Finding if exercising works with RStudio
Does exercising make me lose weight or body fat? I’ve gathered 6 years of health data (on myself) and tried using RStudio to tease out if exercise works. Answer: probably, maybe.
Does exercising make me lose weight or body fat? I’ve gathered 6 years of health data (on myself) and tried using RStudio to tease out if exercise works. Answer: probably, maybe.
Today I talked about how I became a Hadoop specialist/data engineer at the ITNEXT Data Engineering & DevOps meetup.
Here are a couple of links that were or not were in my presentation:
The (what I call) “hype-o-meter” site from YCombinator: https://news.ycombinator.com/
Coursera (fixed-date courses): Coursera.org
Udacity (self-paced cources): Udacity.org
Udemy (non-MOOC course site with crazy discounts): udemy.com
MOOC search engine: class-central.com
MongoDB University (free as long as it’s MongoDB 🙂 ): university.mongodb.com
Elasticsearch was one of the open source products on my list to try out, ever since I got rejected for a couple of assignments as a consultant last year. Apparently it’s a popular product. But why do you need a search engine in a Big Data architecture? This I explain Read more
Back on the ferry to the north of Amsterdam I went, back for day two of Codemotion Amsterdam 2018.

Daniel Gebler from PicNic told us about what they are doing today to bring groceries home for people. I’ve seen two presentations by PicNic before and I could really see their progress from session to session.
Daniel explained how they use a recommender system to make it possible for customers to buy their most common groceries with one tap in the PicNic app. Which is actually hard. Even if you get 90% of precision of your prediction for one item, that means that for a set of 12 items you actually get 12% precision. So they really had to work to get a much better precision per item. They managed to do that by working with two dimensions of data: big and deep data. (more…)
Last Friday I almost felt I had to explain a colleague that I don’t always win raffles and lotteries. Because yep, I won another ticket. Again via the Roaring Elephant podcast. It’s pretty worthwhile listening to them, is all I’m saying.
This was a ticket for CodeMotion Amsterdam 2018. CodeMotion is a conference for developers with topics like the blockchain, Big Data, Internet of Things, DevOps, software architectures, but also front-end development, game development and AR/VR.

Amsterdam from the ferry to the north of the city.
Next week (1 May 2018) I will start as a Hadoop specialist/data steward/data custodian/data something something at the Advanced Analytics team at Port of Rotterdam. We haven’t worked out a fancy data something title yet. I’m already working at this team as a consultant. I’ve been involved with security and data governance of the data lake (for people outside Big Data: a data lake is simply a Hadoop cluster).

The World Port Center
Back for round two of keynotes, good technical sessions and discussing them with fellow data specialists in between. Keynotes First up was Frank Säuberlich from Teradata, who had an interesting example of machine learning for fraud detection at Danske Bank. They used transaction data sort of as pixels and ran Read more
I’m back at Dataworks Summit this year. This time I didn’t win any ticket, but my new employer, Port of Rotterdam, has arranged that I could go. Pretty cool, because I did not want to miss it. This time it’s happening in Berlin.

It started with keynotes. Scott Gnau from Hortonworks announced Data Steward Studio for better data governance. Scott’s message was that your data strategy is your cloud strategy is your business strategy. You should not see them as totally different things. (more…)
This is part 3 in a series on how to build a Hortonworks Data Platform 2.6 cluster on AWS. By now we have an edge node to run Ambari Server, three master nodes for Hadoop name nodes and such. Now we need worker nodes for processing the data.
Creating the worker nodes is not that much different from creating the master nodes. But the workers need more powerful nodes.
Log in at Amazon Web Services again, in the same AWS district as the edge and master nodes. We start with one worker node and clone 2 more later on. Go to the EC2 dashboard in the AWS interface and click “Launch instance”. Then choose Ubuntu Server 16.04 from the Amazon Machine Images. (more…)
When I started studying Hadoop, Python and machine learning in 2016, I found something out that I didn’t expect. I feel better when I study. When I finished another problem, exam or course, and I stepped outside the house to do some shopping or to go to work, I felt Read more