Author Archives: Marcel-Jan Krijgsman

About Marcel-Jan Krijgsman

In 2017 I made the leap to Big Data after 20 years of experience with Oracle databases. I followed courses on Hadoop, Big Data Analytics, Machine Learning and Python, MongoDB and Elasticsearch.

Showing a complex Excel sheet who’s boss with Python and pandas

Data engineering isn’t always creating serverless APIs and ingressing terrabyte a minute streams with do-hickeys on Kubernetes. Sometimes people just want their Excel sheet in the data lake. Is that big data? Not even close. It’s very small. But for … Continue reading

Posted in Howto, Python | Tagged , , , , , , , | Leave a comment

Book review: Seven Databases in Seven Weeks

There are so many data related open source products nowadays. On one side that’s great. On the other side it’s hard for one human to grasp them all. To be sure, there’s great documentation on them all. And there are … Continue reading

Posted in Active Learning | Tagged , , , , , , , , , , , | Leave a comment

Doing data science on my health data in R Studio – Part 1

Up to seven years ago my doctor would nag me every half year that I should lose some weight. Nagging didn’t work on me that much. What did work however was competition. I wanted to become faster in a bike … Continue reading

Posted in Learning Big Data, Weird experiments | Tagged , , , , | Leave a comment

Check your /tmp on HDFS

If you have sensitive data on your Hadoop cluster, you might want to check /tmp on HDFS once a while to see what ends up there. /tmp is used by several components. Hive for example stores its “scratch data” there. … Continue reading

Posted in Learning Big Data | Tagged , , | Leave a comment

Tech dossier: Elasticsearch / the ELK stack

Because tech is moving so fast, I’ve been keeping dossiers in Evernote of open source products I have to learn more of, which I’ve decided to put on my blog. My last one was about Kubernetes. This one is about … Continue reading

Posted in Tech dossier | Tagged , , , , , | Leave a comment

Tech dossier: Kubernetes

Because tech is moving so fast, I’ve been keeping dossiers in Evernote of open source products I have to learn more of. Like Kubernetes. This morning I suddenly thought this would be perfect for a blog.. if properly organized. My … Continue reading

Posted in Kubernetes, Tech dossier | Tagged , , , , , , , , | Leave a comment

Making a Hertzsprung-Russell diagram from Gaia DR2 data with Elasticsearch

Elasticsearch was one of the open source products on my list to try out, ever since I got rejected for a couple of assignments as a consultant last year. Apparently it’s a popular product. But why do you need a … Continue reading

Posted in Learning Big Data, NoSQL | Tagged , , , , , , , , | Leave a comment

Codemotion Amsterdam 2018, day two

Back on the ferry to the north of Amsterdam I went, back for day two of Codemotion Amsterdam 2018. Keynote Daniel Gebler from PicNic told us about what they are doing today to bring groceries home for people. I’ve seen … Continue reading

Posted in Conferences, Events | Tagged , , , , , , , , , , , , | Leave a comment

Codemotion Amsterdam 2018, day one

Last Friday I almost felt I had to explain a colleague that I don’t always win raffles and lotteries. Because yep, I won another ticket. Again via the Roaring Elephant podcast. It’s pretty worthwhile listening to them, is all I’m … Continue reading

Posted in Conferences, Events | Tagged , , , , , , , | Leave a comment

Starting at Port of Rotterdam per 1 May 2018

Next week (1 May 2018) I will start as a Hadoop specialist/data steward/data custodian/data something something at the Advanced Analytics team at Port of Rotterdam. We haven’t worked out a fancy data something title yet. I’m already working at this … Continue reading

Posted in Uncategorized | Tagged , , | Leave a comment