Plotting video locations from my Sony camera in Python

Two years ago I bought a Sony FDR-X3000 actioncam to record video on my bike rides. And I’m really happy about it. It’s just great reliving my rides in 4K, going downhill for kilometers from some col I climbed. I also make compilation videos for fellow cyclists. Like these:

(more…)

By Marcel-Jan Krijgsman, 4 yearsMay 3, 2022 ago

Apache Products for Outsiders

Gaining insights on my workout data with Apache Superset

For a few years I’ve been gathering data on my workouts. In Excel. It’s not exactly state of the art data architecture, but it was fine for a while. But data alone doesn’t do much. I wanted some questions answered. Lately I’ve been hearing a lot about Apache Superset. (Well, Read more

By Marcel-Jan Krijgsman, 4 years ago

Howto

I built a working Hadoop-Spark-Hive cluster on Docker. Here is how.

TL;DR: I made a Docker compose that runs Hadoop, Spark and Hive in a multi-container environment. You can find the necessary files for it here: https://github.com/Marcel-Jan/docker-hadoop-spark [Update 2021-11-09: Since Docker Desktop turned “Expose daemon on tcp://localhost:2375 without TLS” off by default there have been all kinds of connection problems running Read more

By Marcel-Jan Krijgsman, 5 yearsOctober 25, 2020 ago

Howto

A humidity sensor network on a Raspberry Pi with Zigbee2MQTT

I was looking for a way to detect leakage in my appartement with some kind of IoT solution. Someone on the Dutch technology forum Tweakers.net told me Xiaomi Humidity sensors, combined with a Zigbee2MQTT might be a good fit. The sensors are quite cheap and so is the CC2531 sniffer Read more

By Marcel-Jan Krijgsman, 6 years ago

Active Learning

Neo4J: Loading rocket data in a graph database

When I first learned about graph databases, like Neo4J, I didn’t get it. That’s how I always start with new technology: not getting at all why people getting so enthusiastic about them. Then I read “Seven Databases in Seven Weeks, 2nd edition” (as reviewed in January). It describes Neo4J as Read more

By Marcel-Jan Krijgsman, 7 yearsJune 8, 2019 ago

Howto

Showing a complex Excel sheet who’s boss with Python and pandas

Data engineering isn’t always creating serverless APIs and ingressing terrabyte a minute streams with do-hickeys on Kubernetes. Sometimes people just want their Excel sheet in the data lake. Is that big data? Not even close. It’s very small. But for some people it’s a first step in a data driven world.

But does Hadoop read Excel? Not to my knowledge. But NiFi, that wonderful open source data flow software has an Excel processor. It can even help you to work the data a little. But some Excel sheets simply need too much reworking. And that’s simply too big a job for NiFi. I’ve used Python and the pandas library to create a csv file that Hadoop can handle.

(more…)

By Marcel-Jan Krijgsman, 7 yearsMarch 8, 2019 ago

Howto

Building HDP 2.6 on AWS, Part 3: the worker nodes

This is part 3 in a series on how to build a Hortonworks Data Platform 2.6 cluster on AWS. By now we have an edge node to run Ambari Server, three master nodes for Hadoop name nodes and such. Now we need worker nodes for processing the data.

Creating the worker nodes is not that much different from creating the master nodes. But the workers need more powerful nodes.

Creating the first worker node

Log in at Amazon Web Services again, in the same AWS district as the edge and master nodes. We start with one worker node and clone 2 more later on. Go to the EC2 dashboard in the AWS interface and click “Launch instance”. Then choose Ubuntu Server 16.04 from the Amazon Machine Images. (more…)

By Marcel-Jan Krijgsman, 8 yearsApril 10, 2018 ago

Apache Products for Outsiders

Recovering your HDP 2.6.1 Sandbox on VirtualBox after a restart

If you’ve worked with the Hortonworks Data Platform 2.x sandbox of later versions in VirtualBox and made it shutdown rather vigorously, you might have noticed that you won’t get past this startup screen when you try to start it up the next time:

I had this a couple of times and that’s why I decided to pause my sandbox every time and save it before shutting down my laptop. But yesterday Windows 10 decided to step in. After a day of studying it was high time for me to have dinner, during which I kept the laptop on. Little did I know that Windows 10 at that time decided to update and restart. And to do this, it needed to shutdown every application. Including VirtualBox. When I came back I found out to my horror that my carefully prepared HDP sandbox was shutdown in the roughest of ways. Thanks, Microsoft! (more…)

By Marcel-Jan Krijgsman, 8 yearsNovember 17, 2017 ago

Howto

Fun with Data: Python and space rocks!

Last week I had a little fun with playing with Python, the pandas and matplotlib library and a JSON file with asteroid data. Here is what I did.

By Marcel-Jan Krijgsman, 8 yearsAugust 7, 2017 ago

Howto

Building HDP 2.6 on AWS, Part 2: the master nodes

This is part 2 in a series on how to build a Hortonworks Data Platform 2.6 cluster on AWS. In part 1 we created an edge node where we will later install Ambari Server. The next step is creating the master nodes.

Creating the first master node

Make sure you are logged in Amazon Web Services, in the same AWS district as the edge node. To create 3 master nodes, we have to start with one. Once again we go to the EC2 dashboard in the AWS interface and click “Launch instance”. And again we have a choice of Amazon Machine Images and again we choose Ubuntu Server 16.04.

(more…)

By Marcel-Jan Krijgsman, 9 yearsMay 26, 2017 ago