Tag Archives: Docker

My Github repo got 50 stars

I never imagined myself as a maintainer of a data engineering related open source thing. Yet. But when I was working on our data engineering course, I needed some kind of data lake software. At first I used the Cloudera … Continue reading

Posted in Apache Products for Outsiders, Data engineering | Tagged , , , , | Leave a comment

Gaining insights on my workout data with Apache Superset

For a few years I’ve been gathering data on my workouts. In Excel. It’s not exactly state of the art data architecture, but it was fine for a while. But data alone doesn’t do much. I wanted some questions answered. … Continue reading

Posted in Apache Products for Outsiders, Howto | Tagged , , , , , | Leave a comment

I built a working Hadoop-Spark-Hive cluster on Docker. Here is how.

TL;DR: I made a Docker compose that runs Hadoop, Spark and Hive in a multi-container environment. You can find the necessary files for it here: https://github.com/Marcel-Jan/docker-hadoop-spark [Update 2021-11-09: Since Docker Desktop turned “Expose daemon on tcp://localhost:2375 without TLS” off by … Continue reading

Posted in Howto, Spark | Tagged , , , , , , | 23 Comments

Dataworks Summit Berlin 2018, day two

Back for round two of keynotes, good technical sessions and discussing them with fellow data specialists in between. Keynotes First up was  Frank Säuberlich from Teradata, who had an interesting example of machine learning for fraud detection at Danske Bank. … Continue reading

Posted in Conferences, Events | Tagged , , , , , , , , , , , | Leave a comment