Recent Comments
Tag Archives: Docker
My Github repo got 50 stars
I never imagined myself as a maintainer of a data engineering related open source thing. Yet. But when I was working on our data engineering course, I needed some kind of data lake software. At first I used the Cloudera … Continue reading
Posted in Apache Products for Outsiders, Data engineering
Tagged Docker, docker-compose, Github, Hadoop, stars
Leave a comment
Gaining insights on my workout data with Apache Superset
For a few years I’ve been gathering data on my workouts. In Excel. It’s not exactly state of the art data architecture, but it was fine for a while. But data alone doesn’t do much. I wanted some questions answered. … Continue reading
Posted in Apache Products for Outsiders, Howto
Tagged Apache Superset, DATETIME, Docker, docker-compose, health data, PostgreSQL
Leave a comment
I built a working Hadoop-Spark-Hive cluster on Docker. Here is how.
TL;DR: I made a Docker compose that runs Hadoop, Spark and Hive in a multi-container environment. You can find the necessary files for it here: https://github.com/Marcel-Jan/docker-hadoop-spark [Update 2021-11-09: Since Docker Desktop turned “Expose daemon on tcp://localhost:2375 without TLS” off by … Continue reading
Posted in Howto, Spark
Tagged Apache Spark, Big Data Europe, DIKW, Docker, docker-compose, Hadoop, Hive
23 Comments
Dataworks Summit Berlin 2018, day two
Back for round two of keynotes, good technical sessions and discussing them with fellow data specialists in between. Keynotes First up was Frank Säuberlich from Teradata, who had an interesting example of machine learning for fraud detection at Danske Bank. … Continue reading
Posted in Conferences, Events
Tagged Apache Atlas, Apache Metron, Apache Ranger, Data Steward Studio, Dataworks Summit, Docker, GDPR, Personal data, Roaring Elephant podcast, Spark, Synerscope, TPC-H
Leave a comment