Tag Archives: Apache Spark

I built a working Hadoop-Spark-Hive cluster on Docker. Here is how.

TL;DR: I made a Docker compose that runs Hadoop, Spark and Hive in a multi-container environment. You can find the necessary files for it here: https://github.com/Marcel-Jan/docker-hadoop-spark   How it started We at DIKW are working on a Certified Data Engineering … Continue reading

Posted in Howto, Learning Big Data, Spark | Tagged , , , , , , | 6 Comments

Book review: Spark in Action, 2nd edition

There are lots of books on Spark, but not a lot that aimed at the data engineer. Data engineers use Spark to ingest and transform data, which is different from what data scientists use it for. On the Roaring Elephant … Continue reading

Posted in Data engineering, Spark | Tagged , , , | 2 Comments