-
Recent Posts
Recent Comments
- Marcel-Jan Krijgsman on I built a working Hadoop-Spark-Hive cluster on Docker. Here is how.
- Chris on I built a working Hadoop-Spark-Hive cluster on Docker. Here is how.
- admin_r0g1nuq9 on I built a working Hadoop-Spark-Hive cluster on Docker. Here is how.
- LJ on I built a working Hadoop-Spark-Hive cluster on Docker. Here is how.
- admin_r0g1nuq9 on I built a working Hadoop-Spark-Hive cluster on Docker. Here is how.
Archives
Categories
Meta
-
Privacy & Cookies: This site uses cookies. By continuing to use this website, you agree to their use.
To find out more, including how to control cookies, see here: Cookie Policy
Tag Archives: Apache Spark
I built a working Hadoop-Spark-Hive cluster on Docker. Here is how.
TL;DR: I made a Docker compose that runs Hadoop, Spark and Hive in a multi-container environment. You can find the necessary files for it here: https://github.com/Marcel-Jan/docker-hadoop-spark How it started We at DIKW are working on a Certified Data Engineering … Continue reading
Posted in Howto, Learning Big Data, Spark
Tagged Apache Spark, Big Data Europe, DIKW, Docker, docker-compose, Hadoop, Hive
8 Comments
Book review: Spark in Action, 2nd edition
There are lots of books on Spark, but not a lot that aimed at the data engineer. Data engineers use Spark to ingest and transform data, which is different from what data scientists use it for. On the Roaring Elephant … Continue reading
Posted in Data engineering, Spark
Tagged Apache Spark, Jean-Georges Perrin, Roaring Elephant podcast, Spark
2 Comments