Tag Archives: Hive

I built a working Hadoop-Spark-Hive cluster on Docker. Here is how.

TL;DR: I made a Docker compose that runs Hadoop, Spark and Hive in a multi-container environment. You can find the necessary files for it here: https://github.com/Marcel-Jan/docker-hadoop-spark [Update 2021-11-09: Since Docker Desktop turned “Expose daemon on tcp://localhost:2375 without TLS” off by … Continue reading

Posted in Howto, Spark | Tagged , , , , , , | 23 Comments

Recovering your HDP 2.6.1 Sandbox on VirtualBox after a restart

If you’ve worked with the Hortonworks Data Platform 2.x sandbox of later versions in VirtualBox and made it shutdown rather vigorously, you might have noticed that you won’t get past this startup screen when you try to start it up … Continue reading

Posted in Apache Products for Outsiders, Howto | Tagged , , , , , , , , , | 2 Comments

Tutorial: Let’s throw some asteroids in Apache Hive

This is a tutorial on how to import data (with fixed lenght) in Apache Hive (in Hortonworks Data Platform 2.6.1). The idea is that any non-Hive, non-Hadoop savvy people can follow along, so let me know if I succeeded (make … Continue reading

Posted in Apache Products for Outsiders | Tagged , , , , , , , , | Leave a comment

Dataworks Summit München 2017 – day one

Just left the beergarten party at Dataworks Summit 2017 in München. Okay, let’s see how well I blog after three of these large beers. Luckily I took notes before. Tell me when I start to become incoherent. So actually for … Continue reading

Posted in Uncategorized | Tagged , , , , , | 2 Comments