-
Recent Posts
Recent Comments
- Marcel-Jan Krijgsman on I built a working Hadoop-Spark-Hive cluster on Docker. Here is how.
- Chris on I built a working Hadoop-Spark-Hive cluster on Docker. Here is how.
- admin_r0g1nuq9 on I built a working Hadoop-Spark-Hive cluster on Docker. Here is how.
- LJ on I built a working Hadoop-Spark-Hive cluster on Docker. Here is how.
- admin_r0g1nuq9 on I built a working Hadoop-Spark-Hive cluster on Docker. Here is how.
Archives
Categories
Meta
-
Privacy & Cookies: This site uses cookies. By continuing to use this website, you agree to their use.
To find out more, including how to control cookies, see here: Cookie Policy
Tag Archives: Hadoop
I built a working Hadoop-Spark-Hive cluster on Docker. Here is how.
TL;DR: I made a Docker compose that runs Hadoop, Spark and Hive in a multi-container environment. You can find the necessary files for it here: https://github.com/Marcel-Jan/docker-hadoop-spark How it started We at DIKW are working on a Certified Data Engineering … Continue reading
Posted in Howto, Learning Big Data, Spark
Tagged Apache Spark, Big Data Europe, DIKW, Docker, docker-compose, Hadoop, Hive
8 Comments
Building HDP 2.6 on AWS, Part 3: the worker nodes
This is part 3 in a series on how to build a Hortonworks Data Platform 2.6 cluster on AWS. By now we have an edge node to run Ambari Server, three master nodes for Hadoop name nodes and such. Now … Continue reading
Posted in Howto, Learning Big Data
Tagged Amazon Web Services, AWS, cloning nodes, Hadoop, HDP, Hortonworks Data Platform, Ubuntu Server, worker nodes
Leave a comment
Hadoop in a Hurry – Security
When talking about Hadoop security there are so many products and features. What do all of them do? This video gives a high over overview.
Hadoop High Availability In A Hurry – Part 2: YARN
If you don’t know a lot about YARN and why it’s called a data operating system, you’re in luck. I found it necessary to explain how YARN works before I could explain the solutions for high availability. At first YARN … Continue reading
Posted in Apache Products for Outsiders, Learning Big Data
Tagged Application Master, Container, Hadoop, Node Manager, Resource Manager, YARN, ZooKeeper
1 Comment
Hadoop High Availability In A Hurry – Part 1: HDFS
I’ve been studying for a couple of hours how Hadoop high availability works, for the HDPCA exam. And now I’ve condensed that knowledge to a video on HDFS HA in just under 9 minutes. Enjoy!
Posted in Apache Products for Outsiders, Learning Big Data
Tagged DataNode, edits file, Fencing, fsimage, Hadoop, HDFS, High availability, JournalNode, NameNode, Split brain, ZKFC, ZooKeeper
1 Comment
Building HDP 2.6 on AWS, Part 2: the master nodes
This is part 2 in a series on how to build a Hortonworks Data Platform 2.6 cluster on AWS. In part 1 we created an edge node where we will later install Ambari Server. The next step is creating the … Continue reading
Posted in Howto, Learning Big Data
Tagged Amazon Web Services, AWS, cloning nodes, Hadoop, HDP, Hortonworks Data Platform, master node, Ubuntu Server
5 Comments
Building HDP 2.6 on AWS, Part 1: the edge node
Installing Hortonworks Data Platform 2.6 on Amazon Web Services (Amazon’s cloud platform), how hard could it be? It’s click, click, next, next, confirm, right? Well-lll, not quite. Especially if HDP or AWS is new to you. There are many steps … Continue reading
Posted in Howto, Learning Big Data
Tagged Amazon Web Services, AWS, edge node, Hadoop, HDP, Hortonworks Data Platform, Ubuntu Server
2 Comments
Dataworks Summit München 2017 – day two
Day two started with more keynotes. Ross Porter of Dell EMC talked about the ingredients of a successful analytics project. Carlo Vaiti of HP Enterprise had an interesting talk about trends in big data, but I would advise him to … Continue reading
Posted in Events
Tagged Apache NiFi, Apache Storm, Big Data, Big Data sizing, Dataworks Summit, ElasticSearch, Hadoop, HopsFS, Machine Learing, München
1 Comment
Dataworks Summit München 2017 – day one
Just left the beergarten party at Dataworks Summit 2017 in München. Okay, let’s see how well I blog after three of these large beers. Luckily I took notes before. Tell me when I start to become incoherent. So actually for … Continue reading
Posted in Uncategorized
Tagged Apache Ranger, DWS 2017, Hadoop, Hive, München, open source
2 Comments