-
Recent Posts
Recent Comments
- Mart Laurano on I built a working Hadoop-Spark-Hive cluster on Docker. Here is how.
- Marcel-Jan Krijgsman on I tried Lion’s Mane as a cognitive enhancer. Here are my experiences with it.
- B on I tried Lion’s Mane as a cognitive enhancer. Here are my experiences with it.
- Suresh Vemuri on I built a working Hadoop-Spark-Hive cluster on Docker. Here is how.
- Suresh Vemuri on I built a working Hadoop-Spark-Hive cluster on Docker. Here is how.
Archives
- May 2022
- April 2022
- March 2022
- January 2022
- November 2021
- October 2021
- September 2021
- August 2021
- July 2021
- June 2021
- May 2021
- February 2021
- October 2020
- November 2019
- September 2019
- June 2019
- April 2019
- March 2019
- January 2019
- December 2018
- May 2018
- April 2018
- February 2018
- January 2018
- December 2017
- November 2017
- August 2017
- July 2017
- June 2017
- May 2017
- April 2017
- February 2017
Categories
Meta
-
Privacy & Cookies: This site uses cookies. By continuing to use this website, you agree to their use.
To find out more, including how to control cookies, see here: Cookie Policy
Tag Archives: Hadoop
My Github repo got 50 stars
I never imagined myself as a maintainer of a data engineering related open source thing. Yet. But when I was working on our data engineering course, I needed some kind of data lake software. At first I used the Cloudera … Continue reading
Posted in Apache Products for Outsiders, Data engineering, Learning Big Data
Tagged Docker, docker-compose, Github, Hadoop, stars
Leave a comment
What a year 2021 has been
So at the end of 2021 I found myself in the waiting room of an emergency dentist. An infection above my front teeth became unbearable. Fortunately antibiotics makes my live much better now. Let that event not colour my view … Continue reading
Posted in Active Learning, Data engineering
Tagged astronomy, Certified Data Engineering Professional, cycling, Github, Hadoop, Kupka, Paris, vacation
Leave a comment
I built a working Hadoop-Spark-Hive cluster on Docker. Here is how.
TL;DR: I made a Docker compose that runs Hadoop, Spark and Hive in a multi-container environment. You can find the necessary files for it here: https://github.com/Marcel-Jan/docker-hadoop-spark [Update 2021-11-09: Since Docker Desktop turned “Expose daemon on tcp://localhost:2375 without TLS” off by … Continue reading
Posted in Howto, Learning Big Data, Spark
Tagged Apache Spark, Big Data Europe, DIKW, Docker, docker-compose, Hadoop, Hive
17 Comments
Building HDP 2.6 on AWS, Part 3: the worker nodes
This is part 3 in a series on how to build a Hortonworks Data Platform 2.6 cluster on AWS. By now we have an edge node to run Ambari Server, three master nodes for Hadoop name nodes and such. Now … Continue reading
Posted in Howto, Learning Big Data
Tagged Amazon Web Services, AWS, cloning nodes, Hadoop, HDP, Hortonworks Data Platform, Ubuntu Server, worker nodes
Leave a comment
Hadoop in a Hurry – Security
When talking about Hadoop security there are so many products and features. What do all of them do? This video gives a high over overview.
Hadoop High Availability In A Hurry – Part 2: YARN
If you don’t know a lot about YARN and why it’s called a data operating system, you’re in luck. I found it necessary to explain how YARN works before I could explain the solutions for high availability. At first YARN … Continue reading
Posted in Apache Products for Outsiders, Learning Big Data
Tagged Application Master, Container, Hadoop, Node Manager, Resource Manager, YARN, ZooKeeper
1 Comment
Hadoop High Availability In A Hurry – Part 1: HDFS
I’ve been studying for a couple of hours how Hadoop high availability works, for the HDPCA exam. And now I’ve condensed that knowledge to a video on HDFS HA in just under 9 minutes. Enjoy!
Posted in Apache Products for Outsiders, Learning Big Data
Tagged DataNode, edits file, Fencing, fsimage, Hadoop, HDFS, High availability, JournalNode, NameNode, Split brain, ZKFC, ZooKeeper
1 Comment
Building HDP 2.6 on AWS, Part 2: the master nodes
This is part 2 in a series on how to build a Hortonworks Data Platform 2.6 cluster on AWS. In part 1 we created an edge node where we will later install Ambari Server. The next step is creating the … Continue reading
Posted in Howto, Learning Big Data
Tagged Amazon Web Services, AWS, cloning nodes, Hadoop, HDP, Hortonworks Data Platform, master node, Ubuntu Server
5 Comments
Building HDP 2.6 on AWS, Part 1: the edge node
Installing Hortonworks Data Platform 2.6 on Amazon Web Services (Amazon’s cloud platform), how hard could it be? It’s click, click, next, next, confirm, right? Well-lll, not quite. Especially if HDP or AWS is new to you. There are many steps … Continue reading
Posted in Howto, Learning Big Data
Tagged Amazon Web Services, AWS, edge node, Hadoop, HDP, Hortonworks Data Platform, Ubuntu Server
2 Comments