Category Archives: Learning Big Data

Making a Hertzsprung-Russell diagram from Gaia DR2 data with Elasticsearch

Elasticsearch was one of the open source products on my list to try out, ever since I got rejected for a couple of assignments as a consultant last year. Apparently it’s a popular product. But why do you need a … Continue reading

Posted in Learning Big Data, NoSQL | Tagged , , , , , , , , | Leave a comment

Building HDP 2.6 on AWS, Part 3: the worker nodes

This is part 3 in a series on how to build a Hortonworks Data Platform 2.6 cluster on AWS. By now we have an edge node to run Ambari Server, three master nodes for Hadoop name nodes and such. Now … Continue reading

Posted in Howto, Learning Big Data | Tagged , , , , , , , | Leave a comment

I feel great when I study

When I started studying Hadoop, Python and machine learning in 2016, I found something out that I didn’t expect. I feel better when I study. When I finished another problem, exam or course, and I stepped outside the house to … Continue reading

Posted in Learning Big Data | Tagged , | Leave a comment

Recovering your HDP 2.6.1 Sandbox on VirtualBox after a restart

If you’ve worked with the Hortonworks Data Platform 2.x sandbox of later versions in VirtualBox and made it shutdown rather vigorously, you might have noticed that you won’t get past this startup screen when you try to start it up … Continue reading

Posted in Apache Products for Outsiders, Howto, Learning Big Data | Tagged , , , , , , , , , | 2 Comments

Tutorial: Let’s throw some asteroids in Apache Hive

This is a tutorial on how to import data (with fixed lenght) in Apache Hive (in Hortonworks Data Platform 2.6.1). The idea is that any non-Hive, non-Hadoop savvy people can follow along, so let me know if I succeeded (make … Continue reading

Posted in Apache Products for Outsiders, Learning Big Data | Tagged , , , , , , , , | Leave a comment

Hadoop High Availability In A Hurry – Part 2: YARN

If you don’t know a lot about YARN and why it’s called a data operating system, you’re in luck. I found it necessary to explain how YARN works before I could explain the solutions for high availability. At first YARN … Continue reading

Posted in Apache Products for Outsiders, Learning Big Data | Tagged , , , , , , | 1 Comment

Hadoop High Availability In A Hurry – Part 1: HDFS

I’ve been studying for a couple of hours how Hadoop high availability works, for the HDPCA exam. And now I’ve condensed that knowledge to a video on HDFS HA in just under 9 minutes. Enjoy!

Posted in Apache Products for Outsiders, Learning Big Data | Tagged , , , , , , , , , , , | 1 Comment

Certifying as HDP Certified Administrator

Let’s talk about certification. The thing by which you try to show potential employers and customers that you actually know what you are doing at work. My only experience up to last Tuesday with IT product-related certifications was with Oracle’s … Continue reading

Posted in Learning Big Data | Tagged , , , | Leave a comment

Building HDP 2.6 on AWS, Part 2: the master nodes

This is part 2 in a series on how to build a Hortonworks Data Platform 2.6 cluster on AWS. In part 1 we created an edge node where we will later install Ambari Server. The next step is creating the … Continue reading

Posted in Howto, Learning Big Data | Tagged , , , , , , , | 5 Comments

Building HDP 2.6 on AWS, Part 1: the edge node

Installing Hortonworks Data Platform 2.6 on Amazon Web Services (Amazon’s cloud platform), how hard could it be? It’s click, click, next, next, confirm, right? Well-lll, not quite. Especially if HDP or AWS is new to you. There are many steps … Continue reading

Posted in Howto, Learning Big Data | Tagged , , , , , , | 2 Comments