Fun with Data: Python and space rocks!

Last week I had a little fun with playing with Python, the pandas and matplotlib library and a JSON file with asteroid data. Here is what I did.

Posted in Howto, Python | Tagged , , , , , | Leave a comment

Hadoop High Availability In A Hurry – Part 2: YARN

If you don’t know a lot about YARN and why it’s called a data operating system, you’re in luck. I found it necessary to explain how YARN works before I could explain the solutions for high availability.

At first YARN High Availability seemed like a different beast from HDFS High Availability. But when I read more about the topic I found out the solutions are actually very simular. Enjoy!

Posted in Apache Products for Outsiders, Learning Big Data | Tagged , , , , , , | 1 Comment

Hadoop High Availability In A Hurry – Part 1: HDFS

I’ve been studying for a couple of hours how Hadoop high availability works, for the HDPCA exam. And now I’ve condensed that knowledge to a video on HDFS HA in just under 9 minutes. Enjoy!

Posted in Apache Products for Outsiders, Learning Big Data | Tagged , , , , , , , , , , , | 1 Comment

Certifying as HDP Certified Administrator

Let’s talk about certification. The thing by which you try to show potential employers and customers that you actually know what you are doing at work. My only experience up to last Tuesday with IT product-related certifications was with Oracle’s Certified Professional program. I’ve been OCP for the database from 8i to 11g plus I’m 11g Database Performance Tuning Certified Expert. But all these exams were mainly multiple choice and to really test your knowledge the exams often contained some obscure stuff that you would rarely use. I’ll never forget the question about v$waitstat in one of these exams… well, I digress.

OCP wasn’t exactly embraced by all Oracle DBA’s either. A lot of experienced DBA’s saw it more as a way for inexperienced DBA’s to show they .. knew how to learn lots of facts about Oracle databases. Companies with lots of inexperienced DBA’s loved it, hoping that this would entice customers to invite their otherwise green “medior” DBA’s.

Continue reading

Posted in Learning Big Data | Tagged , , , | Leave a comment

Building HDP 2.6 on AWS, Part 2: the master nodes

This is part 2 in a series on how to build a Hortonworks Data Platform 2.6 cluster on AWS. In part 1 we created an edge node where we will later install Ambari Server. The next step is creating the master nodes.

Creating the first master node

Make sure you are logged in Amazon Web Services, in the same AWS district as the edge node. To create 3 master nodes, we have to start with one. Once again we go to the EC2 dashboard in the AWS interface and click “Launch instance”. And again we have a choice of Amazon Machine Images and again we choose Ubuntu Server 16.04.

Continue reading

Posted in Howto, Learning Big Data | Tagged , , , , , , , | 5 Comments

Building HDP 2.6 on AWS, Part 1: the edge node

Installing Hortonworks Data Platform 2.6 on Amazon Web Services (Amazon’s cloud platform), how hard could it be? It’s click, click, next, next, confirm, right?

Well-lll, not quite. Especially if HDP or AWS is new to you. There are many steps and many things to look out for. That’s why I wrote a manual, initially for myself, and here for you.

Disclaimer: This blogpost might change slightly after I’ve gained more experience with my HDP cluster. Most of it works, but I have some problems with a few services. I’ll notify of changes I’ve made at the end of this post.

Continue reading

Posted in Howto, Learning Big Data | Tagged , , , , , , | 2 Comments

Quickly start of the Nifi crash course

As I said last in my last blogpost, I have followed the Apache NiFi crash course that Hortonworks provides. Now the tutorial describes several different scenarios and options and you have to read through that to find which you want. And you don’t have time for that. You’re probably doing this in your spare time and you have a whole Netflix backlog.

So in this guide we cut right to the chase. It took me about 10 hours to follow Tutorial 0, 1, 2 and 3. But perhaps this guide can make you do it in about 4 hours.

1. Preparing the VM

First download the Hortonworks Sandbox. There’s a VirtualBox (used in this example), VMWare and Docker image that come preinstalled with many products, but NiFi isn’t installed just yet (this guide is based on the HDP 2.6 sandbox).

Continue reading

Posted in Apache Products for Outsiders | Tagged , , , , | 2 Comments

The new product anxiety cycle

Am I the only one who has this? Let me know.

Phase 1: Discovery of New Product

Suddenly everybody talks about New Product. It’s said it changes everything. Articles about New Product appear on Hacker News for weeks. Then colleagues on LinkedIn even mention New Product (Warning! People you know, know New Product!). (Or they’re just linking to articles about New Product, so they look cool. Either way: they must know New Product!)

Continue reading

Posted in Uncategorized | Tagged , , | 1 Comment

My first experiences with Apache NiFi

There are a lot of data-related Apache products out there and it’s hard to keep up with all of them. There are several products to stream or flow data (what’s the difference?). Like Kafka, Storm, Flink and NiFi. Yes, all products have documentation, but for an outsider their description sounds like “enterprise scalable streaming solutions”. What does that tell you?

I followed a Crash Course on Apache Nifi at the DataWorks Summit in München last month and was quite impressed. At heart I’m a command line kind of guy, but this graphical interface is really slick and it’s amazing what you can do to find out where your data goes to with NiFi. I decided to organize a workshop for my colleagues at Open Circle Solutions. Continue reading

Posted in Apache Products for Outsiders | Tagged , , , , , , , , | 2 Comments

How to learn Big Data

“How do you got in Big Data?”, is a question that people asked me a couple of times now. So let me give that answer in a blogpost as well.

I’ve used eight sources of Big Data related knowledge and skills:

  • Massive Open Online Courses (MOOCs)
  • Books
  • Meetups and summits
  • Podcasts
  • Videos
  • Online documentation
  • Hands-on experience
  • Learning sites/”universities” of vendors

Continue reading

Posted in Learning Big Data | Tagged , , , , , , , , , , , , , , , , | 2 Comments