Building HDP 2.6 on AWS, Part 2: the master nodes

This is part 2 in a series on how to build a Hortonworks Data Platform 2.6 cluster on AWS. In part 1 we created an edge node where we will later install Ambari Server. The next step is creating the master nodes.

Creating the first master node

Make sure you are logged in Amazon Web Services, in the same AWS district as the edge node. To create 3 master nodes, we have to start with one. Once again we go to the EC2 dashboard in the AWS interface and click “Launch instance”. And again we have a choice of Amazon Machine Images and again we choose Ubuntu Server 16.04.

(more…)

Building HDP 2.6 on AWS, Part 1: the edge node

Installing Hortonworks Data Platform 2.6 on Amazon Web Services (Amazon’s cloud platform), how hard could it be? It’s click, click, next, next, confirm, right?

Well-lll, not quite. Especially if HDP or AWS is new to you. There are many steps and many things to look out for. That’s why I wrote a manual, initially for myself, and here for you.

Disclaimer: This blogpost might change slightly after I’ve gained more experience with my HDP cluster. Most of it works, but I have some problems with a few services. I’ll notify of changes I’ve made at the end of this post.

(more…)

Quickly start of the Nifi crash course

As I said last in my last blogpost, I have followed the Apache NiFi crash course that Hortonworks provides. Now the tutorial describes several different scenarios and options and you have to read through that to find which you want. And you don’t have time for that. You’re probably doing this in your spare time and you have a whole Netflix backlog.

So in this guide we cut right to the chase. It took me about 10 hours to follow Tutorial 0, 1, 2 and 3. But perhaps this guide can make you do it in about 4 hours.

1. Preparing the VM

First download the Hortonworks Sandbox. There’s a VirtualBox (used in this example), VMWare and Docker image that come preinstalled with many products, but NiFi isn’t installed just yet (this guide is based on the HDP 2.6 sandbox).

(more…)

The new product anxiety cycle

Am I the only one who has this? Let me know.

Phase 1: Discovery of New Product

Suddenly everybody talks about New Product. It’s said it changes everything. Articles about New Product appear on Hacker News for weeks. Then colleagues on LinkedIn even mention New Product (Warning! People you know, know New Product!). (Or they’re just linking to articles about New Product, so they look cool. Either way: they must know New Product!)

(more…)

My first experiences with Apache NiFi

There are a lot of data-related Apache products out there and it’s hard to keep up with all of them. There are several products to stream or flow data (what’s the difference?). Like Kafka, Storm, Flink and NiFi. Yes, all products have documentation, but for an outsider their description sounds like “enterprise scalable streaming solutions”. What does that tell you?

I followed a Crash Course on Apache Nifi at the DataWorks Summit in München last month and was quite impressed. At heart I’m a command line kind of guy, but this graphical interface is really slick and it’s amazing what you can do to find out where your data goes to with NiFi. I decided to organize a workshop for my colleagues at Open Circle Solutions. (more…)