Quickly start of the Nifi crash course

As I said last in my last blogpost, I have followed the Apache NiFi crash course that Hortonworks provides. Now the tutorial describes several different scenarios and options and you have to read through that to find which you want. And you don’t have time for that. You’re probably doing this in your spare time and you have a whole Netflix backlog.

So in this guide we cut right to the chase. It took me about 10 hours to follow Tutorial 0, 1, 2 and 3. But perhaps this guide can make you do it in about 4 hours.

1. Preparing the VM

First download the Hortonworks Sandbox. There’s a VirtualBox (used in this example), VMWare and Docker image that come preinstalled with many products, but NiFi isn’t installed just yet (this guide is based on the HDP 2.6 sandbox).

Sandbox

When you start it (here in VirtualBox), you can go to the prompt with Alt-F5 (or Mac: Fn-Alt-F5), but in this window it’s not really very friendly. You don’t have any control (that I could find) over the command line window, like number of lines and such.

The prompt in VirtualBox.

So instead let’s use Putty. Connect to 127.0.0.1 on port 2222.

You can log in with root and password: hadoop. The last thing you have to change. While you’re at it, better change the ambari admin password right away also. Because we don’t have time for anything less than full privileges.

# ambari-admin-password-reset
Please set the password for admin:

What you also can do, is put sandbox.hortonworks.com in your /etc/hosts or c:\Windows\System32\Drivers\etc\hosts file. It’s not very necessary. But you will see that in Ambari the link to the NiFi UI doesn’t work, because your browser doesn’t go to sandbox.hortonworks.com. Instead you can also change it manually to 127.0.0.1. No biggie.

127.0.0.1       localhost sandbox.hortonworks.com

2. Install NiFi

Open this address in your browser: http://127.0.0.1:8080. Welcome to Ambari. The tutorial logs in with user: maria_dev, password: maria_dev. I overlooked that and used admin. Anyway, you will land at the dashboard, which will look something like this:

The Ambari dashboard

Now unlike many other products, NiFi isn’t up and running yet. We’ll have to do a little bit of installation. In the Ambari dashboard, scroll down until you see a button Actions (on your left hand under the list of services). This is a drop down menu and there you’ll find the option Add Service.

Now you can make a choice out of a whole lot of services, most of which are already installed.

Scroll, scroll, scroll and you’ll find NiFi. Check that mark.

Then it’s a lot of button pushing. These to be precise:

Bit of waiting and..

Click Next (I think it was) and you’ll get the summary. Click Complete.

As you could see (before pushing Complete) some services have to be restarted. Don’t restart them one by one. Let Ambari sort it out. So in the Ambari dashboard, scroll down, click Actions and Restart All Required.

After this, I’ve noticed serveral times, not every service is starting up all that well. For example Oozie has a big red warning mark next to it. For your NiFi tutorial this won’t matter, though.

Still in the dashboard, go to the NiFi service (way down below in the services list). What you get now, is the NiFi dashboard. There’s a drop down menu called Quick Links and it has one option: NiFi UI. Now if you didn’t set up your (/etc/) hosts file, this is where you’ll end up nowhere. Instead go to: http://127.0.0.1:9090/nifi/.

And welcome to NiFi.

An empty NiFi canvas

Now would probably the time to read about how NiFi works (to, you know, understand what you’re doing). Or you can start the tutorial right at where the action is and still catch up on House of Cards.

 

3. Couple of last things

For the tutorials you can download templates, which I assume should install all necessary (data) processors right away. Now:

  1. I haven’t seen it working for Lab 1 and haven’t tried any other templates.
  2. We want this tutorial to go fast, but I assume you want to learn at least something about NiFi.

For Tutorial 1 you’ll need to upload a zip file with traffic patterns (go to tutorial 1, under Approach 1). You can use WinSCP to do that.

WinSCP connection to the sandbox.

Also, you’re going to need a couple of directories with 777 permissions to run the tutorial. Best to create these (as root), before you forget them.

cd /tmp
mkdir nifi
chmod 777 nifi
mkdir nifi/input
chmod 777 nifi/input
mkdir /tmp/nifi/output/filtered_transitLoc_data
chmod 777 /tmp/nifi/output/filtered_transitLoc_data
mkdir /tmp/nifi/output/nearby_neighborhoods_search
chmod 777 /tmp/nifi/output/nearby_neighborhoods_search
mkdir /tmp/nifi/output/nearby_neighborhoods_liveStream
chmod 777 /tmp/nifi/output/nearby_neighborhoods_liveStream

For tutorial 2 you will need a Google account, because you need to create an Application key. The tutorial will explain how. Now it is possible that in tutorial 2 the processors in line after InvokeHTTP after a while don’t work right. I had that problem and after a while, I checked the URLs NiFi was sending (which you can find by looking in the details of the Data provenance) in my browser  and Google’s response was:

Now according to Google this is your daily request quota (if you use the Google’s Places API for free without further registration). So if this works how I think it works, you’ll have to call your NiFi experiment a day. To make this less likely to happen, better to adjust the ControlRate processor.

The way this works is: the rate is “Maximum Rate” per “Time Duration”. So either make Maximum Rate lower (but it is already 1) or Time Duration higher, like I did here.

 

You can now start at tutorial 1, Approach 2 (step 2.1).

Analyze Traffic Patterns with Apache NiFi

So have fun with Apache NiFi. I think you’ll find it’s a great product. In no time you can start your own NSA and direct all those data sources to your own Hadoop cluster. (Did I say that out loud?)

And with that, you can go back to Frank Underwood.

About Marcel-Jan Krijgsman

In 2017 I made the leap to Big Data after 20 years of experience with Oracle databases. I followed courses on Hadoop, Big Data Analytics, Machine Learning and Python, MongoDB and Elasticsearch.
This entry was posted in Apache Products for Outsiders and tagged , , , , . Bookmark the permalink.

2 Responses to Quickly start of the Nifi crash course

  1. Pingback: My first experiences with Apache NiFi | Expedition Data

  2. Pingback: Een introductie in Apache NiFi - Open Circle Solutions

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.