As I said last in my last blogpost, I have followed the Apache NiFi crash course that Hortonworks provides. Now the tutorial describes several different scenarios and options and you have to read through that to find which you want. And you don’t have time for that. You’re probably doing this in your spare time and you have a whole Netflix backlog.
So in this guide we cut right to the chase. It took me about 10 hours to follow Tutorial 0, 1, 2 and 3. But perhaps this guide can make you do it in about 4 hours.
1. Preparing the VM
First download the Hortonworks Sandbox. There’s a VirtualBox (used in this example), VMWare and Docker image that come preinstalled with many products, but NiFi isn’t installed just yet (this guide is based on the HDP 2.6 sandbox).
When you start it (here in VirtualBox), you can go to the prompt with Alt-F5 (or Mac: Fn-Alt-F5), but in this window it’s not really very friendly. You don’t have any control (that I could find) over the command line window, like number of lines and such.
So instead let’s use Putty. Connect to 127.0.0.1 on port 2222.
You can log in with root and password: hadoop. The last thing you have to change. While you’re at it, better change the ambari admin password right away also. Because we don’t have time for anything less than full privileges.
# ambari-admin-password-reset Please set the password for admin:
What you also can do, is put sandbox.hortonworks.com in your /etc/hosts or c:\Windows\System32\Drivers\etc\hosts file. It’s not very necessary. But you will see that in Ambari the link to the NiFi UI doesn’t work, because your browser doesn’t go to sandbox.hortonworks.com. Instead you can also change it manually to 127.0.0.1. No biggie.
127.0.0.1 localhost sandbox.hortonworks.com
2. Install NiFi
Open this address in your browser: http://127.0.0.1:8080. Welcome to Ambari. The tutorial logs in with user: maria_dev, password: maria_dev. I overlooked that and used admin. Anyway, you will land at the dashboard, which will look something like this:
Now unlike many other products, NiFi isn’t up and running yet. We’ll have to do a little bit of installation. In the Ambari dashboard, scroll down until you see a button Actions (on your left hand under the list of services). This is a drop down menu and there you’ll find the option Add Service.
Now you can make a choice out of a whole lot of services, most of which are already installed.
Scroll, scroll, scroll and you’ll find NiFi. Check that mark.
Then it’s a lot of button pushing. These to be precise:
Bit of waiting and..
Click Next (I think it was) and you’ll get the summary. Click Complete.
As you could see (before pushing Complete) some services have to be restarted. Don’t restart them one by one. Let Ambari sort it out. So in the Ambari dashboard, scroll down, click Actions and Restart All Required.
After this, I’ve noticed serveral times, not every service is starting up all that well. For example Oozie has a big red warning mark next to it. For your NiFi tutorial this won’t matter, though.
Still in the dashboard, go to the NiFi service (way down below in the services list). What you get now, is the NiFi dashboard. There’s a drop down menu called Quick Links and it has one option: NiFi UI. Now if you didn’t set up your (/etc/) hosts file, this is where you’ll end up nowhere. Instead go to: http://127.0.0.1:9090/nifi/.
And welcome to NiFi.
Now would probably the time to read about how NiFi works (to, you know, understand what you’re doing). Or you can start the tutorial right at where the action is and still catch up on House of Cards.
3. Couple of last things
For the tutorials you can download templates, which I assume should install all necessary (data) processors right away. Now:
- I haven’t seen it working for Lab 1 and haven’t tried any other templates.
- We want this tutorial to go fast, but I assume you want to learn at least something about NiFi.
Also, you’re going to need a couple of directories with 777 permissions to run the tutorial. Best to create these (as root), before you forget them.
cd /tmp mkdir nifi chmod 777 nifi mkdir nifi/input chmod 777 nifi/input mkdir /tmp/nifi/output/filtered_transitLoc_data chmod 777 /tmp/nifi/output/filtered_transitLoc_data mkdir /tmp/nifi/output/nearby_neighborhoods_search chmod 777 /tmp/nifi/output/nearby_neighborhoods_search mkdir /tmp/nifi/output/nearby_neighborhoods_liveStream chmod 777 /tmp/nifi/output/nearby_neighborhoods_liveStream
For tutorial 2 you will need a Google account, because you need to create an Application key. The tutorial will explain how. Now it is possible that in tutorial 2 the processors in line after InvokeHTTP after a while don’t work right. I had that problem and after a while, I checked the URLs NiFi was sending (which you can find by looking in the details of the Data provenance) in my browser and Google’s response was:
Now according to Google this is your daily request quota (if you use the Google’s Places API for free without further registration). So if this works how I think it works, you’ll have to call your NiFi experiment a day. To make this less likely to happen, better to adjust the ControlRate processor.
The way this works is: the rate is “Maximum Rate” per “Time Duration”. So either make Maximum Rate lower (but it is already 1) or Time Duration higher, like I did here.
You can now start at tutorial 1, Approach 2 (step 2.1).
So have fun with Apache NiFi. I think you’ll find it’s a great product. In no time you can start your own NSA and direct all those data sources to your own Hadoop cluster. (Did I say that out loud?)
And with that, you can go back to Frank Underwood.