{"id":123,"date":"2017-05-11T07:32:02","date_gmt":"2017-05-11T07:32:02","guid":{"rendered":"http:\/\/marcel-jan.eu\/datablog\/?p=123"},"modified":"2017-05-11T07:32:02","modified_gmt":"2017-05-11T07:32:02","slug":"quickly-start-of-the-nifi-crash-course","status":"publish","type":"post","link":"https:\/\/marcel-jan.eu\/datablog\/2017\/05\/11\/quickly-start-of-the-nifi-crash-course\/","title":{"rendered":"Quickly start of the Nifi crash course"},"content":{"rendered":"<p>As I said last in <a href=\"https:\/\/marcel-jan.eu\/datablog\/2017\/05\/05\/a-first-look-at-apache-nifi\/\">my last blogpost<\/a>, I have followed the Apache NiFi crash course that Hortonworks provides. Now the tutorial describes several different scenarios and options and you have to read through that to find which you want. And you don&#8217;t have time for that. You&#8217;re probably doing this in your spare time and you have a whole Netflix backlog.<\/p>\n<p>So in this guide we cut right to the chase. It took me about 10 hours to follow Tutorial 0, 1, 2 and 3. But perhaps this guide can make you do it in about 4 hours.<\/p>\n<h2>1. Preparing the VM<\/h2>\n<p>First download the Hortonworks Sandbox. There&#8217;s a VirtualBox (used in this example), VMWare and Docker image that come preinstalled with many products, but NiFi isn&#8217;t installed just yet (this guide is based on the HDP 2.6 sandbox).<\/p>\n<p><!--more--><\/p>\n<p>https:\/\/hortonworks.com\/products\/sandbox\/<\/p>\n<p>When you start it (here in VirtualBox), you can go to the prompt with Alt-F5 (or Mac: Fn-Alt-F5), but in this window it&#8217;s not really very friendly. You don&#8217;t have any control (that I could find) over the command line window, like number of lines and such.<\/p>\n<div id=\"attachment_126\" style=\"width: 536px\" class=\"wp-caption alignnone\"><img loading=\"lazy\" decoding=\"async\" aria-describedby=\"caption-attachment-126\" class=\" wp-image-126\" src=\"https:\/\/marcel-jan.eu\/datablog\/wp-content\/uploads\/2017\/05\/nifi_quickstart1-300x204.png\" alt=\"\" width=\"526\" height=\"358\" \/><p id=\"caption-attachment-126\" class=\"wp-caption-text\">The prompt in VirtualBox.<\/p><\/div>\n<p>So instead let&#8217;s use <a href=\"http:\/\/www.putty.org\/\">Putty<\/a>. Connect to 127.0.0.1 on port 2222.<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" class=\"alignnone wp-image-125\" src=\"https:\/\/marcel-jan.eu\/datablog\/wp-content\/uploads\/2017\/05\/nifi_quickstart2-300x153.png\" alt=\"\" width=\"433\" height=\"221\" \/><\/p>\n<p>You can log in with root and password: hadoop. The last thing you have to change. While you&#8217;re at it, better change the ambari admin password right away also. Because we don&#8217;t have time for anything less than full privileges.<\/p>\n<pre># ambari-admin-password-reset\r\nPlease set the password for admin:<\/pre>\n<p>What you also can do, is put sandbox.hortonworks.com in your \/etc\/hosts or c:\\Windows\\System32\\Drivers\\etc\\hosts file. It&#8217;s not very necessary. But you will see that in Ambari the link to the NiFi UI doesn&#8217;t work, because your browser doesn&#8217;t go to sandbox.hortonworks.com. Instead you can also change it manually to 127.0.0.1. No biggie.<\/p>\n<pre>127.0.0.1\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 localhost sandbox.hortonworks.com<\/pre>\n<h2>2. Install NiFi<\/h2>\n<p>Open this address in your browser: <a href=\"http:\/\/127.0.0.1:8080\">http:\/\/127.0.0.1:8080<\/a>. Welcome to Ambari. The tutorial logs in with user: maria_dev, password: maria_dev. I overlooked that and used admin. Anyway, you will land at the dashboard, which will look something like this:<\/p>\n<div id=\"attachment_127\" style=\"width: 579px\" class=\"wp-caption alignnone\"><img loading=\"lazy\" decoding=\"async\" aria-describedby=\"caption-attachment-127\" class=\" wp-image-127\" src=\"https:\/\/marcel-jan.eu\/datablog\/wp-content\/uploads\/2017\/05\/nifi_quickstart3a-300x154.png\" alt=\"\" width=\"569\" height=\"292\" \/><p id=\"caption-attachment-127\" class=\"wp-caption-text\">The Ambari dashboard<\/p><\/div>\n<p>Now unlike many other products, NiFi isn&#8217;t up and running yet. We&#8217;ll have to do a little bit of installation. In the Ambari dashboard, scroll down until you see a button Actions (on your left hand under the list of services). This is a drop down menu and there you&#8217;ll find the option Add Service.<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" class=\"alignnone wp-image-128\" src=\"https:\/\/marcel-jan.eu\/datablog\/wp-content\/uploads\/2017\/05\/nifi_quickstart3.png\" alt=\"\" width=\"266\" height=\"213\" \/><\/p>\n<p>Now you can make a choice out of a whole lot of services, most of which are already installed.<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" class=\"alignnone wp-image-129\" src=\"https:\/\/marcel-jan.eu\/datablog\/wp-content\/uploads\/2017\/05\/nifi_quickstart4-300x122.png\" alt=\"\" width=\"566\" height=\"230\" \/><\/p>\n<p>Scroll, scroll, scroll and you&#8217;ll find NiFi. Check that mark.<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" class=\"alignnone wp-image-130\" src=\"https:\/\/marcel-jan.eu\/datablog\/wp-content\/uploads\/2017\/05\/nifi_quickstart4a-300x76.png\" alt=\"\" width=\"533\" height=\"135\" \/><\/p>\n<p>Then it&#8217;s a lot of button pushing. These to be precise:<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" class=\"alignnone wp-image-131\" src=\"https:\/\/marcel-jan.eu\/datablog\/wp-content\/uploads\/2017\/05\/nifi_quickstart5-300x172.png\" alt=\"\" width=\"424\" height=\"243\" \/><\/p>\n<p>Bit of waiting and..<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" class=\"alignnone wp-image-132\" src=\"https:\/\/marcel-jan.eu\/datablog\/wp-content\/uploads\/2017\/05\/nifi_quickstart6-300x27.png\" alt=\"\" width=\"511\" height=\"46\" \/><\/p>\n<p>Click Next (I think it was) and you&#8217;ll get the summary. Click Complete.<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" class=\"alignnone wp-image-133\" src=\"https:\/\/marcel-jan.eu\/datablog\/wp-content\/uploads\/2017\/05\/nifi_quickstart7-300x130.png\" alt=\"\" width=\"514\" height=\"223\" \/><\/p>\n<p>As you could see (before pushing Complete) some services have to be restarted. Don&#8217;t restart them one by one. Let Ambari sort it out. So in the Ambari dashboard, scroll down, click Actions and Restart All Required.<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" class=\"alignnone size-medium wp-image-134\" src=\"https:\/\/marcel-jan.eu\/datablog\/wp-content\/uploads\/2017\/05\/nifi_quickstart8-245x300.png\" alt=\"\" width=\"245\" height=\"300\" \/><\/p>\n<p>After this, I&#8217;ve noticed serveral times, not every service is starting up all that well. For example Oozie has a big red warning mark next to it. For your NiFi tutorial this won&#8217;t matter, though.<\/p>\n<p>Still in the dashboard, go to the NiFi service (way down below in the services list). What you get now, is the NiFi dashboard. There&#8217;s a drop down menu called Quick Links and it has one option: NiFi UI. Now if you didn&#8217;t set up your (\/etc\/) hosts file, this is where you&#8217;ll end up nowhere. Instead go to: <a href=\"http:\/\/127.0.0.1:9090\/nifi\/\">http:\/\/127.0.0.1:9090\/nifi\/<\/a>.<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" class=\"alignnone wp-image-137\" src=\"https:\/\/marcel-jan.eu\/datablog\/wp-content\/uploads\/2017\/05\/nifi_quickstart9-1-300x86.png\" alt=\"\" width=\"576\" height=\"165\" \/><\/p>\n<p>And welcome to NiFi.<\/p>\n<div id=\"attachment_139\" style=\"width: 546px\" class=\"wp-caption alignnone\"><img loading=\"lazy\" decoding=\"async\" aria-describedby=\"caption-attachment-139\" class=\" wp-image-139\" src=\"https:\/\/marcel-jan.eu\/datablog\/wp-content\/uploads\/2017\/05\/nifi_quickstart10-300x131.png\" alt=\"\" width=\"536\" height=\"234\" \/><p id=\"caption-attachment-139\" class=\"wp-caption-text\">An empty NiFi canvas<\/p><\/div>\n<p>Now would probably the time to read <a href=\"https:\/\/hortonworks.com\/hadoop-tutorial\/learning-ropes-apache-nifi\/#section_2\">about how NiFi works<\/a> (to, you know, understand what you&#8217;re doing). Or you can start the tutorial right at where the action is and still catch up on House of Cards.<\/p>\n<p>&nbsp;<\/p>\n<h2>3. Couple of last things<\/h2>\n<p>For the tutorials you can download templates, which I assume should install all necessary (data) processors right away. Now:<\/p>\n<ol>\n<li>I haven&#8217;t seen it working for Lab 1 and haven&#8217;t tried any other templates.<\/li>\n<li>We want this tutorial to go fast, but I assume you want to learn at least something about NiFi.<\/li>\n<\/ol>\n<p>For Tutorial 1 you&#8217;ll need to upload a zip file with traffic patterns (go to <a href=\"https:\/\/hortonworks.com\/hadoop-tutorial\/learning-ropes-apache-nifi\/#section_4\">tutorial 1<\/a>, under Approach 1). You can use <a href=\"https:\/\/winscp.net\/eng\/download.php\">WinSCP<\/a> to do that.<\/p>\n<div id=\"attachment_142\" style=\"width: 512px\" class=\"wp-caption alignnone\"><img loading=\"lazy\" decoding=\"async\" aria-describedby=\"caption-attachment-142\" class=\" wp-image-142\" src=\"https:\/\/marcel-jan.eu\/datablog\/wp-content\/uploads\/2017\/05\/nifi_quickstart11-300x203.png\" alt=\"\" width=\"502\" height=\"340\" \/><p id=\"caption-attachment-142\" class=\"wp-caption-text\">WinSCP connection to the sandbox.<\/p><\/div>\n<p>Also, you&#8217;re going to need a couple of directories with 777 permissions to run the tutorial. Best to create these (as root), before you forget them.<\/p>\n<pre>cd \/tmp\r\nmkdir nifi\r\nchmod 777 nifi\r\nmkdir nifi\/input\r\nchmod 777 nifi\/input\r\nmkdir \/tmp\/nifi\/output\/filtered_transitLoc_data\r\nchmod 777 \/tmp\/nifi\/output\/filtered_transitLoc_data\r\nmkdir \/tmp\/nifi\/output\/nearby_neighborhoods_search\r\nchmod 777 \/tmp\/nifi\/output\/nearby_neighborhoods_search\r\nmkdir \/tmp\/nifi\/output\/nearby_neighborhoods_liveStream\r\nchmod 777 \/tmp\/nifi\/output\/nearby_neighborhoods_liveStream<\/pre>\n<p>For <a href=\"https:\/\/hortonworks.com\/hadoop-tutorial\/learning-ropes-apache-nifi\/#section_5\">tutorial 2<\/a> you will need a Google account, because you need to create an Application key. The tutorial will explain how. Now it is possible that in tutorial 2 the processors in line after InvokeHTTP after a while don&#8217;t work right. I had that problem and after a while, I checked the URLs NiFi was sending (which you can find by looking in the details of the Data provenance) in my browser\u00a0 and Google&#8217;s response was:<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" class=\"alignnone wp-image-140\" src=\"https:\/\/marcel-jan.eu\/datablog\/wp-content\/uploads\/2017\/05\/googleapierror-300x82.png\" alt=\"\" width=\"571\" height=\"156\" \/><\/p>\n<p>Now according to Google this is your <em>daily<\/em> request quota (if you use the Google&#8217;s Places API for free without further registration). So if this works how I think it works, you&#8217;ll have to call your NiFi experiment a day. To make this less likely to happen, better to adjust the ControlRate processor.<\/p>\n<p>The way this works is: the rate is &#8220;Maximum Rate&#8221; per &#8220;Time Duration&#8221;. So either make Maximum Rate lower (but it is already 1) or Time Duration higher, like I did here.<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" class=\"alignnone wp-image-141\" src=\"https:\/\/marcel-jan.eu\/datablog\/wp-content\/uploads\/2017\/05\/nifi_quickstart12-300x221.png\" alt=\"\" width=\"532\" height=\"392\" \/><\/p>\n<p>&nbsp;<\/p>\n<p>You can now start at <a href=\"https:\/\/hortonworks.com\/hadoop-tutorial\/learning-ropes-apache-nifi\/#section_4\">tutorial 1<\/a>, Approach 2 (step 2.1).<\/p>\n<p>https:\/\/hortonworks.com\/hadoop-tutorial\/learning-ropes-apache-nifi\/#section_4<\/p>\n<p>So have fun with Apache NiFi. I think you&#8217;ll find it&#8217;s a great product. In no time you can start your own NSA and direct all those data sources to your own Hadoop cluster. (Did I say that out loud?)<\/p>\n<p>And with that, you can go back to Frank Underwood.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>As I said last in my last blogpost, I have followed the Apache NiFi crash course that Hortonworks provides. Now the tutorial describes several different scenarios and options and you have to read through that to find which you want. And you don&#8217;t have time for that. You&#8217;re probably doing [&hellip;]<\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[35],"tags":[12,44,42,23,43],"class_list":["post-123","post","type-post","status-publish","format-standard","hentry","category-apache-products-for-outsiders","tag-apache-nifi","tag-google-places-api","tag-hdp-sandbox","tag-hortonworks","tag-tutorial"],"_links":{"self":[{"href":"https:\/\/marcel-jan.eu\/datablog\/wp-json\/wp\/v2\/posts\/123","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/marcel-jan.eu\/datablog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/marcel-jan.eu\/datablog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/marcel-jan.eu\/datablog\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/marcel-jan.eu\/datablog\/wp-json\/wp\/v2\/comments?post=123"}],"version-history":[{"count":9,"href":"https:\/\/marcel-jan.eu\/datablog\/wp-json\/wp\/v2\/posts\/123\/revisions"}],"predecessor-version":[{"id":163,"href":"https:\/\/marcel-jan.eu\/datablog\/wp-json\/wp\/v2\/posts\/123\/revisions\/163"}],"wp:attachment":[{"href":"https:\/\/marcel-jan.eu\/datablog\/wp-json\/wp\/v2\/media?parent=123"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/marcel-jan.eu\/datablog\/wp-json\/wp\/v2\/categories?post=123"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/marcel-jan.eu\/datablog\/wp-json\/wp\/v2\/tags?post=123"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}