{"id":202,"date":"2017-05-26T20:48:16","date_gmt":"2017-05-26T20:48:16","guid":{"rendered":"http:\/\/marcel-jan.eu\/datablog\/?p=202"},"modified":"2017-05-28T19:11:37","modified_gmt":"2017-05-28T19:11:37","slug":"building-hdp-2-6-on-aws-part-2-the-master-nodes","status":"publish","type":"post","link":"https:\/\/marcel-jan.eu\/datablog\/2017\/05\/26\/building-hdp-2-6-on-aws-part-2-the-master-nodes\/","title":{"rendered":"Building HDP 2.6 on AWS, Part 2: the master nodes"},"content":{"rendered":"<p>This is part 2 in a series on how to build a Hortonworks Data Platform 2.6 cluster on AWS. In part 1 we <a href=\"https:\/\/marcel-jan.eu\/datablog\/2017\/05\/15\/building-hdp-2-6-on-aws-part-1-the-edge-node\/\">created an edge node<\/a> where we will later install Ambari Server. The next step is creating the master nodes.<\/p>\n<h1>Creating the first master node<\/h1>\n<p>Make sure you are logged in Amazon Web Services, in the same AWS district as the edge node. To create 3 master nodes, we have to start with one. Once again we go to the EC2 dashboard in the AWS interface and click &#8220;Launch instance&#8221;. And again we have a choice of Amazon Machine Images and again we choose Ubuntu Server 16.04.<\/p>\n<p><!--more--><\/p>\n<p><img loading=\"lazy\" decoding=\"async\" class=\"alignnone wp-image-204\" src=\"https:\/\/marcel-jan.eu\/datablog\/wp-content\/uploads\/2017\/05\/hdp_op_aws_master1-300x134.png\" alt=\"\" width=\"642\" height=\"287\" \/><\/p>\n<p>This time I chose a general purpose instance that is a bit more powerful than our edge node. Select the one with type m3.large.<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" class=\"alignnone wp-image-205\" src=\"https:\/\/marcel-jan.eu\/datablog\/wp-content\/uploads\/2017\/05\/hdp_op_aws_master2_instancetype-300x116.png\" alt=\"\" width=\"639\" height=\"247\" \/><\/p>\n<h2>Networking<\/h2>\n<p>We will create our master node in a new subnet. We don&#8217;t want it to be accessible from the Internet, but it should be able to access Internet itself (for updates).<\/p>\n<p>Because this subnet is not accessible from the outside world, you might put something like &#8220;private&#8221; in the name tag. This subnet will be created in the same VPC as last time, so select that. But it will have a different IPv4 CIDR block, than the one we created last time, because we need a different range of IP&#8217;s.<\/p>\n<p>To give an example, last post I created a subnet for public access with IPv4 CIDR block 100.10.42.0\/24. We&#8217;re going to be in the 100.10.x.x range, but we can&#8217;t chose 100.10.42.x anymore for our new subnet. That is already taken. So lets use 100.10.43.0\/24.<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" class=\"alignnone wp-image-208\" src=\"https:\/\/marcel-jan.eu\/datablog\/wp-content\/uploads\/2017\/05\/hdp_op_aws_master3_subnet-300x148.png\" alt=\"\" width=\"493\" height=\"243\" \/><\/p>\n<h2>Storage<\/h2>\n<p>On another Hadoop cluster we had space usage warnings on the root partition on some of the master nodes. Like I told part 1 it&#8217;s not easy to enlarge a root partition. 150 GB should do it, though.<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" class=\"alignnone wp-image-209\" src=\"https:\/\/marcel-jan.eu\/datablog\/wp-content\/uploads\/2017\/05\/hdp_op_aws_master5_storage-300x109.png\" alt=\"\" width=\"630\" height=\"229\" \/><\/p>\n<h2>Tags<\/h2>\n<p>Again, we can add a Name tag.<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" class=\"alignnone wp-image-210\" src=\"https:\/\/marcel-jan.eu\/datablog\/wp-content\/uploads\/2017\/05\/hdp_op_aws_master6_tag-300x60.png\" alt=\"\" width=\"655\" height=\"131\" \/><\/p>\n<h2>Security groups<\/h2>\n<p>The master nodes operate in a network space that is not accessible from the outside world. So if there isn&#8217;t a private security group available already, you have to create one.<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" class=\"alignnone wp-image-200\" src=\"https:\/\/marcel-jan.eu\/datablog\/wp-content\/uploads\/2017\/05\/hdp_op_aws_edge8a_securitygroup-1-300x103.png\" alt=\"\" width=\"472\" height=\"162\" \/><\/p>\n<p>While we do not want the master nodes to be accessible directly, we&#8217;re going to need that direct access at the start of the installation. So we need to place the master node temporarily in both the public and private security groups.<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" class=\"alignnone wp-image-212\" src=\"https:\/\/marcel-jan.eu\/datablog\/wp-content\/uploads\/2017\/05\/hdp_op_aws_master7_securitygroup-300x144.png\" alt=\"\" width=\"608\" height=\"292\" \/><\/p>\n<h2>Review<\/h2>\n<p>Check that the configuration is in order and click Launch.<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" class=\"alignnone wp-image-213\" src=\"https:\/\/marcel-jan.eu\/datablog\/wp-content\/uploads\/2017\/05\/hdp_op_aws_master8_review-300x150.png\" alt=\"\" width=\"604\" height=\"302\" \/><\/p>\n<h2>Key pair<\/h2>\n<p>Let&#8217;s use the same key pair as for the edge node.<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" class=\"alignnone wp-image-216\" src=\"https:\/\/marcel-jan.eu\/datablog\/wp-content\/uploads\/2017\/05\/hdp_op_aws_master9_keypair-300x185.png\" alt=\"\" width=\"470\" height=\"290\" \/><\/p>\n<h2>Elastic IP<\/h2>\n<p>To temporarily reach our first master we&#8217;re also going to need an Elastic IP. So let&#8217;s make one. You&#8217;ll find it in the VPC Dashboard. If you&#8217;re not sure where to look, check out <a href=\"https:\/\/marcel-jan.eu\/datablog\/2017\/05\/15\/building-hdp-2-6-on-aws-part-1-the-edge-node\/\">Part 1<\/a> where we also created one.<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" class=\"alignnone wp-image-217\" src=\"https:\/\/marcel-jan.eu\/datablog\/wp-content\/uploads\/2017\/05\/hdp_op_aws_master11-elasticip-300x47.png\" alt=\"\" width=\"581\" height=\"91\" \/><\/p>\n<p>And let&#8217;s associate it with our first master.<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" class=\"alignnone wp-image-218\" src=\"https:\/\/marcel-jan.eu\/datablog\/wp-content\/uploads\/2017\/05\/hdp_op_aws_master13-elasticip-300x59.png\" alt=\"\" width=\"620\" height=\"122\" \/><\/p>\n<p>You get to choose the associated instance here.<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" class=\"alignnone wp-image-219\" src=\"https:\/\/marcel-jan.eu\/datablog\/wp-content\/uploads\/2017\/05\/hdp_op_aws_master14-elasticip-300x135.png\" alt=\"\" width=\"518\" height=\"233\" \/><\/p>\n<h2>Connection with Putty<\/h2>\n<p>All is set to connect to our master node. For this I use Putty again.<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" class=\"alignnone wp-image-220\" src=\"https:\/\/marcel-jan.eu\/datablog\/wp-content\/uploads\/2017\/05\/hdp_op_aws_master15-putty-300x141.png\" alt=\"\" width=\"409\" height=\"192\" \/><\/p>\n<h2>Passwordless logging in from the edge node<\/h2>\n<p>After we have created our masters and nodes, we will register them in an Ambari cluster. This will not work when the process has to enter a passphrase when making a connection from our edge node to our master nodes (I know from experience).<\/p>\n<p>To allow this we need to copy the \/root\/.ssh\/id_rsa.pub file from the edge node, to the master node at the same location (\/root\/.ssh).<\/p>\n<p>The way I did this:<\/p>\n<ol>\n<li>On the edge node, as root, I copied \/root\/.ssh\/id_rsa.pub to \/tmp.<\/li>\n<li>I copied the file with WinSCP from \/tmp to my workstation.<\/li>\n<li>From my workstation I copied it with WinSCP to my master node to \/tmp.<\/li>\n<li>On the master node, from \/tmp as root I copied the file to \/root\/.ssh.<\/li>\n<\/ol>\n<p>There is one more step. On the master node we need to copy id_rsa.pub to the file authorized_keys:<\/p>\n<pre>cat id_rsa.pub &gt;&gt; authorized_keys<\/pre>\n<h2>Enable NTP<\/h2>\n<p>Next we need to do a couple of things on the master node as root. I&#8217;m following the documentation here, but with Ubuntu commands.<\/p>\n<p>First we need to install NTP:<\/p>\n<pre>apt-get update\r\napt-get install ntp<\/pre>\n<p>Now we enable NTP:<\/p>\n<pre>update-rc.d ntp defaults\r\nupdate-rc.d ntp enable\r\n\/etc\/init.d\/ntp start<\/pre>\n<h2>Fully Qualified Domain Name<\/h2>\n<p>According to the documentation your hostname needs to be equal to the FQDN, or Fully Qualified Domain Name. Here is how I did that. Of course you need to fill in the hostname and IP of your master.<\/p>\n<pre>MYHOST=ip-100-10-43-237.eu-west-1.compute.internal\r\nMYIP=\"100.10.43.237\"\r\n\r\nhostname $MYHOST\r\necho $MYHOST &gt; \/etc\/hostname\r\necho $MYIP $MYHOST &gt;&gt; \/etc\/hosts<\/pre>\n<p>Because the right settings here are rather important, it&#8217;s best to check:<\/p>\n<pre>hostname\r\ncat \/etc\/hostname\r\ncat \/etc\/hosts<\/pre>\n<h2>Turning off security<\/h2>\n<p>At this point the documentation tells me <a href=\"https:\/\/docs.hortonworks.com\/HDPDocuments\/Ambari-2.5.0.3\/bk_ambari-installation\/content\/configuring_iptables.html\">to turn off IPtables<\/a>, the Linux firewall software. But Ubuntu Server 16.04 uses a different firewall product. If you query it&#8217;s status, you&#8217;ll find it&#8217;s turned off already:<\/p>\n<pre># sudo ufw status\r\n\r\nStatus: inactive<\/pre>\n<p>SELinux also <a href=\"https:\/\/docs.hortonworks.com\/HDPDocuments\/Ambari-2.5.0.3\/bk_ambari-installation\/content\/disable_selinux_and_packagekit_and_check_the_umask_value.html\">has to be turned off<\/a>. SELinux stands for Security-Enhanced Linux. It&#8217;s a kernel module with which you can choose what software is allowed to do what. You turn it off with this command.<\/p>\n<pre>setenforce 0<\/pre>\n<p>Bit of a shame that, because I recently <a href=\"https:\/\/stopdisablingselinux.com\/\">learned<\/a> SELinux is actually <a href=\"https:\/\/youtu.be\/cNoVgDqqJmM\">not that hard to maintain<\/a>.<\/p>\n<p>But believe me, for now it&#8217;s better to follow instructions of <a href=\"https:\/\/docs.hortonworks.com\/HDPDocuments\/Ambari-2.5.0.3\/bk_ambari-installation\/content\/configuring_iptables.html\">HDP&#8217;s documentation<\/a> up to the letter, because there&#8217;s so much that can go wrong.<\/p>\n<p>There&#8217;s one more command to run, to set the umask.<\/p>\n<pre>umask 0022<\/pre>\n<p>Checking the setting is easy. This command should return 0022.<\/p>\n<pre>umask<\/pre>\n<p>And then let&#8217;s put the umask in the standard profile.<\/p>\n<pre>echo umask 0022 &gt;&gt; \/etc\/profile<\/pre>\n<h2>Installing necessary software<\/h2>\n<p>We&#8217;re going to need <a href=\"https:\/\/docs.hortonworks.com\/HDPDocuments\/HDP2\/HDP-2.6.0\/bk_support-matrices\/content\/ch_matrices-ambari.html#ambari_software\">some software<\/a>. Here is how I&#8217;ve checked if everything is available.<\/p>\n<pre>sudo apt-get update\r\napt list installed curl\r\napt list installed scp\r\napt list installed unzip\r\napt list installed openssl\r\napt list installed tar\r\napt list installed wget\r\napt list installed python\r\napt list installed openjdk-8-jdk<\/pre>\n<p>JDK7 is also supported. On my system unzip was missing, so to install that, you run this command.<\/p>\n<pre>sudo apt install unzip<\/pre>\n<h2>Couple of checks<\/h2>\n<p>I wrote these commands down, but I can&#8217;t seem to find them in the documentation anymore. The default values were okay, so it&#8217;s not a big issue.<\/p>\n<p>This command checks free memory:<\/p>\n<pre># free -m\r\n\r\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 total\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 used\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 free\u00a0\u00a0\u00a0\u00a0\u00a0 shared\u00a0 buff\/cache\u00a0\u00a0 available\r\n Mem:\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 7061\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 126\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 6114\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 16\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 821\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 6642\r\n Swap:\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 0<\/pre>\n<p>Maximum open number of files:<\/p>\n<pre># ulimit -Sn\r\n\r\n1024\r\n\r\n# ulimit -Hn\r\n\r\n4096<\/pre>\n<h2>Removing the Elastic IP and security group<\/h2>\n<p>Alright, we have done everything necessary on the command prompt. Now it&#8217;s time to remove the unnecessary: the Elastic IP and public security group.<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" class=\"alignnone size-medium wp-image-224\" src=\"https:\/\/marcel-jan.eu\/datablog\/wp-content\/uploads\/2017\/05\/hdp_op_aws_master15-remove-elasticip-300x143.png\" alt=\"\" width=\"300\" height=\"143\" \/><\/p>\n<p>Next go to the Change Security Groups option in the instances overview.<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" class=\"alignnone wp-image-225\" src=\"https:\/\/marcel-jan.eu\/datablog\/wp-content\/uploads\/2017\/05\/hdp_op_aws_master16a-remove-sg-300x159.png\" alt=\"\" width=\"445\" height=\"236\" \/><\/p>\n<p>Remove the public security group.<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" class=\"alignnone wp-image-226\" src=\"https:\/\/marcel-jan.eu\/datablog\/wp-content\/uploads\/2017\/05\/hdp_op_aws_master16-remove-sg-300x168.png\" alt=\"\" width=\"439\" height=\"246\" \/><\/p>\n<p>Don&#8217;t forget to test if you can now log in passwordless-ly from the edge node to this master node.<\/p>\n<h1>Bring in the clones<\/h1>\n<p>Time to clone us some master nodes. I&#8217;ve used the Launch More Like This option for this.<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" class=\"alignnone wp-image-227\" src=\"https:\/\/marcel-jan.eu\/datablog\/wp-content\/uploads\/2017\/05\/hdp_op_aws_master17-more-like-this-300x71.png\" alt=\"\" width=\"625\" height=\"148\" \/><\/p>\n<p>Immediately the review window pops up, but we have some changes to make.<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" class=\"alignnone wp-image-228\" src=\"https:\/\/marcel-jan.eu\/datablog\/wp-content\/uploads\/2017\/05\/hdp_op_aws_master18-review-clone-300x151.png\" alt=\"\" width=\"618\" height=\"311\" \/><\/p>\n<p>So click Previous. Here I&#8217;ve already entered a name to which I will add numbers for different nodes later on.<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" class=\"alignnone wp-image-229\" src=\"https:\/\/marcel-jan.eu\/datablog\/wp-content\/uploads\/2017\/05\/hdp_op_aws_master19-add-tag-300x66.png\" alt=\"\" width=\"623\" height=\"137\" \/><\/p>\n<p>Let&#8217;s go back by clicking Previous a couple of more times until you arrive at Step 3: Configure Instance Details. Here we change the number of instances to 2. This means we&#8217;re going to create 2 clones instead of one.<img loading=\"lazy\" decoding=\"async\" class=\"alignnone wp-image-230\" src=\"https:\/\/marcel-jan.eu\/datablog\/wp-content\/uploads\/2017\/05\/hdp_op_aws_master20-config-300x150.png\" alt=\"\" width=\"626\" height=\"313\" \/><\/p>\n<p>Now click Review and Launch. You have to choose the same key pair as the first master.<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" class=\"alignnone wp-image-216\" src=\"https:\/\/marcel-jan.eu\/datablog\/wp-content\/uploads\/2017\/05\/hdp_op_aws_master9_keypair-300x185.png\" alt=\"\" width=\"436\" height=\"269\" \/><\/p>\n<p>And before you know it, you have 3 master nodes. Let&#8217;s change their tags right away, so we can distinguish them from each other.<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" class=\"alignnone size-medium wp-image-231\" src=\"https:\/\/marcel-jan.eu\/datablog\/wp-content\/uploads\/2017\/05\/hdp_op_aws_master21-change-names-300x169.png\" alt=\"\" width=\"300\" height=\"169\" \/><\/p>\n<h2>Test passwordless login, again<\/h2>\n<p>If I&#8217;m not very much mistaken, you can log in without password from the edge node to your new master nodes.<\/p>\n<h1>Conclusion<\/h1>\n<p>It&#8217;s quite a bit of work to make these master nodes, but you have to follow each step. Do this and the registration of your master (and worker) nodes in Ambari will go more smoothly.<\/p>\n<p>If you have any remarks about my process here, let me know in the comments.<\/p>\n<p>&nbsp;<\/p>\n<p>Next post we&#8217;re going to create worker nodes. Luckily this process will look very much like what we did to create master nodes.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>This is part 2 in a series on how to build a Hortonworks Data Platform 2.6 cluster on AWS. In part 1 we created an edge node where we will later install Ambari Server. The next step is creating the master nodes. Creating the first master node Make sure you [&hellip;]<\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[55],"tags":[50,49,54,4,48,47,53,51],"class_list":["post-202","post","type-post","status-publish","format-standard","hentry","category-howto","tag-amazon-web-services","tag-aws","tag-cloning-nodes","tag-hadoop","tag-hdp","tag-hortonworks-data-platform","tag-master-node","tag-ubuntu-server"],"_links":{"self":[{"href":"https:\/\/marcel-jan.eu\/datablog\/wp-json\/wp\/v2\/posts\/202","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/marcel-jan.eu\/datablog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/marcel-jan.eu\/datablog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/marcel-jan.eu\/datablog\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/marcel-jan.eu\/datablog\/wp-json\/wp\/v2\/comments?post=202"}],"version-history":[{"count":7,"href":"https:\/\/marcel-jan.eu\/datablog\/wp-json\/wp\/v2\/posts\/202\/revisions"}],"predecessor-version":[{"id":235,"href":"https:\/\/marcel-jan.eu\/datablog\/wp-json\/wp\/v2\/posts\/202\/revisions\/235"}],"wp:attachment":[{"href":"https:\/\/marcel-jan.eu\/datablog\/wp-json\/wp\/v2\/media?parent=202"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/marcel-jan.eu\/datablog\/wp-json\/wp\/v2\/categories?post=202"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/marcel-jan.eu\/datablog\/wp-json\/wp\/v2\/tags?post=202"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}