{"id":236,"date":"2018-04-10T14:54:41","date_gmt":"2018-04-10T14:54:41","guid":{"rendered":"http:\/\/marcel-jan.eu\/datablog\/?p=236"},"modified":"2018-04-11T12:23:52","modified_gmt":"2018-04-11T12:23:52","slug":"building-hdp-2-6-on-aws-part-3-the-worker-nodes","status":"publish","type":"post","link":"https:\/\/marcel-jan.eu\/datablog\/2018\/04\/10\/building-hdp-2-6-on-aws-part-3-the-worker-nodes\/","title":{"rendered":"Building HDP 2.6 on AWS, Part 3: the worker nodes"},"content":{"rendered":"<p>This is part 3 in a series on how to build a Hortonworks Data Platform 2.6 cluster on AWS. By now we have <a href=\"https:\/\/marcel-jan.eu\/datablog\/2017\/05\/15\/building-hdp-2-6-on-aws-part-1-the-edge-node\/\">an edge node<\/a> to run Ambari Server, <a href=\"https:\/\/marcel-jan.eu\/datablog\/2017\/05\/26\/building-hdp-2-6-on-aws-part-2-the-master-nodes\/\">three master nodes<\/a> for Hadoop name nodes and such. Now we need worker nodes for processing the data.<\/p>\n<p>Creating the worker nodes is not that much different from creating the master nodes. But the workers need more powerful nodes.<\/p>\n<h1>Creating the first worker node<\/h1>\n<p>Log in at Amazon Web Services again, in the same AWS district as the edge and master nodes. We start with one worker node and clone 2 more later on. Go to the EC2 dashboard in the AWS interface and click \u201cLaunch instance\u201d. Then choose Ubuntu Server 16.04 from the Amazon Machine Images.<!--more--><\/p>\n<p><img loading=\"lazy\" decoding=\"async\" class=\"alignnone wp-image-204\" src=\"https:\/\/marcel-jan.eu\/datablog\/wp-content\/uploads\/2017\/05\/hdp_op_aws_master1-300x134.png\" alt=\"\" width=\"611\" height=\"273\" \/><\/p>\n<p>For the workers we need machines with a little more oomph. Select a general purpose instance with type m4.2xlarge.<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" class=\"alignnone wp-image-239\" src=\"https:\/\/marcel-jan.eu\/datablog\/wp-content\/uploads\/2017\/05\/hdp_op_aws_worker1-ami-300x89.png\" alt=\"\" width=\"590\" height=\"175\" \/><\/p>\n<h2>Networking<\/h2>\n<p>The worker nodes will be in the same subnet as the masters.<\/p>\n<h2>Storage<\/h2>\n<p>This is possibly more storage than necessary. 100GB for root, 25 GB of type EBS for \/dev\/sdc and 100 GB of type EBS for \/dev\/sdb.<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" class=\"alignnone wp-image-465\" src=\"https:\/\/marcel-jan.eu\/datablog\/wp-content\/uploads\/2018\/04\/hdp_op_aws_worker2-storage-300x99.png\" alt=\"\" width=\"715\" height=\"236\" \/><\/p>\n<h2>Tags<\/h2>\n<p>The tags of course are different.<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" class=\"alignnone wp-image-467\" src=\"https:\/\/marcel-jan.eu\/datablog\/wp-content\/uploads\/2018\/04\/hdp_op_aws_worker3-tags-300x59.png\" alt=\"\" width=\"656\" height=\"129\" \/><\/p>\n<h2>Review<\/h2>\n<p><img loading=\"lazy\" decoding=\"async\" class=\"alignnone wp-image-471\" src=\"https:\/\/marcel-jan.eu\/datablog\/wp-content\/uploads\/2018\/04\/hdp_op_aws_worker4-review-1-300x149.png\" alt=\"\" width=\"775\" height=\"385\" \/><\/p>\n<h2>Extra preparation to install Ambari and HDP<\/h2>\n<p>You need unzip:<\/p>\n<pre>sudo apt-get update\r\n\r\nsudo apt install unzip<\/pre>\n<h2>Elastic IP<\/h2>\n<p>Create a new Elastic IP and associate the worker node with it.<\/p>\n<p>&nbsp;<\/p>\n<h1>Clone the worker node<\/h1>\n<p>Now we clone the first worker node to two new workers. This works in the same way as we cloned the master nodes, except this time m4.2xlarge nodes were chosen.<\/p>\n<p>Change the tags in the instance list:<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" class=\"alignnone size-medium wp-image-470\" src=\"https:\/\/marcel-jan.eu\/datablog\/wp-content\/uploads\/2018\/04\/hdp_op_aws_worker5-clones-300x229.png\" alt=\"\" width=\"300\" height=\"229\" \/><\/p>\n<p>The nodes have the same software as the first worker node, but passworless access is something you have to configure on all of them. You need to put the worker nodes in the\u00a0de OCS-POC edge \u2013 sg security group and associate them, one by one, with the Elastic IP, so you can log in directly.<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" class=\"alignnone wp-image-472\" src=\"https:\/\/marcel-jan.eu\/datablog\/wp-content\/uploads\/2018\/04\/hdp_op_aws_worker6-securitygroup-300x153.png\" alt=\"\" width=\"781\" height=\"398\" \/><\/p>\n<h2>Test the connection<\/h2>\n<p>When the master nodes are started, you should have access as root from the edge nodes.<\/p>\n<p>&nbsp;<\/p>\n","protected":false},"excerpt":{"rendered":"<p>This is part 3 in a series on how to build a Hortonworks Data Platform 2.6 cluster on AWS. By now we have an edge node to run Ambari Server, three master nodes for Hadoop name nodes and such. Now we need worker nodes for processing the data. Creating the [&hellip;]<\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[55],"tags":[50,49,54,4,48,47,51,108],"class_list":["post-236","post","type-post","status-publish","format-standard","hentry","category-howto","tag-amazon-web-services","tag-aws","tag-cloning-nodes","tag-hadoop","tag-hdp","tag-hortonworks-data-platform","tag-ubuntu-server","tag-worker-nodes"],"_links":{"self":[{"href":"https:\/\/marcel-jan.eu\/datablog\/wp-json\/wp\/v2\/posts\/236","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/marcel-jan.eu\/datablog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/marcel-jan.eu\/datablog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/marcel-jan.eu\/datablog\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/marcel-jan.eu\/datablog\/wp-json\/wp\/v2\/comments?post=236"}],"version-history":[{"count":8,"href":"https:\/\/marcel-jan.eu\/datablog\/wp-json\/wp\/v2\/posts\/236\/revisions"}],"predecessor-version":[{"id":477,"href":"https:\/\/marcel-jan.eu\/datablog\/wp-json\/wp\/v2\/posts\/236\/revisions\/477"}],"wp:attachment":[{"href":"https:\/\/marcel-jan.eu\/datablog\/wp-json\/wp\/v2\/media?parent=236"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/marcel-jan.eu\/datablog\/wp-json\/wp\/v2\/categories?post=236"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/marcel-jan.eu\/datablog\/wp-json\/wp\/v2\/tags?post=236"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}