This is part 3 in a series on how to build a Hortonworks Data Platform 2.6 cluster on AWS. By now we have an edge node to run Ambari Server, three master nodes for Hadoop name nodes and such. Now we need worker nodes for processing the data.
Creating the worker nodes is not that much different from creating the master nodes. But the workers need more powerful nodes.
Creating the first worker node
Log in at Amazon Web Services again, in the same AWS district as the edge and master nodes. We start with one worker node and clone 2 more later on. Go to the EC2 dashboard in the AWS interface and click “Launch instance”. Then choose Ubuntu Server 16.04 from the Amazon Machine Images.
For the workers we need machines with a little more oomph. Select a general purpose instance with type m4.2xlarge.
The worker nodes will be in the same subnet as the masters.
This is possibly more storage than necessary. 100GB for root, 25 GB of type EBS for /dev/sdc and 100 GB of type EBS for /dev/sdb.
The tags of course are different.
Extra preparation to install Ambari and HDP
You need unzip:
sudo apt-get update sudo apt install unzip
Create a new Elastic IP and associate the worker node with it.
Clone the worker node
Now we clone the first worker node to two new workers. This works in the same way as we cloned the master nodes, except this time m4.2xlarge nodes were chosen.
Change the tags in the instance list:
The nodes have the same software as the first worker node, but passworless access is something you have to configure on all of them. You need to put the worker nodes in the de OCS-POC edge – sg security group and associate them, one by one, with the Elastic IP, so you can log in directly.
Test the connection
When the master nodes are started, you should have access as root from the edge nodes.