Building HDP 2.6 on AWS, Part 3: the worker nodes

This is part 3 in a series on how to build a Hortonworks Data Platform 2.6 cluster on AWS. By now we have an edge node to run Ambari Server, three master nodes for Hadoop name nodes and such. Now we need worker nodes for processing the data.

Creating the worker nodes is not that much different from creating the master nodes. But the workers need more powerful nodes.

Creating the first worker node

Log in at Amazon Web Services again, in the same AWS district as the edge and master nodes. We start with one worker node and clone 2 more later on. Go to the EC2 dashboard in the AWS interface and click “Launch instance”. Then choose Ubuntu Server 16.04 from the Amazon Machine Images.

For the workers we need machines with a little more oomph. Select a general purpose instance with type m4.2xlarge.


The worker nodes will be in the same subnet as the masters.


This is possibly more storage than necessary. 100GB for root, 25 GB of type EBS for /dev/sdc and 100 GB of type EBS for /dev/sdb.


The tags of course are different.


Extra preparation to install Ambari and HDP

You need unzip:

sudo apt-get update

sudo apt install unzip

Elastic IP

Create a new Elastic IP and associate the worker node with it.


Clone the worker node

Now we clone the first worker node to two new workers. This works in the same way as we cloned the master nodes, except this time m4.2xlarge nodes were chosen.

Change the tags in the instance list:

The nodes have the same software as the first worker node, but passworless access is something you have to configure on all of them. You need to put the worker nodes in the de OCS-POC edge – sg security group and associate them, one by one, with the Elastic IP, so you can log in directly.

Test the connection

When the master nodes are started, you should have access as root from the edge nodes.


About Marcel-Jan Krijgsman

In 2017 I made the leap to Big Data after 20 years of experience with Oracle databases. I followed courses on Hadoop, Big Data Analytics, Machine Learning and Python, MongoDB and Elasticsearch.
This entry was posted in Howto, Learning Big Data and tagged , , , , , , , . Bookmark the permalink.

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.