Building HDP 2.6 on AWS, Part 2: the master nodes

This is part 2 in a series on how to build a Hortonworks Data Platform 2.6 cluster on AWS. In part 1 we created an edge node where we will later install Ambari Server. The next step is creating the master nodes.

Creating the first master node

Make sure you are logged in Amazon Web Services, in the same AWS district as the edge node. To create 3 master nodes, we have to start with one. Once again we go to the EC2 dashboard in the AWS interface and click “Launch instance”. And again we have a choice of Amazon Machine Images and again we choose Ubuntu Server 16.04.

This time I chose a general purpose instance that is a bit more powerful than our edge node. Select the one with type m3.large.

Networking

We will create our master node in a new subnet. We don’t want it to be accessible from the Internet, but it should be able to access Internet itself (for updates).

Because this subnet is not accessible from the outside world, you might put something like “private” in the name tag. This subnet will be created in the same VPC as last time, so select that. But it will have a different IPv4 CIDR block, than the one we created last time, because we need a different range of IP’s.

To give an example, last post I created a subnet for public access with IPv4 CIDR block 100.10.42.0/24. We’re going to be in the 100.10.x.x range, but we can’t chose 100.10.42.x anymore for our new subnet. That is already taken. So lets use 100.10.43.0/24.

Storage

On another Hadoop cluster we had space usage warnings on the root partition on some of the master nodes. Like I told part 1 it’s not easy to enlarge a root partition. 150 GB should do it, though.

Tags

Again, we can add a Name tag.

Security groups

The master nodes operate in a network space that is not accessible from the outside world. So if there isn’t a private security group available already, you have to create one.

While we do not want the master nodes to be accessible directly, we’re going to need that direct access at the start of the installation. So we need to place the master node temporarily in both the public and private security groups.

Review

Check that the configuration is in order and click Launch.

Key pair

Let’s use the same key pair as for the edge node.

Elastic IP

To temporarily reach our first master we’re also going to need an Elastic IP. So let’s make one. You’ll find it in the VPC Dashboard. If you’re not sure where to look, check out Part 1 where we also created one.

And let’s associate it with our first master.

You get to choose the associated instance here.

Connection with Putty

All is set to connect to our master node. For this I use Putty again.

Passwordless logging in from the edge node

After we have created our masters and nodes, we will register them in an Ambari cluster. This will not work when the process has to enter a passphrase when making a connection from our edge node to our master nodes (I know from experience).

To allow this we need to copy the /root/.ssh/id_rsa.pub file from the edge node, to the master node at the same location (/root/.ssh).

The way I did this:

  1. On the edge node, as root, I copied /root/.ssh/id_rsa.pub to /tmp.
  2. I copied the file with WinSCP from /tmp to my workstation.
  3. From my workstation I copied it with WinSCP to my master node to /tmp.
  4. On the master node, from /tmp as root I copied the file to /root/.ssh.

There is one more step. On the master node we need to copy id_rsa.pub to the file authorized_keys:

cat id_rsa.pub >> authorized_keys

Enable NTP

Next we need to do a couple of things on the master node as root. I’m following the documentation here, but with Ubuntu commands.

First we need to install NTP:

apt-get update
apt-get install ntp

Now we enable NTP:

update-rc.d ntp defaults
update-rc.d ntp enable
/etc/init.d/ntp start

Fully Qualified Domain Name

According to the documentation your hostname needs to be equal to the FQDN, or Fully Qualified Domain Name. Here is how I did that. Of course you need to fill in the hostname and IP of your master.

MYHOST=ip-100-10-43-237.eu-west-1.compute.internal
MYIP="100.10.43.237"

hostname $MYHOST
echo $MYHOST > /etc/hostname
echo $MYIP $MYHOST >> /etc/hosts

Because the right settings here are rather important, it’s best to check:

hostname
cat /etc/hostname
cat /etc/hosts

Turning off security

At this point the documentation tells me to turn off IPtables, the Linux firewall software. But Ubuntu Server 16.04 uses a different firewall product. If you query it’s status, you’ll find it’s turned off already:

# sudo ufw status

Status: inactive

SELinux also has to be turned off. SELinux stands for Security-Enhanced Linux. It’s a kernel module with which you can choose what software is allowed to do what. You turn it off with this command.

setenforce 0

Bit of a shame that, because I recently learned SELinux is actually not that hard to maintain.

But believe me, for now it’s better to follow instructions of HDP’s documentation up to the letter, because there’s so much that can go wrong.

There’s one more command to run, to set the umask.

umask 0022

Checking the setting is easy. This command should return 0022.

umask

And then let’s put the umask in the standard profile.

echo umask 0022 >> /etc/profile

Installing necessary software

We’re going to need some software. Here is how I’ve checked if everything is available.

sudo apt-get update
apt list installed curl
apt list installed scp
apt list installed unzip
apt list installed openssl
apt list installed tar
apt list installed wget
apt list installed python
apt list installed openjdk-8-jdk

JDK7 is also supported. On my system unzip was missing, so to install that, you run this command.

sudo apt install unzip

Couple of checks

I wrote these commands down, but I can’t seem to find them in the documentation anymore. The default values were okay, so it’s not a big issue.

This command checks free memory:

# free -m

              total        used        free      shared  buff/cache   available
 Mem:           7061         126        6114          16         821        6642
 Swap:             0           0           0

Maximum open number of files:

# ulimit -Sn

1024

# ulimit -Hn

4096

Removing the Elastic IP and security group

Alright, we have done everything necessary on the command prompt. Now it’s time to remove the unnecessary: the Elastic IP and public security group.

Next go to the Change Security Groups option in the instances overview.

Remove the public security group.

Don’t forget to test if you can now log in passwordless-ly from the edge node to this master node.

Bring in the clones

Time to clone us some master nodes. I’ve used the Launch More Like This option for this.

Immediately the review window pops up, but we have some changes to make.

So click Previous. Here I’ve already entered a name to which I will add numbers for different nodes later on.

Let’s go back by clicking Previous a couple of more times until you arrive at Step 3: Configure Instance Details. Here we change the number of instances to 2. This means we’re going to create 2 clones instead of one.

Now click Review and Launch. You have to choose the same key pair as the first master.

And before you know it, you have 3 master nodes. Let’s change their tags right away, so we can distinguish them from each other.

Test passwordless login, again

If I’m not very much mistaken, you can log in without password from the edge node to your new master nodes.

Conclusion

It’s quite a bit of work to make these master nodes, but you have to follow each step. Do this and the registration of your master (and worker) nodes in Ambari will go more smoothly.

If you have any remarks about my process here, let me know in the comments.

 

Next post we’re going to create worker nodes. Luckily this process will look very much like what we did to create master nodes.

About Marcel-Jan Krijgsman

In 2017 I made the leap to Big Data after 20 years of experience with Oracle databases. I followed courses on Hadoop, Big Data Analytics, Machine Learning and Python, MongoDB and Elasticsearch.
This entry was posted in Howto, Learning Big Data and tagged , , , , , , , . Bookmark the permalink.

5 Responses to Building HDP 2.6 on AWS, Part 2: the master nodes

  1. shr says:

    Where can I find the remaining parts of this tutorial?

    • Marcel-Jan Krijgsman says:

      I have the notes on them, but I don’t have the environment anymore. So I will put the next parts of the tutorial online, but I can’t check certain things anymore.

  2. shr says:

    Thanks Marcel, Even if you can put your notes detailing the steps (without screenshots of AWS) online would be helpful.

  3. Pingback: Building HDP 2.6 on AWS, Part 3: the worker nodes | Expedition Data

Leave a Reply

Your email address will not be published. Required fields are marked *