This is part 2 in a series on how to build a Hortonworks Data Platform 2.6 cluster on AWS. In part 1 we created an edge node where we will later install Ambari Server. The next step is creating the master nodes.
Creating the first master node
Make sure you are logged in Amazon Web Services, in the same AWS district as the edge node. To create 3 master nodes, we have to start with one. Once again we go to the EC2 dashboard in the AWS interface and click “Launch instance”. And again we have a choice of Amazon Machine Images and again we choose Ubuntu Server 16.04.
This time I chose a general purpose instance that is a bit more powerful than our edge node. Select the one with type m3.large.
Networking
We will create our master node in a new subnet. We don’t want it to be accessible from the Internet, but it should be able to access Internet itself (for updates).
Because this subnet is not accessible from the outside world, you might put something like “private” in the name tag. This subnet will be created in the same VPC as last time, so select that. But it will have a different IPv4 CIDR block, than the one we created last time, because we need a different range of IP’s.
To give an example, last post I created a subnet for public access with IPv4 CIDR block 100.10.42.0/24. We’re going to be in the 100.10.x.x range, but we can’t chose 100.10.42.x anymore for our new subnet. That is already taken. So lets use 100.10.43.0/24.
Storage
On another Hadoop cluster we had space usage warnings on the root partition on some of the master nodes. Like I told part 1 it’s not easy to enlarge a root partition. 150 GB should do it, though.
Tags
Again, we can add a Name tag.
Security groups
The master nodes operate in a network space that is not accessible from the outside world. So if there isn’t a private security group available already, you have to create one.
While we do not want the master nodes to be accessible directly, we’re going to need that direct access at the start of the installation. So we need to place the master node temporarily in both the public and private security groups.
Review
Check that the configuration is in order and click Launch.
Key pair
Let’s use the same key pair as for the edge node.
Elastic IP
To temporarily reach our first master we’re also going to need an Elastic IP. So let’s make one. You’ll find it in the VPC Dashboard. If you’re not sure where to look, check out Part 1 where we also created one.
And let’s associate it with our first master.
You get to choose the associated instance here.
Connection with Putty
All is set to connect to our master node. For this I use Putty again.
Passwordless logging in from the edge node
After we have created our masters and nodes, we will register them in an Ambari cluster. This will not work when the process has to enter a passphrase when making a connection from our edge node to our master nodes (I know from experience).
To allow this we need to copy the /root/.ssh/id_rsa.pub file from the edge node, to the master node at the same location (/root/.ssh).
The way I did this:
- On the edge node, as root, I copied /root/.ssh/id_rsa.pub to /tmp.
- I copied the file with WinSCP from /tmp to my workstation.
- From my workstation I copied it with WinSCP to my master node to /tmp.
- On the master node, from /tmp as root I copied the file to /root/.ssh.
There is one more step. On the master node we need to copy id_rsa.pub to the file authorized_keys:
cat id_rsa.pub >> authorized_keys
Enable NTP
Next we need to do a couple of things on the master node as root. I’m following the documentation here, but with Ubuntu commands.
First we need to install NTP:
apt-get update apt-get install ntp
Now we enable NTP:
update-rc.d ntp defaults update-rc.d ntp enable /etc/init.d/ntp start
Fully Qualified Domain Name
According to the documentation your hostname needs to be equal to the FQDN, or Fully Qualified Domain Name. Here is how I did that. Of course you need to fill in the hostname and IP of your master.
MYHOST=ip-100-10-43-237.eu-west-1.compute.internal MYIP="100.10.43.237" hostname $MYHOST echo $MYHOST > /etc/hostname echo $MYIP $MYHOST >> /etc/hosts
Because the right settings here are rather important, it’s best to check:
hostname cat /etc/hostname cat /etc/hosts
Turning off security
At this point the documentation tells me to turn off IPtables, the Linux firewall software. But Ubuntu Server 16.04 uses a different firewall product. If you query it’s status, you’ll find it’s turned off already:
# sudo ufw status Status: inactive
SELinux also has to be turned off. SELinux stands for Security-Enhanced Linux. It’s a kernel module with which you can choose what software is allowed to do what. You turn it off with this command.
setenforce 0
Bit of a shame that, because I recently learned SELinux is actually not that hard to maintain.
But believe me, for now it’s better to follow instructions of HDP’s documentation up to the letter, because there’s so much that can go wrong.
There’s one more command to run, to set the umask.
umask 0022
Checking the setting is easy. This command should return 0022.
umask
And then let’s put the umask in the standard profile.
echo umask 0022 >> /etc/profile
Installing necessary software
We’re going to need some software. Here is how I’ve checked if everything is available.
sudo apt-get update apt list installed curl apt list installed scp apt list installed unzip apt list installed openssl apt list installed tar apt list installed wget apt list installed python apt list installed openjdk-8-jdk
JDK7 is also supported. On my system unzip was missing, so to install that, you run this command.
sudo apt install unzip
Couple of checks
I wrote these commands down, but I can’t seem to find them in the documentation anymore. The default values were okay, so it’s not a big issue.
This command checks free memory:
# free -m total used free shared buff/cache available Mem: 7061 126 6114 16 821 6642 Swap: 0 0 0
Maximum open number of files:
# ulimit -Sn 1024 # ulimit -Hn 4096
Removing the Elastic IP and security group
Alright, we have done everything necessary on the command prompt. Now it’s time to remove the unnecessary: the Elastic IP and public security group.
Next go to the Change Security Groups option in the instances overview.
Remove the public security group.
Don’t forget to test if you can now log in passwordless-ly from the edge node to this master node.
Bring in the clones
Time to clone us some master nodes. I’ve used the Launch More Like This option for this.
Immediately the review window pops up, but we have some changes to make.
So click Previous. Here I’ve already entered a name to which I will add numbers for different nodes later on.
Let’s go back by clicking Previous a couple of more times until you arrive at Step 3: Configure Instance Details. Here we change the number of instances to 2. This means we’re going to create 2 clones instead of one.
Now click Review and Launch. You have to choose the same key pair as the first master.
And before you know it, you have 3 master nodes. Let’s change their tags right away, so we can distinguish them from each other.
Test passwordless login, again
If I’m not very much mistaken, you can log in without password from the edge node to your new master nodes.
Conclusion
It’s quite a bit of work to make these master nodes, but you have to follow each step. Do this and the registration of your master (and worker) nodes in Ambari will go more smoothly.
If you have any remarks about my process here, let me know in the comments.
Next post we’re going to create worker nodes. Luckily this process will look very much like what we did to create master nodes.
Where can I find the remaining parts of this tutorial?
I have the notes on them, but I don’t have the environment anymore. So I will put the next parts of the tutorial online, but I can’t check certain things anymore.
Thanks Marcel, Even if you can put your notes detailing the steps (without screenshots of AWS) online would be helpful.
Thanks Marcel, Even if you can put your notes detailing the steps (without screenshots of AWS) online would be helpful.
Pingback: Building HDP 2.6 on AWS, Part 3: the worker nodes | Expedition Data