Running Mizan on EC2

Introduction

If you want to try Mizan on EC2, you can use our per-installed copy of Mizan on EC2’s through Amazon Machine Image (AMI). We provide AMI for each region, which are listed below:

Region  AMI ID
US East (N. Virginia) ami-52ed743b

This tutorial shows how to run Mizan on both a single EC2 machine a and a cluster of EC2 machines with minimal configuration by using our per-configured Mizan AMIs. The EC2 AMI might contains an old version of Mizan so we recommend downloading and compiling the latest copy of Mizan from: https://code.google.com/p/mizan-graph-bsp/downloads/list

Sign up/Sign in to Amazon AWS

You should have an account with Amazon AWS to access Amazon’s cloud resources. If you don’t have one, sign up with a valid credit card through the following link: https://aws.amazon.com/

Once you sign up, sign in to your “AWS Management Console” and then select EC2 as shown below.

AWS Console

EC2 Service

Selecting EC2 Region

You need to select an EC2 region, which controls the physical location of your EC2 instances, form the AWS console. We use in this tutorial “US East (N. Virginia)” region; select the region you would like to use from the upper right corner of your AWS console as shown below:

EC2 Region

Setting up Security Group

Once you select your EC2 region, you need to create a new security group to allow the appropriate network ports for both Mizan and Hadoop. From the left side of the AWS console, click on “Security Groups” as shown below:

Security Group

Create a new security group by clicking “Create Security Group”, and name it “MizanSG”as shown below”:

Create Security Group

MizanSG

While selecting “MizanSG” security group, click on “Inbound” as shown below:

MizsnSG Inbound

Add the following rules to “MizanSG” security group, then click “Apply Rule Changes”:

Create a New Rule Port range Source Comment
 Custom TCP rule  22  0.0.0.0/0 Enable ssh from local and global machines
 ALL TCP 10.0.0.0/8 Enable TCP ports from local machines within EC2
ALL UDP 10.0.0.0/8 Enable UDP ports from local machines within EC2
 Custom TCP rule 50030 0.0.0.0/0 Enable Hadoop Map/Reduce Administration to local and global machines
  Custom TCP rule 50070 0.0.0.0/0 Enable Hadoop NameNode Administration to local and global machines

Screenshot from 2013-03-16 22:44:18

Launching an EC2 Instance

Once your security group is ready; you can now launch a new EC2 instance from one of the available Mizan’s AMI’s. Click on “Launch Instance” as shown below:

Launch EC2

Then select the classic wizard:

Screenshot from 2013-03-16 22:53:35

Select “Community AMIs” tab on your “Request Instances Wizard” window. Select “Public Images” from viewing and search for either “Mizan” or the AMI ID that matches your region -which are listed in the table at the beginning of this post-:

Mizan AMI

Once you find Mizan’s AMI, select it and continue with the wizard to select the instance type, count, availability zone, storage configuration and key pair. Make sure to select “MizanSG” security group before finishing the wizard. Don’t forget to stop or terminate your new EC2 instance after you finish.

Security Group

Running Mizan on a Single EC2 Instance

You need to login to your EC2 instance remotely using SSH. First, get the public DNS for your EC2 instance by clicking on it as shown below:

Starting Mizan

Connect to your instance using the command:

ssh -i {Path to your Key} ubuntu@{EC2 instance public DNS name}
#example: ssh -i khayyat.pem ubuntu@ec2-50-19-40-128.compute-1.amazonaws.com

After you access your instance, make sure to set a hostname for it. To change your hostname, open file “/etc/hostname”:

sudo nano /etc/hostname

Then write your new hostname. Save the file, reboot the instance and login again. We assume that the new hostname is “cloud1”. Now You need to configure Hadoop with the new hostname. Open “masters” file:

nano ~/hadoop-1.0.4/conf/masters

Replace the content of “masters” with the hostname of your ec2 instance and save the file. Then open “slaves” file:

nano ~/hadoop-1.0.4/conf/slaves

Replace the content of “slaves” with the hostnames of your ec2 instance and save the file. Then lets open file “core-site.xml” by running the command:

nano ~/hadoop-1.0.4/conf/core-site.xml

Add the following lines to the file and save it; make sure to use your correct hostname:

<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
 
<!-- Put site-specific property overrides in this file. -->
 
<configuration>
        <property>
                <name>hadoop.tmp.dir</name>
                <value>/home/ubuntu/hadoop_data</value>
        </property>
        <property>
                <name>fs.default.name</name>
                <value>hdfs://cloud1:54310</value>
        </property>
</configuration>

Now, open file “mapred-site.xml” with the command:

nano hadoop-1.0.4/conf/mapred-site.xml

Add the following lines to the file and save it; make sure to use your correct hostname:

<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
 
<!-- Put site-specific property overrides in this file. -->
 
<configuration>
        <property>
                <name>mapred.job.tracker</name>
                <value>cloud1:54311</value>
        </property>
</configuration>

You need to include the correct IP of your instance in “/etc/hosts”. First find out your local IP with the command:

ifconfig

The IP should be in format “10.x.x.x”. Now lets put the correct IP in “/etc/hosts” with the following command:

sudo nano /etc/hosts

Then add the IP’s for each worker in your cluster at the end of the file with the following format:

{your Private IP}    cloud1
#Example: 10.203.21.65      cloud1

Now, lets format Hadoop’s namenode:

hadoop namenode -format

We are done now with Hadoop. You can find Mizan on the home directory of the user “ubuntu”. All the required packages are installed and configured ready for you. Lets start Hadoop and partition the sample graph “web-Google.txt” within Mizan’s directory:

start-all.sh
cd ~/Mizan-0.1b/preMizan
./preMizan.sh ./exampleGraphs/web-Google.txt 2

You will get the following message on the terminal. Type “1” into your terminal to select Hash-based graph partitioning:

Select your partitioning type:
   1) Hash Graph Partitioning
   2) Range Graph Partitioning

Now, lets run PageRank on the “web-Google.txt” graph with two workers giving that web-Google has been per-partitioned using hash-based partitioning and the linux username is “ubuntu”:

cd ../Release
make clean; make al
mpirun -np 2 ./Mizan-0.1b -u ubuntu -g web-Google.txt -w 2

Running Mizan on cluster of EC2 instances

To run Mizan on a list of EC2’s, you need to start multiple instances of the same AMI. First, you need to do the steps for “Running Mizan on a Single EC2 Instance” and assign a representative hostname to it (we use cloud1). After that, use option “Launch More Like This” from the right mouse click on your instance “cloud1” to run multiple instances of similar AMI to your “cloud1” EC2 instance.
launch more

You need to change the hostnames of each new instance, store their IP and reboot them. After you fix the hostname of your new EC2 instances, go to your first EC2 instance “cloud1” and add all the hostnames and IP’s of your new EC2 instances to fine “/etc/hosts”. First open the “hosts” file with the command:

sudo nano /etc/hosts

Then add the IP’s for all in your cluster at the end of the file with the following format:

{your Private IP}    cloud1
{your Private IP}    cloud2
{your Private IP}    cloud1
.
.
.
{your Private IP}    cloudX

Then open Hadoop’s “slaves” file:

nano ~/hadoop-1.0.4/conf/slaves

and add all the hostnames of your EC2 instances, we assume the hostnames starts in the form of (cloudX):

cloud1
cloud2
cloud3
.
.
.
cloudX

Now you need to copy Hadoop’s configuration files to all other workers by running the following commands; we assume the user has 8 EC2 instances from “cloud1” up to “cloud8”:

for i in {2..8}; do scp ~/hadoop-1.0.4/conf/mapred-site.xml ~/hadoop-1.0.4/conf/core-site.xml ~/hadoop-1.0.4/conf/masters cloud$i:~/hadoop-1.0.4/conf;done

After that, all of the EC2 instances need a copy of the hostname file. You can do that by copying the “/etc/hosts” to all other machies through root by running the following command:

sudo su - root
for i in {2..8}; do scp /etc/hosts cloud$i:/etc/;done
exit

When you return to user “ubuntu”, start Hadoop at all machines and go to Mizan’s Release folder and create a machines file for MPICH2:

start-all.sh
cd ~/Mizan-0.1b/Release/
nano machines

Add all hostnames for your EC2 instances in the file:

cloud1
cloud2
cloud3
.
.
.
cloudX

Now Copy Mizan’s binary to the other workers, you need to do this after each recompilation to Mizan:

for i in {2..8}; do scp /home/ubuntu/Mizan/Mizan-0.1b/Release/Mizan-0.1b cloud$i:/home/ubuntu/Mizan/Mizan-0.1b/Release/;done

Now execute the following command to run Mizan in a distributed environment, note the use of “machines” in the MPI parameters:

cd ../Release
mpirun -f machines -np 2 ./Mizan-0.1b -u ubuntu -g web-Google.txt -w 2
Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s