Friday, December 27, 2013

Setting up multi-node Hadoop cluster on Mac


On Mac (Lion OS 10.7.5)
  1. Download
    1. Download VMWare Fusion
    2. Install it. You may need to buy license after trial period.
    3. Download Ubuntu desktop 12.04LTS image onto local folder

  1. Launch
    1. Common for all the four nodes below
      1. We want to create four nodes below to simulate ideal multi-node cluster. Edge node will have cloudera manager (to install hadoop on the cluster), Eclipse (to develop code), to submit jobs, etc. Namenode will have namenode, secondary namenode and job tracker services. Data nodes will have datanode and task tracker services.
      2. Open VMWare Fusion and create four virtual machines (VMs) using the Ubuntu image that you downloaded earlier (Click on "Add" >> "New" >> "Install from disc or image" >> "Continue" >> "User another disc or disc image" >> Point to the downloaded Ubuntu image file >> "Customize" based on the memory/processors you have available on your Mac and then give the virtual machine names according to their usage. (You will need to remember the userid+password that you provide here)
      3. Launch each of the machines
      4. Login to the machine
      5. Click on "Dash" >> search for "Terminal" >> Open Terminal
      6. sudo apt-get install openssh-server (to accept SSH connections)
      7. ssh-keygen -t rsa -P "" (then hit enter when prompted)
      8. cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys
      9. chmod 600 ~/.ssh/id_rsa.pub
      10. Run "ifconfig" command and note down the ip address for each of the machines
      11. sudo vi /etc/hostname (replace "ubuntu" with new VM name e.g. "edge" or "nn1" or "dn1" or "dn2")
      12. sudo hostname <VM_Name> (e.g. sudo hostname edge) to change the VM name
      13. sudo vi /etc/hosts (Comment out the lines for "localhost" and "ubuntu" then add a line for each of the VMs "IPaddress   Machine_Name")
      14. sudo vi /etc/sudoers (add a line at the bottom "<user_id>  ALL=(ALL) NOPASSWD: ALL") to provide root previleges for the <user_id>
      15. Set time and timezone
        1. sudo apt-get install ntp
        2. sudo dpkg-reconfigure tzdata
      16. Restart the macnine (Click on power button on top right >> "shutdown" >> "restart")
    2. Edge node
      1. cat ~/.ssh/id_rsa.pub
      2. Highlight and copy the contents from the cat command above
      3. Go to each of the other three nodes, vi ~/.ssh/authorized_keys
      4. Paste the copied contents at the end of the above file & save
      5. SSH to all the four machines including itself couple of times to make sure you are not prompted for anything. First time you may need to type "yes" in the middle (e.g. ssh nn1, ssh dn1, ssh dn2)
      6. Download Cloudera manager & run per instructions at the same link
      7. cd to download dir
      8. chmod +x cloudera-manager-installer.bin
      9. sudo ./cloudera-manager-installer.bin
      10. Follow the instructions
      11. Open a browser and go to http://localhost:7180
      12. Login with "admin" and "admin"
      13. Start the install; Go with the user ID that you started off when you created the VMs
      14. Enter the ip addresses of all the four (including the edge node itself)
      15. Continue the installation.
    3. Name node
      1. cat ~/.ssh/id_rsa.pub
      2. Highlight and copy the contents from the cat command above
      3. Go to each of the other three nodes, vi ~/.ssh/authorized_keys
      4. Paste the copied contents at the end of the above file & save
      5. SSH to all the four machines including itself couple of times to make sure you are not prompted for anything. First time you may need to type "yes" in the middle (e.g. ssh nn1, ssh dn1, ssh dn2)
    4. Data node 1
      1. cat ~/.ssh/id_rsa.pub
      2. Highlight and copy the contents from the cat command above
      3. Go to each of the other three nodes, vi ~/.ssh/authorized_keys
      4. Paste the copied contents at the end of the above file & save
      5. SSH to all the four machines including itself couple of times to make sure you are not prompted for anything. First time you may need to type "yes" in the middle (e.g. ssh nn1, ssh dn1, ssh dn2)
    5. Data node 2
      1. cat ~/.ssh/id_rsa.pub
      2. Highlight and copy the contents from the cat command above
      3. Go to each of the other three nodes, vi ~/.ssh/authorized_keys
      4. Paste the copied contents at the end of the above file & save
      5. SSH to all the four machines including itself couple of times to make sure you are not prompted for anything. First time you may need to type "yes" in the middle (e.g. ssh nn1, ssh dn1, ssh dn2)

No comments:

Post a Comment