Friday, December 27, 2013

Setting up multi-node Hadoop cluster on Mac


On Mac (Lion OS 10.7.5)
  1. Download
    1. Download VMWare Fusion
    2. Install it. You may need to buy license after trial period.
    3. Download Ubuntu desktop 12.04LTS image onto local folder

  1. Launch
    1. Common for all the four nodes below
      1. We want to create four nodes below to simulate ideal multi-node cluster. Edge node will have cloudera manager (to install hadoop on the cluster), Eclipse (to develop code), to submit jobs, etc. Namenode will have namenode, secondary namenode and job tracker services. Data nodes will have datanode and task tracker services.
      2. Open VMWare Fusion and create four virtual machines (VMs) using the Ubuntu image that you downloaded earlier (Click on "Add" >> "New" >> "Install from disc or image" >> "Continue" >> "User another disc or disc image" >> Point to the downloaded Ubuntu image file >> "Customize" based on the memory/processors you have available on your Mac and then give the virtual machine names according to their usage. (You will need to remember the userid+password that you provide here)
      3. Launch each of the machines
      4. Login to the machine
      5. Click on "Dash" >> search for "Terminal" >> Open Terminal
      6. sudo apt-get install openssh-server (to accept SSH connections)
      7. ssh-keygen -t rsa -P "" (then hit enter when prompted)
      8. cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys
      9. chmod 600 ~/.ssh/id_rsa.pub
      10. Run "ifconfig" command and note down the ip address for each of the machines
      11. sudo vi /etc/hostname (replace "ubuntu" with new VM name e.g. "edge" or "nn1" or "dn1" or "dn2")
      12. sudo hostname <VM_Name> (e.g. sudo hostname edge) to change the VM name
      13. sudo vi /etc/hosts (Comment out the lines for "localhost" and "ubuntu" then add a line for each of the VMs "IPaddress   Machine_Name")
      14. sudo vi /etc/sudoers (add a line at the bottom "<user_id>  ALL=(ALL) NOPASSWD: ALL") to provide root previleges for the <user_id>
      15. Set time and timezone
        1. sudo apt-get install ntp
        2. sudo dpkg-reconfigure tzdata
      16. Restart the macnine (Click on power button on top right >> "shutdown" >> "restart")
    2. Edge node
      1. cat ~/.ssh/id_rsa.pub
      2. Highlight and copy the contents from the cat command above
      3. Go to each of the other three nodes, vi ~/.ssh/authorized_keys
      4. Paste the copied contents at the end of the above file & save
      5. SSH to all the four machines including itself couple of times to make sure you are not prompted for anything. First time you may need to type "yes" in the middle (e.g. ssh nn1, ssh dn1, ssh dn2)
      6. Download Cloudera manager & run per instructions at the same link
      7. cd to download dir
      8. chmod +x cloudera-manager-installer.bin
      9. sudo ./cloudera-manager-installer.bin
      10. Follow the instructions
      11. Open a browser and go to http://localhost:7180
      12. Login with "admin" and "admin"
      13. Start the install; Go with the user ID that you started off when you created the VMs
      14. Enter the ip addresses of all the four (including the edge node itself)
      15. Continue the installation.
    3. Name node
      1. cat ~/.ssh/id_rsa.pub
      2. Highlight and copy the contents from the cat command above
      3. Go to each of the other three nodes, vi ~/.ssh/authorized_keys
      4. Paste the copied contents at the end of the above file & save
      5. SSH to all the four machines including itself couple of times to make sure you are not prompted for anything. First time you may need to type "yes" in the middle (e.g. ssh nn1, ssh dn1, ssh dn2)
    4. Data node 1
      1. cat ~/.ssh/id_rsa.pub
      2. Highlight and copy the contents from the cat command above
      3. Go to each of the other three nodes, vi ~/.ssh/authorized_keys
      4. Paste the copied contents at the end of the above file & save
      5. SSH to all the four machines including itself couple of times to make sure you are not prompted for anything. First time you may need to type "yes" in the middle (e.g. ssh nn1, ssh dn1, ssh dn2)
    5. Data node 2
      1. cat ~/.ssh/id_rsa.pub
      2. Highlight and copy the contents from the cat command above
      3. Go to each of the other three nodes, vi ~/.ssh/authorized_keys
      4. Paste the copied contents at the end of the above file & save
      5. SSH to all the four machines including itself couple of times to make sure you are not prompted for anything. First time you may need to type "yes" in the middle (e.g. ssh nn1, ssh dn1, ssh dn2)

Thursday, December 19, 2013

Model thinking

Course by Scott E Page. Models help solve problems by physicalizing abstract or complex things into something you can tweak and play around with. With models you can introduce and analyze various parameters that influence the result one by one. Models help explain whey things happened one way (why rich get richer) or help in coming up equations / rules so as to easily solve (predict whats going to happen next) when parameters change.

Productivity is highest during wars. Why? People are focussed and produce more. A manager must introduce problems periodically so that the team will produce more and be more resilient (e.g. controlled forrest fires).

Some problems become very easy to solve once you change the way you represent the problem (e.g. cartesian coordinations vs polar or sum three to 15 puzzle).

Many models are better than one. It helps because one gets stuck.

Do higher level work and delegate lower level work to machines. Thats the technology advantage that will keep the sustained growth rate (technology advantage). New innovation (new skills) is the way to keep increasing your salary (output). Typically people have rapid rise in salary in the early years and it flattens out (like countries with high growth peters out after a while unless you innovate)

Data points are given. Now find some insight from that. Build models. Predict whats going to happen next.

Saturday, March 9, 2013

Startup = Growth

Startup = Growth:
Are you growing consistently every week? If not, you are not a start up.

'via Blog this'

How to Get Startup Ideas

How to Get Startup Ideas: "When you have an idea for a startup, ask yourself: who wants this right now? Who wants this so much that they'll use it even when it's a crappy version one made by a two-person startup they've never heard of? If you can't answer that, the idea is probably bad."

'via Blog this'

Monday, January 21, 2013

How to Spot the Five Tool Superstar | LinkedIn

How to Spot the Five Tool Superstar | LinkedIn:

'via Blog this'

Public Data Sets : Amazon Web Services

Public Data Sets : Amazon Web Services:
Infochimps, Lexis Nexis, Genome. Census

'via Blog this'

Freebase - Wikipedia, the free encyclopedia

Freebase - Wikipedia, the free encyclopedia:

'via Blog this'

Web 2.0 - Wikipedia, the free encyclopedia

Web 3.0 - Wikipedia, the free encyclopedia:
Semantic Web and personalization first-generation Metaverse (convergence of the virtual and physical world), a web development layer that includes TV-quality open video, 3D simulations, augmented reality, human-constructed semantic standards, and pervasive broadband, wireless, and sensors. Web 3.0's early geosocial (Foursquare, etc.) and augmented reality (Layar, etc.) webs are an extension of Web 2.0's participatory technologies and social networks (Facebook, etc.) into 3D space.

'via Blog this'

Mashup (web application hybrid) - Wikipedia, the free encyclopedia

Mashup (web application hybrid) - Wikipedia, the free encyclopedia:

'via Blog this'

Thursday, January 17, 2013

Big Data Means Big IT Job Opportunities -- for the Right People - CIO.com

Big Data Means Big IT Job Opportunities -- for the Right People - CIO.com:
data scientist, data architect, data visualizer and data change agent.
math, statistics, data analysis, business analytics and even natural language processing.
require knowledge of programming and the ability to develop applications, as well as an understanding of how to meet business needs.
They have to take ideas from one field and apply them to another field, and they have to be comfortable with ambiguity."
In short, big data folks seem to be jacks of all trades and masters of none, and their greatest skill may be the ability to serve as the "glue" in an organization. "You can take someone who maybe is not the world's greatest software engineer [nor] the world's greatest statistician, but they have the communications skills to talk to people on both sides" as well as to the marketing team and C-level executives.
'via Blog this'

The 3 Puzzle Pieces That Shape Your Career Path | LinkedIn

The 3 Puzzle Pieces That Shape Your Career Path | LinkedIn:

Career: Your assets (skills, network, cash balance), Aspirations and Values (thinking big), Market realities (what does market want) -- Reid Hoffman

'via Blog this'

Tuesday, January 15, 2013

Apply BigData to the Science of Management | LinkedIn

Apply BigData to the Science of Management | LinkedIn:

'via Blog this'

Man Busts Out of Google, Rebuilds Top-Secret Query Machine | Wired Enterprise | Wired.com

Man Busts Out of Google, Rebuilds Top-Secret Query Machine | Wired Enterprise | Wired.com:

'via Blog this'

Amazon.com: The Start-up of You: Adapt to the Future, Invest in Yourself, and Transform Your Career (9780307888907): Reid Hoffman, Ben Casnocha: Books

Amazon.com: The Start-up of You: Adapt to the Future, Invest in Yourself, and Transform Your Career (9780307888907): Reid Hoffman, Ben Casnocha: Books: ""Being an entrepreneur isn’t really about starting a business.  It’s a way of looking at the world: seeing opportunity where others see obstacles, taking risks when others take refuge."

'via Blog this'

Breakout Opportunities Are What Accelerates Your Career | LinkedIn

Breakout Opportunities Are What Accelerates Your Career | LinkedIn:

'via Blog this'

Breakout Opportunities Are What Accelerates Your Career | LinkedIn

Breakout Opportunities Are What Accelerates Your Career | LinkedIn:

'via Blog this'

Using Cloudera’s Hadoop AMIs to process EBS datasets on EC2 | Apache Hadoop for the Enterprise | Cloudera

Using Cloudera’s Hadoop AMIs to process EBS datasets on EC2 | Apache Hadoop for the Enterprise | Cloudera:

'via Blog this'