SlideShare a Scribd company logo
1 of 31
Hadoop Single Cluster
Installation
Minh Tran – Software Architect
05/2013
Prerequisites
• Ubuntu Server 10.04 (Lucid Lynx)
• JDK 6u34 Linux
• Hadoop 1.0.4
• VMWare Player / VMWare Workstation /
VMWare Server
• Ubuntu Server VMWare Image:
http://www.thoughtpolice.co.uk/vmware/#u
buntu10.04 (notroot / thoughtpolice)
Install SSH
• sudo apt-get update
• sudo apt-get install openssh-server
Install JDK
• wget -c -O jdk-6u34-linux-i586.bin
http://download.oracle.com/otn/java/jdk/6u34-b04/jdk-6u34-
linux-
i586.bin?AuthParam=1347897296_c6dd13e0af9e099dc731937f9
5c1cd01
• chmod 777 jdk-6u34-linux-i586.bin
• ./jdk-6u34-linux-i586.bin
• sudo mv jdk1.6.0_34 /usr/local
• sudo ln -s /usr/local/jdk1.6.0_34 /usr/local/jdk
Create group / account for
Hadoop
• sudo addgroup hadoop
• sudo adduser --ingroup hadoop hduser
Install Local Hadoop
• wget http://mirrors.digipower.vn/apache/hadoop/common/hado
op-1.0.4/hadoop-1.0.4.tar.gz
• tar -zxvf hadoop-1.0.4.tar.gz
• sudo mv hadoop-1.0.4 /usr/local
• sudo chown -R hduser:hadoop /usr/local/hadoop-1.0.4
• sudo ln -s /usr/local/hadoop-1.0.4 /usr/local/hadoop
Install Apache Ant
• wget http://mirrors.digipower.vn/apache/ant/binaries/apache-
ant-1.9.0-bin.tar.gz
• tar -zxvf apache-ant-1.9.0-bin.tar.gz
• sudo mv apache-ant-1.9.0 /usr/local
• sudo ln -s /usr/local/apache-ant-1.9.0 /usr/local/apache-ant
Modify environment variables
• su - hduser
• vi .bashrc
• export JAVA_HOME=/usr/local/jdk
• export HADOOP_PREFIX=/usr/local/hadoop
• export
PATH=${JAVA_HOME}/bin:${HADOOP_PREFIX}/bin:${PATH}
• . .bashrc
Try 1st example
hduser@ubuntu:/usr/local/hadoop$ cd $HADOOP_PREFIX
hduser@ubuntu:/usr/local/hadoop$ hadoop jar hadoop-examples-1.0.4.jar pi 2 10
Number of Maps = 2
Samples per Map = 10
Wrote input for Map #0
Wrote input for Map #1
Starting Job
13/04/03 15:01:40 INFO mapred.FileInputFormat: Total input paths to process : 2
13/04/03 15:01:41 INFO mapred.JobClient: Running job: job_201304031458_0003
13/04/03 15:01:42 INFO mapred.JobClient: map 0% reduce 0%
13/04/03 15:02:00 INFO mapred.JobClient: map 100% reduce 0%
13/04/03 15:02:15 INFO mapred.JobClient: map 100% reduce 100%
13/04/03 15:02:19 INFO mapred.JobClient: Job complete: job_201304031458_0003
13/04/03 15:02:19 INFO mapred.JobClient: Counters: 30
13/04/03 15:02:19 INFO mapred.JobClient: Job Counters
…
13/04/03 15:02:19 INFO mapred.JobClient: Reduce output records=0
13/04/03 15:02:19 INFO mapred.JobClient: Virtual memory (bytes) snapshot=1118670848
13/04/03 15:02:19 INFO mapred.JobClient: Map output records=4
Job Finished in 39.148 seconds
Estimated value of Pi is 3.80000000000000000000
Setup Single Node Cluster
• Disabling ipv6
• Configuring SSH
• Configuration
– hadoop-env.sh
– conf/*-site.xml
• Start / Stop node cluster
• Running MapReduce job
Disabling ipv6
• Open /etc/sysctl.conf, add following lines
# disable ipv6
net.ipv6.conf.all.disable_ipv6 = 1
net.ipv6.conf.default.disable_ipv6 = 1
net.ipv6.conf.lo.disable_ipv6 = 1
• Reboot your machine
• Verify ipv6 enabled / disabled
cat /proc/sys/net/ipv6/conf/all/disable_ipv6
(0 – enabled, 1 – disabled)
Configuring SSH
• Create SSH keys in the localhost
su - hduser
ssh-keygen -t rsa -P "“
• Put the key id_rsa.pub to localhost
touch ~/.ssh/authorized_keys && chmod 600
~/.ssh/authorized_keys
cat $HOME/.ssh/id_rsa.pub >> $HOME/.ssh/authorized_keys
Configuration
• Edit the configuration
in /usr/local/hadoop/conf/hadoop-
env.sh, add following lines:
export JAVA_HOME=/usr/local/jdk
Configuration (cont.)
• Create a folder to store data for node
sudo mkdir -p /hadoop_data/name
sudo mkdir -p /hadoop_data/data
sudo mkdir -p /hadoop_data/temp
sudo chown hduser:hadoop /hadoop_data/name
sudo chown hduser:hadoop /hadoop_data/data
sudo chown hduser:hadoop /hadoop_data/temp
conf/core-site.xml
<configuration>
<property>
<name>hadoop.tmp.dir</name>
<value>/hadoop_data/temp</value>
<description>A base for other temporary directories.</description>
</property>
<property>
<name>fs.default.name</name>
<value>hdfs://localhost:54310</value>
<description>The name of the default file system. A URI whose
scheme and authority determine the FileSystem implementation. The
uri's scheme determines the config property (fs.SCHEME.impl) naming
the FileSystem implementation class. The uri's authority is used to
determine the host, port, etc. for a filesystem.</description>
</property>
</configuration>
conf/mapred-site.xml
<configuration>
<property>
<name>mapred.job.tracker</name>
<value>localhost:54311</value>
<description>The host and port that the MapReduce job tracker runs at. If
"local", then jobs are run in-process as a single map and reduce task.
</description>
</property>
</configuration>
conf/hdfs-site.xml
<configuration>
<property>
<name>dfs.name.dir</name>
<!-- Path to store namespace and transaction logs -->
<value>/hadoop_data/name</value>
</property>
<property>
<name>dfs.data.dir</name>
<!-- Path to store data blocks in datanode -->
<value>/hadoop_data/data</value>
</property>
<property>
<name>dfs.replication</name>
<value>1</value>
<description>Default block replication. The actual number of replications can
be specified when the file is created. The default is used if replication is not
specified in create time.
</description>
</property>
</configuration>
Format a new system
notroot@ubuntu:/usr/local/hadoop/conf$ su - hduser
Password:
hduser@ubuntu:~$ /usr/local/hadoop/bin/hadoop namenode -format
13/04/03 13:41:24 INFO namenode.NameNode: STARTUP_MSG:
/************************************************************
STARTUP_MSG: Starting NameNode
STARTUP_MSG: host = ubuntu.localdomain/127.0.1.1
STARTUP_MSG: args = [-format]
STARTUP_MSG: version = 1.0.4
STARTUP_MSG: build = https://svn.apache.org/repos/asf/hadoop/common/branches/branch-1.0 -r 1393290;
compiled by 'hortonfo' on Wed Oct 3 05:13:58 UTC 2012
************************************************************/
Re-format filesystem in /hadoop_data/name ? (Y or N) Y
13/04/03 13:41:26 INFO util.GSet: VM type = 32-bit
13/04/03 13:41:26 INFO util.GSet: 2% max memory = 19.33375 MB
13/04/03 13:41:26 INFO util.GSet: capacity = 2^22 = 4194304 entries
….
13/04/03 13:41:28 INFO common.Storage: Storage directory /hadoop_data/name has been successfully formatted.
13/04/03 13:41:28 INFO namenode.NameNode: SHUTDOWN_MSG:
/************************************************************
SHUTDOWN_MSG: Shutting down NameNode at ubuntu.localdomain/127.0.1.1
************************************************************/
Do not format a running Hadoop file system as you will lose all the
data currently in the cluster (in HDFS)!
Start Single Node Cluster
hduser@ubuntu:~$ /usr/local/hadoop/bin/start-all.sh
starting namenode, logging to /usr/local/hadoop-1.0.4/libexec/../logs/hadoop-hduser-
namenode-ubuntu.out
localhost: starting datanode, logging to /usr/local/hadoop-1.0.4/libexec/../logs/hadoop-
hduser-datanode-ubuntu.out
localhost: starting secondarynamenode, logging to /usr/local/hadoop-
1.0.4/libexec/../logs/hadoop-hduser-secondarynamenode-ubuntu.out
starting jobtracker, logging to /usr/local/hadoop-1.0.4/libexec/../logs/hadoop-hduser-
jobtracker-ubuntu.out
localhost: starting tasktracker, logging to /usr/local/hadoop-1.0.4/libexec/../logs/hadoop-
hduser-tasktracker-ubuntu.out
How to verify Hadoop process
• A nifty tool for checking whether the expected Hadoop processes are running is jps
(part of Sun JDK tool)
hduser@ubuntu:~$ jps
1203 NameNode
1833 Jps
1615 JobTracker
1541 SecondaryNameNode
1362 DataNode
1788 TaskTracker
• You can also check with netstat if Hadoop is listening on the configured ports.
notroot@ubuntu:/usr/local/hadoop/conf$ sudo netstat -plten | grep java
tcp 0 0 127.0.0.1:54310 0.0.0.0:* LISTEN 1001 7167 2438/java
tcp 0 0 127.0.0.1:54311 0.0.0.0:* LISTEN 1001 7949 2874/java
tcp 0 0 0.0.0.0:50090 0.0.0.0:* LISTEN 1001 7898 2791/java
tcp 0 0 0.0.0.0:50030 0.0.0.0:* LISTEN 1001 8035 2874/java
tcp 0 0 0.0.0.0:50070 0.0.0.0:* LISTEN 1001 7202 2438/java
tcp 0 0 0.0.0.0:57143 0.0.0.0:* LISTEN 1001 7585 2791/java
tcp 0 0 0.0.0.0:41943 0.0.0.0:* LISTEN 1001 7222 2608/java
tcp 0 0 0.0.0.0:58936 0.0.0.0:* LISTEN 1001 6969 2438/java
tcp 0 0 127.0.0.1:50234 0.0.0.0:* LISTEN 1001 8158 3050/java
tcp 0 0 0.0.0.0:50010 0.0.0.0:* LISTEN 1001 7697 2608/java
tcp 0 0 0.0.0.0:50075 0.0.0.0:* LISTEN 1001 7775 2608/java
tcp 0 0 0.0.0.0:40067 0.0.0.0:* LISTEN 1001 7764 2874/java
tcp 0 0 0.0.0.0:50020 0.0.0.0:* LISTEN 1001 7939 2608/java
Stop your single node cluster
hduser@ubuntu:~$ /usr/local/hadoop/bin/stop-all.sh
stopping jobtracker
localhost: stopping tasktracker
stopping namenode
localhost: stopping datanode
localhost: stopping secondarynamenode
Running a MapReduce job
• We will use three ebooks from Project
Gutenberg for this example:
– The Outline of Science, Vol. 1 (of 4) by J. Arthur Thomson
– The Notebooks of Leonardo Da Vinci
– Ulysses by James Joyce
• Download each ebook as text files in Plain
Text UTF-8 encoding and store the files in
/tmp/gutenberg
Running a MapReduce job
(cont.)
• Copy these files into HDFS
hduser@ubuntu:~$ hadoop dfs -copyFromLocal /tmp/gutenberg
/user/hduser/gutenberg
hduser@ubuntu:~$ hadoop dfs -ls /user/hduser/gutenberg
Found 3 items
-rw-r--r-- 1 hduser supergroup 661807 2013-04-03 14:01
/user/hduser/gutenberg/pg20417.txt
-rw-r--r-- 1 hduser supergroup 1540092 2013-04-03 14:01
/user/hduser/gutenberg/pg4300.txt
-rw-r--r-- 1 hduser supergroup 1391684 2013-04-03 14:01
/user/hduser/gutenberg/pg5000.txt
Running a MapReduce job
(cont.)
hduser@ubuntu:~$ cd /usr/local/hadoop
hduser@ubuntu:/usr/local/hadoop$ hadoop jar hadoop-examples-1.0.4.jar wordcount /user/hduser/gutenberg
/user/hduser/gutenberg-output
13/04/03 14:02:45 INFO input.FileInputFormat: Total input paths to process : 3
13/04/03 14:02:45 INFO util.NativeCodeLoader: Loaded the native-hadoop library
13/04/03 14:02:45 WARN snappy.LoadSnappy: Snappy native library not loaded
13/04/03 14:02:45 INFO mapred.JobClient: Running job: job_201304031352_0001
13/04/03 14:02:46 INFO mapred.JobClient: map 0% reduce 0%
13/04/03 14:03:09 INFO mapred.JobClient: map 66% reduce 0%
13/04/03 14:03:32 INFO mapred.JobClient: map 100% reduce 0%
13/04/03 14:03:47 INFO mapred.JobClient: map 100% reduce 100%
13/04/03 14:03:53 INFO mapred.JobClient: Job complete: job_201304031352_0001
13/04/03 14:03:53 INFO mapred.JobClient: Counters: 29
13/04/03 14:03:53 INFO mapred.JobClient: Job Counters
13/04/03 14:03:53 INFO mapred.JobClient: Launched reduce tasks=1
…
13/04/03 14:03:53 INFO mapred.JobClient: SLOTS_MILLIS_MAPS=59114
13/04/03 14:03:53 INFO mapred.JobClient: SPLIT_RAW_BYTES=361
13/04/03 14:03:53 INFO mapred.JobClient: Reduce input records=102321
13/04/03 14:03:53 INFO mapred.JobClient: Reduce input groups=82334
13/04/03 14:03:53 INFO mapred.JobClient: Combine output records=102321
13/04/03 14:03:53 INFO mapred.JobClient: Physical memory (bytes) snapshot=576069632
13/04/03 14:03:53 INFO mapred.JobClient: Reduce output records=82334
13/04/03 14:03:53 INFO mapred.JobClient: Virtual memory (bytes) snapshot=1490481152
13/04/03 14:03:53 INFO mapred.JobClient: Map output records=629172
Check the result
• hduser@ubuntu:/usr/local/hadoop$ hadoop dfs -ls /user/hduser/gutenberg-output
Found 3 items
-rw-r--r-- 1 hduser supergroup 0 2013-04-03 14:03 /user/hduser/gutenberg-output/_SUCCESS
drwxr-xr-x - hduser supergroup 0 2013-04-03 14:02 /user/hduser/gutenberg-output/_logs
-rw-r--r-- 1 hduser supergroup 880829 2013-04-03 14:03 /user/hduser/gutenberg-output/part-r-00000
• hduser@ubuntu:/usr/local/hadoop$ hadoop dfs -cat /user/hduser/gutenberg-output/part-r-00000 | more
"(Lo)cra" 1
"1490 1
"1498," 1
"35" 1
"40," 1
"A 2
"AS-IS". 1
"A_ 1
"Absoluti 1
"Alack! 1
"Alack!" 1
"Alla 1
"Allegorical 1
"Alpha 1
"Alpha," 1
…
Hadoop Interfaces
• NameNode Web UI:
http://192.168.65.134:50070/
• JobTracker Web UI:
http://192.168.65.134:50030/
• TaskTracker Web UI:
http://192.168.65.134:50060/
NameNode Web UI daemon
JobTracker Web UI
TaskTracker Web UI
Troubleshooting
• VMware Ubuntu image lost eth0 after moving
it http://www.whiteboardcoder.com/2012/03/vmware-ubuntu-image-
lost-eth0-after.html
• Hadoop Troubleshooting:
http://wiki.apache.org/hadoop/TroubleShooting
• Error when formatting the Hadoop
filesystem: http://askubuntu.com/questions/35551/error-when-
formatting-the-hadoop-filesystem
THANK YOU

More Related Content

What's hot

HADOOP 실제 구성 사례, Multi-Node 구성
HADOOP 실제 구성 사례, Multi-Node 구성HADOOP 실제 구성 사례, Multi-Node 구성
HADOOP 실제 구성 사례, Multi-Node 구성Young Pyo
 
Apache Hadoop & Hive installation with movie rating exercise
Apache Hadoop & Hive installation with movie rating exerciseApache Hadoop & Hive installation with movie rating exercise
Apache Hadoop & Hive installation with movie rating exerciseShiva Rama Krishna Dasharathi
 
Set up Hadoop Cluster on Amazon EC2
Set up Hadoop Cluster on Amazon EC2Set up Hadoop Cluster on Amazon EC2
Set up Hadoop Cluster on Amazon EC2IMC Institute
 
Puppet: Eclipsecon ALM 2013
Puppet: Eclipsecon ALM 2013Puppet: Eclipsecon ALM 2013
Puppet: Eclipsecon ALM 2013grim_radical
 
How to create a secured multi tenancy for clustered ML with JupyterHub
How to create a secured multi tenancy for clustered ML with JupyterHubHow to create a secured multi tenancy for clustered ML with JupyterHub
How to create a secured multi tenancy for clustered ML with JupyterHubTiago Simões
 
How to go the extra mile on monitoring
How to go the extra mile on monitoringHow to go the extra mile on monitoring
How to go the extra mile on monitoringTiago Simões
 
Automated infrastructure is on the menu
Automated infrastructure is on the menuAutomated infrastructure is on the menu
Automated infrastructure is on the menujtimberman
 
Ansible ex407 and EX 294
Ansible ex407 and EX 294Ansible ex407 and EX 294
Ansible ex407 and EX 294IkiArif1
 
Nagios Conference 2014 - Mike Weber - Expanding NRDS Capabilities on Linux Sy...
Nagios Conference 2014 - Mike Weber - Expanding NRDS Capabilities on Linux Sy...Nagios Conference 2014 - Mike Weber - Expanding NRDS Capabilities on Linux Sy...
Nagios Conference 2014 - Mike Weber - Expanding NRDS Capabilities on Linux Sy...Nagios
 
Salesforce at Stacki Atlanta Meetup February 2016
Salesforce at Stacki Atlanta Meetup February 2016Salesforce at Stacki Atlanta Meetup February 2016
Salesforce at Stacki Atlanta Meetup February 2016StackIQ
 
Introduction to Stacki at Atlanta Meetup February 2016
Introduction to Stacki at Atlanta Meetup February 2016Introduction to Stacki at Atlanta Meetup February 2016
Introduction to Stacki at Atlanta Meetup February 2016StackIQ
 
Vagrant, Ansible, and OpenStack on your laptop
Vagrant, Ansible, and OpenStack on your laptopVagrant, Ansible, and OpenStack on your laptop
Vagrant, Ansible, and OpenStack on your laptopLorin Hochstein
 

What's hot (17)

Run wordcount job (hadoop)
Run wordcount job (hadoop)Run wordcount job (hadoop)
Run wordcount job (hadoop)
 
HADOOP 실제 구성 사례, Multi-Node 구성
HADOOP 실제 구성 사례, Multi-Node 구성HADOOP 실제 구성 사례, Multi-Node 구성
HADOOP 실제 구성 사례, Multi-Node 구성
 
Apache Hadoop & Hive installation with movie rating exercise
Apache Hadoop & Hive installation with movie rating exerciseApache Hadoop & Hive installation with movie rating exercise
Apache Hadoop & Hive installation with movie rating exercise
 
Set up Hadoop Cluster on Amazon EC2
Set up Hadoop Cluster on Amazon EC2Set up Hadoop Cluster on Amazon EC2
Set up Hadoop Cluster on Amazon EC2
 
Hadoop on ec2
Hadoop on ec2Hadoop on ec2
Hadoop on ec2
 
Puppet: Eclipsecon ALM 2013
Puppet: Eclipsecon ALM 2013Puppet: Eclipsecon ALM 2013
Puppet: Eclipsecon ALM 2013
 
How to create a secured multi tenancy for clustered ML with JupyterHub
How to create a secured multi tenancy for clustered ML with JupyterHubHow to create a secured multi tenancy for clustered ML with JupyterHub
How to create a secured multi tenancy for clustered ML with JupyterHub
 
How to go the extra mile on monitoring
How to go the extra mile on monitoringHow to go the extra mile on monitoring
How to go the extra mile on monitoring
 
Automated infrastructure is on the menu
Automated infrastructure is on the menuAutomated infrastructure is on the menu
Automated infrastructure is on the menu
 
Hadoop completereference
Hadoop completereferenceHadoop completereference
Hadoop completereference
 
Ansible ex407 and EX 294
Ansible ex407 and EX 294Ansible ex407 and EX 294
Ansible ex407 and EX 294
 
Nagios Conference 2014 - Mike Weber - Expanding NRDS Capabilities on Linux Sy...
Nagios Conference 2014 - Mike Weber - Expanding NRDS Capabilities on Linux Sy...Nagios Conference 2014 - Mike Weber - Expanding NRDS Capabilities on Linux Sy...
Nagios Conference 2014 - Mike Weber - Expanding NRDS Capabilities on Linux Sy...
 
Salesforce at Stacki Atlanta Meetup February 2016
Salesforce at Stacki Atlanta Meetup February 2016Salesforce at Stacki Atlanta Meetup February 2016
Salesforce at Stacki Atlanta Meetup February 2016
 
Introduction to Stacki at Atlanta Meetup February 2016
Introduction to Stacki at Atlanta Meetup February 2016Introduction to Stacki at Atlanta Meetup February 2016
Introduction to Stacki at Atlanta Meetup February 2016
 
Vagrant, Ansible, and OpenStack on your laptop
Vagrant, Ansible, and OpenStack on your laptopVagrant, Ansible, and OpenStack on your laptop
Vagrant, Ansible, and OpenStack on your laptop
 
Hadoop 2.4 installing on ubuntu 14.04
Hadoop 2.4 installing on ubuntu 14.04Hadoop 2.4 installing on ubuntu 14.04
Hadoop 2.4 installing on ubuntu 14.04
 
Dev ops
Dev opsDev ops
Dev ops
 

Similar to Hadoop single cluster installation

Installing odoo v8 from github
Installing odoo v8 from githubInstalling odoo v8 from github
Installing odoo v8 from githubAntony Gitomeh
 
/etc/rc.d配下とかのリーディング勉強会
/etc/rc.d配下とかのリーディング勉強会/etc/rc.d配下とかのリーディング勉強会
/etc/rc.d配下とかのリーディング勉強会Naoya Nakazawa
 
Creating "Secure" PHP applications, Part 2, Server Hardening
Creating "Secure" PHP applications, Part 2, Server HardeningCreating "Secure" PHP applications, Part 2, Server Hardening
Creating "Secure" PHP applications, Part 2, Server Hardeningarchwisp
 
ONOS SDN Controller - Clustering Tests & Experiments
ONOS SDN Controller - Clustering Tests & Experiments ONOS SDN Controller - Clustering Tests & Experiments
ONOS SDN Controller - Clustering Tests & Experiments Eueung Mulyana
 
Intrusion Detection System using Snort
Intrusion Detection System using Snort Intrusion Detection System using Snort
Intrusion Detection System using Snort webhostingguy
 
Intrusion Detection System using Snort
Intrusion Detection System using Snort Intrusion Detection System using Snort
Intrusion Detection System using Snort webhostingguy
 
ERP System Implementation Kubernetes Cluster with Sticky Sessions
ERP System Implementation Kubernetes Cluster with Sticky Sessions ERP System Implementation Kubernetes Cluster with Sticky Sessions
ERP System Implementation Kubernetes Cluster with Sticky Sessions Chanaka Lasantha
 
Quick-and-Easy Deployment of a Ceph Storage Cluster with SLES
Quick-and-Easy Deployment of a Ceph Storage Cluster with SLESQuick-and-Easy Deployment of a Ceph Storage Cluster with SLES
Quick-and-Easy Deployment of a Ceph Storage Cluster with SLESJan Kalcic
 
Install and configure linux
Install and configure linuxInstall and configure linux
Install and configure linuxVicent Selfa
 
Openstack 101
Openstack 101Openstack 101
Openstack 101POSSCON
 
Openstack Testbed_ovs_virtualbox_devstack_single node
Openstack Testbed_ovs_virtualbox_devstack_single nodeOpenstack Testbed_ovs_virtualbox_devstack_single node
Openstack Testbed_ovs_virtualbox_devstack_single nodeYongyoon Shin
 
Discoverer 11.1.1.7 web logic (10.3.6) & ebs r12 12.1.3) implementation guide...
Discoverer 11.1.1.7 web logic (10.3.6) & ebs r12 12.1.3) implementation guide...Discoverer 11.1.1.7 web logic (10.3.6) & ebs r12 12.1.3) implementation guide...
Discoverer 11.1.1.7 web logic (10.3.6) & ebs r12 12.1.3) implementation guide...ginniapps
 
Linux 系統管理與安全:進階系統管理系統防駭與資訊安全
Linux 系統管理與安全:進階系統管理系統防駭與資訊安全Linux 系統管理與安全:進階系統管理系統防駭與資訊安全
Linux 系統管理與安全:進階系統管理系統防駭與資訊安全維泰 蔡
 
How To Install Openbravo ERP 2.50 MP43 in Ubuntu
How To Install Openbravo ERP 2.50 MP43 in UbuntuHow To Install Openbravo ERP 2.50 MP43 in Ubuntu
How To Install Openbravo ERP 2.50 MP43 in UbuntuWirabumi Software
 
Virtualization and automation of library software/machines + Puppet
Virtualization and automation of library software/machines + PuppetVirtualization and automation of library software/machines + Puppet
Virtualization and automation of library software/machines + PuppetOmar Reygaert
 
Hands-On Session Docker
Hands-On Session DockerHands-On Session Docker
Hands-On Session DockerLinetsChile
 

Similar to Hadoop single cluster installation (20)

Installing odoo v8 from github
Installing odoo v8 from githubInstalling odoo v8 from github
Installing odoo v8 from github
 
Instalar PENTAHO 5 en CentOS 6
Instalar PENTAHO 5 en CentOS 6Instalar PENTAHO 5 en CentOS 6
Instalar PENTAHO 5 en CentOS 6
 
/etc/rc.d配下とかのリーディング勉強会
/etc/rc.d配下とかのリーディング勉強会/etc/rc.d配下とかのリーディング勉強会
/etc/rc.d配下とかのリーディング勉強会
 
Creating "Secure" PHP applications, Part 2, Server Hardening
Creating "Secure" PHP applications, Part 2, Server HardeningCreating "Secure" PHP applications, Part 2, Server Hardening
Creating "Secure" PHP applications, Part 2, Server Hardening
 
ONOS SDN Controller - Clustering Tests & Experiments
ONOS SDN Controller - Clustering Tests & Experiments ONOS SDN Controller - Clustering Tests & Experiments
ONOS SDN Controller - Clustering Tests & Experiments
 
Intrusion Detection System using Snort
Intrusion Detection System using Snort Intrusion Detection System using Snort
Intrusion Detection System using Snort
 
Intrusion Detection System using Snort
Intrusion Detection System using Snort Intrusion Detection System using Snort
Intrusion Detection System using Snort
 
Hacking the swisscom modem
Hacking the swisscom modemHacking the swisscom modem
Hacking the swisscom modem
 
ERP System Implementation Kubernetes Cluster with Sticky Sessions
ERP System Implementation Kubernetes Cluster with Sticky Sessions ERP System Implementation Kubernetes Cluster with Sticky Sessions
ERP System Implementation Kubernetes Cluster with Sticky Sessions
 
Quick-and-Easy Deployment of a Ceph Storage Cluster with SLES
Quick-and-Easy Deployment of a Ceph Storage Cluster with SLESQuick-and-Easy Deployment of a Ceph Storage Cluster with SLES
Quick-and-Easy Deployment of a Ceph Storage Cluster with SLES
 
Install and configure linux
Install and configure linuxInstall and configure linux
Install and configure linux
 
Openstack 101
Openstack 101Openstack 101
Openstack 101
 
Openstack Testbed_ovs_virtualbox_devstack_single node
Openstack Testbed_ovs_virtualbox_devstack_single nodeOpenstack Testbed_ovs_virtualbox_devstack_single node
Openstack Testbed_ovs_virtualbox_devstack_single node
 
Discoverer 11.1.1.7 web logic (10.3.6) & ebs r12 12.1.3) implementation guide...
Discoverer 11.1.1.7 web logic (10.3.6) & ebs r12 12.1.3) implementation guide...Discoverer 11.1.1.7 web logic (10.3.6) & ebs r12 12.1.3) implementation guide...
Discoverer 11.1.1.7 web logic (10.3.6) & ebs r12 12.1.3) implementation guide...
 
Linux 系統管理與安全:進階系統管理系統防駭與資訊安全
Linux 系統管理與安全:進階系統管理系統防駭與資訊安全Linux 系統管理與安全:進階系統管理系統防駭與資訊安全
Linux 系統管理與安全:進階系統管理系統防駭與資訊安全
 
Sun raysetup
Sun raysetupSun raysetup
Sun raysetup
 
How To Install Openbravo ERP 2.50 MP43 in Ubuntu
How To Install Openbravo ERP 2.50 MP43 in UbuntuHow To Install Openbravo ERP 2.50 MP43 in Ubuntu
How To Install Openbravo ERP 2.50 MP43 in Ubuntu
 
Snort-IPS-Tutorial
Snort-IPS-TutorialSnort-IPS-Tutorial
Snort-IPS-Tutorial
 
Virtualization and automation of library software/machines + Puppet
Virtualization and automation of library software/machines + PuppetVirtualization and automation of library software/machines + Puppet
Virtualization and automation of library software/machines + Puppet
 
Hands-On Session Docker
Hands-On Session DockerHands-On Session Docker
Hands-On Session Docker
 

Recently uploaded

Google AI Hackathon: LLM based Evaluator for RAG
Google AI Hackathon: LLM based Evaluator for RAGGoogle AI Hackathon: LLM based Evaluator for RAG
Google AI Hackathon: LLM based Evaluator for RAGSujit Pal
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersThousandEyes
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024Scott Keck-Warren
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...HostedbyConfluent
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Servicegiselly40
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Allon Mureinik
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 

Recently uploaded (20)

Google AI Hackathon: LLM based Evaluator for RAG
Google AI Hackathon: LLM based Evaluator for RAGGoogle AI Hackathon: LLM based Evaluator for RAG
Google AI Hackathon: LLM based Evaluator for RAG
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen Frames
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 

Hadoop single cluster installation

  • 1. Hadoop Single Cluster Installation Minh Tran – Software Architect 05/2013
  • 2. Prerequisites • Ubuntu Server 10.04 (Lucid Lynx) • JDK 6u34 Linux • Hadoop 1.0.4 • VMWare Player / VMWare Workstation / VMWare Server • Ubuntu Server VMWare Image: http://www.thoughtpolice.co.uk/vmware/#u buntu10.04 (notroot / thoughtpolice)
  • 3. Install SSH • sudo apt-get update • sudo apt-get install openssh-server
  • 4. Install JDK • wget -c -O jdk-6u34-linux-i586.bin http://download.oracle.com/otn/java/jdk/6u34-b04/jdk-6u34- linux- i586.bin?AuthParam=1347897296_c6dd13e0af9e099dc731937f9 5c1cd01 • chmod 777 jdk-6u34-linux-i586.bin • ./jdk-6u34-linux-i586.bin • sudo mv jdk1.6.0_34 /usr/local • sudo ln -s /usr/local/jdk1.6.0_34 /usr/local/jdk
  • 5. Create group / account for Hadoop • sudo addgroup hadoop • sudo adduser --ingroup hadoop hduser
  • 6. Install Local Hadoop • wget http://mirrors.digipower.vn/apache/hadoop/common/hado op-1.0.4/hadoop-1.0.4.tar.gz • tar -zxvf hadoop-1.0.4.tar.gz • sudo mv hadoop-1.0.4 /usr/local • sudo chown -R hduser:hadoop /usr/local/hadoop-1.0.4 • sudo ln -s /usr/local/hadoop-1.0.4 /usr/local/hadoop
  • 7. Install Apache Ant • wget http://mirrors.digipower.vn/apache/ant/binaries/apache- ant-1.9.0-bin.tar.gz • tar -zxvf apache-ant-1.9.0-bin.tar.gz • sudo mv apache-ant-1.9.0 /usr/local • sudo ln -s /usr/local/apache-ant-1.9.0 /usr/local/apache-ant
  • 8. Modify environment variables • su - hduser • vi .bashrc • export JAVA_HOME=/usr/local/jdk • export HADOOP_PREFIX=/usr/local/hadoop • export PATH=${JAVA_HOME}/bin:${HADOOP_PREFIX}/bin:${PATH} • . .bashrc
  • 9. Try 1st example hduser@ubuntu:/usr/local/hadoop$ cd $HADOOP_PREFIX hduser@ubuntu:/usr/local/hadoop$ hadoop jar hadoop-examples-1.0.4.jar pi 2 10 Number of Maps = 2 Samples per Map = 10 Wrote input for Map #0 Wrote input for Map #1 Starting Job 13/04/03 15:01:40 INFO mapred.FileInputFormat: Total input paths to process : 2 13/04/03 15:01:41 INFO mapred.JobClient: Running job: job_201304031458_0003 13/04/03 15:01:42 INFO mapred.JobClient: map 0% reduce 0% 13/04/03 15:02:00 INFO mapred.JobClient: map 100% reduce 0% 13/04/03 15:02:15 INFO mapred.JobClient: map 100% reduce 100% 13/04/03 15:02:19 INFO mapred.JobClient: Job complete: job_201304031458_0003 13/04/03 15:02:19 INFO mapred.JobClient: Counters: 30 13/04/03 15:02:19 INFO mapred.JobClient: Job Counters … 13/04/03 15:02:19 INFO mapred.JobClient: Reduce output records=0 13/04/03 15:02:19 INFO mapred.JobClient: Virtual memory (bytes) snapshot=1118670848 13/04/03 15:02:19 INFO mapred.JobClient: Map output records=4 Job Finished in 39.148 seconds Estimated value of Pi is 3.80000000000000000000
  • 10. Setup Single Node Cluster • Disabling ipv6 • Configuring SSH • Configuration – hadoop-env.sh – conf/*-site.xml • Start / Stop node cluster • Running MapReduce job
  • 11. Disabling ipv6 • Open /etc/sysctl.conf, add following lines # disable ipv6 net.ipv6.conf.all.disable_ipv6 = 1 net.ipv6.conf.default.disable_ipv6 = 1 net.ipv6.conf.lo.disable_ipv6 = 1 • Reboot your machine • Verify ipv6 enabled / disabled cat /proc/sys/net/ipv6/conf/all/disable_ipv6 (0 – enabled, 1 – disabled)
  • 12. Configuring SSH • Create SSH keys in the localhost su - hduser ssh-keygen -t rsa -P "“ • Put the key id_rsa.pub to localhost touch ~/.ssh/authorized_keys && chmod 600 ~/.ssh/authorized_keys cat $HOME/.ssh/id_rsa.pub >> $HOME/.ssh/authorized_keys
  • 13. Configuration • Edit the configuration in /usr/local/hadoop/conf/hadoop- env.sh, add following lines: export JAVA_HOME=/usr/local/jdk
  • 14. Configuration (cont.) • Create a folder to store data for node sudo mkdir -p /hadoop_data/name sudo mkdir -p /hadoop_data/data sudo mkdir -p /hadoop_data/temp sudo chown hduser:hadoop /hadoop_data/name sudo chown hduser:hadoop /hadoop_data/data sudo chown hduser:hadoop /hadoop_data/temp
  • 15. conf/core-site.xml <configuration> <property> <name>hadoop.tmp.dir</name> <value>/hadoop_data/temp</value> <description>A base for other temporary directories.</description> </property> <property> <name>fs.default.name</name> <value>hdfs://localhost:54310</value> <description>The name of the default file system. A URI whose scheme and authority determine the FileSystem implementation. The uri's scheme determines the config property (fs.SCHEME.impl) naming the FileSystem implementation class. The uri's authority is used to determine the host, port, etc. for a filesystem.</description> </property> </configuration>
  • 16. conf/mapred-site.xml <configuration> <property> <name>mapred.job.tracker</name> <value>localhost:54311</value> <description>The host and port that the MapReduce job tracker runs at. If "local", then jobs are run in-process as a single map and reduce task. </description> </property> </configuration>
  • 17. conf/hdfs-site.xml <configuration> <property> <name>dfs.name.dir</name> <!-- Path to store namespace and transaction logs --> <value>/hadoop_data/name</value> </property> <property> <name>dfs.data.dir</name> <!-- Path to store data blocks in datanode --> <value>/hadoop_data/data</value> </property> <property> <name>dfs.replication</name> <value>1</value> <description>Default block replication. The actual number of replications can be specified when the file is created. The default is used if replication is not specified in create time. </description> </property> </configuration>
  • 18. Format a new system notroot@ubuntu:/usr/local/hadoop/conf$ su - hduser Password: hduser@ubuntu:~$ /usr/local/hadoop/bin/hadoop namenode -format 13/04/03 13:41:24 INFO namenode.NameNode: STARTUP_MSG: /************************************************************ STARTUP_MSG: Starting NameNode STARTUP_MSG: host = ubuntu.localdomain/127.0.1.1 STARTUP_MSG: args = [-format] STARTUP_MSG: version = 1.0.4 STARTUP_MSG: build = https://svn.apache.org/repos/asf/hadoop/common/branches/branch-1.0 -r 1393290; compiled by 'hortonfo' on Wed Oct 3 05:13:58 UTC 2012 ************************************************************/ Re-format filesystem in /hadoop_data/name ? (Y or N) Y 13/04/03 13:41:26 INFO util.GSet: VM type = 32-bit 13/04/03 13:41:26 INFO util.GSet: 2% max memory = 19.33375 MB 13/04/03 13:41:26 INFO util.GSet: capacity = 2^22 = 4194304 entries …. 13/04/03 13:41:28 INFO common.Storage: Storage directory /hadoop_data/name has been successfully formatted. 13/04/03 13:41:28 INFO namenode.NameNode: SHUTDOWN_MSG: /************************************************************ SHUTDOWN_MSG: Shutting down NameNode at ubuntu.localdomain/127.0.1.1 ************************************************************/ Do not format a running Hadoop file system as you will lose all the data currently in the cluster (in HDFS)!
  • 19. Start Single Node Cluster hduser@ubuntu:~$ /usr/local/hadoop/bin/start-all.sh starting namenode, logging to /usr/local/hadoop-1.0.4/libexec/../logs/hadoop-hduser- namenode-ubuntu.out localhost: starting datanode, logging to /usr/local/hadoop-1.0.4/libexec/../logs/hadoop- hduser-datanode-ubuntu.out localhost: starting secondarynamenode, logging to /usr/local/hadoop- 1.0.4/libexec/../logs/hadoop-hduser-secondarynamenode-ubuntu.out starting jobtracker, logging to /usr/local/hadoop-1.0.4/libexec/../logs/hadoop-hduser- jobtracker-ubuntu.out localhost: starting tasktracker, logging to /usr/local/hadoop-1.0.4/libexec/../logs/hadoop- hduser-tasktracker-ubuntu.out
  • 20. How to verify Hadoop process • A nifty tool for checking whether the expected Hadoop processes are running is jps (part of Sun JDK tool) hduser@ubuntu:~$ jps 1203 NameNode 1833 Jps 1615 JobTracker 1541 SecondaryNameNode 1362 DataNode 1788 TaskTracker • You can also check with netstat if Hadoop is listening on the configured ports. notroot@ubuntu:/usr/local/hadoop/conf$ sudo netstat -plten | grep java tcp 0 0 127.0.0.1:54310 0.0.0.0:* LISTEN 1001 7167 2438/java tcp 0 0 127.0.0.1:54311 0.0.0.0:* LISTEN 1001 7949 2874/java tcp 0 0 0.0.0.0:50090 0.0.0.0:* LISTEN 1001 7898 2791/java tcp 0 0 0.0.0.0:50030 0.0.0.0:* LISTEN 1001 8035 2874/java tcp 0 0 0.0.0.0:50070 0.0.0.0:* LISTEN 1001 7202 2438/java tcp 0 0 0.0.0.0:57143 0.0.0.0:* LISTEN 1001 7585 2791/java tcp 0 0 0.0.0.0:41943 0.0.0.0:* LISTEN 1001 7222 2608/java tcp 0 0 0.0.0.0:58936 0.0.0.0:* LISTEN 1001 6969 2438/java tcp 0 0 127.0.0.1:50234 0.0.0.0:* LISTEN 1001 8158 3050/java tcp 0 0 0.0.0.0:50010 0.0.0.0:* LISTEN 1001 7697 2608/java tcp 0 0 0.0.0.0:50075 0.0.0.0:* LISTEN 1001 7775 2608/java tcp 0 0 0.0.0.0:40067 0.0.0.0:* LISTEN 1001 7764 2874/java tcp 0 0 0.0.0.0:50020 0.0.0.0:* LISTEN 1001 7939 2608/java
  • 21. Stop your single node cluster hduser@ubuntu:~$ /usr/local/hadoop/bin/stop-all.sh stopping jobtracker localhost: stopping tasktracker stopping namenode localhost: stopping datanode localhost: stopping secondarynamenode
  • 22. Running a MapReduce job • We will use three ebooks from Project Gutenberg for this example: – The Outline of Science, Vol. 1 (of 4) by J. Arthur Thomson – The Notebooks of Leonardo Da Vinci – Ulysses by James Joyce • Download each ebook as text files in Plain Text UTF-8 encoding and store the files in /tmp/gutenberg
  • 23. Running a MapReduce job (cont.) • Copy these files into HDFS hduser@ubuntu:~$ hadoop dfs -copyFromLocal /tmp/gutenberg /user/hduser/gutenberg hduser@ubuntu:~$ hadoop dfs -ls /user/hduser/gutenberg Found 3 items -rw-r--r-- 1 hduser supergroup 661807 2013-04-03 14:01 /user/hduser/gutenberg/pg20417.txt -rw-r--r-- 1 hduser supergroup 1540092 2013-04-03 14:01 /user/hduser/gutenberg/pg4300.txt -rw-r--r-- 1 hduser supergroup 1391684 2013-04-03 14:01 /user/hduser/gutenberg/pg5000.txt
  • 24. Running a MapReduce job (cont.) hduser@ubuntu:~$ cd /usr/local/hadoop hduser@ubuntu:/usr/local/hadoop$ hadoop jar hadoop-examples-1.0.4.jar wordcount /user/hduser/gutenberg /user/hduser/gutenberg-output 13/04/03 14:02:45 INFO input.FileInputFormat: Total input paths to process : 3 13/04/03 14:02:45 INFO util.NativeCodeLoader: Loaded the native-hadoop library 13/04/03 14:02:45 WARN snappy.LoadSnappy: Snappy native library not loaded 13/04/03 14:02:45 INFO mapred.JobClient: Running job: job_201304031352_0001 13/04/03 14:02:46 INFO mapred.JobClient: map 0% reduce 0% 13/04/03 14:03:09 INFO mapred.JobClient: map 66% reduce 0% 13/04/03 14:03:32 INFO mapred.JobClient: map 100% reduce 0% 13/04/03 14:03:47 INFO mapred.JobClient: map 100% reduce 100% 13/04/03 14:03:53 INFO mapred.JobClient: Job complete: job_201304031352_0001 13/04/03 14:03:53 INFO mapred.JobClient: Counters: 29 13/04/03 14:03:53 INFO mapred.JobClient: Job Counters 13/04/03 14:03:53 INFO mapred.JobClient: Launched reduce tasks=1 … 13/04/03 14:03:53 INFO mapred.JobClient: SLOTS_MILLIS_MAPS=59114 13/04/03 14:03:53 INFO mapred.JobClient: SPLIT_RAW_BYTES=361 13/04/03 14:03:53 INFO mapred.JobClient: Reduce input records=102321 13/04/03 14:03:53 INFO mapred.JobClient: Reduce input groups=82334 13/04/03 14:03:53 INFO mapred.JobClient: Combine output records=102321 13/04/03 14:03:53 INFO mapred.JobClient: Physical memory (bytes) snapshot=576069632 13/04/03 14:03:53 INFO mapred.JobClient: Reduce output records=82334 13/04/03 14:03:53 INFO mapred.JobClient: Virtual memory (bytes) snapshot=1490481152 13/04/03 14:03:53 INFO mapred.JobClient: Map output records=629172
  • 25. Check the result • hduser@ubuntu:/usr/local/hadoop$ hadoop dfs -ls /user/hduser/gutenberg-output Found 3 items -rw-r--r-- 1 hduser supergroup 0 2013-04-03 14:03 /user/hduser/gutenberg-output/_SUCCESS drwxr-xr-x - hduser supergroup 0 2013-04-03 14:02 /user/hduser/gutenberg-output/_logs -rw-r--r-- 1 hduser supergroup 880829 2013-04-03 14:03 /user/hduser/gutenberg-output/part-r-00000 • hduser@ubuntu:/usr/local/hadoop$ hadoop dfs -cat /user/hduser/gutenberg-output/part-r-00000 | more "(Lo)cra" 1 "1490 1 "1498," 1 "35" 1 "40," 1 "A 2 "AS-IS". 1 "A_ 1 "Absoluti 1 "Alack! 1 "Alack!" 1 "Alla 1 "Allegorical 1 "Alpha 1 "Alpha," 1 …
  • 26. Hadoop Interfaces • NameNode Web UI: http://192.168.65.134:50070/ • JobTracker Web UI: http://192.168.65.134:50030/ • TaskTracker Web UI: http://192.168.65.134:50060/
  • 27. NameNode Web UI daemon
  • 30. Troubleshooting • VMware Ubuntu image lost eth0 after moving it http://www.whiteboardcoder.com/2012/03/vmware-ubuntu-image- lost-eth0-after.html • Hadoop Troubleshooting: http://wiki.apache.org/hadoop/TroubleShooting • Error when formatting the Hadoop filesystem: http://askubuntu.com/questions/35551/error-when- formatting-the-hadoop-filesystem

Editor's Notes

  1. In the preceding sample, MapReduce worked in the local mode without starting any servers and using the local filesystem as the storage system for inputs, outputs, and working data. The following diagram shows what happened in the WordCount program under the covers: