SlideShare uma empresa Scribd logo
1 de 70
Baixar para ler offline
Outline Introduction RHadoop RHadoop Installation rhdfs rmr2 Examples
Big Data Analytics with R and Hadoop
D. Praveen Kumar
Research Scholar (Full-Time)
Department of Computer Science & Engineering
YSREC of Yogi Vemana University, Proddatur
Kadapa Dt., A. P, India
November 30, 2016
YSR Engineering College of YVU, Proddatur, Kadapa
Big Data Analytics with R and Hadoop
November 30, 2016 Slide: 1 / 70
Outline Introduction RHadoop RHadoop Installation rhdfs rmr2 Examples
1 Introduction
2 RHadoop
3 RHadoop Installation
4 rhdfs Methods
5 rmr2
6 Examples
YSR Engineering College of YVU, Proddatur, Kadapa
Big Data Analytics with R and Hadoop
November 30, 2016 Slide: 2 / 70
Outline Introduction RHadoop RHadoop Installation rhdfs rmr2 Examples
Big Data - Introduction
Big Data has to deal with large and complex data sets that can be
structured, semi-structured, or unstructured and will typically not
fit into memory to be processed. They have to be processed in
place, which means that computation has to be done where the
data resides for processing.
YSR Engineering College of YVU, Proddatur, Kadapa
Big Data Analytics with R and Hadoop
November 30, 2016 Slide: 3 / 70
Outline Introduction RHadoop RHadoop Installation rhdfs rmr2 Examples
Big Data - 3V’s
Velocity refers to the low latency, real-time speed at which the
analytics need to be applied. (Example: to perform analytics
on a continuous stream of data originating from a social
networking site)
Volume refers to the size of the data set. It may be in KB,
MB, GB, TB, or PB based on the type of the application that
generates or receives the data.
Variety refers to the various types of the data that can exist,
for example, text, audio, video, and photos.
YSR Engineering College of YVU, Proddatur, Kadapa
Big Data Analytics with R and Hadoop
November 30, 2016 Slide: 4 / 70
Outline Introduction RHadoop RHadoop Installation rhdfs rmr2 Examples
Big Data - 3V’s (Cont..)
YSR Engineering College of YVU, Proddatur, Kadapa
Big Data Analytics with R and Hadoop
November 30, 2016 Slide: 5 / 70
Outline Introduction RHadoop RHadoop Installation rhdfs rmr2 Examples
Popular Organizations that hold Big Data
Some of the popular organizations that hold Big Data are as
follows: (upto 2014)
Facebook: It has 40 PB of data and captures 100 TB/day
Yahoo!: It has 60 PB of data
Twitter: It captures 8 TB/day
EBay: It has 40 PB of data and captures 50 TB/day
YSR Engineering College of YVU, Proddatur, Kadapa
Big Data Analytics with R and Hadoop
November 30, 2016 Slide: 6 / 70
Outline Introduction RHadoop RHadoop Installation rhdfs rmr2 Examples
Hadoop - Introduction
Apache Hadoop is an open source Java framework for
processing and querying vast amounts of data on large
clusters of commodity hardware.
Hadoop is a top level Apache project, initiated and led by
Yahoo! and Doug Cutting.
Its impact can be boiled down to four salient characteristics:
scalable, cost-effective, flexible, fault-tolerant solutions.
Apache Hadoop has two main features:
HDFS (Hadoop Distributed File System) - Storing
Map Reduce - Processing
YSR Engineering College of YVU, Proddatur, Kadapa
Big Data Analytics with R and Hadoop
November 30, 2016 Slide: 7 / 70
Outline Introduction RHadoop RHadoop Installation rhdfs rmr2 Examples
Requirements
Necessary
Java >= 7
ssh
Linux OS (Ubuntu >=
14.04)
Hadoop framework
Optional
Eclipse
Internet connection
YSR Engineering College of YVU, Proddatur, Kadapa
Big Data Analytics with R and Hadoop
November 30, 2016 Slide: 8 / 70
Outline Introduction RHadoop RHadoop Installation rhdfs rmr2 Examples
Java 7 & Installation
Hadoop requires a working Java installation. However, using
java 1.7 or more is recommended.
Following command is used to install java in linux platform
sudo apt-get install openjdk-7-jdk (or)
sudo apt-get install default-jdk
YSR Engineering College of YVU, Proddatur, Kadapa
Big Data Analytics with R and Hadoop
November 30, 2016 Slide: 9 / 70
Outline Introduction RHadoop RHadoop Installation rhdfs rmr2 Examples
Java PATH Setup
We need to set JAVA path
Open the .bashrc file located in home directory
gedit ~/.bashrc
Add below line at the end:
export JAVA HOME=/usr/lib/jvm/java−7−openjdk−amd64
YSR Engineering College of YVU, Proddatur, Kadapa
Big Data Analytics with R and Hadoop
November 30, 2016 Slide: 10 / 70
Outline Introduction RHadoop RHadoop Installation rhdfs rmr2 Examples
Installation & Configuration of SSH
Hadoop requires SSH(Secure Shell) access to manage its
nodes, i.e. remote machines plus your local machine if you
want to use Hadoop on it.
Install SSH using following command
sudo apt-get install ssh
First, we have to generate DSA an SSH key for user.
ssh-keygen -t dsa -P ’’ -f ~ /.ssh/id dsa
cat ~ /.ssh/id dsa.pub >> ~ /.ssh/authorized keys
YSR Engineering College of YVU, Proddatur, Kadapa
Big Data Analytics with R and Hadoop
November 30, 2016 Slide: 11 / 70
Outline Introduction RHadoop RHadoop Installation rhdfs rmr2 Examples
Download & Extract Hadoop
Download Hadoop from the Apache Download Mirrors
http://mirror.fibergrid.in/apache/hadoop/common/
Extract the contents of the Hadoop package to a location of your
choice. I picked /usr/local/hadoop.
$ sudo chmod 777 /usr/local
$ cd /usr/local
$ tar xzf hadoop-2.7.2.tar.gz
$ sudo mv hadoop-2.7.2 hadoop
YSR Engineering College of YVU, Proddatur, Kadapa
Big Data Analytics with R and Hadoop
November 30, 2016 Slide: 12 / 70
Outline Introduction RHadoop RHadoop Installation rhdfs rmr2 Examples
Add Hadoop configuration in .bashrc
Add Hadoop configuration in .bashrc in home directory.
export HADOOP INSTALL=/usr/local/hadoop
export PATH=$PATH:$HADOOP INSTALL/bin
export PATH=$PATH:$HADOOP INSTALL/sbin
export HADOOP MAPRED HOME=$HADOOP INSTALL
export HADOOP HDFS HOME=$HADOOP INSTALL
export HADOOP COMMON HOME=$HADOOP INSTALL
export YARN HOME=$HADOOP INSTALL
export HADOOP COMMON LIB NATIVE DIR=$HADOOP INSTALL/lib/native
export HADOOP OPTS="-Djava.library.path=$HADOOP INSTALL/lib"
YSR Engineering College of YVU, Proddatur, Kadapa
Big Data Analytics with R and Hadoop
November 30, 2016 Slide: 13 / 70
Outline Introduction RHadoop RHadoop Installation rhdfs rmr2 Examples
Create temp file, DataNode & NameNode
Execute below commands to create NameNode
mkdir -p /usr/local/hadoopdata/hdfs/namenode
Execute below commands to create DataNode
mkdir -p /usr/local/hadoopdata/hdfs/datanode
Execute below code to create the tmp directory in hadoop
sudo mkdir -p /app/hadoop/tmp
sudo chown hadoop1:hadoop1 /app/hadoop/tmp
sudo chmod 750 /app/hadoop/tmp
YSR Engineering College of YVU, Proddatur, Kadapa
Big Data Analytics with R and Hadoop
November 30, 2016 Slide: 14 / 70
Outline Introduction RHadoop RHadoop Installation rhdfs rmr2 Examples
Files to Configure
The following are the files we need to configure
core-site.xml
hadoop-env.sh
mapred-site.xml
hdfs-site.xml
YSR Engineering College of YVU, Proddatur, Kadapa
Big Data Analytics with R and Hadoop
November 30, 2016 Slide: 15 / 70
Outline Introduction RHadoop RHadoop Installation rhdfs rmr2 Examples
Add properties in /usr/local/hadoop/etc/core-site.xml
Add the following snippets between the
< configuration > ... < /configuration > tags in the core-site.xml
file.
Add below property to specify the location of tmp
< property >
< name > hadoop.tmp.dir < /name >
< value > /app/hadoop/tmp < /value >
< /property >
Add below property to specify the location of default file
system and its port number.
< property >
< name > fs.default.name < /name >
< value > hdfs : //localhost : 54310 < /value >
< /property >
YSR Engineering College of YVU, Proddatur, Kadapa
Big Data Analytics with R and Hadoop
November 30, 2016 Slide: 16 / 70
Outline Introduction RHadoop RHadoop Installation rhdfs rmr2 Examples
Add properties in /usr/local/hadoop/etc/hadoop-env.sh
Un-Comment the JAVA HOME and Give Correct Path For
Java.
export JAVA HOME=/usr/lib/jvm/java-7-openjdk-amd64
YSR Engineering College of YVU, Proddatur, Kadapa
Big Data Analytics with R and Hadoop
November 30, 2016 Slide: 17 / 70
Outline Introduction RHadoop RHadoop Installation rhdfs rmr2 Examples
Add property in
/usr/local/hadoop/etc/hadoop/mapred-site.xml
In file we add The host name and port that the MapReduce job
tracker runs at. Add following in mapred-site.xml :
< property >
< name > mapred.job.tracker < /name >
< value > localhost : 54311 < /value >
< /property >
YSR Engineering College of YVU, Proddatur, Kadapa
Big Data Analytics with R and Hadoop
November 30, 2016 Slide: 18 / 70
Outline Introduction RHadoop RHadoop Installation rhdfs rmr2 Examples
Add properties in ... etc/hadoop/hdfs-site.xml
In file hdfs-site.xml add following:
Add replication factor
< property >
< name > dfs.replication < /name >
< value > 1 < /value >
< /property >
Specify the NameNode
< property >
< name > dfs.namenode.name.dir < /name >
< value > file : /usr/local/hadoopdata/hdfs/namenode < /value >
< /property >
Specify the DataNode
< property >
< name > dfs.datanode.name.dir < /name >
< value > file : /usr/local/hadoopdata/hdfs/datanode < /value >
< /property >
YSR Engineering College of YVU, Proddatur, Kadapa
Big Data Analytics with R and Hadoop
November 30, 2016 Slide: 19 / 70
Outline Introduction RHadoop RHadoop Installation rhdfs rmr2 Examples
Formatting the HDFS file system via the NameNode
The first step to starting up your Hadoop installation is
Formatting the Hadoop file system
We need to do this the first time you set up a Hadoop.
Do not format a running Hadoop file system as you will lose
all the data currently in HDFS
To format the file system, run the command
hadoop namenode -format
YSR Engineering College of YVU, Proddatur, Kadapa
Big Data Analytics with R and Hadoop
November 30, 2016 Slide: 20 / 70
Outline Introduction RHadoop RHadoop Installation rhdfs rmr2 Examples
Starting single-node cluster
Run the command:
start-all.sh
This will startup a NameNode,SecondaryNameNode,
DataNode, ResourceManager and a NodeManager on your
machine.
A nifty tool for checking whether the expected Hadoop
processes are running is jps
hadoop1@hadoop1:/usr/local/hadoop$ jps
2598 NameNode
3112 ResourceManager
3523 Jps
2917 SecondaryNameNode
2727 DataNode
3242 NodeManager
YSR Engineering College of YVU, Proddatur, Kadapa
Big Data Analytics with R and Hadoop
November 30, 2016 Slide: 21 / 70
Outline Introduction RHadoop RHadoop Installation rhdfs rmr2 Examples
Stopping your single-node cluster
Run the command
stop-all.sh
To stop all the daemons running on your machine output will be
like this.
stopping NodeManager
localhost: stopping ResourceManager
stopping NameNode
localhost: stopping DataNode
localhost: stopping SecondaryNameNode
YSR Engineering College of YVU, Proddatur, Kadapa
Big Data Analytics with R and Hadoop
November 30, 2016 Slide: 22 / 70
Outline Introduction RHadoop RHadoop Installation rhdfs rmr2 Examples
R - Introduction
R is an open source software package to perform statistical
analysis on data.
R is a programming language developed from S(Statistical)
R provides a wide variety of statistical, machine learning,
graphical techniques, and is highly extensible.
R can now connect with other data stores, such as MySQL,
SQLite, MongoDB, and Hadoop etc.,
YSR Engineering College of YVU, Proddatur, Kadapa
Big Data Analytics with R and Hadoop
November 30, 2016 Slide: 23 / 70
Outline Introduction RHadoop RHadoop Installation rhdfs rmr2 Examples
R - Features
Following are Some of the R Features
Effective statistical programming language
Relational database support
Data analytics
Data visualization
Extension through the vast library of R packages
YSR Engineering College of YVU, Proddatur, Kadapa
Big Data Analytics with R and Hadoop
November 30, 2016 Slide: 24 / 70
Outline Introduction RHadoop RHadoop Installation rhdfs rmr2 Examples
R - Operations
R allows performing Data analytics by various operations such as:
Regression
Classification
Clustering
Recommendation
Text mining
YSR Engineering College of YVU, Proddatur, Kadapa
Big Data Analytics with R and Hadoop
November 30, 2016 Slide: 25 / 70
Outline Introduction RHadoop RHadoop Installation rhdfs rmr2 Examples
R - Installation (Windows)
For Windows, follow the given steps:
1 Navigate to www.r-project.org.
2 Click on the CRAN section, select CRAN mirror, and select
your Windows OS (stick to Linux; Hadoop is almost always
used in a Linux environment).
3 Download the latest R version from the mirror.
4 Execute the downloaded .exe to install R.
YSR Engineering College of YVU, Proddatur, Kadapa
Big Data Analytics with R and Hadoop
November 30, 2016 Slide: 26 / 70
Outline Introduction RHadoop RHadoop Installation rhdfs rmr2 Examples
R - Installation (Ubuntu)
For Linux-Ubuntu, follow the given steps:
1 Navigate to www.r-project.org.
2 Click on the CRAN section, select CRAN mirror, and select
your OS.
3 In the /etc/apt/sources.list file, add the CRAN
< mirror > entry.
4 Download and update the package lists from the repositories
using the sudo apt-get update command.
5 Install R system using the sudo apt-get install r-base
command.
YSR Engineering College of YVU, Proddatur, Kadapa
Big Data Analytics with R and Hadoop
November 30, 2016 Slide: 27 / 70
Outline Introduction RHadoop RHadoop Installation rhdfs rmr2 Examples
RHEL/CentOS
For Linux-RHEL/CentOS, follow the given steps:
1 Navigate to www.r-project.org.
2 Click on CRAN, select CRAN mirror, and select Red Hat OS.
3 Download the R-*core-*.rpm file.
4 Install the .rpm package using the rpm -ivh R-*core-*.rpm
command.
5 Install R system using sudo yum install R.
YSR Engineering College of YVU, Proddatur, Kadapa
Big Data Analytics with R and Hadoop
November 30, 2016 Slide: 28 / 70
Outline Introduction RHadoop RHadoop Installation rhdfs rmr2 Examples
Hadoop MapReduce in R
Hadoop MapReduce in R, we can perform in Three Ways:
1 R and Hadoop Integrated Programming Environment
(RHIPE)
2 HadoopStreaming
3 RHadoop
Among these three RHadoop is efficient and easiest.
YSR Engineering College of YVU, Proddatur, Kadapa
Big Data Analytics with R and Hadoop
November 30, 2016 Slide: 29 / 70
Outline Introduction RHadoop RHadoop Installation rhdfs rmr2 Examples
RHadoop - Introduction
RHadoop was developed by Revolution Analytics
RHadoop is available with three main R packages:
1 rhdfs - provides HDFS data operations
2 rmr - provides MapReduce execution operations
3 rhbase - input data source at the HBase
Here it’s not necessary to install all of the three RHadoop
packages to run the Hadoop MapReduce operations with R
and Hadoop.
YSR Engineering College of YVU, Proddatur, Kadapa
Big Data Analytics with R and Hadoop
November 30, 2016 Slide: 30 / 70
Outline Introduction RHadoop RHadoop Installation rhdfs rmr2 Examples
RHadoop - Architecture
YSR Engineering College of YVU, Proddatur, Kadapa
Big Data Analytics with R and Hadoop
November 30, 2016 Slide: 31 / 70
Outline Introduction RHadoop RHadoop Installation rhdfs rmr2 Examples
rhdfs
rhdfs is an R interface for providing the HDFS usability from
the R console.
rhdfs package calls the HDFS API in backend to operate data
sources stored on HDFS.
With rhdfs methods, R programmer can easily perform read
and write operations on distributed data files.
YSR Engineering College of YVU, Proddatur, Kadapa
Big Data Analytics with R and Hadoop
November 30, 2016 Slide: 32 / 70
Outline Introduction RHadoop RHadoop Installation rhdfs rmr2 Examples
rmr
rmr is an R interface for providing Hadoop MapReduce facility
inside the R environment.
R programmer needs to just divide their application logic into
the map and reduce phases and submit it with the rmr
methods.
After that, rmr calls the Hadoop streaming MapReduce API
with several job parameters as input directory, output
directory, mapper, reducer, and so on, to perform the R
MapReduce job over Hadoop cluster.
YSR Engineering College of YVU, Proddatur, Kadapa
Big Data Analytics with R and Hadoop
November 30, 2016 Slide: 33 / 70
Outline Introduction RHadoop RHadoop Installation rhdfs rmr2 Examples
rhbase
rhbase is an R interface for operating the Hadoop HBase data
source stored at the distributed network via a Thrift server.
The rhbase package is designed with several methods for
initialization and read/write and table manipulation
operations.
YSR Engineering College of YVU, Proddatur, Kadapa
Big Data Analytics with R and Hadoop
November 30, 2016 Slide: 34 / 70
Outline Introduction RHadoop RHadoop Installation rhdfs rmr2 Examples
R and Hadoop installation
We already installed R and Hadoop
YSR Engineering College of YVU, Proddatur, Kadapa
Big Data Analytics with R and Hadoop
November 30, 2016 Slide: 35 / 70
Outline Introduction RHadoop RHadoop Installation rhdfs rmr2 Examples
Installing the R packages
To connect R and Hadoop we need to install some of the packages:
httr
functional
devtools
plyr
reshape2
rJava
RJSONIO
itertools
digest
Rcpp
install.packages( c(’httr’,’functional’,’devtools’, ’plyr’,’reshape2’))
install.packages( c(’rJava’,’RJSONIO’, ’itertools’, ’digest’,’Rcpp’))
YSR Engineering College of YVU, Proddatur, Kadapa
Big Data Analytics with R and Hadoop
November 30, 2016 Slide: 36 / 70
Outline Introduction RHadoop RHadoop Installation rhdfs rmr2 Examples
Setting environment variables
We need to set following environment variables through R console.
## Setting HADOOP CMD
Sys.setenv(HADOOP CMD="/usr/local/hadoop/bin/hadoop")
## Setting up HADOOP STREAMING
Sys.setenv(HADOOP STREAMING="/usr/local/hadoop/share
/hadoop/tools/lib/hadoop-streaming-2.7.3.jar")
or, we can also set the R console via the command line as follows:
export HADOOP CMD="/usr/local/hadoop/"
export HADOOP STREAMING="/usr/local/hadoop/share
/hadoop/tools/lib/hadoop-streaming-2.7.3.jar"
YSR Engineering College of YVU, Proddatur, Kadapa
Big Data Analytics with R and Hadoop
November 30, 2016 Slide: 37 / 70
Outline Introduction RHadoop RHadoop Installation rhdfs rmr2 Examples
Usage of Hadoop Streaming jar
YSR Engineering College of YVU, Proddatur, Kadapa
Big Data Analytics with R and Hadoop
November 30, 2016 Slide: 38 / 70
Outline Introduction RHadoop RHadoop Installation rhdfs rmr2 Examples
Downloading RHadoop Packages
Download RHadoop packages from GitHub repository of
Revolution Analytics:
https://github.com/RevolutionAnalytics/RHadoop
rmr: [rmr-2 3.3.1.tar.gz]
rhdfs: [rhdfs-1.0.8.tar.gz]
rhbase: [rhbase-1.2.1.tar.gz]
We can install these packages using R-command line or RStudio
YSR Engineering College of YVU, Proddatur, Kadapa
Big Data Analytics with R and Hadoop
November 30, 2016 Slide: 39 / 70
Outline Introduction RHadoop RHadoop Installation rhdfs rmr2 Examples
Installing rmr package
Install throught R Commander using the following Command
R CMD INSTALL rmr-2 3.3.1.tar.gz
Install using Rstudio follow the steps
Click on Tools → Install Packages
Change Install from option from Repository(CERN) to
Package Archive File (.tar.gz) option
Choose the rmr-2 3.3.1.tar.gz file from your local system
Click on Install button (It also install supporting packages of
rmr)
YSR Engineering College of YVU, Proddatur, Kadapa
Big Data Analytics with R and Hadoop
November 30, 2016 Slide: 40 / 70
Outline Introduction RHadoop RHadoop Installation rhdfs rmr2 Examples
Installing rhdfs package
Install throught R Commander using the following Command
R CMD INSTALL rhdfs-1.0.8.tar.gz
Install using Rstudio follow the steps
Click on Tools → Install Packages
Change Install from option from Repository(CERN) to
Package Archive File (.tar.gz) option
Choose the rhdfs-1.0.8.tar.gz file from your local system
Click on Install button (It also install supporting packages of
rhdfs)
YSR Engineering College of YVU, Proddatur, Kadapa
Big Data Analytics with R and Hadoop
November 30, 2016 Slide: 41 / 70
Outline Introduction RHadoop RHadoop Installation rhdfs rmr2 Examples
Installing rhbase package
Install throught R Commander using the following Command
R CMD INSTALL rhbase-1.2.1.tar.gz
Install using Rstudio follow the steps
Click on Tools → Install Packages
Change Install from option from Repository(CERN) to
Package Archive File (.tar.gz) option
Choose the rhbase-1.2.1.tar.gz file from your local system
Click on Install button (It also install supporting packages of
rhdfs)
YSR Engineering College of YVU, Proddatur, Kadapa
Big Data Analytics with R and Hadoop
November 30, 2016 Slide: 42 / 70
Outline Introduction RHadoop RHadoop Installation rhdfs rmr2 Examples
Loading the RHadoop libraries
However we load a normal library in R, Similarly we can load
RHadoop libraries using require() or library() methods.
library(’rhdfs’) # Loading HDFS
library(’rmr2’) # Loading MapReduce
library(’rhbase’) # Loading HBase
YSR Engineering College of YVU, Proddatur, Kadapa
Big Data Analytics with R and Hadoop
November 30, 2016 Slide: 43 / 70
Outline Introduction RHadoop RHadoop Installation rhdfs rmr2 Examples
Initializing the RHadoop
Initialize the rhdfs package with parameters specifying the location
of the hadoop configuration files.
Syntax:
hdfs.init(hadoop=PATH)
here PATH specifys the location of the hadoop configuration file.
If we can’t pass any parameter, by default conguration files taken
from the HADOOP CMD environment variable.
YSR Engineering College of YVU, Proddatur, Kadapa
Big Data Analytics with R and Hadoop
November 30, 2016 Slide: 44 / 70
Outline Introduction RHadoop RHadoop Installation rhdfs rmr2 Examples
hdfs.ls
It is useful to list files and directories of the HDFS. It returns the
data frames that columns corresponding to permissions, owner,
groups, size (in bytes), modification time and file or directory
name.
syntax: hdfs.ls(path, recurse=FALSE)
If recurse is TRUE, It recursively shows the sub directories.
YSR Engineering College of YVU, Proddatur, Kadapa
Big Data Analytics with R and Hadoop
November 30, 2016 Slide: 45 / 70
Outline Introduction RHadoop RHadoop Installation rhdfs rmr2 Examples
hdfs.defaults
This method is used to set and get the default configurations of
the HDFS
Syntax:
hdfs.defaults(arg)
arg indicates name of the parameter or NULL.
This function list following values
local: rJava object corresponding to local system.
blocksize: default block size of the files stored in HDFS
fs: an rJava object corresponds to the HDFS
fu: Helper object for rhdfs
classpath: The java classpath
replication: default replication factor in HDFS
conf : name-value mappings for Hadoop configuration
parameters
YSR Engineering College of YVU, Proddatur, Kadapa
Big Data Analytics with R and Hadoop
November 30, 2016 Slide: 46 / 70
Outline Introduction RHadoop RHadoop Installation rhdfs rmr2 Examples
hdfs.defaults : Examples
YSR Engineering College of YVU, Proddatur, Kadapa
Big Data Analytics with R and Hadoop
November 30, 2016 Slide: 47 / 70
Outline Introduction RHadoop RHadoop Installation rhdfs rmr2 Examples
hdfs.cat
This method is useful to read the lines form a file on HDFS.
Syntax:
hdfs.cat(path,n,buffersize)
path : Location of the source file
n : Number of line read form file
buffersize : Size of the buffer (Optional)
Example:
hdfs.cat(’/RHadoop/1/example.txt’)
YSR Engineering College of YVU, Proddatur, Kadapa
Big Data Analytics with R and Hadoop
November 30, 2016 Slide: 48 / 70
Outline Introduction RHadoop RHadoop Installation rhdfs rmr2 Examples
hdfs.put
This method is useful to transfer the data from the local system
to HDFS.
Syntax:
hdfs.put(src,dest,dstFS=hdfs.defaults(”fs”))
src : Location of the source directory or file
dest : Location of the destination directory or file
dstFS : The destination file system (Optional)
Example:
hdfs.put(’/home/dp/Desktop/example.txt’,’/RHadoop/1/’)
YSR Engineering College of YVU, Proddatur, Kadapa
Big Data Analytics with R and Hadoop
November 30, 2016 Slide: 49 / 70
Outline Introduction RHadoop RHadoop Installation rhdfs rmr2 Examples
hdfs.get
This method is useful to transfer the data from the HDFS to local
system.
Syntax:
hdfs.get(src,dest,srcFS=hdfs.defaults(”fs”))
src : Location of the source directory or file
dest : Location of the destination directory or file
srcFS : The source file system (Optional)
Example:
hdfs.get(’/RHadoop/1/’,’/home/dp/Desktop/1/’)
YSR Engineering College of YVU, Proddatur, Kadapa
Big Data Analytics with R and Hadoop
November 30, 2016 Slide: 50 / 70
Outline Introduction RHadoop RHadoop Installation rhdfs rmr2 Examples
hdfs.copy | hdfs.cp
This method is useful to copy the data from one location of the
HDFS to another location in HDFS
Syntax:
hdfs.copy(src,dest,overwrite=FALSE)
src : Location of the source directory or file
dest : Location of the destination directory or file
overwrite : If file exist, whether or not it should be overwritten
Example:
hdfs.copy(’/RHadoop/1/’,’/RHadoop/2/’)
YSR Engineering College of YVU, Proddatur, Kadapa
Big Data Analytics with R and Hadoop
November 30, 2016 Slide: 51 / 70
Outline Introduction RHadoop RHadoop Installation rhdfs rmr2 Examples
hdfs.move
This method is useful to move the data from one location of the
HDFS to another location in HDFS and remove the source
directory or file.
Syntax:
hdfs.move(src,dest)
src : Location of the source directory or file
dest : Location of the destination directory or file
Example:
hdfs.move(’/RHadoop/1/’,’/RHadoop/2/’)
YSR Engineering College of YVU, Proddatur, Kadapa
Big Data Analytics with R and Hadoop
November 30, 2016 Slide: 52 / 70
Outline Introduction RHadoop RHadoop Installation rhdfs rmr2 Examples
hdfs.rename
This method is useful to rename the file or directory in HDFS
through R
Syntax:
hdfs.rename(src,dest)
src : Location of the source directory or file
dest : Location of the destination directory or file
Example:
hdfs.rename(’/RHadoop/1/example.txt’,’/RHadoop/1/sample.txt’)
YSR Engineering College of YVU, Proddatur, Kadapa
Big Data Analytics with R and Hadoop
November 30, 2016 Slide: 53 / 70
Outline Introduction RHadoop RHadoop Installation rhdfs rmr2 Examples
hdfs.rm | hdfs.rmr | hdfs.delete
These functions are used to delete files or directories of HDFS
using R.
Syntax:
hdfs.delete(path)
hdfs.rm(path)
hdfs.rmr(path)
Example:
hdfs.delete("/RHadoop/1/")
hdfs.rm("/RHadoop/1/")
hdfs.rmr("/RHadoop/1/")
YSR Engineering College of YVU, Proddatur, Kadapa
Big Data Analytics with R and Hadoop
November 30, 2016 Slide: 54 / 70
Outline Introduction RHadoop RHadoop Installation rhdfs rmr2 Examples
hdfs.chmod
This method is useful to changing the permissions of HDFS files or
Directories
Syntax
hdfs.chmod(Path, permissions= ’777’)
permission is a character that represents permission of a file or
directory,.
Example
hdfs.chmod("/RHadoop", permissions= ’777’)
YSR Engineering College of YVU, Proddatur, Kadapa
Big Data Analytics with R and Hadoop
November 30, 2016 Slide: 55 / 70
Outline Introduction RHadoop RHadoop Installation rhdfs rmr2 Examples
hdfs.dircreate | hdfs.mkdir
Both these functions will be used for creating a directory over the
HDFS filesystem.
Syntax:
hdfs.mkdir(Dirname)
Example:
hdfs.mkdir("/RHadoop/3/")
YSR Engineering College of YVU, Proddatur, Kadapa
Big Data Analytics with R and Hadoop
November 30, 2016 Slide: 56 / 70
Outline Introduction RHadoop RHadoop Installation rhdfs rmr2 Examples
hdfs.file
This is used to initialize the file to be used for read/write operation
on local system or HDFS.
Syntax:
hdfs.file(path, mode, buffersize ..)
’r’ for read mode, ’w’ for write mode. Append mode is not
allowed.
Example:
f =
hdfs.file("/RHadoop/2/README.txt","r",buffersize=104857600)
YSR Engineering College of YVU, Proddatur, Kadapa
Big Data Analytics with R and Hadoop
November 30, 2016 Slide: 57 / 70
Outline Introduction RHadoop RHadoop Installation rhdfs rmr2 Examples
hdfs.write
This is used to write in to the file stored at HDFS via streaming.
Syntax:
hdfs.write(object,con,hsync=FALSE)
Object is any R object, con is HDFS connection
Example:
obj = c1,2,3,4,5,6,7
hdfs.write(object,con,hsync=FALSE)
YSR Engineering College of YVU, Proddatur, Kadapa
Big Data Analytics with R and Hadoop
November 30, 2016 Slide: 58 / 70
Outline Introduction RHadoop RHadoop Installation rhdfs rmr2 Examples
hdfs.read
This is used to read from binary files on the HDFS directory. This
will use the stream for the deserialization of the data.
Syntax:
hdfs.read(con,n,start)
n indicates number of bytes, start indicates starting block.
Example:
f =
hdfs.file("/RHadoop/2/README.txt","r",buffersize=104857600)
m = hdfs.read(f)
c = rawToChar(m)
print(c)
YSR Engineering College of YVU, Proddatur, Kadapa
Big Data Analytics with R and Hadoop
November 30, 2016 Slide: 59 / 70
Outline Introduction RHadoop RHadoop Installation rhdfs rmr2 Examples
hdfs.close
This is used to close the stream when a file operation is complete.
It will close the stream and will not allow further file operations.
Syntax:
hdfs.close(con)
con indicates connection of HDFS
Example:
hdfs.close(f)
YSR Engineering College of YVU, Proddatur, Kadapa
Big Data Analytics with R and Hadoop
November 30, 2016 Slide: 60 / 70
Outline Introduction RHadoop RHadoop Installation rhdfs rmr2 Examples
hdfs.file.info
This is used to get meta information about the file stored at
HDFS.
Syntax:
hdfs.file.info(PATH)
Example:
YSR Engineering College of YVU, Proddatur, Kadapa
Big Data Analytics with R and Hadoop
November 30, 2016 Slide: 61 / 70
Outline Introduction RHadoop RHadoop Installation rhdfs rmr2 Examples
to.dfs
Write R objects to the file system.
Syntax:
to.dfs(kv,output,format=”native”)
kv means any valid key value pair or vector, matrix ect.,
output is any valid path, and format is string naming format
Example: small.ints ← to.dfs(1:10)
YSR Engineering College of YVU, Proddatur, Kadapa
Big Data Analytics with R and Hadoop
November 30, 2016 Slide: 62 / 70
Outline Introduction RHadoop RHadoop Installation rhdfs rmr2 Examples
from.dfs
This is used to read the R objects from the HDFS filesystem that
are in the binary encrypted format.
Syntax:
from.dfs(input,format)
input is any valid path, and format is string naming format
Example:
from.dfs(’/tmp/RtmpRMIXzb/file2bda3fa07850’)
YSR Engineering College of YVU, Proddatur, Kadapa
Big Data Analytics with R and Hadoop
November 30, 2016 Slide: 63 / 70
Outline Introduction RHadoop RHadoop Installation rhdfs rmr2 Examples
mapreduce
This is used for defining and executing the MapReduce job.
Syntax:
mapreduce(input, output, map, reduce, input.format,
output.format)
input: Path to the input folder on HDFS
output: Path to the output folder on HDFS
map:An optional R function returning null or a value of
keyval()
reduce: An optional R function of two arguments, a key and a
data structure representing all the values associated with key
input.format: Type of input data
output.format: Type of output data
YSR Engineering College of YVU, Proddatur, Kadapa
Big Data Analytics with R and Hadoop
November 30, 2016 Slide: 64 / 70
Outline Introduction RHadoop RHadoop Installation rhdfs rmr2 Examples
keyval
The keyval function is used to creates return values from map or
reduce functions, themselves parameters to mapreduce.
Syntax:
keyval(key,val)
Where key is the desired key or keys, and val is the desired
value or values.
YSR Engineering College of YVU, Proddatur, Kadapa
Big Data Analytics with R and Hadoop
November 30, 2016 Slide: 65 / 70
Outline Introduction RHadoop RHadoop Installation rhdfs rmr2 Examples
WordCount Mapreduce source code
#Set Environment Variables
Sys.setenv(HADOOP CMD="/usr/local/hadoop/bin/hadoop")
Sys.setenv(HADOOP STREAMING="/usr/local/hadoop/share
/hadoop/tools/lib/hadoop-streaming-2.7.1.jar")
Sys.setenv(HADOOP HOME="/usr/local/hadoop/")
# load librarys
library(rmr2)
library(rhdfs)
# initiate rhdfs package
hdfs.init()
Cont..
YSR Engineering College of YVU, Proddatur, Kadapa
Big Data Analytics with R and Hadoop
November 30, 2016 Slide: 66 / 70
Outline Introduction RHadoop RHadoop Installation rhdfs rmr2 Examples
WordCount Mapreduce source code - cont..
map ← function(k,lines) {
words.list ← strsplit(lines, ’ ’)
words ← unlist(words.list)
return( keyval(words, 1) )
}
reduce ← function(word, counts) {
keyval(word, sum(counts))
}
wordcount ← function (input, output) {
mapreduce(input=input, output=output, input.format="text", map=map,
reduce=reduce)
}
Cont..
YSR Engineering College of YVU, Proddatur, Kadapa
Big Data Analytics with R and Hadoop
November 30, 2016 Slide: 67 / 70
Outline Introduction RHadoop RHadoop Installation rhdfs rmr2 Examples
WordCount Mapreduce source code - cont..
## read text files from folder /in1/wc/
hdfs.root ← ’/in1’
hdfs.data ← file.path(hdfs.root, ’wc’)
## save result in folder /in1/out
hdfs.out ← file.path(hdfs.root, ’out’)
## Submit job
out ← wordcount(hdfs.data, hdfs.out)
results ← from.dfs(out)
results.df ← as.data.frame(results, stringsAsFactors=F)
colnames(results.df) ← c(’word’, ’count’)
head(results.df)
YSR Engineering College of YVU, Proddatur, Kadapa
Big Data Analytics with R and Hadoop
November 30, 2016 Slide: 68 / 70
Outline Introduction RHadoop RHadoop Installation rhdfs rmr2 Examples
WordCount Output
YSR Engineering College of YVU, Proddatur, Kadapa
Big Data Analytics with R and Hadoop
November 30, 2016 Slide: 69 / 70
Outline Introduction RHadoop RHadoop Installation rhdfs rmr2 Examples
thank You
YSR Engineering College of YVU, Proddatur, Kadapa
Big Data Analytics with R and Hadoop
November 30, 2016 Slide: 70 / 70

Mais conteúdo relacionado

Mais procurados

Introduction to R Programming
Introduction to R ProgrammingIntroduction to R Programming
Introduction to R Programmingizahn
 
NOSQL- Presentation on NoSQL
NOSQL- Presentation on NoSQLNOSQL- Presentation on NoSQL
NOSQL- Presentation on NoSQLRamakant Soni
 
Introduction to R programming
Introduction to R programmingIntroduction to R programming
Introduction to R programmingVictor Ordu
 
Data warehouse architecture
Data warehouse architectureData warehouse architecture
Data warehouse architecturepcherukumalla
 
Big Data - The 5 Vs Everyone Must Know
Big Data - The 5 Vs Everyone Must KnowBig Data - The 5 Vs Everyone Must Know
Big Data - The 5 Vs Everyone Must KnowBernard Marr
 
Data mining , Knowledge Discovery Process, Classification
Data mining , Knowledge Discovery Process, ClassificationData mining , Knowledge Discovery Process, Classification
Data mining , Knowledge Discovery Process, ClassificationDr. Abdul Ahad Abro
 
multi dimensional data model
multi dimensional data modelmulti dimensional data model
multi dimensional data modelmoni sindhu
 
Lect 2 getting to know your data
Lect 2 getting to know your dataLect 2 getting to know your data
Lect 2 getting to know your datahktripathy
 
Spatial data mining
Spatial data miningSpatial data mining
Spatial data miningMITS Gwalior
 
Linear Regression With R
Linear Regression With RLinear Regression With R
Linear Regression With REdureka!
 
Introduction to pandas
Introduction to pandasIntroduction to pandas
Introduction to pandasPiyush rai
 

Mais procurados (20)

R programming
R programmingR programming
R programming
 
Introduction to R Programming
Introduction to R ProgrammingIntroduction to R Programming
Introduction to R Programming
 
NOSQL- Presentation on NoSQL
NOSQL- Presentation on NoSQLNOSQL- Presentation on NoSQL
NOSQL- Presentation on NoSQL
 
Object oriented databases
Object oriented databasesObject oriented databases
Object oriented databases
 
Introduction to R programming
Introduction to R programmingIntroduction to R programming
Introduction to R programming
 
Uncertainty in AI
Uncertainty in AIUncertainty in AI
Uncertainty in AI
 
Big Data Analytics
Big Data AnalyticsBig Data Analytics
Big Data Analytics
 
Data warehouse architecture
Data warehouse architectureData warehouse architecture
Data warehouse architecture
 
Map Reduce
Map ReduceMap Reduce
Map Reduce
 
Big Data - The 5 Vs Everyone Must Know
Big Data - The 5 Vs Everyone Must KnowBig Data - The 5 Vs Everyone Must Know
Big Data - The 5 Vs Everyone Must Know
 
Data mining , Knowledge Discovery Process, Classification
Data mining , Knowledge Discovery Process, ClassificationData mining , Knowledge Discovery Process, Classification
Data mining , Knowledge Discovery Process, Classification
 
multi dimensional data model
multi dimensional data modelmulti dimensional data model
multi dimensional data model
 
Lect 2 getting to know your data
Lect 2 getting to know your dataLect 2 getting to know your data
Lect 2 getting to know your data
 
5 v of big data
5 v of big data5 v of big data
5 v of big data
 
Spatial data mining
Spatial data miningSpatial data mining
Spatial data mining
 
18 Data Streams
18 Data Streams18 Data Streams
18 Data Streams
 
Linear Regression With R
Linear Regression With RLinear Regression With R
Linear Regression With R
 
Big data frameworks
Big data frameworksBig data frameworks
Big data frameworks
 
Introduction to pandas
Introduction to pandasIntroduction to pandas
Introduction to pandas
 
Data preprocessing
Data preprocessingData preprocessing
Data preprocessing
 

Destaque

Training in Analytics, R and Social Media Analytics
Training in Analytics, R and Social Media AnalyticsTraining in Analytics, R and Social Media Analytics
Training in Analytics, R and Social Media AnalyticsAjay Ohri
 
Data Analytics with R and SQL Server
Data Analytics with R and SQL ServerData Analytics with R and SQL Server
Data Analytics with R and SQL ServerStéphane Fréchette
 
A Workshop on R
A Workshop on RA Workshop on R
A Workshop on RAjay Ohri
 
Introduction to Data Analytics with R
Introduction to Data Analytics with RIntroduction to Data Analytics with R
Introduction to Data Analytics with RWei Zhong Toh
 
Tata consultancy services final
Tata consultancy services finalTata consultancy services final
Tata consultancy services finalWasim Akram
 

Destaque (6)

R and Data Science
R and Data ScienceR and Data Science
R and Data Science
 
Training in Analytics, R and Social Media Analytics
Training in Analytics, R and Social Media AnalyticsTraining in Analytics, R and Social Media Analytics
Training in Analytics, R and Social Media Analytics
 
Data Analytics with R and SQL Server
Data Analytics with R and SQL ServerData Analytics with R and SQL Server
Data Analytics with R and SQL Server
 
A Workshop on R
A Workshop on RA Workshop on R
A Workshop on R
 
Introduction to Data Analytics with R
Introduction to Data Analytics with RIntroduction to Data Analytics with R
Introduction to Data Analytics with R
 
Tata consultancy services final
Tata consultancy services finalTata consultancy services final
Tata consultancy services final
 

Semelhante a RHadoop

Hadoop installation, Configuration, and Mapreduce program
Hadoop installation, Configuration, and Mapreduce programHadoop installation, Configuration, and Mapreduce program
Hadoop installation, Configuration, and Mapreduce programPraveen Kumar Donta
 
R & Python on Hadoop
R & Python on HadoopR & Python on Hadoop
R & Python on HadoopMing Yuan
 
Pig Tutorial | Twitter Case Study | Apache Pig Script and Commands | Edureka
Pig Tutorial | Twitter Case Study | Apache Pig Script and Commands | EdurekaPig Tutorial | Twitter Case Study | Apache Pig Script and Commands | Edureka
Pig Tutorial | Twitter Case Study | Apache Pig Script and Commands | EdurekaEdureka!
 
July 2010 Triangle Hadoop Users Group - Chad Vawter Slides
July 2010 Triangle Hadoop Users Group - Chad Vawter SlidesJuly 2010 Triangle Hadoop Users Group - Chad Vawter Slides
July 2010 Triangle Hadoop Users Group - Chad Vawter Slidesryancox
 
Introduction to Data Mining with R and Data Import/Export in R
Introduction to Data Mining with R and Data Import/Export in RIntroduction to Data Mining with R and Data Import/Export in R
Introduction to Data Mining with R and Data Import/Export in RYanchang Zhao
 
field_guide_to_hadoop_pentaho
field_guide_to_hadoop_pentahofield_guide_to_hadoop_pentaho
field_guide_to_hadoop_pentahoMartin Ferguson
 
Getting started with R & Hadoop
Getting started with R & HadoopGetting started with R & Hadoop
Getting started with R & HadoopJeffrey Breen
 
Yarn by default (Spark on YARN)
Yarn by default (Spark on YARN)Yarn by default (Spark on YARN)
Yarn by default (Spark on YARN)Ferran Galí Reniu
 
HDFS presented by VIJAY
HDFS presented by VIJAYHDFS presented by VIJAY
HDFS presented by VIJAYthevijayps
 
Basic of Big Data
Basic of Big Data Basic of Big Data
Basic of Big Data Amar kumar
 
Introduction of R on Hadoop
Introduction of R on HadoopIntroduction of R on Hadoop
Introduction of R on HadoopChung-Tsai Su
 
Hadoop Interview Questions and Answers | Big Data Interview Questions | Hadoo...
Hadoop Interview Questions and Answers | Big Data Interview Questions | Hadoo...Hadoop Interview Questions and Answers | Big Data Interview Questions | Hadoo...
Hadoop Interview Questions and Answers | Big Data Interview Questions | Hadoo...Edureka!
 
Analyzing Big data in R and Scala using Apache Spark 17-7-19
Analyzing Big data in R and Scala using Apache Spark  17-7-19Analyzing Big data in R and Scala using Apache Spark  17-7-19
Analyzing Big data in R and Scala using Apache Spark 17-7-19Ahmed Elsayed
 
Basics of big data analytics hadoop
Basics of big data analytics hadoopBasics of big data analytics hadoop
Basics of big data analytics hadoopAmbuj Kumar
 

Semelhante a RHadoop (20)

Hadoop installation, Configuration, and Mapreduce program
Hadoop installation, Configuration, and Mapreduce programHadoop installation, Configuration, and Mapreduce program
Hadoop installation, Configuration, and Mapreduce program
 
R & Python on Hadoop
R & Python on HadoopR & Python on Hadoop
R & Python on Hadoop
 
Pig Tutorial | Twitter Case Study | Apache Pig Script and Commands | Edureka
Pig Tutorial | Twitter Case Study | Apache Pig Script and Commands | EdurekaPig Tutorial | Twitter Case Study | Apache Pig Script and Commands | Edureka
Pig Tutorial | Twitter Case Study | Apache Pig Script and Commands | Edureka
 
July 2010 Triangle Hadoop Users Group - Chad Vawter Slides
July 2010 Triangle Hadoop Users Group - Chad Vawter SlidesJuly 2010 Triangle Hadoop Users Group - Chad Vawter Slides
July 2010 Triangle Hadoop Users Group - Chad Vawter Slides
 
RHadoop - beginners
RHadoop - beginnersRHadoop - beginners
RHadoop - beginners
 
Introduction to Data Mining with R and Data Import/Export in R
Introduction to Data Mining with R and Data Import/Export in RIntroduction to Data Mining with R and Data Import/Export in R
Introduction to Data Mining with R and Data Import/Export in R
 
field_guide_to_hadoop_pentaho
field_guide_to_hadoop_pentahofield_guide_to_hadoop_pentaho
field_guide_to_hadoop_pentaho
 
Running R on Hadoop - CHUG - 20120815
Running R on Hadoop - CHUG - 20120815Running R on Hadoop - CHUG - 20120815
Running R on Hadoop - CHUG - 20120815
 
Getting started with R & Hadoop
Getting started with R & HadoopGetting started with R & Hadoop
Getting started with R & Hadoop
 
Yarn by default (Spark on YARN)
Yarn by default (Spark on YARN)Yarn by default (Spark on YARN)
Yarn by default (Spark on YARN)
 
Hadoop content
Hadoop contentHadoop content
Hadoop content
 
BIGDATA ANALYTICS LAB MANUAL final.pdf
BIGDATA  ANALYTICS LAB MANUAL final.pdfBIGDATA  ANALYTICS LAB MANUAL final.pdf
BIGDATA ANALYTICS LAB MANUAL final.pdf
 
HDFS presented by VIJAY
HDFS presented by VIJAYHDFS presented by VIJAY
HDFS presented by VIJAY
 
Basic of Big Data
Basic of Big Data Basic of Big Data
Basic of Big Data
 
Introduction of R on Hadoop
Introduction of R on HadoopIntroduction of R on Hadoop
Introduction of R on Hadoop
 
Hadoop Interview Questions and Answers | Big Data Interview Questions | Hadoo...
Hadoop Interview Questions and Answers | Big Data Interview Questions | Hadoo...Hadoop Interview Questions and Answers | Big Data Interview Questions | Hadoo...
Hadoop Interview Questions and Answers | Big Data Interview Questions | Hadoo...
 
Analyzing Big data in R and Scala using Apache Spark 17-7-19
Analyzing Big data in R and Scala using Apache Spark  17-7-19Analyzing Big data in R and Scala using Apache Spark  17-7-19
Analyzing Big data in R and Scala using Apache Spark 17-7-19
 
Basics of big data analytics hadoop
Basics of big data analytics hadoopBasics of big data analytics hadoop
Basics of big data analytics hadoop
 
Hadoop at Lookout
Hadoop at LookoutHadoop at Lookout
Hadoop at Lookout
 
Meeting20150109 v1
Meeting20150109 v1Meeting20150109 v1
Meeting20150109 v1
 

Último

VidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxVidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxolyaivanovalion
 
Carero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptxCarero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptxolyaivanovalion
 
定制英国白金汉大学毕业证(UCB毕业证书) 成绩单原版一比一
定制英国白金汉大学毕业证(UCB毕业证书)																			成绩单原版一比一定制英国白金汉大学毕业证(UCB毕业证书)																			成绩单原版一比一
定制英国白金汉大学毕业证(UCB毕业证书) 成绩单原版一比一ffjhghh
 
Ravak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptxRavak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptxolyaivanovalion
 
Schema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfSchema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfLars Albertsson
 
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Delhi Call girls
 
CebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxCebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxolyaivanovalion
 
Halmar dropshipping via API with DroFx
Halmar  dropshipping  via API with DroFxHalmar  dropshipping  via API with DroFx
Halmar dropshipping via API with DroFxolyaivanovalion
 
Industrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfIndustrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfLars Albertsson
 
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Callshivangimorya083
 
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfMarket Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfRachmat Ramadhan H
 
Brighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data StorytellingBrighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data StorytellingNeil Barnes
 
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改atducpo
 
Smarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptxSmarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptxolyaivanovalion
 
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Serviceranjana rawat
 
BabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptxBabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptxolyaivanovalion
 
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% SecureCall me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% SecurePooja Nehwal
 
Log Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxLog Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxJohnnyPlasten
 
Midocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxMidocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxolyaivanovalion
 
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Callshivangimorya083
 

Último (20)

VidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxVidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptx
 
Carero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptxCarero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptx
 
定制英国白金汉大学毕业证(UCB毕业证书) 成绩单原版一比一
定制英国白金汉大学毕业证(UCB毕业证书)																			成绩单原版一比一定制英国白金汉大学毕业证(UCB毕业证书)																			成绩单原版一比一
定制英国白金汉大学毕业证(UCB毕业证书) 成绩单原版一比一
 
Ravak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptxRavak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptx
 
Schema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfSchema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdf
 
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
 
CebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxCebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptx
 
Halmar dropshipping via API with DroFx
Halmar  dropshipping  via API with DroFxHalmar  dropshipping  via API with DroFx
Halmar dropshipping via API with DroFx
 
Industrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfIndustrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdf
 
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
 
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfMarket Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
 
Brighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data StorytellingBrighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data Storytelling
 
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改
 
Smarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptxSmarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptx
 
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
 
BabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptxBabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptx
 
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% SecureCall me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
 
Log Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxLog Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptx
 
Midocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxMidocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFx
 
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
 

RHadoop

  • 1. Outline Introduction RHadoop RHadoop Installation rhdfs rmr2 Examples Big Data Analytics with R and Hadoop D. Praveen Kumar Research Scholar (Full-Time) Department of Computer Science & Engineering YSREC of Yogi Vemana University, Proddatur Kadapa Dt., A. P, India November 30, 2016 YSR Engineering College of YVU, Proddatur, Kadapa Big Data Analytics with R and Hadoop November 30, 2016 Slide: 1 / 70
  • 2. Outline Introduction RHadoop RHadoop Installation rhdfs rmr2 Examples 1 Introduction 2 RHadoop 3 RHadoop Installation 4 rhdfs Methods 5 rmr2 6 Examples YSR Engineering College of YVU, Proddatur, Kadapa Big Data Analytics with R and Hadoop November 30, 2016 Slide: 2 / 70
  • 3. Outline Introduction RHadoop RHadoop Installation rhdfs rmr2 Examples Big Data - Introduction Big Data has to deal with large and complex data sets that can be structured, semi-structured, or unstructured and will typically not fit into memory to be processed. They have to be processed in place, which means that computation has to be done where the data resides for processing. YSR Engineering College of YVU, Proddatur, Kadapa Big Data Analytics with R and Hadoop November 30, 2016 Slide: 3 / 70
  • 4. Outline Introduction RHadoop RHadoop Installation rhdfs rmr2 Examples Big Data - 3V’s Velocity refers to the low latency, real-time speed at which the analytics need to be applied. (Example: to perform analytics on a continuous stream of data originating from a social networking site) Volume refers to the size of the data set. It may be in KB, MB, GB, TB, or PB based on the type of the application that generates or receives the data. Variety refers to the various types of the data that can exist, for example, text, audio, video, and photos. YSR Engineering College of YVU, Proddatur, Kadapa Big Data Analytics with R and Hadoop November 30, 2016 Slide: 4 / 70
  • 5. Outline Introduction RHadoop RHadoop Installation rhdfs rmr2 Examples Big Data - 3V’s (Cont..) YSR Engineering College of YVU, Proddatur, Kadapa Big Data Analytics with R and Hadoop November 30, 2016 Slide: 5 / 70
  • 6. Outline Introduction RHadoop RHadoop Installation rhdfs rmr2 Examples Popular Organizations that hold Big Data Some of the popular organizations that hold Big Data are as follows: (upto 2014) Facebook: It has 40 PB of data and captures 100 TB/day Yahoo!: It has 60 PB of data Twitter: It captures 8 TB/day EBay: It has 40 PB of data and captures 50 TB/day YSR Engineering College of YVU, Proddatur, Kadapa Big Data Analytics with R and Hadoop November 30, 2016 Slide: 6 / 70
  • 7. Outline Introduction RHadoop RHadoop Installation rhdfs rmr2 Examples Hadoop - Introduction Apache Hadoop is an open source Java framework for processing and querying vast amounts of data on large clusters of commodity hardware. Hadoop is a top level Apache project, initiated and led by Yahoo! and Doug Cutting. Its impact can be boiled down to four salient characteristics: scalable, cost-effective, flexible, fault-tolerant solutions. Apache Hadoop has two main features: HDFS (Hadoop Distributed File System) - Storing Map Reduce - Processing YSR Engineering College of YVU, Proddatur, Kadapa Big Data Analytics with R and Hadoop November 30, 2016 Slide: 7 / 70
  • 8. Outline Introduction RHadoop RHadoop Installation rhdfs rmr2 Examples Requirements Necessary Java >= 7 ssh Linux OS (Ubuntu >= 14.04) Hadoop framework Optional Eclipse Internet connection YSR Engineering College of YVU, Proddatur, Kadapa Big Data Analytics with R and Hadoop November 30, 2016 Slide: 8 / 70
  • 9. Outline Introduction RHadoop RHadoop Installation rhdfs rmr2 Examples Java 7 & Installation Hadoop requires a working Java installation. However, using java 1.7 or more is recommended. Following command is used to install java in linux platform sudo apt-get install openjdk-7-jdk (or) sudo apt-get install default-jdk YSR Engineering College of YVU, Proddatur, Kadapa Big Data Analytics with R and Hadoop November 30, 2016 Slide: 9 / 70
  • 10. Outline Introduction RHadoop RHadoop Installation rhdfs rmr2 Examples Java PATH Setup We need to set JAVA path Open the .bashrc file located in home directory gedit ~/.bashrc Add below line at the end: export JAVA HOME=/usr/lib/jvm/java−7−openjdk−amd64 YSR Engineering College of YVU, Proddatur, Kadapa Big Data Analytics with R and Hadoop November 30, 2016 Slide: 10 / 70
  • 11. Outline Introduction RHadoop RHadoop Installation rhdfs rmr2 Examples Installation & Configuration of SSH Hadoop requires SSH(Secure Shell) access to manage its nodes, i.e. remote machines plus your local machine if you want to use Hadoop on it. Install SSH using following command sudo apt-get install ssh First, we have to generate DSA an SSH key for user. ssh-keygen -t dsa -P ’’ -f ~ /.ssh/id dsa cat ~ /.ssh/id dsa.pub >> ~ /.ssh/authorized keys YSR Engineering College of YVU, Proddatur, Kadapa Big Data Analytics with R and Hadoop November 30, 2016 Slide: 11 / 70
  • 12. Outline Introduction RHadoop RHadoop Installation rhdfs rmr2 Examples Download & Extract Hadoop Download Hadoop from the Apache Download Mirrors http://mirror.fibergrid.in/apache/hadoop/common/ Extract the contents of the Hadoop package to a location of your choice. I picked /usr/local/hadoop. $ sudo chmod 777 /usr/local $ cd /usr/local $ tar xzf hadoop-2.7.2.tar.gz $ sudo mv hadoop-2.7.2 hadoop YSR Engineering College of YVU, Proddatur, Kadapa Big Data Analytics with R and Hadoop November 30, 2016 Slide: 12 / 70
  • 13. Outline Introduction RHadoop RHadoop Installation rhdfs rmr2 Examples Add Hadoop configuration in .bashrc Add Hadoop configuration in .bashrc in home directory. export HADOOP INSTALL=/usr/local/hadoop export PATH=$PATH:$HADOOP INSTALL/bin export PATH=$PATH:$HADOOP INSTALL/sbin export HADOOP MAPRED HOME=$HADOOP INSTALL export HADOOP HDFS HOME=$HADOOP INSTALL export HADOOP COMMON HOME=$HADOOP INSTALL export YARN HOME=$HADOOP INSTALL export HADOOP COMMON LIB NATIVE DIR=$HADOOP INSTALL/lib/native export HADOOP OPTS="-Djava.library.path=$HADOOP INSTALL/lib" YSR Engineering College of YVU, Proddatur, Kadapa Big Data Analytics with R and Hadoop November 30, 2016 Slide: 13 / 70
  • 14. Outline Introduction RHadoop RHadoop Installation rhdfs rmr2 Examples Create temp file, DataNode & NameNode Execute below commands to create NameNode mkdir -p /usr/local/hadoopdata/hdfs/namenode Execute below commands to create DataNode mkdir -p /usr/local/hadoopdata/hdfs/datanode Execute below code to create the tmp directory in hadoop sudo mkdir -p /app/hadoop/tmp sudo chown hadoop1:hadoop1 /app/hadoop/tmp sudo chmod 750 /app/hadoop/tmp YSR Engineering College of YVU, Proddatur, Kadapa Big Data Analytics with R and Hadoop November 30, 2016 Slide: 14 / 70
  • 15. Outline Introduction RHadoop RHadoop Installation rhdfs rmr2 Examples Files to Configure The following are the files we need to configure core-site.xml hadoop-env.sh mapred-site.xml hdfs-site.xml YSR Engineering College of YVU, Proddatur, Kadapa Big Data Analytics with R and Hadoop November 30, 2016 Slide: 15 / 70
  • 16. Outline Introduction RHadoop RHadoop Installation rhdfs rmr2 Examples Add properties in /usr/local/hadoop/etc/core-site.xml Add the following snippets between the < configuration > ... < /configuration > tags in the core-site.xml file. Add below property to specify the location of tmp < property > < name > hadoop.tmp.dir < /name > < value > /app/hadoop/tmp < /value > < /property > Add below property to specify the location of default file system and its port number. < property > < name > fs.default.name < /name > < value > hdfs : //localhost : 54310 < /value > < /property > YSR Engineering College of YVU, Proddatur, Kadapa Big Data Analytics with R and Hadoop November 30, 2016 Slide: 16 / 70
  • 17. Outline Introduction RHadoop RHadoop Installation rhdfs rmr2 Examples Add properties in /usr/local/hadoop/etc/hadoop-env.sh Un-Comment the JAVA HOME and Give Correct Path For Java. export JAVA HOME=/usr/lib/jvm/java-7-openjdk-amd64 YSR Engineering College of YVU, Proddatur, Kadapa Big Data Analytics with R and Hadoop November 30, 2016 Slide: 17 / 70
  • 18. Outline Introduction RHadoop RHadoop Installation rhdfs rmr2 Examples Add property in /usr/local/hadoop/etc/hadoop/mapred-site.xml In file we add The host name and port that the MapReduce job tracker runs at. Add following in mapred-site.xml : < property > < name > mapred.job.tracker < /name > < value > localhost : 54311 < /value > < /property > YSR Engineering College of YVU, Proddatur, Kadapa Big Data Analytics with R and Hadoop November 30, 2016 Slide: 18 / 70
  • 19. Outline Introduction RHadoop RHadoop Installation rhdfs rmr2 Examples Add properties in ... etc/hadoop/hdfs-site.xml In file hdfs-site.xml add following: Add replication factor < property > < name > dfs.replication < /name > < value > 1 < /value > < /property > Specify the NameNode < property > < name > dfs.namenode.name.dir < /name > < value > file : /usr/local/hadoopdata/hdfs/namenode < /value > < /property > Specify the DataNode < property > < name > dfs.datanode.name.dir < /name > < value > file : /usr/local/hadoopdata/hdfs/datanode < /value > < /property > YSR Engineering College of YVU, Proddatur, Kadapa Big Data Analytics with R and Hadoop November 30, 2016 Slide: 19 / 70
  • 20. Outline Introduction RHadoop RHadoop Installation rhdfs rmr2 Examples Formatting the HDFS file system via the NameNode The first step to starting up your Hadoop installation is Formatting the Hadoop file system We need to do this the first time you set up a Hadoop. Do not format a running Hadoop file system as you will lose all the data currently in HDFS To format the file system, run the command hadoop namenode -format YSR Engineering College of YVU, Proddatur, Kadapa Big Data Analytics with R and Hadoop November 30, 2016 Slide: 20 / 70
  • 21. Outline Introduction RHadoop RHadoop Installation rhdfs rmr2 Examples Starting single-node cluster Run the command: start-all.sh This will startup a NameNode,SecondaryNameNode, DataNode, ResourceManager and a NodeManager on your machine. A nifty tool for checking whether the expected Hadoop processes are running is jps hadoop1@hadoop1:/usr/local/hadoop$ jps 2598 NameNode 3112 ResourceManager 3523 Jps 2917 SecondaryNameNode 2727 DataNode 3242 NodeManager YSR Engineering College of YVU, Proddatur, Kadapa Big Data Analytics with R and Hadoop November 30, 2016 Slide: 21 / 70
  • 22. Outline Introduction RHadoop RHadoop Installation rhdfs rmr2 Examples Stopping your single-node cluster Run the command stop-all.sh To stop all the daemons running on your machine output will be like this. stopping NodeManager localhost: stopping ResourceManager stopping NameNode localhost: stopping DataNode localhost: stopping SecondaryNameNode YSR Engineering College of YVU, Proddatur, Kadapa Big Data Analytics with R and Hadoop November 30, 2016 Slide: 22 / 70
  • 23. Outline Introduction RHadoop RHadoop Installation rhdfs rmr2 Examples R - Introduction R is an open source software package to perform statistical analysis on data. R is a programming language developed from S(Statistical) R provides a wide variety of statistical, machine learning, graphical techniques, and is highly extensible. R can now connect with other data stores, such as MySQL, SQLite, MongoDB, and Hadoop etc., YSR Engineering College of YVU, Proddatur, Kadapa Big Data Analytics with R and Hadoop November 30, 2016 Slide: 23 / 70
  • 24. Outline Introduction RHadoop RHadoop Installation rhdfs rmr2 Examples R - Features Following are Some of the R Features Effective statistical programming language Relational database support Data analytics Data visualization Extension through the vast library of R packages YSR Engineering College of YVU, Proddatur, Kadapa Big Data Analytics with R and Hadoop November 30, 2016 Slide: 24 / 70
  • 25. Outline Introduction RHadoop RHadoop Installation rhdfs rmr2 Examples R - Operations R allows performing Data analytics by various operations such as: Regression Classification Clustering Recommendation Text mining YSR Engineering College of YVU, Proddatur, Kadapa Big Data Analytics with R and Hadoop November 30, 2016 Slide: 25 / 70
  • 26. Outline Introduction RHadoop RHadoop Installation rhdfs rmr2 Examples R - Installation (Windows) For Windows, follow the given steps: 1 Navigate to www.r-project.org. 2 Click on the CRAN section, select CRAN mirror, and select your Windows OS (stick to Linux; Hadoop is almost always used in a Linux environment). 3 Download the latest R version from the mirror. 4 Execute the downloaded .exe to install R. YSR Engineering College of YVU, Proddatur, Kadapa Big Data Analytics with R and Hadoop November 30, 2016 Slide: 26 / 70
  • 27. Outline Introduction RHadoop RHadoop Installation rhdfs rmr2 Examples R - Installation (Ubuntu) For Linux-Ubuntu, follow the given steps: 1 Navigate to www.r-project.org. 2 Click on the CRAN section, select CRAN mirror, and select your OS. 3 In the /etc/apt/sources.list file, add the CRAN < mirror > entry. 4 Download and update the package lists from the repositories using the sudo apt-get update command. 5 Install R system using the sudo apt-get install r-base command. YSR Engineering College of YVU, Proddatur, Kadapa Big Data Analytics with R and Hadoop November 30, 2016 Slide: 27 / 70
  • 28. Outline Introduction RHadoop RHadoop Installation rhdfs rmr2 Examples RHEL/CentOS For Linux-RHEL/CentOS, follow the given steps: 1 Navigate to www.r-project.org. 2 Click on CRAN, select CRAN mirror, and select Red Hat OS. 3 Download the R-*core-*.rpm file. 4 Install the .rpm package using the rpm -ivh R-*core-*.rpm command. 5 Install R system using sudo yum install R. YSR Engineering College of YVU, Proddatur, Kadapa Big Data Analytics with R and Hadoop November 30, 2016 Slide: 28 / 70
  • 29. Outline Introduction RHadoop RHadoop Installation rhdfs rmr2 Examples Hadoop MapReduce in R Hadoop MapReduce in R, we can perform in Three Ways: 1 R and Hadoop Integrated Programming Environment (RHIPE) 2 HadoopStreaming 3 RHadoop Among these three RHadoop is efficient and easiest. YSR Engineering College of YVU, Proddatur, Kadapa Big Data Analytics with R and Hadoop November 30, 2016 Slide: 29 / 70
  • 30. Outline Introduction RHadoop RHadoop Installation rhdfs rmr2 Examples RHadoop - Introduction RHadoop was developed by Revolution Analytics RHadoop is available with three main R packages: 1 rhdfs - provides HDFS data operations 2 rmr - provides MapReduce execution operations 3 rhbase - input data source at the HBase Here it’s not necessary to install all of the three RHadoop packages to run the Hadoop MapReduce operations with R and Hadoop. YSR Engineering College of YVU, Proddatur, Kadapa Big Data Analytics with R and Hadoop November 30, 2016 Slide: 30 / 70
  • 31. Outline Introduction RHadoop RHadoop Installation rhdfs rmr2 Examples RHadoop - Architecture YSR Engineering College of YVU, Proddatur, Kadapa Big Data Analytics with R and Hadoop November 30, 2016 Slide: 31 / 70
  • 32. Outline Introduction RHadoop RHadoop Installation rhdfs rmr2 Examples rhdfs rhdfs is an R interface for providing the HDFS usability from the R console. rhdfs package calls the HDFS API in backend to operate data sources stored on HDFS. With rhdfs methods, R programmer can easily perform read and write operations on distributed data files. YSR Engineering College of YVU, Proddatur, Kadapa Big Data Analytics with R and Hadoop November 30, 2016 Slide: 32 / 70
  • 33. Outline Introduction RHadoop RHadoop Installation rhdfs rmr2 Examples rmr rmr is an R interface for providing Hadoop MapReduce facility inside the R environment. R programmer needs to just divide their application logic into the map and reduce phases and submit it with the rmr methods. After that, rmr calls the Hadoop streaming MapReduce API with several job parameters as input directory, output directory, mapper, reducer, and so on, to perform the R MapReduce job over Hadoop cluster. YSR Engineering College of YVU, Proddatur, Kadapa Big Data Analytics with R and Hadoop November 30, 2016 Slide: 33 / 70
  • 34. Outline Introduction RHadoop RHadoop Installation rhdfs rmr2 Examples rhbase rhbase is an R interface for operating the Hadoop HBase data source stored at the distributed network via a Thrift server. The rhbase package is designed with several methods for initialization and read/write and table manipulation operations. YSR Engineering College of YVU, Proddatur, Kadapa Big Data Analytics with R and Hadoop November 30, 2016 Slide: 34 / 70
  • 35. Outline Introduction RHadoop RHadoop Installation rhdfs rmr2 Examples R and Hadoop installation We already installed R and Hadoop YSR Engineering College of YVU, Proddatur, Kadapa Big Data Analytics with R and Hadoop November 30, 2016 Slide: 35 / 70
  • 36. Outline Introduction RHadoop RHadoop Installation rhdfs rmr2 Examples Installing the R packages To connect R and Hadoop we need to install some of the packages: httr functional devtools plyr reshape2 rJava RJSONIO itertools digest Rcpp install.packages( c(’httr’,’functional’,’devtools’, ’plyr’,’reshape2’)) install.packages( c(’rJava’,’RJSONIO’, ’itertools’, ’digest’,’Rcpp’)) YSR Engineering College of YVU, Proddatur, Kadapa Big Data Analytics with R and Hadoop November 30, 2016 Slide: 36 / 70
  • 37. Outline Introduction RHadoop RHadoop Installation rhdfs rmr2 Examples Setting environment variables We need to set following environment variables through R console. ## Setting HADOOP CMD Sys.setenv(HADOOP CMD="/usr/local/hadoop/bin/hadoop") ## Setting up HADOOP STREAMING Sys.setenv(HADOOP STREAMING="/usr/local/hadoop/share /hadoop/tools/lib/hadoop-streaming-2.7.3.jar") or, we can also set the R console via the command line as follows: export HADOOP CMD="/usr/local/hadoop/" export HADOOP STREAMING="/usr/local/hadoop/share /hadoop/tools/lib/hadoop-streaming-2.7.3.jar" YSR Engineering College of YVU, Proddatur, Kadapa Big Data Analytics with R and Hadoop November 30, 2016 Slide: 37 / 70
  • 38. Outline Introduction RHadoop RHadoop Installation rhdfs rmr2 Examples Usage of Hadoop Streaming jar YSR Engineering College of YVU, Proddatur, Kadapa Big Data Analytics with R and Hadoop November 30, 2016 Slide: 38 / 70
  • 39. Outline Introduction RHadoop RHadoop Installation rhdfs rmr2 Examples Downloading RHadoop Packages Download RHadoop packages from GitHub repository of Revolution Analytics: https://github.com/RevolutionAnalytics/RHadoop rmr: [rmr-2 3.3.1.tar.gz] rhdfs: [rhdfs-1.0.8.tar.gz] rhbase: [rhbase-1.2.1.tar.gz] We can install these packages using R-command line or RStudio YSR Engineering College of YVU, Proddatur, Kadapa Big Data Analytics with R and Hadoop November 30, 2016 Slide: 39 / 70
  • 40. Outline Introduction RHadoop RHadoop Installation rhdfs rmr2 Examples Installing rmr package Install throught R Commander using the following Command R CMD INSTALL rmr-2 3.3.1.tar.gz Install using Rstudio follow the steps Click on Tools → Install Packages Change Install from option from Repository(CERN) to Package Archive File (.tar.gz) option Choose the rmr-2 3.3.1.tar.gz file from your local system Click on Install button (It also install supporting packages of rmr) YSR Engineering College of YVU, Proddatur, Kadapa Big Data Analytics with R and Hadoop November 30, 2016 Slide: 40 / 70
  • 41. Outline Introduction RHadoop RHadoop Installation rhdfs rmr2 Examples Installing rhdfs package Install throught R Commander using the following Command R CMD INSTALL rhdfs-1.0.8.tar.gz Install using Rstudio follow the steps Click on Tools → Install Packages Change Install from option from Repository(CERN) to Package Archive File (.tar.gz) option Choose the rhdfs-1.0.8.tar.gz file from your local system Click on Install button (It also install supporting packages of rhdfs) YSR Engineering College of YVU, Proddatur, Kadapa Big Data Analytics with R and Hadoop November 30, 2016 Slide: 41 / 70
  • 42. Outline Introduction RHadoop RHadoop Installation rhdfs rmr2 Examples Installing rhbase package Install throught R Commander using the following Command R CMD INSTALL rhbase-1.2.1.tar.gz Install using Rstudio follow the steps Click on Tools → Install Packages Change Install from option from Repository(CERN) to Package Archive File (.tar.gz) option Choose the rhbase-1.2.1.tar.gz file from your local system Click on Install button (It also install supporting packages of rhdfs) YSR Engineering College of YVU, Proddatur, Kadapa Big Data Analytics with R and Hadoop November 30, 2016 Slide: 42 / 70
  • 43. Outline Introduction RHadoop RHadoop Installation rhdfs rmr2 Examples Loading the RHadoop libraries However we load a normal library in R, Similarly we can load RHadoop libraries using require() or library() methods. library(’rhdfs’) # Loading HDFS library(’rmr2’) # Loading MapReduce library(’rhbase’) # Loading HBase YSR Engineering College of YVU, Proddatur, Kadapa Big Data Analytics with R and Hadoop November 30, 2016 Slide: 43 / 70
  • 44. Outline Introduction RHadoop RHadoop Installation rhdfs rmr2 Examples Initializing the RHadoop Initialize the rhdfs package with parameters specifying the location of the hadoop configuration files. Syntax: hdfs.init(hadoop=PATH) here PATH specifys the location of the hadoop configuration file. If we can’t pass any parameter, by default conguration files taken from the HADOOP CMD environment variable. YSR Engineering College of YVU, Proddatur, Kadapa Big Data Analytics with R and Hadoop November 30, 2016 Slide: 44 / 70
  • 45. Outline Introduction RHadoop RHadoop Installation rhdfs rmr2 Examples hdfs.ls It is useful to list files and directories of the HDFS. It returns the data frames that columns corresponding to permissions, owner, groups, size (in bytes), modification time and file or directory name. syntax: hdfs.ls(path, recurse=FALSE) If recurse is TRUE, It recursively shows the sub directories. YSR Engineering College of YVU, Proddatur, Kadapa Big Data Analytics with R and Hadoop November 30, 2016 Slide: 45 / 70
  • 46. Outline Introduction RHadoop RHadoop Installation rhdfs rmr2 Examples hdfs.defaults This method is used to set and get the default configurations of the HDFS Syntax: hdfs.defaults(arg) arg indicates name of the parameter or NULL. This function list following values local: rJava object corresponding to local system. blocksize: default block size of the files stored in HDFS fs: an rJava object corresponds to the HDFS fu: Helper object for rhdfs classpath: The java classpath replication: default replication factor in HDFS conf : name-value mappings for Hadoop configuration parameters YSR Engineering College of YVU, Proddatur, Kadapa Big Data Analytics with R and Hadoop November 30, 2016 Slide: 46 / 70
  • 47. Outline Introduction RHadoop RHadoop Installation rhdfs rmr2 Examples hdfs.defaults : Examples YSR Engineering College of YVU, Proddatur, Kadapa Big Data Analytics with R and Hadoop November 30, 2016 Slide: 47 / 70
  • 48. Outline Introduction RHadoop RHadoop Installation rhdfs rmr2 Examples hdfs.cat This method is useful to read the lines form a file on HDFS. Syntax: hdfs.cat(path,n,buffersize) path : Location of the source file n : Number of line read form file buffersize : Size of the buffer (Optional) Example: hdfs.cat(’/RHadoop/1/example.txt’) YSR Engineering College of YVU, Proddatur, Kadapa Big Data Analytics with R and Hadoop November 30, 2016 Slide: 48 / 70
  • 49. Outline Introduction RHadoop RHadoop Installation rhdfs rmr2 Examples hdfs.put This method is useful to transfer the data from the local system to HDFS. Syntax: hdfs.put(src,dest,dstFS=hdfs.defaults(”fs”)) src : Location of the source directory or file dest : Location of the destination directory or file dstFS : The destination file system (Optional) Example: hdfs.put(’/home/dp/Desktop/example.txt’,’/RHadoop/1/’) YSR Engineering College of YVU, Proddatur, Kadapa Big Data Analytics with R and Hadoop November 30, 2016 Slide: 49 / 70
  • 50. Outline Introduction RHadoop RHadoop Installation rhdfs rmr2 Examples hdfs.get This method is useful to transfer the data from the HDFS to local system. Syntax: hdfs.get(src,dest,srcFS=hdfs.defaults(”fs”)) src : Location of the source directory or file dest : Location of the destination directory or file srcFS : The source file system (Optional) Example: hdfs.get(’/RHadoop/1/’,’/home/dp/Desktop/1/’) YSR Engineering College of YVU, Proddatur, Kadapa Big Data Analytics with R and Hadoop November 30, 2016 Slide: 50 / 70
  • 51. Outline Introduction RHadoop RHadoop Installation rhdfs rmr2 Examples hdfs.copy | hdfs.cp This method is useful to copy the data from one location of the HDFS to another location in HDFS Syntax: hdfs.copy(src,dest,overwrite=FALSE) src : Location of the source directory or file dest : Location of the destination directory or file overwrite : If file exist, whether or not it should be overwritten Example: hdfs.copy(’/RHadoop/1/’,’/RHadoop/2/’) YSR Engineering College of YVU, Proddatur, Kadapa Big Data Analytics with R and Hadoop November 30, 2016 Slide: 51 / 70
  • 52. Outline Introduction RHadoop RHadoop Installation rhdfs rmr2 Examples hdfs.move This method is useful to move the data from one location of the HDFS to another location in HDFS and remove the source directory or file. Syntax: hdfs.move(src,dest) src : Location of the source directory or file dest : Location of the destination directory or file Example: hdfs.move(’/RHadoop/1/’,’/RHadoop/2/’) YSR Engineering College of YVU, Proddatur, Kadapa Big Data Analytics with R and Hadoop November 30, 2016 Slide: 52 / 70
  • 53. Outline Introduction RHadoop RHadoop Installation rhdfs rmr2 Examples hdfs.rename This method is useful to rename the file or directory in HDFS through R Syntax: hdfs.rename(src,dest) src : Location of the source directory or file dest : Location of the destination directory or file Example: hdfs.rename(’/RHadoop/1/example.txt’,’/RHadoop/1/sample.txt’) YSR Engineering College of YVU, Proddatur, Kadapa Big Data Analytics with R and Hadoop November 30, 2016 Slide: 53 / 70
  • 54. Outline Introduction RHadoop RHadoop Installation rhdfs rmr2 Examples hdfs.rm | hdfs.rmr | hdfs.delete These functions are used to delete files or directories of HDFS using R. Syntax: hdfs.delete(path) hdfs.rm(path) hdfs.rmr(path) Example: hdfs.delete("/RHadoop/1/") hdfs.rm("/RHadoop/1/") hdfs.rmr("/RHadoop/1/") YSR Engineering College of YVU, Proddatur, Kadapa Big Data Analytics with R and Hadoop November 30, 2016 Slide: 54 / 70
  • 55. Outline Introduction RHadoop RHadoop Installation rhdfs rmr2 Examples hdfs.chmod This method is useful to changing the permissions of HDFS files or Directories Syntax hdfs.chmod(Path, permissions= ’777’) permission is a character that represents permission of a file or directory,. Example hdfs.chmod("/RHadoop", permissions= ’777’) YSR Engineering College of YVU, Proddatur, Kadapa Big Data Analytics with R and Hadoop November 30, 2016 Slide: 55 / 70
  • 56. Outline Introduction RHadoop RHadoop Installation rhdfs rmr2 Examples hdfs.dircreate | hdfs.mkdir Both these functions will be used for creating a directory over the HDFS filesystem. Syntax: hdfs.mkdir(Dirname) Example: hdfs.mkdir("/RHadoop/3/") YSR Engineering College of YVU, Proddatur, Kadapa Big Data Analytics with R and Hadoop November 30, 2016 Slide: 56 / 70
  • 57. Outline Introduction RHadoop RHadoop Installation rhdfs rmr2 Examples hdfs.file This is used to initialize the file to be used for read/write operation on local system or HDFS. Syntax: hdfs.file(path, mode, buffersize ..) ’r’ for read mode, ’w’ for write mode. Append mode is not allowed. Example: f = hdfs.file("/RHadoop/2/README.txt","r",buffersize=104857600) YSR Engineering College of YVU, Proddatur, Kadapa Big Data Analytics with R and Hadoop November 30, 2016 Slide: 57 / 70
  • 58. Outline Introduction RHadoop RHadoop Installation rhdfs rmr2 Examples hdfs.write This is used to write in to the file stored at HDFS via streaming. Syntax: hdfs.write(object,con,hsync=FALSE) Object is any R object, con is HDFS connection Example: obj = c1,2,3,4,5,6,7 hdfs.write(object,con,hsync=FALSE) YSR Engineering College of YVU, Proddatur, Kadapa Big Data Analytics with R and Hadoop November 30, 2016 Slide: 58 / 70
  • 59. Outline Introduction RHadoop RHadoop Installation rhdfs rmr2 Examples hdfs.read This is used to read from binary files on the HDFS directory. This will use the stream for the deserialization of the data. Syntax: hdfs.read(con,n,start) n indicates number of bytes, start indicates starting block. Example: f = hdfs.file("/RHadoop/2/README.txt","r",buffersize=104857600) m = hdfs.read(f) c = rawToChar(m) print(c) YSR Engineering College of YVU, Proddatur, Kadapa Big Data Analytics with R and Hadoop November 30, 2016 Slide: 59 / 70
  • 60. Outline Introduction RHadoop RHadoop Installation rhdfs rmr2 Examples hdfs.close This is used to close the stream when a file operation is complete. It will close the stream and will not allow further file operations. Syntax: hdfs.close(con) con indicates connection of HDFS Example: hdfs.close(f) YSR Engineering College of YVU, Proddatur, Kadapa Big Data Analytics with R and Hadoop November 30, 2016 Slide: 60 / 70
  • 61. Outline Introduction RHadoop RHadoop Installation rhdfs rmr2 Examples hdfs.file.info This is used to get meta information about the file stored at HDFS. Syntax: hdfs.file.info(PATH) Example: YSR Engineering College of YVU, Proddatur, Kadapa Big Data Analytics with R and Hadoop November 30, 2016 Slide: 61 / 70
  • 62. Outline Introduction RHadoop RHadoop Installation rhdfs rmr2 Examples to.dfs Write R objects to the file system. Syntax: to.dfs(kv,output,format=”native”) kv means any valid key value pair or vector, matrix ect., output is any valid path, and format is string naming format Example: small.ints ← to.dfs(1:10) YSR Engineering College of YVU, Proddatur, Kadapa Big Data Analytics with R and Hadoop November 30, 2016 Slide: 62 / 70
  • 63. Outline Introduction RHadoop RHadoop Installation rhdfs rmr2 Examples from.dfs This is used to read the R objects from the HDFS filesystem that are in the binary encrypted format. Syntax: from.dfs(input,format) input is any valid path, and format is string naming format Example: from.dfs(’/tmp/RtmpRMIXzb/file2bda3fa07850’) YSR Engineering College of YVU, Proddatur, Kadapa Big Data Analytics with R and Hadoop November 30, 2016 Slide: 63 / 70
  • 64. Outline Introduction RHadoop RHadoop Installation rhdfs rmr2 Examples mapreduce This is used for defining and executing the MapReduce job. Syntax: mapreduce(input, output, map, reduce, input.format, output.format) input: Path to the input folder on HDFS output: Path to the output folder on HDFS map:An optional R function returning null or a value of keyval() reduce: An optional R function of two arguments, a key and a data structure representing all the values associated with key input.format: Type of input data output.format: Type of output data YSR Engineering College of YVU, Proddatur, Kadapa Big Data Analytics with R and Hadoop November 30, 2016 Slide: 64 / 70
  • 65. Outline Introduction RHadoop RHadoop Installation rhdfs rmr2 Examples keyval The keyval function is used to creates return values from map or reduce functions, themselves parameters to mapreduce. Syntax: keyval(key,val) Where key is the desired key or keys, and val is the desired value or values. YSR Engineering College of YVU, Proddatur, Kadapa Big Data Analytics with R and Hadoop November 30, 2016 Slide: 65 / 70
  • 66. Outline Introduction RHadoop RHadoop Installation rhdfs rmr2 Examples WordCount Mapreduce source code #Set Environment Variables Sys.setenv(HADOOP CMD="/usr/local/hadoop/bin/hadoop") Sys.setenv(HADOOP STREAMING="/usr/local/hadoop/share /hadoop/tools/lib/hadoop-streaming-2.7.1.jar") Sys.setenv(HADOOP HOME="/usr/local/hadoop/") # load librarys library(rmr2) library(rhdfs) # initiate rhdfs package hdfs.init() Cont.. YSR Engineering College of YVU, Proddatur, Kadapa Big Data Analytics with R and Hadoop November 30, 2016 Slide: 66 / 70
  • 67. Outline Introduction RHadoop RHadoop Installation rhdfs rmr2 Examples WordCount Mapreduce source code - cont.. map ← function(k,lines) { words.list ← strsplit(lines, ’ ’) words ← unlist(words.list) return( keyval(words, 1) ) } reduce ← function(word, counts) { keyval(word, sum(counts)) } wordcount ← function (input, output) { mapreduce(input=input, output=output, input.format="text", map=map, reduce=reduce) } Cont.. YSR Engineering College of YVU, Proddatur, Kadapa Big Data Analytics with R and Hadoop November 30, 2016 Slide: 67 / 70
  • 68. Outline Introduction RHadoop RHadoop Installation rhdfs rmr2 Examples WordCount Mapreduce source code - cont.. ## read text files from folder /in1/wc/ hdfs.root ← ’/in1’ hdfs.data ← file.path(hdfs.root, ’wc’) ## save result in folder /in1/out hdfs.out ← file.path(hdfs.root, ’out’) ## Submit job out ← wordcount(hdfs.data, hdfs.out) results ← from.dfs(out) results.df ← as.data.frame(results, stringsAsFactors=F) colnames(results.df) ← c(’word’, ’count’) head(results.df) YSR Engineering College of YVU, Proddatur, Kadapa Big Data Analytics with R and Hadoop November 30, 2016 Slide: 68 / 70
  • 69. Outline Introduction RHadoop RHadoop Installation rhdfs rmr2 Examples WordCount Output YSR Engineering College of YVU, Proddatur, Kadapa Big Data Analytics with R and Hadoop November 30, 2016 Slide: 69 / 70
  • 70. Outline Introduction RHadoop RHadoop Installation rhdfs rmr2 Examples thank You YSR Engineering College of YVU, Proddatur, Kadapa Big Data Analytics with R and Hadoop November 30, 2016 Slide: 70 / 70