꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
RHadoop
1. Outline Introduction RHadoop RHadoop Installation rhdfs rmr2 Examples
Big Data Analytics with R and Hadoop
D. Praveen Kumar
Research Scholar (Full-Time)
Department of Computer Science & Engineering
YSREC of Yogi Vemana University, Proddatur
Kadapa Dt., A. P, India
November 30, 2016
YSR Engineering College of YVU, Proddatur, Kadapa
Big Data Analytics with R and Hadoop
November 30, 2016 Slide: 1 / 70
2. Outline Introduction RHadoop RHadoop Installation rhdfs rmr2 Examples
1 Introduction
2 RHadoop
3 RHadoop Installation
4 rhdfs Methods
5 rmr2
6 Examples
YSR Engineering College of YVU, Proddatur, Kadapa
Big Data Analytics with R and Hadoop
November 30, 2016 Slide: 2 / 70
3. Outline Introduction RHadoop RHadoop Installation rhdfs rmr2 Examples
Big Data - Introduction
Big Data has to deal with large and complex data sets that can be
structured, semi-structured, or unstructured and will typically not
fit into memory to be processed. They have to be processed in
place, which means that computation has to be done where the
data resides for processing.
YSR Engineering College of YVU, Proddatur, Kadapa
Big Data Analytics with R and Hadoop
November 30, 2016 Slide: 3 / 70
4. Outline Introduction RHadoop RHadoop Installation rhdfs rmr2 Examples
Big Data - 3V’s
Velocity refers to the low latency, real-time speed at which the
analytics need to be applied. (Example: to perform analytics
on a continuous stream of data originating from a social
networking site)
Volume refers to the size of the data set. It may be in KB,
MB, GB, TB, or PB based on the type of the application that
generates or receives the data.
Variety refers to the various types of the data that can exist,
for example, text, audio, video, and photos.
YSR Engineering College of YVU, Proddatur, Kadapa
Big Data Analytics with R and Hadoop
November 30, 2016 Slide: 4 / 70
5. Outline Introduction RHadoop RHadoop Installation rhdfs rmr2 Examples
Big Data - 3V’s (Cont..)
YSR Engineering College of YVU, Proddatur, Kadapa
Big Data Analytics with R and Hadoop
November 30, 2016 Slide: 5 / 70
6. Outline Introduction RHadoop RHadoop Installation rhdfs rmr2 Examples
Popular Organizations that hold Big Data
Some of the popular organizations that hold Big Data are as
follows: (upto 2014)
Facebook: It has 40 PB of data and captures 100 TB/day
Yahoo!: It has 60 PB of data
Twitter: It captures 8 TB/day
EBay: It has 40 PB of data and captures 50 TB/day
YSR Engineering College of YVU, Proddatur, Kadapa
Big Data Analytics with R and Hadoop
November 30, 2016 Slide: 6 / 70
7. Outline Introduction RHadoop RHadoop Installation rhdfs rmr2 Examples
Hadoop - Introduction
Apache Hadoop is an open source Java framework for
processing and querying vast amounts of data on large
clusters of commodity hardware.
Hadoop is a top level Apache project, initiated and led by
Yahoo! and Doug Cutting.
Its impact can be boiled down to four salient characteristics:
scalable, cost-effective, flexible, fault-tolerant solutions.
Apache Hadoop has two main features:
HDFS (Hadoop Distributed File System) - Storing
Map Reduce - Processing
YSR Engineering College of YVU, Proddatur, Kadapa
Big Data Analytics with R and Hadoop
November 30, 2016 Slide: 7 / 70
8. Outline Introduction RHadoop RHadoop Installation rhdfs rmr2 Examples
Requirements
Necessary
Java >= 7
ssh
Linux OS (Ubuntu >=
14.04)
Hadoop framework
Optional
Eclipse
Internet connection
YSR Engineering College of YVU, Proddatur, Kadapa
Big Data Analytics with R and Hadoop
November 30, 2016 Slide: 8 / 70
9. Outline Introduction RHadoop RHadoop Installation rhdfs rmr2 Examples
Java 7 & Installation
Hadoop requires a working Java installation. However, using
java 1.7 or more is recommended.
Following command is used to install java in linux platform
sudo apt-get install openjdk-7-jdk (or)
sudo apt-get install default-jdk
YSR Engineering College of YVU, Proddatur, Kadapa
Big Data Analytics with R and Hadoop
November 30, 2016 Slide: 9 / 70
10. Outline Introduction RHadoop RHadoop Installation rhdfs rmr2 Examples
Java PATH Setup
We need to set JAVA path
Open the .bashrc file located in home directory
gedit ~/.bashrc
Add below line at the end:
export JAVA HOME=/usr/lib/jvm/java−7−openjdk−amd64
YSR Engineering College of YVU, Proddatur, Kadapa
Big Data Analytics with R and Hadoop
November 30, 2016 Slide: 10 / 70
11. Outline Introduction RHadoop RHadoop Installation rhdfs rmr2 Examples
Installation & Configuration of SSH
Hadoop requires SSH(Secure Shell) access to manage its
nodes, i.e. remote machines plus your local machine if you
want to use Hadoop on it.
Install SSH using following command
sudo apt-get install ssh
First, we have to generate DSA an SSH key for user.
ssh-keygen -t dsa -P ’’ -f ~ /.ssh/id dsa
cat ~ /.ssh/id dsa.pub >> ~ /.ssh/authorized keys
YSR Engineering College of YVU, Proddatur, Kadapa
Big Data Analytics with R and Hadoop
November 30, 2016 Slide: 11 / 70
12. Outline Introduction RHadoop RHadoop Installation rhdfs rmr2 Examples
Download & Extract Hadoop
Download Hadoop from the Apache Download Mirrors
http://mirror.fibergrid.in/apache/hadoop/common/
Extract the contents of the Hadoop package to a location of your
choice. I picked /usr/local/hadoop.
$ sudo chmod 777 /usr/local
$ cd /usr/local
$ tar xzf hadoop-2.7.2.tar.gz
$ sudo mv hadoop-2.7.2 hadoop
YSR Engineering College of YVU, Proddatur, Kadapa
Big Data Analytics with R and Hadoop
November 30, 2016 Slide: 12 / 70
13. Outline Introduction RHadoop RHadoop Installation rhdfs rmr2 Examples
Add Hadoop configuration in .bashrc
Add Hadoop configuration in .bashrc in home directory.
export HADOOP INSTALL=/usr/local/hadoop
export PATH=$PATH:$HADOOP INSTALL/bin
export PATH=$PATH:$HADOOP INSTALL/sbin
export HADOOP MAPRED HOME=$HADOOP INSTALL
export HADOOP HDFS HOME=$HADOOP INSTALL
export HADOOP COMMON HOME=$HADOOP INSTALL
export YARN HOME=$HADOOP INSTALL
export HADOOP COMMON LIB NATIVE DIR=$HADOOP INSTALL/lib/native
export HADOOP OPTS="-Djava.library.path=$HADOOP INSTALL/lib"
YSR Engineering College of YVU, Proddatur, Kadapa
Big Data Analytics with R and Hadoop
November 30, 2016 Slide: 13 / 70
14. Outline Introduction RHadoop RHadoop Installation rhdfs rmr2 Examples
Create temp file, DataNode & NameNode
Execute below commands to create NameNode
mkdir -p /usr/local/hadoopdata/hdfs/namenode
Execute below commands to create DataNode
mkdir -p /usr/local/hadoopdata/hdfs/datanode
Execute below code to create the tmp directory in hadoop
sudo mkdir -p /app/hadoop/tmp
sudo chown hadoop1:hadoop1 /app/hadoop/tmp
sudo chmod 750 /app/hadoop/tmp
YSR Engineering College of YVU, Proddatur, Kadapa
Big Data Analytics with R and Hadoop
November 30, 2016 Slide: 14 / 70
15. Outline Introduction RHadoop RHadoop Installation rhdfs rmr2 Examples
Files to Configure
The following are the files we need to configure
core-site.xml
hadoop-env.sh
mapred-site.xml
hdfs-site.xml
YSR Engineering College of YVU, Proddatur, Kadapa
Big Data Analytics with R and Hadoop
November 30, 2016 Slide: 15 / 70
16. Outline Introduction RHadoop RHadoop Installation rhdfs rmr2 Examples
Add properties in /usr/local/hadoop/etc/core-site.xml
Add the following snippets between the
< configuration > ... < /configuration > tags in the core-site.xml
file.
Add below property to specify the location of tmp
< property >
< name > hadoop.tmp.dir < /name >
< value > /app/hadoop/tmp < /value >
< /property >
Add below property to specify the location of default file
system and its port number.
< property >
< name > fs.default.name < /name >
< value > hdfs : //localhost : 54310 < /value >
< /property >
YSR Engineering College of YVU, Proddatur, Kadapa
Big Data Analytics with R and Hadoop
November 30, 2016 Slide: 16 / 70
17. Outline Introduction RHadoop RHadoop Installation rhdfs rmr2 Examples
Add properties in /usr/local/hadoop/etc/hadoop-env.sh
Un-Comment the JAVA HOME and Give Correct Path For
Java.
export JAVA HOME=/usr/lib/jvm/java-7-openjdk-amd64
YSR Engineering College of YVU, Proddatur, Kadapa
Big Data Analytics with R and Hadoop
November 30, 2016 Slide: 17 / 70
18. Outline Introduction RHadoop RHadoop Installation rhdfs rmr2 Examples
Add property in
/usr/local/hadoop/etc/hadoop/mapred-site.xml
In file we add The host name and port that the MapReduce job
tracker runs at. Add following in mapred-site.xml :
< property >
< name > mapred.job.tracker < /name >
< value > localhost : 54311 < /value >
< /property >
YSR Engineering College of YVU, Proddatur, Kadapa
Big Data Analytics with R and Hadoop
November 30, 2016 Slide: 18 / 70
19. Outline Introduction RHadoop RHadoop Installation rhdfs rmr2 Examples
Add properties in ... etc/hadoop/hdfs-site.xml
In file hdfs-site.xml add following:
Add replication factor
< property >
< name > dfs.replication < /name >
< value > 1 < /value >
< /property >
Specify the NameNode
< property >
< name > dfs.namenode.name.dir < /name >
< value > file : /usr/local/hadoopdata/hdfs/namenode < /value >
< /property >
Specify the DataNode
< property >
< name > dfs.datanode.name.dir < /name >
< value > file : /usr/local/hadoopdata/hdfs/datanode < /value >
< /property >
YSR Engineering College of YVU, Proddatur, Kadapa
Big Data Analytics with R and Hadoop
November 30, 2016 Slide: 19 / 70
20. Outline Introduction RHadoop RHadoop Installation rhdfs rmr2 Examples
Formatting the HDFS file system via the NameNode
The first step to starting up your Hadoop installation is
Formatting the Hadoop file system
We need to do this the first time you set up a Hadoop.
Do not format a running Hadoop file system as you will lose
all the data currently in HDFS
To format the file system, run the command
hadoop namenode -format
YSR Engineering College of YVU, Proddatur, Kadapa
Big Data Analytics with R and Hadoop
November 30, 2016 Slide: 20 / 70
21. Outline Introduction RHadoop RHadoop Installation rhdfs rmr2 Examples
Starting single-node cluster
Run the command:
start-all.sh
This will startup a NameNode,SecondaryNameNode,
DataNode, ResourceManager and a NodeManager on your
machine.
A nifty tool for checking whether the expected Hadoop
processes are running is jps
hadoop1@hadoop1:/usr/local/hadoop$ jps
2598 NameNode
3112 ResourceManager
3523 Jps
2917 SecondaryNameNode
2727 DataNode
3242 NodeManager
YSR Engineering College of YVU, Proddatur, Kadapa
Big Data Analytics with R and Hadoop
November 30, 2016 Slide: 21 / 70
22. Outline Introduction RHadoop RHadoop Installation rhdfs rmr2 Examples
Stopping your single-node cluster
Run the command
stop-all.sh
To stop all the daemons running on your machine output will be
like this.
stopping NodeManager
localhost: stopping ResourceManager
stopping NameNode
localhost: stopping DataNode
localhost: stopping SecondaryNameNode
YSR Engineering College of YVU, Proddatur, Kadapa
Big Data Analytics with R and Hadoop
November 30, 2016 Slide: 22 / 70
23. Outline Introduction RHadoop RHadoop Installation rhdfs rmr2 Examples
R - Introduction
R is an open source software package to perform statistical
analysis on data.
R is a programming language developed from S(Statistical)
R provides a wide variety of statistical, machine learning,
graphical techniques, and is highly extensible.
R can now connect with other data stores, such as MySQL,
SQLite, MongoDB, and Hadoop etc.,
YSR Engineering College of YVU, Proddatur, Kadapa
Big Data Analytics with R and Hadoop
November 30, 2016 Slide: 23 / 70
24. Outline Introduction RHadoop RHadoop Installation rhdfs rmr2 Examples
R - Features
Following are Some of the R Features
Effective statistical programming language
Relational database support
Data analytics
Data visualization
Extension through the vast library of R packages
YSR Engineering College of YVU, Proddatur, Kadapa
Big Data Analytics with R and Hadoop
November 30, 2016 Slide: 24 / 70
25. Outline Introduction RHadoop RHadoop Installation rhdfs rmr2 Examples
R - Operations
R allows performing Data analytics by various operations such as:
Regression
Classification
Clustering
Recommendation
Text mining
YSR Engineering College of YVU, Proddatur, Kadapa
Big Data Analytics with R and Hadoop
November 30, 2016 Slide: 25 / 70
26. Outline Introduction RHadoop RHadoop Installation rhdfs rmr2 Examples
R - Installation (Windows)
For Windows, follow the given steps:
1 Navigate to www.r-project.org.
2 Click on the CRAN section, select CRAN mirror, and select
your Windows OS (stick to Linux; Hadoop is almost always
used in a Linux environment).
3 Download the latest R version from the mirror.
4 Execute the downloaded .exe to install R.
YSR Engineering College of YVU, Proddatur, Kadapa
Big Data Analytics with R and Hadoop
November 30, 2016 Slide: 26 / 70
27. Outline Introduction RHadoop RHadoop Installation rhdfs rmr2 Examples
R - Installation (Ubuntu)
For Linux-Ubuntu, follow the given steps:
1 Navigate to www.r-project.org.
2 Click on the CRAN section, select CRAN mirror, and select
your OS.
3 In the /etc/apt/sources.list file, add the CRAN
< mirror > entry.
4 Download and update the package lists from the repositories
using the sudo apt-get update command.
5 Install R system using the sudo apt-get install r-base
command.
YSR Engineering College of YVU, Proddatur, Kadapa
Big Data Analytics with R and Hadoop
November 30, 2016 Slide: 27 / 70
28. Outline Introduction RHadoop RHadoop Installation rhdfs rmr2 Examples
RHEL/CentOS
For Linux-RHEL/CentOS, follow the given steps:
1 Navigate to www.r-project.org.
2 Click on CRAN, select CRAN mirror, and select Red Hat OS.
3 Download the R-*core-*.rpm file.
4 Install the .rpm package using the rpm -ivh R-*core-*.rpm
command.
5 Install R system using sudo yum install R.
YSR Engineering College of YVU, Proddatur, Kadapa
Big Data Analytics with R and Hadoop
November 30, 2016 Slide: 28 / 70
29. Outline Introduction RHadoop RHadoop Installation rhdfs rmr2 Examples
Hadoop MapReduce in R
Hadoop MapReduce in R, we can perform in Three Ways:
1 R and Hadoop Integrated Programming Environment
(RHIPE)
2 HadoopStreaming
3 RHadoop
Among these three RHadoop is efficient and easiest.
YSR Engineering College of YVU, Proddatur, Kadapa
Big Data Analytics with R and Hadoop
November 30, 2016 Slide: 29 / 70
30. Outline Introduction RHadoop RHadoop Installation rhdfs rmr2 Examples
RHadoop - Introduction
RHadoop was developed by Revolution Analytics
RHadoop is available with three main R packages:
1 rhdfs - provides HDFS data operations
2 rmr - provides MapReduce execution operations
3 rhbase - input data source at the HBase
Here it’s not necessary to install all of the three RHadoop
packages to run the Hadoop MapReduce operations with R
and Hadoop.
YSR Engineering College of YVU, Proddatur, Kadapa
Big Data Analytics with R and Hadoop
November 30, 2016 Slide: 30 / 70
31. Outline Introduction RHadoop RHadoop Installation rhdfs rmr2 Examples
RHadoop - Architecture
YSR Engineering College of YVU, Proddatur, Kadapa
Big Data Analytics with R and Hadoop
November 30, 2016 Slide: 31 / 70
32. Outline Introduction RHadoop RHadoop Installation rhdfs rmr2 Examples
rhdfs
rhdfs is an R interface for providing the HDFS usability from
the R console.
rhdfs package calls the HDFS API in backend to operate data
sources stored on HDFS.
With rhdfs methods, R programmer can easily perform read
and write operations on distributed data files.
YSR Engineering College of YVU, Proddatur, Kadapa
Big Data Analytics with R and Hadoop
November 30, 2016 Slide: 32 / 70
33. Outline Introduction RHadoop RHadoop Installation rhdfs rmr2 Examples
rmr
rmr is an R interface for providing Hadoop MapReduce facility
inside the R environment.
R programmer needs to just divide their application logic into
the map and reduce phases and submit it with the rmr
methods.
After that, rmr calls the Hadoop streaming MapReduce API
with several job parameters as input directory, output
directory, mapper, reducer, and so on, to perform the R
MapReduce job over Hadoop cluster.
YSR Engineering College of YVU, Proddatur, Kadapa
Big Data Analytics with R and Hadoop
November 30, 2016 Slide: 33 / 70
34. Outline Introduction RHadoop RHadoop Installation rhdfs rmr2 Examples
rhbase
rhbase is an R interface for operating the Hadoop HBase data
source stored at the distributed network via a Thrift server.
The rhbase package is designed with several methods for
initialization and read/write and table manipulation
operations.
YSR Engineering College of YVU, Proddatur, Kadapa
Big Data Analytics with R and Hadoop
November 30, 2016 Slide: 34 / 70
35. Outline Introduction RHadoop RHadoop Installation rhdfs rmr2 Examples
R and Hadoop installation
We already installed R and Hadoop
YSR Engineering College of YVU, Proddatur, Kadapa
Big Data Analytics with R and Hadoop
November 30, 2016 Slide: 35 / 70
36. Outline Introduction RHadoop RHadoop Installation rhdfs rmr2 Examples
Installing the R packages
To connect R and Hadoop we need to install some of the packages:
httr
functional
devtools
plyr
reshape2
rJava
RJSONIO
itertools
digest
Rcpp
install.packages( c(’httr’,’functional’,’devtools’, ’plyr’,’reshape2’))
install.packages( c(’rJava’,’RJSONIO’, ’itertools’, ’digest’,’Rcpp’))
YSR Engineering College of YVU, Proddatur, Kadapa
Big Data Analytics with R and Hadoop
November 30, 2016 Slide: 36 / 70
37. Outline Introduction RHadoop RHadoop Installation rhdfs rmr2 Examples
Setting environment variables
We need to set following environment variables through R console.
## Setting HADOOP CMD
Sys.setenv(HADOOP CMD="/usr/local/hadoop/bin/hadoop")
## Setting up HADOOP STREAMING
Sys.setenv(HADOOP STREAMING="/usr/local/hadoop/share
/hadoop/tools/lib/hadoop-streaming-2.7.3.jar")
or, we can also set the R console via the command line as follows:
export HADOOP CMD="/usr/local/hadoop/"
export HADOOP STREAMING="/usr/local/hadoop/share
/hadoop/tools/lib/hadoop-streaming-2.7.3.jar"
YSR Engineering College of YVU, Proddatur, Kadapa
Big Data Analytics with R and Hadoop
November 30, 2016 Slide: 37 / 70
38. Outline Introduction RHadoop RHadoop Installation rhdfs rmr2 Examples
Usage of Hadoop Streaming jar
YSR Engineering College of YVU, Proddatur, Kadapa
Big Data Analytics with R and Hadoop
November 30, 2016 Slide: 38 / 70
39. Outline Introduction RHadoop RHadoop Installation rhdfs rmr2 Examples
Downloading RHadoop Packages
Download RHadoop packages from GitHub repository of
Revolution Analytics:
https://github.com/RevolutionAnalytics/RHadoop
rmr: [rmr-2 3.3.1.tar.gz]
rhdfs: [rhdfs-1.0.8.tar.gz]
rhbase: [rhbase-1.2.1.tar.gz]
We can install these packages using R-command line or RStudio
YSR Engineering College of YVU, Proddatur, Kadapa
Big Data Analytics with R and Hadoop
November 30, 2016 Slide: 39 / 70
40. Outline Introduction RHadoop RHadoop Installation rhdfs rmr2 Examples
Installing rmr package
Install throught R Commander using the following Command
R CMD INSTALL rmr-2 3.3.1.tar.gz
Install using Rstudio follow the steps
Click on Tools → Install Packages
Change Install from option from Repository(CERN) to
Package Archive File (.tar.gz) option
Choose the rmr-2 3.3.1.tar.gz file from your local system
Click on Install button (It also install supporting packages of
rmr)
YSR Engineering College of YVU, Proddatur, Kadapa
Big Data Analytics with R and Hadoop
November 30, 2016 Slide: 40 / 70
41. Outline Introduction RHadoop RHadoop Installation rhdfs rmr2 Examples
Installing rhdfs package
Install throught R Commander using the following Command
R CMD INSTALL rhdfs-1.0.8.tar.gz
Install using Rstudio follow the steps
Click on Tools → Install Packages
Change Install from option from Repository(CERN) to
Package Archive File (.tar.gz) option
Choose the rhdfs-1.0.8.tar.gz file from your local system
Click on Install button (It also install supporting packages of
rhdfs)
YSR Engineering College of YVU, Proddatur, Kadapa
Big Data Analytics with R and Hadoop
November 30, 2016 Slide: 41 / 70
42. Outline Introduction RHadoop RHadoop Installation rhdfs rmr2 Examples
Installing rhbase package
Install throught R Commander using the following Command
R CMD INSTALL rhbase-1.2.1.tar.gz
Install using Rstudio follow the steps
Click on Tools → Install Packages
Change Install from option from Repository(CERN) to
Package Archive File (.tar.gz) option
Choose the rhbase-1.2.1.tar.gz file from your local system
Click on Install button (It also install supporting packages of
rhdfs)
YSR Engineering College of YVU, Proddatur, Kadapa
Big Data Analytics with R and Hadoop
November 30, 2016 Slide: 42 / 70
43. Outline Introduction RHadoop RHadoop Installation rhdfs rmr2 Examples
Loading the RHadoop libraries
However we load a normal library in R, Similarly we can load
RHadoop libraries using require() or library() methods.
library(’rhdfs’) # Loading HDFS
library(’rmr2’) # Loading MapReduce
library(’rhbase’) # Loading HBase
YSR Engineering College of YVU, Proddatur, Kadapa
Big Data Analytics with R and Hadoop
November 30, 2016 Slide: 43 / 70
44. Outline Introduction RHadoop RHadoop Installation rhdfs rmr2 Examples
Initializing the RHadoop
Initialize the rhdfs package with parameters specifying the location
of the hadoop configuration files.
Syntax:
hdfs.init(hadoop=PATH)
here PATH specifys the location of the hadoop configuration file.
If we can’t pass any parameter, by default conguration files taken
from the HADOOP CMD environment variable.
YSR Engineering College of YVU, Proddatur, Kadapa
Big Data Analytics with R and Hadoop
November 30, 2016 Slide: 44 / 70
45. Outline Introduction RHadoop RHadoop Installation rhdfs rmr2 Examples
hdfs.ls
It is useful to list files and directories of the HDFS. It returns the
data frames that columns corresponding to permissions, owner,
groups, size (in bytes), modification time and file or directory
name.
syntax: hdfs.ls(path, recurse=FALSE)
If recurse is TRUE, It recursively shows the sub directories.
YSR Engineering College of YVU, Proddatur, Kadapa
Big Data Analytics with R and Hadoop
November 30, 2016 Slide: 45 / 70
46. Outline Introduction RHadoop RHadoop Installation rhdfs rmr2 Examples
hdfs.defaults
This method is used to set and get the default configurations of
the HDFS
Syntax:
hdfs.defaults(arg)
arg indicates name of the parameter or NULL.
This function list following values
local: rJava object corresponding to local system.
blocksize: default block size of the files stored in HDFS
fs: an rJava object corresponds to the HDFS
fu: Helper object for rhdfs
classpath: The java classpath
replication: default replication factor in HDFS
conf : name-value mappings for Hadoop configuration
parameters
YSR Engineering College of YVU, Proddatur, Kadapa
Big Data Analytics with R and Hadoop
November 30, 2016 Slide: 46 / 70
47. Outline Introduction RHadoop RHadoop Installation rhdfs rmr2 Examples
hdfs.defaults : Examples
YSR Engineering College of YVU, Proddatur, Kadapa
Big Data Analytics with R and Hadoop
November 30, 2016 Slide: 47 / 70
48. Outline Introduction RHadoop RHadoop Installation rhdfs rmr2 Examples
hdfs.cat
This method is useful to read the lines form a file on HDFS.
Syntax:
hdfs.cat(path,n,buffersize)
path : Location of the source file
n : Number of line read form file
buffersize : Size of the buffer (Optional)
Example:
hdfs.cat(’/RHadoop/1/example.txt’)
YSR Engineering College of YVU, Proddatur, Kadapa
Big Data Analytics with R and Hadoop
November 30, 2016 Slide: 48 / 70
49. Outline Introduction RHadoop RHadoop Installation rhdfs rmr2 Examples
hdfs.put
This method is useful to transfer the data from the local system
to HDFS.
Syntax:
hdfs.put(src,dest,dstFS=hdfs.defaults(”fs”))
src : Location of the source directory or file
dest : Location of the destination directory or file
dstFS : The destination file system (Optional)
Example:
hdfs.put(’/home/dp/Desktop/example.txt’,’/RHadoop/1/’)
YSR Engineering College of YVU, Proddatur, Kadapa
Big Data Analytics with R and Hadoop
November 30, 2016 Slide: 49 / 70
50. Outline Introduction RHadoop RHadoop Installation rhdfs rmr2 Examples
hdfs.get
This method is useful to transfer the data from the HDFS to local
system.
Syntax:
hdfs.get(src,dest,srcFS=hdfs.defaults(”fs”))
src : Location of the source directory or file
dest : Location of the destination directory or file
srcFS : The source file system (Optional)
Example:
hdfs.get(’/RHadoop/1/’,’/home/dp/Desktop/1/’)
YSR Engineering College of YVU, Proddatur, Kadapa
Big Data Analytics with R and Hadoop
November 30, 2016 Slide: 50 / 70
51. Outline Introduction RHadoop RHadoop Installation rhdfs rmr2 Examples
hdfs.copy | hdfs.cp
This method is useful to copy the data from one location of the
HDFS to another location in HDFS
Syntax:
hdfs.copy(src,dest,overwrite=FALSE)
src : Location of the source directory or file
dest : Location of the destination directory or file
overwrite : If file exist, whether or not it should be overwritten
Example:
hdfs.copy(’/RHadoop/1/’,’/RHadoop/2/’)
YSR Engineering College of YVU, Proddatur, Kadapa
Big Data Analytics with R and Hadoop
November 30, 2016 Slide: 51 / 70
52. Outline Introduction RHadoop RHadoop Installation rhdfs rmr2 Examples
hdfs.move
This method is useful to move the data from one location of the
HDFS to another location in HDFS and remove the source
directory or file.
Syntax:
hdfs.move(src,dest)
src : Location of the source directory or file
dest : Location of the destination directory or file
Example:
hdfs.move(’/RHadoop/1/’,’/RHadoop/2/’)
YSR Engineering College of YVU, Proddatur, Kadapa
Big Data Analytics with R and Hadoop
November 30, 2016 Slide: 52 / 70
53. Outline Introduction RHadoop RHadoop Installation rhdfs rmr2 Examples
hdfs.rename
This method is useful to rename the file or directory in HDFS
through R
Syntax:
hdfs.rename(src,dest)
src : Location of the source directory or file
dest : Location of the destination directory or file
Example:
hdfs.rename(’/RHadoop/1/example.txt’,’/RHadoop/1/sample.txt’)
YSR Engineering College of YVU, Proddatur, Kadapa
Big Data Analytics with R and Hadoop
November 30, 2016 Slide: 53 / 70
54. Outline Introduction RHadoop RHadoop Installation rhdfs rmr2 Examples
hdfs.rm | hdfs.rmr | hdfs.delete
These functions are used to delete files or directories of HDFS
using R.
Syntax:
hdfs.delete(path)
hdfs.rm(path)
hdfs.rmr(path)
Example:
hdfs.delete("/RHadoop/1/")
hdfs.rm("/RHadoop/1/")
hdfs.rmr("/RHadoop/1/")
YSR Engineering College of YVU, Proddatur, Kadapa
Big Data Analytics with R and Hadoop
November 30, 2016 Slide: 54 / 70
55. Outline Introduction RHadoop RHadoop Installation rhdfs rmr2 Examples
hdfs.chmod
This method is useful to changing the permissions of HDFS files or
Directories
Syntax
hdfs.chmod(Path, permissions= ’777’)
permission is a character that represents permission of a file or
directory,.
Example
hdfs.chmod("/RHadoop", permissions= ’777’)
YSR Engineering College of YVU, Proddatur, Kadapa
Big Data Analytics with R and Hadoop
November 30, 2016 Slide: 55 / 70
56. Outline Introduction RHadoop RHadoop Installation rhdfs rmr2 Examples
hdfs.dircreate | hdfs.mkdir
Both these functions will be used for creating a directory over the
HDFS filesystem.
Syntax:
hdfs.mkdir(Dirname)
Example:
hdfs.mkdir("/RHadoop/3/")
YSR Engineering College of YVU, Proddatur, Kadapa
Big Data Analytics with R and Hadoop
November 30, 2016 Slide: 56 / 70
57. Outline Introduction RHadoop RHadoop Installation rhdfs rmr2 Examples
hdfs.file
This is used to initialize the file to be used for read/write operation
on local system or HDFS.
Syntax:
hdfs.file(path, mode, buffersize ..)
’r’ for read mode, ’w’ for write mode. Append mode is not
allowed.
Example:
f =
hdfs.file("/RHadoop/2/README.txt","r",buffersize=104857600)
YSR Engineering College of YVU, Proddatur, Kadapa
Big Data Analytics with R and Hadoop
November 30, 2016 Slide: 57 / 70
58. Outline Introduction RHadoop RHadoop Installation rhdfs rmr2 Examples
hdfs.write
This is used to write in to the file stored at HDFS via streaming.
Syntax:
hdfs.write(object,con,hsync=FALSE)
Object is any R object, con is HDFS connection
Example:
obj = c1,2,3,4,5,6,7
hdfs.write(object,con,hsync=FALSE)
YSR Engineering College of YVU, Proddatur, Kadapa
Big Data Analytics with R and Hadoop
November 30, 2016 Slide: 58 / 70
59. Outline Introduction RHadoop RHadoop Installation rhdfs rmr2 Examples
hdfs.read
This is used to read from binary files on the HDFS directory. This
will use the stream for the deserialization of the data.
Syntax:
hdfs.read(con,n,start)
n indicates number of bytes, start indicates starting block.
Example:
f =
hdfs.file("/RHadoop/2/README.txt","r",buffersize=104857600)
m = hdfs.read(f)
c = rawToChar(m)
print(c)
YSR Engineering College of YVU, Proddatur, Kadapa
Big Data Analytics with R and Hadoop
November 30, 2016 Slide: 59 / 70
60. Outline Introduction RHadoop RHadoop Installation rhdfs rmr2 Examples
hdfs.close
This is used to close the stream when a file operation is complete.
It will close the stream and will not allow further file operations.
Syntax:
hdfs.close(con)
con indicates connection of HDFS
Example:
hdfs.close(f)
YSR Engineering College of YVU, Proddatur, Kadapa
Big Data Analytics with R and Hadoop
November 30, 2016 Slide: 60 / 70
61. Outline Introduction RHadoop RHadoop Installation rhdfs rmr2 Examples
hdfs.file.info
This is used to get meta information about the file stored at
HDFS.
Syntax:
hdfs.file.info(PATH)
Example:
YSR Engineering College of YVU, Proddatur, Kadapa
Big Data Analytics with R and Hadoop
November 30, 2016 Slide: 61 / 70
62. Outline Introduction RHadoop RHadoop Installation rhdfs rmr2 Examples
to.dfs
Write R objects to the file system.
Syntax:
to.dfs(kv,output,format=”native”)
kv means any valid key value pair or vector, matrix ect.,
output is any valid path, and format is string naming format
Example: small.ints ← to.dfs(1:10)
YSR Engineering College of YVU, Proddatur, Kadapa
Big Data Analytics with R and Hadoop
November 30, 2016 Slide: 62 / 70
63. Outline Introduction RHadoop RHadoop Installation rhdfs rmr2 Examples
from.dfs
This is used to read the R objects from the HDFS filesystem that
are in the binary encrypted format.
Syntax:
from.dfs(input,format)
input is any valid path, and format is string naming format
Example:
from.dfs(’/tmp/RtmpRMIXzb/file2bda3fa07850’)
YSR Engineering College of YVU, Proddatur, Kadapa
Big Data Analytics with R and Hadoop
November 30, 2016 Slide: 63 / 70
64. Outline Introduction RHadoop RHadoop Installation rhdfs rmr2 Examples
mapreduce
This is used for defining and executing the MapReduce job.
Syntax:
mapreduce(input, output, map, reduce, input.format,
output.format)
input: Path to the input folder on HDFS
output: Path to the output folder on HDFS
map:An optional R function returning null or a value of
keyval()
reduce: An optional R function of two arguments, a key and a
data structure representing all the values associated with key
input.format: Type of input data
output.format: Type of output data
YSR Engineering College of YVU, Proddatur, Kadapa
Big Data Analytics with R and Hadoop
November 30, 2016 Slide: 64 / 70
65. Outline Introduction RHadoop RHadoop Installation rhdfs rmr2 Examples
keyval
The keyval function is used to creates return values from map or
reduce functions, themselves parameters to mapreduce.
Syntax:
keyval(key,val)
Where key is the desired key or keys, and val is the desired
value or values.
YSR Engineering College of YVU, Proddatur, Kadapa
Big Data Analytics with R and Hadoop
November 30, 2016 Slide: 65 / 70
66. Outline Introduction RHadoop RHadoop Installation rhdfs rmr2 Examples
WordCount Mapreduce source code
#Set Environment Variables
Sys.setenv(HADOOP CMD="/usr/local/hadoop/bin/hadoop")
Sys.setenv(HADOOP STREAMING="/usr/local/hadoop/share
/hadoop/tools/lib/hadoop-streaming-2.7.1.jar")
Sys.setenv(HADOOP HOME="/usr/local/hadoop/")
# load librarys
library(rmr2)
library(rhdfs)
# initiate rhdfs package
hdfs.init()
Cont..
YSR Engineering College of YVU, Proddatur, Kadapa
Big Data Analytics with R and Hadoop
November 30, 2016 Slide: 66 / 70
67. Outline Introduction RHadoop RHadoop Installation rhdfs rmr2 Examples
WordCount Mapreduce source code - cont..
map ← function(k,lines) {
words.list ← strsplit(lines, ’ ’)
words ← unlist(words.list)
return( keyval(words, 1) )
}
reduce ← function(word, counts) {
keyval(word, sum(counts))
}
wordcount ← function (input, output) {
mapreduce(input=input, output=output, input.format="text", map=map,
reduce=reduce)
}
Cont..
YSR Engineering College of YVU, Proddatur, Kadapa
Big Data Analytics with R and Hadoop
November 30, 2016 Slide: 67 / 70
68. Outline Introduction RHadoop RHadoop Installation rhdfs rmr2 Examples
WordCount Mapreduce source code - cont..
## read text files from folder /in1/wc/
hdfs.root ← ’/in1’
hdfs.data ← file.path(hdfs.root, ’wc’)
## save result in folder /in1/out
hdfs.out ← file.path(hdfs.root, ’out’)
## Submit job
out ← wordcount(hdfs.data, hdfs.out)
results ← from.dfs(out)
results.df ← as.data.frame(results, stringsAsFactors=F)
colnames(results.df) ← c(’word’, ’count’)
head(results.df)
YSR Engineering College of YVU, Proddatur, Kadapa
Big Data Analytics with R and Hadoop
November 30, 2016 Slide: 68 / 70
69. Outline Introduction RHadoop RHadoop Installation rhdfs rmr2 Examples
WordCount Output
YSR Engineering College of YVU, Proddatur, Kadapa
Big Data Analytics with R and Hadoop
November 30, 2016 Slide: 69 / 70
70. Outline Introduction RHadoop RHadoop Installation rhdfs rmr2 Examples
thank You
YSR Engineering College of YVU, Proddatur, Kadapa
Big Data Analytics with R and Hadoop
November 30, 2016 Slide: 70 / 70