SlideShare uma empresa Scribd logo
1 de 17
Baixar para ler offline
RHive tutorial - Installation
There are 3 ways to install RHive.

   •   Installation using CRAN
   •   Download from RHive project homepage an already built R package then use
       R CMD to install
   •   Download the source from Github, build, then install.

Excluding the version deployed in CRAN, all RHive packages and sources
can be found in the site below:

RHive’s Github repository path: https://github.com/nexr/RHive

Contents of this Tutorial
This tutorial explains how to install and run R and RHive in an environment
where Hadoop and Hive are running.

Environments used in this Tutorial
This tutorial is written with installing RHive on a CentOS5 Linux 64bit version
in mind.
Installation procedures on other Linuxes or Mac OS x are virtually identical.
Only the methods of installing packages such as git or ant may differ for each
version of deployment.
Method of using RHive in Windows will be provided as a separate article.


Hadoop and Hive Structural Environment
The modules installed and are running with the servers used in this tutorial are
as follows.

10.1.1.1 - Hadoop namenode, Hive server, R, RHive
10.1.1.[2-4] - Hadoop job node, DFS node, Rserve node

Thus, this tutorial supposes the following have already been composed.

   •   Suppose Hadoop namenode is installed in server 10.1.1.1 and Hive is
       installed and Hive server is running.
   •   Servers 10.1.1.2, 10.1.1.3, and 10.1.1.4 has Hadoop DFS node and Hadoop
       Job node running in them.
   •   Suppose Hadoop and Hive are functioning as normal.
Should you require guidance beginning from Hadoop and Hive installation
then please use the Hive and Hadoop references.

Note

It’s generally not a good idea to install things of functions other than
namenode to Hadoop namenode, but for the sake of fast composition and
small-scale cluster setup (and out of convenience), this tutorial installs Hive
server, R, and RHive.
Should a greater scale with simultaneous usage by multiple users are desired,
an appropriately altered application of the contents of this tutorial should
suffice.

Method of Installing Git to Download Sources
It is not such a bother to download the source code from Github and installing
it and on top of that there is the advantage of being able to directly build and
use the newest packages.
If a problem is found in the currently used RHive and there are source code
updates, it is faster to just download the source code and build it.

The Github repository where you can download RHive’s source code is as
follows: git://github.com/nexr/RHive.git

If the OS you are using is Linux or Mac OS X and you want to open a terminal
and work within the server, then you can use SSH to connect to the remote
server you plan to work on.
This tutorial is going to use a root account as a work account, if the user’s
environment grants no permission to connect via a root account, then the user
has to obtain sudoer permission and work with a sudo command.

Connecting to or opening a terminal

Open a terminal window or

connect to the server you plan to work on

ssh	
  root@10.1.1.1	
  

Note: we assume 10.1.1.1 is the server which RHive should be installed

Download Source Code

Make a temporary directory and download RHive source via git in it.
And move to the automatically created subdirectory, ‘RHive’.
mkdir	
  RHive_source	
  
cd	
  RHive_source	
  
git	
  clone	
  git://github.com/nexr/RHive.git	
  
#	
  if	
  you	
  succeed,	
  the	
  name	
  "RHive'	
  is	
  made	
  automatically	
  
cd	
  RHive	
  

If there is no git and therefore be unable to clone, you must use the command
below to install git and follow the directions above.

yum	
  install	
  git	
  

Using ant to build jar

Before building RHive package, one must build sub modules written in java
and ends with jar file extension
This may not be required in the cases of downloading from CRAN or
downloading the final version of a package,
this procedure is required in the case of downloading the source and manually
building it.
That is, the jar module used in RHive sub modules must be compiled and
readymade before RHive package becomes made into a form that can be
installed by R.

You can compile jar files which ant will include in the RHive sub modules.

ant	
  build	
  

If there is no ant then install ant to Linux first, then execute the
aforementioned procedures.
And java must be installed, of course.
Ant can be installed with the following command:

yum	
  install	
  ant	
  

Once the command has been executed then the following can result:

#	
  antBuildfile:	
  build.xml	
  
compile:	
  	
  	
  	
  [mkdir]	
  Created	
  dir:	
  
/mnt/srv/RHive_package/RHive/build/classes	
  	
  	
  	
  [javac]	
  Compiling	
  
5	
  source	
  files	
  to	
  
/mnt/srv/RHive_package/RHive/build/classes	
  	
  	
  	
  [unjar]	
  
Expanding:	
  
/mnt/srv/RHive_package/RHive/RHive/inst/javasrc/lib/REngine.jar	
  
into	
  /mnt/srv/RHive_package/RHive/build/classes	
  	
  	
  	
  [unjar]	
  
Expanding:	
  
/mnt/srv/RHive_package/RHive/RHive/inst/javasrc/lib/RserveEngin
e.jar	
  into	
  /mnt/srv/RHive_package/RHive/build/classes	
  
jar:	
  	
  	
  	
  	
  	
  [jar]	
  Building	
  jar:	
  
/mnt/srv/RHive_package/RHive/rhive_udf.jar	
  
cran:	
  	
  	
  	
  	
  [copy]	
  Copying	
  1	
  file	
  to	
  
/mnt/srv/RHive_package/RHive/RHive/inst/java	
  	
  	
  	
  	
  [copy]	
  Copying	
  
13	
  files	
  to	
  
/mnt/srv/RHive_package/RHive/build/CRAN/rhive/inst	
  	
  	
  	
  	
  [copy]	
  
Copying	
  9	
  files	
  to	
  
/mnt/srv/RHive_package/RHive/build/CRAN/rhive/man	
  	
  	
  	
  	
  [copy]	
  
Copying	
  3	
  files	
  to	
  
/mnt/srv/RHive_package/RHive/build/CRAN/rhive/R	
  	
  	
  	
  	
  [copy]	
  
Copying	
  1	
  file	
  to	
  
/mnt/srv/RHive_package/RHive/build/CRAN/rhive	
  	
  	
  	
  	
  [copy]	
  
Copying	
  1	
  file	
  to	
  
/mnt/srv/RHive_package/RHive/build/CRAN/rhive	
  	
  	
  [delete]	
  
Deleting:	
  /mnt/srv/RHive_package/RHive/rhive_udf.jar	
  
main:	
  
BUILD	
  SUCCESSFUL	
  

You can see the build has been successful and if it failed, the quickest
solution is to consult the RHive development team.

Building RHive Package

After making the sub modules, in order to install RHive package, it must be
made as an R package type.
The current path must be checked to see if it is the same as the directory
where jar was built, then build RHive package like below.
This can be done like this:


#	
  pwd	
  
/root/RHive_package/RHive	
  
#	
  ls	
  -­‐l	
  
total	
  76	
  
-­‐rw-­‐r-­‐-­‐r-­‐-­‐	
  1	
  root	
  root	
  	
  1413	
  Dec	
  11	
  16:41	
  ChangeLog	
  
-­‐rw-­‐r-­‐-­‐r-­‐-­‐	
  1	
  root	
  root	
  	
  2068	
  Dec	
  11	
  16:41	
  INSTALL	
  
-­‐rw-­‐r-­‐-­‐r-­‐-­‐	
  1	
  root	
  root	
  	
  2444	
  Dec	
  11	
  16:41	
  README	
  
drwxr-­‐xr-­‐x	
  5	
  root	
  root	
  	
  4096	
  Dec	
  11	
  16:41	
  RHive	
  
drwxr-­‐xr-­‐x	
  4	
  root	
  root	
  	
  4096	
  Dec	
  11	
  16:42	
  build	
  
-­‐rw-­‐r-­‐-­‐r-­‐-­‐	
  1	
  root	
  root	
  	
  2999	
  Dec	
  11	
  16:41	
  build.xml	
  
-­‐rw-­‐r-­‐-­‐r-­‐-­‐	
  1	
  root	
  root	
  35244	
  Dec	
  11	
  16:41	
  rhive-­‐logo.jpg	
  
-­‐rw-­‐r-­‐-­‐r-­‐-­‐	
  1	
  root	
  root	
  12732	
  Dec	
  11	
  16:41	
  rhive-­‐logo.png	
  
#	
  R	
  CMD	
  build	
  ./RHive	
  

If the build was successful then you may see the following result message.

* checking for file ‘./RHive/DESCRIPTION’ ... OK

* preparing ‘RHive’:

* checking DESCRIPTION meta-information ... OK

* checking for LF line-endings in source and make files

* checking for empty or unneeded directories

* building ‘RHive_0.0-4.tar.gz’

You can see RHive_0.0-4.tar.gz has been created.
This package is installable by R.
The created file’s name will be different according to the RHive package
version used for building.


Install RHive Package

Now we shall install the just created or downloaded RHive Package.
It can be installed with the following command:

R	
  CMD	
  INSTALL	
  ./RHive_0.0-­‐4.tar.gz	
  

No errors mean installation success.

But you might encounter errors related to rJava and Rserver packages.
*	
  installing	
  to	
  library	
  ‘/usr/lib64/R/library’ERROR:	
  
dependencies	
  ‘rJava’,	
  ‘Rserve’	
  are	
  not	
  available	
  for	
  package	
  
‘RHive’*	
  removing	
  ‘/usr/lib64/R/library/RHive’	
  

This error message indicates that R packages called rJava and Rserver are
not installed in the currently used R.
RHive depends on rJava and Rserve package so this package must already
be installed.
Using CRAN to install RHive will automatically install the depended packages
for your but in the case of having used source, automatic installation is difficult.
Manually install.

#	
  OpenR)	
  
install.packages("rJava")	
  
install.packages("Rserve")	
  
#	
  and	
  install	
  RHive	
  
install.packages("./RHive_0.0-­‐4.tar.gz",	
  repos=NULL)	
  

No errors indicate a successful installation.

Directly downloading RHive package from project site

The URL where you can download a built package is as follows:
https://github.com/nexr/RHive/downloads

We will be downloading a suitable version to download from the above site.
This tutorial will install the version as listed below:

RHive_0.0-­‐4-­‐2011121201.tar.gz	
  —	
  RHive_0.0-­‐4	
  SNAPSHOP	
  
(build2011121201)	
  -­‐	
  R	
  package	
  

You can also download this file via a web browser and install it to a laptop or
desktop, or install by sending the file to a remote server via FTP.
This tutorial will exemplify how to install it to a remote Linux server.

Firstly, use a terminal to connect a remote RHive to a Linux where it will be
installed.
In this tutorial it is server 10.1.1.1, located in the internal network.


ssh	
  root@10.1.1.1	
  
mkdir	
  RHive_installable	
  
cd	
  RHive_installable	
  

Now create a temporary directory and use wget to download the file.
The download link path can be obtained from the aforementioned download
site.
Remember to write –no-check-certificate in the wget option.

wget	
  -­‐-­‐no-­‐check-­‐certificate	
  	
  
https://github.com/downloads/nexr/RHive/RHive_0.0-­‐4-­‐
2011121401.tar.gz	
  

Once download is complete your current directory will contain the following file:

#	
  ls	
  -­‐al	
  
total	
  3240	
  
drwxr-­‐xr-­‐x	
  2	
  root	
  root	
  	
  	
  	
  4096	
  Dec	
  11	
  18:00	
  .	
  
drwxr-­‐x-­‐-­‐-­‐	
  6	
  root	
  root	
  	
  	
  	
  4096	
  Dec	
  11	
  18:02	
  ..	
  
-­‐rw-­‐r-­‐-­‐r-­‐-­‐	
  1	
  root	
  root	
  3302766	
  Dec	
  12	
  	
  2011	
  RHive_0.0-­‐4-­‐
2011121401.tar.gz	
  

This file is a package created by RHive development team made for uploading
it to CRAN, therefore doesn’t require a separate build procedure.
It can be straightforwardly installed by using R.

R	
  CMD	
  INSTALL	
  ./RHive_0.0-­‐4-­‐2011121201.tar.gz	
  

If you encounter an error message related to rJava and Rserve dependency
like the one mentioned before,
install those first inside R first and then install the reinstall the downloaded
files. Like below.
It was mentioned before but it can be installed via the following method:

Open	
  R	
  
install.packages('rJava')	
  
install.packages('Rserve')	
  

No errors mean a completed installation.
Downloading source code without using Git client

You can download the source code from Github even without the use of Git
command or Git client.
Github supports the use of web browsers to download the compressed source
code.
You can download the newest source code like below.

wget	
  -­‐-­‐no-­‐check-­‐certificate	
  	
  
https://github.com/nexr/RHive/zipball/master	
  	
  
-­‐O	
  RHive.zip	
  
unzip	
  RHive.zip	
  
cd	
  nexr-­‐RHive-­‐df7341c/	
  

Compiling the sources and building the package is the same as if you
downloaded RHive source via use of Git client.

Installing R and RServe

In order to use RHive, all job nodes of Hadoop must have Rserve installed.
RHive controls the Rserve by referencing slaves which is in conf of RHive.
It is not hard to install Rserve.

Connect to both Hadoop name node and job node and install R and Rserve
for each.
Except for name node: it does not need Rserve installed into it.

ssh	
  root@10.1.1.1	
  

If R is not already installed, install that first.
In CentOS5, you can use the following method to install the newest version of
R.
Remember to install R-devel, because it is necessary to install Rserve.

rpm	
  -­‐Uvh	
  
http://download.fedora.redhat.com/pub/epel/5/i386/epel-­‐release-­‐
5-­‐4.noarch.rpm	
  
	
  	
  
yum	
  install	
  R	
  
yum	
  install	
  R-­‐devel	
  
If the required packages are installed, install Rserve via the following
command.

open	
  R	
  
install.packages("Rserve")	
  

If the installed R does not possess a file named libR.so, the following error
occurs when attempting to install Rserve.

*	
  installing	
  *source*	
  package	
  ‘Rserve’	
  ...	
  
**	
  package	
  ‘Rserve’	
  successfully	
  unpacked	
  and	
  MD5	
  sums	
  checked	
  
checking	
  whether	
  to	
  compile	
  the	
  server...	
  yes	
  
configure:	
  error:	
  R	
  was	
  configured	
  without	
  -­‐-­‐enable-­‐R-­‐shlib	
  or	
  
-­‐-­‐enable-­‐R-­‐static-­‐lib	
  
	
  	
  
***	
  Rserve	
  requires	
  R	
  (shared	
  or	
  static)	
  
library.	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  ***	
  
***	
  Please	
  install	
  R	
  library	
  or	
  compile	
  R	
  with	
  either	
  -­‐-­‐enable-­‐
R-­‐shlib	
  	
  ***	
  
***	
  or	
  -­‐-­‐enable-­‐R-­‐static-­‐lib	
  
support	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  ***	
  
	
  	
  
	
  Alternatively	
  use	
  -­‐-­‐without-­‐server	
  if	
  you	
  wish	
  to	
  build	
  only	
  
Rserve	
  client.	
  
	
  	
  
	
  	
  
ERROR:	
  configuration	
  failed	
  for	
  package	
  ‘Rserve’	
  
*	
  removing	
  ‘/usr/lib64/R/library/Rserve’	
  

In order to solve this problem, when compiling R it must be compiled using --
enable-R-shlib or --enable-R-static-lib
but most Linux has these compiled with such options so this error is probably
caused by something else.
First, use the command below to search in the file path where R’s library files
are.

#	
  R	
  CMD	
  config	
  -­‐-­‐ldflags	
  
-­‐L/usr/lib64/R/lib	
  -­‐lR	
  

You might encounter the following error while executing the above command.

[root@i-­‐10-­‐24-­‐1-­‐34	
  Rserve]#	
  R	
  CMD	
  config	
  -­‐-­‐ldflags	
  
/usr/lib64/R/bin/config:	
  line	
  142:	
  make:	
  command	
  not	
  found	
  
/usr/lib64/R/bin/config:	
  line	
  143:	
  make:	
  command	
  not	
  found	
  

This means there is no ‘make’ utility and Rserve needs it for installation so
‘make’ utility has to be installed.
Install the ‘make’ utility like below and then execute “R CMD config –ldflags”
and see whether library path becomes successfully displayed.

yum	
  install	
  make	
  

And let’s check if libR.so is indeed in the printed path.

#	
  ls	
  -­‐al	
  /usr/lib64/R/lib	
  
total	
  4560	
  
drwxr-­‐xr-­‐x	
  2	
  root	
  root	
  	
  	
  	
  4096	
  Dec	
  13	
  03:00	
  .	
  
drwxr-­‐xr-­‐x	
  7	
  root	
  root	
  	
  	
  	
  4096	
  Dec	
  13	
  03:35	
  ..	
  
-­‐rwxr-­‐xr-­‐x	
  1	
  root	
  root	
  2996480	
  Nov	
  	
  8	
  14:19	
  libR.so	
  
-­‐rwxr-­‐xr-­‐x	
  1	
  root	
  root	
  	
  177176	
  Nov	
  	
  8	
  14:19	
  libRblas.so	
  
-­‐rwxr-­‐xr-­‐x	
  1	
  root	
  root	
  1470264	
  Nov	
  	
  8	
  14:19	
  libRlapack.so	
  

libR.so is confirmed to be there. Now that all preparations for installing Rserve
are complete, retry and finish installing Rserve.

open R

install.packages("Rserve")
*** Rserve requires R (shared or static) library.                                         ***

*** Please install R library or compile R with either --enable-R-shlib ***

*** or --enable-R-static-lib support

Running Rserve
Once Rserve installation is complete, use DAEMON to run Rserve.
Before running Rserve, configurations must be adjusted to enable remote
connections to Rserve.
Adjust the configurations as follows:

Connect	
  to	
  the	
  server	
  where	
  Rserve	
  will	
  be	
  run.	
  In	
  all	
  Hadoop	
  
job	
  nodes,	
  open	
  the	
  file,	
  	
  
"/etc/Rserv.conf",	
  using	
  a	
  text	
  editor.	
  If	
  there	
  is	
  no	
  such	
  
file	
  then	
  it	
  must	
  be	
  created.	
  	
  
Insert	
  'remote	
  enable'	
  into	
  the	
  file.	
  
Save	
  and	
  exit.	
  
Rserv.conf	
  can	
  configure	
  many	
  other	
  options.	
  Details	
  pertaining	
  
to	
  configuration	
  can	
  be	
  found	
  in	
  the	
  URL	
  below.	
  
	
  
http://www.rforge.net/Rserve/doc.html	
  

And then leave R and run Rserve in the command prompt.

R	
  CMD	
  Rserve	
  

If Rserve is run via Daemon then the following command can be used to
check if it is listening to any ports.

#	
  netstat	
  -­‐nltp	
  
Active	
  Internet	
  connections	
  (only	
  servers)	
  
Proto	
  Recv-­‐Q	
  Send-­‐Q	
  Local	
  Address	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  Foreign	
  
Address	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  State	
  	
  	
  	
  	
  	
  	
  PID/Program	
  name	
  
tcp	
  	
  	
  	
  	
  	
  	
  	
  0	
  	
  	
  	
  	
  	
  0	
  
0.0.0.0:6311	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  0.0.0.0:*	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  LISTEN	
  
	
  	
  	
  	
  	
  25516/Rserve	
  
tcp	
  	
  	
  	
  	
  	
  	
  	
  0	
  	
  	
  	
  	
  	
  0	
  :::59873	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  :::*	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  
	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  LISTEN	
  	
  	
  	
  	
  	
  13023/java	
  
tcp	
  	
  	
  	
  	
  	
  	
  	
  0	
  	
  	
  	
  	
  	
  0	
  :::50020	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  :::*	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  
	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  LISTEN	
  	
  	
  	
  	
  	
  13023/java	
  
tcp	
  	
  	
  	
  	
  	
  	
  	
  0	
  	
  	
  	
  	
  	
  0	
  ::ffff:127.0.0.1:46056	
  	
  	
  	
  	
  	
  :::*	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  
	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  LISTEN	
  	
  	
  	
  	
  	
  13112/java	
  
tcp	
  	
  	
  	
  	
  	
  	
  	
  0	
  	
  	
  	
  	
  	
  0	
  :::50060	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  :::*	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  
 	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  LISTEN	
  	
  	
  	
  	
  	
  13112/java	
  
tcp	
  	
  	
  	
  	
  	
  	
  	
  0	
  	
  	
  	
  	
  	
  0	
  :::22	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  :::*	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  
	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  LISTEN	
  	
  	
  	
  	
  	
  1109/sshd	
  
tcp	
  	
  	
  	
  	
  	
  	
  	
  0	
  	
  	
  	
  	
  	
  0	
  :::50010	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  :::*	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  
	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  LISTEN	
  	
  	
  	
  	
  	
  13023/java	
  
tcp	
  	
  	
  	
  	
  	
  	
  	
  0	
  	
  	
  	
  	
  	
  0	
  :::50075	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  :::*	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  
	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  LISTEN	
  	
  	
  	
  	
  	
  13023/java	
  

You can see the Rserve Daemon listening to port 6311.
Port 6311 is the default port which Rserve uses. This can be changed via
adjusting the configuration.
But don’t change it unless there is a special reason to.
And if the port isn’t open due to the firewall, then permission must be obtained
so as to enable connection between internal servers.
To check this, first see if the server where RHive will be run can achieve
connection.

#	
  connect	
  to	
  the	
  RHive	
  server	
  
ssh	
  root@10.1.1.1	
  
#	
  telnet	
  10.1.1.2	
  6311	
  
Trying	
  10.1.1.2...	
  
Connected	
  to	
  10.1.1.2.	
  
Escape	
  character	
  is	
  '^]'.	
  
Rsrv0103QAP1	
  
	
  	
  
-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐	
  
#	
  telnet	
  10.1.1.3	
  6311	
  
Trying	
  10.1.1.3...	
  
Connected	
  to	
  10.1.1.3.	
  
Escape	
  character	
  is	
  '^]'.	
  
Rsrv0103QAP1	
  
	
  	
  
-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐	
  
#	
  telnet	
  10.1.1.4	
  6311	
  
Trying	
  10.1.1.4...	
  
Connected	
  to	
  10.1.1.4.	
  
Escape	
  character	
  is	
  '^]'.	
  
Rsrv0103QAP1	
  
	
  	
  
-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐	
  

Configuring Hadoop and Hive for RHive

In order to run RHive, the laptops or desktops with RHive installed must also
have Hadoop and Hive installed, and their Hadoop configurations must also
match the configuration of the Hadoop cluster.
If the server planned for RHive installation do not have Hadoop or Hive
installed into it, then install a version same as the one installed for the Hadoop
cluster. Then copy the Hadoop’s configuration and match them up.
After matching that, configure environment variables.

export	
  HADOOP_HOME=/service/hadoop-­‐0.20.203.0	
  
export	
  HIVE_HOME=/service/hive-­‐0.7.1	
  

In the contents above, /service/hadoop-0.20.203.0is the path where Hadoop is
installed
and /service/hive-0.7.1 is where Hive is installed.
These must be put into /etc/profile

If RHive is installed in the same server as Hadoop namenode then no
separate configuring is required.
But if it’s a different server or a laptop then edit the contents of
/service/hadoop-0.20.203.0/conf to be the same as the Hadoop cluster you
plan to use.

Running the RHive Example

As stated before, in order to activate RHive, then environment variable must
be configured before running R.
To put it more precisely, a suitable environment variable must be set before
initializing RHive.
If you forgot to set HIVE_HOME and HADOOP_HOME for the laptop or
server’s environment variables, or wish to toggle between using different
versions then, as listed below, can be set after running R.

  Open	
  R	
  
Sys.setenv(HIVE_HOME="/service/hive-­‐0.7.1")	
  
 Sys.setenv(HADOOP_HOME="/service/hadoop-­‐0.20.203.0")	
  
 library(RHive)	
  

You can skip this if you edited /etc/profile and etc. This method suffers the
disadvantage of having to be done every time R is run.

Checking for and Setting RHive Environment Variables

You can check whether the environment variable is properly set by running R
and using the rhive.env() Function.
Should either Hive Home Directory or Hadoop Home Directory not properly
show up then you must recheck whether they have been correctly set.

rhive.env()	
  
Hive	
  Home	
  Directory	
  :	
  /mnt/srv/hive-­‐0.8.1	
  
Hadoop	
  Home	
  Directory	
  :	
  /mnt/srv/hadoop-­‐0.20.203.0	
  
Default	
  RServe	
  List	
  
node1	
  node2	
  node3	
  
Disconnected	
  HiveServer	
  and	
  HDFS	
  

RHive connect

After loading RHive and before doing any work, the rhive.connect function
must be called and Hive server and connection must be made.
If the connection isn’t made then RHive Functions will not work.

rhive.connect()	
  
SLF4J:	
  Class	
  path	
  contains	
  multiple	
  SLF4J	
  bindings.	
  
SLF4J:	
  Found	
  binding	
  in	
  [jar:file:/service/hive-­‐
0.7.1/lib/slf4j-­‐log4j12-­‐
1.6.1.jar!/org/slf4j/impl/StaticLoggerBinder.class]	
  
SLF4J:	
  Found	
  binding	
  in	
  [jar:file:/service/hadoop-­‐
0.20.203.0/lib/slf4j-­‐log4j12-­‐
1.4.3.jar!/org/slf4j/impl/StaticLoggerBinder.class]	
  
SLF4J:	
  See	
  http://www.slf4j.org/codes.html#multiple_bindings	
  
for	
  an	
  explanation.	
  

Checking the contents of HDFS files
You might see how many complex messages result when making the
connection. These may be ignored.
Now you can use the rhive.hdfs.* Functions to handle Hadoop’s HDFS and
these correspond to the commands which “hadoop fs” .
you can use the rhive.hdfs.ls() Function to check the HDFS’s list of files.

rhive.hdfs.ls("/")	
  
	
  	
  permission	
  owner	
  	
  	
  	
  	
  	
  group	
  	
  	
  length	
  	
  	
  	
  	
  	
  modify-­‐
time	
  	
  	
  	
  	
  	
  	
  	
  file	
  
1	
  	
  rwxr-­‐xr-­‐x	
  	
  root	
  supergroup	
  	
  	
  	
  	
  	
  	
  	
  0	
  2011-­‐12-­‐07	
  
14:27	
  	
  	
  	
  /airline	
  
2	
  	
  rwxr-­‐xr-­‐x	
  	
  root	
  supergroup	
  	
  	
  	
  	
  	
  	
  	
  0	
  2011-­‐12-­‐07	
  13:16	
  
/benchmarks	
  
3	
  	
  rw-­‐r-­‐-­‐r-­‐-­‐	
  	
  root	
  supergroup	
  11186419	
  2011-­‐12-­‐06	
  
03:59	
  	
  	
  /messages	
  
4	
  	
  rwxr-­‐xr-­‐x	
  	
  root	
  supergroup	
  	
  	
  	
  	
  	
  	
  	
  0	
  2011-­‐12-­‐07	
  
22:05	
  	
  	
  	
  	
  	
  	
  	
  /mnt	
  
5	
  	
  rwxr-­‐xr-­‐x	
  	
  root	
  supergroup	
  	
  	
  	
  	
  	
  	
  	
  0	
  2011-­‐12-­‐07	
  
22:15	
  	
  	
  	
  	
  	
  /rhive	
  
6	
  	
  rwxr-­‐xr-­‐x	
  	
  root	
  supergroup	
  	
  	
  	
  	
  	
  	
  	
  0	
  2011-­‐12-­‐07	
  
20:19	
  	
  	
  	
  	
  	
  	
  	
  /tmp	
  

Checking table list of Hive

Also, you can check the list of tables registered in Hive by using the
rhive.list.tables() Function.
If you have not made any tables then you can see the following result.

rhive.list.tables()	
  
[1]	
  tab_name	
  
<0	
  rows>	
  (or	
  0-­‐length	
  row.names)	
  

Creating Hive table

You can use a simple command to save R’s data frame to a Hive table.

tablename	
  <-­‐	
  rhive.write.table(USArrests)	
  
USArrests is data provided with R. RHive converts data frame’s object name
into Hive table name and store it as Hive table.

Checking Table descriptions

And you can use the rhive.list.desc() Function to see the descriptions of the
table of Hive.


rhive.desc.table("USArrests")	
  
	
  	
  col_name	
  data_type	
  comment	
  
1	
  	
  rowname	
  	
  	
  	
  string	
  
2	
  	
  	
  murder	
  	
  	
  	
  double	
  
3	
  	
  assault	
  	
  	
  	
  	
  	
  	
  int	
  
4	
  urbanpop	
  	
  	
  	
  	
  	
  	
  int	
  
5	
  	
  	
  	
  	
  rape	
  	
  	
  	
  double	
  

As a note, Hive’s table names do not distinguish between upper and lower
cases.

Creating Hive Tables 2

It is possible to take other data in MASS package or data with CSV files
loaded and store them into Hive.

library(MASS)	
  
tablename	
  <-­‐	
  rhive.write.table(Aids2)	
  
rhive.desc.table(tablename)	
  
rhive.load.table(tablename)	
  

This method is useful for uploading to Hive some data of relatively small sizes
and if attempting to save several Gbs of data to Hive, the recommended
method is to save files to HDFS and configuring as an external table
RHive currently does not automatically handle this for users and such a
feature is still in the drawing board.

Executing a simple SQL syntax
You can use the rhive.query() function to send SQL to Hive.
Let’s try running a simple SQL syntax that checks the entire number of
Records for the Hive table, usarrests.

rhive.query("SELECT	
  COUNT(*)	
  FROM	
  usarrests")	
  
	
  	
  X_c0	
  
1	
  	
  	
  50	
  

The SQL syntax executed above is the result of Map/Reducing using Hadoop
and Hive. If you saw SQL results like above, then it indicates the RHive,
Hadoop, and Hive configurations are alright, and Hadoop calculated and
outputted the total count of the input data.

One thing to watch out for is that this example only used a very small data so
it is not safe to assert this has made full use of the potential of Hive and
Hadoop, which are distributed processing platforms.

Small data such as ”usarrests” that can be loaded into a single server’s
memory can be processed within R, without the use of RHive.
This step is just checking if the configurations are properly calibrated and
basic functions are in working order.

If you wish to use RHive through Hadoop and Hive, then it is fitting to use data
at least the proportions ranging from several GiBs to the tens of GiBs.

FAQ and Contact Info
Consult the following reference materials for explanations and details for
RHives for each Function.
If you find a bug or find difficulty in using RHive then do a bug report on the
RHive site or ask the RHive development team via e-mail.
The RHive development team is always open and responsive to questions,
requests, and bug reports.
e-mail: rhive@nexr.com

Mais conteúdo relacionado

Mais procurados

Introduction to Flume
Introduction to FlumeIntroduction to Flume
Introduction to FlumeRupak Roy
 
Introduction to scoop and its functions
Introduction to scoop and its functionsIntroduction to scoop and its functions
Introduction to scoop and its functionsRupak Roy
 
Install and upgrade Oracle grid infrastructure 12.1.0.2
Install and upgrade Oracle grid infrastructure 12.1.0.2Install and upgrade Oracle grid infrastructure 12.1.0.2
Install and upgrade Oracle grid infrastructure 12.1.0.2Biju Thomas
 
Mahout Workshop on Google Cloud Platform
Mahout Workshop on Google Cloud PlatformMahout Workshop on Google Cloud Platform
Mahout Workshop on Google Cloud PlatformIMC Institute
 
Hadoop 2.0 cluster setup on ubuntu 14.04 (64 bit)
Hadoop 2.0 cluster setup on ubuntu 14.04 (64 bit)Hadoop 2.0 cluster setup on ubuntu 14.04 (64 bit)
Hadoop 2.0 cluster setup on ubuntu 14.04 (64 bit)Nag Arvind Gudiseva
 
Hadoop 20111117
Hadoop 20111117Hadoop 20111117
Hadoop 20111117exsuns
 
Linux Webserver Installation Command and GUI.ppt
Linux Webserver Installation Command and GUI.pptLinux Webserver Installation Command and GUI.ppt
Linux Webserver Installation Command and GUI.pptwebhostingguy
 
Content server installation guide
Content server installation guideContent server installation guide
Content server installation guideNaveed Bashir
 
Hive data migration (export/import)
Hive data migration (export/import)Hive data migration (export/import)
Hive data migration (export/import)Bopyo Hong
 
Lession1 Linux Preview
Lession1 Linux PreviewLession1 Linux Preview
Lession1 Linux Previewleminhvuong
 
Hadoop installation on windows
Hadoop installation on windows Hadoop installation on windows
Hadoop installation on windows habeebulla g
 
Asm disk group migration from
Asm disk group migration from Asm disk group migration from
Asm disk group migration from Anar Godjaev
 

Mais procurados (16)

Introduction to Flume
Introduction to FlumeIntroduction to Flume
Introduction to Flume
 
Introduction to scoop and its functions
Introduction to scoop and its functionsIntroduction to scoop and its functions
Introduction to scoop and its functions
 
Install and upgrade Oracle grid infrastructure 12.1.0.2
Install and upgrade Oracle grid infrastructure 12.1.0.2Install and upgrade Oracle grid infrastructure 12.1.0.2
Install and upgrade Oracle grid infrastructure 12.1.0.2
 
Mahout Workshop on Google Cloud Platform
Mahout Workshop on Google Cloud PlatformMahout Workshop on Google Cloud Platform
Mahout Workshop on Google Cloud Platform
 
BD-zero lecture.pptx
BD-zero lecture.pptxBD-zero lecture.pptx
BD-zero lecture.pptx
 
Hadoop 2.0 cluster setup on ubuntu 14.04 (64 bit)
Hadoop 2.0 cluster setup on ubuntu 14.04 (64 bit)Hadoop 2.0 cluster setup on ubuntu 14.04 (64 bit)
Hadoop 2.0 cluster setup on ubuntu 14.04 (64 bit)
 
Hadoop 20111117
Hadoop 20111117Hadoop 20111117
Hadoop 20111117
 
Linux Webserver Installation Command and GUI.ppt
Linux Webserver Installation Command and GUI.pptLinux Webserver Installation Command and GUI.ppt
Linux Webserver Installation Command and GUI.ppt
 
Hadoop completereference
Hadoop completereferenceHadoop completereference
Hadoop completereference
 
Content server installation guide
Content server installation guideContent server installation guide
Content server installation guide
 
Hive data migration (export/import)
Hive data migration (export/import)Hive data migration (export/import)
Hive data migration (export/import)
 
Lession1 Linux Preview
Lession1 Linux PreviewLession1 Linux Preview
Lession1 Linux Preview
 
Hadoop installation on windows
Hadoop installation on windows Hadoop installation on windows
Hadoop installation on windows
 
Lession2 Xinetd
Lession2 XinetdLession2 Xinetd
Lession2 Xinetd
 
Ex-8-hive.pptx
Ex-8-hive.pptxEx-8-hive.pptx
Ex-8-hive.pptx
 
Asm disk group migration from
Asm disk group migration from Asm disk group migration from
Asm disk group migration from
 

Semelhante a RHive tutorial - Installation

R hive tutorial supplement 1 - Installing Hadoop
R hive tutorial supplement 1 - Installing HadoopR hive tutorial supplement 1 - Installing Hadoop
R hive tutorial supplement 1 - Installing HadoopAiden Seonghak Hong
 
Single node setup
Single node setupSingle node setup
Single node setupKBCHOW123
 
Deploying your rails application to a clean ubuntu 10
Deploying your rails application to a clean ubuntu 10Deploying your rails application to a clean ubuntu 10
Deploying your rails application to a clean ubuntu 10Maurício Linhares
 
LuisRodriguezLocalDevEnvironmentsDrupalOpenDays
LuisRodriguezLocalDevEnvironmentsDrupalOpenDaysLuisRodriguezLocalDevEnvironmentsDrupalOpenDays
LuisRodriguezLocalDevEnvironmentsDrupalOpenDaysLuis Rodríguez Castromil
 
R hive tutorial supplement 2 - Installing Hive
R hive tutorial supplement 2 - Installing HiveR hive tutorial supplement 2 - Installing Hive
R hive tutorial supplement 2 - Installing HiveAiden Seonghak Hong
 
Getting started-with-zend-framework
Getting started-with-zend-frameworkGetting started-with-zend-framework
Getting started-with-zend-frameworkMarcelo da Rocha
 
02 Hadoop deployment and configuration
02 Hadoop deployment and configuration02 Hadoop deployment and configuration
02 Hadoop deployment and configurationSubhas Kumar Ghosh
 
Apache HDFS - Lab Assignment
Apache HDFS - Lab AssignmentApache HDFS - Lab Assignment
Apache HDFS - Lab AssignmentFarzad Nozarian
 
How tos nagios - centos wiki
How tos nagios - centos wikiHow tos nagios - centos wiki
How tos nagios - centos wikishahab071
 
Rapid miner r extension 5
Rapid miner r extension 5Rapid miner r extension 5
Rapid miner r extension 5raz3366
 
Single node hadoop cluster installation
Single node hadoop cluster installation Single node hadoop cluster installation
Single node hadoop cluster installation Mahantesh Angadi
 
How to make debian package from scratch (linux)
How to make debian package from scratch (linux)How to make debian package from scratch (linux)
How to make debian package from scratch (linux)Thierry Gayet
 
Control your deployments with Capistrano
Control your deployments with CapistranoControl your deployments with Capistrano
Control your deployments with CapistranoRamazan K
 
Configure h base hadoop and hbase client
Configure h base hadoop and hbase clientConfigure h base hadoop and hbase client
Configure h base hadoop and hbase clientShashwat Shriparv
 
Environment isolation with Docker (Alex Medvedev, Alpari)
Environment isolation with Docker (Alex Medvedev, Alpari)Environment isolation with Docker (Alex Medvedev, Alpari)
Environment isolation with Docker (Alex Medvedev, Alpari)Symfoniacs
 

Semelhante a RHive tutorial - Installation (20)

R hive tutorial supplement 1 - Installing Hadoop
R hive tutorial supplement 1 - Installing HadoopR hive tutorial supplement 1 - Installing Hadoop
R hive tutorial supplement 1 - Installing Hadoop
 
Single node setup
Single node setupSingle node setup
Single node setup
 
Deploying your rails application to a clean ubuntu 10
Deploying your rails application to a clean ubuntu 10Deploying your rails application to a clean ubuntu 10
Deploying your rails application to a clean ubuntu 10
 
LuisRodriguezLocalDevEnvironmentsDrupalOpenDays
LuisRodriguezLocalDevEnvironmentsDrupalOpenDaysLuisRodriguezLocalDevEnvironmentsDrupalOpenDays
LuisRodriguezLocalDevEnvironmentsDrupalOpenDays
 
R hive tutorial supplement 2 - Installing Hive
R hive tutorial supplement 2 - Installing HiveR hive tutorial supplement 2 - Installing Hive
R hive tutorial supplement 2 - Installing Hive
 
Rally_Docker_deployment_JumpVM
Rally_Docker_deployment_JumpVMRally_Docker_deployment_JumpVM
Rally_Docker_deployment_JumpVM
 
Linux Presentation
Linux PresentationLinux Presentation
Linux Presentation
 
Getting started-with-zend-framework
Getting started-with-zend-frameworkGetting started-with-zend-framework
Getting started-with-zend-framework
 
02 Hadoop deployment and configuration
02 Hadoop deployment and configuration02 Hadoop deployment and configuration
02 Hadoop deployment and configuration
 
Apache HDFS - Lab Assignment
Apache HDFS - Lab AssignmentApache HDFS - Lab Assignment
Apache HDFS - Lab Assignment
 
How tos nagios - centos wiki
How tos nagios - centos wikiHow tos nagios - centos wiki
How tos nagios - centos wiki
 
Rapid miner r extension 5
Rapid miner r extension 5Rapid miner r extension 5
Rapid miner r extension 5
 
grate techniques
grate techniquesgrate techniques
grate techniques
 
Single node hadoop cluster installation
Single node hadoop cluster installation Single node hadoop cluster installation
Single node hadoop cluster installation
 
Install Guide
Install GuideInstall Guide
Install Guide
 
How to make debian package from scratch (linux)
How to make debian package from scratch (linux)How to make debian package from scratch (linux)
How to make debian package from scratch (linux)
 
Control your deployments with Capistrano
Control your deployments with CapistranoControl your deployments with Capistrano
Control your deployments with Capistrano
 
Configure h base hadoop and hbase client
Configure h base hadoop and hbase clientConfigure h base hadoop and hbase client
Configure h base hadoop and hbase client
 
Run wordcount job (hadoop)
Run wordcount job (hadoop)Run wordcount job (hadoop)
Run wordcount job (hadoop)
 
Environment isolation with Docker (Alex Medvedev, Alpari)
Environment isolation with Docker (Alex Medvedev, Alpari)Environment isolation with Docker (Alex Medvedev, Alpari)
Environment isolation with Docker (Alex Medvedev, Alpari)
 

Mais de Aiden Seonghak Hong

RHive tutorial supplement 3: RHive 튜토리얼 부록 3 - RStudio 설치
RHive tutorial supplement 3: RHive 튜토리얼 부록 3 - RStudio 설치RHive tutorial supplement 3: RHive 튜토리얼 부록 3 - RStudio 설치
RHive tutorial supplement 3: RHive 튜토리얼 부록 3 - RStudio 설치Aiden Seonghak Hong
 
RHive tutorial supplement 2: RHive 튜토리얼 부록 2 - Hive 설치
RHive tutorial supplement 2: RHive 튜토리얼 부록 2 - Hive 설치RHive tutorial supplement 2: RHive 튜토리얼 부록 2 - Hive 설치
RHive tutorial supplement 2: RHive 튜토리얼 부록 2 - Hive 설치Aiden Seonghak Hong
 
RHive tutorial supplement 1: RHive 튜토리얼 부록 1 - Hadoop 설치
RHive tutorial supplement 1: RHive 튜토리얼 부록 1 - Hadoop 설치RHive tutorial supplement 1: RHive 튜토리얼 부록 1 - Hadoop 설치
RHive tutorial supplement 1: RHive 튜토리얼 부록 1 - Hadoop 설치Aiden Seonghak Hong
 
RHive tutorial 5: RHive 튜토리얼 5 - apply 함수와 맵리듀스
RHive tutorial 5: RHive 튜토리얼 5 - apply 함수와 맵리듀스RHive tutorial 5: RHive 튜토리얼 5 - apply 함수와 맵리듀스
RHive tutorial 5: RHive 튜토리얼 5 - apply 함수와 맵리듀스Aiden Seonghak Hong
 
RHive tutorial 4: RHive 튜토리얼 4 - UDF, UDTF, UDAF 함수
RHive tutorial 4: RHive 튜토리얼 4 - UDF, UDTF, UDAF 함수RHive tutorial 4: RHive 튜토리얼 4 - UDF, UDTF, UDAF 함수
RHive tutorial 4: RHive 튜토리얼 4 - UDF, UDTF, UDAF 함수Aiden Seonghak Hong
 
RHive tutorial 3: RHive 튜토리얼 3 - HDFS 함수
RHive tutorial 3: RHive 튜토리얼 3 - HDFS 함수RHive tutorial 3: RHive 튜토리얼 3 - HDFS 함수
RHive tutorial 3: RHive 튜토리얼 3 - HDFS 함수Aiden Seonghak Hong
 
RHive tutorial 2: RHive 튜토리얼 2 - 기본 함수
RHive tutorial 2: RHive 튜토리얼 2 - 기본 함수RHive tutorial 2: RHive 튜토리얼 2 - 기본 함수
RHive tutorial 2: RHive 튜토리얼 2 - 기본 함수Aiden Seonghak Hong
 
RHive tutorial 1: RHive 튜토리얼 1 - 설치 및 설정
RHive tutorial 1: RHive 튜토리얼 1 - 설치 및 설정RHive tutorial 1: RHive 튜토리얼 1 - 설치 및 설정
RHive tutorial 1: RHive 튜토리얼 1 - 설치 및 설정Aiden Seonghak Hong
 
R hive tutorial - apply functions and map reduce
R hive tutorial - apply functions and map reduceR hive tutorial - apply functions and map reduce
R hive tutorial - apply functions and map reduceAiden Seonghak Hong
 
RHive tutorials - Basic functions
RHive tutorials - Basic functionsRHive tutorials - Basic functions
RHive tutorials - Basic functionsAiden Seonghak Hong
 

Mais de Aiden Seonghak Hong (12)

IoT and Big data with R
IoT and Big data with RIoT and Big data with R
IoT and Big data with R
 
RHive tutorial supplement 3: RHive 튜토리얼 부록 3 - RStudio 설치
RHive tutorial supplement 3: RHive 튜토리얼 부록 3 - RStudio 설치RHive tutorial supplement 3: RHive 튜토리얼 부록 3 - RStudio 설치
RHive tutorial supplement 3: RHive 튜토리얼 부록 3 - RStudio 설치
 
RHive tutorial supplement 2: RHive 튜토리얼 부록 2 - Hive 설치
RHive tutorial supplement 2: RHive 튜토리얼 부록 2 - Hive 설치RHive tutorial supplement 2: RHive 튜토리얼 부록 2 - Hive 설치
RHive tutorial supplement 2: RHive 튜토리얼 부록 2 - Hive 설치
 
RHive tutorial supplement 1: RHive 튜토리얼 부록 1 - Hadoop 설치
RHive tutorial supplement 1: RHive 튜토리얼 부록 1 - Hadoop 설치RHive tutorial supplement 1: RHive 튜토리얼 부록 1 - Hadoop 설치
RHive tutorial supplement 1: RHive 튜토리얼 부록 1 - Hadoop 설치
 
RHive tutorial 5: RHive 튜토리얼 5 - apply 함수와 맵리듀스
RHive tutorial 5: RHive 튜토리얼 5 - apply 함수와 맵리듀스RHive tutorial 5: RHive 튜토리얼 5 - apply 함수와 맵리듀스
RHive tutorial 5: RHive 튜토리얼 5 - apply 함수와 맵리듀스
 
RHive tutorial 4: RHive 튜토리얼 4 - UDF, UDTF, UDAF 함수
RHive tutorial 4: RHive 튜토리얼 4 - UDF, UDTF, UDAF 함수RHive tutorial 4: RHive 튜토리얼 4 - UDF, UDTF, UDAF 함수
RHive tutorial 4: RHive 튜토리얼 4 - UDF, UDTF, UDAF 함수
 
RHive tutorial 3: RHive 튜토리얼 3 - HDFS 함수
RHive tutorial 3: RHive 튜토리얼 3 - HDFS 함수RHive tutorial 3: RHive 튜토리얼 3 - HDFS 함수
RHive tutorial 3: RHive 튜토리얼 3 - HDFS 함수
 
RHive tutorial 2: RHive 튜토리얼 2 - 기본 함수
RHive tutorial 2: RHive 튜토리얼 2 - 기본 함수RHive tutorial 2: RHive 튜토리얼 2 - 기본 함수
RHive tutorial 2: RHive 튜토리얼 2 - 기본 함수
 
RHive tutorial 1: RHive 튜토리얼 1 - 설치 및 설정
RHive tutorial 1: RHive 튜토리얼 1 - 설치 및 설정RHive tutorial 1: RHive 튜토리얼 1 - 설치 및 설정
RHive tutorial 1: RHive 튜토리얼 1 - 설치 및 설정
 
R hive tutorial 1
R hive tutorial 1R hive tutorial 1
R hive tutorial 1
 
R hive tutorial - apply functions and map reduce
R hive tutorial - apply functions and map reduceR hive tutorial - apply functions and map reduce
R hive tutorial - apply functions and map reduce
 
RHive tutorials - Basic functions
RHive tutorials - Basic functionsRHive tutorials - Basic functions
RHive tutorials - Basic functions
 

Último

How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Servicegiselly40
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Allon Mureinik
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j
 

Último (20)

How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 

RHive tutorial - Installation

  • 1. RHive tutorial - Installation There are 3 ways to install RHive. • Installation using CRAN • Download from RHive project homepage an already built R package then use R CMD to install • Download the source from Github, build, then install. Excluding the version deployed in CRAN, all RHive packages and sources can be found in the site below: RHive’s Github repository path: https://github.com/nexr/RHive Contents of this Tutorial This tutorial explains how to install and run R and RHive in an environment where Hadoop and Hive are running. Environments used in this Tutorial This tutorial is written with installing RHive on a CentOS5 Linux 64bit version in mind. Installation procedures on other Linuxes or Mac OS x are virtually identical. Only the methods of installing packages such as git or ant may differ for each version of deployment. Method of using RHive in Windows will be provided as a separate article. Hadoop and Hive Structural Environment The modules installed and are running with the servers used in this tutorial are as follows. 10.1.1.1 - Hadoop namenode, Hive server, R, RHive 10.1.1.[2-4] - Hadoop job node, DFS node, Rserve node Thus, this tutorial supposes the following have already been composed. • Suppose Hadoop namenode is installed in server 10.1.1.1 and Hive is installed and Hive server is running. • Servers 10.1.1.2, 10.1.1.3, and 10.1.1.4 has Hadoop DFS node and Hadoop Job node running in them. • Suppose Hadoop and Hive are functioning as normal.
  • 2. Should you require guidance beginning from Hadoop and Hive installation then please use the Hive and Hadoop references. Note It’s generally not a good idea to install things of functions other than namenode to Hadoop namenode, but for the sake of fast composition and small-scale cluster setup (and out of convenience), this tutorial installs Hive server, R, and RHive. Should a greater scale with simultaneous usage by multiple users are desired, an appropriately altered application of the contents of this tutorial should suffice. Method of Installing Git to Download Sources It is not such a bother to download the source code from Github and installing it and on top of that there is the advantage of being able to directly build and use the newest packages. If a problem is found in the currently used RHive and there are source code updates, it is faster to just download the source code and build it. The Github repository where you can download RHive’s source code is as follows: git://github.com/nexr/RHive.git If the OS you are using is Linux or Mac OS X and you want to open a terminal and work within the server, then you can use SSH to connect to the remote server you plan to work on. This tutorial is going to use a root account as a work account, if the user’s environment grants no permission to connect via a root account, then the user has to obtain sudoer permission and work with a sudo command. Connecting to or opening a terminal Open a terminal window or connect to the server you plan to work on ssh  root@10.1.1.1   Note: we assume 10.1.1.1 is the server which RHive should be installed Download Source Code Make a temporary directory and download RHive source via git in it. And move to the automatically created subdirectory, ‘RHive’.
  • 3. mkdir  RHive_source   cd  RHive_source   git  clone  git://github.com/nexr/RHive.git   #  if  you  succeed,  the  name  "RHive'  is  made  automatically   cd  RHive   If there is no git and therefore be unable to clone, you must use the command below to install git and follow the directions above. yum  install  git   Using ant to build jar Before building RHive package, one must build sub modules written in java and ends with jar file extension This may not be required in the cases of downloading from CRAN or downloading the final version of a package, this procedure is required in the case of downloading the source and manually building it. That is, the jar module used in RHive sub modules must be compiled and readymade before RHive package becomes made into a form that can be installed by R. You can compile jar files which ant will include in the RHive sub modules. ant  build   If there is no ant then install ant to Linux first, then execute the aforementioned procedures. And java must be installed, of course. Ant can be installed with the following command: yum  install  ant   Once the command has been executed then the following can result: #  antBuildfile:  build.xml   compile:        [mkdir]  Created  dir:   /mnt/srv/RHive_package/RHive/build/classes        [javac]  Compiling   5  source  files  to  
  • 4. /mnt/srv/RHive_package/RHive/build/classes        [unjar]   Expanding:   /mnt/srv/RHive_package/RHive/RHive/inst/javasrc/lib/REngine.jar   into  /mnt/srv/RHive_package/RHive/build/classes        [unjar]   Expanding:   /mnt/srv/RHive_package/RHive/RHive/inst/javasrc/lib/RserveEngin e.jar  into  /mnt/srv/RHive_package/RHive/build/classes   jar:            [jar]  Building  jar:   /mnt/srv/RHive_package/RHive/rhive_udf.jar   cran:          [copy]  Copying  1  file  to   /mnt/srv/RHive_package/RHive/RHive/inst/java          [copy]  Copying   13  files  to   /mnt/srv/RHive_package/RHive/build/CRAN/rhive/inst          [copy]   Copying  9  files  to   /mnt/srv/RHive_package/RHive/build/CRAN/rhive/man          [copy]   Copying  3  files  to   /mnt/srv/RHive_package/RHive/build/CRAN/rhive/R          [copy]   Copying  1  file  to   /mnt/srv/RHive_package/RHive/build/CRAN/rhive          [copy]   Copying  1  file  to   /mnt/srv/RHive_package/RHive/build/CRAN/rhive      [delete]   Deleting:  /mnt/srv/RHive_package/RHive/rhive_udf.jar   main:   BUILD  SUCCESSFUL   You can see the build has been successful and if it failed, the quickest solution is to consult the RHive development team. Building RHive Package After making the sub modules, in order to install RHive package, it must be made as an R package type. The current path must be checked to see if it is the same as the directory where jar was built, then build RHive package like below. This can be done like this: #  pwd   /root/RHive_package/RHive   #  ls  -­‐l   total  76  
  • 5. -­‐rw-­‐r-­‐-­‐r-­‐-­‐  1  root  root    1413  Dec  11  16:41  ChangeLog   -­‐rw-­‐r-­‐-­‐r-­‐-­‐  1  root  root    2068  Dec  11  16:41  INSTALL   -­‐rw-­‐r-­‐-­‐r-­‐-­‐  1  root  root    2444  Dec  11  16:41  README   drwxr-­‐xr-­‐x  5  root  root    4096  Dec  11  16:41  RHive   drwxr-­‐xr-­‐x  4  root  root    4096  Dec  11  16:42  build   -­‐rw-­‐r-­‐-­‐r-­‐-­‐  1  root  root    2999  Dec  11  16:41  build.xml   -­‐rw-­‐r-­‐-­‐r-­‐-­‐  1  root  root  35244  Dec  11  16:41  rhive-­‐logo.jpg   -­‐rw-­‐r-­‐-­‐r-­‐-­‐  1  root  root  12732  Dec  11  16:41  rhive-­‐logo.png   #  R  CMD  build  ./RHive   If the build was successful then you may see the following result message. * checking for file ‘./RHive/DESCRIPTION’ ... OK * preparing ‘RHive’: * checking DESCRIPTION meta-information ... OK * checking for LF line-endings in source and make files * checking for empty or unneeded directories * building ‘RHive_0.0-4.tar.gz’ You can see RHive_0.0-4.tar.gz has been created. This package is installable by R. The created file’s name will be different according to the RHive package version used for building. Install RHive Package Now we shall install the just created or downloaded RHive Package. It can be installed with the following command: R  CMD  INSTALL  ./RHive_0.0-­‐4.tar.gz   No errors mean installation success. But you might encounter errors related to rJava and Rserver packages.
  • 6. *  installing  to  library  ‘/usr/lib64/R/library’ERROR:   dependencies  ‘rJava’,  ‘Rserve’  are  not  available  for  package   ‘RHive’*  removing  ‘/usr/lib64/R/library/RHive’   This error message indicates that R packages called rJava and Rserver are not installed in the currently used R. RHive depends on rJava and Rserve package so this package must already be installed. Using CRAN to install RHive will automatically install the depended packages for your but in the case of having used source, automatic installation is difficult. Manually install. #  OpenR)   install.packages("rJava")   install.packages("Rserve")   #  and  install  RHive   install.packages("./RHive_0.0-­‐4.tar.gz",  repos=NULL)   No errors indicate a successful installation. Directly downloading RHive package from project site The URL where you can download a built package is as follows: https://github.com/nexr/RHive/downloads We will be downloading a suitable version to download from the above site. This tutorial will install the version as listed below: RHive_0.0-­‐4-­‐2011121201.tar.gz  —  RHive_0.0-­‐4  SNAPSHOP   (build2011121201)  -­‐  R  package   You can also download this file via a web browser and install it to a laptop or desktop, or install by sending the file to a remote server via FTP. This tutorial will exemplify how to install it to a remote Linux server. Firstly, use a terminal to connect a remote RHive to a Linux where it will be installed. In this tutorial it is server 10.1.1.1, located in the internal network. ssh  root@10.1.1.1  
  • 7. mkdir  RHive_installable   cd  RHive_installable   Now create a temporary directory and use wget to download the file. The download link path can be obtained from the aforementioned download site. Remember to write –no-check-certificate in the wget option. wget  -­‐-­‐no-­‐check-­‐certificate     https://github.com/downloads/nexr/RHive/RHive_0.0-­‐4-­‐ 2011121401.tar.gz   Once download is complete your current directory will contain the following file: #  ls  -­‐al   total  3240   drwxr-­‐xr-­‐x  2  root  root        4096  Dec  11  18:00  .   drwxr-­‐x-­‐-­‐-­‐  6  root  root        4096  Dec  11  18:02  ..   -­‐rw-­‐r-­‐-­‐r-­‐-­‐  1  root  root  3302766  Dec  12    2011  RHive_0.0-­‐4-­‐ 2011121401.tar.gz   This file is a package created by RHive development team made for uploading it to CRAN, therefore doesn’t require a separate build procedure. It can be straightforwardly installed by using R. R  CMD  INSTALL  ./RHive_0.0-­‐4-­‐2011121201.tar.gz   If you encounter an error message related to rJava and Rserve dependency like the one mentioned before, install those first inside R first and then install the reinstall the downloaded files. Like below. It was mentioned before but it can be installed via the following method: Open  R   install.packages('rJava')   install.packages('Rserve')   No errors mean a completed installation.
  • 8. Downloading source code without using Git client You can download the source code from Github even without the use of Git command or Git client. Github supports the use of web browsers to download the compressed source code. You can download the newest source code like below. wget  -­‐-­‐no-­‐check-­‐certificate     https://github.com/nexr/RHive/zipball/master     -­‐O  RHive.zip   unzip  RHive.zip   cd  nexr-­‐RHive-­‐df7341c/   Compiling the sources and building the package is the same as if you downloaded RHive source via use of Git client. Installing R and RServe In order to use RHive, all job nodes of Hadoop must have Rserve installed. RHive controls the Rserve by referencing slaves which is in conf of RHive. It is not hard to install Rserve. Connect to both Hadoop name node and job node and install R and Rserve for each. Except for name node: it does not need Rserve installed into it. ssh  root@10.1.1.1   If R is not already installed, install that first. In CentOS5, you can use the following method to install the newest version of R. Remember to install R-devel, because it is necessary to install Rserve. rpm  -­‐Uvh   http://download.fedora.redhat.com/pub/epel/5/i386/epel-­‐release-­‐ 5-­‐4.noarch.rpm       yum  install  R   yum  install  R-­‐devel  
  • 9. If the required packages are installed, install Rserve via the following command. open  R   install.packages("Rserve")   If the installed R does not possess a file named libR.so, the following error occurs when attempting to install Rserve. *  installing  *source*  package  ‘Rserve’  ...   **  package  ‘Rserve’  successfully  unpacked  and  MD5  sums  checked   checking  whether  to  compile  the  server...  yes   configure:  error:  R  was  configured  without  -­‐-­‐enable-­‐R-­‐shlib  or   -­‐-­‐enable-­‐R-­‐static-­‐lib       ***  Rserve  requires  R  (shared  or  static)   library.                                              ***   ***  Please  install  R  library  or  compile  R  with  either  -­‐-­‐enable-­‐ R-­‐shlib    ***   ***  or  -­‐-­‐enable-­‐R-­‐static-­‐lib   support                                                                        ***        Alternatively  use  -­‐-­‐without-­‐server  if  you  wish  to  build  only   Rserve  client.           ERROR:  configuration  failed  for  package  ‘Rserve’   *  removing  ‘/usr/lib64/R/library/Rserve’   In order to solve this problem, when compiling R it must be compiled using -- enable-R-shlib or --enable-R-static-lib but most Linux has these compiled with such options so this error is probably caused by something else. First, use the command below to search in the file path where R’s library files are. #  R  CMD  config  -­‐-­‐ldflags  
  • 10. -­‐L/usr/lib64/R/lib  -­‐lR   You might encounter the following error while executing the above command. [root@i-­‐10-­‐24-­‐1-­‐34  Rserve]#  R  CMD  config  -­‐-­‐ldflags   /usr/lib64/R/bin/config:  line  142:  make:  command  not  found   /usr/lib64/R/bin/config:  line  143:  make:  command  not  found   This means there is no ‘make’ utility and Rserve needs it for installation so ‘make’ utility has to be installed. Install the ‘make’ utility like below and then execute “R CMD config –ldflags” and see whether library path becomes successfully displayed. yum  install  make   And let’s check if libR.so is indeed in the printed path. #  ls  -­‐al  /usr/lib64/R/lib   total  4560   drwxr-­‐xr-­‐x  2  root  root        4096  Dec  13  03:00  .   drwxr-­‐xr-­‐x  7  root  root        4096  Dec  13  03:35  ..   -­‐rwxr-­‐xr-­‐x  1  root  root  2996480  Nov    8  14:19  libR.so   -­‐rwxr-­‐xr-­‐x  1  root  root    177176  Nov    8  14:19  libRblas.so   -­‐rwxr-­‐xr-­‐x  1  root  root  1470264  Nov    8  14:19  libRlapack.so   libR.so is confirmed to be there. Now that all preparations for installing Rserve are complete, retry and finish installing Rserve. open R install.packages("Rserve") *** Rserve requires R (shared or static) library. *** *** Please install R library or compile R with either --enable-R-shlib *** *** or --enable-R-static-lib support Running Rserve
  • 11. Once Rserve installation is complete, use DAEMON to run Rserve. Before running Rserve, configurations must be adjusted to enable remote connections to Rserve. Adjust the configurations as follows: Connect  to  the  server  where  Rserve  will  be  run.  In  all  Hadoop   job  nodes,  open  the  file,     "/etc/Rserv.conf",  using  a  text  editor.  If  there  is  no  such   file  then  it  must  be  created.     Insert  'remote  enable'  into  the  file.   Save  and  exit.   Rserv.conf  can  configure  many  other  options.  Details  pertaining   to  configuration  can  be  found  in  the  URL  below.     http://www.rforge.net/Rserve/doc.html   And then leave R and run Rserve in the command prompt. R  CMD  Rserve   If Rserve is run via Daemon then the following command can be used to check if it is listening to any ports. #  netstat  -­‐nltp   Active  Internet  connections  (only  servers)   Proto  Recv-­‐Q  Send-­‐Q  Local  Address                              Foreign   Address                          State              PID/Program  name   tcp                0            0   0.0.0.0:6311                                0.0.0.0:*                                      LISTEN            25516/Rserve   tcp                0            0  :::59873                                        :::*                                                LISTEN            13023/java   tcp                0            0  :::50020                                        :::*                                                LISTEN            13023/java   tcp                0            0  ::ffff:127.0.0.1:46056            :::*                                                LISTEN            13112/java   tcp                0            0  :::50060                                        :::*                      
  • 12.                          LISTEN            13112/java   tcp                0            0  :::22                                              :::*                                                LISTEN            1109/sshd   tcp                0            0  :::50010                                        :::*                                                LISTEN            13023/java   tcp                0            0  :::50075                                        :::*                                                LISTEN            13023/java   You can see the Rserve Daemon listening to port 6311. Port 6311 is the default port which Rserve uses. This can be changed via adjusting the configuration. But don’t change it unless there is a special reason to. And if the port isn’t open due to the firewall, then permission must be obtained so as to enable connection between internal servers. To check this, first see if the server where RHive will be run can achieve connection. #  connect  to  the  RHive  server   ssh  root@10.1.1.1   #  telnet  10.1.1.2  6311   Trying  10.1.1.2...   Connected  to  10.1.1.2.   Escape  character  is  '^]'.   Rsrv0103QAP1       -­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐   #  telnet  10.1.1.3  6311   Trying  10.1.1.3...   Connected  to  10.1.1.3.   Escape  character  is  '^]'.   Rsrv0103QAP1       -­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐   #  telnet  10.1.1.4  6311   Trying  10.1.1.4...  
  • 13. Connected  to  10.1.1.4.   Escape  character  is  '^]'.   Rsrv0103QAP1       -­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐-­‐   Configuring Hadoop and Hive for RHive In order to run RHive, the laptops or desktops with RHive installed must also have Hadoop and Hive installed, and their Hadoop configurations must also match the configuration of the Hadoop cluster. If the server planned for RHive installation do not have Hadoop or Hive installed into it, then install a version same as the one installed for the Hadoop cluster. Then copy the Hadoop’s configuration and match them up. After matching that, configure environment variables. export  HADOOP_HOME=/service/hadoop-­‐0.20.203.0   export  HIVE_HOME=/service/hive-­‐0.7.1   In the contents above, /service/hadoop-0.20.203.0is the path where Hadoop is installed and /service/hive-0.7.1 is where Hive is installed. These must be put into /etc/profile If RHive is installed in the same server as Hadoop namenode then no separate configuring is required. But if it’s a different server or a laptop then edit the contents of /service/hadoop-0.20.203.0/conf to be the same as the Hadoop cluster you plan to use. Running the RHive Example As stated before, in order to activate RHive, then environment variable must be configured before running R. To put it more precisely, a suitable environment variable must be set before initializing RHive. If you forgot to set HIVE_HOME and HADOOP_HOME for the laptop or server’s environment variables, or wish to toggle between using different versions then, as listed below, can be set after running R. Open  R  
  • 14. Sys.setenv(HIVE_HOME="/service/hive-­‐0.7.1")   Sys.setenv(HADOOP_HOME="/service/hadoop-­‐0.20.203.0")   library(RHive)   You can skip this if you edited /etc/profile and etc. This method suffers the disadvantage of having to be done every time R is run. Checking for and Setting RHive Environment Variables You can check whether the environment variable is properly set by running R and using the rhive.env() Function. Should either Hive Home Directory or Hadoop Home Directory not properly show up then you must recheck whether they have been correctly set. rhive.env()   Hive  Home  Directory  :  /mnt/srv/hive-­‐0.8.1   Hadoop  Home  Directory  :  /mnt/srv/hadoop-­‐0.20.203.0   Default  RServe  List   node1  node2  node3   Disconnected  HiveServer  and  HDFS   RHive connect After loading RHive and before doing any work, the rhive.connect function must be called and Hive server and connection must be made. If the connection isn’t made then RHive Functions will not work. rhive.connect()   SLF4J:  Class  path  contains  multiple  SLF4J  bindings.   SLF4J:  Found  binding  in  [jar:file:/service/hive-­‐ 0.7.1/lib/slf4j-­‐log4j12-­‐ 1.6.1.jar!/org/slf4j/impl/StaticLoggerBinder.class]   SLF4J:  Found  binding  in  [jar:file:/service/hadoop-­‐ 0.20.203.0/lib/slf4j-­‐log4j12-­‐ 1.4.3.jar!/org/slf4j/impl/StaticLoggerBinder.class]   SLF4J:  See  http://www.slf4j.org/codes.html#multiple_bindings   for  an  explanation.   Checking the contents of HDFS files
  • 15. You might see how many complex messages result when making the connection. These may be ignored. Now you can use the rhive.hdfs.* Functions to handle Hadoop’s HDFS and these correspond to the commands which “hadoop fs” . you can use the rhive.hdfs.ls() Function to check the HDFS’s list of files. rhive.hdfs.ls("/")      permission  owner            group      length            modify-­‐ time                file   1    rwxr-­‐xr-­‐x    root  supergroup                0  2011-­‐12-­‐07   14:27        /airline   2    rwxr-­‐xr-­‐x    root  supergroup                0  2011-­‐12-­‐07  13:16   /benchmarks   3    rw-­‐r-­‐-­‐r-­‐-­‐    root  supergroup  11186419  2011-­‐12-­‐06   03:59      /messages   4    rwxr-­‐xr-­‐x    root  supergroup                0  2011-­‐12-­‐07   22:05                /mnt   5    rwxr-­‐xr-­‐x    root  supergroup                0  2011-­‐12-­‐07   22:15            /rhive   6    rwxr-­‐xr-­‐x    root  supergroup                0  2011-­‐12-­‐07   20:19                /tmp   Checking table list of Hive Also, you can check the list of tables registered in Hive by using the rhive.list.tables() Function. If you have not made any tables then you can see the following result. rhive.list.tables()   [1]  tab_name   <0  rows>  (or  0-­‐length  row.names)   Creating Hive table You can use a simple command to save R’s data frame to a Hive table. tablename  <-­‐  rhive.write.table(USArrests)  
  • 16. USArrests is data provided with R. RHive converts data frame’s object name into Hive table name and store it as Hive table. Checking Table descriptions And you can use the rhive.list.desc() Function to see the descriptions of the table of Hive. rhive.desc.table("USArrests")      col_name  data_type  comment   1    rowname        string   2      murder        double   3    assault              int   4  urbanpop              int   5          rape        double   As a note, Hive’s table names do not distinguish between upper and lower cases. Creating Hive Tables 2 It is possible to take other data in MASS package or data with CSV files loaded and store them into Hive. library(MASS)   tablename  <-­‐  rhive.write.table(Aids2)   rhive.desc.table(tablename)   rhive.load.table(tablename)   This method is useful for uploading to Hive some data of relatively small sizes and if attempting to save several Gbs of data to Hive, the recommended method is to save files to HDFS and configuring as an external table RHive currently does not automatically handle this for users and such a feature is still in the drawing board. Executing a simple SQL syntax
  • 17. You can use the rhive.query() function to send SQL to Hive. Let’s try running a simple SQL syntax that checks the entire number of Records for the Hive table, usarrests. rhive.query("SELECT  COUNT(*)  FROM  usarrests")      X_c0   1      50   The SQL syntax executed above is the result of Map/Reducing using Hadoop and Hive. If you saw SQL results like above, then it indicates the RHive, Hadoop, and Hive configurations are alright, and Hadoop calculated and outputted the total count of the input data. One thing to watch out for is that this example only used a very small data so it is not safe to assert this has made full use of the potential of Hive and Hadoop, which are distributed processing platforms. Small data such as ”usarrests” that can be loaded into a single server’s memory can be processed within R, without the use of RHive. This step is just checking if the configurations are properly calibrated and basic functions are in working order. If you wish to use RHive through Hadoop and Hive, then it is fitting to use data at least the proportions ranging from several GiBs to the tens of GiBs. FAQ and Contact Info Consult the following reference materials for explanations and details for RHives for each Function. If you find a bug or find difficulty in using RHive then do a bug report on the RHive site or ask the RHive development team via e-mail. The RHive development team is always open and responsive to questions, requests, and bug reports. e-mail: rhive@nexr.com