SlideShare uma empresa Scribd logo
1 de 25
Baixar para ler offline
© 2012 coreservlets.com and Dima May
Customized Java EE Training: http://courses.coreservlets.com/
Hadoop, Java, JSF 2, PrimeFaces, Servlets, JSP, Ajax, jQuery, Spring, Hibernate, RESTful Web Services, Android.
Developed and taught by well-known author and developer. At public venues or onsite at your location.
HBase Installation & Shell
Originals of slides and source code for examples: http://www.coreservlets.com/hadoop-tutorial/
Also see the customized Hadoop training courses (onsite or at public venues) – http://courses.coreservlets.com/hadoop-training.html
© 2012 coreservlets.com and Dima May
Customized Java EE Training: http://courses.coreservlets.com/
Hadoop, Java, JSF 2, PrimeFaces, Servlets, JSP, Ajax, jQuery, Spring, Hibernate, RESTful Web Services, Android.
Developed and taught by well-known author and developer. At public venues or onsite at your location.
For live customized Hadoop training (including prep
for the Cloudera certification exam), please email
info@coreservlets.com
Taught by recognized Hadoop expert who spoke on Hadoop
several times at JavaOne, and who uses Hadoop daily in
real-world apps. Available at public venues, or customized
versions can be held on-site at your organization.
• Courses developed and taught by Marty Hall
– JSF 2.2, PrimeFaces, servlets/JSP, Ajax, jQuery, Android development, Java 7 or 8 programming, custom mix of topics
– Courses available in any state or country. Maryland/DC area companies can also choose afternoon/evening courses.
• Courses developed and taught by coreservlets.com experts (edited by Marty)
– Spring, Hibernate/JPA, GWT, Hadoop, HTML5, RESTful Web Services
Contact info@coreservlets.com for details
Agenda
• Learn about installation modes
• How to set-up Pseudo-Distributed Mode
• HBase Management Console
• HBase Shell
– Define Schema
– Create, Read, Update and Delete
4
Runtime Modes
5
• Local (Standalone) Mode
– Comes Out-of-the-Box, easy to get started
– Uses local filesystem (not HDFS), NOT for production
– Runs HBase & Zookeeper in the same JVM
• Pseudo-Distributed Mode
– Requires HDFS
– Mimics Fully-Distributed but runs on just one host
– Good for testing, debugging and prototyping
– Not for production use or performance benchmarking!
– Development mode used in class
• Fully-Distributed Mode
– Run HBase on many machines
– Great for production and development clusters
Set Up Pseudo-Distributed
Mode
1. Verify Installation Requirements
– Java, password-less SSH
2. Configure Java
3. Configure the use of HDFS
– Specify the location of Namenode
– Configure replication
4. Make sure HDFS is running
5. Start HBase
6. Verify HBase is running
6
1: Verify Installation Requirements
• Latest release of Java 6 from Oracle
• Must have compatible release of Hadoop!
– runs of top of HDFS
– Today runs on Hadoop 0.20.x
– Can run on top of local FS
• Will lose data when crashes
• Needs HDFS's durable sync for data fault-tolerance
– HDFS provides confirmation that the data has been saved
– Confirmation is provided after all blocks are successfully replicated
to all the required nodes
7
1: Verify Installation Requirements
• SSH installed, sshd must be running
– Just like Hadoop
– Need password-less SSH to all the nodes including
yourself
– Required for both pseudo-distributed and fully-distributed
modes
• Windows
– Very little testing – for development only
– Will need Cygwin
8
2: Configure Java
• vi <HBASE_HOME>/conf/hbase-env.sh
export JAVA_HOME=/usr/java/jdk1.6.0
9
3: Configure the use of HDFS
• Point to HDFS for its filesystem
– Edit <hbase_home>/conf/hbase-site.xml
– hbase.rootdir property:
• Uses HDFS URI
• Recall URI format: scheme://namenode/path
– Example: hdfs://localhost:9000/hbase
• The location of namenode
• directory on HDFS where Region Servers will save it's
data
– If directory doesn't exist it will be created
– dfs.replication property:
• The number of times data will be replicated across Region
Servers (HLog and HFile)
• Will set to 1 since there is only 1 host
10
3: Configure the use of HDFS
<hbase_home>/conf/hbase-site.xml
11
<configuration>
...
<property>
<name>hbase.rootdir</name>
<value>hdfs://localhost:9000/hbase</value>
<description>The directory shared by RegionServers.
</description>
</property>
<property>
<name>dfs.replication</name>
<value>1</value>
<description>The replication count for HLog and HFile
storage. Should not be greater than HDFS datanode count.
</description>
</property>
...
</configuration>
Will this configuration work on a remote client?
3: Configure the use of HDFS
• Since 'localhost' was specified as the
location of the namenode remote clients
can't use this configuration
12
...
<property>
<name>hbase.rootdir</name>
<value>hdfs://localhost:9000/hbase</value>
</property>
...
4: Make sure HDFS is running
• Make sure HDFS is running
– Easy way is to check web-based management console
• http://localhost:50070/dfshealth.jsp
– Or use command line
• $ hdfs dfs -ls /
13
5: Start HBase
14
$ cd <hbase_home>/bin
$ ./start-hbase.sh
starting master, logging to
/home/hadoop/Training/logs/hbase/hbase-hadoop-master-
hadoop-laptop.out
...
6: Verify HBase is Running
15
$ hadoop fs -ls /hbase
Found 5 items
drwxr-xr-x - hadoop supergroup 0 2011-12-31 13:18 /hbase/-ROOT-
drwxr-xr-x - hadoop supergroup 0 2011-12-31 13:18 /hbase/.META.
drwxr-xr-x - hadoop supergroup 0 2011-12-31 13:18 /hbase/.logs
drwxr-xr-x - hadoop supergroup 0 2011-12-31 13:18 /hbase/.oldlogs
-rw-r--r-- 1 hadoop supergroup 3 2011-12-31 13:18 /hbase/hbase.version
HBase data and metadata is stored in HDFS
$ hbase shell
HBase Shell; enter 'help<RETURN>' for list of supported commands.
Type "exit<RETURN>" to leave the HBase Shell
Version 0.90.4-cdh3u2, r, Thu Oct 13 20:32:26 PDT 2011
hbase(main):001:0> list
TABLE
0 row(s) in 0.4070 seconds Run a command to verify
that cluster is actually running
6: Verify HBase is Running
• By default HBase manages Zookeeper
daemon for you
• Logs by default go to <hbase_home>/logs
– Change the default by editing <hbase_home>/conf/hbase-
env.sh
• export HBASE_LOG_DIR=/new/location/logs
16
HBase Management Console
• HBase comes with web based management
– http://localhost:60010
• Both Master and Region servers run web
server
– Browsing Master will lead you to region servers
• Regions run on port 60030
• Firewall considerations
– Opening <master_host>:60010 in firewall is not enough
– Have to open up <region(s)_host>:60030 on every slave
host
– An easy option is to open a browser behind the firewall
• SSH tunneling and Virtual Network Computing (VNC)
17
HBase Management Console
18
HBase Shell
19
• JRuby IRB (Interactive Ruby Shell)
– HBase commands added
– If you can do it in IRB you can do it in HBase shell
• http://en.wikipedia.org/wiki/Interactive_Ruby_Shell
• To run simply
– Puts you into IRB
– Type 'help' to get a listing of commands
• $ help “command” (quotes are required)
– > help “get”
$ <hbase_install>/bin/hbase shell
HBase Shell; enter 'help<RETURN>' for list of supported commands.
Type "exit<RETURN>" to leave the HBase Shell
Version 0.90.4-cdh3u2, r, Thu Oct 13 20:32:26 PDT 2011
hbase(main):001:0>
HBase Shell
• Quote all names
– Table and column names
– Single quotes for text
• hbase> get 't1', 'myRowId'
– Double quotes for binary
• Use hexadecimal representation of that binary value
• hbase> get 't1', "keyx03x3fxcd"
• Uses ruby hashes to specify parameters
– {'key1' => 'value1', 'key2' => 'value2', …}
– Example:
20
hbase> get 'UserTable', 'userId1', {COLUMN => 'address:str'}
HBase Shell
21
• HBase Shell supports various commands
– General
• status, version
– Data Definition Language (DDL)
• alter, create, describe, disable, drop, enable, exists, is_disabled,
is_enabled, list
– Data Manipulation Language (DML)
• count, delete, deleteall, get, get_counter, incr, put, scan, truncate
– Cluster administration
• balancer, close_region, compact, flush, major_compact, move, split,
unassign, zk_dump, add_peer, disable_peer, enable_peer,
remove_peer, start_replication, stop_replication
• Learn more about each command
– hbase> help "<command>"
HBase Shell - Check Status
• Display cluster's status via status command
– hbase> status
– hbase> status 'detailed'
• Similar information can be found on HBase
Web Management Console
– http://localhost:60010
22
HBase Shell - Check Status
23
hbase> status
1 servers, 0 dead, 3.0000 average load
hbase> status 'detailed'
version 0.90.4-cdh3u2
0 regionsInTransition
1 live servers
hadoop-laptop:39679 1326056194009
requests=0, regions=3, usedHeap=30, maxHeap=998
.META.,,1
stores=1, storefiles=0, storefileSizeMB=0, …
-ROOT-,,0
stores=1, storefiles=1, storefileSizeMB=0, …
Blog,,1326059842133.c1b865dd916b64a6228ecb4f743 …
0 dead servers
HBase Shell DDL and DML
• Let's walk through an example
1. Create a table
• Define column families
2. Populate table with data records
• Multiple records
3. Access data
• Count, get and scan
4. Edit data
5. Delete records
6. Drop table
24
1: Create Table
• Create table called 'Blog' with the following
schema
– 2 families
• 'info' with 3 columns: 'title', 'author', and 'date'
• 'content' with 1 column family: 'post'
25
Blog
Family: info: Columns: title, author, date
content: Columns: post
1: Create Table
• Various options to create tables and
families
– hbase> create 't1', {NAME => 'f1', VERSIONS => 5}
– hbase> create 't1', {NAME => 'f1', VERSIONS => 1,
TTL => 2592000, BLOCKCACHE => true}
– hbase> create 't1', {NAME => 'f1'}, {NAME => 'f2'},
{NAME => 'f3'}
– hbase> create 't1', 'f1', 'f2', 'f3'
26
hbase> create 'Blog', {NAME=>'info'}, {NAME=>'content'}
0 row(s) in 1.3580 seconds
2: Populate Table With Data
Records
• Populate data with multiple records
• Put command format:
hbase> put 'table', 'row_id', 'family:column', 'value'
27
Row Id info:title info:author info:date content:post
Matt-001 Elephant Matt 2009.05.06 Do elephants like monkeys?
Matt-002 Monkey Matt 2011.02.14 Do monkeys like elephants?
Bob-003 Dog Bob 1995.10.20 People own dogs!
Michelle-004 Cat Michelle 1990.07.06 I have a cat!
John-005 Mouse John 2012.01.15 Mickey mouse.
2: Populate Table With Data
Records
28
# insert row 1
put 'Blog', 'Matt-001', 'info:title', 'Elephant'
put 'Blog', 'Matt-001', 'info:author', 'Matt'
put 'Blog', 'Matt-001', 'info:date', '2009.05.06'
put 'Blog', 'Matt-001', 'content:post', 'Do elephants like monkeys?'
…
…
… # insert rows 2-4
…
…
# row 5
put 'Blog', 'John-005', 'info:title', 'Mouse'
put 'Blog', 'John-005', 'info:author', 'John'
put 'Blog', 'John-005', 'info:date', '1990.07.06'
put 'Blog', 'John-005', 'content:post', 'Mickey mouse.'
1 put statement per cell
3. Access data - count
• Access Data
– count: display the total number of records
– get: retrieve a single row
– scan: retrieve a range of rows
• Count is simple
– hbase> count 'table_name'
– Will scan the entire table! May be slow for a large table
• Alternatively can run a MapReduce job (more on this
later...)
– $ yarn jar hbase.jar rowcount
– Specify count to display every n rows. Default is 1000
• hbase> count 't1', INTERVAL => 10
29
3. Access data - count
30
hbase> count 'Blog', {INTERVAL=>2}
Current count: 2, row: John-005
Current count: 4, row: Matt-002
5 row(s) in 0.0220 seconds
hbase> count 'Blog', {INTERVAL=>1}
Current count: 1, row: Bob-003
Current count: 2, row: John-005
Current count: 3, row: Matt-001
Current count: 4, row: Matt-002
Current count: 5, row: Michelle-004
Affects how often
count is displayed
3. Access data - get
31
• Select single row with 'get' command
– hbase> get 'table', 'row_id'
• Returns an entire row
– Requires table name and row id
– Optional: timestamp or time-range, and versions
• Select specific columns
– hbase> get 't1', 'r1', {COLUMN => 'c1'}
– hbase> get 't1', 'r1', {COLUMN => ['c1', 'c2', 'c3']}
• Select specific timestamp or time-range
– hbase> get 't1', 'r1', {TIMERANGE => [ts1, ts2]}
– hbase> get 't1', 'r1', {COLUMN => 'c1', TIMESTAMP => ts1}
• Select more than one version
– hbase> get 't1', 'r1', {VERSIONS => 4}
3. Access data - get
32
hbase> get 'Blog', 'unknownRowId'
COLUMN CELL
0 row(s) in 0.0250 seconds
hbase> get 'Blog', 'Michelle-004'
COLUMN CELL
content:post timestamp=1326061625690, value=I have a cat!
info:author timestamp=1326061625630, value=Michelle
info:date timestamp=1326061625653, value=1990.07.06
info:title timestamp=1326061625608, value=Cat
4 row(s) in 0.0420 seconds
Row Id doesn't exist
Returns ALL the columns, displays 1
column per row!!!
3. Access data - get
33
hbase> get 'Blog', 'Michelle-004',
{COLUMN=>['info:author','content:post']}
COLUMN CELL
content:post timestamp=1326061625690, value=I have a cat!
info:author timestamp=1326061625630, value=Michelle
2 row(s) in 0.0100 seconds
hbase> get 'Blog', 'Michelle-004',
{COLUMN=>['info:author','content:post'],
TIMESTAMP=>1326061625690}
COLUMN CELL
content:post timestamp=1326061625690, value=I have a cat!
1 row(s) in 0.0140 seconds
Narrow down to just two columns
Narrow down via columns and timestamp
Only one timestamp matches
3. Access data - get
34
hbase> get 'Blog', 'Michelle-004',
{COLUMN=>'info:date', VERSIONS=>2}
COLUMN CELL
info:date timestamp=1326071670471, value=1990.07.08
info:date timestamp=1326071670442, value=1990.07.07
2 row(s) in 0.0300 seconds
hbase> get 'Blog', 'Michelle-004',
{COLUMN=>'info:date'}
COLUMN CELL
info:date timestamp=1326071670471, value=1990.07.08
1 row(s) in 0.0190 seconds
Asks for the latest two versions
By default only the latest version is returned
3. Access data - Scan
35
• Scan entire table or a portion of it
• Load entire row or explicitly retrieve column
families, columns or specific cells
• To scan an entire table
– hbase> scan 'table_name'
• Limit the number of results
– hbase> scan 'table_name', {LIMIT=>1}
• Scan a range
– hbase> scan 'Blog', {STARTROW=>'startRow',
STOPROW=>'stopRow'}
– Start row is inclusive, stop row is exclusive
– Can provide just start row or just stop row
3. Access data - Scan
• Limit what columns are retrieved
– hbase> scan 'table', {COLUMNS=>['col1', 'col2']}
• Scan a time range
– hbase> scan 'table', {TIMERANGE => [1303, 13036]}
• Limit results with a filter
– hbase> scan 'Blog', {FILTER =>
org.apache.hadoop.hbase.filter.ColumnPaginationFilter.n
ew(1, 0)}
– More about filters later
36
3. Access data - Scan
37
hbase(main):014:0> scan 'Blog'
ROW COLUMN+CELL
Bob-003 column=content:post, timestamp=1326061625569,
value=People own dogs!
Bob-003 column=info:author, timestamp=1326061625518, value=Bob
Bob-003 column=info:date, timestamp=1326061625546,
value=1995.10.20
Bob-003 column=info:title, timestamp=1326061625499, value=Dog
John-005 column=content:post, timestamp=1326061625820,
value=Mickey mouse.
John-005 column=info:author, timestamp=1326061625758,
value=John
…
Michelle-004 column=info:author, timestamp=1326061625630,
value=Michelle
Michelle-004 column=info:date, timestamp=1326071670471,
value=1990.07.08
Michelle-004 column=info:title, timestamp=1326061625608,
value=Cat
5 row(s) in 0.0670 seconds
Scan the entire table, grab ALL the columns
3. Access data - Scan
38
hbase> scan 'Blog', {STOPROW=>'John'}
ROW COLUMN+CELL
Bob-003 column=content:post, timestamp=1326061625569,
value=People own dogs!
Bob-003 column=info:author, timestamp=1326061625518,
value=Bob
Bob-003 column=info:date, timestamp=1326061625546,
value=1995.10.20
Bob-003 column=info:title, timestamp=1326061625499,
value=Dog
1 row(s) in 0.0410 seconds
Stop row is exclusive, row ids that
start with John will not be included
3. Access data - Scan
39
hbase> scan 'Blog', {COLUMNS=>'info:title',
STARTROW=>'John', STOPROW=>'Michelle'}
ROW COLUMN+CELL
John-005 column=info:title, timestamp=1326061625728,
value=Mouse
Matt-001 column=info:title, timestamp=1326061625214,
value=Elephant
Matt-002 column=info:title, timestamp=1326061625383,
value=Monkey
3 row(s) in 0.0290 seconds
Only retrieve 'info:title' column
4: Edit data
• Put command inserts a new value if row id
doesn't exist
• Put updates the value if the row does exist
• But does it really update?
– Inserts a new version for the cell
– Only the latest version is selected by default
– N versions are kept per cell
• configured per family at creation:
• 3 versions are kept by default
40
hbase> create 'table', {NAME => 'family', VERSIONS => 7}
4: Edit data
41
hbase> put 'Blog', 'Michelle-004', 'info:date', '1990.07.06'
0 row(s) in 0.0520 seconds
hbase> put 'Blog', 'Michelle-004', 'info:date', '1990.07.07'
0 row(s) in 0.0080 seconds
hbase> put 'Blog', 'Michelle-004', 'info:date', '1990.07.08'
0 row(s) in 0.0060 seconds
hbase> get 'Blog', 'Michelle-004',
{COLUMN=>'info:date', VERSIONS=>3}
COLUMN CELL
info:date timestamp=1326071670471, value=1990.07.08
info:date timestamp=1326071670442, value=1990.07.07
info:date timestamp=1326071670382, value=1990.07.06
3 row(s) in 0.0170 seconds
Update the same exact row with a different value
Keeps three versions of each cell by default
4: Edit data
42
hbase> get 'Blog', 'Michelle-004',
{COLUMN=>'info:date', VERSIONS=>2}
COLUMN CELL
info:date timestamp=1326071670471, value=1990.07.08
info:date timestamp=1326071670442, value=1990.07.07
2 row(s) in 0.0300 seconds
hbase> get 'Blog', 'Michelle-004',
{COLUMN=>'info:date'}
COLUMN CELL
info:date timestamp=1326071670471, value=1990.07.08
1 row(s) in 0.0190 seconds
Asks for the latest two versions
By default only the latest version is returned
5: Delete records
• Delete cell by providing table, row id and
column coordinates
– delete 'table', 'rowId', 'column'
– Deletes all the versions of that cell
• Optionally add timestamp to only delete
versions before the provided timestamp
– delete 'table', 'rowId', 'column', timestamp
43
5: Delete records
44
hbase> get 'Blog', 'Bob-003', 'info:date'
COLUMN CELL
info:date timestamp=1326061625546, value=1995.10.20
1 row(s) in 0.0200 seconds
hbase> delete 'Blog', 'Bob-003', 'info:date'
0 row(s) in 0.0180 seconds
hbase> get 'Blog', 'Bob-003', 'info:date'
COLUMN CELL
0 row(s) in 0.0170 seconds
5: Delete records
45
hbase> get 'Blog', 'Michelle-004',
{COLUMN=>'info:date', VERSIONS=>3}
COLUMN CELL
info:date timestamp=1326254742846, value=1990.07.08
info:date timestamp=1326254739790, value=1990.07.07
info:date timestamp=1326254736564, value=1990.07.06
3 row(s) in 0.0120 seconds
hbase> delete 'Blog', 'Michelle-004', 'info:date', 1326254739791
0 row(s) in 0.0150 seconds
hbase> get 'Blog', 'Michelle-004',
{COLUMN=>'info:date', VERSIONS=>3}
COLUMN CELL
info:date timestamp=1326254742846, value=1990.07.08
1 row(s) in 0.0090 seconds
3 versions
1 millisecond after the second version
After the timestamp provided at delete statement
6: Drop table
• Must disable before dropping
– puts the table “offline” so schema based operations can
be performed
– hbase> disable 'table_name'
– hbase> drop 'table_name'
• For a large table it may take a long time....
46
6: Drop table
47
hbase> list
TABLE
Blog
1 row(s) in 0.0120 seconds
hbase> disable 'Blog'
0 row(s) in 2.0510 seconds
hbase> drop 'Blog'
0 row(s) in 0.0940 seconds
hbase> list
TABLE
0 row(s) in 0.0200 seconds
Take the table offline for
schema modifications
© 2012 coreservlets.com and Dima May
Customized Java EE Training: http://courses.coreservlets.com/
Hadoop, Java, JSF 2, PrimeFaces, Servlets, JSP, Ajax, jQuery, Spring, Hibernate, RESTful Web Services, Android.
Developed and taught by well-known author and developer. At public venues or onsite at your location.
Wrap-Up
Summary
• We learned
– How to install HBase in Pseudo-Distributed Mode
– How to use HBase Shell
– HBase Shell commands
49
© 2012 coreservlets.com and Dima May
Customized Java EE Training: http://courses.coreservlets.com/
Hadoop, Java, JSF 2, PrimeFaces, Servlets, JSP, Ajax, jQuery, Spring, Hibernate, RESTful Web Services, Android.
Developed and taught by well-known author and developer. At public venues or onsite at your location.
Questions?
More info:
http://www.coreservlets.com/hadoop-tutorial/ – Hadoop programming tutorial
http://courses.coreservlets.com/hadoop-training.html – Customized Hadoop training courses, at public venues or onsite at your organization
http://courses.coreservlets.com/Course-Materials/java.html – General Java programming tutorial
http://www.coreservlets.com/java-8-tutorial/ – Java 8 tutorial
http://www.coreservlets.com/JSF-Tutorial/jsf2/ – JSF 2.2 tutorial
http://www.coreservlets.com/JSF-Tutorial/primefaces/ – PrimeFaces tutorial
http://coreservlets.com/ – JSF 2, PrimeFaces, Java 7 or 8, Ajax, jQuery, Hadoop, RESTful Web Services, Android, HTML5, Spring, Hibernate, Servlets, JSP, GWT, and other Java EE training

Mais conteúdo relacionado

Mais procurados

02 Hadoop deployment and configuration
02 Hadoop deployment and configuration02 Hadoop deployment and configuration
02 Hadoop deployment and configurationSubhas Kumar Ghosh
 
The State of HBase Replication
The State of HBase ReplicationThe State of HBase Replication
The State of HBase ReplicationHBaseCon
 
HBase: Extreme Makeover
HBase: Extreme MakeoverHBase: Extreme Makeover
HBase: Extreme MakeoverHBaseCon
 
Using memcache to improve php performance
Using memcache to improve php performanceUsing memcache to improve php performance
Using memcache to improve php performanceSudar Muthu
 
Pure Speed Drupal 4 Gov talk
Pure Speed Drupal 4 Gov talkPure Speed Drupal 4 Gov talk
Pure Speed Drupal 4 Gov talkBryan Ollendyke
 
Caching with Varnish
Caching with VarnishCaching with Varnish
Caching with Varnishschoefmax
 
Accelerate your ColdFusion Applications using Caching
Accelerate your ColdFusion Applications using CachingAccelerate your ColdFusion Applications using Caching
Accelerate your ColdFusion Applications using CachingColdFusionConference
 
Hadoop meet Rex(How to construct hadoop cluster with rex)
Hadoop meet Rex(How to construct hadoop cluster with rex)Hadoop meet Rex(How to construct hadoop cluster with rex)
Hadoop meet Rex(How to construct hadoop cluster with rex)Jun Hong Kim
 
Drupal, varnish, esi - Toulouse November 2
Drupal, varnish, esi - Toulouse November 2Drupal, varnish, esi - Toulouse November 2
Drupal, varnish, esi - Toulouse November 2Marcus Deglos
 
Meet HBase 1.0
Meet HBase 1.0Meet HBase 1.0
Meet HBase 1.0enissoz
 
Operating and supporting HBase Clusters
Operating and supporting HBase ClustersOperating and supporting HBase Clusters
Operating and supporting HBase Clustersenissoz
 
Rigorous and Multi-tenant HBase Performance Measurement
Rigorous and Multi-tenant HBase Performance MeasurementRigorous and Multi-tenant HBase Performance Measurement
Rigorous and Multi-tenant HBase Performance MeasurementDataWorks Summit
 

Mais procurados (15)

02 Hadoop deployment and configuration
02 Hadoop deployment and configuration02 Hadoop deployment and configuration
02 Hadoop deployment and configuration
 
The State of HBase Replication
The State of HBase ReplicationThe State of HBase Replication
The State of HBase Replication
 
How Flipkart scales PHP
How Flipkart scales PHPHow Flipkart scales PHP
How Flipkart scales PHP
 
Memcached Study
Memcached StudyMemcached Study
Memcached Study
 
HBase: Extreme Makeover
HBase: Extreme MakeoverHBase: Extreme Makeover
HBase: Extreme Makeover
 
Using memcache to improve php performance
Using memcache to improve php performanceUsing memcache to improve php performance
Using memcache to improve php performance
 
Pure Speed Drupal 4 Gov talk
Pure Speed Drupal 4 Gov talkPure Speed Drupal 4 Gov talk
Pure Speed Drupal 4 Gov talk
 
Caching with Varnish
Caching with VarnishCaching with Varnish
Caching with Varnish
 
Top ten-list
Top ten-listTop ten-list
Top ten-list
 
Accelerate your ColdFusion Applications using Caching
Accelerate your ColdFusion Applications using CachingAccelerate your ColdFusion Applications using Caching
Accelerate your ColdFusion Applications using Caching
 
Hadoop meet Rex(How to construct hadoop cluster with rex)
Hadoop meet Rex(How to construct hadoop cluster with rex)Hadoop meet Rex(How to construct hadoop cluster with rex)
Hadoop meet Rex(How to construct hadoop cluster with rex)
 
Drupal, varnish, esi - Toulouse November 2
Drupal, varnish, esi - Toulouse November 2Drupal, varnish, esi - Toulouse November 2
Drupal, varnish, esi - Toulouse November 2
 
Meet HBase 1.0
Meet HBase 1.0Meet HBase 1.0
Meet HBase 1.0
 
Operating and supporting HBase Clusters
Operating and supporting HBase ClustersOperating and supporting HBase Clusters
Operating and supporting HBase Clusters
 
Rigorous and Multi-tenant HBase Performance Measurement
Rigorous and Multi-tenant HBase Performance MeasurementRigorous and Multi-tenant HBase Performance Measurement
Rigorous and Multi-tenant HBase Performance Measurement
 

Destaque (20)

Conocemos al cocodrilo paco
Conocemos al cocodrilo pacoConocemos al cocodrilo paco
Conocemos al cocodrilo paco
 
Moc ibiu consensuada_11-09-12
Moc ibiu consensuada_11-09-12Moc ibiu consensuada_11-09-12
Moc ibiu consensuada_11-09-12
 
Bachelor´s degree Transcripts
Bachelor´s degree TranscriptsBachelor´s degree Transcripts
Bachelor´s degree Transcripts
 
Critical Social Skills for Adolescents wtih High Incidence Disabilities: Par...
Critical Social Skills for Adolescents wtih High Incidence Disabilities:  Par...Critical Social Skills for Adolescents wtih High Incidence Disabilities:  Par...
Critical Social Skills for Adolescents wtih High Incidence Disabilities: Par...
 
Curso de competencias t.i.c.s.
Curso de competencias                     t.i.c.s.Curso de competencias                     t.i.c.s.
Curso de competencias t.i.c.s.
 
Noe viaje extraordinario_mt
Noe viaje extraordinario_mtNoe viaje extraordinario_mt
Noe viaje extraordinario_mt
 
como crear correo gmail
como crear correo gmailcomo crear correo gmail
como crear correo gmail
 
Rincipios eticos enfermeria
Rincipios eticos enfermeriaRincipios eticos enfermeria
Rincipios eticos enfermeria
 
Actividad de aprendizaje
Actividad de aprendizajeActividad de aprendizaje
Actividad de aprendizaje
 
Esan 7ºano cn_1º_teste_idg
Esan 7ºano cn_1º_teste_idgEsan 7ºano cn_1º_teste_idg
Esan 7ºano cn_1º_teste_idg
 
Dia del agua 22 de marzo
Dia del agua  22 de marzoDia del agua  22 de marzo
Dia del agua 22 de marzo
 
Promocion de ventas
Promocion de ventasPromocion de ventas
Promocion de ventas
 
Calvarez
CalvarezCalvarez
Calvarez
 
Programacion San Cibrao 2014
Programacion San Cibrao 2014Programacion San Cibrao 2014
Programacion San Cibrao 2014
 
SISTEMAS OPERATIVOS
SISTEMAS OPERATIVOSSISTEMAS OPERATIVOS
SISTEMAS OPERATIVOS
 
Casse
CasseCasse
Casse
 
Cecilia
CeciliaCecilia
Cecilia
 
Change_02
Change_02Change_02
Change_02
 
Dia internacional del agua
Dia internacional del aguaDia internacional del agua
Dia internacional del agua
 
9 Minute Video
9 Minute Video9 Minute Video
9 Minute Video
 

Semelhante a 03 h base-2-installation_andshell

8a. How To Setup HBase with Docker
8a. How To Setup HBase with Docker8a. How To Setup HBase with Docker
8a. How To Setup HBase with DockerFabio Fumarola
 
R hive tutorial supplement 1 - Installing Hadoop
R hive tutorial supplement 1 - Installing HadoopR hive tutorial supplement 1 - Installing Hadoop
R hive tutorial supplement 1 - Installing HadoopAiden Seonghak Hong
 
Configure h base hadoop and hbase client
Configure h base hadoop and hbase clientConfigure h base hadoop and hbase client
Configure h base hadoop and hbase clientShashwat Shriparv
 
3. v sphere big data extensions
3. v sphere big data extensions3. v sphere big data extensions
3. v sphere big data extensionsChiou-Nan Chen
 
Whirr dev-up-puppetconf2011
Whirr dev-up-puppetconf2011Whirr dev-up-puppetconf2011
Whirr dev-up-puppetconf2011Puppet
 
Learn to setup a Hadoop Multi Node Cluster
Learn to setup a Hadoop Multi Node ClusterLearn to setup a Hadoop Multi Node Cluster
Learn to setup a Hadoop Multi Node ClusterEdureka!
 
Learn Hadoop Administration
Learn Hadoop AdministrationLearn Hadoop Administration
Learn Hadoop AdministrationEdureka!
 
Hadoop single node setup
Hadoop single node setupHadoop single node setup
Hadoop single node setupMohammad_Tariq
 
Hadoop cluster 安裝
Hadoop cluster 安裝Hadoop cluster 安裝
Hadoop cluster 安裝recast203
 
July 2010 Triangle Hadoop Users Group - Chad Vawter Slides
July 2010 Triangle Hadoop Users Group - Chad Vawter SlidesJuly 2010 Triangle Hadoop Users Group - Chad Vawter Slides
July 2010 Triangle Hadoop Users Group - Chad Vawter Slidesryancox
 
Apache Wizardry - Ohio Linux 2011
Apache Wizardry - Ohio Linux 2011Apache Wizardry - Ohio Linux 2011
Apache Wizardry - Ohio Linux 2011Rich Bowen
 
Power Hadoop Cluster with AWS Cloud
Power Hadoop Cluster with AWS CloudPower Hadoop Cluster with AWS Cloud
Power Hadoop Cluster with AWS CloudEdureka!
 
Single node hadoop cluster installation
Single node hadoop cluster installation Single node hadoop cluster installation
Single node hadoop cluster installation Mahantesh Angadi
 
A tour of Ansible
A tour of AnsibleA tour of Ansible
A tour of AnsibleDevOps Ltd.
 
DevOps Meetup ansible
DevOps Meetup   ansibleDevOps Meetup   ansible
DevOps Meetup ansiblesriram_rajan
 
R hive tutorial supplement 2 - Installing Hive
R hive tutorial supplement 2 - Installing HiveR hive tutorial supplement 2 - Installing Hive
R hive tutorial supplement 2 - Installing HiveAiden Seonghak Hong
 

Semelhante a 03 h base-2-installation_andshell (20)

8a. How To Setup HBase with Docker
8a. How To Setup HBase with Docker8a. How To Setup HBase with Docker
8a. How To Setup HBase with Docker
 
R hive tutorial supplement 1 - Installing Hadoop
R hive tutorial supplement 1 - Installing HadoopR hive tutorial supplement 1 - Installing Hadoop
R hive tutorial supplement 1 - Installing Hadoop
 
Configure h base hadoop and hbase client
Configure h base hadoop and hbase clientConfigure h base hadoop and hbase client
Configure h base hadoop and hbase client
 
3. v sphere big data extensions
3. v sphere big data extensions3. v sphere big data extensions
3. v sphere big data extensions
 
Hbase
HbaseHbase
Hbase
 
Whirr dev-up-puppetconf2011
Whirr dev-up-puppetconf2011Whirr dev-up-puppetconf2011
Whirr dev-up-puppetconf2011
 
Learn to setup a Hadoop Multi Node Cluster
Learn to setup a Hadoop Multi Node ClusterLearn to setup a Hadoop Multi Node Cluster
Learn to setup a Hadoop Multi Node Cluster
 
Apache
ApacheApache
Apache
 
Learn Hadoop Administration
Learn Hadoop AdministrationLearn Hadoop Administration
Learn Hadoop Administration
 
Hadoop single node setup
Hadoop single node setupHadoop single node setup
Hadoop single node setup
 
Hadoop cluster 安裝
Hadoop cluster 安裝Hadoop cluster 安裝
Hadoop cluster 安裝
 
July 2010 Triangle Hadoop Users Group - Chad Vawter Slides
July 2010 Triangle Hadoop Users Group - Chad Vawter SlidesJuly 2010 Triangle Hadoop Users Group - Chad Vawter Slides
July 2010 Triangle Hadoop Users Group - Chad Vawter Slides
 
Exp-3.pptx
Exp-3.pptxExp-3.pptx
Exp-3.pptx
 
Apache Wizardry - Ohio Linux 2011
Apache Wizardry - Ohio Linux 2011Apache Wizardry - Ohio Linux 2011
Apache Wizardry - Ohio Linux 2011
 
Power Hadoop Cluster with AWS Cloud
Power Hadoop Cluster with AWS CloudPower Hadoop Cluster with AWS Cloud
Power Hadoop Cluster with AWS Cloud
 
Single node hadoop cluster installation
Single node hadoop cluster installation Single node hadoop cluster installation
Single node hadoop cluster installation
 
Introduction to Chef
Introduction to ChefIntroduction to Chef
Introduction to Chef
 
A tour of Ansible
A tour of AnsibleA tour of Ansible
A tour of Ansible
 
DevOps Meetup ansible
DevOps Meetup   ansibleDevOps Meetup   ansible
DevOps Meetup ansible
 
R hive tutorial supplement 2 - Installing Hive
R hive tutorial supplement 2 - Installing HiveR hive tutorial supplement 2 - Installing Hive
R hive tutorial supplement 2 - Installing Hive
 

Último

GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024Results
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024Scott Keck-Warren
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...shyamraj55
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Allon Mureinik
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitecturePixlogix Infotech
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure servicePooja Nehwal
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Paola De la Torre
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersThousandEyes
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 

Último (20)

GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC Architecture
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen Frames
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 

03 h base-2-installation_andshell

  • 1. © 2012 coreservlets.com and Dima May Customized Java EE Training: http://courses.coreservlets.com/ Hadoop, Java, JSF 2, PrimeFaces, Servlets, JSP, Ajax, jQuery, Spring, Hibernate, RESTful Web Services, Android. Developed and taught by well-known author and developer. At public venues or onsite at your location. HBase Installation & Shell Originals of slides and source code for examples: http://www.coreservlets.com/hadoop-tutorial/ Also see the customized Hadoop training courses (onsite or at public venues) – http://courses.coreservlets.com/hadoop-training.html © 2012 coreservlets.com and Dima May Customized Java EE Training: http://courses.coreservlets.com/ Hadoop, Java, JSF 2, PrimeFaces, Servlets, JSP, Ajax, jQuery, Spring, Hibernate, RESTful Web Services, Android. Developed and taught by well-known author and developer. At public venues or onsite at your location. For live customized Hadoop training (including prep for the Cloudera certification exam), please email info@coreservlets.com Taught by recognized Hadoop expert who spoke on Hadoop several times at JavaOne, and who uses Hadoop daily in real-world apps. Available at public venues, or customized versions can be held on-site at your organization. • Courses developed and taught by Marty Hall – JSF 2.2, PrimeFaces, servlets/JSP, Ajax, jQuery, Android development, Java 7 or 8 programming, custom mix of topics – Courses available in any state or country. Maryland/DC area companies can also choose afternoon/evening courses. • Courses developed and taught by coreservlets.com experts (edited by Marty) – Spring, Hibernate/JPA, GWT, Hadoop, HTML5, RESTful Web Services Contact info@coreservlets.com for details
  • 2. Agenda • Learn about installation modes • How to set-up Pseudo-Distributed Mode • HBase Management Console • HBase Shell – Define Schema – Create, Read, Update and Delete 4 Runtime Modes 5 • Local (Standalone) Mode – Comes Out-of-the-Box, easy to get started – Uses local filesystem (not HDFS), NOT for production – Runs HBase & Zookeeper in the same JVM • Pseudo-Distributed Mode – Requires HDFS – Mimics Fully-Distributed but runs on just one host – Good for testing, debugging and prototyping – Not for production use or performance benchmarking! – Development mode used in class • Fully-Distributed Mode – Run HBase on many machines – Great for production and development clusters
  • 3. Set Up Pseudo-Distributed Mode 1. Verify Installation Requirements – Java, password-less SSH 2. Configure Java 3. Configure the use of HDFS – Specify the location of Namenode – Configure replication 4. Make sure HDFS is running 5. Start HBase 6. Verify HBase is running 6 1: Verify Installation Requirements • Latest release of Java 6 from Oracle • Must have compatible release of Hadoop! – runs of top of HDFS – Today runs on Hadoop 0.20.x – Can run on top of local FS • Will lose data when crashes • Needs HDFS's durable sync for data fault-tolerance – HDFS provides confirmation that the data has been saved – Confirmation is provided after all blocks are successfully replicated to all the required nodes 7
  • 4. 1: Verify Installation Requirements • SSH installed, sshd must be running – Just like Hadoop – Need password-less SSH to all the nodes including yourself – Required for both pseudo-distributed and fully-distributed modes • Windows – Very little testing – for development only – Will need Cygwin 8 2: Configure Java • vi <HBASE_HOME>/conf/hbase-env.sh export JAVA_HOME=/usr/java/jdk1.6.0 9
  • 5. 3: Configure the use of HDFS • Point to HDFS for its filesystem – Edit <hbase_home>/conf/hbase-site.xml – hbase.rootdir property: • Uses HDFS URI • Recall URI format: scheme://namenode/path – Example: hdfs://localhost:9000/hbase • The location of namenode • directory on HDFS where Region Servers will save it's data – If directory doesn't exist it will be created – dfs.replication property: • The number of times data will be replicated across Region Servers (HLog and HFile) • Will set to 1 since there is only 1 host 10 3: Configure the use of HDFS <hbase_home>/conf/hbase-site.xml 11 <configuration> ... <property> <name>hbase.rootdir</name> <value>hdfs://localhost:9000/hbase</value> <description>The directory shared by RegionServers. </description> </property> <property> <name>dfs.replication</name> <value>1</value> <description>The replication count for HLog and HFile storage. Should not be greater than HDFS datanode count. </description> </property> ... </configuration> Will this configuration work on a remote client?
  • 6. 3: Configure the use of HDFS • Since 'localhost' was specified as the location of the namenode remote clients can't use this configuration 12 ... <property> <name>hbase.rootdir</name> <value>hdfs://localhost:9000/hbase</value> </property> ... 4: Make sure HDFS is running • Make sure HDFS is running – Easy way is to check web-based management console • http://localhost:50070/dfshealth.jsp – Or use command line • $ hdfs dfs -ls / 13
  • 7. 5: Start HBase 14 $ cd <hbase_home>/bin $ ./start-hbase.sh starting master, logging to /home/hadoop/Training/logs/hbase/hbase-hadoop-master- hadoop-laptop.out ... 6: Verify HBase is Running 15 $ hadoop fs -ls /hbase Found 5 items drwxr-xr-x - hadoop supergroup 0 2011-12-31 13:18 /hbase/-ROOT- drwxr-xr-x - hadoop supergroup 0 2011-12-31 13:18 /hbase/.META. drwxr-xr-x - hadoop supergroup 0 2011-12-31 13:18 /hbase/.logs drwxr-xr-x - hadoop supergroup 0 2011-12-31 13:18 /hbase/.oldlogs -rw-r--r-- 1 hadoop supergroup 3 2011-12-31 13:18 /hbase/hbase.version HBase data and metadata is stored in HDFS $ hbase shell HBase Shell; enter 'help<RETURN>' for list of supported commands. Type "exit<RETURN>" to leave the HBase Shell Version 0.90.4-cdh3u2, r, Thu Oct 13 20:32:26 PDT 2011 hbase(main):001:0> list TABLE 0 row(s) in 0.4070 seconds Run a command to verify that cluster is actually running
  • 8. 6: Verify HBase is Running • By default HBase manages Zookeeper daemon for you • Logs by default go to <hbase_home>/logs – Change the default by editing <hbase_home>/conf/hbase- env.sh • export HBASE_LOG_DIR=/new/location/logs 16 HBase Management Console • HBase comes with web based management – http://localhost:60010 • Both Master and Region servers run web server – Browsing Master will lead you to region servers • Regions run on port 60030 • Firewall considerations – Opening <master_host>:60010 in firewall is not enough – Have to open up <region(s)_host>:60030 on every slave host – An easy option is to open a browser behind the firewall • SSH tunneling and Virtual Network Computing (VNC) 17
  • 9. HBase Management Console 18 HBase Shell 19 • JRuby IRB (Interactive Ruby Shell) – HBase commands added – If you can do it in IRB you can do it in HBase shell • http://en.wikipedia.org/wiki/Interactive_Ruby_Shell • To run simply – Puts you into IRB – Type 'help' to get a listing of commands • $ help “command” (quotes are required) – > help “get” $ <hbase_install>/bin/hbase shell HBase Shell; enter 'help<RETURN>' for list of supported commands. Type "exit<RETURN>" to leave the HBase Shell Version 0.90.4-cdh3u2, r, Thu Oct 13 20:32:26 PDT 2011 hbase(main):001:0>
  • 10. HBase Shell • Quote all names – Table and column names – Single quotes for text • hbase> get 't1', 'myRowId' – Double quotes for binary • Use hexadecimal representation of that binary value • hbase> get 't1', "keyx03x3fxcd" • Uses ruby hashes to specify parameters – {'key1' => 'value1', 'key2' => 'value2', …} – Example: 20 hbase> get 'UserTable', 'userId1', {COLUMN => 'address:str'} HBase Shell 21 • HBase Shell supports various commands – General • status, version – Data Definition Language (DDL) • alter, create, describe, disable, drop, enable, exists, is_disabled, is_enabled, list – Data Manipulation Language (DML) • count, delete, deleteall, get, get_counter, incr, put, scan, truncate – Cluster administration • balancer, close_region, compact, flush, major_compact, move, split, unassign, zk_dump, add_peer, disable_peer, enable_peer, remove_peer, start_replication, stop_replication • Learn more about each command – hbase> help "<command>"
  • 11. HBase Shell - Check Status • Display cluster's status via status command – hbase> status – hbase> status 'detailed' • Similar information can be found on HBase Web Management Console – http://localhost:60010 22 HBase Shell - Check Status 23 hbase> status 1 servers, 0 dead, 3.0000 average load hbase> status 'detailed' version 0.90.4-cdh3u2 0 regionsInTransition 1 live servers hadoop-laptop:39679 1326056194009 requests=0, regions=3, usedHeap=30, maxHeap=998 .META.,,1 stores=1, storefiles=0, storefileSizeMB=0, … -ROOT-,,0 stores=1, storefiles=1, storefileSizeMB=0, … Blog,,1326059842133.c1b865dd916b64a6228ecb4f743 … 0 dead servers
  • 12. HBase Shell DDL and DML • Let's walk through an example 1. Create a table • Define column families 2. Populate table with data records • Multiple records 3. Access data • Count, get and scan 4. Edit data 5. Delete records 6. Drop table 24 1: Create Table • Create table called 'Blog' with the following schema – 2 families • 'info' with 3 columns: 'title', 'author', and 'date' • 'content' with 1 column family: 'post' 25 Blog Family: info: Columns: title, author, date content: Columns: post
  • 13. 1: Create Table • Various options to create tables and families – hbase> create 't1', {NAME => 'f1', VERSIONS => 5} – hbase> create 't1', {NAME => 'f1', VERSIONS => 1, TTL => 2592000, BLOCKCACHE => true} – hbase> create 't1', {NAME => 'f1'}, {NAME => 'f2'}, {NAME => 'f3'} – hbase> create 't1', 'f1', 'f2', 'f3' 26 hbase> create 'Blog', {NAME=>'info'}, {NAME=>'content'} 0 row(s) in 1.3580 seconds 2: Populate Table With Data Records • Populate data with multiple records • Put command format: hbase> put 'table', 'row_id', 'family:column', 'value' 27 Row Id info:title info:author info:date content:post Matt-001 Elephant Matt 2009.05.06 Do elephants like monkeys? Matt-002 Monkey Matt 2011.02.14 Do monkeys like elephants? Bob-003 Dog Bob 1995.10.20 People own dogs! Michelle-004 Cat Michelle 1990.07.06 I have a cat! John-005 Mouse John 2012.01.15 Mickey mouse.
  • 14. 2: Populate Table With Data Records 28 # insert row 1 put 'Blog', 'Matt-001', 'info:title', 'Elephant' put 'Blog', 'Matt-001', 'info:author', 'Matt' put 'Blog', 'Matt-001', 'info:date', '2009.05.06' put 'Blog', 'Matt-001', 'content:post', 'Do elephants like monkeys?' … … … # insert rows 2-4 … … # row 5 put 'Blog', 'John-005', 'info:title', 'Mouse' put 'Blog', 'John-005', 'info:author', 'John' put 'Blog', 'John-005', 'info:date', '1990.07.06' put 'Blog', 'John-005', 'content:post', 'Mickey mouse.' 1 put statement per cell 3. Access data - count • Access Data – count: display the total number of records – get: retrieve a single row – scan: retrieve a range of rows • Count is simple – hbase> count 'table_name' – Will scan the entire table! May be slow for a large table • Alternatively can run a MapReduce job (more on this later...) – $ yarn jar hbase.jar rowcount – Specify count to display every n rows. Default is 1000 • hbase> count 't1', INTERVAL => 10 29
  • 15. 3. Access data - count 30 hbase> count 'Blog', {INTERVAL=>2} Current count: 2, row: John-005 Current count: 4, row: Matt-002 5 row(s) in 0.0220 seconds hbase> count 'Blog', {INTERVAL=>1} Current count: 1, row: Bob-003 Current count: 2, row: John-005 Current count: 3, row: Matt-001 Current count: 4, row: Matt-002 Current count: 5, row: Michelle-004 Affects how often count is displayed 3. Access data - get 31 • Select single row with 'get' command – hbase> get 'table', 'row_id' • Returns an entire row – Requires table name and row id – Optional: timestamp or time-range, and versions • Select specific columns – hbase> get 't1', 'r1', {COLUMN => 'c1'} – hbase> get 't1', 'r1', {COLUMN => ['c1', 'c2', 'c3']} • Select specific timestamp or time-range – hbase> get 't1', 'r1', {TIMERANGE => [ts1, ts2]} – hbase> get 't1', 'r1', {COLUMN => 'c1', TIMESTAMP => ts1} • Select more than one version – hbase> get 't1', 'r1', {VERSIONS => 4}
  • 16. 3. Access data - get 32 hbase> get 'Blog', 'unknownRowId' COLUMN CELL 0 row(s) in 0.0250 seconds hbase> get 'Blog', 'Michelle-004' COLUMN CELL content:post timestamp=1326061625690, value=I have a cat! info:author timestamp=1326061625630, value=Michelle info:date timestamp=1326061625653, value=1990.07.06 info:title timestamp=1326061625608, value=Cat 4 row(s) in 0.0420 seconds Row Id doesn't exist Returns ALL the columns, displays 1 column per row!!! 3. Access data - get 33 hbase> get 'Blog', 'Michelle-004', {COLUMN=>['info:author','content:post']} COLUMN CELL content:post timestamp=1326061625690, value=I have a cat! info:author timestamp=1326061625630, value=Michelle 2 row(s) in 0.0100 seconds hbase> get 'Blog', 'Michelle-004', {COLUMN=>['info:author','content:post'], TIMESTAMP=>1326061625690} COLUMN CELL content:post timestamp=1326061625690, value=I have a cat! 1 row(s) in 0.0140 seconds Narrow down to just two columns Narrow down via columns and timestamp Only one timestamp matches
  • 17. 3. Access data - get 34 hbase> get 'Blog', 'Michelle-004', {COLUMN=>'info:date', VERSIONS=>2} COLUMN CELL info:date timestamp=1326071670471, value=1990.07.08 info:date timestamp=1326071670442, value=1990.07.07 2 row(s) in 0.0300 seconds hbase> get 'Blog', 'Michelle-004', {COLUMN=>'info:date'} COLUMN CELL info:date timestamp=1326071670471, value=1990.07.08 1 row(s) in 0.0190 seconds Asks for the latest two versions By default only the latest version is returned 3. Access data - Scan 35 • Scan entire table or a portion of it • Load entire row or explicitly retrieve column families, columns or specific cells • To scan an entire table – hbase> scan 'table_name' • Limit the number of results – hbase> scan 'table_name', {LIMIT=>1} • Scan a range – hbase> scan 'Blog', {STARTROW=>'startRow', STOPROW=>'stopRow'} – Start row is inclusive, stop row is exclusive – Can provide just start row or just stop row
  • 18. 3. Access data - Scan • Limit what columns are retrieved – hbase> scan 'table', {COLUMNS=>['col1', 'col2']} • Scan a time range – hbase> scan 'table', {TIMERANGE => [1303, 13036]} • Limit results with a filter – hbase> scan 'Blog', {FILTER => org.apache.hadoop.hbase.filter.ColumnPaginationFilter.n ew(1, 0)} – More about filters later 36 3. Access data - Scan 37 hbase(main):014:0> scan 'Blog' ROW COLUMN+CELL Bob-003 column=content:post, timestamp=1326061625569, value=People own dogs! Bob-003 column=info:author, timestamp=1326061625518, value=Bob Bob-003 column=info:date, timestamp=1326061625546, value=1995.10.20 Bob-003 column=info:title, timestamp=1326061625499, value=Dog John-005 column=content:post, timestamp=1326061625820, value=Mickey mouse. John-005 column=info:author, timestamp=1326061625758, value=John … Michelle-004 column=info:author, timestamp=1326061625630, value=Michelle Michelle-004 column=info:date, timestamp=1326071670471, value=1990.07.08 Michelle-004 column=info:title, timestamp=1326061625608, value=Cat 5 row(s) in 0.0670 seconds Scan the entire table, grab ALL the columns
  • 19. 3. Access data - Scan 38 hbase> scan 'Blog', {STOPROW=>'John'} ROW COLUMN+CELL Bob-003 column=content:post, timestamp=1326061625569, value=People own dogs! Bob-003 column=info:author, timestamp=1326061625518, value=Bob Bob-003 column=info:date, timestamp=1326061625546, value=1995.10.20 Bob-003 column=info:title, timestamp=1326061625499, value=Dog 1 row(s) in 0.0410 seconds Stop row is exclusive, row ids that start with John will not be included 3. Access data - Scan 39 hbase> scan 'Blog', {COLUMNS=>'info:title', STARTROW=>'John', STOPROW=>'Michelle'} ROW COLUMN+CELL John-005 column=info:title, timestamp=1326061625728, value=Mouse Matt-001 column=info:title, timestamp=1326061625214, value=Elephant Matt-002 column=info:title, timestamp=1326061625383, value=Monkey 3 row(s) in 0.0290 seconds Only retrieve 'info:title' column
  • 20. 4: Edit data • Put command inserts a new value if row id doesn't exist • Put updates the value if the row does exist • But does it really update? – Inserts a new version for the cell – Only the latest version is selected by default – N versions are kept per cell • configured per family at creation: • 3 versions are kept by default 40 hbase> create 'table', {NAME => 'family', VERSIONS => 7} 4: Edit data 41 hbase> put 'Blog', 'Michelle-004', 'info:date', '1990.07.06' 0 row(s) in 0.0520 seconds hbase> put 'Blog', 'Michelle-004', 'info:date', '1990.07.07' 0 row(s) in 0.0080 seconds hbase> put 'Blog', 'Michelle-004', 'info:date', '1990.07.08' 0 row(s) in 0.0060 seconds hbase> get 'Blog', 'Michelle-004', {COLUMN=>'info:date', VERSIONS=>3} COLUMN CELL info:date timestamp=1326071670471, value=1990.07.08 info:date timestamp=1326071670442, value=1990.07.07 info:date timestamp=1326071670382, value=1990.07.06 3 row(s) in 0.0170 seconds Update the same exact row with a different value Keeps three versions of each cell by default
  • 21. 4: Edit data 42 hbase> get 'Blog', 'Michelle-004', {COLUMN=>'info:date', VERSIONS=>2} COLUMN CELL info:date timestamp=1326071670471, value=1990.07.08 info:date timestamp=1326071670442, value=1990.07.07 2 row(s) in 0.0300 seconds hbase> get 'Blog', 'Michelle-004', {COLUMN=>'info:date'} COLUMN CELL info:date timestamp=1326071670471, value=1990.07.08 1 row(s) in 0.0190 seconds Asks for the latest two versions By default only the latest version is returned 5: Delete records • Delete cell by providing table, row id and column coordinates – delete 'table', 'rowId', 'column' – Deletes all the versions of that cell • Optionally add timestamp to only delete versions before the provided timestamp – delete 'table', 'rowId', 'column', timestamp 43
  • 22. 5: Delete records 44 hbase> get 'Blog', 'Bob-003', 'info:date' COLUMN CELL info:date timestamp=1326061625546, value=1995.10.20 1 row(s) in 0.0200 seconds hbase> delete 'Blog', 'Bob-003', 'info:date' 0 row(s) in 0.0180 seconds hbase> get 'Blog', 'Bob-003', 'info:date' COLUMN CELL 0 row(s) in 0.0170 seconds 5: Delete records 45 hbase> get 'Blog', 'Michelle-004', {COLUMN=>'info:date', VERSIONS=>3} COLUMN CELL info:date timestamp=1326254742846, value=1990.07.08 info:date timestamp=1326254739790, value=1990.07.07 info:date timestamp=1326254736564, value=1990.07.06 3 row(s) in 0.0120 seconds hbase> delete 'Blog', 'Michelle-004', 'info:date', 1326254739791 0 row(s) in 0.0150 seconds hbase> get 'Blog', 'Michelle-004', {COLUMN=>'info:date', VERSIONS=>3} COLUMN CELL info:date timestamp=1326254742846, value=1990.07.08 1 row(s) in 0.0090 seconds 3 versions 1 millisecond after the second version After the timestamp provided at delete statement
  • 23. 6: Drop table • Must disable before dropping – puts the table “offline” so schema based operations can be performed – hbase> disable 'table_name' – hbase> drop 'table_name' • For a large table it may take a long time.... 46 6: Drop table 47 hbase> list TABLE Blog 1 row(s) in 0.0120 seconds hbase> disable 'Blog' 0 row(s) in 2.0510 seconds hbase> drop 'Blog' 0 row(s) in 0.0940 seconds hbase> list TABLE 0 row(s) in 0.0200 seconds Take the table offline for schema modifications
  • 24. © 2012 coreservlets.com and Dima May Customized Java EE Training: http://courses.coreservlets.com/ Hadoop, Java, JSF 2, PrimeFaces, Servlets, JSP, Ajax, jQuery, Spring, Hibernate, RESTful Web Services, Android. Developed and taught by well-known author and developer. At public venues or onsite at your location. Wrap-Up Summary • We learned – How to install HBase in Pseudo-Distributed Mode – How to use HBase Shell – HBase Shell commands 49
  • 25. © 2012 coreservlets.com and Dima May Customized Java EE Training: http://courses.coreservlets.com/ Hadoop, Java, JSF 2, PrimeFaces, Servlets, JSP, Ajax, jQuery, Spring, Hibernate, RESTful Web Services, Android. Developed and taught by well-known author and developer. At public venues or onsite at your location. Questions? More info: http://www.coreservlets.com/hadoop-tutorial/ – Hadoop programming tutorial http://courses.coreservlets.com/hadoop-training.html – Customized Hadoop training courses, at public venues or onsite at your organization http://courses.coreservlets.com/Course-Materials/java.html – General Java programming tutorial http://www.coreservlets.com/java-8-tutorial/ – Java 8 tutorial http://www.coreservlets.com/JSF-Tutorial/jsf2/ – JSF 2.2 tutorial http://www.coreservlets.com/JSF-Tutorial/primefaces/ – PrimeFaces tutorial http://coreservlets.com/ – JSF 2, PrimeFaces, Java 7 or 8, Ajax, jQuery, Hadoop, RESTful Web Services, Android, HTML5, Spring, Hibernate, Servlets, JSP, GWT, and other Java EE training