SlideShare a Scribd company logo
1 of 50
Download to read offline
Working with HBase and Big SQL
Cynthia M. Saracco
IBM Solution Architect
September 14, 2015
Page 2 Introducing HBase
Contents
LAB 1 OVERVIEW......................................................................................................................................................... 3
1.1. ABOUT YOUR ENVIRONMENT ..................................................................................................................... 3
1.2. GETTING STARTED ................................................................................................................................... 4
LAB 2 ISSUING BASIC HBASE COMMANDS ............................................................................................................. 6
2.1. CREATING AND ALTERING A TABLE ............................................................................................................. 7
2.2. INSERTING AND RETRIEVING DATA............................................................................................................ 10
2.3. UPDATING DATA..................................................................................................................................... 14
2.4. DELETING DATA ..................................................................................................................................... 15
2.5. DROPPING A TABLE ................................................................................................................................ 16
2.6. EXPLORING THE IMPACT OF YOUR WORK................................................................................................... 17
LAB 3 USING BIG SQL TO CREATE AND QUERY HBASE TABLES ...................................................................... 24
3.1. EXPERIMENTING WITH YOUR FIRST BIG SQL HBASE TABLE........................................................................ 24
3.2. CREATING VIEWS OVER BIG SQL HBASE TABLES...................................................................................... 28
3.3. LOADING DATA INTO A BIG SQL HBASE TABLE.......................................................................................... 28
3.4. OPTIONAL: DROPPING TABLES CREATED IN THIS LAB ................................................................................. 31
LAB 4 EXPLORING DATA MODELING OPTIONS..................................................................................................... 33
4.1. DE-NORMALIZING RELATIONAL TABLES (MAPPING MULTIPLE TABLES TO ONE HBASE TABLE) ........................... 33
4.2. GENERATING UNIQUE ROW KEY VALUES ................................................................................................... 39
4.3. MAPPING MULTIPLE SQL COLUMNS TO ONE HBASE ROW KEY OR COLUMN ................................................... 43
4.4. OPTIONAL: DROPPING TABLES CREATED IN THIS LAB ................................................................................. 45
LAB 5 SUMMARY ....................................................................................................................................................... 47
Hands On Lab Page 3
Lab 1 Overview
In this hands-on lab, you'll learn the basics of using HBase natively and with Big SQL, IBM's industry-
standard SQL interface for data stored in supported Hadoop-based platforms. HBase is an open source
key-value data storage mechanism commonly used in Hadoop environments. It features a column-
oriented data model that enables programmers to efficiently retrieve data by key values from very large
data sets. It also supports certain data modification operations, readily accommodates sparse data, and
supports various kinds of data. Indeed, HBase doesn’t have primitive data types – all user data is stored
as byte arrays. With Big SQL, programmers can use SQL to query data in Big SQL tables managed by
HBase.
After completing this hands-on lab, you’ll be able to:
• Use the HBase shell (command-line interface) to issue commands
• Create HBase tables that include multiple column families and columns
• Work directly with HBase tables to add, retrieve, and delete data
• Create, populate, and query Big SQL tables that store data in HBase
• Create and query views based on Big SQL tables managed by HBase
• Explore options for mapping relational database designs to Big SQL HBase tables
• Generate unique row key values for Big SQL tables managed by HBase
• Investigate the storage implications of certain Big SQL HBase table designs
• Explore the status of your HBase cluster through a Web interface
Allow 2.5 - 3 hours to complete this lab. Although this lab briefly summarizes HBase architecture and
basic concepts, you may find it helpful to read introductory articles or watch some videos on HBase
before beginning your work.
This lab was developed by Cynthia M. Saracco, IBM Silicon Valley Lab, with thanks to Bruce Brown and
Piotr Pruski for their earlier contributions and Nailah Bissoon, Scott Gray, Dan Kikuchi, Ellen Patterson,
Henry Quach, and Deepa Remesh for their earlier reviews. Please post questions or comments about
this lab or the technologies it describes to the forum on Hadoop Dev at
https://developer.ibm.com/hadoop/.
1.1. About your environment
This lab was developed for a BigInsights 4.1 environment running on the IBM Open Platform for Apache
Hadoop. To complete this lab, install and launch these Hadoop-based services: HDFS, MapReduce,
YARN, Hive, HBase, ZooKeeper, Sqoop, Knox, and Ambari. In addition, you must install and launch the
Big SQL service, as well as install the BigInsights Home service, which includes JSqsh (the SQL
command line environment). Big SQL and BigInsights Home are part of BigInsights Quick Start Edition,
BigInsights Data Analyst, and BigInsights Data Scientist. Before proceeding with this tutorial, ensure that
you have access to a working platform with Big SQL running and JSqsh accessible.
Page 4 Introducing HBase
Examples in this lab use are based on a sample environment with the configuration shown in the tables
below. If your environment is different, modify the sample code and instructions as needed to
match your configuration.
User Password
Root account root password
Big SQL Administrator bigsql bigsql
Ambari Administrator admin admin
Property Value
Host name myhost.ibm.com
Ambari port number 8080
HBase client directory with HBase
shell
/usr/iop/current/hbase-client/bin
HBase Master installation directory /usr/iop/current/hbase-master
HBase Region Server installation
directory
/usr/iop/current/hbase-region
Big SQL database name bigsql
Big SQL port number 32051
Big SQL installation directory /usr/ibmpacks/bigsql
Big SQL samples directory /usr/ibmpacks/bigsql/4.1/bigsql/samples/data
JSqsh installation directory /usr/ibmpacks/common-utils/current/jsqsh
About the screen captures, sample code, and environment configuration
Screen captures in this lab depict examples and results that may vary from
what you see when you complete the exercises. In addition, some code
examples may need to be customized to match your environment.
1.2. Getting started
To get started with the lab exercises, you need to access to a working BigInsights environment, as
described in the previous section. A free Quick Start Edition is available for download from IBM at
http://www-01.ibm.com/software/data/infosphere/hadoop/trials.html.
Hands On Lab Page 5
This lab was tested against a native BigInsights installation on a multi-node cluster. Although the IBM
Analytics for Hadoop cloud service on Bluemix includes Big SQL, it does not support SSH access. As a
result, examples in this lab that involve JSqsh, the HBase shell, and HDFS commands cannot be
executed using that Bluemix service. Therefore, if you want to follow all examples in this lab, you should
install and configure BigInsights on your own cluster. Consult the instructions in the product's
Knowledge Center (http://www-
01.ibm.com/support/knowledgecenter/SSPT3X_4.1.0/com.ibm.swg.im.infosphere.biginsights.welcome.d
oc/doc/welcome.html).
Before continuing with this lab, verify that Big SQL and all pre-requisite services are running.
If have any questions or need help getting your environment up and running, visit Hadoop Dev
(https://developer.ibm.com/hadoop/) and review the product documentation or post a
message to the forum. You cannot proceed with subsequent lab exercises without access to a
working environment.
Page 6 Introducing HBase
Lab 2 Issuing basic HBase commands
After confirming that all necessary services are running, you're ready to start using HBase directly.
Exercises in this lab are intended for those with little or no prior experience using HBase. As such, after
completing this lab, you'll know how to
 Launch the HBase shell
 Create an HBase table
 Inspect the characteristics of a table
 Alter properties associated with a table
 Populate a table with data
 Retrieve data from a table
 Use HBase Web interfaces to explore information about your environment
As you work through this lab, you'll become familiar with some basic HBase concepts, such as row keys,
column families, and columns. You'll also observe its schema-less nature. However, a detailed
explanation of HBase is beyond the scope of this lab. Visit the Apache HBase site
(http://hbase.apache.org/) or the HBase wiki (http://wiki.apache.org/hadoop/Hbase) for reference
materials on HBase.
To keep this lab simple, you will create one HBase table to track customer reviews of various products.
Each review will have a unique identifier, summary information (e.g., product name and rating), reviewer
data (e.g., name and location), and detailed information (e.g., review comments). In a relational DBMS,
such information might be stored in a single table with one column for each attribute to be tracked (e.g.,
REVIEW-ID, PRODUCT, RATING, REVIEWER-NAME, REVIEWER-LOCATION, COMMENTS).
Furthermore, a data type would be specified for each column at creation – perhaps INT for the REVIEW-
ID, VARCHAR(30) for the PRODUCT, and so on.
In HBase, your table design will be different. The unique identifier for each review will serve as the row
key. Attributes commonly queried together will be grouped into a column family. HBase requires at least
one column family per table. Yours will have three:
 summary, which summarizes essential review information (such as the product name and
rating).
 reviewer, which tracks data related to the user who reviewed the product.
 details, which tracks comments, tips, and other detailed review information.
Each of these column families may contain one or more columns, depending on the data associated with
a given review. For example, one review might contain the name, location, and email address of a
reviewer, while another might only contain the reviewer’s name. HBase is schema-less, so you don’t
define any columns when you create your table. Furthermore, since all data in HBase is stored in byte
arrays, you don’t declare data types for your columns. You’ll put this concepts into practice shortly.
Hands On Lab Page 7
In production environments, programmers typically interact with HBase through applications they've built
in Java or another supported interface, such as REST or Thrift. However, to keep things simple, you'll
work with HBase through its command-line interface.
HBase consists of an active HBase Master Server and one or more Region Server(s). Region Servers
manage user data modeled as HBase tables. HBase automatically partitions tables into regions, storing
a range of rows together based on their key values. Regions are stored in files in your distributed file
system.
Allow 1 – 1.5 hours to complete all sections of this lab. You must have a working Big SQL and HBase
environment, as described in the first module of this series of lab exercises.
2.1. Creating and altering a table
To begin, create a reviews table and alter some of its default properties before populating it with data.
__1. If necessary, open a terminal window.
__2. Launch the HBase shell. From the HBase client …/bin directory (typically
/usr/iop/current/hbase-client/bin), issue this command:
hbase shell
Ignore any informational messages that may appear. Verify that the shell launched successfully
and that your screen appears similar to this:
__3. Optionally, type help to review information about supported shell commands. A portion of this
information is shown here:
Page 8 Introducing HBase
__4. Create an HBase table named reviews with 3 column families: summary, reviewer, and details.
Ignore any warnings that may appear involving multiple SLF4J bindings.
create 'reviews', 'summary', 'reviewer', 'details'
About the CREATE command
CREATE only requires the name of the table and one or more column families.
In a moment, you'll see how to add columns to the table dynamically when you
insert some data into the table.
__5. Inspect the default properties associated with your new table:
describe 'reviews'
Note that your table is shown as ENABLED, or ready for use. Also note that each column family
has some various properties associated with it. For example, the summary column family's
IN_MEMORY property is set to false in the screen capture below. Since we expect most HBase
queries to reference this column family, let's change the property to TRUE. This instructs HBase
to give priority to caching this data.
Hands On Lab Page 9
__6. To alter (or drop) a table, you must first disable it:
disable 'reviews'
__7. Alter the table to set the IN_MEMORY property of the summary column family to true.
alter 'reviews', {NAME => 'summary', IN_MEMORY => 'true'}
__8. Set the number of versions for the summary and reviewer column families to 2. HBase can
store multiple versions of data for each column family. If your application does not require
multiple versions, the VERSIONS property for each column family should be set to 1.
alter 'reviews', {NAME => 'summary', VERSIONS => 2}, {NAME => 'reviewer',
VERSIONS => 2}
__9. Verify that your property changes were captured correctly:
describe 'reviews'
Page 10 Introducing HBase
__10. Enable (or activate) the table so that it’s ready for use.
enable 'reviews'
Now you can populate your table with data and query it.
2.2. Inserting and retrieving data
This exercise introduces you to the PUT, GET, SCAN, and COUNT commands. As you might imagine,
PUT enables you to write data to HBase. GET and SCAN enable you to retrieve data, while COUNT
returns the total number of rows in your table.
__1. Insert some data into your HBase table. The PUT command enables you to write data into a
single cell of an HBase table. This cell may reside in an existing row or may belong to a new
row.
Issue this command:
put 'reviews', '101', 'summary:product', 'hat'
Hands On Lab Page 11
What happened after executing this command
Executing this command caused HBase to add a row with a row key of 101 to
the reviews table and to write the value of hat into the product column of
the summary column family. Note that this command dynamically created the
summary:product column and that no data type was specified for this column.
What if you have more data for this row? You need to issue additional PUT
commands – one for each cell (i.e., each column family:column) in the target
row. You’ll do that shortly. But before you do, consider what HBase just did
behind the scenes . . . .
HBase wrote your data to a Write-Ahead Log (WAL) in your distributed file
system to allow for recovery from a server failure. In addition, it cached your
data (in a MemStore) of a specific region managed by a specific Region
Server. At some point, when the MemStore becomes full, your data will be
flushed to disk and stored in files (HFiles) in your distributed file system. Each
HFile contains data related to a specific column family.
__2. Retrieve the row. To do so, provide the table name and row key value to the GET command:
get 'reviews', '101'
__3. Add more cells (columns and data values) to this row:
put 'reviews', '101', 'summary:rating', '5'
put 'reviews', '101', 'reviewer:name', 'Chris'
put 'reviews', '101', 'details:comment', 'Great value'
Page 12 Introducing HBase
About your table . . . .
Conceptually, your table looks something like this:
It has one row with 3 column families. The summary column family for this row
contains two columns, while the other two column families for this row each
have one column.
Physically, data in each column family is stored together in your distributed file
system (in one or more HFiles).
__4. Retrieve row key 101 again:
get 'reviews', '101'
About this output
This output can be a little confusing at first, because it’s showing that 4 rows
are returned. This row count refers to the number of lines (rows) displayed on
the screen. Since information about each cell is displayed on a separate line
and there are 4 cells in row 101, the GET command reports 4 rows.
__5. Count the number of rows in the entire table and verify that there is only 1 row:
count 'reviews'
Hands On Lab Page 13
The COUNT command is appropriate for small tables only. (Consult the HBase site for
more efficient approaches for large tables.)
__6. Add 2 more rows to your table using these commands:
put 'reviews', '112', 'summary:product', 'vest'
put 'reviews', '112', 'summary:rating', '5'
put 'reviews', '112', 'reviewer:name', 'Tina'
put 'reviews', '133', 'summary:product', 'vest'
put 'reviews', '133', 'summary:rating', '4'
put 'reviews', '133', 'reviewer:name', 'Helen'
put 'reviews', '133', 'reviewer:location', 'USA'
put 'reviews', '133', 'details:tip', 'Sizes run small. Order 1 size up.'
Note that review 112 lacks any detailed information (e.g., a comment), while review 133 contains
a tip in its details. Note also that review 133 includes the reviewer's location, which is not
present in the other rows. Let's explore how HBase captures this information.
__7. Retrieve the entire contents of the table using this SCAN command:
scan 'reviews'
Note that SCAN correctly reports that the table contains 3 rows. The display contains more than
3 lines, because each line includes information for a single cell in a row. Note also that each row
in your table has a different schema and that missing information is simply omitted.
Furthermore, each displayed line includes not only the value of a particular cell in the table but
also its associated row key (e.g., 101), column family name (e.g., details), column name (e.g.,
comment), and timestamp. As you learned earlier, HBase is a key-value store. Together, these
four attributes (row key, column family name, column qualifier, and timestamp) form the key.
Page 14 Introducing HBase
Consider the implications of storing this key information with each cell value. Having a large
number of columns with values for all rows (in other words, dense data) means that a lot of key
information is repeated. Also, large row key values and long column family / column names
increase the table’s storage requirements.
__8. Finally, restrict the scan results to retrieve only the contents of the summary column family and
the reviewer:name column for row keys starting at '120' and ending at '150'.
scan 'reviews', {COLUMNS => ['summary', 'reviewer:name'], STARTROW => '120',
STOPROW => '150'}
Given your sample data, only row '133' qualifies. Note that the reviewer's location
(reviewer:location) and all the review details (details:tip) were omitted from the results
due to the scan parameters you specified.
2.3. Updating data
HBase doesn't have an UPDATE command or API. Instead, programmers simply write another set of
column values for the same row key. In this exercise, you'll see how to update data values using the PUT
command (again). You'll also explore how HBase maintains multiple versions of your data for the
summary and reviewer column families. As you'll recall, in an earlier exercise you set the VERSIONS
properties of these families to 2.
__1. Update Tina's review (row key 112) to change the rating to '4':
put 'reviews', '112', 'summary:rating', '4'
__2. Scan the table to inspect the change.
scan 'reviews'
By default, HBase returns the most recent version of data for each cell.
Hands On Lab Page 15
__3. To see multiple versions of your data, issue this command:
scan 'reviews', {VERSIONS => 2}
__4. You can also GET the original rating value from row 112 by explicitly specifying the timestamp
value. This value will differ on your system, so you will need to substitute the value
appropriate for your environment for the timestamp shown below. Consult the output from
the previous step to obtain this value.
get 'reviews', '112', {COLUMN => 'summary:rating', TIMESTAMP =>
1421878110712}
2.4. Deleting data
In this exercise, you'll learn how to delete data in your HBase table. You can delete a single cell value
within a row or all cell values within a row.
__1. Delete Tina's name from her review (row 112).
delete 'reviews', '112', 'reviewer:name'
__2. Scan the table to inspect the change.
scan 'reviews'
Page 16 Introducing HBase
__3. Delete all cells associated with Tina's review (i.e., all data for row 112) and scan the table to
inspect the change.
deleteall 'reviews', '112'
scan 'reviews'
About DELETE . . . .
DELETE doesn't remove data from the table immediately. Instead, it marks the
data for deletion, which prevents the data from being included in any
subsequent data retrieval operations. Because the underlying files that form
an HBase table (HFiles) are immutable, storage for deleted data will not be
recovered until an administrator initiates a major compaction operation. This
operation consolidates data and reconciles deletions by removing both the
deleted data and the delete indicator.
2.5. Dropping a table
In this exercise, you'll learn how to drop an HBase table. Because we want to retain the reviews table
for future exercises, you'll create a new table, verify its existence, and then drop it.
__1. Create a sample table with 1 column family.
create 'sample', 'cf1'
__2. Describe the table or verify that it exists. Issue one of these two commands:
Hands On Lab Page 17
describe 'sample'
exists 'sample'
__3. Disable the table you just created. (Before you can drop a table, you must disable or deactivate
it.)
disable 'sample'
__4. Drop the table.
drop 'sample'
__5. Verify that the table no longer exists.
exists 'sample'
2.6. Exploring the impact of your work
Are you curious about how your work has affected your HBase environment? This exercise helps you
explore some of the meta data available to you about your table as well as your overall HBase
environment.
__1. Launch a Web browser.
__2. Enter the URL and port of your HBase Master Server Web interface (by default, this is port
60010). For example, if your host name is myhost.ibm.com and the HBase Master Service Web
interface port is 60010, you would enter http://myhost.ibm.com:60010.
Page 18 Introducing HBase
Locating the HBase Master information port
By default, the HBase Master Web interface port is 60010 on your host
machine. This information is configured in the . . . /conf/hbase-site.xml file
within the installation directory of your HBase Master (e.g.,
/usr/iop/current/hbase-master). Look for the hbase.master.info.port
property in the hbase-site.xml file.
__3. Scroll to the Tables section and click the User Tables tab. Note that your reviews table is
present and that its data resides in 1 region (because your table is very small). Also note that
the description highlights a few important aspects of your table, including its column families and
properties that you altered to non-default values.
Hands On Lab Page 19
__4. Click the [Details] link at top to display further details about the tables in your HBase server.
__5. If necessary, scroll down to locate data for your table. Note that all your table's properties and
their values are displayed here.
Where have you seen similar output?
This output should look familiar to you. Indeed, in a previous exercise you
issued the describe command from the HBase shell to inspect the status of
your table (enabled/disabled) as well as its properties.
__6. Click the Back button on your browser to return to the previous page (the main page for the
HBase Master Server Web interface).
__7. In the Tables section, click on the User Tables tab and locate your reviews table again. Click on
the link for this table.
__8. Note the Region Server link for your table and click on it. You’ll find this information in the Table
Regions section of the displayed page.
Page 20 Introducing HBase
__9. After you have been redirected to the HBase Region Server Web interface, skim through the
information displayed.
Locating the HBase Region Server information port
By default, the HBase Region Server Web interface port is 60030. This
information is configured in the …/conf/hbase-site.xml file within the
installation directory of your HBase region server. Look for the
hbase.regionserver.info.port property.
__10. In the Server Metrics section at top, click on the Requests tab to display the total read and write
request counts for your server since its launch. (The screen capture below was taken from a
system with 264 read requests and 11 write requests. As with other screen captures, your data
may vary.)
Hands On Lab Page 21
__11. Scroll to the Regions section towards the bottom of the page.
__12. Click on the Request metrics tab in the Regions section to determine the number of read and
write requests for the reviews table. (This screen capture was taken after 18 read requests and
15 write requests were issued.)
__13. If necessary, open a terminal window. From the Linux/Unix command line (not the HBase shell),
list the contents of the HBase data directory in your DFS. The HBase data directory location is
determined at installation. In the example below, this directory is
/app/hbase/data/data/default. Note that a subdirectory for your reviews table is present.
hdfs dfs -ls /apps/hbase/data/data/default/
Page 22 Introducing HBase
__14. Explore the contents of the …/reviews directory and look for a subdirectory with a long,
system-generated name.
hdfs dfs -ls /apps/hbase/data/data/default/reviews
__15. Investigate the contents of this subdirectory and note that it contains 1 subdirectory for each
column family of your table. (Substitute the system-generated subdirectory name in your
environment for the sample name shown below.) Recall that HBase physically organizes data in
your table by column family, which is reflected here.
hdfs dfs -ls
/apps/hbase/data/data/default/reviews/c628ccb212a396d1d97a954504470412
__16. If you'd like to learn how to explore your DFS contents using a Web browser, continue with the
remainder of this lab module. Otherwise, skip to the next lab.
__17. If necessary, determine the URL for your Name Node's Web interface. In your Hadoop client
directory (by default /usr/iop/current/hadoop-client), browse the . . . /conf/hdfs-
site.xml file. Note the setting for the dfs.namenode.http-address property. By default, this
will be at port 50070 of your Name Node's host.
__18. Direct your browser to the Name Node's HTTP address and verify that your display is similar to
this:
Hands On Lab Page 23
__19. In the menu at top, click the arrow key next to the Utilities tab to expose a drop-down menu.
Select Browse the file system.
__20. Navigate through the DFS directory tree, investigating the contents of your HBase database. For
example, the screen capture below displays the contents of the HBase subdirectory for the
reviews table.
Page 24 Introducing HBase
Lab 3 Using Big SQL to create and query HBase tables
Although HBase provides useful data storage and retrieval capabilities, it lacks a rich query language.
Fortunately, IBM's Big SQL technology, a component of BigInsights, enables programmers to use
industry-standard SQL to create, populate, and query Big SQL tables managed by HBase. Native HBase
data retrieval operations (such as GET and SCAN) can be performed on these tables, too. This lab
introduces you to the basics of using Big SQL with HBase. Later, you'll have a chance to explore
additional topics, such as data modeling.
Prior to starting this lab, you must have access to a BigInsights 4.1 environment with Big SQL running.
In addition, you need to know how to connect to your Big SQL server and how to issue a Big SQL query
using JSqsh or another query tool. If necessary, consult the BigInsights product documentation before
proceeding.
After you complete the lessons in this module, you will understand how to:
 Create Big SQL tables that use HBase as the underlying storage mechanism.
 Populate these tables using Big SQL INSERT and LOAD operations.
 Query these tables using projections and restrictions.
 Create views over Big SQL tables managed by HBase and query these views.
Allow 30 minutes to complete this lab.
3.1. Experimenting with your first Big SQL HBase table
To get started, you'll create a Big SQL table named bigsqllab.reviews to capture information about
product reviews. In contrast to the previous reviews table that you created natively in HBase, this Big
SQL table will have a pre-defined set of columns just like any other SQL-based table. Moreover, your
bigsqllab.reviews table definition will also specify how to map these SQL columns to HBase table
attributes -- specifically, a row key and columns in specific column families.
__1. If necessary, refresh your memory about the HBase reviews table created in an earlier lab.
Recall that this table tracked reviews that users posted about various products. The table had 3
column families -- summary, reviewer, and details. And, of course, it had a row key.
__2. Launch your Big SQL query execution environment and connect to your Big SQL database
following the standard process appropriate for your query environment. For example, if you're
using JSqsh and you previously created a Big SQL database connection named bigsql, you
might enter
/usr/ibmpacks/common-utils/current/jsqsh/bin/jsqsh bigsql
Modify this sample command as needed to match your environment.
Hands On Lab Page 25
__3. Create a Big SQL table named bigsqllab.reviews managed by HBase.
CREATE HBASE TABLE IF NOT EXISTS BIGSQLLAB.REVIEWS (
REVIEWID varchar(10) primary key not null,
PRODUCT varchar(30),
RATING int,
REVIEWERNAME varchar(30),
REVIEWERLOC varchar(30),
COMMENT varchar(100),
TIP varchar(100)
)
COLUMN MAPPING
(
key mapped by (REVIEWID),
summary:product mapped by (PRODUCT),
summary:rating mapped by (RATING),
reviewer:name mapped by (REVIEWERNAME),
reviewer:location mapped by (REVIEWERLOC),
details:comment mapped by (COMMENT),
details:tip mapped by (TIP)
);
About this CREATE HBASE TABLE statement . . . .
This statement creates a Big SQL table named REVIEWS in the BIGSQLLAB
schema and instructs Big SQL to use HBase as the underlying storage manager.
The COLUMN MAPPING clause specifies how SQL columns are to be mapped to
HBase columns in column families. For example, the SQL REVIEWERNAME
column is mapped to the HBase column family:column of 'reviewer:name'.
For simplicity, this example uses a 1:1 mapping of SQL columns to HBase
columns. It also uses a single SQL column as the HBase row key. As you'll see
in a subsequent lab, you may want (or need) to map multiple SQL columns to one
HBase column or to the HBase row key. Big SQL supports doing so.
Also for simplicity, we've accepted various defaults for this table, including default
HBase column family property settings (for VERSIONS and others) and binary
encoding for storage.
Finally, note that the SQL REVIEWID column was defined as the SQL primary
key. This is an informational constraint that can be useful for query optimization.
However, it isn't actively enforced.
__4. If necessary, open a terminal window.
Use Hadoop file system commands to verify that the table was created. List the contents of the
HBase data directory and confirm that a bigsqllab.reviews subdirectory exists.
hdfs dfs -ls /apps/hbase/data/data/default
Page 26 Introducing HBase
The root HBase directory is determined at installation. The examples in this lab were
developed for an environment in which HBase was configured to store data in
/apps/hbase/data/data/default. If your HBase configuration is different, adjust
the commands as needed to match your environment.
__5. List the contents of your table's subdirectory. Observe that this subdirectory contains another
subdirectory with a system-generated name. (In the screen capture below, this is shown in the
final line.)
hdfs dfs -ls /apps/hbase/data/data/default/bigsqllab.reviews
__6. List the contents of the . . . /bigsqllab.reviews subdirectory with the system-generated name.
(In the example above, this is the . . . 388912839f9df25a0f11996adede38b3 subdirectory.)
Adjust the path specification below to match your environment.
hdfs dfs -ls
/apps/hbase/data/data/default/bigsqllab.reviews/388912839f9df25a0f11996adede3
8b3
Note that there are additional subdirectories for each of the 3 column families (summary,
reviewer, details) specified in the COLUMN MAPPING clause of your CREATE HBASE TABLE
statement.
__7. Return to your Big SQL execution environment.
__8. Insert a row into your Big SQL reviews table.
insert into bigsqllab.reviews
values ('198','scarf','2','Bruno',null,'Feels cheap',null);
Hands On Lab Page 27
Note that this INSERT statement looks the same as any other SQL statement. You didn't need
to insert one HBase cell value at a time, and you didn't need to understand the underlying HBase
table structure.
__9. Insert another row into your table, specifying values for only a subset of its columns.
insert into bigsqllab.reviews (reviewid, product, rating, reviewername)
values ('298','gloves','3','Beppe');
__10. Use SQL to count the number of rows stored in your table, verifying that 2 are present.
select count(*) from bigsqllab.reviews;
__11. Execute a simple query to return specific information about reviews of products rated 3 or higher.
select reviewid, product, reviewername, reviewerloc
from bigsqllab.reviews
where rating >= 3;
As expected, only 1 row is returned.
Again, note that your SELECT statement looks like any other SQL SELECT -- you didn't need to
add any special code because the underlying storage manager is HBase.
__12. Launch the HBase shell and verify that you can work directly with the bigsqllab.reviews table
from the shell. To do so, scan the table.
scan 'bigsqllab.reviews'
Page 28 Introducing HBase
As you would expect, the final line of output reports that there are 2 rows in your table. In case
you're curious, x00 is both the non-null marker and also the terminator used for variable length
binary encoded values.
3.2. Creating views over Big SQL HBase tables
You can create views over Big SQL tables stored in HBase just as you can create views over Big SQL
tables that store data in the Hive warehouse or in simple DFS files. Creating views over Big SQL HBase
tables is straightforward, as you'll see in this exercise.
__1. From your Big SQL query execution environment, create a view based on a subset of the
bigsqllab.reviews table that you created earlier.
create view bigsqllab.testview as
select reviewid, product, reviewername, reviewerloc
from bigsqllab.reviews
where rating >= 3;
Note that this view definition looks like any other SQL view definition. You didn't need to specify
any syntax unique to Hadoop or HBase.
__2. Query the view.
select reviewid, product, reviewername
from bigsqllab.testview;
3.3. Loading data into a Big SQL HBase table
In this exercise, you'll explore how to use the Big SQL LOAD command to load data from a file into a Big
SQL table managed by HBase. To do so, you will use sample data shipped with Big SQL that is typically
installed with Big SQL client software. By default, this data is installed at
/usr/ibmpacks/bigsql/4.1/bigsql/samples/data. It is also available online at
https://hub.jazz.net/project/jayatheerthan/BigSQLSamples/overview#https://hub.jazz.net/git/jayatheerthan
%252FBigSQLSamples/list/master/samples/data.
The sample data represents data exported from a data warehouse that tracks sales of outdoor products.
It includes a series of FACT and DIMENSION tables. For this lab, you will create 1 DIMENSION table
and load sample data from 1 file into it.
__1. Create a Big SQL table in HBase named bigsqllab.sls_product_dim.
-- product dimension table
CREATE HBASE TABLE IF NOT EXISTS bigsqllab.sls_product_dim
( product_key INT PRIMARY KEY NOT NULL
Hands On Lab Page 29
, product_line_code INT NOT NULL
, product_type_key INT NOT NULL
, product_type_code INT NOT NULL
, product_number INT NOT NULL
, base_product_key INT NOT NULL
, base_product_number INT NOT NULL
, product_color_code INT
, product_size_code INT
, product_brand_key INT NOT NULL
, product_brand_code INT NOT NULL
, product_image VARCHAR(60)
, introduction_date TIMESTAMP
, discontinued_date TIMESTAMP
)
COLUMN MAPPING
(
key mapped by (PRODUCT_KEY),
data:line_code mapped by (PRODUCT_LINE_CODE),
data:type_key mapped by (PRODUCT_TYPE_KEY),
data:type_code mapped by (PRODUCT_TYPE_CODE),
data:number mapped by (PRODUCT_NUMBER),
data:base_key mapped by (BASE_PRODUCT_KEY),
data:base_number mapped by (BASE_PRODUCT_NUMBER),
data:color mapped by (PRODUCT_COLOR_CODE),
data:size mapped by (PRODUCT_SIZE_CODE),
data:brand_key mapped by (PRODUCT_BRAND_KEY),
data:brand_code mapped by (PRODUCT_BRAND_CODE),
data:image mapped by (PRODUCT_IMAGE),
data:intro_date mapped by (INTRODUCTION_DATE),
data:discon_date mapped by (DISCONTINUED_DATE)
);
Page 30 Introducing HBase
HBase design for this table
The HBase specification of this Big SQL statement placed nearly all SQL
columns into a single HBase column family named 'data'. As you know,
HBase creates physical files for each column family; this is something you
should take into consideration when designing your table's structure. It's often
best to keep the number of column families per table small unless your
workload involves many queries over mutually exclusive columns. For
example, if you knew that PRODUCT_IMAGE data was rarely queried or
frequently queried alone, you might want to store it separately (i.e., in a
different column family:column).
There's something else worth noting about this table definition: the HBase
columns have shorter names than the SQL columns. As you saw in earlier
exercises, HBase stores full key information (row key, column family name,
column name, and timestamp) with the key value. This consumes disk space.
If you keep the HBase column family and column names short and specify
longer, more meaningful names for the SQL columns, your design will
minimize disk consumption and remain friendly to SQL programmers.
__2. Load data into the table. Change the SFTP and file path specifications shown below to
match your environment. This statement will return a warning message providing details on
the number of rows loaded, etc.
LOAD HADOOP using file url
'sftp://yourID:yourPassword@myhost.ibm.com:22/your-
dir/data/GOSALESDW.SLS_PRODUCT_DIM.txt' with SOURCE PROPERTIES
('field.delimiter'='t') INTO TABLE bigsqllab.sls_product_dim;
Hands On Lab Page 31
A closer look at LOAD . . . .
Let’s explore the LOAD syntax briefly. This statement loads data into a table
using a file URL specification that relies on SFTP to locate the source file. In
particular, the SFTP specification includes a valid user ID and password
(yourID/yourPassword), the target host server and port
(myhost.ibm.com:22), and the full path of the data file on that system. The
WITH SOURCE PROPERTIES clause specifies that fields in the source data
are delimited by tabs (t). The INTO TABLE clause identifies the target table
for the LOAD operation.
Using SFTP (or FTP) is one way in which you can invoke the LOAD command.
If your target data already resides in your distributed file system, you can
provide the DFS directory information in your file URL specification. Indeed, for
optimal runtime performance, you may prefer to take that approach. In
addition, you can load data directly from a remote relational DBMS via a JDBC
connection. See the BigInsights Knowledge Center (product documentation)
for details.
__3. Count the number of rows in the table to verify that 274 are present.
-- total rows in SLS_PRODUCT_DIM = 274
select count(*) from bigsqllab.SLS_PRODUCT_DIM;
__4. Optionally, query the table.
select product_key, introduction_date, product_color_code
from bigsqllab.sls_product_dim
where product_key > 30000
fetch first 5 rows only;
3.4. Optional: Dropping tables created in this lab
If you don't plan to complete any subsequent labs, you may want to clean up your environment. This
optional exercise shows you how to drop the tables you created in this lab.
Page 32 Introducing HBase
__1. Drop the bigsqllab.reviews table and the bigsqllab.sls_product_dim table.
drop table bigsqllab.reviews;
drop table bigsqllab.sls_product_dim;
__2. Verify that these tables no longer exist. For example, query each table and confirm that you
receive an error message indicating that the table name you provided is undefined (SQLCODE -
204, SQLSTATE 42704).
select count(*) from bigsqllab.reviews;
select count(*) from bigsqllab.sls_product_dim;
Hands On Lab Page 33
Lab 4 Exploring data modeling options
In this lab, you'll explore a few options for modeling data exported from a relational DBMS into Big SQL
tables managed by HBase. In the previous lab, you saw how you could map data that was originally
modeled as a single relational DBMS table into a Big SQL table. You even saw how you could integrate
some minor HBase design enhancements into your Big SQL table, such as grouping frequently queried
columns together into a single column family and using short names for HBase columns and column
families. However, for simplicity, you looked at your dimension table in isolation; you didn't consider how
this one table related to other tables in the original database.
Although many HBase applications involve managing data that's outside the typical scope of a relational
database, some organizations look to HBase as a potential storage mechanism for offloading seldom-
queried relational data, including "cold" or "stale" data in a relational data warehouse. Such data is often
spread across multiple tables in a normalized (or somewhat normalized) manner. Relationships among
these tables are captured through primary key/foreign key constraints. Indeed, star schema or snowflake
schema database designs are common in such scenarios. Furthermore, in some relational database
designs, primary keys may not be present in all tables or may consist of multiple SQL columns.
Furthermore, many relational tables are densely populated (i.e., they contain relatively few nulls).
Finally, queries often involve multi-way joins.
Such characteristics can pose challenges when attempting to port the relational design to Big SQL tables
managed by HBase. This lab introduces you to some of these challenges and explores options for
addressing them. While a full discussion of Big SQL and HBase data modeling topics is beyond the
scope of this introductory lab, you'll have a chance to explore:
 Many-to-one mapping of relational tables to an HBase table (de-normalization).
 Many-to-one mapping of relational table columns to an HBase column.
 Generation of unique row key values.
Sample data for this lab is based on sample warehouse data available as part of your BigInsights
installation and publicly at
https://hub.jazz.net/project/jayatheerthan/BigSQLSamples/overview#https://hub.jazz.net/git/jayatheerthan
%252FBigSQLSamples/list/master/samples/data.
Allow 1/2 - 1 hour to complete this lab. Prior to starting this lab, you need to know how to connect to your
BigInsights Big SQL database and how to issue a Big SQL query using JSqsh or another supported
query tool.
4.1. De-normalizing relational tables (mapping multiple tables to one
HBase table)
Since multi-way join processing isn't a strength of HBase, you may need to de-normalize a traditional
relational database design to implement an efficient HBase design for your Big SQL tables. Of course,
Big SQL doesn't mandate the use of HBase -- you can use Hive or simple DFS files -- but let's assume
you concluded that you wanted to use HBase as the underlying storage manager.
Page 34 Introducing HBase
In this exercise, you will explore one approach for mapping two tables along the PRODUCT dimension of
the relational data warehouse into a single Big SQL table. Since the content for the source tables is
stored in two separate files that each contain different sets of fields, you won't be able to directly load
these files into your target table. Instead, you'll need to pre-process or transform the data before using it
to populate your Big SQL table. You'll use Big SQL to help you with that task.
Specifically, you'll upload the source files into your DFS using standard Hadoop file system commands.
Next, you'll create Big SQL externally managed tables over these files. Doing so simply layers a SQL
schema over these files -- it does not cause the data to be duplicated or copied into the Hive warehouse.
Finally, you'll select the data you want from these external tables and create a Big SQL HBase table
based on that result set.
__1. If necessary, open a terminal window.
__2. Check the directory permissions for your DFS.
hdfs dfs -ls /
If the /user directory cannot be written by the public (as shown in the example above), you will
need to change these permissions so that you can create the necessary subdirectories for this
lab using your standard lab user account.
From the command line, issue this command to switch to the root user ID temporarily:
su root
When prompted, enter the password for this account. Then switch to the hdfs account.
su hdfs
While logged in as user hdfs, issue this command:
hdfs dfs -chmod 777 /user
Next, confirm the effect of your change:
hdfs dfs -ls /
Hands On Lab Page 35
Exit the hdfs user account.
exit
Finally, exit the root user account and return to your standard user account:
exit
__3. Create directories in your distributed file system for the source data files. If desired, alter the
DFS information as appropriate for your environment.
hdfs dfs -mkdir /user/hbase_lab
hdfs dfs -mkdir /user/hbase_lab/sls_product_dim
hdfs dfs -mkdir /user/hbase_lab/sls_product_line_lookup
__4. Upload the source data files into their respective DFS directories. Change the local and DFS
directories information below to match your environment.
hdfs dfs -copyFromLocal /your-dir/data/GOSALESDW.SLS_PRODUCT_DIM.txt
/user/hbase_lab/sls_product_dim/SLS_PRODUCT_DIM.txt
hdfs dfs -copyFromLocal /your-dir/data/GOSALESDW.SLS_PRODUCT_LINE_LOOKUP.txt
/user/hbase_lab/sls_product_line_lookup/SLS_PRODUCT_LINE_LOOKUP.txt
__5. List the contents of the DFS directories into which you copied the files to validate your work.
hdfs dfs -ls /user/hbase_lab/sls_product_dim
hdfs dfs -ls /user/hbase_lab/sls_product_line_lookup
__6. Ensure the contents of your new DFS directories is publicly accessible.
hdfs dfs -chmod -R 777 /user/hbase_lab
Page 36 Introducing HBase
__7. If necessary, launch your Big SQL query execution environment.
__8. Create external Big SQL tables for the sales product dimension (extern.sls_product_dim)
and the sales product line lookup (extern.sls_product_line_lookup) tables. Note that the
LOCATION clause in each statement references the DFS directory into which you copied the
sample data.
-- product dimension table
CREATE EXTERNAL HADOOP TABLE IF NOT EXISTS extern.sls_product_dim
( product_key INT NOT NULL
, product_line_code INT NOT NULL
, product_type_key INT NOT NULL
, product_type_code INT NOT NULL
, product_number INT NOT NULL
, base_product_key INT NOT NULL
, base_product_number INT NOT NULL
, product_color_code INT
, product_size_code INT
, product_brand_key INT NOT NULL
, product_brand_code INT NOT NULL
, product_image VARCHAR(60)
, introduction_date TIMESTAMP
, discontinued_date TIMESTAMP
)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY 't'
LINES TERMINATED BY 'n'
location '/user/hbase_lab/sls_product_dim';
-- look up table with product line info in various languages
CREATE EXTERNAL HADOOP TABLE IF NOT EXISTS extern.sls_product_line_lookup
( product_line_code INT NOT NULL
, product_line_en VARCHAR(90) NOT NULL
, product_line_de VARCHAR(90), product_line_fr VARCHAR(90)
, product_line_ja VARCHAR(90), product_line_cs VARCHAR(90)
, product_line_da VARCHAR(90), product_line_el VARCHAR(90)
, product_line_es VARCHAR(90), product_line_fi VARCHAR(90)
, product_line_hu VARCHAR(90), product_line_id VARCHAR(90)
, product_line_it VARCHAR(90), product_line_ko VARCHAR(90)
, product_line_ms VARCHAR(90), product_line_nl VARCHAR(90)
, product_line_no VARCHAR(90), product_line_pl VARCHAR(90)
, product_line_pt VARCHAR(90), product_line_ru VARCHAR(90)
, product_line_sc VARCHAR(90), product_line_sv VARCHAR(90)
, product_line_tc VARCHAR(90), product_line_th VARCHAR(90)
)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY 't'
LINES TERMINATED BY 'n'
location '/user/hbase_lab/sls_product_line_lookup';
Hands On Lab Page 37
If you encounter a SQL -5105 error message such as the one shown below, the DFS directory
permissions for your target directory (e.g., /user/biadmin) may be too restrictive.
From an OS terminal window, issue this command:
hdfs dfs -ls /user/hbase-lab/
Your permissions must include rw settings. Consult the earlier steps in this lab for instructions
on how to reset DFS directory permissions.
__9. Verify that you can query each table.
-- total rows in EXTERN.SLS_PRODUCT_DIM = 274
select count(*) from EXTERN.SLS_PRODUCT_DIM;
-- total rows in EXTERN.SLS_PRODUCT_LINE_LOOKUP = 5
select count(*) from EXTERN.SLS_PRODUCT_LINE_LOOKUP;
__10. Become familiar with the content of the extern.sls_product_dim table.
select * from extern.sls_product_dim fetch first 3 rows only;
Note that most of the columns contain only numeric codes; these columns typically serve as join
keys for other tables in the PRODUCT dimension that contain more descriptive information.
__11. Become familiar with the contents of the extern.sls_product_line_lookup table by selecting
data in a few of its columns:
select product_line_code, product_line_en, product_line_de, product_line_fr
from extern.sls_product_line_lookup;
Page 38 Introducing HBase
Note that the PRODUCT_LINE_CODE column in this table is likely to be joined with the
PRODUCT_LINE_CODE column of the sales product dimension table you created earlier. This
will make it easy for your to de-normalize (or flatten) these tables into a single HBase table with a
unique row key based on the product line code.
__12. Execute a SELECT statement to join the two tables. In a moment, you'll use this statement as a
basis for creating a new HBase table modeled after the result set of this query. (For simplicity,
you will only retrieve a subset of columns in each.)
select product_key, d.product_line_code, product_type_key,
product_type_code, product_line_en, product_line_de
from extern.sls_product_dim d, extern.sls_product_line_lookup l
where d.product_line_code = l.product_line_code
fetch first 3 rows only;
__13. Verify that the following join of these tables will produce a result set with 274 rows -- the same
number as the extern.sls_product_dim table. After all, you simply want to add more
descriptive information (extracted from the extern.sls_product_line_lookup table) to its
contents in your new HBase table.
-- this query should return 274
select count(*)
from extern.sls_product_dim d, extern.sls_product_line_lookup l
where d.product_line_code = l.product_line_code;
__14. Now you’re ready to create a Big SQL HBase table derived from your join query. In effect, you’re
using Big SQL to transform source data in 2 files into a format suitable for a single HBase table.
Issue this statement:
-- flatten 2 product tables into 1 hbase table
CREATE hbase TABLE IF NOT EXISTS bigsqllab.sls_product_flat
( product_key INT NOT NULL
, product_line_code INT NOT NULL
, product_type_key INT NOT NULL
, product_type_code INT NOT NULL
, product_line_en VARCHAR(90)
, product_line_de VARCHAR(90)
)
column mapping
(
key mapped by (product_key),
data:c2 mapped by (product_line_code),
data:c3 mapped by (product_type_key),
data:c4 mapped by (product_type_code),
data:c5 mapped by (product_line_en),
Hands On Lab Page 39
data:c6 mapped by (product_line_de)
)
as select product_key, d.product_line_code, product_type_key,
product_type_code, product_line_en, product_line_de
from extern.sls_product_dim d, extern.sls_product_line_lookup l
where d.product_line_code = l.product_line_code;
__15. Verify that there are 274 rows in the table you just created.
select count(*) from bigsqllab.sls_product_flat;
__16. Query the bigsqllab.sls_product_flat table.
select product_key, product_line_code, product_line_en
from bigsqllab.sls_product_flat
where product_key > 30270;
4.2. Generating unique row key values
In this exercise, you’ll consider a situation in which the original relational table didn’t have a single
column serving as its primary key. Thus, a one-to-one mapping of this table to a Big SQL table won’t be
appropriate unless you take some additional action, such as using the FORCE UNIQUE KEY clause of
the Big SQL CREATE HBASE TABLE statement. You’ll explore that approach here.
__1. Become familiar with the schema for the SLS_SALES_FACT table exported from a relational
database. Inspect the details below, and note that the primary key for this table is comprised of
several columns.
Table: SLS_SALES_FACT
Columns:
ORDER_DAY_KEY
ORGANIZATION_KEY
EMPLOYEE_KEY
RETAILER_KEY
RETAILER_SITE_KEY
PRODUCT_KEY
PROMOTION_KEY
ORDER_METHOD_KEY
SALES_ORDER_KEY
SHIP_DAY_KEY
CLOSE_DAY_KEY
Page 40 Introducing HBase
QUANTITY
UNIT_COST
UNIT_PRICE
UNIT_SALE_PRICE
GROSS_MARGIN
SALE_TOTAL
GROSS_PROFIT
Primary Key:
ORDER_DAY_KEY
ORGANIZATION_KEY
EMPLOYEE_KEY
RETAILER_KEY
RETAILER_SITE_KEY
PRODUCT_KEY
PROMOTION_KEY
ORDER_METHOD_KEY
Foreign Key: PRODUCT_KEY
Parent table: SLS_PRODUCT_DIM
__2. If necessary, open a terminal window.
__3. Change directories to the location of the sample data in your local file system. Alter the directory
specification shown below to match your environment.
cd /usr/ibmpacks/bigsql/4.1/bigsql/samples/data
__4. Count the number of lines (records) in the GOSALESDW.SLS_SALES_FACT.txt file, verifying that
446023 are present:
wc -l GOSALESDW.SLS_SALES_FACT.txt
__5. If necessary, launch your Big SQL query execution environment.
__6. From your query execution environment, create a Big SQL HBase table named
bigsqllab.sls_sales_fact_unique. Include a FORCE KEY UNIQUE clause with the row key
specification to instruct Big SQL to append additional data to the ORDER_DAY_KEY values to
ensure that each input record results in a unique value (and therefore a new row in the HBase
table). This additional data won’t be visible to users who query the table.
CREATE HBASE TABLE IF NOT EXISTS BIGSQLLAB.SLS_SALES_FACT_UNIQUE
(
ORDER_DAY_KEY int,
ORGANIZATION_KEY int,
EMPLOYEE_KEY int,
RETAILER_KEY int,
RETAILER_SITE_KEY int,
PRODUCT_KEY int,
PROMOTION_KEY int,
ORDER_METHOD_KEY int,
Hands On Lab Page 41
SALES_ORDER_KEY int,
SHIP_DAY_KEY int,
CLOSE_DAY_KEY int,
QUANTITY int,
UNIT_COST decimal(19,2),
UNIT_PRICE decimal(19,2),
UNIT_SALE_PRICE decimal(19,2),
GROSS_MARGIN double,
SALE_TOTAL decimal(19,2),
GROSS_PROFIT decimal(19,2)
)
COLUMN MAPPING
(
key mapped by (ORDER_DAY_KEY) force key unique,
cf_data:cq_ORGANIZATION_KEY mapped by (ORGANIZATION_KEY),
cf_data:cq_EMPLOYEE_KEY mapped by (EMPLOYEE_KEY),
cf_data:cq_RETAILER_KEY mapped by (RETAILER_KEY),
cf_data:cq_RETAILER_SITE_KEY mapped by (RETAILER_SITE_KEY),
cf_data:cq_PRODUCT_KEY mapped by (PRODUCT_KEY),
cf_data:cq_PROMOTION_KEY mapped by (PROMOTION_KEY),
cf_data:cq_ORDER_METHOD_KEY mapped by (ORDER_METHOD_KEY),
cf_data:cq_SALES_ORDER_KEY mapped by (SALES_ORDER_KEY),
cf_data:cq_SHIP_DAY_KEY mapped by (SHIP_DAY_KEY),
cf_data:cq_CLOSE_DAY_KEY mapped by (CLOSE_DAY_KEY),
cf_data:cq_QUANTITY mapped by (QUANTITY),
cf_data:cq_UNIT_COST mapped by (UNIT_COST),
cf_data:cq_UNIT_PRICE mapped by (UNIT_PRICE),
cf_data:cq_UNIT_SALE_PRICE mapped by (UNIT_SALE_PRICE),
cf_data:cq_GROSS_MARGIN mapped by (GROSS_MARGIN),
cf_data:cq_SALE_TOTAL mapped by (SALE_TOTAL),
cf_data:cq_GROSS_PROFIT mapped by (GROSS_PROFIT)
);
__7. Load data into this table. Adjust the user ID, password, and directory information as needed for
your environment.
LOAD HADOOP using file url
'sftp://yourID:yourPassword@myhost.ibm.com:22/your-
dir/GOSALESDW.SLS_SALES_FACT.txt' with SOURCE PROPERTIES
('field.delimiter'='t')
INTO TABLE bigsqllab.sls_sales_fact_unique;
__8. Verify that the table contains the expected number of rows (446023).
select count(*) from bigsqllab.sls_sales_fact_unique;
Page 42 Introducing HBase
__9. Query the table.
select order_day_key, product_key, sale_total from bigsqllab.sls_sales_fact_unique
where order_day_key BETWEEN 20040112 and 20040115
fetch first 5 rows only;
__10. Optionally, investigate what happens if you had omitted the FORCE UNIQUE KEY clause when
creating this table.
1. Create a table named bigsqllab.sls_sales_fact_nopk that omits the FORCE
UNIQUE KEY clause for the row key definition.
CREATE HBASE TABLE IF NOT EXISTS BIGSQLLAB.SLS_SALES_FACT_NOPK
(
ORDER_DAY_KEY int,
ORGANIZATION_KEY int,
EMPLOYEE_KEY int,
RETAILER_KEY int,
RETAILER_SITE_KEY int,
PRODUCT_KEY int,
PROMOTION_KEY int,
ORDER_METHOD_KEY int,
SALES_ORDER_KEY int,
SHIP_DAY_KEY int,
CLOSE_DAY_KEY int,
QUANTITY int,
UNIT_COST decimal(19,2),
UNIT_PRICE decimal(19,2),
UNIT_SALE_PRICE decimal(19,2),
GROSS_MARGIN double,
SALE_TOTAL decimal(19,2),
GROSS_PROFIT decimal(19,2)
)
COLUMN MAPPING
(
key mapped by (ORDER_DAY_KEY),
cf_data:cq_ORGANIZATION_KEY mapped by (ORGANIZATION_KEY),
cf_data:cq_EMPLOYEE_KEY mapped by (EMPLOYEE_KEY),
cf_data:cq_RETAILER_KEY mapped by (RETAILER_KEY),
cf_data:cq_RETAILER_SITE_KEY mapped by (RETAILER_SITE_KEY),
cf_data:cq_PRODUCT_KEY mapped by (PRODUCT_KEY),
Hands On Lab Page 43
cf_data:cq_PROMOTION_KEY mapped by (PROMOTION_KEY),
cf_data:cq_ORDER_METHOD_KEY mapped by (ORDER_METHOD_KEY),
cf_data:cq_SALES_ORDER_KEY mapped by (SALES_ORDER_KEY),
cf_data:cq_SHIP_DAY_KEY mapped by (SHIP_DAY_KEY),
cf_data:cq_CLOSE_DAY_KEY mapped by (CLOSE_DAY_KEY),
cf_data:cq_QUANTITY mapped by (QUANTITY),
cf_data:cq_UNIT_COST mapped by (UNIT_COST),
cf_data:cq_UNIT_PRICE mapped by (UNIT_PRICE),
cf_data:cq_UNIT_SALE_PRICE mapped by (UNIT_SALE_PRICE),
cf_data:cq_GROSS_MARGIN mapped by (GROSS_MARGIN),
cf_data:cq_SALE_TOTAL mapped by (SALE_TOTAL),
cf_data:cq_GROSS_PROFIT mapped by (GROSS_PROFIT)
);
2. Load data into this table from your source file. Adjust the file URL specification as
needed to match your environment.
LOAD HADOOP using file url
'sftp://yourID:yourPassword@myhost.ibm.com:22/your-
dir/GOSALESDW.SLS_SALES_FACT.txt'with SOURCE PROPERTIES
('field.delimiter'='t') INTO TABLE bigsqllab.sls_sales_fact_nopk;
3. Count the number of rows in your table. Note that there are only 440 rows.
select count(*) from bigsqllab.sls_sales_fact_nopk;
4. Consider what just occurred. You loaded a file with 446023 records into your Big SQL
HBase table without error, yet only 440 rows are present in your table. That's because
HBase ensures that each row key is unique. If you put 5 different records with the same
row key into a native HBase table, your HBase table will contain only 1 current row for
that row key. Because you mapped a SQL column with non-unique values to the HBase
row key, HBase essentially updated the information for those rows containing duplicate
row key values.
4.3. Mapping multiple SQL columns to one HBase row key or column
Until now, you've mapped each field from a source file (i.e., each SQL column in the source relational
table) to a single Big SQL HBase column. Although straightforward to implement, this one-to-one
mapping approach has a significant drawback: it consumes considerable disk space. Why? As you
learned earlier, HBase stores full key information (row key, column family name, column name, and
timestamp) along with each cell value. As a result, HBase tables with many columns can consume
considerable disk space.
In this exercise, you'll explore two different many-to-one column mapping options. In particular, you'll
define a composite key for the HBase row key; in other words, your row key will be based on multiple
SQL columns. In addition, you'll define dense columns in your HBase table; in other words, one HBase
column will be based on multiple SQL columns.
Page 44 Introducing HBase
__1. Consider the relational schema for the SLS_SALES_FACT table shown at the beginning of the
previous exercise. Recall that its primary key spanned several SQL columns, which you'll model
as a composite row key in your Big SQL HBase table. In addition, let's assume that some SQL
columns are commonly queried together, such as columns related to pricing and cost. Packing
these SQL columns into a single, dense HBase column can reduce the I/O required to read and
write this data.
__2. If necessary, launch your Big SQL query execution environment.
__3. Create a new Big SQL sales fact table named bigsqllab.sls_sales_fact_dense with a
composite key and dense columns.
CREATE HBASE TABLE IF NOT EXISTS BIGSQLLAB.SLS_SALES_FACT_DENSE
(
ORDER_DAY_KEY int,
ORGANIZATION_KEY int,
EMPLOYEE_KEY int,
RETAILER_KEY int,
RETAILER_SITE_KEY int,
PRODUCT_KEY int,
PROMOTION_KEY int,
ORDER_METHOD_KEY int,
SALES_ORDER_KEY int,
SHIP_DAY_KEY int,
CLOSE_DAY_KEY int,
QUANTITY int,
UNIT_COST decimal(19,2),
UNIT_PRICE decimal(19,2),
UNIT_SALE_PRICE decimal(19,2),
GROSS_MARGIN double,
SALE_TOTAL decimal(19,2),
GROSS_PROFIT decimal(19,2)
)
COLUMN MAPPING
(
key mapped by
(ORDER_DAY_KEY, ORGANIZATION_KEY, EMPLOYEE_KEY, RETAILER_KEY, RETAILER_SITE_KEY,
PRODUCT_KEY, PROMOTION_KEY, ORDER_METHOD_KEY),
cf_data:cq_OTHER_KEYS mapped by
(SALES_ORDER_KEY, SHIP_DAY_KEY, CLOSE_DAY_KEY),
cf_data:cq_QUANTITY mapped by (QUANTITY),
cf_data:cq_MONEY mapped by
(UNIT_COST, UNIT_PRICE, UNIT_SALE_PRICE, GROSS_MARGIN, SALE_TOTAL, GROSS_PROFIT)
);
__4. Load data into this table. (Adjust the SFTP specification below to match your environment.)
Hands On Lab Page 45
LOAD HADOOP using file url
'sftp://yourID:yourPassword@myhost.ibm.com:22/opt/ibm/biginsights/bigsql/samp
les/data/GOSALESDW.SLS_SALES_FACT.txt' with SOURCE PROPERTIES
('field.delimiter'='t')
INTO TABLE bigsqllab.sls_sales_fact_dense;
__5. Verify that the table contains the expected number of rows (446023).
select count(*) from bigsqllab.sls_sales_fact_dense;
__6. Query the table.
select order_day_key, product_key, sale_total from
bigsqllab.sls_sales_fact_dense where order_day_key BETWEEN 20040112 and
20040115 fetch first 5 rows only;
__7. If you're curious about how the storage savings of the "dense" model of your fact table compares
with the original model that mapped each SQL column to an HBase row key or column, open a
terminal window and execute this command:
hdfs dfs -du /apps/hbase/data/data/default | grep -i bigsqllab | sort
Compare the size of the bigsqllab.sls_sales_fact_dense table (shown in the first line of the
sample output) with the size of the bigsqllab.sls_sales_fact_unique table (shown in the
last ine of the sample output). The "dense" table consumes much less space than the original
table.
4.4. Optional: Dropping tables created in this lab
If you don't plan to complete any subsequent labs, you may want to clean up your environment. This
optional exercise contains instruction to do so.
__1. Drop the tables you created in this lab.
Page 46 Introducing HBase
drop table extern.sls_product_dim;
drop table extern.sls_product_line_lookup;
drop table bigsqllab.sls_product_flat;
drop table bigsqllab.sls_sales_fact_unique;
drop table bigsqllab.sls_sales_fact_nopk;
drop table bigsqllab.sls_sales_fact_dense;
__2. Optionally, verify that these tables no longer exist. For example, query each table and confirm
that you receive an error message indicating that the table name you provided is undefined
(SQLCODE -204, SQLSTATE 42704).
select count(*) from extern.sls_product_dim;
select count(*) from extern.sls_product_line_lookup;
select count(*) from bigsqllab.sls_product_flat;
select count(*) from bigsqllab.sls_sales_fact_unique;
select count(*) from bigsqllab.sls_sales_fact_nopk;
select count(*) from bigsqllab.sls_sales_fact_dense;
Hands On Lab Page 47
Lab 5 Summary
In this lab, you gained hands-on experience using HBase natively as well as using Big SQL with HBase.
You learned how to create, popular and retrieve data from HBase using the HBase shell. In addition, you
saw how Big SQL can store its data in HBase tables, thereby affording programmers sophisticated SQL
query capabilities. You also explored some data modeling options available through Big SQL.
To expand your skills even further, visit the HadoopDev web site (https://developer.ibm.com/hadoop/)
contains for links to free online courses, tutorials, and more.
NOTES
NOTES
© Copyright IBM Corporation 2015. Author: Cynthia M. Saracco.
The information contained in these materials is provided for
informational purposes only, and is provided AS IS without warranty
of any kind, express or implied. IBM shall not be responsible for any
damages arising out of the use of, or otherwise related to, these
materials. Nothing contained in these materials is intended to, nor
shall have the effect of, creating any warranties or representations
from IBM or its suppliers or licensors, or altering the terms and
conditions of the applicable license agreement governing the use of
IBM software. References in these materials to IBM products,
programs, or services do not imply that they will be available in all
countries in which IBM operates. This information is based on
current IBM product plans and strategy, which are subject to change
by IBM without notice. Product release dates and/or capabilities
referenced in these materials may change at any time at IBM’s sole
discretion based on market opportunities or other factors, and are not
intended to be a commitment to future product or feature availability
in any way.
IBM, the IBM logo and ibm.com are trademarks of International
Business Machines Corp., registered in many jurisdictions
worldwide. Other product and service names might be trademarks of
IBM or other companies. A current list of IBM trademarks is
available on the Web at “Copyright and trademark information” at
www.ibm.com/legal/copytrade.shtml.

More Related Content

What's hot

Understanding MicroSERVICE Architecture with Java & Spring Boot
Understanding MicroSERVICE Architecture with Java & Spring BootUnderstanding MicroSERVICE Architecture with Java & Spring Boot
Understanding MicroSERVICE Architecture with Java & Spring BootKashif Ali Siddiqui
 
Comparable/ Comparator
Comparable/ ComparatorComparable/ Comparator
Comparable/ ComparatorSean McElrath
 
Scala - The Simple Parts, SFScala presentation
Scala - The Simple Parts, SFScala presentationScala - The Simple Parts, SFScala presentation
Scala - The Simple Parts, SFScala presentationMartin Odersky
 
Introduction to Resilience4j
Introduction to Resilience4jIntroduction to Resilience4j
Introduction to Resilience4jKnoldus Inc.
 
Cloud Native Apps
Cloud Native AppsCloud Native Apps
Cloud Native AppsDavid Chou
 
Microservice Architecture | Microservices Tutorial for Beginners | Microservi...
Microservice Architecture | Microservices Tutorial for Beginners | Microservi...Microservice Architecture | Microservices Tutorial for Beginners | Microservi...
Microservice Architecture | Microservices Tutorial for Beginners | Microservi...Edureka!
 
Microservice architecture : Part 1
Microservice architecture : Part 1Microservice architecture : Part 1
Microservice architecture : Part 1NodeXperts
 
Hadoop hive presentation
Hadoop hive presentationHadoop hive presentation
Hadoop hive presentationArvind Kumar
 
Seminar Presentation Hadoop
Seminar Presentation HadoopSeminar Presentation Hadoop
Seminar Presentation HadoopVarun Narang
 
Spring - Part 1 - IoC, Di and Beans
Spring - Part 1 - IoC, Di and Beans Spring - Part 1 - IoC, Di and Beans
Spring - Part 1 - IoC, Di and Beans Hitesh-Java
 

What's hot (20)

Understanding MicroSERVICE Architecture with Java & Spring Boot
Understanding MicroSERVICE Architecture with Java & Spring BootUnderstanding MicroSERVICE Architecture with Java & Spring Boot
Understanding MicroSERVICE Architecture with Java & Spring Boot
 
Comparable/ Comparator
Comparable/ ComparatorComparable/ Comparator
Comparable/ Comparator
 
Apache hive
Apache hiveApache hive
Apache hive
 
IIS 7.0 Architecture And Integration With Asp.Net
IIS 7.0 Architecture And Integration With Asp.NetIIS 7.0 Architecture And Integration With Asp.Net
IIS 7.0 Architecture And Integration With Asp.Net
 
Hibernate I
Hibernate IHibernate I
Hibernate I
 
Spring boot
Spring bootSpring boot
Spring boot
 
System design interview prep
System design interview prepSystem design interview prep
System design interview prep
 
Scala - The Simple Parts, SFScala presentation
Scala - The Simple Parts, SFScala presentationScala - The Simple Parts, SFScala presentation
Scala - The Simple Parts, SFScala presentation
 
Argo uml
Argo umlArgo uml
Argo uml
 
HDFS Architecture
HDFS ArchitectureHDFS Architecture
HDFS Architecture
 
Introduction to Resilience4j
Introduction to Resilience4jIntroduction to Resilience4j
Introduction to Resilience4j
 
Cloud Native Apps
Cloud Native AppsCloud Native Apps
Cloud Native Apps
 
Microservice Architecture | Microservices Tutorial for Beginners | Microservi...
Microservice Architecture | Microservices Tutorial for Beginners | Microservi...Microservice Architecture | Microservices Tutorial for Beginners | Microservi...
Microservice Architecture | Microservices Tutorial for Beginners | Microservi...
 
Spring Boot
Spring BootSpring Boot
Spring Boot
 
Hive
HiveHive
Hive
 
Microservice architecture : Part 1
Microservice architecture : Part 1Microservice architecture : Part 1
Microservice architecture : Part 1
 
MongoDB Sharding Fundamentals
MongoDB Sharding Fundamentals MongoDB Sharding Fundamentals
MongoDB Sharding Fundamentals
 
Hadoop hive presentation
Hadoop hive presentationHadoop hive presentation
Hadoop hive presentation
 
Seminar Presentation Hadoop
Seminar Presentation HadoopSeminar Presentation Hadoop
Seminar Presentation Hadoop
 
Spring - Part 1 - IoC, Di and Beans
Spring - Part 1 - IoC, Di and Beans Spring - Part 1 - IoC, Di and Beans
Spring - Part 1 - IoC, Di and Beans
 

Similar to Big Data: HBase and Big SQL self-study lab

Big Data: Working with Big SQL data from Spark
Big Data:  Working with Big SQL data from Spark Big Data:  Working with Big SQL data from Spark
Big Data: Working with Big SQL data from Spark Cynthia Saracco
 
Big Data: Big SQL web tooling (Data Server Manager) self-study lab
Big Data:  Big SQL web tooling (Data Server Manager) self-study labBig Data:  Big SQL web tooling (Data Server Manager) self-study lab
Big Data: Big SQL web tooling (Data Server Manager) self-study labCynthia Saracco
 
Hbase Tutorial
Hbase TutorialHbase Tutorial
Hbase TutorialGuru99
 
HBase Mongo_DB Project
HBase Mongo_DB ProjectHBase Mongo_DB Project
HBase Mongo_DB ProjectSonali Gupta
 
Big Data: Explore Hadoop and BigInsights self-study lab
Big Data:  Explore Hadoop and BigInsights self-study labBig Data:  Explore Hadoop and BigInsights self-study lab
Big Data: Explore Hadoop and BigInsights self-study labCynthia Saracco
 
Hands-on-Lab: Adding Value to HBase with IBM InfoSphere BigInsights and BigSQL
Hands-on-Lab: Adding Value to HBase with IBM InfoSphere BigInsights and BigSQLHands-on-Lab: Adding Value to HBase with IBM InfoSphere BigInsights and BigSQL
Hands-on-Lab: Adding Value to HBase with IBM InfoSphere BigInsights and BigSQLPiotr Pruski
 
Hadoop cluster configuration
Hadoop cluster configurationHadoop cluster configuration
Hadoop cluster configurationprabakaranbrick
 
Working with Hive Analytics
Working with Hive AnalyticsWorking with Hive Analytics
Working with Hive AnalyticsManish Chopra
 
Apache Hive micro guide - ConfusedCoders
Apache Hive micro guide - ConfusedCodersApache Hive micro guide - ConfusedCoders
Apache Hive micro guide - ConfusedCodersYash Sharma
 
Hortonworks Setup & Configuration on Azure
Hortonworks Setup & Configuration on AzureHortonworks Setup & Configuration on Azure
Hortonworks Setup & Configuration on AzureAnita Luthra
 
Big Data: Getting started with Big SQL self-study guide
Big Data:  Getting started with Big SQL self-study guideBig Data:  Getting started with Big SQL self-study guide
Big Data: Getting started with Big SQL self-study guideCynthia Saracco
 
Apache hadoop-administrator-training
Apache hadoop-administrator-trainingApache hadoop-administrator-training
Apache hadoop-administrator-trainingKnowledgehut
 
Talend openstudio bigdata_gettingstarted_6.3.0_en
Talend openstudio bigdata_gettingstarted_6.3.0_enTalend openstudio bigdata_gettingstarted_6.3.0_en
Talend openstudio bigdata_gettingstarted_6.3.0_enManoj Sharma
 
HBase In Action - Chapter 10 - Operations
HBase In Action - Chapter 10 - OperationsHBase In Action - Chapter 10 - Operations
HBase In Action - Chapter 10 - Operationsphanleson
 
Hortonworks HBase Meetup Presentation
Hortonworks HBase Meetup PresentationHortonworks HBase Meetup Presentation
Hortonworks HBase Meetup PresentationHortonworks
 
Overview of big data & hadoop version 1 - Tony Nguyen
Overview of big data & hadoop   version 1 - Tony NguyenOverview of big data & hadoop   version 1 - Tony Nguyen
Overview of big data & hadoop version 1 - Tony NguyenThanh Nguyen
 
Overview of Big data, Hadoop and Microsoft BI - version1
Overview of Big data, Hadoop and Microsoft BI - version1Overview of Big data, Hadoop and Microsoft BI - version1
Overview of Big data, Hadoop and Microsoft BI - version1Thanh Nguyen
 

Similar to Big Data: HBase and Big SQL self-study lab (20)

Big Data: Working with Big SQL data from Spark
Big Data:  Working with Big SQL data from Spark Big Data:  Working with Big SQL data from Spark
Big Data: Working with Big SQL data from Spark
 
hbase lab
hbase labhbase lab
hbase lab
 
Big Data: Big SQL web tooling (Data Server Manager) self-study lab
Big Data:  Big SQL web tooling (Data Server Manager) self-study labBig Data:  Big SQL web tooling (Data Server Manager) self-study lab
Big Data: Big SQL web tooling (Data Server Manager) self-study lab
 
Hbase Tutorial
Hbase TutorialHbase Tutorial
Hbase Tutorial
 
HBase Mongo_DB Project
HBase Mongo_DB ProjectHBase Mongo_DB Project
HBase Mongo_DB Project
 
Big Data: Explore Hadoop and BigInsights self-study lab
Big Data:  Explore Hadoop and BigInsights self-study labBig Data:  Explore Hadoop and BigInsights self-study lab
Big Data: Explore Hadoop and BigInsights self-study lab
 
Hands-on-Lab: Adding Value to HBase with IBM InfoSphere BigInsights and BigSQL
Hands-on-Lab: Adding Value to HBase with IBM InfoSphere BigInsights and BigSQLHands-on-Lab: Adding Value to HBase with IBM InfoSphere BigInsights and BigSQL
Hands-on-Lab: Adding Value to HBase with IBM InfoSphere BigInsights and BigSQL
 
Hadoop cluster configuration
Hadoop cluster configurationHadoop cluster configuration
Hadoop cluster configuration
 
Working with Hive Analytics
Working with Hive AnalyticsWorking with Hive Analytics
Working with Hive Analytics
 
Apache Hive micro guide - ConfusedCoders
Apache Hive micro guide - ConfusedCodersApache Hive micro guide - ConfusedCoders
Apache Hive micro guide - ConfusedCoders
 
hive lab
hive labhive lab
hive lab
 
Hortonworks Setup & Configuration on Azure
Hortonworks Setup & Configuration on AzureHortonworks Setup & Configuration on Azure
Hortonworks Setup & Configuration on Azure
 
Big Data: Getting started with Big SQL self-study guide
Big Data:  Getting started with Big SQL self-study guideBig Data:  Getting started with Big SQL self-study guide
Big Data: Getting started with Big SQL self-study guide
 
Apache hadoop-administrator-training
Apache hadoop-administrator-trainingApache hadoop-administrator-training
Apache hadoop-administrator-training
 
Talend openstudio bigdata_gettingstarted_6.3.0_en
Talend openstudio bigdata_gettingstarted_6.3.0_enTalend openstudio bigdata_gettingstarted_6.3.0_en
Talend openstudio bigdata_gettingstarted_6.3.0_en
 
HBase In Action - Chapter 10 - Operations
HBase In Action - Chapter 10 - OperationsHBase In Action - Chapter 10 - Operations
HBase In Action - Chapter 10 - Operations
 
Hortonworks HBase Meetup Presentation
Hortonworks HBase Meetup PresentationHortonworks HBase Meetup Presentation
Hortonworks HBase Meetup Presentation
 
Hadoop - Apache Hbase
Hadoop - Apache HbaseHadoop - Apache Hbase
Hadoop - Apache Hbase
 
Overview of big data & hadoop version 1 - Tony Nguyen
Overview of big data & hadoop   version 1 - Tony NguyenOverview of big data & hadoop   version 1 - Tony Nguyen
Overview of big data & hadoop version 1 - Tony Nguyen
 
Overview of Big data, Hadoop and Microsoft BI - version1
Overview of Big data, Hadoop and Microsoft BI - version1Overview of Big data, Hadoop and Microsoft BI - version1
Overview of Big data, Hadoop and Microsoft BI - version1
 

More from Cynthia Saracco

Using your DB2 SQL Skills with Hadoop and Spark
Using your DB2 SQL Skills with Hadoop and SparkUsing your DB2 SQL Skills with Hadoop and Spark
Using your DB2 SQL Skills with Hadoop and SparkCynthia Saracco
 
Big Data: Getting off to a fast start with Big SQL (World of Watson 2016 sess...
Big Data: Getting off to a fast start with Big SQL (World of Watson 2016 sess...Big Data: Getting off to a fast start with Big SQL (World of Watson 2016 sess...
Big Data: Getting off to a fast start with Big SQL (World of Watson 2016 sess...Cynthia Saracco
 
Big Data: SQL query federation for Hadoop and RDBMS data
Big Data:  SQL query federation for Hadoop and RDBMS dataBig Data:  SQL query federation for Hadoop and RDBMS data
Big Data: SQL query federation for Hadoop and RDBMS dataCynthia Saracco
 
Big Data: Querying complex JSON data with BigInsights and Hadoop
Big Data:  Querying complex JSON data with BigInsights and HadoopBig Data:  Querying complex JSON data with BigInsights and Hadoop
Big Data: Querying complex JSON data with BigInsights and HadoopCynthia Saracco
 
Big Data: Using free Bluemix Analytics Exchange Data with Big SQL
Big Data: Using free Bluemix Analytics Exchange Data with Big SQL Big Data: Using free Bluemix Analytics Exchange Data with Big SQL
Big Data: Using free Bluemix Analytics Exchange Data with Big SQL Cynthia Saracco
 
Big Data: InterConnect 2016 Session on Getting Started with Big Data Analytics
Big Data:  InterConnect 2016 Session on Getting Started with Big Data AnalyticsBig Data:  InterConnect 2016 Session on Getting Started with Big Data Analytics
Big Data: InterConnect 2016 Session on Getting Started with Big Data AnalyticsCynthia Saracco
 
Big Data: SQL on Hadoop from IBM
Big Data:  SQL on Hadoop from IBM Big Data:  SQL on Hadoop from IBM
Big Data: SQL on Hadoop from IBM Cynthia Saracco
 
Big Data: Get started with SQL on Hadoop self-study lab
Big Data:  Get started with SQL on Hadoop self-study lab Big Data:  Get started with SQL on Hadoop self-study lab
Big Data: Get started with SQL on Hadoop self-study lab Cynthia Saracco
 
Big Data: Technical Introduction to BigSheets for InfoSphere BigInsights
Big Data:  Technical Introduction to BigSheets for InfoSphere BigInsightsBig Data:  Technical Introduction to BigSheets for InfoSphere BigInsights
Big Data: Technical Introduction to BigSheets for InfoSphere BigInsightsCynthia Saracco
 
Big Data: Introducing BigInsights, IBM's Hadoop- and Spark-based analytical p...
Big Data: Introducing BigInsights, IBM's Hadoop- and Spark-based analytical p...Big Data: Introducing BigInsights, IBM's Hadoop- and Spark-based analytical p...
Big Data: Introducing BigInsights, IBM's Hadoop- and Spark-based analytical p...Cynthia Saracco
 

More from Cynthia Saracco (10)

Using your DB2 SQL Skills with Hadoop and Spark
Using your DB2 SQL Skills with Hadoop and SparkUsing your DB2 SQL Skills with Hadoop and Spark
Using your DB2 SQL Skills with Hadoop and Spark
 
Big Data: Getting off to a fast start with Big SQL (World of Watson 2016 sess...
Big Data: Getting off to a fast start with Big SQL (World of Watson 2016 sess...Big Data: Getting off to a fast start with Big SQL (World of Watson 2016 sess...
Big Data: Getting off to a fast start with Big SQL (World of Watson 2016 sess...
 
Big Data: SQL query federation for Hadoop and RDBMS data
Big Data:  SQL query federation for Hadoop and RDBMS dataBig Data:  SQL query federation for Hadoop and RDBMS data
Big Data: SQL query federation for Hadoop and RDBMS data
 
Big Data: Querying complex JSON data with BigInsights and Hadoop
Big Data:  Querying complex JSON data with BigInsights and HadoopBig Data:  Querying complex JSON data with BigInsights and Hadoop
Big Data: Querying complex JSON data with BigInsights and Hadoop
 
Big Data: Using free Bluemix Analytics Exchange Data with Big SQL
Big Data: Using free Bluemix Analytics Exchange Data with Big SQL Big Data: Using free Bluemix Analytics Exchange Data with Big SQL
Big Data: Using free Bluemix Analytics Exchange Data with Big SQL
 
Big Data: InterConnect 2016 Session on Getting Started with Big Data Analytics
Big Data:  InterConnect 2016 Session on Getting Started with Big Data AnalyticsBig Data:  InterConnect 2016 Session on Getting Started with Big Data Analytics
Big Data: InterConnect 2016 Session on Getting Started with Big Data Analytics
 
Big Data: SQL on Hadoop from IBM
Big Data:  SQL on Hadoop from IBM Big Data:  SQL on Hadoop from IBM
Big Data: SQL on Hadoop from IBM
 
Big Data: Get started with SQL on Hadoop self-study lab
Big Data:  Get started with SQL on Hadoop self-study lab Big Data:  Get started with SQL on Hadoop self-study lab
Big Data: Get started with SQL on Hadoop self-study lab
 
Big Data: Technical Introduction to BigSheets for InfoSphere BigInsights
Big Data:  Technical Introduction to BigSheets for InfoSphere BigInsightsBig Data:  Technical Introduction to BigSheets for InfoSphere BigInsights
Big Data: Technical Introduction to BigSheets for InfoSphere BigInsights
 
Big Data: Introducing BigInsights, IBM's Hadoop- and Spark-based analytical p...
Big Data: Introducing BigInsights, IBM's Hadoop- and Spark-based analytical p...Big Data: Introducing BigInsights, IBM's Hadoop- and Spark-based analytical p...
Big Data: Introducing BigInsights, IBM's Hadoop- and Spark-based analytical p...
 

Recently uploaded

Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUK Journal
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxKatpro Technologies
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?Igalia
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessPixlogix Infotech
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsJoaquim Jorge
 

Recently uploaded (20)

Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your Business
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 

Big Data: HBase and Big SQL self-study lab

  • 1. Working with HBase and Big SQL Cynthia M. Saracco IBM Solution Architect September 14, 2015
  • 2. Page 2 Introducing HBase Contents LAB 1 OVERVIEW......................................................................................................................................................... 3 1.1. ABOUT YOUR ENVIRONMENT ..................................................................................................................... 3 1.2. GETTING STARTED ................................................................................................................................... 4 LAB 2 ISSUING BASIC HBASE COMMANDS ............................................................................................................. 6 2.1. CREATING AND ALTERING A TABLE ............................................................................................................. 7 2.2. INSERTING AND RETRIEVING DATA............................................................................................................ 10 2.3. UPDATING DATA..................................................................................................................................... 14 2.4. DELETING DATA ..................................................................................................................................... 15 2.5. DROPPING A TABLE ................................................................................................................................ 16 2.6. EXPLORING THE IMPACT OF YOUR WORK................................................................................................... 17 LAB 3 USING BIG SQL TO CREATE AND QUERY HBASE TABLES ...................................................................... 24 3.1. EXPERIMENTING WITH YOUR FIRST BIG SQL HBASE TABLE........................................................................ 24 3.2. CREATING VIEWS OVER BIG SQL HBASE TABLES...................................................................................... 28 3.3. LOADING DATA INTO A BIG SQL HBASE TABLE.......................................................................................... 28 3.4. OPTIONAL: DROPPING TABLES CREATED IN THIS LAB ................................................................................. 31 LAB 4 EXPLORING DATA MODELING OPTIONS..................................................................................................... 33 4.1. DE-NORMALIZING RELATIONAL TABLES (MAPPING MULTIPLE TABLES TO ONE HBASE TABLE) ........................... 33 4.2. GENERATING UNIQUE ROW KEY VALUES ................................................................................................... 39 4.3. MAPPING MULTIPLE SQL COLUMNS TO ONE HBASE ROW KEY OR COLUMN ................................................... 43 4.4. OPTIONAL: DROPPING TABLES CREATED IN THIS LAB ................................................................................. 45 LAB 5 SUMMARY ....................................................................................................................................................... 47
  • 3. Hands On Lab Page 3 Lab 1 Overview In this hands-on lab, you'll learn the basics of using HBase natively and with Big SQL, IBM's industry- standard SQL interface for data stored in supported Hadoop-based platforms. HBase is an open source key-value data storage mechanism commonly used in Hadoop environments. It features a column- oriented data model that enables programmers to efficiently retrieve data by key values from very large data sets. It also supports certain data modification operations, readily accommodates sparse data, and supports various kinds of data. Indeed, HBase doesn’t have primitive data types – all user data is stored as byte arrays. With Big SQL, programmers can use SQL to query data in Big SQL tables managed by HBase. After completing this hands-on lab, you’ll be able to: • Use the HBase shell (command-line interface) to issue commands • Create HBase tables that include multiple column families and columns • Work directly with HBase tables to add, retrieve, and delete data • Create, populate, and query Big SQL tables that store data in HBase • Create and query views based on Big SQL tables managed by HBase • Explore options for mapping relational database designs to Big SQL HBase tables • Generate unique row key values for Big SQL tables managed by HBase • Investigate the storage implications of certain Big SQL HBase table designs • Explore the status of your HBase cluster through a Web interface Allow 2.5 - 3 hours to complete this lab. Although this lab briefly summarizes HBase architecture and basic concepts, you may find it helpful to read introductory articles or watch some videos on HBase before beginning your work. This lab was developed by Cynthia M. Saracco, IBM Silicon Valley Lab, with thanks to Bruce Brown and Piotr Pruski for their earlier contributions and Nailah Bissoon, Scott Gray, Dan Kikuchi, Ellen Patterson, Henry Quach, and Deepa Remesh for their earlier reviews. Please post questions or comments about this lab or the technologies it describes to the forum on Hadoop Dev at https://developer.ibm.com/hadoop/. 1.1. About your environment This lab was developed for a BigInsights 4.1 environment running on the IBM Open Platform for Apache Hadoop. To complete this lab, install and launch these Hadoop-based services: HDFS, MapReduce, YARN, Hive, HBase, ZooKeeper, Sqoop, Knox, and Ambari. In addition, you must install and launch the Big SQL service, as well as install the BigInsights Home service, which includes JSqsh (the SQL command line environment). Big SQL and BigInsights Home are part of BigInsights Quick Start Edition, BigInsights Data Analyst, and BigInsights Data Scientist. Before proceeding with this tutorial, ensure that you have access to a working platform with Big SQL running and JSqsh accessible.
  • 4. Page 4 Introducing HBase Examples in this lab use are based on a sample environment with the configuration shown in the tables below. If your environment is different, modify the sample code and instructions as needed to match your configuration. User Password Root account root password Big SQL Administrator bigsql bigsql Ambari Administrator admin admin Property Value Host name myhost.ibm.com Ambari port number 8080 HBase client directory with HBase shell /usr/iop/current/hbase-client/bin HBase Master installation directory /usr/iop/current/hbase-master HBase Region Server installation directory /usr/iop/current/hbase-region Big SQL database name bigsql Big SQL port number 32051 Big SQL installation directory /usr/ibmpacks/bigsql Big SQL samples directory /usr/ibmpacks/bigsql/4.1/bigsql/samples/data JSqsh installation directory /usr/ibmpacks/common-utils/current/jsqsh About the screen captures, sample code, and environment configuration Screen captures in this lab depict examples and results that may vary from what you see when you complete the exercises. In addition, some code examples may need to be customized to match your environment. 1.2. Getting started To get started with the lab exercises, you need to access to a working BigInsights environment, as described in the previous section. A free Quick Start Edition is available for download from IBM at http://www-01.ibm.com/software/data/infosphere/hadoop/trials.html.
  • 5. Hands On Lab Page 5 This lab was tested against a native BigInsights installation on a multi-node cluster. Although the IBM Analytics for Hadoop cloud service on Bluemix includes Big SQL, it does not support SSH access. As a result, examples in this lab that involve JSqsh, the HBase shell, and HDFS commands cannot be executed using that Bluemix service. Therefore, if you want to follow all examples in this lab, you should install and configure BigInsights on your own cluster. Consult the instructions in the product's Knowledge Center (http://www- 01.ibm.com/support/knowledgecenter/SSPT3X_4.1.0/com.ibm.swg.im.infosphere.biginsights.welcome.d oc/doc/welcome.html). Before continuing with this lab, verify that Big SQL and all pre-requisite services are running. If have any questions or need help getting your environment up and running, visit Hadoop Dev (https://developer.ibm.com/hadoop/) and review the product documentation or post a message to the forum. You cannot proceed with subsequent lab exercises without access to a working environment.
  • 6. Page 6 Introducing HBase Lab 2 Issuing basic HBase commands After confirming that all necessary services are running, you're ready to start using HBase directly. Exercises in this lab are intended for those with little or no prior experience using HBase. As such, after completing this lab, you'll know how to  Launch the HBase shell  Create an HBase table  Inspect the characteristics of a table  Alter properties associated with a table  Populate a table with data  Retrieve data from a table  Use HBase Web interfaces to explore information about your environment As you work through this lab, you'll become familiar with some basic HBase concepts, such as row keys, column families, and columns. You'll also observe its schema-less nature. However, a detailed explanation of HBase is beyond the scope of this lab. Visit the Apache HBase site (http://hbase.apache.org/) or the HBase wiki (http://wiki.apache.org/hadoop/Hbase) for reference materials on HBase. To keep this lab simple, you will create one HBase table to track customer reviews of various products. Each review will have a unique identifier, summary information (e.g., product name and rating), reviewer data (e.g., name and location), and detailed information (e.g., review comments). In a relational DBMS, such information might be stored in a single table with one column for each attribute to be tracked (e.g., REVIEW-ID, PRODUCT, RATING, REVIEWER-NAME, REVIEWER-LOCATION, COMMENTS). Furthermore, a data type would be specified for each column at creation – perhaps INT for the REVIEW- ID, VARCHAR(30) for the PRODUCT, and so on. In HBase, your table design will be different. The unique identifier for each review will serve as the row key. Attributes commonly queried together will be grouped into a column family. HBase requires at least one column family per table. Yours will have three:  summary, which summarizes essential review information (such as the product name and rating).  reviewer, which tracks data related to the user who reviewed the product.  details, which tracks comments, tips, and other detailed review information. Each of these column families may contain one or more columns, depending on the data associated with a given review. For example, one review might contain the name, location, and email address of a reviewer, while another might only contain the reviewer’s name. HBase is schema-less, so you don’t define any columns when you create your table. Furthermore, since all data in HBase is stored in byte arrays, you don’t declare data types for your columns. You’ll put this concepts into practice shortly.
  • 7. Hands On Lab Page 7 In production environments, programmers typically interact with HBase through applications they've built in Java or another supported interface, such as REST or Thrift. However, to keep things simple, you'll work with HBase through its command-line interface. HBase consists of an active HBase Master Server and one or more Region Server(s). Region Servers manage user data modeled as HBase tables. HBase automatically partitions tables into regions, storing a range of rows together based on their key values. Regions are stored in files in your distributed file system. Allow 1 – 1.5 hours to complete all sections of this lab. You must have a working Big SQL and HBase environment, as described in the first module of this series of lab exercises. 2.1. Creating and altering a table To begin, create a reviews table and alter some of its default properties before populating it with data. __1. If necessary, open a terminal window. __2. Launch the HBase shell. From the HBase client …/bin directory (typically /usr/iop/current/hbase-client/bin), issue this command: hbase shell Ignore any informational messages that may appear. Verify that the shell launched successfully and that your screen appears similar to this: __3. Optionally, type help to review information about supported shell commands. A portion of this information is shown here:
  • 8. Page 8 Introducing HBase __4. Create an HBase table named reviews with 3 column families: summary, reviewer, and details. Ignore any warnings that may appear involving multiple SLF4J bindings. create 'reviews', 'summary', 'reviewer', 'details' About the CREATE command CREATE only requires the name of the table and one or more column families. In a moment, you'll see how to add columns to the table dynamically when you insert some data into the table. __5. Inspect the default properties associated with your new table: describe 'reviews' Note that your table is shown as ENABLED, or ready for use. Also note that each column family has some various properties associated with it. For example, the summary column family's IN_MEMORY property is set to false in the screen capture below. Since we expect most HBase queries to reference this column family, let's change the property to TRUE. This instructs HBase to give priority to caching this data.
  • 9. Hands On Lab Page 9 __6. To alter (or drop) a table, you must first disable it: disable 'reviews' __7. Alter the table to set the IN_MEMORY property of the summary column family to true. alter 'reviews', {NAME => 'summary', IN_MEMORY => 'true'} __8. Set the number of versions for the summary and reviewer column families to 2. HBase can store multiple versions of data for each column family. If your application does not require multiple versions, the VERSIONS property for each column family should be set to 1. alter 'reviews', {NAME => 'summary', VERSIONS => 2}, {NAME => 'reviewer', VERSIONS => 2} __9. Verify that your property changes were captured correctly: describe 'reviews'
  • 10. Page 10 Introducing HBase __10. Enable (or activate) the table so that it’s ready for use. enable 'reviews' Now you can populate your table with data and query it. 2.2. Inserting and retrieving data This exercise introduces you to the PUT, GET, SCAN, and COUNT commands. As you might imagine, PUT enables you to write data to HBase. GET and SCAN enable you to retrieve data, while COUNT returns the total number of rows in your table. __1. Insert some data into your HBase table. The PUT command enables you to write data into a single cell of an HBase table. This cell may reside in an existing row or may belong to a new row. Issue this command: put 'reviews', '101', 'summary:product', 'hat'
  • 11. Hands On Lab Page 11 What happened after executing this command Executing this command caused HBase to add a row with a row key of 101 to the reviews table and to write the value of hat into the product column of the summary column family. Note that this command dynamically created the summary:product column and that no data type was specified for this column. What if you have more data for this row? You need to issue additional PUT commands – one for each cell (i.e., each column family:column) in the target row. You’ll do that shortly. But before you do, consider what HBase just did behind the scenes . . . . HBase wrote your data to a Write-Ahead Log (WAL) in your distributed file system to allow for recovery from a server failure. In addition, it cached your data (in a MemStore) of a specific region managed by a specific Region Server. At some point, when the MemStore becomes full, your data will be flushed to disk and stored in files (HFiles) in your distributed file system. Each HFile contains data related to a specific column family. __2. Retrieve the row. To do so, provide the table name and row key value to the GET command: get 'reviews', '101' __3. Add more cells (columns and data values) to this row: put 'reviews', '101', 'summary:rating', '5' put 'reviews', '101', 'reviewer:name', 'Chris' put 'reviews', '101', 'details:comment', 'Great value'
  • 12. Page 12 Introducing HBase About your table . . . . Conceptually, your table looks something like this: It has one row with 3 column families. The summary column family for this row contains two columns, while the other two column families for this row each have one column. Physically, data in each column family is stored together in your distributed file system (in one or more HFiles). __4. Retrieve row key 101 again: get 'reviews', '101' About this output This output can be a little confusing at first, because it’s showing that 4 rows are returned. This row count refers to the number of lines (rows) displayed on the screen. Since information about each cell is displayed on a separate line and there are 4 cells in row 101, the GET command reports 4 rows. __5. Count the number of rows in the entire table and verify that there is only 1 row: count 'reviews'
  • 13. Hands On Lab Page 13 The COUNT command is appropriate for small tables only. (Consult the HBase site for more efficient approaches for large tables.) __6. Add 2 more rows to your table using these commands: put 'reviews', '112', 'summary:product', 'vest' put 'reviews', '112', 'summary:rating', '5' put 'reviews', '112', 'reviewer:name', 'Tina' put 'reviews', '133', 'summary:product', 'vest' put 'reviews', '133', 'summary:rating', '4' put 'reviews', '133', 'reviewer:name', 'Helen' put 'reviews', '133', 'reviewer:location', 'USA' put 'reviews', '133', 'details:tip', 'Sizes run small. Order 1 size up.' Note that review 112 lacks any detailed information (e.g., a comment), while review 133 contains a tip in its details. Note also that review 133 includes the reviewer's location, which is not present in the other rows. Let's explore how HBase captures this information. __7. Retrieve the entire contents of the table using this SCAN command: scan 'reviews' Note that SCAN correctly reports that the table contains 3 rows. The display contains more than 3 lines, because each line includes information for a single cell in a row. Note also that each row in your table has a different schema and that missing information is simply omitted. Furthermore, each displayed line includes not only the value of a particular cell in the table but also its associated row key (e.g., 101), column family name (e.g., details), column name (e.g., comment), and timestamp. As you learned earlier, HBase is a key-value store. Together, these four attributes (row key, column family name, column qualifier, and timestamp) form the key.
  • 14. Page 14 Introducing HBase Consider the implications of storing this key information with each cell value. Having a large number of columns with values for all rows (in other words, dense data) means that a lot of key information is repeated. Also, large row key values and long column family / column names increase the table’s storage requirements. __8. Finally, restrict the scan results to retrieve only the contents of the summary column family and the reviewer:name column for row keys starting at '120' and ending at '150'. scan 'reviews', {COLUMNS => ['summary', 'reviewer:name'], STARTROW => '120', STOPROW => '150'} Given your sample data, only row '133' qualifies. Note that the reviewer's location (reviewer:location) and all the review details (details:tip) were omitted from the results due to the scan parameters you specified. 2.3. Updating data HBase doesn't have an UPDATE command or API. Instead, programmers simply write another set of column values for the same row key. In this exercise, you'll see how to update data values using the PUT command (again). You'll also explore how HBase maintains multiple versions of your data for the summary and reviewer column families. As you'll recall, in an earlier exercise you set the VERSIONS properties of these families to 2. __1. Update Tina's review (row key 112) to change the rating to '4': put 'reviews', '112', 'summary:rating', '4' __2. Scan the table to inspect the change. scan 'reviews' By default, HBase returns the most recent version of data for each cell.
  • 15. Hands On Lab Page 15 __3. To see multiple versions of your data, issue this command: scan 'reviews', {VERSIONS => 2} __4. You can also GET the original rating value from row 112 by explicitly specifying the timestamp value. This value will differ on your system, so you will need to substitute the value appropriate for your environment for the timestamp shown below. Consult the output from the previous step to obtain this value. get 'reviews', '112', {COLUMN => 'summary:rating', TIMESTAMP => 1421878110712} 2.4. Deleting data In this exercise, you'll learn how to delete data in your HBase table. You can delete a single cell value within a row or all cell values within a row. __1. Delete Tina's name from her review (row 112). delete 'reviews', '112', 'reviewer:name' __2. Scan the table to inspect the change. scan 'reviews'
  • 16. Page 16 Introducing HBase __3. Delete all cells associated with Tina's review (i.e., all data for row 112) and scan the table to inspect the change. deleteall 'reviews', '112' scan 'reviews' About DELETE . . . . DELETE doesn't remove data from the table immediately. Instead, it marks the data for deletion, which prevents the data from being included in any subsequent data retrieval operations. Because the underlying files that form an HBase table (HFiles) are immutable, storage for deleted data will not be recovered until an administrator initiates a major compaction operation. This operation consolidates data and reconciles deletions by removing both the deleted data and the delete indicator. 2.5. Dropping a table In this exercise, you'll learn how to drop an HBase table. Because we want to retain the reviews table for future exercises, you'll create a new table, verify its existence, and then drop it. __1. Create a sample table with 1 column family. create 'sample', 'cf1' __2. Describe the table or verify that it exists. Issue one of these two commands:
  • 17. Hands On Lab Page 17 describe 'sample' exists 'sample' __3. Disable the table you just created. (Before you can drop a table, you must disable or deactivate it.) disable 'sample' __4. Drop the table. drop 'sample' __5. Verify that the table no longer exists. exists 'sample' 2.6. Exploring the impact of your work Are you curious about how your work has affected your HBase environment? This exercise helps you explore some of the meta data available to you about your table as well as your overall HBase environment. __1. Launch a Web browser. __2. Enter the URL and port of your HBase Master Server Web interface (by default, this is port 60010). For example, if your host name is myhost.ibm.com and the HBase Master Service Web interface port is 60010, you would enter http://myhost.ibm.com:60010.
  • 18. Page 18 Introducing HBase Locating the HBase Master information port By default, the HBase Master Web interface port is 60010 on your host machine. This information is configured in the . . . /conf/hbase-site.xml file within the installation directory of your HBase Master (e.g., /usr/iop/current/hbase-master). Look for the hbase.master.info.port property in the hbase-site.xml file. __3. Scroll to the Tables section and click the User Tables tab. Note that your reviews table is present and that its data resides in 1 region (because your table is very small). Also note that the description highlights a few important aspects of your table, including its column families and properties that you altered to non-default values.
  • 19. Hands On Lab Page 19 __4. Click the [Details] link at top to display further details about the tables in your HBase server. __5. If necessary, scroll down to locate data for your table. Note that all your table's properties and their values are displayed here. Where have you seen similar output? This output should look familiar to you. Indeed, in a previous exercise you issued the describe command from the HBase shell to inspect the status of your table (enabled/disabled) as well as its properties. __6. Click the Back button on your browser to return to the previous page (the main page for the HBase Master Server Web interface). __7. In the Tables section, click on the User Tables tab and locate your reviews table again. Click on the link for this table. __8. Note the Region Server link for your table and click on it. You’ll find this information in the Table Regions section of the displayed page.
  • 20. Page 20 Introducing HBase __9. After you have been redirected to the HBase Region Server Web interface, skim through the information displayed. Locating the HBase Region Server information port By default, the HBase Region Server Web interface port is 60030. This information is configured in the …/conf/hbase-site.xml file within the installation directory of your HBase region server. Look for the hbase.regionserver.info.port property. __10. In the Server Metrics section at top, click on the Requests tab to display the total read and write request counts for your server since its launch. (The screen capture below was taken from a system with 264 read requests and 11 write requests. As with other screen captures, your data may vary.)
  • 21. Hands On Lab Page 21 __11. Scroll to the Regions section towards the bottom of the page. __12. Click on the Request metrics tab in the Regions section to determine the number of read and write requests for the reviews table. (This screen capture was taken after 18 read requests and 15 write requests were issued.) __13. If necessary, open a terminal window. From the Linux/Unix command line (not the HBase shell), list the contents of the HBase data directory in your DFS. The HBase data directory location is determined at installation. In the example below, this directory is /app/hbase/data/data/default. Note that a subdirectory for your reviews table is present. hdfs dfs -ls /apps/hbase/data/data/default/
  • 22. Page 22 Introducing HBase __14. Explore the contents of the …/reviews directory and look for a subdirectory with a long, system-generated name. hdfs dfs -ls /apps/hbase/data/data/default/reviews __15. Investigate the contents of this subdirectory and note that it contains 1 subdirectory for each column family of your table. (Substitute the system-generated subdirectory name in your environment for the sample name shown below.) Recall that HBase physically organizes data in your table by column family, which is reflected here. hdfs dfs -ls /apps/hbase/data/data/default/reviews/c628ccb212a396d1d97a954504470412 __16. If you'd like to learn how to explore your DFS contents using a Web browser, continue with the remainder of this lab module. Otherwise, skip to the next lab. __17. If necessary, determine the URL for your Name Node's Web interface. In your Hadoop client directory (by default /usr/iop/current/hadoop-client), browse the . . . /conf/hdfs- site.xml file. Note the setting for the dfs.namenode.http-address property. By default, this will be at port 50070 of your Name Node's host. __18. Direct your browser to the Name Node's HTTP address and verify that your display is similar to this:
  • 23. Hands On Lab Page 23 __19. In the menu at top, click the arrow key next to the Utilities tab to expose a drop-down menu. Select Browse the file system. __20. Navigate through the DFS directory tree, investigating the contents of your HBase database. For example, the screen capture below displays the contents of the HBase subdirectory for the reviews table.
  • 24. Page 24 Introducing HBase Lab 3 Using Big SQL to create and query HBase tables Although HBase provides useful data storage and retrieval capabilities, it lacks a rich query language. Fortunately, IBM's Big SQL technology, a component of BigInsights, enables programmers to use industry-standard SQL to create, populate, and query Big SQL tables managed by HBase. Native HBase data retrieval operations (such as GET and SCAN) can be performed on these tables, too. This lab introduces you to the basics of using Big SQL with HBase. Later, you'll have a chance to explore additional topics, such as data modeling. Prior to starting this lab, you must have access to a BigInsights 4.1 environment with Big SQL running. In addition, you need to know how to connect to your Big SQL server and how to issue a Big SQL query using JSqsh or another query tool. If necessary, consult the BigInsights product documentation before proceeding. After you complete the lessons in this module, you will understand how to:  Create Big SQL tables that use HBase as the underlying storage mechanism.  Populate these tables using Big SQL INSERT and LOAD operations.  Query these tables using projections and restrictions.  Create views over Big SQL tables managed by HBase and query these views. Allow 30 minutes to complete this lab. 3.1. Experimenting with your first Big SQL HBase table To get started, you'll create a Big SQL table named bigsqllab.reviews to capture information about product reviews. In contrast to the previous reviews table that you created natively in HBase, this Big SQL table will have a pre-defined set of columns just like any other SQL-based table. Moreover, your bigsqllab.reviews table definition will also specify how to map these SQL columns to HBase table attributes -- specifically, a row key and columns in specific column families. __1. If necessary, refresh your memory about the HBase reviews table created in an earlier lab. Recall that this table tracked reviews that users posted about various products. The table had 3 column families -- summary, reviewer, and details. And, of course, it had a row key. __2. Launch your Big SQL query execution environment and connect to your Big SQL database following the standard process appropriate for your query environment. For example, if you're using JSqsh and you previously created a Big SQL database connection named bigsql, you might enter /usr/ibmpacks/common-utils/current/jsqsh/bin/jsqsh bigsql Modify this sample command as needed to match your environment.
  • 25. Hands On Lab Page 25 __3. Create a Big SQL table named bigsqllab.reviews managed by HBase. CREATE HBASE TABLE IF NOT EXISTS BIGSQLLAB.REVIEWS ( REVIEWID varchar(10) primary key not null, PRODUCT varchar(30), RATING int, REVIEWERNAME varchar(30), REVIEWERLOC varchar(30), COMMENT varchar(100), TIP varchar(100) ) COLUMN MAPPING ( key mapped by (REVIEWID), summary:product mapped by (PRODUCT), summary:rating mapped by (RATING), reviewer:name mapped by (REVIEWERNAME), reviewer:location mapped by (REVIEWERLOC), details:comment mapped by (COMMENT), details:tip mapped by (TIP) ); About this CREATE HBASE TABLE statement . . . . This statement creates a Big SQL table named REVIEWS in the BIGSQLLAB schema and instructs Big SQL to use HBase as the underlying storage manager. The COLUMN MAPPING clause specifies how SQL columns are to be mapped to HBase columns in column families. For example, the SQL REVIEWERNAME column is mapped to the HBase column family:column of 'reviewer:name'. For simplicity, this example uses a 1:1 mapping of SQL columns to HBase columns. It also uses a single SQL column as the HBase row key. As you'll see in a subsequent lab, you may want (or need) to map multiple SQL columns to one HBase column or to the HBase row key. Big SQL supports doing so. Also for simplicity, we've accepted various defaults for this table, including default HBase column family property settings (for VERSIONS and others) and binary encoding for storage. Finally, note that the SQL REVIEWID column was defined as the SQL primary key. This is an informational constraint that can be useful for query optimization. However, it isn't actively enforced. __4. If necessary, open a terminal window. Use Hadoop file system commands to verify that the table was created. List the contents of the HBase data directory and confirm that a bigsqllab.reviews subdirectory exists. hdfs dfs -ls /apps/hbase/data/data/default
  • 26. Page 26 Introducing HBase The root HBase directory is determined at installation. The examples in this lab were developed for an environment in which HBase was configured to store data in /apps/hbase/data/data/default. If your HBase configuration is different, adjust the commands as needed to match your environment. __5. List the contents of your table's subdirectory. Observe that this subdirectory contains another subdirectory with a system-generated name. (In the screen capture below, this is shown in the final line.) hdfs dfs -ls /apps/hbase/data/data/default/bigsqllab.reviews __6. List the contents of the . . . /bigsqllab.reviews subdirectory with the system-generated name. (In the example above, this is the . . . 388912839f9df25a0f11996adede38b3 subdirectory.) Adjust the path specification below to match your environment. hdfs dfs -ls /apps/hbase/data/data/default/bigsqllab.reviews/388912839f9df25a0f11996adede3 8b3 Note that there are additional subdirectories for each of the 3 column families (summary, reviewer, details) specified in the COLUMN MAPPING clause of your CREATE HBASE TABLE statement. __7. Return to your Big SQL execution environment. __8. Insert a row into your Big SQL reviews table. insert into bigsqllab.reviews values ('198','scarf','2','Bruno',null,'Feels cheap',null);
  • 27. Hands On Lab Page 27 Note that this INSERT statement looks the same as any other SQL statement. You didn't need to insert one HBase cell value at a time, and you didn't need to understand the underlying HBase table structure. __9. Insert another row into your table, specifying values for only a subset of its columns. insert into bigsqllab.reviews (reviewid, product, rating, reviewername) values ('298','gloves','3','Beppe'); __10. Use SQL to count the number of rows stored in your table, verifying that 2 are present. select count(*) from bigsqllab.reviews; __11. Execute a simple query to return specific information about reviews of products rated 3 or higher. select reviewid, product, reviewername, reviewerloc from bigsqllab.reviews where rating >= 3; As expected, only 1 row is returned. Again, note that your SELECT statement looks like any other SQL SELECT -- you didn't need to add any special code because the underlying storage manager is HBase. __12. Launch the HBase shell and verify that you can work directly with the bigsqllab.reviews table from the shell. To do so, scan the table. scan 'bigsqllab.reviews'
  • 28. Page 28 Introducing HBase As you would expect, the final line of output reports that there are 2 rows in your table. In case you're curious, x00 is both the non-null marker and also the terminator used for variable length binary encoded values. 3.2. Creating views over Big SQL HBase tables You can create views over Big SQL tables stored in HBase just as you can create views over Big SQL tables that store data in the Hive warehouse or in simple DFS files. Creating views over Big SQL HBase tables is straightforward, as you'll see in this exercise. __1. From your Big SQL query execution environment, create a view based on a subset of the bigsqllab.reviews table that you created earlier. create view bigsqllab.testview as select reviewid, product, reviewername, reviewerloc from bigsqllab.reviews where rating >= 3; Note that this view definition looks like any other SQL view definition. You didn't need to specify any syntax unique to Hadoop or HBase. __2. Query the view. select reviewid, product, reviewername from bigsqllab.testview; 3.3. Loading data into a Big SQL HBase table In this exercise, you'll explore how to use the Big SQL LOAD command to load data from a file into a Big SQL table managed by HBase. To do so, you will use sample data shipped with Big SQL that is typically installed with Big SQL client software. By default, this data is installed at /usr/ibmpacks/bigsql/4.1/bigsql/samples/data. It is also available online at https://hub.jazz.net/project/jayatheerthan/BigSQLSamples/overview#https://hub.jazz.net/git/jayatheerthan %252FBigSQLSamples/list/master/samples/data. The sample data represents data exported from a data warehouse that tracks sales of outdoor products. It includes a series of FACT and DIMENSION tables. For this lab, you will create 1 DIMENSION table and load sample data from 1 file into it. __1. Create a Big SQL table in HBase named bigsqllab.sls_product_dim. -- product dimension table CREATE HBASE TABLE IF NOT EXISTS bigsqllab.sls_product_dim ( product_key INT PRIMARY KEY NOT NULL
  • 29. Hands On Lab Page 29 , product_line_code INT NOT NULL , product_type_key INT NOT NULL , product_type_code INT NOT NULL , product_number INT NOT NULL , base_product_key INT NOT NULL , base_product_number INT NOT NULL , product_color_code INT , product_size_code INT , product_brand_key INT NOT NULL , product_brand_code INT NOT NULL , product_image VARCHAR(60) , introduction_date TIMESTAMP , discontinued_date TIMESTAMP ) COLUMN MAPPING ( key mapped by (PRODUCT_KEY), data:line_code mapped by (PRODUCT_LINE_CODE), data:type_key mapped by (PRODUCT_TYPE_KEY), data:type_code mapped by (PRODUCT_TYPE_CODE), data:number mapped by (PRODUCT_NUMBER), data:base_key mapped by (BASE_PRODUCT_KEY), data:base_number mapped by (BASE_PRODUCT_NUMBER), data:color mapped by (PRODUCT_COLOR_CODE), data:size mapped by (PRODUCT_SIZE_CODE), data:brand_key mapped by (PRODUCT_BRAND_KEY), data:brand_code mapped by (PRODUCT_BRAND_CODE), data:image mapped by (PRODUCT_IMAGE), data:intro_date mapped by (INTRODUCTION_DATE), data:discon_date mapped by (DISCONTINUED_DATE) );
  • 30. Page 30 Introducing HBase HBase design for this table The HBase specification of this Big SQL statement placed nearly all SQL columns into a single HBase column family named 'data'. As you know, HBase creates physical files for each column family; this is something you should take into consideration when designing your table's structure. It's often best to keep the number of column families per table small unless your workload involves many queries over mutually exclusive columns. For example, if you knew that PRODUCT_IMAGE data was rarely queried or frequently queried alone, you might want to store it separately (i.e., in a different column family:column). There's something else worth noting about this table definition: the HBase columns have shorter names than the SQL columns. As you saw in earlier exercises, HBase stores full key information (row key, column family name, column name, and timestamp) with the key value. This consumes disk space. If you keep the HBase column family and column names short and specify longer, more meaningful names for the SQL columns, your design will minimize disk consumption and remain friendly to SQL programmers. __2. Load data into the table. Change the SFTP and file path specifications shown below to match your environment. This statement will return a warning message providing details on the number of rows loaded, etc. LOAD HADOOP using file url 'sftp://yourID:yourPassword@myhost.ibm.com:22/your- dir/data/GOSALESDW.SLS_PRODUCT_DIM.txt' with SOURCE PROPERTIES ('field.delimiter'='t') INTO TABLE bigsqllab.sls_product_dim;
  • 31. Hands On Lab Page 31 A closer look at LOAD . . . . Let’s explore the LOAD syntax briefly. This statement loads data into a table using a file URL specification that relies on SFTP to locate the source file. In particular, the SFTP specification includes a valid user ID and password (yourID/yourPassword), the target host server and port (myhost.ibm.com:22), and the full path of the data file on that system. The WITH SOURCE PROPERTIES clause specifies that fields in the source data are delimited by tabs (t). The INTO TABLE clause identifies the target table for the LOAD operation. Using SFTP (or FTP) is one way in which you can invoke the LOAD command. If your target data already resides in your distributed file system, you can provide the DFS directory information in your file URL specification. Indeed, for optimal runtime performance, you may prefer to take that approach. In addition, you can load data directly from a remote relational DBMS via a JDBC connection. See the BigInsights Knowledge Center (product documentation) for details. __3. Count the number of rows in the table to verify that 274 are present. -- total rows in SLS_PRODUCT_DIM = 274 select count(*) from bigsqllab.SLS_PRODUCT_DIM; __4. Optionally, query the table. select product_key, introduction_date, product_color_code from bigsqllab.sls_product_dim where product_key > 30000 fetch first 5 rows only; 3.4. Optional: Dropping tables created in this lab If you don't plan to complete any subsequent labs, you may want to clean up your environment. This optional exercise shows you how to drop the tables you created in this lab.
  • 32. Page 32 Introducing HBase __1. Drop the bigsqllab.reviews table and the bigsqllab.sls_product_dim table. drop table bigsqllab.reviews; drop table bigsqllab.sls_product_dim; __2. Verify that these tables no longer exist. For example, query each table and confirm that you receive an error message indicating that the table name you provided is undefined (SQLCODE - 204, SQLSTATE 42704). select count(*) from bigsqllab.reviews; select count(*) from bigsqllab.sls_product_dim;
  • 33. Hands On Lab Page 33 Lab 4 Exploring data modeling options In this lab, you'll explore a few options for modeling data exported from a relational DBMS into Big SQL tables managed by HBase. In the previous lab, you saw how you could map data that was originally modeled as a single relational DBMS table into a Big SQL table. You even saw how you could integrate some minor HBase design enhancements into your Big SQL table, such as grouping frequently queried columns together into a single column family and using short names for HBase columns and column families. However, for simplicity, you looked at your dimension table in isolation; you didn't consider how this one table related to other tables in the original database. Although many HBase applications involve managing data that's outside the typical scope of a relational database, some organizations look to HBase as a potential storage mechanism for offloading seldom- queried relational data, including "cold" or "stale" data in a relational data warehouse. Such data is often spread across multiple tables in a normalized (or somewhat normalized) manner. Relationships among these tables are captured through primary key/foreign key constraints. Indeed, star schema or snowflake schema database designs are common in such scenarios. Furthermore, in some relational database designs, primary keys may not be present in all tables or may consist of multiple SQL columns. Furthermore, many relational tables are densely populated (i.e., they contain relatively few nulls). Finally, queries often involve multi-way joins. Such characteristics can pose challenges when attempting to port the relational design to Big SQL tables managed by HBase. This lab introduces you to some of these challenges and explores options for addressing them. While a full discussion of Big SQL and HBase data modeling topics is beyond the scope of this introductory lab, you'll have a chance to explore:  Many-to-one mapping of relational tables to an HBase table (de-normalization).  Many-to-one mapping of relational table columns to an HBase column.  Generation of unique row key values. Sample data for this lab is based on sample warehouse data available as part of your BigInsights installation and publicly at https://hub.jazz.net/project/jayatheerthan/BigSQLSamples/overview#https://hub.jazz.net/git/jayatheerthan %252FBigSQLSamples/list/master/samples/data. Allow 1/2 - 1 hour to complete this lab. Prior to starting this lab, you need to know how to connect to your BigInsights Big SQL database and how to issue a Big SQL query using JSqsh or another supported query tool. 4.1. De-normalizing relational tables (mapping multiple tables to one HBase table) Since multi-way join processing isn't a strength of HBase, you may need to de-normalize a traditional relational database design to implement an efficient HBase design for your Big SQL tables. Of course, Big SQL doesn't mandate the use of HBase -- you can use Hive or simple DFS files -- but let's assume you concluded that you wanted to use HBase as the underlying storage manager.
  • 34. Page 34 Introducing HBase In this exercise, you will explore one approach for mapping two tables along the PRODUCT dimension of the relational data warehouse into a single Big SQL table. Since the content for the source tables is stored in two separate files that each contain different sets of fields, you won't be able to directly load these files into your target table. Instead, you'll need to pre-process or transform the data before using it to populate your Big SQL table. You'll use Big SQL to help you with that task. Specifically, you'll upload the source files into your DFS using standard Hadoop file system commands. Next, you'll create Big SQL externally managed tables over these files. Doing so simply layers a SQL schema over these files -- it does not cause the data to be duplicated or copied into the Hive warehouse. Finally, you'll select the data you want from these external tables and create a Big SQL HBase table based on that result set. __1. If necessary, open a terminal window. __2. Check the directory permissions for your DFS. hdfs dfs -ls / If the /user directory cannot be written by the public (as shown in the example above), you will need to change these permissions so that you can create the necessary subdirectories for this lab using your standard lab user account. From the command line, issue this command to switch to the root user ID temporarily: su root When prompted, enter the password for this account. Then switch to the hdfs account. su hdfs While logged in as user hdfs, issue this command: hdfs dfs -chmod 777 /user Next, confirm the effect of your change: hdfs dfs -ls /
  • 35. Hands On Lab Page 35 Exit the hdfs user account. exit Finally, exit the root user account and return to your standard user account: exit __3. Create directories in your distributed file system for the source data files. If desired, alter the DFS information as appropriate for your environment. hdfs dfs -mkdir /user/hbase_lab hdfs dfs -mkdir /user/hbase_lab/sls_product_dim hdfs dfs -mkdir /user/hbase_lab/sls_product_line_lookup __4. Upload the source data files into their respective DFS directories. Change the local and DFS directories information below to match your environment. hdfs dfs -copyFromLocal /your-dir/data/GOSALESDW.SLS_PRODUCT_DIM.txt /user/hbase_lab/sls_product_dim/SLS_PRODUCT_DIM.txt hdfs dfs -copyFromLocal /your-dir/data/GOSALESDW.SLS_PRODUCT_LINE_LOOKUP.txt /user/hbase_lab/sls_product_line_lookup/SLS_PRODUCT_LINE_LOOKUP.txt __5. List the contents of the DFS directories into which you copied the files to validate your work. hdfs dfs -ls /user/hbase_lab/sls_product_dim hdfs dfs -ls /user/hbase_lab/sls_product_line_lookup __6. Ensure the contents of your new DFS directories is publicly accessible. hdfs dfs -chmod -R 777 /user/hbase_lab
  • 36. Page 36 Introducing HBase __7. If necessary, launch your Big SQL query execution environment. __8. Create external Big SQL tables for the sales product dimension (extern.sls_product_dim) and the sales product line lookup (extern.sls_product_line_lookup) tables. Note that the LOCATION clause in each statement references the DFS directory into which you copied the sample data. -- product dimension table CREATE EXTERNAL HADOOP TABLE IF NOT EXISTS extern.sls_product_dim ( product_key INT NOT NULL , product_line_code INT NOT NULL , product_type_key INT NOT NULL , product_type_code INT NOT NULL , product_number INT NOT NULL , base_product_key INT NOT NULL , base_product_number INT NOT NULL , product_color_code INT , product_size_code INT , product_brand_key INT NOT NULL , product_brand_code INT NOT NULL , product_image VARCHAR(60) , introduction_date TIMESTAMP , discontinued_date TIMESTAMP ) ROW FORMAT DELIMITED FIELDS TERMINATED BY 't' LINES TERMINATED BY 'n' location '/user/hbase_lab/sls_product_dim'; -- look up table with product line info in various languages CREATE EXTERNAL HADOOP TABLE IF NOT EXISTS extern.sls_product_line_lookup ( product_line_code INT NOT NULL , product_line_en VARCHAR(90) NOT NULL , product_line_de VARCHAR(90), product_line_fr VARCHAR(90) , product_line_ja VARCHAR(90), product_line_cs VARCHAR(90) , product_line_da VARCHAR(90), product_line_el VARCHAR(90) , product_line_es VARCHAR(90), product_line_fi VARCHAR(90) , product_line_hu VARCHAR(90), product_line_id VARCHAR(90) , product_line_it VARCHAR(90), product_line_ko VARCHAR(90) , product_line_ms VARCHAR(90), product_line_nl VARCHAR(90) , product_line_no VARCHAR(90), product_line_pl VARCHAR(90) , product_line_pt VARCHAR(90), product_line_ru VARCHAR(90) , product_line_sc VARCHAR(90), product_line_sv VARCHAR(90) , product_line_tc VARCHAR(90), product_line_th VARCHAR(90) ) ROW FORMAT DELIMITED FIELDS TERMINATED BY 't' LINES TERMINATED BY 'n' location '/user/hbase_lab/sls_product_line_lookup';
  • 37. Hands On Lab Page 37 If you encounter a SQL -5105 error message such as the one shown below, the DFS directory permissions for your target directory (e.g., /user/biadmin) may be too restrictive. From an OS terminal window, issue this command: hdfs dfs -ls /user/hbase-lab/ Your permissions must include rw settings. Consult the earlier steps in this lab for instructions on how to reset DFS directory permissions. __9. Verify that you can query each table. -- total rows in EXTERN.SLS_PRODUCT_DIM = 274 select count(*) from EXTERN.SLS_PRODUCT_DIM; -- total rows in EXTERN.SLS_PRODUCT_LINE_LOOKUP = 5 select count(*) from EXTERN.SLS_PRODUCT_LINE_LOOKUP; __10. Become familiar with the content of the extern.sls_product_dim table. select * from extern.sls_product_dim fetch first 3 rows only; Note that most of the columns contain only numeric codes; these columns typically serve as join keys for other tables in the PRODUCT dimension that contain more descriptive information. __11. Become familiar with the contents of the extern.sls_product_line_lookup table by selecting data in a few of its columns: select product_line_code, product_line_en, product_line_de, product_line_fr from extern.sls_product_line_lookup;
  • 38. Page 38 Introducing HBase Note that the PRODUCT_LINE_CODE column in this table is likely to be joined with the PRODUCT_LINE_CODE column of the sales product dimension table you created earlier. This will make it easy for your to de-normalize (or flatten) these tables into a single HBase table with a unique row key based on the product line code. __12. Execute a SELECT statement to join the two tables. In a moment, you'll use this statement as a basis for creating a new HBase table modeled after the result set of this query. (For simplicity, you will only retrieve a subset of columns in each.) select product_key, d.product_line_code, product_type_key, product_type_code, product_line_en, product_line_de from extern.sls_product_dim d, extern.sls_product_line_lookup l where d.product_line_code = l.product_line_code fetch first 3 rows only; __13. Verify that the following join of these tables will produce a result set with 274 rows -- the same number as the extern.sls_product_dim table. After all, you simply want to add more descriptive information (extracted from the extern.sls_product_line_lookup table) to its contents in your new HBase table. -- this query should return 274 select count(*) from extern.sls_product_dim d, extern.sls_product_line_lookup l where d.product_line_code = l.product_line_code; __14. Now you’re ready to create a Big SQL HBase table derived from your join query. In effect, you’re using Big SQL to transform source data in 2 files into a format suitable for a single HBase table. Issue this statement: -- flatten 2 product tables into 1 hbase table CREATE hbase TABLE IF NOT EXISTS bigsqllab.sls_product_flat ( product_key INT NOT NULL , product_line_code INT NOT NULL , product_type_key INT NOT NULL , product_type_code INT NOT NULL , product_line_en VARCHAR(90) , product_line_de VARCHAR(90) ) column mapping ( key mapped by (product_key), data:c2 mapped by (product_line_code), data:c3 mapped by (product_type_key), data:c4 mapped by (product_type_code), data:c5 mapped by (product_line_en),
  • 39. Hands On Lab Page 39 data:c6 mapped by (product_line_de) ) as select product_key, d.product_line_code, product_type_key, product_type_code, product_line_en, product_line_de from extern.sls_product_dim d, extern.sls_product_line_lookup l where d.product_line_code = l.product_line_code; __15. Verify that there are 274 rows in the table you just created. select count(*) from bigsqllab.sls_product_flat; __16. Query the bigsqllab.sls_product_flat table. select product_key, product_line_code, product_line_en from bigsqllab.sls_product_flat where product_key > 30270; 4.2. Generating unique row key values In this exercise, you’ll consider a situation in which the original relational table didn’t have a single column serving as its primary key. Thus, a one-to-one mapping of this table to a Big SQL table won’t be appropriate unless you take some additional action, such as using the FORCE UNIQUE KEY clause of the Big SQL CREATE HBASE TABLE statement. You’ll explore that approach here. __1. Become familiar with the schema for the SLS_SALES_FACT table exported from a relational database. Inspect the details below, and note that the primary key for this table is comprised of several columns. Table: SLS_SALES_FACT Columns: ORDER_DAY_KEY ORGANIZATION_KEY EMPLOYEE_KEY RETAILER_KEY RETAILER_SITE_KEY PRODUCT_KEY PROMOTION_KEY ORDER_METHOD_KEY SALES_ORDER_KEY SHIP_DAY_KEY CLOSE_DAY_KEY
  • 40. Page 40 Introducing HBase QUANTITY UNIT_COST UNIT_PRICE UNIT_SALE_PRICE GROSS_MARGIN SALE_TOTAL GROSS_PROFIT Primary Key: ORDER_DAY_KEY ORGANIZATION_KEY EMPLOYEE_KEY RETAILER_KEY RETAILER_SITE_KEY PRODUCT_KEY PROMOTION_KEY ORDER_METHOD_KEY Foreign Key: PRODUCT_KEY Parent table: SLS_PRODUCT_DIM __2. If necessary, open a terminal window. __3. Change directories to the location of the sample data in your local file system. Alter the directory specification shown below to match your environment. cd /usr/ibmpacks/bigsql/4.1/bigsql/samples/data __4. Count the number of lines (records) in the GOSALESDW.SLS_SALES_FACT.txt file, verifying that 446023 are present: wc -l GOSALESDW.SLS_SALES_FACT.txt __5. If necessary, launch your Big SQL query execution environment. __6. From your query execution environment, create a Big SQL HBase table named bigsqllab.sls_sales_fact_unique. Include a FORCE KEY UNIQUE clause with the row key specification to instruct Big SQL to append additional data to the ORDER_DAY_KEY values to ensure that each input record results in a unique value (and therefore a new row in the HBase table). This additional data won’t be visible to users who query the table. CREATE HBASE TABLE IF NOT EXISTS BIGSQLLAB.SLS_SALES_FACT_UNIQUE ( ORDER_DAY_KEY int, ORGANIZATION_KEY int, EMPLOYEE_KEY int, RETAILER_KEY int, RETAILER_SITE_KEY int, PRODUCT_KEY int, PROMOTION_KEY int, ORDER_METHOD_KEY int,
  • 41. Hands On Lab Page 41 SALES_ORDER_KEY int, SHIP_DAY_KEY int, CLOSE_DAY_KEY int, QUANTITY int, UNIT_COST decimal(19,2), UNIT_PRICE decimal(19,2), UNIT_SALE_PRICE decimal(19,2), GROSS_MARGIN double, SALE_TOTAL decimal(19,2), GROSS_PROFIT decimal(19,2) ) COLUMN MAPPING ( key mapped by (ORDER_DAY_KEY) force key unique, cf_data:cq_ORGANIZATION_KEY mapped by (ORGANIZATION_KEY), cf_data:cq_EMPLOYEE_KEY mapped by (EMPLOYEE_KEY), cf_data:cq_RETAILER_KEY mapped by (RETAILER_KEY), cf_data:cq_RETAILER_SITE_KEY mapped by (RETAILER_SITE_KEY), cf_data:cq_PRODUCT_KEY mapped by (PRODUCT_KEY), cf_data:cq_PROMOTION_KEY mapped by (PROMOTION_KEY), cf_data:cq_ORDER_METHOD_KEY mapped by (ORDER_METHOD_KEY), cf_data:cq_SALES_ORDER_KEY mapped by (SALES_ORDER_KEY), cf_data:cq_SHIP_DAY_KEY mapped by (SHIP_DAY_KEY), cf_data:cq_CLOSE_DAY_KEY mapped by (CLOSE_DAY_KEY), cf_data:cq_QUANTITY mapped by (QUANTITY), cf_data:cq_UNIT_COST mapped by (UNIT_COST), cf_data:cq_UNIT_PRICE mapped by (UNIT_PRICE), cf_data:cq_UNIT_SALE_PRICE mapped by (UNIT_SALE_PRICE), cf_data:cq_GROSS_MARGIN mapped by (GROSS_MARGIN), cf_data:cq_SALE_TOTAL mapped by (SALE_TOTAL), cf_data:cq_GROSS_PROFIT mapped by (GROSS_PROFIT) ); __7. Load data into this table. Adjust the user ID, password, and directory information as needed for your environment. LOAD HADOOP using file url 'sftp://yourID:yourPassword@myhost.ibm.com:22/your- dir/GOSALESDW.SLS_SALES_FACT.txt' with SOURCE PROPERTIES ('field.delimiter'='t') INTO TABLE bigsqllab.sls_sales_fact_unique; __8. Verify that the table contains the expected number of rows (446023). select count(*) from bigsqllab.sls_sales_fact_unique;
  • 42. Page 42 Introducing HBase __9. Query the table. select order_day_key, product_key, sale_total from bigsqllab.sls_sales_fact_unique where order_day_key BETWEEN 20040112 and 20040115 fetch first 5 rows only; __10. Optionally, investigate what happens if you had omitted the FORCE UNIQUE KEY clause when creating this table. 1. Create a table named bigsqllab.sls_sales_fact_nopk that omits the FORCE UNIQUE KEY clause for the row key definition. CREATE HBASE TABLE IF NOT EXISTS BIGSQLLAB.SLS_SALES_FACT_NOPK ( ORDER_DAY_KEY int, ORGANIZATION_KEY int, EMPLOYEE_KEY int, RETAILER_KEY int, RETAILER_SITE_KEY int, PRODUCT_KEY int, PROMOTION_KEY int, ORDER_METHOD_KEY int, SALES_ORDER_KEY int, SHIP_DAY_KEY int, CLOSE_DAY_KEY int, QUANTITY int, UNIT_COST decimal(19,2), UNIT_PRICE decimal(19,2), UNIT_SALE_PRICE decimal(19,2), GROSS_MARGIN double, SALE_TOTAL decimal(19,2), GROSS_PROFIT decimal(19,2) ) COLUMN MAPPING ( key mapped by (ORDER_DAY_KEY), cf_data:cq_ORGANIZATION_KEY mapped by (ORGANIZATION_KEY), cf_data:cq_EMPLOYEE_KEY mapped by (EMPLOYEE_KEY), cf_data:cq_RETAILER_KEY mapped by (RETAILER_KEY), cf_data:cq_RETAILER_SITE_KEY mapped by (RETAILER_SITE_KEY), cf_data:cq_PRODUCT_KEY mapped by (PRODUCT_KEY),
  • 43. Hands On Lab Page 43 cf_data:cq_PROMOTION_KEY mapped by (PROMOTION_KEY), cf_data:cq_ORDER_METHOD_KEY mapped by (ORDER_METHOD_KEY), cf_data:cq_SALES_ORDER_KEY mapped by (SALES_ORDER_KEY), cf_data:cq_SHIP_DAY_KEY mapped by (SHIP_DAY_KEY), cf_data:cq_CLOSE_DAY_KEY mapped by (CLOSE_DAY_KEY), cf_data:cq_QUANTITY mapped by (QUANTITY), cf_data:cq_UNIT_COST mapped by (UNIT_COST), cf_data:cq_UNIT_PRICE mapped by (UNIT_PRICE), cf_data:cq_UNIT_SALE_PRICE mapped by (UNIT_SALE_PRICE), cf_data:cq_GROSS_MARGIN mapped by (GROSS_MARGIN), cf_data:cq_SALE_TOTAL mapped by (SALE_TOTAL), cf_data:cq_GROSS_PROFIT mapped by (GROSS_PROFIT) ); 2. Load data into this table from your source file. Adjust the file URL specification as needed to match your environment. LOAD HADOOP using file url 'sftp://yourID:yourPassword@myhost.ibm.com:22/your- dir/GOSALESDW.SLS_SALES_FACT.txt'with SOURCE PROPERTIES ('field.delimiter'='t') INTO TABLE bigsqllab.sls_sales_fact_nopk; 3. Count the number of rows in your table. Note that there are only 440 rows. select count(*) from bigsqllab.sls_sales_fact_nopk; 4. Consider what just occurred. You loaded a file with 446023 records into your Big SQL HBase table without error, yet only 440 rows are present in your table. That's because HBase ensures that each row key is unique. If you put 5 different records with the same row key into a native HBase table, your HBase table will contain only 1 current row for that row key. Because you mapped a SQL column with non-unique values to the HBase row key, HBase essentially updated the information for those rows containing duplicate row key values. 4.3. Mapping multiple SQL columns to one HBase row key or column Until now, you've mapped each field from a source file (i.e., each SQL column in the source relational table) to a single Big SQL HBase column. Although straightforward to implement, this one-to-one mapping approach has a significant drawback: it consumes considerable disk space. Why? As you learned earlier, HBase stores full key information (row key, column family name, column name, and timestamp) along with each cell value. As a result, HBase tables with many columns can consume considerable disk space. In this exercise, you'll explore two different many-to-one column mapping options. In particular, you'll define a composite key for the HBase row key; in other words, your row key will be based on multiple SQL columns. In addition, you'll define dense columns in your HBase table; in other words, one HBase column will be based on multiple SQL columns.
  • 44. Page 44 Introducing HBase __1. Consider the relational schema for the SLS_SALES_FACT table shown at the beginning of the previous exercise. Recall that its primary key spanned several SQL columns, which you'll model as a composite row key in your Big SQL HBase table. In addition, let's assume that some SQL columns are commonly queried together, such as columns related to pricing and cost. Packing these SQL columns into a single, dense HBase column can reduce the I/O required to read and write this data. __2. If necessary, launch your Big SQL query execution environment. __3. Create a new Big SQL sales fact table named bigsqllab.sls_sales_fact_dense with a composite key and dense columns. CREATE HBASE TABLE IF NOT EXISTS BIGSQLLAB.SLS_SALES_FACT_DENSE ( ORDER_DAY_KEY int, ORGANIZATION_KEY int, EMPLOYEE_KEY int, RETAILER_KEY int, RETAILER_SITE_KEY int, PRODUCT_KEY int, PROMOTION_KEY int, ORDER_METHOD_KEY int, SALES_ORDER_KEY int, SHIP_DAY_KEY int, CLOSE_DAY_KEY int, QUANTITY int, UNIT_COST decimal(19,2), UNIT_PRICE decimal(19,2), UNIT_SALE_PRICE decimal(19,2), GROSS_MARGIN double, SALE_TOTAL decimal(19,2), GROSS_PROFIT decimal(19,2) ) COLUMN MAPPING ( key mapped by (ORDER_DAY_KEY, ORGANIZATION_KEY, EMPLOYEE_KEY, RETAILER_KEY, RETAILER_SITE_KEY, PRODUCT_KEY, PROMOTION_KEY, ORDER_METHOD_KEY), cf_data:cq_OTHER_KEYS mapped by (SALES_ORDER_KEY, SHIP_DAY_KEY, CLOSE_DAY_KEY), cf_data:cq_QUANTITY mapped by (QUANTITY), cf_data:cq_MONEY mapped by (UNIT_COST, UNIT_PRICE, UNIT_SALE_PRICE, GROSS_MARGIN, SALE_TOTAL, GROSS_PROFIT) ); __4. Load data into this table. (Adjust the SFTP specification below to match your environment.)
  • 45. Hands On Lab Page 45 LOAD HADOOP using file url 'sftp://yourID:yourPassword@myhost.ibm.com:22/opt/ibm/biginsights/bigsql/samp les/data/GOSALESDW.SLS_SALES_FACT.txt' with SOURCE PROPERTIES ('field.delimiter'='t') INTO TABLE bigsqllab.sls_sales_fact_dense; __5. Verify that the table contains the expected number of rows (446023). select count(*) from bigsqllab.sls_sales_fact_dense; __6. Query the table. select order_day_key, product_key, sale_total from bigsqllab.sls_sales_fact_dense where order_day_key BETWEEN 20040112 and 20040115 fetch first 5 rows only; __7. If you're curious about how the storage savings of the "dense" model of your fact table compares with the original model that mapped each SQL column to an HBase row key or column, open a terminal window and execute this command: hdfs dfs -du /apps/hbase/data/data/default | grep -i bigsqllab | sort Compare the size of the bigsqllab.sls_sales_fact_dense table (shown in the first line of the sample output) with the size of the bigsqllab.sls_sales_fact_unique table (shown in the last ine of the sample output). The "dense" table consumes much less space than the original table. 4.4. Optional: Dropping tables created in this lab If you don't plan to complete any subsequent labs, you may want to clean up your environment. This optional exercise contains instruction to do so. __1. Drop the tables you created in this lab.
  • 46. Page 46 Introducing HBase drop table extern.sls_product_dim; drop table extern.sls_product_line_lookup; drop table bigsqllab.sls_product_flat; drop table bigsqllab.sls_sales_fact_unique; drop table bigsqllab.sls_sales_fact_nopk; drop table bigsqllab.sls_sales_fact_dense; __2. Optionally, verify that these tables no longer exist. For example, query each table and confirm that you receive an error message indicating that the table name you provided is undefined (SQLCODE -204, SQLSTATE 42704). select count(*) from extern.sls_product_dim; select count(*) from extern.sls_product_line_lookup; select count(*) from bigsqllab.sls_product_flat; select count(*) from bigsqllab.sls_sales_fact_unique; select count(*) from bigsqllab.sls_sales_fact_nopk; select count(*) from bigsqllab.sls_sales_fact_dense;
  • 47. Hands On Lab Page 47 Lab 5 Summary In this lab, you gained hands-on experience using HBase natively as well as using Big SQL with HBase. You learned how to create, popular and retrieve data from HBase using the HBase shell. In addition, you saw how Big SQL can store its data in HBase tables, thereby affording programmers sophisticated SQL query capabilities. You also explored some data modeling options available through Big SQL. To expand your skills even further, visit the HadoopDev web site (https://developer.ibm.com/hadoop/) contains for links to free online courses, tutorials, and more.
  • 48. NOTES
  • 49. NOTES
  • 50. © Copyright IBM Corporation 2015. Author: Cynthia M. Saracco. The information contained in these materials is provided for informational purposes only, and is provided AS IS without warranty of any kind, express or implied. IBM shall not be responsible for any damages arising out of the use of, or otherwise related to, these materials. Nothing contained in these materials is intended to, nor shall have the effect of, creating any warranties or representations from IBM or its suppliers or licensors, or altering the terms and conditions of the applicable license agreement governing the use of IBM software. References in these materials to IBM products, programs, or services do not imply that they will be available in all countries in which IBM operates. This information is based on current IBM product plans and strategy, which are subject to change by IBM without notice. Product release dates and/or capabilities referenced in these materials may change at any time at IBM’s sole discretion based on market opportunities or other factors, and are not intended to be a commitment to future product or feature availability in any way. IBM, the IBM logo and ibm.com are trademarks of International Business Machines Corp., registered in many jurisdictions worldwide. Other product and service names might be trademarks of IBM or other companies. A current list of IBM trademarks is available on the Web at “Copyright and trademark information” at www.ibm.com/legal/copytrade.shtml.