2. ● What is Hbase?
● Benefits
● Why Hbase?
● High Level Architecture
● Terminology
● When to use Hbase?
● Trivia
We’ll be talking about...
3. What is Hbase?
Hbase is the no-sql Hadoop database, a distributed, scalable, big data
store.
Hbase is used to find needle in a haystack
Source : http://www.startupdaily.net/2014/02/finding-needle-haystack-startup-simplifies-legal-ediscovery/
Needle - Pointed Random Read
Haystack- Big Data Store
4. Benefits
● Highly scalable
● Distributed storage
● Automatic and configurable sharding of tables
● Automatic failover support between RegionServers.
● Can be deployed on cheap commodity hardware
● Easy Client access - Java API, REST, Avro
5. Why Hbase?
Open source
Derived from Google BigTable concepts
Strong Community
Good Integration with Hadoop, sits on top of HDFS
7. Must Know Terminology in Hbase
● Master server - The master server (HMaster) co-ordinates the cluster and
performs administrative operations, such as assigning regions and balancing the
loads.
● Region servers - The region servers do the real work. A subset of the data of
each table is handled by each region server. Clients talk to region servers to access
data in HBase.
● ZooKeeper coordinates, communicates, and shares state between the Masters
and RegionServers.
● RowKey is like the primary index whose design be in-line with your search
pattern.
● Column Families are stored together on disk, which is why HBase is referred
to as a column-oriented data store.
9. When to use Hbase?
Hbase usage is highly dependent on use case and access pattern.
Access Pattern must be clearly defined before designing row key
(primary index) for hbase
Lot of engineering goes into design on efficient rowkey
Random read/write or both
Pointed Query on High Volume of data(TB’s)
10. Trivia-1
How can you use Hbase if you are currently using RDBMS?
“Splice Machine” and “Phoenix” provides an SQL engine on top of Hbase.
They use Derby Engine(SQL) on top of Hbase to provide this solution.
11. Trivia-2
There are no fancy data types such as String, INT, or Long in
HBase.
It's all byte array. It's a kind of byte-in and byte-out database.
Put put = new Put(Bytes.toBytes(rowKey));
Get g = new Get(Bytes.toBytes("row1"));