SlideShare uma empresa Scribd logo
1 de 21
HBASE
THE SCALABLE DATA STORE
Sampath Rachakonda
Agenda
 Evolution of HBASE
 Overview
 Data Model
 Architecture
 Hbase and Zookeper
Evolution of HBASE
 File-Systems
 Tapes → Linear Access or Sequential Access.
 Disc → Random Access
Seek Time
Transfer Rate
 DBMS
 RDBMS
 Now NOSQL
Hadoop
 It comprises mainly two
things HDFS and
MapReduce.
 HDFS is scalable, fault
tolerant, and high
performance DFS can
run on commodity
hardware.
 Map-Reduce is software
framework for
distributed computation.
 Master/Slave
`
 Limitations
Batch processing
Sequential Data
look-up
Not intended for
real time querying
No Support for
Random Access
NOSQL
 Massive Data Volumes
 Schema Evolution
As it is almost impossible for fixed Schema for
web scale database.
With NOSQL Schema changes can be gradually
introduced into systems.
 Extreme Query Load
Bottleneck is Joins
Why HBASE ?
 Column-Oriented Stores
 Distributed – Designed to serve large tables
 Horizontally Scalable
 High Performance & Availability
 Storage System
 The base goal of HBASE is Billions of Rows,
Millions of Columns and Thousand of versions
 Supports random real time CRUD operations unlike
HDFS
Who uses Hbase ?
 Facebook
 Adobe
 Twitter
 Yahoo
 Meetup
 Netflix
 Many More..
When to use HBASE ?
 Good for large amounts of data
100's of millions or billions of rows
Have to have enough hardware Large Amounts of
client requests
Single Random Selects and range scans by key
Great for variable schema
Analytical
HBASE Data Model
 Data is stored in Tables
 Tables contain rows
Rows are referenced by Unique key
Key is array of bytes anything can be a key.
 Rows made of columns are grouped in column
families
Data is stored in cells and identified by row x column-family x
column
 Tables are sorted by the row key in lexicographical
order.
HBASE Families
 Rows are grouped as families
Labeled as “Family:column”
 Example: “user:name”
Different features are applied to families
 Stored together – HFile/StoreFile
 Compression
 Table Schema defines its Column Families
Each family can consist of any number of columns
and Versions
Column exists when inserted, NULLS are free.
Columns with family are sorted and stored
together.
HBASE Timestamps
 Cells Values are versioned and 3 versions are kept
by default.
 Versions are stored in decreasing time-stamp order.
 Reads the latest first – which will be our current
value.
 Value will be
Value = Table + RowKey + Family + Column +
TimeStamp
 Index will be always unique
HBASE Cells Example
 Example of how values are stored
Row Key Time stamp Name Family Address Family
first_name last_name number address
row1 t1 Bob Smith
t5 10 First Lane
t10 30 Other Lane
t15 7 Last Street
row2 t20 Mary Tompson
t22 77 One Street
t30 Thompson
HBASE Architecture
 Table is made up of regions
 Region is a range of rows sorted together
Dynamically splits as they become too big and
merge when they are too small
 Master Server is responsible for managing Hbase
cluster (i.e.., Region Servers)
 Hbase stores its data into HDFS which makes to rely
it on high tolerant and high availability and fault
tolerance features.
 Zookeper is used for distributed coordination.
HBASE Architecture
 As Follows:
HBASE Regions
 Region is a range of keys start key to end key
exclusive
 Initially there will be one region as addition of data
exceed the configured maximum (256 MB default)
the region will be split
 No of regions per server varies from 10 to 10000 as
per hardware per region server.
 Splitting data into regions help us in different ways:
Fast Recovery when a region fails
Load Balancing when a server overloaded
Splitting is fast
HBASE Data Storage
 When data is added it will be written on to WAL
(Write Ahead Log) and also in memory (Memstore)
 When the data exceeds maximum value then it is
flushed out of WAL to HFile
 RegionServer still serves read-writes during the
flush operations, writing values to WAL &
Memstore.
 Hfile is nothing much than a Key-Value map.
 As HDFS doesn't support updates to an existing file
therefore HFiles are immutable.
 Delete Marker is saved to indicate whether record is
available or removed.
HBASE Data Storage(Contd.)
 Periodic Data Computations are performed to
control no of Hfiles and to keep cluster balanced
Minor Complication:
 Smaller Hfiles are merged into larger Hfiles
Fast as data is already sorted
Delete Markers are not applied
Major Complication:
 Scanning for all the entries and apply deletes as
necessary
 Merge all Hfiles of a region into a single file lies
within a column family
HBASE Master
 Manages Regions and their locations
Assigns Regions
Balances workload
Recovers if any region server is unavailable
Uses Zookeeper for distributed coordination
service
 Clients directly communicate with Region Servers
 Performs Schema Management and changes
Adding/Removing tables and Column Families
HBASE and Zookeeper
 HBASE uses zookeeper for region assignments
 Zookeeper is a centralized server for maintaining
configuration information, Naming, Providing
distributed synchronization, and providing group
service.
 File like API, performs operations on directories and
files (Znodes)
 Clients connect with a session to zookeeper
Session is maintained via Heart-Beat
Clients listening for updates will be notified of the
deleted nodes and new nodes.
HBASE and Zookeeper(Contd.)
 Each region server creates a Ephemeral Node.
Master monitors these nodes to discover available
region servers and for server failures.
 Use Zookeeper to make sure that only one master is
registered
 HBASE cannot exist in distributed without
Zookeeper.
HBASE Access
 Hbase Shell
 Native JAVA API
Fastest and very capable options.
 Avro Server
Requires running Avro Server.
 Hbql
SQL like syntax for HBASE

Mais conteúdo relacionado

Mais procurados

HBase Tutorial For Beginners | HBase Architecture | HBase Tutorial | Hadoop T...
HBase Tutorial For Beginners | HBase Architecture | HBase Tutorial | Hadoop T...HBase Tutorial For Beginners | HBase Architecture | HBase Tutorial | Hadoop T...
HBase Tutorial For Beginners | HBase Architecture | HBase Tutorial | Hadoop T...Simplilearn
 
Hadoop Summit 2012 | Optimizing MapReduce Job Performance
Hadoop Summit 2012 | Optimizing MapReduce Job PerformanceHadoop Summit 2012 | Optimizing MapReduce Job Performance
Hadoop Summit 2012 | Optimizing MapReduce Job PerformanceCloudera, Inc.
 
Managing 2000 Node Cluster with Ambari
Managing 2000 Node Cluster with AmbariManaging 2000 Node Cluster with Ambari
Managing 2000 Node Cluster with AmbariDataWorks Summit
 
HBase in Practice
HBase in PracticeHBase in Practice
HBase in Practicelarsgeorge
 
HBase Advanced - Lars George
HBase Advanced - Lars GeorgeHBase Advanced - Lars George
HBase Advanced - Lars GeorgeJAX London
 
Elasticsearch for beginners
Elasticsearch for beginnersElasticsearch for beginners
Elasticsearch for beginnersNeil Baker
 
Hive User Meeting August 2009 Facebook
Hive User Meeting August 2009 FacebookHive User Meeting August 2009 Facebook
Hive User Meeting August 2009 Facebookragho
 
Best Practice of Compression/Decompression Codes in Apache Spark with Sophia...
 Best Practice of Compression/Decompression Codes in Apache Spark with Sophia... Best Practice of Compression/Decompression Codes in Apache Spark with Sophia...
Best Practice of Compression/Decompression Codes in Apache Spark with Sophia...Databricks
 
Apache Iceberg: An Architectural Look Under the Covers
Apache Iceberg: An Architectural Look Under the CoversApache Iceberg: An Architectural Look Under the Covers
Apache Iceberg: An Architectural Look Under the CoversScyllaDB
 
SQL-on-Hadoop Tutorial
SQL-on-Hadoop TutorialSQL-on-Hadoop Tutorial
SQL-on-Hadoop TutorialDaniel Abadi
 
Hadoop Training | Hadoop Training For Beginners | Hadoop Architecture | Hadoo...
Hadoop Training | Hadoop Training For Beginners | Hadoop Architecture | Hadoo...Hadoop Training | Hadoop Training For Beginners | Hadoop Architecture | Hadoo...
Hadoop Training | Hadoop Training For Beginners | Hadoop Architecture | Hadoo...Simplilearn
 
How to build a streaming Lakehouse with Flink, Kafka, and Hudi
How to build a streaming Lakehouse with Flink, Kafka, and HudiHow to build a streaming Lakehouse with Flink, Kafka, and Hudi
How to build a streaming Lakehouse with Flink, Kafka, and HudiFlink Forward
 
Building Data Pipelines for Solr with Apache NiFi
Building Data Pipelines for Solr with Apache NiFiBuilding Data Pipelines for Solr with Apache NiFi
Building Data Pipelines for Solr with Apache NiFiBryan Bende
 
Local Secondary Indexes in Apache Phoenix
Local Secondary Indexes in Apache PhoenixLocal Secondary Indexes in Apache Phoenix
Local Secondary Indexes in Apache PhoenixRajeshbabu Chintaguntla
 
SF Big Analytics 20190612: Building highly efficient data lakes using Apache ...
SF Big Analytics 20190612: Building highly efficient data lakes using Apache ...SF Big Analytics 20190612: Building highly efficient data lakes using Apache ...
SF Big Analytics 20190612: Building highly efficient data lakes using Apache ...Chester Chen
 

Mais procurados (20)

Introduction to Hadoop Administration
Introduction to Hadoop AdministrationIntroduction to Hadoop Administration
Introduction to Hadoop Administration
 
HBase Tutorial For Beginners | HBase Architecture | HBase Tutorial | Hadoop T...
HBase Tutorial For Beginners | HBase Architecture | HBase Tutorial | Hadoop T...HBase Tutorial For Beginners | HBase Architecture | HBase Tutorial | Hadoop T...
HBase Tutorial For Beginners | HBase Architecture | HBase Tutorial | Hadoop T...
 
Hadoop Summit 2012 | Optimizing MapReduce Job Performance
Hadoop Summit 2012 | Optimizing MapReduce Job PerformanceHadoop Summit 2012 | Optimizing MapReduce Job Performance
Hadoop Summit 2012 | Optimizing MapReduce Job Performance
 
File Format Benchmark - Avro, JSON, ORC and Parquet
File Format Benchmark - Avro, JSON, ORC and ParquetFile Format Benchmark - Avro, JSON, ORC and Parquet
File Format Benchmark - Avro, JSON, ORC and Parquet
 
Managing 2000 Node Cluster with Ambari
Managing 2000 Node Cluster with AmbariManaging 2000 Node Cluster with Ambari
Managing 2000 Node Cluster with Ambari
 
HBase in Practice
HBase in PracticeHBase in Practice
HBase in Practice
 
HBase Advanced - Lars George
HBase Advanced - Lars GeorgeHBase Advanced - Lars George
HBase Advanced - Lars George
 
Solr Presentation
Solr PresentationSolr Presentation
Solr Presentation
 
Elasticsearch for beginners
Elasticsearch for beginnersElasticsearch for beginners
Elasticsearch for beginners
 
Hive User Meeting August 2009 Facebook
Hive User Meeting August 2009 FacebookHive User Meeting August 2009 Facebook
Hive User Meeting August 2009 Facebook
 
Apache HBase™
Apache HBase™Apache HBase™
Apache HBase™
 
Druid
DruidDruid
Druid
 
Best Practice of Compression/Decompression Codes in Apache Spark with Sophia...
 Best Practice of Compression/Decompression Codes in Apache Spark with Sophia... Best Practice of Compression/Decompression Codes in Apache Spark with Sophia...
Best Practice of Compression/Decompression Codes in Apache Spark with Sophia...
 
Apache Iceberg: An Architectural Look Under the Covers
Apache Iceberg: An Architectural Look Under the CoversApache Iceberg: An Architectural Look Under the Covers
Apache Iceberg: An Architectural Look Under the Covers
 
SQL-on-Hadoop Tutorial
SQL-on-Hadoop TutorialSQL-on-Hadoop Tutorial
SQL-on-Hadoop Tutorial
 
Hadoop Training | Hadoop Training For Beginners | Hadoop Architecture | Hadoo...
Hadoop Training | Hadoop Training For Beginners | Hadoop Architecture | Hadoo...Hadoop Training | Hadoop Training For Beginners | Hadoop Architecture | Hadoo...
Hadoop Training | Hadoop Training For Beginners | Hadoop Architecture | Hadoo...
 
How to build a streaming Lakehouse with Flink, Kafka, and Hudi
How to build a streaming Lakehouse with Flink, Kafka, and HudiHow to build a streaming Lakehouse with Flink, Kafka, and Hudi
How to build a streaming Lakehouse with Flink, Kafka, and Hudi
 
Building Data Pipelines for Solr with Apache NiFi
Building Data Pipelines for Solr with Apache NiFiBuilding Data Pipelines for Solr with Apache NiFi
Building Data Pipelines for Solr with Apache NiFi
 
Local Secondary Indexes in Apache Phoenix
Local Secondary Indexes in Apache PhoenixLocal Secondary Indexes in Apache Phoenix
Local Secondary Indexes in Apache Phoenix
 
SF Big Analytics 20190612: Building highly efficient data lakes using Apache ...
SF Big Analytics 20190612: Building highly efficient data lakes using Apache ...SF Big Analytics 20190612: Building highly efficient data lakes using Apache ...
SF Big Analytics 20190612: Building highly efficient data lakes using Apache ...
 

Destaque

TriHUG 3/14: HBase in Production
TriHUG 3/14: HBase in ProductionTriHUG 3/14: HBase in Production
TriHUG 3/14: HBase in Productiontrihug
 
HBase In Action - Chapter 04: HBase table design
HBase In Action - Chapter 04: HBase table designHBase In Action - Chapter 04: HBase table design
HBase In Action - Chapter 04: HBase table designphanleson
 
Apache hbase overview (20160427)
Apache hbase overview (20160427)Apache hbase overview (20160427)
Apache hbase overview (20160427)Steve Min
 
Column-Stores vs. Row-Stores: How Different are they Really?
Column-Stores vs. Row-Stores: How Different are they Really?Column-Stores vs. Row-Stores: How Different are they Really?
Column-Stores vs. Row-Stores: How Different are they Really?Daniel Abadi
 
HBase Sizing Guide
HBase Sizing GuideHBase Sizing Guide
HBase Sizing Guidelarsgeorge
 
HBase Operations and Best Practices
HBase Operations and Best PracticesHBase Operations and Best Practices
HBase Operations and Best PracticesVenu Anuganti
 
[Spark meetup] Spark Streaming Overview
[Spark meetup] Spark Streaming Overview[Spark meetup] Spark Streaming Overview
[Spark meetup] Spark Streaming OverviewStratio
 
Introduction to column oriented databases
Introduction to column oriented databasesIntroduction to column oriented databases
Introduction to column oriented databasesArangoDB Database
 
HBase: Just the Basics
HBase: Just the BasicsHBase: Just the Basics
HBase: Just the BasicsHBaseCon
 
HBase for Architects
HBase for ArchitectsHBase for Architects
HBase for ArchitectsNick Dimiduk
 
Intro to HBase Internals & Schema Design (for HBase users)
Intro to HBase Internals & Schema Design (for HBase users)Intro to HBase Internals & Schema Design (for HBase users)
Intro to HBase Internals & Schema Design (for HBase users)alexbaranau
 
Rise of Column Oriented Database
Rise of Column Oriented DatabaseRise of Column Oriented Database
Rise of Column Oriented DatabaseSuvradeep Rudra
 
Hbase: Introduction to column oriented databases
Hbase: Introduction to column oriented databasesHbase: Introduction to column oriented databases
Hbase: Introduction to column oriented databasesLuis Cipriani
 

Destaque (19)

NoSQL & HBase overview
NoSQL & HBase overviewNoSQL & HBase overview
NoSQL & HBase overview
 
TriHUG 3/14: HBase in Production
TriHUG 3/14: HBase in ProductionTriHUG 3/14: HBase in Production
TriHUG 3/14: HBase in Production
 
HBase In Action - Chapter 04: HBase table design
HBase In Action - Chapter 04: HBase table designHBase In Action - Chapter 04: HBase table design
HBase In Action - Chapter 04: HBase table design
 
Hbase at Salesforce.com
Hbase at Salesforce.comHbase at Salesforce.com
Hbase at Salesforce.com
 
Apache Spark Overview
Apache Spark OverviewApache Spark Overview
Apache Spark Overview
 
Apache hbase overview (20160427)
Apache hbase overview (20160427)Apache hbase overview (20160427)
Apache hbase overview (20160427)
 
Column-Stores vs. Row-Stores: How Different are they Really?
Column-Stores vs. Row-Stores: How Different are they Really?Column-Stores vs. Row-Stores: How Different are they Really?
Column-Stores vs. Row-Stores: How Different are they Really?
 
HBase Sizing Guide
HBase Sizing GuideHBase Sizing Guide
HBase Sizing Guide
 
HBase Operations and Best Practices
HBase Operations and Best PracticesHBase Operations and Best Practices
HBase Operations and Best Practices
 
[Spark meetup] Spark Streaming Overview
[Spark meetup] Spark Streaming Overview[Spark meetup] Spark Streaming Overview
[Spark meetup] Spark Streaming Overview
 
Intro to column stores
Intro to column storesIntro to column stores
Intro to column stores
 
Introduction to column oriented databases
Introduction to column oriented databasesIntroduction to column oriented databases
Introduction to column oriented databases
 
Apache Phoenix + Apache HBase
Apache Phoenix + Apache HBaseApache Phoenix + Apache HBase
Apache Phoenix + Apache HBase
 
HBase: Just the Basics
HBase: Just the BasicsHBase: Just the Basics
HBase: Just the Basics
 
HBase for Architects
HBase for ArchitectsHBase for Architects
HBase for Architects
 
Intro to HBase Internals & Schema Design (for HBase users)
Intro to HBase Internals & Schema Design (for HBase users)Intro to HBase Internals & Schema Design (for HBase users)
Intro to HBase Internals & Schema Design (for HBase users)
 
Rise of Column Oriented Database
Rise of Column Oriented DatabaseRise of Column Oriented Database
Rise of Column Oriented Database
 
Hbase: Introduction to column oriented databases
Hbase: Introduction to column oriented databasesHbase: Introduction to column oriented databases
Hbase: Introduction to column oriented databases
 
Intro to HBase
Intro to HBaseIntro to HBase
Intro to HBase
 

Semelhante a HBASE Overview

Hbase Quick Review Guide for Interviews
Hbase Quick Review Guide for InterviewsHbase Quick Review Guide for Interviews
Hbase Quick Review Guide for InterviewsRavindra kumar
 
HBase.pptx
HBase.pptxHBase.pptx
HBase.pptxSadhik7
 
CCS334 BIG DATA ANALYTICS UNIT 5 PPT ELECTIVE PAPER
CCS334 BIG DATA ANALYTICS UNIT 5 PPT  ELECTIVE PAPERCCS334 BIG DATA ANALYTICS UNIT 5 PPT  ELECTIVE PAPER
CCS334 BIG DATA ANALYTICS UNIT 5 PPT ELECTIVE PAPERKrishnaVeni451953
 
Apache HBase - Introduction & Use Cases
Apache HBase - Introduction & Use CasesApache HBase - Introduction & Use Cases
Apache HBase - Introduction & Use CasesData Con LA
 
Chicago Data Summit: Apache HBase: An Introduction
Chicago Data Summit: Apache HBase: An IntroductionChicago Data Summit: Apache HBase: An Introduction
Chicago Data Summit: Apache HBase: An IntroductionCloudera, Inc.
 
Hw09 Practical HBase Getting The Most From Your H Base Install
Hw09   Practical HBase  Getting The Most From Your H Base InstallHw09   Practical HBase  Getting The Most From Your H Base Install
Hw09 Practical HBase Getting The Most From Your H Base InstallCloudera, Inc.
 
Hbase introduction
Hbase introductionHbase introduction
Hbase introductionyangwm
 
Facebook keynote-nicolas-qcon
Facebook keynote-nicolas-qconFacebook keynote-nicolas-qcon
Facebook keynote-nicolas-qconYiwei Ma
 
Facebook Messages & HBase
Facebook Messages & HBaseFacebook Messages & HBase
Facebook Messages & HBase强 王
 
支撑Facebook消息处理的h base存储系统
支撑Facebook消息处理的h base存储系统支撑Facebook消息处理的h base存储系统
支撑Facebook消息处理的h base存储系统yongboy
 

Semelhante a HBASE Overview (20)

Hbase
HbaseHbase
Hbase
 
Hbase Quick Review Guide for Interviews
Hbase Quick Review Guide for InterviewsHbase Quick Review Guide for Interviews
Hbase Quick Review Guide for Interviews
 
Introduction to HBase
Introduction to HBaseIntroduction to HBase
Introduction to HBase
 
4. hbase overview
4. hbase overview4. hbase overview
4. hbase overview
 
HBase.pptx
HBase.pptxHBase.pptx
HBase.pptx
 
CCS334 BIG DATA ANALYTICS UNIT 5 PPT ELECTIVE PAPER
CCS334 BIG DATA ANALYTICS UNIT 5 PPT  ELECTIVE PAPERCCS334 BIG DATA ANALYTICS UNIT 5 PPT  ELECTIVE PAPER
CCS334 BIG DATA ANALYTICS UNIT 5 PPT ELECTIVE PAPER
 
Apache HBase - Introduction & Use Cases
Apache HBase - Introduction & Use CasesApache HBase - Introduction & Use Cases
Apache HBase - Introduction & Use Cases
 
Hbase
HbaseHbase
Hbase
 
Hbase.pptx
Hbase.pptxHbase.pptx
Hbase.pptx
 
Hbase
HbaseHbase
Hbase
 
Chicago Data Summit: Apache HBase: An Introduction
Chicago Data Summit: Apache HBase: An IntroductionChicago Data Summit: Apache HBase: An Introduction
Chicago Data Summit: Apache HBase: An Introduction
 
Apache hadoop hbase
Apache hadoop hbaseApache hadoop hbase
Apache hadoop hbase
 
Hw09 Practical HBase Getting The Most From Your H Base Install
Hw09   Practical HBase  Getting The Most From Your H Base InstallHw09   Practical HBase  Getting The Most From Your H Base Install
Hw09 Practical HBase Getting The Most From Your H Base Install
 
Big data hbase
Big data hbase Big data hbase
Big data hbase
 
Hbase introduction
Hbase introductionHbase introduction
Hbase introduction
 
Hadoop - Apache Hbase
Hadoop - Apache HbaseHadoop - Apache Hbase
Hadoop - Apache Hbase
 
Facebook keynote-nicolas-qcon
Facebook keynote-nicolas-qconFacebook keynote-nicolas-qcon
Facebook keynote-nicolas-qcon
 
Facebook Messages & HBase
Facebook Messages & HBaseFacebook Messages & HBase
Facebook Messages & HBase
 
支撑Facebook消息处理的h base存储系统
支撑Facebook消息处理的h base存储系统支撑Facebook消息处理的h base存储系统
支撑Facebook消息处理的h base存储系统
 
Hbase
HbaseHbase
Hbase
 

HBASE Overview

  • 1. HBASE THE SCALABLE DATA STORE Sampath Rachakonda
  • 2. Agenda  Evolution of HBASE  Overview  Data Model  Architecture  Hbase and Zookeper
  • 3. Evolution of HBASE  File-Systems  Tapes → Linear Access or Sequential Access.  Disc → Random Access Seek Time Transfer Rate  DBMS  RDBMS  Now NOSQL
  • 4. Hadoop  It comprises mainly two things HDFS and MapReduce.  HDFS is scalable, fault tolerant, and high performance DFS can run on commodity hardware.  Map-Reduce is software framework for distributed computation.  Master/Slave `  Limitations Batch processing Sequential Data look-up Not intended for real time querying No Support for Random Access
  • 5. NOSQL  Massive Data Volumes  Schema Evolution As it is almost impossible for fixed Schema for web scale database. With NOSQL Schema changes can be gradually introduced into systems.  Extreme Query Load Bottleneck is Joins
  • 6. Why HBASE ?  Column-Oriented Stores  Distributed – Designed to serve large tables  Horizontally Scalable  High Performance & Availability  Storage System  The base goal of HBASE is Billions of Rows, Millions of Columns and Thousand of versions  Supports random real time CRUD operations unlike HDFS
  • 7. Who uses Hbase ?  Facebook  Adobe  Twitter  Yahoo  Meetup  Netflix  Many More..
  • 8. When to use HBASE ?  Good for large amounts of data 100's of millions or billions of rows Have to have enough hardware Large Amounts of client requests Single Random Selects and range scans by key Great for variable schema Analytical
  • 9. HBASE Data Model  Data is stored in Tables  Tables contain rows Rows are referenced by Unique key Key is array of bytes anything can be a key.  Rows made of columns are grouped in column families Data is stored in cells and identified by row x column-family x column  Tables are sorted by the row key in lexicographical order.
  • 10. HBASE Families  Rows are grouped as families Labeled as “Family:column”  Example: “user:name” Different features are applied to families  Stored together – HFile/StoreFile  Compression  Table Schema defines its Column Families Each family can consist of any number of columns and Versions Column exists when inserted, NULLS are free. Columns with family are sorted and stored together.
  • 11. HBASE Timestamps  Cells Values are versioned and 3 versions are kept by default.  Versions are stored in decreasing time-stamp order.  Reads the latest first – which will be our current value.  Value will be Value = Table + RowKey + Family + Column + TimeStamp  Index will be always unique
  • 12. HBASE Cells Example  Example of how values are stored Row Key Time stamp Name Family Address Family first_name last_name number address row1 t1 Bob Smith t5 10 First Lane t10 30 Other Lane t15 7 Last Street row2 t20 Mary Tompson t22 77 One Street t30 Thompson
  • 13. HBASE Architecture  Table is made up of regions  Region is a range of rows sorted together Dynamically splits as they become too big and merge when they are too small  Master Server is responsible for managing Hbase cluster (i.e.., Region Servers)  Hbase stores its data into HDFS which makes to rely it on high tolerant and high availability and fault tolerance features.  Zookeper is used for distributed coordination.
  • 15. HBASE Regions  Region is a range of keys start key to end key exclusive  Initially there will be one region as addition of data exceed the configured maximum (256 MB default) the region will be split  No of regions per server varies from 10 to 10000 as per hardware per region server.  Splitting data into regions help us in different ways: Fast Recovery when a region fails Load Balancing when a server overloaded Splitting is fast
  • 16. HBASE Data Storage  When data is added it will be written on to WAL (Write Ahead Log) and also in memory (Memstore)  When the data exceeds maximum value then it is flushed out of WAL to HFile  RegionServer still serves read-writes during the flush operations, writing values to WAL & Memstore.  Hfile is nothing much than a Key-Value map.  As HDFS doesn't support updates to an existing file therefore HFiles are immutable.  Delete Marker is saved to indicate whether record is available or removed.
  • 17. HBASE Data Storage(Contd.)  Periodic Data Computations are performed to control no of Hfiles and to keep cluster balanced Minor Complication:  Smaller Hfiles are merged into larger Hfiles Fast as data is already sorted Delete Markers are not applied Major Complication:  Scanning for all the entries and apply deletes as necessary  Merge all Hfiles of a region into a single file lies within a column family
  • 18. HBASE Master  Manages Regions and their locations Assigns Regions Balances workload Recovers if any region server is unavailable Uses Zookeeper for distributed coordination service  Clients directly communicate with Region Servers  Performs Schema Management and changes Adding/Removing tables and Column Families
  • 19. HBASE and Zookeeper  HBASE uses zookeeper for region assignments  Zookeeper is a centralized server for maintaining configuration information, Naming, Providing distributed synchronization, and providing group service.  File like API, performs operations on directories and files (Znodes)  Clients connect with a session to zookeeper Session is maintained via Heart-Beat Clients listening for updates will be notified of the deleted nodes and new nodes.
  • 20. HBASE and Zookeeper(Contd.)  Each region server creates a Ephemeral Node. Master monitors these nodes to discover available region servers and for server failures.  Use Zookeeper to make sure that only one master is registered  HBASE cannot exist in distributed without Zookeeper.
  • 21. HBASE Access  Hbase Shell  Native JAVA API Fastest and very capable options.  Avro Server Requires running Avro Server.  Hbql SQL like syntax for HBASE