SlideShare uma empresa Scribd logo
1 de 76
Scott Miao 2012/7/5




                      1
Agenda
   Course Credit
   Hands-on
       Install tm-puppet
   Client API: The basics
   Hands-on
       Write your own CRUD codes~
   Client API: Advanced Feature
   All refers to
Hbase: The Definitive Guide - http://www.amazon.com/HBase-Definitive-Guide-
Lars-George/dp/1449396100/ref=sr_1_1?ie=UTF8&qid=1339060175&sr=8-1
                                                                              2
General Notes (1/2)
   Any mutate data operations are atomic
    on a per-row basis

   Create HTable instances only once for
    each thread
     HTable is not thread-safe
     Can use HTablePool
   Familiar with the API docs


                                            3
General Notes (2/2)
   Configuration
     Load hbase-default & hbase-site.xml in
      CLASSPATH
     Set properties in hadoop CLI
     Set properties in Java code
     Set in Java code > hadoop CLI > hbase-
      site.xml > hbase-default.xml




                                               4
Client API: The basics
 Put
 Get
 Delete
 Batch operations
 Row Locks
 Scan



 Source code: https://github.com/larsgeorge/hbase-book

                                                         5
Put method - Single Put
   ch03/client.PutExample

   Notice the timestamp
     ts will set by HBase if user not provide
     ts determines version in HBase (default is 3)
     ts may confuse the HBase versioning if
     Client‟s timezone is not identical




                                                      6
Put method - KeyValue
   The low level data bytes in Client APIs
     <row-key>/<family>:<qualifier>/<version>/<type>/<value-
      length>

     Put.add(KeyValue kv);
     Map<byte[], List<KeyValue>> Put.getFamilyMap();




                                                                7
Put method –
Client-side write buffer (1/3)
 collects put operations so that they are
  sent in one RPC call to the server(s)
 ch03/client.PutWriteBufferExample
  sample code
     long getWriteBufferSize()
     void setWriteBufferSize(long
      writeBufferSize)
     Default is 2 MB bytes
     Configuration property
      ○ hbase.client.write.buffer

                                             8
Put method –
Client-side write buffer (2/3)
   Round-trip time
     Is the time it takes for a client to send a
      request and the server to send a response
      over the network
     Not include the Data-Transfer Time
      ○ Data size is a factor
     On average, 1ms on a LAN
      ○ 1000 round-trips per second
   Usecase
     Small data size but many requests to send


                                                    9
Put method –
Client-side write buffer (3/3)




                                 10
Put method – List of Puts

 ch03/client.PutListExample
 ch03/client. PutListErrorExample2




                                      11
Put method –
Atomic compare-and-set
   A check before put

   ch03/client.CheckAndPutExample

   Can not cross the row




                                     12
Get method – Single Gets

 client.GetExample
 Result Class
     Contains all the matching cells




                                        13
Get method – List of Gets
 client.GetListExample
 client. GetListErrorExample
 client.GetRowOrBeforeExample
     Find the specified rowKey
     Previous row if not found
     Null if no any found




                                  14
Delete method – Single Deletes
   client.DeleteExample




                                 15
Delete method – List of Deletes
 client.DeleteListExample
 client.DeleteListErrorExample




                                  16
Delete method –
Atomic compare-and-delete

   client.CheckAndDeleteExample




                                   17
Batch Operations
   client.BatchExample
     No client-side buffer, just like Put operations




                                                        18
Row Locks (1/3)
   Two types of lock
     Server side lock
      ○ Servers will create a lock implicitly on your
        behalf, just for the duration of the call
     Client side lock
      ○ Clients can also acquire explicit locks and use
        them across multiple operations on the same
        row
      ○ RowLock Class

   client.RowLockExample

                                                          19
Row Locks (2/3)
   Avoid using row locks whenever
    possible

   Do Gets require a Lock ?
     No, while a mutation is in progress, all
     reading clients will be seeing the previous
     state of all columns




                                                   20
Row Locks (3/3)
   When to release RowLock ?
     Current lock has been released
     The lease on the lock has expired
      ○ Configuration Property on the server side
         hbase.regionserver.lease.period
         Default is 1 min.
      ○ org.apache.hadoop.hbase.regionserver.LeaseExce
        ption:




                                                         21
Scans (1/3)
   A technique akin to cursors in database
    systems
     which make use of the underlying
     sequential, sorted storage layout HBase is
     providing

   Narrowing the scan‟s scope is playing
    into the strengths of HBase
     Since data is stored in column families, you
     will not read the unrelated families storage
     files at all

                                                     22
Scans (2/3)
   Scan(byte[] startRow, byte[] stopRow)
     [startRow, stopRow)
 Scan addFamily(byte [] family)
 Scan addColumn(byte[] family, byte[]
  qualifier)
 Scan setTimeRange(long minStamp,
  long maxStamp) throws IOException
 Scan setTimeStamp(long timestamp)
 Scan setMaxVersions()
 Scan setMaxVersions(int maxVersions)
 Scan setFilter(Filter filter)
                                            23
Scans (3/3)
   ch03/client.ScanExample

   Scans do not ship all the matching rows in
    one RPC to the client
     one call would use up too many resources, and
     take a long time
 ResultScanner wraps the Result instance for
  each row into an iterator functionality
 An iterator functionality
     Just like JDBC ResultSet

                                                      24
Scans – Caching (1/2)
 Deal with small data rows with huge data
  set size
 Table level
     void HTable.setScannerCaching(int
     scannerCaching)
   Scan level
     void Scan.setCaching(int caching)
   In configuration file (hbase-site.xml)
     hbase.client.scanner.caching
     Will take effect depends on you put it on the
     client or server side
                                                      25
Scans – Caching (2/2)
   Need to find a sweet spot between
     A low number of RPCs
     The memory used on client and server side


 Using the same lease-based
  mechanisms with RowLock
 org.apache.hadoop.hbase.client.Scanne
  rTimeoutException: 65094ms passed
  since the last invocation, timeout is
  currently set to 60000
 ch03/client.ScanTimeoutExample

                                                  26
Scans – Batching
   Deal with very large rows
     Those do not fit into the memory of the client
      process
     batching works on the column level
 void Scan.setBatch(int batch)
 For example, your row has 17 columns
  and you set the batch to 5…
     You‟ll get four Result instances, with 5, 5, 5,
      and 2

                                                        27
Scans – Caching & Batching
(1/3)
 ch03/client.ScanCacheBatchExample
 10 rows * 20 columns per row = 200 columns




                                               28
Scans – Caching & Batching
    (2/3)
   RPCs = (Rows * Cols per Row) / Min(Cols per
    Row, Batch Size) / Scanner Caching
      = (10 * 20) / Min(20, 20) / 5
      = 200 / 20 / 5
      =2
   2 + 1 or 2 requests for open/close Scanner
       = 3 or 4



                                                  29
Scans – Caching & Batching
(3/3)
 1 Table, 9 Rows, with some columns
 Caching set to 6, batch set to 3




                                       30
Hands-On –
Write your own CRUD codes
(1/3)
   In hbase shell
     Create table
      ○ A Table „MY_SECOND_TABLE‟
      ○ With two column families „FAM_1‟, „FAM_2‟
   In java code
     Put
      ○ Two values
     Scan Table
     Delete
      ○ One value
     Get the last one value


                                                    31
Hands-On –
    Write your own CRUD codes
    (2/3)
   Environment
     Let project_home = ${git_home}/hbase-
      training/002/hands-on/${your_name}
     mkdir ${project_home}
     cp –rf ${git_home}/hbase-training/002/projects/training-
      002 ${project_home}


   Write java codes in
     ${project_home}/src/main/java/client/CrudTest.java


                                                             32
Hands-On –
Write your own CRUD codes
(3/3)
   Requirements
     After you completed your codes
     Run command in ${project_home}
      ○ Build the jar file
        mvn clean package
     ○ Run the jar file you built
        sh bin/run.sh > output.txt
     ○ output.txt
        Ran successfully and output the Hbase data
        I will verify this file in git

   Commit and push your git
                                                      33
Client API: Advanced
Features
 Filters
 Counters
 Coprocessors
 HTable Pool
 Connection Handling




                        34
Filters
   Get
     Direct access to data
   Scan
     Use start/end key
   Filters
     More limiting selectors to the query
     Applied on the server side
     Including
      ○ Column families, column
        qualifiers, timestamps or ranges, version
        number
                                                    35
Filters – How Filters work




                             36
Filters –
Hierarchy
   Various Filter
    impl.s for your
    needs

   You also can write
    your own impl.




                         37
Filters – Comparison Filters
They take the comparison operator and
comparator instance
     Class Name                             Description

RowFilter            It is used to filter based on the row key


FamilyFilter         It is used to filter based on the column family


QualifierFilter      It is used to filter based on the column qualifier


ValueFilter          It is used to filter based on column value


DependentColumnFil   It uses the timestamp of the reference column and
ter                  includes all other columns that have the same
                     timestamp                                            38
Filters –
CompareFilter
Operators




                39
Filters –
CompareFilter Comparators




                            40
Filters – CompareFilter
example

   ch04/filters.RowFilterExample




                                    41
Filters – Dedicated Filters
   Mainly used in the Scan, they basically
   filter out entire rows
     Class Name                                  Description

SingleColumnValueFil   It is used to filter cells based on value
ter
SingleColumnValueEx Opposite with SingleColumnValueFilter
cludeFilter
PrefixFilter           All rows that match this prefix are returned to the client

PageFilter             It controls how many rows per page should be returned

KeyOnlyFilter          It access just the keys of each KeyValue, while omitting the
                       actual data
FirstKeyOnlyFilter     It access the key of first column in each row, and bypass the
                       rest                                                         42
Filters – Dedicated Filters

      Class Name                                  Description

InclusiveStopFilter      Change the Scan [startRow, stopRow) to [startRow, stopRow]

TimestampFilter          It returns only cells whose timestamp (version) is in the
                         specified list of timestamps (versions)
ColumnCountGetFilter     It returns first N columns on row only, for HBase test purpose

ColumnPaginationFilter   Similar to the PageFilter, this one can be used to page
                         through columns in a row
ColumnPrefixFilter       Analog to the PrefixFilter, which worked by filtering on row
                         key prefixes, this filter does the same for columns
RandomRowFilter          It including random rows into the result
                                                                                        43
Filters – Decorating Filters


   Class Name                              Description

SkipFilter         wraps a given filter and extends it to exclude an entire row,
                   when the wrapped filter hints for a KeyValue to be skipped


WhileMatchFilter   It aborts the entire scan once a piece of information is
                   filtered




                                                                                   44
Filters - FilterList
   In practice, you may want to have more
    than one filter being applied to reduce
    the data returned to your client
    application

   Operators




                                              45
Filters - FilterList
   ch04/filters.FilterListExample
     First scan filters is like




     Second scan filters is like




                                     46
Filters – Custom Filters
   If there is no any Filters
    can help your needs
     You could make one by
      yourself !!


   Make a Filter Impl.
    extended from
     Filter
     FilterBase


                                 47
Filters – Custom Filters
 ch04/filters.CustomFilter
 ch04/filters.CustomFilterExample


   Custom Filters Deployment (costly)
    1. Build jar file
    2. Put jar file on the every region server
    3. Append jar file path into $CLASSPATH in
       hbase-env.sh
    4. Restart all HBase daemons

                                                 48
Filters - Summary




                    49
Filters - Summary




                    50
Counters
   Many applications that collect statistics
     such as clicks or views in online advertising
     were used to collect the data in logfiles that
     would subsequently be analyzed


   The Counter is all you need !!




                                                       51
Counters - shell
   Create a table
     create 'counters', 'daily', 'weekly', 'monthly„


   Initial a counter
     incr 'counters', '20110101', 'daily:hits', 1
     Let‟s do it twice


   Get your counter
     get_counter 'counters', '20110101',
      'daily:hits'
                                                        52
Counters - shell
   You can also fine-tune your counter
     incr 'counters', '20110101', 'daily:hits', 0
     incr 'counters', '20110101', 'daily:hits', -1


   Do not use put as incr, despite counter
    is also a value
     Data type issue, long V.S. String
   Use get_counter not get
     It is more human readable~

                                                      53
Counters - API
   Single Counters
     ch04/client.IncrementSingleExample


   Multiple Counters
     ch04/client.IncrementMultipleExample




                                             54
Coprocessors
   With the coprocessor feature in HBase,
    you can even move part of the
    computation to where the data lives

   As a small MapReduce framework,
    which can distribute the work across the
    entire cluster




                                               55
Coprocessors
   Two types
     Observer
      ○ Trigger-like
     Endpoint
      ○ Stored procedure-like
   Usecases
     Aggregate functions, sum(), avg()
     Integrity Checks, put some data and other data
      must exist
     Authentication, authorization and auditing
      ○ Based on Coprocessors from 0.92 HBase


                                                       56
Coprocessors –
Coprocessor Class
   Priorities defined in Coprocessor.Priority
    enumeration




                                                 57
Coprocessors –
Coprocessor Class
   State defined in Coprocessor.State
    enumeration




                                         58
Coprossesor – Main
Classes




                     59
Coprossesor – Flow




                     60
Coprocessor – Loading from
    Configuration
   Add following description in hbase-site.xml




   Region, master, wal are different Observers
   The order of Class fully-qualified names in value, will
    determine the execution order
   And follow the Custom-Filter deployment way
   For every table and region
                                                              61
Coprocessor – Loading from
    table descriptor
   Use HTableDescriptor.setValue(String key, String value)
   Key spec.
     COPROCESSOR[$<number>]
     Ex.
       ○ COPROCESSOR$1
   Value spec.
     <jarFilePath>|<classFullyQualifiedName>|<priority>
     Ex.
       ○ “hdfs://localhost:8020/users/leon/test.jar|coprocessor.Test|SYSTEM”
     jarFilePath could be any protocol supported by Hadoop FileSystem Class


   Ch04/coprocessor.LoadWithTableDescriptorExample
   Only for regions of specified table


                                                                         62
Coprocessor - Observer
   callback functions (hooks) are executed
    when certain events occur
   Known as Triggers in DBMS


      Observer Type                      Description

RegionObserver        Observse events bound to the regions of a table


MasterObserver        Observe evens bound to administrative or DDL-type
                      operations (cluster-wide event)

WALObserver           Observe events bound to WAL log (Write-ahead log)
                      processing
                                                                          63
Coprocessor – Observer main
claases




                              64
Coprocessor – RegionObserver
and Region Life Cycle




                               65
Coprocessor – RegionObserver
Classes
• Handling region life cycle events
• Handling client API events
• ch04/coprocessor.RegionObserverExample




                                           66
Coprocessor – MasterObserver
Classes
• ch04/coprocessor.MasterObserverExample




                                           67
Coprocessor - Endpoint
   User code can be deployed to the
    servers hosting the data to, for example,
    perform server-local computations

   Known as Stored procedures in DBMS

   Can be combined with observer
    implementations to directly interact with
    the server-side state

                                                68
Coprocessor – Endpoint main
Classes
 •   ch04/coprocessor.RowCountProtocol
 •   ch04/coprocessor.RowCountEndpoint
 •   ch04/coprocessor.EndpointExample
 •   ch04/coprocessor.EndpointProxyExample




                                             69
Coprocessor –
Single Region V.S. Range of regions




                                      70
HTablePool
 Creating an HTable instance takes a few
  seconds to complete
 It is not be capable in highly contended
  environment with thousands of requests
  per second
 Keep one HTable instance for multiple
  uses, but it is not thread-safe



                                             71
HTablePool – Sample code




                           72
Connection Handling
   Use the shared Connection as you can




                                           73
Connection Handling –
Main Classes




                        74
Connection Handling –
Features
 Share ZooKeeper connections
     initial lookup of where user table regions are
      located
   Cache common resources
     Location is cached on the client side after first
      round-trips with ZooKeeper and other servers
     When a lookup fails
      ○ Ex. A region was split
      ○ A built-in retry mechanism to refresh the stale
        cache information
   Do not forget to release your shared Connection
     HTable.close()
     HTablePool.closeTablePool(…)

                                                          75
中場休息~
        76

Mais conteúdo relacionado

Mais procurados

HBase at Xiaomi
HBase at XiaomiHBase at Xiaomi
HBase at XiaomiHBaseCon
 
HBase Coprocessor Introduction
HBase Coprocessor IntroductionHBase Coprocessor Introduction
HBase Coprocessor IntroductionSchubert Zhang
 
PostgreSQL Administration for System Administrators
PostgreSQL Administration for System AdministratorsPostgreSQL Administration for System Administrators
PostgreSQL Administration for System AdministratorsCommand Prompt., Inc
 
HBaseCon2017 Improving HBase availability in a multi tenant environment
HBaseCon2017 Improving HBase availability in a multi tenant environmentHBaseCon2017 Improving HBase availability in a multi tenant environment
HBaseCon2017 Improving HBase availability in a multi tenant environmentHBaseCon
 
Oracle: Binding versus caging
Oracle: Binding versus cagingOracle: Binding versus caging
Oracle: Binding versus cagingBertrandDrouvot
 
The Essential postgresql.conf
The Essential postgresql.confThe Essential postgresql.conf
The Essential postgresql.confRobert Treat
 
Advanced Postgres Monitoring
Advanced Postgres MonitoringAdvanced Postgres Monitoring
Advanced Postgres MonitoringDenish Patel
 
GitLab PostgresMortem: Lessons Learned
GitLab PostgresMortem: Lessons LearnedGitLab PostgresMortem: Lessons Learned
GitLab PostgresMortem: Lessons LearnedAlexey Lesovsky
 
MapR Tutorial Series
MapR Tutorial SeriesMapR Tutorial Series
MapR Tutorial Seriesselvaraaju
 
Technical Overview of Apache Drill by Jacques Nadeau
Technical Overview of Apache Drill by Jacques NadeauTechnical Overview of Apache Drill by Jacques Nadeau
Technical Overview of Apache Drill by Jacques NadeauMapR Technologies
 
Keynote: Apache HBase at Yahoo! Scale
Keynote: Apache HBase at Yahoo! ScaleKeynote: Apache HBase at Yahoo! Scale
Keynote: Apache HBase at Yahoo! ScaleHBaseCon
 
Mastering PostgreSQL Administration
Mastering PostgreSQL AdministrationMastering PostgreSQL Administration
Mastering PostgreSQL AdministrationEDB
 
Lucene revolution 2011
Lucene revolution 2011Lucene revolution 2011
Lucene revolution 2011Takahiko Ito
 
HBaseCon 2013: Scalable Network Designs for Apache HBase
HBaseCon 2013: Scalable Network Designs for Apache HBaseHBaseCon 2013: Scalable Network Designs for Apache HBase
HBaseCon 2013: Scalable Network Designs for Apache HBaseCloudera, Inc.
 
Managing PostgreSQL with PgCenter
Managing PostgreSQL with PgCenterManaging PostgreSQL with PgCenter
Managing PostgreSQL with PgCenterAlexey Lesovsky
 
PostgreSQL Troubleshoot On-line, (RITfest 2015 meetup at Moscow, Russia).
PostgreSQL Troubleshoot On-line, (RITfest 2015 meetup at Moscow, Russia).PostgreSQL Troubleshoot On-line, (RITfest 2015 meetup at Moscow, Russia).
PostgreSQL Troubleshoot On-line, (RITfest 2015 meetup at Moscow, Russia).Alexey Lesovsky
 
HBaseCon 2015: OpenTSDB and AsyncHBase Update
HBaseCon 2015: OpenTSDB and AsyncHBase UpdateHBaseCon 2015: OpenTSDB and AsyncHBase Update
HBaseCon 2015: OpenTSDB and AsyncHBase UpdateHBaseCon
 

Mais procurados (20)

HBase at Xiaomi
HBase at XiaomiHBase at Xiaomi
HBase at Xiaomi
 
HBase Coprocessor Introduction
HBase Coprocessor IntroductionHBase Coprocessor Introduction
HBase Coprocessor Introduction
 
PostgreSQL Administration for System Administrators
PostgreSQL Administration for System AdministratorsPostgreSQL Administration for System Administrators
PostgreSQL Administration for System Administrators
 
HBaseCon2017 Improving HBase availability in a multi tenant environment
HBaseCon2017 Improving HBase availability in a multi tenant environmentHBaseCon2017 Improving HBase availability in a multi tenant environment
HBaseCon2017 Improving HBase availability in a multi tenant environment
 
Oracle: Binding versus caging
Oracle: Binding versus cagingOracle: Binding versus caging
Oracle: Binding versus caging
 
The Essential postgresql.conf
The Essential postgresql.confThe Essential postgresql.conf
The Essential postgresql.conf
 
Advanced Postgres Monitoring
Advanced Postgres MonitoringAdvanced Postgres Monitoring
Advanced Postgres Monitoring
 
GitLab PostgresMortem: Lessons Learned
GitLab PostgresMortem: Lessons LearnedGitLab PostgresMortem: Lessons Learned
GitLab PostgresMortem: Lessons Learned
 
MapR Tutorial Series
MapR Tutorial SeriesMapR Tutorial Series
MapR Tutorial Series
 
PostgreSQL Replication Tutorial
PostgreSQL Replication TutorialPostgreSQL Replication Tutorial
PostgreSQL Replication Tutorial
 
Technical Overview of Apache Drill by Jacques Nadeau
Technical Overview of Apache Drill by Jacques NadeauTechnical Overview of Apache Drill by Jacques Nadeau
Technical Overview of Apache Drill by Jacques Nadeau
 
Keynote: Apache HBase at Yahoo! Scale
Keynote: Apache HBase at Yahoo! ScaleKeynote: Apache HBase at Yahoo! Scale
Keynote: Apache HBase at Yahoo! Scale
 
Mastering PostgreSQL Administration
Mastering PostgreSQL AdministrationMastering PostgreSQL Administration
Mastering PostgreSQL Administration
 
Lucene revolution 2011
Lucene revolution 2011Lucene revolution 2011
Lucene revolution 2011
 
HBaseCon 2013: Scalable Network Designs for Apache HBase
HBaseCon 2013: Scalable Network Designs for Apache HBaseHBaseCon 2013: Scalable Network Designs for Apache HBase
HBaseCon 2013: Scalable Network Designs for Apache HBase
 
HBase Low Latency
HBase Low LatencyHBase Low Latency
HBase Low Latency
 
Concurrency
ConcurrencyConcurrency
Concurrency
 
Managing PostgreSQL with PgCenter
Managing PostgreSQL with PgCenterManaging PostgreSQL with PgCenter
Managing PostgreSQL with PgCenter
 
PostgreSQL Troubleshoot On-line, (RITfest 2015 meetup at Moscow, Russia).
PostgreSQL Troubleshoot On-line, (RITfest 2015 meetup at Moscow, Russia).PostgreSQL Troubleshoot On-line, (RITfest 2015 meetup at Moscow, Russia).
PostgreSQL Troubleshoot On-line, (RITfest 2015 meetup at Moscow, Russia).
 
HBaseCon 2015: OpenTSDB and AsyncHBase Update
HBaseCon 2015: OpenTSDB and AsyncHBase UpdateHBaseCon 2015: OpenTSDB and AsyncHBase Update
HBaseCon 2015: OpenTSDB and AsyncHBase Update
 

Semelhante a 002 hbase clientapi

weblogic perfomence tuning
weblogic perfomence tuningweblogic perfomence tuning
weblogic perfomence tuningprathap kumar
 
Troubleshooting common oslo.messaging and RabbitMQ issues
Troubleshooting common oslo.messaging and RabbitMQ issuesTroubleshooting common oslo.messaging and RabbitMQ issues
Troubleshooting common oslo.messaging and RabbitMQ issuesMichael Klishin
 
Developing Realtime Data Pipelines With Apache Kafka
Developing Realtime Data Pipelines With Apache KafkaDeveloping Realtime Data Pipelines With Apache Kafka
Developing Realtime Data Pipelines With Apache KafkaJoe Stein
 
Introduction to LAVA Workload Scheduler
Introduction to LAVA Workload SchedulerIntroduction to LAVA Workload Scheduler
Introduction to LAVA Workload SchedulerNopparat Nopkuat
 
Sql saturday azure storage by Anton Vidishchev
Sql saturday azure storage by Anton VidishchevSql saturday azure storage by Anton Vidishchev
Sql saturday azure storage by Anton VidishchevAlex Tumanoff
 
Windows Azure Storage: Overview, Internals, and Best Practices
Windows Azure Storage: Overview, Internals, and Best PracticesWindows Azure Storage: Overview, Internals, and Best Practices
Windows Azure Storage: Overview, Internals, and Best PracticesAnton Vidishchev
 
Apache Cassandra in Bangalore - Cassandra Internals and Performance
Apache Cassandra in Bangalore - Cassandra Internals and PerformanceApache Cassandra in Bangalore - Cassandra Internals and Performance
Apache Cassandra in Bangalore - Cassandra Internals and Performanceaaronmorton
 
Drizzle—Low Latency Execution for Apache Spark: Spark Summit East talk by Shi...
Drizzle—Low Latency Execution for Apache Spark: Spark Summit East talk by Shi...Drizzle—Low Latency Execution for Apache Spark: Spark Summit East talk by Shi...
Drizzle—Low Latency Execution for Apache Spark: Spark Summit East talk by Shi...Spark Summit
 
Developing Real-Time Data Pipelines with Apache Kafka
Developing Real-Time Data Pipelines with Apache KafkaDeveloping Real-Time Data Pipelines with Apache Kafka
Developing Real-Time Data Pipelines with Apache KafkaJoe Stein
 
Event Processing and Integration with IAS Data Processors
Event Processing and Integration with IAS Data ProcessorsEvent Processing and Integration with IAS Data Processors
Event Processing and Integration with IAS Data ProcessorsInvenire Aude
 
What the CRaC - Superfast JVM startup
What the CRaC - Superfast JVM startupWhat the CRaC - Superfast JVM startup
What the CRaC - Superfast JVM startupGerrit Grunwald
 
Hadoop and HBase experiences in perf log project
Hadoop and HBase experiences in perf log projectHadoop and HBase experiences in perf log project
Hadoop and HBase experiences in perf log projectMao Geng
 
Solr Troubleshooting - TreeMap approach
Solr Troubleshooting - TreeMap approachSolr Troubleshooting - TreeMap approach
Solr Troubleshooting - TreeMap approachAlexandre Rafalovitch
 
Solr Troubleshooting - Treemap Approach: Presented by Alexandre Rafolovitch, ...
Solr Troubleshooting - Treemap Approach: Presented by Alexandre Rafolovitch, ...Solr Troubleshooting - Treemap Approach: Presented by Alexandre Rafolovitch, ...
Solr Troubleshooting - Treemap Approach: Presented by Alexandre Rafolovitch, ...Lucidworks
 
Fortify aws aurora_proxy
Fortify aws aurora_proxyFortify aws aurora_proxy
Fortify aws aurora_proxyMarco Tusa
 
Hadoop cluster performance profiler
Hadoop cluster performance profilerHadoop cluster performance profiler
Hadoop cluster performance profilerIhor Bobak
 
Writing a TSDB from scratch_ performance optimizations.pdf
Writing a TSDB from scratch_ performance optimizations.pdfWriting a TSDB from scratch_ performance optimizations.pdf
Writing a TSDB from scratch_ performance optimizations.pdfRomanKhavronenko
 
100 bugs in Open Source C/C++ projects
100 bugs in Open Source C/C++ projects100 bugs in Open Source C/C++ projects
100 bugs in Open Source C/C++ projectsPVS-Studio
 

Semelhante a 002 hbase clientapi (20)

weblogic perfomence tuning
weblogic perfomence tuningweblogic perfomence tuning
weblogic perfomence tuning
 
Troubleshooting common oslo.messaging and RabbitMQ issues
Troubleshooting common oslo.messaging and RabbitMQ issuesTroubleshooting common oslo.messaging and RabbitMQ issues
Troubleshooting common oslo.messaging and RabbitMQ issues
 
Developing Realtime Data Pipelines With Apache Kafka
Developing Realtime Data Pipelines With Apache KafkaDeveloping Realtime Data Pipelines With Apache Kafka
Developing Realtime Data Pipelines With Apache Kafka
 
Introduction to LAVA Workload Scheduler
Introduction to LAVA Workload SchedulerIntroduction to LAVA Workload Scheduler
Introduction to LAVA Workload Scheduler
 
Sql saturday azure storage by Anton Vidishchev
Sql saturday azure storage by Anton VidishchevSql saturday azure storage by Anton Vidishchev
Sql saturday azure storage by Anton Vidishchev
 
Windows Azure Storage: Overview, Internals, and Best Practices
Windows Azure Storage: Overview, Internals, and Best PracticesWindows Azure Storage: Overview, Internals, and Best Practices
Windows Azure Storage: Overview, Internals, and Best Practices
 
Performance Tuning
Performance TuningPerformance Tuning
Performance Tuning
 
Apache Cassandra in Bangalore - Cassandra Internals and Performance
Apache Cassandra in Bangalore - Cassandra Internals and PerformanceApache Cassandra in Bangalore - Cassandra Internals and Performance
Apache Cassandra in Bangalore - Cassandra Internals and Performance
 
Drizzle—Low Latency Execution for Apache Spark: Spark Summit East talk by Shi...
Drizzle—Low Latency Execution for Apache Spark: Spark Summit East talk by Shi...Drizzle—Low Latency Execution for Apache Spark: Spark Summit East talk by Shi...
Drizzle—Low Latency Execution for Apache Spark: Spark Summit East talk by Shi...
 
Developing Real-Time Data Pipelines with Apache Kafka
Developing Real-Time Data Pipelines with Apache KafkaDeveloping Real-Time Data Pipelines with Apache Kafka
Developing Real-Time Data Pipelines with Apache Kafka
 
Event Processing and Integration with IAS Data Processors
Event Processing and Integration with IAS Data ProcessorsEvent Processing and Integration with IAS Data Processors
Event Processing and Integration with IAS Data Processors
 
What the CRaC - Superfast JVM startup
What the CRaC - Superfast JVM startupWhat the CRaC - Superfast JVM startup
What the CRaC - Superfast JVM startup
 
Hadoop and HBase experiences in perf log project
Hadoop and HBase experiences in perf log projectHadoop and HBase experiences in perf log project
Hadoop and HBase experiences in perf log project
 
Solr Troubleshooting - TreeMap approach
Solr Troubleshooting - TreeMap approachSolr Troubleshooting - TreeMap approach
Solr Troubleshooting - TreeMap approach
 
Solr Troubleshooting - Treemap Approach: Presented by Alexandre Rafolovitch, ...
Solr Troubleshooting - Treemap Approach: Presented by Alexandre Rafolovitch, ...Solr Troubleshooting - Treemap Approach: Presented by Alexandre Rafolovitch, ...
Solr Troubleshooting - Treemap Approach: Presented by Alexandre Rafolovitch, ...
 
Fortify aws aurora_proxy
Fortify aws aurora_proxyFortify aws aurora_proxy
Fortify aws aurora_proxy
 
Hadoop cluster performance profiler
Hadoop cluster performance profilerHadoop cluster performance profiler
Hadoop cluster performance profiler
 
Writing a TSDB from scratch_ performance optimizations.pdf
Writing a TSDB from scratch_ performance optimizations.pdfWriting a TSDB from scratch_ performance optimizations.pdf
Writing a TSDB from scratch_ performance optimizations.pdf
 
100 bugs in Open Source C/C++ projects
100 bugs in Open Source C/C++ projects100 bugs in Open Source C/C++ projects
100 bugs in Open Source C/C++ projects
 
Java 8
Java 8Java 8
Java 8
 

Mais de Scott Miao

My thoughts for - Building CI/CD Pipelines for Serverless Applications sharing
My thoughts for - Building CI/CD Pipelines for Serverless Applications sharingMy thoughts for - Building CI/CD Pipelines for Serverless Applications sharing
My thoughts for - Building CI/CD Pipelines for Serverless Applications sharingScott Miao
 
20171122 aws usergrp_coretech-spn-cicd-aws-v01
20171122 aws usergrp_coretech-spn-cicd-aws-v0120171122 aws usergrp_coretech-spn-cicd-aws-v01
20171122 aws usergrp_coretech-spn-cicd-aws-v01Scott Miao
 
Achieve big data analytic platform with lambda architecture on cloud
Achieve big data analytic platform with lambda architecture on cloudAchieve big data analytic platform with lambda architecture on cloud
Achieve big data analytic platform with lambda architecture on cloudScott Miao
 
analytic engine - a common big data computation service on the aws
analytic engine - a common big data computation service on the awsanalytic engine - a common big data computation service on the aws
analytic engine - a common big data computation service on the awsScott Miao
 
Zero-downtime Hadoop/HBase Cross-datacenter Migration
Zero-downtime Hadoop/HBase Cross-datacenter MigrationZero-downtime Hadoop/HBase Cross-datacenter Migration
Zero-downtime Hadoop/HBase Cross-datacenter MigrationScott Miao
 
Attack on graph
Attack on graphAttack on graph
Attack on graphScott Miao
 
004 architecture andadvanceduse
004 architecture andadvanceduse004 architecture andadvanceduse
004 architecture andadvanceduseScott Miao
 
001 hbase introduction
001 hbase introduction001 hbase introduction
001 hbase introductionScott Miao
 
20121022 tm hbasecanarytool
20121022 tm hbasecanarytool20121022 tm hbasecanarytool
20121022 tm hbasecanarytoolScott Miao
 

Mais de Scott Miao (9)

My thoughts for - Building CI/CD Pipelines for Serverless Applications sharing
My thoughts for - Building CI/CD Pipelines for Serverless Applications sharingMy thoughts for - Building CI/CD Pipelines for Serverless Applications sharing
My thoughts for - Building CI/CD Pipelines for Serverless Applications sharing
 
20171122 aws usergrp_coretech-spn-cicd-aws-v01
20171122 aws usergrp_coretech-spn-cicd-aws-v0120171122 aws usergrp_coretech-spn-cicd-aws-v01
20171122 aws usergrp_coretech-spn-cicd-aws-v01
 
Achieve big data analytic platform with lambda architecture on cloud
Achieve big data analytic platform with lambda architecture on cloudAchieve big data analytic platform with lambda architecture on cloud
Achieve big data analytic platform with lambda architecture on cloud
 
analytic engine - a common big data computation service on the aws
analytic engine - a common big data computation service on the awsanalytic engine - a common big data computation service on the aws
analytic engine - a common big data computation service on the aws
 
Zero-downtime Hadoop/HBase Cross-datacenter Migration
Zero-downtime Hadoop/HBase Cross-datacenter MigrationZero-downtime Hadoop/HBase Cross-datacenter Migration
Zero-downtime Hadoop/HBase Cross-datacenter Migration
 
Attack on graph
Attack on graphAttack on graph
Attack on graph
 
004 architecture andadvanceduse
004 architecture andadvanceduse004 architecture andadvanceduse
004 architecture andadvanceduse
 
001 hbase introduction
001 hbase introduction001 hbase introduction
001 hbase introduction
 
20121022 tm hbasecanarytool
20121022 tm hbasecanarytool20121022 tm hbasecanarytool
20121022 tm hbasecanarytool
 

Último

Training state-of-the-art general text embedding
Training state-of-the-art general text embeddingTraining state-of-the-art general text embedding
Training state-of-the-art general text embeddingZilliz
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxhariprasad279825
 
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Wonjun Hwang
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Patryk Bandurski
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenHervé Boutemy
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piececharlottematthew16
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Manik S Magar
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...Fwdays
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsMiki Katsuragi
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 

Último (20)

Training state-of-the-art general text embedding
Training state-of-the-art general text embeddingTraining state-of-the-art general text embedding
Training state-of-the-art general text embedding
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptx
 
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
 
DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special EditionDMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache Maven
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piece
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering Tips
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 

002 hbase clientapi

  • 2. Agenda  Course Credit  Hands-on  Install tm-puppet  Client API: The basics  Hands-on  Write your own CRUD codes~  Client API: Advanced Feature  All refers to Hbase: The Definitive Guide - http://www.amazon.com/HBase-Definitive-Guide- Lars-George/dp/1449396100/ref=sr_1_1?ie=UTF8&qid=1339060175&sr=8-1 2
  • 3. General Notes (1/2)  Any mutate data operations are atomic on a per-row basis  Create HTable instances only once for each thread  HTable is not thread-safe  Can use HTablePool  Familiar with the API docs 3
  • 4. General Notes (2/2)  Configuration  Load hbase-default & hbase-site.xml in CLASSPATH  Set properties in hadoop CLI  Set properties in Java code  Set in Java code > hadoop CLI > hbase- site.xml > hbase-default.xml 4
  • 5. Client API: The basics  Put  Get  Delete  Batch operations  Row Locks  Scan Source code: https://github.com/larsgeorge/hbase-book 5
  • 6. Put method - Single Put  ch03/client.PutExample  Notice the timestamp  ts will set by HBase if user not provide  ts determines version in HBase (default is 3)  ts may confuse the HBase versioning if Client‟s timezone is not identical 6
  • 7. Put method - KeyValue  The low level data bytes in Client APIs  <row-key>/<family>:<qualifier>/<version>/<type>/<value- length>  Put.add(KeyValue kv);  Map<byte[], List<KeyValue>> Put.getFamilyMap(); 7
  • 8. Put method – Client-side write buffer (1/3)  collects put operations so that they are sent in one RPC call to the server(s)  ch03/client.PutWriteBufferExample sample code  long getWriteBufferSize()  void setWriteBufferSize(long writeBufferSize)  Default is 2 MB bytes  Configuration property ○ hbase.client.write.buffer 8
  • 9. Put method – Client-side write buffer (2/3)  Round-trip time  Is the time it takes for a client to send a request and the server to send a response over the network  Not include the Data-Transfer Time ○ Data size is a factor  On average, 1ms on a LAN ○ 1000 round-trips per second  Usecase  Small data size but many requests to send 9
  • 10. Put method – Client-side write buffer (3/3) 10
  • 11. Put method – List of Puts  ch03/client.PutListExample  ch03/client. PutListErrorExample2 11
  • 12. Put method – Atomic compare-and-set  A check before put  ch03/client.CheckAndPutExample  Can not cross the row 12
  • 13. Get method – Single Gets  client.GetExample  Result Class  Contains all the matching cells 13
  • 14. Get method – List of Gets  client.GetListExample  client. GetListErrorExample  client.GetRowOrBeforeExample  Find the specified rowKey  Previous row if not found  Null if no any found 14
  • 15. Delete method – Single Deletes  client.DeleteExample 15
  • 16. Delete method – List of Deletes  client.DeleteListExample  client.DeleteListErrorExample 16
  • 17. Delete method – Atomic compare-and-delete  client.CheckAndDeleteExample 17
  • 18. Batch Operations  client.BatchExample  No client-side buffer, just like Put operations 18
  • 19. Row Locks (1/3)  Two types of lock  Server side lock ○ Servers will create a lock implicitly on your behalf, just for the duration of the call  Client side lock ○ Clients can also acquire explicit locks and use them across multiple operations on the same row ○ RowLock Class  client.RowLockExample 19
  • 20. Row Locks (2/3)  Avoid using row locks whenever possible  Do Gets require a Lock ?  No, while a mutation is in progress, all reading clients will be seeing the previous state of all columns 20
  • 21. Row Locks (3/3)  When to release RowLock ?  Current lock has been released  The lease on the lock has expired ○ Configuration Property on the server side  hbase.regionserver.lease.period  Default is 1 min. ○ org.apache.hadoop.hbase.regionserver.LeaseExce ption: 21
  • 22. Scans (1/3)  A technique akin to cursors in database systems  which make use of the underlying sequential, sorted storage layout HBase is providing  Narrowing the scan‟s scope is playing into the strengths of HBase  Since data is stored in column families, you will not read the unrelated families storage files at all 22
  • 23. Scans (2/3)  Scan(byte[] startRow, byte[] stopRow)  [startRow, stopRow)  Scan addFamily(byte [] family)  Scan addColumn(byte[] family, byte[] qualifier)  Scan setTimeRange(long minStamp, long maxStamp) throws IOException  Scan setTimeStamp(long timestamp)  Scan setMaxVersions()  Scan setMaxVersions(int maxVersions)  Scan setFilter(Filter filter) 23
  • 24. Scans (3/3)  ch03/client.ScanExample  Scans do not ship all the matching rows in one RPC to the client  one call would use up too many resources, and take a long time  ResultScanner wraps the Result instance for each row into an iterator functionality  An iterator functionality  Just like JDBC ResultSet 24
  • 25. Scans – Caching (1/2)  Deal with small data rows with huge data set size  Table level  void HTable.setScannerCaching(int scannerCaching)  Scan level  void Scan.setCaching(int caching)  In configuration file (hbase-site.xml)  hbase.client.scanner.caching  Will take effect depends on you put it on the client or server side 25
  • 26. Scans – Caching (2/2)  Need to find a sweet spot between  A low number of RPCs  The memory used on client and server side  Using the same lease-based mechanisms with RowLock  org.apache.hadoop.hbase.client.Scanne rTimeoutException: 65094ms passed since the last invocation, timeout is currently set to 60000  ch03/client.ScanTimeoutExample 26
  • 27. Scans – Batching  Deal with very large rows  Those do not fit into the memory of the client process  batching works on the column level  void Scan.setBatch(int batch)  For example, your row has 17 columns and you set the batch to 5…  You‟ll get four Result instances, with 5, 5, 5, and 2 27
  • 28. Scans – Caching & Batching (1/3)  ch03/client.ScanCacheBatchExample  10 rows * 20 columns per row = 200 columns 28
  • 29. Scans – Caching & Batching (2/3)  RPCs = (Rows * Cols per Row) / Min(Cols per Row, Batch Size) / Scanner Caching = (10 * 20) / Min(20, 20) / 5 = 200 / 20 / 5 =2  2 + 1 or 2 requests for open/close Scanner = 3 or 4 29
  • 30. Scans – Caching & Batching (3/3)  1 Table, 9 Rows, with some columns  Caching set to 6, batch set to 3 30
  • 31. Hands-On – Write your own CRUD codes (1/3)  In hbase shell  Create table ○ A Table „MY_SECOND_TABLE‟ ○ With two column families „FAM_1‟, „FAM_2‟  In java code  Put ○ Two values  Scan Table  Delete ○ One value  Get the last one value 31
  • 32. Hands-On – Write your own CRUD codes (2/3)  Environment  Let project_home = ${git_home}/hbase- training/002/hands-on/${your_name}  mkdir ${project_home}  cp –rf ${git_home}/hbase-training/002/projects/training- 002 ${project_home}  Write java codes in  ${project_home}/src/main/java/client/CrudTest.java 32
  • 33. Hands-On – Write your own CRUD codes (3/3)  Requirements  After you completed your codes  Run command in ${project_home} ○ Build the jar file  mvn clean package ○ Run the jar file you built  sh bin/run.sh > output.txt ○ output.txt  Ran successfully and output the Hbase data  I will verify this file in git  Commit and push your git 33
  • 34. Client API: Advanced Features  Filters  Counters  Coprocessors  HTable Pool  Connection Handling 34
  • 35. Filters  Get  Direct access to data  Scan  Use start/end key  Filters  More limiting selectors to the query  Applied on the server side  Including ○ Column families, column qualifiers, timestamps or ranges, version number 35
  • 36. Filters – How Filters work 36
  • 37. Filters – Hierarchy  Various Filter impl.s for your needs  You also can write your own impl. 37
  • 38. Filters – Comparison Filters They take the comparison operator and comparator instance Class Name Description RowFilter It is used to filter based on the row key FamilyFilter It is used to filter based on the column family QualifierFilter It is used to filter based on the column qualifier ValueFilter It is used to filter based on column value DependentColumnFil It uses the timestamp of the reference column and ter includes all other columns that have the same timestamp 38
  • 41. Filters – CompareFilter example  ch04/filters.RowFilterExample 41
  • 42. Filters – Dedicated Filters Mainly used in the Scan, they basically filter out entire rows Class Name Description SingleColumnValueFil It is used to filter cells based on value ter SingleColumnValueEx Opposite with SingleColumnValueFilter cludeFilter PrefixFilter All rows that match this prefix are returned to the client PageFilter It controls how many rows per page should be returned KeyOnlyFilter It access just the keys of each KeyValue, while omitting the actual data FirstKeyOnlyFilter It access the key of first column in each row, and bypass the rest 42
  • 43. Filters – Dedicated Filters Class Name Description InclusiveStopFilter Change the Scan [startRow, stopRow) to [startRow, stopRow] TimestampFilter It returns only cells whose timestamp (version) is in the specified list of timestamps (versions) ColumnCountGetFilter It returns first N columns on row only, for HBase test purpose ColumnPaginationFilter Similar to the PageFilter, this one can be used to page through columns in a row ColumnPrefixFilter Analog to the PrefixFilter, which worked by filtering on row key prefixes, this filter does the same for columns RandomRowFilter It including random rows into the result 43
  • 44. Filters – Decorating Filters Class Name Description SkipFilter wraps a given filter and extends it to exclude an entire row, when the wrapped filter hints for a KeyValue to be skipped WhileMatchFilter It aborts the entire scan once a piece of information is filtered 44
  • 45. Filters - FilterList  In practice, you may want to have more than one filter being applied to reduce the data returned to your client application  Operators 45
  • 46. Filters - FilterList  ch04/filters.FilterListExample  First scan filters is like  Second scan filters is like 46
  • 47. Filters – Custom Filters  If there is no any Filters can help your needs  You could make one by yourself !!  Make a Filter Impl. extended from  Filter  FilterBase 47
  • 48. Filters – Custom Filters  ch04/filters.CustomFilter  ch04/filters.CustomFilterExample  Custom Filters Deployment (costly) 1. Build jar file 2. Put jar file on the every region server 3. Append jar file path into $CLASSPATH in hbase-env.sh 4. Restart all HBase daemons 48
  • 51. Counters  Many applications that collect statistics  such as clicks or views in online advertising  were used to collect the data in logfiles that would subsequently be analyzed  The Counter is all you need !! 51
  • 52. Counters - shell  Create a table  create 'counters', 'daily', 'weekly', 'monthly„  Initial a counter  incr 'counters', '20110101', 'daily:hits', 1  Let‟s do it twice  Get your counter  get_counter 'counters', '20110101', 'daily:hits' 52
  • 53. Counters - shell  You can also fine-tune your counter  incr 'counters', '20110101', 'daily:hits', 0  incr 'counters', '20110101', 'daily:hits', -1  Do not use put as incr, despite counter is also a value  Data type issue, long V.S. String  Use get_counter not get  It is more human readable~ 53
  • 54. Counters - API  Single Counters  ch04/client.IncrementSingleExample  Multiple Counters  ch04/client.IncrementMultipleExample 54
  • 55. Coprocessors  With the coprocessor feature in HBase, you can even move part of the computation to where the data lives  As a small MapReduce framework, which can distribute the work across the entire cluster 55
  • 56. Coprocessors  Two types  Observer ○ Trigger-like  Endpoint ○ Stored procedure-like  Usecases  Aggregate functions, sum(), avg()  Integrity Checks, put some data and other data must exist  Authentication, authorization and auditing ○ Based on Coprocessors from 0.92 HBase 56
  • 57. Coprocessors – Coprocessor Class  Priorities defined in Coprocessor.Priority enumeration 57
  • 58. Coprocessors – Coprocessor Class  State defined in Coprocessor.State enumeration 58
  • 61. Coprocessor – Loading from Configuration  Add following description in hbase-site.xml  Region, master, wal are different Observers  The order of Class fully-qualified names in value, will determine the execution order  And follow the Custom-Filter deployment way  For every table and region 61
  • 62. Coprocessor – Loading from table descriptor  Use HTableDescriptor.setValue(String key, String value)  Key spec.  COPROCESSOR[$<number>]  Ex. ○ COPROCESSOR$1  Value spec.  <jarFilePath>|<classFullyQualifiedName>|<priority>  Ex. ○ “hdfs://localhost:8020/users/leon/test.jar|coprocessor.Test|SYSTEM”  jarFilePath could be any protocol supported by Hadoop FileSystem Class  Ch04/coprocessor.LoadWithTableDescriptorExample  Only for regions of specified table 62
  • 63. Coprocessor - Observer  callback functions (hooks) are executed when certain events occur  Known as Triggers in DBMS Observer Type Description RegionObserver Observse events bound to the regions of a table MasterObserver Observe evens bound to administrative or DDL-type operations (cluster-wide event) WALObserver Observe events bound to WAL log (Write-ahead log) processing 63
  • 64. Coprocessor – Observer main claases 64
  • 65. Coprocessor – RegionObserver and Region Life Cycle 65
  • 66. Coprocessor – RegionObserver Classes • Handling region life cycle events • Handling client API events • ch04/coprocessor.RegionObserverExample 66
  • 67. Coprocessor – MasterObserver Classes • ch04/coprocessor.MasterObserverExample 67
  • 68. Coprocessor - Endpoint  User code can be deployed to the servers hosting the data to, for example, perform server-local computations  Known as Stored procedures in DBMS  Can be combined with observer implementations to directly interact with the server-side state 68
  • 69. Coprocessor – Endpoint main Classes • ch04/coprocessor.RowCountProtocol • ch04/coprocessor.RowCountEndpoint • ch04/coprocessor.EndpointExample • ch04/coprocessor.EndpointProxyExample 69
  • 70. Coprocessor – Single Region V.S. Range of regions 70
  • 71. HTablePool  Creating an HTable instance takes a few seconds to complete  It is not be capable in highly contended environment with thousands of requests per second  Keep one HTable instance for multiple uses, but it is not thread-safe 71
  • 73. Connection Handling  Use the shared Connection as you can 73
  • 75. Connection Handling – Features  Share ZooKeeper connections  initial lookup of where user table regions are located  Cache common resources  Location is cached on the client side after first round-trips with ZooKeeper and other servers  When a lookup fails ○ Ex. A region was split ○ A built-in retry mechanism to refresh the stale cache information  Do not forget to release your shared Connection  HTable.close()  HTablePool.closeTablePool(…) 75