SlideShare uma empresa Scribd logo
1 de 20
HBase Coprocessor Intro.

   Anty Rao, Schubert Zhang
         Aug. 29 2012
Motivation
• Distributed and parallel computation over data stored within HBase/Bigtable.

• Architecture: {HBase + MapReduce} vs. {HBase with Coprocessor}
    – {Loosely coupled} vs. {Built in}
    – E.g., simple additive or aggregating operations like summing, counting, and the like –
      pushing the computation down to the servers where it can operate on the data
      directly without communication overheads can give a dramatic performance
      improvement over HBase’s already good scanning performance.

• To be a framework for both flexible and generic extension, and of distributed
  computation directly within the HBase server processes.
    – Arbitrary code can run at each tablet in each HBase server.
    – Provides a very flexible model for building distributed services.
    – Automatic scaling, load balancing, request routing for applications.
Motivation (cont.)
• To be a Data-Driven distributed and parallel service platform.
    – Distributed parallel computation framework.
    – Distributed application service platform.

• High-level call interface for clients
    – Calls are addressed to rows or ranges of rows and the coprocessor client library
      resolves them to actual locations;
    – Calls across multiple rows are automatically split into multiple parallelized RPC.

• Origin
    – Inspired by Google’s Bigtable Coprocessors.
    – Jeff Dean gave a talk at LADIS’09
        • http://www.cs.cornell.edu/projects/ladis2009/talks/dean-keynote-ladis2009.pdf, page 66-67
HBase vs. Google Bigtable
• It is a framework that provides a library and runtime
  environment for executing user code within the HBase
  region server and master processes.

• Google coprocessors in contrast run co-located with the
  tablet server but outside of its address space.
  – https://issues.apache.org/jira/browse/HBASE-4047
Google’s Bigtable Coprocessors
Overview of HBase Coprocessor
• Tow scopes
    – System : loaded globally on all tables and regions.
    – Per-table: loaded on all regions for a table.

• Two types
    – Observers
        • Like triggers in conventional databases.
        • The idea behind observers is that we can insert user code by overriding upcall methods provided by the
          coprocessor framework. The callback functions are executed from core HBase code when certain events occur.
    – Endpoints
        • Dynamic PRC endpoints that resemble stored procedures.
        • One can invoke an endpoint at any time from the client. The endpoint implementation will then be executed
          remotely at the target region or regions, and results from those executions will be returned to the client.

• Difference of the tow types
    – Only endpoints return result to client.
Observers
• Currently, three observers interfaces provided
   – RegionObserver
       • Provides hooks for data manipulation events, Get, Put, Delete, Scan, and so on. There is
         an instance of a RegionObserver coprocessor for every table region and the scope of
         the observations they can make is constrained to that region
   – WALObserver
       • Provides hooks for write-ahead log (WAL) related operations. This is a way to observe
         or intercept WAL writing and reconstruction events. A WALObserver runs in the context
         of WAL processing. There is one such context per region server.
   – MasterObserver
       • Provides hooks for DDL-type operation, i.e., create, delete, modify table, etc. The
         MasterObserver runs within the context of the HBase master.

• Multiple Observers are chained to execute sequentially by order of
  assigned priorities.
Observers: Example
Observers: Example Code
package org.apache.hadoop.hbase.coprocessor;

import java.util.List;
import org.apache.hadoop.hbase.KeyValue;
import org.apache.hadoop.hbase.client.Get;

// Sample access-control coprocessor. It utilizes RegionObserver
// and intercept preXXX() method to check user privilege for the given table
// and column family.
public class AccessControlCoprocessor extends BaseRegionObserver {
  @Override
  public void preGet(final ObserverContext<RegionCoprocessorEnvironment> c,
final Get get, final List<KeyValue> result) throws IOException
    throws IOException {

        // check permissions..
        if (!permissionGranted()) {
           throw new AccessDeniedException("User is not allowed to access.");
        }
    }

    // override prePut(), preDelete(), etc.
}
Endpoint
• Resembling stored procedures
• Invoke an endpoint at any time from the client.
• The endpoint implementation will then be executed
  remotely at the target region or regions.
• Result from those executions will be returned to the client.

• Code implementation
   – Endpoint is an interface for dynamic RPC extension.
Endpoints:
   How to implement a custom Coprocessor?
• Have a new protocol interface which extends CoprocessorProtocol.
• Implement the Endpoint interface and the new protocol interface . The
  implementation will be loaded into and executed from the region
  context.
   – Extend the abstract class BaseEndpointCoprocessor. This convenience class
     hide some internal details that the implementer need not necessary be
     concerned about, such as coprocessor class loading.
• On the client side, the Endpoints can be invoked by two new HBase
  client APIs:
   – Executing against a single region:
       • HTableInterface.coprocessorProxy(Class<T> protocol, byte[] row)
   – Executing against a range of regions:
       • HTableInterface.coprocessorExec(Class<T> protocol, byte[] startKey, byte[]
         endKey, Batch.Call<T,R> callable)
Endpoints: Example
                                Client Code
                         new Batch.Call (on all regions)                                   Region Server 1
                                                                                                                            Endpoint
          Batch.Call<ColumnAggregationProtocol, Long>()                                      tableA, , 12345678
          {                                                                                                         ColumnAggregationProtocol




                                                                                      .)
                                                                                     (..
              public Long call(ColumnAggregationProtocol instance)




                                                                                   or
              throws IOException




                                                                                 ss
                                                                                                                            Endpoint




                                                                                  e
                                                                               oc
              {                                                                            tableA, bbbb, 12345678




                                                                            pr
                return instance.sum(FAMILY, QUALIFIER);




                                                                          Co
                                                                                                                    ColumnAggregationProtocol




                                                                      ec
              }




                                                                     ex
            }



                                     HTable                                                Region Server 2
                                                                                                                            Endpoint
          Map<byte[], Long> sumResults =                                                   tableA, cccc, 12345678
          table.coprocessorExec(ColumnAggregationProtocol.class,                                                    ColumnAggregationProtocol
          startRow, endRow)
                                                                                                                            Endpoint
                                                                                           tableA, dddd, 12345678
                                                                                                                    ColumnAggregationProtocol
                                  Batch Results

                         Map<byte[], Long> sumResults




•   Note that the HBase client has the responsibility for dispatching parallel endpoint invocations to the
    target regions, and for collecting the returned results to present to the application code.
•   Like a lightweight MapReduce job: The “map” is the endpoint execution performed in the region
    server on every target region, and the “reduce” is the final aggregation at the client.
•   The distributed systems programming details behind a clean API.
Step-1: Define protocol interface
/**
 * A sample protocol for performing aggregation at regions.
 */
public interface ColumnAggregationProtocol extends CoprocessorProtocol
{
   /**
    * Perform aggregation for a given column at the region. The aggregation
    * will include all the rows inside the region. It can be extended to allow
    * passing start and end rows for a fine-grained aggregation.
    *
    * @param family
    *        family
    * @param qualifier
    *        qualifier
    * @return Aggregation of the column.
    * @throws exception.
    */
   public long sum(byte[] family, byte[] qualifier) throws IOException;
}
Step-2: Implement endpoint and the interface
                                                            try
public class ColumnAggregationEndpoint extends                    {
BaseEndpointCoprocessor                                            List<KeyValue> curVals = new ArrayList<KeyValue>();
    implements ColumnAggregationProtocol                           boolean done = false;
{                                                                  do
  @Override                                                        {
  public long sum(byte[] family, byte[] qualifier) throws             curVals.clear();
IOException                                                           done = scanner.next(curVals);
  {                                                                   KeyValue kv = curVals.get(0);
    // aggregate at each region                                       sumResult +=
    Scan scan = new Scan();                                 Bytes.toLong(kv.getBuffer(), kv.getValueOffset());
    scan.addColumn(family, qualifier);                             } while (done);
    long sumResult = 0;                                         }
                                                                finally
    InternalScanner scanner =                                   {
         ((RegionCoprocessorEnvironment)                           scanner.close();
getEnvironment()).getRegion()                                   }
             .getScanner(scan);                                 return sumResult;
                                                              }
                                                            }
Step-3 Deployment
• Two chooses
  – Load from configuration (hbase-site.xml, restart HBase)
  – Load from table attribute (disable and enable table)
     • From shell
Step-4: Invoking
                   HTable table = new HTable(util.getConfiguration(), TEST_TABLE);

• On client            Map<byte[], Long> results;

                       // scan: for all regions
  side, invoking       results =
                            table.coprocessorExec(ColumnAggregationProtocol.class,

  the endpoint                   ROWS[rowSeperator1 - 1], ROWS[rowSeperator2 + 1],
                                 new Batch.Call<ColumnAggregationProtocol, Long>()
                                 {
                                     public Long call(ColumnAggregationProtocol instance)
                                          throws IOException
                                     {
                                       return instance
                                            .sum(TEST_FAMILY, TEST_QUALIFIER);
                                     }
                                 });
                       long sumResult = 0;
                       long expectedResult = 0;
                       for (Map.Entry<byte[], Long> e : results.entrySet())
                       {
                          sumResult += e.getValue();
                       }
Server side execution
• Region Server             public interface HRegionInterface
  provide                   extends VersionedProtocol,
  environment to            Stoppable,Abortable
  execute custom
  coprocessor in            {
  region context.           …
• Exec                       ExecResult execCoprocessor(byte[]
   – Custom protocol        regionName, Exec call) throws
     name                   IOException;
   – Method name
                            …
   – Method
     parameters             }
Coprocessor Manangement
• Build your own Coprocessor
   – Write server-side coprocessor code like above example, compiled and
     packaged as a jar file.
      • CoprocessorProtocol (e.g. ColumnAggregationProtocol)
      • Endpoint implementation (e.g. ColumnAggregationEndpoint)
• Coprocessor Deployment
   – Load from Configuration (hbase-site.xml, restart HBase)
      • The jar file must be in classpath of HBase servers.
      • Global for all regions of all tables (system coprocessors).
   – Load from table attribute (from shell)
      • per table basis
      • The jar file should be put into HDFS or HBase servers’ classpath firstly, and set in
        the table attribute.
Future Work based on Coprocessors
•   Parallel Computation Framework (our first goal!)                    • Others
     – Higher level of abstraction
     – E.g. MapReduce APIs similar.                                        – External Coprocessor Host
     – Integration and implementation Dremel and/or dremel                   (HBASE-4047)
       computation model into HBase.
                                                                               • separate processes

•   Distributed application service platform (our second                   – Code Weaving (HBASE-2058)
    goal !?)                                                                   • protect against malicious actions
     – Higher level of abstraction                                               or faults accidentally introduced
     – Data-driven distributed application architecture.                         by a coprocessor.
     – Avoid building similar distributed architecture repeatedly.
                                                                           – …

•   HBase system enhancements
     – HBase internal measurements and statistics for administration.
•   Support application like percolator
     – Observes and notifications.
Reference
• https://blogs.apache.org/hbase/entry/coprocessor_intr
  oduction

Mais conteúdo relacionado

Mais procurados

HBaseCon2017 Improving HBase availability in a multi tenant environment
HBaseCon2017 Improving HBase availability in a multi tenant environmentHBaseCon2017 Improving HBase availability in a multi tenant environment
HBaseCon2017 Improving HBase availability in a multi tenant environmentHBaseCon
 
Tuning Java for Big Data
Tuning Java for Big DataTuning Java for Big Data
Tuning Java for Big DataScott Seighman
 
Nov. 4, 2011 o reilly webcast-hbase- lars george
Nov. 4, 2011 o reilly webcast-hbase- lars georgeNov. 4, 2011 o reilly webcast-hbase- lars george
Nov. 4, 2011 o reilly webcast-hbase- lars georgeO'Reilly Media
 
Taming GC Pauses for Humongous Java Heaps in Spark Graph Computing-(Eric Kacz...
Taming GC Pauses for Humongous Java Heaps in Spark Graph Computing-(Eric Kacz...Taming GC Pauses for Humongous Java Heaps in Spark Graph Computing-(Eric Kacz...
Taming GC Pauses for Humongous Java Heaps in Spark Graph Computing-(Eric Kacz...Spark Summit
 
Hadoop world g1_gc_forh_base_v4
Hadoop world g1_gc_forh_base_v4Hadoop world g1_gc_forh_base_v4
Hadoop world g1_gc_forh_base_v4YanpingWang
 
HBaseCon 2013: Scalable Network Designs for Apache HBase
HBaseCon 2013: Scalable Network Designs for Apache HBaseHBaseCon 2013: Scalable Network Designs for Apache HBase
HBaseCon 2013: Scalable Network Designs for Apache HBaseCloudera, Inc.
 
HBaseCon 2015: OpenTSDB and AsyncHBase Update
HBaseCon 2015: OpenTSDB and AsyncHBase UpdateHBaseCon 2015: OpenTSDB and AsyncHBase Update
HBaseCon 2015: OpenTSDB and AsyncHBase UpdateHBaseCon
 
HBaseCon2017 Removable singularity: a story of HBase upgrade in Pinterest
HBaseCon2017 Removable singularity: a story of HBase upgrade in PinterestHBaseCon2017 Removable singularity: a story of HBase upgrade in Pinterest
HBaseCon2017 Removable singularity: a story of HBase upgrade in PinterestHBaseCon
 
[RakutenTechConf2014] [D-4] The next step of LeoFS and Introducing NewDB Project
[RakutenTechConf2014] [D-4] The next step of LeoFS and Introducing NewDB Project[RakutenTechConf2014] [D-4] The next step of LeoFS and Introducing NewDB Project
[RakutenTechConf2014] [D-4] The next step of LeoFS and Introducing NewDB ProjectRakuten Group, Inc.
 
HBase and HDFS: Understanding FileSystem Usage in HBase
HBase and HDFS: Understanding FileSystem Usage in HBaseHBase and HDFS: Understanding FileSystem Usage in HBase
HBase and HDFS: Understanding FileSystem Usage in HBaseenissoz
 
Hadoop Query Performance Smackdown
Hadoop Query Performance SmackdownHadoop Query Performance Smackdown
Hadoop Query Performance SmackdownDataWorks Summit
 
Streaming Replication (Keynote @ PostgreSQL Conference 2009 Japan)
Streaming Replication (Keynote @ PostgreSQL Conference 2009 Japan)Streaming Replication (Keynote @ PostgreSQL Conference 2009 Japan)
Streaming Replication (Keynote @ PostgreSQL Conference 2009 Japan)Masao Fujii
 
HBase tales from the trenches
HBase tales from the trenchesHBase tales from the trenches
HBase tales from the trencheswchevreuil
 
Background Tasks in Node - Evan Tahler, TaskRabbit
Background Tasks in Node - Evan Tahler, TaskRabbitBackground Tasks in Node - Evan Tahler, TaskRabbit
Background Tasks in Node - Evan Tahler, TaskRabbitRedis Labs
 
PostgreSQL Extensions: A deeper look
PostgreSQL Extensions:  A deeper lookPostgreSQL Extensions:  A deeper look
PostgreSQL Extensions: A deeper lookJignesh Shah
 
003 admin featuresandclients
003 admin featuresandclients003 admin featuresandclients
003 admin featuresandclientsScott Miao
 
PGPool-II Load testing
PGPool-II Load testingPGPool-II Load testing
PGPool-II Load testingEDB
 
8a. How To Setup HBase with Docker
8a. How To Setup HBase with Docker8a. How To Setup HBase with Docker
8a. How To Setup HBase with DockerFabio Fumarola
 

Mais procurados (20)

HBaseCon2017 Improving HBase availability in a multi tenant environment
HBaseCon2017 Improving HBase availability in a multi tenant environmentHBaseCon2017 Improving HBase availability in a multi tenant environment
HBaseCon2017 Improving HBase availability in a multi tenant environment
 
Tuning Java for Big Data
Tuning Java for Big DataTuning Java for Big Data
Tuning Java for Big Data
 
Nov. 4, 2011 o reilly webcast-hbase- lars george
Nov. 4, 2011 o reilly webcast-hbase- lars georgeNov. 4, 2011 o reilly webcast-hbase- lars george
Nov. 4, 2011 o reilly webcast-hbase- lars george
 
Taming GC Pauses for Humongous Java Heaps in Spark Graph Computing-(Eric Kacz...
Taming GC Pauses for Humongous Java Heaps in Spark Graph Computing-(Eric Kacz...Taming GC Pauses for Humongous Java Heaps in Spark Graph Computing-(Eric Kacz...
Taming GC Pauses for Humongous Java Heaps in Spark Graph Computing-(Eric Kacz...
 
Hadoop world g1_gc_forh_base_v4
Hadoop world g1_gc_forh_base_v4Hadoop world g1_gc_forh_base_v4
Hadoop world g1_gc_forh_base_v4
 
HBaseCon 2013: Scalable Network Designs for Apache HBase
HBaseCon 2013: Scalable Network Designs for Apache HBaseHBaseCon 2013: Scalable Network Designs for Apache HBase
HBaseCon 2013: Scalable Network Designs for Apache HBase
 
HBaseCon 2015: OpenTSDB and AsyncHBase Update
HBaseCon 2015: OpenTSDB and AsyncHBase UpdateHBaseCon 2015: OpenTSDB and AsyncHBase Update
HBaseCon 2015: OpenTSDB and AsyncHBase Update
 
HBaseCon2017 Removable singularity: a story of HBase upgrade in Pinterest
HBaseCon2017 Removable singularity: a story of HBase upgrade in PinterestHBaseCon2017 Removable singularity: a story of HBase upgrade in Pinterest
HBaseCon2017 Removable singularity: a story of HBase upgrade in Pinterest
 
Hbase Nosql
Hbase NosqlHbase Nosql
Hbase Nosql
 
[RakutenTechConf2014] [D-4] The next step of LeoFS and Introducing NewDB Project
[RakutenTechConf2014] [D-4] The next step of LeoFS and Introducing NewDB Project[RakutenTechConf2014] [D-4] The next step of LeoFS and Introducing NewDB Project
[RakutenTechConf2014] [D-4] The next step of LeoFS and Introducing NewDB Project
 
HBase and HDFS: Understanding FileSystem Usage in HBase
HBase and HDFS: Understanding FileSystem Usage in HBaseHBase and HDFS: Understanding FileSystem Usage in HBase
HBase and HDFS: Understanding FileSystem Usage in HBase
 
Hadoop Query Performance Smackdown
Hadoop Query Performance SmackdownHadoop Query Performance Smackdown
Hadoop Query Performance Smackdown
 
Streaming Replication (Keynote @ PostgreSQL Conference 2009 Japan)
Streaming Replication (Keynote @ PostgreSQL Conference 2009 Japan)Streaming Replication (Keynote @ PostgreSQL Conference 2009 Japan)
Streaming Replication (Keynote @ PostgreSQL Conference 2009 Japan)
 
HBase tales from the trenches
HBase tales from the trenchesHBase tales from the trenches
HBase tales from the trenches
 
Background Tasks in Node - Evan Tahler, TaskRabbit
Background Tasks in Node - Evan Tahler, TaskRabbitBackground Tasks in Node - Evan Tahler, TaskRabbit
Background Tasks in Node - Evan Tahler, TaskRabbit
 
PostgreSQL Extensions: A deeper look
PostgreSQL Extensions:  A deeper lookPostgreSQL Extensions:  A deeper look
PostgreSQL Extensions: A deeper look
 
PostgreSQL Terminology
PostgreSQL TerminologyPostgreSQL Terminology
PostgreSQL Terminology
 
003 admin featuresandclients
003 admin featuresandclients003 admin featuresandclients
003 admin featuresandclients
 
PGPool-II Load testing
PGPool-II Load testingPGPool-II Load testing
PGPool-II Load testing
 
8a. How To Setup HBase with Docker
8a. How To Setup HBase with Docker8a. How To Setup HBase with Docker
8a. How To Setup HBase with Docker
 

Semelhante a HBase Coprocessor Introduction

Hadoop map reduce in operation
Hadoop map reduce in operationHadoop map reduce in operation
Hadoop map reduce in operationSubhas Kumar Ghosh
 
weblogic perfomence tuning
weblogic perfomence tuningweblogic perfomence tuning
weblogic perfomence tuningprathap kumar
 
Unleashing your Kafka Streams Application Metrics!
Unleashing your Kafka Streams Application Metrics!Unleashing your Kafka Streams Application Metrics!
Unleashing your Kafka Streams Application Metrics!HostedbyConfluent
 
002 hbase clientapi
002 hbase clientapi002 hbase clientapi
002 hbase clientapiScott Miao
 
Java 어플리케이션 성능튜닝 Part1
Java 어플리케이션 성능튜닝 Part1Java 어플리케이션 성능튜닝 Part1
Java 어플리케이션 성능튜닝 Part1상욱 송
 
HBase Client APIs (for webapps?)
HBase Client APIs (for webapps?)HBase Client APIs (for webapps?)
HBase Client APIs (for webapps?)Nick Dimiduk
 
Nextcon samza preso july - final
Nextcon samza preso   july - finalNextcon samza preso   july - final
Nextcon samza preso july - finalYi Pan
 
Finagle and Java Service Framework at Pinterest
Finagle and Java Service Framework at PinterestFinagle and Java Service Framework at Pinterest
Finagle and Java Service Framework at PinterestPavan Chitumalla
 
Java High Level Stream API
Java High Level Stream APIJava High Level Stream API
Java High Level Stream APIApache Apex
 
Streams Don't Fail Me Now - Robustness Features in Kafka Streams
Streams Don't Fail Me Now - Robustness Features in Kafka StreamsStreams Don't Fail Me Now - Robustness Features in Kafka Streams
Streams Don't Fail Me Now - Robustness Features in Kafka StreamsHostedbyConfluent
 
Porting a Streaming Pipeline from Scala to Rust
Porting a Streaming Pipeline from Scala to RustPorting a Streaming Pipeline from Scala to Rust
Porting a Streaming Pipeline from Scala to RustEvan Chan
 
RxJava applied [JavaDay Kyiv 2016]
RxJava applied [JavaDay Kyiv 2016]RxJava applied [JavaDay Kyiv 2016]
RxJava applied [JavaDay Kyiv 2016]Igor Lozynskyi
 
Building a High-Performance Database with Scala, Akka, and Spark
Building a High-Performance Database with Scala, Akka, and SparkBuilding a High-Performance Database with Scala, Akka, and Spark
Building a High-Performance Database with Scala, Akka, and SparkEvan Chan
 
Cassandra 2.1 boot camp, Overview
Cassandra 2.1 boot camp, OverviewCassandra 2.1 boot camp, Overview
Cassandra 2.1 boot camp, OverviewJoshua McKenzie
 
Introduction to Structured Streaming
Introduction to Structured StreamingIntroduction to Structured Streaming
Introduction to Structured StreamingKnoldus Inc.
 
Quantifying Container Runtime Performance: OSCON 2017 Open Container Day
Quantifying Container Runtime Performance: OSCON 2017 Open Container DayQuantifying Container Runtime Performance: OSCON 2017 Open Container Day
Quantifying Container Runtime Performance: OSCON 2017 Open Container DayPhil Estes
 
Lightweight Grids With Terracotta
Lightweight Grids With TerracottaLightweight Grids With Terracotta
Lightweight Grids With TerracottaPT.JUG
 
Accumulo Summit 2016: Introducing Accumulo Collections: A Practical Accumulo ...
Accumulo Summit 2016: Introducing Accumulo Collections: A Practical Accumulo ...Accumulo Summit 2016: Introducing Accumulo Collections: A Practical Accumulo ...
Accumulo Summit 2016: Introducing Accumulo Collections: A Practical Accumulo ...Accumulo Summit
 

Semelhante a HBase Coprocessor Introduction (20)

Hadoop map reduce in operation
Hadoop map reduce in operationHadoop map reduce in operation
Hadoop map reduce in operation
 
weblogic perfomence tuning
weblogic perfomence tuningweblogic perfomence tuning
weblogic perfomence tuning
 
Unleashing your Kafka Streams Application Metrics!
Unleashing your Kafka Streams Application Metrics!Unleashing your Kafka Streams Application Metrics!
Unleashing your Kafka Streams Application Metrics!
 
002 hbase clientapi
002 hbase clientapi002 hbase clientapi
002 hbase clientapi
 
Java 어플리케이션 성능튜닝 Part1
Java 어플리케이션 성능튜닝 Part1Java 어플리케이션 성능튜닝 Part1
Java 어플리케이션 성능튜닝 Part1
 
HBase Client APIs (for webapps?)
HBase Client APIs (for webapps?)HBase Client APIs (for webapps?)
HBase Client APIs (for webapps?)
 
Nextcon samza preso july - final
Nextcon samza preso   july - finalNextcon samza preso   july - final
Nextcon samza preso july - final
 
Java 8 streams
Java 8 streams Java 8 streams
Java 8 streams
 
Finagle and Java Service Framework at Pinterest
Finagle and Java Service Framework at PinterestFinagle and Java Service Framework at Pinterest
Finagle and Java Service Framework at Pinterest
 
Java High Level Stream API
Java High Level Stream APIJava High Level Stream API
Java High Level Stream API
 
Streams Don't Fail Me Now - Robustness Features in Kafka Streams
Streams Don't Fail Me Now - Robustness Features in Kafka StreamsStreams Don't Fail Me Now - Robustness Features in Kafka Streams
Streams Don't Fail Me Now - Robustness Features in Kafka Streams
 
Porting a Streaming Pipeline from Scala to Rust
Porting a Streaming Pipeline from Scala to RustPorting a Streaming Pipeline from Scala to Rust
Porting a Streaming Pipeline from Scala to Rust
 
RxJava applied [JavaDay Kyiv 2016]
RxJava applied [JavaDay Kyiv 2016]RxJava applied [JavaDay Kyiv 2016]
RxJava applied [JavaDay Kyiv 2016]
 
Building a High-Performance Database with Scala, Akka, and Spark
Building a High-Performance Database with Scala, Akka, and SparkBuilding a High-Performance Database with Scala, Akka, and Spark
Building a High-Performance Database with Scala, Akka, and Spark
 
Data Pipeline at Tapad
Data Pipeline at TapadData Pipeline at Tapad
Data Pipeline at Tapad
 
Cassandra 2.1 boot camp, Overview
Cassandra 2.1 boot camp, OverviewCassandra 2.1 boot camp, Overview
Cassandra 2.1 boot camp, Overview
 
Introduction to Structured Streaming
Introduction to Structured StreamingIntroduction to Structured Streaming
Introduction to Structured Streaming
 
Quantifying Container Runtime Performance: OSCON 2017 Open Container Day
Quantifying Container Runtime Performance: OSCON 2017 Open Container DayQuantifying Container Runtime Performance: OSCON 2017 Open Container Day
Quantifying Container Runtime Performance: OSCON 2017 Open Container Day
 
Lightweight Grids With Terracotta
Lightweight Grids With TerracottaLightweight Grids With Terracotta
Lightweight Grids With Terracotta
 
Accumulo Summit 2016: Introducing Accumulo Collections: A Practical Accumulo ...
Accumulo Summit 2016: Introducing Accumulo Collections: A Practical Accumulo ...Accumulo Summit 2016: Introducing Accumulo Collections: A Practical Accumulo ...
Accumulo Summit 2016: Introducing Accumulo Collections: A Practical Accumulo ...
 

Mais de Schubert Zhang

Engineering Culture and Infrastructure
Engineering Culture and InfrastructureEngineering Culture and Infrastructure
Engineering Culture and InfrastructureSchubert Zhang
 
Simple practices in performance monitoring and evaluation
Simple practices in performance monitoring and evaluationSimple practices in performance monitoring and evaluation
Simple practices in performance monitoring and evaluationSchubert Zhang
 
Scrum Agile Development
Scrum Agile DevelopmentScrum Agile Development
Scrum Agile DevelopmentSchubert Zhang
 
Engineering practices in big data storage and processing
Engineering practices in big data storage and processingEngineering practices in big data storage and processing
Engineering practices in big data storage and processingSchubert Zhang
 
Bigtable数据模型解决CDR清单存储问题的资源估算
Bigtable数据模型解决CDR清单存储问题的资源估算Bigtable数据模型解决CDR清单存储问题的资源估算
Bigtable数据模型解决CDR清单存储问题的资源估算Schubert Zhang
 
Big Data Engineering Team Meeting 20120223a
Big Data Engineering Team Meeting 20120223aBig Data Engineering Team Meeting 20120223a
Big Data Engineering Team Meeting 20120223aSchubert Zhang
 
Hadoop大数据实践经验
Hadoop大数据实践经验Hadoop大数据实践经验
Hadoop大数据实践经验Schubert Zhang
 
Wild Thinking of BigdataBase
Wild Thinking of BigdataBaseWild Thinking of BigdataBase
Wild Thinking of BigdataBaseSchubert Zhang
 
RockStor - A Cloud Object System based on Hadoop
RockStor -  A Cloud Object System based on HadoopRockStor -  A Cloud Object System based on Hadoop
RockStor - A Cloud Object System based on HadoopSchubert Zhang
 
Hadoop compress-stream
Hadoop compress-streamHadoop compress-stream
Hadoop compress-streamSchubert Zhang
 
Ganglia轻度使用指南
Ganglia轻度使用指南Ganglia轻度使用指南
Ganglia轻度使用指南Schubert Zhang
 
DaStor/Cassandra report for CDR solution
DaStor/Cassandra report for CDR solutionDaStor/Cassandra report for CDR solution
DaStor/Cassandra report for CDR solutionSchubert Zhang
 
Learning from google megastore (Part-1)
Learning from google megastore (Part-1)Learning from google megastore (Part-1)
Learning from google megastore (Part-1)Schubert Zhang
 

Mais de Schubert Zhang (20)

Blockchain in Action
Blockchain in ActionBlockchain in Action
Blockchain in Action
 
科普区块链
科普区块链科普区块链
科普区块链
 
Engineering Culture and Infrastructure
Engineering Culture and InfrastructureEngineering Culture and Infrastructure
Engineering Culture and Infrastructure
 
Simple practices in performance monitoring and evaluation
Simple practices in performance monitoring and evaluationSimple practices in performance monitoring and evaluation
Simple practices in performance monitoring and evaluation
 
Scrum Agile Development
Scrum Agile DevelopmentScrum Agile Development
Scrum Agile Development
 
Career Advice
Career AdviceCareer Advice
Career Advice
 
Engineering practices in big data storage and processing
Engineering practices in big data storage and processingEngineering practices in big data storage and processing
Engineering practices in big data storage and processing
 
HiveServer2
HiveServer2HiveServer2
HiveServer2
 
Horizon for Big Data
Horizon for Big DataHorizon for Big Data
Horizon for Big Data
 
Bigtable数据模型解决CDR清单存储问题的资源估算
Bigtable数据模型解决CDR清单存储问题的资源估算Bigtable数据模型解决CDR清单存储问题的资源估算
Bigtable数据模型解决CDR清单存储问题的资源估算
 
Big Data Engineering Team Meeting 20120223a
Big Data Engineering Team Meeting 20120223aBig Data Engineering Team Meeting 20120223a
Big Data Engineering Team Meeting 20120223a
 
Hadoop大数据实践经验
Hadoop大数据实践经验Hadoop大数据实践经验
Hadoop大数据实践经验
 
Wild Thinking of BigdataBase
Wild Thinking of BigdataBaseWild Thinking of BigdataBase
Wild Thinking of BigdataBase
 
RockStor - A Cloud Object System based on Hadoop
RockStor -  A Cloud Object System based on HadoopRockStor -  A Cloud Object System based on Hadoop
RockStor - A Cloud Object System based on Hadoop
 
Fans of running gump
Fans of running gumpFans of running gump
Fans of running gump
 
Hadoop compress-stream
Hadoop compress-streamHadoop compress-stream
Hadoop compress-stream
 
Ganglia轻度使用指南
Ganglia轻度使用指南Ganglia轻度使用指南
Ganglia轻度使用指南
 
DaStor/Cassandra report for CDR solution
DaStor/Cassandra report for CDR solutionDaStor/Cassandra report for CDR solution
DaStor/Cassandra report for CDR solution
 
Big data and cloud
Big data and cloudBig data and cloud
Big data and cloud
 
Learning from google megastore (Part-1)
Learning from google megastore (Part-1)Learning from google megastore (Part-1)
Learning from google megastore (Part-1)
 

Último

Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptxLBM Solutions
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...shyamraj55
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machinePadma Pradeep
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitecturePixlogix Infotech
 
Benefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksBenefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksSoftradix Technologies
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Allon Mureinik
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...HostedbyConfluent
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 3652toLead Limited
 
Azure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & ApplicationAzure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & ApplicationAndikSusilo4
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure servicePooja Nehwal
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 

Último (20)

Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptx
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machine
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC Architecture
 
Benefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksBenefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other Frameworks
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen Frames
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
 
Azure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & ApplicationAzure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & Application
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 

HBase Coprocessor Introduction

  • 1. HBase Coprocessor Intro. Anty Rao, Schubert Zhang Aug. 29 2012
  • 2. Motivation • Distributed and parallel computation over data stored within HBase/Bigtable. • Architecture: {HBase + MapReduce} vs. {HBase with Coprocessor} – {Loosely coupled} vs. {Built in} – E.g., simple additive or aggregating operations like summing, counting, and the like – pushing the computation down to the servers where it can operate on the data directly without communication overheads can give a dramatic performance improvement over HBase’s already good scanning performance. • To be a framework for both flexible and generic extension, and of distributed computation directly within the HBase server processes. – Arbitrary code can run at each tablet in each HBase server. – Provides a very flexible model for building distributed services. – Automatic scaling, load balancing, request routing for applications.
  • 3. Motivation (cont.) • To be a Data-Driven distributed and parallel service platform. – Distributed parallel computation framework. – Distributed application service platform. • High-level call interface for clients – Calls are addressed to rows or ranges of rows and the coprocessor client library resolves them to actual locations; – Calls across multiple rows are automatically split into multiple parallelized RPC. • Origin – Inspired by Google’s Bigtable Coprocessors. – Jeff Dean gave a talk at LADIS’09 • http://www.cs.cornell.edu/projects/ladis2009/talks/dean-keynote-ladis2009.pdf, page 66-67
  • 4. HBase vs. Google Bigtable • It is a framework that provides a library and runtime environment for executing user code within the HBase region server and master processes. • Google coprocessors in contrast run co-located with the tablet server but outside of its address space. – https://issues.apache.org/jira/browse/HBASE-4047
  • 6. Overview of HBase Coprocessor • Tow scopes – System : loaded globally on all tables and regions. – Per-table: loaded on all regions for a table. • Two types – Observers • Like triggers in conventional databases. • The idea behind observers is that we can insert user code by overriding upcall methods provided by the coprocessor framework. The callback functions are executed from core HBase code when certain events occur. – Endpoints • Dynamic PRC endpoints that resemble stored procedures. • One can invoke an endpoint at any time from the client. The endpoint implementation will then be executed remotely at the target region or regions, and results from those executions will be returned to the client. • Difference of the tow types – Only endpoints return result to client.
  • 7. Observers • Currently, three observers interfaces provided – RegionObserver • Provides hooks for data manipulation events, Get, Put, Delete, Scan, and so on. There is an instance of a RegionObserver coprocessor for every table region and the scope of the observations they can make is constrained to that region – WALObserver • Provides hooks for write-ahead log (WAL) related operations. This is a way to observe or intercept WAL writing and reconstruction events. A WALObserver runs in the context of WAL processing. There is one such context per region server. – MasterObserver • Provides hooks for DDL-type operation, i.e., create, delete, modify table, etc. The MasterObserver runs within the context of the HBase master. • Multiple Observers are chained to execute sequentially by order of assigned priorities.
  • 9. Observers: Example Code package org.apache.hadoop.hbase.coprocessor; import java.util.List; import org.apache.hadoop.hbase.KeyValue; import org.apache.hadoop.hbase.client.Get; // Sample access-control coprocessor. It utilizes RegionObserver // and intercept preXXX() method to check user privilege for the given table // and column family. public class AccessControlCoprocessor extends BaseRegionObserver { @Override public void preGet(final ObserverContext<RegionCoprocessorEnvironment> c, final Get get, final List<KeyValue> result) throws IOException throws IOException { // check permissions.. if (!permissionGranted()) { throw new AccessDeniedException("User is not allowed to access."); } } // override prePut(), preDelete(), etc. }
  • 10. Endpoint • Resembling stored procedures • Invoke an endpoint at any time from the client. • The endpoint implementation will then be executed remotely at the target region or regions. • Result from those executions will be returned to the client. • Code implementation – Endpoint is an interface for dynamic RPC extension.
  • 11. Endpoints: How to implement a custom Coprocessor? • Have a new protocol interface which extends CoprocessorProtocol. • Implement the Endpoint interface and the new protocol interface . The implementation will be loaded into and executed from the region context. – Extend the abstract class BaseEndpointCoprocessor. This convenience class hide some internal details that the implementer need not necessary be concerned about, such as coprocessor class loading. • On the client side, the Endpoints can be invoked by two new HBase client APIs: – Executing against a single region: • HTableInterface.coprocessorProxy(Class<T> protocol, byte[] row) – Executing against a range of regions: • HTableInterface.coprocessorExec(Class<T> protocol, byte[] startKey, byte[] endKey, Batch.Call<T,R> callable)
  • 12. Endpoints: Example Client Code new Batch.Call (on all regions) Region Server 1 Endpoint Batch.Call<ColumnAggregationProtocol, Long>() tableA, , 12345678 { ColumnAggregationProtocol .) (.. public Long call(ColumnAggregationProtocol instance) or throws IOException ss Endpoint e oc { tableA, bbbb, 12345678 pr return instance.sum(FAMILY, QUALIFIER); Co ColumnAggregationProtocol ec } ex } HTable Region Server 2 Endpoint Map<byte[], Long> sumResults = tableA, cccc, 12345678 table.coprocessorExec(ColumnAggregationProtocol.class, ColumnAggregationProtocol startRow, endRow) Endpoint tableA, dddd, 12345678 ColumnAggregationProtocol Batch Results Map<byte[], Long> sumResults • Note that the HBase client has the responsibility for dispatching parallel endpoint invocations to the target regions, and for collecting the returned results to present to the application code. • Like a lightweight MapReduce job: The “map” is the endpoint execution performed in the region server on every target region, and the “reduce” is the final aggregation at the client. • The distributed systems programming details behind a clean API.
  • 13. Step-1: Define protocol interface /** * A sample protocol for performing aggregation at regions. */ public interface ColumnAggregationProtocol extends CoprocessorProtocol { /** * Perform aggregation for a given column at the region. The aggregation * will include all the rows inside the region. It can be extended to allow * passing start and end rows for a fine-grained aggregation. * * @param family * family * @param qualifier * qualifier * @return Aggregation of the column. * @throws exception. */ public long sum(byte[] family, byte[] qualifier) throws IOException; }
  • 14. Step-2: Implement endpoint and the interface try public class ColumnAggregationEndpoint extends { BaseEndpointCoprocessor List<KeyValue> curVals = new ArrayList<KeyValue>(); implements ColumnAggregationProtocol boolean done = false; { do @Override { public long sum(byte[] family, byte[] qualifier) throws curVals.clear(); IOException done = scanner.next(curVals); { KeyValue kv = curVals.get(0); // aggregate at each region sumResult += Scan scan = new Scan(); Bytes.toLong(kv.getBuffer(), kv.getValueOffset()); scan.addColumn(family, qualifier); } while (done); long sumResult = 0; } finally InternalScanner scanner = { ((RegionCoprocessorEnvironment) scanner.close(); getEnvironment()).getRegion() } .getScanner(scan); return sumResult; } }
  • 15. Step-3 Deployment • Two chooses – Load from configuration (hbase-site.xml, restart HBase) – Load from table attribute (disable and enable table) • From shell
  • 16. Step-4: Invoking HTable table = new HTable(util.getConfiguration(), TEST_TABLE); • On client Map<byte[], Long> results; // scan: for all regions side, invoking results = table.coprocessorExec(ColumnAggregationProtocol.class, the endpoint ROWS[rowSeperator1 - 1], ROWS[rowSeperator2 + 1], new Batch.Call<ColumnAggregationProtocol, Long>() { public Long call(ColumnAggregationProtocol instance) throws IOException { return instance .sum(TEST_FAMILY, TEST_QUALIFIER); } }); long sumResult = 0; long expectedResult = 0; for (Map.Entry<byte[], Long> e : results.entrySet()) { sumResult += e.getValue(); }
  • 17. Server side execution • Region Server public interface HRegionInterface provide extends VersionedProtocol, environment to Stoppable,Abortable execute custom coprocessor in { region context. … • Exec ExecResult execCoprocessor(byte[] – Custom protocol regionName, Exec call) throws name IOException; – Method name … – Method parameters }
  • 18. Coprocessor Manangement • Build your own Coprocessor – Write server-side coprocessor code like above example, compiled and packaged as a jar file. • CoprocessorProtocol (e.g. ColumnAggregationProtocol) • Endpoint implementation (e.g. ColumnAggregationEndpoint) • Coprocessor Deployment – Load from Configuration (hbase-site.xml, restart HBase) • The jar file must be in classpath of HBase servers. • Global for all regions of all tables (system coprocessors). – Load from table attribute (from shell) • per table basis • The jar file should be put into HDFS or HBase servers’ classpath firstly, and set in the table attribute.
  • 19. Future Work based on Coprocessors • Parallel Computation Framework (our first goal!) • Others – Higher level of abstraction – E.g. MapReduce APIs similar. – External Coprocessor Host – Integration and implementation Dremel and/or dremel (HBASE-4047) computation model into HBase. • separate processes • Distributed application service platform (our second – Code Weaving (HBASE-2058) goal !?) • protect against malicious actions – Higher level of abstraction or faults accidentally introduced – Data-driven distributed application architecture. by a coprocessor. – Avoid building similar distributed architecture repeatedly. – … • HBase system enhancements – HBase internal measurements and statistics for administration. • Support application like percolator – Observes and notifications.