SlideShare uma empresa Scribd logo
1 de 12
Baixar para ler offline
Buzzwords Berlin HBase Hackathon, June 2012
Apache Flume and HBase
Alexander Alten-Lorenz | Customer Operations Engineer



                                                        1
About Me

    •   COPS Engineer @ Cloudera
    •   Apache Flume Contributor
    •   Working with hadoop since 2009
    •   Blogger (mapredit.blogspot.com)
    •   Speaker at Conferences / Meetups /
        Tooling Events



2                                    ©2012
                       Cloudera, Inc. All Rights Reserved.


                                                             2
Flume 1.x

    • Mass event collector
    • Stream data (events, not files) from clients
      to sinks
    • Clients: files, syslog, avro, seq, exec
    • Sinks: HDFS files, HBase, …
    • Configurable routing / topology




3                                    ©2012
                       Cloudera, Inc. All Rights Reserved.


                                                             3
Architecture
    Component   Function

    Agent       The JVM running Flume. One per machine. Runs
                many sources and sinks.
    Client      Produces data in the form of events. Runs in a
                separate thread.
    Sink        Receives events from a channel. Runs in a separate
                thread.
    Channel     Connects sources to sinks (like a queue).
                Implements the reliability semantics.
    Event       A single datum; a log record, an avro object, etc.
                Normally around ~4KB.




4                                 ©2012
                    Cloudera, Inc. All Rights Reserved.


                                                                     4
Agent

    • Runs many clients and sinks
    • Java properties-based configuration
    • Low overhead (-Xmx20m)
      – adding RAM increases performance
      – setting Xms prevent in time memory allocation
      – Batching increase performance dramatically




5                                    ©2012
                       Cloudera, Inc. All Rights Reserved.


                                                             5
Sources

    • Plugin interface
    • Managed by a SourceRunner that controls
      threading and execution model (e.g. polling
      vs. event-based)
    • Included: exec, avro, syslog, seq




6                                   ©2012
                      Cloudera, Inc. All Rights Reserved.


                                                            6
HBase sink
    ls -la flume-ng-sinks/flume-ng-hbase-sink/
    src/main/java/org/apache/flume/sink/hbase/

    HBaseSink.java
    HbaseEventSerializer.java
    SimpleHbaseEventSerializer.java
    SimpleRowKeyGenerator.java




7                                  ©2012
                     Cloudera, Inc. All Rights Reserved.


                                                           7
HBaseSink.java


•   Control flush()
•   Using serializer
•   Control the transaction
•   Control rollbacks (in case of events couldn’t
    written)




8                                   ©2012
                      Cloudera, Inc. All Rights Reserved.


                                                            8
Configuration


    •   Source Seq interface
    •   Listening on a defined port @localhost
    •   Serializer need some parameters
    •   Column family and column must be known
    •   Valid hbase-site.xml in $CLASSPATH



9                                   ©2012
                      Cloudera, Inc. All Rights Reserved.


                                                            9
Configuration Example
host1.sources = src1
host1.sinks = sink1
host1.channels = ch1

host1.sources.src1.type = seq
host1.sources.src1.port = 25001
host1.sources.src1.bind = localhost
host1.sources.src1.channels = ch1
host1.sinks.sink1.type = org.apache.flume.sink.hbase.HBaseSink
host1.sinks.sink1.channel = ch1
host1.sinks.sink1.table = test3
host1.sinks.sink1.columnFamily = testing
host1.sinks.sink1.column = foo
host1.sinks.sink1.serializer =
org.apache.flume.sink.hbase.SimpleHbaseEventSerializer
host1.sinks.sink1.serializer.payloadColumn = pcol
host1.sinks.sink1.serializer.incrementColumn = icol
host1.channels.ch1.type=memory


10                                     ©2012
                         Cloudera, Inc. All Rights Reserved.


                                                                 10
Take Away


 •   Flume collects events
 •   Source - Channel - Sink concept
 •   HBase sink needs a serializer interface
 •   Column family and column must be known




11                                ©2012
                    Cloudera, Inc. All Rights Reserved.


                                                          11
Thank You

 • Web: https://cwiki.apache.org/FLUME/
   getting-started.html
 • ML: flume-user@incubator.apache.org

 • Mail: alexander@cloudera.com
 • Blog: mapredit.blogspot.com
 • Twitter: @mapredit


12                              ©2012
                  Cloudera, Inc. All Rights Reserved.


                                                        12

Mais conteúdo relacionado

Mais procurados

Large scale near real-time log indexing with Flume and SolrCloud
Large scale near real-time log indexing with Flume and SolrCloudLarge scale near real-time log indexing with Flume and SolrCloud
Large scale near real-time log indexing with Flume and SolrCloudDataWorks Summit
 
Apache Kafka
Apache KafkaApache Kafka
Apache KafkaJoe Stein
 
Cross-Site BigTable using HBase
Cross-Site BigTable using HBaseCross-Site BigTable using HBase
Cross-Site BigTable using HBaseHBaseCon
 
Query Pulsar Streams using Apache Flink
Query Pulsar Streams using Apache FlinkQuery Pulsar Streams using Apache Flink
Query Pulsar Streams using Apache FlinkStreamNative
 
From Message to Cluster: A Realworld Introduction to Kafka Capacity Planning
From Message to Cluster: A Realworld Introduction to Kafka Capacity PlanningFrom Message to Cluster: A Realworld Introduction to Kafka Capacity Planning
From Message to Cluster: A Realworld Introduction to Kafka Capacity Planningconfluent
 
Introducing HerdDB - a distributed JVM embeddable database built upon Apache ...
Introducing HerdDB - a distributed JVM embeddable database built upon Apache ...Introducing HerdDB - a distributed JVM embeddable database built upon Apache ...
Introducing HerdDB - a distributed JVM embeddable database built upon Apache ...StreamNative
 
Eron Wright - Flink Security Enhancements
Eron Wright - Flink Security EnhancementsEron Wright - Flink Security Enhancements
Eron Wright - Flink Security EnhancementsFlink Forward
 
Load Balancing with Apache
Load Balancing with ApacheLoad Balancing with Apache
Load Balancing with ApacheBradley Holt
 
Apache Flume and its use case in Manufacturing
Apache Flume and its use case in ManufacturingApache Flume and its use case in Manufacturing
Apache Flume and its use case in ManufacturingRapheephan Thongkham-Uan
 
Introduction Apache Kafka
Introduction Apache KafkaIntroduction Apache Kafka
Introduction Apache KafkaJoe Stein
 
High Concurrency Architecture and Laravel Performance Tuning
High Concurrency Architecture and Laravel Performance TuningHigh Concurrency Architecture and Laravel Performance Tuning
High Concurrency Architecture and Laravel Performance TuningAlbert Chen
 
AMIS SIG - Introducing Apache Kafka - Scalable, reliable Event Bus & Message ...
AMIS SIG - Introducing Apache Kafka - Scalable, reliable Event Bus & Message ...AMIS SIG - Introducing Apache Kafka - Scalable, reliable Event Bus & Message ...
AMIS SIG - Introducing Apache Kafka - Scalable, reliable Event Bus & Message ...Lucas Jellema
 
HBaseConEast2016: Coprocessors – Uses, Abuses and Solutions
HBaseConEast2016: Coprocessors – Uses, Abuses and SolutionsHBaseConEast2016: Coprocessors – Uses, Abuses and Solutions
HBaseConEast2016: Coprocessors – Uses, Abuses and SolutionsMichael Stack
 
A Unified Platform for Real-time Storage and Processing
A Unified Platform for Real-time Storage and ProcessingA Unified Platform for Real-time Storage and Processing
A Unified Platform for Real-time Storage and ProcessingStreamNative
 
How Orange Financial combat financial frauds over 50M transactions a day usin...
How Orange Financial combat financial frauds over 50M transactions a day usin...How Orange Financial combat financial frauds over 50M transactions a day usin...
How Orange Financial combat financial frauds over 50M transactions a day usin...JinfengHuang3
 
HBaseCon 2013: How to Get the MTTR Below 1 Minute and More
HBaseCon 2013: How to Get the MTTR Below 1 Minute and MoreHBaseCon 2013: How to Get the MTTR Below 1 Minute and More
HBaseCon 2013: How to Get the MTTR Below 1 Minute and MoreCloudera, Inc.
 
Visualizing Kafka Security
Visualizing Kafka SecurityVisualizing Kafka Security
Visualizing Kafka SecurityDataWorks Summit
 

Mais procurados (20)

Large scale near real-time log indexing with Flume and SolrCloud
Large scale near real-time log indexing with Flume and SolrCloudLarge scale near real-time log indexing with Flume and SolrCloud
Large scale near real-time log indexing with Flume and SolrCloud
 
Apache Kafka
Apache KafkaApache Kafka
Apache Kafka
 
Highlights Of Sqoop2
Highlights Of Sqoop2Highlights Of Sqoop2
Highlights Of Sqoop2
 
Cross-Site BigTable using HBase
Cross-Site BigTable using HBaseCross-Site BigTable using HBase
Cross-Site BigTable using HBase
 
Query Pulsar Streams using Apache Flink
Query Pulsar Streams using Apache FlinkQuery Pulsar Streams using Apache Flink
Query Pulsar Streams using Apache Flink
 
From Message to Cluster: A Realworld Introduction to Kafka Capacity Planning
From Message to Cluster: A Realworld Introduction to Kafka Capacity PlanningFrom Message to Cluster: A Realworld Introduction to Kafka Capacity Planning
From Message to Cluster: A Realworld Introduction to Kafka Capacity Planning
 
Kafka blr-meetup-presentation - Kafka internals
Kafka blr-meetup-presentation - Kafka internalsKafka blr-meetup-presentation - Kafka internals
Kafka blr-meetup-presentation - Kafka internals
 
Introducing HerdDB - a distributed JVM embeddable database built upon Apache ...
Introducing HerdDB - a distributed JVM embeddable database built upon Apache ...Introducing HerdDB - a distributed JVM embeddable database built upon Apache ...
Introducing HerdDB - a distributed JVM embeddable database built upon Apache ...
 
Eron Wright - Flink Security Enhancements
Eron Wright - Flink Security EnhancementsEron Wright - Flink Security Enhancements
Eron Wright - Flink Security Enhancements
 
Load Balancing with Apache
Load Balancing with ApacheLoad Balancing with Apache
Load Balancing with Apache
 
Apache Flume and its use case in Manufacturing
Apache Flume and its use case in ManufacturingApache Flume and its use case in Manufacturing
Apache Flume and its use case in Manufacturing
 
Introduction Apache Kafka
Introduction Apache KafkaIntroduction Apache Kafka
Introduction Apache Kafka
 
High Concurrency Architecture and Laravel Performance Tuning
High Concurrency Architecture and Laravel Performance TuningHigh Concurrency Architecture and Laravel Performance Tuning
High Concurrency Architecture and Laravel Performance Tuning
 
Apache Kafka Best Practices
Apache Kafka Best PracticesApache Kafka Best Practices
Apache Kafka Best Practices
 
AMIS SIG - Introducing Apache Kafka - Scalable, reliable Event Bus & Message ...
AMIS SIG - Introducing Apache Kafka - Scalable, reliable Event Bus & Message ...AMIS SIG - Introducing Apache Kafka - Scalable, reliable Event Bus & Message ...
AMIS SIG - Introducing Apache Kafka - Scalable, reliable Event Bus & Message ...
 
HBaseConEast2016: Coprocessors – Uses, Abuses and Solutions
HBaseConEast2016: Coprocessors – Uses, Abuses and SolutionsHBaseConEast2016: Coprocessors – Uses, Abuses and Solutions
HBaseConEast2016: Coprocessors – Uses, Abuses and Solutions
 
A Unified Platform for Real-time Storage and Processing
A Unified Platform for Real-time Storage and ProcessingA Unified Platform for Real-time Storage and Processing
A Unified Platform for Real-time Storage and Processing
 
How Orange Financial combat financial frauds over 50M transactions a day usin...
How Orange Financial combat financial frauds over 50M transactions a day usin...How Orange Financial combat financial frauds over 50M transactions a day usin...
How Orange Financial combat financial frauds over 50M transactions a day usin...
 
HBaseCon 2013: How to Get the MTTR Below 1 Minute and More
HBaseCon 2013: How to Get the MTTR Below 1 Minute and MoreHBaseCon 2013: How to Get the MTTR Below 1 Minute and More
HBaseCon 2013: How to Get the MTTR Below 1 Minute and More
 
Visualizing Kafka Security
Visualizing Kafka SecurityVisualizing Kafka Security
Visualizing Kafka Security
 

Semelhante a Flume and HBase

What's New and Noteworthy on Oracle CAF 12.1.3
What's New and Noteworthy on Oracle CAF 12.1.3What's New and Noteworthy on Oracle CAF 12.1.3
What's New and Noteworthy on Oracle CAF 12.1.3Bruno Borges
 
Event Detection Pipelines with Apache Kafka
Event Detection Pipelines with Apache KafkaEvent Detection Pipelines with Apache Kafka
Event Detection Pipelines with Apache KafkaDataWorks Summit
 
ActiveMQ Performance Tuning
ActiveMQ Performance TuningActiveMQ Performance Tuning
ActiveMQ Performance TuningChristian Posta
 
WebLogic 12c - OMF Canberra June 2014
WebLogic 12c - OMF Canberra June 2014WebLogic 12c - OMF Canberra June 2014
WebLogic 12c - OMF Canberra June 2014Joelith
 
Oracle Enterprise Manager - EM12c R5 Hybrid Cloud Management
Oracle Enterprise Manager - EM12c R5 Hybrid Cloud ManagementOracle Enterprise Manager - EM12c R5 Hybrid Cloud Management
Oracle Enterprise Manager - EM12c R5 Hybrid Cloud ManagementMarketingArrowECS_CZ
 
Flume @ Austin HUG 2/17/11
Flume @ Austin HUG 2/17/11Flume @ Austin HUG 2/17/11
Flume @ Austin HUG 2/17/11Cloudera, Inc.
 
Lessons learned in reaching multi-host container networking
Lessons learned in reaching multi-host container networkingLessons learned in reaching multi-host container networking
Lessons learned in reaching multi-host container networkingTony Georgiev
 
1 architecture & design
1   architecture & design1   architecture & design
1 architecture & designMark Swarbrick
 
ServerTemplate™ Deep Dive: Configuration for Multi-Cloud Environments
ServerTemplate™ Deep Dive: Configuration for Multi-Cloud EnvironmentsServerTemplate™ Deep Dive: Configuration for Multi-Cloud Environments
ServerTemplate™ Deep Dive: Configuration for Multi-Cloud EnvironmentsRightScale
 
Applications on Hadoop
Applications on HadoopApplications on Hadoop
Applications on Hadoopmarkgrover
 
Real Time Data Processing Using Spark Streaming
Real Time Data Processing Using Spark StreamingReal Time Data Processing Using Spark Streaming
Real Time Data Processing Using Spark StreamingHari Shreedharan
 
Apache Airflow (incubating) NL HUG Meetup 2016-07-19
Apache Airflow (incubating) NL HUG Meetup 2016-07-19Apache Airflow (incubating) NL HUG Meetup 2016-07-19
Apache Airflow (incubating) NL HUG Meetup 2016-07-19Bolke de Bruin
 
Get most out of Spark on YARN
Get most out of Spark on YARNGet most out of Spark on YARN
Get most out of Spark on YARNDataWorks Summit
 
One Click Hadoop Clusters - Anywhere (Using Docker)
One Click Hadoop Clusters - Anywhere (Using Docker)One Click Hadoop Clusters - Anywhere (Using Docker)
One Click Hadoop Clusters - Anywhere (Using Docker)DataWorks Summit
 
Big Data Day LA 2016/ NoSQL track - Apache Kudu: Fast Analytics on Fast Data,...
Big Data Day LA 2016/ NoSQL track - Apache Kudu: Fast Analytics on Fast Data,...Big Data Day LA 2016/ NoSQL track - Apache Kudu: Fast Analytics on Fast Data,...
Big Data Day LA 2016/ NoSQL track - Apache Kudu: Fast Analytics on Fast Data,...Data Con LA
 
NYC Identity Summit Tech Day: ForgeRock DevOps/Cloud Strategy
NYC Identity Summit Tech Day: ForgeRock DevOps/Cloud StrategyNYC Identity Summit Tech Day: ForgeRock DevOps/Cloud Strategy
NYC Identity Summit Tech Day: ForgeRock DevOps/Cloud StrategyForgeRock
 
Webinar: The Future of Hadoop
Webinar: The Future of HadoopWebinar: The Future of Hadoop
Webinar: The Future of HadoopCloudera, Inc.
 

Semelhante a Flume and HBase (20)

Spark+flume seattle
Spark+flume seattleSpark+flume seattle
Spark+flume seattle
 
Hadoop Operations
Hadoop OperationsHadoop Operations
Hadoop Operations
 
What's New and Noteworthy on Oracle CAF 12.1.3
What's New and Noteworthy on Oracle CAF 12.1.3What's New and Noteworthy on Oracle CAF 12.1.3
What's New and Noteworthy on Oracle CAF 12.1.3
 
Event Detection Pipelines with Apache Kafka
Event Detection Pipelines with Apache KafkaEvent Detection Pipelines with Apache Kafka
Event Detection Pipelines with Apache Kafka
 
ActiveMQ Performance Tuning
ActiveMQ Performance TuningActiveMQ Performance Tuning
ActiveMQ Performance Tuning
 
WebLogic 12c - OMF Canberra June 2014
WebLogic 12c - OMF Canberra June 2014WebLogic 12c - OMF Canberra June 2014
WebLogic 12c - OMF Canberra June 2014
 
Oracle Enterprise Manager - EM12c R5 Hybrid Cloud Management
Oracle Enterprise Manager - EM12c R5 Hybrid Cloud ManagementOracle Enterprise Manager - EM12c R5 Hybrid Cloud Management
Oracle Enterprise Manager - EM12c R5 Hybrid Cloud Management
 
Flume @ Austin HUG 2/17/11
Flume @ Austin HUG 2/17/11Flume @ Austin HUG 2/17/11
Flume @ Austin HUG 2/17/11
 
Lessons learned in reaching multi-host container networking
Lessons learned in reaching multi-host container networkingLessons learned in reaching multi-host container networking
Lessons learned in reaching multi-host container networking
 
1 architecture & design
1   architecture & design1   architecture & design
1 architecture & design
 
ServerTemplate™ Deep Dive: Configuration for Multi-Cloud Environments
ServerTemplate™ Deep Dive: Configuration for Multi-Cloud EnvironmentsServerTemplate™ Deep Dive: Configuration for Multi-Cloud Environments
ServerTemplate™ Deep Dive: Configuration for Multi-Cloud Environments
 
Applications on Hadoop
Applications on HadoopApplications on Hadoop
Applications on Hadoop
 
Real Time Data Processing Using Spark Streaming
Real Time Data Processing Using Spark StreamingReal Time Data Processing Using Spark Streaming
Real Time Data Processing Using Spark Streaming
 
Apache Airflow (incubating) NL HUG Meetup 2016-07-19
Apache Airflow (incubating) NL HUG Meetup 2016-07-19Apache Airflow (incubating) NL HUG Meetup 2016-07-19
Apache Airflow (incubating) NL HUG Meetup 2016-07-19
 
Get most out of Spark on YARN
Get most out of Spark on YARNGet most out of Spark on YARN
Get most out of Spark on YARN
 
One Click Hadoop Clusters - Anywhere (Using Docker)
One Click Hadoop Clusters - Anywhere (Using Docker)One Click Hadoop Clusters - Anywhere (Using Docker)
One Click Hadoop Clusters - Anywhere (Using Docker)
 
Apache kafka
Apache kafkaApache kafka
Apache kafka
 
Big Data Day LA 2016/ NoSQL track - Apache Kudu: Fast Analytics on Fast Data,...
Big Data Day LA 2016/ NoSQL track - Apache Kudu: Fast Analytics on Fast Data,...Big Data Day LA 2016/ NoSQL track - Apache Kudu: Fast Analytics on Fast Data,...
Big Data Day LA 2016/ NoSQL track - Apache Kudu: Fast Analytics on Fast Data,...
 
NYC Identity Summit Tech Day: ForgeRock DevOps/Cloud Strategy
NYC Identity Summit Tech Day: ForgeRock DevOps/Cloud StrategyNYC Identity Summit Tech Day: ForgeRock DevOps/Cloud Strategy
NYC Identity Summit Tech Day: ForgeRock DevOps/Cloud Strategy
 
Webinar: The Future of Hadoop
Webinar: The Future of HadoopWebinar: The Future of Hadoop
Webinar: The Future of Hadoop
 

Mais de Alexander Alten

Creating a value chain with IoT
Creating a value chain with IoTCreating a value chain with IoT
Creating a value chain with IoTAlexander Alten
 
Big Data in an modern Enterprise
Big Data in an modern EnterpriseBig Data in an modern Enterprise
Big Data in an modern EnterpriseAlexander Alten
 
Beyond Hadoop and MapReduce
Beyond Hadoop and MapReduceBeyond Hadoop and MapReduce
Beyond Hadoop and MapReduceAlexander Alten
 
Sentry - An Introduction
Sentry - An Introduction Sentry - An Introduction
Sentry - An Introduction Alexander Alten
 
Cloudera Impala - HUG Karlsruhe, July 04, 2013
Cloudera Impala - HUG Karlsruhe, July 04, 2013Cloudera Impala - HUG Karlsruhe, July 04, 2013
Cloudera Impala - HUG Karlsruhe, July 04, 2013Alexander Alten
 
Bi with apache hadoop(en)
Bi with apache hadoop(en)Bi with apache hadoop(en)
Bi with apache hadoop(en)Alexander Alten
 
BI mit Apache Hadoop (CDH)
BI mit Apache Hadoop (CDH)BI mit Apache Hadoop (CDH)
BI mit Apache Hadoop (CDH)Alexander Alten
 
Filesystems, RPC and HDFS
Filesystems, RPC and HDFSFilesystems, RPC and HDFS
Filesystems, RPC and HDFSAlexander Alten
 
Big Data mit Apache Hadoop
Big Data mit Apache HadoopBig Data mit Apache Hadoop
Big Data mit Apache HadoopAlexander Alten
 

Mais de Alexander Alten (11)

Is big data dead?
Is big data dead?Is big data dead?
Is big data dead?
 
Creating a value chain with IoT
Creating a value chain with IoTCreating a value chain with IoT
Creating a value chain with IoT
 
Big Data in an modern Enterprise
Big Data in an modern EnterpriseBig Data in an modern Enterprise
Big Data in an modern Enterprise
 
The Future of Energy
The Future of EnergyThe Future of Energy
The Future of Energy
 
Beyond Hadoop and MapReduce
Beyond Hadoop and MapReduceBeyond Hadoop and MapReduce
Beyond Hadoop and MapReduce
 
Sentry - An Introduction
Sentry - An Introduction Sentry - An Introduction
Sentry - An Introduction
 
Cloudera Impala - HUG Karlsruhe, July 04, 2013
Cloudera Impala - HUG Karlsruhe, July 04, 2013Cloudera Impala - HUG Karlsruhe, July 04, 2013
Cloudera Impala - HUG Karlsruhe, July 04, 2013
 
Bi with apache hadoop(en)
Bi with apache hadoop(en)Bi with apache hadoop(en)
Bi with apache hadoop(en)
 
BI mit Apache Hadoop (CDH)
BI mit Apache Hadoop (CDH)BI mit Apache Hadoop (CDH)
BI mit Apache Hadoop (CDH)
 
Filesystems, RPC and HDFS
Filesystems, RPC and HDFSFilesystems, RPC and HDFS
Filesystems, RPC and HDFS
 
Big Data mit Apache Hadoop
Big Data mit Apache HadoopBig Data mit Apache Hadoop
Big Data mit Apache Hadoop
 

Último

Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)wesley chun
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUK Journal
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
 
Evaluating the top large language models.pdf
Evaluating the top large language models.pdfEvaluating the top large language models.pdf
Evaluating the top large language models.pdfChristopherTHyatt
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsJoaquim Jorge
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century educationjfdjdjcjdnsjd
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?Antenna Manufacturer Coco
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 

Último (20)

Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Evaluating the top large language models.pdf
Evaluating the top large language models.pdfEvaluating the top large language models.pdf
Evaluating the top large language models.pdf
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 

Flume and HBase

  • 1. Buzzwords Berlin HBase Hackathon, June 2012 Apache Flume and HBase Alexander Alten-Lorenz | Customer Operations Engineer 1
  • 2. About Me • COPS Engineer @ Cloudera • Apache Flume Contributor • Working with hadoop since 2009 • Blogger (mapredit.blogspot.com) • Speaker at Conferences / Meetups / Tooling Events 2 ©2012 Cloudera, Inc. All Rights Reserved. 2
  • 3. Flume 1.x • Mass event collector • Stream data (events, not files) from clients to sinks • Clients: files, syslog, avro, seq, exec • Sinks: HDFS files, HBase, … • Configurable routing / topology 3 ©2012 Cloudera, Inc. All Rights Reserved. 3
  • 4. Architecture Component Function Agent The JVM running Flume. One per machine. Runs many sources and sinks. Client Produces data in the form of events. Runs in a separate thread. Sink Receives events from a channel. Runs in a separate thread. Channel Connects sources to sinks (like a queue). Implements the reliability semantics. Event A single datum; a log record, an avro object, etc. Normally around ~4KB. 4 ©2012 Cloudera, Inc. All Rights Reserved. 4
  • 5. Agent • Runs many clients and sinks • Java properties-based configuration • Low overhead (-Xmx20m) – adding RAM increases performance – setting Xms prevent in time memory allocation – Batching increase performance dramatically 5 ©2012 Cloudera, Inc. All Rights Reserved. 5
  • 6. Sources • Plugin interface • Managed by a SourceRunner that controls threading and execution model (e.g. polling vs. event-based) • Included: exec, avro, syslog, seq 6 ©2012 Cloudera, Inc. All Rights Reserved. 6
  • 7. HBase sink ls -la flume-ng-sinks/flume-ng-hbase-sink/ src/main/java/org/apache/flume/sink/hbase/ HBaseSink.java HbaseEventSerializer.java SimpleHbaseEventSerializer.java SimpleRowKeyGenerator.java 7 ©2012 Cloudera, Inc. All Rights Reserved. 7
  • 8. HBaseSink.java • Control flush() • Using serializer • Control the transaction • Control rollbacks (in case of events couldn’t written) 8 ©2012 Cloudera, Inc. All Rights Reserved. 8
  • 9. Configuration • Source Seq interface • Listening on a defined port @localhost • Serializer need some parameters • Column family and column must be known • Valid hbase-site.xml in $CLASSPATH 9 ©2012 Cloudera, Inc. All Rights Reserved. 9
  • 10. Configuration Example host1.sources = src1 host1.sinks = sink1 host1.channels = ch1 host1.sources.src1.type = seq host1.sources.src1.port = 25001 host1.sources.src1.bind = localhost host1.sources.src1.channels = ch1 host1.sinks.sink1.type = org.apache.flume.sink.hbase.HBaseSink host1.sinks.sink1.channel = ch1 host1.sinks.sink1.table = test3 host1.sinks.sink1.columnFamily = testing host1.sinks.sink1.column = foo host1.sinks.sink1.serializer = org.apache.flume.sink.hbase.SimpleHbaseEventSerializer host1.sinks.sink1.serializer.payloadColumn = pcol host1.sinks.sink1.serializer.incrementColumn = icol host1.channels.ch1.type=memory 10 ©2012 Cloudera, Inc. All Rights Reserved. 10
  • 11. Take Away • Flume collects events • Source - Channel - Sink concept • HBase sink needs a serializer interface • Column family and column must be known 11 ©2012 Cloudera, Inc. All Rights Reserved. 11
  • 12. Thank You • Web: https://cwiki.apache.org/FLUME/ getting-started.html • ML: flume-user@incubator.apache.org • Mail: alexander@cloudera.com • Blog: mapredit.blogspot.com • Twitter: @mapredit 12 ©2012 Cloudera, Inc. All Rights Reserved. 12