SlideShare uma empresa Scribd logo
1 de 32
Baixar para ler offline
Felipe Ferreira
Conhecendo o
Natural Partner for Innovation
felipe.ferreira@indt.org.br
• NoSQL datastore built on top of HDFS (Hadoop)
• An Apache Top Level Project
• The goal is the hosting of very large tables (billions of
rows X millions of columns)
• Based on Google’s BigTable paper
What Is HBase?
• Storing large amounts of data (TB/PB)
• High throughput for a large number of requests
• Storing unstructured or variable column data
• Big Data with random read and writes
Why Use HBase?
• Only use with Big Data problems
• Read straight through files
• Write all at once or append new files
– Not random reads or writes
• Access patterns of the data are ill-defined
When to Consider Not Using HBase?
• More complete list at http://wiki.apache.org/hadoop/Hbase/PoweredBy
Hbase in production
HBase Architecture – How It works
• HBase Master
• RegionServer
• ZooKeeper
• HDFS
– NameNode/Standby NameNode
– DataNode
Meet the Daemons
Daemon Locations
Tables and Column Families
Rows and Columns
Regions
Regions
Write Path
Read Path
HBase API – How to access the data
• Data is not accessed over SQL
• You must:
– Create your own connections
– Keep track of the type of data in a column
– Give each row a key
– Access a row by its key
No SQL Means No SQL
• Gets
– Gets a row’s data based on the row key
• Puts
– Update/inserts a row with data based on the row key
• Scans
– Finds all matching rows based on the row key
– Scan logic can be increased by using filters
Types of Access
Gets
Puts
Puts
HBase Schema Design – How to design
• Designing schemas for HBase requires an in-depth knowledge
• Schema Design is ‘data-centric’ not ‘relationship-centric’
• You design around how data is accessed
• Row keys are engineered
No SQL Means No SQL
• A row key is more than the glue between two tables
• Engineering time is spent just on constructing a row key
– Contents of a row key vary by access pattern
– Often made up of several pieces of data
Row Keys
• Schema design does not start in an ERD
• Access pattern must be known and ascertained
• Denormalize to improve performance
– Fewer, bigger tables
Schema Design
HBase in production - examples
• Use of HBase to integrate SMS, chat, email and Facebook Messages into
one inbox
• HydraBase – The evolution of HBase@Facebook
• HBase provides a distributed, read/write backup of all mysql tables in
Twitter's production
• A number of applications including people search rely on HBase internally
for data generation
• Additionally, the operations team uses HBase as a timeseries database for
cluster-wide monitoring/performance data
• Uses HBase as a foundation for cloud scale storage for a variety of
applications
• Uses HBase to build a graph service for global web threat entities
evaluation and reputation
Internal Use Only
Non-profit R&D Center
founded by Nokia in 2001 in Brazil
Focused on projects
delivering solutions and products in the mobile
technology area
Technical team of 200+
Located in Brazil
Manaus | Brasilia | Recife | São Paulo
50+
invention reports
accepted by
Nokia/Microsoft to file
patent application
500+
items of scientific
production
300+
completed projects
Internal Use Only
OUR
CERTIFICATIONS
Internal Use Only
OUR
AWARDS
Eco System Saving Tips (app)
Mobile World Congress 2012
Facelock1st prize
London Hackathon | Nokia World 2010
Audio Aid
1st prize |Forum Nokia
Calling All Innovators 2009
Microsoft Data Gathering
Tele.Síntese
2012 & 2013
award
• About training in Big Data (Developer, Analyst, Admin):
http://www.indt.org/servicos/treinamentos/hadoop-developer
http://www.indt.org/servicos/treinamentos/hadoop-analyst
http://www.indt.org/servicos/treinamentos/hadoop-admin
• About Hbase
http://hbase.apache.org/
• About INDT:
http://www.indt.org
communications@indt.org.br
• About Hortonworks:
http://www.hortonworks.com
communications@indt.org.br
INFOS + CONTACT

Mais conteúdo relacionado

Mais procurados

Mais procurados (20)

HBase
HBaseHBase
HBase
 
Apache hadoop technology : Beginners
Apache hadoop technology : BeginnersApache hadoop technology : Beginners
Apache hadoop technology : Beginners
 
Hadoop, Infrastructure and Stack
Hadoop, Infrastructure and StackHadoop, Infrastructure and Stack
Hadoop, Infrastructure and Stack
 
H-Base in Data Base Mangement System
H-Base in Data Base Mangement SystemH-Base in Data Base Mangement System
H-Base in Data Base Mangement System
 
BDM9 - Comparison of Oracle RDBMS and Cloudera Impala for a hospital use case
BDM9 - Comparison of Oracle RDBMS and Cloudera Impala for a hospital use caseBDM9 - Comparison of Oracle RDBMS and Cloudera Impala for a hospital use case
BDM9 - Comparison of Oracle RDBMS and Cloudera Impala for a hospital use case
 
BDM8 - Near-realtime Big Data Analytics using Impala
BDM8 - Near-realtime Big Data Analytics using ImpalaBDM8 - Near-realtime Big Data Analytics using Impala
BDM8 - Near-realtime Big Data Analytics using Impala
 
Hadoop jon
Hadoop jonHadoop jon
Hadoop jon
 
Introduction to hbase
Introduction to hbaseIntroduction to hbase
Introduction to hbase
 
Introduction to Big Data
Introduction to Big DataIntroduction to Big Data
Introduction to Big Data
 
SQL on Hadoop
SQL on HadoopSQL on Hadoop
SQL on Hadoop
 
HBase Advanced Schema Design - Berlin Buzzwords - June 2012
HBase Advanced Schema Design - Berlin Buzzwords - June 2012HBase Advanced Schema Design - Berlin Buzzwords - June 2012
HBase Advanced Schema Design - Berlin Buzzwords - June 2012
 
Microsoft's Big Play for Big Data
Microsoft's Big Play for Big DataMicrosoft's Big Play for Big Data
Microsoft's Big Play for Big Data
 
Big data stores
Big data  storesBig data  stores
Big data stores
 
Big data Intro by Kaushik Dutta
Big data Intro by Kaushik DuttaBig data Intro by Kaushik Dutta
Big data Intro by Kaushik Dutta
 
Getting started with big data in Azure HDInsight
Getting started with big data in Azure HDInsightGetting started with big data in Azure HDInsight
Getting started with big data in Azure HDInsight
 
Apache Drill at ApacheCon2014
Apache Drill at ApacheCon2014Apache Drill at ApacheCon2014
Apache Drill at ApacheCon2014
 
Lviv EDGE 2 - NoSQL
Lviv EDGE 2 - NoSQLLviv EDGE 2 - NoSQL
Lviv EDGE 2 - NoSQL
 
ODI11g, Hadoop and "Big Data" Sources
ODI11g, Hadoop and "Big Data" SourcesODI11g, Hadoop and "Big Data" Sources
ODI11g, Hadoop and "Big Data" Sources
 
Boston Hadoop Meetup, April 26 2012
Boston Hadoop Meetup, April 26 2012Boston Hadoop Meetup, April 26 2012
Boston Hadoop Meetup, April 26 2012
 
Hadoop
HadoopHadoop
Hadoop
 

Semelhante a Conhecendo o Apache HBase

Nyc hadoop meetup introduction to h base
Nyc hadoop meetup   introduction to h baseNyc hadoop meetup   introduction to h base
Nyc hadoop meetup introduction to h base
智杰 付
 
Big Data and Cloud Computing
Big Data and Cloud ComputingBig Data and Cloud Computing
Big Data and Cloud Computing
Farzad Nozarian
 

Semelhante a Conhecendo o Apache HBase (20)

Intro to HBase - Lars George
Intro to HBase - Lars GeorgeIntro to HBase - Lars George
Intro to HBase - Lars George
 
Techincal Talk Hbase-Ditributed,no-sql database
Techincal Talk Hbase-Ditributed,no-sql databaseTechincal Talk Hbase-Ditributed,no-sql database
Techincal Talk Hbase-Ditributed,no-sql database
 
Nyc hadoop meetup introduction to h base
Nyc hadoop meetup   introduction to h baseNyc hadoop meetup   introduction to h base
Nyc hadoop meetup introduction to h base
 
Unit II Hadoop Ecosystem_Updated.pptx
Unit II Hadoop Ecosystem_Updated.pptxUnit II Hadoop Ecosystem_Updated.pptx
Unit II Hadoop Ecosystem_Updated.pptx
 
Apache Hadoop Hive
Apache Hadoop HiveApache Hadoop Hive
Apache Hadoop Hive
 
Big Data and Hadoop - History, Technical Deep Dive, and Industry Trends
Big Data and Hadoop - History, Technical Deep Dive, and Industry TrendsBig Data and Hadoop - History, Technical Deep Dive, and Industry Trends
Big Data and Hadoop - History, Technical Deep Dive, and Industry Trends
 
Big Data and Hadoop - History, Technical Deep Dive, and Industry Trends
Big Data and Hadoop - History, Technical Deep Dive, and Industry TrendsBig Data and Hadoop - History, Technical Deep Dive, and Industry Trends
Big Data and Hadoop - History, Technical Deep Dive, and Industry Trends
 
Big data solutions in azure
Big data solutions in azureBig data solutions in azure
Big data solutions in azure
 
NoSql - mayank singh
NoSql - mayank singhNoSql - mayank singh
NoSql - mayank singh
 
Big Data and Cloud Computing
Big Data and Cloud ComputingBig Data and Cloud Computing
Big Data and Cloud Computing
 
Hive
HiveHive
Hive
 
Apache HBase™
Apache HBase™Apache HBase™
Apache HBase™
 
Impala for PhillyDB Meetup
Impala for PhillyDB MeetupImpala for PhillyDB Meetup
Impala for PhillyDB Meetup
 
Hadoop - Apache Hbase
Hadoop - Apache HbaseHadoop - Apache Hbase
Hadoop - Apache Hbase
 
Big data solutions in Azure
Big data solutions in AzureBig data solutions in Azure
Big data solutions in Azure
 
Big data and hadoop
Big data and hadoopBig data and hadoop
Big data and hadoop
 
Big Data and NoSQL for Database and BI Pros
Big Data and NoSQL for Database and BI ProsBig Data and NoSQL for Database and BI Pros
Big Data and NoSQL for Database and BI Pros
 
hive.pptx
hive.pptxhive.pptx
hive.pptx
 
Building Big data solutions in Azure
Building Big data solutions in AzureBuilding Big data solutions in Azure
Building Big data solutions in Azure
 
Hive and querying data
Hive and querying dataHive and querying data
Hive and querying data
 

Último

Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
WSO2
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
panagenda
 

Último (20)

FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024
 
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Platformless Horizons for Digital Adaptability
Platformless Horizons for Digital AdaptabilityPlatformless Horizons for Digital Adaptability
Platformless Horizons for Digital Adaptability
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptx
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot ModelMcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Six Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal OntologySix Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal Ontology
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 

Conhecendo o Apache HBase

  • 1. Felipe Ferreira Conhecendo o Natural Partner for Innovation felipe.ferreira@indt.org.br
  • 2. • NoSQL datastore built on top of HDFS (Hadoop) • An Apache Top Level Project • The goal is the hosting of very large tables (billions of rows X millions of columns) • Based on Google’s BigTable paper What Is HBase?
  • 3. • Storing large amounts of data (TB/PB) • High throughput for a large number of requests • Storing unstructured or variable column data • Big Data with random read and writes Why Use HBase?
  • 4. • Only use with Big Data problems • Read straight through files • Write all at once or append new files – Not random reads or writes • Access patterns of the data are ill-defined When to Consider Not Using HBase?
  • 5. • More complete list at http://wiki.apache.org/hadoop/Hbase/PoweredBy Hbase in production
  • 6. HBase Architecture – How It works
  • 7. • HBase Master • RegionServer • ZooKeeper • HDFS – NameNode/Standby NameNode – DataNode Meet the Daemons
  • 9. Tables and Column Families
  • 15. HBase API – How to access the data
  • 16. • Data is not accessed over SQL • You must: – Create your own connections – Keep track of the type of data in a column – Give each row a key – Access a row by its key No SQL Means No SQL
  • 17. • Gets – Gets a row’s data based on the row key • Puts – Update/inserts a row with data based on the row key • Scans – Finds all matching rows based on the row key – Scan logic can be increased by using filters Types of Access
  • 18. Gets
  • 19. Puts
  • 20. Puts
  • 21. HBase Schema Design – How to design
  • 22. • Designing schemas for HBase requires an in-depth knowledge • Schema Design is ‘data-centric’ not ‘relationship-centric’ • You design around how data is accessed • Row keys are engineered No SQL Means No SQL
  • 23. • A row key is more than the glue between two tables • Engineering time is spent just on constructing a row key – Contents of a row key vary by access pattern – Often made up of several pieces of data Row Keys
  • 24. • Schema design does not start in an ERD • Access pattern must be known and ascertained • Denormalize to improve performance – Fewer, bigger tables Schema Design
  • 25. HBase in production - examples
  • 26. • Use of HBase to integrate SMS, chat, email and Facebook Messages into one inbox • HydraBase – The evolution of HBase@Facebook
  • 27. • HBase provides a distributed, read/write backup of all mysql tables in Twitter's production • A number of applications including people search rely on HBase internally for data generation • Additionally, the operations team uses HBase as a timeseries database for cluster-wide monitoring/performance data
  • 28. • Uses HBase as a foundation for cloud scale storage for a variety of applications • Uses HBase to build a graph service for global web threat entities evaluation and reputation
  • 29. Internal Use Only Non-profit R&D Center founded by Nokia in 2001 in Brazil Focused on projects delivering solutions and products in the mobile technology area Technical team of 200+ Located in Brazil Manaus | Brasilia | Recife | São Paulo 50+ invention reports accepted by Nokia/Microsoft to file patent application 500+ items of scientific production 300+ completed projects
  • 31. Internal Use Only OUR AWARDS Eco System Saving Tips (app) Mobile World Congress 2012 Facelock1st prize London Hackathon | Nokia World 2010 Audio Aid 1st prize |Forum Nokia Calling All Innovators 2009 Microsoft Data Gathering Tele.Síntese 2012 & 2013 award
  • 32. • About training in Big Data (Developer, Analyst, Admin): http://www.indt.org/servicos/treinamentos/hadoop-developer http://www.indt.org/servicos/treinamentos/hadoop-analyst http://www.indt.org/servicos/treinamentos/hadoop-admin • About Hbase http://hbase.apache.org/ • About INDT: http://www.indt.org communications@indt.org.br • About Hortonworks: http://www.hortonworks.com communications@indt.org.br INFOS + CONTACT