SlideShare uma empresa Scribd logo
1 de 37
Baixar para ler offline
Realtime Apache Hadoop at Facebook



Jonathan Gray & Dhruba Borthakur
June 14, 2011 at SIGMOD, Athens
Agenda

 1   Why Apache Hadoop and HBase?


 2   Quick Introduction to Apache HBase


 3   Applications of HBase at Facebook
Why Hadoop and HBase?
For Realtime Data?
Problems with existing stack
▪ MySQL   is stable, but...
 ▪  Not inherently distributed
 ▪  Table size limits

 ▪  Inflexible schema


▪ Hadoop   is scalable, but...
 ▪  MapReduce  is slow and difficult
 ▪  Does not support random writes

 ▪  Poor support for random reads
Specialized solutions
▪ High-throughput,    persistent key-value
  ▪ Tokyo  Cabinet
▪ Large scale data warehousing

   ▪  Hive/Hadoop

▪  PhotoStore
  ▪  Haystack

▪ Custom    C++ servers for lots of other stuff
What do we need in a data store?
▪ Requirements    for Facebook Messages
  ▪  Massive  datasets, with large subsets of cold data
  ▪  Elasticity and high availability

  ▪  Strong consistency within a datacenter

  ▪  Fault isolation


▪ Some   non-requirements
  ▪  Network  partitions within a single datacenter
  ▪  Active-active serving from multiple datacenters
HBase satisfied our requirements
▪ In   early 2010, engineers at FB compared DBs
   ▪  Apache       Cassandra, Apache HBase, Sharded MySQL

▪ Compared            performance, scalability, and
 features
   ▪  HBase gave excellent write performance, good reads
   ▪  HBase already included many nice-to-have features
        ▪    Atomic read-modify-write operations
        ▪    Multiple shards per server
        ▪    Bulk importing
HBase uses HDFS
We get the benefits of HDFS as a storage
system for free
▪ Fault
      toleranceScalabilityChecksums fix
corruptionsMapReduce

▪ Faultisolation of disksHDFS battle tested at petabyte
scale at Facebook
Lots of existing operational experience
Apache HBase
▪ Originally   part of Hadoop
  ▪  HBase   adds random read/write access to HDFS

▪ Required    some Hadoop changes for FB usage
  ▪  Fileappends
  ▪  HA NameNode

  ▪  Read optimizations


▪ Plus   ZooKeeper!
HBase System Overview
                          Database Layer

                     Master       Backup Master

          Region Server    Region Server    Region Server    ...
                              HBASE
Storage                                               Coordination
Layer                                                      Service

   Namenode     Secondary Namenode                 ZK Peer
                                                               ...
  Datanode    Datanode     Datanode   ...          ZK Peer
                  HDFS                          Zookeeper Quorum
HBase in a nutshell
▪ Sortedand column-oriented
▪ High write throughputHorizontal scalability
  Automatic failoverRegions sharded
  dynamically
Applications of HBase at
Facebook
Use Case 1
    Titan
(Facebook Messages)
The New Facebook Messages




Messages   IM/Chat   email   SMS
Facebook Messaging
  ▪ High   write throughputEvery message, instant
     message, SMS, and e-mailSearch indexes for all of the
     above
  ▪  Denormalized schema



▪ A   product at massive scale on day one
  ▪  6kmessages a second
  ▪  50k instant messages a second

  ▪  300TB data growth/month compressed
Typical Cell Layout
   ▪  Multiple       cells for messaging	

   ▪    20 servers/rack; 5 or more racks per cluster         	

   ▪  Controllers       (master/Zookeeper) spread across racks
ZooKeeper              ZooKeeper          ZooKeeper                ZooKeeper          ZooKeeper
HDFS NameNode          Backup NameNode    Job Tracker              HBase Master       Backup Master

           	

Region Server
Data Node
               	

              	

                       Region Server
                       Data Node
                                    	

            	

                                          Region Server
                                          Data Node
                                                       	

                  	

                                                                   Region Server
                                                                   Data Node
                                                                                	

            	

                                                                                      Region Server
                                                                                      Data Node
                                                                                                   	

Task Tracker           Task Tracker       Task Tracker             Task Tracker       Task Tracker




        19x...              19x...             19x...                   19x...             19x...
           	

Region Server
Data Node
               	

              	

                       Region Server
                       Data Node
                                    	

            	

                                          Region Server
                                          Data Node
                                                       	

                  	

                                                                   Region Server
                                                                   Data Node
                                                                                	

            	

                                                                                      Region Server
                                                                                      Data Node
                                                                                                   	

Task Tracker           Task Tracker       Task Tracker             Task Tracker       Task Tracker
        Rack #1             Rack #2            Rack #3                  Rack #4            Rack #5
High Write Throughput
               Write
               Key Value

                                                         Sequential
     Key val                          Key val              write

     Key val                          Key val
       .
       .
       .
     Key val                            .
                                        .
                                        .
                                      Key val

       .
       .
       .
     Key val
                                  Sorted in memory
                                      Key val


     Key val         Sequential       Key val
                     write

Commit Log(in HDFS)               Memstore (in memory)
Horizontal Scalability
Region

         ...               ...
Automatic Failover
                                         Find new
                       HBase client      server from
                                         META

server
 died




         No physical data copy because data is in HDFS
Use Case 2
   Puma
(Facebook Insights)
Puma
▪ Realtime      Data Pipeline
  ▪  Utilize   existing log aggregation pipeline (Scribe-
     HDFS)
  ▪  Extend low-latency capabilities of HDFS (Sync+PTail)

  ▪  High-throughput writes (HBase)


▪ Support      for Realtime Aggregation
  ▪  Utilize
          HBase atomic increments to maintain roll-ups
  ▪  Complex HBase schemas for unique-user calculations
Puma as Realtime MapReduce
▪ Map   phase with PTail
  ▪  Divide  the input log stream into N shards
  ▪  First version only supported random bucketing

  ▪  Now supports application-level bucketing


▪ Reduce      phase with HBase
  ▪  Every row+column in HBase is an output key
  ▪  Aggregate key counts using atomic counters

  ▪  Can also maintain per-key lists or other structures
Puma for Facebook Insights
▪ Realtime      URL/Domain Insights
  ▪  Domain    owners can see deep analytics for their site
  ▪  Clicks, Likes, Shares, Comments, Impressions

  ▪  Detailed demographic breakdowns (anonymized)

  ▪  Top URLs calculated per-domain and globally


▪ Massive   Throughput
  ▪  Billions of URLs
  ▪  > 1 Million counter increments per second
Use Case 3
         ODS
(Facebook Internal Metrics)
ODS
▪ Operational           Data Store
  ▪  System metrics (CPU, Memory, IO, Network)
  ▪  Application metrics (Web, DB, Caches)

  ▪  Facebook metrics (Usage, Revenue)
       ▪    Easily graph this data over time
       ▪    Supports complex aggregation, transformations, etc.


▪  Difficultto scale with MySQL
  ▪  Millions of unique time-series with billions of points

  ▪  Irregular data growth patterns
Dynamic sharding of regions
Region

                ...      ...




           server
         overloaded
Future of HBase at Facebook
User and Graph Data
      in HBase
HBase for the important stuff
▪ Looking    at HBase to augment MySQL
  ▪  Only single row ACID from MySQL is used
  ▪  DBs are always fronted by an in-memory cache

  ▪  HBase is great at storing dictionaries and lists



▪ Database    tier size determined by IOPS
  ▪  HBase  does only sequential writes
  ▪  Lower IOPs translate to lower cost

  ▪  Larger tables on denser, cheaper, commodity nodes
Conclusion
▪ Facebook   investing in Realtime Hadoop/HBase
  ▪  Work  of a large team of Facebook engineers
  ▪  Close collaboration with open source developers



▪ Much   more detail in Realtime Hadoop paper
  ▪  Technical   details about changes to Hadoop and
     HBase
  ▪  Operational experiences in production
Questions?
jgray@fb.com
dhruba@fb.com

Mais conteúdo relacionado

Mais procurados

HBase @ Twitter
HBase @ TwitterHBase @ Twitter
HBase @ Twitterctrezzo
 
MongoDB at eBay
MongoDB at eBayMongoDB at eBay
MongoDB at eBayMongoDB
 
Hadoop Summit 2012 - Hadoop and Vertica: The Data Analytics Platform at Twitter
Hadoop Summit 2012 - Hadoop and Vertica: The Data Analytics Platform at TwitterHadoop Summit 2012 - Hadoop and Vertica: The Data Analytics Platform at Twitter
Hadoop Summit 2012 - Hadoop and Vertica: The Data Analytics Platform at TwitterBill Graham
 
Facebook keynote-nicolas-qcon
Facebook keynote-nicolas-qconFacebook keynote-nicolas-qcon
Facebook keynote-nicolas-qconYiwei Ma
 
Hadoop Successes and Failures to Drive Deployment Evolution
Hadoop Successes and Failures to Drive Deployment EvolutionHadoop Successes and Failures to Drive Deployment Evolution
Hadoop Successes and Failures to Drive Deployment EvolutionBenoit Perroud
 
Storage Infrastructure Behind Facebook Messages
Storage Infrastructure Behind Facebook MessagesStorage Infrastructure Behind Facebook Messages
Storage Infrastructure Behind Facebook Messagesyarapavan
 
Geo-based content processing using hbase
Geo-based content processing using hbaseGeo-based content processing using hbase
Geo-based content processing using hbaseRavi Veeramachaneni
 
Hadoop World 2011: Practical HBase - Ravi Veeramchaneni, Informatica
Hadoop World 2011: Practical HBase - Ravi Veeramchaneni, InformaticaHadoop World 2011: Practical HBase - Ravi Veeramchaneni, Informatica
Hadoop World 2011: Practical HBase - Ravi Veeramchaneni, InformaticaCloudera, Inc.
 
Hadoop configuration & performance tuning
Hadoop configuration & performance tuningHadoop configuration & performance tuning
Hadoop configuration & performance tuningVitthal Gogate
 
Design, Scale and Performance of MapR's Distribution for Hadoop
Design, Scale and Performance of MapR's Distribution for HadoopDesign, Scale and Performance of MapR's Distribution for Hadoop
Design, Scale and Performance of MapR's Distribution for Hadoopmcsrivas
 
MongoDB San Francisco 2013: Storing eBay's Media Metadata on MongoDB present...
MongoDB San Francisco 2013: Storing eBay's Media Metadata on MongoDB  present...MongoDB San Francisco 2013: Storing eBay's Media Metadata on MongoDB  present...
MongoDB San Francisco 2013: Storing eBay's Media Metadata on MongoDB present...MongoDB
 
HBaseCon 2012 | Low Latency OLAP with HBase - Cosmin Lehene, Adobe
HBaseCon 2012 | Low Latency OLAP with HBase - Cosmin Lehene, AdobeHBaseCon 2012 | Low Latency OLAP with HBase - Cosmin Lehene, Adobe
HBaseCon 2012 | Low Latency OLAP with HBase - Cosmin Lehene, AdobeCloudera, Inc.
 
Getting Started with Hadoop
Getting Started with HadoopGetting Started with Hadoop
Getting Started with HadoopCloudera, Inc.
 
Facing enterprise specific challenges – utility programming in hadoop
Facing enterprise specific challenges – utility programming in hadoopFacing enterprise specific challenges – utility programming in hadoop
Facing enterprise specific challenges – utility programming in hadoopfann wu
 
Big Data and Hadoop - History, Technical Deep Dive, and Industry Trends
Big Data and Hadoop - History, Technical Deep Dive, and Industry TrendsBig Data and Hadoop - History, Technical Deep Dive, and Industry Trends
Big Data and Hadoop - History, Technical Deep Dive, and Industry TrendsEsther Kundin
 
TriHUG - Beyond Batch
TriHUG - Beyond BatchTriHUG - Beyond Batch
TriHUG - Beyond Batchboorad
 
Hadoop meets Agile! - An Agile Big Data Model
Hadoop meets Agile! - An Agile Big Data ModelHadoop meets Agile! - An Agile Big Data Model
Hadoop meets Agile! - An Agile Big Data ModelUwe Printz
 
Marcel Kornacker: Impala tech talk Tue Feb 26th 2013
Marcel Kornacker: Impala tech talk Tue Feb 26th 2013Marcel Kornacker: Impala tech talk Tue Feb 26th 2013
Marcel Kornacker: Impala tech talk Tue Feb 26th 2013Modern Data Stack France
 
Bikas saha:the next generation of hadoop– hadoop 2 and yarn
Bikas saha:the next generation of hadoop– hadoop 2 and yarnBikas saha:the next generation of hadoop– hadoop 2 and yarn
Bikas saha:the next generation of hadoop– hadoop 2 and yarnhdhappy001
 

Mais procurados (20)

HBase @ Twitter
HBase @ TwitterHBase @ Twitter
HBase @ Twitter
 
MongoDB at eBay
MongoDB at eBayMongoDB at eBay
MongoDB at eBay
 
Hadoop Summit 2012 - Hadoop and Vertica: The Data Analytics Platform at Twitter
Hadoop Summit 2012 - Hadoop and Vertica: The Data Analytics Platform at TwitterHadoop Summit 2012 - Hadoop and Vertica: The Data Analytics Platform at Twitter
Hadoop Summit 2012 - Hadoop and Vertica: The Data Analytics Platform at Twitter
 
Facebook keynote-nicolas-qcon
Facebook keynote-nicolas-qconFacebook keynote-nicolas-qcon
Facebook keynote-nicolas-qcon
 
Hadoop Successes and Failures to Drive Deployment Evolution
Hadoop Successes and Failures to Drive Deployment EvolutionHadoop Successes and Failures to Drive Deployment Evolution
Hadoop Successes and Failures to Drive Deployment Evolution
 
Drill njhug -19 feb2013
Drill njhug -19 feb2013Drill njhug -19 feb2013
Drill njhug -19 feb2013
 
Storage Infrastructure Behind Facebook Messages
Storage Infrastructure Behind Facebook MessagesStorage Infrastructure Behind Facebook Messages
Storage Infrastructure Behind Facebook Messages
 
Geo-based content processing using hbase
Geo-based content processing using hbaseGeo-based content processing using hbase
Geo-based content processing using hbase
 
Hadoop World 2011: Practical HBase - Ravi Veeramchaneni, Informatica
Hadoop World 2011: Practical HBase - Ravi Veeramchaneni, InformaticaHadoop World 2011: Practical HBase - Ravi Veeramchaneni, Informatica
Hadoop World 2011: Practical HBase - Ravi Veeramchaneni, Informatica
 
Hadoop configuration & performance tuning
Hadoop configuration & performance tuningHadoop configuration & performance tuning
Hadoop configuration & performance tuning
 
Design, Scale and Performance of MapR's Distribution for Hadoop
Design, Scale and Performance of MapR's Distribution for HadoopDesign, Scale and Performance of MapR's Distribution for Hadoop
Design, Scale and Performance of MapR's Distribution for Hadoop
 
MongoDB San Francisco 2013: Storing eBay's Media Metadata on MongoDB present...
MongoDB San Francisco 2013: Storing eBay's Media Metadata on MongoDB  present...MongoDB San Francisco 2013: Storing eBay's Media Metadata on MongoDB  present...
MongoDB San Francisco 2013: Storing eBay's Media Metadata on MongoDB present...
 
HBaseCon 2012 | Low Latency OLAP with HBase - Cosmin Lehene, Adobe
HBaseCon 2012 | Low Latency OLAP with HBase - Cosmin Lehene, AdobeHBaseCon 2012 | Low Latency OLAP with HBase - Cosmin Lehene, Adobe
HBaseCon 2012 | Low Latency OLAP with HBase - Cosmin Lehene, Adobe
 
Getting Started with Hadoop
Getting Started with HadoopGetting Started with Hadoop
Getting Started with Hadoop
 
Facing enterprise specific challenges – utility programming in hadoop
Facing enterprise specific challenges – utility programming in hadoopFacing enterprise specific challenges – utility programming in hadoop
Facing enterprise specific challenges – utility programming in hadoop
 
Big Data and Hadoop - History, Technical Deep Dive, and Industry Trends
Big Data and Hadoop - History, Technical Deep Dive, and Industry TrendsBig Data and Hadoop - History, Technical Deep Dive, and Industry Trends
Big Data and Hadoop - History, Technical Deep Dive, and Industry Trends
 
TriHUG - Beyond Batch
TriHUG - Beyond BatchTriHUG - Beyond Batch
TriHUG - Beyond Batch
 
Hadoop meets Agile! - An Agile Big Data Model
Hadoop meets Agile! - An Agile Big Data ModelHadoop meets Agile! - An Agile Big Data Model
Hadoop meets Agile! - An Agile Big Data Model
 
Marcel Kornacker: Impala tech talk Tue Feb 26th 2013
Marcel Kornacker: Impala tech talk Tue Feb 26th 2013Marcel Kornacker: Impala tech talk Tue Feb 26th 2013
Marcel Kornacker: Impala tech talk Tue Feb 26th 2013
 
Bikas saha:the next generation of hadoop– hadoop 2 and yarn
Bikas saha:the next generation of hadoop– hadoop 2 and yarnBikas saha:the next generation of hadoop– hadoop 2 and yarn
Bikas saha:the next generation of hadoop– hadoop 2 and yarn
 

Destaque

Hadoop Operations Powered By ... Hadoop (Hadoop Summit 2014 Amsterdam)
Hadoop Operations Powered By ... Hadoop (Hadoop Summit 2014 Amsterdam)Hadoop Operations Powered By ... Hadoop (Hadoop Summit 2014 Amsterdam)
Hadoop Operations Powered By ... Hadoop (Hadoop Summit 2014 Amsterdam)Adam Kawa
 
Presto: Distributed SQL on Anything - Strata Hadoop 2017 San Jose, CA
Presto: Distributed SQL on Anything -  Strata Hadoop 2017 San Jose, CAPresto: Distributed SQL on Anything -  Strata Hadoop 2017 San Jose, CA
Presto: Distributed SQL on Anything - Strata Hadoop 2017 San Jose, CAkbajda
 
Big Data Overview 2013-2014
Big Data Overview 2013-2014Big Data Overview 2013-2014
Big Data Overview 2013-2014KMS Technology
 
introduction to data processing using Hadoop and Pig
introduction to data processing using Hadoop and Pigintroduction to data processing using Hadoop and Pig
introduction to data processing using Hadoop and PigRicardo Varela
 
Practical Problem Solving with Apache Hadoop & Pig
Practical Problem Solving with Apache Hadoop & PigPractical Problem Solving with Apache Hadoop & Pig
Practical Problem Solving with Apache Hadoop & PigMilind Bhandarkar
 
Integration of Hive and HBase
Integration of Hive and HBaseIntegration of Hive and HBase
Integration of Hive and HBaseHortonworks
 
Pig, Making Hadoop Easy
Pig, Making Hadoop EasyPig, Making Hadoop Easy
Pig, Making Hadoop EasyNick Dimiduk
 
Facebooks Petabyte Scale Data Warehouse using Hive and Hadoop
Facebooks Petabyte Scale Data Warehouse using Hive and HadoopFacebooks Petabyte Scale Data Warehouse using Hive and Hadoop
Facebooks Petabyte Scale Data Warehouse using Hive and Hadooproyans
 
Hive Quick Start Tutorial
Hive Quick Start TutorialHive Quick Start Tutorial
Hive Quick Start TutorialCarl Steinbach
 
Hadoop, Pig, and Twitter (NoSQL East 2009)
Hadoop, Pig, and Twitter (NoSQL East 2009)Hadoop, Pig, and Twitter (NoSQL East 2009)
Hadoop, Pig, and Twitter (NoSQL East 2009)Kevin Weil
 
Introduction To Map Reduce
Introduction To Map ReduceIntroduction To Map Reduce
Introduction To Map Reducerantav
 
HIVE: Data Warehousing & Analytics on Hadoop
HIVE: Data Warehousing & Analytics on HadoopHIVE: Data Warehousing & Analytics on Hadoop
HIVE: Data Warehousing & Analytics on HadoopZheng Shao
 
Big Data Analytics with Hadoop
Big Data Analytics with HadoopBig Data Analytics with Hadoop
Big Data Analytics with HadoopPhilippe Julio
 

Destaque (14)

Big data Analytics Hadoop
Big data Analytics HadoopBig data Analytics Hadoop
Big data Analytics Hadoop
 
Hadoop Operations Powered By ... Hadoop (Hadoop Summit 2014 Amsterdam)
Hadoop Operations Powered By ... Hadoop (Hadoop Summit 2014 Amsterdam)Hadoop Operations Powered By ... Hadoop (Hadoop Summit 2014 Amsterdam)
Hadoop Operations Powered By ... Hadoop (Hadoop Summit 2014 Amsterdam)
 
Presto: Distributed SQL on Anything - Strata Hadoop 2017 San Jose, CA
Presto: Distributed SQL on Anything -  Strata Hadoop 2017 San Jose, CAPresto: Distributed SQL on Anything -  Strata Hadoop 2017 San Jose, CA
Presto: Distributed SQL on Anything - Strata Hadoop 2017 San Jose, CA
 
Big Data Overview 2013-2014
Big Data Overview 2013-2014Big Data Overview 2013-2014
Big Data Overview 2013-2014
 
introduction to data processing using Hadoop and Pig
introduction to data processing using Hadoop and Pigintroduction to data processing using Hadoop and Pig
introduction to data processing using Hadoop and Pig
 
Practical Problem Solving with Apache Hadoop & Pig
Practical Problem Solving with Apache Hadoop & PigPractical Problem Solving with Apache Hadoop & Pig
Practical Problem Solving with Apache Hadoop & Pig
 
Integration of Hive and HBase
Integration of Hive and HBaseIntegration of Hive and HBase
Integration of Hive and HBase
 
Pig, Making Hadoop Easy
Pig, Making Hadoop EasyPig, Making Hadoop Easy
Pig, Making Hadoop Easy
 
Facebooks Petabyte Scale Data Warehouse using Hive and Hadoop
Facebooks Petabyte Scale Data Warehouse using Hive and HadoopFacebooks Petabyte Scale Data Warehouse using Hive and Hadoop
Facebooks Petabyte Scale Data Warehouse using Hive and Hadoop
 
Hive Quick Start Tutorial
Hive Quick Start TutorialHive Quick Start Tutorial
Hive Quick Start Tutorial
 
Hadoop, Pig, and Twitter (NoSQL East 2009)
Hadoop, Pig, and Twitter (NoSQL East 2009)Hadoop, Pig, and Twitter (NoSQL East 2009)
Hadoop, Pig, and Twitter (NoSQL East 2009)
 
Introduction To Map Reduce
Introduction To Map ReduceIntroduction To Map Reduce
Introduction To Map Reduce
 
HIVE: Data Warehousing & Analytics on Hadoop
HIVE: Data Warehousing & Analytics on HadoopHIVE: Data Warehousing & Analytics on Hadoop
HIVE: Data Warehousing & Analytics on Hadoop
 
Big Data Analytics with Hadoop
Big Data Analytics with HadoopBig Data Analytics with Hadoop
Big Data Analytics with Hadoop
 

Semelhante a Realtime Apache Hadoop at Facebook

支撑Facebook消息处理的h base存储系统
支撑Facebook消息处理的h base存储系统支撑Facebook消息处理的h base存储系统
支撑Facebook消息处理的h base存储系统yongboy
 
Facebook Messages & HBase
Facebook Messages & HBaseFacebook Messages & HBase
Facebook Messages & HBase强 王
 
Facebook architecture
Facebook architectureFacebook architecture
Facebook architecturedrewz lin
 
Facebook architecture
Facebook architectureFacebook architecture
Facebook architecturemysqlops
 
Qcon 090408233824-phpapp01
Qcon 090408233824-phpapp01Qcon 090408233824-phpapp01
Qcon 090408233824-phpapp01jgregory1234
 
Facebook的架构
Facebook的架构Facebook的架构
Facebook的架构yiditushe
 
[Hi c2011]building mission critical messaging system(guoqiang jerry)
[Hi c2011]building mission critical messaging system(guoqiang jerry)[Hi c2011]building mission critical messaging system(guoqiang jerry)
[Hi c2011]building mission critical messaging system(guoqiang jerry)baggioss
 
Shared personalization service. How to scale to 15 k rps (Patrice Pelland)
Shared personalization service. How to scale to 15 k rps (Patrice Pelland)Shared personalization service. How to scale to 15 k rps (Patrice Pelland)
Shared personalization service. How to scale to 15 k rps (Patrice Pelland)Ontico
 
Hadoop on Azure, Blue elephants
Hadoop on Azure,  Blue elephantsHadoop on Azure,  Blue elephants
Hadoop on Azure, Blue elephantsOvidiu Dimulescu
 
SQL on Hadoop: Defining the New Generation of Analytic SQL Databases
SQL on Hadoop: Defining the New Generation of Analytic SQL DatabasesSQL on Hadoop: Defining the New Generation of Analytic SQL Databases
SQL on Hadoop: Defining the New Generation of Analytic SQL DatabasesOReillyStrata
 
Cloudera Sessions - Clinic 1 - Getting Started With Hadoop
Cloudera Sessions - Clinic 1 - Getting Started With HadoopCloudera Sessions - Clinic 1 - Getting Started With Hadoop
Cloudera Sessions - Clinic 1 - Getting Started With HadoopCloudera, Inc.
 
HBaseCon 2015: Multitenancy in HBase
HBaseCon 2015: Multitenancy in HBaseHBaseCon 2015: Multitenancy in HBase
HBaseCon 2015: Multitenancy in HBaseHBaseCon
 
SQL Server 2008 Fast Track Data Warehouse
SQL Server 2008 Fast Track Data WarehouseSQL Server 2008 Fast Track Data Warehouse
SQL Server 2008 Fast Track Data WarehouseMark Ginnebaugh
 
Cloudera Impala: A modern SQL Query Engine for Hadoop
Cloudera Impala: A modern SQL Query Engine for HadoopCloudera Impala: A modern SQL Query Engine for Hadoop
Cloudera Impala: A modern SQL Query Engine for HadoopCloudera, Inc.
 
Unified Batch & Stream Processing with Apache Samza
Unified Batch & Stream Processing with Apache SamzaUnified Batch & Stream Processing with Apache Samza
Unified Batch & Stream Processing with Apache SamzaDataWorks Summit
 
Architecting the Future of Big Data & Search - Eric Baldeschwieler
Architecting the Future of Big Data & Search - Eric BaldeschwielerArchitecting the Future of Big Data & Search - Eric Baldeschwieler
Architecting the Future of Big Data & Search - Eric Baldeschwielerlucenerevolution
 

Semelhante a Realtime Apache Hadoop at Facebook (20)

支撑Facebook消息处理的h base存储系统
支撑Facebook消息处理的h base存储系统支撑Facebook消息处理的h base存储系统
支撑Facebook消息处理的h base存储系统
 
Facebook Messages & HBase
Facebook Messages & HBaseFacebook Messages & HBase
Facebook Messages & HBase
 
Introduction to Hadoop
Introduction to HadoopIntroduction to Hadoop
Introduction to Hadoop
 
Facebook architecture
Facebook architectureFacebook architecture
Facebook architecture
 
Facebook architecture
Facebook architectureFacebook architecture
Facebook architecture
 
Qcon 090408233824-phpapp01
Qcon 090408233824-phpapp01Qcon 090408233824-phpapp01
Qcon 090408233824-phpapp01
 
Facebook的架构
Facebook的架构Facebook的架构
Facebook的架构
 
[Hi c2011]building mission critical messaging system(guoqiang jerry)
[Hi c2011]building mission critical messaging system(guoqiang jerry)[Hi c2011]building mission critical messaging system(guoqiang jerry)
[Hi c2011]building mission critical messaging system(guoqiang jerry)
 
Shared personalization service. How to scale to 15 k rps (Patrice Pelland)
Shared personalization service. How to scale to 15 k rps (Patrice Pelland)Shared personalization service. How to scale to 15 k rps (Patrice Pelland)
Shared personalization service. How to scale to 15 k rps (Patrice Pelland)
 
Hadoop on Azure, Blue elephants
Hadoop on Azure,  Blue elephantsHadoop on Azure,  Blue elephants
Hadoop on Azure, Blue elephants
 
SQL on Hadoop: Defining the New Generation of Analytic SQL Databases
SQL on Hadoop: Defining the New Generation of Analytic SQL DatabasesSQL on Hadoop: Defining the New Generation of Analytic SQL Databases
SQL on Hadoop: Defining the New Generation of Analytic SQL Databases
 
Cloudera Sessions - Clinic 1 - Getting Started With Hadoop
Cloudera Sessions - Clinic 1 - Getting Started With HadoopCloudera Sessions - Clinic 1 - Getting Started With Hadoop
Cloudera Sessions - Clinic 1 - Getting Started With Hadoop
 
HBaseCon 2015: Multitenancy in HBase
HBaseCon 2015: Multitenancy in HBaseHBaseCon 2015: Multitenancy in HBase
HBaseCon 2015: Multitenancy in HBase
 
HBase internals
HBase internalsHBase internals
HBase internals
 
Hadoop, Taming Elephants
Hadoop, Taming ElephantsHadoop, Taming Elephants
Hadoop, Taming Elephants
 
SQL Server 2008 Fast Track Data Warehouse
SQL Server 2008 Fast Track Data WarehouseSQL Server 2008 Fast Track Data Warehouse
SQL Server 2008 Fast Track Data Warehouse
 
Cloudera Impala: A modern SQL Query Engine for Hadoop
Cloudera Impala: A modern SQL Query Engine for HadoopCloudera Impala: A modern SQL Query Engine for Hadoop
Cloudera Impala: A modern SQL Query Engine for Hadoop
 
Data Aggregation System
Data Aggregation SystemData Aggregation System
Data Aggregation System
 
Unified Batch & Stream Processing with Apache Samza
Unified Batch & Stream Processing with Apache SamzaUnified Batch & Stream Processing with Apache Samza
Unified Batch & Stream Processing with Apache Samza
 
Architecting the Future of Big Data & Search - Eric Baldeschwieler
Architecting the Future of Big Data & Search - Eric BaldeschwielerArchitecting the Future of Big Data & Search - Eric Baldeschwieler
Architecting the Future of Big Data & Search - Eric Baldeschwieler
 

Mais de parallellabs

Hpca2012 facebook keynote
Hpca2012 facebook keynoteHpca2012 facebook keynote
Hpca2012 facebook keynoteparallellabs
 
Intel's Nehalem Microarchitecture by Glenn Hinton
Intel's Nehalem Microarchitecture by Glenn HintonIntel's Nehalem Microarchitecture by Glenn Hinton
Intel's Nehalem Microarchitecture by Glenn Hintonparallellabs
 
Isca needle a_0610
Isca needle a_0610Isca needle a_0610
Isca needle a_0610parallellabs
 
Dean keynote-ladis2009-jeff-dean
Dean keynote-ladis2009-jeff-deanDean keynote-ladis2009-jeff-dean
Dean keynote-ladis2009-jeff-deanparallellabs
 
Building Software Systems at Google and Lessons Learned
Building Software Systems at Google and Lessons LearnedBuilding Software Systems at Google and Lessons Learned
Building Software Systems at Google and Lessons Learnedparallellabs
 
Chalmers microprocessor sept 2010
Chalmers microprocessor sept 2010Chalmers microprocessor sept 2010
Chalmers microprocessor sept 2010parallellabs
 

Mais de parallellabs (6)

Hpca2012 facebook keynote
Hpca2012 facebook keynoteHpca2012 facebook keynote
Hpca2012 facebook keynote
 
Intel's Nehalem Microarchitecture by Glenn Hinton
Intel's Nehalem Microarchitecture by Glenn HintonIntel's Nehalem Microarchitecture by Glenn Hinton
Intel's Nehalem Microarchitecture by Glenn Hinton
 
Isca needle a_0610
Isca needle a_0610Isca needle a_0610
Isca needle a_0610
 
Dean keynote-ladis2009-jeff-dean
Dean keynote-ladis2009-jeff-deanDean keynote-ladis2009-jeff-dean
Dean keynote-ladis2009-jeff-dean
 
Building Software Systems at Google and Lessons Learned
Building Software Systems at Google and Lessons LearnedBuilding Software Systems at Google and Lessons Learned
Building Software Systems at Google and Lessons Learned
 
Chalmers microprocessor sept 2010
Chalmers microprocessor sept 2010Chalmers microprocessor sept 2010
Chalmers microprocessor sept 2010
 

Último

Developing An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilDeveloping An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilV3cube
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxKatpro Technologies
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEarley Information Science
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024Results
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Servicegiselly40
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure servicePooja Nehwal
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 

Último (20)

Developing An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilDeveloping An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of Brazil
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen Frames
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 

Realtime Apache Hadoop at Facebook

  • 1.
  • 2. Realtime Apache Hadoop at Facebook Jonathan Gray & Dhruba Borthakur June 14, 2011 at SIGMOD, Athens
  • 3. Agenda 1 Why Apache Hadoop and HBase? 2 Quick Introduction to Apache HBase 3 Applications of HBase at Facebook
  • 4. Why Hadoop and HBase? For Realtime Data?
  • 5.
  • 6. Problems with existing stack ▪ MySQL is stable, but... ▪  Not inherently distributed ▪  Table size limits ▪  Inflexible schema ▪ Hadoop is scalable, but... ▪  MapReduce is slow and difficult ▪  Does not support random writes ▪  Poor support for random reads
  • 7. Specialized solutions ▪ High-throughput, persistent key-value ▪ Tokyo Cabinet ▪ Large scale data warehousing ▪  Hive/Hadoop ▪  PhotoStore ▪  Haystack ▪ Custom C++ servers for lots of other stuff
  • 8. What do we need in a data store? ▪ Requirements for Facebook Messages ▪  Massive datasets, with large subsets of cold data ▪  Elasticity and high availability ▪  Strong consistency within a datacenter ▪  Fault isolation ▪ Some non-requirements ▪  Network partitions within a single datacenter ▪  Active-active serving from multiple datacenters
  • 9. HBase satisfied our requirements ▪ In early 2010, engineers at FB compared DBs ▪  Apache Cassandra, Apache HBase, Sharded MySQL ▪ Compared performance, scalability, and features ▪  HBase gave excellent write performance, good reads ▪  HBase already included many nice-to-have features ▪  Atomic read-modify-write operations ▪  Multiple shards per server ▪  Bulk importing
  • 10. HBase uses HDFS We get the benefits of HDFS as a storage system for free ▪ Fault toleranceScalabilityChecksums fix corruptionsMapReduce ▪ Faultisolation of disksHDFS battle tested at petabyte scale at Facebook Lots of existing operational experience
  • 11. Apache HBase ▪ Originally part of Hadoop ▪  HBase adds random read/write access to HDFS ▪ Required some Hadoop changes for FB usage ▪  Fileappends ▪  HA NameNode ▪  Read optimizations ▪ Plus ZooKeeper!
  • 12. HBase System Overview Database Layer Master Backup Master Region Server Region Server Region Server ... HBASE Storage Coordination Layer Service Namenode Secondary Namenode ZK Peer ... Datanode Datanode Datanode ... ZK Peer HDFS Zookeeper Quorum
  • 13. HBase in a nutshell ▪ Sortedand column-oriented ▪ High write throughputHorizontal scalability Automatic failoverRegions sharded dynamically
  • 14. Applications of HBase at Facebook
  • 15. Use Case 1 Titan (Facebook Messages)
  • 16. The New Facebook Messages Messages IM/Chat email SMS
  • 17. Facebook Messaging ▪ High write throughputEvery message, instant message, SMS, and e-mailSearch indexes for all of the above ▪  Denormalized schema ▪ A product at massive scale on day one ▪  6kmessages a second ▪  50k instant messages a second ▪  300TB data growth/month compressed
  • 18. Typical Cell Layout ▪  Multiple cells for messaging ▪  20 servers/rack; 5 or more racks per cluster ▪  Controllers (master/Zookeeper) spread across racks ZooKeeper ZooKeeper ZooKeeper ZooKeeper ZooKeeper HDFS NameNode Backup NameNode Job Tracker HBase Master Backup Master Region Server Data Node Region Server Data Node Region Server Data Node Region Server Data Node Region Server Data Node Task Tracker Task Tracker Task Tracker Task Tracker Task Tracker 19x... 19x... 19x... 19x... 19x... Region Server Data Node Region Server Data Node Region Server Data Node Region Server Data Node Region Server Data Node Task Tracker Task Tracker Task Tracker Task Tracker Task Tracker Rack #1 Rack #2 Rack #3 Rack #4 Rack #5
  • 19. High Write Throughput Write Key Value Sequential Key val Key val write Key val Key val . . . Key val . . . Key val . . . Key val Sorted in memory Key val Key val Sequential Key val write Commit Log(in HDFS) Memstore (in memory)
  • 21. Automatic Failover Find new HBase client server from META server died No physical data copy because data is in HDFS
  • 22. Use Case 2 Puma (Facebook Insights)
  • 23.
  • 24.
  • 25. Puma ▪ Realtime Data Pipeline ▪  Utilize existing log aggregation pipeline (Scribe- HDFS) ▪  Extend low-latency capabilities of HDFS (Sync+PTail) ▪  High-throughput writes (HBase) ▪ Support for Realtime Aggregation ▪  Utilize HBase atomic increments to maintain roll-ups ▪  Complex HBase schemas for unique-user calculations
  • 26. Puma as Realtime MapReduce ▪ Map phase with PTail ▪  Divide the input log stream into N shards ▪  First version only supported random bucketing ▪  Now supports application-level bucketing ▪ Reduce phase with HBase ▪  Every row+column in HBase is an output key ▪  Aggregate key counts using atomic counters ▪  Can also maintain per-key lists or other structures
  • 27. Puma for Facebook Insights ▪ Realtime URL/Domain Insights ▪  Domain owners can see deep analytics for their site ▪  Clicks, Likes, Shares, Comments, Impressions ▪  Detailed demographic breakdowns (anonymized) ▪  Top URLs calculated per-domain and globally ▪ Massive Throughput ▪  Billions of URLs ▪  > 1 Million counter increments per second
  • 28.
  • 29.
  • 30. Use Case 3 ODS (Facebook Internal Metrics)
  • 31. ODS ▪ Operational Data Store ▪  System metrics (CPU, Memory, IO, Network) ▪  Application metrics (Web, DB, Caches) ▪  Facebook metrics (Usage, Revenue) ▪  Easily graph this data over time ▪  Supports complex aggregation, transformations, etc. ▪  Difficultto scale with MySQL ▪  Millions of unique time-series with billions of points ▪  Irregular data growth patterns
  • 32. Dynamic sharding of regions Region ... ... server overloaded
  • 33. Future of HBase at Facebook
  • 34. User and Graph Data in HBase
  • 35. HBase for the important stuff ▪ Looking at HBase to augment MySQL ▪  Only single row ACID from MySQL is used ▪  DBs are always fronted by an in-memory cache ▪  HBase is great at storing dictionaries and lists ▪ Database tier size determined by IOPS ▪  HBase does only sequential writes ▪  Lower IOPs translate to lower cost ▪  Larger tables on denser, cheaper, commodity nodes
  • 36. Conclusion ▪ Facebook investing in Realtime Hadoop/HBase ▪  Work of a large team of Facebook engineers ▪  Close collaboration with open source developers ▪ Much more detail in Realtime Hadoop paper ▪  Technical details about changes to Hadoop and HBase ▪  Operational experiences in production