Enviar pesquisa
Carregar
Low Latency “OLAP” with HBase - HBaseCon 2012
•
Transferir como PPTX, PDF
•
27 gostaram
•
26,065 visualizações
Cosmin Lehene
Seguir
Tecnologia
Negócios
Denunciar
Compartilhar
Denunciar
Compartilhar
1 de 35
Baixar agora
Recomendados
Low Latency OLAP with Hadoop and HBase
Low Latency OLAP with Hadoop and HBase
DataWorks Summit
HBase and Hadoop at Adobe
HBase and Hadoop at Adobe
Cosmin Lehene
DBA Basics guide
DBA Basics guide
azoznasser1
An Intro to Tuning Your SQL on DB2 for z/OS
An Intro to Tuning Your SQL on DB2 for z/OS
Willie Favero
DBA101
DBA101
Craig Mullins
DB2 V10 Migration Guidance
DB2 V10 Migration Guidance
Craig Mullins
JONSMITH10042016
JONSMITH10042016
Jon Smith
DB2 10 Smarter Database - IBM Tech Forum
DB2 10 Smarter Database - IBM Tech Forum
Surekha Parekh
Recomendados
Low Latency OLAP with Hadoop and HBase
Low Latency OLAP with Hadoop and HBase
DataWorks Summit
HBase and Hadoop at Adobe
HBase and Hadoop at Adobe
Cosmin Lehene
DBA Basics guide
DBA Basics guide
azoznasser1
An Intro to Tuning Your SQL on DB2 for z/OS
An Intro to Tuning Your SQL on DB2 for z/OS
Willie Favero
DBA101
DBA101
Craig Mullins
DB2 V10 Migration Guidance
DB2 V10 Migration Guidance
Craig Mullins
JONSMITH10042016
JONSMITH10042016
Jon Smith
DB2 10 Smarter Database - IBM Tech Forum
DB2 10 Smarter Database - IBM Tech Forum
Surekha Parekh
DB2 10 Webcast #1 - Overview And Migration Planning
DB2 10 Webcast #1 - Overview And Migration Planning
Laura Hood
Realtime Apache Hadoop at Facebook
Realtime Apache Hadoop at Facebook
parallellabs
SQL Server 2008 Fast Track Data Warehouse
SQL Server 2008 Fast Track Data Warehouse
Mark Ginnebaugh
Ta3
Ta3
leo1092
Monster
Monster
Jon Smith
Oracle10g new features
Oracle10g new features
Tanvi_Agrawal
DB210 Smarter Database IBM Tech Forum 2011
DB210 Smarter Database IBM Tech Forum 2011
Laura Hood
SQL Server Workshop Paul Bertucci
SQL Server Workshop Paul Bertucci
Mark Ginnebaugh
An Hour of DB2 Tips
An Hour of DB2 Tips
Craig Mullins
SQLFire Webinar
SQLFire Webinar
Carter Shanklin
SQLFire at Strata 2012
SQLFire at Strata 2012
Carter Shanklin
SDEC2011 Using Couchbase for social game scaling and speed
SDEC2011 Using Couchbase for social game scaling and speed
Korea Sdec
SQLFire lightning talk
SQLFire lightning talk
Carter Shanklin
Apache Kylin: OLAP Engine on Hadoop - Tech Deep Dive
Apache Kylin: OLAP Engine on Hadoop - Tech Deep Dive
Xu Jiang
Real-time “OLAP” for Big Data (+ use cases) - bigdata.ro 2013
Real-time “OLAP” for Big Data (+ use cases) - bigdata.ro 2013
Cosmin Lehene
6. Apache Kylin Roadmap and Community - Apache Kylin Meetup @Shanghai
6. Apache Kylin Roadmap and Community - Apache Kylin Meetup @Shanghai
Luke Han
HISTORIA ACTIVA
HISTORIA ACTIVA
Jose Ramon
Making Of Zoozoo (Part 1)
Making Of Zoozoo (Part 1)
nirvanafilmblog
Ha nacido un concursante
Ha nacido un concursante
Jose Ramon
DÍAS DE RADIO
DÍAS DE RADIO
Jose Ramon
Mismuseos.net: Art After Technology (putting cultural data to work)
Mismuseos.net: Art After Technology (putting cultural data to work)
GNOSS
RHBC Announcements 3/19/17
RHBC Announcements 3/19/17
rhbc
Mais conteúdo relacionado
Mais procurados
DB2 10 Webcast #1 - Overview And Migration Planning
DB2 10 Webcast #1 - Overview And Migration Planning
Laura Hood
Realtime Apache Hadoop at Facebook
Realtime Apache Hadoop at Facebook
parallellabs
SQL Server 2008 Fast Track Data Warehouse
SQL Server 2008 Fast Track Data Warehouse
Mark Ginnebaugh
Ta3
Ta3
leo1092
Monster
Monster
Jon Smith
Oracle10g new features
Oracle10g new features
Tanvi_Agrawal
DB210 Smarter Database IBM Tech Forum 2011
DB210 Smarter Database IBM Tech Forum 2011
Laura Hood
SQL Server Workshop Paul Bertucci
SQL Server Workshop Paul Bertucci
Mark Ginnebaugh
An Hour of DB2 Tips
An Hour of DB2 Tips
Craig Mullins
SQLFire Webinar
SQLFire Webinar
Carter Shanklin
SQLFire at Strata 2012
SQLFire at Strata 2012
Carter Shanklin
SDEC2011 Using Couchbase for social game scaling and speed
SDEC2011 Using Couchbase for social game scaling and speed
Korea Sdec
SQLFire lightning talk
SQLFire lightning talk
Carter Shanklin
Mais procurados
(13)
DB2 10 Webcast #1 - Overview And Migration Planning
DB2 10 Webcast #1 - Overview And Migration Planning
Realtime Apache Hadoop at Facebook
Realtime Apache Hadoop at Facebook
SQL Server 2008 Fast Track Data Warehouse
SQL Server 2008 Fast Track Data Warehouse
Ta3
Ta3
Monster
Monster
Oracle10g new features
Oracle10g new features
DB210 Smarter Database IBM Tech Forum 2011
DB210 Smarter Database IBM Tech Forum 2011
SQL Server Workshop Paul Bertucci
SQL Server Workshop Paul Bertucci
An Hour of DB2 Tips
An Hour of DB2 Tips
SQLFire Webinar
SQLFire Webinar
SQLFire at Strata 2012
SQLFire at Strata 2012
SDEC2011 Using Couchbase for social game scaling and speed
SDEC2011 Using Couchbase for social game scaling and speed
SQLFire lightning talk
SQLFire lightning talk
Destaque
Apache Kylin: OLAP Engine on Hadoop - Tech Deep Dive
Apache Kylin: OLAP Engine on Hadoop - Tech Deep Dive
Xu Jiang
Real-time “OLAP” for Big Data (+ use cases) - bigdata.ro 2013
Real-time “OLAP” for Big Data (+ use cases) - bigdata.ro 2013
Cosmin Lehene
6. Apache Kylin Roadmap and Community - Apache Kylin Meetup @Shanghai
6. Apache Kylin Roadmap and Community - Apache Kylin Meetup @Shanghai
Luke Han
HISTORIA ACTIVA
HISTORIA ACTIVA
Jose Ramon
Making Of Zoozoo (Part 1)
Making Of Zoozoo (Part 1)
nirvanafilmblog
Ha nacido un concursante
Ha nacido un concursante
Jose Ramon
DÍAS DE RADIO
DÍAS DE RADIO
Jose Ramon
Mismuseos.net: Art After Technology (putting cultural data to work)
Mismuseos.net: Art After Technology (putting cultural data to work)
GNOSS
RHBC Announcements 3/19/17
RHBC Announcements 3/19/17
rhbc
The cognitive approach to abnormality (2)
The cognitive approach to abnormality (2)
clivecaines
Elastic HBase on Mesos - HBaseCon 2015
Elastic HBase on Mesos - HBaseCon 2015
Cosmin Lehene
Apache Cassandra Lesson: Data Modelling and CQL3
Apache Cassandra Lesson: Data Modelling and CQL3
Markus Klems
Normas de cine
Normas de cine
Jose Ramon
Stateless Hypervisors at Scale
Stateless Hypervisors at Scale
Antony Messerl
Beacosystem V3
Beacosystem V3
Sean O'Sullivan
IS OLAP DEAD IN THE AGE OF BIG DATA?
IS OLAP DEAD IN THE AGE OF BIG DATA?
DataWorks Summit
Hadoop World 2011: Advanced HBase Schema Design
Hadoop World 2011: Advanced HBase Schema Design
Cloudera, Inc.
Test strategies for data processing pipelines
Test strategies for data processing pipelines
Lars Albertsson
A Survey of HBase Application Archetypes
A Survey of HBase Application Archetypes
HBaseCon
Apache NiFi- MiNiFi meetup Slides
Apache NiFi- MiNiFi meetup Slides
Isheeta Sanghi
Destaque
(20)
Apache Kylin: OLAP Engine on Hadoop - Tech Deep Dive
Apache Kylin: OLAP Engine on Hadoop - Tech Deep Dive
Real-time “OLAP” for Big Data (+ use cases) - bigdata.ro 2013
Real-time “OLAP” for Big Data (+ use cases) - bigdata.ro 2013
6. Apache Kylin Roadmap and Community - Apache Kylin Meetup @Shanghai
6. Apache Kylin Roadmap and Community - Apache Kylin Meetup @Shanghai
HISTORIA ACTIVA
HISTORIA ACTIVA
Making Of Zoozoo (Part 1)
Making Of Zoozoo (Part 1)
Ha nacido un concursante
Ha nacido un concursante
DÍAS DE RADIO
DÍAS DE RADIO
Mismuseos.net: Art After Technology (putting cultural data to work)
Mismuseos.net: Art After Technology (putting cultural data to work)
RHBC Announcements 3/19/17
RHBC Announcements 3/19/17
The cognitive approach to abnormality (2)
The cognitive approach to abnormality (2)
Elastic HBase on Mesos - HBaseCon 2015
Elastic HBase on Mesos - HBaseCon 2015
Apache Cassandra Lesson: Data Modelling and CQL3
Apache Cassandra Lesson: Data Modelling and CQL3
Normas de cine
Normas de cine
Stateless Hypervisors at Scale
Stateless Hypervisors at Scale
Beacosystem V3
Beacosystem V3
IS OLAP DEAD IN THE AGE OF BIG DATA?
IS OLAP DEAD IN THE AGE OF BIG DATA?
Hadoop World 2011: Advanced HBase Schema Design
Hadoop World 2011: Advanced HBase Schema Design
Test strategies for data processing pipelines
Test strategies for data processing pipelines
A Survey of HBase Application Archetypes
A Survey of HBase Application Archetypes
Apache NiFi- MiNiFi meetup Slides
Apache NiFi- MiNiFi meetup Slides
Semelhante a Low Latency “OLAP” with HBase - HBaseCon 2012
Xebia adobe flash mobile applications
Xebia adobe flash mobile applications
Michael Chaize
xTech2006_DB2onRails
xTech2006_DB2onRails
webuploader
Flex and LiveCycle Data Services Best Practices from the Trenches (Adobe MAX ...
Flex and LiveCycle Data Services Best Practices from the Trenches (Adobe MAX ...
François Le Droff
오라클 DR 및 복제 솔루션(Dbvisit 소개)
오라클 DR 및 복제 솔루션(Dbvisit 소개)
Linux Foundation Korea
Voldemort & Hadoop @ Linkedin, Hadoop User Group Jan 2010
Voldemort & Hadoop @ Linkedin, Hadoop User Group Jan 2010
Bhupesh Bansal
Hadoop and Voldemort @ LinkedIn
Hadoop and Voldemort @ LinkedIn
Hadoop User Group
Information Retrieval, Applied Statistics and Mathematics onBigData - German ...
Information Retrieval, Applied Statistics and Mathematics onBigData - German ...
Romeo Kienzler
Compression Options in Hadoop - A Tale of Tradeoffs
Compression Options in Hadoop - A Tale of Tradeoffs
DataWorks Summit
Software im SAP Umfeld_IBM DB2
Software im SAP Umfeld_IBM DB2
IBM Switzerland
Ibm db2 big sql
Ibm db2 big sql
ModusOptimum
Improving Hadoop Resiliency and Operational Efficiency with EMC Isilon
Improving Hadoop Resiliency and Operational Efficiency with EMC Isilon
DataWorks Summit/Hadoop Summit
Monitoring with Icinga2 at Adobe
Monitoring with Icinga2 at Adobe
Icinga
Leveraging Open Source to Manage SAN Performance
Leveraging Open Source to Manage SAN Performance
brettallison
IBMHadoopofferingTechline-Systems2015
IBMHadoopofferingTechline-Systems2015
Daniela Zuppini
Big Data - HDInsight and Power BI
Big Data - HDInsight and Power BI
Prasad Prabhu (PP)
Hadoop Summit San Jose 2013: Compression Options in Hadoop - A Tale of Tradeo...
Hadoop Summit San Jose 2013: Compression Options in Hadoop - A Tale of Tradeo...
Sumeet Singh
Compression Options in Hadoop - A Tale of Tradeoffs
Compression Options in Hadoop - A Tale of Tradeoffs
DataWorks Summit
OVH Lab - Enterprise Cloud Databases
OVH Lab - Enterprise Cloud Databases
OVHcloud
Hadoop World 2011: Building Web Analytics Processing on Hadoop at CBS Interac...
Hadoop World 2011: Building Web Analytics Processing on Hadoop at CBS Interac...
Cloudera, Inc.
Large Scale SQL Considerations for SharePoint Deployments
Large Scale SQL Considerations for SharePoint Deployments
Joel Oleson
Semelhante a Low Latency “OLAP” with HBase - HBaseCon 2012
(20)
Xebia adobe flash mobile applications
Xebia adobe flash mobile applications
xTech2006_DB2onRails
xTech2006_DB2onRails
Flex and LiveCycle Data Services Best Practices from the Trenches (Adobe MAX ...
Flex and LiveCycle Data Services Best Practices from the Trenches (Adobe MAX ...
오라클 DR 및 복제 솔루션(Dbvisit 소개)
오라클 DR 및 복제 솔루션(Dbvisit 소개)
Voldemort & Hadoop @ Linkedin, Hadoop User Group Jan 2010
Voldemort & Hadoop @ Linkedin, Hadoop User Group Jan 2010
Hadoop and Voldemort @ LinkedIn
Hadoop and Voldemort @ LinkedIn
Information Retrieval, Applied Statistics and Mathematics onBigData - German ...
Information Retrieval, Applied Statistics and Mathematics onBigData - German ...
Compression Options in Hadoop - A Tale of Tradeoffs
Compression Options in Hadoop - A Tale of Tradeoffs
Software im SAP Umfeld_IBM DB2
Software im SAP Umfeld_IBM DB2
Ibm db2 big sql
Ibm db2 big sql
Improving Hadoop Resiliency and Operational Efficiency with EMC Isilon
Improving Hadoop Resiliency and Operational Efficiency with EMC Isilon
Monitoring with Icinga2 at Adobe
Monitoring with Icinga2 at Adobe
Leveraging Open Source to Manage SAN Performance
Leveraging Open Source to Manage SAN Performance
IBMHadoopofferingTechline-Systems2015
IBMHadoopofferingTechline-Systems2015
Big Data - HDInsight and Power BI
Big Data - HDInsight and Power BI
Hadoop Summit San Jose 2013: Compression Options in Hadoop - A Tale of Tradeo...
Hadoop Summit San Jose 2013: Compression Options in Hadoop - A Tale of Tradeo...
Compression Options in Hadoop - A Tale of Tradeoffs
Compression Options in Hadoop - A Tale of Tradeoffs
OVH Lab - Enterprise Cloud Databases
OVH Lab - Enterprise Cloud Databases
Hadoop World 2011: Building Web Analytics Processing on Hadoop at CBS Interac...
Hadoop World 2011: Building Web Analytics Processing on Hadoop at CBS Interac...
Large Scale SQL Considerations for SharePoint Deployments
Large Scale SQL Considerations for SharePoint Deployments
Último
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Edi Saputra
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
apidays
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
ThousandEyes
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor Presentation
Dropbox
CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistan
danishmna97
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
apidays
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Juan lago vázquez
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptx
Rustici Software
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
apidays
Exploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with Milvus
Zilliz
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
Andrey Devyatkin
[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf
Sandro Moreira
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectors
Nanddeep Nachan
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Orbitshub
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Jeffrey Haguewood
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Victor Rentea
Ransomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdf
Overkill Security
Spring Boot vs Quarkus the ultimate battle - DevoxxUK
Spring Boot vs Quarkus the ultimate battle - DevoxxUK
Jago de Vreede
Último
(20)
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor Presentation
CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistan
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptx
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Exploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with Milvus
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectors
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Ransomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdf
Spring Boot vs Quarkus the ultimate battle - DevoxxUK
Spring Boot vs Quarkus the ultimate battle - DevoxxUK
Low Latency “OLAP” with HBase - HBaseCon 2012
1.
Low Latency “OLAP”
with HBase Cosmin Lehene | Adobe © 2012 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential.
2.
What we needed
… and built OLAP Semantics Low Latency Ingestion High Throughput Real-time Query API Not hardcoded to web analytics or x-, y-, z- analytics, but extensible © 2012 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential. 2
3.
Building Blocks
Dimensions, Metrics Aggregations Roll-up, drill-down, slicing and dicing, sorting © 2012 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential. 3
4.
OLAP 101 –
Queries example Date Countr City OS Browser Sale y 2012-05-21 USA NY Windows FF 0.0 2012-05-21 USA NY Windows FF 10.0 2012-05-22 USA SF OSX Chrome 25.0 2012-05-22 Canada Ontario Linux Chrome 0.0 2012-05-23 USA Chicago OSX Safari 15.0 5 visits, 2 4 cities: 3 OS-es 3 browsers 50.0 3 days countries NY: 2 Win: 2 FF: 2 3 sales USA: 4 SF: 1 OSX: 2 Chrome:2 Canada: 1 © 2012 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential. 4
5.
OLAP 101 –
Queries example Rolling up to country level: Country visits sales SELECT COUNT(visits), SUM(sales) USA 4 $50 GROUP BY country Canada 1 0 “Slicing” by browser Country visits sales SELECT COUNT(visits), SUM(sales) USA 2 $10 GROUP BY country Canada 0 0 HAVING browser = “FF” Top browsers by sales Browser sales visits SELECT SUM(sales), COUNT(visits) Chrome $25 2 GROUP BY browser Safari $15 1 ORDER BY sales FF $10 2 © 2012 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential. 5
6.
OLAP – Runtime
Aggregation vs. Pre-aggregation Aggregate at runtime Pre-aggregate Most flexible Fast Fast – scatter gather Efficient – O(1) Space efficient High throughput But But I/O, CPU intensive More effort to process (latency) slow for larger data Combinatorial explosion (space) low throughput No flexibility © 2012 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential. 6
7.
Pre-aggregation
Data needs to be summarized Can’t visualize 1B data points (no, not even with Retina display) Difficult to comprehend correlations among more than 3 dimensions Not all dimension groups are relevant Index on a needed basis (view selection problem) Runtime aggregation == TeraSort for every query? Pre-aggregate to reduce cardinality © 2012 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential. 7
8.
SaasBase
We tune both pre-aggregation level vs. runtime post-aggregation (ingestion speed + space ) vs. (query speed) Think materialized views from RDBMS © 2012 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential. 8
9.
SaasBase Domain Model
Mapping © 2012 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential. 9
10.
SaasBase - Domain
Model Mapping © 2012 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential. 10
11.
SaasBase - Ingestion,
Processing, Indexing, Querying © 2012 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential. 11
12.
SaasBase - Ingestion,
Processing, Indexing, Querying © 2012 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential. 12
13.
Ingestion © 2012 Adobe
Systems Incorporated. All Rights Reserved. Adobe Confidential. 13
14.
Ingestion throughput vs.
latency Historical data (large batches) Optimize for throughput Increments (latest data, smaller) Optimize for latency © 2012 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential. 14
15.
Large, granular input
strategies Slow listing in HDFS Archive processed files Filtering input FileDateFilter (log name patterns: log-YYYY-MM-dd-HH.log) TableInputFormat start/stop row File Index in HBase (track processed/new files) Map tasks overhead - stitching input splits 400K files => 400K map tasks => overhead, slow reduce copy CombineFileInputFormat – 2GB-splits => 500 splits for 1TB FixedMappersTableInputFormat (e.g. 5-region splits) © 2012 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential. 15
16.
Ingestion – Bulk
Import HFileOutputFormat (HFOF) 100s X faster than HBase API No need to recover from failed jobs No unnecessary load on machines * No shuffle - global reduce order required! e.g. first reduce key needs to be in the first region, last one in the last region Watch for uneven partitions © 2012 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential. 16
17.
HFOF – FileSizeDatePartitioner
1 partition(reduce) / day for initial import Uneven reduce (partitions) due to data growth over time Reduce k: 2010-12-04 = 500MB Reduce n: 2012-05-22 = 5GB => slow and will result in a 5GB region Balance reduce buckets based on input file sizes and the reduce key Generate sub-partitions based on predefined size (e.g. 1GB) © 2012 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential. 17
18.
Processing © 2012 Adobe
Systems Incorporated. All Rights Reserved. Adobe Confidential. 18
19.
Processing
Processing involves reading the Input (files, tables, events), pre- aggregating it (reducing cardinality) and generating tables that can be queried in real-time 1 year: 1B events => 100B data points indexed Query => scan 365 data points (e.g. daily page views) Processing could be either MR or real-time (e.g. Storm) © 2012 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential. 19
20.
Processing for OLAP
semantics GROUP BY (process, query) COUNT, SUM, AVG, etc. (process, query) SORT (process, query) HAVING (mostly query, can define pre-process constraints) © 2012 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential. 20
21.
SaasBase vs. SQL
Views Comparison © 2012 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential. 21
22.
reports.json entities definition ©
2012 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential. 22
23.
Processing Performance
read, map, partition, combine, copy, sort, reduce, write Read: Scan.setCaching() (I/O ~ buffer) Scan.setBatching() (avoid timeouts for abnormal input, e.g. 1M hits/visit) Even region distribution across cluster (distributes CPU, I/O) Map: No unnecessary transformations: Bytes.toString(bytes) + Bytes.toBytes(string) (CPU) Avoid GC : new X() (CPU, Memory) Avoid system calls (context switching) Stripping unnecessary data (I/O) © 2012 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential. 23
24.
Processing Performance
Hot (in memory) vs. Cold (on disk, on network) data Minimize I/O from disk/network Single shot MR job: SuperProcessor Emit all groups from one map() call Incremental processing Data format YYYY-MM-DD prefixed rowkey (HH:mm for more granularity) © 2012 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential. 24
25.
Indexing © 2012 Adobe
Systems Incorporated. All Rights Reserved. Adobe Confidential. 25
26.
HBase natural order:
hierarchical representation © 2012 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential. 26
27.
Indexing - Why
Example: top 10 cities ~50K [country, city] combinations per day Top 10 cities for 1 year => 365 (days) X 50K ~=15M data points scanned If you add gender => 30M If you add Device, OS, Browser … Might compress well, but think about the environment How much energy would you spend for just top 10 cities? * Image from: http://my.neutralexistence.com/images/Green-Earth.jpg © 2012 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential. 27
28.
Indexing with HBase
“10” < “2” GROUP BY year, month, country, city ORDER BY visits DESC LIMIT 10 Lexicographic sorting 2012/05/USA/0000000000/ 2012/05/USA/4294961296/San Francisco = 1000 visits* 2012/05/USA/4294961396/New York = 900 visits* . . . 2012/05/USA/9999999999/ scan “t” startrow => “2012/05/USA/”, limit => 10 * Padding numbers for lexicographic sorting: 1000 -> Long.MAX_VALUE – 1000 = 4294961296 © 2012 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential. 28
29.
Query Engine
Always reads indexed, compact data Query parsing Scan strategy Single vs. multiple scans Start/stop rows (prefixes, index positions, etc.) Index selection (volatile indexes with incremental processing) Deserialization Post-aggregation, sorting, fuzzy-sorting etc. Paging Custom dimension/metric class loading © 2012 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential. 29
30.
Conclusions
OLAP semantics on a simple data model Data as first class citizen Domain Specific “Language” for Dimensions, Metrics, Aggregations Tunable performance, resource allocation Framework for vertical analytics systems © 2012 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential. 30
31.
Thank you!
Cosmin Lehene @clehene http://hstack.org Credits: Andrei Dragomir Adrian Muraru Andrei Dulvac Raluca Podiuc Tudor Scurtu Bogdan Dragu Bogdan Drutu © 2012 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential. 31
32.
© 2012 Adobe
Systems Incorporated. All Rights Reserved. Adobe Confidential.
33.
OLAP 101 -
Rollup Countr Visits Sale y USA 4 $50 Canada 1 $0 Rollup: SELECT COUNT(visits), SUM(sales) GROUP BY country © 2012 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential. 33
34.
OLAP 101 -
Slicing Date Countr City OS Browser Sale y 2012-03-02 USA NY Windows FF 0.0 2012-03-02 USA NY Windows FF 10.0 2012-03-03 USA S OSX Chrome 25.0 2012-03-03 Canada Ontario Linux Chrome 0.0 2012-03-04 USA Chicago OSX Safari 15.0 5 visits, 2 4 cities: 3 OS-es 3 browsers 50.0 3 days countries NY: 2 Win: 2 FF: 2 3 sales USA: 4 SF: 1 OSX: 2 Chrome:2 Canada: 1 Filter or Segment or Slice (WHERE or HAVING) © 2012 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential. 34
35.
OLAP 101 –
Sorting, TOP n Date Countr City OS Browser Sale y Chrome $25 Safari $15 Firefox $10 SELECT SUM(sales) as total GROUP BY browser ORDER BY total © 2012 Adobe Systems Incorporated. All Rights Reserved. Adobe Confidential. 35
Notas do Editor
How many HBase users?
Data as first class citizen
Check contrast on projector
Just like speedvs space in general CS/algoQueries always hit indexes
Dimensions – readtransformserializedeserialize data attributesMetrics – read/transform/aggregate/serializeConstraints: ingestion filteringReport: instrument dimensions groups + metrics with aggregations, sorting
QUERY ENGINE -> INDEX(always realtime)
Initial import/process and NEW reports (not covered) on historical data
18K regions, upgrade to 0.92
DiagramHARD TO DIGEST (TOO MUCH INFO, TOO CONDENSED)
Process = aggregate,generate indexes (natural)Query = uses indexes, can do extra aggregation
LEFT: report definition, NOT a QUERYLIKE A VIEW - CREATED - THEN QUERIED
Inconsistent
Rowkey =dimensions group -> metrics (right)
GO BACK to EXPLAIN
>100K/sec/threadREALTIME
Data analysts work with familiar concepts
Baixar agora