SlideShare uma empresa Scribd logo
1 de 46
Baixar para ler offline
© King.com Ltd 2013 – Public 2
Datab
ase
Relati
onal
© King.com Ltd 2013 – Public
Agenda
3
•  Welcome!
•  A brief history of King
•  King data platform evolution
•  Enter Hive
•  Hive + DB
•  Hive + better DB
•  Questions?
© King.com Ltd 2013 – Public
A brief history of King
4
© King.com Ltd 2013 – Public
Who?
5
A brief history of King
© King.com Ltd 2013 – Public
Where?
6
A brief history of king
© King.com Ltd 2013 – Public
Web, social, mobile
7
A brief history of King
© King.com Ltd 2013 – Public
King in numbers
8
•  100 million daily active users
•  1 billion game plays per day
•  8 offices
•  10 billion events per day
•  Lots and lots of data…
A brief history of King
© King.com Ltd 2013 – Public
A brief history of me
andy.done@king.com
9
© King.com Ltd 2013 – Public
King data platform
evolution
10
© King.com Ltd 2013 – Public
Enter Hive
11
© King.com Ltd 2013 – Public
The road to big
12
Enter Hive
0
50
100
150
200
250
300
350
2011-02-16
2011-03-04
2011-03-20
2011-04-05
2011-04-21
2011-05-07
2011-05-23
2011-06-08
2011-06-24
2011-07-10
2011-07-26
2011-08-11
2011-08-27
2011-09-12
2011-09-28
2011-10-14
2011-10-30
2011-11-15
2011-12-01
2011-12-17
2012-01-02
2012-01-18
2012-02-03
2012-02-19
2012-03-06
2012-03-22
2012-04-07
2012-04-23
2012-05-09
2012-05-25
2012-06-10
2012-06-26
2012-07-12
2012-07-28
2012-08-13
2012-08-29
2012-09-14
2012-09-30
2012-10-16
2012-11-01
2012-11-17
2012-12-03
2012-12-19
2013-01-04
2013-01-20
2013-02-05
2013-02-21
2013-03-09
2013-03-25
2013-04-10
2013-04-26
Compressedeventsgigabytes/day
Browser Mobile
40 nodes
Qlikview says
no
Infobright
CE says no
10 nodes
20 nodes
© King.com Ltd 2013 – Public
Scaling accomplished
13
Enter Hive
© King.com Ltd 2013 – Public
Hive says…
14
Enter Hive
© King.com Ltd 2013 – Public
Data exploration
15
•  COUNT(*)
•  SELECT DISTINCT
•  COUNT, SUM… GROUP BY date
Enter Hive
© King.com Ltd 2013 – Public
Hive + DB = ?
16
© King.com Ltd 2013 – Public
Data platform 1.0
17
Hive + DB
Games
Event
data
Hive
Report
s
Data
scientis
ts
ETL
© King.com Ltd 2013 – Public
Data platform 1.5
18
Hive + DB
Games
Event
data
Hive DB
Report
s
Data
scientis
ts
ETL
© King.com Ltd 2013 – Public
Selection criteria
19
•  ‘Accessible’ pricing (free?)
•  Single node
•  Easy to set up
•  Low maintenance
Hive + DB
© King.com Ltd 2013 – Public
Contenders ready
20
•  Infobright
•  Columnar MySql engine
•  Light tuning and hinting
•  InfiniDB
•  Columnar MySql engine
•  Tuning-less
•  Faster for our use case
© King.com Ltd 2013 – Public
How’s that work out?
21
•  Paid its way
•  Popular
•  100s queries / day
•  Stability
•  Ceilings
•  Screwed by mobile
© King.com Ltd 2013 – Public
The road to big
22
Enter Hive
0
50
100
150
200
250
300
350
2011-02-16
2011-03-04
2011-03-20
2011-04-05
2011-04-21
2011-05-07
2011-05-23
2011-06-08
2011-06-24
2011-07-10
2011-07-26
2011-08-11
2011-08-27
2011-09-12
2011-09-28
2011-10-14
2011-10-30
2011-11-15
2011-12-01
2011-12-17
2012-01-02
2012-01-18
2012-02-03
2012-02-19
2012-03-06
2012-03-22
2012-04-07
2012-04-23
2012-05-09
2012-05-25
2012-06-10
2012-06-26
2012-07-12
2012-07-28
2012-08-13
2012-08-29
2012-09-14
2012-09-30
2012-10-16
2012-11-01
2012-11-17
2012-12-03
2012-12-19
2013-01-04
2013-01-20
2013-02-05
2013-02-21
2013-03-09
2013-03-25
2013-04-10
2013-04-26
Compressedeventsgigabytes/day
Browser Mobile
40 nodes
Qlikview says
no
Infobright
CE says no
10 nodes
20 nodes
InfiniDB
© King.com Ltd 2013 – Public
ETL?
23
© King.com Ltd 2013 – Public
Hive + better DB = ?
24
© King.com Ltd 2013 – Public
Data platform 2.0
25
Hive + better DB
Game
Event
data
Hive
Better
DB
Report
s
Data
scientis
ts
ETL
© King.com Ltd 2013 – Public
State of the market Jan 2013
26
•  Hadoop on steroids
•  Hadapt…
•  Impala
•  Nouvaeu Data
•  Platfora
•  SIsense
•  MPP analytics databases
•  Vertica
•  ExaSol
Hive + better DB
© King.com Ltd 2013 – Public
Contenders ready
27
Hive + better DB
Feature ExaSol Vertica
Processing In memory Disc optimised
Administration Web based Command line
Backup Web based Command line
Resiliency Hot spare Gradual
degradation
Tuning Self tuning User tuning
Licensing Allocated RAM Total storage
Vendor Smaller Larger
© King.com Ltd 2013 – Public
Disclaimers
28
•  Our data
•  Our queries
•  Our use case
•  Our results
Hive + better DB
© King.com Ltd 2013 – Public
This is our data
29
Hive + better DB
Table Row count
Mobile dimension 161 m
Social dimension 600 m
Mobile facts 1 B
Social facts 6.7 B
© King.com Ltd 2013 – Public
Single query
30
Hive + better DB
© King.com Ltd 2013 – Public
Single query
31
Hive + better DB
© King.com Ltd 2013 – Public
Single query
32
Hive + better DB
© King.com Ltd 2013 – Public
Single query
33
Hive + better DB
© King.com Ltd 2013 – Public
Cluster stats
34
Hive + better DB
Vertica ExaSol Hive InfiniDB
Nodes 4 4 19 1
Cores 64 48 228 32
RAM 512 Gb 288 Gb 1216 Gb 300 Gb
Discs 96 32 76 4
Hardware
cost / USD $$$$ $$ $$ $
Total cost /
USD $$$$$$ $$$$$ $$ $$
© King.com Ltd 2013 – Public
Concurrency 2
35
Hive + better DB
© King.com Ltd 2013 – Public
Concurrency 4
36
Hive + better DB
© King.com Ltd 2013 – Public
Concurrency 8
37
Hive + better DB
© King.com Ltd 2013 – Public
Concurrency 16
38
Hive + better DB
© King.com Ltd 2013 – Public
Overall run time
39
Hive + better DB
© King.com Ltd 2013 – Public
Picture:words
40
Hive + better DB
$1.9m
=
4 ExaSol
nodes
420 Hive nodes
© King.com Ltd 2013 – Public
This is a test
41
•  Ad hoc query tests
•  DML
•  INSERTs
•  UPDATEs
•  DELETEs
Hive + better DB
© King.com Ltd 2013 – Public
And in the real world
42
•  Faster processing times
•  4.5 hours to 20 minutes
•  Happier analysts
•  Happier data warehouse engineers
•  Happier ops
Hive + better DB
© King.com Ltd 2013 – Public
Conclusions
43
•  For structured workloads, consider a good analytic database to
complement your Hadoop infrastructure
•  ExaSol was an excellent fit for our use case
•  We’ll let you know how we get on!
Hive + better DB
© King.com Ltd 2013 – Public
Questions?
44
© King.com Ltd 2013 – Public
We’re hiring!
45
Thank you
© King.com Ltd 2013 – Public 46

Mais conteúdo relacionado

Mais procurados

Mais procurados (20)

IMC Summit 2016 Breakout - Nikita Ivanov - Shared In-Memory RDDs – Missing Li...
IMC Summit 2016 Breakout - Nikita Ivanov - Shared In-Memory RDDs – Missing Li...IMC Summit 2016 Breakout - Nikita Ivanov - Shared In-Memory RDDs – Missing Li...
IMC Summit 2016 Breakout - Nikita Ivanov - Shared In-Memory RDDs – Missing Li...
 
Accelerating Big Data Insights
Accelerating Big Data InsightsAccelerating Big Data Insights
Accelerating Big Data Insights
 
Apache Ignite vs Alluxio: Memory Speed Big Data Analytics
Apache Ignite vs Alluxio: Memory Speed Big Data AnalyticsApache Ignite vs Alluxio: Memory Speed Big Data Analytics
Apache Ignite vs Alluxio: Memory Speed Big Data Analytics
 
Getting Started with Amazon Redshift
Getting Started with Amazon RedshiftGetting Started with Amazon Redshift
Getting Started with Amazon Redshift
 
Real-Time Machine Learning with Redis, Apache Spark, Tensor Flow, and more wi...
Real-Time Machine Learning with Redis, Apache Spark, Tensor Flow, and more wi...Real-Time Machine Learning with Redis, Apache Spark, Tensor Flow, and more wi...
Real-Time Machine Learning with Redis, Apache Spark, Tensor Flow, and more wi...
 
Drilling into Data with Apache Drill
Drilling into Data with Apache DrillDrilling into Data with Apache Drill
Drilling into Data with Apache Drill
 
Apache Kudu (Incubating): New Hadoop Storage for Fast Analytics on Fast Data ...
Apache Kudu (Incubating): New Hadoop Storage for Fast Analytics on Fast Data ...Apache Kudu (Incubating): New Hadoop Storage for Fast Analytics on Fast Data ...
Apache Kudu (Incubating): New Hadoop Storage for Fast Analytics on Fast Data ...
 
Novinky v Oracle Database 18c
Novinky v Oracle Database 18cNovinky v Oracle Database 18c
Novinky v Oracle Database 18c
 
Accelerating the Hadoop data stack with Apache Ignite, Spark and Bigtop
Accelerating the Hadoop data stack with Apache Ignite, Spark and BigtopAccelerating the Hadoop data stack with Apache Ignite, Spark and Bigtop
Accelerating the Hadoop data stack with Apache Ignite, Spark and Bigtop
 
Running Analytics at the Speed of Your Business
Running Analytics at the Speed of Your BusinessRunning Analytics at the Speed of Your Business
Running Analytics at the Speed of Your Business
 
Introduction to Amazon DynamoDB
Introduction to Amazon DynamoDBIntroduction to Amazon DynamoDB
Introduction to Amazon DynamoDB
 
The Apache Spark config behind the indsutry's first 100TB Spark SQL benchmark
The Apache Spark config behind the indsutry's first 100TB Spark SQL benchmarkThe Apache Spark config behind the indsutry's first 100TB Spark SQL benchmark
The Apache Spark config behind the indsutry's first 100TB Spark SQL benchmark
 
Exponea - Kafka and Hadoop as components of architecture
Exponea  - Kafka and Hadoop as components of architectureExponea  - Kafka and Hadoop as components of architecture
Exponea - Kafka and Hadoop as components of architecture
 
Scylla Summit 2022: Multi-cloud State for k8s: Anthos and ScyllaDB
Scylla Summit 2022: Multi-cloud State for k8s: Anthos and ScyllaDBScylla Summit 2022: Multi-cloud State for k8s: Anthos and ScyllaDB
Scylla Summit 2022: Multi-cloud State for k8s: Anthos and ScyllaDB
 
Lifting the hood on spark streaming - StampedeCon 2015
Lifting the hood on spark streaming - StampedeCon 2015Lifting the hood on spark streaming - StampedeCon 2015
Lifting the hood on spark streaming - StampedeCon 2015
 
Riak CS Build Your Own Cloud Storage
Riak CS Build Your Own Cloud StorageRiak CS Build Your Own Cloud Storage
Riak CS Build Your Own Cloud Storage
 
Datastax Expedia
Datastax ExpediaDatastax Expedia
Datastax Expedia
 
Ignite Your Big Data With a Spark!
Ignite Your Big Data With a Spark!Ignite Your Big Data With a Spark!
Ignite Your Big Data With a Spark!
 
Expert summit SQL Server 2016
Expert summit   SQL Server 2016Expert summit   SQL Server 2016
Expert summit SQL Server 2016
 
Keynote - Hosted PostgreSQL: An Objective Look
Keynote - Hosted PostgreSQL: An Objective LookKeynote - Hosted PostgreSQL: An Objective Look
Keynote - Hosted PostgreSQL: An Objective Look
 

Destaque

Microsoft Dynamics CRM: Reporting and Dashboards
Microsoft Dynamics CRM: Reporting and DashboardsMicrosoft Dynamics CRM: Reporting and Dashboards
Microsoft Dynamics CRM: Reporting and Dashboards
Infinity Info Systems
 
Analytical Queries with Hive: SQL Windowing and Table Functions
Analytical Queries with Hive: SQL Windowing and Table FunctionsAnalytical Queries with Hive: SQL Windowing and Table Functions
Analytical Queries with Hive: SQL Windowing and Table Functions
DataWorks Summit
 
Aljoscha Krettek - Apache Flink for IoT: How Event-Time Processing Enables Ea...
Aljoscha Krettek - Apache Flink for IoT: How Event-Time Processing Enables Ea...Aljoscha Krettek - Apache Flink for IoT: How Event-Time Processing Enables Ea...
Aljoscha Krettek - Apache Flink for IoT: How Event-Time Processing Enables Ea...
Ververica
 

Destaque (15)

Comparison of MPP Data Warehouse Platforms
Comparison of MPP Data Warehouse PlatformsComparison of MPP Data Warehouse Platforms
Comparison of MPP Data Warehouse Platforms
 
GPORCA: Query Optimization as a Service
GPORCA: Query Optimization as a ServiceGPORCA: Query Optimization as a Service
GPORCA: Query Optimization as a Service
 
Microsoft Dynamics CRM: Reporting and Dashboards
Microsoft Dynamics CRM: Reporting and DashboardsMicrosoft Dynamics CRM: Reporting and Dashboards
Microsoft Dynamics CRM: Reporting and Dashboards
 
SQL: Query optimization in practice
SQL: Query optimization in practiceSQL: Query optimization in practice
SQL: Query optimization in practice
 
PostgreSQL and Benchmarks
PostgreSQL and BenchmarksPostgreSQL and Benchmarks
PostgreSQL and Benchmarks
 
Analytical Queries with Hive: SQL Windowing and Table Functions
Analytical Queries with Hive: SQL Windowing and Table FunctionsAnalytical Queries with Hive: SQL Windowing and Table Functions
Analytical Queries with Hive: SQL Windowing and Table Functions
 
Aljoscha Krettek - Apache Flink for IoT: How Event-Time Processing Enables Ea...
Aljoscha Krettek - Apache Flink for IoT: How Event-Time Processing Enables Ea...Aljoscha Krettek - Apache Flink for IoT: How Event-Time Processing Enables Ea...
Aljoscha Krettek - Apache Flink for IoT: How Event-Time Processing Enables Ea...
 
Google's Dremel
Google's DremelGoogle's Dremel
Google's Dremel
 
PostgreSQL performance improvements in 9.5 and 9.6
PostgreSQL performance improvements in 9.5 and 9.6PostgreSQL performance improvements in 9.5 and 9.6
PostgreSQL performance improvements in 9.5 and 9.6
 
AWS re:Invent 2016: Best Practices for Data Warehousing with Amazon Redshift ...
AWS re:Invent 2016: Best Practices for Data Warehousing with Amazon Redshift ...AWS re:Invent 2016: Best Practices for Data Warehousing with Amazon Redshift ...
AWS re:Invent 2016: Best Practices for Data Warehousing with Amazon Redshift ...
 
SQL to Hive Cheat Sheet
SQL to Hive Cheat SheetSQL to Hive Cheat Sheet
SQL to Hive Cheat Sheet
 
Tableau Drive, A new methodology for scaling your analytic culture
Tableau Drive, A new methodology for scaling your analytic cultureTableau Drive, A new methodology for scaling your analytic culture
Tableau Drive, A new methodology for scaling your analytic culture
 
Hybrid Data Warehouse Hadoop Implementations
Hybrid Data Warehouse Hadoop ImplementationsHybrid Data Warehouse Hadoop Implementations
Hybrid Data Warehouse Hadoop Implementations
 
TPC-H Column Store and MPP systems
TPC-H Column Store and MPP systemsTPC-H Column Store and MPP systems
TPC-H Column Store and MPP systems
 
Building A Modern Data Analytics Architecture on AWS
Building A Modern Data Analytics Architecture on AWSBuilding A Modern Data Analytics Architecture on AWS
Building A Modern Data Analytics Architecture on AWS
 

Semelhante a King hug uk

An Introduction to Druid
An Introduction to DruidAn Introduction to Druid
An Introduction to Druid
DataWorks Summit
 
Graphs fun vjug2
Graphs fun vjug2Graphs fun vjug2
Graphs fun vjug2
Neo4j
 
Data Con LA 2022 - Why Database Modernization Makes Your Data Decisions More ...
Data Con LA 2022 - Why Database Modernization Makes Your Data Decisions More ...Data Con LA 2022 - Why Database Modernization Makes Your Data Decisions More ...
Data Con LA 2022 - Why Database Modernization Makes Your Data Decisions More ...
Data Con LA
 
2 one spot redshift bigdatacamp 1.02
2 one spot redshift bigdatacamp 1.022 one spot redshift bigdatacamp 1.02
2 one spot redshift bigdatacamp 1.02
Valerie Akinson Brown
 
2 one spot redshift bigdatacamp 1.02
2 one spot redshift bigdatacamp 1.022 one spot redshift bigdatacamp 1.02
2 one spot redshift bigdatacamp 1.02
BigDataCamp
 

Semelhante a King hug uk (20)

raph Databases with Neo4j – Emil Eifrem
raph Databases with Neo4j – Emil Eifremraph Databases with Neo4j – Emil Eifrem
raph Databases with Neo4j – Emil Eifrem
 
How PostgreSQL became King
How PostgreSQL became KingHow PostgreSQL became King
How PostgreSQL became King
 
PGConf.ASIA 2019 Bali - How PostgreSQL Became King - Chris Travers
PGConf.ASIA 2019 Bali - How PostgreSQL Became King - Chris TraversPGConf.ASIA 2019 Bali - How PostgreSQL Became King - Chris Travers
PGConf.ASIA 2019 Bali - How PostgreSQL Became King - Chris Travers
 
An Introduction to Druid
An Introduction to DruidAn Introduction to Druid
An Introduction to Druid
 
Bootstrapping Your Graph Project with Neo4j Data Importer and Browser.pptx
Bootstrapping Your Graph Project with Neo4j Data Importer and Browser.pptxBootstrapping Your Graph Project with Neo4j Data Importer and Browser.pptx
Bootstrapping Your Graph Project with Neo4j Data Importer and Browser.pptx
 
Big data for the rest of us
Big data for the rest of usBig data for the rest of us
Big data for the rest of us
 
Interactive real time dashboards on data streams using Kafka, Druid, and Supe...
Interactive real time dashboards on data streams using Kafka, Druid, and Supe...Interactive real time dashboards on data streams using Kafka, Druid, and Supe...
Interactive real time dashboards on data streams using Kafka, Druid, and Supe...
 
Inforln.com Baan 4 to LN Upgrade Differences Training - Enterprise Planning
Inforln.com Baan 4 to LN Upgrade Differences Training - Enterprise PlanningInforln.com Baan 4 to LN Upgrade Differences Training - Enterprise Planning
Inforln.com Baan 4 to LN Upgrade Differences Training - Enterprise Planning
 
Graphs fun vjug2
Graphs fun vjug2Graphs fun vjug2
Graphs fun vjug2
 
Data Con LA 2022 - Why Database Modernization Makes Your Data Decisions More ...
Data Con LA 2022 - Why Database Modernization Makes Your Data Decisions More ...Data Con LA 2022 - Why Database Modernization Makes Your Data Decisions More ...
Data Con LA 2022 - Why Database Modernization Makes Your Data Decisions More ...
 
Big Data Ecosystem- Impetus Technologies
Big Data Ecosystem-  Impetus TechnologiesBig Data Ecosystem-  Impetus Technologies
Big Data Ecosystem- Impetus Technologies
 
IIMB presentation
IIMB presentationIIMB presentation
IIMB presentation
 
Interactive real-time dashboards on data streams using Kafka, Druid, and Supe...
Interactive real-time dashboards on data streams using Kafka, Druid, and Supe...Interactive real-time dashboards on data streams using Kafka, Druid, and Supe...
Interactive real-time dashboards on data streams using Kafka, Druid, and Supe...
 
Neo4j in Production: A look at Neo4j in the Real World
Neo4j in Production: A look at Neo4j in the Real WorldNeo4j in Production: A look at Neo4j in the Real World
Neo4j in Production: A look at Neo4j in the Real World
 
2 one spot redshift bigdatacamp 1.02
2 one spot redshift bigdatacamp 1.022 one spot redshift bigdatacamp 1.02
2 one spot redshift bigdatacamp 1.02
 
2 one spot redshift bigdatacamp 1.02
2 one spot redshift bigdatacamp 1.022 one spot redshift bigdatacamp 1.02
2 one spot redshift bigdatacamp 1.02
 
2 one spot redshift bigdatacamp 1.02
2 one spot redshift bigdatacamp 1.022 one spot redshift bigdatacamp 1.02
2 one spot redshift bigdatacamp 1.02
 
M|18 How Copart Switched to MariaDB and Reduced Costs During Growth
M|18 How Copart Switched to MariaDB and Reduced Costs During GrowthM|18 How Copart Switched to MariaDB and Reduced Costs During Growth
M|18 How Copart Switched to MariaDB and Reduced Costs During Growth
 
GraphSummit Toronto: Keynote - Innovating with Graphs
GraphSummit Toronto: Keynote - Innovating with Graphs GraphSummit Toronto: Keynote - Innovating with Graphs
GraphSummit Toronto: Keynote - Innovating with Graphs
 
Transforming Data Management in the Cloud with the Denodo Platform
Transforming Data Management in the Cloud with the Denodo PlatformTransforming Data Management in the Cloud with the Denodo Platform
Transforming Data Management in the Cloud with the Denodo Platform
 

Mais de huguk

Mais de huguk (20)

Data Wrangling on Hadoop - Olivier De Garrigues, Trifacta
Data Wrangling on Hadoop - Olivier De Garrigues, TrifactaData Wrangling on Hadoop - Olivier De Garrigues, Trifacta
Data Wrangling on Hadoop - Olivier De Garrigues, Trifacta
 
ether.camp - Hackathon & ether.camp intro
ether.camp - Hackathon & ether.camp introether.camp - Hackathon & ether.camp intro
ether.camp - Hackathon & ether.camp intro
 
Google Cloud Dataproc - Easier, faster, more cost-effective Spark and Hadoop
Google Cloud Dataproc - Easier, faster, more cost-effective Spark and HadoopGoogle Cloud Dataproc - Easier, faster, more cost-effective Spark and Hadoop
Google Cloud Dataproc - Easier, faster, more cost-effective Spark and Hadoop
 
Using Big Data techniques to query and store OpenStreetMap data. Stephen Knox...
Using Big Data techniques to query and store OpenStreetMap data. Stephen Knox...Using Big Data techniques to query and store OpenStreetMap data. Stephen Knox...
Using Big Data techniques to query and store OpenStreetMap data. Stephen Knox...
 
Extracting maximum value from data while protecting consumer privacy. Jason ...
Extracting maximum value from data while protecting consumer privacy.  Jason ...Extracting maximum value from data while protecting consumer privacy.  Jason ...
Extracting maximum value from data while protecting consumer privacy. Jason ...
 
Intelligence Augmented vs Artificial Intelligence. Alex Flamant, IBM Watson
Intelligence Augmented vs Artificial Intelligence. Alex Flamant, IBM WatsonIntelligence Augmented vs Artificial Intelligence. Alex Flamant, IBM Watson
Intelligence Augmented vs Artificial Intelligence. Alex Flamant, IBM Watson
 
Streaming Dataflow with Apache Flink
Streaming Dataflow with Apache Flink Streaming Dataflow with Apache Flink
Streaming Dataflow with Apache Flink
 
Lambda architecture on Spark, Kafka for real-time large scale ML
Lambda architecture on Spark, Kafka for real-time large scale MLLambda architecture on Spark, Kafka for real-time large scale ML
Lambda architecture on Spark, Kafka for real-time large scale ML
 
Today’s reality Hadoop with Spark- How to select the best Data Science approa...
Today’s reality Hadoop with Spark- How to select the best Data Science approa...Today’s reality Hadoop with Spark- How to select the best Data Science approa...
Today’s reality Hadoop with Spark- How to select the best Data Science approa...
 
Jonathon Southam: Venture Capital, Funding & Pitching
Jonathon Southam: Venture Capital, Funding & PitchingJonathon Southam: Venture Capital, Funding & Pitching
Jonathon Southam: Venture Capital, Funding & Pitching
 
Signal Media: Real-Time Media & News Monitoring
Signal Media: Real-Time Media & News MonitoringSignal Media: Real-Time Media & News Monitoring
Signal Media: Real-Time Media & News Monitoring
 
Dean Bryen: Scaling The Platform For Your Startup
Dean Bryen: Scaling The Platform For Your StartupDean Bryen: Scaling The Platform For Your Startup
Dean Bryen: Scaling The Platform For Your Startup
 
Peter Karney: Intro to the Digital catapult
Peter Karney: Intro to the Digital catapultPeter Karney: Intro to the Digital catapult
Peter Karney: Intro to the Digital catapult
 
Cytora: Real-Time Political Risk Analysis
Cytora:  Real-Time Political Risk AnalysisCytora:  Real-Time Political Risk Analysis
Cytora: Real-Time Political Risk Analysis
 
Cubitic: Predictive Analytics
Cubitic: Predictive AnalyticsCubitic: Predictive Analytics
Cubitic: Predictive Analytics
 
Bird.i: Earth Observation Data Made Social
Bird.i: Earth Observation Data Made SocialBird.i: Earth Observation Data Made Social
Bird.i: Earth Observation Data Made Social
 
Aiseedo: Real Time Machine Intelligence
Aiseedo: Real Time Machine IntelligenceAiseedo: Real Time Machine Intelligence
Aiseedo: Real Time Machine Intelligence
 
Secrets of Spark's success - Deenar Toraskar, Think Reactive
Secrets of Spark's success - Deenar Toraskar, Think Reactive Secrets of Spark's success - Deenar Toraskar, Think Reactive
Secrets of Spark's success - Deenar Toraskar, Think Reactive
 
TV Marketing and big data: cat and dog or thick as thieves? Krzysztof Osiewal...
TV Marketing and big data: cat and dog or thick as thieves? Krzysztof Osiewal...TV Marketing and big data: cat and dog or thick as thieves? Krzysztof Osiewal...
TV Marketing and big data: cat and dog or thick as thieves? Krzysztof Osiewal...
 
Hadoop - Looking to the Future By Arun Murthy
Hadoop - Looking to the Future By Arun MurthyHadoop - Looking to the Future By Arun Murthy
Hadoop - Looking to the Future By Arun Murthy
 

Último

Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
panagenda
 

Último (20)

Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
HTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesHTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation Strategies
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
Manulife - Insurer Innovation Award 2024
Manulife - Insurer Innovation Award 2024Manulife - Insurer Innovation Award 2024
Manulife - Insurer Innovation Award 2024
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsTop 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 

King hug uk

  • 1.
  • 2. © King.com Ltd 2013 – Public 2 Datab ase Relati onal
  • 3. © King.com Ltd 2013 – Public Agenda 3 •  Welcome! •  A brief history of King •  King data platform evolution •  Enter Hive •  Hive + DB •  Hive + better DB •  Questions?
  • 4. © King.com Ltd 2013 – Public A brief history of King 4
  • 5. © King.com Ltd 2013 – Public Who? 5 A brief history of King
  • 6. © King.com Ltd 2013 – Public Where? 6 A brief history of king
  • 7. © King.com Ltd 2013 – Public Web, social, mobile 7 A brief history of King
  • 8. © King.com Ltd 2013 – Public King in numbers 8 •  100 million daily active users •  1 billion game plays per day •  8 offices •  10 billion events per day •  Lots and lots of data… A brief history of King
  • 9. © King.com Ltd 2013 – Public A brief history of me andy.done@king.com 9
  • 10. © King.com Ltd 2013 – Public King data platform evolution 10
  • 11. © King.com Ltd 2013 – Public Enter Hive 11
  • 12. © King.com Ltd 2013 – Public The road to big 12 Enter Hive 0 50 100 150 200 250 300 350 2011-02-16 2011-03-04 2011-03-20 2011-04-05 2011-04-21 2011-05-07 2011-05-23 2011-06-08 2011-06-24 2011-07-10 2011-07-26 2011-08-11 2011-08-27 2011-09-12 2011-09-28 2011-10-14 2011-10-30 2011-11-15 2011-12-01 2011-12-17 2012-01-02 2012-01-18 2012-02-03 2012-02-19 2012-03-06 2012-03-22 2012-04-07 2012-04-23 2012-05-09 2012-05-25 2012-06-10 2012-06-26 2012-07-12 2012-07-28 2012-08-13 2012-08-29 2012-09-14 2012-09-30 2012-10-16 2012-11-01 2012-11-17 2012-12-03 2012-12-19 2013-01-04 2013-01-20 2013-02-05 2013-02-21 2013-03-09 2013-03-25 2013-04-10 2013-04-26 Compressedeventsgigabytes/day Browser Mobile 40 nodes Qlikview says no Infobright CE says no 10 nodes 20 nodes
  • 13. © King.com Ltd 2013 – Public Scaling accomplished 13 Enter Hive
  • 14. © King.com Ltd 2013 – Public Hive says… 14 Enter Hive
  • 15. © King.com Ltd 2013 – Public Data exploration 15 •  COUNT(*) •  SELECT DISTINCT •  COUNT, SUM… GROUP BY date Enter Hive
  • 16. © King.com Ltd 2013 – Public Hive + DB = ? 16
  • 17. © King.com Ltd 2013 – Public Data platform 1.0 17 Hive + DB Games Event data Hive Report s Data scientis ts ETL
  • 18. © King.com Ltd 2013 – Public Data platform 1.5 18 Hive + DB Games Event data Hive DB Report s Data scientis ts ETL
  • 19. © King.com Ltd 2013 – Public Selection criteria 19 •  ‘Accessible’ pricing (free?) •  Single node •  Easy to set up •  Low maintenance Hive + DB
  • 20. © King.com Ltd 2013 – Public Contenders ready 20 •  Infobright •  Columnar MySql engine •  Light tuning and hinting •  InfiniDB •  Columnar MySql engine •  Tuning-less •  Faster for our use case
  • 21. © King.com Ltd 2013 – Public How’s that work out? 21 •  Paid its way •  Popular •  100s queries / day •  Stability •  Ceilings •  Screwed by mobile
  • 22. © King.com Ltd 2013 – Public The road to big 22 Enter Hive 0 50 100 150 200 250 300 350 2011-02-16 2011-03-04 2011-03-20 2011-04-05 2011-04-21 2011-05-07 2011-05-23 2011-06-08 2011-06-24 2011-07-10 2011-07-26 2011-08-11 2011-08-27 2011-09-12 2011-09-28 2011-10-14 2011-10-30 2011-11-15 2011-12-01 2011-12-17 2012-01-02 2012-01-18 2012-02-03 2012-02-19 2012-03-06 2012-03-22 2012-04-07 2012-04-23 2012-05-09 2012-05-25 2012-06-10 2012-06-26 2012-07-12 2012-07-28 2012-08-13 2012-08-29 2012-09-14 2012-09-30 2012-10-16 2012-11-01 2012-11-17 2012-12-03 2012-12-19 2013-01-04 2013-01-20 2013-02-05 2013-02-21 2013-03-09 2013-03-25 2013-04-10 2013-04-26 Compressedeventsgigabytes/day Browser Mobile 40 nodes Qlikview says no Infobright CE says no 10 nodes 20 nodes InfiniDB
  • 23. © King.com Ltd 2013 – Public ETL? 23
  • 24. © King.com Ltd 2013 – Public Hive + better DB = ? 24
  • 25. © King.com Ltd 2013 – Public Data platform 2.0 25 Hive + better DB Game Event data Hive Better DB Report s Data scientis ts ETL
  • 26. © King.com Ltd 2013 – Public State of the market Jan 2013 26 •  Hadoop on steroids •  Hadapt… •  Impala •  Nouvaeu Data •  Platfora •  SIsense •  MPP analytics databases •  Vertica •  ExaSol Hive + better DB
  • 27. © King.com Ltd 2013 – Public Contenders ready 27 Hive + better DB Feature ExaSol Vertica Processing In memory Disc optimised Administration Web based Command line Backup Web based Command line Resiliency Hot spare Gradual degradation Tuning Self tuning User tuning Licensing Allocated RAM Total storage Vendor Smaller Larger
  • 28. © King.com Ltd 2013 – Public Disclaimers 28 •  Our data •  Our queries •  Our use case •  Our results Hive + better DB
  • 29. © King.com Ltd 2013 – Public This is our data 29 Hive + better DB Table Row count Mobile dimension 161 m Social dimension 600 m Mobile facts 1 B Social facts 6.7 B
  • 30. © King.com Ltd 2013 – Public Single query 30 Hive + better DB
  • 31. © King.com Ltd 2013 – Public Single query 31 Hive + better DB
  • 32. © King.com Ltd 2013 – Public Single query 32 Hive + better DB
  • 33. © King.com Ltd 2013 – Public Single query 33 Hive + better DB
  • 34. © King.com Ltd 2013 – Public Cluster stats 34 Hive + better DB Vertica ExaSol Hive InfiniDB Nodes 4 4 19 1 Cores 64 48 228 32 RAM 512 Gb 288 Gb 1216 Gb 300 Gb Discs 96 32 76 4 Hardware cost / USD $$$$ $$ $$ $ Total cost / USD $$$$$$ $$$$$ $$ $$
  • 35. © King.com Ltd 2013 – Public Concurrency 2 35 Hive + better DB
  • 36. © King.com Ltd 2013 – Public Concurrency 4 36 Hive + better DB
  • 37. © King.com Ltd 2013 – Public Concurrency 8 37 Hive + better DB
  • 38. © King.com Ltd 2013 – Public Concurrency 16 38 Hive + better DB
  • 39. © King.com Ltd 2013 – Public Overall run time 39 Hive + better DB
  • 40. © King.com Ltd 2013 – Public Picture:words 40 Hive + better DB $1.9m = 4 ExaSol nodes 420 Hive nodes
  • 41. © King.com Ltd 2013 – Public This is a test 41 •  Ad hoc query tests •  DML •  INSERTs •  UPDATEs •  DELETEs Hive + better DB
  • 42. © King.com Ltd 2013 – Public And in the real world 42 •  Faster processing times •  4.5 hours to 20 minutes •  Happier analysts •  Happier data warehouse engineers •  Happier ops Hive + better DB
  • 43. © King.com Ltd 2013 – Public Conclusions 43 •  For structured workloads, consider a good analytic database to complement your Hadoop infrastructure •  ExaSol was an excellent fit for our use case •  We’ll let you know how we get on! Hive + better DB
  • 44. © King.com Ltd 2013 – Public Questions? 44
  • 45. © King.com Ltd 2013 – Public We’re hiring! 45
  • 46. Thank you © King.com Ltd 2013 – Public 46