SlideShare uma empresa Scribd logo
1 de 28
Kylin Tour 
Kylin Tour 
--Extreme OLAP Engine for Big Data 
Luke Han | @lukehq 
Product Manager, eBay 
Oct 2014
Agenda 
 What’s Kylin 
 Architecture 
 Performance 
 Onboarding 
 Open Source 
 Q & A
Kylin Tour 
Big Data Era 
 More and more data becoming available on Hadoop 
 Limitations in existing Business Intelligence (BI) Tools 
 Limited support for Hadoop 
 Data size growing exponentially 
 High latency of interactive queries 
 Challenges to adopt Hadoop as interactive analysis system 
 Majority of analyst groups are SQL savvy 
 No mature SQL interface on Hadoop 
 Full OLAP capability on Hadoop ecosystem not ready yet
Kylin Tour 
Business Needs for Big Data Analysis 
 Sub-second query latency on billions of rows 
 High concurrency – thousands of end users 
 ANSI-SQL for both analysts and engineers 
 Full OLAP capability to offer advanced functionality 
 Support of high cardinality and high dimensions 
 Seamless Integration with BI Tools 
 Distributed and scale out architecture for large data volume
Kylin Tour 
Options Considered 
 Commercial Solutions 
 Open Source Options 
No one solution that could match our business needs
Build an engine from scratch 
6 Kylin Tour
Extreme OLAP Engine for Big Data 
Kylin is an open source Distributed Analytics Engine from eBay 
that provides SQL interface and multi-dimensional analysis 
(OLAP) on Hadoop supporting extremely large datasets 
Kylin Tour 
What’s Kylin 
kylin / ˈkiːˈlɪn / 麒麟 
--n. (in Chinese art) a mythical animal of composite form
Agenda 
 What’s Kylin 
 Architecture 
 Performance 
 Onboarding 
 Open Source 
 Q & A
Kylin is designed to accelerate analytics 
queries performance on Hadoop 
Strategy 
Kylin Tour 
Transaction 
Operation 
High Level 
Aggregation 
•Very High Level, e.g GMV by 
site by vertical by weeks 
Analysis 
Query 
•Middle level, e.g GMV by site by vertical, 
by category (level x) past 12 weeks 
Drill Down 
to Detail •Detail Level (Summary Table) 
Low Level 
Aggregation 
•First Level 
Aggragation 
Transaction 
Level 
•Transaction Data 
9 
Analytics Query Taxonomy 
80+% 
Analytics
Kylin Tour 10 
What’s OLAP Cube? 
• Cuboid = one combination of dimensions 
• Cube = all combination of dimensions (all cuboids) 
time, item 
time item location supplier 
time, item, location 
item, location 
time, location, supplier 
time, item, location, supplier 
time, location 
Time, supplier 
item, supplier 
location, supplier 
time, item, supplier 
item, location, supplier 
0-D(apex) cuboid 
1-D cuboids 
2-D cuboids 
3-D cuboids 
4-D(base) cuboid 
• Base vs. aggregate cells; ancestor vs. descendant cells; parent vs. child cells 
1. (9/15, milk, Urbana, Dairy_land) - <time, item, location, supplier> 
2. (9/15, milk, Urbana, *) - <time, item, location> 
3. (*, milk, Urbana, *) - <item, location> 
4. (*, milk, Chicago, *) - <item, location> 
5. (*, milk, *, *) - <item>
Kylin Tour 11 
From Rational to Key-Value
Data 
Cube 
OLAP 
Cube 
(HBase) 
Kylin Tour 12 
Kylin Architecture Overview 
REST Server 
Query Engine 
Cube 
SQL-Based Tool 
(BI Tools: Tableau…) 
Build Engine 
SQL 
Low Latency - 
Seconds 
Mid Latency - Minutes 
Routing 
3rd Party App 
(Web App, Mobile…) 
Metadata 
Hadoop 
Hive 
REST API JDBC/ODBC 
 Online Analysis Data Flow 
 Offline Data Flow 
 Clients/Users interactive with 
Kylin via SQL 
 OLAP Cube is transparent to 
users 
SQL 
Star Schema Data Key Value Data
Kylin Tour 13 
Why is Kylin fast? 
 Pre-built cube 
 No runtime Hive table scan and MapReduce job 
 Leveraging distributed computing infrastructure 
 Compression and encoding 
 Put “Computing” to “Data” 
 Cached
End User Cube Modeler Admin 
Row Key Column 
Val 1 
Val 2 
Val 3 
Kylin Tour 14 
Data Modeling 
Cube: … 
Fact Table: … 
Dimensions: … 
Measures: … 
Dim 
Fact Storage(HBase): … 
Dim Dim 
Source 
Star Schema 
row A 
row B 
row C 
Column Family 
Target 
HBase Storage 
Mapping 
Cube Metadata
Kylin Tour 15 
Query Engine – Calcite (Optiq) 
 Dynamic data management framework. 
 Formerly known as Optiq, Calcite is an Apache incubator project, used by 
Apache Drill and Apache Hive, among others. 
 http://optiq.incubator.apache.org
Kylin Tour 16 
Query Engine – Kylin Explain Plan 
SELECT test_cal_dt.week_beg_dt, test_category.category_name, test_category.lvl2_name, test_category.lvl3_name, 
test_kylin_fact.lstg_format_name, test_sites.site_name, SUM(test_kylin_fact.price) AS GMV, COUNT(*) AS TRANS_CNT 
FROM test_kylin_fact 
LEFT JOIN test_cal_dt ON test_kylin_fact.cal_dt = test_cal_dt.cal_dt 
LEFT JOIN test_category ON test_kylin_fact.leaf_categ_id = test_category.leaf_categ_id AND test_kylin_fact.lstg_site_id = 
test_category.site_id 
LEFT JOIN test_sites ON test_kylin_fact.lstg_site_id = test_sites.site_id 
WHERE test_kylin_fact.seller_id = 123456OR test_kylin_fact.lstg_format_name = ’New' 
GROUP BY test_cal_dt.week_beg_dt, test_category.category_name, test_category.lvl2_name, test_category.lvl3_name, 
test_kylin_fact.lstg_format_name,test_sites.site_name 
OLAPToEnumerableConverter 
OLAPProjectRel(WEEK_BEG_DT=[$0], category_name=[$1], CATEG_LVL2_NAME=[$2], CATEG_LVL3_NAME=[$3], 
LSTG_FORMAT_NAME=[$4], SITE_NAME=[$5], GMV=[CASE(=($7, 0), null, $6)], TRANS_CNT=[$8]) 
OLAPAggregateRel(group=[{0, 1, 2, 3, 4, 5}], agg#0=[$SUM0($6)], agg#1=[COUNT($6)], TRANS_CNT=[COUNT()]) 
OLAPProjectRel(WEEK_BEG_DT=[$13], category_name=[$21], CATEG_LVL2_NAME=[$15], CATEG_LVL3_NAME=[$14], 
LSTG_FORMAT_NAME=[$5], SITE_NAME=[$23], PRICE=[$0]) 
OLAPFilterRel(condition=[OR(=($3, 123456), =($5, ’New'))]) 
OLAPJoinRel(condition=[=($2, $25)], joinType=[left]) 
OLAPJoinRel(condition=[AND(=($6, $22), =($2, $17))], joinType=[left]) 
OLAPJoinRel(condition=[=($4, $12)], joinType=[left]) 
OLAPTableScan(table=[[DEFAULT, TEST_KYLIN_FACT]], fields=[[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11]]) 
OLAPTableScan(table=[[DEFAULT, TEST_CAL_DT]], fields=[[0, 1]]) 
OLAPTableScan(table=[[DEFAULT, test_category]], fields=[[0, 1, 2, 3, 4, 5, 6, 7, 8]]) 
OLAPTableScan(table=[[DEFAULT, TEST_SITES]], fields=[[0, 1, 2]])
Agenda 
 What’s Kylin 
 Architecture 
 Performance 
 Onboarding 
 Open Source 
 Q & A
Kylin Tour 18 
Kylin vs. Hive 
200 
150 
100 
50 
0 
# Query 
Type 
SQL #1 SQL #2 SQL #3 
Return Dataset Query 
On Kylin (s) 
Query 
On Hive (s) 
Comments 
1 High Level 
Aggregation 
4 0.129 157.437 1,217 times 
2 Analysis Query 22,669 1.615 109.206 68 times 
3 Drill Down to 
Detail 
325,029 12.058 113.123 9 times 
4 Drill Down to 
Detail 
524,780 22.42 6383.21 278 times 
5 Data Dump 972,002 49.054 N/A 
Hive 
Kylin 
High Level 
Aggregatio 
n 
Analysis 
Query 
Drill Down 
to Detail 
Low Level 
Aggregatio 
n 
Transactio 
n Level
Kylin Tour 19 
Performance -- Concurrency 
Linear scale out with more nodes
Kylin Tour 20 
Performance - Query Latency 
90%tile queries <5s
Agenda 
 What’s Kylin 
 Architecture 
 Performance 
 Onboarding 
 Open Source 
 Q & A
Kylin Tour 22 
How to onboard - Data 
 Landing data to Hadoop Cluster 
 Creating Hive tables with Star-Schema 
 Calculating dimensions cardinality statistics 
 Gathering query patterns
Kylin Tour 23 
How to onboard – Build Kylin Cube 
 Sync up Hive Metadata to Kylin 
 Create cube metadata via Cube Designer 
 Trigger cube build job 
 Setup Incremental job 
 Consume from front-end tool, like Tableau
Agenda 
 What’s Kylin 
 Architecture 
 Performance 
 Onboarding 
 Open Source 
 Q & A
Kylin Tour 25 
Open Source 
 Kylin Site: 
 http://kylin.io 
 Github Repo: 
 https://github.com/KylinOLAP 
 Google Group: 
 Kylin OLAP
Kylin Tour 26 
Kylin Ecosystem 
 Kylin Core 
 Fundamental framework of 
Kylin OLAP Engine 
 Extension 
 Plugins to support for 
additional functions and 
features 
 Integration 
 Lifecycle Management 
Support to integrate with 
other applications 
 Interface 
 Allows for third party users to 
build more features via user-interface 
atop Kylin core 
 Driver 
 ODBC and JDBC Drivers 
Kylin OLAP 
Core 
Extension 
 Security 
 SSO 
 Redis Storage 
Interface 
 Web Console 
 Customized BI 
 Ambari/Hue Plugin 
Integration 
 ODBC Driver 
 ETL 
 Scheduling
Kylin Tour 27 
What’s Next? 
• v1.1 (Current version under development) 
– InvertedIndex to support extra high cardinality and high dimensions 
– Remote JDBC Driver 
– Filter on Coprocessor 
• v1.2 
– Job Schedule and Priority 
– Capacity Management 
– Automation 
• v2.0 
– HOLAP (Hybrid OLAP to combine ROLAP and MOLAP) 
– In-Memory Analysis 
– More…
Kylin Tour 28 
Thanks 
 Kylin Web Site: 
 http://kylin.io 
 Google Group: 
 Kylin OLAP 
 Contact: 
 Luke Han | lukhan@ebay.com 
 Medha Samant | msamant@ebay.com

Mais conteúdo relacionado

Mais procurados

Mais procurados (20)

Apache Kylin’s Performance Boost from Apache HBase
Apache Kylin’s Performance Boost from Apache HBaseApache Kylin’s Performance Boost from Apache HBase
Apache Kylin’s Performance Boost from Apache HBase
 
6. Apache Kylin Roadmap and Community - Apache Kylin Meetup @Shanghai
6. Apache Kylin Roadmap and Community - Apache Kylin Meetup @Shanghai6. Apache Kylin Roadmap and Community - Apache Kylin Meetup @Shanghai
6. Apache Kylin Roadmap and Community - Apache Kylin Meetup @Shanghai
 
The Evolution of Apache Kylin by Luke Han
The Evolution of Apache Kylin by Luke HanThe Evolution of Apache Kylin by Luke Han
The Evolution of Apache Kylin by Luke Han
 
Apache Kylin – Cubes on Hadoop
Apache Kylin – Cubes on HadoopApache Kylin – Cubes on Hadoop
Apache Kylin – Cubes on Hadoop
 
Adding Spark support to Kylin at Bay Area Spark Meetup
Adding Spark support to Kylin at Bay Area Spark MeetupAdding Spark support to Kylin at Bay Area Spark Meetup
Adding Spark support to Kylin at Bay Area Spark Meetup
 
Apache Kylin 1.5 Updates
Apache Kylin 1.5 UpdatesApache Kylin 1.5 Updates
Apache Kylin 1.5 Updates
 
1. Apache Kylin Deep Dive - Streaming and Plugin Architecture - Apache Kylin ...
1. Apache Kylin Deep Dive - Streaming and Plugin Architecture - Apache Kylin ...1. Apache Kylin Deep Dive - Streaming and Plugin Architecture - Apache Kylin ...
1. Apache Kylin Deep Dive - Streaming and Plugin Architecture - Apache Kylin ...
 
Apache Kylin on HBase: Extreme OLAP engine for big data
Apache Kylin on HBase: Extreme OLAP engine for big dataApache Kylin on HBase: Extreme OLAP engine for big data
Apache Kylin on HBase: Extreme OLAP engine for big data
 
Design cube in Apache Kylin
Design cube in Apache KylinDesign cube in Apache Kylin
Design cube in Apache Kylin
 
Apache Kylin - Balance between space and time - Hadoop Summit 2015
Apache Kylin -  Balance between space and time - Hadoop Summit 2015Apache Kylin -  Balance between space and time - Hadoop Summit 2015
Apache Kylin - Balance between space and time - Hadoop Summit 2015
 
Big Data MDX with Mondrian and Apache Kylin
Big Data MDX with Mondrian and Apache KylinBig Data MDX with Mondrian and Apache Kylin
Big Data MDX with Mondrian and Apache Kylin
 
Kylin Engineering Principles
Kylin Engineering PrinciplesKylin Engineering Principles
Kylin Engineering Principles
 
Apache kylin 2.0: from classic olap to real-time data warehouse
Apache kylin 2.0: from classic olap to real-time data warehouseApache kylin 2.0: from classic olap to real-time data warehouse
Apache kylin 2.0: from classic olap to real-time data warehouse
 
Apache Kylin Use Cases in China and Japan
Apache Kylin Use Cases in China and JapanApache Kylin Use Cases in China and Japan
Apache Kylin Use Cases in China and Japan
 
Apache Kylin - OLAP Cubes for SQL on Hadoop
Apache Kylin - OLAP Cubes for SQL on HadoopApache Kylin - OLAP Cubes for SQL on Hadoop
Apache Kylin - OLAP Cubes for SQL on Hadoop
 
Datacubes in Apache Hive at ApacheCon
Datacubes in Apache Hive at ApacheConDatacubes in Apache Hive at ApacheCon
Datacubes in Apache Hive at ApacheCon
 
Apache Kylin @ Big Data Europe 2015
Apache Kylin @ Big Data Europe 2015Apache Kylin @ Big Data Europe 2015
Apache Kylin @ Big Data Europe 2015
 
Apache Kylin: OLAP Engine on Hadoop - Tech Deep Dive
Apache Kylin: OLAP Engine on Hadoop - Tech Deep DiveApache Kylin: OLAP Engine on Hadoop - Tech Deep Dive
Apache Kylin: OLAP Engine on Hadoop - Tech Deep Dive
 
The Apache Way - Building Open Source Community in China - Luke Han
The Apache Way - Building Open Source Community in China - Luke HanThe Apache Way - Building Open Source Community in China - Luke Han
The Apache Way - Building Open Source Community in China - Luke Han
 
The Evolution of Apache Kylin
The Evolution of Apache KylinThe Evolution of Apache Kylin
The Evolution of Apache Kylin
 

Semelhante a Kylin OLAP Engine Tour

Creating Interactive Olap Applications With My Sql Enterprise And Mondrian Pr...
Creating Interactive Olap Applications With My Sql Enterprise And Mondrian Pr...Creating Interactive Olap Applications With My Sql Enterprise And Mondrian Pr...
Creating Interactive Olap Applications With My Sql Enterprise And Mondrian Pr...
Indus Khaitan
 
2018: State of the Dolphin, MySQL Keynote at Percona Live Europe 2018, Frankf...
2018: State of the Dolphin, MySQL Keynote at Percona Live Europe 2018, Frankf...2018: State of the Dolphin, MySQL Keynote at Percona Live Europe 2018, Frankf...
2018: State of the Dolphin, MySQL Keynote at Percona Live Europe 2018, Frankf...
Geir Høydalsvik
 

Semelhante a Kylin OLAP Engine Tour (20)

HBaseCon 2015: Apache Kylin - Extreme OLAP Engine for Hadoop
HBaseCon 2015: Apache Kylin - Extreme OLAP  Engine for HadoopHBaseCon 2015: Apache Kylin - Extreme OLAP  Engine for Hadoop
HBaseCon 2015: Apache Kylin - Extreme OLAP Engine for Hadoop
 
ApacheKylin_HBaseCon2015
ApacheKylin_HBaseCon2015ApacheKylin_HBaseCon2015
ApacheKylin_HBaseCon2015
 
Kylin and Druid Presentation
Kylin and Druid PresentationKylin and Druid Presentation
Kylin and Druid Presentation
 
Cloud-native Semantic Layer on Data Lake
Cloud-native Semantic Layer on Data LakeCloud-native Semantic Layer on Data Lake
Cloud-native Semantic Layer on Data Lake
 
A Hands-on Intro to Data Science and R Presentation.ppt
A Hands-on Intro to Data Science and R Presentation.pptA Hands-on Intro to Data Science and R Presentation.ppt
A Hands-on Intro to Data Science and R Presentation.ppt
 
PCM18 (Big Data Analytics)
PCM18 (Big Data Analytics)PCM18 (Big Data Analytics)
PCM18 (Big Data Analytics)
 
A Smarter Pig: Building a SQL interface to Pig using Apache Calcite
A Smarter Pig: Building a SQL interface to Pig using Apache CalciteA Smarter Pig: Building a SQL interface to Pig using Apache Calcite
A Smarter Pig: Building a SQL interface to Pig using Apache Calcite
 
Oracle OpenWorld 2016 Review - Focus on Data, BigData, Streaming Data, Machin...
Oracle OpenWorld 2016 Review - Focus on Data, BigData, Streaming Data, Machin...Oracle OpenWorld 2016 Review - Focus on Data, BigData, Streaming Data, Machin...
Oracle OpenWorld 2016 Review - Focus on Data, BigData, Streaming Data, Machin...
 
Oow2016 review-db-dev-bigdata-BI
Oow2016 review-db-dev-bigdata-BIOow2016 review-db-dev-bigdata-BI
Oow2016 review-db-dev-bigdata-BI
 
Creating Interactive Olap Applications With My Sql Enterprise And Mondrian Pr...
Creating Interactive Olap Applications With My Sql Enterprise And Mondrian Pr...Creating Interactive Olap Applications With My Sql Enterprise And Mondrian Pr...
Creating Interactive Olap Applications With My Sql Enterprise And Mondrian Pr...
 
2018: State of the Dolphin, MySQL Keynote at Percona Live Europe 2018, Frankf...
2018: State of the Dolphin, MySQL Keynote at Percona Live Europe 2018, Frankf...2018: State of the Dolphin, MySQL Keynote at Percona Live Europe 2018, Frankf...
2018: State of the Dolphin, MySQL Keynote at Percona Live Europe 2018, Frankf...
 
Querying Druid in SQL with Superset
Querying Druid in SQL with SupersetQuerying Druid in SQL with Superset
Querying Druid in SQL with Superset
 
A smarter Pig: Building a SQL interface to Apache Pig using Apache Calcite
A smarter Pig: Building a SQL interface to Apache Pig using Apache CalciteA smarter Pig: Building a SQL interface to Apache Pig using Apache Calcite
A smarter Pig: Building a SQL interface to Apache Pig using Apache Calcite
 
Accelerating Big Data Analytics with Apache Kylin
Accelerating Big Data Analytics with Apache KylinAccelerating Big Data Analytics with Apache Kylin
Accelerating Big Data Analytics with Apache Kylin
 
Dev Ops Training
Dev Ops TrainingDev Ops Training
Dev Ops Training
 
When OLAP Meets Real-Time, What Happens in eBay?
When OLAP Meets Real-Time, What Happens in eBay?When OLAP Meets Real-Time, What Happens in eBay?
When OLAP Meets Real-Time, What Happens in eBay?
 
Google Cloud Dataflow Two Worlds Become a Much Better One
Google Cloud Dataflow Two Worlds Become a Much Better OneGoogle Cloud Dataflow Two Worlds Become a Much Better One
Google Cloud Dataflow Two Worlds Become a Much Better One
 
Self-serve analytics journey at Celtra: Snowflake, Spark, and Databricks
Self-serve analytics journey at Celtra: Snowflake, Spark, and DatabricksSelf-serve analytics journey at Celtra: Snowflake, Spark, and Databricks
Self-serve analytics journey at Celtra: Snowflake, Spark, and Databricks
 
Apache Kylin - Balance Between Space and Time
Apache Kylin - Balance Between Space and TimeApache Kylin - Balance Between Space and Time
Apache Kylin - Balance Between Space and Time
 
NoSQL and MySQL: News about JSON
NoSQL and MySQL: News about JSONNoSQL and MySQL: News about JSON
NoSQL and MySQL: News about JSON
 

Mais de Luke Han

Mais de Luke Han (8)

Augmented OLAP for Big Data
Augmented OLAP for Big DataAugmented OLAP for Big Data
Augmented OLAP for Big Data
 
Apache Kylin and Use Cases - 2018 Big Data Spain
Apache Kylin and Use Cases - 2018 Big Data SpainApache Kylin and Use Cases - 2018 Big Data Spain
Apache Kylin and Use Cases - 2018 Big Data Spain
 
Refactoring your EDW with Mobile Analytics Products
Refactoring your EDW with Mobile Analytics ProductsRefactoring your EDW with Mobile Analytics Products
Refactoring your EDW with Mobile Analytics Products
 
Building Enterprise OLAP on Hadoop for FSI
Building Enterprise OLAP on Hadoop for FSIBuilding Enterprise OLAP on Hadoop for FSI
Building Enterprise OLAP on Hadoop for FSI
 
3. Apache Tez Introducation - Apache Kylin Meetup @Shanghai
3. Apache Tez Introducation - Apache Kylin Meetup @Shanghai3. Apache Tez Introducation - Apache Kylin Meetup @Shanghai
3. Apache Tez Introducation - Apache Kylin Meetup @Shanghai
 
5. Apache Kylin的金融大数据应用场景 - Apache Kylin Meetup @Shanghai
5. Apache Kylin的金融大数据应用场景 - Apache Kylin Meetup @Shanghai5. Apache Kylin的金融大数据应用场景 - Apache Kylin Meetup @Shanghai
5. Apache Kylin的金融大数据应用场景 - Apache Kylin Meetup @Shanghai
 
4.Building a Data Product using apache Zeppelin - Apache Kylin Meetup @Shanghai
4.Building a Data Product using apache Zeppelin - Apache Kylin Meetup @Shanghai4.Building a Data Product using apache Zeppelin - Apache Kylin Meetup @Shanghai
4.Building a Data Product using apache Zeppelin - Apache Kylin Meetup @Shanghai
 
Actuate presentation 2011
Actuate presentation   2011Actuate presentation   2011
Actuate presentation 2011
 

Último

CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
giselly40
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
vu2urc
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
Earley Information Science
 

Último (20)

Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your Business
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 

Kylin OLAP Engine Tour

  • 1. Kylin Tour Kylin Tour --Extreme OLAP Engine for Big Data Luke Han | @lukehq Product Manager, eBay Oct 2014
  • 2. Agenda  What’s Kylin  Architecture  Performance  Onboarding  Open Source  Q & A
  • 3. Kylin Tour Big Data Era  More and more data becoming available on Hadoop  Limitations in existing Business Intelligence (BI) Tools  Limited support for Hadoop  Data size growing exponentially  High latency of interactive queries  Challenges to adopt Hadoop as interactive analysis system  Majority of analyst groups are SQL savvy  No mature SQL interface on Hadoop  Full OLAP capability on Hadoop ecosystem not ready yet
  • 4. Kylin Tour Business Needs for Big Data Analysis  Sub-second query latency on billions of rows  High concurrency – thousands of end users  ANSI-SQL for both analysts and engineers  Full OLAP capability to offer advanced functionality  Support of high cardinality and high dimensions  Seamless Integration with BI Tools  Distributed and scale out architecture for large data volume
  • 5. Kylin Tour Options Considered  Commercial Solutions  Open Source Options No one solution that could match our business needs
  • 6. Build an engine from scratch 6 Kylin Tour
  • 7. Extreme OLAP Engine for Big Data Kylin is an open source Distributed Analytics Engine from eBay that provides SQL interface and multi-dimensional analysis (OLAP) on Hadoop supporting extremely large datasets Kylin Tour What’s Kylin kylin / ˈkiːˈlɪn / 麒麟 --n. (in Chinese art) a mythical animal of composite form
  • 8. Agenda  What’s Kylin  Architecture  Performance  Onboarding  Open Source  Q & A
  • 9. Kylin is designed to accelerate analytics queries performance on Hadoop Strategy Kylin Tour Transaction Operation High Level Aggregation •Very High Level, e.g GMV by site by vertical by weeks Analysis Query •Middle level, e.g GMV by site by vertical, by category (level x) past 12 weeks Drill Down to Detail •Detail Level (Summary Table) Low Level Aggregation •First Level Aggragation Transaction Level •Transaction Data 9 Analytics Query Taxonomy 80+% Analytics
  • 10. Kylin Tour 10 What’s OLAP Cube? • Cuboid = one combination of dimensions • Cube = all combination of dimensions (all cuboids) time, item time item location supplier time, item, location item, location time, location, supplier time, item, location, supplier time, location Time, supplier item, supplier location, supplier time, item, supplier item, location, supplier 0-D(apex) cuboid 1-D cuboids 2-D cuboids 3-D cuboids 4-D(base) cuboid • Base vs. aggregate cells; ancestor vs. descendant cells; parent vs. child cells 1. (9/15, milk, Urbana, Dairy_land) - <time, item, location, supplier> 2. (9/15, milk, Urbana, *) - <time, item, location> 3. (*, milk, Urbana, *) - <item, location> 4. (*, milk, Chicago, *) - <item, location> 5. (*, milk, *, *) - <item>
  • 11. Kylin Tour 11 From Rational to Key-Value
  • 12. Data Cube OLAP Cube (HBase) Kylin Tour 12 Kylin Architecture Overview REST Server Query Engine Cube SQL-Based Tool (BI Tools: Tableau…) Build Engine SQL Low Latency - Seconds Mid Latency - Minutes Routing 3rd Party App (Web App, Mobile…) Metadata Hadoop Hive REST API JDBC/ODBC  Online Analysis Data Flow  Offline Data Flow  Clients/Users interactive with Kylin via SQL  OLAP Cube is transparent to users SQL Star Schema Data Key Value Data
  • 13. Kylin Tour 13 Why is Kylin fast?  Pre-built cube  No runtime Hive table scan and MapReduce job  Leveraging distributed computing infrastructure  Compression and encoding  Put “Computing” to “Data”  Cached
  • 14. End User Cube Modeler Admin Row Key Column Val 1 Val 2 Val 3 Kylin Tour 14 Data Modeling Cube: … Fact Table: … Dimensions: … Measures: … Dim Fact Storage(HBase): … Dim Dim Source Star Schema row A row B row C Column Family Target HBase Storage Mapping Cube Metadata
  • 15. Kylin Tour 15 Query Engine – Calcite (Optiq)  Dynamic data management framework.  Formerly known as Optiq, Calcite is an Apache incubator project, used by Apache Drill and Apache Hive, among others.  http://optiq.incubator.apache.org
  • 16. Kylin Tour 16 Query Engine – Kylin Explain Plan SELECT test_cal_dt.week_beg_dt, test_category.category_name, test_category.lvl2_name, test_category.lvl3_name, test_kylin_fact.lstg_format_name, test_sites.site_name, SUM(test_kylin_fact.price) AS GMV, COUNT(*) AS TRANS_CNT FROM test_kylin_fact LEFT JOIN test_cal_dt ON test_kylin_fact.cal_dt = test_cal_dt.cal_dt LEFT JOIN test_category ON test_kylin_fact.leaf_categ_id = test_category.leaf_categ_id AND test_kylin_fact.lstg_site_id = test_category.site_id LEFT JOIN test_sites ON test_kylin_fact.lstg_site_id = test_sites.site_id WHERE test_kylin_fact.seller_id = 123456OR test_kylin_fact.lstg_format_name = ’New' GROUP BY test_cal_dt.week_beg_dt, test_category.category_name, test_category.lvl2_name, test_category.lvl3_name, test_kylin_fact.lstg_format_name,test_sites.site_name OLAPToEnumerableConverter OLAPProjectRel(WEEK_BEG_DT=[$0], category_name=[$1], CATEG_LVL2_NAME=[$2], CATEG_LVL3_NAME=[$3], LSTG_FORMAT_NAME=[$4], SITE_NAME=[$5], GMV=[CASE(=($7, 0), null, $6)], TRANS_CNT=[$8]) OLAPAggregateRel(group=[{0, 1, 2, 3, 4, 5}], agg#0=[$SUM0($6)], agg#1=[COUNT($6)], TRANS_CNT=[COUNT()]) OLAPProjectRel(WEEK_BEG_DT=[$13], category_name=[$21], CATEG_LVL2_NAME=[$15], CATEG_LVL3_NAME=[$14], LSTG_FORMAT_NAME=[$5], SITE_NAME=[$23], PRICE=[$0]) OLAPFilterRel(condition=[OR(=($3, 123456), =($5, ’New'))]) OLAPJoinRel(condition=[=($2, $25)], joinType=[left]) OLAPJoinRel(condition=[AND(=($6, $22), =($2, $17))], joinType=[left]) OLAPJoinRel(condition=[=($4, $12)], joinType=[left]) OLAPTableScan(table=[[DEFAULT, TEST_KYLIN_FACT]], fields=[[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11]]) OLAPTableScan(table=[[DEFAULT, TEST_CAL_DT]], fields=[[0, 1]]) OLAPTableScan(table=[[DEFAULT, test_category]], fields=[[0, 1, 2, 3, 4, 5, 6, 7, 8]]) OLAPTableScan(table=[[DEFAULT, TEST_SITES]], fields=[[0, 1, 2]])
  • 17. Agenda  What’s Kylin  Architecture  Performance  Onboarding  Open Source  Q & A
  • 18. Kylin Tour 18 Kylin vs. Hive 200 150 100 50 0 # Query Type SQL #1 SQL #2 SQL #3 Return Dataset Query On Kylin (s) Query On Hive (s) Comments 1 High Level Aggregation 4 0.129 157.437 1,217 times 2 Analysis Query 22,669 1.615 109.206 68 times 3 Drill Down to Detail 325,029 12.058 113.123 9 times 4 Drill Down to Detail 524,780 22.42 6383.21 278 times 5 Data Dump 972,002 49.054 N/A Hive Kylin High Level Aggregatio n Analysis Query Drill Down to Detail Low Level Aggregatio n Transactio n Level
  • 19. Kylin Tour 19 Performance -- Concurrency Linear scale out with more nodes
  • 20. Kylin Tour 20 Performance - Query Latency 90%tile queries <5s
  • 21. Agenda  What’s Kylin  Architecture  Performance  Onboarding  Open Source  Q & A
  • 22. Kylin Tour 22 How to onboard - Data  Landing data to Hadoop Cluster  Creating Hive tables with Star-Schema  Calculating dimensions cardinality statistics  Gathering query patterns
  • 23. Kylin Tour 23 How to onboard – Build Kylin Cube  Sync up Hive Metadata to Kylin  Create cube metadata via Cube Designer  Trigger cube build job  Setup Incremental job  Consume from front-end tool, like Tableau
  • 24. Agenda  What’s Kylin  Architecture  Performance  Onboarding  Open Source  Q & A
  • 25. Kylin Tour 25 Open Source  Kylin Site:  http://kylin.io  Github Repo:  https://github.com/KylinOLAP  Google Group:  Kylin OLAP
  • 26. Kylin Tour 26 Kylin Ecosystem  Kylin Core  Fundamental framework of Kylin OLAP Engine  Extension  Plugins to support for additional functions and features  Integration  Lifecycle Management Support to integrate with other applications  Interface  Allows for third party users to build more features via user-interface atop Kylin core  Driver  ODBC and JDBC Drivers Kylin OLAP Core Extension  Security  SSO  Redis Storage Interface  Web Console  Customized BI  Ambari/Hue Plugin Integration  ODBC Driver  ETL  Scheduling
  • 27. Kylin Tour 27 What’s Next? • v1.1 (Current version under development) – InvertedIndex to support extra high cardinality and high dimensions – Remote JDBC Driver – Filter on Coprocessor • v1.2 – Job Schedule and Priority – Capacity Management – Automation • v2.0 – HOLAP (Hybrid OLAP to combine ROLAP and MOLAP) – In-Memory Analysis – More…
  • 28. Kylin Tour 28 Thanks  Kylin Web Site:  http://kylin.io  Google Group:  Kylin OLAP  Contact:  Luke Han | lukhan@ebay.com  Medha Samant | msamant@ebay.com