SlideShare uma empresa Scribd logo
1 de 37
Augmented OLAP
for Big Data
Luke Han | luke.han@Kyligence.io
Co-founder & CEO of Kyligence
Apache Kylin PMC Chair
Microsoft Reginal Director & MVP
Strata Global Sponsor
BOOTH #410
© Kyligence Inc. 2019.
About Luke Han
• Luke Han
• Co-founder & CEO at Kyligence
• Co-creator and PMC Chair of Apache Kylin
• Apache Software Foundation Member
• Microsoft Regional Director & MVP
• Former eBay Big Data Product Manager Lead
© Kyligence Inc. 2019.
About Apache Kylin
• Leading Open Source OLAP for Big Data
• Rank 1 from googling “big data OLAP”
• Rank 1 from googling “hadoop OLAP”
• Open sourced by eBay in 2014
• Graduated to Apache Top Project in 2015
• 1000+ Adoptions world wild
• 2015 InfoWorld Bossie Awards
• 2016 InfoWorld Bossie Awards
© Kyligence Inc. 2019.
Agenda
• About Kyligence
• Pains in Big Data Analysis
• Kyligence’s solution: Augmented OLAP
• Video Demo
• Benchmark
• Use Cases
© Kyligence Inc. 2019.
Kyligence = Kylin + Intelligence
• Founded in 2016 by the original creators of Apache Kylin
• CRN Top 10 Big Data Startups 2018
• Backing by leading VCs:
• Redpoint Ventures
• Cisco
• CBC Capital
• Shunwei Capital
• Eight Roads Ventures (Fidelity International Arm)
• Coatue
• Global Offices:
• Shanghai
• Beijing
• Shenzhen
• San Jose
• New York
• Seattle
• …
© Kyligence Inc. 2019.
Telecom
Finance
Manufacturing
Trusted by Global Leaders
Retail &
Others
Most of them are Global Fortune 500
© Kyligence Inc. 2019.
Global Partners
© Kyligence Inc. 2019.
Agenda
• About Kyligence
• Pains in Big Data Analysis
• Kyligence’s solution: Augmented OLAP
• Use Cases
© Kyligence Inc. 2019.
Let’s talk about photography story…
https://technave.com/data/files/mall/article/201812271418327393.jpg
© Kyligence Inc. 2019.
Let’s talk about photography story…
How many people
really know how to
setup those?
© Kyligence Inc. 2019.
Let’s talk about photography story…
https://technave.com/data/files/mall/article/201812271437395703.jpg
© Kyligence Inc. 2019.
Let’s talk about photography story…
Google Photos
© Kyligence Inc. 2019.
Let’s talk about photography story…
How do you manage
your 100,000+ photos?
Google Photos
© Kyligence Inc. 2019, Confidential.
Then…
how about your enterprise
data?
© Kyligence Inc. 2019, Confidential.
https://www.slideshare.net/datascienceth/introduction-to-data-science-data-science-thailand-meetup-1
https://www.sintetia.com/wp-content/uploads/2014/05/Data-Scientist-What-I-really-do.png
© Kyligence Inc. 2019, Confidential.
Fast and Changing
Analysis Demand
Slow and Heavy
Big Data Operations
vs
© Kyligence Inc. 2019.
The Typical “Throw in some People” Approach
Business Users Analysts Data Engineers
Business Analysis Data Modeling Lake → Warehouse → Mart Reporting
$$$High Cost :
Administrators
Slow Time-to-Insight:
© Kyligence Inc. 2019, Confidential.
Presentation
Visualization
Impala
Data Lake
Hive Spark SQL Drill
MapReduce Spark …….
Time-to-value Pain
Weeks of waiting breaks the
“online” promise.
Collaboration Pain
Hard to reuse asset across teams.
Each team fights their own path.
Resource Pain
Hard to scale. Where to find so
many skilled big data engineers?
Pains in the “Throw in some People” Approach
© Kyligence Inc. 2019.
Agenda
• About Kyligence
• Pains in Big Data Analysis
• Kyligence’s solution: Augmented OLAP
• Use Cases
© Kyligence Inc. 2019, Confidential.
Throw in some Intelligence!
Let a system replace the people.
o Transparent SQL Acceleration
o On-demand Data Preparation
o Interactive Query Performance
o High Concurrency
o Centralized Semantic Layer
Faster time to market. Stay “online”.
Augmented OLAP
Data Mart
Presentation
Visualization
Impala
Data Lake
Hive Drill
MapReduce Spark …….
Spark SQL
Semantic Automation Acceleration Governance
© Kyligence Inc. 2019.
A Learning OLAP System
Business User Insights
Business User Analyst Data Engineer
vs
© Kyligence Inc. 2019.
A Learning OLAP System
Business User Insights
Pattern Detection
Auto Modeling
Data Preparing
Raw Data
Prepared Data
Augmented
OLAP Engine
(Background Learning)
© Kyligence Inc. 2019.
Demo Setup
Tableau
SparkSQL 2.4
Kyligence
Enterprise
Analyze 1 billion rows of
sales records (TPC-H)
Business User
© Kyligence Inc. 2019.
(Embed the Demo Video)
© Kyligence Inc. 2019.
Demo FAQ
Business User
Analyst
reuse
How to improve the first slow
exploration?
What if the analyst operates differently
the second time?
More comprehensive performance
benchmark?
Prepared Data
© Kyligence Inc. 2019.
TPC-H Decision Support Benchmark
TPC-H Benchmark
• Examine large volumes of data
• High complexity queries
• Answers critical business questions
• 22 decision making queries
E.g. The Shipping Priority Query
retrieves the shipping priority and potential
revenue of the orders having the largest revenue
among those that had not been shipped as of a
given date. Top 10 orders are listed in
decreasing order of revenue.
© Kyligence Inc. 2019.
Kyligence Enterprise 4 Beta vs SparkSQL 2.4
To see the trend as data grows
• 3 datasets
• Scale Factor = 20, 35, 50
• TPCH_SF1: Consists of the base row size (several million
elements).
• TPCH_SF20: Consists of the base row size x 20.
• TPCH_SF35: Consists of the base row size x 35.
• TPCH_SF50: Consists of the base row size x 50 (several
hundred million elements).
Billion
© Kyligence Inc. 2019.
Hardware Configurations
Same 4 physical nodes
- Intel(R) Xeon(R) CPU E5-2630 v4 @ 2.20GHz * 2
- Totally 86 vCores, 188 GB mem
Same Spark configuration for both KE 4 Beta and SparkSQL 2.4
- spark.driver.memory=16g
- spark.executor.memory=8g
- spark.yarn.executor.memoryOverhead=2g
- spark.yarn.am.memory=1024m
- spark.executor.cores=5
- spark.executor.instances=17
© Kyligence Inc. 2019.
Query Response Time | KE 4 Beta vs. SparkSQL 2.4
Milliseconds
TPC-H 22 queries
For each dataset
- Run each query 3 times
- Record the average time
- No warm up
Lower is better.
SF=50
© Kyligence Inc. 2019.
Total Response Time | KE 4 Beta vs. SparkSQL 2.4
Billion Seconds
Total response time is the sum
of 22 queries’ response time.
Compare over the size of
datasets and feel the trend.
Scale out for the future.
© Kyligence Inc. 2019.
Avg. Acceleration Rate | KE 4 Beta vs. SparkSQL 2.4
Acceleration Rate
= SparkSQL time / KE time
Take average of the 22 and
compare over size of datasets.
© Kyligence Inc. 2019.
SQL
Query Log
Analytic Behavior
Data
Schema
Data
Profile
Machine Learning
Engine
Data Modeling
Automation
Kylin Cube
Learnt Index
Smart Pushdown
BI
Real-time
Analysis
Data-as-a-
Service
Local
Deployment
Cloud
Platform
Container Data Services
© Kyligence Inc. 2018, Confidential.
AI-Augmented Analytics Platform
© Kyligence Inc. 2019.
Agenda
• About Kyligence
• Pains in Big Data Analysis
• Kyligence’s solution: Augmented OLAP
• Use Cases
© Kyligence Inc. 2019.
Use Case: IBM Cognos Replacement
One Kyligence Cube for 800+ Cognos Cubes
Org. Daily
Cube
Merch. Daily
Cube
Channel Daily
Cube
Region Daily
Cube
Org. Monthly
Cube
Merch.
Monthly Cube
Channel
Monthly Cube
Region
Monthly Cube
Shanghai
Merchants
Zhejiang
Merchants
Anhui
Merchants
Guangdong
Merchants
Card Transaction
Dimensions: 167
Measures: 20
800+ Cognos Cube, 1000+ ETL jobs
Functional
Scene
Time
Scene
Geo
Scene
┄ ┄
Data: 300+ B Records
Merchants: 10+m
Cards: 10+B
© Kyligence Inc. 2019.
Use Case: Data as a Services Platform
In the past, due to the limitations of our previous
multi-dimensional analytic tool, we faced challenges
of constrained time range in queries……We are
considering leveraging multi-dimensional data
cubes to replace a number of fragmented legacy
tabular reports in more business units, so that we
can provide better analytic services to our business
users.”
-- Laments Wu Ying, VP of CMBs Development
Center,
Offline Platform
(Hadoop)
Online Platform
(Hadoop)
EDW
(Teradata)
Kyligence Data-as-a-Service Platform
Tenancy 1 Tenancy 2 Tenancy 3 Tenancy N……
Business Intelligence
Cognos Tableau MicroStrategy Superset API
Smart Routine
Intelligent
Modeling
Multi-tenancy
Applications
SecurityJob
MIP MDS CRM DWS
© Kyligence Inc. 2019, Confidential.
Take away: Augmented OLAP, the future for analytics
ImpalaHive Spark SQL Drill
MapReduce Spark …….
AI-Augmented
OLAP
ImpalaHive Drill
MapReduce Spark …….
Spark SQL
Semantic Automation Acceleration Governance
Thanks
luke.han@Kyligence.io |
@lukehq
Homepage: http://kyligence.io
Twitter: @kyligence
Booth: #410

Mais conteúdo relacionado

Mais procurados

Cloud-Native Workshop NYC - Leveraging Google Cloud Services with Spring Boot...
Cloud-Native Workshop NYC - Leveraging Google Cloud Services with Spring Boot...Cloud-Native Workshop NYC - Leveraging Google Cloud Services with Spring Boot...
Cloud-Native Workshop NYC - Leveraging Google Cloud Services with Spring Boot...
VMware Tanzu
 
Javaedge 2010-cschalk
Javaedge 2010-cschalkJavaedge 2010-cschalk
Javaedge 2010-cschalk
Chris Schalk
 
A Journey to a Serverless Business Intelligence, Machine Learning and Big Dat...
A Journey to a Serverless Business Intelligence, Machine Learning and Big Dat...A Journey to a Serverless Business Intelligence, Machine Learning and Big Dat...
A Journey to a Serverless Business Intelligence, Machine Learning and Big Dat...
DataWorks Summit
 
Next-Generation BPM - How to create intelligent Business Processes thanks to ...
Next-Generation BPM - How to create intelligent Business Processes thanks to ...Next-Generation BPM - How to create intelligent Business Processes thanks to ...
Next-Generation BPM - How to create intelligent Business Processes thanks to ...
Kai Wähner
 

Mais procurados (20)

Cloud-Native Microservices
Cloud-Native MicroservicesCloud-Native Microservices
Cloud-Native Microservices
 
The Manulife Journey
The Manulife JourneyThe Manulife Journey
The Manulife Journey
 
Cloud-Native Workshop NYC - Leveraging Google Cloud Services with Spring Boot...
Cloud-Native Workshop NYC - Leveraging Google Cloud Services with Spring Boot...Cloud-Native Workshop NYC - Leveraging Google Cloud Services with Spring Boot...
Cloud-Native Workshop NYC - Leveraging Google Cloud Services with Spring Boot...
 
#GeodeSummit - Modern manufacturing powered by Spring XD and Geode
#GeodeSummit - Modern manufacturing powered by Spring XD and Geode#GeodeSummit - Modern manufacturing powered by Spring XD and Geode
#GeodeSummit - Modern manufacturing powered by Spring XD and Geode
 
#GeodeSummit - Using Geode as Operational Data Services for Real Time Mobile ...
#GeodeSummit - Using Geode as Operational Data Services for Real Time Mobile ...#GeodeSummit - Using Geode as Operational Data Services for Real Time Mobile ...
#GeodeSummit - Using Geode as Operational Data Services for Real Time Mobile ...
 
Javaedge 2010-cschalk
Javaedge 2010-cschalkJavaedge 2010-cschalk
Javaedge 2010-cschalk
 
A Journey to a Serverless Business Intelligence, Machine Learning and Big Dat...
A Journey to a Serverless Business Intelligence, Machine Learning and Big Dat...A Journey to a Serverless Business Intelligence, Machine Learning and Big Dat...
A Journey to a Serverless Business Intelligence, Machine Learning and Big Dat...
 
Use Cases from Batch to Streaming, MapReduce to Spark, Mainframe to Cloud: To...
Use Cases from Batch to Streaming, MapReduce to Spark, Mainframe to Cloud: To...Use Cases from Batch to Streaming, MapReduce to Spark, Mainframe to Cloud: To...
Use Cases from Batch to Streaming, MapReduce to Spark, Mainframe to Cloud: To...
 
Hassle-Free Data Lake Governance: Automating Your Analytics with a Semantic L...
Hassle-Free Data Lake Governance: Automating Your Analytics with a Semantic L...Hassle-Free Data Lake Governance: Automating Your Analytics with a Semantic L...
Hassle-Free Data Lake Governance: Automating Your Analytics with a Semantic L...
 
Cognitive Procurement Masterclass with IBM - SID 51774
Cognitive Procurement Masterclass with IBM - SID 51774Cognitive Procurement Masterclass with IBM - SID 51774
Cognitive Procurement Masterclass with IBM - SID 51774
 
Real-time Streaming Analytics: Business Value, Use Cases and Architectural Co...
Real-time Streaming Analytics: Business Value, Use Cases and Architectural Co...Real-time Streaming Analytics: Business Value, Use Cases and Architectural Co...
Real-time Streaming Analytics: Business Value, Use Cases and Architectural Co...
 
Fast Data for Competitive Advantage: 4 Steps to Expand your Window of Opportu...
Fast Data for Competitive Advantage: 4 Steps to Expand your Window of Opportu...Fast Data for Competitive Advantage: 4 Steps to Expand your Window of Opportu...
Fast Data for Competitive Advantage: 4 Steps to Expand your Window of Opportu...
 
Petabytes to Personalization - Data Analytics with Qubit and Looker
Petabytes to Personalization - Data Analytics with Qubit and LookerPetabytes to Personalization - Data Analytics with Qubit and Looker
Petabytes to Personalization - Data Analytics with Qubit and Looker
 
Stopping the Lake from becoming a Swamp
Stopping the Lake from becoming a SwampStopping the Lake from becoming a Swamp
Stopping the Lake from becoming a Swamp
 
Next-Generation BPM - How to create intelligent Business Processes thanks to ...
Next-Generation BPM - How to create intelligent Business Processes thanks to ...Next-Generation BPM - How to create intelligent Business Processes thanks to ...
Next-Generation BPM - How to create intelligent Business Processes thanks to ...
 
Big Data Paris - A Modern Enterprise Architecture
Big Data Paris - A Modern Enterprise ArchitectureBig Data Paris - A Modern Enterprise Architecture
Big Data Paris - A Modern Enterprise Architecture
 
How Workato creates robust data pipelines and automations for you?
How Workato creates robust data pipelines and automations for you?How Workato creates robust data pipelines and automations for you?
How Workato creates robust data pipelines and automations for you?
 
2020 Big Data & Analytics Maturity Survey Results
2020 Big Data & Analytics Maturity Survey Results2020 Big Data & Analytics Maturity Survey Results
2020 Big Data & Analytics Maturity Survey Results
 
Big Data & Analytics - Use Cases in Mobile, E-commerce, Media and more
Big Data & Analytics - Use Cases in Mobile, E-commerce, Media and moreBig Data & Analytics - Use Cases in Mobile, E-commerce, Media and more
Big Data & Analytics - Use Cases in Mobile, E-commerce, Media and more
 
From BI Developer to Data Engineer with Oracle Analytics Cloud, Data Lake
From BI Developer to Data Engineer with Oracle Analytics Cloud, Data LakeFrom BI Developer to Data Engineer with Oracle Analytics Cloud, Data Lake
From BI Developer to Data Engineer with Oracle Analytics Cloud, Data Lake
 

Semelhante a Augmented OLAP for Big Data

Connecta Event: Big Query och dataanalys med Google Cloud Platform
Connecta Event: Big Query och dataanalys med Google Cloud PlatformConnecta Event: Big Query och dataanalys med Google Cloud Platform
Connecta Event: Big Query och dataanalys med Google Cloud Platform
ConnectaDigital
 
Becoming a data driven organization
Becoming a data driven organization Becoming a data driven organization
Becoming a data driven organization
Magnus Backman
 
Customer Presentation - IBM Cloud Pak for Data Overview (Level 100).PPTX
Customer Presentation - IBM Cloud Pak for Data Overview (Level 100).PPTXCustomer Presentation - IBM Cloud Pak for Data Overview (Level 100).PPTX
Customer Presentation - IBM Cloud Pak for Data Overview (Level 100).PPTX
tsigitnist02
 

Semelhante a Augmented OLAP for Big Data (20)

Augmented OLAP for Big Data Analytics
Augmented OLAP for Big Data AnalyticsAugmented OLAP for Big Data Analytics
Augmented OLAP for Big Data Analytics
 
Take the Bias out of Big Data Insights With Augmented Analytics
Take the Bias out of Big Data Insights With Augmented AnalyticsTake the Bias out of Big Data Insights With Augmented Analytics
Take the Bias out of Big Data Insights With Augmented Analytics
 
Simplify Data Analytics Over the Cloud
Simplify Data Analytics Over the CloudSimplify Data Analytics Over the Cloud
Simplify Data Analytics Over the Cloud
 
Connecta Event: Big Query och dataanalys med Google Cloud Platform
Connecta Event: Big Query och dataanalys med Google Cloud PlatformConnecta Event: Big Query och dataanalys med Google Cloud Platform
Connecta Event: Big Query och dataanalys med Google Cloud Platform
 
Lightning-Fast, Interactive Business Intelligence Performance with MicroStrat...
Lightning-Fast, Interactive Business Intelligence Performance with MicroStrat...Lightning-Fast, Interactive Business Intelligence Performance with MicroStrat...
Lightning-Fast, Interactive Business Intelligence Performance with MicroStrat...
 
Apache Kylin and Use Cases - 2018 Big Data Spain
Apache Kylin and Use Cases - 2018 Big Data SpainApache Kylin and Use Cases - 2018 Big Data Spain
Apache Kylin and Use Cases - 2018 Big Data Spain
 
Kyligence Cloud 4 - An Overview
Kyligence Cloud 4 - An OverviewKyligence Cloud 4 - An Overview
Kyligence Cloud 4 - An Overview
 
The Enabling Power of Distributed SQL for Enterprise Digital Transformation I...
The Enabling Power of Distributed SQL for Enterprise Digital Transformation I...The Enabling Power of Distributed SQL for Enterprise Digital Transformation I...
The Enabling Power of Distributed SQL for Enterprise Digital Transformation I...
 
Business Data Lake Best Practices
Business Data Lake Best PracticesBusiness Data Lake Best Practices
Business Data Lake Best Practices
 
Top Trends in Building Data Lakes for Machine Learning and AI
Top Trends in Building Data Lakes for Machine Learning and AI Top Trends in Building Data Lakes for Machine Learning and AI
Top Trends in Building Data Lakes for Machine Learning and AI
 
Becoming a data driven organization
Becoming a data driven organization Becoming a data driven organization
Becoming a data driven organization
 
Snowflake: The Good, the Bad, and the Ugly
Snowflake: The Good, the Bad, and the UglySnowflake: The Good, the Bad, and the Ugly
Snowflake: The Good, the Bad, and the Ugly
 
Turning Big Data into Better Business Outcomes
Turning Big Data into Better Business OutcomesTurning Big Data into Better Business Outcomes
Turning Big Data into Better Business Outcomes
 
Snowflake: The Good, the Bad and the Ugly
Snowflake: The Good, the Bad and the UglySnowflake: The Good, the Bad and the Ugly
Snowflake: The Good, the Bad and the Ugly
 
Big Data LDN 2017: The New Dominant Companies Are Running on Data
Big Data LDN 2017: The New Dominant Companies Are Running on DataBig Data LDN 2017: The New Dominant Companies Are Running on Data
Big Data LDN 2017: The New Dominant Companies Are Running on Data
 
Big Data LDN 2017: The New Dominant Companies Are Running on Data
Big Data LDN 2017: The New Dominant Companies Are Running on DataBig Data LDN 2017: The New Dominant Companies Are Running on Data
Big Data LDN 2017: The New Dominant Companies Are Running on Data
 
Customer Presentation - IBM Cloud Pak for Data Overview (Level 100).PPTX
Customer Presentation - IBM Cloud Pak for Data Overview (Level 100).PPTXCustomer Presentation - IBM Cloud Pak for Data Overview (Level 100).PPTX
Customer Presentation - IBM Cloud Pak for Data Overview (Level 100).PPTX
 
Architecting Snowflake for High Concurrency and High Performance
Architecting Snowflake for High Concurrency and High PerformanceArchitecting Snowflake for High Concurrency and High Performance
Architecting Snowflake for High Concurrency and High Performance
 
The new dominant companies are running on data
The new dominant companies are running on data The new dominant companies are running on data
The new dominant companies are running on data
 
Kyligence Cloud 4 - Feature Focus: AI-Augmented Engine
Kyligence Cloud 4 - Feature Focus: AI-Augmented EngineKyligence Cloud 4 - Feature Focus: AI-Augmented Engine
Kyligence Cloud 4 - Feature Focus: AI-Augmented Engine
 

Mais de Luke Han

Mais de Luke Han (18)

Refactoring your EDW with Mobile Analytics Products
Refactoring your EDW with Mobile Analytics ProductsRefactoring your EDW with Mobile Analytics Products
Refactoring your EDW with Mobile Analytics Products
 
Building Enterprise OLAP on Hadoop for FSI
Building Enterprise OLAP on Hadoop for FSIBuilding Enterprise OLAP on Hadoop for FSI
Building Enterprise OLAP on Hadoop for FSI
 
Apache Kylin Use Cases in China and Japan
Apache Kylin Use Cases in China and JapanApache Kylin Use Cases in China and Japan
Apache Kylin Use Cases in China and Japan
 
The Apache Way - Building Open Source Community in China - Luke Han
The Apache Way - Building Open Source Community in China - Luke HanThe Apache Way - Building Open Source Community in China - Luke Han
The Apache Way - Building Open Source Community in China - Luke Han
 
The Evolution of Apache Kylin by Luke Han
The Evolution of Apache Kylin by Luke HanThe Evolution of Apache Kylin by Luke Han
The Evolution of Apache Kylin by Luke Han
 
3. Apache Tez Introducation - Apache Kylin Meetup @Shanghai
3. Apache Tez Introducation - Apache Kylin Meetup @Shanghai3. Apache Tez Introducation - Apache Kylin Meetup @Shanghai
3. Apache Tez Introducation - Apache Kylin Meetup @Shanghai
 
5. Apache Kylin的金融大数据应用场景 - Apache Kylin Meetup @Shanghai
5. Apache Kylin的金融大数据应用场景 - Apache Kylin Meetup @Shanghai5. Apache Kylin的金融大数据应用场景 - Apache Kylin Meetup @Shanghai
5. Apache Kylin的金融大数据应用场景 - Apache Kylin Meetup @Shanghai
 
6. Apache Kylin Roadmap and Community - Apache Kylin Meetup @Shanghai
6. Apache Kylin Roadmap and Community - Apache Kylin Meetup @Shanghai6. Apache Kylin Roadmap and Community - Apache Kylin Meetup @Shanghai
6. Apache Kylin Roadmap and Community - Apache Kylin Meetup @Shanghai
 
4.Building a Data Product using apache Zeppelin - Apache Kylin Meetup @Shanghai
4.Building a Data Product using apache Zeppelin - Apache Kylin Meetup @Shanghai4.Building a Data Product using apache Zeppelin - Apache Kylin Meetup @Shanghai
4.Building a Data Product using apache Zeppelin - Apache Kylin Meetup @Shanghai
 
1. Apache Kylin Deep Dive - Streaming and Plugin Architecture - Apache Kylin ...
1. Apache Kylin Deep Dive - Streaming and Plugin Architecture - Apache Kylin ...1. Apache Kylin Deep Dive - Streaming and Plugin Architecture - Apache Kylin ...
1. Apache Kylin Deep Dive - Streaming and Plugin Architecture - Apache Kylin ...
 
Apache Kylin Open Source Journey for QCon2015 Beijing
Apache Kylin Open Source Journey for QCon2015 BeijingApache Kylin Open Source Journey for QCon2015 Beijing
Apache Kylin Open Source Journey for QCon2015 Beijing
 
ApacheKylin_HBaseCon2015
ApacheKylin_HBaseCon2015ApacheKylin_HBaseCon2015
ApacheKylin_HBaseCon2015
 
Apache Kylin Extreme OLAP Engine for Big Data
Apache Kylin Extreme OLAP Engine for Big DataApache Kylin Extreme OLAP Engine for Big Data
Apache Kylin Extreme OLAP Engine for Big Data
 
Apache Kylin Introduction
Apache Kylin IntroductionApache Kylin Introduction
Apache Kylin Introduction
 
Adding Spark support to Kylin at Bay Area Spark Meetup
Adding Spark support to Kylin at Bay Area Spark MeetupAdding Spark support to Kylin at Bay Area Spark Meetup
Adding Spark support to Kylin at Bay Area Spark Meetup
 
Apache kylin - Big Data Technology Conference 2014 Beijing
Apache kylin - Big Data Technology Conference 2014 BeijingApache kylin - Big Data Technology Conference 2014 Beijing
Apache kylin - Big Data Technology Conference 2014 Beijing
 
Kylin OLAP Engine Tour
Kylin OLAP Engine TourKylin OLAP Engine Tour
Kylin OLAP Engine Tour
 
Actuate presentation 2011
Actuate presentation   2011Actuate presentation   2011
Actuate presentation 2011
 

Último

CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
9953056974 Low Rate Call Girls In Saket, Delhi NCR
 
TECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service providerTECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service provider
mohitmore19
 
introduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdf
introduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdfintroduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdf
introduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdf
VishalKumarJha10
 

Último (20)

The Guide to Integrating Generative AI into Unified Continuous Testing Platfo...
The Guide to Integrating Generative AI into Unified Continuous Testing Platfo...The Guide to Integrating Generative AI into Unified Continuous Testing Platfo...
The Guide to Integrating Generative AI into Unified Continuous Testing Platfo...
 
Introducing Microsoft’s new Enterprise Work Management (EWM) Solution
Introducing Microsoft’s new Enterprise Work Management (EWM) SolutionIntroducing Microsoft’s new Enterprise Work Management (EWM) Solution
Introducing Microsoft’s new Enterprise Work Management (EWM) Solution
 
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
 
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
 
How To Use Server-Side Rendering with Nuxt.js
How To Use Server-Side Rendering with Nuxt.jsHow To Use Server-Side Rendering with Nuxt.js
How To Use Server-Side Rendering with Nuxt.js
 
Diamond Application Development Crafting Solutions with Precision
Diamond Application Development Crafting Solutions with PrecisionDiamond Application Development Crafting Solutions with Precision
Diamond Application Development Crafting Solutions with Precision
 
How To Troubleshoot Collaboration Apps for the Modern Connected Worker
How To Troubleshoot Collaboration Apps for the Modern Connected WorkerHow To Troubleshoot Collaboration Apps for the Modern Connected Worker
How To Troubleshoot Collaboration Apps for the Modern Connected Worker
 
Unveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time ApplicationsUnveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
 
TECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service providerTECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service provider
 
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdfLearn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
 
Azure_Native_Qumulo_High_Performance_Compute_Benchmarks.pdf
Azure_Native_Qumulo_High_Performance_Compute_Benchmarks.pdfAzure_Native_Qumulo_High_Performance_Compute_Benchmarks.pdf
Azure_Native_Qumulo_High_Performance_Compute_Benchmarks.pdf
 
Software Quality Assurance Interview Questions
Software Quality Assurance Interview QuestionsSoftware Quality Assurance Interview Questions
Software Quality Assurance Interview Questions
 
AI & Machine Learning Presentation Template
AI & Machine Learning Presentation TemplateAI & Machine Learning Presentation Template
AI & Machine Learning Presentation Template
 
Microsoft AI Transformation Partner Playbook.pdf
Microsoft AI Transformation Partner Playbook.pdfMicrosoft AI Transformation Partner Playbook.pdf
Microsoft AI Transformation Partner Playbook.pdf
 
A Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docxA Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docx
 
introduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdf
introduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdfintroduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdf
introduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdf
 
5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdf5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdf
 
Define the academic and professional writing..pdf
Define the academic and professional writing..pdfDefine the academic and professional writing..pdf
Define the academic and professional writing..pdf
 
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...
 
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
 

Augmented OLAP for Big Data

  • 1. Augmented OLAP for Big Data Luke Han | luke.han@Kyligence.io Co-founder & CEO of Kyligence Apache Kylin PMC Chair Microsoft Reginal Director & MVP Strata Global Sponsor BOOTH #410
  • 2. © Kyligence Inc. 2019. About Luke Han • Luke Han • Co-founder & CEO at Kyligence • Co-creator and PMC Chair of Apache Kylin • Apache Software Foundation Member • Microsoft Regional Director & MVP • Former eBay Big Data Product Manager Lead
  • 3. © Kyligence Inc. 2019. About Apache Kylin • Leading Open Source OLAP for Big Data • Rank 1 from googling “big data OLAP” • Rank 1 from googling “hadoop OLAP” • Open sourced by eBay in 2014 • Graduated to Apache Top Project in 2015 • 1000+ Adoptions world wild • 2015 InfoWorld Bossie Awards • 2016 InfoWorld Bossie Awards
  • 4. © Kyligence Inc. 2019. Agenda • About Kyligence • Pains in Big Data Analysis • Kyligence’s solution: Augmented OLAP • Video Demo • Benchmark • Use Cases
  • 5. © Kyligence Inc. 2019. Kyligence = Kylin + Intelligence • Founded in 2016 by the original creators of Apache Kylin • CRN Top 10 Big Data Startups 2018 • Backing by leading VCs: • Redpoint Ventures • Cisco • CBC Capital • Shunwei Capital • Eight Roads Ventures (Fidelity International Arm) • Coatue • Global Offices: • Shanghai • Beijing • Shenzhen • San Jose • New York • Seattle • …
  • 6. © Kyligence Inc. 2019. Telecom Finance Manufacturing Trusted by Global Leaders Retail & Others Most of them are Global Fortune 500
  • 7. © Kyligence Inc. 2019. Global Partners
  • 8. © Kyligence Inc. 2019. Agenda • About Kyligence • Pains in Big Data Analysis • Kyligence’s solution: Augmented OLAP • Use Cases
  • 9. © Kyligence Inc. 2019. Let’s talk about photography story… https://technave.com/data/files/mall/article/201812271418327393.jpg
  • 10. © Kyligence Inc. 2019. Let’s talk about photography story… How many people really know how to setup those?
  • 11. © Kyligence Inc. 2019. Let’s talk about photography story… https://technave.com/data/files/mall/article/201812271437395703.jpg
  • 12. © Kyligence Inc. 2019. Let’s talk about photography story… Google Photos
  • 13. © Kyligence Inc. 2019. Let’s talk about photography story… How do you manage your 100,000+ photos? Google Photos
  • 14. © Kyligence Inc. 2019, Confidential. Then… how about your enterprise data?
  • 15. © Kyligence Inc. 2019, Confidential. https://www.slideshare.net/datascienceth/introduction-to-data-science-data-science-thailand-meetup-1 https://www.sintetia.com/wp-content/uploads/2014/05/Data-Scientist-What-I-really-do.png
  • 16. © Kyligence Inc. 2019, Confidential. Fast and Changing Analysis Demand Slow and Heavy Big Data Operations vs
  • 17. © Kyligence Inc. 2019. The Typical “Throw in some People” Approach Business Users Analysts Data Engineers Business Analysis Data Modeling Lake → Warehouse → Mart Reporting $$$High Cost : Administrators Slow Time-to-Insight:
  • 18. © Kyligence Inc. 2019, Confidential. Presentation Visualization Impala Data Lake Hive Spark SQL Drill MapReduce Spark ……. Time-to-value Pain Weeks of waiting breaks the “online” promise. Collaboration Pain Hard to reuse asset across teams. Each team fights their own path. Resource Pain Hard to scale. Where to find so many skilled big data engineers? Pains in the “Throw in some People” Approach
  • 19. © Kyligence Inc. 2019. Agenda • About Kyligence • Pains in Big Data Analysis • Kyligence’s solution: Augmented OLAP • Use Cases
  • 20. © Kyligence Inc. 2019, Confidential. Throw in some Intelligence! Let a system replace the people. o Transparent SQL Acceleration o On-demand Data Preparation o Interactive Query Performance o High Concurrency o Centralized Semantic Layer Faster time to market. Stay “online”. Augmented OLAP Data Mart Presentation Visualization Impala Data Lake Hive Drill MapReduce Spark ……. Spark SQL Semantic Automation Acceleration Governance
  • 21. © Kyligence Inc. 2019. A Learning OLAP System Business User Insights Business User Analyst Data Engineer vs
  • 22. © Kyligence Inc. 2019. A Learning OLAP System Business User Insights Pattern Detection Auto Modeling Data Preparing Raw Data Prepared Data Augmented OLAP Engine (Background Learning)
  • 23. © Kyligence Inc. 2019. Demo Setup Tableau SparkSQL 2.4 Kyligence Enterprise Analyze 1 billion rows of sales records (TPC-H) Business User
  • 24. © Kyligence Inc. 2019. (Embed the Demo Video)
  • 25. © Kyligence Inc. 2019. Demo FAQ Business User Analyst reuse How to improve the first slow exploration? What if the analyst operates differently the second time? More comprehensive performance benchmark? Prepared Data
  • 26. © Kyligence Inc. 2019. TPC-H Decision Support Benchmark TPC-H Benchmark • Examine large volumes of data • High complexity queries • Answers critical business questions • 22 decision making queries E.g. The Shipping Priority Query retrieves the shipping priority and potential revenue of the orders having the largest revenue among those that had not been shipped as of a given date. Top 10 orders are listed in decreasing order of revenue.
  • 27. © Kyligence Inc. 2019. Kyligence Enterprise 4 Beta vs SparkSQL 2.4 To see the trend as data grows • 3 datasets • Scale Factor = 20, 35, 50 • TPCH_SF1: Consists of the base row size (several million elements). • TPCH_SF20: Consists of the base row size x 20. • TPCH_SF35: Consists of the base row size x 35. • TPCH_SF50: Consists of the base row size x 50 (several hundred million elements). Billion
  • 28. © Kyligence Inc. 2019. Hardware Configurations Same 4 physical nodes - Intel(R) Xeon(R) CPU E5-2630 v4 @ 2.20GHz * 2 - Totally 86 vCores, 188 GB mem Same Spark configuration for both KE 4 Beta and SparkSQL 2.4 - spark.driver.memory=16g - spark.executor.memory=8g - spark.yarn.executor.memoryOverhead=2g - spark.yarn.am.memory=1024m - spark.executor.cores=5 - spark.executor.instances=17
  • 29. © Kyligence Inc. 2019. Query Response Time | KE 4 Beta vs. SparkSQL 2.4 Milliseconds TPC-H 22 queries For each dataset - Run each query 3 times - Record the average time - No warm up Lower is better. SF=50
  • 30. © Kyligence Inc. 2019. Total Response Time | KE 4 Beta vs. SparkSQL 2.4 Billion Seconds Total response time is the sum of 22 queries’ response time. Compare over the size of datasets and feel the trend. Scale out for the future.
  • 31. © Kyligence Inc. 2019. Avg. Acceleration Rate | KE 4 Beta vs. SparkSQL 2.4 Acceleration Rate = SparkSQL time / KE time Take average of the 22 and compare over size of datasets.
  • 32. © Kyligence Inc. 2019. SQL Query Log Analytic Behavior Data Schema Data Profile Machine Learning Engine Data Modeling Automation Kylin Cube Learnt Index Smart Pushdown BI Real-time Analysis Data-as-a- Service Local Deployment Cloud Platform Container Data Services © Kyligence Inc. 2018, Confidential. AI-Augmented Analytics Platform
  • 33. © Kyligence Inc. 2019. Agenda • About Kyligence • Pains in Big Data Analysis • Kyligence’s solution: Augmented OLAP • Use Cases
  • 34. © Kyligence Inc. 2019. Use Case: IBM Cognos Replacement One Kyligence Cube for 800+ Cognos Cubes Org. Daily Cube Merch. Daily Cube Channel Daily Cube Region Daily Cube Org. Monthly Cube Merch. Monthly Cube Channel Monthly Cube Region Monthly Cube Shanghai Merchants Zhejiang Merchants Anhui Merchants Guangdong Merchants Card Transaction Dimensions: 167 Measures: 20 800+ Cognos Cube, 1000+ ETL jobs Functional Scene Time Scene Geo Scene ┄ ┄ Data: 300+ B Records Merchants: 10+m Cards: 10+B
  • 35. © Kyligence Inc. 2019. Use Case: Data as a Services Platform In the past, due to the limitations of our previous multi-dimensional analytic tool, we faced challenges of constrained time range in queries……We are considering leveraging multi-dimensional data cubes to replace a number of fragmented legacy tabular reports in more business units, so that we can provide better analytic services to our business users.” -- Laments Wu Ying, VP of CMBs Development Center, Offline Platform (Hadoop) Online Platform (Hadoop) EDW (Teradata) Kyligence Data-as-a-Service Platform Tenancy 1 Tenancy 2 Tenancy 3 Tenancy N…… Business Intelligence Cognos Tableau MicroStrategy Superset API Smart Routine Intelligent Modeling Multi-tenancy Applications SecurityJob MIP MDS CRM DWS
  • 36. © Kyligence Inc. 2019, Confidential. Take away: Augmented OLAP, the future for analytics ImpalaHive Spark SQL Drill MapReduce Spark ……. AI-Augmented OLAP ImpalaHive Drill MapReduce Spark ……. Spark SQL Semantic Automation Acceleration Governance