SlideShare uma empresa Scribd logo
1 de 63
Building a Data Product using
apache Zeppelin (incubating)
NFLabs for ApacheCon ’15 EU
Alexander Bezzubov
Data engineer @NFLabs
based in Seoul, South Korea
bzz@apache.org
So you want to build a data
product…
What do you need?
To build a data product we need:
Idea
Data
Software (to process the data)
Hardware (to run the software)
…and a brain
Idea
Data
Software (to process the data)
Hardware (to run the software)
…and a brain
To build a data product we need:
Software
Software
Software
… …
Software
… …
Zeppelin
Zeppelin
Opensouce analytical environment,
with pluggable backend data-processing systems,
with notebook-style GUI for visualisations
ASF Incubation12.2014
08.2013 NFLabs Internal project Hive/Shark
http://zeppelin.incubator.apache.org
12.2012 Commercial App using AMP Lab Shark 0.5
10.2013 Prototype Hive/Shark
Zeppelin History
Zeppelin History
Zeppelin History
Zeppelin History
Zeppelin History
Zeppelin Current Status
1 Release
63 Contributors worldwide
689 Stars on GH
~300/900 Emails at users/dev
@i.a.o
http://zeppelin.incubator.apache.org
Zeppelin Architecture
Hardware
Hardware
.
.
.
.
.
.
Idea
An Idea
Would not it be cool if …
An Idea
Would not it be cool if …Would not it be cool if …
you could have your own Google
Analytics?
An Idea
Would not it be cool if …Would not it be cool if …
you could have your own Google
Analytics?
sorry, we already saw it in eCG
talk..
ok, let’s pick something else
An Idea
you could be the first to know when
there is a new interesting*
opensouce project
Would not it be cool if …
Data
Data: Github archive
https://www.githubarchive.org• Github logs, hosted in the
cloud
• Collaboration between Github
and Google engineers
• 20+ events, 250+Gb since
2012
• Proprietary software
• available on BigQuery
But what if you could:
analyse this data independently, without asking
permission or paying anybody?
But what if you could:
analyse this data independent, without asking
permission or paying anybody?
well, with ASF you can!
Let’s build a data product for ourselves!
Building a product
We are going to build a Notebook that:
• Downloads the latest data from GitHub Archive
• Read & explore the dataset
• Imports, filters the PublicEvent
• Join logs w/ more data from Github API calls
• Shows simple HTML template, to visualise the
list
• Sends email notifications
We are going to build a Notebook that:
basically, sends you digest emails:
Start Zeppelin
./bin/zeppelin-daemon.sh start
& create a new notebook
http://zeppelin.incubator.apache.org/download.html
Zeppelin Architecture
We are going to build a Notebook that:
• Downloads the latest data from GitHub Archive
• Read & explore the dataset
• Imports, filters the PublicEvent
• Join logs w/ more data from Github API calls
• Shows simple HTML template, to visualise the
list
• Sends email notifications
Load Dependency
We are going to build a Notebook that:
• Downloads the latest data from GitHub Archive
• Read & explore the dataset
• Imports, filters the PublicEvent
• Join logs w/ more data from Github API calls
• Shows simple HTML template, to visualise the
list
• Sends email notifications
Download Data
In serial, sample, using shell interpreter
Download Data
In serial, whole day, using shell interpreter
Don’t need this as we have data
prepared
Download Data
In parallel, using Spark interpreter
We are going to build a Notebook that:
• Downloads the latest data from GitHub Archive
• Read & explore the dataset
• Imports, filters the PublicEvent
• Join on external information through remote API
call
• Shows simple HTML template to visualise the
list
• Sends email notifications
Read Data
Explore Data #1
Pie Chart of Event types
Explore Data #2
Top 10 organisations by event type
We are going to build a Notebook that:
• Downloads the latest data from GitHub Archive
• Read & explore the dataset
• Imports, filters the PublicEvent
• Join on external information through remote API
call
• Shows simple HTML template to visualise the
list
• Sends email notifications
Filter: cleanup
Only organisations open sourcing
repositories
Filter: interesting companies
Only organisations open sourcing
repositories
We are going to build a Notebook that:
• Downloads the latest data from GitHub Archive
• Read & explore the dataset
• Imports, filters the PublicEvent
• Join on external information through remote API
• Shows simple HTML template to visualize the
list
• Sends email notifications
Call Github API
Getting more information about
repository
GitHub personal access token to rise rate-limit
github.com/<username> => Edit Profile => Personal access tokens => Generate new
token
Join orgs and repos
We are going to build a Notebook that:
• Downloads the latest data from GitHub Archive
• Read & explore the dataset
• Imports, filters the PublicEvent
• Join on external information through remote API
call
• Shows simple HTML template
• Sends email notifications
HTML Preview
To generate a template
HTML Preview
To output the results
We are going to build a Notebook that:
• Downloads the latest data from GitHub Archive
• Read & explore the dataset
• Imports, filters the PublicEvent
• Join on external information through remote API
call
• Shows simple HTML template to visualise the
list
• Sends email notifications
Send Email
Send Email
Send Email
We are going to build a Notebook that:
• Downloads the latest data from GitHub Archive
• Read & explore the dataset
• Imports, filters the PublicEvent
• Join on external information through remote API
call
• Shows simple HTML template to visualise the
list
• Sends email notifications
Schedule
Kylin: interpreter setup
Kylin: consume data
Alexander Bezzubov
bzz@apache.org
http://s.apache.org/zeppelin-workshop

Mais conteúdo relacionado

Mais procurados

Airflow - An Open Source Platform to Author and Monitor Data Pipelines
Airflow - An Open Source Platform to Author and Monitor Data PipelinesAirflow - An Open Source Platform to Author and Monitor Data Pipelines
Airflow - An Open Source Platform to Author and Monitor Data PipelinesDataWorks Summit
 
Apache Kylin Introduction
Apache Kylin IntroductionApache Kylin Introduction
Apache Kylin IntroductionLuke Han
 
Hw09 Hadoop Applications At Yahoo!
Hw09   Hadoop Applications At Yahoo!Hw09   Hadoop Applications At Yahoo!
Hw09 Hadoop Applications At Yahoo!Cloudera, Inc.
 
Koalas: Unifying Spark and pandas APIs
Koalas: Unifying Spark and pandas APIsKoalas: Unifying Spark and pandas APIs
Koalas: Unifying Spark and pandas APIsXiao Li
 
The Apache Way - Building Open Source Community in China - Luke Han
The Apache Way - Building Open Source Community in China - Luke HanThe Apache Way - Building Open Source Community in China - Luke Han
The Apache Way - Building Open Source Community in China - Luke HanLuke Han
 
The Evolution of Apache Kylin by Luke Han
The Evolution of Apache Kylin by Luke HanThe Evolution of Apache Kylin by Luke Han
The Evolution of Apache Kylin by Luke HanLuke Han
 
Zeppelin at Twitter
Zeppelin at TwitterZeppelin at Twitter
Zeppelin at TwitterPrasad Wagle
 
Spark Summit EU talk by Jakub Hava
Spark Summit EU talk by Jakub HavaSpark Summit EU talk by Jakub Hava
Spark Summit EU talk by Jakub HavaSpark Summit
 
Apache Kylin Open Source Journey for QCon2015 Beijing
Apache Kylin Open Source Journey for QCon2015 BeijingApache Kylin Open Source Journey for QCon2015 Beijing
Apache Kylin Open Source Journey for QCon2015 BeijingLuke Han
 
Open Source Ingredients for Interactive Data Analysis in Spark by Maxim Lukiy...
Open Source Ingredients for Interactive Data Analysis in Spark by Maxim Lukiy...Open Source Ingredients for Interactive Data Analysis in Spark by Maxim Lukiy...
Open Source Ingredients for Interactive Data Analysis in Spark by Maxim Lukiy...DataWorks Summit/Hadoop Summit
 
Adding Spark support to Kylin at Bay Area Spark Meetup
Adding Spark support to Kylin at Bay Area Spark MeetupAdding Spark support to Kylin at Bay Area Spark Meetup
Adding Spark support to Kylin at Bay Area Spark MeetupLuke Han
 
Apache kylin - Big Data Technology Conference 2014 Beijing
Apache kylin - Big Data Technology Conference 2014 BeijingApache kylin - Big Data Technology Conference 2014 Beijing
Apache kylin - Big Data Technology Conference 2014 BeijingLuke Han
 
6. Apache Kylin Roadmap and Community - Apache Kylin Meetup @Shanghai
6. Apache Kylin Roadmap and Community - Apache Kylin Meetup @Shanghai6. Apache Kylin Roadmap and Community - Apache Kylin Meetup @Shanghai
6. Apache Kylin Roadmap and Community - Apache Kylin Meetup @ShanghaiLuke Han
 
Writing Apache Spark and Apache Flink Applications Using Apache Bahir
Writing Apache Spark and Apache Flink Applications Using Apache BahirWriting Apache Spark and Apache Flink Applications Using Apache Bahir
Writing Apache Spark and Apache Flink Applications Using Apache BahirLuciano Resende
 
Spark Summit EU talk by Oscar Castaneda
Spark Summit EU talk by Oscar CastanedaSpark Summit EU talk by Oscar Castaneda
Spark Summit EU talk by Oscar CastanedaSpark Summit
 
Spark Uber Development Kit
Spark Uber Development KitSpark Uber Development Kit
Spark Uber Development KitJen Aman
 
Zeppelin at twitter (sf data science meetup, july 2016)
Zeppelin at twitter (sf data science meetup, july 2016)Zeppelin at twitter (sf data science meetup, july 2016)
Zeppelin at twitter (sf data science meetup, july 2016)Prasad Wagle
 
Spark Summit EU talk by Dean Wampler
Spark Summit EU talk by Dean WamplerSpark Summit EU talk by Dean Wampler
Spark Summit EU talk by Dean WamplerSpark Summit
 
Interactive Analytics using Apache Spark
Interactive Analytics using Apache SparkInteractive Analytics using Apache Spark
Interactive Analytics using Apache SparkSachin Aggarwal
 
Apache Kylin Extreme OLAP Engine for Big Data
Apache Kylin Extreme OLAP Engine for Big DataApache Kylin Extreme OLAP Engine for Big Data
Apache Kylin Extreme OLAP Engine for Big DataLuke Han
 

Mais procurados (20)

Airflow - An Open Source Platform to Author and Monitor Data Pipelines
Airflow - An Open Source Platform to Author and Monitor Data PipelinesAirflow - An Open Source Platform to Author and Monitor Data Pipelines
Airflow - An Open Source Platform to Author and Monitor Data Pipelines
 
Apache Kylin Introduction
Apache Kylin IntroductionApache Kylin Introduction
Apache Kylin Introduction
 
Hw09 Hadoop Applications At Yahoo!
Hw09   Hadoop Applications At Yahoo!Hw09   Hadoop Applications At Yahoo!
Hw09 Hadoop Applications At Yahoo!
 
Koalas: Unifying Spark and pandas APIs
Koalas: Unifying Spark and pandas APIsKoalas: Unifying Spark and pandas APIs
Koalas: Unifying Spark and pandas APIs
 
The Apache Way - Building Open Source Community in China - Luke Han
The Apache Way - Building Open Source Community in China - Luke HanThe Apache Way - Building Open Source Community in China - Luke Han
The Apache Way - Building Open Source Community in China - Luke Han
 
The Evolution of Apache Kylin by Luke Han
The Evolution of Apache Kylin by Luke HanThe Evolution of Apache Kylin by Luke Han
The Evolution of Apache Kylin by Luke Han
 
Zeppelin at Twitter
Zeppelin at TwitterZeppelin at Twitter
Zeppelin at Twitter
 
Spark Summit EU talk by Jakub Hava
Spark Summit EU talk by Jakub HavaSpark Summit EU talk by Jakub Hava
Spark Summit EU talk by Jakub Hava
 
Apache Kylin Open Source Journey for QCon2015 Beijing
Apache Kylin Open Source Journey for QCon2015 BeijingApache Kylin Open Source Journey for QCon2015 Beijing
Apache Kylin Open Source Journey for QCon2015 Beijing
 
Open Source Ingredients for Interactive Data Analysis in Spark by Maxim Lukiy...
Open Source Ingredients for Interactive Data Analysis in Spark by Maxim Lukiy...Open Source Ingredients for Interactive Data Analysis in Spark by Maxim Lukiy...
Open Source Ingredients for Interactive Data Analysis in Spark by Maxim Lukiy...
 
Adding Spark support to Kylin at Bay Area Spark Meetup
Adding Spark support to Kylin at Bay Area Spark MeetupAdding Spark support to Kylin at Bay Area Spark Meetup
Adding Spark support to Kylin at Bay Area Spark Meetup
 
Apache kylin - Big Data Technology Conference 2014 Beijing
Apache kylin - Big Data Technology Conference 2014 BeijingApache kylin - Big Data Technology Conference 2014 Beijing
Apache kylin - Big Data Technology Conference 2014 Beijing
 
6. Apache Kylin Roadmap and Community - Apache Kylin Meetup @Shanghai
6. Apache Kylin Roadmap and Community - Apache Kylin Meetup @Shanghai6. Apache Kylin Roadmap and Community - Apache Kylin Meetup @Shanghai
6. Apache Kylin Roadmap and Community - Apache Kylin Meetup @Shanghai
 
Writing Apache Spark and Apache Flink Applications Using Apache Bahir
Writing Apache Spark and Apache Flink Applications Using Apache BahirWriting Apache Spark and Apache Flink Applications Using Apache Bahir
Writing Apache Spark and Apache Flink Applications Using Apache Bahir
 
Spark Summit EU talk by Oscar Castaneda
Spark Summit EU talk by Oscar CastanedaSpark Summit EU talk by Oscar Castaneda
Spark Summit EU talk by Oscar Castaneda
 
Spark Uber Development Kit
Spark Uber Development KitSpark Uber Development Kit
Spark Uber Development Kit
 
Zeppelin at twitter (sf data science meetup, july 2016)
Zeppelin at twitter (sf data science meetup, july 2016)Zeppelin at twitter (sf data science meetup, july 2016)
Zeppelin at twitter (sf data science meetup, july 2016)
 
Spark Summit EU talk by Dean Wampler
Spark Summit EU talk by Dean WamplerSpark Summit EU talk by Dean Wampler
Spark Summit EU talk by Dean Wampler
 
Interactive Analytics using Apache Spark
Interactive Analytics using Apache SparkInteractive Analytics using Apache Spark
Interactive Analytics using Apache Spark
 
Apache Kylin Extreme OLAP Engine for Big Data
Apache Kylin Extreme OLAP Engine for Big DataApache Kylin Extreme OLAP Engine for Big Data
Apache Kylin Extreme OLAP Engine for Big Data
 

Semelhante a 4.Building a Data Product using apache Zeppelin - Apache Kylin Meetup @Shanghai

Make an Instant Website with Webhooks
Make an Instant Website with WebhooksMake an Instant Website with Webhooks
Make an Instant Website with WebhooksAnne Gentle
 
IBM Connections REST API Klompendans
IBM Connections REST API KlompendansIBM Connections REST API Klompendans
IBM Connections REST API KlompendansHenning Schmidt
 
CICD Pipeline Using Github Actions
CICD Pipeline Using Github ActionsCICD Pipeline Using Github Actions
CICD Pipeline Using Github ActionsKumar Shìvam
 
Docs as Part of the Product - Open Source Summit North America 2018
Docs as Part of the Product - Open Source Summit North America 2018Docs as Part of the Product - Open Source Summit North America 2018
Docs as Part of the Product - Open Source Summit North America 2018Den Delimarsky
 
Dd13.2013.milano.open ntf
Dd13.2013.milano.open ntfDd13.2013.milano.open ntf
Dd13.2013.milano.open ntfUlrich Krause
 
DevOps on AWS: Accelerating Software Delivery with the AWS Developer Tools
DevOps on AWS: Accelerating Software Delivery with the AWS Developer ToolsDevOps on AWS: Accelerating Software Delivery with the AWS Developer Tools
DevOps on AWS: Accelerating Software Delivery with the AWS Developer ToolsAmazon Web Services
 
GraphDay Paris - Intégrer des flux de données dans Neo4j avec l'ETL Open Sour...
GraphDay Paris - Intégrer des flux de données dans Neo4j avec l'ETL Open Sour...GraphDay Paris - Intégrer des flux de données dans Neo4j avec l'ETL Open Sour...
GraphDay Paris - Intégrer des flux de données dans Neo4j avec l'ETL Open Sour...Neo4j
 
The Latest and Greatest from OpenNTF and the IBM Social Business Toolkit, #dd13
The Latest and Greatest from OpenNTF and the IBM Social Business Toolkit, #dd13The Latest and Greatest from OpenNTF and the IBM Social Business Toolkit, #dd13
The Latest and Greatest from OpenNTF and the IBM Social Business Toolkit, #dd13Dominopoint - Italian Lotus User Group
 
Learn Electron for Web Developers
Learn Electron for Web DevelopersLearn Electron for Web Developers
Learn Electron for Web DevelopersKyle Cearley
 
Machine Learning Platform in LINE Fukuoka
Machine Learning Platform in LINE FukuokaMachine Learning Platform in LINE Fukuoka
Machine Learning Platform in LINE FukuokaLINE Corporation
 
DEVNET-2002 Coding 201: Coding Skills 201: Going Further with REST and Python...
DEVNET-2002	Coding 201: Coding Skills 201: Going Further with REST and Python...DEVNET-2002	Coding 201: Coding Skills 201: Going Further with REST and Python...
DEVNET-2002 Coding 201: Coding Skills 201: Going Further with REST and Python...Cisco DevNet
 
Open Data and Web API
Open Data and Web APIOpen Data and Web API
Open Data and Web APISammy Fung
 
Getting started with sbt
Getting started with sbtGetting started with sbt
Getting started with sbtikenna4u
 
XPages -Beyond the Basics
XPages -Beyond the BasicsXPages -Beyond the Basics
XPages -Beyond the BasicsUlrich Krause
 
Guidelines for Working with Contract Developers in Evergreen
Guidelines for Working with Contract Developers in EvergreenGuidelines for Working with Contract Developers in Evergreen
Guidelines for Working with Contract Developers in Evergreenloriayre
 
Habitat Workshop at Velocity London 2017
Habitat Workshop at Velocity London 2017Habitat Workshop at Velocity London 2017
Habitat Workshop at Velocity London 2017Mandi Walls
 
An Introduction to Working With the Activity Stream
An Introduction to Working With the Activity StreamAn Introduction to Working With the Activity Stream
An Introduction to Working With the Activity StreamMikkel Flindt Heisterberg
 
AWS re:Invent 2016: DevOps on AWS: Accelerating Software Delivery with the AW...
AWS re:Invent 2016: DevOps on AWS: Accelerating Software Delivery with the AW...AWS re:Invent 2016: DevOps on AWS: Accelerating Software Delivery with the AW...
AWS re:Invent 2016: DevOps on AWS: Accelerating Software Delivery with the AW...Amazon Web Services
 
How to Play at Work - A Play Framework Tutorial
How to Play at Work - A Play Framework TutorialHow to Play at Work - A Play Framework Tutorial
How to Play at Work - A Play Framework TutorialAssistSoftware
 

Semelhante a 4.Building a Data Product using apache Zeppelin - Apache Kylin Meetup @Shanghai (20)

Make an Instant Website with Webhooks
Make an Instant Website with WebhooksMake an Instant Website with Webhooks
Make an Instant Website with Webhooks
 
IBM Connections REST API Klompendans
IBM Connections REST API KlompendansIBM Connections REST API Klompendans
IBM Connections REST API Klompendans
 
CICD Pipeline Using Github Actions
CICD Pipeline Using Github ActionsCICD Pipeline Using Github Actions
CICD Pipeline Using Github Actions
 
Docs as Part of the Product - Open Source Summit North America 2018
Docs as Part of the Product - Open Source Summit North America 2018Docs as Part of the Product - Open Source Summit North America 2018
Docs as Part of the Product - Open Source Summit North America 2018
 
Dd13.2013.milano.open ntf
Dd13.2013.milano.open ntfDd13.2013.milano.open ntf
Dd13.2013.milano.open ntf
 
DevOps on AWS: Accelerating Software Delivery with the AWS Developer Tools
DevOps on AWS: Accelerating Software Delivery with the AWS Developer ToolsDevOps on AWS: Accelerating Software Delivery with the AWS Developer Tools
DevOps on AWS: Accelerating Software Delivery with the AWS Developer Tools
 
GraphDay Paris - Intégrer des flux de données dans Neo4j avec l'ETL Open Sour...
GraphDay Paris - Intégrer des flux de données dans Neo4j avec l'ETL Open Sour...GraphDay Paris - Intégrer des flux de données dans Neo4j avec l'ETL Open Sour...
GraphDay Paris - Intégrer des flux de données dans Neo4j avec l'ETL Open Sour...
 
The Latest and Greatest from OpenNTF and the IBM Social Business Toolkit, #dd13
The Latest and Greatest from OpenNTF and the IBM Social Business Toolkit, #dd13The Latest and Greatest from OpenNTF and the IBM Social Business Toolkit, #dd13
The Latest and Greatest from OpenNTF and the IBM Social Business Toolkit, #dd13
 
Learn Electron for Web Developers
Learn Electron for Web DevelopersLearn Electron for Web Developers
Learn Electron for Web Developers
 
Machine Learning Platform in LINE Fukuoka
Machine Learning Platform in LINE FukuokaMachine Learning Platform in LINE Fukuoka
Machine Learning Platform in LINE Fukuoka
 
DEVNET-2002 Coding 201: Coding Skills 201: Going Further with REST and Python...
DEVNET-2002	Coding 201: Coding Skills 201: Going Further with REST and Python...DEVNET-2002	Coding 201: Coding Skills 201: Going Further with REST and Python...
DEVNET-2002 Coding 201: Coding Skills 201: Going Further with REST and Python...
 
Open Data and Web API
Open Data and Web APIOpen Data and Web API
Open Data and Web API
 
Getting started with sbt
Getting started with sbtGetting started with sbt
Getting started with sbt
 
INLS461_day14a.ppt
INLS461_day14a.pptINLS461_day14a.ppt
INLS461_day14a.ppt
 
XPages -Beyond the Basics
XPages -Beyond the BasicsXPages -Beyond the Basics
XPages -Beyond the Basics
 
Guidelines for Working with Contract Developers in Evergreen
Guidelines for Working with Contract Developers in EvergreenGuidelines for Working with Contract Developers in Evergreen
Guidelines for Working with Contract Developers in Evergreen
 
Habitat Workshop at Velocity London 2017
Habitat Workshop at Velocity London 2017Habitat Workshop at Velocity London 2017
Habitat Workshop at Velocity London 2017
 
An Introduction to Working With the Activity Stream
An Introduction to Working With the Activity StreamAn Introduction to Working With the Activity Stream
An Introduction to Working With the Activity Stream
 
AWS re:Invent 2016: DevOps on AWS: Accelerating Software Delivery with the AW...
AWS re:Invent 2016: DevOps on AWS: Accelerating Software Delivery with the AW...AWS re:Invent 2016: DevOps on AWS: Accelerating Software Delivery with the AW...
AWS re:Invent 2016: DevOps on AWS: Accelerating Software Delivery with the AW...
 
How to Play at Work - A Play Framework Tutorial
How to Play at Work - A Play Framework TutorialHow to Play at Work - A Play Framework Tutorial
How to Play at Work - A Play Framework Tutorial
 

Mais de Luke Han

Augmented OLAP for Big Data
Augmented OLAP for Big DataAugmented OLAP for Big Data
Augmented OLAP for Big DataLuke Han
 
Apache Kylin and Use Cases - 2018 Big Data Spain
Apache Kylin and Use Cases - 2018 Big Data SpainApache Kylin and Use Cases - 2018 Big Data Spain
Apache Kylin and Use Cases - 2018 Big Data SpainLuke Han
 
Refactoring your EDW with Mobile Analytics Products
Refactoring your EDW with Mobile Analytics ProductsRefactoring your EDW with Mobile Analytics Products
Refactoring your EDW with Mobile Analytics ProductsLuke Han
 
Building Enterprise OLAP on Hadoop for FSI
Building Enterprise OLAP on Hadoop for FSIBuilding Enterprise OLAP on Hadoop for FSI
Building Enterprise OLAP on Hadoop for FSILuke Han
 
Apache Kylin Use Cases in China and Japan
Apache Kylin Use Cases in China and JapanApache Kylin Use Cases in China and Japan
Apache Kylin Use Cases in China and JapanLuke Han
 
3. Apache Tez Introducation - Apache Kylin Meetup @Shanghai
3. Apache Tez Introducation - Apache Kylin Meetup @Shanghai3. Apache Tez Introducation - Apache Kylin Meetup @Shanghai
3. Apache Tez Introducation - Apache Kylin Meetup @ShanghaiLuke Han
 
5. Apache Kylin的金融大数据应用场景 - Apache Kylin Meetup @Shanghai
5. Apache Kylin的金融大数据应用场景 - Apache Kylin Meetup @Shanghai5. Apache Kylin的金融大数据应用场景 - Apache Kylin Meetup @Shanghai
5. Apache Kylin的金融大数据应用场景 - Apache Kylin Meetup @ShanghaiLuke Han
 
ApacheKylin_HBaseCon2015
ApacheKylin_HBaseCon2015ApacheKylin_HBaseCon2015
ApacheKylin_HBaseCon2015Luke Han
 
Kylin OLAP Engine Tour
Kylin OLAP Engine TourKylin OLAP Engine Tour
Kylin OLAP Engine TourLuke Han
 
Actuate presentation 2011
Actuate presentation   2011Actuate presentation   2011
Actuate presentation 2011Luke Han
 

Mais de Luke Han (10)

Augmented OLAP for Big Data
Augmented OLAP for Big DataAugmented OLAP for Big Data
Augmented OLAP for Big Data
 
Apache Kylin and Use Cases - 2018 Big Data Spain
Apache Kylin and Use Cases - 2018 Big Data SpainApache Kylin and Use Cases - 2018 Big Data Spain
Apache Kylin and Use Cases - 2018 Big Data Spain
 
Refactoring your EDW with Mobile Analytics Products
Refactoring your EDW with Mobile Analytics ProductsRefactoring your EDW with Mobile Analytics Products
Refactoring your EDW with Mobile Analytics Products
 
Building Enterprise OLAP on Hadoop for FSI
Building Enterprise OLAP on Hadoop for FSIBuilding Enterprise OLAP on Hadoop for FSI
Building Enterprise OLAP on Hadoop for FSI
 
Apache Kylin Use Cases in China and Japan
Apache Kylin Use Cases in China and JapanApache Kylin Use Cases in China and Japan
Apache Kylin Use Cases in China and Japan
 
3. Apache Tez Introducation - Apache Kylin Meetup @Shanghai
3. Apache Tez Introducation - Apache Kylin Meetup @Shanghai3. Apache Tez Introducation - Apache Kylin Meetup @Shanghai
3. Apache Tez Introducation - Apache Kylin Meetup @Shanghai
 
5. Apache Kylin的金融大数据应用场景 - Apache Kylin Meetup @Shanghai
5. Apache Kylin的金融大数据应用场景 - Apache Kylin Meetup @Shanghai5. Apache Kylin的金融大数据应用场景 - Apache Kylin Meetup @Shanghai
5. Apache Kylin的金融大数据应用场景 - Apache Kylin Meetup @Shanghai
 
ApacheKylin_HBaseCon2015
ApacheKylin_HBaseCon2015ApacheKylin_HBaseCon2015
ApacheKylin_HBaseCon2015
 
Kylin OLAP Engine Tour
Kylin OLAP Engine TourKylin OLAP Engine Tour
Kylin OLAP Engine Tour
 
Actuate presentation 2011
Actuate presentation   2011Actuate presentation   2011
Actuate presentation 2011
 

Último

WSO2CON 2024 - Does Open Source Still Matter?
WSO2CON 2024 - Does Open Source Still Matter?WSO2CON 2024 - Does Open Source Still Matter?
WSO2CON 2024 - Does Open Source Still Matter?WSO2
 
%in Bahrain+277-882-255-28 abortion pills for sale in Bahrain
%in Bahrain+277-882-255-28 abortion pills for sale in Bahrain%in Bahrain+277-882-255-28 abortion pills for sale in Bahrain
%in Bahrain+277-882-255-28 abortion pills for sale in Bahrainmasabamasaba
 
Introducing Microsoft’s new Enterprise Work Management (EWM) Solution
Introducing Microsoft’s new Enterprise Work Management (EWM) SolutionIntroducing Microsoft’s new Enterprise Work Management (EWM) Solution
Introducing Microsoft’s new Enterprise Work Management (EWM) SolutionOnePlan Solutions
 
%in Hazyview+277-882-255-28 abortion pills for sale in Hazyview
%in Hazyview+277-882-255-28 abortion pills for sale in Hazyview%in Hazyview+277-882-255-28 abortion pills for sale in Hazyview
%in Hazyview+277-882-255-28 abortion pills for sale in Hazyviewmasabamasaba
 
Right Money Management App For Your Financial Goals
Right Money Management App For Your Financial GoalsRight Money Management App For Your Financial Goals
Right Money Management App For Your Financial GoalsJhone kinadey
 
%in ivory park+277-882-255-28 abortion pills for sale in ivory park
%in ivory park+277-882-255-28 abortion pills for sale in ivory park %in ivory park+277-882-255-28 abortion pills for sale in ivory park
%in ivory park+277-882-255-28 abortion pills for sale in ivory park masabamasaba
 
Define the academic and professional writing..pdf
Define the academic and professional writing..pdfDefine the academic and professional writing..pdf
Define the academic and professional writing..pdfPearlKirahMaeRagusta1
 
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...Steffen Staab
 
%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...
%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...
%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...masabamasaba
 
WSO2CON2024 - It's time to go Platformless
WSO2CON2024 - It's time to go PlatformlessWSO2CON2024 - It's time to go Platformless
WSO2CON2024 - It's time to go PlatformlessWSO2
 
%in tembisa+277-882-255-28 abortion pills for sale in tembisa
%in tembisa+277-882-255-28 abortion pills for sale in tembisa%in tembisa+277-882-255-28 abortion pills for sale in tembisa
%in tembisa+277-882-255-28 abortion pills for sale in tembisamasabamasaba
 
%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...
%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...
%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...masabamasaba
 
Direct Style Effect Systems - The Print[A] Example - A Comprehension Aid
Direct Style Effect Systems -The Print[A] Example- A Comprehension AidDirect Style Effect Systems -The Print[A] Example- A Comprehension Aid
Direct Style Effect Systems - The Print[A] Example - A Comprehension AidPhilip Schwarz
 
%+27788225528 love spells in Huntington Beach Psychic Readings, Attraction sp...
%+27788225528 love spells in Huntington Beach Psychic Readings, Attraction sp...%+27788225528 love spells in Huntington Beach Psychic Readings, Attraction sp...
%+27788225528 love spells in Huntington Beach Psychic Readings, Attraction sp...masabamasaba
 
Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...
Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...
Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...SelfMade bd
 
Devoxx UK 2024 - Going serverless with Quarkus, GraalVM native images and AWS...
Devoxx UK 2024 - Going serverless with Quarkus, GraalVM native images and AWS...Devoxx UK 2024 - Going serverless with Quarkus, GraalVM native images and AWS...
Devoxx UK 2024 - Going serverless with Quarkus, GraalVM native images and AWS...Bert Jan Schrijver
 
WSO2Con2024 - From Code To Cloud: Fast Track Your Cloud Native Journey with C...
WSO2Con2024 - From Code To Cloud: Fast Track Your Cloud Native Journey with C...WSO2Con2024 - From Code To Cloud: Fast Track Your Cloud Native Journey with C...
WSO2Con2024 - From Code To Cloud: Fast Track Your Cloud Native Journey with C...WSO2
 
AI & Machine Learning Presentation Template
AI & Machine Learning Presentation TemplateAI & Machine Learning Presentation Template
AI & Machine Learning Presentation TemplatePresentation.STUDIO
 
Harnessing ChatGPT - Elevating Productivity in Today's Agile Environment
Harnessing ChatGPT  - Elevating Productivity in Today's Agile EnvironmentHarnessing ChatGPT  - Elevating Productivity in Today's Agile Environment
Harnessing ChatGPT - Elevating Productivity in Today's Agile EnvironmentVictorSzoltysek
 

Último (20)

WSO2CON 2024 - Does Open Source Still Matter?
WSO2CON 2024 - Does Open Source Still Matter?WSO2CON 2024 - Does Open Source Still Matter?
WSO2CON 2024 - Does Open Source Still Matter?
 
%in Bahrain+277-882-255-28 abortion pills for sale in Bahrain
%in Bahrain+277-882-255-28 abortion pills for sale in Bahrain%in Bahrain+277-882-255-28 abortion pills for sale in Bahrain
%in Bahrain+277-882-255-28 abortion pills for sale in Bahrain
 
Introducing Microsoft’s new Enterprise Work Management (EWM) Solution
Introducing Microsoft’s new Enterprise Work Management (EWM) SolutionIntroducing Microsoft’s new Enterprise Work Management (EWM) Solution
Introducing Microsoft’s new Enterprise Work Management (EWM) Solution
 
%in Hazyview+277-882-255-28 abortion pills for sale in Hazyview
%in Hazyview+277-882-255-28 abortion pills for sale in Hazyview%in Hazyview+277-882-255-28 abortion pills for sale in Hazyview
%in Hazyview+277-882-255-28 abortion pills for sale in Hazyview
 
Right Money Management App For Your Financial Goals
Right Money Management App For Your Financial GoalsRight Money Management App For Your Financial Goals
Right Money Management App For Your Financial Goals
 
%in ivory park+277-882-255-28 abortion pills for sale in ivory park
%in ivory park+277-882-255-28 abortion pills for sale in ivory park %in ivory park+277-882-255-28 abortion pills for sale in ivory park
%in ivory park+277-882-255-28 abortion pills for sale in ivory park
 
Define the academic and professional writing..pdf
Define the academic and professional writing..pdfDefine the academic and professional writing..pdf
Define the academic and professional writing..pdf
 
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
 
%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...
%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...
%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...
 
WSO2CON2024 - It's time to go Platformless
WSO2CON2024 - It's time to go PlatformlessWSO2CON2024 - It's time to go Platformless
WSO2CON2024 - It's time to go Platformless
 
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
 
%in tembisa+277-882-255-28 abortion pills for sale in tembisa
%in tembisa+277-882-255-28 abortion pills for sale in tembisa%in tembisa+277-882-255-28 abortion pills for sale in tembisa
%in tembisa+277-882-255-28 abortion pills for sale in tembisa
 
%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...
%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...
%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...
 
Direct Style Effect Systems - The Print[A] Example - A Comprehension Aid
Direct Style Effect Systems -The Print[A] Example- A Comprehension AidDirect Style Effect Systems -The Print[A] Example- A Comprehension Aid
Direct Style Effect Systems - The Print[A] Example - A Comprehension Aid
 
%+27788225528 love spells in Huntington Beach Psychic Readings, Attraction sp...
%+27788225528 love spells in Huntington Beach Psychic Readings, Attraction sp...%+27788225528 love spells in Huntington Beach Psychic Readings, Attraction sp...
%+27788225528 love spells in Huntington Beach Psychic Readings, Attraction sp...
 
Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...
Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...
Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...
 
Devoxx UK 2024 - Going serverless with Quarkus, GraalVM native images and AWS...
Devoxx UK 2024 - Going serverless with Quarkus, GraalVM native images and AWS...Devoxx UK 2024 - Going serverless with Quarkus, GraalVM native images and AWS...
Devoxx UK 2024 - Going serverless with Quarkus, GraalVM native images and AWS...
 
WSO2Con2024 - From Code To Cloud: Fast Track Your Cloud Native Journey with C...
WSO2Con2024 - From Code To Cloud: Fast Track Your Cloud Native Journey with C...WSO2Con2024 - From Code To Cloud: Fast Track Your Cloud Native Journey with C...
WSO2Con2024 - From Code To Cloud: Fast Track Your Cloud Native Journey with C...
 
AI & Machine Learning Presentation Template
AI & Machine Learning Presentation TemplateAI & Machine Learning Presentation Template
AI & Machine Learning Presentation Template
 
Harnessing ChatGPT - Elevating Productivity in Today's Agile Environment
Harnessing ChatGPT  - Elevating Productivity in Today's Agile EnvironmentHarnessing ChatGPT  - Elevating Productivity in Today's Agile Environment
Harnessing ChatGPT - Elevating Productivity in Today's Agile Environment
 

4.Building a Data Product using apache Zeppelin - Apache Kylin Meetup @Shanghai

Notas do Editor

  1. short intro designed as a workshop - go home with some practical skills
  2. Go though some history of Zeppelin project and then I’ll guide through a hands-on example of one data product
  3. any “product”, based on data
  4. Easy, because of Apache &BigData
  5. ASF is a leader in BigData ecosystem one of the biggest foundations, 200+ projects gives you the power to crunch the data that only big companies have interesting engineering challenges, job
  6. a lot computing options some of them do have a web ui to interact with, but are project-secific
  7. with Z - you have visualisation options, agnostic to the backend allows you to be flexible with your stack Spark is the most advanced interpreter now - historical reasons
  8. * what is Zeppelin? and some history
  9. started at NFLabs in Seoul, South Korea open-source (free, as a beer) since 2013 when mature enough, to provide a value to community -> ASF
  10. comercial app
  11. prototype OSS free as a beer
  12. visualisations 2 different backend Hive and Shark
  13. with pluggable backend systems, thought ‘interpreters’
  14. so, what exactly is Zeppelin that’s how it look like in 10 months under ASF
  15. interpreters abstracted was very important
  16. if you want to build a data product - you better be prepared for scale quality of the product depends on the amount of data
  17. * plan for both: a cluster (cloud) and a laptop (dev prod) for free * …and you have those, thanks to ASF!
  18. good example of data product using web logs!
  19. talked about Apache a lot (Kylin, Tez..) we all love Opensource Github is the biggest opensource code repository problem: you cannot follow organisation on Github!
  20. oss is awesome, so many things happen - hard to keep up problem: you cannot follow organisation on Github! say eBay, or any other tech company you are interested in OSS something now have software\hardware and an Idea for data product. Something missing
  21. Github opensources their log data may companies - log-generating machines, but GH pushes it further * available to crunch, using tones of closed-sourced engineering effort
  22. Would not that be cool?
  23. Hands-on part
  24. break-down: 8 steps
  25. build a missing feature definition of “interesting” may vary i.e not just “new from big companies”, but intelligent suggestions based on
  26. going to be using different interpreters: shell, spark, kylin
  27. break-down: 8 steps
  28. break-down: 8 steps
  29. sequential download, does not utilise full channel bandwidth
  30. * simplicity! * in 24 parallel threads