SlideShare uma empresa Scribd logo
1 de 42
OASIS – DATA ANALYSIS
PLATFORM FOR ENTERPRISE
KEIJI YOSHIDA – DATA ENGINEER, LINE CORPORATION
• We have created a web-based data analysis platform named ”OASIS”
• Employees can analyze data of a Hadoop cluster by writing Spark applications
• 100+ employees use it every day
INTRODUCTION
Agenda 1. Motivation
2. Features & System Architecture
3. Use Cases
Agenda 1. Motivation
2. Features & System Architecture
3. Use Cases
• Based in Tokyo, Japan
• Provides a messaging application named “LINE”
• 164M monthly active users across Japan, Thailand, Taiwan, and Indonesia
• Also provides various related services such as games, sticker market, etc.
LINE CORPORATION
DATA PLATFORM AT LINE
Services
Game
Sticker Market
News
Music
Video Streaming
・
・
・
Processing
Data
Hadoop Cluster
Game
LINE Ads Platform
Sticker
Market
News Music
Video
Streaming
LINE Ads
Platform
・
・
・
Analysis
BI / Reporting
• Enable all employees analyze their services’ data as they like
• Speed up their data analysis process and decision making
PUBLICATION OF HADOOP CLUSTER
1. Security
• Each employee can access only the data related to their service
2. Stability
• Queries must not affect the performance of other queries
3. Features
• Each employee can extract data from the Hadoop cluster as they like
• Results can be visualized and shared within a team or a department
REQUIREMENTS
1. Security
• Kerberize the Hadoop cluster and install Apache Ranger
2. Stability
• Use Apache Spark as a query and application execution engine
3. Features
• Try Apache Zeppelin for the Web UI
SOLUTIONS
• Framework to manage access control over a Hadoop cluster
• Used to control each employee’s data access
APACHE RANGER
1. Security
• Kerberize the Hadoop cluster and install Apache Ranger
2. Stability
• Use Apache Spark as a query and application execution engine
3. Features
• Try Apache Zeppelin for the Web UI
SOLUTIONS
• Web-based data analysis tool
• Supports Apache Spark and user impersonation
• Results can be shared within multiple users as a form of a notebook
APACHE ZEPPELIN
SYSTEM ARCHITECTURE
Apache Zeppelin
0.7.3
Hadoop YARN
Cluster
LDAP
Spark application submission
+
User impersonation
End Users
HDFS + Apache
Ranger
User authentication Data access
1. Security
• An arbitrary user can be set to scheduled notebook’s execution user
• Users can access the data which they don’t have access rights to
ISSUES OF APACHE ZEPPELIN 0.7.3
2. Stability
• Runs only on a single server
• Does not support the “yarn-cluster” mode of Apache Spark
• Freezes when a Spark driver program consumes many server resources
ISSUES OF APACHE ZEPPELIN 0.7.3
3. Features
• Users have to set access control to each notebook
• Users cannot execute a notebook while changing only its parameters
without saving it
ISSUES OF APACHE ZEPPELIN 0.7.3
OASIS
COMPARISON
For a single team
• Does not go with Apache Ranger
• Runs only on a single server
• Access control per notebook
Apache Zeppelin
For an enterprise
• Works well with Apache Ranger
• Runs on multiple servers
• Access control per team
OASIS
Agenda 1. Motivation
2. Features & System Architecture
3. Use Cases
• User impersonation
• Data Visualization
• Notebooks Sharing
• Scalable
OVERVIEW
TOP PAGE
• A root directory of notebooks for a team, a department, or a service
• Access right for each user is separately set in each space
• 2 types of access rights: ”read write” and “read only”
SPACE
Space 1
User A (read write) User B (read only)
Space 2
User C (read write) User D (read only)
NOTEBOOK CREATION
• A single Spark application launches per notebook session
• Notebook author’s account is used to access files
• Spark, Spark SQL, PySpark, and SparkR are available
• Each language of a single notebook session shares a single Spark application
SPARK APPLICATION
SPARK APPLICATION
UTILIZATION OF IN-MEMORY CACHING
• Notebooks can be executed automatically on a prescribed schedule
• Contents of notebooks can be kept updated by this feature
• This feature is also used for creating a light weight ETL processing
SCHEDULING
• Parameters can be injected into a notebook during its execution
• “read only” users can execute a notebook while changing its parameters
PARAMETERS
TABLE DEFINITIONS
FILE UPLOAD
SYSTEM ARCHITECTURE
End Users
Frontend /
API
OASIS
Job
Scheduler
MySQL Redis
Spark
Interpreter
Hadoop
YARN
Cluster
HDFS /
Apache
Ranger
• 500 DataNodes / NodeManagers
• HDFS usage: 20PB
• 150+ Hive databases
• 1,500+ Hive tables
HADOOP CLUSTER
Agenda 1. Motivation
2. Features & System Architecture
3. Use Cases
• 1,500+ users
• 100+ daily active users
• 300+ monthly active users
• 30+ spaces (i.e. departments, teams, or services)
• 1,100+ notebooks
• 200+ scheduled notebooks
STATS
1. Report
2. Interactive dashboard
3. ETL
4. Monitoring
5. Ad hoc analysis
USE CASES
1. REPORT
2. INTERACTIVE DASHBOARD
3. ETL
4. MONITORING
5. AD HOC ANALYSIS
• We created OASIS to solve the issues of Apache Zeppelin
• Extracted data can be visualized and shared within a team
• OASIS utilizes the user impersonation feature of Apache Spark
• At LINE, OASIS is used for reporting, data monitoring, ad hoc analysis, etc.
RECAP
THANK YOU

Mais conteúdo relacionado

Mais procurados

More Best Practices With Share Point Solutions
More Best Practices With Share Point SolutionsMore Best Practices With Share Point Solutions
More Best Practices With Share Point SolutionsAlexander Meijers
 
Spark UDFs are EviL, Catalyst to the rEsCue!
Spark UDFs are EviL, Catalyst to the rEsCue!Spark UDFs are EviL, Catalyst to the rEsCue!
Spark UDFs are EviL, Catalyst to the rEsCue!Adi Polak
 
"Introduction to Sparkling Water" — Jakub Hava, Senior Software Engineer, at ...
"Introduction to Sparkling Water" — Jakub Hava, Senior Software Engineer, at ..."Introduction to Sparkling Water" — Jakub Hava, Senior Software Engineer, at ...
"Introduction to Sparkling Water" — Jakub Hava, Senior Software Engineer, at ...Provectus
 
Writing Apache Spark and Apache Flink Applications Using Apache Bahir
Writing Apache Spark and Apache Flink Applications Using Apache BahirWriting Apache Spark and Apache Flink Applications Using Apache Bahir
Writing Apache Spark and Apache Flink Applications Using Apache BahirLuciano Resende
 
Extending_EBS_12_1_3_with_APEX_5_0_COLLABORATE16
Extending_EBS_12_1_3_with_APEX_5_0_COLLABORATE16Extending_EBS_12_1_3_with_APEX_5_0_COLLABORATE16
Extending_EBS_12_1_3_with_APEX_5_0_COLLABORATE16Alfredo Abate
 
Feature Toggle for .Net Core Apps on Azure with Azure App Configuration Featu...
Feature Toggle for .Net Core Apps on Azure with Azure App Configuration Featu...Feature Toggle for .Net Core Apps on Azure with Azure App Configuration Featu...
Feature Toggle for .Net Core Apps on Azure with Azure App Configuration Featu...Kasun Kodagoda
 
Boston Future of Data Meetup: May 2017: Spark Introduction with Credit Card F...
Boston Future of Data Meetup: May 2017: Spark Introduction with Credit Card F...Boston Future of Data Meetup: May 2017: Spark Introduction with Credit Card F...
Boston Future of Data Meetup: May 2017: Spark Introduction with Credit Card F...Carolyn Duby
 
Rackspace: Email's Solution for Indexing 50K Documents per Second: Presented ...
Rackspace: Email's Solution for Indexing 50K Documents per Second: Presented ...Rackspace: Email's Solution for Indexing 50K Documents per Second: Presented ...
Rackspace: Email's Solution for Indexing 50K Documents per Second: Presented ...Lucidworks
 
Scala & Spark Online Training
Scala & Spark Online TrainingScala & Spark Online Training
Scala & Spark Online TrainingLearntek1
 
KoprowskiT_SQLSat409_MaintenancePlansForBeginners
KoprowskiT_SQLSat409_MaintenancePlansForBeginnersKoprowskiT_SQLSat409_MaintenancePlansForBeginners
KoprowskiT_SQLSat409_MaintenancePlansForBeginnersTobias Koprowski
 
Meson: Building a Machine Learning Orchestration Framework on Mesos
Meson: Building a Machine Learning Orchestration Framework on MesosMeson: Building a Machine Learning Orchestration Framework on Mesos
Meson: Building a Machine Learning Orchestration Framework on MesosAntony Arokiasamy
 
SharePoint NYC search presentation
SharePoint NYC search presentationSharePoint NYC search presentation
SharePoint NYC search presentationjtbarrera
 
Spark Hsinchu meetup
Spark Hsinchu meetupSpark Hsinchu meetup
Spark Hsinchu meetupYung-An He
 
Individual Serverless Development Environments for AWS
Individual Serverless Development Environments for AWSIndividual Serverless Development Environments for AWS
Individual Serverless Development Environments for AWSSøren Peter Nielsen
 
Building Enterprise Search Engines using Open Source Technologies
Building Enterprise Search Engines using Open Source TechnologiesBuilding Enterprise Search Engines using Open Source Technologies
Building Enterprise Search Engines using Open Source TechnologiesRahul Singh
 
Jupyter con meetup extended jupyter kernel gateway
Jupyter con meetup   extended jupyter kernel gatewayJupyter con meetup   extended jupyter kernel gateway
Jupyter con meetup extended jupyter kernel gatewayLuciano Resende
 

Mais procurados (20)

More Best Practices With Share Point Solutions
More Best Practices With Share Point SolutionsMore Best Practices With Share Point Solutions
More Best Practices With Share Point Solutions
 
Spark UDFs are EviL, Catalyst to the rEsCue!
Spark UDFs are EviL, Catalyst to the rEsCue!Spark UDFs are EviL, Catalyst to the rEsCue!
Spark UDFs are EviL, Catalyst to the rEsCue!
 
"Introduction to Sparkling Water" — Jakub Hava, Senior Software Engineer, at ...
"Introduction to Sparkling Water" — Jakub Hava, Senior Software Engineer, at ..."Introduction to Sparkling Water" — Jakub Hava, Senior Software Engineer, at ...
"Introduction to Sparkling Water" — Jakub Hava, Senior Software Engineer, at ...
 
SharePoint 2013 - What's New
SharePoint 2013 - What's NewSharePoint 2013 - What's New
SharePoint 2013 - What's New
 
Writing Apache Spark and Apache Flink Applications Using Apache Bahir
Writing Apache Spark and Apache Flink Applications Using Apache BahirWriting Apache Spark and Apache Flink Applications Using Apache Bahir
Writing Apache Spark and Apache Flink Applications Using Apache Bahir
 
Extending_EBS_12_1_3_with_APEX_5_0_COLLABORATE16
Extending_EBS_12_1_3_with_APEX_5_0_COLLABORATE16Extending_EBS_12_1_3_with_APEX_5_0_COLLABORATE16
Extending_EBS_12_1_3_with_APEX_5_0_COLLABORATE16
 
Feature Toggle for .Net Core Apps on Azure with Azure App Configuration Featu...
Feature Toggle for .Net Core Apps on Azure with Azure App Configuration Featu...Feature Toggle for .Net Core Apps on Azure with Azure App Configuration Featu...
Feature Toggle for .Net Core Apps on Azure with Azure App Configuration Featu...
 
Boston Future of Data Meetup: May 2017: Spark Introduction with Credit Card F...
Boston Future of Data Meetup: May 2017: Spark Introduction with Credit Card F...Boston Future of Data Meetup: May 2017: Spark Introduction with Credit Card F...
Boston Future of Data Meetup: May 2017: Spark Introduction with Credit Card F...
 
SQL TUNING 101
SQL TUNING 101SQL TUNING 101
SQL TUNING 101
 
PerfTest in SOA
PerfTest in SOAPerfTest in SOA
PerfTest in SOA
 
Rackspace: Email's Solution for Indexing 50K Documents per Second: Presented ...
Rackspace: Email's Solution for Indexing 50K Documents per Second: Presented ...Rackspace: Email's Solution for Indexing 50K Documents per Second: Presented ...
Rackspace: Email's Solution for Indexing 50K Documents per Second: Presented ...
 
Elixir in the Wild
Elixir in the WildElixir in the Wild
Elixir in the Wild
 
Scala & Spark Online Training
Scala & Spark Online TrainingScala & Spark Online Training
Scala & Spark Online Training
 
KoprowskiT_SQLSat409_MaintenancePlansForBeginners
KoprowskiT_SQLSat409_MaintenancePlansForBeginnersKoprowskiT_SQLSat409_MaintenancePlansForBeginners
KoprowskiT_SQLSat409_MaintenancePlansForBeginners
 
Meson: Building a Machine Learning Orchestration Framework on Mesos
Meson: Building a Machine Learning Orchestration Framework on MesosMeson: Building a Machine Learning Orchestration Framework on Mesos
Meson: Building a Machine Learning Orchestration Framework on Mesos
 
SharePoint NYC search presentation
SharePoint NYC search presentationSharePoint NYC search presentation
SharePoint NYC search presentation
 
Spark Hsinchu meetup
Spark Hsinchu meetupSpark Hsinchu meetup
Spark Hsinchu meetup
 
Individual Serverless Development Environments for AWS
Individual Serverless Development Environments for AWSIndividual Serverless Development Environments for AWS
Individual Serverless Development Environments for AWS
 
Building Enterprise Search Engines using Open Source Technologies
Building Enterprise Search Engines using Open Source TechnologiesBuilding Enterprise Search Engines using Open Source Technologies
Building Enterprise Search Engines using Open Source Technologies
 
Jupyter con meetup extended jupyter kernel gateway
Jupyter con meetup   extended jupyter kernel gatewayJupyter con meetup   extended jupyter kernel gateway
Jupyter con meetup extended jupyter kernel gateway
 

Semelhante a OASIS Data Analysis Platform for Enterprise

OASIS - Data Analysis Platform for Multi-tenant Hadoop Cluster
OASIS - Data Analysis Platform for Multi-tenant Hadoop ClusterOASIS - Data Analysis Platform for Multi-tenant Hadoop Cluster
OASIS - Data Analysis Platform for Multi-tenant Hadoop ClusterLINE Corporation
 
One Tool to Rule Them All- Seamless SQL on MongoDB, MySQL and Redis with Apac...
One Tool to Rule Them All- Seamless SQL on MongoDB, MySQL and Redis with Apac...One Tool to Rule Them All- Seamless SQL on MongoDB, MySQL and Redis with Apac...
One Tool to Rule Them All- Seamless SQL on MongoDB, MySQL and Redis with Apac...Tim Vaillancourt
 
Apache Spark Fundamentals
Apache Spark FundamentalsApache Spark Fundamentals
Apache Spark FundamentalsZahra Eskandari
 
Data Science at Scale: Using Apache Spark for Data Science at Bitly
Data Science at Scale: Using Apache Spark for Data Science at BitlyData Science at Scale: Using Apache Spark for Data Science at Bitly
Data Science at Scale: Using Apache Spark for Data Science at BitlySarah Guido
 
Webinar: What's new in CDAP 3.5?
Webinar: What's new in CDAP 3.5?Webinar: What's new in CDAP 3.5?
Webinar: What's new in CDAP 3.5?Cask Data
 
Scaling out Driverless AI with IBM Spectrum Conductor - Kevin Doyle - H2O AI ...
Scaling out Driverless AI with IBM Spectrum Conductor - Kevin Doyle - H2O AI ...Scaling out Driverless AI with IBM Spectrum Conductor - Kevin Doyle - H2O AI ...
Scaling out Driverless AI with IBM Spectrum Conductor - Kevin Doyle - H2O AI ...Sri Ambati
 
Getting Started with Oracle APEX
Getting Started with Oracle APEXGetting Started with Oracle APEX
Getting Started with Oracle APEXDataNext Solutions
 
The Analytic Platform behind IBM’s Watson Data Platform by Luciano Resende a...
 The Analytic Platform behind IBM’s Watson Data Platform by Luciano Resende a... The Analytic Platform behind IBM’s Watson Data Platform by Luciano Resende a...
The Analytic Platform behind IBM’s Watson Data Platform by Luciano Resende a...Big Data Spain
 
Machine Learning With H2O vs SparkML
Machine Learning With H2O vs SparkMLMachine Learning With H2O vs SparkML
Machine Learning With H2O vs SparkMLArnab Biswas
 
Apache Spark - A High Level overview
Apache Spark - A High Level overviewApache Spark - A High Level overview
Apache Spark - A High Level overviewKaran Alang
 
Hpc lunch and learn
Hpc lunch and learnHpc lunch and learn
Hpc lunch and learnJohn D Almon
 
Getting started with SparkSQL - Desert Code Camp 2016
Getting started with SparkSQL  - Desert Code Camp 2016Getting started with SparkSQL  - Desert Code Camp 2016
Getting started with SparkSQL - Desert Code Camp 2016clairvoyantllc
 
Profiling and Tuning a Web Application - The Dirty Details
Profiling and Tuning a Web Application - The Dirty DetailsProfiling and Tuning a Web Application - The Dirty Details
Profiling and Tuning a Web Application - The Dirty DetailsAchievers Tech
 
Apache Spark At Apple with Sam Maclennan and Vishwanath Lakkundi
Apache Spark At Apple with Sam Maclennan and Vishwanath LakkundiApache Spark At Apple with Sam Maclennan and Vishwanath Lakkundi
Apache Spark At Apple with Sam Maclennan and Vishwanath LakkundiDatabricks
 
Big analytics meetup - Extended Jupyter Kernel Gateway
Big analytics meetup - Extended Jupyter Kernel GatewayBig analytics meetup - Extended Jupyter Kernel Gateway
Big analytics meetup - Extended Jupyter Kernel GatewayLuciano Resende
 

Semelhante a OASIS Data Analysis Platform for Enterprise (20)

OASIS - Data Analysis Platform for Multi-tenant Hadoop Cluster
OASIS - Data Analysis Platform for Multi-tenant Hadoop ClusterOASIS - Data Analysis Platform for Multi-tenant Hadoop Cluster
OASIS - Data Analysis Platform for Multi-tenant Hadoop Cluster
 
DataOps with Project Amaterasu
DataOps with Project AmaterasuDataOps with Project Amaterasu
DataOps with Project Amaterasu
 
One Tool to Rule Them All- Seamless SQL on MongoDB, MySQL and Redis with Apac...
One Tool to Rule Them All- Seamless SQL on MongoDB, MySQL and Redis with Apac...One Tool to Rule Them All- Seamless SQL on MongoDB, MySQL and Redis with Apac...
One Tool to Rule Them All- Seamless SQL on MongoDB, MySQL and Redis with Apac...
 
Apache Spark Fundamentals
Apache Spark FundamentalsApache Spark Fundamentals
Apache Spark Fundamentals
 
Data Science at Scale: Using Apache Spark for Data Science at Bitly
Data Science at Scale: Using Apache Spark for Data Science at BitlyData Science at Scale: Using Apache Spark for Data Science at Bitly
Data Science at Scale: Using Apache Spark for Data Science at Bitly
 
Webinar: What's new in CDAP 3.5?
Webinar: What's new in CDAP 3.5?Webinar: What's new in CDAP 3.5?
Webinar: What's new in CDAP 3.5?
 
Scaling out Driverless AI with IBM Spectrum Conductor - Kevin Doyle - H2O AI ...
Scaling out Driverless AI with IBM Spectrum Conductor - Kevin Doyle - H2O AI ...Scaling out Driverless AI with IBM Spectrum Conductor - Kevin Doyle - H2O AI ...
Scaling out Driverless AI with IBM Spectrum Conductor - Kevin Doyle - H2O AI ...
 
Spark Uber Development Kit
Spark Uber Development KitSpark Uber Development Kit
Spark Uber Development Kit
 
Oracle OpenWo2014 review part 03 three_paa_s_database
Oracle OpenWo2014 review part 03 three_paa_s_databaseOracle OpenWo2014 review part 03 three_paa_s_database
Oracle OpenWo2014 review part 03 three_paa_s_database
 
Getting Started with Oracle APEX
Getting Started with Oracle APEXGetting Started with Oracle APEX
Getting Started with Oracle APEX
 
The Analytic Platform behind IBM’s Watson Data Platform by Luciano Resende a...
 The Analytic Platform behind IBM’s Watson Data Platform by Luciano Resende a... The Analytic Platform behind IBM’s Watson Data Platform by Luciano Resende a...
The Analytic Platform behind IBM’s Watson Data Platform by Luciano Resende a...
 
Apereo OAE - Bootcamp
Apereo OAE - BootcampApereo OAE - Bootcamp
Apereo OAE - Bootcamp
 
Machine Learning With H2O vs SparkML
Machine Learning With H2O vs SparkMLMachine Learning With H2O vs SparkML
Machine Learning With H2O vs SparkML
 
Apache Spark - A High Level overview
Apache Spark - A High Level overviewApache Spark - A High Level overview
Apache Spark - A High Level overview
 
Hpc lunch and learn
Hpc lunch and learnHpc lunch and learn
Hpc lunch and learn
 
Getting started with SparkSQL - Desert Code Camp 2016
Getting started with SparkSQL  - Desert Code Camp 2016Getting started with SparkSQL  - Desert Code Camp 2016
Getting started with SparkSQL - Desert Code Camp 2016
 
Profiling and Tuning a Web Application - The Dirty Details
Profiling and Tuning a Web Application - The Dirty DetailsProfiling and Tuning a Web Application - The Dirty Details
Profiling and Tuning a Web Application - The Dirty Details
 
Apache Spark At Apple with Sam Maclennan and Vishwanath Lakkundi
Apache Spark At Apple with Sam Maclennan and Vishwanath LakkundiApache Spark At Apple with Sam Maclennan and Vishwanath Lakkundi
Apache Spark At Apple with Sam Maclennan and Vishwanath Lakkundi
 
Big analytics meetup - Extended Jupyter Kernel Gateway
Big analytics meetup - Extended Jupyter Kernel GatewayBig analytics meetup - Extended Jupyter Kernel Gateway
Big analytics meetup - Extended Jupyter Kernel Gateway
 
App_Engine_PPT.ppt
App_Engine_PPT.pptApp_Engine_PPT.ppt
App_Engine_PPT.ppt
 

Mais de LINE Corporation

JJUG CCC 2018 Fall 懇親会LT
JJUG CCC 2018 Fall 懇親会LTJJUG CCC 2018 Fall 懇親会LT
JJUG CCC 2018 Fall 懇親会LTLINE Corporation
 
Reduce dependency on Rx with Kotlin Coroutines
Reduce dependency on Rx with Kotlin CoroutinesReduce dependency on Rx with Kotlin Coroutines
Reduce dependency on Rx with Kotlin CoroutinesLINE Corporation
 
Kotlin/NativeでAndroidのNativeメソッドを実装してみた
Kotlin/NativeでAndroidのNativeメソッドを実装してみたKotlin/NativeでAndroidのNativeメソッドを実装してみた
Kotlin/NativeでAndroidのNativeメソッドを実装してみたLINE Corporation
 
Use Kotlin scripts and Clova SDK to build your Clova extension
Use Kotlin scripts and Clova SDK to build your Clova extensionUse Kotlin scripts and Clova SDK to build your Clova extension
Use Kotlin scripts and Clova SDK to build your Clova extensionLINE Corporation
 
The Magic of LINE 購物 Testing
The Magic of LINE 購物 TestingThe Magic of LINE 購物 Testing
The Magic of LINE 購物 TestingLINE Corporation
 
UI Automation Test with JUnit5
UI Automation Test with JUnit5UI Automation Test with JUnit5
UI Automation Test with JUnit5LINE Corporation
 
Feature Detection for UI Testing
Feature Detection for UI TestingFeature Detection for UI Testing
Feature Detection for UI TestingLINE Corporation
 
LINE 新星計劃介紹與新創團隊分享
LINE 新星計劃介紹與新創團隊分享LINE 新星計劃介紹與新創團隊分享
LINE 新星計劃介紹與新創團隊分享LINE Corporation
 
​LINE 技術合作夥伴與應用分享
​LINE 技術合作夥伴與應用分享​LINE 技術合作夥伴與應用分享
​LINE 技術合作夥伴與應用分享LINE Corporation
 
LINE 開發者社群經營與技術推廣
LINE 開發者社群經營與技術推廣LINE 開發者社群經營與技術推廣
LINE 開發者社群經營與技術推廣LINE Corporation
 
日本開發者大會短講分享
日本開發者大會短講分享日本開發者大會短講分享
日本開發者大會短講分享LINE Corporation
 
LINE Chatbot - 活動報名報到設計分享
LINE Chatbot - 活動報名報到設計分享LINE Chatbot - 活動報名報到設計分享
LINE Chatbot - 活動報名報到設計分享LINE Corporation
 
在 LINE 私有雲中使用 Managed Kubernetes
在 LINE 私有雲中使用 Managed Kubernetes在 LINE 私有雲中使用 Managed Kubernetes
在 LINE 私有雲中使用 Managed KubernetesLINE Corporation
 
LINE TODAY高效率的敏捷測試開發技巧
LINE TODAY高效率的敏捷測試開發技巧LINE TODAY高效率的敏捷測試開發技巧
LINE TODAY高效率的敏捷測試開發技巧LINE Corporation
 
LINE 區塊鏈平台及代幣經濟 - LINK Chain及LINK介紹
LINE 區塊鏈平台及代幣經濟 - LINK Chain及LINK介紹LINE 區塊鏈平台及代幣經濟 - LINK Chain及LINK介紹
LINE 區塊鏈平台及代幣經濟 - LINK Chain及LINK介紹LINE Corporation
 
LINE Things - LINE IoT平台新技術分享
LINE Things - LINE IoT平台新技術分享LINE Things - LINE IoT平台新技術分享
LINE Things - LINE IoT平台新技術分享LINE Corporation
 
LINE Pay - 一卡通支付新體驗
LINE Pay - 一卡通支付新體驗LINE Pay - 一卡通支付新體驗
LINE Pay - 一卡通支付新體驗LINE Corporation
 
LINE Platform API Update - 打造一個更好的Chatbot服務
LINE Platform API Update - 打造一個更好的Chatbot服務LINE Platform API Update - 打造一個更好的Chatbot服務
LINE Platform API Update - 打造一個更好的Chatbot服務LINE Corporation
 
Keynote - ​LINE 的技術策略佈局與跨國產品開發
Keynote - ​LINE 的技術策略佈局與跨國產品開發Keynote - ​LINE 的技術策略佈局與跨國產品開發
Keynote - ​LINE 的技術策略佈局與跨國產品開發LINE Corporation
 

Mais de LINE Corporation (20)

JJUG CCC 2018 Fall 懇親会LT
JJUG CCC 2018 Fall 懇親会LTJJUG CCC 2018 Fall 懇親会LT
JJUG CCC 2018 Fall 懇親会LT
 
Reduce dependency on Rx with Kotlin Coroutines
Reduce dependency on Rx with Kotlin CoroutinesReduce dependency on Rx with Kotlin Coroutines
Reduce dependency on Rx with Kotlin Coroutines
 
Kotlin/NativeでAndroidのNativeメソッドを実装してみた
Kotlin/NativeでAndroidのNativeメソッドを実装してみたKotlin/NativeでAndroidのNativeメソッドを実装してみた
Kotlin/NativeでAndroidのNativeメソッドを実装してみた
 
Use Kotlin scripts and Clova SDK to build your Clova extension
Use Kotlin scripts and Clova SDK to build your Clova extensionUse Kotlin scripts and Clova SDK to build your Clova extension
Use Kotlin scripts and Clova SDK to build your Clova extension
 
The Magic of LINE 購物 Testing
The Magic of LINE 購物 TestingThe Magic of LINE 購物 Testing
The Magic of LINE 購物 Testing
 
GA Test Automation
GA Test AutomationGA Test Automation
GA Test Automation
 
UI Automation Test with JUnit5
UI Automation Test with JUnit5UI Automation Test with JUnit5
UI Automation Test with JUnit5
 
Feature Detection for UI Testing
Feature Detection for UI TestingFeature Detection for UI Testing
Feature Detection for UI Testing
 
LINE 新星計劃介紹與新創團隊分享
LINE 新星計劃介紹與新創團隊分享LINE 新星計劃介紹與新創團隊分享
LINE 新星計劃介紹與新創團隊分享
 
​LINE 技術合作夥伴與應用分享
​LINE 技術合作夥伴與應用分享​LINE 技術合作夥伴與應用分享
​LINE 技術合作夥伴與應用分享
 
LINE 開發者社群經營與技術推廣
LINE 開發者社群經營與技術推廣LINE 開發者社群經營與技術推廣
LINE 開發者社群經營與技術推廣
 
日本開發者大會短講分享
日本開發者大會短講分享日本開發者大會短講分享
日本開發者大會短講分享
 
LINE Chatbot - 活動報名報到設計分享
LINE Chatbot - 活動報名報到設計分享LINE Chatbot - 活動報名報到設計分享
LINE Chatbot - 活動報名報到設計分享
 
在 LINE 私有雲中使用 Managed Kubernetes
在 LINE 私有雲中使用 Managed Kubernetes在 LINE 私有雲中使用 Managed Kubernetes
在 LINE 私有雲中使用 Managed Kubernetes
 
LINE TODAY高效率的敏捷測試開發技巧
LINE TODAY高效率的敏捷測試開發技巧LINE TODAY高效率的敏捷測試開發技巧
LINE TODAY高效率的敏捷測試開發技巧
 
LINE 區塊鏈平台及代幣經濟 - LINK Chain及LINK介紹
LINE 區塊鏈平台及代幣經濟 - LINK Chain及LINK介紹LINE 區塊鏈平台及代幣經濟 - LINK Chain及LINK介紹
LINE 區塊鏈平台及代幣經濟 - LINK Chain及LINK介紹
 
LINE Things - LINE IoT平台新技術分享
LINE Things - LINE IoT平台新技術分享LINE Things - LINE IoT平台新技術分享
LINE Things - LINE IoT平台新技術分享
 
LINE Pay - 一卡通支付新體驗
LINE Pay - 一卡通支付新體驗LINE Pay - 一卡通支付新體驗
LINE Pay - 一卡通支付新體驗
 
LINE Platform API Update - 打造一個更好的Chatbot服務
LINE Platform API Update - 打造一個更好的Chatbot服務LINE Platform API Update - 打造一個更好的Chatbot服務
LINE Platform API Update - 打造一個更好的Chatbot服務
 
Keynote - ​LINE 的技術策略佈局與跨國產品開發
Keynote - ​LINE 的技術策略佈局與跨國產品開發Keynote - ​LINE 的技術策略佈局與跨國產品開發
Keynote - ​LINE 的技術策略佈局與跨國產品開發
 

Último

EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEarley Information Science
 
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...gurkirankumar98700
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxKatpro Technologies
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
Developing An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilDeveloping An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilV3cube
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Servicegiselly40
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024The Digital Insurer
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 

Último (20)

EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
 
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
Developing An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilDeveloping An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of Brazil
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 

OASIS Data Analysis Platform for Enterprise

  • 1. OASIS – DATA ANALYSIS PLATFORM FOR ENTERPRISE KEIJI YOSHIDA – DATA ENGINEER, LINE CORPORATION
  • 2. • We have created a web-based data analysis platform named ”OASIS” • Employees can analyze data of a Hadoop cluster by writing Spark applications • 100+ employees use it every day INTRODUCTION
  • 3. Agenda 1. Motivation 2. Features & System Architecture 3. Use Cases
  • 4. Agenda 1. Motivation 2. Features & System Architecture 3. Use Cases
  • 5. • Based in Tokyo, Japan • Provides a messaging application named “LINE” • 164M monthly active users across Japan, Thailand, Taiwan, and Indonesia • Also provides various related services such as games, sticker market, etc. LINE CORPORATION
  • 6. DATA PLATFORM AT LINE Services Game Sticker Market News Music Video Streaming ・ ・ ・ Processing Data Hadoop Cluster Game LINE Ads Platform Sticker Market News Music Video Streaming LINE Ads Platform ・ ・ ・ Analysis BI / Reporting
  • 7. • Enable all employees analyze their services’ data as they like • Speed up their data analysis process and decision making PUBLICATION OF HADOOP CLUSTER
  • 8. 1. Security • Each employee can access only the data related to their service 2. Stability • Queries must not affect the performance of other queries 3. Features • Each employee can extract data from the Hadoop cluster as they like • Results can be visualized and shared within a team or a department REQUIREMENTS
  • 9. 1. Security • Kerberize the Hadoop cluster and install Apache Ranger 2. Stability • Use Apache Spark as a query and application execution engine 3. Features • Try Apache Zeppelin for the Web UI SOLUTIONS
  • 10. • Framework to manage access control over a Hadoop cluster • Used to control each employee’s data access APACHE RANGER
  • 11. 1. Security • Kerberize the Hadoop cluster and install Apache Ranger 2. Stability • Use Apache Spark as a query and application execution engine 3. Features • Try Apache Zeppelin for the Web UI SOLUTIONS
  • 12. • Web-based data analysis tool • Supports Apache Spark and user impersonation • Results can be shared within multiple users as a form of a notebook APACHE ZEPPELIN
  • 13. SYSTEM ARCHITECTURE Apache Zeppelin 0.7.3 Hadoop YARN Cluster LDAP Spark application submission + User impersonation End Users HDFS + Apache Ranger User authentication Data access
  • 14. 1. Security • An arbitrary user can be set to scheduled notebook’s execution user • Users can access the data which they don’t have access rights to ISSUES OF APACHE ZEPPELIN 0.7.3
  • 15. 2. Stability • Runs only on a single server • Does not support the “yarn-cluster” mode of Apache Spark • Freezes when a Spark driver program consumes many server resources ISSUES OF APACHE ZEPPELIN 0.7.3
  • 16. 3. Features • Users have to set access control to each notebook • Users cannot execute a notebook while changing only its parameters without saving it ISSUES OF APACHE ZEPPELIN 0.7.3
  • 17. OASIS
  • 18. COMPARISON For a single team • Does not go with Apache Ranger • Runs only on a single server • Access control per notebook Apache Zeppelin For an enterprise • Works well with Apache Ranger • Runs on multiple servers • Access control per team OASIS
  • 19. Agenda 1. Motivation 2. Features & System Architecture 3. Use Cases
  • 20. • User impersonation • Data Visualization • Notebooks Sharing • Scalable OVERVIEW
  • 22. • A root directory of notebooks for a team, a department, or a service • Access right for each user is separately set in each space • 2 types of access rights: ”read write” and “read only” SPACE Space 1 User A (read write) User B (read only) Space 2 User C (read write) User D (read only)
  • 24. • A single Spark application launches per notebook session • Notebook author’s account is used to access files • Spark, Spark SQL, PySpark, and SparkR are available • Each language of a single notebook session shares a single Spark application SPARK APPLICATION
  • 27. • Notebooks can be executed automatically on a prescribed schedule • Contents of notebooks can be kept updated by this feature • This feature is also used for creating a light weight ETL processing SCHEDULING
  • 28. • Parameters can be injected into a notebook during its execution • “read only” users can execute a notebook while changing its parameters PARAMETERS
  • 31. SYSTEM ARCHITECTURE End Users Frontend / API OASIS Job Scheduler MySQL Redis Spark Interpreter Hadoop YARN Cluster HDFS / Apache Ranger
  • 32. • 500 DataNodes / NodeManagers • HDFS usage: 20PB • 150+ Hive databases • 1,500+ Hive tables HADOOP CLUSTER
  • 33. Agenda 1. Motivation 2. Features & System Architecture 3. Use Cases
  • 34. • 1,500+ users • 100+ daily active users • 300+ monthly active users • 30+ spaces (i.e. departments, teams, or services) • 1,100+ notebooks • 200+ scheduled notebooks STATS
  • 35. 1. Report 2. Interactive dashboard 3. ETL 4. Monitoring 5. Ad hoc analysis USE CASES
  • 40. 5. AD HOC ANALYSIS
  • 41. • We created OASIS to solve the issues of Apache Zeppelin • Extracted data can be visualized and shared within a team • OASIS utilizes the user impersonation feature of Apache Spark • At LINE, OASIS is used for reporting, data monitoring, ad hoc analysis, etc. RECAP