SlideShare uma empresa Scribd logo
1 de 47
Baixar para ler offline
Apache PredictionIO 架構與整合
Establish an effective machine learning platform efficiently
李文良 張峰睿
亦思科技
2017/09/30
About us
李文良 (William Lee)
亦思科技 研究發展處 專案經理
williamlee@is-land.com.tw
張峰睿 (Frank Chang)
亦思科技 研究發展處 系統架構師
frank@is-land.com.tw
Outline
•Background
•PredictionIO Overview
•Quick Start your first Engine
•Customizing an Engine
•Implementation on Enterprise Production
•Summary
對於機器學習的期望
《MIT Technology Review 》and 《Google Cloud》 報告裡所提到的:
• 有 50% 的組織規劃將在將來透過機器學習來加深對手上資料群的了解,以
便得出更多的資訊。
• 用於取得更多的競爭優勢,或是加速現有資料的分析。
• 甚至有 31% 認為可以透過機器學習達到降低成本的功效。
https://s3.amazonaws.com/files.technologyreview.com/whitepapers/MITTR_GoogleforWork_Survey.pdf
機器學習在半導體領域的應用
● 半導體生產通常需經過數百道的製程,過程中產出數百萬筆的資料。
● 在產品的開發過程 RD 必須為這些資料訂定 SPEC 用於檢查品質以及機台調
整。
● 通常需要 IT 人員協助取得資料集轉入統計軟體方能進行分析。
● 若系統結合收集資料並且機器學習自動產生 SPEC 輔助人員確認,能減少產
品開發過程中所耗用的時間。
開發產品 資料收集 分析資料 調整參數
SPEC
機器學習在金融領域的應用
● 現行金融業已經進入電子交易的時代,歷史交易累積成為大量的資料群。
並且隨時會透過交易系統加入新的資料。
● 根據不同需求的從歷史資料庫中提取資訊進行分析。
● 若是能建構出系統統合收集歷史資料的資料庫並且提供幾種機器學習的演
算法,便能夠加快分析資料到產出目標的時間。
歷史資料
用戶端
用戶端
用戶端
分析資料
設計產品
風險管理
異常分析
市場預測
分析資料
分析資料
從 Spark MLLib 開始 run examples 時似乎不錯
實際上建構系統時,卻有許多需要注意的部份......
Training Model 可以
存起來下次使用 。
存放在哪?
如何管理?
App 或現有系統結合?
如何即時並且方便使用?
Algorithm 如何執行?
參數和環境如何設定?
資料怎麼進來?存哪?
Hidden Technical Debt in Machine Learning Systems
“Only a small fraction of real-world ML systems is composed of the ML code.
The required surrounding infrastructure is vast and complex.”
Hidden Technical Debt in Machine Learning Systems , by Sculley, et al., NIPS, 2016
Big Data System with Machine Learning Stacks
API Service Server
Spark ML
Caffe, DeepLearning4J, Tensorflow, …...
Hadoop, Spark, …...
RDB, Hadoop HDFS, HBase, ES, …...
Apps
Algorithms
Processing
DataStore
PredictionIO
http://sssslide.com/speakerdeck.com/takahiro/building-a-recommendation-engine-with-spark-and-apache-predictionio
Outline
•Background
•PredictionIO Overview
•Quick Start your first Engine
•Customizing an Engine
•Implementation on Enterprise Production
•Summary
What is Apache PredictionIO
Apache PredictionIO (incubating) 是開源機器學習伺服器平台。
提供開發者及資料科學家有效快速建立預測引擎。
並且整合所有應用系統達到 Machine Learning as a Service 的目標。
PredictionIO 可帶來下列預期效益:
• 提供簡便資料收集以及儲存的方案,統合現有生態系中的平台。
• 讓開發者可以快速的使用模組建立 machine learning engine 並提供 Service
便於整合外部系統。
• 可以透過模組修改建立自訂的 machine learning engine。
Latest release on 9/26
From PredictionIO JIRA web site, we can find:
• Version 0.12.0 was released on 26/Sep’17
SDK / Service
Client
Architecture
Processing
Event Server Prediction Engine
PredictionIO
Platform
Engine
Template
Analytics
Tools
Storage
Build Engine
and
Deploy
PIO Storage Alternatives
Outline
•Background
•PredictionIO Overview
•Quick Start your first Engine
•Customizing an Engine
•Implementation on Enterprise Production
•Summary
● REST APIs
● SDKs
● 54 of available templates
● DASE for custom needs
● Source Code
● Docker
Quick Start your first Engine
Install & Start
EventServer
Train & Deploy
Prediction Engine
Query Result
via REST
1. Install and Run PredictionIO 2. Create a new Engine from an Engine
Template
3. Generate App ID and Access Key
6. Use the Engine
Alternatives
Operation
Steps
5. Deploy the Engine as a Service
4. Collecting Data
Installation & Quick Start
● 請參考 https://github.com/apache/incubator-predictionio/
https://github.com/apache/incubator-predictionio/
Installing with Docker
● Install docker firstly
● Start docker-predictionio
$ docker run -it -p 8000:8000 steveny/predictionio /bin/bash
http://predictionio.incubator.apache.org/community/projects/#docker-installation-for-predictionio
Installing From Source
● Up-to-date Version : 0.12.0
● Downloading Source Code : https://github.com/apache/incubator-predictionio/
● Building Dependencies:
Ecosystem Versions of Dependencies Default
Scala 2.10.x, 2.11.x 2.11.8
Spark 1.6.x, 2.0.x, 2.1.x 2.1.1
Elasticsearch 1.7.x, 5.x 5.5.2
Hadoop 2.4.x to 2.7.x 2.7.3(*)
HBase 0.98.x, 1.2.x 1.2.6(*)
https://predictionio.incubator.apache.org/install/install-sourcecode/
$ ./make-distribution.sh -Dscala.version=2.11.8 -Dspark.version=2.1.0 -Delasticsearch.version=5.3.0
● Setup and Start PredictionIO
Command Line
● General Commands
○ pio status : Displays install path and running status of PredictionIO system and its
dependencies.
● Event Server Commands
○ pio eventserver : Launch the Event Server.
○ pio app : Manage apps that are used by the Event Server
● Engine Commands
○ pio build : Build the engine at the current directory.
○ pio train : Kick off a training using an engine.
○ pio deploy : Deploy an engine as an engine server. If no instance ID is specified, it will
deploy the latest instance.
https://predictionio.incubator.apache.org/cli/#engine-commands
$ curl -i -X POST http://localhost:7070/events.json?accessKey=$ACCESS_KEY 
-H "Content-Type: application/json" -d '{$JSON-CONTEXT}'
REST API and SDKs
● REST API :
○ port number for server access (default value, please make sure your setup) :
■ Event Server : 7070
■ Engine : 8000
○ example
$ curl -H "Content-Type: application/json" 
-d '{ $JSON-CONTEXT }' http://localhost:8000/queries.json
● SDKs :
○ Java & Android
○ Python
○ PHP
○ Ruby
https://predictionio.incubator.apache.org/cli/#engine-commands
Outline
•Background
•PredictionIO Overview
•Quick Start your first Engine
•Customizing an Engine
•Implementation on Enterprise Production
•Summary
Customizing your Engine with D-A-S-E
參數設定
D
A
S
E
https://predictionio.incubator.apache.org/customize/
Datasource.scala Preparator.scala ALSAlgorithm
.scala
Serving.scala
Evaluation.scala
Engine.scalaengine.json
Engine
Query
case class case class
Predicted
Result
Engine
Factory
object
RecommendationEngine
Query via REST Predicted Result
Engine
參數設定
Engine.scala
D A S E
Example
Data Source and Data Preparator
readTrain()
D A S E
events
RDD
ratings
RDD
Training
Data
prepare()
Action
Required
(*) Prepared
Data
DataSource Preparator
Algorithm
DataSource.scala
Preparator.scala
Note :
* : Performs any necessary feature selection or data processing, etc.
Event
Server
Algorithm
train()
D A S E
algo
Model
predict()
Model
Predicted
Result
Algorithm
Serving
Prepared
Data
train
Query
ALSAlgorithm
.scala
參數設定
Note :
*: train() is called when you run “pio train”
example of engine.json for Algorithm
{
...
"algorithms": [
{
"name": "als",
"params": {
"rank": 10,
"numIterations": 20,
"lambda": 0.01,
"seed": 3
}
}
]
...
}
D A S E
Serving
serve()
D A S E
Predicted
Result Predicted
Result
(JSON)
Serving
Query
Predicted
Results
Combine
Predicted
Result
(*)
Note:
*: serve() method will combine multiple predicted results into one if you have more than one predictive model
Quiz (1/4)
- Read Custom Events
Q: 如何將 rate 及 buy 二種 event 改成 like 及 dislike ?
events
RDD
val eventsRDD: RDD[Event] = PEventStore.find(
appName = dsp.appName,
entityType = Some("user"),
eventNames = Some(List("rate", "buy")), // read "rate" and "buy" event
// targetEntityType is optional field of an event.
targetEntityType = Some(Some("item")))(sc)
D A S E
val eventsRDD: RDD[Event] = PEventStore.find(
appName = dsp.appName,
entityType = Some("customer"), // change user to customer
eventNames = Some(List("like", "dislike")), // read "like" and "dislike” event
// targetEntityType is optional field of an event.
targetEntityType = Some(Some("product")))(sc) // Modified
Before
After
Quiz (2/4)
- Map Custom Events
Q: 如何將 rate 及 buy 二種 event 改成 like 及 dislike ? (續)
ratings
RDD
D A S E
val ratingValue: Double = event.event match {
case "rate" => event.properties.get[Double]("rating")
case "buy" => 4.0 // map buy event to rating value of 4
case "like" => 4.0 // map a like event to a rating of 4.0
case "dislike" => 1.0 // map a like event to a rating of 1.0
case _ => throw new Exception(s"Unexpected event ${event} is read.")
}
val ratingValue: Double = event.event match {
case "rate" => event.properties.get[Double]("rating")
case "buy" => 4.0 // map buy event to rating value of 4
case _ => throw new Exception(s"Unexpected event ${event} is read.")
}
Before
After
Quiz (3/4)
- Customizing Data Preparator
Q: 如何將新增黑名單功能,讓系統濾除部份產品 ?
class Preparator
extends PPreparator[TrainingData, PreparedData] {
def prepare(sc: SparkContext, trainingData: TrainingData): PreparedData = {
new PreparedData(ratings = trainingData.ratings)
}
}
D A S E
import scala.io.Source // ADDED
class Preparator
extends PPreparator[TrainingData, PreparedData] {
def prepare(sc: SparkContext, trainingData: TrainingData): PreparedData = {
val noTrainItems = Source.fromFile("./data/sample_not_train_data.txt").getLines.toSet
// exclude noTrainItems from original trainingData
val ratings = trainingData.ratings.filter( r => !noTrainItems.contains(r.item) )
new PreparedData(ratings)
}
}
Before
After
Quiz (4/4)
- Release for your Change
D A S E
$ pio build
$ pio train
$ pio deploy
● How to release the modified engine(s) ?
Evaluation (1/4)
Evaluation
AccuracyEvaluation
Evaluation
Metrics
Engine
Params
List
AccuracyAlgo
D A S E
case class Accuracy
extends AverageMetric[EmptyEvaluationInfo, Query, PredictedResult, ActualResult] {
def calculate(query: Query, predicted: PredictedResult, actual: ActualResult)
: Double = (if (predicted.label == actual.label) 1.0 else 0.0)
}
Evaluation (2/4)
Query
case class case class
Predicted
Result
RecommendationEngine
Query via REST Predicted Result
class DataSource
參數設定
D A S E
case class
Actual
Result
Evaluation (3/4)
readTrain()
events
RDD
ratings
RDD
Training
Data
prepare()
(TBD)
Prepared
Data
DataSource Preparator
Algorithm
readEval()
TrainingData
RDD(Query, ActualResult)
events
DB
K-fold
splitting
D A S E
Evaluation (4/4)
● Build and run the evaluation
● Deploy the best engine parameter
D A S E
Outline
•Background
•PredictionIO Overview
•Quick Start your first Engine
•Customizing an Engine
•Implementation on Enterprise Production
•Summary
Implementation on Enterprise Production
Test Log
Cluster
Cluster
Cluster
Cluster
Batch Data ( pio import)
Real Time Data
( Streaming + PIO SDK )
Yield-En.
Event Server
Cluster
Prediction Engine Cluster
P1 Engine P2 Engine P3 Engine
Meta
Event
Data
Model
Query via REST
Prediction Result RDD
Off-line
Training
PredictionIO Platform
Deploy the Event Server onto Prediction Cluster
Setup
PredictionIO
Run
eventserver
listen pio_engine_7070 :7070
mode http
balance roundrobin
option httpclose
option forwardfor
option redispatch
retries 3
log global
log 127.0.0.1 local4 info
server piovm1 192.168.56.101:7070 check weight 1 maxconn 30
server piovm2 192.168.56.102:7070 check weight 1 maxconn 30
server piovm3 192.168.56.103:7070 check weight 1 maxconn 30
● HAProxy configuration for Event Server Cluster
Setup
HAProxy
分別在需佈署之 Event Server 上,執行下列指令:
$ pio eventserver &
Deploy the Engine onto Prediction Cluster
$ pio
deploy
分別在需佈署之 Prediction Server 上,執行下列指令:
$ pio deploy --port 8001 --engine-instance-id AV6dTEoKBlbECIGzXhaS
Off-Line
Engine Training
Summary
• Apache PredictionIO project is an active and popular project.
• It will let you to integrate machine learning functions in your apps effectively
and efficiently.
• It is also convenient for you to consolidate multiple PredictionIO nodes with
HAProxy and other Hadoop ecosystem to provide scalable and stable solution.
Thank You !

Mais conteúdo relacionado

Mais procurados

Teaching Apache Spark Clusters to Manage Their Workers Elastically: Spark Sum...
Teaching Apache Spark Clusters to Manage Their Workers Elastically: Spark Sum...Teaching Apache Spark Clusters to Manage Their Workers Elastically: Spark Sum...
Teaching Apache Spark Clusters to Manage Their Workers Elastically: Spark Sum...
Spark Summit
 
A Journey to Building an Autonomous Streaming Data Platform—Scaling to Trilli...
A Journey to Building an Autonomous Streaming Data Platform—Scaling to Trilli...A Journey to Building an Autonomous Streaming Data Platform—Scaling to Trilli...
A Journey to Building an Autonomous Streaming Data Platform—Scaling to Trilli...
Databricks
 
Building the Foundations of an Intelligent, Event-Driven Data Platform at EFSA
Building the Foundations of an Intelligent, Event-Driven Data Platform at EFSABuilding the Foundations of an Intelligent, Event-Driven Data Platform at EFSA
Building the Foundations of an Intelligent, Event-Driven Data Platform at EFSA
Databricks
 

Mais procurados (20)

Spark Summit San Francisco 2016 - Matei Zaharia Keynote: Apache Spark 2.0
Spark Summit San Francisco 2016 - Matei Zaharia Keynote: Apache Spark 2.0Spark Summit San Francisco 2016 - Matei Zaharia Keynote: Apache Spark 2.0
Spark Summit San Francisco 2016 - Matei Zaharia Keynote: Apache Spark 2.0
 
Jim Dowling - Multi-tenant Flink-as-a-Service on YARN
Jim Dowling - Multi-tenant Flink-as-a-Service on YARN Jim Dowling - Multi-tenant Flink-as-a-Service on YARN
Jim Dowling - Multi-tenant Flink-as-a-Service on YARN
 
Teaching Apache Spark Clusters to Manage Their Workers Elastically: Spark Sum...
Teaching Apache Spark Clusters to Manage Their Workers Elastically: Spark Sum...Teaching Apache Spark Clusters to Manage Their Workers Elastically: Spark Sum...
Teaching Apache Spark Clusters to Manage Their Workers Elastically: Spark Sum...
 
Native Support of Prometheus Monitoring in Apache Spark 3.0
Native Support of Prometheus Monitoring in Apache Spark 3.0Native Support of Prometheus Monitoring in Apache Spark 3.0
Native Support of Prometheus Monitoring in Apache Spark 3.0
 
A Journey to Building an Autonomous Streaming Data Platform—Scaling to Trilli...
A Journey to Building an Autonomous Streaming Data Platform—Scaling to Trilli...A Journey to Building an Autonomous Streaming Data Platform—Scaling to Trilli...
A Journey to Building an Autonomous Streaming Data Platform—Scaling to Trilli...
 
An Insider’s Guide to Maximizing Spark SQL Performance
 An Insider’s Guide to Maximizing Spark SQL Performance An Insider’s Guide to Maximizing Spark SQL Performance
An Insider’s Guide to Maximizing Spark SQL Performance
 
Real-Time Machine Learning with Redis, Apache Spark, Tensor Flow, and more wi...
Real-Time Machine Learning with Redis, Apache Spark, Tensor Flow, and more wi...Real-Time Machine Learning with Redis, Apache Spark, Tensor Flow, and more wi...
Real-Time Machine Learning with Redis, Apache Spark, Tensor Flow, and more wi...
 
Productionizing Machine Learning with a Microservices Architecture
Productionizing Machine Learning with a Microservices ArchitectureProductionizing Machine Learning with a Microservices Architecture
Productionizing Machine Learning with a Microservices Architecture
 
Healthcare Claim Reimbursement using Apache Spark
Healthcare Claim Reimbursement using Apache SparkHealthcare Claim Reimbursement using Apache Spark
Healthcare Claim Reimbursement using Apache Spark
 
Best Practices for Building Robust Data Platform with Apache Spark and Delta
Best Practices for Building Robust Data Platform with Apache Spark and DeltaBest Practices for Building Robust Data Platform with Apache Spark and Delta
Best Practices for Building Robust Data Platform with Apache Spark and Delta
 
Continuous Integration & Continuous Delivery
Continuous Integration & Continuous DeliveryContinuous Integration & Continuous Delivery
Continuous Integration & Continuous Delivery
 
Benchmark Tests and How-Tos of Convolutional Neural Network on HorovodRunner ...
Benchmark Tests and How-Tos of Convolutional Neural Network on HorovodRunner ...Benchmark Tests and How-Tos of Convolutional Neural Network on HorovodRunner ...
Benchmark Tests and How-Tos of Convolutional Neural Network on HorovodRunner ...
 
Databricks: What We Have Learned by Eating Our Dog Food
Databricks: What We Have Learned by Eating Our Dog FoodDatabricks: What We Have Learned by Eating Our Dog Food
Databricks: What We Have Learned by Eating Our Dog Food
 
Building the Foundations of an Intelligent, Event-Driven Data Platform at EFSA
Building the Foundations of an Intelligent, Event-Driven Data Platform at EFSABuilding the Foundations of an Intelligent, Event-Driven Data Platform at EFSA
Building the Foundations of an Intelligent, Event-Driven Data Platform at EFSA
 
Using Databricks as an Analysis Platform
Using Databricks as an Analysis PlatformUsing Databricks as an Analysis Platform
Using Databricks as an Analysis Platform
 
Data Science Across Data Sources with Apache Arrow
Data Science Across Data Sources with Apache ArrowData Science Across Data Sources with Apache Arrow
Data Science Across Data Sources with Apache Arrow
 
Enancing Threat Detection with Big Data and AI
Enancing Threat Detection with Big Data and AIEnancing Threat Detection with Big Data and AI
Enancing Threat Detection with Big Data and AI
 
Koalas: How Well Does Koalas Work?
Koalas: How Well Does Koalas Work?Koalas: How Well Does Koalas Work?
Koalas: How Well Does Koalas Work?
 
How to Boost 100x Performance for Real World Application with Apache Spark-(G...
How to Boost 100x Performance for Real World Application with Apache Spark-(G...How to Boost 100x Performance for Real World Application with Apache Spark-(G...
How to Boost 100x Performance for Real World Application with Apache Spark-(G...
 
MLflow Model Serving
MLflow Model ServingMLflow Model Serving
MLflow Model Serving
 

Semelhante a Prediction io 架構與整合 -DataCon.TW-2017

Projeto-web-services-Spring-Boot-JPA.pdf
Projeto-web-services-Spring-Boot-JPA.pdfProjeto-web-services-Spring-Boot-JPA.pdf
Projeto-web-services-Spring-Boot-JPA.pdf
AdrianoSantos888423
 

Semelhante a Prediction io 架構與整合 -DataCon.TW-2017 (20)

[2C2]PredictionIO
[2C2]PredictionIO[2C2]PredictionIO
[2C2]PredictionIO
 
Introduce to PredictionIO
Introduce to PredictionIOIntroduce to PredictionIO
Introduce to PredictionIO
 
pio_present
pio_presentpio_present
pio_present
 
Introduction to PredictionIO
Introduction to PredictionIOIntroduction to PredictionIO
Introduction to PredictionIO
 
Pyramid deployment
Pyramid deploymentPyramid deployment
Pyramid deployment
 
How to Webpack your Django!
How to Webpack your Django!How to Webpack your Django!
How to Webpack your Django!
 
SharePoint Saturday Atlanta 2015
SharePoint Saturday Atlanta 2015SharePoint Saturday Atlanta 2015
SharePoint Saturday Atlanta 2015
 
Data Seeding via Parameterized API Requests
Data Seeding via Parameterized API RequestsData Seeding via Parameterized API Requests
Data Seeding via Parameterized API Requests
 
PredictionIO – A Machine Learning Server in Scala – SF Scala
PredictionIO – A Machine Learning Server in Scala – SF ScalaPredictionIO – A Machine Learning Server in Scala – SF Scala
PredictionIO – A Machine Learning Server in Scala – SF Scala
 
Node.js and Parse
Node.js and ParseNode.js and Parse
Node.js and Parse
 
Django Architecture Introduction
Django Architecture IntroductionDjango Architecture Introduction
Django Architecture Introduction
 
Analytics Metrics delivery and ML Feature visualization: Evolution of Data Pl...
Analytics Metrics delivery and ML Feature visualization: Evolution of Data Pl...Analytics Metrics delivery and ML Feature visualization: Evolution of Data Pl...
Analytics Metrics delivery and ML Feature visualization: Evolution of Data Pl...
 
OpenStack Rally presentation by RamaK
OpenStack Rally presentation by RamaKOpenStack Rally presentation by RamaK
OpenStack Rally presentation by RamaK
 
Projeto-web-services-Spring-Boot-JPA.pdf
Projeto-web-services-Spring-Boot-JPA.pdfProjeto-web-services-Spring-Boot-JPA.pdf
Projeto-web-services-Spring-Boot-JPA.pdf
 
NYC_2016_slides
NYC_2016_slidesNYC_2016_slides
NYC_2016_slides
 
Dive into DevOps | March, Building with Terraform, Volodymyr Tsap
Dive into DevOps | March, Building with Terraform, Volodymyr TsapDive into DevOps | March, Building with Terraform, Volodymyr Tsap
Dive into DevOps | March, Building with Terraform, Volodymyr Tsap
 
Google Cloud Platform
Google Cloud Platform Google Cloud Platform
Google Cloud Platform
 
Iac d.damyanov 4.pptx
Iac d.damyanov 4.pptxIac d.damyanov 4.pptx
Iac d.damyanov 4.pptx
 
Pyramid Deployment and Maintenance
Pyramid Deployment and MaintenancePyramid Deployment and Maintenance
Pyramid Deployment and Maintenance
 
Google cloud certified professional cloud developer practice dumps 2020
Google cloud certified professional cloud developer practice dumps 2020Google cloud certified professional cloud developer practice dumps 2020
Google cloud certified professional cloud developer practice dumps 2020
 

Último

Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...
nirzagarg
 
一比一原版(UCD毕业证书)加州大学戴维斯分校毕业证成绩单原件一模一样
一比一原版(UCD毕业证书)加州大学戴维斯分校毕业证成绩单原件一模一样一比一原版(UCD毕业证书)加州大学戴维斯分校毕业证成绩单原件一模一样
一比一原版(UCD毕业证书)加州大学戴维斯分校毕业证成绩单原件一模一样
wsppdmt
 
怎样办理圣路易斯大学毕业证(SLU毕业证书)成绩单学校原版复制
怎样办理圣路易斯大学毕业证(SLU毕业证书)成绩单学校原版复制怎样办理圣路易斯大学毕业证(SLU毕业证书)成绩单学校原版复制
怎样办理圣路易斯大学毕业证(SLU毕业证书)成绩单学校原版复制
vexqp
 
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
ZurliaSoop
 
Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...
Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...
Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...
gajnagarg
 
Abortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Jeddah | +966572737505 | Get CytotecAbortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Riyadh +966572737505 get cytotec
 
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get CytotecAbortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Riyadh +966572737505 get cytotec
 
Jual Cytotec Asli Obat Aborsi No. 1 Paling Manjur
Jual Cytotec Asli Obat Aborsi No. 1 Paling ManjurJual Cytotec Asli Obat Aborsi No. 1 Paling Manjur
Jual Cytotec Asli Obat Aborsi No. 1 Paling Manjur
ptikerjasaptiker
 
+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...
+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...
+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...
Health
 
怎样办理伦敦大学毕业证(UoL毕业证书)成绩单学校原版复制
怎样办理伦敦大学毕业证(UoL毕业证书)成绩单学校原版复制怎样办理伦敦大学毕业证(UoL毕业证书)成绩单学校原版复制
怎样办理伦敦大学毕业证(UoL毕业证书)成绩单学校原版复制
vexqp
 
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
Klinik kandungan
 
怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制
怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制
怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制
vexqp
 
Cytotec in Jeddah+966572737505) get unwanted pregnancy kit Riyadh
Cytotec in Jeddah+966572737505) get unwanted pregnancy kit RiyadhCytotec in Jeddah+966572737505) get unwanted pregnancy kit Riyadh
Cytotec in Jeddah+966572737505) get unwanted pregnancy kit Riyadh
Abortion pills in Riyadh +966572737505 get cytotec
 
Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...
Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...
Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...
gajnagarg
 

Último (20)

Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...
 
The-boAt-Story-Navigating-the-Waves-of-Innovation.pptx
The-boAt-Story-Navigating-the-Waves-of-Innovation.pptxThe-boAt-Story-Navigating-the-Waves-of-Innovation.pptx
The-boAt-Story-Navigating-the-Waves-of-Innovation.pptx
 
一比一原版(UCD毕业证书)加州大学戴维斯分校毕业证成绩单原件一模一样
一比一原版(UCD毕业证书)加州大学戴维斯分校毕业证成绩单原件一模一样一比一原版(UCD毕业证书)加州大学戴维斯分校毕业证成绩单原件一模一样
一比一原版(UCD毕业证书)加州大学戴维斯分校毕业证成绩单原件一模一样
 
Capstone in Interprofessional Informatic // IMPACT OF COVID 19 ON EDUCATION
Capstone in Interprofessional Informatic  // IMPACT OF COVID 19 ON EDUCATIONCapstone in Interprofessional Informatic  // IMPACT OF COVID 19 ON EDUCATION
Capstone in Interprofessional Informatic // IMPACT OF COVID 19 ON EDUCATION
 
怎样办理圣路易斯大学毕业证(SLU毕业证书)成绩单学校原版复制
怎样办理圣路易斯大学毕业证(SLU毕业证书)成绩单学校原版复制怎样办理圣路易斯大学毕业证(SLU毕业证书)成绩单学校原版复制
怎样办理圣路易斯大学毕业证(SLU毕业证书)成绩单学校原版复制
 
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
 
DATA SUMMIT 24 Building Real-Time Pipelines With FLaNK
DATA SUMMIT 24  Building Real-Time Pipelines With FLaNKDATA SUMMIT 24  Building Real-Time Pipelines With FLaNK
DATA SUMMIT 24 Building Real-Time Pipelines With FLaNK
 
Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...
Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...
Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...
 
Abortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Jeddah | +966572737505 | Get CytotecAbortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Jeddah | +966572737505 | Get Cytotec
 
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get CytotecAbortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
 
Jual Cytotec Asli Obat Aborsi No. 1 Paling Manjur
Jual Cytotec Asli Obat Aborsi No. 1 Paling ManjurJual Cytotec Asli Obat Aborsi No. 1 Paling Manjur
Jual Cytotec Asli Obat Aborsi No. 1 Paling Manjur
 
+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...
+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...
+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...
 
怎样办理伦敦大学毕业证(UoL毕业证书)成绩单学校原版复制
怎样办理伦敦大学毕业证(UoL毕业证书)成绩单学校原版复制怎样办理伦敦大学毕业证(UoL毕业证书)成绩单学校原版复制
怎样办理伦敦大学毕业证(UoL毕业证书)成绩单学校原版复制
 
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
 
Ranking and Scoring Exercises for Research
Ranking and Scoring Exercises for ResearchRanking and Scoring Exercises for Research
Ranking and Scoring Exercises for Research
 
怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制
怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制
怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制
 
Cytotec in Jeddah+966572737505) get unwanted pregnancy kit Riyadh
Cytotec in Jeddah+966572737505) get unwanted pregnancy kit RiyadhCytotec in Jeddah+966572737505) get unwanted pregnancy kit Riyadh
Cytotec in Jeddah+966572737505) get unwanted pregnancy kit Riyadh
 
Switzerland Constitution 2002.pdf.........
Switzerland Constitution 2002.pdf.........Switzerland Constitution 2002.pdf.........
Switzerland Constitution 2002.pdf.........
 
Aspirational Block Program Block Syaldey District - Almora
Aspirational Block Program Block Syaldey District - AlmoraAspirational Block Program Block Syaldey District - Almora
Aspirational Block Program Block Syaldey District - Almora
 
Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...
Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...
Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...
 

Prediction io 架構與整合 -DataCon.TW-2017

  • 1. Apache PredictionIO 架構與整合 Establish an effective machine learning platform efficiently 李文良 張峰睿 亦思科技 2017/09/30
  • 2. About us 李文良 (William Lee) 亦思科技 研究發展處 專案經理 williamlee@is-land.com.tw 張峰睿 (Frank Chang) 亦思科技 研究發展處 系統架構師 frank@is-land.com.tw
  • 3. Outline •Background •PredictionIO Overview •Quick Start your first Engine •Customizing an Engine •Implementation on Enterprise Production •Summary
  • 4. 對於機器學習的期望 《MIT Technology Review 》and 《Google Cloud》 報告裡所提到的: • 有 50% 的組織規劃將在將來透過機器學習來加深對手上資料群的了解,以 便得出更多的資訊。 • 用於取得更多的競爭優勢,或是加速現有資料的分析。 • 甚至有 31% 認為可以透過機器學習達到降低成本的功效。 https://s3.amazonaws.com/files.technologyreview.com/whitepapers/MITTR_GoogleforWork_Survey.pdf
  • 5. 機器學習在半導體領域的應用 ● 半導體生產通常需經過數百道的製程,過程中產出數百萬筆的資料。 ● 在產品的開發過程 RD 必須為這些資料訂定 SPEC 用於檢查品質以及機台調 整。 ● 通常需要 IT 人員協助取得資料集轉入統計軟體方能進行分析。 ● 若系統結合收集資料並且機器學習自動產生 SPEC 輔助人員確認,能減少產 品開發過程中所耗用的時間。 開發產品 資料收集 分析資料 調整參數 SPEC
  • 6. 機器學習在金融領域的應用 ● 現行金融業已經進入電子交易的時代,歷史交易累積成為大量的資料群。 並且隨時會透過交易系統加入新的資料。 ● 根據不同需求的從歷史資料庫中提取資訊進行分析。 ● 若是能建構出系統統合收集歷史資料的資料庫並且提供幾種機器學習的演 算法,便能夠加快分析資料到產出目標的時間。 歷史資料 用戶端 用戶端 用戶端 分析資料 設計產品 風險管理 異常分析 市場預測 分析資料 分析資料
  • 7. 從 Spark MLLib 開始 run examples 時似乎不錯
  • 8. 實際上建構系統時,卻有許多需要注意的部份...... Training Model 可以 存起來下次使用 。 存放在哪? 如何管理? App 或現有系統結合? 如何即時並且方便使用? Algorithm 如何執行? 參數和環境如何設定? 資料怎麼進來?存哪?
  • 9. Hidden Technical Debt in Machine Learning Systems “Only a small fraction of real-world ML systems is composed of the ML code. The required surrounding infrastructure is vast and complex.” Hidden Technical Debt in Machine Learning Systems , by Sculley, et al., NIPS, 2016
  • 10. Big Data System with Machine Learning Stacks API Service Server Spark ML Caffe, DeepLearning4J, Tensorflow, …... Hadoop, Spark, …... RDB, Hadoop HDFS, HBase, ES, …... Apps Algorithms Processing DataStore PredictionIO http://sssslide.com/speakerdeck.com/takahiro/building-a-recommendation-engine-with-spark-and-apache-predictionio
  • 11. Outline •Background •PredictionIO Overview •Quick Start your first Engine •Customizing an Engine •Implementation on Enterprise Production •Summary
  • 12. What is Apache PredictionIO Apache PredictionIO (incubating) 是開源機器學習伺服器平台。 提供開發者及資料科學家有效快速建立預測引擎。 並且整合所有應用系統達到 Machine Learning as a Service 的目標。 PredictionIO 可帶來下列預期效益: • 提供簡便資料收集以及儲存的方案,統合現有生態系中的平台。 • 讓開發者可以快速的使用模組建立 machine learning engine 並提供 Service 便於整合外部系統。 • 可以透過模組修改建立自訂的 machine learning engine。
  • 13. Latest release on 9/26 From PredictionIO JIRA web site, we can find: • Version 0.12.0 was released on 26/Sep’17
  • 14.
  • 15. SDK / Service Client Architecture Processing Event Server Prediction Engine PredictionIO Platform Engine Template Analytics Tools Storage Build Engine and Deploy
  • 17. Outline •Background •PredictionIO Overview •Quick Start your first Engine •Customizing an Engine •Implementation on Enterprise Production •Summary
  • 18. ● REST APIs ● SDKs ● 54 of available templates ● DASE for custom needs ● Source Code ● Docker Quick Start your first Engine Install & Start EventServer Train & Deploy Prediction Engine Query Result via REST 1. Install and Run PredictionIO 2. Create a new Engine from an Engine Template 3. Generate App ID and Access Key 6. Use the Engine Alternatives Operation Steps 5. Deploy the Engine as a Service 4. Collecting Data
  • 19. Installation & Quick Start ● 請參考 https://github.com/apache/incubator-predictionio/ https://github.com/apache/incubator-predictionio/
  • 20. Installing with Docker ● Install docker firstly ● Start docker-predictionio $ docker run -it -p 8000:8000 steveny/predictionio /bin/bash http://predictionio.incubator.apache.org/community/projects/#docker-installation-for-predictionio
  • 21. Installing From Source ● Up-to-date Version : 0.12.0 ● Downloading Source Code : https://github.com/apache/incubator-predictionio/ ● Building Dependencies: Ecosystem Versions of Dependencies Default Scala 2.10.x, 2.11.x 2.11.8 Spark 1.6.x, 2.0.x, 2.1.x 2.1.1 Elasticsearch 1.7.x, 5.x 5.5.2 Hadoop 2.4.x to 2.7.x 2.7.3(*) HBase 0.98.x, 1.2.x 1.2.6(*) https://predictionio.incubator.apache.org/install/install-sourcecode/ $ ./make-distribution.sh -Dscala.version=2.11.8 -Dspark.version=2.1.0 -Delasticsearch.version=5.3.0 ● Setup and Start PredictionIO
  • 22. Command Line ● General Commands ○ pio status : Displays install path and running status of PredictionIO system and its dependencies. ● Event Server Commands ○ pio eventserver : Launch the Event Server. ○ pio app : Manage apps that are used by the Event Server ● Engine Commands ○ pio build : Build the engine at the current directory. ○ pio train : Kick off a training using an engine. ○ pio deploy : Deploy an engine as an engine server. If no instance ID is specified, it will deploy the latest instance. https://predictionio.incubator.apache.org/cli/#engine-commands
  • 23. $ curl -i -X POST http://localhost:7070/events.json?accessKey=$ACCESS_KEY -H "Content-Type: application/json" -d '{$JSON-CONTEXT}' REST API and SDKs ● REST API : ○ port number for server access (default value, please make sure your setup) : ■ Event Server : 7070 ■ Engine : 8000 ○ example $ curl -H "Content-Type: application/json" -d '{ $JSON-CONTEXT }' http://localhost:8000/queries.json ● SDKs : ○ Java & Android ○ Python ○ PHP ○ Ruby https://predictionio.incubator.apache.org/cli/#engine-commands
  • 24. Outline •Background •PredictionIO Overview •Quick Start your first Engine •Customizing an Engine •Implementation on Enterprise Production •Summary
  • 25.
  • 26.
  • 27. Customizing your Engine with D-A-S-E 參數設定 D A S E https://predictionio.incubator.apache.org/customize/ Datasource.scala Preparator.scala ALSAlgorithm .scala Serving.scala Evaluation.scala Engine.scalaengine.json
  • 28. Engine Query case class case class Predicted Result Engine Factory object RecommendationEngine Query via REST Predicted Result Engine 參數設定 Engine.scala D A S E
  • 30. Data Source and Data Preparator readTrain() D A S E events RDD ratings RDD Training Data prepare() Action Required (*) Prepared Data DataSource Preparator Algorithm DataSource.scala Preparator.scala Note : * : Performs any necessary feature selection or data processing, etc. Event Server
  • 31. Algorithm train() D A S E algo Model predict() Model Predicted Result Algorithm Serving Prepared Data train Query ALSAlgorithm .scala 參數設定 Note : *: train() is called when you run “pio train”
  • 32. example of engine.json for Algorithm { ... "algorithms": [ { "name": "als", "params": { "rank": 10, "numIterations": 20, "lambda": 0.01, "seed": 3 } } ] ... } D A S E
  • 33. Serving serve() D A S E Predicted Result Predicted Result (JSON) Serving Query Predicted Results Combine Predicted Result (*) Note: *: serve() method will combine multiple predicted results into one if you have more than one predictive model
  • 34. Quiz (1/4) - Read Custom Events Q: 如何將 rate 及 buy 二種 event 改成 like 及 dislike ? events RDD val eventsRDD: RDD[Event] = PEventStore.find( appName = dsp.appName, entityType = Some("user"), eventNames = Some(List("rate", "buy")), // read "rate" and "buy" event // targetEntityType is optional field of an event. targetEntityType = Some(Some("item")))(sc) D A S E val eventsRDD: RDD[Event] = PEventStore.find( appName = dsp.appName, entityType = Some("customer"), // change user to customer eventNames = Some(List("like", "dislike")), // read "like" and "dislike” event // targetEntityType is optional field of an event. targetEntityType = Some(Some("product")))(sc) // Modified Before After
  • 35. Quiz (2/4) - Map Custom Events Q: 如何將 rate 及 buy 二種 event 改成 like 及 dislike ? (續) ratings RDD D A S E val ratingValue: Double = event.event match { case "rate" => event.properties.get[Double]("rating") case "buy" => 4.0 // map buy event to rating value of 4 case "like" => 4.0 // map a like event to a rating of 4.0 case "dislike" => 1.0 // map a like event to a rating of 1.0 case _ => throw new Exception(s"Unexpected event ${event} is read.") } val ratingValue: Double = event.event match { case "rate" => event.properties.get[Double]("rating") case "buy" => 4.0 // map buy event to rating value of 4 case _ => throw new Exception(s"Unexpected event ${event} is read.") } Before After
  • 36. Quiz (3/4) - Customizing Data Preparator Q: 如何將新增黑名單功能,讓系統濾除部份產品 ? class Preparator extends PPreparator[TrainingData, PreparedData] { def prepare(sc: SparkContext, trainingData: TrainingData): PreparedData = { new PreparedData(ratings = trainingData.ratings) } } D A S E import scala.io.Source // ADDED class Preparator extends PPreparator[TrainingData, PreparedData] { def prepare(sc: SparkContext, trainingData: TrainingData): PreparedData = { val noTrainItems = Source.fromFile("./data/sample_not_train_data.txt").getLines.toSet // exclude noTrainItems from original trainingData val ratings = trainingData.ratings.filter( r => !noTrainItems.contains(r.item) ) new PreparedData(ratings) } } Before After
  • 37. Quiz (4/4) - Release for your Change D A S E $ pio build $ pio train $ pio deploy ● How to release the modified engine(s) ?
  • 38. Evaluation (1/4) Evaluation AccuracyEvaluation Evaluation Metrics Engine Params List AccuracyAlgo D A S E case class Accuracy extends AverageMetric[EmptyEvaluationInfo, Query, PredictedResult, ActualResult] { def calculate(query: Query, predicted: PredictedResult, actual: ActualResult) : Double = (if (predicted.label == actual.label) 1.0 else 0.0) }
  • 39. Evaluation (2/4) Query case class case class Predicted Result RecommendationEngine Query via REST Predicted Result class DataSource 參數設定 D A S E case class Actual Result
  • 41. Evaluation (4/4) ● Build and run the evaluation ● Deploy the best engine parameter D A S E
  • 42. Outline •Background •PredictionIO Overview •Quick Start your first Engine •Customizing an Engine •Implementation on Enterprise Production •Summary
  • 43. Implementation on Enterprise Production Test Log Cluster Cluster Cluster Cluster Batch Data ( pio import) Real Time Data ( Streaming + PIO SDK ) Yield-En. Event Server Cluster Prediction Engine Cluster P1 Engine P2 Engine P3 Engine Meta Event Data Model Query via REST Prediction Result RDD Off-line Training PredictionIO Platform
  • 44. Deploy the Event Server onto Prediction Cluster Setup PredictionIO Run eventserver listen pio_engine_7070 :7070 mode http balance roundrobin option httpclose option forwardfor option redispatch retries 3 log global log 127.0.0.1 local4 info server piovm1 192.168.56.101:7070 check weight 1 maxconn 30 server piovm2 192.168.56.102:7070 check weight 1 maxconn 30 server piovm3 192.168.56.103:7070 check weight 1 maxconn 30 ● HAProxy configuration for Event Server Cluster Setup HAProxy 分別在需佈署之 Event Server 上,執行下列指令: $ pio eventserver &
  • 45. Deploy the Engine onto Prediction Cluster $ pio deploy 分別在需佈署之 Prediction Server 上,執行下列指令: $ pio deploy --port 8001 --engine-instance-id AV6dTEoKBlbECIGzXhaS Off-Line Engine Training
  • 46. Summary • Apache PredictionIO project is an active and popular project. • It will let you to integrate machine learning functions in your apps effectively and efficiently. • It is also convenient for you to consolidate multiple PredictionIO nodes with HAProxy and other Hadoop ecosystem to provide scalable and stable solution.