SlideShare uma empresa Scribd logo
1 de 24
Baixar para ler offline
How	
  To	
  Lower	
  the	
  Cost	
  of	
  Deploying	
  Analy7cs:	
  
An	
  Introduc7on	
  to	
  the	
  Portable	
  	
  
Format	
  for	
  Analy7cs	
  (PFA)	
  
Robert	
  L.	
  Grossman	
  
University	
  of	
  Chicago	
  
and	
  
Open	
  Data	
  Group	
  
The	
  Data	
  Science	
  Conference	
  (Chicago)	
  
April	
  22,	
  2016	
  
rgrossman.com	
  
@bobgrossman	
  
Exploratory	
  Data	
  
Analysis	
  
Get	
  and	
  	
  
clean	
  the	
  
data	
  Build	
  model	
  in	
  dev/
modeling	
  environment	
  
Deploy	
  model	
  in	
  
opera7onal	
  systems	
  
with	
  scoring	
  
applica7on	
  	
  
Monitor	
  performance	
  
and	
  employ	
  
champion-­‐challenger	
  
methodology	
  
Analy7c	
  modeling	
  
Analy7c	
  opera7ons	
  
Deploy	
  
model	
  
Re7re	
  model	
  and	
  deploy	
  
improved	
  model	
  
Select	
  analy7c	
  
problem	
  &	
  
approach	
  
Scale	
  up	
  	
  
deployment	
  
Perf.	
  
data	
  
Life	
  Cycle	
  of	
  Predic7ve	
  Model	
  
Exploratory	
  Data	
  
Analysis	
  
Get	
  and	
  	
  
clean	
  the	
  
data	
  Build	
  model	
  in	
  dev/
modeling	
  environment	
  
Deploy	
  model	
  in	
  
opera7onal	
  systems	
  
with	
  scoring	
  
applica7on	
  	
  
Monitor	
  performance	
  
and	
  employ	
  
champion-­‐challenger	
  
methodology	
  
Analy7c	
  modeling	
  
Analy7c	
  opera7ons	
  
Deploy	
  
model	
  
Re7re	
  model	
  and	
  deploy	
  
improved	
  model	
  
Select	
  analy7c	
  
problem	
  &	
  
approach	
  
Scale	
  up	
  	
  
deployment	
  
Model Env
Deployment Env
Perf.	
  
data	
  
Differences	
  Between	
  the	
  Modeling	
  and	
  
Deployment	
  Environments	
  
•  Typically	
  modelers	
  use	
  specialized	
  languages	
  such	
  as	
  
SAS,	
  SPSS	
  or	
  R.	
  
•  Usually,	
  developers	
  responsible	
  for	
  products	
  and	
  
services	
  use	
  languages	
  such	
  as	
  Java,	
  JavaScript,	
  
Python,	
  C++,	
  etc.	
  
•  This	
  can	
  result	
  in	
  significant	
  effort	
  moving	
  the	
  model	
  
from	
  the	
  modeling	
  environment	
  to	
  the	
  deployment	
  
environment.	
  
Ways	
  to	
  Deploy	
  Models	
  into	
  	
  
Products/Services/Opera7ons	
  
•  Push	
  code.	
  
•  Embed	
  a	
  sta7c	
  model	
  into	
  a	
  product	
  or	
  service.	
  
•  Export	
  and	
  import	
  tables	
  of	
  scores	
  
•  Export	
  and	
  import	
  tables	
  of	
  parameters	
  
•  Have	
  the	
  product/service	
  interact	
  with	
  the	
  model	
  
as	
  a	
  web	
  or	
  message	
  service.	
  
•  Import	
  the	
  models	
  into	
  a	
  database	
  
How	
  quickly	
  can	
  the	
  model	
  be	
  updated?	
  
•  Model	
  parameters?	
  
•  New	
  features?	
  	
  	
  	
  
•  New	
  pre-­‐	
  &	
  post-­‐	
  processing?	
  
I write all my models in R,
why do I need a model
interchange format??
Alice,	
  Data	
  Scien7st	
  
Not-­‐For-­‐Profit	
  DMG	
  
www.dmg.org	
  
PMML	
  
PFA	
  
Analy7c	
  models	
   Analy7c	
  opera7ons	
  
Deploying	
  analy7c	
  models	
  
Model	
  
Consumer	
  
Model	
  
Producer	
  
Analy7c	
  Infrastructure	
  
Export	
  
model	
  
Import	
  
model	
  
PMML	
  &	
  PFA	
  
What	
  is	
  a	
  Scoring	
  Engine?	
  
•  A	
  scoring	
  engine	
  is	
  a	
  component	
  that	
  is	
  integrated	
  into	
  
products	
  or	
  enterprise	
  IT	
  that	
  deploys	
  analy7c	
  models	
  in	
  
opera7onal	
  workflows	
  for	
  products	
  and	
  services.	
  
•  A	
  Model	
  Interchange	
  Format	
  is	
  a	
  format	
  that	
  supports	
  
the	
  expor7ng	
  of	
  a	
  model	
  by	
  one	
  applica7on	
  and	
  the	
  
impor7ng	
  of	
  a	
  model	
  by	
  another	
  applica7on.	
  	
  	
  
•  Model	
  Interchange	
  Formats	
  include	
  the	
  Predic7ve	
  Model	
  
Markup	
  Language	
  (PMML),	
  the	
  Portable	
  Format	
  for	
  
Analy7cs	
  (PFA),	
  and	
  various	
  in-­‐house	
  or	
  custom	
  formats.	
  
•  Scoring	
  engines	
  are	
  integrated	
  once,	
  but	
  allow	
  
applica7ons	
  to	
  update	
  models	
  as	
  quickly	
  as	
  reading	
  a	
  a	
  
model	
  interchange	
  format	
  file.	
  
9	
  
PMML	
  Philosophy	
  
•  PMML	
  is	
  a	
  specifica(on	
  of	
  a	
  model,	
  not	
  an	
  
implementa7on	
  of	
  a	
  model	
  
•  PMML	
  allows	
  a	
  simple	
  means	
  of	
  binding	
  
parameters	
  to	
  values	
  for	
  an	
  agreed	
  upon	
  set	
  of	
  
data	
  mining	
  models	
  &	
  transforma7ons	
  
•  Because	
  of	
  the	
  specifica7on	
  nature	
  of	
  PMML,	
  a	
  
compliant	
  scoring	
  engine	
  must	
  support	
  a	
  large	
  
combinatorial	
  combina7on	
  of	
  specifica7ons,	
  and	
  
it	
  can	
  be	
  challenging	
  to	
  develop	
  a	
  consistent	
  
scoring	
  engine.	
  
10	
  
PFA	
  Philosophy	
  
•  Define	
  primi7ves	
  for	
  data	
  transforma7ons,	
  data	
  
aggrega7ons,	
  and	
  sta7s7cal	
  and	
  analy7c	
  models.	
  
•  Support	
  composi7on	
  of	
  data	
  mining	
  primi7ves	
  
(which	
  makes	
  it	
  easy	
  to	
  specify	
  machine	
  learning	
  
algorithms	
  and	
  pre-­‐/post-­‐	
  processing	
  of	
  data).	
  
•  Be	
  extensible.	
  
•  Designed	
  to	
  be	
  “safe”	
  to	
  deploy	
  in	
  enterprise	
  IT	
  
opera7onal	
  environments.	
  
•  This	
  is	
  a	
  philosophy	
  that	
  is	
  different	
  and	
  
complementary	
  to	
  Predic7ve	
  Model	
  Markup	
  
Language	
  (PMML).	
  
11	
  
PFA	
  Case	
  Study	
  1	
  
•  20+	
  person	
  data	
  science	
  group	
  developing	
  models	
  in	
  
R,	
  Python,	
  Scikit-­‐learn	
  and	
  MATLAB.	
  
•  All	
  the	
  data	
  scien7sts	
  export	
  their	
  model	
  in	
  PFA.	
  
•  The	
  company’s	
  product	
  imports	
  models	
  in	
  PFA	
  and	
  
runs	
  on	
  their	
  customers	
  data	
  as	
  required.	
  
Export	
  PFA	
   Import	
  PFA	
  
Widget	
  
records	
  
Widget	
  
scores	
  
PFA	
  Case	
  Study	
  2	
  
•  Data	
  scien7st	
  teams	
  developing	
  analy7c	
  models	
  for	
  
adversarial	
  analy7cs	
  project.	
  
•  Models	
  developed	
  in	
  Hadoop	
  and	
  exported	
  in	
  PFA	
  every	
  
2	
  weeks.	
  	
  
•  Models	
  updated	
  in	
  client	
  systems	
  every	
  week.	
  
Export	
  PFA	
   Import	
  PFA	
  
Event	
  
records	
  
Event	
  
scores	
  
Weeks	
  1/2,	
  3/4,	
  	
   Weeks	
  2/3,	
  4/5,	
  …	
   Weeks	
  2,	
  3,	
  4,	
  5,	
  …	
  
PFA	
  Func7onality	
  
•  PFA	
  codes	
  arbitrary	
  mathema7cal	
  algorithms	
  in	
  a	
  
7ghtly	
  controlled	
  environment.	
  
•  PFA	
  has	
  all	
  the	
  standard	
  flow	
  control	
  of	
  a	
  
programming	
  language:	
  if/then/else	
  &	
  	
  for/while	
  
loops.	
  
•  PFA	
  has	
  func7on	
  calls	
  and	
  func7on	
  call	
  backs	
  	
  
•  PFA	
  has	
  algebraic	
  data	
  types.	
  
•  PFA	
  is	
  encoded	
  as	
  func7on	
  calls	
  in	
  JSON	
  
	
   	
  {func7on:	
  [arg	
  1,	
  arg	
  2,	
  …,	
  arg	
  n]	
  }	
  
14	
  
Example:	
  Scoring	
  Clusters	
  
15	
  
Source:	
  dmg.org/pfa	
  
Source:	
  dmg.org/pfa	
  
16	
  
Benefits	
  of	
  PFA	
  
•  PFA	
  is	
  based	
  upon	
  JSON	
  and	
  Avro	
  and	
  integrates	
  
easily	
  into	
  modern	
  big	
  data	
  environments.	
  
•  PFA	
  allows	
  models	
  to	
  be	
  easily	
  chained	
  and	
  
composed	
  
•  PFA	
  allows	
  developers	
  and	
  users	
  users	
  of	
  analy7c	
  
systems	
  to	
  pre-­‐process	
  inputs	
  and	
  to	
  post-­‐process	
  
outputs	
  to	
  models	
  
•  PFA	
  is	
  easily	
  integrated	
  with	
  Hadoop,	
  Spark,	
  etc.	
  
•  PFA	
  is	
  easily	
  integrated	
  with	
  Kapa,	
  Storm,	
  Akka	
  and	
  
other	
  streaming	
  environments	
  
•  PFA	
  can	
  be	
  used	
  to	
  integrate	
  mul7ple	
  	
  tools	
  
applica7ons	
  within	
  an	
  analy7c	
  ecosystem.	
  
Gaussian	
  Process	
  Model	
  (1	
  of	
  5)	
  
Gaussian	
  Process	
  Model	
  (2	
  of	
  5)	
  
input: {type: array, items: double}
output: {type: array, items: double}
cells:
table:
type:
{type: array, items: {type: record, name: GP, fields: [
- {name: x, type: {type: array, items: double}}
- {name: to, type: {type: array, items: double}}
- {name: sigma, type: {type: array, items: double}}]}}
init:
- {x: [ 0, 0], to: [0.01870587, 0.96812508], sigma: [0.2, 0.2]}
- {x: [ 0, 36], to: [0.00242101, 0.95369720], sigma: [0.2, 0.2]}
- {x: [ 0, 72], to: [0.13131668, 0.53822666], sigma: [0.2, 0.2]}
...
- {x: [324, 324], to: [-0.6815587, 0.82271760], sigma: [0.2, 0.2]}
action:
model.reg.gaussianProcess:
- input
- {cell: table}
- null
- {fcn: m.kernel.rbf, fill: {gamma: 2.0}}
input	
  and	
  output	
  of	
  scoring	
  engine	
  
expressed	
  as	
  Avro	
  schemas	
  
Source:	
  dmg.org/pfa	
  
Gaussian	
  Process	
  Model	
  (3	
  of	
  5)	
  
input: {type: array, items: double}
output: {type: array, items: double}
cells:
table:
type:
{type: array, items: {type: record, name: GP, fields: [
- {name: x, type: {type: array, items: double}}
- {name: to, type: {type: array, items: double}}
- {name: sigma, type: {type: array, items: double}}]}}
init:
- {x: [ 0, 0], to: [0.01870587, 0.96812508], sigma: [0.2, 0.2]}
- {x: [ 0, 36], to: [0.00242101, 0.95369720], sigma: [0.2, 0.2]}
- {x: [ 0, 72], to: [0.13131668, 0.53822666], sigma: [0.2, 0.2]}
...
- {x: [324, 324], to: [-0.6815587, 0.82271760], sigma: [0.2, 0.2]}
action:
model.reg.gaussianProcess:
- input
- {cell: table}
- null
- {fcn: m.kernel.rbf, fill: {gamma: 2.0}}
type	
  
(also	
  Avro)	
  
and	
  value	
  
(as	
  JSON,	
  
truncated)	
  
Gaussian	
  Process	
  
model	
  parameters	
  
Source:	
  dmg.org/pfa	
  
Gaussian	
  Process	
  Model	
  (4	
  of	
  5)	
  
input: {type: array, items: double}
output: {type: array, items: double}
cells:
table:
type:
{type: array, items: {type: record, name: GP, fields: [
- {name: x, type: {type: array, items: double}}
- {name: to, type: {type: array, items: double}}
- {name: sigma, type: {type: array, items: double}}]}}
init:
- {x: [ 0, 0], to: [0.01870587, 0.96812508], sigma: [0.2, 0.2]}
- {x: [ 0, 36], to: [0.00242101, 0.95369720], sigma: [0.2, 0.2]}
- {x: [ 0, 72], to: [0.13131668, 0.53822666], sigma: [0.2, 0.2]}
...
- {x: [324, 324], to: [-0.6815587, 0.82271760], sigma: [0.2, 0.2]}
action:
model.reg.gaussianProcess:
- input
- {cell: table}
- null
- {fcn: m.kernel.rbf, fill: {gamma: 2.0}}
calling	
  method:	
  parameters	
  
expressed	
  as	
  JSON	
  
input:	
  get	
  interpola7on	
  point	
  from	
  input	
  
{cell:	
  table}:	
  get	
  parameters	
  from	
  table	
  
null:	
  no	
  explicit	
  Kriging	
  weight	
  (universal)	
  
{fcn:	
  …}:	
  kernel	
  func7on	
  
Source:	
  dmg.org/pfa	
  
Gaussian	
  Process	
  Model	
  (5	
  of	
  5)	
  
•  Appears	
  declara7ve,	
  but	
  this	
  is	
  a	
  func7on	
  call.	
  
–  Fourth	
  parameter	
  is	
  another	
  func7on:	
  m.kernel.rbf	
  (radial	
  basis	
  
kernel,	
  a.k.a.	
  squared	
  exponen7al).	
  
–  	
  m.kernel.rbf	
  was	
  intended	
  for	
  SVM,	
  but	
  is	
  reusable	
  anywhere.	
  
–  One	
  argument	
  (gamma)	
  preapplied	
  so	
  that	
  it	
  fits	
  the	
  signature	
  
for	
  model.reg.gaussianProcess.	
  
•  Any	
  kernel	
  func7on	
  could	
  be	
  used,	
  including	
  user-­‐defined	
  func7ons	
  
wriren	
  with	
  PFA	
  “code.”	
  
•  The	
  Gaussian	
  Process	
  could	
  be	
  used	
  anywhere,	
  even	
  as	
  a	
  pre-­‐
processing	
  or	
  post-­‐processing	
  step.	
  
model.reg.gaussianProcess:
- input
- {cell: table}
- null
- {fcn: m.kernel.rbf, fill: {gamma: 2.0}}
Source:	
  dmg.org/pfa	
  
Summary	
  
•  The	
  Portable	
  Format	
  for	
  Analy7cs	
  (PFA)	
  is	
  a	
  model	
  
interchange	
  format	
  for	
  building	
  analy7c	
  models	
  in	
  one	
  
environment	
  and	
  deploying	
  them	
  in	
  another	
  one.	
  
•  Based	
  upon	
  data	
  mining	
  primi7ves.	
  
•  Supports	
  pre-­‐processing,	
  analy7c	
  models,	
  post-­‐
processing,	
  and	
  composi7on	
  of	
  primi7ves	
  and	
  models.	
  
•  You	
  can	
  easily	
  add	
  your	
  own	
  PFA	
  models	
  since	
  you	
  can	
  
add	
  your	
  own	
  PFA	
  func7ons	
  
•  There	
  is	
  a	
  reference	
  implementa7on	
  and	
  thousands	
  of	
  
compliance	
  tests.	
  	
  
•  Standard	
  being	
  developed	
  by	
  the	
  not-­‐for-­‐profit	
  DMG,	
  
which	
  developed	
  PMML.	
  
Ques7ons?	
  
24	
  
For	
  more	
  informa7on,	
  see:	
  dmg.org/pfa	
  

Mais conteúdo relacionado

Mais procurados

Model Building with RevoScaleR: Using R and Hadoop for Statistical Computation
Model Building with RevoScaleR: Using R and Hadoop for Statistical ComputationModel Building with RevoScaleR: Using R and Hadoop for Statistical Computation
Model Building with RevoScaleR: Using R and Hadoop for Statistical ComputationRevolution Analytics
 
AISF19 - Building Scalable, Kubernetes-Native ML/AI Pipelines with TFX, KubeF...
AISF19 - Building Scalable, Kubernetes-Native ML/AI Pipelines with TFX, KubeF...AISF19 - Building Scalable, Kubernetes-Native ML/AI Pipelines with TFX, KubeF...
AISF19 - Building Scalable, Kubernetes-Native ML/AI Pipelines with TFX, KubeF...Bill Liu
 
Tensors Are All You Need: Faster Inference with Hummingbird
Tensors Are All You Need: Faster Inference with HummingbirdTensors Are All You Need: Faster Inference with Hummingbird
Tensors Are All You Need: Faster Inference with HummingbirdDatabricks
 
Processing Terabyte-Scale Genomics Datasets with ADAM: Spark Summit East talk...
Processing Terabyte-Scale Genomics Datasets with ADAM: Spark Summit East talk...Processing Terabyte-Scale Genomics Datasets with ADAM: Spark Summit East talk...
Processing Terabyte-Scale Genomics Datasets with ADAM: Spark Summit East talk...Spark Summit
 
The MADlib Analytics Library
The MADlib Analytics Library The MADlib Analytics Library
The MADlib Analytics Library EMC
 
High Performance Predictive Analytics in R and Hadoop
High Performance Predictive Analytics in R and HadoopHigh Performance Predictive Analytics in R and Hadoop
High Performance Predictive Analytics in R and HadoopDataWorks Summit
 
The network structure of cran 2015 07-02 final
The network structure of cran 2015 07-02 finalThe network structure of cran 2015 07-02 final
The network structure of cran 2015 07-02 finalRevolution Analytics
 
High Performance Predictive Analytics in R and Hadoop
High Performance Predictive Analytics in R and HadoopHigh Performance Predictive Analytics in R and Hadoop
High Performance Predictive Analytics in R and HadoopRevolution Analytics
 
Quick and Dirty: Scaling Out Predictive Models Using Revolution Analytics on ...
Quick and Dirty: Scaling Out Predictive Models Using Revolution Analytics on ...Quick and Dirty: Scaling Out Predictive Models Using Revolution Analytics on ...
Quick and Dirty: Scaling Out Predictive Models Using Revolution Analytics on ...Revolution Analytics
 
Streaming analytics state of the art
Streaming analytics state of the artStreaming analytics state of the art
Streaming analytics state of the artStavros Kontopoulos
 
運用CNTK 實作深度學習物件辨識 Deep Learning based Object Detection with Microsoft Cogniti...
運用CNTK 實作深度學習物件辨識 Deep Learning based Object Detection with Microsoft Cogniti...運用CNTK 實作深度學習物件辨識 Deep Learning based Object Detection with Microsoft Cogniti...
運用CNTK 實作深度學習物件辨識 Deep Learning based Object Detection with Microsoft Cogniti...Herman Wu
 
Next generation analytics with yarn, spark and graph lab
Next generation analytics with yarn, spark and graph labNext generation analytics with yarn, spark and graph lab
Next generation analytics with yarn, spark and graph labImpetus Technologies
 
Building A Hybrid Warehouse: Efficient Joins between Data Stored in HDFS and ...
Building A Hybrid Warehouse: Efficient Joins between Data Stored in HDFS and ...Building A Hybrid Warehouse: Efficient Joins between Data Stored in HDFS and ...
Building A Hybrid Warehouse: Efficient Joins between Data Stored in HDFS and ...Yuanyuan Tian
 
Multiplatform Spark solution for Graph datasources by Javier Dominguez
Multiplatform Spark solution for Graph datasources by Javier DominguezMultiplatform Spark solution for Graph datasources by Javier Dominguez
Multiplatform Spark solution for Graph datasources by Javier DominguezBig Data Spain
 
Machine Learning and Hadoop
Machine Learning and HadoopMachine Learning and Hadoop
Machine Learning and HadoopJosh Patterson
 
TensorFlow London: Cutting edge generative models
TensorFlow London: Cutting edge generative modelsTensorFlow London: Cutting edge generative models
TensorFlow London: Cutting edge generative modelsSeldon
 
Deep Learning for Natural Language Processing Using Apache Spark and TensorFl...
Deep Learning for Natural Language Processing Using Apache Spark and TensorFl...Deep Learning for Natural Language Processing Using Apache Spark and TensorFl...
Deep Learning for Natural Language Processing Using Apache Spark and TensorFl...Databricks
 

Mais procurados (20)

Model Building with RevoScaleR: Using R and Hadoop for Statistical Computation
Model Building with RevoScaleR: Using R and Hadoop for Statistical ComputationModel Building with RevoScaleR: Using R and Hadoop for Statistical Computation
Model Building with RevoScaleR: Using R and Hadoop for Statistical Computation
 
AISF19 - Building Scalable, Kubernetes-Native ML/AI Pipelines with TFX, KubeF...
AISF19 - Building Scalable, Kubernetes-Native ML/AI Pipelines with TFX, KubeF...AISF19 - Building Scalable, Kubernetes-Native ML/AI Pipelines with TFX, KubeF...
AISF19 - Building Scalable, Kubernetes-Native ML/AI Pipelines with TFX, KubeF...
 
Tensors Are All You Need: Faster Inference with Hummingbird
Tensors Are All You Need: Faster Inference with HummingbirdTensors Are All You Need: Faster Inference with Hummingbird
Tensors Are All You Need: Faster Inference with Hummingbird
 
Processing Terabyte-Scale Genomics Datasets with ADAM: Spark Summit East talk...
Processing Terabyte-Scale Genomics Datasets with ADAM: Spark Summit East talk...Processing Terabyte-Scale Genomics Datasets with ADAM: Spark Summit East talk...
Processing Terabyte-Scale Genomics Datasets with ADAM: Spark Summit East talk...
 
The MADlib Analytics Library
The MADlib Analytics Library The MADlib Analytics Library
The MADlib Analytics Library
 
Yarn spark next_gen_hadoop_8_jan_2014
Yarn spark next_gen_hadoop_8_jan_2014Yarn spark next_gen_hadoop_8_jan_2014
Yarn spark next_gen_hadoop_8_jan_2014
 
High Performance Predictive Analytics in R and Hadoop
High Performance Predictive Analytics in R and HadoopHigh Performance Predictive Analytics in R and Hadoop
High Performance Predictive Analytics in R and Hadoop
 
R and Data Science
R and Data ScienceR and Data Science
R and Data Science
 
The network structure of cran 2015 07-02 final
The network structure of cran 2015 07-02 finalThe network structure of cran 2015 07-02 final
The network structure of cran 2015 07-02 final
 
High Performance Predictive Analytics in R and Hadoop
High Performance Predictive Analytics in R and HadoopHigh Performance Predictive Analytics in R and Hadoop
High Performance Predictive Analytics in R and Hadoop
 
Quick and Dirty: Scaling Out Predictive Models Using Revolution Analytics on ...
Quick and Dirty: Scaling Out Predictive Models Using Revolution Analytics on ...Quick and Dirty: Scaling Out Predictive Models Using Revolution Analytics on ...
Quick and Dirty: Scaling Out Predictive Models Using Revolution Analytics on ...
 
Streaming analytics state of the art
Streaming analytics state of the artStreaming analytics state of the art
Streaming analytics state of the art
 
運用CNTK 實作深度學習物件辨識 Deep Learning based Object Detection with Microsoft Cogniti...
運用CNTK 實作深度學習物件辨識 Deep Learning based Object Detection with Microsoft Cogniti...運用CNTK 實作深度學習物件辨識 Deep Learning based Object Detection with Microsoft Cogniti...
運用CNTK 實作深度學習物件辨識 Deep Learning based Object Detection with Microsoft Cogniti...
 
Next generation analytics with yarn, spark and graph lab
Next generation analytics with yarn, spark and graph labNext generation analytics with yarn, spark and graph lab
Next generation analytics with yarn, spark and graph lab
 
Building A Hybrid Warehouse: Efficient Joins between Data Stored in HDFS and ...
Building A Hybrid Warehouse: Efficient Joins between Data Stored in HDFS and ...Building A Hybrid Warehouse: Efficient Joins between Data Stored in HDFS and ...
Building A Hybrid Warehouse: Efficient Joins between Data Stored in HDFS and ...
 
Multiplatform Spark solution for Graph datasources by Javier Dominguez
Multiplatform Spark solution for Graph datasources by Javier DominguezMultiplatform Spark solution for Graph datasources by Javier Dominguez
Multiplatform Spark solution for Graph datasources by Javier Dominguez
 
Ai use cases
Ai use casesAi use cases
Ai use cases
 
Machine Learning and Hadoop
Machine Learning and HadoopMachine Learning and Hadoop
Machine Learning and Hadoop
 
TensorFlow London: Cutting edge generative models
TensorFlow London: Cutting edge generative modelsTensorFlow London: Cutting edge generative models
TensorFlow London: Cutting edge generative models
 
Deep Learning for Natural Language Processing Using Apache Spark and TensorFl...
Deep Learning for Natural Language Processing Using Apache Spark and TensorFl...Deep Learning for Natural Language Processing Using Apache Spark and TensorFl...
Deep Learning for Natural Language Processing Using Apache Spark and TensorFl...
 

Destaque

Keynote on 2015 Yale Day of Data
Keynote on 2015 Yale Day of Data Keynote on 2015 Yale Day of Data
Keynote on 2015 Yale Day of Data Robert Grossman
 
Clouds and Commons for the Data Intensive Science Community (June 8, 2015)
Clouds and Commons for the Data Intensive Science Community (June 8, 2015)Clouds and Commons for the Data Intensive Science Community (June 8, 2015)
Clouds and Commons for the Data Intensive Science Community (June 8, 2015)Robert Grossman
 
Architectures for Data Commons (XLDB 15 Lightning Talk)
Architectures for Data Commons (XLDB 15 Lightning Talk)Architectures for Data Commons (XLDB 15 Lightning Talk)
Architectures for Data Commons (XLDB 15 Lightning Talk)Robert Grossman
 
What is a Data Commons and Why Should You Care?
What is a Data Commons and Why Should You Care? What is a Data Commons and Why Should You Care?
What is a Data Commons and Why Should You Care? Robert Grossman
 
Practical Methods for Identifying Anomalies That Matter in Large Datasets
Practical Methods for Identifying Anomalies That Matter in Large DatasetsPractical Methods for Identifying Anomalies That Matter in Large Datasets
Practical Methods for Identifying Anomalies That Matter in Large DatasetsRobert Grossman
 
Processing Big Data (Chapter 3, SC 11 Tutorial)
Processing Big Data (Chapter 3, SC 11 Tutorial)Processing Big Data (Chapter 3, SC 11 Tutorial)
Processing Big Data (Chapter 3, SC 11 Tutorial)Robert Grossman
 
Big Data, The Community and The Commons (May 12, 2014)
Big Data, The Community and The Commons (May 12, 2014)Big Data, The Community and The Commons (May 12, 2014)
Big Data, The Community and The Commons (May 12, 2014)Robert Grossman
 
Adversarial Analytics - 2013 Strata & Hadoop World Talk
Adversarial Analytics - 2013 Strata & Hadoop World TalkAdversarial Analytics - 2013 Strata & Hadoop World Talk
Adversarial Analytics - 2013 Strata & Hadoop World TalkRobert Grossman
 
How to prevent Road Accidents, Road Safety tips, Road Safety Seminar, Road Sa...
How to prevent Road Accidents, Road Safety tips, Road Safety Seminar, Road Sa...How to prevent Road Accidents, Road Safety tips, Road Safety Seminar, Road Sa...
How to prevent Road Accidents, Road Safety tips, Road Safety Seminar, Road Sa...Road Safety
 
Road Safety PowerPoint Presentation
Road Safety PowerPoint PresentationRoad Safety PowerPoint Presentation
Road Safety PowerPoint PresentationRoad Safety
 
Introduction To Chemistry
Introduction To ChemistryIntroduction To Chemistry
Introduction To ChemistryOH TEIK BIN
 
Visual Design with Data
Visual Design with DataVisual Design with Data
Visual Design with DataSeth Familian
 
3 Things Every Sales Team Needs to Be Thinking About in 2017
3 Things Every Sales Team Needs to Be Thinking About in 20173 Things Every Sales Team Needs to Be Thinking About in 2017
3 Things Every Sales Team Needs to Be Thinking About in 2017Drift
 
ACM Bay Area Data Mining Workshop: Pattern, PMML, Hadoop
ACM Bay Area Data Mining Workshop: Pattern, PMML, HadoopACM Bay Area Data Mining Workshop: Pattern, PMML, Hadoop
ACM Bay Area Data Mining Workshop: Pattern, PMML, HadoopPaco Nathan
 
Active Technologies - Portable Analytics for Everyone
Active Technologies - Portable Analytics for Everyone Active Technologies - Portable Analytics for Everyone
Active Technologies - Portable Analytics for Everyone Brian Carter
 
On the representation and reuse of machine learning (ML) models
On the representation and reuse of machine learning (ML) modelsOn the representation and reuse of machine learning (ML) models
On the representation and reuse of machine learning (ML) modelsVillu Ruusmann
 
PMML - Predictive Model Markup Language
PMML - Predictive Model Markup LanguagePMML - Predictive Model Markup Language
PMML - Predictive Model Markup Languageaguazzel
 

Destaque (20)

Keynote on 2015 Yale Day of Data
Keynote on 2015 Yale Day of Data Keynote on 2015 Yale Day of Data
Keynote on 2015 Yale Day of Data
 
Clouds and Commons for the Data Intensive Science Community (June 8, 2015)
Clouds and Commons for the Data Intensive Science Community (June 8, 2015)Clouds and Commons for the Data Intensive Science Community (June 8, 2015)
Clouds and Commons for the Data Intensive Science Community (June 8, 2015)
 
Architectures for Data Commons (XLDB 15 Lightning Talk)
Architectures for Data Commons (XLDB 15 Lightning Talk)Architectures for Data Commons (XLDB 15 Lightning Talk)
Architectures for Data Commons (XLDB 15 Lightning Talk)
 
What is a Data Commons and Why Should You Care?
What is a Data Commons and Why Should You Care? What is a Data Commons and Why Should You Care?
What is a Data Commons and Why Should You Care?
 
Practical Methods for Identifying Anomalies That Matter in Large Datasets
Practical Methods for Identifying Anomalies That Matter in Large DatasetsPractical Methods for Identifying Anomalies That Matter in Large Datasets
Practical Methods for Identifying Anomalies That Matter in Large Datasets
 
Processing Big Data (Chapter 3, SC 11 Tutorial)
Processing Big Data (Chapter 3, SC 11 Tutorial)Processing Big Data (Chapter 3, SC 11 Tutorial)
Processing Big Data (Chapter 3, SC 11 Tutorial)
 
Big Data, The Community and The Commons (May 12, 2014)
Big Data, The Community and The Commons (May 12, 2014)Big Data, The Community and The Commons (May 12, 2014)
Big Data, The Community and The Commons (May 12, 2014)
 
Adversarial Analytics - 2013 Strata & Hadoop World Talk
Adversarial Analytics - 2013 Strata & Hadoop World TalkAdversarial Analytics - 2013 Strata & Hadoop World Talk
Adversarial Analytics - 2013 Strata & Hadoop World Talk
 
How to prevent Road Accidents, Road Safety tips, Road Safety Seminar, Road Sa...
How to prevent Road Accidents, Road Safety tips, Road Safety Seminar, Road Sa...How to prevent Road Accidents, Road Safety tips, Road Safety Seminar, Road Sa...
How to prevent Road Accidents, Road Safety tips, Road Safety Seminar, Road Sa...
 
Landscaping Architecture
Landscaping ArchitectureLandscaping Architecture
Landscaping Architecture
 
Road Safety PowerPoint Presentation
Road Safety PowerPoint PresentationRoad Safety PowerPoint Presentation
Road Safety PowerPoint Presentation
 
GOAL SETTING POWERPOINT
GOAL SETTING POWERPOINTGOAL SETTING POWERPOINT
GOAL SETTING POWERPOINT
 
Introduction To Chemistry
Introduction To ChemistryIntroduction To Chemistry
Introduction To Chemistry
 
Culture
CultureCulture
Culture
 
Visual Design with Data
Visual Design with DataVisual Design with Data
Visual Design with Data
 
3 Things Every Sales Team Needs to Be Thinking About in 2017
3 Things Every Sales Team Needs to Be Thinking About in 20173 Things Every Sales Team Needs to Be Thinking About in 2017
3 Things Every Sales Team Needs to Be Thinking About in 2017
 
ACM Bay Area Data Mining Workshop: Pattern, PMML, Hadoop
ACM Bay Area Data Mining Workshop: Pattern, PMML, HadoopACM Bay Area Data Mining Workshop: Pattern, PMML, Hadoop
ACM Bay Area Data Mining Workshop: Pattern, PMML, Hadoop
 
Active Technologies - Portable Analytics for Everyone
Active Technologies - Portable Analytics for Everyone Active Technologies - Portable Analytics for Everyone
Active Technologies - Portable Analytics for Everyone
 
On the representation and reuse of machine learning (ML) models
On the representation and reuse of machine learning (ML) modelsOn the representation and reuse of machine learning (ML) models
On the representation and reuse of machine learning (ML) models
 
PMML - Predictive Model Markup Language
PMML - Predictive Model Markup LanguagePMML - Predictive Model Markup Language
PMML - Predictive Model Markup Language
 

Semelhante a How to Lower the Cost of Deploying Analytics: An Introduction to the Portable Format for Analytics

Building machine learning service in your business — Eric Chen (Uber) @PAPIs ...
Building machine learning service in your business — Eric Chen (Uber) @PAPIs ...Building machine learning service in your business — Eric Chen (Uber) @PAPIs ...
Building machine learning service in your business — Eric Chen (Uber) @PAPIs ...PAPIs.io
 
SigOpt for Hedge Funds
SigOpt for Hedge FundsSigOpt for Hedge Funds
SigOpt for Hedge FundsSigOpt
 
Utilisation de MLflow pour le cycle de vie des projet Machine learning
Utilisation de MLflow pour le cycle de vie des projet Machine learningUtilisation de MLflow pour le cycle de vie des projet Machine learning
Utilisation de MLflow pour le cycle de vie des projet Machine learningParis Data Engineers !
 
Apache ® Spark™ MLlib 2.x: How to Productionize your Machine Learning Models
Apache ® Spark™ MLlib 2.x: How to Productionize your Machine Learning ModelsApache ® Spark™ MLlib 2.x: How to Productionize your Machine Learning Models
Apache ® Spark™ MLlib 2.x: How to Productionize your Machine Learning ModelsAnyscale
 
Crossing the Analytics Chasm and Getting the Models You Developed Deployed
Crossing the Analytics Chasm and Getting the Models You Developed DeployedCrossing the Analytics Chasm and Getting the Models You Developed Deployed
Crossing the Analytics Chasm and Getting the Models You Developed DeployedRobert Grossman
 
Aditya Bhattacharya - Enterprise DL - Accelerating Deep Learning Solutions to...
Aditya Bhattacharya - Enterprise DL - Accelerating Deep Learning Solutions to...Aditya Bhattacharya - Enterprise DL - Accelerating Deep Learning Solutions to...
Aditya Bhattacharya - Enterprise DL - Accelerating Deep Learning Solutions to...Aditya Bhattacharya
 
How to Productionize Your Machine Learning Models Using Apache Spark MLlib 2....
How to Productionize Your Machine Learning Models Using Apache Spark MLlib 2....How to Productionize Your Machine Learning Models Using Apache Spark MLlib 2....
How to Productionize Your Machine Learning Models Using Apache Spark MLlib 2....Databricks
 
World Artificial Intelligence Conference Shanghai 2018
World Artificial Intelligence Conference Shanghai 2018World Artificial Intelligence Conference Shanghai 2018
World Artificial Intelligence Conference Shanghai 2018Adam Gibson
 
Machine Learning Models in Production
Machine Learning Models in ProductionMachine Learning Models in Production
Machine Learning Models in ProductionDataWorks Summit
 
SigOpt for Machine Learning and AI
SigOpt for Machine Learning and AISigOpt for Machine Learning and AI
SigOpt for Machine Learning and AISigOpt
 
Distributed Database practicals
Distributed Database practicals Distributed Database practicals
Distributed Database practicals Vrushali Lanjewar
 
Software engineering practices for the data science and machine learning life...
Software engineering practices for the data science and machine learning life...Software engineering practices for the data science and machine learning life...
Software engineering practices for the data science and machine learning life...DataWorks Summit
 
Jonathon Wright - Intelligent Performance Cognitive Learning (AIOps)
Jonathon Wright - Intelligent Performance Cognitive Learning (AIOps)Jonathon Wright - Intelligent Performance Cognitive Learning (AIOps)
Jonathon Wright - Intelligent Performance Cognitive Learning (AIOps)Neotys_Partner
 
MLops on Vertex AI Presentation (AI/ML).pptx
MLops on Vertex AI Presentation (AI/ML).pptxMLops on Vertex AI Presentation (AI/ML).pptx
MLops on Vertex AI Presentation (AI/ML).pptxKnoldus Inc.
 
Continuous delivery for machine learning
Continuous delivery for machine learningContinuous delivery for machine learning
Continuous delivery for machine learningRajesh Muppalla
 
Advanced MLflow: Multi-Step Workflows, Hyperparameter Tuning and Integrating ...
Advanced MLflow: Multi-Step Workflows, Hyperparameter Tuning and Integrating ...Advanced MLflow: Multi-Step Workflows, Hyperparameter Tuning and Integrating ...
Advanced MLflow: Multi-Step Workflows, Hyperparameter Tuning and Integrating ...Databricks
 
Slides-Артем Коваль-Cloud-Native MLOps Framework - DataFest 2021.pdf
Slides-Артем Коваль-Cloud-Native MLOps Framework - DataFest 2021.pdfSlides-Артем Коваль-Cloud-Native MLOps Framework - DataFest 2021.pdf
Slides-Артем Коваль-Cloud-Native MLOps Framework - DataFest 2021.pdfvitm11
 

Semelhante a How to Lower the Cost of Deploying Analytics: An Introduction to the Portable Format for Analytics (20)

Building machine learning service in your business — Eric Chen (Uber) @PAPIs ...
Building machine learning service in your business — Eric Chen (Uber) @PAPIs ...Building machine learning service in your business — Eric Chen (Uber) @PAPIs ...
Building machine learning service in your business — Eric Chen (Uber) @PAPIs ...
 
SigOpt for Hedge Funds
SigOpt for Hedge FundsSigOpt for Hedge Funds
SigOpt for Hedge Funds
 
Utilisation de MLflow pour le cycle de vie des projet Machine learning
Utilisation de MLflow pour le cycle de vie des projet Machine learningUtilisation de MLflow pour le cycle de vie des projet Machine learning
Utilisation de MLflow pour le cycle de vie des projet Machine learning
 
Apache ® Spark™ MLlib 2.x: How to Productionize your Machine Learning Models
Apache ® Spark™ MLlib 2.x: How to Productionize your Machine Learning ModelsApache ® Spark™ MLlib 2.x: How to Productionize your Machine Learning Models
Apache ® Spark™ MLlib 2.x: How to Productionize your Machine Learning Models
 
Crossing the Analytics Chasm and Getting the Models You Developed Deployed
Crossing the Analytics Chasm and Getting the Models You Developed DeployedCrossing the Analytics Chasm and Getting the Models You Developed Deployed
Crossing the Analytics Chasm and Getting the Models You Developed Deployed
 
Aditya Bhattacharya - Enterprise DL - Accelerating Deep Learning Solutions to...
Aditya Bhattacharya - Enterprise DL - Accelerating Deep Learning Solutions to...Aditya Bhattacharya - Enterprise DL - Accelerating Deep Learning Solutions to...
Aditya Bhattacharya - Enterprise DL - Accelerating Deep Learning Solutions to...
 
How to Productionize Your Machine Learning Models Using Apache Spark MLlib 2....
How to Productionize Your Machine Learning Models Using Apache Spark MLlib 2....How to Productionize Your Machine Learning Models Using Apache Spark MLlib 2....
How to Productionize Your Machine Learning Models Using Apache Spark MLlib 2....
 
World Artificial Intelligence Conference Shanghai 2018
World Artificial Intelligence Conference Shanghai 2018World Artificial Intelligence Conference Shanghai 2018
World Artificial Intelligence Conference Shanghai 2018
 
Machine Learning Models in Production
Machine Learning Models in ProductionMachine Learning Models in Production
Machine Learning Models in Production
 
SigOpt for Machine Learning and AI
SigOpt for Machine Learning and AISigOpt for Machine Learning and AI
SigOpt for Machine Learning and AI
 
Distributed Database practicals
Distributed Database practicals Distributed Database practicals
Distributed Database practicals
 
Software engineering practices for the data science and machine learning life...
Software engineering practices for the data science and machine learning life...Software engineering practices for the data science and machine learning life...
Software engineering practices for the data science and machine learning life...
 
Jonathon Wright - Intelligent Performance Cognitive Learning (AIOps)
Jonathon Wright - Intelligent Performance Cognitive Learning (AIOps)Jonathon Wright - Intelligent Performance Cognitive Learning (AIOps)
Jonathon Wright - Intelligent Performance Cognitive Learning (AIOps)
 
Data Modeling in SAP Gateway – maximize performance at all levels
Data Modeling in SAP Gateway – maximize performance at all levelsData Modeling in SAP Gateway – maximize performance at all levels
Data Modeling in SAP Gateway – maximize performance at all levels
 
Vedic Calculator
Vedic CalculatorVedic Calculator
Vedic Calculator
 
MLops on Vertex AI Presentation (AI/ML).pptx
MLops on Vertex AI Presentation (AI/ML).pptxMLops on Vertex AI Presentation (AI/ML).pptx
MLops on Vertex AI Presentation (AI/ML).pptx
 
Continuous delivery for machine learning
Continuous delivery for machine learningContinuous delivery for machine learning
Continuous delivery for machine learning
 
Advanced MLflow: Multi-Step Workflows, Hyperparameter Tuning and Integrating ...
Advanced MLflow: Multi-Step Workflows, Hyperparameter Tuning and Integrating ...Advanced MLflow: Multi-Step Workflows, Hyperparameter Tuning and Integrating ...
Advanced MLflow: Multi-Step Workflows, Hyperparameter Tuning and Integrating ...
 
SANTOSH KUMAR M -FD
SANTOSH KUMAR M -FDSANTOSH KUMAR M -FD
SANTOSH KUMAR M -FD
 
Slides-Артем Коваль-Cloud-Native MLOps Framework - DataFest 2021.pdf
Slides-Артем Коваль-Cloud-Native MLOps Framework - DataFest 2021.pdfSlides-Артем Коваль-Cloud-Native MLOps Framework - DataFest 2021.pdf
Slides-Артем Коваль-Cloud-Native MLOps Framework - DataFest 2021.pdf
 

Mais de Robert Grossman

Some Frameworks for Improving Analytic Operations at Your Company
Some Frameworks for Improving Analytic Operations at Your CompanySome Frameworks for Improving Analytic Operations at Your Company
Some Frameworks for Improving Analytic Operations at Your CompanyRobert Grossman
 
Some Proposed Principles for Interoperating Cloud Based Data Platforms
Some Proposed Principles for Interoperating Cloud Based Data PlatformsSome Proposed Principles for Interoperating Cloud Based Data Platforms
Some Proposed Principles for Interoperating Cloud Based Data PlatformsRobert Grossman
 
A Gen3 Perspective of Disparate Data
A Gen3 Perspective of Disparate DataA Gen3 Perspective of Disparate Data
A Gen3 Perspective of Disparate DataRobert Grossman
 
A Data Biosphere for Biomedical Research
A Data Biosphere for Biomedical ResearchA Data Biosphere for Biomedical Research
A Data Biosphere for Biomedical ResearchRobert Grossman
 
What is Data Commons and How Can Your Organization Build One?
What is Data Commons and How Can Your Organization Build One?What is Data Commons and How Can Your Organization Build One?
What is Data Commons and How Can Your Organization Build One?Robert Grossman
 
How Data Commons are Changing the Way that Large Datasets Are Analyzed and Sh...
How Data Commons are Changing the Way that Large Datasets Are Analyzed and Sh...How Data Commons are Changing the Way that Large Datasets Are Analyzed and Sh...
How Data Commons are Changing the Way that Large Datasets Are Analyzed and Sh...Robert Grossman
 
How Data Commons are Changing the Way that Large Datasets Are Analyzed and Sh...
How Data Commons are Changing the Way that Large Datasets Are Analyzed and Sh...How Data Commons are Changing the Way that Large Datasets Are Analyzed and Sh...
How Data Commons are Changing the Way that Large Datasets Are Analyzed and Sh...Robert Grossman
 
Biomedical Clusters, Clouds and Commons - DePaul Colloquium Oct 24, 2014
Biomedical Clusters, Clouds and Commons - DePaul Colloquium Oct 24, 2014Biomedical Clusters, Clouds and Commons - DePaul Colloquium Oct 24, 2014
Biomedical Clusters, Clouds and Commons - DePaul Colloquium Oct 24, 2014Robert Grossman
 
What Are Science Clouds?
What Are Science Clouds?What Are Science Clouds?
What Are Science Clouds?Robert Grossman
 
The Matsu Project - Open Source Software for Processing Satellite Imagery Data
The Matsu Project - Open Source Software for Processing Satellite Imagery DataThe Matsu Project - Open Source Software for Processing Satellite Imagery Data
The Matsu Project - Open Source Software for Processing Satellite Imagery DataRobert Grossman
 
Using the Open Science Data Cloud for Data Science Research
Using the Open Science Data Cloud for Data Science ResearchUsing the Open Science Data Cloud for Data Science Research
Using the Open Science Data Cloud for Data Science ResearchRobert Grossman
 
The Open Science Data Cloud: Empowering the Long Tail of Science
The Open Science Data Cloud: Empowering the Long Tail of ScienceThe Open Science Data Cloud: Empowering the Long Tail of Science
The Open Science Data Cloud: Empowering the Long Tail of ScienceRobert Grossman
 
Bionimbus: Towards One Million Genomes (XLDB 2012 Lecture)
Bionimbus: Towards One Million Genomes (XLDB 2012 Lecture)Bionimbus: Towards One Million Genomes (XLDB 2012 Lecture)
Bionimbus: Towards One Million Genomes (XLDB 2012 Lecture)Robert Grossman
 
Managing Big Data (Chapter 2, SC 11 Tutorial)
Managing Big Data (Chapter 2, SC 11 Tutorial)Managing Big Data (Chapter 2, SC 11 Tutorial)
Managing Big Data (Chapter 2, SC 11 Tutorial)Robert Grossman
 
Introduction to Big Data and Science Clouds (Chapter 1, SC 11 Tutorial)
Introduction to Big Data and Science Clouds (Chapter 1, SC 11 Tutorial)Introduction to Big Data and Science Clouds (Chapter 1, SC 11 Tutorial)
Introduction to Big Data and Science Clouds (Chapter 1, SC 11 Tutorial)Robert Grossman
 
Open Science Data Cloud (IEEE Cloud 2011)
Open Science Data Cloud (IEEE Cloud 2011)Open Science Data Cloud (IEEE Cloud 2011)
Open Science Data Cloud (IEEE Cloud 2011)Robert Grossman
 
Open Science Data Cloud - CCA 11
Open Science Data Cloud - CCA 11Open Science Data Cloud - CCA 11
Open Science Data Cloud - CCA 11Robert Grossman
 
Bionimbus - Northwestern CGI Workshop 4-21-2011
Bionimbus - Northwestern CGI Workshop 4-21-2011Bionimbus - Northwestern CGI Workshop 4-21-2011
Bionimbus - Northwestern CGI Workshop 4-21-2011Robert Grossman
 

Mais de Robert Grossman (18)

Some Frameworks for Improving Analytic Operations at Your Company
Some Frameworks for Improving Analytic Operations at Your CompanySome Frameworks for Improving Analytic Operations at Your Company
Some Frameworks for Improving Analytic Operations at Your Company
 
Some Proposed Principles for Interoperating Cloud Based Data Platforms
Some Proposed Principles for Interoperating Cloud Based Data PlatformsSome Proposed Principles for Interoperating Cloud Based Data Platforms
Some Proposed Principles for Interoperating Cloud Based Data Platforms
 
A Gen3 Perspective of Disparate Data
A Gen3 Perspective of Disparate DataA Gen3 Perspective of Disparate Data
A Gen3 Perspective of Disparate Data
 
A Data Biosphere for Biomedical Research
A Data Biosphere for Biomedical ResearchA Data Biosphere for Biomedical Research
A Data Biosphere for Biomedical Research
 
What is Data Commons and How Can Your Organization Build One?
What is Data Commons and How Can Your Organization Build One?What is Data Commons and How Can Your Organization Build One?
What is Data Commons and How Can Your Organization Build One?
 
How Data Commons are Changing the Way that Large Datasets Are Analyzed and Sh...
How Data Commons are Changing the Way that Large Datasets Are Analyzed and Sh...How Data Commons are Changing the Way that Large Datasets Are Analyzed and Sh...
How Data Commons are Changing the Way that Large Datasets Are Analyzed and Sh...
 
How Data Commons are Changing the Way that Large Datasets Are Analyzed and Sh...
How Data Commons are Changing the Way that Large Datasets Are Analyzed and Sh...How Data Commons are Changing the Way that Large Datasets Are Analyzed and Sh...
How Data Commons are Changing the Way that Large Datasets Are Analyzed and Sh...
 
Biomedical Clusters, Clouds and Commons - DePaul Colloquium Oct 24, 2014
Biomedical Clusters, Clouds and Commons - DePaul Colloquium Oct 24, 2014Biomedical Clusters, Clouds and Commons - DePaul Colloquium Oct 24, 2014
Biomedical Clusters, Clouds and Commons - DePaul Colloquium Oct 24, 2014
 
What Are Science Clouds?
What Are Science Clouds?What Are Science Clouds?
What Are Science Clouds?
 
The Matsu Project - Open Source Software for Processing Satellite Imagery Data
The Matsu Project - Open Source Software for Processing Satellite Imagery DataThe Matsu Project - Open Source Software for Processing Satellite Imagery Data
The Matsu Project - Open Source Software for Processing Satellite Imagery Data
 
Using the Open Science Data Cloud for Data Science Research
Using the Open Science Data Cloud for Data Science ResearchUsing the Open Science Data Cloud for Data Science Research
Using the Open Science Data Cloud for Data Science Research
 
The Open Science Data Cloud: Empowering the Long Tail of Science
The Open Science Data Cloud: Empowering the Long Tail of ScienceThe Open Science Data Cloud: Empowering the Long Tail of Science
The Open Science Data Cloud: Empowering the Long Tail of Science
 
Bionimbus: Towards One Million Genomes (XLDB 2012 Lecture)
Bionimbus: Towards One Million Genomes (XLDB 2012 Lecture)Bionimbus: Towards One Million Genomes (XLDB 2012 Lecture)
Bionimbus: Towards One Million Genomes (XLDB 2012 Lecture)
 
Managing Big Data (Chapter 2, SC 11 Tutorial)
Managing Big Data (Chapter 2, SC 11 Tutorial)Managing Big Data (Chapter 2, SC 11 Tutorial)
Managing Big Data (Chapter 2, SC 11 Tutorial)
 
Introduction to Big Data and Science Clouds (Chapter 1, SC 11 Tutorial)
Introduction to Big Data and Science Clouds (Chapter 1, SC 11 Tutorial)Introduction to Big Data and Science Clouds (Chapter 1, SC 11 Tutorial)
Introduction to Big Data and Science Clouds (Chapter 1, SC 11 Tutorial)
 
Open Science Data Cloud (IEEE Cloud 2011)
Open Science Data Cloud (IEEE Cloud 2011)Open Science Data Cloud (IEEE Cloud 2011)
Open Science Data Cloud (IEEE Cloud 2011)
 
Open Science Data Cloud - CCA 11
Open Science Data Cloud - CCA 11Open Science Data Cloud - CCA 11
Open Science Data Cloud - CCA 11
 
Bionimbus - Northwestern CGI Workshop 4-21-2011
Bionimbus - Northwestern CGI Workshop 4-21-2011Bionimbus - Northwestern CGI Workshop 4-21-2011
Bionimbus - Northwestern CGI Workshop 4-21-2011
 

Último

Heart Disease Classification Report: A Data Analysis Project
Heart Disease Classification Report: A Data Analysis ProjectHeart Disease Classification Report: A Data Analysis Project
Heart Disease Classification Report: A Data Analysis ProjectBoston Institute of Analytics
 
Statistics, Data Analysis, and Decision Modeling, 5th edition by James R. Eva...
Statistics, Data Analysis, and Decision Modeling, 5th edition by James R. Eva...Statistics, Data Analysis, and Decision Modeling, 5th edition by James R. Eva...
Statistics, Data Analysis, and Decision Modeling, 5th edition by James R. Eva...ssuserf63bd7
 
Top 5 Best Data Analytics Courses In Queens
Top 5 Best Data Analytics Courses In QueensTop 5 Best Data Analytics Courses In Queens
Top 5 Best Data Analytics Courses In Queensdataanalyticsqueen03
 
modul pembelajaran robotic Workshop _ by Slidesgo.pptx
modul pembelajaran robotic Workshop _ by Slidesgo.pptxmodul pembelajaran robotic Workshop _ by Slidesgo.pptx
modul pembelajaran robotic Workshop _ by Slidesgo.pptxaleedritatuxx
 
20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdf20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdfHuman37
 
Multiple time frame trading analysis -brianshannon.pdf
Multiple time frame trading analysis -brianshannon.pdfMultiple time frame trading analysis -brianshannon.pdf
Multiple time frame trading analysis -brianshannon.pdfchwongval
 
DBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdfDBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdfJohn Sterrett
 
April 2024 - NLIT Cloudera Real-Time LLM Streaming 2024
April 2024 - NLIT Cloudera Real-Time LLM Streaming 2024April 2024 - NLIT Cloudera Real-Time LLM Streaming 2024
April 2024 - NLIT Cloudera Real-Time LLM Streaming 2024Timothy Spann
 
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)jennyeacort
 
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...limedy534
 
Biometric Authentication: The Evolution, Applications, Benefits and Challenge...
Biometric Authentication: The Evolution, Applications, Benefits and Challenge...Biometric Authentication: The Evolution, Applications, Benefits and Challenge...
Biometric Authentication: The Evolution, Applications, Benefits and Challenge...GQ Research
 
detection and classification of knee osteoarthritis.pptx
detection and classification of knee osteoarthritis.pptxdetection and classification of knee osteoarthritis.pptx
detection and classification of knee osteoarthritis.pptxAleenaJamil4
 
How we prevented account sharing with MFA
How we prevented account sharing with MFAHow we prevented account sharing with MFA
How we prevented account sharing with MFAAndrei Kaleshka
 
RABBIT: A CLI tool for identifying bots based on their GitHub events.
RABBIT: A CLI tool for identifying bots based on their GitHub events.RABBIT: A CLI tool for identifying bots based on their GitHub events.
RABBIT: A CLI tool for identifying bots based on their GitHub events.natarajan8993
 
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...Boston Institute of Analytics
 
Student Profile Sample report on improving academic performance by uniting gr...
Student Profile Sample report on improving academic performance by uniting gr...Student Profile Sample report on improving academic performance by uniting gr...
Student Profile Sample report on improving academic performance by uniting gr...Seán Kennedy
 
Defining Constituents, Data Vizzes and Telling a Data Story
Defining Constituents, Data Vizzes and Telling a Data StoryDefining Constituents, Data Vizzes and Telling a Data Story
Defining Constituents, Data Vizzes and Telling a Data StoryJeremy Anderson
 
Predictive Analysis for Loan Default Presentation : Data Analysis Project PPT
Predictive Analysis for Loan Default  Presentation : Data Analysis Project PPTPredictive Analysis for Loan Default  Presentation : Data Analysis Project PPT
Predictive Analysis for Loan Default Presentation : Data Analysis Project PPTBoston Institute of Analytics
 
办美国阿肯色大学小石城分校毕业证成绩单pdf电子版制作修改#真实留信入库#永久存档#真实可查#diploma#degree
办美国阿肯色大学小石城分校毕业证成绩单pdf电子版制作修改#真实留信入库#永久存档#真实可查#diploma#degree办美国阿肯色大学小石城分校毕业证成绩单pdf电子版制作修改#真实留信入库#永久存档#真实可查#diploma#degree
办美国阿肯色大学小石城分校毕业证成绩单pdf电子版制作修改#真实留信入库#永久存档#真实可查#diploma#degreeyuu sss
 

Último (20)

Heart Disease Classification Report: A Data Analysis Project
Heart Disease Classification Report: A Data Analysis ProjectHeart Disease Classification Report: A Data Analysis Project
Heart Disease Classification Report: A Data Analysis Project
 
Statistics, Data Analysis, and Decision Modeling, 5th edition by James R. Eva...
Statistics, Data Analysis, and Decision Modeling, 5th edition by James R. Eva...Statistics, Data Analysis, and Decision Modeling, 5th edition by James R. Eva...
Statistics, Data Analysis, and Decision Modeling, 5th edition by James R. Eva...
 
Top 5 Best Data Analytics Courses In Queens
Top 5 Best Data Analytics Courses In QueensTop 5 Best Data Analytics Courses In Queens
Top 5 Best Data Analytics Courses In Queens
 
modul pembelajaran robotic Workshop _ by Slidesgo.pptx
modul pembelajaran robotic Workshop _ by Slidesgo.pptxmodul pembelajaran robotic Workshop _ by Slidesgo.pptx
modul pembelajaran robotic Workshop _ by Slidesgo.pptx
 
20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdf20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdf
 
Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
Deep Generative Learning for All - The Gen AI Hype (Spring 2024)Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
 
Multiple time frame trading analysis -brianshannon.pdf
Multiple time frame trading analysis -brianshannon.pdfMultiple time frame trading analysis -brianshannon.pdf
Multiple time frame trading analysis -brianshannon.pdf
 
DBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdfDBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdf
 
April 2024 - NLIT Cloudera Real-Time LLM Streaming 2024
April 2024 - NLIT Cloudera Real-Time LLM Streaming 2024April 2024 - NLIT Cloudera Real-Time LLM Streaming 2024
April 2024 - NLIT Cloudera Real-Time LLM Streaming 2024
 
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
 
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
 
Biometric Authentication: The Evolution, Applications, Benefits and Challenge...
Biometric Authentication: The Evolution, Applications, Benefits and Challenge...Biometric Authentication: The Evolution, Applications, Benefits and Challenge...
Biometric Authentication: The Evolution, Applications, Benefits and Challenge...
 
detection and classification of knee osteoarthritis.pptx
detection and classification of knee osteoarthritis.pptxdetection and classification of knee osteoarthritis.pptx
detection and classification of knee osteoarthritis.pptx
 
How we prevented account sharing with MFA
How we prevented account sharing with MFAHow we prevented account sharing with MFA
How we prevented account sharing with MFA
 
RABBIT: A CLI tool for identifying bots based on their GitHub events.
RABBIT: A CLI tool for identifying bots based on their GitHub events.RABBIT: A CLI tool for identifying bots based on their GitHub events.
RABBIT: A CLI tool for identifying bots based on their GitHub events.
 
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
 
Student Profile Sample report on improving academic performance by uniting gr...
Student Profile Sample report on improving academic performance by uniting gr...Student Profile Sample report on improving academic performance by uniting gr...
Student Profile Sample report on improving academic performance by uniting gr...
 
Defining Constituents, Data Vizzes and Telling a Data Story
Defining Constituents, Data Vizzes and Telling a Data StoryDefining Constituents, Data Vizzes and Telling a Data Story
Defining Constituents, Data Vizzes and Telling a Data Story
 
Predictive Analysis for Loan Default Presentation : Data Analysis Project PPT
Predictive Analysis for Loan Default  Presentation : Data Analysis Project PPTPredictive Analysis for Loan Default  Presentation : Data Analysis Project PPT
Predictive Analysis for Loan Default Presentation : Data Analysis Project PPT
 
办美国阿肯色大学小石城分校毕业证成绩单pdf电子版制作修改#真实留信入库#永久存档#真实可查#diploma#degree
办美国阿肯色大学小石城分校毕业证成绩单pdf电子版制作修改#真实留信入库#永久存档#真实可查#diploma#degree办美国阿肯色大学小石城分校毕业证成绩单pdf电子版制作修改#真实留信入库#永久存档#真实可查#diploma#degree
办美国阿肯色大学小石城分校毕业证成绩单pdf电子版制作修改#真实留信入库#永久存档#真实可查#diploma#degree
 

How to Lower the Cost of Deploying Analytics: An Introduction to the Portable Format for Analytics

  • 1. How  To  Lower  the  Cost  of  Deploying  Analy7cs:   An  Introduc7on  to  the  Portable     Format  for  Analy7cs  (PFA)   Robert  L.  Grossman   University  of  Chicago   and   Open  Data  Group   The  Data  Science  Conference  (Chicago)   April  22,  2016   rgrossman.com   @bobgrossman  
  • 2. Exploratory  Data   Analysis   Get  and     clean  the   data  Build  model  in  dev/ modeling  environment   Deploy  model  in   opera7onal  systems   with  scoring   applica7on     Monitor  performance   and  employ   champion-­‐challenger   methodology   Analy7c  modeling   Analy7c  opera7ons   Deploy   model   Re7re  model  and  deploy   improved  model   Select  analy7c   problem  &   approach   Scale  up     deployment   Perf.   data   Life  Cycle  of  Predic7ve  Model  
  • 3. Exploratory  Data   Analysis   Get  and     clean  the   data  Build  model  in  dev/ modeling  environment   Deploy  model  in   opera7onal  systems   with  scoring   applica7on     Monitor  performance   and  employ   champion-­‐challenger   methodology   Analy7c  modeling   Analy7c  opera7ons   Deploy   model   Re7re  model  and  deploy   improved  model   Select  analy7c   problem  &   approach   Scale  up     deployment   Model Env Deployment Env Perf.   data  
  • 4. Differences  Between  the  Modeling  and   Deployment  Environments   •  Typically  modelers  use  specialized  languages  such  as   SAS,  SPSS  or  R.   •  Usually,  developers  responsible  for  products  and   services  use  languages  such  as  Java,  JavaScript,   Python,  C++,  etc.   •  This  can  result  in  significant  effort  moving  the  model   from  the  modeling  environment  to  the  deployment   environment.  
  • 5. Ways  to  Deploy  Models  into     Products/Services/Opera7ons   •  Push  code.   •  Embed  a  sta7c  model  into  a  product  or  service.   •  Export  and  import  tables  of  scores   •  Export  and  import  tables  of  parameters   •  Have  the  product/service  interact  with  the  model   as  a  web  or  message  service.   •  Import  the  models  into  a  database   How  quickly  can  the  model  be  updated?   •  Model  parameters?   •  New  features?         •  New  pre-­‐  &  post-­‐  processing?  
  • 6. I write all my models in R, why do I need a model interchange format?? Alice,  Data  Scien7st  
  • 8. Analy7c  models   Analy7c  opera7ons   Deploying  analy7c  models   Model   Consumer   Model   Producer   Analy7c  Infrastructure   Export   model   Import   model   PMML  &  PFA  
  • 9. What  is  a  Scoring  Engine?   •  A  scoring  engine  is  a  component  that  is  integrated  into   products  or  enterprise  IT  that  deploys  analy7c  models  in   opera7onal  workflows  for  products  and  services.   •  A  Model  Interchange  Format  is  a  format  that  supports   the  expor7ng  of  a  model  by  one  applica7on  and  the   impor7ng  of  a  model  by  another  applica7on.       •  Model  Interchange  Formats  include  the  Predic7ve  Model   Markup  Language  (PMML),  the  Portable  Format  for   Analy7cs  (PFA),  and  various  in-­‐house  or  custom  formats.   •  Scoring  engines  are  integrated  once,  but  allow   applica7ons  to  update  models  as  quickly  as  reading  a  a   model  interchange  format  file.   9  
  • 10. PMML  Philosophy   •  PMML  is  a  specifica(on  of  a  model,  not  an   implementa7on  of  a  model   •  PMML  allows  a  simple  means  of  binding   parameters  to  values  for  an  agreed  upon  set  of   data  mining  models  &  transforma7ons   •  Because  of  the  specifica7on  nature  of  PMML,  a   compliant  scoring  engine  must  support  a  large   combinatorial  combina7on  of  specifica7ons,  and   it  can  be  challenging  to  develop  a  consistent   scoring  engine.   10  
  • 11. PFA  Philosophy   •  Define  primi7ves  for  data  transforma7ons,  data   aggrega7ons,  and  sta7s7cal  and  analy7c  models.   •  Support  composi7on  of  data  mining  primi7ves   (which  makes  it  easy  to  specify  machine  learning   algorithms  and  pre-­‐/post-­‐  processing  of  data).   •  Be  extensible.   •  Designed  to  be  “safe”  to  deploy  in  enterprise  IT   opera7onal  environments.   •  This  is  a  philosophy  that  is  different  and   complementary  to  Predic7ve  Model  Markup   Language  (PMML).   11  
  • 12. PFA  Case  Study  1   •  20+  person  data  science  group  developing  models  in   R,  Python,  Scikit-­‐learn  and  MATLAB.   •  All  the  data  scien7sts  export  their  model  in  PFA.   •  The  company’s  product  imports  models  in  PFA  and   runs  on  their  customers  data  as  required.   Export  PFA   Import  PFA   Widget   records   Widget   scores  
  • 13. PFA  Case  Study  2   •  Data  scien7st  teams  developing  analy7c  models  for   adversarial  analy7cs  project.   •  Models  developed  in  Hadoop  and  exported  in  PFA  every   2  weeks.     •  Models  updated  in  client  systems  every  week.   Export  PFA   Import  PFA   Event   records   Event   scores   Weeks  1/2,  3/4,     Weeks  2/3,  4/5,  …   Weeks  2,  3,  4,  5,  …  
  • 14. PFA  Func7onality   •  PFA  codes  arbitrary  mathema7cal  algorithms  in  a   7ghtly  controlled  environment.   •  PFA  has  all  the  standard  flow  control  of  a   programming  language:  if/then/else  &    for/while   loops.   •  PFA  has  func7on  calls  and  func7on  call  backs     •  PFA  has  algebraic  data  types.   •  PFA  is  encoded  as  func7on  calls  in  JSON      {func7on:  [arg  1,  arg  2,  …,  arg  n]  }   14  
  • 15. Example:  Scoring  Clusters   15   Source:  dmg.org/pfa  
  • 17. Benefits  of  PFA   •  PFA  is  based  upon  JSON  and  Avro  and  integrates   easily  into  modern  big  data  environments.   •  PFA  allows  models  to  be  easily  chained  and   composed   •  PFA  allows  developers  and  users  users  of  analy7c   systems  to  pre-­‐process  inputs  and  to  post-­‐process   outputs  to  models   •  PFA  is  easily  integrated  with  Hadoop,  Spark,  etc.   •  PFA  is  easily  integrated  with  Kapa,  Storm,  Akka  and   other  streaming  environments   •  PFA  can  be  used  to  integrate  mul7ple    tools   applica7ons  within  an  analy7c  ecosystem.  
  • 18. Gaussian  Process  Model  (1  of  5)  
  • 19. Gaussian  Process  Model  (2  of  5)   input: {type: array, items: double} output: {type: array, items: double} cells: table: type: {type: array, items: {type: record, name: GP, fields: [ - {name: x, type: {type: array, items: double}} - {name: to, type: {type: array, items: double}} - {name: sigma, type: {type: array, items: double}}]}} init: - {x: [ 0, 0], to: [0.01870587, 0.96812508], sigma: [0.2, 0.2]} - {x: [ 0, 36], to: [0.00242101, 0.95369720], sigma: [0.2, 0.2]} - {x: [ 0, 72], to: [0.13131668, 0.53822666], sigma: [0.2, 0.2]} ... - {x: [324, 324], to: [-0.6815587, 0.82271760], sigma: [0.2, 0.2]} action: model.reg.gaussianProcess: - input - {cell: table} - null - {fcn: m.kernel.rbf, fill: {gamma: 2.0}} input  and  output  of  scoring  engine   expressed  as  Avro  schemas   Source:  dmg.org/pfa  
  • 20. Gaussian  Process  Model  (3  of  5)   input: {type: array, items: double} output: {type: array, items: double} cells: table: type: {type: array, items: {type: record, name: GP, fields: [ - {name: x, type: {type: array, items: double}} - {name: to, type: {type: array, items: double}} - {name: sigma, type: {type: array, items: double}}]}} init: - {x: [ 0, 0], to: [0.01870587, 0.96812508], sigma: [0.2, 0.2]} - {x: [ 0, 36], to: [0.00242101, 0.95369720], sigma: [0.2, 0.2]} - {x: [ 0, 72], to: [0.13131668, 0.53822666], sigma: [0.2, 0.2]} ... - {x: [324, 324], to: [-0.6815587, 0.82271760], sigma: [0.2, 0.2]} action: model.reg.gaussianProcess: - input - {cell: table} - null - {fcn: m.kernel.rbf, fill: {gamma: 2.0}} type   (also  Avro)   and  value   (as  JSON,   truncated)   Gaussian  Process   model  parameters   Source:  dmg.org/pfa  
  • 21. Gaussian  Process  Model  (4  of  5)   input: {type: array, items: double} output: {type: array, items: double} cells: table: type: {type: array, items: {type: record, name: GP, fields: [ - {name: x, type: {type: array, items: double}} - {name: to, type: {type: array, items: double}} - {name: sigma, type: {type: array, items: double}}]}} init: - {x: [ 0, 0], to: [0.01870587, 0.96812508], sigma: [0.2, 0.2]} - {x: [ 0, 36], to: [0.00242101, 0.95369720], sigma: [0.2, 0.2]} - {x: [ 0, 72], to: [0.13131668, 0.53822666], sigma: [0.2, 0.2]} ... - {x: [324, 324], to: [-0.6815587, 0.82271760], sigma: [0.2, 0.2]} action: model.reg.gaussianProcess: - input - {cell: table} - null - {fcn: m.kernel.rbf, fill: {gamma: 2.0}} calling  method:  parameters   expressed  as  JSON   input:  get  interpola7on  point  from  input   {cell:  table}:  get  parameters  from  table   null:  no  explicit  Kriging  weight  (universal)   {fcn:  …}:  kernel  func7on   Source:  dmg.org/pfa  
  • 22. Gaussian  Process  Model  (5  of  5)   •  Appears  declara7ve,  but  this  is  a  func7on  call.   –  Fourth  parameter  is  another  func7on:  m.kernel.rbf  (radial  basis   kernel,  a.k.a.  squared  exponen7al).   –   m.kernel.rbf  was  intended  for  SVM,  but  is  reusable  anywhere.   –  One  argument  (gamma)  preapplied  so  that  it  fits  the  signature   for  model.reg.gaussianProcess.   •  Any  kernel  func7on  could  be  used,  including  user-­‐defined  func7ons   wriren  with  PFA  “code.”   •  The  Gaussian  Process  could  be  used  anywhere,  even  as  a  pre-­‐ processing  or  post-­‐processing  step.   model.reg.gaussianProcess: - input - {cell: table} - null - {fcn: m.kernel.rbf, fill: {gamma: 2.0}} Source:  dmg.org/pfa  
  • 23. Summary   •  The  Portable  Format  for  Analy7cs  (PFA)  is  a  model   interchange  format  for  building  analy7c  models  in  one   environment  and  deploying  them  in  another  one.   •  Based  upon  data  mining  primi7ves.   •  Supports  pre-­‐processing,  analy7c  models,  post-­‐ processing,  and  composi7on  of  primi7ves  and  models.   •  You  can  easily  add  your  own  PFA  models  since  you  can   add  your  own  PFA  func7ons   •  There  is  a  reference  implementa7on  and  thousands  of   compliance  tests.     •  Standard  being  developed  by  the  not-­‐for-­‐profit  DMG,   which  developed  PMML.  
  • 24. Ques7ons?   24   For  more  informa7on,  see:  dmg.org/pfa