SlideShare uma empresa Scribd logo
1 de 36
Baixar para ler offline
Leah McGuire
Principal Member of Technical Staff, Salesforce Einstein
lmcguire@salesforce.com
@leahmcguire
Types, Types, Types!
Embracing a hierarchy of types to simplify machine
learning
Let’s make sure you are in the right talk
​ What I am going to talk about:
• What does machine learning mean at Salesforce
• Problems in machine learning for business to business (B2B)
companies
• Automating machine learning and how our AutoML library (Optimus
Prime) works
• The utility of having strongly typed features in AutoML
• What we have learned and what we are planning
Salesforce and Machine
Learning
Salesforce
The Problem
​ For the majority of businesses, data science is out of reach
Sales Cloud Einstein
Predictive Lead Scoring
Opportunity Insights
Automated Activity Capture
Building World’s Smartest CRM
Commerce Cloud
Einstein
Product Recommendations
Predictive Sort
Commerce Insights
App Cloud Einstein
Heroku + PredictionIO
Predictive Vision Services
Predictive Sentiment Services
Predictive Modeling Services
Service Cloud Einstein
Recommended Case Classification
Recommended Responses
Predictive Close Time
Marketing Cloud Einstein
Predictive Scoring
Predictive Audiences
Automated Send-time Optimization
Community Cloud Einstein
Recommended Experts, Articles & Topics
Automated Service Escalation
Newsfeed Insights
Analytics Cloud Einstein
Predictive Wave Apps
Smart Data Discovery
Automated Analytics & Storytelling
IoT Cloud Einstein
Predictive Device Scoring
Recommend Best Next Action
Automated IoT Rules Optimization
Machine learning workflows
And how much more complicated they get for B2B
Feature
Engineering
Model
Training
Model A
Model B
Model C
Model
Evaluation
​ What Kaggle would lead us to believe
Building a machine learning model
Real-life ML
​ Building a ML model workflow
ETL
Model
Evaluation
Feature
Engineering
Scoring
Model
Training
Model A
Model B
Model C
Deployment
​ Over and over again
Building a machine learning model
ETL
Model
Evaluation
Feature
Engineering
Scoring
Model
Trainin
g
Model
A
Model
B
Model
C
Deployment
ETL
Model
Evaluation
Feature
Engineering
Scoring
Model
Trainin
g
Model
A
Model
B
Model
C
Deployment
ETL
Model
Evaluation
Feature
Engineering
Scoring
Model
Trainin
g
Model
A
Model
B
Model
C
Deployment
We can’t build one global model
• Privacy concerns
•  Customers don’t want data cross-
pollinated
• Business Use Cases
•  Industries are very different
•  Processes are different
• Platform customization
•  Ability to create custom fields and
objects
• Scale, Automation,
•  Ability to create
​ Over and over again
Building a machine learning model
M
o
d
e
l
E
v
a
l
u
a
t
i
o
n
F
e
a
t
u
r
e
E
n
g
i
n
e
e
r
i
n
g
S
c
o
r
i
n
g
D
e
p
l
o
y
m
e
n
t
M
o
d
e
l
E
v
a
l
u
a
t
i
o
n
F
e
a
t
u
r
e
E
n
g
i
n
e
e
r
i
n
g
S
c
o
r
i
n
g
D
e
p
l
o
y
m
e
n
t
M
o
d
e
l
E
v
a
l
u
a
t
i
o
n
F
e
a
t
u
r
e
E
n
g
i
n
e
e
r
i
n
g
S
c
o
r
i
n
g
D
e
p
l
o
y
m
e
n
t
M
o
d
e
l
E
v
a
l
u
a
t
i
o
n
F
e
a
t
u
r
e
E
n
g
i
n
e
e
r
i
n
g
S
c
o
r
i
n
g
D
e
p
l
o
y
m
e
n
t
ET
L
Model
Evaluati
on
Feature
Enginee
ring
Scoring
Mod
el
Trai
ning
Mod
el A
Mod
el B
Mod
el C
Deploy
ment
M
o
d
el
E
v
al
u
at
io
n
F
e
at
ur
e
E
n
gi
n
e
er
in
g
S
c
o
ri
n
g
M
o
d
e
l
T
r
a
i
n
i
n
g
M
o
d
e
l
A
M
o
d
e
l
B
M
o
d
e
l
C
D
e
pl
o
y
m
e
nt
M
o
d
e
l
E
v
a
l
u
a
t
i
o
n
F
e
a
t
u
r
e
E
n
g
i
n
e
e
r
i
n
g
S
c
o
r
i
n
g
D
e
p
l
o
y
m
e
n
t
M
o
d
e
l
E
v
a
l
F
e
a
t
u
r
e
E
n
g
i
n
S
c
o
r
i
n
g
D
e
p
l
o
y
m
e
n
M
o
d
e
l
E
v
a
l
u
a
t
i
o
n
F
e
a
t
u
r
e
E
n
g
i
n
e
e
r
i
n
g
S
c
o
r
i
n
g
D
e
p
l
o
y
m
e
n
t
M
o
d
e
l
E
v
F
e
a
t
u
r
e
E
n
S
c
o
r
i
n
g
D
e
p
l
o
y
m
M
o
d
e
l
E
v
a
l
u
F
e
a
t
u
r
e
E
n
g
i
n
S
c
o
r
i
n
g
D
e
p
l
o
y
m
e
n
t
M
o
a
F
e
a
t
u
r
S
c
o
D
e
p
Mo
del
Ev
alu
ati
on
Fe
atu
re
En
gin
eer
ing
Sc
ori
ng
M
o
d
e
l
T
r
a
i
n
i
n
g
M
o
d
e
l
A
M
o
d
e
l
B
M
o
d
e
l
C
De
plo
ym
ent
M
o
d
e
l
E
v
a
l
u
a
t
i
o
F
e
a
t
u
r
e
E
n
g
i
n
e
e
r
i
n
S
c
o
r
i
n
g
D
e
p
l
o
y
m
e
n
t
M
o
d
e
l
E
v
a
l
u
a
t
i
o
n
F
e
a
t
u
r
e
E
n
g
i
n
e
e
r
i
n
g
S
c
o
r
i
n
g
D
e
p
l
o
y
m
e
n
t
M
o
d
e
l
E
v
F
e
a
t
u
r
e
E
n
g
S
c
o
r
i
n
g
D
e
p
l
o
y
m
Automating machine learning
Enter Einstein (and Optimus Prime)
•  ML is not magic, just statistics –
generalizing examples
•  But there is a ‘black art’ to producing
good models
•  Input data needs to be combined, filtered,
cleaned etc.
•  Producing the best features for your model
takes time
•  You can’t just throw a ml algorithm at your raw
data and expect good results
Turning a black art into a paint by number kit.
Keep it DRY (don’t repeat yourself) and DRO (don’t repeat
others)
• The Spark ML pipeline (estimator, transformer) model is nice
• The lack of types in Spark is not
• Want to use more than Spark ML
• Declarative and intuitive syntax – for both
workflow generation and developers
• Typed reusable operations
• Multitenant application support
• All built in scala
Optimus Prime - A library to develop reusable, modular and typed ML workflows
Simple interchangeable parts
​ val featureVector = Seq(pClass, name, gender, age, sibSp, parch, ticket, cabin, embarked).vectorize()
​ val (pred, raw, prob) = featureVector.check(survived).classify(survived)
​ val workflow = new OpWorkflow().setResultFeatures(pred).setDataReader(titanicReader)
​ In a declarative type safe syntax
Stages
Features
transformed with
produce
Transformers Estimators
fitted into
ReadersWorkflows
input data
materialized by
Automating typed feature
engineering and modeling
(with Optimus Prime)
Features are given a type on creation
​ val gender = FeatureBuilder.Categorical[Titanic]
.extract(d => Option(d.getGender).toSet[String]).asPredictor
• Features are strongly typed
• Each stage takes specific input type(s) and returns a specific output type(s)
​ Death to runtime errors!
StagesFeatures
produce
Creating a workflow DAG with features
• Features point to a
column of data
• The type of the feature
determines which stages
can act on it
pred
prob
rawPred
survived
featureVector
gender age name
genderPivot title
Creating a workflow DAG with features
• When a stage acts on a
feature it produces a new
feature (or features)
• Keep on manipulating
features until you get
your goal
pred
prob
rawPred
survived
featureVector
gender age name
genderPivot title
Pivot Title Regex
Combine
Model
Done manipulating your features? Make them.
Stages
Features
transformed with
produce
Transformers Estimators
fitted
into
ReadersWorkflows
input data
materialized by
• Once you make your final feature you have the full DAG
• Features are materialized by the workflow
• Initial data into the workflow provided by the reader
The power of types!
Using types to automate feature engineering
​ val featureVector = Seq(pClass, name, gender, age, sibSp, parch, ticket, cabin, embarked).vectorize()
• Each feature is mapped to an appropriate .vectorize() stage based on its type
•  gender (a Categorical) and age (a Real) are automatically assigned to different stages
• You also have an option to do the exact type safe manipulations you want
•  age can undergo special transformations if desired
•  val ageBuckets =age.bucketize(buckets(0, 10, 20, 40, 100))
•  val featureVector = Seq(pClass, name, gender, ageBuckets, sibSp, parch, ticket, cabin,
embarked).vectorize()
Show me the types!
FeatureType
OPNumeric OPCollection
OPSet
OPSortedSet
OPList
NonNullable
Text
Email
Base64
Phone
ID
URL
ComboBox
PickList
TextArea
OPVector OPMap
BinaryMap
IntegralMap
RealMap
CategoricalMap
OrdinalMap
DateList
DateTimeList
Integral
Real
Binary
Percent
Currency
Date
DateTime Categorical
MultiPickList
Note: all the types are assumed to be nullable, unless NonNullable trait is mixed - https://developer.salesforce.com/docs/atlas.en-us.api.meta/api/field_types.htm
Ordinal
TextMap
Legend: - inheritance, bold - abstract class, italic - trait, normal - concrete class
...
Optimus Prime Type Hierarchy
TextList
City
Street
Country
Postal Code
Location
State
Geolocation
StateMap
Take the types away!!
• Sometimes a type is all you have
• Hierarchy allows both very specific
and very general stages
• Type safety for production saves a
lot of headaches
​ Why would we make this monstrosity??
FeatureType
OPNumeric OPCollection
OPSet
OPSortedSet
OPList
NonNullable
Text
Email
Base64
Phone
ID
URL
ComboBox
PickList
TextArea
OPVector OPMap
BinaryMap
IntegralMap
RealMap
CategoricalMap
OrdinalMap
DateList
DateTimeList
Integral
Real
Binary
Percent
Currency
Date
DateTime Categorical
MultiPickList
Note: all the types are assumed to be nullable, unless NonNullable trait is mixed - https://developer.salesforce.com/docs/atlas.en-us.api.meta/api/field_types.htm
Ordinal
TextMap
Legend: - inheritance, bold - abstract class, italic - trait, normal - concrete class
...
Optimus Prime Type Hierarchy
TextList
City
Street
Country
Postal Code
Location
State
Geolocation
StateMap
Sanity Checking – the stage that checks your features
• Check data quality before modeling
• Label leakage
• Features have acceptable ranges
• The feature types allow much better checks
val checkedVector = featureVector.check(survived)
Model Selection Stage - Resampling, Hyper-parameter
Tuning, Comparing Models
• Many possible models for each class of
problem
• Many hyper parameters for each type of
model
• Finding the right model for THIS dataset
makes a huge difference
val (pred, raw, prob) = checkedFeatureVector.classify(survived)
Types can save us
​ And if you don’t believe me take a look at the code
val featureVector = Seq(pClass, name, gender, age, sibSp, parch, ticket, cabin, embarked).vectorize()
val (pred, raw, prob) = featureVector.check(survived).classify(survived)
val workflow = new OpWorkflow().setResultFeatures(pred).setDataReader(titanicReader)
Types can save us
​ And if you don’t believe me take a look at the code
val featureVector = Seq(pClass, name, gender, age, sibSp, parch, ticket, cabin, embarked).vectorize()
val (pred, raw, prob) = featureVector.check(survived).classify(survived)
val workflow = new OpWorkflow().setResultFeatures(pred).setDataReader(titanicReader)
def addFeatures(df: DataFrame): DataFrame = {!
// Create a new family size field := siblings + spouses + parents + children + self!
val familySizeUDF = udf { (sibsp: Double, parch: Double) => sibsp + parch + 1 }!
!
df.withColumn("fsize", familySizeUDF(col("sibsp"), col("parch"))) // <-- full freedom to overwrite !
}!
!
def fillMissing(df: DataFrame): DataFrame = {!
// Fill missing age values with average age!
val avgAge = df.select("age").agg(avg("age")).collect.first()!
!
// Fill missing embarked values with default "S" (i.e Southampton)!
val embarkedUDF = udf{(e: String)=> e match { case x if x == null || x.isEmpty => "S"; case x => x}}!
!
df.na.fill(Map("age" -> avgAge)).withColumn("embarked", embarkedUDF(col("embarked")))!
}!
Types can save us
​ And if you don’t believe me take a look at the code
// Modify the dataframe!
val allData = fillMissing(addFeatures(rawData)).cache() // <-- need to remember about caching!
// Split the data and cache it!
val Array(trainSet, testSet) = allData.randomSplit(Array(0.75, 0.25)).map(_.cache())!
!
// Prepare categorical columns!
val categoricalFeatures = Array("pclass", "sex", "embarked")!
val stringIndexers = categoricalFeatures.map(colName =>!
new StringIndexer().setInputCol(colName).setOutputCol(colName + "_index").fit(allData)!
)!
!
// Concat all the feature into a numeric feature vector!
val allFeatures = Array("age", "sibsp", "parch", "fsize") ++ stringIndexers.map(_.getOutputCol)!
!
val vectorAssembler = new VectorAssembler().setInputCols(allFeatures).setOutputCol("feature_vector”)!
!
// Prepare Logistic Regression estimator!
val logReg = new LogisticRegression().setFeaturesCol("feature_vector").setLabelCol("survived”)!
!
// Finally build the pipeline with the stages above!
val pipeline = new Pipeline().setStages(stringIndexers ++ Array(vectorAssembler, logReg))!
Types can save us
​ And if you don’t believe me take a look at the code
// Cross validate our pipeline with various parameters!
val paramGrid =!
new ParamGridBuilder()!
.addGrid(logReg.regParam, Array(1, 0.1, 0.01))!
.addGrid(logReg.maxIter, Array(10, 50, 100))!
.build()!
!
val crossValidator =!
new CrossValidator()!
.setEstimator(pipeline) // <-- set our pipeline here!
.setEstimatorParamMaps(paramGrid)!
.setEvaluator(new BinaryClassificationEvaluator().setLabelCol("survived"))!
.setNumFolds(3)!
!
// Train the model & compute scores !
val model: CrossValidationModel = crossValidator.fit(trainSet)!
val scores: DataFrame = model.transform(testSet)!
!
// Save the model for later use!
model.save("/models/titanic-model.ml")!
!
Where are we going and what
have we learned
Key takeaways
• ML for B2B is a whole other beast
• Spark ML is great, but it needs type safety
• Simple and intuitive syntax saves you trouble down the road
• Types in ML are incredibly useful
• Scala has all the relevant facilities to provide the above
• Modularity and reusability is the key
Going forward with Optimus Prime
• Going beyond Spark ML for algorithms and small scale
• Making everything smarter (feature eng, sanity checking, model selection)
• Template generation
• Improvements to developer interface
If You’re Curious …
einstein-recruiting@salesforce.com
Thank Y u

Mais conteúdo relacionado

Mais procurados

Data-Driven Transformation: Leveraging Big Data at Showtime with Apache Spark
Data-Driven Transformation: Leveraging Big Data at Showtime with Apache SparkData-Driven Transformation: Leveraging Big Data at Showtime with Apache Spark
Data-Driven Transformation: Leveraging Big Data at Showtime with Apache SparkDatabricks
 
Build, Scale, and Deploy Deep Learning Pipelines Using Apache Spark
Build, Scale, and Deploy Deep Learning Pipelines Using Apache SparkBuild, Scale, and Deploy Deep Learning Pipelines Using Apache Spark
Build, Scale, and Deploy Deep Learning Pipelines Using Apache SparkDatabricks
 
A full Machine learning pipeline in Scikit-learn vs in scala-Spark: pros and ...
A full Machine learning pipeline in Scikit-learn vs in scala-Spark: pros and ...A full Machine learning pipeline in Scikit-learn vs in scala-Spark: pros and ...
A full Machine learning pipeline in Scikit-learn vs in scala-Spark: pros and ...Jose Quesada (hiring)
 
Apache ® Spark™ MLlib 2.x: How to Productionize your Machine Learning Models
Apache ® Spark™ MLlib 2.x: How to Productionize your Machine Learning ModelsApache ® Spark™ MLlib 2.x: How to Productionize your Machine Learning Models
Apache ® Spark™ MLlib 2.x: How to Productionize your Machine Learning ModelsAnyscale
 
Unified Framework for Real Time, Near Real Time and Offline Analysis of Video...
Unified Framework for Real Time, Near Real Time and Offline Analysis of Video...Unified Framework for Real Time, Near Real Time and Offline Analysis of Video...
Unified Framework for Real Time, Near Real Time and Offline Analysis of Video...Spark Summit
 
Informational Referential Integrity Constraints Support in Apache Spark with ...
Informational Referential Integrity Constraints Support in Apache Spark with ...Informational Referential Integrity Constraints Support in Apache Spark with ...
Informational Referential Integrity Constraints Support in Apache Spark with ...Databricks
 
Spark After Dark: Real time Advanced Analytics and Machine Learning with Spark
Spark After Dark:  Real time Advanced Analytics and Machine Learning with SparkSpark After Dark:  Real time Advanced Analytics and Machine Learning with Spark
Spark After Dark: Real time Advanced Analytics and Machine Learning with SparkChris Fregly
 
Distributed Inference on Large Datasets Using Apache MXNet and Apache Spark ...
 Distributed Inference on Large Datasets Using Apache MXNet and Apache Spark ... Distributed Inference on Large Datasets Using Apache MXNet and Apache Spark ...
Distributed Inference on Large Datasets Using Apache MXNet and Apache Spark ...Databricks
 
Practical Machine Learning Pipelines with MLlib
Practical Machine Learning Pipelines with MLlibPractical Machine Learning Pipelines with MLlib
Practical Machine Learning Pipelines with MLlibDatabricks
 
A Tale of Three Deep Learning Frameworks: TensorFlow, Keras, & Deep Learning ...
A Tale of Three Deep Learning Frameworks: TensorFlow, Keras, & Deep Learning ...A Tale of Three Deep Learning Frameworks: TensorFlow, Keras, & Deep Learning ...
A Tale of Three Deep Learning Frameworks: TensorFlow, Keras, & Deep Learning ...Databricks
 
Building a Location Based Social Graph in Spark at InMobi-(Seinjuti Chatterje...
Building a Location Based Social Graph in Spark at InMobi-(Seinjuti Chatterje...Building a Location Based Social Graph in Spark at InMobi-(Seinjuti Chatterje...
Building a Location Based Social Graph in Spark at InMobi-(Seinjuti Chatterje...Spark Summit
 
Operationalizing Machine Learning—Managing Provenance from Raw Data to Predic...
Operationalizing Machine Learning—Managing Provenance from Raw Data to Predic...Operationalizing Machine Learning—Managing Provenance from Raw Data to Predic...
Operationalizing Machine Learning—Managing Provenance from Raw Data to Predic...Databricks
 
Continuous Evaluation of Deployed Models in Production Many high-tech industr...
Continuous Evaluation of Deployed Models in Production Many high-tech industr...Continuous Evaluation of Deployed Models in Production Many high-tech industr...
Continuous Evaluation of Deployed Models in Production Many high-tech industr...Databricks
 
Spark and machine learning in microservices architecture
Spark and machine learning in microservices architectureSpark and machine learning in microservices architecture
Spark and machine learning in microservices architectureStepan Pushkarev
 
Apache con big data 2015 - Data Science from the trenches
Apache con big data 2015 - Data Science from the trenchesApache con big data 2015 - Data Science from the trenches
Apache con big data 2015 - Data Science from the trenchesVinay Shukla
 
Bring Satellite and Drone Imagery into your Data Science Workflows
Bring Satellite and Drone Imagery into your Data Science WorkflowsBring Satellite and Drone Imagery into your Data Science Workflows
Bring Satellite and Drone Imagery into your Data Science WorkflowsDatabricks
 
MLeap: Productionize Data Science Workflows Using Spark
MLeap: Productionize Data Science Workflows Using SparkMLeap: Productionize Data Science Workflows Using Spark
MLeap: Productionize Data Science Workflows Using SparkJen Aman
 
Jump Start with Apache Spark 2.0 on Databricks
Jump Start with Apache Spark 2.0 on DatabricksJump Start with Apache Spark 2.0 on Databricks
Jump Start with Apache Spark 2.0 on DatabricksAnyscale
 

Mais procurados (20)

Deploying Machine Learning Models to Production
Deploying Machine Learning Models to ProductionDeploying Machine Learning Models to Production
Deploying Machine Learning Models to Production
 
Data-Driven Transformation: Leveraging Big Data at Showtime with Apache Spark
Data-Driven Transformation: Leveraging Big Data at Showtime with Apache SparkData-Driven Transformation: Leveraging Big Data at Showtime with Apache Spark
Data-Driven Transformation: Leveraging Big Data at Showtime with Apache Spark
 
Build, Scale, and Deploy Deep Learning Pipelines Using Apache Spark
Build, Scale, and Deploy Deep Learning Pipelines Using Apache SparkBuild, Scale, and Deploy Deep Learning Pipelines Using Apache Spark
Build, Scale, and Deploy Deep Learning Pipelines Using Apache Spark
 
A full Machine learning pipeline in Scikit-learn vs in scala-Spark: pros and ...
A full Machine learning pipeline in Scikit-learn vs in scala-Spark: pros and ...A full Machine learning pipeline in Scikit-learn vs in scala-Spark: pros and ...
A full Machine learning pipeline in Scikit-learn vs in scala-Spark: pros and ...
 
Apache ® Spark™ MLlib 2.x: How to Productionize your Machine Learning Models
Apache ® Spark™ MLlib 2.x: How to Productionize your Machine Learning ModelsApache ® Spark™ MLlib 2.x: How to Productionize your Machine Learning Models
Apache ® Spark™ MLlib 2.x: How to Productionize your Machine Learning Models
 
Dev Ops Training
Dev Ops TrainingDev Ops Training
Dev Ops Training
 
Unified Framework for Real Time, Near Real Time and Offline Analysis of Video...
Unified Framework for Real Time, Near Real Time and Offline Analysis of Video...Unified Framework for Real Time, Near Real Time and Offline Analysis of Video...
Unified Framework for Real Time, Near Real Time and Offline Analysis of Video...
 
Informational Referential Integrity Constraints Support in Apache Spark with ...
Informational Referential Integrity Constraints Support in Apache Spark with ...Informational Referential Integrity Constraints Support in Apache Spark with ...
Informational Referential Integrity Constraints Support in Apache Spark with ...
 
Spark After Dark: Real time Advanced Analytics and Machine Learning with Spark
Spark After Dark:  Real time Advanced Analytics and Machine Learning with SparkSpark After Dark:  Real time Advanced Analytics and Machine Learning with Spark
Spark After Dark: Real time Advanced Analytics and Machine Learning with Spark
 
Distributed Inference on Large Datasets Using Apache MXNet and Apache Spark ...
 Distributed Inference on Large Datasets Using Apache MXNet and Apache Spark ... Distributed Inference on Large Datasets Using Apache MXNet and Apache Spark ...
Distributed Inference on Large Datasets Using Apache MXNet and Apache Spark ...
 
Practical Machine Learning Pipelines with MLlib
Practical Machine Learning Pipelines with MLlibPractical Machine Learning Pipelines with MLlib
Practical Machine Learning Pipelines with MLlib
 
A Tale of Three Deep Learning Frameworks: TensorFlow, Keras, & Deep Learning ...
A Tale of Three Deep Learning Frameworks: TensorFlow, Keras, & Deep Learning ...A Tale of Three Deep Learning Frameworks: TensorFlow, Keras, & Deep Learning ...
A Tale of Three Deep Learning Frameworks: TensorFlow, Keras, & Deep Learning ...
 
Building a Location Based Social Graph in Spark at InMobi-(Seinjuti Chatterje...
Building a Location Based Social Graph in Spark at InMobi-(Seinjuti Chatterje...Building a Location Based Social Graph in Spark at InMobi-(Seinjuti Chatterje...
Building a Location Based Social Graph in Spark at InMobi-(Seinjuti Chatterje...
 
Operationalizing Machine Learning—Managing Provenance from Raw Data to Predic...
Operationalizing Machine Learning—Managing Provenance from Raw Data to Predic...Operationalizing Machine Learning—Managing Provenance from Raw Data to Predic...
Operationalizing Machine Learning—Managing Provenance from Raw Data to Predic...
 
Continuous Evaluation of Deployed Models in Production Many high-tech industr...
Continuous Evaluation of Deployed Models in Production Many high-tech industr...Continuous Evaluation of Deployed Models in Production Many high-tech industr...
Continuous Evaluation of Deployed Models in Production Many high-tech industr...
 
Spark and machine learning in microservices architecture
Spark and machine learning in microservices architectureSpark and machine learning in microservices architecture
Spark and machine learning in microservices architecture
 
Apache con big data 2015 - Data Science from the trenches
Apache con big data 2015 - Data Science from the trenchesApache con big data 2015 - Data Science from the trenches
Apache con big data 2015 - Data Science from the trenches
 
Bring Satellite and Drone Imagery into your Data Science Workflows
Bring Satellite and Drone Imagery into your Data Science WorkflowsBring Satellite and Drone Imagery into your Data Science Workflows
Bring Satellite and Drone Imagery into your Data Science Workflows
 
MLeap: Productionize Data Science Workflows Using Spark
MLeap: Productionize Data Science Workflows Using SparkMLeap: Productionize Data Science Workflows Using Spark
MLeap: Productionize Data Science Workflows Using Spark
 
Jump Start with Apache Spark 2.0 on Databricks
Jump Start with Apache Spark 2.0 on DatabricksJump Start with Apache Spark 2.0 on Databricks
Jump Start with Apache Spark 2.0 on Databricks
 

Semelhante a Embracing a Taxonomy of Types to Simplify Machine Learning with Leah McGuire

Fantastic ML apps and how to build them
Fantastic ML apps and how to build themFantastic ML apps and how to build them
Fantastic ML apps and how to build themMatthew Tovbin
 
Write Generic Code with the Tooling API
Write Generic Code with the Tooling APIWrite Generic Code with the Tooling API
Write Generic Code with the Tooling APIAdam Olshansky
 
OSCON 2014: Data Workflows for Machine Learning
OSCON 2014: Data Workflows for Machine LearningOSCON 2014: Data Workflows for Machine Learning
OSCON 2014: Data Workflows for Machine LearningPaco Nathan
 
Scaling Machine Learning from zero to millions of users (May 2019)
Scaling Machine Learning from zero to millions of users (May 2019)Scaling Machine Learning from zero to millions of users (May 2019)
Scaling Machine Learning from zero to millions of users (May 2019)Julien SIMON
 
Data Workflows for Machine Learning - Seattle DAML
Data Workflows for Machine Learning - Seattle DAMLData Workflows for Machine Learning - Seattle DAML
Data Workflows for Machine Learning - Seattle DAMLPaco Nathan
 
Data Workflows for Machine Learning - SF Bay Area ML
Data Workflows for Machine Learning - SF Bay Area MLData Workflows for Machine Learning - SF Bay Area ML
Data Workflows for Machine Learning - SF Bay Area MLPaco Nathan
 
How I Learned to Stop Worrying and Love Legacy Code - Ox:Agile 2018
How I Learned to Stop Worrying and Love Legacy Code - Ox:Agile 2018How I Learned to Stop Worrying and Love Legacy Code - Ox:Agile 2018
How I Learned to Stop Worrying and Love Legacy Code - Ox:Agile 2018Mike Harris
 
Mark Tortoricci - Talent42 2015
Mark Tortoricci - Talent42 2015Mark Tortoricci - Talent42 2015
Mark Tortoricci - Talent42 2015Talent42
 
Deep Dive Time Series Anomaly Detection in Azure with dotnet
Deep Dive Time Series Anomaly Detection in Azure with dotnetDeep Dive Time Series Anomaly Detection in Azure with dotnet
Deep Dive Time Series Anomaly Detection in Azure with dotnetMarco Parenzan
 
Machine Learning with TensorFlow 2
Machine Learning with TensorFlow 2Machine Learning with TensorFlow 2
Machine Learning with TensorFlow 2Sarah Stemmler
 
Punta Dreamin 17 Generic Apex and Tooling Api
Punta Dreamin 17 Generic Apex and Tooling ApiPunta Dreamin 17 Generic Apex and Tooling Api
Punta Dreamin 17 Generic Apex and Tooling ApiAdam Olshansky
 
Big Data Pipelines and Machine Learning at Uber
Big Data Pipelines and Machine Learning at UberBig Data Pipelines and Machine Learning at Uber
Big Data Pipelines and Machine Learning at UberSudhir Tonse
 
The Power of Auto ML and How Does it Work
The Power of Auto ML and How Does it WorkThe Power of Auto ML and How Does it Work
The Power of Auto ML and How Does it WorkIvo Andreev
 
Machine learning for IoT - unpacking the blackbox
Machine learning for IoT - unpacking the blackboxMachine learning for IoT - unpacking the blackbox
Machine learning for IoT - unpacking the blackboxIvo Andreev
 
Domain oriented development
Domain oriented developmentDomain oriented development
Domain oriented developmentrajmundr
 
TransmogrifAI - Automate Machine Learning Workflow with the power of Scala an...
TransmogrifAI - Automate Machine Learning Workflow with the power of Scala an...TransmogrifAI - Automate Machine Learning Workflow with the power of Scala an...
TransmogrifAI - Automate Machine Learning Workflow with the power of Scala an...Chetan Khatri
 
The Machine Learning Workflow with Azure
The Machine Learning Workflow with AzureThe Machine Learning Workflow with Azure
The Machine Learning Workflow with AzureIvo Andreev
 
Wix Machine Learning - Ran Romano
Wix Machine Learning - Ran RomanoWix Machine Learning - Ran Romano
Wix Machine Learning - Ran RomanoWix Engineering
 
Augmenting Machine Learning with Databricks Labs AutoML Toolkit
Augmenting Machine Learning with Databricks Labs AutoML ToolkitAugmenting Machine Learning with Databricks Labs AutoML Toolkit
Augmenting Machine Learning with Databricks Labs AutoML ToolkitDatabricks
 

Semelhante a Embracing a Taxonomy of Types to Simplify Machine Learning with Leah McGuire (20)

Fantastic ML apps and how to build them
Fantastic ML apps and how to build themFantastic ML apps and how to build them
Fantastic ML apps and how to build them
 
Write Generic Code with the Tooling API
Write Generic Code with the Tooling APIWrite Generic Code with the Tooling API
Write Generic Code with the Tooling API
 
OSCON 2014: Data Workflows for Machine Learning
OSCON 2014: Data Workflows for Machine LearningOSCON 2014: Data Workflows for Machine Learning
OSCON 2014: Data Workflows for Machine Learning
 
Scaling Machine Learning from zero to millions of users (May 2019)
Scaling Machine Learning from zero to millions of users (May 2019)Scaling Machine Learning from zero to millions of users (May 2019)
Scaling Machine Learning from zero to millions of users (May 2019)
 
Data Workflows for Machine Learning - Seattle DAML
Data Workflows for Machine Learning - Seattle DAMLData Workflows for Machine Learning - Seattle DAML
Data Workflows for Machine Learning - Seattle DAML
 
Data Workflows for Machine Learning - SF Bay Area ML
Data Workflows for Machine Learning - SF Bay Area MLData Workflows for Machine Learning - SF Bay Area ML
Data Workflows for Machine Learning - SF Bay Area ML
 
How I Learned to Stop Worrying and Love Legacy Code - Ox:Agile 2018
How I Learned to Stop Worrying and Love Legacy Code - Ox:Agile 2018How I Learned to Stop Worrying and Love Legacy Code - Ox:Agile 2018
How I Learned to Stop Worrying and Love Legacy Code - Ox:Agile 2018
 
Mark Tortoricci - Talent42 2015
Mark Tortoricci - Talent42 2015Mark Tortoricci - Talent42 2015
Mark Tortoricci - Talent42 2015
 
Deep Dive Time Series Anomaly Detection in Azure with dotnet
Deep Dive Time Series Anomaly Detection in Azure with dotnetDeep Dive Time Series Anomaly Detection in Azure with dotnet
Deep Dive Time Series Anomaly Detection in Azure with dotnet
 
Machine Learning with TensorFlow 2
Machine Learning with TensorFlow 2Machine Learning with TensorFlow 2
Machine Learning with TensorFlow 2
 
Punta Dreamin 17 Generic Apex and Tooling Api
Punta Dreamin 17 Generic Apex and Tooling ApiPunta Dreamin 17 Generic Apex and Tooling Api
Punta Dreamin 17 Generic Apex and Tooling Api
 
Big Data Pipelines and Machine Learning at Uber
Big Data Pipelines and Machine Learning at UberBig Data Pipelines and Machine Learning at Uber
Big Data Pipelines and Machine Learning at Uber
 
The Power of Auto ML and How Does it Work
The Power of Auto ML and How Does it WorkThe Power of Auto ML and How Does it Work
The Power of Auto ML and How Does it Work
 
Machine learning for IoT - unpacking the blackbox
Machine learning for IoT - unpacking the blackboxMachine learning for IoT - unpacking the blackbox
Machine learning for IoT - unpacking the blackbox
 
Domain oriented development
Domain oriented developmentDomain oriented development
Domain oriented development
 
TransmogrifAI - Automate Machine Learning Workflow with the power of Scala an...
TransmogrifAI - Automate Machine Learning Workflow with the power of Scala an...TransmogrifAI - Automate Machine Learning Workflow with the power of Scala an...
TransmogrifAI - Automate Machine Learning Workflow with the power of Scala an...
 
Machine learning
Machine learningMachine learning
Machine learning
 
The Machine Learning Workflow with Azure
The Machine Learning Workflow with AzureThe Machine Learning Workflow with Azure
The Machine Learning Workflow with Azure
 
Wix Machine Learning - Ran Romano
Wix Machine Learning - Ran RomanoWix Machine Learning - Ran Romano
Wix Machine Learning - Ran Romano
 
Augmenting Machine Learning with Databricks Labs AutoML Toolkit
Augmenting Machine Learning with Databricks Labs AutoML ToolkitAugmenting Machine Learning with Databricks Labs AutoML Toolkit
Augmenting Machine Learning with Databricks Labs AutoML Toolkit
 

Mais de Databricks

DW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptxDW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptxDatabricks
 
Data Lakehouse Symposium | Day 1 | Part 1
Data Lakehouse Symposium | Day 1 | Part 1Data Lakehouse Symposium | Day 1 | Part 1
Data Lakehouse Symposium | Day 1 | Part 1Databricks
 
Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 1 | Part 2Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 1 | Part 2Databricks
 
Data Lakehouse Symposium | Day 2
Data Lakehouse Symposium | Day 2Data Lakehouse Symposium | Day 2
Data Lakehouse Symposium | Day 2Databricks
 
Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4Databricks
 
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
5 Critical Steps to Clean Your Data Swamp When Migrating Off of HadoopDatabricks
 
Democratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized PlatformDemocratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized PlatformDatabricks
 
Learn to Use Databricks for Data Science
Learn to Use Databricks for Data ScienceLearn to Use Databricks for Data Science
Learn to Use Databricks for Data ScienceDatabricks
 
Why APM Is Not the Same As ML Monitoring
Why APM Is Not the Same As ML MonitoringWhy APM Is Not the Same As ML Monitoring
Why APM Is Not the Same As ML MonitoringDatabricks
 
The Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
The Function, the Context, and the Data—Enabling ML Ops at Stitch FixThe Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
The Function, the Context, and the Data—Enabling ML Ops at Stitch FixDatabricks
 
Stage Level Scheduling Improving Big Data and AI Integration
Stage Level Scheduling Improving Big Data and AI IntegrationStage Level Scheduling Improving Big Data and AI Integration
Stage Level Scheduling Improving Big Data and AI IntegrationDatabricks
 
Simplify Data Conversion from Spark to TensorFlow and PyTorch
Simplify Data Conversion from Spark to TensorFlow and PyTorchSimplify Data Conversion from Spark to TensorFlow and PyTorch
Simplify Data Conversion from Spark to TensorFlow and PyTorchDatabricks
 
Scaling your Data Pipelines with Apache Spark on Kubernetes
Scaling your Data Pipelines with Apache Spark on KubernetesScaling your Data Pipelines with Apache Spark on Kubernetes
Scaling your Data Pipelines with Apache Spark on KubernetesDatabricks
 
Scaling and Unifying SciKit Learn and Apache Spark Pipelines
Scaling and Unifying SciKit Learn and Apache Spark PipelinesScaling and Unifying SciKit Learn and Apache Spark Pipelines
Scaling and Unifying SciKit Learn and Apache Spark PipelinesDatabricks
 
Sawtooth Windows for Feature Aggregations
Sawtooth Windows for Feature AggregationsSawtooth Windows for Feature Aggregations
Sawtooth Windows for Feature AggregationsDatabricks
 
Redis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Redis + Apache Spark = Swiss Army Knife Meets Kitchen SinkRedis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Redis + Apache Spark = Swiss Army Knife Meets Kitchen SinkDatabricks
 
Re-imagine Data Monitoring with whylogs and Spark
Re-imagine Data Monitoring with whylogs and SparkRe-imagine Data Monitoring with whylogs and Spark
Re-imagine Data Monitoring with whylogs and SparkDatabricks
 
Raven: End-to-end Optimization of ML Prediction Queries
Raven: End-to-end Optimization of ML Prediction QueriesRaven: End-to-end Optimization of ML Prediction Queries
Raven: End-to-end Optimization of ML Prediction QueriesDatabricks
 
Processing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache SparkProcessing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache SparkDatabricks
 
Massive Data Processing in Adobe Using Delta Lake
Massive Data Processing in Adobe Using Delta LakeMassive Data Processing in Adobe Using Delta Lake
Massive Data Processing in Adobe Using Delta LakeDatabricks
 

Mais de Databricks (20)

DW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptxDW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptx
 
Data Lakehouse Symposium | Day 1 | Part 1
Data Lakehouse Symposium | Day 1 | Part 1Data Lakehouse Symposium | Day 1 | Part 1
Data Lakehouse Symposium | Day 1 | Part 1
 
Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 1 | Part 2Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 1 | Part 2
 
Data Lakehouse Symposium | Day 2
Data Lakehouse Symposium | Day 2Data Lakehouse Symposium | Day 2
Data Lakehouse Symposium | Day 2
 
Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4
 
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
 
Democratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized PlatformDemocratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized Platform
 
Learn to Use Databricks for Data Science
Learn to Use Databricks for Data ScienceLearn to Use Databricks for Data Science
Learn to Use Databricks for Data Science
 
Why APM Is Not the Same As ML Monitoring
Why APM Is Not the Same As ML MonitoringWhy APM Is Not the Same As ML Monitoring
Why APM Is Not the Same As ML Monitoring
 
The Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
The Function, the Context, and the Data—Enabling ML Ops at Stitch FixThe Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
The Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
 
Stage Level Scheduling Improving Big Data and AI Integration
Stage Level Scheduling Improving Big Data and AI IntegrationStage Level Scheduling Improving Big Data and AI Integration
Stage Level Scheduling Improving Big Data and AI Integration
 
Simplify Data Conversion from Spark to TensorFlow and PyTorch
Simplify Data Conversion from Spark to TensorFlow and PyTorchSimplify Data Conversion from Spark to TensorFlow and PyTorch
Simplify Data Conversion from Spark to TensorFlow and PyTorch
 
Scaling your Data Pipelines with Apache Spark on Kubernetes
Scaling your Data Pipelines with Apache Spark on KubernetesScaling your Data Pipelines with Apache Spark on Kubernetes
Scaling your Data Pipelines with Apache Spark on Kubernetes
 
Scaling and Unifying SciKit Learn and Apache Spark Pipelines
Scaling and Unifying SciKit Learn and Apache Spark PipelinesScaling and Unifying SciKit Learn and Apache Spark Pipelines
Scaling and Unifying SciKit Learn and Apache Spark Pipelines
 
Sawtooth Windows for Feature Aggregations
Sawtooth Windows for Feature AggregationsSawtooth Windows for Feature Aggregations
Sawtooth Windows for Feature Aggregations
 
Redis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Redis + Apache Spark = Swiss Army Knife Meets Kitchen SinkRedis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Redis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
 
Re-imagine Data Monitoring with whylogs and Spark
Re-imagine Data Monitoring with whylogs and SparkRe-imagine Data Monitoring with whylogs and Spark
Re-imagine Data Monitoring with whylogs and Spark
 
Raven: End-to-end Optimization of ML Prediction Queries
Raven: End-to-end Optimization of ML Prediction QueriesRaven: End-to-end Optimization of ML Prediction Queries
Raven: End-to-end Optimization of ML Prediction Queries
 
Processing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache SparkProcessing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache Spark
 
Massive Data Processing in Adobe Using Delta Lake
Massive Data Processing in Adobe Using Delta LakeMassive Data Processing in Adobe Using Delta Lake
Massive Data Processing in Adobe Using Delta Lake
 

Último

Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Valters Lauzums
 
Midocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxMidocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxolyaivanovalion
 
Log Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxLog Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxJohnnyPlasten
 
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...amitlee9823
 
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...amitlee9823
 
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 nightCheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 nightDelhi Call girls
 
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Delhi Call girls
 
Ravak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptxRavak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptxolyaivanovalion
 
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAl Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAroojKhan71
 
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Callshivangimorya083
 
Mature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxMature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxolyaivanovalion
 
Week-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionWeek-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionfulawalesam
 
Smarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptxSmarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptxolyaivanovalion
 
BabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptxBabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptxolyaivanovalion
 
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort ServiceBDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort ServiceDelhi Call girls
 
Accredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdfAccredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdfadriantubila
 
Halmar dropshipping via API with DroFx
Halmar  dropshipping  via API with DroFxHalmar  dropshipping  via API with DroFx
Halmar dropshipping via API with DroFxolyaivanovalion
 

Último (20)

Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
 
Midocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxMidocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFx
 
Log Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxLog Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptx
 
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
 
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
 
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
 
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 nightCheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
 
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
 
Ravak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptxRavak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptx
 
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get CytotecAbortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
 
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAl Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
 
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
 
Mature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxMature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptx
 
Week-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionWeek-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interaction
 
Smarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptxSmarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptx
 
BabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptxBabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptx
 
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort ServiceBDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
 
Accredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdfAccredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdf
 
Halmar dropshipping via API with DroFx
Halmar  dropshipping  via API with DroFxHalmar  dropshipping  via API with DroFx
Halmar dropshipping via API with DroFx
 
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
 

Embracing a Taxonomy of Types to Simplify Machine Learning with Leah McGuire

  • 1. Leah McGuire Principal Member of Technical Staff, Salesforce Einstein lmcguire@salesforce.com @leahmcguire Types, Types, Types! Embracing a hierarchy of types to simplify machine learning
  • 2. Let’s make sure you are in the right talk ​ What I am going to talk about: • What does machine learning mean at Salesforce • Problems in machine learning for business to business (B2B) companies • Automating machine learning and how our AutoML library (Optimus Prime) works • The utility of having strongly typed features in AutoML • What we have learned and what we are planning
  • 5. The Problem ​ For the majority of businesses, data science is out of reach
  • 6. Sales Cloud Einstein Predictive Lead Scoring Opportunity Insights Automated Activity Capture Building World’s Smartest CRM Commerce Cloud Einstein Product Recommendations Predictive Sort Commerce Insights App Cloud Einstein Heroku + PredictionIO Predictive Vision Services Predictive Sentiment Services Predictive Modeling Services Service Cloud Einstein Recommended Case Classification Recommended Responses Predictive Close Time Marketing Cloud Einstein Predictive Scoring Predictive Audiences Automated Send-time Optimization Community Cloud Einstein Recommended Experts, Articles & Topics Automated Service Escalation Newsfeed Insights Analytics Cloud Einstein Predictive Wave Apps Smart Data Discovery Automated Analytics & Storytelling IoT Cloud Einstein Predictive Device Scoring Recommend Best Next Action Automated IoT Rules Optimization
  • 7. Machine learning workflows And how much more complicated they get for B2B
  • 8. Feature Engineering Model Training Model A Model B Model C Model Evaluation ​ What Kaggle would lead us to believe Building a machine learning model
  • 9. Real-life ML ​ Building a ML model workflow ETL Model Evaluation Feature Engineering Scoring Model Training Model A Model B Model C Deployment
  • 10. ​ Over and over again Building a machine learning model ETL Model Evaluation Feature Engineering Scoring Model Trainin g Model A Model B Model C Deployment ETL Model Evaluation Feature Engineering Scoring Model Trainin g Model A Model B Model C Deployment ETL Model Evaluation Feature Engineering Scoring Model Trainin g Model A Model B Model C Deployment
  • 11. We can’t build one global model • Privacy concerns •  Customers don’t want data cross- pollinated • Business Use Cases •  Industries are very different •  Processes are different • Platform customization •  Ability to create custom fields and objects • Scale, Automation, •  Ability to create
  • 12. ​ Over and over again Building a machine learning model M o d e l E v a l u a t i o n F e a t u r e E n g i n e e r i n g S c o r i n g D e p l o y m e n t M o d e l E v a l u a t i o n F e a t u r e E n g i n e e r i n g S c o r i n g D e p l o y m e n t M o d e l E v a l u a t i o n F e a t u r e E n g i n e e r i n g S c o r i n g D e p l o y m e n t M o d e l E v a l u a t i o n F e a t u r e E n g i n e e r i n g S c o r i n g D e p l o y m e n t ET L Model Evaluati on Feature Enginee ring Scoring Mod el Trai ning Mod el A Mod el B Mod el C Deploy ment M o d el E v al u at io n F e at ur e E n gi n e er in g S c o ri n g M o d e l T r a i n i n g M o d e l A M o d e l B M o d e l C D e pl o y m e nt M o d e l E v a l u a t i o n F e a t u r e E n g i n e e r i n g S c o r i n g D e p l o y m e n t M o d e l E v a l F e a t u r e E n g i n S c o r i n g D e p l o y m e n M o d e l E v a l u a t i o n F e a t u r e E n g i n e e r i n g S c o r i n g D e p l o y m e n t M o d e l E v F e a t u r e E n S c o r i n g D e p l o y m M o d e l E v a l u F e a t u r e E n g i n S c o r i n g D e p l o y m e n t M o a F e a t u r S c o D e p Mo del Ev alu ati on Fe atu re En gin eer ing Sc ori ng M o d e l T r a i n i n g M o d e l A M o d e l B M o d e l C De plo ym ent M o d e l E v a l u a t i o F e a t u r e E n g i n e e r i n S c o r i n g D e p l o y m e n t M o d e l E v a l u a t i o n F e a t u r e E n g i n e e r i n g S c o r i n g D e p l o y m e n t M o d e l E v F e a t u r e E n g S c o r i n g D e p l o y m
  • 13. Automating machine learning Enter Einstein (and Optimus Prime)
  • 14. •  ML is not magic, just statistics – generalizing examples •  But there is a ‘black art’ to producing good models •  Input data needs to be combined, filtered, cleaned etc. •  Producing the best features for your model takes time •  You can’t just throw a ml algorithm at your raw data and expect good results Turning a black art into a paint by number kit.
  • 15. Keep it DRY (don’t repeat yourself) and DRO (don’t repeat others) • The Spark ML pipeline (estimator, transformer) model is nice • The lack of types in Spark is not • Want to use more than Spark ML • Declarative and intuitive syntax – for both workflow generation and developers • Typed reusable operations • Multitenant application support • All built in scala Optimus Prime - A library to develop reusable, modular and typed ML workflows
  • 16. Simple interchangeable parts ​ val featureVector = Seq(pClass, name, gender, age, sibSp, parch, ticket, cabin, embarked).vectorize() ​ val (pred, raw, prob) = featureVector.check(survived).classify(survived) ​ val workflow = new OpWorkflow().setResultFeatures(pred).setDataReader(titanicReader) ​ In a declarative type safe syntax Stages Features transformed with produce Transformers Estimators fitted into ReadersWorkflows input data materialized by
  • 17. Automating typed feature engineering and modeling (with Optimus Prime)
  • 18. Features are given a type on creation ​ val gender = FeatureBuilder.Categorical[Titanic] .extract(d => Option(d.getGender).toSet[String]).asPredictor • Features are strongly typed • Each stage takes specific input type(s) and returns a specific output type(s) ​ Death to runtime errors! StagesFeatures produce
  • 19. Creating a workflow DAG with features • Features point to a column of data • The type of the feature determines which stages can act on it pred prob rawPred survived featureVector gender age name genderPivot title
  • 20. Creating a workflow DAG with features • When a stage acts on a feature it produces a new feature (or features) • Keep on manipulating features until you get your goal pred prob rawPred survived featureVector gender age name genderPivot title Pivot Title Regex Combine Model
  • 21. Done manipulating your features? Make them. Stages Features transformed with produce Transformers Estimators fitted into ReadersWorkflows input data materialized by • Once you make your final feature you have the full DAG • Features are materialized by the workflow • Initial data into the workflow provided by the reader
  • 22. The power of types!
  • 23. Using types to automate feature engineering ​ val featureVector = Seq(pClass, name, gender, age, sibSp, parch, ticket, cabin, embarked).vectorize() • Each feature is mapped to an appropriate .vectorize() stage based on its type •  gender (a Categorical) and age (a Real) are automatically assigned to different stages • You also have an option to do the exact type safe manipulations you want •  age can undergo special transformations if desired •  val ageBuckets =age.bucketize(buckets(0, 10, 20, 40, 100)) •  val featureVector = Seq(pClass, name, gender, ageBuckets, sibSp, parch, ticket, cabin, embarked).vectorize()
  • 24. Show me the types! FeatureType OPNumeric OPCollection OPSet OPSortedSet OPList NonNullable Text Email Base64 Phone ID URL ComboBox PickList TextArea OPVector OPMap BinaryMap IntegralMap RealMap CategoricalMap OrdinalMap DateList DateTimeList Integral Real Binary Percent Currency Date DateTime Categorical MultiPickList Note: all the types are assumed to be nullable, unless NonNullable trait is mixed - https://developer.salesforce.com/docs/atlas.en-us.api.meta/api/field_types.htm Ordinal TextMap Legend: - inheritance, bold - abstract class, italic - trait, normal - concrete class ... Optimus Prime Type Hierarchy TextList City Street Country Postal Code Location State Geolocation StateMap
  • 25. Take the types away!! • Sometimes a type is all you have • Hierarchy allows both very specific and very general stages • Type safety for production saves a lot of headaches ​ Why would we make this monstrosity?? FeatureType OPNumeric OPCollection OPSet OPSortedSet OPList NonNullable Text Email Base64 Phone ID URL ComboBox PickList TextArea OPVector OPMap BinaryMap IntegralMap RealMap CategoricalMap OrdinalMap DateList DateTimeList Integral Real Binary Percent Currency Date DateTime Categorical MultiPickList Note: all the types are assumed to be nullable, unless NonNullable trait is mixed - https://developer.salesforce.com/docs/atlas.en-us.api.meta/api/field_types.htm Ordinal TextMap Legend: - inheritance, bold - abstract class, italic - trait, normal - concrete class ... Optimus Prime Type Hierarchy TextList City Street Country Postal Code Location State Geolocation StateMap
  • 26. Sanity Checking – the stage that checks your features • Check data quality before modeling • Label leakage • Features have acceptable ranges • The feature types allow much better checks val checkedVector = featureVector.check(survived)
  • 27. Model Selection Stage - Resampling, Hyper-parameter Tuning, Comparing Models • Many possible models for each class of problem • Many hyper parameters for each type of model • Finding the right model for THIS dataset makes a huge difference val (pred, raw, prob) = checkedFeatureVector.classify(survived)
  • 28. Types can save us ​ And if you don’t believe me take a look at the code val featureVector = Seq(pClass, name, gender, age, sibSp, parch, ticket, cabin, embarked).vectorize() val (pred, raw, prob) = featureVector.check(survived).classify(survived) val workflow = new OpWorkflow().setResultFeatures(pred).setDataReader(titanicReader)
  • 29. Types can save us ​ And if you don’t believe me take a look at the code val featureVector = Seq(pClass, name, gender, age, sibSp, parch, ticket, cabin, embarked).vectorize() val (pred, raw, prob) = featureVector.check(survived).classify(survived) val workflow = new OpWorkflow().setResultFeatures(pred).setDataReader(titanicReader) def addFeatures(df: DataFrame): DataFrame = {! // Create a new family size field := siblings + spouses + parents + children + self! val familySizeUDF = udf { (sibsp: Double, parch: Double) => sibsp + parch + 1 }! ! df.withColumn("fsize", familySizeUDF(col("sibsp"), col("parch"))) // <-- full freedom to overwrite ! }! ! def fillMissing(df: DataFrame): DataFrame = {! // Fill missing age values with average age! val avgAge = df.select("age").agg(avg("age")).collect.first()! ! // Fill missing embarked values with default "S" (i.e Southampton)! val embarkedUDF = udf{(e: String)=> e match { case x if x == null || x.isEmpty => "S"; case x => x}}! ! df.na.fill(Map("age" -> avgAge)).withColumn("embarked", embarkedUDF(col("embarked")))! }!
  • 30. Types can save us ​ And if you don’t believe me take a look at the code // Modify the dataframe! val allData = fillMissing(addFeatures(rawData)).cache() // <-- need to remember about caching! // Split the data and cache it! val Array(trainSet, testSet) = allData.randomSplit(Array(0.75, 0.25)).map(_.cache())! ! // Prepare categorical columns! val categoricalFeatures = Array("pclass", "sex", "embarked")! val stringIndexers = categoricalFeatures.map(colName =>! new StringIndexer().setInputCol(colName).setOutputCol(colName + "_index").fit(allData)! )! ! // Concat all the feature into a numeric feature vector! val allFeatures = Array("age", "sibsp", "parch", "fsize") ++ stringIndexers.map(_.getOutputCol)! ! val vectorAssembler = new VectorAssembler().setInputCols(allFeatures).setOutputCol("feature_vector”)! ! // Prepare Logistic Regression estimator! val logReg = new LogisticRegression().setFeaturesCol("feature_vector").setLabelCol("survived”)! ! // Finally build the pipeline with the stages above! val pipeline = new Pipeline().setStages(stringIndexers ++ Array(vectorAssembler, logReg))!
  • 31. Types can save us ​ And if you don’t believe me take a look at the code // Cross validate our pipeline with various parameters! val paramGrid =! new ParamGridBuilder()! .addGrid(logReg.regParam, Array(1, 0.1, 0.01))! .addGrid(logReg.maxIter, Array(10, 50, 100))! .build()! ! val crossValidator =! new CrossValidator()! .setEstimator(pipeline) // <-- set our pipeline here! .setEstimatorParamMaps(paramGrid)! .setEvaluator(new BinaryClassificationEvaluator().setLabelCol("survived"))! .setNumFolds(3)! ! // Train the model & compute scores ! val model: CrossValidationModel = crossValidator.fit(trainSet)! val scores: DataFrame = model.transform(testSet)! ! // Save the model for later use! model.save("/models/titanic-model.ml")! !
  • 32. Where are we going and what have we learned
  • 33. Key takeaways • ML for B2B is a whole other beast • Spark ML is great, but it needs type safety • Simple and intuitive syntax saves you trouble down the road • Types in ML are incredibly useful • Scala has all the relevant facilities to provide the above • Modularity and reusability is the key
  • 34. Going forward with Optimus Prime • Going beyond Spark ML for algorithms and small scale • Making everything smarter (feature eng, sanity checking, model selection) • Template generation • Improvements to developer interface
  • 35. If You’re Curious … einstein-recruiting@salesforce.com