SlideShare uma empresa Scribd logo
1 de 38
Building an A.I. Cloud
What We Learned from PredictionIO
InfoQ.com: News & Community Site
• 750,000 unique visitors/month
• Published in 4 languages (English, Chinese, Japanese and Brazilian
Portuguese)
• Post content from our QCon conferences
• News 15-20 / week
• Articles 3-4 / week
• Presentations (videos) 12-15 / week
• Interviews 2-3 / week
• Books 1 / month
Watch the video with slide
synchronization on InfoQ.com!
https://www.infoq.com/presentations/
machine-learning-saas
Presented at QCon New York
www.qconnewyork.com
Purpose of QCon
- to empower software development by facilitating the spread of
knowledge and innovation
Strategy
- practitioner-driven conference designed for YOU: influencers of
change and innovation in your teams
- speakers and topics driving the evolution and innovation
- connecting and catalyzing the influencers and innovators
Highlights
- attended by more than 12,000 delegates since 2007
- held in 9 cities worldwide
Simon Chan
Sr. Director, Product Management, Salesforce
Co-founder, PredictionIO
PhD, University College London
simon@salesforce.com
The A.I. Developer Platform Dilemma
FlexibleSimple
“ Every prediction problem is unique.
3 Approaches to Customize Prediction
GUIAutomation
Custom
Code & Script
FlexibleSimple
10 KEY STEPS
to build-your-own A.I.
P.S. Choices = Complexity
One platform, build multiple apps. Here are 3 examples.
1. E-Commerce
Recommend
products
2. Subscription
Predict churn
3. Social Network
Detect spam
Let’s Go Beyond Textbook Tutorial
// Example from Spark ML website
import org.apache.spark.ml.classification.LogisticRegression
// Load training data
val training = sqlCtx.read.format("libsvm").load("data/mllib/sample_libsvm_data.txt")
val lr = new LogisticRegression().setMaxIter(10).setRegParam(0.3).setElasticNetParam(0.8)
// Fit the model
val lrModel = lr.fit(training)
Define the Prediction Problem
Be clear about the goal
1
Define the Prediction Problem
Basic Ideas:
◦ What is the Business Goal?
▫ Better user experience
▫ Maximize revenue
▫ Automate a manual task
▫ Forecast future events
◦ What’s the input query?
◦ What’s the prediction output?
◦ What is a good prediction?
◦ What is a bad prediction?
Decide on the Presentation
It’s (still) all about human perception
2
Decide on the Presentation
NOT SPAM SPAM
NOT SPAM True Negative False Negative
SPAM False Positive True Positive
Actual
Predicted
Mailing List and Social Network, for example, may tolerate false predictions differently
Decide on the Presentation
Some UX/UI Choices:
◦ Toleration to Bad Prediction?
◦ Suggestive or Decisive?
◦ Prediction Explainability?
◦ Intelligence Visualization?
◦ Human Interactive?
◦ Score; Ranking; Comparison; Charts; Groups
◦ Feedback Interface
▫ Explicit or Implicit
Import Free-form Data Source
Life is more complicated than MNIST and MovieLens datasets
3
Import Free-form Data Sources
Some Types of Data:
◦ User Attributes
◦ Item (product/content) Attributes
◦ Activities / Events
Estimate (guess) what you need.
Import Free-form Data Sources
Some Ways to Transfer Data:
◦ Transactional versus Batch
◦ Batch Frequency
◦ Full data or Changed Delta Only
Don’t forget continuous data sanity checking
Construct Features & Labels
from Data
Make it algorithm-friendly!
4
Construct Features & Labels from Data
Some Ways of Transformation:
◦ Qualitative to Numerical
◦ Normalize and Weight
◦ Aggregate - Sum, Average?
◦ Time Range
◦ Missing Records
Different algorithms may need different things
Label-specific:
◦ Delayed Feedback
◦ Implicit Assumptions
◦ Reliability of Explicit Opinion
Construct Features & Labels from Data
Qualitative to Numerical
// Example from Spark ML website - TF-IDF
import org.apache.spark.ml.feature.{HashingTF, IDF, Tokenizer}
val sentenceData = sqlContext.createDataFrame(Seq(
(0, "Hi I heard about Spark"),
(0, "I wish Java could use case classes"),
(1, "Logistic regression models are neat")
)).toDF("label", "sentence")
val tokenizer = new Tokenizer().setInputCol("sentence").setOutputCol("words")
val wordsData = tokenizer.transform(sentenceData)
val hashingTF = new HashingTF()
.setInputCol("words").setOutputCol("rawFeatures").setNumFeatures(20)
val featurizedData = hashingTF.transform(wordsData)
val idf = new IDF().setInputCol("rawFeatures").setOutputCol("features")
val idfModel = idf.fit(featurizedData)
val rescaledData = idfModel.transform(featurizedData)
Set Evaluation Metrics
Measure things that matter
5
Set Evaluation Metrics
Some Challenges:
◦ How to Define an Offline Evaluation that
Reflects Real Business Goal?
◦ Delayed Feedback (again)
◦ How to Present The Results to Everyone?
◦ How to Do Live A/B Test?
Clarify “Real-time”
The same word can mean different things
6
Clarify “Real-time”
Different Needs:
◦ Batch Model Update, Batch Queries
◦ Batch Model Update, Real-time Queries
◦ Real-time Model Update, Real-time Queries
When to Train/Re-train for Batch?
Find the Right Model
The “cool” modeling part - algorithms and hyperparameters
7
Find the Right Model
Example of model hyperparameter selection
// Example from Spark ML website
// We use a ParamGridBuilder to construct a grid of parameters to search over.
val paramGrid = new ParamGridBuilder()
.addGrid(hashingTF.numFeatures, Array(10, 100, 1000))
.addGrid(lr.regParam, Array(0.1, 0.01)).build()
// Note that the evaluator here is a BinaryClassificationEvaluator and its default metric
// is areaUnderROC.
val cv = new CrossValidator()
.setEstimator(pipeline).setEvaluator(new BinaryClassificationEvaluator)
.setEstimatorParamMaps(paramGrid)
.setNumFolds(2) // Use 3+ in practice
// Run cross-validation, and choose the best set of parameters.
val cvModel = cv.fit(training)
Find the Right Model
Some Typical Challenges:
◦ Classification, Regression, Recommendation
or Something Else?
◦ Overfitting / Underfitting
◦ Cold-Start (New Users/Items)
◦ Data Size
◦ Noise
Serve Predictions
Time to Use the Result
8
Serve Predictions
Some Approaches:
◦ Real-time Scoring
◦ Batch Scoring
Real-time Business Logics/Filters is often added
on top.
Collect Feedback for Improvement
Machine Learning is all about “Learning”
9
Collect Feedback for Improvement
Some Mechanism:
◦ Explicit User Feedback
▫ Allow users to correct, or express opinion
on, prediction manually
◦ Implicit Feedback
▫ Learn from subsequence effects of the
previous predictions
▫ Compare predicted results with the
actual reality
Keep Monitoring & Be Crisis-Ready
Make sure things are still working
10
Keep Monitoring & Be Crisis-Ready
Some Ideas:
◦ Real-time Alert (email, slack, pagerduty)
◦ Daily Reports
◦ Possibly integrate with existing monitoring tools
◦ Ready for production rollback
◦ Version control
For both prediction accuracy and production
issues.
Summary: Processes We Need to Simplify
Define the Prediction Problem
Decide on the Presentation
Import Free-form Data Sources
Construct Features & Labels from Data
Set Evaluation Metrics
Clarify “Real-Time”
Find the Right Model
Serve Predictions
Collect Feedback for Improvement
Keep Monitoring & Be Crisis-Ready
The Future of A.I.
is the automation of A.I.
Thanks! Any Questions?
WE ARE HIRING.
simon@salesforce.com
@simonchannet
Watch the video with slide
synchronization on InfoQ.com!
https://www.infoq.com/presentations/
machine-learning-saas

Mais conteúdo relacionado

Destaque

Wellbeing week 1 jan 16
Wellbeing week 1 jan 16Wellbeing week 1 jan 16
Wellbeing week 1 jan 16mwalsh2015
 
Elena massetti foster
Elena massetti   fosterElena massetti   foster
Elena massetti fosterinfoprogetto
 
Insurance Technology As A Catalyst For Change - Emerging Market Segments
Insurance Technology As A Catalyst For Change - Emerging Market SegmentsInsurance Technology As A Catalyst For Change - Emerging Market Segments
Insurance Technology As A Catalyst For Change - Emerging Market SegmentsAgile Financial Technologies
 
5 Yoga Poses To Help Soothe Neck And Back Pain
5 Yoga Poses To Help Soothe Neck And Back Pain5 Yoga Poses To Help Soothe Neck And Back Pain
5 Yoga Poses To Help Soothe Neck And Back PainEason Chan
 
5 Point Checklist to Create Powerful Cover Slides
5 Point Checklist to Create Powerful Cover Slides5 Point Checklist to Create Powerful Cover Slides
5 Point Checklist to Create Powerful Cover Slides24Slides
 
Verizon communications
Verizon communicationsVerizon communications
Verizon communicationsPeter Jon
 
Ways to keep allergies at bay
Ways to keep allergies at bayWays to keep allergies at bay
Ways to keep allergies at bayEason Chan
 

Destaque (7)

Wellbeing week 1 jan 16
Wellbeing week 1 jan 16Wellbeing week 1 jan 16
Wellbeing week 1 jan 16
 
Elena massetti foster
Elena massetti   fosterElena massetti   foster
Elena massetti foster
 
Insurance Technology As A Catalyst For Change - Emerging Market Segments
Insurance Technology As A Catalyst For Change - Emerging Market SegmentsInsurance Technology As A Catalyst For Change - Emerging Market Segments
Insurance Technology As A Catalyst For Change - Emerging Market Segments
 
5 Yoga Poses To Help Soothe Neck And Back Pain
5 Yoga Poses To Help Soothe Neck And Back Pain5 Yoga Poses To Help Soothe Neck And Back Pain
5 Yoga Poses To Help Soothe Neck And Back Pain
 
5 Point Checklist to Create Powerful Cover Slides
5 Point Checklist to Create Powerful Cover Slides5 Point Checklist to Create Powerful Cover Slides
5 Point Checklist to Create Powerful Cover Slides
 
Verizon communications
Verizon communicationsVerizon communications
Verizon communications
 
Ways to keep allergies at bay
Ways to keep allergies at bayWays to keep allergies at bay
Ways to keep allergies at bay
 

Mais de C4Media

Streaming a Million Likes/Second: Real-Time Interactions on Live Video
Streaming a Million Likes/Second: Real-Time Interactions on Live VideoStreaming a Million Likes/Second: Real-Time Interactions on Live Video
Streaming a Million Likes/Second: Real-Time Interactions on Live VideoC4Media
 
Next Generation Client APIs in Envoy Mobile
Next Generation Client APIs in Envoy MobileNext Generation Client APIs in Envoy Mobile
Next Generation Client APIs in Envoy MobileC4Media
 
Software Teams and Teamwork Trends Report Q1 2020
Software Teams and Teamwork Trends Report Q1 2020Software Teams and Teamwork Trends Report Q1 2020
Software Teams and Teamwork Trends Report Q1 2020C4Media
 
Understand the Trade-offs Using Compilers for Java Applications
Understand the Trade-offs Using Compilers for Java ApplicationsUnderstand the Trade-offs Using Compilers for Java Applications
Understand the Trade-offs Using Compilers for Java ApplicationsC4Media
 
Kafka Needs No Keeper
Kafka Needs No KeeperKafka Needs No Keeper
Kafka Needs No KeeperC4Media
 
High Performing Teams Act Like Owners
High Performing Teams Act Like OwnersHigh Performing Teams Act Like Owners
High Performing Teams Act Like OwnersC4Media
 
Does Java Need Inline Types? What Project Valhalla Can Bring to Java
Does Java Need Inline Types? What Project Valhalla Can Bring to JavaDoes Java Need Inline Types? What Project Valhalla Can Bring to Java
Does Java Need Inline Types? What Project Valhalla Can Bring to JavaC4Media
 
Service Meshes- The Ultimate Guide
Service Meshes- The Ultimate GuideService Meshes- The Ultimate Guide
Service Meshes- The Ultimate GuideC4Media
 
Shifting Left with Cloud Native CI/CD
Shifting Left with Cloud Native CI/CDShifting Left with Cloud Native CI/CD
Shifting Left with Cloud Native CI/CDC4Media
 
CI/CD for Machine Learning
CI/CD for Machine LearningCI/CD for Machine Learning
CI/CD for Machine LearningC4Media
 
Fault Tolerance at Speed
Fault Tolerance at SpeedFault Tolerance at Speed
Fault Tolerance at SpeedC4Media
 
Architectures That Scale Deep - Regaining Control in Deep Systems
Architectures That Scale Deep - Regaining Control in Deep SystemsArchitectures That Scale Deep - Regaining Control in Deep Systems
Architectures That Scale Deep - Regaining Control in Deep SystemsC4Media
 
ML in the Browser: Interactive Experiences with Tensorflow.js
ML in the Browser: Interactive Experiences with Tensorflow.jsML in the Browser: Interactive Experiences with Tensorflow.js
ML in the Browser: Interactive Experiences with Tensorflow.jsC4Media
 
Build Your Own WebAssembly Compiler
Build Your Own WebAssembly CompilerBuild Your Own WebAssembly Compiler
Build Your Own WebAssembly CompilerC4Media
 
User & Device Identity for Microservices @ Netflix Scale
User & Device Identity for Microservices @ Netflix ScaleUser & Device Identity for Microservices @ Netflix Scale
User & Device Identity for Microservices @ Netflix ScaleC4Media
 
Scaling Patterns for Netflix's Edge
Scaling Patterns for Netflix's EdgeScaling Patterns for Netflix's Edge
Scaling Patterns for Netflix's EdgeC4Media
 
Make Your Electron App Feel at Home Everywhere
Make Your Electron App Feel at Home EverywhereMake Your Electron App Feel at Home Everywhere
Make Your Electron App Feel at Home EverywhereC4Media
 
The Talk You've Been Await-ing For
The Talk You've Been Await-ing ForThe Talk You've Been Await-ing For
The Talk You've Been Await-ing ForC4Media
 
Future of Data Engineering
Future of Data EngineeringFuture of Data Engineering
Future of Data EngineeringC4Media
 
Automated Testing for Terraform, Docker, Packer, Kubernetes, and More
Automated Testing for Terraform, Docker, Packer, Kubernetes, and MoreAutomated Testing for Terraform, Docker, Packer, Kubernetes, and More
Automated Testing for Terraform, Docker, Packer, Kubernetes, and MoreC4Media
 

Mais de C4Media (20)

Streaming a Million Likes/Second: Real-Time Interactions on Live Video
Streaming a Million Likes/Second: Real-Time Interactions on Live VideoStreaming a Million Likes/Second: Real-Time Interactions on Live Video
Streaming a Million Likes/Second: Real-Time Interactions on Live Video
 
Next Generation Client APIs in Envoy Mobile
Next Generation Client APIs in Envoy MobileNext Generation Client APIs in Envoy Mobile
Next Generation Client APIs in Envoy Mobile
 
Software Teams and Teamwork Trends Report Q1 2020
Software Teams and Teamwork Trends Report Q1 2020Software Teams and Teamwork Trends Report Q1 2020
Software Teams and Teamwork Trends Report Q1 2020
 
Understand the Trade-offs Using Compilers for Java Applications
Understand the Trade-offs Using Compilers for Java ApplicationsUnderstand the Trade-offs Using Compilers for Java Applications
Understand the Trade-offs Using Compilers for Java Applications
 
Kafka Needs No Keeper
Kafka Needs No KeeperKafka Needs No Keeper
Kafka Needs No Keeper
 
High Performing Teams Act Like Owners
High Performing Teams Act Like OwnersHigh Performing Teams Act Like Owners
High Performing Teams Act Like Owners
 
Does Java Need Inline Types? What Project Valhalla Can Bring to Java
Does Java Need Inline Types? What Project Valhalla Can Bring to JavaDoes Java Need Inline Types? What Project Valhalla Can Bring to Java
Does Java Need Inline Types? What Project Valhalla Can Bring to Java
 
Service Meshes- The Ultimate Guide
Service Meshes- The Ultimate GuideService Meshes- The Ultimate Guide
Service Meshes- The Ultimate Guide
 
Shifting Left with Cloud Native CI/CD
Shifting Left with Cloud Native CI/CDShifting Left with Cloud Native CI/CD
Shifting Left with Cloud Native CI/CD
 
CI/CD for Machine Learning
CI/CD for Machine LearningCI/CD for Machine Learning
CI/CD for Machine Learning
 
Fault Tolerance at Speed
Fault Tolerance at SpeedFault Tolerance at Speed
Fault Tolerance at Speed
 
Architectures That Scale Deep - Regaining Control in Deep Systems
Architectures That Scale Deep - Regaining Control in Deep SystemsArchitectures That Scale Deep - Regaining Control in Deep Systems
Architectures That Scale Deep - Regaining Control in Deep Systems
 
ML in the Browser: Interactive Experiences with Tensorflow.js
ML in the Browser: Interactive Experiences with Tensorflow.jsML in the Browser: Interactive Experiences with Tensorflow.js
ML in the Browser: Interactive Experiences with Tensorflow.js
 
Build Your Own WebAssembly Compiler
Build Your Own WebAssembly CompilerBuild Your Own WebAssembly Compiler
Build Your Own WebAssembly Compiler
 
User & Device Identity for Microservices @ Netflix Scale
User & Device Identity for Microservices @ Netflix ScaleUser & Device Identity for Microservices @ Netflix Scale
User & Device Identity for Microservices @ Netflix Scale
 
Scaling Patterns for Netflix's Edge
Scaling Patterns for Netflix's EdgeScaling Patterns for Netflix's Edge
Scaling Patterns for Netflix's Edge
 
Make Your Electron App Feel at Home Everywhere
Make Your Electron App Feel at Home EverywhereMake Your Electron App Feel at Home Everywhere
Make Your Electron App Feel at Home Everywhere
 
The Talk You've Been Await-ing For
The Talk You've Been Await-ing ForThe Talk You've Been Await-ing For
The Talk You've Been Await-ing For
 
Future of Data Engineering
Future of Data EngineeringFuture of Data Engineering
Future of Data Engineering
 
Automated Testing for Terraform, Docker, Packer, Kubernetes, and More
Automated Testing for Terraform, Docker, Packer, Kubernetes, and MoreAutomated Testing for Terraform, Docker, Packer, Kubernetes, and More
Automated Testing for Terraform, Docker, Packer, Kubernetes, and More
 

Último

Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxLoriGlavin3
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity PlanDatabarracks
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyAlfredo García Lavilla
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningLars Bell
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESSALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESmohitsingh558521
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .Alan Dix
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxLoriGlavin3
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxLoriGlavin3
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.Curtis Poe
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfMounikaPolabathina
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxLoriGlavin3
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 

Último (20)

Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
 
DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special EditionDMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity Plan
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easy
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine Tuning
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESSALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdf
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 

Building an AI in the Cloud

  • 1. Building an A.I. Cloud What We Learned from PredictionIO
  • 2. InfoQ.com: News & Community Site • 750,000 unique visitors/month • Published in 4 languages (English, Chinese, Japanese and Brazilian Portuguese) • Post content from our QCon conferences • News 15-20 / week • Articles 3-4 / week • Presentations (videos) 12-15 / week • Interviews 2-3 / week • Books 1 / month Watch the video with slide synchronization on InfoQ.com! https://www.infoq.com/presentations/ machine-learning-saas
  • 3. Presented at QCon New York www.qconnewyork.com Purpose of QCon - to empower software development by facilitating the spread of knowledge and innovation Strategy - practitioner-driven conference designed for YOU: influencers of change and innovation in your teams - speakers and topics driving the evolution and innovation - connecting and catalyzing the influencers and innovators Highlights - attended by more than 12,000 delegates since 2007 - held in 9 cities worldwide
  • 4. Simon Chan Sr. Director, Product Management, Salesforce Co-founder, PredictionIO PhD, University College London simon@salesforce.com
  • 5. The A.I. Developer Platform Dilemma FlexibleSimple
  • 6. “ Every prediction problem is unique.
  • 7. 3 Approaches to Customize Prediction GUIAutomation Custom Code & Script FlexibleSimple
  • 8. 10 KEY STEPS to build-your-own A.I. P.S. Choices = Complexity
  • 9. One platform, build multiple apps. Here are 3 examples. 1. E-Commerce Recommend products 2. Subscription Predict churn 3. Social Network Detect spam
  • 10. Let’s Go Beyond Textbook Tutorial // Example from Spark ML website import org.apache.spark.ml.classification.LogisticRegression // Load training data val training = sqlCtx.read.format("libsvm").load("data/mllib/sample_libsvm_data.txt") val lr = new LogisticRegression().setMaxIter(10).setRegParam(0.3).setElasticNetParam(0.8) // Fit the model val lrModel = lr.fit(training)
  • 11. Define the Prediction Problem Be clear about the goal 1
  • 12. Define the Prediction Problem Basic Ideas: ◦ What is the Business Goal? ▫ Better user experience ▫ Maximize revenue ▫ Automate a manual task ▫ Forecast future events ◦ What’s the input query? ◦ What’s the prediction output? ◦ What is a good prediction? ◦ What is a bad prediction?
  • 13. Decide on the Presentation It’s (still) all about human perception 2
  • 14. Decide on the Presentation NOT SPAM SPAM NOT SPAM True Negative False Negative SPAM False Positive True Positive Actual Predicted Mailing List and Social Network, for example, may tolerate false predictions differently
  • 15. Decide on the Presentation Some UX/UI Choices: ◦ Toleration to Bad Prediction? ◦ Suggestive or Decisive? ◦ Prediction Explainability? ◦ Intelligence Visualization? ◦ Human Interactive? ◦ Score; Ranking; Comparison; Charts; Groups ◦ Feedback Interface ▫ Explicit or Implicit
  • 16. Import Free-form Data Source Life is more complicated than MNIST and MovieLens datasets 3
  • 17. Import Free-form Data Sources Some Types of Data: ◦ User Attributes ◦ Item (product/content) Attributes ◦ Activities / Events Estimate (guess) what you need.
  • 18. Import Free-form Data Sources Some Ways to Transfer Data: ◦ Transactional versus Batch ◦ Batch Frequency ◦ Full data or Changed Delta Only Don’t forget continuous data sanity checking
  • 19. Construct Features & Labels from Data Make it algorithm-friendly! 4
  • 20. Construct Features & Labels from Data Some Ways of Transformation: ◦ Qualitative to Numerical ◦ Normalize and Weight ◦ Aggregate - Sum, Average? ◦ Time Range ◦ Missing Records Different algorithms may need different things Label-specific: ◦ Delayed Feedback ◦ Implicit Assumptions ◦ Reliability of Explicit Opinion
  • 21. Construct Features & Labels from Data Qualitative to Numerical // Example from Spark ML website - TF-IDF import org.apache.spark.ml.feature.{HashingTF, IDF, Tokenizer} val sentenceData = sqlContext.createDataFrame(Seq( (0, "Hi I heard about Spark"), (0, "I wish Java could use case classes"), (1, "Logistic regression models are neat") )).toDF("label", "sentence") val tokenizer = new Tokenizer().setInputCol("sentence").setOutputCol("words") val wordsData = tokenizer.transform(sentenceData) val hashingTF = new HashingTF() .setInputCol("words").setOutputCol("rawFeatures").setNumFeatures(20) val featurizedData = hashingTF.transform(wordsData) val idf = new IDF().setInputCol("rawFeatures").setOutputCol("features") val idfModel = idf.fit(featurizedData) val rescaledData = idfModel.transform(featurizedData)
  • 22. Set Evaluation Metrics Measure things that matter 5
  • 23. Set Evaluation Metrics Some Challenges: ◦ How to Define an Offline Evaluation that Reflects Real Business Goal? ◦ Delayed Feedback (again) ◦ How to Present The Results to Everyone? ◦ How to Do Live A/B Test?
  • 24. Clarify “Real-time” The same word can mean different things 6
  • 25. Clarify “Real-time” Different Needs: ◦ Batch Model Update, Batch Queries ◦ Batch Model Update, Real-time Queries ◦ Real-time Model Update, Real-time Queries When to Train/Re-train for Batch?
  • 26. Find the Right Model The “cool” modeling part - algorithms and hyperparameters 7
  • 27. Find the Right Model Example of model hyperparameter selection // Example from Spark ML website // We use a ParamGridBuilder to construct a grid of parameters to search over. val paramGrid = new ParamGridBuilder() .addGrid(hashingTF.numFeatures, Array(10, 100, 1000)) .addGrid(lr.regParam, Array(0.1, 0.01)).build() // Note that the evaluator here is a BinaryClassificationEvaluator and its default metric // is areaUnderROC. val cv = new CrossValidator() .setEstimator(pipeline).setEvaluator(new BinaryClassificationEvaluator) .setEstimatorParamMaps(paramGrid) .setNumFolds(2) // Use 3+ in practice // Run cross-validation, and choose the best set of parameters. val cvModel = cv.fit(training)
  • 28. Find the Right Model Some Typical Challenges: ◦ Classification, Regression, Recommendation or Something Else? ◦ Overfitting / Underfitting ◦ Cold-Start (New Users/Items) ◦ Data Size ◦ Noise
  • 29. Serve Predictions Time to Use the Result 8
  • 30. Serve Predictions Some Approaches: ◦ Real-time Scoring ◦ Batch Scoring Real-time Business Logics/Filters is often added on top.
  • 31. Collect Feedback for Improvement Machine Learning is all about “Learning” 9
  • 32. Collect Feedback for Improvement Some Mechanism: ◦ Explicit User Feedback ▫ Allow users to correct, or express opinion on, prediction manually ◦ Implicit Feedback ▫ Learn from subsequence effects of the previous predictions ▫ Compare predicted results with the actual reality
  • 33. Keep Monitoring & Be Crisis-Ready Make sure things are still working 10
  • 34. Keep Monitoring & Be Crisis-Ready Some Ideas: ◦ Real-time Alert (email, slack, pagerduty) ◦ Daily Reports ◦ Possibly integrate with existing monitoring tools ◦ Ready for production rollback ◦ Version control For both prediction accuracy and production issues.
  • 35. Summary: Processes We Need to Simplify Define the Prediction Problem Decide on the Presentation Import Free-form Data Sources Construct Features & Labels from Data Set Evaluation Metrics Clarify “Real-Time” Find the Right Model Serve Predictions Collect Feedback for Improvement Keep Monitoring & Be Crisis-Ready
  • 36. The Future of A.I. is the automation of A.I.
  • 37. Thanks! Any Questions? WE ARE HIRING. simon@salesforce.com @simonchannet
  • 38. Watch the video with slide synchronization on InfoQ.com! https://www.infoq.com/presentations/ machine-learning-saas