FrugalML: Using ML APIs More Accurately and Cheaply

Databricks
DatabricksDeveloper Marketing and Relations at MuleSoft em Databricks
FrugalML: Using ML Prediction APIs
more Accurately and Cheaply
Lingjiao Chen
1
Joint work with
James Zou
Matei Zaharia
Outline
Introduction to MLaaS
FrugalML: How to save up to 90% using cloud ML APIs?
The main idea
How to use it
Empirical evaluation on real world ML APIs
What is next?
2
Copyright@Lingjiao Chen,
https://lchen001.github.io/
Machine Learning as a Service (MLaaS)
- Goal:
Mitigate low level overheads
- e.g., model training
- data labelling, etc
- Participator:
-VALUE:
Previous: USD 1.0 billion in 2019
Expected: USD 8.48 billion by 2025
2019 2024
C
A
G
R
:
4
3
%
Source: Mordor Intelligence
3
Copyright@Lingjiao Chen,
https://lchen001.github.io/
Copyright@Lingjiao Chen,
https://lchen001.github.io/
4
Example: FER via GoogleVision API
Cost: $0.0015/image
Problem: Which API to use?
- ML Prediction APIs: a data point -> a label (plus a cost)
e.g., Google API: images -> facial emotions, 0.0015$/image
- Many commercial APIs with same functionality
- Heterogeneity in performance and cost
… …
5
Copyright@Lingjiao Chen,
https://lchen001.github.io/
Our Proposed Solution: FrugalML
- Optimize for best sequential strategy with a budget constraint
Up to 90% cost savings or 5% better accuracy with same cost
across all tasks and datasets evaluated
6
Copyright@Lingjiao Chen,
https://lchen001.github.io/
FrugalML: How to use it?
- Call a base service first
- Take the predicted quality score (QS) and predicted label from the
base service as features to decide
- i) if the prediction should be accepted
- ii) if and which additional API should be invoked.
7
Copyright@Lingjiao Chen,
https://lchen001.github.io/
FrugalML: How to use it?
8
Copyright@Lingjiao Chen,
https://lchen001.github.io/
FrugalMLTraining FrugalML Deploying
Google API Deploying
FrugalML: How to train it?
Goal: Pick the optimal base/add-on services, thresholds, etc.
Combinatorial optimization problem: provably efficient solver?
Statistically: How many samples are needed?
Computationally: How long does it take for training?
9
Copyright@Lingjiao Chen,
https://lchen001.github.io/
FrugalML: A provably efficient solver
✔ Key lemma: base/add-on services from <3 services (sparsity)
✔ An approx. solver: O(1/N) accuracy loss guarantee
✔ Sample complexity: N samples annotated by APIs
✔ Computational cost: O(N)
10
Copyright@Lingjiao Chen,
https://lchen001.github.io/
Learned FrugalML Strategy
Case Study on a facial emotion dataset, FER+
Budget: $5 (=cheapest commercial API)
FrugalML works well in practice
11
Copyright@Lingjiao Chen,
https://lchen001.github.io/
$15
$10
$0.01
Accuracy and Cost Comparison
Cost
(Dollar)
Accuracy
(%)
Case Study on a facial emotion dataset, FER+
FrugalML works well in practice
12
Copyright@Lingjiao Chen,
https://lchen001.github.io/
Accuracy Budget Trade-offs
Case study on a facial emotion dataset, FER+1
Accuracy
(%)
Microsoft API
Github API
FrugalML works well in practice
13
Copyright@Lingjiao Chen,
https://lchen001.github.io/
Face++ API
Google API
FrugalML’s cost savings (%) while match best commercial API’s accuracy
Up to 90% cost savings or 5% better accuracy with same cost
across all tasks and datasets evaluated
FrugalML works well in practice
Vision NLP Speech
14
Copyright@Lingjiao Chen,
https://lchen001.github.io/
FrugalML’s accuracy improvement (%) while match best commercial API’s cost
Up to 90% cost savings or 5% better accuracy with same cost
across all tasks and datasets evaluated
FrugalML works well in practice
Vision NLP Speech
15
Copyright@Lingjiao Chen,
https://lchen001.github.io/
Conclusions and Open Problems
Question: Best use ML APIs in the market within a budget
Our solution: FrugalML
Provable performance and efficiency guarantee
Up to 90% cost savings or 5% better accuracy with same cost
Dataset with 612,139 samples annotated by APIs and code released
Open problems: many exist in this under-explored area
More complicated tasks?
API performance shift?
Other requirements (fairness, latency, …)?
16
Copyright@Lingjiao Chen,
https://lchen001.github.io/
Code and Data:
github.com/lchen001/Frugal
ML
More on theoretical analysis, empirical results:
Please visit our project website and/or full paper!
17
Copyright@Lingjiao Chen,
https://lchen001.github.io/
1 de 17

Recomendados

Commercializing Alternative Data por
Commercializing Alternative DataCommercializing Alternative Data
Commercializing Alternative DataDatabricks
350 visualizações18 slides
Jeeves Grows Up: An AI Chatbot for Performance and Quality por
Jeeves Grows Up: An AI Chatbot for Performance and QualityJeeves Grows Up: An AI Chatbot for Performance and Quality
Jeeves Grows Up: An AI Chatbot for Performance and QualityDatabricks
260 visualizações28 slides
Polymorphic Table Functions: The Best Way to Integrate SQL and Apache Spark por
Polymorphic Table Functions: The Best Way to Integrate SQL and Apache SparkPolymorphic Table Functions: The Best Way to Integrate SQL and Apache Spark
Polymorphic Table Functions: The Best Way to Integrate SQL and Apache SparkDatabricks
421 visualizações28 slides
Software Engineering for Data Scientists por
Software Engineering for Data ScientistsSoftware Engineering for Data Scientists
Software Engineering for Data ScientistsDomino Data Lab
447 visualizações11 slides
Architecting for Data Science por
Architecting for Data ScienceArchitecting for Data Science
Architecting for Data ScienceJohann Schleier-Smith
1.3K visualizações114 slides
Deep Learning for Recommender Systems with Nick pentreath por
Deep Learning for Recommender Systems with Nick pentreathDeep Learning for Recommender Systems with Nick pentreath
Deep Learning for Recommender Systems with Nick pentreathDatabricks
3.2K visualizações31 slides

Mais conteúdo relacionado

Mais procurados

Feature Store as a Data Foundation for Machine Learning por
Feature Store as a Data Foundation for Machine LearningFeature Store as a Data Foundation for Machine Learning
Feature Store as a Data Foundation for Machine LearningProvectus
349 visualizações43 slides
Horizon: Deep Reinforcement Learning at Scale por
Horizon: Deep Reinforcement Learning at ScaleHorizon: Deep Reinforcement Learning at Scale
Horizon: Deep Reinforcement Learning at ScaleDatabricks
670 visualizações37 slides
Scaling AutoML-Driven Anomaly Detection With Luminaire por
Scaling AutoML-Driven Anomaly Detection With LuminaireScaling AutoML-Driven Anomaly Detection With Luminaire
Scaling AutoML-Driven Anomaly Detection With LuminaireDatabricks
273 visualizações34 slides
H2O World - Data Science w/ Big Data in a Corporate Environment - Nachum Shacham por
H2O World - Data Science w/ Big Data in a Corporate Environment - Nachum ShachamH2O World - Data Science w/ Big Data in a Corporate Environment - Nachum Shacham
H2O World - Data Science w/ Big Data in a Corporate Environment - Nachum ShachamSri Ambati
2.3K visualizações18 slides
Real-time Recommendations for Retail: Architecture, Algorithms, and Design por
Real-time Recommendations for Retail: Architecture, Algorithms, and DesignReal-time Recommendations for Retail: Architecture, Algorithms, and Design
Real-time Recommendations for Retail: Architecture, Algorithms, and DesignJuliet Hougland
5K visualizações54 slides
Building Real-Time Data Pipeline for Diabetes Medication Recommender System U... por
Building Real-Time Data Pipeline for Diabetes Medication Recommender System U...Building Real-Time Data Pipeline for Diabetes Medication Recommender System U...
Building Real-Time Data Pipeline for Diabetes Medication Recommender System U...Databricks
958 visualizações31 slides

Mais procurados(20)

Feature Store as a Data Foundation for Machine Learning por Provectus
Feature Store as a Data Foundation for Machine LearningFeature Store as a Data Foundation for Machine Learning
Feature Store as a Data Foundation for Machine Learning
Provectus349 visualizações
Horizon: Deep Reinforcement Learning at Scale por Databricks
Horizon: Deep Reinforcement Learning at ScaleHorizon: Deep Reinforcement Learning at Scale
Horizon: Deep Reinforcement Learning at Scale
Databricks670 visualizações
Scaling AutoML-Driven Anomaly Detection With Luminaire por Databricks
Scaling AutoML-Driven Anomaly Detection With LuminaireScaling AutoML-Driven Anomaly Detection With Luminaire
Scaling AutoML-Driven Anomaly Detection With Luminaire
Databricks273 visualizações
H2O World - Data Science w/ Big Data in a Corporate Environment - Nachum Shacham por Sri Ambati
H2O World - Data Science w/ Big Data in a Corporate Environment - Nachum ShachamH2O World - Data Science w/ Big Data in a Corporate Environment - Nachum Shacham
H2O World - Data Science w/ Big Data in a Corporate Environment - Nachum Shacham
Sri Ambati2.3K visualizações
Real-time Recommendations for Retail: Architecture, Algorithms, and Design por Juliet Hougland
Real-time Recommendations for Retail: Architecture, Algorithms, and DesignReal-time Recommendations for Retail: Architecture, Algorithms, and Design
Real-time Recommendations for Retail: Architecture, Algorithms, and Design
Juliet Hougland5K visualizações
Building Real-Time Data Pipeline for Diabetes Medication Recommender System U... por Databricks
Building Real-Time Data Pipeline for Diabetes Medication Recommender System U...Building Real-Time Data Pipeline for Diabetes Medication Recommender System U...
Building Real-Time Data Pipeline for Diabetes Medication Recommender System U...
Databricks958 visualizações
Building the Artificially Intelligent Enterprise por Databricks
Building the Artificially Intelligent EnterpriseBuilding the Artificially Intelligent Enterprise
Building the Artificially Intelligent Enterprise
Databricks253 visualizações
Improving Search in Workday Products using Natural Language Processing por DataWorks Summit
Improving Search in Workday Products using Natural Language ProcessingImproving Search in Workday Products using Natural Language Processing
Improving Search in Workday Products using Natural Language Processing
DataWorks Summit729 visualizações
Commercial Analytics at Scale in Pharma: From Hackathon to MVP with Azure Dat... por Databricks
Commercial Analytics at Scale in Pharma: From Hackathon to MVP with Azure Dat...Commercial Analytics at Scale in Pharma: From Hackathon to MVP with Azure Dat...
Commercial Analytics at Scale in Pharma: From Hackathon to MVP with Azure Dat...
Databricks564 visualizações
DevOps for DataScience por Stepan Pushkarev
DevOps for DataScienceDevOps for DataScience
DevOps for DataScience
Stepan Pushkarev1.7K visualizações
Data Science as a Service: Intersection of Cloud Computing and Data Science por Pouria Amirian
Data Science as a Service: Intersection of Cloud Computing and Data ScienceData Science as a Service: Intersection of Cloud Computing and Data Science
Data Science as a Service: Intersection of Cloud Computing and Data Science
Pouria Amirian1.1K visualizações
Machine Learning in Production with Dato Predictive Services por Turi, Inc.
Machine Learning in Production with Dato Predictive ServicesMachine Learning in Production with Dato Predictive Services
Machine Learning in Production with Dato Predictive Services
Turi, Inc.820 visualizações
Importance of ML Reproducibility & Applications with MLfLow por Databricks
Importance of ML Reproducibility & Applications with MLfLowImportance of ML Reproducibility & Applications with MLfLow
Importance of ML Reproducibility & Applications with MLfLow
Databricks288 visualizações
Storage Challenges for Production Machine Learning por Nisha Talagala
Storage Challenges for Production Machine LearningStorage Challenges for Production Machine Learning
Storage Challenges for Production Machine Learning
Nisha Talagala172 visualizações
Accelerating the ML Lifecycle with an Enterprise-Grade Feature Store por Databricks
Accelerating the ML Lifecycle with an Enterprise-Grade Feature StoreAccelerating the ML Lifecycle with an Enterprise-Grade Feature Store
Accelerating the ML Lifecycle with an Enterprise-Grade Feature Store
Databricks562 visualizações
Knowledge Graph for Machine Learning and Data Science por Cambridge Semantics
Knowledge Graph for Machine Learning and Data ScienceKnowledge Graph for Machine Learning and Data Science
Knowledge Graph for Machine Learning and Data Science
Cambridge Semantics260 visualizações
MLCommons: Better ML for Everyone por Databricks
MLCommons: Better ML for EveryoneMLCommons: Better ML for Everyone
MLCommons: Better ML for Everyone
Databricks167 visualizações
A view of graph data usage by Cerved por Data Science Milan
A view of graph data usage by CervedA view of graph data usage by Cerved
A view of graph data usage by Cerved
Data Science Milan341 visualizações
Towards Personalization in Global Digital Health por Databricks
Towards Personalization in Global Digital HealthTowards Personalization in Global Digital Health
Towards Personalization in Global Digital Health
Databricks196 visualizações
Rsqrd AI: How to Design a Reliable and Reproducible Pipeline por Sanjana Chowdhury
Rsqrd AI: How to Design a Reliable and Reproducible PipelineRsqrd AI: How to Design a Reliable and Reproducible Pipeline
Rsqrd AI: How to Design a Reliable and Reproducible Pipeline
Sanjana Chowdhury992 visualizações

Similar a FrugalML: Using ML APIs More Accurately and Cheaply

Ml product page por
Ml product pageMl product page
Ml product pageJanu Jahnavi
57 visualizações13 slides
Ml product page por
Ml product pageMl product page
Ml product pageJanu Jahnavi
11 visualizações13 slides
Unleashing the Power of Generative AI.pdf por
Unleashing the Power of Generative AI.pdfUnleashing the Power of Generative AI.pdf
Unleashing the Power of Generative AI.pdfeoinhalpin99
8 visualizações16 slides
Unleashing the Power of Generative AI.pdf por
Unleashing the Power of Generative AI.pdfUnleashing the Power of Generative AI.pdf
Unleashing the Power of Generative AI.pdfTomHalpin9
38 visualizações16 slides
MongoDB World 2018: Building Intelligent Apps with MongoDB & Google Cloud por
MongoDB World 2018: Building Intelligent Apps with MongoDB & Google CloudMongoDB World 2018: Building Intelligent Apps with MongoDB & Google Cloud
MongoDB World 2018: Building Intelligent Apps with MongoDB & Google CloudMongoDB
1K visualizações70 slides
Smarter Event-Driven Edge with Amazon SageMaker & Project Flogo (AIM204-S) - ... por
Smarter Event-Driven Edge with Amazon SageMaker & Project Flogo (AIM204-S) - ...Smarter Event-Driven Edge with Amazon SageMaker & Project Flogo (AIM204-S) - ...
Smarter Event-Driven Edge with Amazon SageMaker & Project Flogo (AIM204-S) - ...Amazon Web Services
5.1K visualizações42 slides

Similar a FrugalML: Using ML APIs More Accurately and Cheaply(20)

Ml product page por Janu Jahnavi
Ml product pageMl product page
Ml product page
Janu Jahnavi57 visualizações
Ml product page por Janu Jahnavi
Ml product pageMl product page
Ml product page
Janu Jahnavi11 visualizações
Unleashing the Power of Generative AI.pdf por eoinhalpin99
Unleashing the Power of Generative AI.pdfUnleashing the Power of Generative AI.pdf
Unleashing the Power of Generative AI.pdf
eoinhalpin998 visualizações
Unleashing the Power of Generative AI.pdf por TomHalpin9
Unleashing the Power of Generative AI.pdfUnleashing the Power of Generative AI.pdf
Unleashing the Power of Generative AI.pdf
TomHalpin938 visualizações
MongoDB World 2018: Building Intelligent Apps with MongoDB & Google Cloud por MongoDB
MongoDB World 2018: Building Intelligent Apps with MongoDB & Google CloudMongoDB World 2018: Building Intelligent Apps with MongoDB & Google Cloud
MongoDB World 2018: Building Intelligent Apps with MongoDB & Google Cloud
MongoDB1K visualizações
Smarter Event-Driven Edge with Amazon SageMaker & Project Flogo (AIM204-S) - ... por Amazon Web Services
Smarter Event-Driven Edge with Amazon SageMaker & Project Flogo (AIM204-S) - ...Smarter Event-Driven Edge with Amazon SageMaker & Project Flogo (AIM204-S) - ...
Smarter Event-Driven Edge with Amazon SageMaker & Project Flogo (AIM204-S) - ...
Amazon Web Services5.1K visualizações
AI 2023.pdf por DavidCieslak4
AI 2023.pdfAI 2023.pdf
AI 2023.pdf
DavidCieslak41.1K visualizações
Machine Learning Platformization & AutoML: Adopting ML at Scale in the Enterp... por Ed Fernandez
Machine Learning Platformization & AutoML: Adopting ML at Scale in the Enterp...Machine Learning Platformization & AutoML: Adopting ML at Scale in the Enterp...
Machine Learning Platformization & AutoML: Adopting ML at Scale in the Enterp...
Ed Fernandez4.8K visualizações
Building NLP applications with Transformers por Julien SIMON
Building NLP applications with TransformersBuilding NLP applications with Transformers
Building NLP applications with Transformers
Julien SIMON1.1K visualizações
MLSEV Virtual. ML Platformization and AutoML in the Enterprise por BigML, Inc
MLSEV Virtual. ML Platformization and AutoML in the EnterpriseMLSEV Virtual. ML Platformization and AutoML in the Enterprise
MLSEV Virtual. ML Platformization and AutoML in the Enterprise
BigML, Inc390 visualizações
Xain.io exhibiting at Berlin Tech Job Fair Spring 2020 por TechMeetups
Xain.io exhibiting at Berlin Tech Job Fair Spring 2020Xain.io exhibiting at Berlin Tech Job Fair Spring 2020
Xain.io exhibiting at Berlin Tech Job Fair Spring 2020
TechMeetups129 visualizações
USING FACTORY DESIGN PATTERNS IN MAP REDUCE DESIGN FOR BIG DATA ANALYTICS por HCL Technologies
USING FACTORY DESIGN PATTERNS IN MAP REDUCE DESIGN FOR BIG DATA ANALYTICSUSING FACTORY DESIGN PATTERNS IN MAP REDUCE DESIGN FOR BIG DATA ANALYTICS
USING FACTORY DESIGN PATTERNS IN MAP REDUCE DESIGN FOR BIG DATA ANALYTICS
HCL Technologies2.3K visualizações
Natural Language Processing at Scale por Andrei Lopatenko
Natural Language Processing at ScaleNatural Language Processing at Scale
Natural Language Processing at Scale
Andrei Lopatenko124 visualizações
Bailing Out Your Business with Open Source por Matt Asay
Bailing Out Your Business with Open SourceBailing Out Your Business with Open Source
Bailing Out Your Business with Open Source
Matt Asay635 visualizações
Designing a Generative AI QnA solution with Proprietary Enterprise Business K... por IRJET Journal
Designing a Generative AI QnA solution with Proprietary Enterprise Business K...Designing a Generative AI QnA solution with Proprietary Enterprise Business K...
Designing a Generative AI QnA solution with Proprietary Enterprise Business K...
IRJET Journal27 visualizações
Using ml to accelerate failure analysis por Heemeng Foo
Using ml to accelerate failure analysisUsing ml to accelerate failure analysis
Using ml to accelerate failure analysis
Heemeng Foo118 visualizações
雲端推動的人工智能革命 por Amazon Web Services
雲端推動的人工智能革命雲端推動的人工智能革命
雲端推動的人工智能革命
Amazon Web Services865 visualizações
[DSC Europe 22] Avoid mistakes building AI products - Karol Przystalski por DataScienceConferenc1
[DSC Europe 22] Avoid mistakes building AI products - Karol Przystalski[DSC Europe 22] Avoid mistakes building AI products - Karol Przystalski
[DSC Europe 22] Avoid mistakes building AI products - Karol Przystalski
DataScienceConferenc17 visualizações
Appear IQ The Business Case for hybrid html5 mobile apps por Appear
Appear IQ The Business Case for hybrid html5 mobile appsAppear IQ The Business Case for hybrid html5 mobile apps
Appear IQ The Business Case for hybrid html5 mobile apps
Appear403 visualizações
Applying the Serverless Mindset to Any Tech Stack por Ben Kehoe
Applying the Serverless Mindset to Any Tech StackApplying the Serverless Mindset to Any Tech Stack
Applying the Serverless Mindset to Any Tech Stack
Ben Kehoe558 visualizações

Mais de Databricks

DW Migration Webinar-March 2022.pptx por
DW Migration Webinar-March 2022.pptxDW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptxDatabricks
4.3K visualizações25 slides
Data Lakehouse Symposium | Day 1 | Part 1 por
Data Lakehouse Symposium | Day 1 | Part 1Data Lakehouse Symposium | Day 1 | Part 1
Data Lakehouse Symposium | Day 1 | Part 1Databricks
1.5K visualizações43 slides
Data Lakehouse Symposium | Day 1 | Part 2 por
Data Lakehouse Symposium | Day 1 | Part 2Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 1 | Part 2Databricks
739 visualizações16 slides
Data Lakehouse Symposium | Day 4 por
Data Lakehouse Symposium | Day 4Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4Databricks
1.8K visualizações74 slides
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop por
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
5 Critical Steps to Clean Your Data Swamp When Migrating Off of HadoopDatabricks
6.3K visualizações64 slides
Democratizing Data Quality Through a Centralized Platform por
Democratizing Data Quality Through a Centralized PlatformDemocratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized PlatformDatabricks
1.4K visualizações36 slides

Mais de Databricks(20)

DW Migration Webinar-March 2022.pptx por Databricks
DW Migration Webinar-March 2022.pptxDW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptx
Databricks4.3K visualizações
Data Lakehouse Symposium | Day 1 | Part 1 por Databricks
Data Lakehouse Symposium | Day 1 | Part 1Data Lakehouse Symposium | Day 1 | Part 1
Data Lakehouse Symposium | Day 1 | Part 1
Databricks1.5K visualizações
Data Lakehouse Symposium | Day 1 | Part 2 por Databricks
Data Lakehouse Symposium | Day 1 | Part 2Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 1 | Part 2
Databricks739 visualizações
Data Lakehouse Symposium | Day 4 por Databricks
Data Lakehouse Symposium | Day 4Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4
Databricks1.8K visualizações
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop por Databricks
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
Databricks6.3K visualizações
Democratizing Data Quality Through a Centralized Platform por Databricks
Democratizing Data Quality Through a Centralized PlatformDemocratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized Platform
Databricks1.4K visualizações
Learn to Use Databricks for Data Science por Databricks
Learn to Use Databricks for Data ScienceLearn to Use Databricks for Data Science
Learn to Use Databricks for Data Science
Databricks1.6K visualizações
Why APM Is Not the Same As ML Monitoring por Databricks
Why APM Is Not the Same As ML MonitoringWhy APM Is Not the Same As ML Monitoring
Why APM Is Not the Same As ML Monitoring
Databricks743 visualizações
The Function, the Context, and the Data—Enabling ML Ops at Stitch Fix por Databricks
The Function, the Context, and the Data—Enabling ML Ops at Stitch FixThe Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
The Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
Databricks688 visualizações
Stage Level Scheduling Improving Big Data and AI Integration por Databricks
Stage Level Scheduling Improving Big Data and AI IntegrationStage Level Scheduling Improving Big Data and AI Integration
Stage Level Scheduling Improving Big Data and AI Integration
Databricks850 visualizações
Simplify Data Conversion from Spark to TensorFlow and PyTorch por Databricks
Simplify Data Conversion from Spark to TensorFlow and PyTorchSimplify Data Conversion from Spark to TensorFlow and PyTorch
Simplify Data Conversion from Spark to TensorFlow and PyTorch
Databricks1.8K visualizações
Scaling your Data Pipelines with Apache Spark on Kubernetes por Databricks
Scaling your Data Pipelines with Apache Spark on KubernetesScaling your Data Pipelines with Apache Spark on Kubernetes
Scaling your Data Pipelines with Apache Spark on Kubernetes
Databricks2.1K visualizações
Scaling and Unifying SciKit Learn and Apache Spark Pipelines por Databricks
Scaling and Unifying SciKit Learn and Apache Spark PipelinesScaling and Unifying SciKit Learn and Apache Spark Pipelines
Scaling and Unifying SciKit Learn and Apache Spark Pipelines
Databricks667 visualizações
Sawtooth Windows for Feature Aggregations por Databricks
Sawtooth Windows for Feature AggregationsSawtooth Windows for Feature Aggregations
Sawtooth Windows for Feature Aggregations
Databricks604 visualizações
Redis + Apache Spark = Swiss Army Knife Meets Kitchen Sink por Databricks
Redis + Apache Spark = Swiss Army Knife Meets Kitchen SinkRedis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Redis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Databricks675 visualizações
Re-imagine Data Monitoring with whylogs and Spark por Databricks
Re-imagine Data Monitoring with whylogs and SparkRe-imagine Data Monitoring with whylogs and Spark
Re-imagine Data Monitoring with whylogs and Spark
Databricks550 visualizações
Raven: End-to-end Optimization of ML Prediction Queries por Databricks
Raven: End-to-end Optimization of ML Prediction QueriesRaven: End-to-end Optimization of ML Prediction Queries
Raven: End-to-end Optimization of ML Prediction Queries
Databricks448 visualizações
Processing Large Datasets for ADAS Applications using Apache Spark por Databricks
Processing Large Datasets for ADAS Applications using Apache SparkProcessing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache Spark
Databricks512 visualizações
Massive Data Processing in Adobe Using Delta Lake por Databricks
Massive Data Processing in Adobe Using Delta LakeMassive Data Processing in Adobe Using Delta Lake
Massive Data Processing in Adobe Using Delta Lake
Databricks719 visualizações
Machine Learning CI/CD for Email Attack Detection por Databricks
Machine Learning CI/CD for Email Attack DetectionMachine Learning CI/CD for Email Attack Detection
Machine Learning CI/CD for Email Attack Detection
Databricks389 visualizações

Último

Supercharging your Data with Azure AI Search and Azure OpenAI por
Supercharging your Data with Azure AI Search and Azure OpenAISupercharging your Data with Azure AI Search and Azure OpenAI
Supercharging your Data with Azure AI Search and Azure OpenAIPeter Gallagher
37 visualizações32 slides
Introduction to Microsoft Fabric.pdf por
Introduction to Microsoft Fabric.pdfIntroduction to Microsoft Fabric.pdf
Introduction to Microsoft Fabric.pdfishaniuudeshika
24 visualizações16 slides
RuleBookForTheFairDataEconomy.pptx por
RuleBookForTheFairDataEconomy.pptxRuleBookForTheFairDataEconomy.pptx
RuleBookForTheFairDataEconomy.pptxnoraelstela1
67 visualizações16 slides
Organic Shopping in Google Analytics 4.pdf por
Organic Shopping in Google Analytics 4.pdfOrganic Shopping in Google Analytics 4.pdf
Organic Shopping in Google Analytics 4.pdfGA4 Tutorials
10 visualizações13 slides
PROGRAMME.pdf por
PROGRAMME.pdfPROGRAMME.pdf
PROGRAMME.pdfHiNedHaJar
17 visualizações13 slides
Understanding Hallucinations in LLMs - 2023 09 29.pptx por
Understanding Hallucinations in LLMs - 2023 09 29.pptxUnderstanding Hallucinations in LLMs - 2023 09 29.pptx
Understanding Hallucinations in LLMs - 2023 09 29.pptxGreg Makowski
13 visualizações18 slides

Último(20)

Supercharging your Data with Azure AI Search and Azure OpenAI por Peter Gallagher
Supercharging your Data with Azure AI Search and Azure OpenAISupercharging your Data with Azure AI Search and Azure OpenAI
Supercharging your Data with Azure AI Search and Azure OpenAI
Peter Gallagher37 visualizações
Introduction to Microsoft Fabric.pdf por ishaniuudeshika
Introduction to Microsoft Fabric.pdfIntroduction to Microsoft Fabric.pdf
Introduction to Microsoft Fabric.pdf
ishaniuudeshika24 visualizações
RuleBookForTheFairDataEconomy.pptx por noraelstela1
RuleBookForTheFairDataEconomy.pptxRuleBookForTheFairDataEconomy.pptx
RuleBookForTheFairDataEconomy.pptx
noraelstela167 visualizações
Organic Shopping in Google Analytics 4.pdf por GA4 Tutorials
Organic Shopping in Google Analytics 4.pdfOrganic Shopping in Google Analytics 4.pdf
Organic Shopping in Google Analytics 4.pdf
GA4 Tutorials10 visualizações
PROGRAMME.pdf por HiNedHaJar
PROGRAMME.pdfPROGRAMME.pdf
PROGRAMME.pdf
HiNedHaJar17 visualizações
Understanding Hallucinations in LLMs - 2023 09 29.pptx por Greg Makowski
Understanding Hallucinations in LLMs - 2023 09 29.pptxUnderstanding Hallucinations in LLMs - 2023 09 29.pptx
Understanding Hallucinations in LLMs - 2023 09 29.pptx
Greg Makowski13 visualizações
UNEP FI CRS Climate Risk Results.pptx por pekka28
UNEP FI CRS Climate Risk Results.pptxUNEP FI CRS Climate Risk Results.pptx
UNEP FI CRS Climate Risk Results.pptx
pekka2811 visualizações
Advanced_Recommendation_Systems_Presentation.pptx por neeharikasingh29
Advanced_Recommendation_Systems_Presentation.pptxAdvanced_Recommendation_Systems_Presentation.pptx
Advanced_Recommendation_Systems_Presentation.pptx
neeharikasingh295 visualizações
CRIJ4385_Death Penalty_F23.pptx por yvettemm100
CRIJ4385_Death Penalty_F23.pptxCRIJ4385_Death Penalty_F23.pptx
CRIJ4385_Death Penalty_F23.pptx
yvettemm1006 visualizações
Chapter 3b- Process Communication (1) (1)(1) (1).pptx por ayeshabaig2004
Chapter 3b- Process Communication (1) (1)(1) (1).pptxChapter 3b- Process Communication (1) (1)(1) (1).pptx
Chapter 3b- Process Communication (1) (1)(1) (1).pptx
ayeshabaig20045 visualizações
Short Story Assignment by Kelly Nguyen por kellynguyen01
Short Story Assignment by Kelly NguyenShort Story Assignment by Kelly Nguyen
Short Story Assignment by Kelly Nguyen
kellynguyen0118 visualizações
Building Real-Time Travel Alerts por Timothy Spann
Building Real-Time Travel AlertsBuilding Real-Time Travel Alerts
Building Real-Time Travel Alerts
Timothy Spann109 visualizações
Survey on Factuality in LLM's.pptx por NeethaSherra1
Survey on Factuality in LLM's.pptxSurvey on Factuality in LLM's.pptx
Survey on Factuality in LLM's.pptx
NeethaSherra15 visualizações
How Leaders See Data? (Level 1) por Narendra Narendra
How Leaders See Data? (Level 1)How Leaders See Data? (Level 1)
How Leaders See Data? (Level 1)
Narendra Narendra13 visualizações
RIO GRANDE SUPPLY COMPANY INC, JAYSON.docx por JaysonGarabilesEspej
RIO GRANDE SUPPLY COMPANY INC, JAYSON.docxRIO GRANDE SUPPLY COMPANY INC, JAYSON.docx
RIO GRANDE SUPPLY COMPANY INC, JAYSON.docx
JaysonGarabilesEspej6 visualizações
Cross-network in Google Analytics 4.pdf por GA4 Tutorials
Cross-network in Google Analytics 4.pdfCross-network in Google Analytics 4.pdf
Cross-network in Google Analytics 4.pdf
GA4 Tutorials6 visualizações
Data structure and algorithm. por Abdul salam
Data structure and algorithm. Data structure and algorithm.
Data structure and algorithm.
Abdul salam 18 visualizações
[DSC Europe 23] Spela Poklukar & Tea Brasanac - Retrieval Augmented Generation por DataScienceConferenc1
[DSC Europe 23] Spela Poklukar & Tea Brasanac - Retrieval Augmented Generation[DSC Europe 23] Spela Poklukar & Tea Brasanac - Retrieval Augmented Generation
[DSC Europe 23] Spela Poklukar & Tea Brasanac - Retrieval Augmented Generation
DataScienceConferenc15 visualizações
Vikas 500 BIG DATA TECHNOLOGIES LAB.pdf por vikas12611618
Vikas 500 BIG DATA TECHNOLOGIES LAB.pdfVikas 500 BIG DATA TECHNOLOGIES LAB.pdf
Vikas 500 BIG DATA TECHNOLOGIES LAB.pdf
vikas126116188 visualizações

FrugalML: Using ML APIs More Accurately and Cheaply

  • 1. FrugalML: Using ML Prediction APIs more Accurately and Cheaply Lingjiao Chen 1 Joint work with James Zou Matei Zaharia
  • 2. Outline Introduction to MLaaS FrugalML: How to save up to 90% using cloud ML APIs? The main idea How to use it Empirical evaluation on real world ML APIs What is next? 2 Copyright@Lingjiao Chen, https://lchen001.github.io/
  • 3. Machine Learning as a Service (MLaaS) - Goal: Mitigate low level overheads - e.g., model training - data labelling, etc - Participator: -VALUE: Previous: USD 1.0 billion in 2019 Expected: USD 8.48 billion by 2025 2019 2024 C A G R : 4 3 % Source: Mordor Intelligence 3 Copyright@Lingjiao Chen, https://lchen001.github.io/
  • 5. Problem: Which API to use? - ML Prediction APIs: a data point -> a label (plus a cost) e.g., Google API: images -> facial emotions, 0.0015$/image - Many commercial APIs with same functionality - Heterogeneity in performance and cost … … 5 Copyright@Lingjiao Chen, https://lchen001.github.io/
  • 6. Our Proposed Solution: FrugalML - Optimize for best sequential strategy with a budget constraint Up to 90% cost savings or 5% better accuracy with same cost across all tasks and datasets evaluated 6 Copyright@Lingjiao Chen, https://lchen001.github.io/
  • 7. FrugalML: How to use it? - Call a base service first - Take the predicted quality score (QS) and predicted label from the base service as features to decide - i) if the prediction should be accepted - ii) if and which additional API should be invoked. 7 Copyright@Lingjiao Chen, https://lchen001.github.io/
  • 8. FrugalML: How to use it? 8 Copyright@Lingjiao Chen, https://lchen001.github.io/ FrugalMLTraining FrugalML Deploying Google API Deploying
  • 9. FrugalML: How to train it? Goal: Pick the optimal base/add-on services, thresholds, etc. Combinatorial optimization problem: provably efficient solver? Statistically: How many samples are needed? Computationally: How long does it take for training? 9 Copyright@Lingjiao Chen, https://lchen001.github.io/
  • 10. FrugalML: A provably efficient solver ✔ Key lemma: base/add-on services from <3 services (sparsity) ✔ An approx. solver: O(1/N) accuracy loss guarantee ✔ Sample complexity: N samples annotated by APIs ✔ Computational cost: O(N) 10 Copyright@Lingjiao Chen, https://lchen001.github.io/
  • 11. Learned FrugalML Strategy Case Study on a facial emotion dataset, FER+ Budget: $5 (=cheapest commercial API) FrugalML works well in practice 11 Copyright@Lingjiao Chen, https://lchen001.github.io/ $15 $10 $0.01
  • 12. Accuracy and Cost Comparison Cost (Dollar) Accuracy (%) Case Study on a facial emotion dataset, FER+ FrugalML works well in practice 12 Copyright@Lingjiao Chen, https://lchen001.github.io/
  • 13. Accuracy Budget Trade-offs Case study on a facial emotion dataset, FER+1 Accuracy (%) Microsoft API Github API FrugalML works well in practice 13 Copyright@Lingjiao Chen, https://lchen001.github.io/ Face++ API Google API
  • 14. FrugalML’s cost savings (%) while match best commercial API’s accuracy Up to 90% cost savings or 5% better accuracy with same cost across all tasks and datasets evaluated FrugalML works well in practice Vision NLP Speech 14 Copyright@Lingjiao Chen, https://lchen001.github.io/
  • 15. FrugalML’s accuracy improvement (%) while match best commercial API’s cost Up to 90% cost savings or 5% better accuracy with same cost across all tasks and datasets evaluated FrugalML works well in practice Vision NLP Speech 15 Copyright@Lingjiao Chen, https://lchen001.github.io/
  • 16. Conclusions and Open Problems Question: Best use ML APIs in the market within a budget Our solution: FrugalML Provable performance and efficiency guarantee Up to 90% cost savings or 5% better accuracy with same cost Dataset with 612,139 samples annotated by APIs and code released Open problems: many exist in this under-explored area More complicated tasks? API performance shift? Other requirements (fairness, latency, …)? 16 Copyright@Lingjiao Chen, https://lchen001.github.io/
  • 17. Code and Data: github.com/lchen001/Frugal ML More on theoretical analysis, empirical results: Please visit our project website and/or full paper! 17 Copyright@Lingjiao Chen, https://lchen001.github.io/