SlideShare uma empresa Scribd logo
1 de 25
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Soji Adeshina, Machine Learning Engineer, Amazon AI
SageMaker Automatic Model
Tuning
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Roadmap
• Hyperparameters
• Search Based HPO
• Bayesian HPO
• Amazon SageMaker AMT
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Hyperparameters
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
What is a Hyperparameter
• Hyperparameter = algorithm parameter
• Training algorithm accepts hyperparameter(s) and returns model
parameters
• It affects how an algorithm behaves during model training process
• “Any decision an algorithm author can’t make for you”
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Examples of Hyperparameters
Model:
Number of layers: 1, 2, 3, …
Activation functions: Sigmoid, tanh, RELU, …
Optimization:
Method: SGD, Adam, AdaGrad, …
Learning Rate: 0.01 to 2
Data:
Batch Size: 8, 16, 32 …
Augmentation: Resize, Normalize, Color Jitter, …
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Model vs Hyperparameter Optimization
𝑙∗
= min
𝜃
ℎ(𝜃)
ℎ(𝜃) = min
𝑤
𝑓(𝑤|𝑋, 𝑦, 𝜃)
Optimize Model params (𝑤)
Optimize Hyperparams (𝜃)
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Blackbox Optimization
• We aim to minimize the objective function .
• We have no knowledge of what the objective function is.
• We don’t have access to the gradients of the objective function.
• All we know is what goes into the function and what comes out.
ℎ( 𝜃)
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Search Based HPO
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Grid Search
Learning Rate
Activation
Sigmoid
RELU
tanh
0 20.5 1 1.5
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Grid Search
Learning Rate
Activation
Sigmoid
RELU
tanh
0 20.5 1 1.5
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Grid Search - Shortcomings
• In grid search the user specifies a finite set of values for each hyperparameter.
• Each hyperparam increases degree of freedom and results in combinatorial explosion.
• Assume each hyper-param has 5 options
e.g. Learning Rate: 0, 0.5, 1, 1.5, 2
1 HP = 5 combinations
2 HPs = 5*5 = 25 combinations
3 HPs = 5*5*5 = 125 combinations
…
10 HPs = 5^10 = 9,765,625 combinations
N HPs = 5^N combinations
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Grid Search - Shortcomings
Learning Rate
Activation
Sigmoid
RELU
tanh
0 20.5 1 1.5
Some hyper-params more important than others.
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Grid Search
Learning Rate
Activation
Sigmoid
RELU
tanh
0 20.5 1 1.5
Wasted Compute
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Random Grid Search
Learning Rate
Activation
Sigmoid
RELU
tanh
0 20.5 1 1.5
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Bayesian HPO
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Model based Bayesian HPO
Learning Rate
Activation
RELU
0 20.5 1 1.5
ℎ 𝜃 : 𝑡𝑟𝑢𝑒 (ℎ𝑖𝑑𝑑𝑒𝑛)
𝐷: 𝑠𝑎𝑚𝑝𝑙𝑒𝑠
ℎ′ 𝜃 : 𝑎𝑝𝑝𝑟𝑜𝑥𝑖𝑚𝑎𝑡𝑒
𝑐: 𝑐𝑎𝑛𝑑𝑖𝑑𝑎𝑡𝑒
• ℎ 𝜃 is expensive so use an approximation or surrogate model ℎ′(𝜃) instead
• Use an acquisition function 𝔼[𝐼 𝜆 ] to selects next points
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
• Keeps track of previous evaluations and infers expected behaviour.
• It is Bayesian in a sense that the surrogate model model uses prior probability
distribution to make predictions about the posterior.
𝑃 𝑌 𝑋 ∝ 𝑃 𝑌 𝑋 𝑃(𝑌)
• Improves our beliefs about the objective function by applying iterative learning.
Model based Bayesian HPO
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Surrogate Model - Gaussian Process
• Gaussian Process is a distribution over functions each of which returns mean and variance of a
Gaussian distribution.
𝑓: 𝒳 → ℝ
𝑓(𝑋𝑡1
), 𝑓(𝑋𝑡2
), … , 𝑓(𝑋𝑡 𝑛
)~𝒩(𝝁, 𝜮)
• Gaussian distribution is a distribution of random numbers that is described by mean 𝜇 and variance
𝜎2
.
• Each distribution corresponds to a set of hyperparameters Λ;
𝜆𝑖 𝜖Λ = 𝑖=1
𝑛
Λ 𝑖
• A Gaussian process is fully specified by a mean 𝜇 𝜆 and a covariance function 𝑘(𝜆, 𝜆′).
𝒢(𝜇 𝜆 , 𝑘(𝜆, 𝜆′))
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Gaussian Process for model of model loss
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Covariance Matrix
Similarity between 2 points: controls ‘smoothness’.
SageMaker uses Matérn kernel with 𝜐 = 5/2
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Acquisition Function
• Given posterior distribution of functions…
𝔼 𝕀 𝜆 = 𝔼[max(𝑓_ min −𝑌, 0)]
• Used as criteria for selecting next candidate hyperparams for evaluation.
• Often depends on the best hyperparams seen so far in search.
• Controls exploration vs exploitation in search.
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Acquisition Function: Expected Improvement
0.3 0.2
𝐸𝐼 𝑥1 > 𝐸𝐼(𝑥2)
𝑥1
𝑥2
1
-1
70%
Current best
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Using Acquisition Function
• Expected improvement
[maximining the dashed line] has
two components:
• One is dependent on −𝜇 [solid line]
• The other dependent on uncertainty or
variance 𝑘(𝜆, 𝜆′) [blue line]
• There fore we maximize the
acquisition function wherever:
• Mean, 𝜇, is low, or
• Uncertainty,𝑘(𝜆, 𝜆′), is high.
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Part 2: Hands On with Amazon SageMaker AMT
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

Mais conteúdo relacionado

Semelhante a Sagemaker Automatic model tuning

NEW LAUNCH! Introducing Amazon SageMaker - MCL365 - re:Invent 2017
NEW LAUNCH! Introducing Amazon SageMaker - MCL365 - re:Invent 2017NEW LAUNCH! Introducing Amazon SageMaker - MCL365 - re:Invent 2017
NEW LAUNCH! Introducing Amazon SageMaker - MCL365 - re:Invent 2017Amazon Web Services
 
Build Deep Learning Applications Using Apache MXNet, Featuring Workday (AIM40...
Build Deep Learning Applications Using Apache MXNet, Featuring Workday (AIM40...Build Deep Learning Applications Using Apache MXNet, Featuring Workday (AIM40...
Build Deep Learning Applications Using Apache MXNet, Featuring Workday (AIM40...Amazon Web Services
 
RoboMaker로 DeepRacer 자율 주행차 만들기 :: 유정열 - AWS Community Day 2019
RoboMaker로 DeepRacer 자율 주행차 만들기 :: 유정열 - AWS Community Day 2019 RoboMaker로 DeepRacer 자율 주행차 만들기 :: 유정열 - AWS Community Day 2019
RoboMaker로 DeepRacer 자율 주행차 만들기 :: 유정열 - AWS Community Day 2019 AWSKRUG - AWS한국사용자모임
 
AWS re:Invent 2017 | CloudHealth Tech Session
AWS re:Invent 2017 |  CloudHealth Tech SessionAWS re:Invent 2017 |  CloudHealth Tech Session
AWS re:Invent 2017 | CloudHealth Tech SessionCloudHealth by VMware
 
Using Amazon SageMaker to build, train, and deploy your ML Models
Using Amazon SageMaker to build, train, and deploy your ML ModelsUsing Amazon SageMaker to build, train, and deploy your ML Models
Using Amazon SageMaker to build, train, and deploy your ML ModelsAmazon Web Services
 
[NEW LAUNCH!] Introducing Amazon SageMaker RL - Build and Train Reinforcement...
[NEW LAUNCH!] Introducing Amazon SageMaker RL - Build and Train Reinforcement...[NEW LAUNCH!] Introducing Amazon SageMaker RL - Build and Train Reinforcement...
[NEW LAUNCH!] Introducing Amazon SageMaker RL - Build and Train Reinforcement...Amazon Web Services
 
Introduction to Machine Learning, Deep Learning and MXNet
Introduction to Machine Learning, Deep Learning and MXNetIntroduction to Machine Learning, Deep Learning and MXNet
Introduction to Machine Learning, Deep Learning and MXNetAmazon Web Services
 
Amazon SageMaker (December 2018)
Amazon SageMaker (December 2018)Amazon SageMaker (December 2018)
Amazon SageMaker (December 2018)Julien SIMON
 
Julien Simon, Principal Technical Evangelist at Amazon - Machine Learning: Fr...
Julien Simon, Principal Technical Evangelist at Amazon - Machine Learning: Fr...Julien Simon, Principal Technical Evangelist at Amazon - Machine Learning: Fr...
Julien Simon, Principal Technical Evangelist at Amazon - Machine Learning: Fr...Codiax
 
From Notebook to production with Amazon SageMaker
From Notebook to production with Amazon SageMakerFrom Notebook to production with Amazon SageMaker
From Notebook to production with Amazon SageMakerAmazon Web Services
 
Machine Learning: From Notebook to Production with Amazon Sagemaker
Machine Learning: From Notebook to Production with Amazon SagemakerMachine Learning: From Notebook to Production with Amazon Sagemaker
Machine Learning: From Notebook to Production with Amazon SagemakerAmazon Web Services
 
Machine Learning - From Notebook to Production with Amazon Sagemaker
Machine Learning - From Notebook to Production with Amazon SagemakerMachine Learning - From Notebook to Production with Amazon Sagemaker
Machine Learning - From Notebook to Production with Amazon SagemakerAmazon Web Services
 
Auto Scaling Prime Time: Target Tracking Hits the Bullseye at Netflix - CMP31...
Auto Scaling Prime Time: Target Tracking Hits the Bullseye at Netflix - CMP31...Auto Scaling Prime Time: Target Tracking Hits the Bullseye at Netflix - CMP31...
Auto Scaling Prime Time: Target Tracking Hits the Bullseye at Netflix - CMP31...Amazon Web Services
 
Machine Learning: From Notebook to Production with Amazon Sagemaker (January ...
Machine Learning: From Notebook to Production with Amazon Sagemaker (January ...Machine Learning: From Notebook to Production with Amazon Sagemaker (January ...
Machine Learning: From Notebook to Production with Amazon Sagemaker (January ...Julien SIMON
 
Using Amazon SageMaker to Build, Train, and Deploy Your ML Models
Using Amazon SageMaker to Build, Train, and Deploy Your ML ModelsUsing Amazon SageMaker to Build, Train, and Deploy Your ML Models
Using Amazon SageMaker to Build, Train, and Deploy Your ML ModelsAmazon Web Services
 
AWS reInvent 2017 recap - Optimizing Costs as You Scale on AWS
AWS reInvent 2017 recap - Optimizing Costs as You Scale on AWSAWS reInvent 2017 recap - Optimizing Costs as You Scale on AWS
AWS reInvent 2017 recap - Optimizing Costs as You Scale on AWSAmazon Web Services
 
100 Billion Data Points With Lambda_AWSPSSummit_Singapore
100 Billion Data Points With Lambda_AWSPSSummit_Singapore100 Billion Data Points With Lambda_AWSPSSummit_Singapore
100 Billion Data Points With Lambda_AWSPSSummit_SingaporeAmazon Web Services
 
Debugging Gluon and Apache MXNet (AIM423) - AWS re:Invent 2018
Debugging Gluon and Apache MXNet (AIM423) - AWS re:Invent 2018Debugging Gluon and Apache MXNet (AIM423) - AWS re:Invent 2018
Debugging Gluon and Apache MXNet (AIM423) - AWS re:Invent 2018Amazon Web Services
 

Semelhante a Sagemaker Automatic model tuning (20)

NEW LAUNCH! Introducing Amazon SageMaker - MCL365 - re:Invent 2017
NEW LAUNCH! Introducing Amazon SageMaker - MCL365 - re:Invent 2017NEW LAUNCH! Introducing Amazon SageMaker - MCL365 - re:Invent 2017
NEW LAUNCH! Introducing Amazon SageMaker - MCL365 - re:Invent 2017
 
Build Deep Learning Applications Using Apache MXNet, Featuring Workday (AIM40...
Build Deep Learning Applications Using Apache MXNet, Featuring Workday (AIM40...Build Deep Learning Applications Using Apache MXNet, Featuring Workday (AIM40...
Build Deep Learning Applications Using Apache MXNet, Featuring Workday (AIM40...
 
Deep Learning with MXNet
Deep Learning with MXNetDeep Learning with MXNet
Deep Learning with MXNet
 
RoboMaker로 DeepRacer 자율 주행차 만들기 :: 유정열 - AWS Community Day 2019
RoboMaker로 DeepRacer 자율 주행차 만들기 :: 유정열 - AWS Community Day 2019 RoboMaker로 DeepRacer 자율 주행차 만들기 :: 유정열 - AWS Community Day 2019
RoboMaker로 DeepRacer 자율 주행차 만들기 :: 유정열 - AWS Community Day 2019
 
AWS re:Invent 2017 | CloudHealth Tech Session
AWS re:Invent 2017 |  CloudHealth Tech SessionAWS re:Invent 2017 |  CloudHealth Tech Session
AWS re:Invent 2017 | CloudHealth Tech Session
 
Using Amazon SageMaker to build, train, and deploy your ML Models
Using Amazon SageMaker to build, train, and deploy your ML ModelsUsing Amazon SageMaker to build, train, and deploy your ML Models
Using Amazon SageMaker to build, train, and deploy your ML Models
 
[NEW LAUNCH!] Introducing Amazon SageMaker RL - Build and Train Reinforcement...
[NEW LAUNCH!] Introducing Amazon SageMaker RL - Build and Train Reinforcement...[NEW LAUNCH!] Introducing Amazon SageMaker RL - Build and Train Reinforcement...
[NEW LAUNCH!] Introducing Amazon SageMaker RL - Build and Train Reinforcement...
 
Introduction to Machine Learning, Deep Learning and MXNet
Introduction to Machine Learning, Deep Learning and MXNetIntroduction to Machine Learning, Deep Learning and MXNet
Introduction to Machine Learning, Deep Learning and MXNet
 
Amazon SageMaker (December 2018)
Amazon SageMaker (December 2018)Amazon SageMaker (December 2018)
Amazon SageMaker (December 2018)
 
Julien Simon, Principal Technical Evangelist at Amazon - Machine Learning: Fr...
Julien Simon, Principal Technical Evangelist at Amazon - Machine Learning: Fr...Julien Simon, Principal Technical Evangelist at Amazon - Machine Learning: Fr...
Julien Simon, Principal Technical Evangelist at Amazon - Machine Learning: Fr...
 
From Notebook to production with Amazon SageMaker
From Notebook to production with Amazon SageMakerFrom Notebook to production with Amazon SageMaker
From Notebook to production with Amazon SageMaker
 
Machine Learning: From Notebook to Production with Amazon Sagemaker
Machine Learning: From Notebook to Production with Amazon SagemakerMachine Learning: From Notebook to Production with Amazon Sagemaker
Machine Learning: From Notebook to Production with Amazon Sagemaker
 
Machine Learning - From Notebook to Production with Amazon Sagemaker
Machine Learning - From Notebook to Production with Amazon SagemakerMachine Learning - From Notebook to Production with Amazon Sagemaker
Machine Learning - From Notebook to Production with Amazon Sagemaker
 
Deep Learning Workshop
Deep Learning WorkshopDeep Learning Workshop
Deep Learning Workshop
 
Auto Scaling Prime Time: Target Tracking Hits the Bullseye at Netflix - CMP31...
Auto Scaling Prime Time: Target Tracking Hits the Bullseye at Netflix - CMP31...Auto Scaling Prime Time: Target Tracking Hits the Bullseye at Netflix - CMP31...
Auto Scaling Prime Time: Target Tracking Hits the Bullseye at Netflix - CMP31...
 
Machine Learning: From Notebook to Production with Amazon Sagemaker (January ...
Machine Learning: From Notebook to Production with Amazon Sagemaker (January ...Machine Learning: From Notebook to Production with Amazon Sagemaker (January ...
Machine Learning: From Notebook to Production with Amazon Sagemaker (January ...
 
Using Amazon SageMaker to Build, Train, and Deploy Your ML Models
Using Amazon SageMaker to Build, Train, and Deploy Your ML ModelsUsing Amazon SageMaker to Build, Train, and Deploy Your ML Models
Using Amazon SageMaker to Build, Train, and Deploy Your ML Models
 
AWS reInvent 2017 recap - Optimizing Costs as You Scale on AWS
AWS reInvent 2017 recap - Optimizing Costs as You Scale on AWSAWS reInvent 2017 recap - Optimizing Costs as You Scale on AWS
AWS reInvent 2017 recap - Optimizing Costs as You Scale on AWS
 
100 Billion Data Points With Lambda_AWSPSSummit_Singapore
100 Billion Data Points With Lambda_AWSPSSummit_Singapore100 Billion Data Points With Lambda_AWSPSSummit_Singapore
100 Billion Data Points With Lambda_AWSPSSummit_Singapore
 
Debugging Gluon and Apache MXNet (AIM423) - AWS re:Invent 2018
Debugging Gluon and Apache MXNet (AIM423) - AWS re:Invent 2018Debugging Gluon and Apache MXNet (AIM423) - AWS re:Invent 2018
Debugging Gluon and Apache MXNet (AIM423) - AWS re:Invent 2018
 

Último

Platformless Horizons for Digital Adaptability
Platformless Horizons for Digital AdaptabilityPlatformless Horizons for Digital Adaptability
Platformless Horizons for Digital AdaptabilityWSO2
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businesspanagenda
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Victor Rentea
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MIND CTI
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyKhushali Kathiriya
 
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...apidays
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingEdi Saputra
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...Zilliz
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Victor Rentea
 
Exploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusExploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusZilliz
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc
 
WSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering DevelopersWSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering DevelopersWSO2
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...apidays
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FMESafe Software
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherRemote DBA Services
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century educationjfdjdjcjdnsjd
 
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot ModelMcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot ModelDeepika Singh
 
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfRising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfOrbitshub
 
Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)Zilliz
 

Último (20)

Platformless Horizons for Digital Adaptability
Platformless Horizons for Digital AdaptabilityPlatformless Horizons for Digital Adaptability
Platformless Horizons for Digital Adaptability
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
 
Exploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusExploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with Milvus
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
WSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering DevelopersWSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering Developers
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot ModelMcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
 
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfRising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
 
Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)
 

Sagemaker Automatic model tuning

  • 1. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Soji Adeshina, Machine Learning Engineer, Amazon AI SageMaker Automatic Model Tuning
  • 2. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Roadmap • Hyperparameters • Search Based HPO • Bayesian HPO • Amazon SageMaker AMT
  • 3. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Hyperparameters
  • 4. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. What is a Hyperparameter • Hyperparameter = algorithm parameter • Training algorithm accepts hyperparameter(s) and returns model parameters • It affects how an algorithm behaves during model training process • “Any decision an algorithm author can’t make for you”
  • 5. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Examples of Hyperparameters Model: Number of layers: 1, 2, 3, … Activation functions: Sigmoid, tanh, RELU, … Optimization: Method: SGD, Adam, AdaGrad, … Learning Rate: 0.01 to 2 Data: Batch Size: 8, 16, 32 … Augmentation: Resize, Normalize, Color Jitter, …
  • 6. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Model vs Hyperparameter Optimization 𝑙∗ = min 𝜃 ℎ(𝜃) ℎ(𝜃) = min 𝑤 𝑓(𝑤|𝑋, 𝑦, 𝜃) Optimize Model params (𝑤) Optimize Hyperparams (𝜃)
  • 7. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Blackbox Optimization • We aim to minimize the objective function . • We have no knowledge of what the objective function is. • We don’t have access to the gradients of the objective function. • All we know is what goes into the function and what comes out. ℎ( 𝜃)
  • 8. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Search Based HPO
  • 9. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Grid Search Learning Rate Activation Sigmoid RELU tanh 0 20.5 1 1.5
  • 10. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Grid Search Learning Rate Activation Sigmoid RELU tanh 0 20.5 1 1.5
  • 11. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Grid Search - Shortcomings • In grid search the user specifies a finite set of values for each hyperparameter. • Each hyperparam increases degree of freedom and results in combinatorial explosion. • Assume each hyper-param has 5 options e.g. Learning Rate: 0, 0.5, 1, 1.5, 2 1 HP = 5 combinations 2 HPs = 5*5 = 25 combinations 3 HPs = 5*5*5 = 125 combinations … 10 HPs = 5^10 = 9,765,625 combinations N HPs = 5^N combinations
  • 12. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Grid Search - Shortcomings Learning Rate Activation Sigmoid RELU tanh 0 20.5 1 1.5 Some hyper-params more important than others.
  • 13. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Grid Search Learning Rate Activation Sigmoid RELU tanh 0 20.5 1 1.5 Wasted Compute
  • 14. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Random Grid Search Learning Rate Activation Sigmoid RELU tanh 0 20.5 1 1.5
  • 15. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Bayesian HPO
  • 16. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Model based Bayesian HPO Learning Rate Activation RELU 0 20.5 1 1.5 ℎ 𝜃 : 𝑡𝑟𝑢𝑒 (ℎ𝑖𝑑𝑑𝑒𝑛) 𝐷: 𝑠𝑎𝑚𝑝𝑙𝑒𝑠 ℎ′ 𝜃 : 𝑎𝑝𝑝𝑟𝑜𝑥𝑖𝑚𝑎𝑡𝑒 𝑐: 𝑐𝑎𝑛𝑑𝑖𝑑𝑎𝑡𝑒 • ℎ 𝜃 is expensive so use an approximation or surrogate model ℎ′(𝜃) instead • Use an acquisition function 𝔼[𝐼 𝜆 ] to selects next points
  • 17. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. • Keeps track of previous evaluations and infers expected behaviour. • It is Bayesian in a sense that the surrogate model model uses prior probability distribution to make predictions about the posterior. 𝑃 𝑌 𝑋 ∝ 𝑃 𝑌 𝑋 𝑃(𝑌) • Improves our beliefs about the objective function by applying iterative learning. Model based Bayesian HPO
  • 18. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Surrogate Model - Gaussian Process • Gaussian Process is a distribution over functions each of which returns mean and variance of a Gaussian distribution. 𝑓: 𝒳 → ℝ 𝑓(𝑋𝑡1 ), 𝑓(𝑋𝑡2 ), … , 𝑓(𝑋𝑡 𝑛 )~𝒩(𝝁, 𝜮) • Gaussian distribution is a distribution of random numbers that is described by mean 𝜇 and variance 𝜎2 . • Each distribution corresponds to a set of hyperparameters Λ; 𝜆𝑖 𝜖Λ = 𝑖=1 𝑛 Λ 𝑖 • A Gaussian process is fully specified by a mean 𝜇 𝜆 and a covariance function 𝑘(𝜆, 𝜆′). 𝒢(𝜇 𝜆 , 𝑘(𝜆, 𝜆′))
  • 19. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Gaussian Process for model of model loss
  • 20. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Covariance Matrix Similarity between 2 points: controls ‘smoothness’. SageMaker uses Matérn kernel with 𝜐 = 5/2
  • 21. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Acquisition Function • Given posterior distribution of functions… 𝔼 𝕀 𝜆 = 𝔼[max(𝑓_ min −𝑌, 0)] • Used as criteria for selecting next candidate hyperparams for evaluation. • Often depends on the best hyperparams seen so far in search. • Controls exploration vs exploitation in search.
  • 22. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Acquisition Function: Expected Improvement 0.3 0.2 𝐸𝐼 𝑥1 > 𝐸𝐼(𝑥2) 𝑥1 𝑥2 1 -1 70% Current best
  • 23. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Using Acquisition Function • Expected improvement [maximining the dashed line] has two components: • One is dependent on −𝜇 [solid line] • The other dependent on uncertainty or variance 𝑘(𝜆, 𝜆′) [blue line] • There fore we maximize the acquisition function wherever: • Mean, 𝜇, is low, or • Uncertainty,𝑘(𝜆, 𝜆′), is high.
  • 24. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Part 2: Hands On with Amazon SageMaker AMT
  • 25. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

Notas do Editor

  1. Various data types: Continuous, Integer, Categorical Various ranges
  2. 𝑓 and ℎ return the loss: cross entropy loss. Can find gradient of 𝜆: 1st order. Can’t find gradient of 𝜃: 0th order. Often no closed form.
  3. Underlying true relationship is hidden. Cost time and money to evaluate. Must sample.
  4. Discretize
  5. 1000 years for model that takes 1h to train
  6. Often some hyper-params more important than others.
  7. Wasted compute.
  8. Can limit number of samples
  9. Use quick model to choose next point to evaluate. Use acquisition function to choose next point.
  10. Assumes similar points give similar results: Co-variance function. Gives probabilistic estimates. Closed form expressions for mean and variance.
  11. Most common is Squared Exponential Kernel (Gaussian radial basis function). Matérn generalizes this. V=Inf gives Squared Exponential Kernel, Infinitely differentiable. V=5/2 Can differentiate twice but not 3 times) – good default, works on wide range of problems, robust Simplifications for these cases.