SlideShare uma empresa Scribd logo
1 de 41
Practical Data Science
Implementation on AWS
Ding Li 2021.8
2
1. Analyze Datasets and Train
ML Models using AutoML
3
Data Science and Cloud
4
Register Data with AWS Glue and Query Data with Athena
5
Data Visualization
6
Statistical Bias and SageMaker Clarify
Covariant Drift: distribution of the independent variables or the features can change.
Prior Probability Drift: data distribution of your labels or the targeted variables might change.
Concept Drift: relationship between the features and the labels can change. Concept drift also
called as concept shift can happen when the definition of the label itself changes based
on
a particular feature like age or geographical location.
Measure
Class Imbalance (CI)
• Measures the imbalance in the number of examples that are provided for different facet values.
• Does a particular product category have disproportionately large number of total reviews than
any other category in the dataset?
Difference in Proportions of Labels (DPL)
• Measures the imbalance of positive outcomes between the different facet values.
• If a particular product category has disproportionately higher ratings than other categories.
Amazon SageMaker Clarify
7
Feature Importance SHAP
Rank the individual features in the order of their importance and
contribution to the final model.
SHAP (SHapley Additive exPlanations) GitHub paper YouTube
A game theoretic approach to explain the output of any machine
learning model. It connects optimal credit allocation with local
explanations using the classic Shapley values from game theory and
their related extensions
New Data Flow
Import Data
Add Data Analysis
Feature Importance
8
• Auto ML allows for experts to focus on those hard problems that can't be solved through Auto ML.
• Auto ML can reduce the repetitive work, experts can apply their domain to analyze the results
9
Automatic data pre-processing and feature engineering
• Automatic data pre-processing and feature engineering automatically fills in the missing data, provides statistical insights about columns in your dataset, and automatically
extracts information from non-numeric columns, such as date and time information from timestamps.
• Automatic ML model selection automatically infers the type of predictions that best suit your data, such as binary classification, multi-class classification, or regression. SageMaker
Autopilot then explores high-performing algorithms such as gradient boosting decision tree, feedforward deep neural networks, and logistic regression, and trains and optimizes hundreds of models based
on these algorithms to find the model that best fits your data.
• Model leaderboard can view the list of models, ranked by metrics such as accuracy, precision, recall, and area under the curve (AUC), review model details such as the impact of features on
predictions, and deploy the model that is best suited to your use case.
10
Amazon SageMaker Built-in Algorithms
11
Explore the Use Case and Analyze the Dataset:
• AWS Data Wrangler
• AWS Glue
• Amazon Athena
• Matplotlib
• Seaborn
• Pandas
• Numpy
Data Bias and Feature Importance:
• Measure Pretraining Bias - Amazon SageMaker
• SHAP
Automated Machine Learning:
• Amazon SageMaker Autopilot
Built-in algorithms:
• Elastic Machine Learning Algorithms in Amazon SageMaker
• Word2Vec algorithm
• GloVe algorithm
• FastText algorithm
• Transformer architecture, "Attention Is All You Need"
• BlazingText algorithm
• ELMo algorithm
• GPT model architecture
• BERT model architecture
• Built-in algorithms
• Amazon SageMaker BlazingText
12
2. Build, Train, and Deploy ML
Pipelines using BERT
13
• Dataset best fits the algorithm
• Improve ML model performance
Feature Engineering Steps
Feature Engineering Pipeline
Split Dataset
Feature Engineering
14
BERT Embedding
SageMaker Processing with scikit-learn
Parameters: code, processingInput, processingOutput
15
Feature Store – Reuse the feature engineering results
Centralized Reusable Discoverable
16
17
18
19
20
21
22
Artifact
• the output of a step or task can be consumed the next
step in a pipeline or deployed directly for consumption
SageMaker Pipelines
23
24
Feature Engineering and Feature Store:
• RoBERTa: A Robustly Optimized BERT Pretraining Approach
• Fundamental Techniques of Feature Engineering for Machine Learning
Train, Debug, and Profile a Machine Learning Model:
• PyTorch Hub
• TensorFlow Hub
• Hugging Face open-source NLP transformers library
• RoBERTa model
• Amazon SageMaker Model Training (Developer Guide)
• Amazon SageMaker Debugger: A system for real-time insights into machine learning model training
• The science behind SageMaker’s cost-saving Debugger
• Amazon SageMaker Debugger (Developer Guide)
• Amazon SageMaker Debugger (GitHub)
Deploy End-To-End Machine Learning Pipelines:
• A Chat with Andrew on MLOps: From Model-centric to Data-centric AI
25
3. Optimize ML Models and Deploy
Human-in-the-Loop Pipelines
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
Advanced model training, tuning, and evaluation:
• Hyperband
• Bayesian Optimization
• Amazon SageMaker Automatic Model Tuning
Advanced model deployment, and monitoring:
• A/B Testing
• Autoscaling
• Multi-armed bandit
• Batch Transform
• Inference Pipeline
• Model Monitor
Data labeling and human-in-the-loop pipelines:
• Towards Automated Data Quality Management for Machine Learning
• Amazon SageMaker Ground Truth Developer Guide
• Create high-quality instructions for Amazon SageMaker Ground Truth labeling jobs
• Amazon SageMaker Augmented AI (Amazon A2I) Developer Guide

Mais conteúdo relacionado

Mais procurados

Azure data analytics platform - A reference architecture
Azure data analytics platform - A reference architecture Azure data analytics platform - A reference architecture
Azure data analytics platform - A reference architecture
Rajesh Kumar
 
Machine Learning & Amazon SageMaker
Machine Learning & Amazon SageMakerMachine Learning & Amazon SageMaker
Machine Learning & Amazon SageMaker
Amazon Web Services
 

Mais procurados (20)

Databricks on AWS.pptx
Databricks on AWS.pptxDatabricks on AWS.pptx
Databricks on AWS.pptx
 
Aws glue를 통한 손쉬운 데이터 전처리 작업하기
Aws glue를 통한 손쉬운 데이터 전처리 작업하기Aws glue를 통한 손쉬운 데이터 전처리 작업하기
Aws glue를 통한 손쉬운 데이터 전처리 작업하기
 
AWS Basics .pdf
AWS Basics .pdfAWS Basics .pdf
AWS Basics .pdf
 
Productionzing ML Model Using MLflow Model Serving
Productionzing ML Model Using MLflow Model ServingProductionzing ML Model Using MLflow Model Serving
Productionzing ML Model Using MLflow Model Serving
 
Azure Machine Learning
Azure Machine LearningAzure Machine Learning
Azure Machine Learning
 
Azure Data Factory Data Flow
Azure Data Factory Data FlowAzure Data Factory Data Flow
Azure Data Factory Data Flow
 
Azure data analytics platform - A reference architecture
Azure data analytics platform - A reference architecture Azure data analytics platform - A reference architecture
Azure data analytics platform - A reference architecture
 
Introduction to AWS Glue
Introduction to AWS GlueIntroduction to AWS Glue
Introduction to AWS Glue
 
LG 이노텍 - Amazon Redshift Serverless를 활용한 데이터 분석 플랫폼 혁신 과정 - 발표자: 유재상 선임, LG이노...
LG 이노텍 - Amazon Redshift Serverless를 활용한 데이터 분석 플랫폼 혁신 과정 - 발표자: 유재상 선임, LG이노...LG 이노텍 - Amazon Redshift Serverless를 활용한 데이터 분석 플랫폼 혁신 과정 - 발표자: 유재상 선임, LG이노...
LG 이노텍 - Amazon Redshift Serverless를 활용한 데이터 분석 플랫폼 혁신 과정 - 발표자: 유재상 선임, LG이노...
 
대용량 데이터베이스의 클라우드 네이티브 DB로 전환 시 확인해야 하는 체크 포인트-김지훈, AWS Database Specialist SA...
대용량 데이터베이스의 클라우드 네이티브 DB로 전환 시 확인해야 하는 체크 포인트-김지훈, AWS Database Specialist SA...대용량 데이터베이스의 클라우드 네이티브 DB로 전환 시 확인해야 하는 체크 포인트-김지훈, AWS Database Specialist SA...
대용량 데이터베이스의 클라우드 네이티브 DB로 전환 시 확인해야 하는 체크 포인트-김지훈, AWS Database Specialist SA...
 
MLOps - The Assembly Line of ML
MLOps - The Assembly Line of MLMLOps - The Assembly Line of ML
MLOps - The Assembly Line of ML
 
Machine Learning & Amazon SageMaker
Machine Learning & Amazon SageMakerMachine Learning & Amazon SageMaker
Machine Learning & Amazon SageMaker
 
Data Modeling in Looker
Data Modeling in LookerData Modeling in Looker
Data Modeling in Looker
 
AWS Data Analytics on AWS
AWS Data Analytics on AWSAWS Data Analytics on AWS
AWS Data Analytics on AWS
 
Webinar: Accelerate Your Cloud Business With CloudHealth
Webinar: Accelerate Your Cloud Business With CloudHealthWebinar: Accelerate Your Cloud Business With CloudHealth
Webinar: Accelerate Your Cloud Business With CloudHealth
 
使用 Amazon Athena 直接分析儲存於 S3 的巨量資料
使用 Amazon Athena 直接分析儲存於 S3 的巨量資料使用 Amazon Athena 直接分析儲存於 S3 的巨量資料
使用 Amazon Athena 直接分析儲存於 S3 的巨量資料
 
Building Data Lakes with AWS
Building Data Lakes with AWSBuilding Data Lakes with AWS
Building Data Lakes with AWS
 
How to Build a ML Platform Efficiently Using Open-Source
How to Build a ML Platform Efficiently Using Open-SourceHow to Build a ML Platform Efficiently Using Open-Source
How to Build a ML Platform Efficiently Using Open-Source
 
Data Quality Patterns in the Cloud with Azure Data Factory
Data Quality Patterns in the Cloud with Azure Data FactoryData Quality Patterns in the Cloud with Azure Data Factory
Data Quality Patterns in the Cloud with Azure Data Factory
 
MLOps.pptx
MLOps.pptxMLOps.pptx
MLOps.pptx
 

Semelhante a Practical data science

Machine Learning Models in Production
Machine Learning Models in ProductionMachine Learning Models in Production
Machine Learning Models in Production
DataWorks Summit
 

Semelhante a Practical data science (20)

Machine Learning and AI at Oracle
Machine Learning and AI at OracleMachine Learning and AI at Oracle
Machine Learning and AI at Oracle
 
Augmenting Machine Learning with Databricks Labs AutoML Toolkit
Augmenting Machine Learning with Databricks Labs AutoML ToolkitAugmenting Machine Learning with Databricks Labs AutoML Toolkit
Augmenting Machine Learning with Databricks Labs AutoML Toolkit
 
The Power of Auto ML and How Does it Work
The Power of Auto ML and How Does it WorkThe Power of Auto ML and How Does it Work
The Power of Auto ML and How Does it Work
 
Machine learning
Machine learningMachine learning
Machine learning
 
AutoML - Heralding a New Era of Machine Learning - CASOUG Oct 2021
AutoML - Heralding a New Era of Machine Learning - CASOUG Oct 2021AutoML - Heralding a New Era of Machine Learning - CASOUG Oct 2021
AutoML - Heralding a New Era of Machine Learning - CASOUG Oct 2021
 
Guiding through a typical Machine Learning Pipeline
Guiding through a typical Machine Learning PipelineGuiding through a typical Machine Learning Pipeline
Guiding through a typical Machine Learning Pipeline
 
Augmenting Machine Learning with Databricks Labs AutoML Toolkit
Augmenting Machine Learning with Databricks Labs AutoML ToolkitAugmenting Machine Learning with Databricks Labs AutoML Toolkit
Augmenting Machine Learning with Databricks Labs AutoML Toolkit
 
MLIntro_ADA.pptx
MLIntro_ADA.pptxMLIntro_ADA.pptx
MLIntro_ADA.pptx
 
[DSC Europe 23] Petar Zecevic - ML in Production on Databricks
[DSC Europe 23] Petar Zecevic - ML in Production on Databricks[DSC Europe 23] Petar Zecevic - ML in Production on Databricks
[DSC Europe 23] Petar Zecevic - ML in Production on Databricks
 
The Data Science Process - Do we need it and how to apply?
The Data Science Process - Do we need it and how to apply?The Data Science Process - Do we need it and how to apply?
The Data Science Process - Do we need it and how to apply?
 
MLOPS By Amazon offered and free download
MLOPS By Amazon offered and free downloadMLOPS By Amazon offered and free download
MLOPS By Amazon offered and free download
 
Building machine learning inference pipelines at scale (March 2019)
Building machine learning inference pipelines at scale (March 2019)Building machine learning inference pipelines at scale (March 2019)
Building machine learning inference pipelines at scale (March 2019)
 
AlphaPy
AlphaPyAlphaPy
AlphaPy
 
AlphaPy: A Data Science Pipeline in Python
AlphaPy: A Data Science Pipeline in PythonAlphaPy: A Data Science Pipeline in Python
AlphaPy: A Data Science Pipeline in Python
 
.Net development with Azure Machine Learning (AzureML) Nov 2014
.Net development with Azure Machine Learning (AzureML) Nov 2014.Net development with Azure Machine Learning (AzureML) Nov 2014
.Net development with Azure Machine Learning (AzureML) Nov 2014
 
Unlocking DataDriven Talent Intelligence Transforming TALENTX with Industry P...
Unlocking DataDriven Talent Intelligence Transforming TALENTX with Industry P...Unlocking DataDriven Talent Intelligence Transforming TALENTX with Industry P...
Unlocking DataDriven Talent Intelligence Transforming TALENTX with Industry P...
 
Apache ® Spark™ MLlib 2.x: How to Productionize your Machine Learning Models
Apache ® Spark™ MLlib 2.x: How to Productionize your Machine Learning ModelsApache ® Spark™ MLlib 2.x: How to Productionize your Machine Learning Models
Apache ® Spark™ MLlib 2.x: How to Productionize your Machine Learning Models
 
Machine Learning Models in Production
Machine Learning Models in ProductionMachine Learning Models in Production
Machine Learning Models in Production
 
Key projects in AI, ML and Generative AI
Key projects in AI, ML and Generative AIKey projects in AI, ML and Generative AI
Key projects in AI, ML and Generative AI
 
Building Machine Learning inference pipelines at scale | AWS Summit Tel Aviv ...
Building Machine Learning inference pipelines at scale | AWS Summit Tel Aviv ...Building Machine Learning inference pipelines at scale | AWS Summit Tel Aviv ...
Building Machine Learning inference pipelines at scale | AWS Summit Tel Aviv ...
 

Mais de Ding Li

Mais de Ding Li (13)

Software architecture for data applications
Software architecture for data applicationsSoftware architecture for data applications
Software architecture for data applications
 
Seismic data analysis with u net
Seismic data analysis with u netSeismic data analysis with u net
Seismic data analysis with u net
 
Titanic survivor prediction by machine learning
Titanic survivor prediction by machine learningTitanic survivor prediction by machine learning
Titanic survivor prediction by machine learning
 
Find nuclei in images with U-net
Find nuclei in images with U-netFind nuclei in images with U-net
Find nuclei in images with U-net
 
Digit recognizer by convolutional neural network
Digit recognizer by convolutional neural networkDigit recognizer by convolutional neural network
Digit recognizer by convolutional neural network
 
Reinforcement learning
Reinforcement learningReinforcement learning
Reinforcement learning
 
Recommendation system
Recommendation systemRecommendation system
Recommendation system
 
Generative adversarial networks
Generative adversarial networksGenerative adversarial networks
Generative adversarial networks
 
AI to advance science research
AI to advance science researchAI to advance science research
AI to advance science research
 
Machine learning with graph
Machine learning with graphMachine learning with graph
Machine learning with graph
 
Natural language processing and transformer models
Natural language processing and transformer modelsNatural language processing and transformer models
Natural language processing and transformer models
 
Great neck school budget 2016-2017 analysis
Great neck school budget 2016-2017 analysisGreat neck school budget 2016-2017 analysis
Great neck school budget 2016-2017 analysis
 
Business Intelligence and Big Data in Cloud
Business Intelligence and Big Data in CloudBusiness Intelligence and Big Data in Cloud
Business Intelligence and Big Data in Cloud
 

Último

Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
gajnagarg
 
Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...
Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...
Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...
gajnagarg
 
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
nirzagarg
 
怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制
怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制
怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制
vexqp
 
Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...
Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...
Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...
gajnagarg
 
Gartner's Data Analytics Maturity Model.pptx
Gartner's Data Analytics Maturity Model.pptxGartner's Data Analytics Maturity Model.pptx
Gartner's Data Analytics Maturity Model.pptx
chadhar227
 
Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...
nirzagarg
 
Diamond Harbour \ Russian Call Girls Kolkata | Book 8005736733 Extreme Naught...
Diamond Harbour \ Russian Call Girls Kolkata | Book 8005736733 Extreme Naught...Diamond Harbour \ Russian Call Girls Kolkata | Book 8005736733 Extreme Naught...
Diamond Harbour \ Russian Call Girls Kolkata | Book 8005736733 Extreme Naught...
HyderabadDolls
 
Lake Town / Independent Kolkata Call Girls Phone No 8005736733 Elite Escort S...
Lake Town / Independent Kolkata Call Girls Phone No 8005736733 Elite Escort S...Lake Town / Independent Kolkata Call Girls Phone No 8005736733 Elite Escort S...
Lake Town / Independent Kolkata Call Girls Phone No 8005736733 Elite Escort S...
HyderabadDolls
 
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi ArabiaIn Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
ahmedjiabur940
 

Último (20)

Fun all Day Call Girls in Jaipur 9332606886 High Profile Call Girls You Ca...
Fun all Day Call Girls in Jaipur   9332606886  High Profile Call Girls You Ca...Fun all Day Call Girls in Jaipur   9332606886  High Profile Call Girls You Ca...
Fun all Day Call Girls in Jaipur 9332606886 High Profile Call Girls You Ca...
 
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
 
Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...
Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...
Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...
 
Identify Customer Segments to Create Customer Offers for Each Segment - Appli...
Identify Customer Segments to Create Customer Offers for Each Segment - Appli...Identify Customer Segments to Create Customer Offers for Each Segment - Appli...
Identify Customer Segments to Create Customer Offers for Each Segment - Appli...
 
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
 
Aspirational Block Program Block Syaldey District - Almora
Aspirational Block Program Block Syaldey District - AlmoraAspirational Block Program Block Syaldey District - Almora
Aspirational Block Program Block Syaldey District - Almora
 
Vadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book now
Vadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book nowVadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book now
Vadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book now
 
DATA SUMMIT 24 Building Real-Time Pipelines With FLaNK
DATA SUMMIT 24  Building Real-Time Pipelines With FLaNKDATA SUMMIT 24  Building Real-Time Pipelines With FLaNK
DATA SUMMIT 24 Building Real-Time Pipelines With FLaNK
 
Case Study 4 Where the cry of rebellion happen?
Case Study 4 Where the cry of rebellion happen?Case Study 4 Where the cry of rebellion happen?
Case Study 4 Where the cry of rebellion happen?
 
Vastral Call Girls Book Now 7737669865 Top Class Escort Service Available
Vastral Call Girls Book Now 7737669865 Top Class Escort Service AvailableVastral Call Girls Book Now 7737669865 Top Class Escort Service Available
Vastral Call Girls Book Now 7737669865 Top Class Escort Service Available
 
Predicting HDB Resale Prices - Conducting Linear Regression Analysis With Orange
Predicting HDB Resale Prices - Conducting Linear Regression Analysis With OrangePredicting HDB Resale Prices - Conducting Linear Regression Analysis With Orange
Predicting HDB Resale Prices - Conducting Linear Regression Analysis With Orange
 
怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制
怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制
怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制
 
Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...
Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...
Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...
 
Gartner's Data Analytics Maturity Model.pptx
Gartner's Data Analytics Maturity Model.pptxGartner's Data Analytics Maturity Model.pptx
Gartner's Data Analytics Maturity Model.pptx
 
Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...
 
Diamond Harbour \ Russian Call Girls Kolkata | Book 8005736733 Extreme Naught...
Diamond Harbour \ Russian Call Girls Kolkata | Book 8005736733 Extreme Naught...Diamond Harbour \ Russian Call Girls Kolkata | Book 8005736733 Extreme Naught...
Diamond Harbour \ Russian Call Girls Kolkata | Book 8005736733 Extreme Naught...
 
Digital Transformation Playbook by Graham Ware
Digital Transformation Playbook by Graham WareDigital Transformation Playbook by Graham Ware
Digital Transformation Playbook by Graham Ware
 
Lake Town / Independent Kolkata Call Girls Phone No 8005736733 Elite Escort S...
Lake Town / Independent Kolkata Call Girls Phone No 8005736733 Elite Escort S...Lake Town / Independent Kolkata Call Girls Phone No 8005736733 Elite Escort S...
Lake Town / Independent Kolkata Call Girls Phone No 8005736733 Elite Escort S...
 
Ranking and Scoring Exercises for Research
Ranking and Scoring Exercises for ResearchRanking and Scoring Exercises for Research
Ranking and Scoring Exercises for Research
 
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi ArabiaIn Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
 

Practical data science

  • 2. 2 1. Analyze Datasets and Train ML Models using AutoML
  • 4. 4 Register Data with AWS Glue and Query Data with Athena
  • 6. 6 Statistical Bias and SageMaker Clarify Covariant Drift: distribution of the independent variables or the features can change. Prior Probability Drift: data distribution of your labels or the targeted variables might change. Concept Drift: relationship between the features and the labels can change. Concept drift also called as concept shift can happen when the definition of the label itself changes based on a particular feature like age or geographical location. Measure Class Imbalance (CI) • Measures the imbalance in the number of examples that are provided for different facet values. • Does a particular product category have disproportionately large number of total reviews than any other category in the dataset? Difference in Proportions of Labels (DPL) • Measures the imbalance of positive outcomes between the different facet values. • If a particular product category has disproportionately higher ratings than other categories. Amazon SageMaker Clarify
  • 7. 7 Feature Importance SHAP Rank the individual features in the order of their importance and contribution to the final model. SHAP (SHapley Additive exPlanations) GitHub paper YouTube A game theoretic approach to explain the output of any machine learning model. It connects optimal credit allocation with local explanations using the classic Shapley values from game theory and their related extensions New Data Flow Import Data Add Data Analysis Feature Importance
  • 8. 8 • Auto ML allows for experts to focus on those hard problems that can't be solved through Auto ML. • Auto ML can reduce the repetitive work, experts can apply their domain to analyze the results
  • 9. 9 Automatic data pre-processing and feature engineering • Automatic data pre-processing and feature engineering automatically fills in the missing data, provides statistical insights about columns in your dataset, and automatically extracts information from non-numeric columns, such as date and time information from timestamps. • Automatic ML model selection automatically infers the type of predictions that best suit your data, such as binary classification, multi-class classification, or regression. SageMaker Autopilot then explores high-performing algorithms such as gradient boosting decision tree, feedforward deep neural networks, and logistic regression, and trains and optimizes hundreds of models based on these algorithms to find the model that best fits your data. • Model leaderboard can view the list of models, ranked by metrics such as accuracy, precision, recall, and area under the curve (AUC), review model details such as the impact of features on predictions, and deploy the model that is best suited to your use case.
  • 11. 11 Explore the Use Case and Analyze the Dataset: • AWS Data Wrangler • AWS Glue • Amazon Athena • Matplotlib • Seaborn • Pandas • Numpy Data Bias and Feature Importance: • Measure Pretraining Bias - Amazon SageMaker • SHAP Automated Machine Learning: • Amazon SageMaker Autopilot Built-in algorithms: • Elastic Machine Learning Algorithms in Amazon SageMaker • Word2Vec algorithm • GloVe algorithm • FastText algorithm • Transformer architecture, "Attention Is All You Need" • BlazingText algorithm • ELMo algorithm • GPT model architecture • BERT model architecture • Built-in algorithms • Amazon SageMaker BlazingText
  • 12. 12 2. Build, Train, and Deploy ML Pipelines using BERT
  • 13. 13 • Dataset best fits the algorithm • Improve ML model performance Feature Engineering Steps Feature Engineering Pipeline Split Dataset Feature Engineering
  • 14. 14 BERT Embedding SageMaker Processing with scikit-learn Parameters: code, processingInput, processingOutput
  • 15. 15 Feature Store – Reuse the feature engineering results Centralized Reusable Discoverable
  • 16. 16
  • 17. 17
  • 18. 18
  • 19. 19
  • 20. 20
  • 21. 21
  • 22. 22 Artifact • the output of a step or task can be consumed the next step in a pipeline or deployed directly for consumption SageMaker Pipelines
  • 23. 23
  • 24. 24 Feature Engineering and Feature Store: • RoBERTa: A Robustly Optimized BERT Pretraining Approach • Fundamental Techniques of Feature Engineering for Machine Learning Train, Debug, and Profile a Machine Learning Model: • PyTorch Hub • TensorFlow Hub • Hugging Face open-source NLP transformers library • RoBERTa model • Amazon SageMaker Model Training (Developer Guide) • Amazon SageMaker Debugger: A system for real-time insights into machine learning model training • The science behind SageMaker’s cost-saving Debugger • Amazon SageMaker Debugger (Developer Guide) • Amazon SageMaker Debugger (GitHub) Deploy End-To-End Machine Learning Pipelines: • A Chat with Andrew on MLOps: From Model-centric to Data-centric AI
  • 25. 25 3. Optimize ML Models and Deploy Human-in-the-Loop Pipelines
  • 26. 26
  • 27. 27
  • 28. 28
  • 29. 29
  • 30. 30
  • 31. 31
  • 32. 32
  • 33. 33
  • 34. 34
  • 35. 35
  • 36. 36
  • 37. 37
  • 38. 38
  • 39. 39
  • 40. 40
  • 41. 41 Advanced model training, tuning, and evaluation: • Hyperband • Bayesian Optimization • Amazon SageMaker Automatic Model Tuning Advanced model deployment, and monitoring: • A/B Testing • Autoscaling • Multi-armed bandit • Batch Transform • Inference Pipeline • Model Monitor Data labeling and human-in-the-loop pipelines: • Towards Automated Data Quality Management for Machine Learning • Amazon SageMaker Ground Truth Developer Guide • Create high-quality instructions for Amazon SageMaker Ground Truth labeling jobs • Amazon SageMaker Augmented AI (Amazon A2I) Developer Guide