SlideShare uma empresa Scribd logo
1 de 37
Baixar para ler offline
ALEXIS PERRIER
@alexip
Linkedin.com/in/alexisperrier
Data Scientist
Slides:
▸ AWS Machine Learning : predictive analytics
▸ Simple, efficient but somewhat limited
▸ Auto-ML on AWS marketplace
AWS MACHINE LEARNING
PLAN
DATA SCIENCE TAKE TIME AND RESOURCES
10 FALLACIES OF DATA SCIENCE
Shane Brennan https://medium.com/towards-data-science/the-ten-fallacies-of-data-science-9b2af78a1862
1. Exists
2. Accessible
3. Consistent
4. Relevant
5. Understandable
6. Processable
7. Reproducibility
8. Compliance and security
9. Results are understood
10.Expected outcomes
Data scientist ROI
Access Massage Train

Models
ProductionPresentation
The plan
“AWS WANTS TO PUT MACHINE
LEARNING IN REACH OF ANY
DEVELOPER“
April 2015 - Techcrunch
MACHINE LEARNING AS A SERVICE
AWS DATA ECOSYSTEM
AWS MACHINE LEARNING
WHAT IT DOES
▸ Supervised Predictive Analytics
▸ On structured data and text,
▸ Outcome as function of variables,
ground truth known on subset
WHAT IT DOES NOT
▸ Unsupervised learning
▸ Reinforcement learning
▸ Deep learning
WORK FLOW - AWS ML PROJECT
Create a datasource from S3, RDS or Redshift
Transform the data with recipes (opt)
Train a Model
Evaluate
Create endpoints
3
1
2
4
5
DATASOURCE
AWS extracts the schema
AWS analyses the data
Provides simple visualization
Offers default transformation
S3
Redshift
RDS (CLI - SDK only)
3
1
2
4
TITANIC DATASET - DEFAULT SCHEMA
TITANIC DATASET - SIMPLE ANALYSIS
SCHEMA, RECIPES AND FEATURES
▸ From the data, AWS suggests the optimal transformations - recipe
▸ 7 transformations are available
▸ Text: N-gram, Orthogonal Sparse Bigram, Lowercase, Punctuation
▸ TF-IDF by default, no stop words, no POS, Lemma, …
▸ Categorical: Cartesian product
▸ Numeric: Normalization, Quantile Binning
▸ QB: non linearities in continuous, numeric to categorical
▸ Recipe is downloadable
TITANIC DATASET - DATA TRANSFORMATION
TITANIC DATASET - DATA TRANSFORMATION
TRAIN YOUR MODEL
STOCHASTIC GRADIENT DESCENT
One model to rule them all
▸ Simple
▸ Regularization
▸ Epochs
▸ Shuffling
▸ That’s it!
STOCHASTIC GRADIENT DESCENT - TUNING
Epochs
Shuffling
Regularization
Stochastic Gradient in scikit-learn
CONVERGENCE - POWERFUL QUANTILE BINNING
Accuracy Accuracy
EVALUATION
EVALUATION
TEXT
END POINT - STREAMING
AWS INTEGRATION
STRONG POINTS
▸ Powerful modeling: SGD + quantile binning
▸ AWS ecosystem
▸ Multiple sources (S3, RDS, Redshift)
▸ Simple to setup and use
▸ Great for benchmarking
▸ No need for production code!
▸ CLI - SDKs (python, …)
ROOM FOR IMPROVEMENTS
‣ No cross validation!
‣ Can’t export your trained models*
‣ No scripting
(*) Stealing Machine Learning Models via prediction APIs

http://www.cs.unc.edu/~reiter/papers/2016/USENIX.pdf
▸ Limited data visualization
▸ Limited feature engineering
▸ SGD model only: no forests, SVMs, Bayes, …
▸ No deep learning (EC2)
STILL A NEED FOR DOMAIN EXPERTISE
AND FEATURE ENGINEERING
GREAT TIME SAVER BUT
AUTO-ML ON AWS
Modeling is simplified
But feature engineering still needs attention
3
1
2 Feature engineering
Include all available datasets
Feature importance
‣ Feature importance
‣ Composite features surfacing
‣ Complementary dataset integration
AUTOMATIC FEATURE ENGINEERING
Smart (Not naive)

Bayesian Optimization
ON AWS MARKETPLACE
EC2 INSTANCE
DATA EXPLORATION - FEATURE SELECTION
FEATURE SURFACING
AND ITERATE
+
‣ Reduced TTM + TCO
‣ Less resources
‣ Powerful benchmarking
‣ Fast iterations
‣ Holistic data integration
Jerry Hargrove @awsgeek https://www.awsgeek.com/posts/amazon-machine-learning-summary
LET’S CONNECT!
@alexip
linkedin.com/in/alexisperrier
THANK YOU

Mais conteúdo relacionado

Semelhante a AWS Machine Learning Big Data NYC

Build in 2019 建立分佈式、開放式、數據中心的人工智慧數據驅動平台
Build in 2019 建立分佈式、開放式、數據中心的人工智慧數據驅動平台Build in 2019 建立分佈式、開放式、數據中心的人工智慧數據驅動平台
Build in 2019 建立分佈式、開放式、數據中心的人工智慧數據驅動平台
Amazon Web Services
 

Semelhante a AWS Machine Learning Big Data NYC (20)

AWS Summit Singapore Opening Keynote
AWS Summit Singapore Opening Keynote AWS Summit Singapore Opening Keynote
AWS Summit Singapore Opening Keynote
 
Understand Immutable infrastructure - at Build Stuff Kiev 2016
Understand Immutable infrastructure  - at Build Stuff Kiev 2016Understand Immutable infrastructure  - at Build Stuff Kiev 2016
Understand Immutable infrastructure - at Build Stuff Kiev 2016
 
IT automation: Make the server great again - toulouse devops fev 2017
IT automation: Make the server great again  - toulouse devops fev 2017IT automation: Make the server great again  - toulouse devops fev 2017
IT automation: Make the server great again - toulouse devops fev 2017
 
What is Cloud Computing with AWS?
What is Cloud Computing with AWS?What is Cloud Computing with AWS?
What is Cloud Computing with AWS?
 
A real-life account of moving 100% to a public cloud
A real-life account of moving 100% to a public cloudA real-life account of moving 100% to a public cloud
A real-life account of moving 100% to a public cloud
 
Innovation-at-Hyper-scale-Outlook-on-Emerging-Technologies
Innovation-at-Hyper-scale-Outlook-on-Emerging-TechnologiesInnovation-at-Hyper-scale-Outlook-on-Emerging-Technologies
Innovation-at-Hyper-scale-Outlook-on-Emerging-Technologies
 
Data Virtualization in the Cloud – Accelerating Time-to-Value
Data Virtualization in the Cloud – Accelerating Time-to-ValueData Virtualization in the Cloud – Accelerating Time-to-Value
Data Virtualization in the Cloud – Accelerating Time-to-Value
 
Machine Learning in azione con Amazon SageMaker
Machine Learning in azione con Amazon SageMakerMachine Learning in azione con Amazon SageMaker
Machine Learning in azione con Amazon SageMaker
 
20141021 AWS Cloud Taekwon - Startup Best Practices on AWS
20141021 AWS Cloud Taekwon - Startup Best Practices on AWS20141021 AWS Cloud Taekwon - Startup Best Practices on AWS
20141021 AWS Cloud Taekwon - Startup Best Practices on AWS
 
Agile integration workshop Seattle
Agile integration workshop SeattleAgile integration workshop Seattle
Agile integration workshop Seattle
 
Scaling Cloud Infrastructure for Millions of Devices
Scaling Cloud Infrastructure for Millions of DevicesScaling Cloud Infrastructure for Millions of Devices
Scaling Cloud Infrastructure for Millions of Devices
 
Build in 2019 建立分佈式、開放式、數據中心的人工智慧數據驅動平台
Build in 2019 建立分佈式、開放式、數據中心的人工智慧數據驅動平台Build in 2019 建立分佈式、開放式、數據中心的人工智慧數據驅動平台
Build in 2019 建立分佈式、開放式、數據中心的人工智慧數據驅動平台
 
Cepta The Future of Data with Power BI
Cepta The Future of Data with Power BICepta The Future of Data with Power BI
Cepta The Future of Data with Power BI
 
Introduction to Cloud Computing with Amazon Web Services-ASEAN Workshop Serie...
Introduction to Cloud Computing with Amazon Web Services-ASEAN Workshop Serie...Introduction to Cloud Computing with Amazon Web Services-ASEAN Workshop Serie...
Introduction to Cloud Computing with Amazon Web Services-ASEAN Workshop Serie...
 
Cloud Adoption
Cloud AdoptionCloud Adoption
Cloud Adoption
 
Introduction to Cloud Computing with Amazon Web Services
Introduction to Cloud Computing with Amazon Web ServicesIntroduction to Cloud Computing with Amazon Web Services
Introduction to Cloud Computing with Amazon Web Services
 
Best Practices for Application Management in AWS
Best Practices for Application Management in AWSBest Practices for Application Management in AWS
Best Practices for Application Management in AWS
 
AWS Webcast - Build Agile Applications in AWS Cloud for Government
AWS Webcast - Build Agile Applications in AWS Cloud for GovernmentAWS Webcast - Build Agile Applications in AWS Cloud for Government
AWS Webcast - Build Agile Applications in AWS Cloud for Government
 
AWS reInvent 2023 recaps from Chicago AWS user group
AWS reInvent 2023 recaps from Chicago AWS user groupAWS reInvent 2023 recaps from Chicago AWS user group
AWS reInvent 2023 recaps from Chicago AWS user group
 
amrutaca
amrutacaamrutaca
amrutaca
 

Último

Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...
Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...
Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...
gajnagarg
 
怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制
怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制
怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制
vexqp
 
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
nirzagarg
 
怎样办理伦敦大学城市学院毕业证(CITY毕业证书)成绩单学校原版复制
怎样办理伦敦大学城市学院毕业证(CITY毕业证书)成绩单学校原版复制怎样办理伦敦大学城市学院毕业证(CITY毕业证书)成绩单学校原版复制
怎样办理伦敦大学城市学院毕业证(CITY毕业证书)成绩单学校原版复制
vexqp
 
Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...
Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...
Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...
gajnagarg
 
Cytotec in Jeddah+966572737505) get unwanted pregnancy kit Riyadh
Cytotec in Jeddah+966572737505) get unwanted pregnancy kit RiyadhCytotec in Jeddah+966572737505) get unwanted pregnancy kit Riyadh
Cytotec in Jeddah+966572737505) get unwanted pregnancy kit Riyadh
Abortion pills in Riyadh +966572737505 get cytotec
 
一比一原版(曼大毕业证书)曼尼托巴大学毕业证成绩单留信学历认证一手价格
一比一原版(曼大毕业证书)曼尼托巴大学毕业证成绩单留信学历认证一手价格一比一原版(曼大毕业证书)曼尼托巴大学毕业证成绩单留信学历认证一手价格
一比一原版(曼大毕业证书)曼尼托巴大学毕业证成绩单留信学历认证一手价格
q6pzkpark
 
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...
nirzagarg
 
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get CytotecAbortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Riyadh +966572737505 get cytotec
 
+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...
+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...
+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...
Health
 
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi ArabiaIn Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
ahmedjiabur940
 

Último (20)

SR-101-01012024-EN.docx Federal Constitution of the Swiss Confederation
SR-101-01012024-EN.docx  Federal Constitution  of the Swiss ConfederationSR-101-01012024-EN.docx  Federal Constitution  of the Swiss Confederation
SR-101-01012024-EN.docx Federal Constitution of the Swiss Confederation
 
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
 
Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...
Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...
Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...
 
Switzerland Constitution 2002.pdf.........
Switzerland Constitution 2002.pdf.........Switzerland Constitution 2002.pdf.........
Switzerland Constitution 2002.pdf.........
 
怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制
怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制
怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制
 
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
 
Predicting HDB Resale Prices - Conducting Linear Regression Analysis With Orange
Predicting HDB Resale Prices - Conducting Linear Regression Analysis With OrangePredicting HDB Resale Prices - Conducting Linear Regression Analysis With Orange
Predicting HDB Resale Prices - Conducting Linear Regression Analysis With Orange
 
怎样办理伦敦大学城市学院毕业证(CITY毕业证书)成绩单学校原版复制
怎样办理伦敦大学城市学院毕业证(CITY毕业证书)成绩单学校原版复制怎样办理伦敦大学城市学院毕业证(CITY毕业证书)成绩单学校原版复制
怎样办理伦敦大学城市学院毕业证(CITY毕业证书)成绩单学校原版复制
 
Sequential and reinforcement learning for demand side management by Margaux B...
Sequential and reinforcement learning for demand side management by Margaux B...Sequential and reinforcement learning for demand side management by Margaux B...
Sequential and reinforcement learning for demand side management by Margaux B...
 
Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...
Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...
Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...
 
Cytotec in Jeddah+966572737505) get unwanted pregnancy kit Riyadh
Cytotec in Jeddah+966572737505) get unwanted pregnancy kit RiyadhCytotec in Jeddah+966572737505) get unwanted pregnancy kit Riyadh
Cytotec in Jeddah+966572737505) get unwanted pregnancy kit Riyadh
 
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
 
The-boAt-Story-Navigating-the-Waves-of-Innovation.pptx
The-boAt-Story-Navigating-the-Waves-of-Innovation.pptxThe-boAt-Story-Navigating-the-Waves-of-Innovation.pptx
The-boAt-Story-Navigating-the-Waves-of-Innovation.pptx
 
一比一原版(曼大毕业证书)曼尼托巴大学毕业证成绩单留信学历认证一手价格
一比一原版(曼大毕业证书)曼尼托巴大学毕业证成绩单留信学历认证一手价格一比一原版(曼大毕业证书)曼尼托巴大学毕业证成绩单留信学历认证一手价格
一比一原版(曼大毕业证书)曼尼托巴大学毕业证成绩单留信学历认证一手价格
 
DATA SUMMIT 24 Building Real-Time Pipelines With FLaNK
DATA SUMMIT 24  Building Real-Time Pipelines With FLaNKDATA SUMMIT 24  Building Real-Time Pipelines With FLaNK
DATA SUMMIT 24 Building Real-Time Pipelines With FLaNK
 
Data Analyst Tasks to do the internship.pdf
Data Analyst Tasks to do the internship.pdfData Analyst Tasks to do the internship.pdf
Data Analyst Tasks to do the internship.pdf
 
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...
 
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get CytotecAbortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
 
+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...
+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...
+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...
 
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi ArabiaIn Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
 

AWS Machine Learning Big Data NYC