SlideShare uma empresa Scribd logo
1 de 14
Building Data Scientists
Machine Learning Mastery in Python
Mitch Sanders
Jan 10th 2018
Internal Use - Confidential
2 of Y
Internal Use - Confidential
Trend #2 Non-Data Scientists will perform
more fairly sophisticated analytics
alongside data scientists
Data Scientist
Algorithm Coder
Data
Science
Citizens
Advanced
Analytics
Programmers
Statisticians
Business
Analyst
Coders
Data Science continues to develop
specialties - this means the mythical
‘full stack’ data scientist will disappear
Trend #1
Data
Scientist
Data
Engineer
Algorithm
Coder
Data
Storyteller
Industry Trends for 2018 – How
what we’re doing fits into the future
the Context
3 of Y
Internal Use - Confidential
the Course
Machine Learning Mastery
- Understand Your Data
- Create Accurate Models
- Work Projects End-To-End
• 16 weeks – May-Oct., 2017
• 20+ class hours – 20% homework, 80% live coding
• 17 notebooks – Python code templates
• 4 Prerequisites – Coding, statistics, algorithms, thirst to learn
• 1 Textbook – Machine Learning Mastery w/ Python -Dr. Jason Brownlee
• 1 Teacher – Mitch Sanders w/ Assistant – Uday Waghmare
• 14 Students – global: software engineers, adv. analysts, statisticians
• Platform – Jupyter, Python 2.7, Anaconda
• Code Repository – GitHub
• NPS Survey – Survey Monkey, LTR = 90
• Awarded – “On the Spot”
4 of Y
Internal Use - Confidential
the Content
Prepare & Explore Model Improve Accuracy & Finalize
Python ML
Ecosystem
SciPy
Scikit-learn
Crash Courses
NumPy
Matplotlib
Pandas
Load Libraries & Data
Descriptive Statistics
Attribute Data Types
Class Distribution
Correlation Analysis
Skew of Univariates
Pre Processing
Rescale
Standardize
Normalize
BinarizeFeature Selection
Tree & Univariate
Recursive -RFE
Principle Comp.
Analysis - PCA
Feature Importance
Resampling
Split into Train/Test
K-fold Cross Validation
Leave One Out
Repeated Random
Evaluation Metrics
For Classification
For Regression
Spot Check
Classification Algorithms
Linear –
• Logistic Regression
• Linear Discriminate
Analysis (LDA)
Non-linear –
• K-Nearest Neighbor (KNN)
• Naïve Bayes
• Class & Regression Trees
(CART)
• Support Vector Machines
(SVM)
Compare Algorithms
Spot Check
Regression Algorithms
Linear – LR, LASSO,
ElasticNet (EN)
Non-Linear – CART, SVR,
KNN
Automate w/ Pipelines
Preparation Pipelines
Feature Extraction Pipelines
Modeling Pipelines
Ensembles - Performance
Improvements
Boosting –
• AdaBoost,
• Gradient Boosting (GBM)
Bagging –
• Random Forest, Extra Trees
• Voting
Algorithm
Parameter Tuning
Parameters
Grid Search
Random Search
Finalize Model
Predict on Validation Data
Create Standalone on Entire Data
Save Model for Production
Visualization
Univariate Plots
Multivariate Plots
Case Studies #1 & #2
Key concepts – and flow – the
17 notebooks
#1
#17
Reference Material
6 of Y
Internal Use - Confidential
the Course Syllabus
Python Ecosystem for Machine
Learning
• Python
• SciPy
• Scikit-learn
• Python Ecosystem Installation
• Summary
Crash Course in Python and SciPy
• Python Crash Course
• NumPy Crash Course
• Matplotlib Crash Course
• Pandas Crash Course
• Summary
How To Load Machine Learning Data
• Considerations When Loading CSV
Data
• Pima Indians Dataset
• Load CSV Files with the Python
Standard Library
• Load CSV Files with NumPy
• Load CSV Files with Pandas
• Summary
Understand Your Data With
Visualization
• Univariate Plots
• Multivariate Plots
• Summary
Prepare Your Data For Machine Learning
• Need For Data Pre-processing
• Data Transforms
• Rescale Data
• Standardize Data
• Normalize Data
• Binarize Data (Make Binary)
• Summary
Feature Selection For Machine Learning
• Feature Selection
• Univariate Selection
• Recursive Feature Elimination
• Principal Component Analysis
• Feature Importance
• Summary
Evaluate the Performance of Machine
Learning Algorithms with Resampling
• Evaluate Machine Learning Algorithms
• Split into Train and Test Sets
• K-fold Cross-Validation
• Leave One Out Cross-Validation
• Repeated Random Test-Train Splits
• What Techniques to Use When
• Summary
Machine Learning Algorithm
Performance Metrics
• Algorithm Evaluation Metrics
• Classification Metrics
• Regression Metrics
• Summary
Spot-Check Classification Algorithms
• Algorithm Spot-Checking
• Algorithms Overview
• Linear Machine Learning Algorithms
• Nonlinear Machine Learning
Algorithms
• Summary
Spot-Check Regression Algorithms
• Algorithms Overview
• Linear Machine Learning Algorithms
• Nonlinear Machine Learning
Algorithms
• Summary
Compare Machine Learning Algorithms
• Choose The Best Machine Learning
Model
• Compare Machine Learning
Algorithms Consistently
• Summary
Automate Machine Learning Workflows
with Pipelines
• Automating Machine Learning
Workflows
• Data Preparation and Modeling
Pipeline
• Feature Extraction and Modeling
Pipeline
• Summary
Improve Performance with Ensembles
• Combine Models Into Ensemble
Predictions
• Bagging Algorithms
• Boosting Algorithms
• Voting Ensemble
• Summary
7 of Y
Internal Use - Confidential
data science student questions - 1
“So you do Data Science work. What really does that involve? And how is that different than programming, statistical work or data
engineering?”
“I want to learn Data Science. Between R, Python and SAS, where should I start and what are the Pros and Cons of each?”
“What is OOP (Object orientated programming) and Structured Programming and what’s the difference between them?"
“What is main differences between Python 2.7 and Python 3.x versions? And why do so many developers stay with Python 2.7?”
"What is the difference between Supervised Learning an Unsupervised Learning?"
"What's different graphing might a univariate have compared to a bivariate analysis? Can you graph multivariate?"
"How do you explain machine learning to an 8-year old child?"
"What is Gradient Descent?
"What is multicollinearity and how you can overcome it?"
8 of Y
Internal Use - Confidential
data science student questions - 2
"What is the curse of dimensionality?"
"What do you understand by Hypothesis in the content of Machine Learning?"
"What's the difference between a Test Set and a Validation Set?"
"What is cross-validation and what is it used for?"
"What's difference between a Classification Regression Tree algoithm and a Random Forest? And when is one better than the other?"
"What are the basic assumptions to be made for linear regression?"
"Can you explain in simple language what is an Eigenvalue and Eigenvector?"
"Do gradient descent methods always converge to same point?"
"What's difference between continuous, ordinal and categorical variables?"
"What is K-means? How can you select K for K-means?"
9 of Y
Internal Use - Confidential
data science student questions - 3
"Why is naive Bayes so ‘naive’ ?"
"OLS is to linear regression as Maximum likelihood is to logistic regression. Explain the statement."
"What do you understand by Bias Variance trade off?"
"Do you suggest that treating a categorical variable as continuous variable would result in a better predictive model?"
"When does regularization becomes necessary in Machine Learning?"
"Explain a model and its dimensions to an 8 year old."
"How do you determine and deal with correlated features in your data set, how to reduce the dimensionality of data?"
"During analysis, how do you treat missing values?"
"What is Regularization and what kind of problems does regularization solve?"
Extras
11 of Y
Internal Use - Confidential
the Data Scientist Roles
Roles Defined by 3 different Data Science Authors
Data Scientist Core Skills
How To Build A Successful Data Science
Team
The seven people you need on your
Big Data team Descriptions:
Capture Data Engineer Handyman
Expert in Dell EDW, D3, BO, Hana/BMS,
other RDBMS, and ETL work
Open Source Guru (plus Data
Modeler)
Hadoop stack, Cloudera, Linux, data
structures and network
Analyze Machine Learning Expert
Data Modeler (plus all aspets of Data
Engineer and Business Analyst)
SQL, RDBMS, Teradata, Dell
infrastructure
Deep Diver
Machine Learning, R, Python, SQL, ETL
work, algorithm modeling, statistics
Present Business Analyst Story Teller
PowerPoint, Design, Tableau,
understands customers business
language and technical, artistic eye
Snoop (plus Handyman skills)
Enthusiastic, deeply creative, super savy
in Dell envirionments, finds contacts and
not hesitant to do work-arounds
Privacy Wonk
Dell policy meticulous, socially aware,
foresees roadblocks
12 of Y
Internal Use - Confidential
13 of Y
Internal Use - Confidential
14 of Y
Internal Use - Confidential

Mais conteĂşdo relacionado

Mais procurados

Data Scientist Salary, Skills, Jobs And Resume | Data Scientist Career | Data...
Data Scientist Salary, Skills, Jobs And Resume | Data Scientist Career | Data...Data Scientist Salary, Skills, Jobs And Resume | Data Scientist Career | Data...
Data Scientist Salary, Skills, Jobs And Resume | Data Scientist Career | Data...Simplilearn
 
Introduction To Data Science
Introduction To Data ScienceIntroduction To Data Science
Introduction To Data ScienceSpotle.ai
 
How to Become a Data Scientist
How to Become a Data ScientistHow to Become a Data Scientist
How to Become a Data Scientistryanorban
 
Creating a Data Science Ecosystem for Scientific, Societal and Educational Im...
Creating a Data Science Ecosystem for Scientific, Societal and Educational Im...Creating a Data Science Ecosystem for Scientific, Societal and Educational Im...
Creating a Data Science Ecosystem for Scientific, Societal and Educational Im...Ilkay Altintas, Ph.D.
 
Self Study Business Approach to DS_01022022.docx
Self Study Business Approach to DS_01022022.docxSelf Study Business Approach to DS_01022022.docx
Self Study Business Approach to DS_01022022.docxShanmugasundaram M
 
Introduction to Big Data/Machine Learning
Introduction to Big Data/Machine LearningIntroduction to Big Data/Machine Learning
Introduction to Big Data/Machine LearningLars Marius Garshol
 
Data+Science : A First Course
Data+Science : A First CourseData+Science : A First Course
Data+Science : A First CourseArnab Majumdar
 
How To Become a Data Scientist in Iran Marketplace
How To Become a Data Scientist in Iran Marketplace How To Become a Data Scientist in Iran Marketplace
How To Become a Data Scientist in Iran Marketplace Mohamadreza Mohtat
 
Introduction to Data Science
Introduction to Data ScienceIntroduction to Data Science
Introduction to Data ScienceANOOP V S
 
Introduction to Machine Learning & AI
Introduction to Machine Learning & AIIntroduction to Machine Learning & AI
Introduction to Machine Learning & AIMichael Eydman
 
Data science
Data scienceData science
Data science9diov
 
Introduction to Data Science
Introduction to Data ScienceIntroduction to Data Science
Introduction to Data ScienceGabriel Moreira
 
General introduction to AI ML DL DS
General introduction to AI ML DL DSGeneral introduction to AI ML DL DS
General introduction to AI ML DL DSRoopesh Kohad
 
Data Scientist vs Data Analyst vs Data Engineer - Role & Responsibility, Skil...
Data Scientist vs Data Analyst vs Data Engineer - Role & Responsibility, Skil...Data Scientist vs Data Analyst vs Data Engineer - Role & Responsibility, Skil...
Data Scientist vs Data Analyst vs Data Engineer - Role & Responsibility, Skil...Simplilearn
 

Mais procurados (20)

Data Scientist Salary, Skills, Jobs And Resume | Data Scientist Career | Data...
Data Scientist Salary, Skills, Jobs And Resume | Data Scientist Career | Data...Data Scientist Salary, Skills, Jobs And Resume | Data Scientist Career | Data...
Data Scientist Salary, Skills, Jobs And Resume | Data Scientist Career | Data...
 
Introduction To Data Science
Introduction To Data ScienceIntroduction To Data Science
Introduction To Data Science
 
How to Become a Data Scientist
How to Become a Data ScientistHow to Become a Data Scientist
How to Become a Data Scientist
 
Data Science using Python
Data Science using PythonData Science using Python
Data Science using Python
 
Creating a Data Science Ecosystem for Scientific, Societal and Educational Im...
Creating a Data Science Ecosystem for Scientific, Societal and Educational Im...Creating a Data Science Ecosystem for Scientific, Societal and Educational Im...
Creating a Data Science Ecosystem for Scientific, Societal and Educational Im...
 
Self Study Business Approach to DS_01022022.docx
Self Study Business Approach to DS_01022022.docxSelf Study Business Approach to DS_01022022.docx
Self Study Business Approach to DS_01022022.docx
 
Introduction to Big Data/Machine Learning
Introduction to Big Data/Machine LearningIntroduction to Big Data/Machine Learning
Introduction to Big Data/Machine Learning
 
Data+Science : A First Course
Data+Science : A First CourseData+Science : A First Course
Data+Science : A First Course
 
Unit 3 part 2
Unit  3 part 2Unit  3 part 2
Unit 3 part 2
 
How To Become a Data Scientist in Iran Marketplace
How To Become a Data Scientist in Iran Marketplace How To Become a Data Scientist in Iran Marketplace
How To Become a Data Scientist in Iran Marketplace
 
Introduction to Data Science
Introduction to Data ScienceIntroduction to Data Science
Introduction to Data Science
 
NLP & ML Webinar
NLP & ML WebinarNLP & ML Webinar
NLP & ML Webinar
 
Introduction to Machine Learning & AI
Introduction to Machine Learning & AIIntroduction to Machine Learning & AI
Introduction to Machine Learning & AI
 
Introduction to Data Science by Datalent Team @Data Science Clinic #9
Introduction to Data Science by Datalent Team @Data Science Clinic #9Introduction to Data Science by Datalent Team @Data Science Clinic #9
Introduction to Data Science by Datalent Team @Data Science Clinic #9
 
Data science
Data scienceData science
Data science
 
Introduction to Data Science
Introduction to Data ScienceIntroduction to Data Science
Introduction to Data Science
 
General introduction to AI ML DL DS
General introduction to AI ML DL DSGeneral introduction to AI ML DL DS
General introduction to AI ML DL DS
 
Data Scientist vs Data Analyst vs Data Engineer - Role & Responsibility, Skil...
Data Scientist vs Data Analyst vs Data Engineer - Role & Responsibility, Skil...Data Scientist vs Data Analyst vs Data Engineer - Role & Responsibility, Skil...
Data Scientist vs Data Analyst vs Data Engineer - Role & Responsibility, Skil...
 
Data Science in Action
Data Science in ActionData Science in Action
Data Science in Action
 
Machine Learning Algorithms (Part 1)
Machine Learning Algorithms (Part 1)Machine Learning Algorithms (Part 1)
Machine Learning Algorithms (Part 1)
 

Semelhante a Building Data Scientists

JavaZone 2018 - A Practical(ish) Introduction to Data Science
JavaZone 2018 - A Practical(ish) Introduction to Data ScienceJavaZone 2018 - A Practical(ish) Introduction to Data Science
JavaZone 2018 - A Practical(ish) Introduction to Data ScienceMark West
 
Abhishek Training PPT.pptx
Abhishek Training PPT.pptxAbhishek Training PPT.pptx
Abhishek Training PPT.pptxKashishKashish22
 
NDC Oslo : A Practical Introduction to Data Science
NDC Oslo : A Practical Introduction to Data ScienceNDC Oslo : A Practical Introduction to Data Science
NDC Oslo : A Practical Introduction to Data ScienceMark West
 
Data Science Introduction: Concepts, lifecycle, applications.pptx
Data Science Introduction: Concepts, lifecycle, applications.pptxData Science Introduction: Concepts, lifecycle, applications.pptx
Data Science Introduction: Concepts, lifecycle, applications.pptxsumitkumar600840
 
Data Science & Big Data - Theory.pdf
Data Science & Big Data - Theory.pdfData Science & Big Data - Theory.pdf
Data Science & Big Data - Theory.pdfRAKESHG79
 
GeeCon Prague 2018 - A Practical-ish Introduction to Data Science
GeeCon Prague 2018 - A Practical-ish Introduction to Data ScienceGeeCon Prague 2018 - A Practical-ish Introduction to Data Science
GeeCon Prague 2018 - A Practical-ish Introduction to Data ScienceMark West
 
A Practical-ish Introduction to Data Science
A Practical-ish Introduction to Data ScienceA Practical-ish Introduction to Data Science
A Practical-ish Introduction to Data ScienceMark West
 
From SQL to Python - A Beginner's Guide to Making the Switch
From SQL to Python - A Beginner's Guide to Making the SwitchFrom SQL to Python - A Beginner's Guide to Making the Switch
From SQL to Python - A Beginner's Guide to Making the SwitchRachel Berryman
 
How to crack Big Data and Data Science roles
How to crack Big Data and Data Science rolesHow to crack Big Data and Data Science roles
How to crack Big Data and Data Science rolesUpXAcademy
 
L2 DS Tools and Application.pptx
L2 DS Tools and Application.pptxL2 DS Tools and Application.pptx
L2 DS Tools and Application.pptxShambhavi Vats
 
intro to data science Clustering and visualization of data science subfields ...
intro to data science Clustering and visualization of data science subfields ...intro to data science Clustering and visualization of data science subfields ...
intro to data science Clustering and visualization of data science subfields ...jybufgofasfbkpoovh
 
Data Science Job ready #DataScienceInterview Question and Answers 2022 | #Dat...
Data Science Job ready #DataScienceInterview Question and Answers 2022 | #Dat...Data Science Job ready #DataScienceInterview Question and Answers 2022 | #Dat...
Data Science Job ready #DataScienceInterview Question and Answers 2022 | #Dat...Rohit Dubey
 
A New Paradigm on Analytic-Driven Information and Automation V2.pdf
A New Paradigm on Analytic-Driven Information and Automation V2.pdfA New Paradigm on Analytic-Driven Information and Automation V2.pdf
A New Paradigm on Analytic-Driven Information and Automation V2.pdfArmyTrilidiaDevegaSK
 
Data Science at Scale - The DevOps Approach
Data Science at Scale - The DevOps ApproachData Science at Scale - The DevOps Approach
Data Science at Scale - The DevOps ApproachMihai Criveti
 
Introduction to Data Science.pdf
Introduction to Data Science.pdfIntroduction to Data Science.pdf
Introduction to Data Science.pdfUniversity of Sindh
 
Which institute is best for data science?
Which institute is best for data science?Which institute is best for data science?
Which institute is best for data science?DIGITALSAI1
 
Best Selenium certification course
Best Selenium certification courseBest Selenium certification course
Best Selenium certification courseKumarNaik21
 
Data science training in hyd ppt (1)
Data science training in hyd ppt (1)Data science training in hyd ppt (1)
Data science training in hyd ppt (1)SayyedYusufali
 
Data science training institute in hyderabad
Data science training institute in hyderabadData science training institute in hyderabad
Data science training institute in hyderabadVamsiNihal
 

Semelhante a Building Data Scientists (20)

JavaZone 2018 - A Practical(ish) Introduction to Data Science
JavaZone 2018 - A Practical(ish) Introduction to Data ScienceJavaZone 2018 - A Practical(ish) Introduction to Data Science
JavaZone 2018 - A Practical(ish) Introduction to Data Science
 
Abhishek Training PPT.pptx
Abhishek Training PPT.pptxAbhishek Training PPT.pptx
Abhishek Training PPT.pptx
 
NDC Oslo : A Practical Introduction to Data Science
NDC Oslo : A Practical Introduction to Data ScienceNDC Oslo : A Practical Introduction to Data Science
NDC Oslo : A Practical Introduction to Data Science
 
00-01 DSnDA.pdf
00-01 DSnDA.pdf00-01 DSnDA.pdf
00-01 DSnDA.pdf
 
Data Science Introduction: Concepts, lifecycle, applications.pptx
Data Science Introduction: Concepts, lifecycle, applications.pptxData Science Introduction: Concepts, lifecycle, applications.pptx
Data Science Introduction: Concepts, lifecycle, applications.pptx
 
Data Science & Big Data - Theory.pdf
Data Science & Big Data - Theory.pdfData Science & Big Data - Theory.pdf
Data Science & Big Data - Theory.pdf
 
GeeCon Prague 2018 - A Practical-ish Introduction to Data Science
GeeCon Prague 2018 - A Practical-ish Introduction to Data ScienceGeeCon Prague 2018 - A Practical-ish Introduction to Data Science
GeeCon Prague 2018 - A Practical-ish Introduction to Data Science
 
A Practical-ish Introduction to Data Science
A Practical-ish Introduction to Data ScienceA Practical-ish Introduction to Data Science
A Practical-ish Introduction to Data Science
 
From SQL to Python - A Beginner's Guide to Making the Switch
From SQL to Python - A Beginner's Guide to Making the SwitchFrom SQL to Python - A Beginner's Guide to Making the Switch
From SQL to Python - A Beginner's Guide to Making the Switch
 
How to crack Big Data and Data Science roles
How to crack Big Data and Data Science rolesHow to crack Big Data and Data Science roles
How to crack Big Data and Data Science roles
 
L2 DS Tools and Application.pptx
L2 DS Tools and Application.pptxL2 DS Tools and Application.pptx
L2 DS Tools and Application.pptx
 
intro to data science Clustering and visualization of data science subfields ...
intro to data science Clustering and visualization of data science subfields ...intro to data science Clustering and visualization of data science subfields ...
intro to data science Clustering and visualization of data science subfields ...
 
Data Science Job ready #DataScienceInterview Question and Answers 2022 | #Dat...
Data Science Job ready #DataScienceInterview Question and Answers 2022 | #Dat...Data Science Job ready #DataScienceInterview Question and Answers 2022 | #Dat...
Data Science Job ready #DataScienceInterview Question and Answers 2022 | #Dat...
 
A New Paradigm on Analytic-Driven Information and Automation V2.pdf
A New Paradigm on Analytic-Driven Information and Automation V2.pdfA New Paradigm on Analytic-Driven Information and Automation V2.pdf
A New Paradigm on Analytic-Driven Information and Automation V2.pdf
 
Data Science at Scale - The DevOps Approach
Data Science at Scale - The DevOps ApproachData Science at Scale - The DevOps Approach
Data Science at Scale - The DevOps Approach
 
Introduction to Data Science.pdf
Introduction to Data Science.pdfIntroduction to Data Science.pdf
Introduction to Data Science.pdf
 
Which institute is best for data science?
Which institute is best for data science?Which institute is best for data science?
Which institute is best for data science?
 
Best Selenium certification course
Best Selenium certification courseBest Selenium certification course
Best Selenium certification course
 
Data science training in hyd ppt (1)
Data science training in hyd ppt (1)Data science training in hyd ppt (1)
Data science training in hyd ppt (1)
 
Data science training institute in hyderabad
Data science training institute in hyderabadData science training institute in hyderabad
Data science training institute in hyderabad
 

Último

Online banking management system project.pdf
Online banking management system project.pdfOnline banking management system project.pdf
Online banking management system project.pdfKamal Acharya
 
Intze Overhead Water Tank Design by Working Stress - IS Method.pdf
Intze Overhead Water Tank  Design by Working Stress - IS Method.pdfIntze Overhead Water Tank  Design by Working Stress - IS Method.pdf
Intze Overhead Water Tank Design by Working Stress - IS Method.pdfSuman Jyoti
 
Booking open Available Pune Call Girls Pargaon 6297143586 Call Hot Indian Gi...
Booking open Available Pune Call Girls Pargaon  6297143586 Call Hot Indian Gi...Booking open Available Pune Call Girls Pargaon  6297143586 Call Hot Indian Gi...
Booking open Available Pune Call Girls Pargaon 6297143586 Call Hot Indian Gi...Call Girls in Nagpur High Profile
 
Top Rated Call Girls In chittoor 📱 {7001035870} VIP Escorts chittoor
Top Rated Call Girls In chittoor 📱 {7001035870} VIP Escorts chittoorTop Rated Call Girls In chittoor 📱 {7001035870} VIP Escorts chittoor
Top Rated Call Girls In chittoor 📱 {7001035870} VIP Escorts chittoordharasingh5698
 
chapter 5.pptx: drainage and irrigation engineering
chapter 5.pptx: drainage and irrigation engineeringchapter 5.pptx: drainage and irrigation engineering
chapter 5.pptx: drainage and irrigation engineeringmulugeta48
 
VIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 BookingVIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 Bookingdharasingh5698
 
Design For Accessibility: Getting it right from the start
Design For Accessibility: Getting it right from the startDesign For Accessibility: Getting it right from the start
Design For Accessibility: Getting it right from the startQuintin Balsdon
 
CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete Record
CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete RecordCCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete Record
CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete RecordAsst.prof M.Gokilavani
 
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXssuser89054b
 
Call Girls in Netaji Nagar, Delhi 💯 Call Us 🔝9953056974 🔝 Escort Service
Call Girls in Netaji Nagar, Delhi 💯 Call Us 🔝9953056974 🔝 Escort ServiceCall Girls in Netaji Nagar, Delhi 💯 Call Us 🔝9953056974 🔝 Escort Service
Call Girls in Netaji Nagar, Delhi 💯 Call Us 🔝9953056974 🔝 Escort Service9953056974 Low Rate Call Girls In Saket, Delhi NCR
 
FEA Based Level 3 Assessment of Deformed Tanks with Fluid Induced Loads
FEA Based Level 3 Assessment of Deformed Tanks with Fluid Induced LoadsFEA Based Level 3 Assessment of Deformed Tanks with Fluid Induced Loads
FEA Based Level 3 Assessment of Deformed Tanks with Fluid Induced LoadsArindam Chakraborty, Ph.D., P.E. (CA, TX)
 
ONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdf
ONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdfONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdf
ONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdfKamal Acharya
 
Call Girls in Ramesh Nagar Delhi 💯 Call Us 🔝9953056974 🔝 Escort Service
Call Girls in Ramesh Nagar Delhi 💯 Call Us 🔝9953056974 🔝 Escort ServiceCall Girls in Ramesh Nagar Delhi 💯 Call Us 🔝9953056974 🔝 Escort Service
Call Girls in Ramesh Nagar Delhi 💯 Call Us 🔝9953056974 🔝 Escort Service9953056974 Low Rate Call Girls In Saket, Delhi NCR
 
Call Girls In Bangalore ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bangalore ☎ 7737669865 🥵 Book Your One night StandCall Girls In Bangalore ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bangalore ☎ 7737669865 🥵 Book Your One night Standamitlee9823
 
AKTU Computer Networks notes --- Unit 3.pdf
AKTU Computer Networks notes ---  Unit 3.pdfAKTU Computer Networks notes ---  Unit 3.pdf
AKTU Computer Networks notes --- Unit 3.pdfankushspencer015
 
Work-Permit-Receiver-in-Saudi-Aramco.pptx
Work-Permit-Receiver-in-Saudi-Aramco.pptxWork-Permit-Receiver-in-Saudi-Aramco.pptx
Work-Permit-Receiver-in-Saudi-Aramco.pptxJuliansyahHarahap1
 
Unleashing the Power of the SORA AI lastest leap
Unleashing the Power of the SORA AI lastest leapUnleashing the Power of the SORA AI lastest leap
Unleashing the Power of the SORA AI lastest leapRishantSharmaFr
 

Último (20)

Online banking management system project.pdf
Online banking management system project.pdfOnline banking management system project.pdf
Online banking management system project.pdf
 
Intze Overhead Water Tank Design by Working Stress - IS Method.pdf
Intze Overhead Water Tank  Design by Working Stress - IS Method.pdfIntze Overhead Water Tank  Design by Working Stress - IS Method.pdf
Intze Overhead Water Tank Design by Working Stress - IS Method.pdf
 
(INDIRA) Call Girl Meerut Call Now 8617697112 Meerut Escorts 24x7
(INDIRA) Call Girl Meerut Call Now 8617697112 Meerut Escorts 24x7(INDIRA) Call Girl Meerut Call Now 8617697112 Meerut Escorts 24x7
(INDIRA) Call Girl Meerut Call Now 8617697112 Meerut Escorts 24x7
 
Booking open Available Pune Call Girls Pargaon 6297143586 Call Hot Indian Gi...
Booking open Available Pune Call Girls Pargaon  6297143586 Call Hot Indian Gi...Booking open Available Pune Call Girls Pargaon  6297143586 Call Hot Indian Gi...
Booking open Available Pune Call Girls Pargaon 6297143586 Call Hot Indian Gi...
 
Top Rated Call Girls In chittoor 📱 {7001035870} VIP Escorts chittoor
Top Rated Call Girls In chittoor 📱 {7001035870} VIP Escorts chittoorTop Rated Call Girls In chittoor 📱 {7001035870} VIP Escorts chittoor
Top Rated Call Girls In chittoor 📱 {7001035870} VIP Escorts chittoor
 
(INDIRA) Call Girl Bhosari Call Now 8617697112 Bhosari Escorts 24x7
(INDIRA) Call Girl Bhosari Call Now 8617697112 Bhosari Escorts 24x7(INDIRA) Call Girl Bhosari Call Now 8617697112 Bhosari Escorts 24x7
(INDIRA) Call Girl Bhosari Call Now 8617697112 Bhosari Escorts 24x7
 
chapter 5.pptx: drainage and irrigation engineering
chapter 5.pptx: drainage and irrigation engineeringchapter 5.pptx: drainage and irrigation engineering
chapter 5.pptx: drainage and irrigation engineering
 
VIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 BookingVIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 Booking
 
Design For Accessibility: Getting it right from the start
Design For Accessibility: Getting it right from the startDesign For Accessibility: Getting it right from the start
Design For Accessibility: Getting it right from the start
 
CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete Record
CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete RecordCCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete Record
CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete Record
 
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
 
Call Girls in Netaji Nagar, Delhi 💯 Call Us 🔝9953056974 🔝 Escort Service
Call Girls in Netaji Nagar, Delhi 💯 Call Us 🔝9953056974 🔝 Escort ServiceCall Girls in Netaji Nagar, Delhi 💯 Call Us 🔝9953056974 🔝 Escort Service
Call Girls in Netaji Nagar, Delhi 💯 Call Us 🔝9953056974 🔝 Escort Service
 
FEA Based Level 3 Assessment of Deformed Tanks with Fluid Induced Loads
FEA Based Level 3 Assessment of Deformed Tanks with Fluid Induced LoadsFEA Based Level 3 Assessment of Deformed Tanks with Fluid Induced Loads
FEA Based Level 3 Assessment of Deformed Tanks with Fluid Induced Loads
 
ONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdf
ONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdfONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdf
ONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdf
 
Call Girls in Ramesh Nagar Delhi 💯 Call Us 🔝9953056974 🔝 Escort Service
Call Girls in Ramesh Nagar Delhi 💯 Call Us 🔝9953056974 🔝 Escort ServiceCall Girls in Ramesh Nagar Delhi 💯 Call Us 🔝9953056974 🔝 Escort Service
Call Girls in Ramesh Nagar Delhi 💯 Call Us 🔝9953056974 🔝 Escort Service
 
Cara Menggugurkan Sperma Yang Masuk Rahim Biyar Tidak Hamil
Cara Menggugurkan Sperma Yang Masuk Rahim Biyar Tidak HamilCara Menggugurkan Sperma Yang Masuk Rahim Biyar Tidak Hamil
Cara Menggugurkan Sperma Yang Masuk Rahim Biyar Tidak Hamil
 
Call Girls In Bangalore ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bangalore ☎ 7737669865 🥵 Book Your One night StandCall Girls In Bangalore ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bangalore ☎ 7737669865 🥵 Book Your One night Stand
 
AKTU Computer Networks notes --- Unit 3.pdf
AKTU Computer Networks notes ---  Unit 3.pdfAKTU Computer Networks notes ---  Unit 3.pdf
AKTU Computer Networks notes --- Unit 3.pdf
 
Work-Permit-Receiver-in-Saudi-Aramco.pptx
Work-Permit-Receiver-in-Saudi-Aramco.pptxWork-Permit-Receiver-in-Saudi-Aramco.pptx
Work-Permit-Receiver-in-Saudi-Aramco.pptx
 
Unleashing the Power of the SORA AI lastest leap
Unleashing the Power of the SORA AI lastest leapUnleashing the Power of the SORA AI lastest leap
Unleashing the Power of the SORA AI lastest leap
 

Building Data Scientists

  • 1. Building Data Scientists Machine Learning Mastery in Python Mitch Sanders Jan 10th 2018 Internal Use - Confidential
  • 2. 2 of Y Internal Use - Confidential Trend #2 Non-Data Scientists will perform more fairly sophisticated analytics alongside data scientists Data Scientist Algorithm Coder Data Science Citizens Advanced Analytics Programmers Statisticians Business Analyst Coders Data Science continues to develop specialties - this means the mythical ‘full stack’ data scientist will disappear Trend #1 Data Scientist Data Engineer Algorithm Coder Data Storyteller Industry Trends for 2018 – How what we’re doing fits into the future the Context
  • 3. 3 of Y Internal Use - Confidential the Course Machine Learning Mastery - Understand Your Data - Create Accurate Models - Work Projects End-To-End • 16 weeks – May-Oct., 2017 • 20+ class hours – 20% homework, 80% live coding • 17 notebooks – Python code templates • 4 Prerequisites – Coding, statistics, algorithms, thirst to learn • 1 Textbook – Machine Learning Mastery w/ Python -Dr. Jason Brownlee • 1 Teacher – Mitch Sanders w/ Assistant – Uday Waghmare • 14 Students – global: software engineers, adv. analysts, statisticians • Platform – Jupyter, Python 2.7, Anaconda • Code Repository – GitHub • NPS Survey – Survey Monkey, LTR = 90 • Awarded – “On the Spot”
  • 4. 4 of Y Internal Use - Confidential the Content Prepare & Explore Model Improve Accuracy & Finalize Python ML Ecosystem SciPy Scikit-learn Crash Courses NumPy Matplotlib Pandas Load Libraries & Data Descriptive Statistics Attribute Data Types Class Distribution Correlation Analysis Skew of Univariates Pre Processing Rescale Standardize Normalize BinarizeFeature Selection Tree & Univariate Recursive -RFE Principle Comp. Analysis - PCA Feature Importance Resampling Split into Train/Test K-fold Cross Validation Leave One Out Repeated Random Evaluation Metrics For Classification For Regression Spot Check Classification Algorithms Linear – • Logistic Regression • Linear Discriminate Analysis (LDA) Non-linear – • K-Nearest Neighbor (KNN) • NaĂŻve Bayes • Class & Regression Trees (CART) • Support Vector Machines (SVM) Compare Algorithms Spot Check Regression Algorithms Linear – LR, LASSO, ElasticNet (EN) Non-Linear – CART, SVR, KNN Automate w/ Pipelines Preparation Pipelines Feature Extraction Pipelines Modeling Pipelines Ensembles - Performance Improvements Boosting – • AdaBoost, • Gradient Boosting (GBM) Bagging – • Random Forest, Extra Trees • Voting Algorithm Parameter Tuning Parameters Grid Search Random Search Finalize Model Predict on Validation Data Create Standalone on Entire Data Save Model for Production Visualization Univariate Plots Multivariate Plots Case Studies #1 & #2 Key concepts – and flow – the 17 notebooks #1 #17
  • 6. 6 of Y Internal Use - Confidential the Course Syllabus Python Ecosystem for Machine Learning • Python • SciPy • Scikit-learn • Python Ecosystem Installation • Summary Crash Course in Python and SciPy • Python Crash Course • NumPy Crash Course • Matplotlib Crash Course • Pandas Crash Course • Summary How To Load Machine Learning Data • Considerations When Loading CSV Data • Pima Indians Dataset • Load CSV Files with the Python Standard Library • Load CSV Files with NumPy • Load CSV Files with Pandas • Summary Understand Your Data With Visualization • Univariate Plots • Multivariate Plots • Summary Prepare Your Data For Machine Learning • Need For Data Pre-processing • Data Transforms • Rescale Data • Standardize Data • Normalize Data • Binarize Data (Make Binary) • Summary Feature Selection For Machine Learning • Feature Selection • Univariate Selection • Recursive Feature Elimination • Principal Component Analysis • Feature Importance • Summary Evaluate the Performance of Machine Learning Algorithms with Resampling • Evaluate Machine Learning Algorithms • Split into Train and Test Sets • K-fold Cross-Validation • Leave One Out Cross-Validation • Repeated Random Test-Train Splits • What Techniques to Use When • Summary Machine Learning Algorithm Performance Metrics • Algorithm Evaluation Metrics • Classification Metrics • Regression Metrics • Summary Spot-Check Classification Algorithms • Algorithm Spot-Checking • Algorithms Overview • Linear Machine Learning Algorithms • Nonlinear Machine Learning Algorithms • Summary Spot-Check Regression Algorithms • Algorithms Overview • Linear Machine Learning Algorithms • Nonlinear Machine Learning Algorithms • Summary Compare Machine Learning Algorithms • Choose The Best Machine Learning Model • Compare Machine Learning Algorithms Consistently • Summary Automate Machine Learning Workflows with Pipelines • Automating Machine Learning Workflows • Data Preparation and Modeling Pipeline • Feature Extraction and Modeling Pipeline • Summary Improve Performance with Ensembles • Combine Models Into Ensemble Predictions • Bagging Algorithms • Boosting Algorithms • Voting Ensemble • Summary
  • 7. 7 of Y Internal Use - Confidential data science student questions - 1 “So you do Data Science work. What really does that involve? And how is that different than programming, statistical work or data engineering?” “I want to learn Data Science. Between R, Python and SAS, where should I start and what are the Pros and Cons of each?” “What is OOP (Object orientated programming) and Structured Programming and what’s the difference between them?" “What is main differences between Python 2.7 and Python 3.x versions? And why do so many developers stay with Python 2.7?” "What is the difference between Supervised Learning an Unsupervised Learning?" "What's different graphing might a univariate have compared to a bivariate analysis? Can you graph multivariate?" "How do you explain machine learning to an 8-year old child?" "What is Gradient Descent? "What is multicollinearity and how you can overcome it?"
  • 8. 8 of Y Internal Use - Confidential data science student questions - 2 "What is the curse of dimensionality?" "What do you understand by Hypothesis in the content of Machine Learning?" "What's the difference between a Test Set and a Validation Set?" "What is cross-validation and what is it used for?" "What's difference between a Classification Regression Tree algoithm and a Random Forest? And when is one better than the other?" "What are the basic assumptions to be made for linear regression?" "Can you explain in simple language what is an Eigenvalue and Eigenvector?" "Do gradient descent methods always converge to same point?" "What's difference between continuous, ordinal and categorical variables?" "What is K-means? How can you select K for K-means?"
  • 9. 9 of Y Internal Use - Confidential data science student questions - 3 "Why is naive Bayes so ‘naive’ ?" "OLS is to linear regression as Maximum likelihood is to logistic regression. Explain the statement." "What do you understand by Bias Variance trade off?" "Do you suggest that treating a categorical variable as continuous variable would result in a better predictive model?" "When does regularization becomes necessary in Machine Learning?" "Explain a model and its dimensions to an 8 year old." "How do you determine and deal with correlated features in your data set, how to reduce the dimensionality of data?" "During analysis, how do you treat missing values?" "What is Regularization and what kind of problems does regularization solve?"
  • 11. 11 of Y Internal Use - Confidential the Data Scientist Roles Roles Defined by 3 different Data Science Authors Data Scientist Core Skills How To Build A Successful Data Science Team The seven people you need on your Big Data team Descriptions: Capture Data Engineer Handyman Expert in Dell EDW, D3, BO, Hana/BMS, other RDBMS, and ETL work Open Source Guru (plus Data Modeler) Hadoop stack, Cloudera, Linux, data structures and network Analyze Machine Learning Expert Data Modeler (plus all aspets of Data Engineer and Business Analyst) SQL, RDBMS, Teradata, Dell infrastructure Deep Diver Machine Learning, R, Python, SQL, ETL work, algorithm modeling, statistics Present Business Analyst Story Teller PowerPoint, Design, Tableau, understands customers business language and technical, artistic eye Snoop (plus Handyman skills) Enthusiastic, deeply creative, super savy in Dell envirionments, finds contacts and not hesitant to do work-arounds Privacy Wonk Dell policy meticulous, socially aware, foresees roadblocks
  • 12. 12 of Y Internal Use - Confidential
  • 13. 13 of Y Internal Use - Confidential
  • 14. 14 of Y Internal Use - Confidential

Notas do Editor

  1. https://www.datasciencecentral.com/profiles/blogs/6-predictions-about-data-science-machine-learning-and-ai-for-2018