SlideShare uma empresa Scribd logo
1 de 21
WHY SO MANY
DATA SCIENCE
PROJECTS FAIL?
Ethan Ram / Aug. 2018
1
• Between 70% to 80% of
corporate business
intelligence projects fail
(Gartner)
• 55% of big data projects are
never finished (Inforchimps)
• Only 13% of organizations
achieve full-scale production
for their in-house big-data
implementations (Qubole)
• And the results…
DATA SCIENCE PROJECTS FAIL…
9/3/2018 Why So Many Data Science Projects Fail 2
Top of the list of
developers who said they
are looking for a new job*:
• ML specialists - 14.3%
• Data scientists - 13.2%
9/3/2018 Why So Many Data Science Projects Fail 3
“I HATE THIS JOB!”
* 2018 Stack Overflow survey based on 64,000 developers’ answers
Business
objective and
plan
Build
dataset
Model data
and validate
Implement
application
Deploy
Monitor,
measure &
optimize
We’ll look at some common failures in each step and
suggest better approaches.
DATA SCIENCE APPLICATION LIFECYCLE
9/3/2018 Why So Many Data Science Projects Fail 4
•First day success
•No false-positives
•100% accuracy
•No business value expected
•Expecting that the ML itself
would be the product
•Not defining the deliverable
9/3/2018 Why So Many Data Science Projects Fail 5
BUSINESS OBJECTIVE FAILURES
• Google “fixed” its “racist” algorithm
by removing gorillas from its image-
labeling tech
CAN YOU AFFORD A FALSE POSITIVE?
•Very few business’ core product is
AI/ML/Data based
•Most use those tools to improve
their bottom lines with existing
products
BE REALISTIC!
9/3/2018 Why So Many Data Science Projects Fail 7
1. Descriptive analysis (offline report)
2. Dashboard (real-time system)
3. Automated decision making system (“self
driving” system)
4. Dataset with specific qualities (to be used by
another ML)
Define: leverage, friction to impact and
cleanness
5. Methodology (dataset >> model)
6. Framework (API/SDK to build methodologies)
7. Proof-of-concept (proof a viable methodology)
TYPES OF DELIVERABLES
9/3/2018 Why So Many Data Science Projects Fail 8
•Missing diversity in the team
•In many projects 80% of work is
working on the dataset!
•It’s a *research* project!
•Short time to delivery
PLANNING FAILURES
Drue Conaway: Data Science Diagram
9/3/2018 Why So Many Data Science Projects Fail 9
Engineering
YOLO V3 NETWORK ARCHITECTURE
•Too little data to
build on
•Dataset is dirty
•Missing data from
the field
DATA INVENTORY FAILURES
9/3/2018 Why So Many Data Science Projects Fail 11
9/3/2018 Why So Many Data Science Projects Fail 12
DIRTY DATASET: NEGATIVE INFLUENCE
Data-set includes
negative influence
examples
Resulting
Classification
(with confidence)
9/3/2018 Why So Many Data Science Projects Fail 13
DATA MODELING FAILURESYou need to be
able to understand
the result! •Jumping to conclusions on what
the data is
•Assuming it works based on a
small sample
•Feedback-loop in results
•Missing cross validation
•Choosing algorithms that are too
heavy for the application
Supervised
learning
Classification
Linear classifiers
/ Fisher's
discriminant
Support vector
machines /
Least squares
Quadratic
classifiers
Kernel
estimation
K-nearest
neighbor
Regression
Linear
Regression
Logistic
Regression
CART
Naïve Bayes
Ensemble
Bagging with
Random Forests
Boosting with
XGBoost
Unsupervised
learning
Association
Apriori
K-means
Clustering
Mean-Shift
Density-Based
Spatial
EM-GMM
Agglomerative
Hierarchical
Dimensionality
Reduction
Feature
Selection
Variance
Thresholds
Correlation
Thresholds
Genetic
Algorithms (GA)
Stepwise Search
Feature
extraction
PCA
Linear
Discriminant
Analysis (LDA)
Autoencoders
Reinforcement
learning
Exploration
a.Criterion of
optimality
a.Brute force
a.Value function
a.Direct policy
search
9/3/2018 Why So Many Data Science Projects Fail 14
Application
Class
Algorithms
ML ALGORITHMS [PARTIAL] MAP
Boosting
Decision trees
Random forests
Neural networks
Learning vector
quantization
•Requesting the Data Scientists
team to build the application…
•Not testing to scale
•Switching from monitoring to
automatic action-taking too fast
•Missing safeguards on output
•Not preparing for attack!
APPLICATION FAILURES
9/3/2018 Why So Many Data Science Projects Fail 15
9/3/2018 Why So Many Data Science Projects Fail 16
DIRECT ATTACK EXAMPLE

9/3/2018 Why So Many Data Science Projects Fail 17
SYNTHESIZED ADVERSARIAL EXAMPLE
“WE HAVEN’T SEEN ANYTHING LIKE THIS BEFORE…”
9/3/2018 Why So Many Data Science Projects Fail 18
•Assuming it just works…
• Not having a long enough
beta
• Missing feedback from real
users
•Missing KPIs
• Measure business success
• Find false-positives
•Missing A-B testing built-in
MONITOR > MEASURE > OPTIMIZE FAILURES
9/3/2018 Why So Many Data Science Projects Fail 19
"Right now, a lot of our AI
systems make decisions in
ways that people don't really
understand… And I don't
think that… we want to end
up with systems that people
don't understand how they're
making decisions.“
• ZUCKERBERG at Senate
hearing 10-Apr-18
9/3/2018 Why So Many Data Science Projects Fail 20
Business
objective and
plan
Build
dataset
Model data
and validate
Implement
application
Deploy
Monitor,
measure &
optimize
DATA SCIENCE APPLICATION LIFECYCLE
•Q&A
9/3/2018 Why So Many Data Science Projects Fail 21

Mais conteúdo relacionado

Mais procurados

Architecting a Platform for Enterprise Use - Strata London 2018
Architecting a Platform for Enterprise Use - Strata London 2018Architecting a Platform for Enterprise Use - Strata London 2018
Architecting a Platform for Enterprise Use - Strata London 2018mark madsen
 
Architecting a Data Platform For Enterprise Use (Strata NY 2018)
Architecting a Data Platform For Enterprise Use (Strata NY 2018)Architecting a Data Platform For Enterprise Use (Strata NY 2018)
Architecting a Data Platform For Enterprise Use (Strata NY 2018)mark madsen
 
[Infographic] Uniting Internet of Things and Big Data
[Infographic] Uniting Internet of Things and Big Data[Infographic] Uniting Internet of Things and Big Data
[Infographic] Uniting Internet of Things and Big DataSnapLogic
 
Max Cottica slides from Future of Business Intelligence
Max Cottica slides from Future of Business Intelligence Max Cottica slides from Future of Business Intelligence
Max Cottica slides from Future of Business Intelligence Lauren Campbell Assoc CIPD
 
Pay no attention to the man behind the curtain - the unseen work behind data ...
Pay no attention to the man behind the curtain - the unseen work behind data ...Pay no attention to the man behind the curtain - the unseen work behind data ...
Pay no attention to the man behind the curtain - the unseen work behind data ...mark madsen
 
Focus on Your Analysis, Not Your SQL Code
Focus on Your Analysis, Not Your SQL CodeFocus on Your Analysis, Not Your SQL Code
Focus on Your Analysis, Not Your SQL CodeDATAVERSITY
 
Lifecycle of a Data Science Project
Lifecycle of a Data Science ProjectLifecycle of a Data Science Project
Lifecycle of a Data Science ProjectDigital Vidya
 
Domino and AWS: collaborative analytics and model governance at financial ser...
Domino and AWS: collaborative analytics and model governance at financial ser...Domino and AWS: collaborative analytics and model governance at financial ser...
Domino and AWS: collaborative analytics and model governance at financial ser...Domino Data Lab
 
Back to Square One: Building a Data Science Team from Scratch
Back to Square One: Building a Data Science Team from ScratchBack to Square One: Building a Data Science Team from Scratch
Back to Square One: Building a Data Science Team from ScratchKlaas Bosteels
 
How to understand trends in the data & software market
How to understand trends in the data & software marketHow to understand trends in the data & software market
How to understand trends in the data & software marketmark madsen
 
How Data Science Builds Better Products - Data Science Pop-up Seattle
How Data Science Builds Better Products - Data Science Pop-up SeattleHow Data Science Builds Better Products - Data Science Pop-up Seattle
How Data Science Builds Better Products - Data Science Pop-up SeattleDomino Data Lab
 
What Managers Need to Know about Data Science
What Managers Need to Know about Data ScienceWhat Managers Need to Know about Data Science
What Managers Need to Know about Data ScienceAnnie Flippo
 
20151016 Data Science For Project Managers
20151016 Data Science For Project Managers20151016 Data Science For Project Managers
20151016 Data Science For Project ManagersTze-Yiu Yong
 
The Black Box: Interpretability, Reproducibility, and Data Management
The Black Box: Interpretability, Reproducibility, and Data ManagementThe Black Box: Interpretability, Reproducibility, and Data Management
The Black Box: Interpretability, Reproducibility, and Data Managementmark madsen
 
How can a quality engineering and assurance consultancy keep you ahead of others
How can a quality engineering and assurance consultancy keep you ahead of othersHow can a quality engineering and assurance consultancy keep you ahead of others
How can a quality engineering and assurance consultancy keep you ahead of othersgreyaudrina
 
Big data and Predictive Analytics By : Professor Lili Saghafi
Big data and Predictive Analytics By : Professor Lili SaghafiBig data and Predictive Analytics By : Professor Lili Saghafi
Big data and Predictive Analytics By : Professor Lili SaghafiProfessor Lili Saghafi
 
1645 track 1 bress_using his laptop
1645 track 1 bress_using his laptop1645 track 1 bress_using his laptop
1645 track 1 bress_using his laptopRising Media, Inc.
 
Never Mind Big Data: We're Still Living in the Era of Big Spreadsheet
Never Mind Big Data: We're Still Living in the Era of Big SpreadsheetNever Mind Big Data: We're Still Living in the Era of Big Spreadsheet
Never Mind Big Data: We're Still Living in the Era of Big SpreadsheetInformationActive Inc.
 
Giovanni Lanzani GoDataDriven
Giovanni Lanzani GoDataDrivenGiovanni Lanzani GoDataDriven
Giovanni Lanzani GoDataDrivenBigDataExpo
 

Mais procurados (20)

Architecting a Platform for Enterprise Use - Strata London 2018
Architecting a Platform for Enterprise Use - Strata London 2018Architecting a Platform for Enterprise Use - Strata London 2018
Architecting a Platform for Enterprise Use - Strata London 2018
 
Architecting a Data Platform For Enterprise Use (Strata NY 2018)
Architecting a Data Platform For Enterprise Use (Strata NY 2018)Architecting a Data Platform For Enterprise Use (Strata NY 2018)
Architecting a Data Platform For Enterprise Use (Strata NY 2018)
 
[Infographic] Uniting Internet of Things and Big Data
[Infographic] Uniting Internet of Things and Big Data[Infographic] Uniting Internet of Things and Big Data
[Infographic] Uniting Internet of Things and Big Data
 
Max Cottica slides from Future of Business Intelligence
Max Cottica slides from Future of Business Intelligence Max Cottica slides from Future of Business Intelligence
Max Cottica slides from Future of Business Intelligence
 
Pay no attention to the man behind the curtain - the unseen work behind data ...
Pay no attention to the man behind the curtain - the unseen work behind data ...Pay no attention to the man behind the curtain - the unseen work behind data ...
Pay no attention to the man behind the curtain - the unseen work behind data ...
 
Focus on Your Analysis, Not Your SQL Code
Focus on Your Analysis, Not Your SQL CodeFocus on Your Analysis, Not Your SQL Code
Focus on Your Analysis, Not Your SQL Code
 
Lifecycle of a Data Science Project
Lifecycle of a Data Science ProjectLifecycle of a Data Science Project
Lifecycle of a Data Science Project
 
Domino and AWS: collaborative analytics and model governance at financial ser...
Domino and AWS: collaborative analytics and model governance at financial ser...Domino and AWS: collaborative analytics and model governance at financial ser...
Domino and AWS: collaborative analytics and model governance at financial ser...
 
Back to Square One: Building a Data Science Team from Scratch
Back to Square One: Building a Data Science Team from ScratchBack to Square One: Building a Data Science Team from Scratch
Back to Square One: Building a Data Science Team from Scratch
 
How to understand trends in the data & software market
How to understand trends in the data & software marketHow to understand trends in the data & software market
How to understand trends in the data & software market
 
How Data Science Builds Better Products - Data Science Pop-up Seattle
How Data Science Builds Better Products - Data Science Pop-up SeattleHow Data Science Builds Better Products - Data Science Pop-up Seattle
How Data Science Builds Better Products - Data Science Pop-up Seattle
 
What Managers Need to Know about Data Science
What Managers Need to Know about Data ScienceWhat Managers Need to Know about Data Science
What Managers Need to Know about Data Science
 
20151016 Data Science For Project Managers
20151016 Data Science For Project Managers20151016 Data Science For Project Managers
20151016 Data Science For Project Managers
 
The Black Box: Interpretability, Reproducibility, and Data Management
The Black Box: Interpretability, Reproducibility, and Data ManagementThe Black Box: Interpretability, Reproducibility, and Data Management
The Black Box: Interpretability, Reproducibility, and Data Management
 
How can a quality engineering and assurance consultancy keep you ahead of others
How can a quality engineering and assurance consultancy keep you ahead of othersHow can a quality engineering and assurance consultancy keep you ahead of others
How can a quality engineering and assurance consultancy keep you ahead of others
 
Big data and Predictive Analytics By : Professor Lili Saghafi
Big data and Predictive Analytics By : Professor Lili SaghafiBig data and Predictive Analytics By : Professor Lili Saghafi
Big data and Predictive Analytics By : Professor Lili Saghafi
 
1645 track 1 bress_using his laptop
1645 track 1 bress_using his laptop1645 track 1 bress_using his laptop
1645 track 1 bress_using his laptop
 
Never Mind Big Data: We're Still Living in the Era of Big Spreadsheet
Never Mind Big Data: We're Still Living in the Era of Big SpreadsheetNever Mind Big Data: We're Still Living in the Era of Big Spreadsheet
Never Mind Big Data: We're Still Living in the Era of Big Spreadsheet
 
Giovanni Lanzani GoDataDriven
Giovanni Lanzani GoDataDrivenGiovanni Lanzani GoDataDriven
Giovanni Lanzani GoDataDriven
 
Beyond the Science Gateway
Beyond the Science GatewayBeyond the Science Gateway
Beyond the Science Gateway
 

Semelhante a Why Data Science Projects Fail?

Big Data LDN 2018: HOW AUTOMATION CAN ACCELERATE THE DELIVERY OF MACHINE LEAR...
Big Data LDN 2018: HOW AUTOMATION CAN ACCELERATE THE DELIVERY OF MACHINE LEAR...Big Data LDN 2018: HOW AUTOMATION CAN ACCELERATE THE DELIVERY OF MACHINE LEAR...
Big Data LDN 2018: HOW AUTOMATION CAN ACCELERATE THE DELIVERY OF MACHINE LEAR...Matt Stubbs
 
Data Science as a Service: Intersection of Cloud Computing and Data Science
Data Science as a Service: Intersection of Cloud Computing and Data ScienceData Science as a Service: Intersection of Cloud Computing and Data Science
Data Science as a Service: Intersection of Cloud Computing and Data SciencePouria Amirian
 
Data Science as a Service: Intersection of Cloud Computing and Data Science
Data Science as a Service: Intersection of Cloud Computing and Data ScienceData Science as a Service: Intersection of Cloud Computing and Data Science
Data Science as a Service: Intersection of Cloud Computing and Data SciencePouria Amirian
 
Data kitchen 7 agile steps - big data fest 9-18-2015
Data kitchen   7 agile steps - big data fest 9-18-2015Data kitchen   7 agile steps - big data fest 9-18-2015
Data kitchen 7 agile steps - big data fest 9-18-2015DataKitchen
 
Rabobank - There is something about Data
Rabobank - There is something about DataRabobank - There is something about Data
Rabobank - There is something about DataBigDataExpo
 
Course 8 : How to start your big data project by Eric Rodriguez
Course 8 : How to start your big data project by Eric Rodriguez Course 8 : How to start your big data project by Eric Rodriguez
Course 8 : How to start your big data project by Eric Rodriguez Betacowork
 
Data science presentation 2nd CI day
Data science presentation 2nd CI dayData science presentation 2nd CI day
Data science presentation 2nd CI dayMohammed Barakat
 
How Data Virtualization Puts Enterprise Machine Learning Programs into Produc...
How Data Virtualization Puts Enterprise Machine Learning Programs into Produc...How Data Virtualization Puts Enterprise Machine Learning Programs into Produc...
How Data Virtualization Puts Enterprise Machine Learning Programs into Produc...Denodo
 
Future-Proof Your Streaming Analytics Architecture- StreamAnalytix Webinar
Future-Proof Your Streaming Analytics Architecture- StreamAnalytix WebinarFuture-Proof Your Streaming Analytics Architecture- StreamAnalytix Webinar
Future-Proof Your Streaming Analytics Architecture- StreamAnalytix WebinarImpetus Technologies
 
1.-DE-LECTURE-1-INTRO-TO-DATA-ENGG.pptx
1.-DE-LECTURE-1-INTRO-TO-DATA-ENGG.pptx1.-DE-LECTURE-1-INTRO-TO-DATA-ENGG.pptx
1.-DE-LECTURE-1-INTRO-TO-DATA-ENGG.pptxarpit206900
 
Washington DC DataOps Meetup -- Nov 2019
Washington DC DataOps Meetup   -- Nov 2019Washington DC DataOps Meetup   -- Nov 2019
Washington DC DataOps Meetup -- Nov 2019DataKitchen
 
Analyst Webinar: The Role of a Logical Architecture in Modern Data and Analytics
Analyst Webinar: The Role of a Logical Architecture in Modern Data and AnalyticsAnalyst Webinar: The Role of a Logical Architecture in Modern Data and Analytics
Analyst Webinar: The Role of a Logical Architecture in Modern Data and AnalyticsDenodo
 
Practical Applications of Machine Learning in Cybersecurity
Practical Applications of Machine Learning in CybersecurityPractical Applications of Machine Learning in Cybersecurity
Practical Applications of Machine Learning in Cybersecurityscoopnewsgroup
 
L1 Introduction DS.pptx
L1 Introduction DS.pptxL1 Introduction DS.pptx
L1 Introduction DS.pptxShambhavi Vats
 
ADV Slides: What the Aspiring or New Data Scientist Needs to Know About the E...
ADV Slides: What the Aspiring or New Data Scientist Needs to Know About the E...ADV Slides: What the Aspiring or New Data Scientist Needs to Know About the E...
ADV Slides: What the Aspiring or New Data Scientist Needs to Know About the E...DATAVERSITY
 
PXL Data Engineering Workshop By Selligent
PXL Data Engineering Workshop By Selligent PXL Data Engineering Workshop By Selligent
PXL Data Engineering Workshop By Selligent Jonny Daenen
 
Future of Data Strategy (ASEAN)
Future of Data Strategy (ASEAN)Future of Data Strategy (ASEAN)
Future of Data Strategy (ASEAN)Denodo
 
Why Your Data Science Architecture Should Include a Data Virtualization Tool ...
Why Your Data Science Architecture Should Include a Data Virtualization Tool ...Why Your Data Science Architecture Should Include a Data Virtualization Tool ...
Why Your Data Science Architecture Should Include a Data Virtualization Tool ...Denodo
 

Semelhante a Why Data Science Projects Fail? (20)

Challenges of Executing AI
Challenges of Executing AIChallenges of Executing AI
Challenges of Executing AI
 
Big Data LDN 2018: HOW AUTOMATION CAN ACCELERATE THE DELIVERY OF MACHINE LEAR...
Big Data LDN 2018: HOW AUTOMATION CAN ACCELERATE THE DELIVERY OF MACHINE LEAR...Big Data LDN 2018: HOW AUTOMATION CAN ACCELERATE THE DELIVERY OF MACHINE LEAR...
Big Data LDN 2018: HOW AUTOMATION CAN ACCELERATE THE DELIVERY OF MACHINE LEAR...
 
Data Science as a Service: Intersection of Cloud Computing and Data Science
Data Science as a Service: Intersection of Cloud Computing and Data ScienceData Science as a Service: Intersection of Cloud Computing and Data Science
Data Science as a Service: Intersection of Cloud Computing and Data Science
 
Data Science as a Service: Intersection of Cloud Computing and Data Science
Data Science as a Service: Intersection of Cloud Computing and Data ScienceData Science as a Service: Intersection of Cloud Computing and Data Science
Data Science as a Service: Intersection of Cloud Computing and Data Science
 
Data kitchen 7 agile steps - big data fest 9-18-2015
Data kitchen   7 agile steps - big data fest 9-18-2015Data kitchen   7 agile steps - big data fest 9-18-2015
Data kitchen 7 agile steps - big data fest 9-18-2015
 
Rabobank - There is something about Data
Rabobank - There is something about DataRabobank - There is something about Data
Rabobank - There is something about Data
 
Course 8 : How to start your big data project by Eric Rodriguez
Course 8 : How to start your big data project by Eric Rodriguez Course 8 : How to start your big data project by Eric Rodriguez
Course 8 : How to start your big data project by Eric Rodriguez
 
Data science presentation 2nd CI day
Data science presentation 2nd CI dayData science presentation 2nd CI day
Data science presentation 2nd CI day
 
How Data Virtualization Puts Enterprise Machine Learning Programs into Produc...
How Data Virtualization Puts Enterprise Machine Learning Programs into Produc...How Data Virtualization Puts Enterprise Machine Learning Programs into Produc...
How Data Virtualization Puts Enterprise Machine Learning Programs into Produc...
 
Future-Proof Your Streaming Analytics Architecture- StreamAnalytix Webinar
Future-Proof Your Streaming Analytics Architecture- StreamAnalytix WebinarFuture-Proof Your Streaming Analytics Architecture- StreamAnalytix Webinar
Future-Proof Your Streaming Analytics Architecture- StreamAnalytix Webinar
 
1.-DE-LECTURE-1-INTRO-TO-DATA-ENGG.pptx
1.-DE-LECTURE-1-INTRO-TO-DATA-ENGG.pptx1.-DE-LECTURE-1-INTRO-TO-DATA-ENGG.pptx
1.-DE-LECTURE-1-INTRO-TO-DATA-ENGG.pptx
 
Washington DC DataOps Meetup -- Nov 2019
Washington DC DataOps Meetup   -- Nov 2019Washington DC DataOps Meetup   -- Nov 2019
Washington DC DataOps Meetup -- Nov 2019
 
Analyst Webinar: The Role of a Logical Architecture in Modern Data and Analytics
Analyst Webinar: The Role of a Logical Architecture in Modern Data and AnalyticsAnalyst Webinar: The Role of a Logical Architecture in Modern Data and Analytics
Analyst Webinar: The Role of a Logical Architecture in Modern Data and Analytics
 
Practical Applications of Machine Learning in Cybersecurity
Practical Applications of Machine Learning in CybersecurityPractical Applications of Machine Learning in Cybersecurity
Practical Applications of Machine Learning in Cybersecurity
 
L1 Introduction DS.pptx
L1 Introduction DS.pptxL1 Introduction DS.pptx
L1 Introduction DS.pptx
 
ADV Slides: What the Aspiring or New Data Scientist Needs to Know About the E...
ADV Slides: What the Aspiring or New Data Scientist Needs to Know About the E...ADV Slides: What the Aspiring or New Data Scientist Needs to Know About the E...
ADV Slides: What the Aspiring or New Data Scientist Needs to Know About the E...
 
PXL Data Engineering Workshop By Selligent
PXL Data Engineering Workshop By Selligent PXL Data Engineering Workshop By Selligent
PXL Data Engineering Workshop By Selligent
 
Big data ppt
Big data pptBig data ppt
Big data ppt
 
Future of Data Strategy (ASEAN)
Future of Data Strategy (ASEAN)Future of Data Strategy (ASEAN)
Future of Data Strategy (ASEAN)
 
Why Your Data Science Architecture Should Include a Data Virtualization Tool ...
Why Your Data Science Architecture Should Include a Data Virtualization Tool ...Why Your Data Science Architecture Should Include a Data Virtualization Tool ...
Why Your Data Science Architecture Should Include a Data Virtualization Tool ...
 

Mais de Ethan Ram

App Install Fraud - Who? How? Why? and How to Fight it? - FraudCon 3.0 2019
App Install Fraud - Who? How? Why? and How to Fight it? - FraudCon 3.0 2019App Install Fraud - Who? How? Why? and How to Fight it? - FraudCon 3.0 2019
App Install Fraud - Who? How? Why? and How to Fight it? - FraudCon 3.0 2019Ethan Ram
 
Kiss.ts - The Keep It Simple Software Stack for 2017++
Kiss.ts - The Keep It Simple Software Stack for 2017++Kiss.ts - The Keep It Simple Software Stack for 2017++
Kiss.ts - The Keep It Simple Software Stack for 2017++Ethan Ram
 
How to Measure Agility Project Success in Business Terms
How to Measure Agility Project Success in Business TermsHow to Measure Agility Project Success in Business Terms
How to Measure Agility Project Success in Business TermsEthan Ram
 
Making the Agile Leap to Continuous Deployment
Making the Agile Leap to Continuous DeploymentMaking the Agile Leap to Continuous Deployment
Making the Agile Leap to Continuous DeploymentEthan Ram
 
DevOps / Agile Tools Seminar 2013
DevOps / Agile Tools Seminar 2013DevOps / Agile Tools Seminar 2013
DevOps / Agile Tools Seminar 2013Ethan Ram
 
Advanced topics in Agile: Implementing Scrum in a project-based company
Advanced topics in Agile: Implementing Scrum in a project-based companyAdvanced topics in Agile: Implementing Scrum in a project-based company
Advanced topics in Agile: Implementing Scrum in a project-based companyEthan Ram
 

Mais de Ethan Ram (6)

App Install Fraud - Who? How? Why? and How to Fight it? - FraudCon 3.0 2019
App Install Fraud - Who? How? Why? and How to Fight it? - FraudCon 3.0 2019App Install Fraud - Who? How? Why? and How to Fight it? - FraudCon 3.0 2019
App Install Fraud - Who? How? Why? and How to Fight it? - FraudCon 3.0 2019
 
Kiss.ts - The Keep It Simple Software Stack for 2017++
Kiss.ts - The Keep It Simple Software Stack for 2017++Kiss.ts - The Keep It Simple Software Stack for 2017++
Kiss.ts - The Keep It Simple Software Stack for 2017++
 
How to Measure Agility Project Success in Business Terms
How to Measure Agility Project Success in Business TermsHow to Measure Agility Project Success in Business Terms
How to Measure Agility Project Success in Business Terms
 
Making the Agile Leap to Continuous Deployment
Making the Agile Leap to Continuous DeploymentMaking the Agile Leap to Continuous Deployment
Making the Agile Leap to Continuous Deployment
 
DevOps / Agile Tools Seminar 2013
DevOps / Agile Tools Seminar 2013DevOps / Agile Tools Seminar 2013
DevOps / Agile Tools Seminar 2013
 
Advanced topics in Agile: Implementing Scrum in a project-based company
Advanced topics in Agile: Implementing Scrum in a project-based companyAdvanced topics in Agile: Implementing Scrum in a project-based company
Advanced topics in Agile: Implementing Scrum in a project-based company
 

Último

Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...apidays
 
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfRising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfOrbitshub
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024The Digital Insurer
 
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot ModelMcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot ModelDeepika Singh
 
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...Angeliki Cooney
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Victor Rentea
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxRustici Software
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century educationjfdjdjcjdnsjd
 
[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdfSandro Moreira
 
Platformless Horizons for Digital Adaptability
Platformless Horizons for Digital AdaptabilityPlatformless Horizons for Digital Adaptability
Platformless Horizons for Digital AdaptabilityWSO2
 
Vector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptxVector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptxRemote DBA Services
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MIND CTI
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodJuan lago vázquez
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FMESafe Software
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobeapidays
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsNanddeep Nachan
 
Six Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal OntologySix Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal Ontologyjohnbeverley2021
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...Zilliz
 

Último (20)

Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
 
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfRising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024
 
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot ModelMcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
 
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptx
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf
 
Platformless Horizons for Digital Adaptability
Platformless Horizons for Digital AdaptabilityPlatformless Horizons for Digital Adaptability
Platformless Horizons for Digital Adaptability
 
Vector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptxVector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptx
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectors
 
Six Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal OntologySix Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal Ontology
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
 

Why Data Science Projects Fail?

  • 1. WHY SO MANY DATA SCIENCE PROJECTS FAIL? Ethan Ram / Aug. 2018 1
  • 2. • Between 70% to 80% of corporate business intelligence projects fail (Gartner) • 55% of big data projects are never finished (Inforchimps) • Only 13% of organizations achieve full-scale production for their in-house big-data implementations (Qubole) • And the results… DATA SCIENCE PROJECTS FAIL… 9/3/2018 Why So Many Data Science Projects Fail 2
  • 3. Top of the list of developers who said they are looking for a new job*: • ML specialists - 14.3% • Data scientists - 13.2% 9/3/2018 Why So Many Data Science Projects Fail 3 “I HATE THIS JOB!” * 2018 Stack Overflow survey based on 64,000 developers’ answers
  • 4. Business objective and plan Build dataset Model data and validate Implement application Deploy Monitor, measure & optimize We’ll look at some common failures in each step and suggest better approaches. DATA SCIENCE APPLICATION LIFECYCLE 9/3/2018 Why So Many Data Science Projects Fail 4
  • 5. •First day success •No false-positives •100% accuracy •No business value expected •Expecting that the ML itself would be the product •Not defining the deliverable 9/3/2018 Why So Many Data Science Projects Fail 5 BUSINESS OBJECTIVE FAILURES
  • 6. • Google “fixed” its “racist” algorithm by removing gorillas from its image- labeling tech CAN YOU AFFORD A FALSE POSITIVE?
  • 7. •Very few business’ core product is AI/ML/Data based •Most use those tools to improve their bottom lines with existing products BE REALISTIC! 9/3/2018 Why So Many Data Science Projects Fail 7
  • 8. 1. Descriptive analysis (offline report) 2. Dashboard (real-time system) 3. Automated decision making system (“self driving” system) 4. Dataset with specific qualities (to be used by another ML) Define: leverage, friction to impact and cleanness 5. Methodology (dataset >> model) 6. Framework (API/SDK to build methodologies) 7. Proof-of-concept (proof a viable methodology) TYPES OF DELIVERABLES 9/3/2018 Why So Many Data Science Projects Fail 8
  • 9. •Missing diversity in the team •In many projects 80% of work is working on the dataset! •It’s a *research* project! •Short time to delivery PLANNING FAILURES Drue Conaway: Data Science Diagram 9/3/2018 Why So Many Data Science Projects Fail 9 Engineering
  • 10. YOLO V3 NETWORK ARCHITECTURE
  • 11. •Too little data to build on •Dataset is dirty •Missing data from the field DATA INVENTORY FAILURES 9/3/2018 Why So Many Data Science Projects Fail 11
  • 12. 9/3/2018 Why So Many Data Science Projects Fail 12 DIRTY DATASET: NEGATIVE INFLUENCE Data-set includes negative influence examples Resulting Classification (with confidence)
  • 13. 9/3/2018 Why So Many Data Science Projects Fail 13 DATA MODELING FAILURESYou need to be able to understand the result! •Jumping to conclusions on what the data is •Assuming it works based on a small sample •Feedback-loop in results •Missing cross validation •Choosing algorithms that are too heavy for the application
  • 14. Supervised learning Classification Linear classifiers / Fisher's discriminant Support vector machines / Least squares Quadratic classifiers Kernel estimation K-nearest neighbor Regression Linear Regression Logistic Regression CART Naïve Bayes Ensemble Bagging with Random Forests Boosting with XGBoost Unsupervised learning Association Apriori K-means Clustering Mean-Shift Density-Based Spatial EM-GMM Agglomerative Hierarchical Dimensionality Reduction Feature Selection Variance Thresholds Correlation Thresholds Genetic Algorithms (GA) Stepwise Search Feature extraction PCA Linear Discriminant Analysis (LDA) Autoencoders Reinforcement learning Exploration a.Criterion of optimality a.Brute force a.Value function a.Direct policy search 9/3/2018 Why So Many Data Science Projects Fail 14 Application Class Algorithms ML ALGORITHMS [PARTIAL] MAP Boosting Decision trees Random forests Neural networks Learning vector quantization
  • 15. •Requesting the Data Scientists team to build the application… •Not testing to scale •Switching from monitoring to automatic action-taking too fast •Missing safeguards on output •Not preparing for attack! APPLICATION FAILURES 9/3/2018 Why So Many Data Science Projects Fail 15
  • 16. 9/3/2018 Why So Many Data Science Projects Fail 16 DIRECT ATTACK EXAMPLE
  • 17.  9/3/2018 Why So Many Data Science Projects Fail 17 SYNTHESIZED ADVERSARIAL EXAMPLE
  • 18. “WE HAVEN’T SEEN ANYTHING LIKE THIS BEFORE…” 9/3/2018 Why So Many Data Science Projects Fail 18
  • 19. •Assuming it just works… • Not having a long enough beta • Missing feedback from real users •Missing KPIs • Measure business success • Find false-positives •Missing A-B testing built-in MONITOR > MEASURE > OPTIMIZE FAILURES 9/3/2018 Why So Many Data Science Projects Fail 19
  • 20. "Right now, a lot of our AI systems make decisions in ways that people don't really understand… And I don't think that… we want to end up with systems that people don't understand how they're making decisions.“ • ZUCKERBERG at Senate hearing 10-Apr-18 9/3/2018 Why So Many Data Science Projects Fail 20
  • 21. Business objective and plan Build dataset Model data and validate Implement application Deploy Monitor, measure & optimize DATA SCIENCE APPLICATION LIFECYCLE •Q&A 9/3/2018 Why So Many Data Science Projects Fail 21

Notas do Editor

  1. Expect magic to happen! YOLO (You Only Look Once) is a lightweight real-time object detection – can detect objects on a video-stream. It took 5 years to get to this version.. Dan Ariely: Big data is like teenage sex: everyone talks about it, nobody really knows how to do it, everyone thinks everyone else is doing it, so everyone claims they are doing it...
  2. Precision  accuracy False positive vs false negative… Think of automatic cancer prediction – Can reduce false negatives of a human proffesional. - - like a radiologist
  3. Outputs of data science: Descriptive analysis (a report): clear answer to a clear question like what should be the platform to release the new product: Android or iOS. This is usually on offline system. Dashboard: helps a human decide and take action continuously, or again and again. This is usually an online/real-time system. Automated decision making: based on the dashboard, take automatic action. (“self driving” system) Data-set: data that is then used by another algorithm. For example, a cleaned-up list of addresses that were given by users on a form. A data-set used for training and benchmarking object extraction from images: COCO dataset or IMAGE-NET dataset. If your dataset is no good you will never get a good results. Qualities of a dataset: Leverage: the potential of the dataset - - what it can be used for Friction to impact: what is the additional work needed on the dataset to get a significant Cleanness: percentage of errors in the dataset that may sabotage the learning process. A methodology: the system/algorithm that is used to take a dataset and create a model that can then be used to answer a question. For example, how to estimate national poll results based on a sample questioner of 500 ppl >> a "Bias correction" system A Recommendation system Framework: an API or SDK that is used to build (code) methodologies. For example, Google's AI framework, TensorFlow. The framework should assist in lowering the Friction to impact. Proof-of-concept: it does not give the business impact but it gives the notion that the methodology is viable. Used for "fail-fast" or as a first milestone in a larger project.
  4. Computer science >> computer engineering Math & stats - - many times done by physicians Need ppl that can do the data tagging It’s a research project! Data Science is more than machine learning The importance of diversity in a data-scientists team: based on the diagram it is clear that it's very hard to find ppl that are able to answer all the above, especially for a team that is meant to answer questions from a diverse set of domains. Some like offline-analysis Some like real-time systems Some are about processes some are about tools Some are very good in one domain but has zero knowledge about anything else… Etc..   A good data scientist better have at least excellent proficiency in one side _and_ at least some understanding in the other 2 sides.  
  5. Example of how complicated an ML project can be… YOLO (You Only Look Once) is a lightweight real-time object detection – can detect objects on a video-stream. It took 5 years to get to this version..
  6. Google Translate’s Maori dataset is too small, leading to some funny mistakes. Better not train your model on these cat pictures… A satiations would know this… but a computer system engineer would not. Internal politics – would the engineer get access to the transactions database???
  7. Feedback-loop in results => need to understand causality. e.g. testing a 'like' btn size. clicking 'like' on a big-btn brings the item to top of list for everyone so it affects control-group clicks. Must make sure the observational inference matches causality. You need to be able to understand the result.
  8. … Boosting Consider changing to Yoav’s chart - - give examples
  9. Building an application with a good UX is outside the scope of a Data Scientist team Tay is an AI-based chat bot created by Microsoft and “unleashed” on Tweeter in 2016. It soon absorbed what people talking with her as the truth..
  10. URME Personal Surveillance Identity Prosthetic – by http://www.urmesurveillance.com/ Kerckhoffs-Shannon principle: “one ought to design systems under the assumption that the enemy will immediately gain full familiarity with them”. Don’t rely on the privacy of the model because one day or another, it will be leaked. You should not base your code entirely on open-source algorithms. You should not base your model on open data-sets
  11. Generative Adversarial Networks (GAN) – is sometimes used to fool the original network. In the example: projected gradient decent Synthesizing adversarial examples for neural networks is surprisingly easy: small, carefully-crafted perturbations to inputs can cause neural networks to misclassify inputs in arbitrarily chosen ways. Given that adversarial examples transfer to the physical world and can be made extremely robust, this is a real security concern.
  12. In the GIF: Tesla Model S adaptive cruise control 1 second before crashes into a parked Van on the roadside - - May 2016
  13. KPI: Key Performance Indicator
  14. Generative Adversarial Networks (GAN) – is sometimes used to fool the original network >> it can be used to understand how the neural network works.
  15. Original map (interactive): http://scikit-learn.org/stable/tutorial/machine_learning_map/
  16. Types of data - 2 axis: Is it a qualitive (e.g. questionnaire) or a quantitative (sales transaction logs) Our data (e.g. logs) or 3rd pty data (e.g. Wikipedia dataset)