SlideShare uma empresa Scribd logo
1 de 19
Making Data Science Work
By
Dr. Kuldeep Deshpande
Saumitra Modak
10 x increase in
data science jobs!
Out of all Data Science Projects…
Only these many succeed.
Gartner says 80% of analytics insights will not deliver business outcomes through 2022 and 80% of AI projects
will “remain alchemy, run by wizards” through 2020
Is DS = BS* ?
It is!
Data Science = Business Sense
Why Data Science Projects Fail?
Initiation People and Process Solution Design Data Access Data Fallacies
Inadequate research
Starting with the
wrong questions
Not addressing the
root cause
Initiating data science
project due to a blog
Try to take on to large
of a first project
Lack of diverse
Subject Matter Experts
Lacking an experienced
data science leader
Limited business
understanding
Lack of a standardized
data science process
Failing to
communicate value
of data science
The solutions are
too complex
Forming conclusions
before data scientists
start
Poorly designed
models
Fail to provide
actionable insights
Using technologies
because they are
cool
Lack of access to
data
Using Faulty / Bad
Data
Having a data scientist
build their own ETLs
Relying on Excel as
the main data storage.
Big data silos or
vendor owned
data!
Simpson’s Paradox
Setting wrong
performance measures
McNamara Fallacy
Overfitting
Data Dredging
Availability of right data
More Data Beats A Cleverer Algorithm!
You need 10 times as many examples as degrees of freedom in
the model.
What
model
should I
use?
How much
training
data
should I
gather?
How much data is required for algorithms?
IT DEPENDS!
More features may help overcome issue of lesser data.
Correctly tagged data may be more useful than large untagged
dataset.
Small Data problems
Over-fitting becomes much harder to avoid
Outliers become much more dangerous.
Noise becomes a real issue!
Setting The Right Performance Measure
Regression Classification
OthersUnsupervised
Models
o MSPE
o MSAE
o R Square
o Adjusted R Square
o Precision- Recall
o ROC-AUC
o Accuracy
o Log-Loss
o Rand Index
o Mutual Information
o CV Error
o Heuristic methods
tc find K
o BLEU Score (NLP)
Classification of legal documents using Data Science
A u t o m a t e d
C l a s s i f i c a t i o n o f l e g a l
d o c u m e n t s f r o m 3 0 0 0 +
g o v e r n m e n t w e b p a g e s
• U n s t r u c t u r e d D a t a
• Tr a i n i n g d a t a – M o r e
t h a n 9 0 0 0 0 m a n u a l l y
c l a s s i f i e d
• 9 5 % o ve r a l l
a c c u r a c y
• M o r e t h a n 8 5 %
r e c a l l
Data UsedObjective Result
Availability of right data Right Performance Measure
• N a ï v e B a y e s
• S V M
• R o c c h i o
• Te c h n o l o g i e s – P y t h o n
a n d R e d i s
Algorithms And Technologies
Milking the bull – When data science is going to fail!
Images and text with very less
features
Beware of data issues
that can lead to failure
Inaccurately tagged data
Low quantity data
Absence of actual predictors
Time series data without features
The model does not categorize the data correctly because of too much of details
and noise.
Overfitting
A statistical model is said to be overfitted, when we train it with a lot of data.
When a model gets trained with so much of data, it starts learning from the
noise and inaccurate data entries.
Production forecasting for manufacturers
P r e d i c t i n g d a i l y
p r o d u c t i o n f o r a
m a n u f a c t u r i n g
c o m p a n y
* M o n t h l y t a r g e t s n o t a v a i l a b l e
* L a b o r d e t a i l s u n r e l i a b l e
* M a c h i n e d e t a i l s n o t a v a i l a b l e
• P a t t e r n s f o u n d i n
s u b s e t s o f d a t a
n o t g e n e r a l i z i n g
• N o u s e f u l
p r e d i c t i o n
Data UsedObjective Result
Milking The Bull Overfitting
• R
• A R I M A , L i n e a r
r e g r e s s i o n , C a t b o o s t ,
& R a n d o m F o r e s t s w i t h
f e a t u r e e n g i n e e r i n g
Algorithms And Technologies
6 m o n t h s h o u r l y
p r o d u c t i o n n u m b e r s
How companies use hammers to kill mosquitos!
appropriate technologies do!
Sexy Technologies don’t guarantee success,
Data must be clean or there should be a way to clean data.
Better Data > Efficient Algorithms
Domain understanding
Data Owner interactions
Acceptance of
missing values
Data Profiling
automation
• Remove Unwanted observations
• Fix Structural Errors
• Filter Unwanted Outliers
• Handle Missing Data
Customer Churn Prediction For Specialty Insurance
• P r e d i c t l i k e l i h o o d o f
c h u r n f o r e a c h
p o l i c y
• G e t m a x i m u m c h u r n
d e t e c t i o n r a t e w h i l e
k e e p i n g f a l s e
a l a r m s l o w e r t h a n
2 0 %
• D a t a u n d e r s t a n d i n g w i t h
d o m a i n e x p e r t s
• D a t a c l e a n i n g a n d
e x p l o r a t i o n f o r i n s i g h t s
a n d p r e d i c t o r
i d e n t i f i c a t i o n
• A b l e t o a c h i e v e 7 5 %
c h u r n d e t e c t i o n r a t e
w h i l e k e e p i n g t h e
f a l s e a l a r m r a t e l e s s
t h a t 2 0 %
• T i m e l y p r e d i c t i o n o f
c h u r n h e l p i n g i n
t a k i n g r e t e n t i o n
a c t i o n
Data UsedObjective Result
Appropriate Technologies Clean Data
Algorithms And Technologies
• R
• R a n d o m F o r e s t , L o g i s t i c
R e g r e s s i o n , G r a d i e n t
B o o s t i n g a n d C a t B o o s t
A phenomenon in which a trend appears in different groups of data
but disappears or reverses when the groups are combined.
Simpson’s Paradox
The admission process seems significantly
biased against women.
But in reality most of the departments
are significantly biased against men.
Admission to UC Berkeley
Making a decision based solely on quantitative observations
and ignoring all others.
McNamara Fallacy
Presume that which cannot be measured easily is not important.
• Let us assume a company has developed a new E-Commerce website.
• After new website, site visits are up 50% and number of newsletter subscriptions are up 25%.
Measure whatever can be easily measured.
Disregard that which cannot be measured easily.
But
What if percentage of people who never open their emails OR
who unsubscribe immediately has increased?
Web Traffic Measurement
Biotech Innovation Efficiency Analytics
• A n a l y z e p o t e n t i a l o f
e a r l y s t a g e b i o t e c h
f i r m s b y a n a l y z i n g
t h e i r I n n o v a t i o n
e f f i c i e n c y.
• F i n d o u t s t a t i s t i c a l
c o r r e l a t i o n o f
p e r f o r m a n c e o f a
c o m p a n y w i t h
i n n o v a t i o n e f f i c i e n c y.
• C l i n i c a l Tr a i l s
• P r e s s r e l e a s e
• S t o c k d e t a i l s
• P a t e n t s
• P u b l i c a t i o n s
• C o m p a n y F i n a n c i a l s
Data UsedObjective Result
Simpson’s Paradox McNamara Fallacy
Algorithms And Technologies
• R a n d o m F o r e s t
• D e c i s i o n T r e e
• N e u r a l N e t w o r k s
• R e g r e s s i o n
Right Performance Measure Domain Understanding
S t a t i s t i c a l
c o r r e l a t i o n b e t w e e n
f i n a n c i a l
p e r f o r m a n c e a n d
i n n o va t i o n
e f f i c i e n c y f o r
c e r t a i n c a t e g o r y o f
c o m p a n i e s
Why Data Science Projects Fail?
Initiation People and Process Solution Design Data Access Data Fallacies
Inadequate research
Starting with the
wrong questions
Not addressing the
root cause
Initiating data science
project due to a blog
Try to take on to large
of a first project
Lack of diverse
Subject Matter Experts
Lacking an experienced
data science leader
Limited business
understanding
Lack of a standardized
data science process
Failing to
communicate value
of data science
The solutions are
too complex
Forming conclusions
before data scientists
start
Poorly designed
models
Fail to provide
actionable insights
Using technologies
because they are
cool
Lack of access to
data
Using Faulty / Bad
Data
Having a data scientist
build their own ETLs
Relying on Excel as
the main data storage.
Big data silos or
vendor owned
data!
Simpson’s Paradox
Setting wrong
performance measures
McNamara Fallacy
Overfitting
Data Dredging

Mais conteúdo relacionado

Mais procurados

Agile Marketing For The Real World event - Signal - 6th Nov 2019
Agile Marketing For The Real World event - Signal - 6th Nov 2019Agile Marketing For The Real World event - Signal - 6th Nov 2019
Agile Marketing For The Real World event - Signal - 6th Nov 2019Lauren Cormack
 
Trends on Pinterest
Trends on PinterestTrends on Pinterest
Trends on PinterestJune Andrews
 
LKCE18 Dimitar Bakardziev - Kanban Policy Game
LKCE18 Dimitar Bakardziev - Kanban Policy GameLKCE18 Dimitar Bakardziev - Kanban Policy Game
LKCE18 Dimitar Bakardziev - Kanban Policy GameLean Kanban Central Europe
 
Improving the development process with metrics driven insights presentation
Improving the development process with metrics driven insights presentationImproving the development process with metrics driven insights presentation
Improving the development process with metrics driven insights presentationindeedeng
 
Gain Maximum Visibility - DEM06 - Anaheim AWS Summit
Gain Maximum Visibility - DEM06 - Anaheim AWS SummitGain Maximum Visibility - DEM06 - Anaheim AWS Summit
Gain Maximum Visibility - DEM06 - Anaheim AWS SummitAmazon Web Services
 
Estimations in Project Management
Estimations in Project ManagementEstimations in Project Management
Estimations in Project ManagementIntaver Insititute
 
Big Data and Small Devices: What will it do for us and to us
Big Data and Small Devices: What will it do for us and to usBig Data and Small Devices: What will it do for us and to us
Big Data and Small Devices: What will it do for us and to usJohn Tomizuka
 
Grady Newsource: UX Study
Grady Newsource: UX StudyGrady Newsource: UX Study
Grady Newsource: UX StudyKate Devlin
 
Artificial Assistants: How can I help you? by Christopher Currin
Artificial Assistants: How can I help you? by Christopher CurrinArtificial Assistants: How can I help you? by Christopher Currin
Artificial Assistants: How can I help you? by Christopher CurrinChristopher Currin
 
Data Science 101
Data Science 101Data Science 101
Data Science 101odsc
 
Risk Management and Reliable Forecasting using Un-reliable Data (magennis) - ...
Risk Management and Reliable Forecasting using Un-reliable Data (magennis) - ...Risk Management and Reliable Forecasting using Un-reliable Data (magennis) - ...
Risk Management and Reliable Forecasting using Un-reliable Data (magennis) - ...Troy Magennis
 
Developing Analytic Technique and Defeating Cognitive Bias in Security
Developing Analytic Technique and Defeating Cognitive Bias in SecurityDeveloping Analytic Technique and Defeating Cognitive Bias in Security
Developing Analytic Technique and Defeating Cognitive Bias in Securitychrissanders88
 
A Comparative Study of Data Management Maturity Models
A Comparative Study of Data Management Maturity ModelsA Comparative Study of Data Management Maturity Models
A Comparative Study of Data Management Maturity ModelsData Crossroads
 
A Comparative Study of Data Management Maturity Models
A Comparative Study of Data Management Maturity ModelsA Comparative Study of Data Management Maturity Models
A Comparative Study of Data Management Maturity ModelsData Crossroads
 
What is the story with agile data keynote agile 2018 (Magennis)
What is the story with agile data keynote   agile 2018 (Magennis)What is the story with agile data keynote   agile 2018 (Magennis)
What is the story with agile data keynote agile 2018 (Magennis)Troy Magennis
 

Mais procurados (20)

Agile Marketing For The Real World event - Signal - 6th Nov 2019
Agile Marketing For The Real World event - Signal - 6th Nov 2019Agile Marketing For The Real World event - Signal - 6th Nov 2019
Agile Marketing For The Real World event - Signal - 6th Nov 2019
 
Trends on Pinterest
Trends on PinterestTrends on Pinterest
Trends on Pinterest
 
Whispers in Chaos
Whispers in ChaosWhispers in Chaos
Whispers in Chaos
 
Experience based choice
Experience based choiceExperience based choice
Experience based choice
 
LKCE18 Dimitar Bakardziev - Kanban Policy Game
LKCE18 Dimitar Bakardziev - Kanban Policy GameLKCE18 Dimitar Bakardziev - Kanban Policy Game
LKCE18 Dimitar Bakardziev - Kanban Policy Game
 
Improving the development process with metrics driven insights presentation
Improving the development process with metrics driven insights presentationImproving the development process with metrics driven insights presentation
Improving the development process with metrics driven insights presentation
 
Frappe Open Day - June 2018
Frappe Open Day - June 2018Frappe Open Day - June 2018
Frappe Open Day - June 2018
 
Real-Time Responsive Text Analytics
Real-Time Responsive Text Analytics Real-Time Responsive Text Analytics
Real-Time Responsive Text Analytics
 
Gain Maximum Visibility - DEM06 - Anaheim AWS Summit
Gain Maximum Visibility - DEM06 - Anaheim AWS SummitGain Maximum Visibility - DEM06 - Anaheim AWS Summit
Gain Maximum Visibility - DEM06 - Anaheim AWS Summit
 
Estimations in Project Management
Estimations in Project ManagementEstimations in Project Management
Estimations in Project Management
 
Big Data and Small Devices: What will it do for us and to us
Big Data and Small Devices: What will it do for us and to usBig Data and Small Devices: What will it do for us and to us
Big Data and Small Devices: What will it do for us and to us
 
Grady Newsource: UX Study
Grady Newsource: UX StudyGrady Newsource: UX Study
Grady Newsource: UX Study
 
Artificial Assistants: How can I help you? by Christopher Currin
Artificial Assistants: How can I help you? by Christopher CurrinArtificial Assistants: How can I help you? by Christopher Currin
Artificial Assistants: How can I help you? by Christopher Currin
 
Data Science 101
Data Science 101Data Science 101
Data Science 101
 
Agile metrics
Agile metricsAgile metrics
Agile metrics
 
Risk Management and Reliable Forecasting using Un-reliable Data (magennis) - ...
Risk Management and Reliable Forecasting using Un-reliable Data (magennis) - ...Risk Management and Reliable Forecasting using Un-reliable Data (magennis) - ...
Risk Management and Reliable Forecasting using Un-reliable Data (magennis) - ...
 
Developing Analytic Technique and Defeating Cognitive Bias in Security
Developing Analytic Technique and Defeating Cognitive Bias in SecurityDeveloping Analytic Technique and Defeating Cognitive Bias in Security
Developing Analytic Technique and Defeating Cognitive Bias in Security
 
A Comparative Study of Data Management Maturity Models
A Comparative Study of Data Management Maturity ModelsA Comparative Study of Data Management Maturity Models
A Comparative Study of Data Management Maturity Models
 
A Comparative Study of Data Management Maturity Models
A Comparative Study of Data Management Maturity ModelsA Comparative Study of Data Management Maturity Models
A Comparative Study of Data Management Maturity Models
 
What is the story with agile data keynote agile 2018 (Magennis)
What is the story with agile data keynote   agile 2018 (Magennis)What is the story with agile data keynote   agile 2018 (Magennis)
What is the story with agile data keynote agile 2018 (Magennis)
 

Semelhante a Ellicium Solutions - Making Data Science Work

Data Modelling Fundamentals course 3 day synopsis
Data Modelling Fundamentals course 3 day synopsisData Modelling Fundamentals course 3 day synopsis
Data Modelling Fundamentals course 3 day synopsisChristopher Bradley
 
Advanced Data Modelling course 3 day synopsis
Advanced Data Modelling course 3 day synopsisAdvanced Data Modelling course 3 day synopsis
Advanced Data Modelling course 3 day synopsisChristopher Bradley
 
Slides: How Automating Data Lineage Improves BI Performance
Slides: How Automating Data Lineage Improves BI PerformanceSlides: How Automating Data Lineage Improves BI Performance
Slides: How Automating Data Lineage Improves BI PerformanceDATAVERSITY
 
Como transformar servidores em cientistas de dados e diminuir a distância ent...
Como transformar servidores em cientistas de dados e diminuir a distância ent...Como transformar servidores em cientistas de dados e diminuir a distância ent...
Como transformar servidores em cientistas de dados e diminuir a distância ent...Rommel Carvalho
 
Switching horses midstream - From Waterfall to Agile
Switching horses midstream - From Waterfall to AgileSwitching horses midstream - From Waterfall to Agile
Switching horses midstream - From Waterfall to AgileDoc Norton
 
From the right process to a solid cultural change
From the right process to a solid cultural changeFrom the right process to a solid cultural change
From the right process to a solid cultural changeFrancesco Zaia
 
Dmmaturitymodelscomparison 190513162839
Dmmaturitymodelscomparison 190513162839Dmmaturitymodelscomparison 190513162839
Dmmaturitymodelscomparison 190513162839Irina Steenbeek, PhD
 
Final PPT Pratik 107.pptx
Final PPT Pratik 107.pptxFinal PPT Pratik 107.pptx
Final PPT Pratik 107.pptxVaibhavJhanwar2
 
Information Security Project Management
Information Security Project ManagementInformation Security Project Management
Information Security Project ManagementIgor Pertsovsky
 
Scaling your Tableau - Migrating from Tableau Online to a proper DWH solution...
Scaling your Tableau - Migrating from Tableau Online to a proper DWH solution...Scaling your Tableau - Migrating from Tableau Online to a proper DWH solution...
Scaling your Tableau - Migrating from Tableau Online to a proper DWH solution...Sergii Khomenko
 
Why Every Product Manager Needs to Know Big Data
Why Every Product Manager Needs to Know Big DataWhy Every Product Manager Needs to Know Big Data
Why Every Product Manager Needs to Know Big DataJeremy Horn
 
Data Modeling & Metadata for Graph Databases
Data Modeling & Metadata for Graph DatabasesData Modeling & Metadata for Graph Databases
Data Modeling & Metadata for Graph DatabasesDATAVERSITY
 
Tailoring Malaysian Blockchain Regulations For Digital Economy 2018 MIGHT
Tailoring Malaysian Blockchain Regulations For Digital Economy 2018 MIGHT Tailoring Malaysian Blockchain Regulations For Digital Economy 2018 MIGHT
Tailoring Malaysian Blockchain Regulations For Digital Economy 2018 MIGHT Kancil San
 
Presentation on BIKON - International BI conference
Presentation on BIKON - International BI conferencePresentation on BIKON - International BI conference
Presentation on BIKON - International BI conferenceKunal Bhattacharya
 
Gain Maximum Visibility into Your Applications - DEM03 - Chicago AWS Summit
Gain Maximum Visibility into Your Applications - DEM03 - Chicago AWS SummitGain Maximum Visibility into Your Applications - DEM03 - Chicago AWS Summit
Gain Maximum Visibility into Your Applications - DEM03 - Chicago AWS SummitAmazon Web Services
 
Analytics and Big Data in Law Firms
Analytics and Big Data in Law FirmsAnalytics and Big Data in Law Firms
Analytics and Big Data in Law FirmsLexisNexis Pacific
 

Semelhante a Ellicium Solutions - Making Data Science Work (20)

Data Modelling Fundamentals course 3 day synopsis
Data Modelling Fundamentals course 3 day synopsisData Modelling Fundamentals course 3 day synopsis
Data Modelling Fundamentals course 3 day synopsis
 
Advanced Data Modelling course 3 day synopsis
Advanced Data Modelling course 3 day synopsisAdvanced Data Modelling course 3 day synopsis
Advanced Data Modelling course 3 day synopsis
 
Slides: How Automating Data Lineage Improves BI Performance
Slides: How Automating Data Lineage Improves BI PerformanceSlides: How Automating Data Lineage Improves BI Performance
Slides: How Automating Data Lineage Improves BI Performance
 
Como transformar servidores em cientistas de dados e diminuir a distância ent...
Como transformar servidores em cientistas de dados e diminuir a distância ent...Como transformar servidores em cientistas de dados e diminuir a distância ent...
Como transformar servidores em cientistas de dados e diminuir a distância ent...
 
Switching horses midstream - From Waterfall to Agile
Switching horses midstream - From Waterfall to AgileSwitching horses midstream - From Waterfall to Agile
Switching horses midstream - From Waterfall to Agile
 
SENCER_panel.ppt
SENCER_panel.pptSENCER_panel.ppt
SENCER_panel.ppt
 
From the right process to a solid cultural change
From the right process to a solid cultural changeFrom the right process to a solid cultural change
From the right process to a solid cultural change
 
The IoT For Real
The IoT For Real The IoT For Real
The IoT For Real
 
Dmmaturitymodelscomparison 190513162839
Dmmaturitymodelscomparison 190513162839Dmmaturitymodelscomparison 190513162839
Dmmaturitymodelscomparison 190513162839
 
Final PPT Pratik 107.pptx
Final PPT Pratik 107.pptxFinal PPT Pratik 107.pptx
Final PPT Pratik 107.pptx
 
Information Security Project Management
Information Security Project ManagementInformation Security Project Management
Information Security Project Management
 
Scaling your Tableau - Migrating from Tableau Online to a proper DWH solution...
Scaling your Tableau - Migrating from Tableau Online to a proper DWH solution...Scaling your Tableau - Migrating from Tableau Online to a proper DWH solution...
Scaling your Tableau - Migrating from Tableau Online to a proper DWH solution...
 
Small data big impact
Small data big impactSmall data big impact
Small data big impact
 
Why Every Product Manager Needs to Know Big Data
Why Every Product Manager Needs to Know Big DataWhy Every Product Manager Needs to Know Big Data
Why Every Product Manager Needs to Know Big Data
 
Data Modeling & Metadata for Graph Databases
Data Modeling & Metadata for Graph DatabasesData Modeling & Metadata for Graph Databases
Data Modeling & Metadata for Graph Databases
 
Tailoring Malaysian Blockchain Regulations For Digital Economy 2018 MIGHT
Tailoring Malaysian Blockchain Regulations For Digital Economy 2018 MIGHT Tailoring Malaysian Blockchain Regulations For Digital Economy 2018 MIGHT
Tailoring Malaysian Blockchain Regulations For Digital Economy 2018 MIGHT
 
Presentation on BIKON - International BI conference
Presentation on BIKON - International BI conferencePresentation on BIKON - International BI conference
Presentation on BIKON - International BI conference
 
Gain Maximum Visibility into Your Applications - DEM03 - Chicago AWS Summit
Gain Maximum Visibility into Your Applications - DEM03 - Chicago AWS SummitGain Maximum Visibility into Your Applications - DEM03 - Chicago AWS Summit
Gain Maximum Visibility into Your Applications - DEM03 - Chicago AWS Summit
 
Actionable insights
Actionable insightsActionable insights
Actionable insights
 
Analytics and Big Data in Law Firms
Analytics and Big Data in Law FirmsAnalytics and Big Data in Law Firms
Analytics and Big Data in Law Firms
 

Último

Student profile product demonstration on grades, ability, well-being and mind...
Student profile product demonstration on grades, ability, well-being and mind...Student profile product demonstration on grades, ability, well-being and mind...
Student profile product demonstration on grades, ability, well-being and mind...Seán Kennedy
 
English-8-Q4-W3-Synthesizing-Essential-Information-From-Various-Sources-1.pdf
English-8-Q4-W3-Synthesizing-Essential-Information-From-Various-Sources-1.pdfEnglish-8-Q4-W3-Synthesizing-Essential-Information-From-Various-Sources-1.pdf
English-8-Q4-W3-Synthesizing-Essential-Information-From-Various-Sources-1.pdfblazblazml
 
Learn How Data Science Changes Our World
Learn How Data Science Changes Our WorldLearn How Data Science Changes Our World
Learn How Data Science Changes Our WorldEduminds Learning
 
Principles and Practices of Data Visualization
Principles and Practices of Data VisualizationPrinciples and Practices of Data Visualization
Principles and Practices of Data VisualizationKianJazayeri1
 
Rithik Kumar Singh codealpha pythohn.pdf
Rithik Kumar Singh codealpha pythohn.pdfRithik Kumar Singh codealpha pythohn.pdf
Rithik Kumar Singh codealpha pythohn.pdfrahulyadav957181
 
Digital Marketing Plan, how digital marketing works
Digital Marketing Plan, how digital marketing worksDigital Marketing Plan, how digital marketing works
Digital Marketing Plan, how digital marketing worksdeepakthakur548787
 
Bank Loan Approval Analysis: A Comprehensive Data Analysis Project
Bank Loan Approval Analysis: A Comprehensive Data Analysis ProjectBank Loan Approval Analysis: A Comprehensive Data Analysis Project
Bank Loan Approval Analysis: A Comprehensive Data Analysis ProjectBoston Institute of Analytics
 
Real-Time AI Streaming - AI Max Princeton
Real-Time AI  Streaming - AI Max PrincetonReal-Time AI  Streaming - AI Max Princeton
Real-Time AI Streaming - AI Max PrincetonTimothy Spann
 
modul pembelajaran robotic Workshop _ by Slidesgo.pptx
modul pembelajaran robotic Workshop _ by Slidesgo.pptxmodul pembelajaran robotic Workshop _ by Slidesgo.pptx
modul pembelajaran robotic Workshop _ by Slidesgo.pptxaleedritatuxx
 
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...Boston Institute of Analytics
 
Cyber awareness ppt on the recorded data
Cyber awareness ppt on the recorded dataCyber awareness ppt on the recorded data
Cyber awareness ppt on the recorded dataTecnoIncentive
 
World Economic Forum Metaverse Ecosystem By Utpal Chakraborty.pdf
World Economic Forum Metaverse Ecosystem By Utpal Chakraborty.pdfWorld Economic Forum Metaverse Ecosystem By Utpal Chakraborty.pdf
World Economic Forum Metaverse Ecosystem By Utpal Chakraborty.pdfsimulationsindia
 
FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024
FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024
FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024Susanna-Assunta Sansone
 
IBEF report on the Insurance market in India
IBEF report on the Insurance market in IndiaIBEF report on the Insurance market in India
IBEF report on the Insurance market in IndiaManalVerma4
 
why-transparency-and-traceability-are-essential-for-sustainable-supply-chains...
why-transparency-and-traceability-are-essential-for-sustainable-supply-chains...why-transparency-and-traceability-are-essential-for-sustainable-supply-chains...
why-transparency-and-traceability-are-essential-for-sustainable-supply-chains...Jack Cole
 
Decoding Patterns: Customer Churn Prediction Data Analysis Project
Decoding Patterns: Customer Churn Prediction Data Analysis ProjectDecoding Patterns: Customer Churn Prediction Data Analysis Project
Decoding Patterns: Customer Churn Prediction Data Analysis ProjectBoston Institute of Analytics
 
6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...
6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...
6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...Dr Arash Najmaei ( Phd., MBA, BSc)
 
What To Do For World Nature Conservation Day by Slidesgo.pptx
What To Do For World Nature Conservation Day by Slidesgo.pptxWhat To Do For World Nature Conservation Day by Slidesgo.pptx
What To Do For World Nature Conservation Day by Slidesgo.pptxSimranPal17
 
Student Profile Sample report on improving academic performance by uniting gr...
Student Profile Sample report on improving academic performance by uniting gr...Student Profile Sample report on improving academic performance by uniting gr...
Student Profile Sample report on improving academic performance by uniting gr...Seán Kennedy
 

Último (20)

Student profile product demonstration on grades, ability, well-being and mind...
Student profile product demonstration on grades, ability, well-being and mind...Student profile product demonstration on grades, ability, well-being and mind...
Student profile product demonstration on grades, ability, well-being and mind...
 
English-8-Q4-W3-Synthesizing-Essential-Information-From-Various-Sources-1.pdf
English-8-Q4-W3-Synthesizing-Essential-Information-From-Various-Sources-1.pdfEnglish-8-Q4-W3-Synthesizing-Essential-Information-From-Various-Sources-1.pdf
English-8-Q4-W3-Synthesizing-Essential-Information-From-Various-Sources-1.pdf
 
Learn How Data Science Changes Our World
Learn How Data Science Changes Our WorldLearn How Data Science Changes Our World
Learn How Data Science Changes Our World
 
Principles and Practices of Data Visualization
Principles and Practices of Data VisualizationPrinciples and Practices of Data Visualization
Principles and Practices of Data Visualization
 
Rithik Kumar Singh codealpha pythohn.pdf
Rithik Kumar Singh codealpha pythohn.pdfRithik Kumar Singh codealpha pythohn.pdf
Rithik Kumar Singh codealpha pythohn.pdf
 
Data Analysis Project: Stroke Prediction
Data Analysis Project: Stroke PredictionData Analysis Project: Stroke Prediction
Data Analysis Project: Stroke Prediction
 
Digital Marketing Plan, how digital marketing works
Digital Marketing Plan, how digital marketing worksDigital Marketing Plan, how digital marketing works
Digital Marketing Plan, how digital marketing works
 
Bank Loan Approval Analysis: A Comprehensive Data Analysis Project
Bank Loan Approval Analysis: A Comprehensive Data Analysis ProjectBank Loan Approval Analysis: A Comprehensive Data Analysis Project
Bank Loan Approval Analysis: A Comprehensive Data Analysis Project
 
Real-Time AI Streaming - AI Max Princeton
Real-Time AI  Streaming - AI Max PrincetonReal-Time AI  Streaming - AI Max Princeton
Real-Time AI Streaming - AI Max Princeton
 
modul pembelajaran robotic Workshop _ by Slidesgo.pptx
modul pembelajaran robotic Workshop _ by Slidesgo.pptxmodul pembelajaran robotic Workshop _ by Slidesgo.pptx
modul pembelajaran robotic Workshop _ by Slidesgo.pptx
 
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...
 
Cyber awareness ppt on the recorded data
Cyber awareness ppt on the recorded dataCyber awareness ppt on the recorded data
Cyber awareness ppt on the recorded data
 
World Economic Forum Metaverse Ecosystem By Utpal Chakraborty.pdf
World Economic Forum Metaverse Ecosystem By Utpal Chakraborty.pdfWorld Economic Forum Metaverse Ecosystem By Utpal Chakraborty.pdf
World Economic Forum Metaverse Ecosystem By Utpal Chakraborty.pdf
 
FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024
FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024
FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024
 
IBEF report on the Insurance market in India
IBEF report on the Insurance market in IndiaIBEF report on the Insurance market in India
IBEF report on the Insurance market in India
 
why-transparency-and-traceability-are-essential-for-sustainable-supply-chains...
why-transparency-and-traceability-are-essential-for-sustainable-supply-chains...why-transparency-and-traceability-are-essential-for-sustainable-supply-chains...
why-transparency-and-traceability-are-essential-for-sustainable-supply-chains...
 
Decoding Patterns: Customer Churn Prediction Data Analysis Project
Decoding Patterns: Customer Churn Prediction Data Analysis ProjectDecoding Patterns: Customer Churn Prediction Data Analysis Project
Decoding Patterns: Customer Churn Prediction Data Analysis Project
 
6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...
6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...
6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...
 
What To Do For World Nature Conservation Day by Slidesgo.pptx
What To Do For World Nature Conservation Day by Slidesgo.pptxWhat To Do For World Nature Conservation Day by Slidesgo.pptx
What To Do For World Nature Conservation Day by Slidesgo.pptx
 
Student Profile Sample report on improving academic performance by uniting gr...
Student Profile Sample report on improving academic performance by uniting gr...Student Profile Sample report on improving academic performance by uniting gr...
Student Profile Sample report on improving academic performance by uniting gr...
 

Ellicium Solutions - Making Data Science Work

  • 1. Making Data Science Work By Dr. Kuldeep Deshpande Saumitra Modak
  • 2. 10 x increase in data science jobs!
  • 3. Out of all Data Science Projects… Only these many succeed. Gartner says 80% of analytics insights will not deliver business outcomes through 2022 and 80% of AI projects will “remain alchemy, run by wizards” through 2020
  • 4. Is DS = BS* ?
  • 5. It is! Data Science = Business Sense
  • 6. Why Data Science Projects Fail? Initiation People and Process Solution Design Data Access Data Fallacies Inadequate research Starting with the wrong questions Not addressing the root cause Initiating data science project due to a blog Try to take on to large of a first project Lack of diverse Subject Matter Experts Lacking an experienced data science leader Limited business understanding Lack of a standardized data science process Failing to communicate value of data science The solutions are too complex Forming conclusions before data scientists start Poorly designed models Fail to provide actionable insights Using technologies because they are cool Lack of access to data Using Faulty / Bad Data Having a data scientist build their own ETLs Relying on Excel as the main data storage. Big data silos or vendor owned data! Simpson’s Paradox Setting wrong performance measures McNamara Fallacy Overfitting Data Dredging
  • 7. Availability of right data More Data Beats A Cleverer Algorithm! You need 10 times as many examples as degrees of freedom in the model. What model should I use? How much training data should I gather? How much data is required for algorithms? IT DEPENDS! More features may help overcome issue of lesser data. Correctly tagged data may be more useful than large untagged dataset. Small Data problems Over-fitting becomes much harder to avoid Outliers become much more dangerous. Noise becomes a real issue!
  • 8. Setting The Right Performance Measure Regression Classification OthersUnsupervised Models o MSPE o MSAE o R Square o Adjusted R Square o Precision- Recall o ROC-AUC o Accuracy o Log-Loss o Rand Index o Mutual Information o CV Error o Heuristic methods tc find K o BLEU Score (NLP)
  • 9. Classification of legal documents using Data Science A u t o m a t e d C l a s s i f i c a t i o n o f l e g a l d o c u m e n t s f r o m 3 0 0 0 + g o v e r n m e n t w e b p a g e s • U n s t r u c t u r e d D a t a • Tr a i n i n g d a t a – M o r e t h a n 9 0 0 0 0 m a n u a l l y c l a s s i f i e d • 9 5 % o ve r a l l a c c u r a c y • M o r e t h a n 8 5 % r e c a l l Data UsedObjective Result Availability of right data Right Performance Measure • N a ï v e B a y e s • S V M • R o c c h i o • Te c h n o l o g i e s – P y t h o n a n d R e d i s Algorithms And Technologies
  • 10. Milking the bull – When data science is going to fail! Images and text with very less features Beware of data issues that can lead to failure Inaccurately tagged data Low quantity data Absence of actual predictors Time series data without features
  • 11. The model does not categorize the data correctly because of too much of details and noise. Overfitting A statistical model is said to be overfitted, when we train it with a lot of data. When a model gets trained with so much of data, it starts learning from the noise and inaccurate data entries.
  • 12. Production forecasting for manufacturers P r e d i c t i n g d a i l y p r o d u c t i o n f o r a m a n u f a c t u r i n g c o m p a n y * M o n t h l y t a r g e t s n o t a v a i l a b l e * L a b o r d e t a i l s u n r e l i a b l e * M a c h i n e d e t a i l s n o t a v a i l a b l e • P a t t e r n s f o u n d i n s u b s e t s o f d a t a n o t g e n e r a l i z i n g • N o u s e f u l p r e d i c t i o n Data UsedObjective Result Milking The Bull Overfitting • R • A R I M A , L i n e a r r e g r e s s i o n , C a t b o o s t , & R a n d o m F o r e s t s w i t h f e a t u r e e n g i n e e r i n g Algorithms And Technologies 6 m o n t h s h o u r l y p r o d u c t i o n n u m b e r s
  • 13. How companies use hammers to kill mosquitos! appropriate technologies do! Sexy Technologies don’t guarantee success,
  • 14. Data must be clean or there should be a way to clean data. Better Data > Efficient Algorithms Domain understanding Data Owner interactions Acceptance of missing values Data Profiling automation • Remove Unwanted observations • Fix Structural Errors • Filter Unwanted Outliers • Handle Missing Data
  • 15. Customer Churn Prediction For Specialty Insurance • P r e d i c t l i k e l i h o o d o f c h u r n f o r e a c h p o l i c y • G e t m a x i m u m c h u r n d e t e c t i o n r a t e w h i l e k e e p i n g f a l s e a l a r m s l o w e r t h a n 2 0 % • D a t a u n d e r s t a n d i n g w i t h d o m a i n e x p e r t s • D a t a c l e a n i n g a n d e x p l o r a t i o n f o r i n s i g h t s a n d p r e d i c t o r i d e n t i f i c a t i o n • A b l e t o a c h i e v e 7 5 % c h u r n d e t e c t i o n r a t e w h i l e k e e p i n g t h e f a l s e a l a r m r a t e l e s s t h a t 2 0 % • T i m e l y p r e d i c t i o n o f c h u r n h e l p i n g i n t a k i n g r e t e n t i o n a c t i o n Data UsedObjective Result Appropriate Technologies Clean Data Algorithms And Technologies • R • R a n d o m F o r e s t , L o g i s t i c R e g r e s s i o n , G r a d i e n t B o o s t i n g a n d C a t B o o s t
  • 16. A phenomenon in which a trend appears in different groups of data but disappears or reverses when the groups are combined. Simpson’s Paradox The admission process seems significantly biased against women. But in reality most of the departments are significantly biased against men. Admission to UC Berkeley
  • 17. Making a decision based solely on quantitative observations and ignoring all others. McNamara Fallacy Presume that which cannot be measured easily is not important. • Let us assume a company has developed a new E-Commerce website. • After new website, site visits are up 50% and number of newsletter subscriptions are up 25%. Measure whatever can be easily measured. Disregard that which cannot be measured easily. But What if percentage of people who never open their emails OR who unsubscribe immediately has increased? Web Traffic Measurement
  • 18. Biotech Innovation Efficiency Analytics • A n a l y z e p o t e n t i a l o f e a r l y s t a g e b i o t e c h f i r m s b y a n a l y z i n g t h e i r I n n o v a t i o n e f f i c i e n c y. • F i n d o u t s t a t i s t i c a l c o r r e l a t i o n o f p e r f o r m a n c e o f a c o m p a n y w i t h i n n o v a t i o n e f f i c i e n c y. • C l i n i c a l Tr a i l s • P r e s s r e l e a s e • S t o c k d e t a i l s • P a t e n t s • P u b l i c a t i o n s • C o m p a n y F i n a n c i a l s Data UsedObjective Result Simpson’s Paradox McNamara Fallacy Algorithms And Technologies • R a n d o m F o r e s t • D e c i s i o n T r e e • N e u r a l N e t w o r k s • R e g r e s s i o n Right Performance Measure Domain Understanding S t a t i s t i c a l c o r r e l a t i o n b e t w e e n f i n a n c i a l p e r f o r m a n c e a n d i n n o va t i o n e f f i c i e n c y f o r c e r t a i n c a t e g o r y o f c o m p a n i e s
  • 19. Why Data Science Projects Fail? Initiation People and Process Solution Design Data Access Data Fallacies Inadequate research Starting with the wrong questions Not addressing the root cause Initiating data science project due to a blog Try to take on to large of a first project Lack of diverse Subject Matter Experts Lacking an experienced data science leader Limited business understanding Lack of a standardized data science process Failing to communicate value of data science The solutions are too complex Forming conclusions before data scientists start Poorly designed models Fail to provide actionable insights Using technologies because they are cool Lack of access to data Using Faulty / Bad Data Having a data scientist build their own ETLs Relying on Excel as the main data storage. Big data silos or vendor owned data! Simpson’s Paradox Setting wrong performance measures McNamara Fallacy Overfitting Data Dredging