SlideShare uma empresa Scribd logo
1 de 18
Data Science
  Data Meetup Jan. 12
What is data science?
Besides a reason to have beer and pizza…
What does the literature say?
Hacking
“Good data scientists understand, in a
deep way, that the heavy lifting of
cleanup and preparation isn’t
something that gets in the way of solving
the problem… it is the problem”
                                   DJ Patil



 bash/awk/sed
Statistics
What’s the probability that 2 people in
the front 2 rows share a birthday?
1. ~10%
2. ~20%
3. ~50%
4. ~90%

What’s the probability that a 99%
accurate test diagnosed a 1/1000 disease?
1. ~10%
2. ~50%
3. ~90%
4. ~99%
Domain Expertise
Intelligence Cookbook
      Just follow the steps
The Recipe

First, make it valuable.
Then, make it possible.
Then, make it beautiful.
 Then, make it smart.
Example

E-Commerce website
Make it valuable

Find a KPI that is correlated
   to bottom line revenue


e.g. number of products the
  visitor browses through
Make it possible

Develop the simplest heuristic



e.g. show the visitor one of the
     top 10 selling products
Make it beautiful

Create a method to quickly test new
    algorithms against old ones


 e.g. create a framework that split
   tests two models and reports
         which one is better
Make it smart

Figure out in what field your problem is
 and choose an off the shelf algorithm


    e.g. recognize that the problem
   is product recommendation and
       use collaborative filtering
Common ML problems
•   Supervised learning
    •   Classification
    •   Regression
    •   Anomaly detection
•   Unsupervised learning
    •   Clustering
    •   Separation
•   Recommendation
    •   Feature based recommendation
    •   Collaborative filtering
•   Search
    •   Indexing
    •   Ranking
To sum it all up
Real data science is hard

but …

Real data science is the last step in data
science, not the first

and besides …

The most important thing in data science is
the business, not the science
Questions?

email: vitalyp@liveperson.com

     Twitter: @bigdatasc

Mais conteúdo relacionado

Mais procurados

Nabep analytics presentation
Nabep analytics presentationNabep analytics presentation
Nabep analytics presentation
aarongblack1
 
DataScienceSummit2016
DataScienceSummit2016DataScienceSummit2016
DataScienceSummit2016
Paolo Massimi
 
Making fashion recommendations with human-in-the-loop machine learning
Making fashion recommendations with human-in-the-loop machine learningMaking fashion recommendations with human-in-the-loop machine learning
Making fashion recommendations with human-in-the-loop machine learning
Brad Klingenberg
 
Data Science Consulting at ThoughtWorks -- NYC Open Data Meetup
Data Science Consulting at ThoughtWorks -- NYC Open Data MeetupData Science Consulting at ThoughtWorks -- NYC Open Data Meetup
Data Science Consulting at ThoughtWorks -- NYC Open Data Meetup
David Johnston
 

Mais procurados (14)

TDAmeritrade Holiday Spending and Behavioral Econ
TDAmeritrade Holiday Spending and Behavioral EconTDAmeritrade Holiday Spending and Behavioral Econ
TDAmeritrade Holiday Spending and Behavioral Econ
 
How to Start Thinking Like a Data Scientist
How to Start Thinking Like a Data ScientistHow to Start Thinking Like a Data Scientist
How to Start Thinking Like a Data Scientist
 
Nabep analytics presentation
Nabep analytics presentationNabep analytics presentation
Nabep analytics presentation
 
10NTC - Data Superheroes - DiJulio
10NTC - Data Superheroes - DiJulio10NTC - Data Superheroes - DiJulio
10NTC - Data Superheroes - DiJulio
 
Giovanni Lanzani GoDataDriven
Giovanni Lanzani GoDataDrivenGiovanni Lanzani GoDataDriven
Giovanni Lanzani GoDataDriven
 
DataScienceSummit2016
DataScienceSummit2016DataScienceSummit2016
DataScienceSummit2016
 
"Lessons from product-market-fit iterations"  by Alex Weidauer, CEO @Rasa
"Lessons from product-market-fit iterations"  by Alex Weidauer, CEO @Rasa"Lessons from product-market-fit iterations"  by Alex Weidauer, CEO @Rasa
"Lessons from product-market-fit iterations"  by Alex Weidauer, CEO @Rasa
 
Start Thinking Like a Data Scientist
Start Thinking Like a Data ScientistStart Thinking Like a Data Scientist
Start Thinking Like a Data Scientist
 
Making fashion recommendations with human-in-the-loop machine learning
Making fashion recommendations with human-in-the-loop machine learningMaking fashion recommendations with human-in-the-loop machine learning
Making fashion recommendations with human-in-the-loop machine learning
 
Data Science Consulting at ThoughtWorks -- NYC Open Data Meetup
Data Science Consulting at ThoughtWorks -- NYC Open Data MeetupData Science Consulting at ThoughtWorks -- NYC Open Data Meetup
Data Science Consulting at ThoughtWorks -- NYC Open Data Meetup
 
Idea generation
Idea generationIdea generation
Idea generation
 
Managing Data Science by David Martínez Rego
Managing Data Science by David Martínez RegoManaging Data Science by David Martínez Rego
Managing Data Science by David Martínez Rego
 
Design Thinking for Data Science #StrataHadoop
Design Thinking for Data Science #StrataHadoopDesign Thinking for Data Science #StrataHadoop
Design Thinking for Data Science #StrataHadoop
 
Essentials op3
Essentials op3Essentials op3
Essentials op3
 

Destaque

Big Data World 2013 - How LinkedIn leveraged its data to become the world's l...
Big Data World 2013 - How LinkedIn leveraged its data to become the world's l...Big Data World 2013 - How LinkedIn leveraged its data to become the world's l...
Big Data World 2013 - How LinkedIn leveraged its data to become the world's l...
Vitaly Gordon
 

Destaque (7)

Computing Professional Identity for the Economic Graph
Computing Professional Identity for the Economic GraphComputing Professional Identity for the Economic Graph
Computing Professional Identity for the Economic Graph
 
Building Data Products
Building Data ProductsBuilding Data Products
Building Data Products
 
LinkedIn Data Products
LinkedIn Data ProductsLinkedIn Data Products
LinkedIn Data Products
 
Developing Data Products
Developing Data ProductsDeveloping Data Products
Developing Data Products
 
Big Data World 2013 - How LinkedIn leveraged its data to become the world's l...
Big Data World 2013 - How LinkedIn leveraged its data to become the world's l...Big Data World 2013 - How LinkedIn leveraged its data to become the world's l...
Big Data World 2013 - How LinkedIn leveraged its data to become the world's l...
 
Scalable and Flexible Machine Learning With Scala @ LinkedIn
Scalable and Flexible Machine Learning With Scala @ LinkedInScalable and Flexible Machine Learning With Scala @ LinkedIn
Scalable and Flexible Machine Learning With Scala @ LinkedIn
 
Where is Data Going? - RMDC Keynote
Where is Data Going? - RMDC KeynoteWhere is Data Going? - RMDC Keynote
Where is Data Going? - RMDC Keynote
 

Semelhante a Big data meetup

The data science handbook pre release (1)
The data science handbook   pre release (1)The data science handbook   pre release (1)
The data science handbook pre release (1)
Lakshmi Prasanna
 
Ala virtual july2012
Ala virtual july2012Ala virtual july2012
Ala virtual july2012
Stephen Abram
 

Semelhante a Big data meetup (20)

Product Management in the Era of Data Science
Product Management in the Era of Data ScienceProduct Management in the Era of Data Science
Product Management in the Era of Data Science
 
Landing your first Data Science Job: The Technical Interview
Landing your first Data Science Job: The Technical InterviewLanding your first Data Science Job: The Technical Interview
Landing your first Data Science Job: The Technical Interview
 
Clare Corthell: Learning Data Science Online
Clare Corthell: Learning Data Science OnlineClare Corthell: Learning Data Science Online
Clare Corthell: Learning Data Science Online
 
Fundamentals of Data Analytics Outline
Fundamentals of Data Analytics OutlineFundamentals of Data Analytics Outline
Fundamentals of Data Analytics Outline
 
The data science handbook pre release (1)
The data science handbook   pre release (1)The data science handbook   pre release (1)
The data science handbook pre release (1)
 
CYCLES Course (2): Alignment
CYCLES Course (2): AlignmentCYCLES Course (2): Alignment
CYCLES Course (2): Alignment
 
NYC Open Data Meetup-- Thoughtworks chief data scientist talk
NYC Open Data Meetup-- Thoughtworks chief data scientist talkNYC Open Data Meetup-- Thoughtworks chief data scientist talk
NYC Open Data Meetup-- Thoughtworks chief data scientist talk
 
Introduction to Data Science
Introduction to Data ScienceIntroduction to Data Science
Introduction to Data Science
 
How to start your data career
How to start your data careerHow to start your data career
How to start your data career
 
Ala virtual july2012
Ala virtual july2012Ala virtual july2012
Ala virtual july2012
 
How to be a Good Machine Learning PM by Google Product Manager
How to be a Good Machine Learning PM by Google Product ManagerHow to be a Good Machine Learning PM by Google Product Manager
How to be a Good Machine Learning PM by Google Product Manager
 
Oclc cla2012 abram
Oclc cla2012 abramOclc cla2012 abram
Oclc cla2012 abram
 
Digital analytics lecture1
Digital analytics lecture1Digital analytics lecture1
Digital analytics lecture1
 
What Managers Need to Know about Data Science
What Managers Need to Know about Data ScienceWhat Managers Need to Know about Data Science
What Managers Need to Know about Data Science
 
What's the profile of a data scientist?
What's the profile of a data scientist? What's the profile of a data scientist?
What's the profile of a data scientist?
 
Large language models in higher education
Large language models in higher educationLarge language models in higher education
Large language models in higher education
 
Who is a data scientist
Who is a data scientist  Who is a data scientist
Who is a data scientist
 
How Do I Get a Job in Data Science? | People Ask Google
How Do I Get a Job in Data Science? | People Ask GoogleHow Do I Get a Job in Data Science? | People Ask Google
How Do I Get a Job in Data Science? | People Ask Google
 
Learning Analytics Primer: Getting Started with Learning and Performance Anal...
Learning Analytics Primer: Getting Started with Learning and Performance Anal...Learning Analytics Primer: Getting Started with Learning and Performance Anal...
Learning Analytics Primer: Getting Started with Learning and Performance Anal...
 
Saoug
SaougSaoug
Saoug
 

Último

Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
?#DUbAI#??##{{(☎️+971_581248768%)**%*]'#abortion pills for sale in dubai@
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
WSO2
 

Último (20)

Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
 
Vector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptxVector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptx
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptx
 
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor Presentation
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot ModelMcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challenges
 
Six Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal OntologySix Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal Ontology
 
Understanding the FAA Part 107 License ..
Understanding the FAA Part 107 License ..Understanding the FAA Part 107 License ..
Understanding the FAA Part 107 License ..
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 

Big data meetup

  • 1. Data Science Data Meetup Jan. 12
  • 2. What is data science? Besides a reason to have beer and pizza…
  • 3.
  • 4.
  • 5. What does the literature say?
  • 6. Hacking “Good data scientists understand, in a deep way, that the heavy lifting of cleanup and preparation isn’t something that gets in the way of solving the problem… it is the problem” DJ Patil bash/awk/sed
  • 7. Statistics What’s the probability that 2 people in the front 2 rows share a birthday? 1. ~10% 2. ~20% 3. ~50% 4. ~90% What’s the probability that a 99% accurate test diagnosed a 1/1000 disease? 1. ~10% 2. ~50% 3. ~90% 4. ~99%
  • 9. Intelligence Cookbook Just follow the steps
  • 10. The Recipe First, make it valuable. Then, make it possible. Then, make it beautiful. Then, make it smart.
  • 12. Make it valuable Find a KPI that is correlated to bottom line revenue e.g. number of products the visitor browses through
  • 13. Make it possible Develop the simplest heuristic e.g. show the visitor one of the top 10 selling products
  • 14. Make it beautiful Create a method to quickly test new algorithms against old ones e.g. create a framework that split tests two models and reports which one is better
  • 15. Make it smart Figure out in what field your problem is and choose an off the shelf algorithm e.g. recognize that the problem is product recommendation and use collaborative filtering
  • 16. Common ML problems • Supervised learning • Classification • Regression • Anomaly detection • Unsupervised learning • Clustering • Separation • Recommendation • Feature based recommendation • Collaborative filtering • Search • Indexing • Ranking
  • 17. To sum it all up Real data science is hard but … Real data science is the last step in data science, not the first and besides … The most important thing in data science is the business, not the science