SlideShare uma empresa Scribd logo
1 de 20
HUMAN CLONING
                           The Data Scientist bottleneck resolved
                                        Dr Alex Farquhar




Friday, 24 February 2012
exabytes data (IDC/EMC report 2008)

         20,000


         15,000


         10,000


            5,000


                   0
                    2008   2009   2010   2011   2012   2013   2014   2015   2016   2017




Friday, 24 February 2012
By 2018, the United States alone could face a
                           shortage of 140,000 to 190,000 data people...




Friday, 24 February 2012
WE’RE ALL DOOMED




Friday, 24 February 2012
DATA PEOPLE?




                                     © Drew Conway


Friday, 24 February 2012
MAYBE WE CAN JUST....



    •1       statistician + 1 developer ≈ 1 data scientist?




Friday, 24 February 2012
HOW ABOUT....



    •4       statisticians + 4 developers ≈ 4 Data Scientists?




Friday, 24 February 2012
Friday, 24 February 2012
Friday, 24 February 2012
WHAT CAN WE DO?


    • Train            more new data scientists (not fast enough)

    • Cross-train             people

    • Cobble               together different skills in teams (see above)




Friday, 24 February 2012
WHAT CAN WE DO?



    • Do            more work




Friday, 24 February 2012
DOING MORE

    • simplify             (fob the work off)

    • automate               (fob even more work off)

    • choose/build              the right tools

    • parallelise

    • iterate



Friday, 24 February 2012
SIMPLIFY & AUTOMATE



    • Counting              stuff is not much fun




Friday, 24 February 2012
SIMPLIFY & AUTOMATE



                                             Hive




                                 TSV files   Hadoop

Friday, 24 February 2012
AUTOMATE / PARALLELISE
                           magic




                           Hadoop




                             Job



Friday, 24 February 2012
AUTOMATE / PARALLELISE
                                      magic




                                     Hadoop



                               Lots of jobs at once
                           Job 1   Job 2   Job 3   Job 4

Friday, 24 February 2012
TOOLS



    • something            thats allows fast iteration i.e. not java

    • R, ruby, python




Friday, 24 February 2012
PARALLELISE




Friday, 24 February 2012
ITERATE


    • try        different things

    • improve                what works

    • dump                 what doesn’t

    • constant               improvement & learning → get faster



Friday, 24 February 2012
WE’RE NOT ALL
                             DOOMED



Friday, 24 February 2012

Mais conteúdo relacionado

Destaque

PetersonSierra_Interface_SustainabilityPoster
PetersonSierra_Interface_SustainabilityPosterPetersonSierra_Interface_SustainabilityPoster
PetersonSierra_Interface_SustainabilityPoster
Sierra Peterson
 
Solucionario del primer examen con ingreso directo de la PRE SAN MARCOS ciclo...
Solucionario del primer examen con ingreso directo de la PRE SAN MARCOS ciclo...Solucionario del primer examen con ingreso directo de la PRE SAN MARCOS ciclo...
Solucionario del primer examen con ingreso directo de la PRE SAN MARCOS ciclo...
Mery Lucy Flores M.
 

Destaque (12)

Utilización y selección
Utilización y selecciónUtilización y selección
Utilización y selección
 
Mais cultura
Mais culturaMais cultura
Mais cultura
 
Gradle_ToursJUG
Gradle_ToursJUGGradle_ToursJUG
Gradle_ToursJUG
 
Research at last.fm
Research at last.fmResearch at last.fm
Research at last.fm
 
Sarwat Jahan_cv
Sarwat Jahan_cvSarwat Jahan_cv
Sarwat Jahan_cv
 
(Inter)national Facades: International Facade Master: WHY? by Arie Bergsma (2...
(Inter)national Facades: International Facade Master: WHY? by Arie Bergsma (2...(Inter)national Facades: International Facade Master: WHY? by Arie Bergsma (2...
(Inter)national Facades: International Facade Master: WHY? by Arie Bergsma (2...
 
Super-Fast Clustering Report in MapR
Super-Fast Clustering Report in MapRSuper-Fast Clustering Report in MapR
Super-Fast Clustering Report in MapR
 
PetersonSierra_Interface_SustainabilityPoster
PetersonSierra_Interface_SustainabilityPosterPetersonSierra_Interface_SustainabilityPoster
PetersonSierra_Interface_SustainabilityPoster
 
Lines and angles ( Class 6-7 )
Lines and angles ( Class 6-7 )Lines and angles ( Class 6-7 )
Lines and angles ( Class 6-7 )
 
aparatologia ortodontica
aparatologia ortodontica aparatologia ortodontica
aparatologia ortodontica
 
JENKINS_BreizhJUG_20111003
JENKINS_BreizhJUG_20111003JENKINS_BreizhJUG_20111003
JENKINS_BreizhJUG_20111003
 
Solucionario del primer examen con ingreso directo de la PRE SAN MARCOS ciclo...
Solucionario del primer examen con ingreso directo de la PRE SAN MARCOS ciclo...Solucionario del primer examen con ingreso directo de la PRE SAN MARCOS ciclo...
Solucionario del primer examen con ingreso directo de la PRE SAN MARCOS ciclo...
 

Semelhante a "Human Cloning: The Data Scientist Bottleneck Resolved" Dr. Alex Farquhar @ds_ldn

Filtered and Refined: Interfaces for Distilling Data
Filtered and Refined: Interfaces for Distilling Data Filtered and Refined: Interfaces for Distilling Data
Filtered and Refined: Interfaces for Distilling Data
Erin Jo Richey
 
Introducción a Agile y Scrum
Introducción a Agile y ScrumIntroducción a Agile y Scrum
Introducción a Agile y Scrum
betabeers
 

Semelhante a "Human Cloning: The Data Scientist Bottleneck Resolved" Dr. Alex Farquhar @ds_ldn (15)

Introduction to Machine Learning
Introduction to Machine LearningIntroduction to Machine Learning
Introduction to Machine Learning
 
Filtered and Refined: Interfaces for Distilling Data
Filtered and Refined: Interfaces for Distilling Data Filtered and Refined: Interfaces for Distilling Data
Filtered and Refined: Interfaces for Distilling Data
 
Del Druplicon a la máscara de luchador
Del Druplicon  a la máscara de luchadorDel Druplicon  a la máscara de luchador
Del Druplicon a la máscara de luchador
 
Summer of tech - Career Seminar 2012
Summer of tech  - Career Seminar 2012Summer of tech  - Career Seminar 2012
Summer of tech - Career Seminar 2012
 
Learning @school Integrating ICT in the Classroom
Learning @school Integrating ICT in the ClassroomLearning @school Integrating ICT in the Classroom
Learning @school Integrating ICT in the Classroom
 
鱼与熊掌 - 软件质量和交付速度
鱼与熊掌 - 软件质量和交付速度鱼与熊掌 - 软件质量和交付速度
鱼与熊掌 - 软件质量和交付速度
 
Talkdesk - Call center in the browser
Talkdesk - Call center in the browserTalkdesk - Call center in the browser
Talkdesk - Call center in the browser
 
Workbench: Managing Content Management
Workbench: Managing Content ManagementWorkbench: Managing Content Management
Workbench: Managing Content Management
 
Misguided manager
Misguided managerMisguided manager
Misguided manager
 
BIG DATA: AN AUGMENTED INTELLIGENCE FOR STRATEGIC DECISION MAKING from Struct...
BIG DATA: AN AUGMENTED INTELLIGENCE FOR STRATEGIC DECISION MAKING from Struct...BIG DATA: AN AUGMENTED INTELLIGENCE FOR STRATEGIC DECISION MAKING from Struct...
BIG DATA: AN AUGMENTED INTELLIGENCE FOR STRATEGIC DECISION MAKING from Struct...
 
HTML5 Italy: Back end ecosystems for your applications - Cesare Rocchi + Clau...
HTML5 Italy: Back end ecosystems for your applications - Cesare Rocchi + Clau...HTML5 Italy: Back end ecosystems for your applications - Cesare Rocchi + Clau...
HTML5 Italy: Back end ecosystems for your applications - Cesare Rocchi + Clau...
 
Weather Sensebox for Schools
Weather Sensebox for SchoolsWeather Sensebox for Schools
Weather Sensebox for Schools
 
Introducción a Agile y Scrum
Introducción a Agile y ScrumIntroducción a Agile y Scrum
Introducción a Agile y Scrum
 
Introducción a Agile y Scrum (BetaBeers.com)
Introducción a Agile y Scrum (BetaBeers.com)Introducción a Agile y Scrum (BetaBeers.com)
Introducción a Agile y Scrum (BetaBeers.com)
 
P Lamp
P LampP Lamp
P Lamp
 

Mais de Data Science London

Python pandas workshop iPython notebook (163 pages)
Python pandas workshop iPython notebook (163 pages)Python pandas workshop iPython notebook (163 pages)
Python pandas workshop iPython notebook (163 pages)
Data Science London
 

Mais de Data Science London (20)

Standardizing +113 million Merchant Names in Financial Services with Greenplu...
Standardizing +113 million Merchant Names in Financial Services with Greenplu...Standardizing +113 million Merchant Names in Financial Services with Greenplu...
Standardizing +113 million Merchant Names in Financial Services with Greenplu...
 
Big Data [sorry] & Data Science: What Does a Data Scientist Do?
Big Data [sorry] & Data Science: What Does a Data Scientist Do?Big Data [sorry] & Data Science: What Does a Data Scientist Do?
Big Data [sorry] & Data Science: What Does a Data Scientist Do?
 
Real-Time Queries in Hadoop w/ Cloudera Impala
Real-Time Queries in Hadoop w/ Cloudera ImpalaReal-Time Queries in Hadoop w/ Cloudera Impala
Real-Time Queries in Hadoop w/ Cloudera Impala
 
Nowcasting Business Performance
Nowcasting Business PerformanceNowcasting Business Performance
Nowcasting Business Performance
 
Numpy, the Python foundation for number crunching
Numpy, the Python foundation for number crunchingNumpy, the Python foundation for number crunching
Numpy, the Python foundation for number crunching
 
Python pandas workshop iPython notebook (163 pages)
Python pandas workshop iPython notebook (163 pages)Python pandas workshop iPython notebook (163 pages)
Python pandas workshop iPython notebook (163 pages)
 
Big Practical Recommendations with Alternating Least Squares
Big Practical Recommendations with Alternating Least SquaresBig Practical Recommendations with Alternating Least Squares
Big Practical Recommendations with Alternating Least Squares
 
Bringing back the excitement to data analysis
Bringing back the excitement to data analysisBringing back the excitement to data analysis
Bringing back the excitement to data analysis
 
Survival Analysis of Web Users
Survival Analysis of Web UsersSurvival Analysis of Web Users
Survival Analysis of Web Users
 
ACM RecSys 2012: Recommender Systems, Today
ACM RecSys 2012: Recommender Systems, TodayACM RecSys 2012: Recommender Systems, Today
ACM RecSys 2012: Recommender Systems, Today
 
Beyond Accuracy: Goal-Driven Recommender Systems Design
Beyond Accuracy: Goal-Driven Recommender Systems DesignBeyond Accuracy: Goal-Driven Recommender Systems Design
Beyond Accuracy: Goal-Driven Recommender Systems Design
 
Machine Learning and Hadoop: Present and Future
Machine Learning and Hadoop: Present and FutureMachine Learning and Hadoop: Present and Future
Machine Learning and Hadoop: Present and Future
 
Data Science for Live Music
Data Science for Live MusicData Science for Live Music
Data Science for Live Music
 
Music and Data: Adding Up the UK Music Industry
Music and Data: Adding Up the UK Music IndustryMusic and Data: Adding Up the UK Music Industry
Music and Data: Adding Up the UK Music Industry
 
Scientific Article Recommendations with Mahout
Scientific Article Recommendations with MahoutScientific Article Recommendations with Mahout
Scientific Article Recommendations with Mahout
 
Simple Matrix Factorization for Recommendation in Mahout
Simple Matrix Factorization for Recommendation in MahoutSimple Matrix Factorization for Recommendation in Mahout
Simple Matrix Factorization for Recommendation in Mahout
 
Going Real-Time with Mahout, Predicting gender of Facebook Users
Going Real-Time with Mahout, Predicting gender of Facebook UsersGoing Real-Time with Mahout, Predicting gender of Facebook Users
Going Real-Time with Mahout, Predicting gender of Facebook Users
 
Practical Magic with Incanter
Practical Magic with IncanterPractical Magic with Incanter
Practical Magic with Incanter
 
Understanding Cause & Effect in Customer Behaviour
Understanding Cause & Effect in Customer BehaviourUnderstanding Cause & Effect in Customer Behaviour
Understanding Cause & Effect in Customer Behaviour
 
Bootstrapping Data Science
Bootstrapping Data ScienceBootstrapping Data Science
Bootstrapping Data Science
 

Último

Top Kala Jadu, Black magic expert in Faisalabad and Kala ilam specialist in S...
Top Kala Jadu, Black magic expert in Faisalabad and Kala ilam specialist in S...Top Kala Jadu, Black magic expert in Faisalabad and Kala ilam specialist in S...
Top Kala Jadu, Black magic expert in Faisalabad and Kala ilam specialist in S...
baharayali
 
Famous Kala Jadu, Black magic specialist in Lahore and Kala ilam expert in ka...
Famous Kala Jadu, Black magic specialist in Lahore and Kala ilam expert in ka...Famous Kala Jadu, Black magic specialist in Lahore and Kala ilam expert in ka...
Famous Kala Jadu, Black magic specialist in Lahore and Kala ilam expert in ka...
baharayali
 
Top 10 Amil baba list Famous Amil baba In Pakistan Amil baba Kala jadu in Raw...
Top 10 Amil baba list Famous Amil baba In Pakistan Amil baba Kala jadu in Raw...Top 10 Amil baba list Famous Amil baba In Pakistan Amil baba Kala jadu in Raw...
Top 10 Amil baba list Famous Amil baba In Pakistan Amil baba Kala jadu in Raw...
Amil Baba Naveed Bangali
 
Famous Kala Jadu, Black magic specialist in Rawalpindi and Bangali Amil baba ...
Famous Kala Jadu, Black magic specialist in Rawalpindi and Bangali Amil baba ...Famous Kala Jadu, Black magic specialist in Rawalpindi and Bangali Amil baba ...
Famous Kala Jadu, Black magic specialist in Rawalpindi and Bangali Amil baba ...
baharayali
 

Último (20)

Connaught Place, Delhi Call girls :8448380779 Model Escorts | 100% verified
Connaught Place, Delhi Call girls :8448380779 Model Escorts | 100% verifiedConnaught Place, Delhi Call girls :8448380779 Model Escorts | 100% verified
Connaught Place, Delhi Call girls :8448380779 Model Escorts | 100% verified
 
Emails, Facebook, WhatsApp and the Dhamma (English and Chinese).pdf
Emails, Facebook, WhatsApp and the Dhamma  (English and Chinese).pdfEmails, Facebook, WhatsApp and the Dhamma  (English and Chinese).pdf
Emails, Facebook, WhatsApp and the Dhamma (English and Chinese).pdf
 
Top Kala Jadu, Black magic expert in Faisalabad and Kala ilam specialist in S...
Top Kala Jadu, Black magic expert in Faisalabad and Kala ilam specialist in S...Top Kala Jadu, Black magic expert in Faisalabad and Kala ilam specialist in S...
Top Kala Jadu, Black magic expert in Faisalabad and Kala ilam specialist in S...
 
"The Magnificent Surah Rahman: PDF Version"
"The Magnificent Surah Rahman: PDF Version""The Magnificent Surah Rahman: PDF Version"
"The Magnificent Surah Rahman: PDF Version"
 
St. Louise de Marillac and Galley Prisoners
St. Louise de Marillac and Galley PrisonersSt. Louise de Marillac and Galley Prisoners
St. Louise de Marillac and Galley Prisoners
 
Jude: The Acts of the Apostates (Jude vv.1-4).pptx
Jude: The Acts of the Apostates (Jude vv.1-4).pptxJude: The Acts of the Apostates (Jude vv.1-4).pptx
Jude: The Acts of the Apostates (Jude vv.1-4).pptx
 
Famous Kala Jadu, Black magic specialist in Lahore and Kala ilam expert in ka...
Famous Kala Jadu, Black magic specialist in Lahore and Kala ilam expert in ka...Famous Kala Jadu, Black magic specialist in Lahore and Kala ilam expert in ka...
Famous Kala Jadu, Black magic specialist in Lahore and Kala ilam expert in ka...
 
Flores de Mayo-history and origin we need to understand
Flores de Mayo-history and origin we need to understandFlores de Mayo-history and origin we need to understand
Flores de Mayo-history and origin we need to understand
 
St. Louise de Marillac and Care of the Sick Poor
St. Louise de Marillac and Care of the Sick PoorSt. Louise de Marillac and Care of the Sick Poor
St. Louise de Marillac and Care of the Sick Poor
 
Top 10 Amil baba list Famous Amil baba In Pakistan Amil baba Kala jadu in Raw...
Top 10 Amil baba list Famous Amil baba In Pakistan Amil baba Kala jadu in Raw...Top 10 Amil baba list Famous Amil baba In Pakistan Amil baba Kala jadu in Raw...
Top 10 Amil baba list Famous Amil baba In Pakistan Amil baba Kala jadu in Raw...
 
Legends of the Light v2.pdf xxxxxxxxxxxxx
Legends of the Light v2.pdf xxxxxxxxxxxxxLegends of the Light v2.pdf xxxxxxxxxxxxx
Legends of the Light v2.pdf xxxxxxxxxxxxx
 
NoHo First Good News online newsletter May 2024
NoHo First Good News online newsletter May 2024NoHo First Good News online newsletter May 2024
NoHo First Good News online newsletter May 2024
 
Genesis 1:7 || Meditate the Scripture daily verse by verse
Genesis 1:7  ||  Meditate the Scripture daily verse by verseGenesis 1:7  ||  Meditate the Scripture daily verse by verse
Genesis 1:7 || Meditate the Scripture daily verse by verse
 
Genesis 1:5 - Meditate the Scripture Daily bit by bit
Genesis 1:5 - Meditate the Scripture Daily bit by bitGenesis 1:5 - Meditate the Scripture Daily bit by bit
Genesis 1:5 - Meditate the Scripture Daily bit by bit
 
St. Louise de Marillac and Poor Children
St. Louise de Marillac and Poor ChildrenSt. Louise de Marillac and Poor Children
St. Louise de Marillac and Poor Children
 
The Revelation Chapter 4 Working Copy.docx
The Revelation Chapter 4 Working Copy.docxThe Revelation Chapter 4 Working Copy.docx
The Revelation Chapter 4 Working Copy.docx
 
Sector 18, Noida Call girls :8448380779 Model Escorts | 100% verified
Sector 18, Noida Call girls :8448380779 Model Escorts | 100% verifiedSector 18, Noida Call girls :8448380779 Model Escorts | 100% verified
Sector 18, Noida Call girls :8448380779 Model Escorts | 100% verified
 
Zulu - The Epistle of Ignatius to Polycarp.pdf
Zulu - The Epistle of Ignatius to Polycarp.pdfZulu - The Epistle of Ignatius to Polycarp.pdf
Zulu - The Epistle of Ignatius to Polycarp.pdf
 
Exploring the Meaning of Jesus’ Ascension
Exploring the Meaning of Jesus’ AscensionExploring the Meaning of Jesus’ Ascension
Exploring the Meaning of Jesus’ Ascension
 
Famous Kala Jadu, Black magic specialist in Rawalpindi and Bangali Amil baba ...
Famous Kala Jadu, Black magic specialist in Rawalpindi and Bangali Amil baba ...Famous Kala Jadu, Black magic specialist in Rawalpindi and Bangali Amil baba ...
Famous Kala Jadu, Black magic specialist in Rawalpindi and Bangali Amil baba ...
 

"Human Cloning: The Data Scientist Bottleneck Resolved" Dr. Alex Farquhar @ds_ldn

  • 1. HUMAN CLONING The Data Scientist bottleneck resolved Dr Alex Farquhar Friday, 24 February 2012
  • 2. exabytes data (IDC/EMC report 2008) 20,000 15,000 10,000 5,000 0 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 Friday, 24 February 2012
  • 3. By 2018, the United States alone could face a shortage of 140,000 to 190,000 data people... Friday, 24 February 2012
  • 4. WE’RE ALL DOOMED Friday, 24 February 2012
  • 5. DATA PEOPLE? © Drew Conway Friday, 24 February 2012
  • 6. MAYBE WE CAN JUST.... •1 statistician + 1 developer ≈ 1 data scientist? Friday, 24 February 2012
  • 7. HOW ABOUT.... •4 statisticians + 4 developers ≈ 4 Data Scientists? Friday, 24 February 2012
  • 10. WHAT CAN WE DO? • Train more new data scientists (not fast enough) • Cross-train people • Cobble together different skills in teams (see above) Friday, 24 February 2012
  • 11. WHAT CAN WE DO? • Do more work Friday, 24 February 2012
  • 12. DOING MORE • simplify (fob the work off) • automate (fob even more work off) • choose/build the right tools • parallelise • iterate Friday, 24 February 2012
  • 13. SIMPLIFY & AUTOMATE • Counting stuff is not much fun Friday, 24 February 2012
  • 14. SIMPLIFY & AUTOMATE Hive TSV files Hadoop Friday, 24 February 2012
  • 15. AUTOMATE / PARALLELISE magic Hadoop Job Friday, 24 February 2012
  • 16. AUTOMATE / PARALLELISE magic Hadoop Lots of jobs at once Job 1 Job 2 Job 3 Job 4 Friday, 24 February 2012
  • 17. TOOLS • something thats allows fast iteration i.e. not java • R, ruby, python Friday, 24 February 2012
  • 19. ITERATE • try different things • improve what works • dump what doesn’t • constant improvement & learning → get faster Friday, 24 February 2012
  • 20. WE’RE NOT ALL DOOMED Friday, 24 February 2012