SlideShare uma empresa Scribd logo
1 de 32
The Three Sexy Skills of Data Scientists (& Data-Driven Startups) Michael Driscoll | Metamarkets IA Ventures Big Data Conference | Oct 2010 = + For print version:  http://www.dataspora.com/blog
I. THE PROMISE OF BIG DATA
What is Big Data? Data that is distributed.
Attack of the Exponentials 1
Attack of the Exponentials 2
Attack of the Exponentials 3
Economics of Data Processing $ extract monetize BIG DATA FEATURES ECONOMIC VALUE
Economic Value > Extraction Cost
II. 3 SEXY SKILLS OF DATA SCIENTISTS… … & DATA-DRIVEN  START-UPS
=suffering
+ = if ($foo =~  / {2,3}([A-Z]{5,7}) {2,5}/)
Examples of Data Munging Start-ups r
=statistics
data model 1000 bytes 2 bytes
Examples of Statistical Data Products at Start-ups
=storytelling
Exploratory  Visualization
Narrative Visualization Source: NYT, inspired by Wattenberg & Bryon, http://www.leebyron.com/else/streamgraph/
Examples of Data Visualization Start-ups
III. THE BIG DATA ECOSYSTEM
Actions Products (APIs, Dashboards, Tools) Analytics (R, SPSS, SAS, SAP) Insights Data Hadoop, Parallel RDBMS  Data
THANKS! Michael Driscoll mike@metamarketsgroup.com @dataspora
The Three Sexy Skills of Data Scientists (& Data-Driven Startups) Michael Driscoll | Metamarkets IA Ventures Big Data Conference | Oct 2010 = +
EXTRAS
WHAT IS DATA  SCIENCE?
Ia ventures framing_talk_oct2010

Mais conteúdo relacionado

Destaque

One Billion Rows per Second: Analytics for the Digital Media Markets
One Billion Rows per Second:  Analytics for the Digital Media MarketsOne Billion Rows per Second:  Analytics for the Digital Media Markets
One Billion Rows per Second: Analytics for the Digital Media Markets
Michael Driscoll
 

Destaque (17)

One Billion Rows per Second: Analytics for the Digital Media Markets
One Billion Rows per Second:  Analytics for the Digital Media MarketsOne Billion Rows per Second:  Analytics for the Digital Media Markets
One Billion Rows per Second: Analytics for the Digital Media Markets
 
Making Sense of Data
Making Sense of DataMaking Sense of Data
Making Sense of Data
 
A Statistician's 'Big Tent' View on Big Data and Data Science (Version 8)
A Statistician's 'Big Tent' View on Big Data and Data Science (Version 8)A Statistician's 'Big Tent' View on Big Data and Data Science (Version 8)
A Statistician's 'Big Tent' View on Big Data and Data Science (Version 8)
 
A Statistician's 'Big Tent' View on Big Data and Data Science (Version 6)
A Statistician's 'Big Tent' View on Big Data and Data Science (Version 6)A Statistician's 'Big Tent' View on Big Data and Data Science (Version 6)
A Statistician's 'Big Tent' View on Big Data and Data Science (Version 6)
 
A Statistician's Introductory View on Big Data and Data Science (Version 7)
A Statistician's Introductory View on Big Data and Data Science (Version 7)A Statistician's Introductory View on Big Data and Data Science (Version 7)
A Statistician's Introductory View on Big Data and Data Science (Version 7)
 
A Statistician's 'Big Tent' View on Big Data and Data Science (Version 5)
A Statistician's 'Big Tent' View on Big Data and Data Science (Version 5)A Statistician's 'Big Tent' View on Big Data and Data Science (Version 5)
A Statistician's 'Big Tent' View on Big Data and Data Science (Version 5)
 
A Swiss Statistician's 'Big Tent' View on Big Data and Data Science (Version 10)
A Swiss Statistician's 'Big Tent' View on Big Data and Data Science (Version 10)A Swiss Statistician's 'Big Tent' View on Big Data and Data Science (Version 10)
A Swiss Statistician's 'Big Tent' View on Big Data and Data Science (Version 10)
 
A Statistician's `Big Tent' View on Big Data and Data Science in Health Scien...
A Statistician's `Big Tent' View on Big Data and Data Science in Health Scien...A Statistician's `Big Tent' View on Big Data and Data Science in Health Scien...
A Statistician's `Big Tent' View on Big Data and Data Science in Health Scien...
 
A Statistician's View on Big Data and Data Science (Version 3)
A Statistician's View on Big Data and Data Science (Version 3)A Statistician's View on Big Data and Data Science (Version 3)
A Statistician's View on Big Data and Data Science (Version 3)
 
A Statistician's View on Big Data and Data Science (Version 2)
A Statistician's View on Big Data and Data Science (Version 2)A Statistician's View on Big Data and Data Science (Version 2)
A Statistician's View on Big Data and Data Science (Version 2)
 
A Statistician's View on Big Data and Data Science in Pharmaceutical Developm...
A Statistician's View on Big Data and Data Science in Pharmaceutical Developm...A Statistician's View on Big Data and Data Science in Pharmaceutical Developm...
A Statistician's View on Big Data and Data Science in Pharmaceutical Developm...
 
A Statistician's 'Big Tent' View on Big Data and Data Science (Version 9)
A Statistician's 'Big Tent' View on Big Data and Data Science (Version 9)A Statistician's 'Big Tent' View on Big Data and Data Science (Version 9)
A Statistician's 'Big Tent' View on Big Data and Data Science (Version 9)
 
A Swiss Statistician's 'Big Tent' Overview of Big Data and Data Science in Ph...
A Swiss Statistician's 'Big Tent' Overview of Big Data and Data Science in Ph...A Swiss Statistician's 'Big Tent' Overview of Big Data and Data Science in Ph...
A Swiss Statistician's 'Big Tent' Overview of Big Data and Data Science in Ph...
 
Using Data Science for Social Good: Fighting Human Trafficking
Using Data Science for Social Good: Fighting Human TraffickingUsing Data Science for Social Good: Fighting Human Trafficking
Using Data Science for Social Good: Fighting Human Trafficking
 
Big Data, Data-Driven Decision Making and Statistics Towards Data-Informed Po...
Big Data, Data-Driven Decision Making and Statistics Towards Data-Informed Po...Big Data, Data-Driven Decision Making and Statistics Towards Data-Informed Po...
Big Data, Data-Driven Decision Making and Statistics Towards Data-Informed Po...
 
A Statistician's View on Big Data and Data Science (Version 1)
A Statistician's View on Big Data and Data Science (Version 1)A Statistician's View on Big Data and Data Science (Version 1)
A Statistician's View on Big Data and Data Science (Version 1)
 
Demystifying Big Data, Data Science and Statistics, along with Machine Intell...
Demystifying Big Data, Data Science and Statistics, along with Machine Intell...Demystifying Big Data, Data Science and Statistics, along with Machine Intell...
Demystifying Big Data, Data Science and Statistics, along with Machine Intell...
 

Semelhante a Ia ventures framing_talk_oct2010

The Business of Big Data (IA Ventures)
The Business of Big Data (IA Ventures)The Business of Big Data (IA Ventures)
The Business of Big Data (IA Ventures)
Ben Siscovick
 

Semelhante a Ia ventures framing_talk_oct2010 (20)

The Five Graphs of Government: How Federal Agencies can Utilize Graph Technology
The Five Graphs of Government: How Federal Agencies can Utilize Graph TechnologyThe Five Graphs of Government: How Federal Agencies can Utilize Graph Technology
The Five Graphs of Government: How Federal Agencies can Utilize Graph Technology
 
The Five Graphs of Government: How Federal Agencies can Utilize Graph Technology
The Five Graphs of Government: How Federal Agencies can Utilize Graph TechnologyThe Five Graphs of Government: How Federal Agencies can Utilize Graph Technology
The Five Graphs of Government: How Federal Agencies can Utilize Graph Technology
 
Predition Model for Stock Price on Big Data Analytics
Predition Model for Stock Price on Big Data AnalyticsPredition Model for Stock Price on Big Data Analytics
Predition Model for Stock Price on Big Data Analytics
 
IRJET- Building a Big Data Provenance with its Applications for Smart Cities
IRJET- Building a Big Data Provenance with its Applications for Smart CitiesIRJET- Building a Big Data Provenance with its Applications for Smart Cities
IRJET- Building a Big Data Provenance with its Applications for Smart Cities
 
從數據處理到資料視覺化-商業智慧的實作與應用
從數據處理到資料視覺化-商業智慧的實作與應用從數據處理到資料視覺化-商業智慧的實作與應用
從數據處理到資料視覺化-商業智慧的實作與應用
 
The Business of Big Data (IA Ventures)
The Business of Big Data (IA Ventures)The Business of Big Data (IA Ventures)
The Business of Big Data (IA Ventures)
 
BIG DATA AND HADOOP.pdf
BIG DATA AND HADOOP.pdfBIG DATA AND HADOOP.pdf
BIG DATA AND HADOOP.pdf
 
sybca-bigdata-ppt.pptx
sybca-bigdata-ppt.pptxsybca-bigdata-ppt.pptx
sybca-bigdata-ppt.pptx
 
The Disappearing Data Scientist
The Disappearing Data ScientistThe Disappearing Data Scientist
The Disappearing Data Scientist
 
The Business of Big Data - IA Ventures
The Business of Big Data - IA VenturesThe Business of Big Data - IA Ventures
The Business of Big Data - IA Ventures
 
Data_Mining.ppt
Data_Mining.pptData_Mining.ppt
Data_Mining.ppt
 
Math2015
Math2015Math2015
Math2015
 
Labour supply and demand forecasts final
Labour supply and demand forecasts   finalLabour supply and demand forecasts   final
Labour supply and demand forecasts final
 
Dr. dzaharudin mansor microsoft
Dr. dzaharudin mansor   microsoftDr. dzaharudin mansor   microsoft
Dr. dzaharudin mansor microsoft
 
Big Data for the Retail Business I Swan Insights I Solvay Business School
Big Data for the Retail Business I Swan Insights I Solvay Business SchoolBig Data for the Retail Business I Swan Insights I Solvay Business School
Big Data for the Retail Business I Swan Insights I Solvay Business School
 
Data Science Demystified
Data Science DemystifiedData Science Demystified
Data Science Demystified
 
Data imputation for unstructured dataset
Data imputation for unstructured datasetData imputation for unstructured dataset
Data imputation for unstructured dataset
 
Jobs Complexity
Jobs ComplexityJobs Complexity
Jobs Complexity
 
Bigdata
Bigdata Bigdata
Bigdata
 
RMDS data science ecosystem approach
RMDS data science ecosystem approachRMDS data science ecosystem approach
RMDS data science ecosystem approach
 

Último

+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
?#DUbAI#??##{{(☎️+971_581248768%)**%*]'#abortion pills for sale in dubai@
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
panagenda
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
WSO2
 

Último (20)

+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
 
A Beginners Guide to Building a RAG App Using Open Source Milvus
A Beginners Guide to Building a RAG App Using Open Source MilvusA Beginners Guide to Building a RAG App Using Open Source Milvus
A Beginners Guide to Building a RAG App Using Open Source Milvus
 
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot ModelNavi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor Presentation
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectors
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbu
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu SubbuApidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbu
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbu
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 

Ia ventures framing_talk_oct2010

Notas do Editor

  1. I’ve added an addendum to this talk – These skills aren’t just sexy for individualsStart-ups with these skills in-house are also sexy investments – we wouldn’t be meeting here today if that weren’t the case.The motivation for this talk was Hal Varian’s quip that Statisticians are the Sexy Profession of the Next Decade. I thought how I could mash up data with sexy, and this is what I got.
  2. Let’s set the stage. Joe Hellerstein has said that we’re living in the Industrial Revolution of Data.Big Data means.
  3. An important note: big data is not just about volume, it’s about velocity.Systems must be dramatically re-architected when they shift from monolithic to modular: unicellular to multicellular.Most of the additional complexity goes into interfaces between the pieces.Regardless, I define Big Data as data that is distributed.Transition: how did we get here, to a world chock full of exabytes?
  4. Attack of the exponentials.
  5. This is what’s happened in the last four decades.These four factors also happen to be inputs for data generation processes. So what happen
  6. Kurzweil reference, I call this the data singularity.CPU cost and storage costs have fallen faster than network and disk IO have risen – meaning more data can be stored & processed locally than can be shipped around. This has strategic implications: data is heavy, and hard to move once it lands somewhere. This puts Amazon, for instance, at enormous competitive advantage over its cloud computing peers.Data is heavy. Strategic implications.Things can be explode.
  7. Kurzweil reference, I call this the data singularity.CPU cost and storage costs have fallen faster than network and disk IO have risen – meaning more data can be stored & processed locally than can be shipped around. This has strategic implications: data is heavy, and hard to move once it lands somewhere. This puts Amazon, for instance, at enormous competitive advantage over its cloud computing peers.Data is heavy. Strategic implications.Things can be explode.
  8. Athabasca Sands of Canada. There are parallels; mining value from these tar sands illustrates the point that these efforts were only worthwhile once value of oil extracted exceeded cost of extraction. The same holds true for data.Where are the Athabasca Tar sands of data?(Graphic showing value > cost threshold with example data)The economics of data aggregation and analysis have shifted dramatically: compelling (i) new categories of data to be stored & collected, (ii) re-examination of already collected but frequently disposed dataIn either case, the criteria is the same: economic value > cost of analysisBut the process of capitalizing on these emerging opportunities, of converting data volumes into value, requires a unique skill set.When concentrated in a single individual or within a start-up, they are a powerful cocktail – sexy to employers and investors alikeThese are the three sexy skills I discuss nextNot all data is worth keeping / aggregating / analyzing.Formerly rehabilitate data that wasn’t meritorious.Amazon stock chart as punchline.So few people had access to these tools. The scientist moniker is almost counter to what we traditionally as scientist. Call out that hacker ethos of the data scientist.
  9. Few individuals have all these skills concentrated in one. That is, after all, the advantage of a start-up – where talents can compliment one another.
  10. It is a painful process.Transition: Most of us are used to confronting files that look like:
  11. Grab a screendump from the Oracle database scrape from 10 years of advertising data from a London publishing partner of ours.
  12. Datamunging is a labor intensive and painful process; often 80% of time in an analysis project can be spent on this pieceThe tools used are typically high-level scripting languages like Python, Ruby, Perl If you want to know more about munging, we have two world-class data mungers are here with us today, Pete Skomoroch & Flip Kromer. Pete built a site that mines Wikipedia’s edit logs for trending news topics, and Flip is the force behind InfoChimps, and has written more parsers than almost the rest of us combined.
  13. Abstraction, symbology, ontology…
  14. Statistics is the grammar of data science. For those who feel that stats is dominated by old white dead men…
  15. That because it is. But these old dead white men have some powerful ideas.
  16. Statistics allows us to provided reduced descriptions of the world, in the form of models.In this way, they are reductive: models capture only the essential features of the data.
  17. Statistical or machine-learning based data product are a staple of nearly every data-driven start-up in town. Here are just a few.Both in the process of developing a data product, data visualization plays an important role.
  18. Our eyes are the highest capacity bandwidth channel we have.Visualization is a means of surfacing otherwise intangibly large data sets.Two broad classes: exploratory, audience of 1 or 2, characterized by rapid iterations, local development, not in printNarrative: a point of view has been established and viz is supposed to help drive the story forward.
  19. Tukey
  20. Wattenberg stream graphs
  21. Storytelling. Human-size for human decision makers – telling stories with the data, through visualization, to communicate massive scales to people that execute and make decisions.
  22. Good luck. Tableau is desktop.
  23. This is an open source stack, and this vibrant big data hacker community actively building these tools.Specifically how its manifesting that we’re using in our country; he’s where we’re paying and here’s where we not. Here’s the solution interim. The stack is loosely coupled: right tool for the right job. The need for a dedicated analytics RDBMSYou know who sits on the top of that stack? We do. That’s why storytelling is such an important skill.Commoditization moves from the bottom up.
  24. I’ve added an addendum to this talk – These skills aren’t just sexy for individualsStart-ups with these skills in-house are also sexy investments – we wouldn’t be meeting here today if that weren’t the case.The motivation for this talk was Hal Varian’s quip that Statisticians are the Sexy Profession of the Next Decade. I thought how I could mash up data with sexy, and this is what I got.
  25. I’m defining data Science is: applying tools to data to answer questions. It is at the intersection of these tools. And it is a growing field, because data is getting bigger, and our tools are getting better. (Suffice to say, the questions we ask have been around since time immemorial: whoAnother word for questions is hypotheses.
  26. There’s been a lot of talk about Big Data in the past year. Articles and conferences.