SlideShare uma empresa Scribd logo
1 de 23
DS101:
Introduction to AI and DS
Lecture 1: Introduction to Data Science
Dr. Sudheer
hsudheer@ifheindia.org
1
2
Course Code Course Title L P U
DS101
Introduction to Data Science and Artificial
Intelligence
3 0 3
Team of Instructors: 1. Ms Sathya AR 2. Ms. P Rohini 3. Dr. H Sudheer 4. Dr. P. Sirisha
Course Objective:
1. The objective of this course is to expose the students to fundamental concepts of data science and their
implementation using Python programming.
2. Introduce the mathematical foundations required for data science
3. To explore the various data pre-processing techniques
4. To Summarize the aspects of exploratory data analysis (EDA): Uses of EDA; Role of metadata in EDA; Data
transformations identified through EDA.
5. To understand the AI approaches in Data Science.
3
Textbook (s) T1 Cathy O’Neil and Rachel Schutt, “Doing Data Science, Straight Talk From The
Frontline”, O’Reilly, 2014.
T2 Artificial Intelligence A Modern Approach, by Stuart Russell and Peter
Norvig, 3 rd Edition, Pearson Education, 2010, ISBN 13:978-0-13-604259-4.
Reference Book(s) R1 Python Data Science Handbook, Essential Tools for Working with Data, Jake
VanderPlas,Orielly, 2017
R2
Data Science from Scratch: FIRST PRINCIPLES WITH PYTHON, Joel Grus,
Orielly,2019
R3 The Data Science HandBook, Field Cady ,Wiley,2017
R4
Jiawei Han, Micheline Kamber and Jian Pei, “ Data Mining: Concepts and
Techniques”, Third Edition. ISBN 0123814790, 2011
Online Resources R5 https://onlinecourses.nptel.ac.in/noc22_cs72/preview
R6 https://www.udemy.com/course/complete-python-bootcamp/
R7
https://lms.simplilearn.com/courses/4227/Introduction-to-Data-
Science/syllabus
4
“Data is the new oil. It’s valuable, but if unrefined it cannot really be used. It has
to be changed into gas, plastic, chemicals, etc to create a valuable entity that
drives profitable activity; so must data be broken down, analyzed for it to have
value.” — Clive Humby, 2006
5
Increasingly many companies see
themselves as data driven.
6
Data Science is the science which uses computer science, statistics
and machine learning, visualization and human-computer
interactions to collect, clean, integrate, analyze, visualize, interact
with data to create data products.
Data science is an interdisciplinary field that uses scientific methods,
processes, algorithms and systems to extract knowledge and insights
from noisy, structured and unstructured data] and apply knowledge from
data across a broad range of application domains. Data science is related
to data mining, machine learning and big data.
SOURCE : WIKIPIDEA
7
Big Data and Data Science Hype
8
“Big Data” Sources
Every:
Click
Ad impression
Billing event
Fast Forward, pause,…
Server request
Transaction
Network message
Fault
…
User Generated (Web &
Mobile)
….
.
Internet of Things / M2M Health/Scientific Computing
It’s All Happening On-line
“Big Data” Sources
11
The Current Landscape (with a Little History)
12
Data science is a broad field that refers to the collective
processes, theories, concepts, tools and technologies that
enable the review, analysis and extraction of valuable
knowledge and information from raw data.
Source: Techopedia
Drew Conway’s Venn diagram of data science
Rise of the Data Scientist
13
skills of Data Geeks:
Statistics – traditional analysis you’re used to thinking about
Data Munging – parsing, scraping, and formatting data
Visualization – graphs, tools, etc.
Harvard Business Review declared data scientist to be the “Sexiest Job of the
21st Century”.
https://hbr.org/2012/10/data-scientist-the-sexiest-job-of-the-21st-century
The Role of the Social Scientist in Data Science
14
Both LinkedIn and Facebook are social network companies.
Often‐ times a description or definition of data scientist includes hybrid sta
tistician, software engineer, and social scientist.
If they’re social science-y problems like friend recommendations or people
you know or user segmentation, then by all means, bring on the social
scientist! Social scientists also do tend to be good question askers and have
other good investigative qualities, so a social scientist who also has the
quantitative and programming chops makes a great data scientist.
Data Science Jobs
15
Most of the job descriptions: they ask data scientists to be experts in
computer science, statistics, communication, data visualization, and to have
extensive domain expertise.
Nobody is an expert in everything, which is why it makes more sense to create
teams of people who have different profiles and different expertise together,
as a team, they can specialize in all those things.
A Data Science Profile :
• Computer science
• Math
• Statistics
• Machine learning
• Domain expertise
• Communication and presentation skills
• Data visualization
Rachel’s data science profile, which she created to illustrate trying to visualize oneself as a data
scientist; she wanted students and guest lecturers to “riff” on this—to add buckets or remove
skills, use a different scale or visualization method, and think about the drawbacks of self-
reporting
16
Data science team profiles can be
constructed from data scientist
profiles; there should be alignment
between the data science team
profile and the profile of the data
problems they try to solve
17
Data science workflow
18
Section 2
https://cacm.acm.org/blogs/blog-cacm/169199-data-science-
workflow-overview-and-challenges/fulltext
Data science workflow
19
Section 2
Data science workflow
20
Digging Around
in Data
Hypothesize
Model
Large Scale
Exploitation
Evaluate
Interpret
Clean,
prep
What is hard about Data Science
21
• Overcoming assumptions
• Making ad-hoc explanations of data patterns
• Overgeneralizing
• Communication
• Not checking enough (validate models, data pipeline
integrity, etc.)
• Using statistical tests correctly
• Prototype  Production transitions
• Data pipeline complexity (who do you ask?)
What is hard about Data Science
22
What are Data Scientists really doing?
23
Section 2
https://visit.figure-eight.com/rs/416-ZBE-
142/images/CrowdFlower_DataScienceReport_2016.pdf

Mais conteúdo relacionado

Semelhante a Lecture_1_Intro_toDS&AI.pptx

Introduction to Data Science.pdf
Introduction to Data Science.pdfIntroduction to Data Science.pdf
Introduction to Data Science.pdfUniversity of Sindh
 
The Analytics and Data Science Landscape
The Analytics and Data Science LandscapeThe Analytics and Data Science Landscape
The Analytics and Data Science LandscapePhilip Bourne
 
Next generation of data scientist
Next generation of data scientistNext generation of data scientist
Next generation of data scientistTanujaSomvanshi1
 
Session 01 designing and scoping a data science project
Session 01 designing and scoping a data science projectSession 01 designing and scoping a data science project
Session 01 designing and scoping a data science projectbodaceacat
 
Session 01 designing and scoping a data science project
Session 01 designing and scoping a data science projectSession 01 designing and scoping a data science project
Session 01 designing and scoping a data science projectSara-Jayne Terp
 
ds.pptx
ds.pptxds.pptx
ds.pptxElves3
 
The profile of the management (data) scientist: Potential scenarios and skill...
The profile of the management (data) scientist: Potential scenarios and skill...The profile of the management (data) scientist: Potential scenarios and skill...
The profile of the management (data) scientist: Potential scenarios and skill...Juan Mateos-Garcia
 
Ch1IntroductiontoDataScience.pptx
Ch1IntroductiontoDataScience.pptxCh1IntroductiontoDataScience.pptx
Ch1IntroductiontoDataScience.pptxAbderrahmanABID2
 
Data Science and AI in Biomedicine: The World has Changed
Data Science and AI in Biomedicine: The World has ChangedData Science and AI in Biomedicine: The World has Changed
Data Science and AI in Biomedicine: The World has ChangedPhilip Bourne
 
Ed Fox on Learning Technologies
Ed Fox on Learning TechnologiesEd Fox on Learning Technologies
Ed Fox on Learning TechnologiesGardner Campbell
 
Data, Data Everywhere: What's A Publisher to Do?
Data, Data Everywhere: What's  A Publisher to Do?Data, Data Everywhere: What's  A Publisher to Do?
Data, Data Everywhere: What's A Publisher to Do?Anita de Waard
 
A New Paradigm on Analytic-Driven Information and Automation V2.pdf
A New Paradigm on Analytic-Driven Information and Automation V2.pdfA New Paradigm on Analytic-Driven Information and Automation V2.pdf
A New Paradigm on Analytic-Driven Information and Automation V2.pdfArmyTrilidiaDevegaSK
 
Data fluency for the 21st century
Data fluency for the 21st centuryData fluency for the 21st century
Data fluency for the 21st centuryMartinFrigaard
 
Data Science for Every Student at RPI
Data Science for Every Student at RPIData Science for Every Student at RPI
Data Science for Every Student at RPISteven Miller
 
Data Science Meets Biomedicine, Does Anything Change
Data Science Meets Biomedicine, Does Anything ChangeData Science Meets Biomedicine, Does Anything Change
Data Science Meets Biomedicine, Does Anything ChangePhilip Bourne
 
Thinking About the Making of Data
Thinking About the Making of DataThinking About the Making of Data
Thinking About the Making of DataPaul Groth
 
Dutch Cooking with xAPI Recipes, The Good, the Bad, and the Consistent
Dutch Cooking with xAPI Recipes, The Good, the Bad, and the ConsistentDutch Cooking with xAPI Recipes, The Good, the Bad, and the Consistent
Dutch Cooking with xAPI Recipes, The Good, the Bad, and the ConsistentHendrik Drachsler
 
My FAIR share of the work - Diamond Light Source - Dec 2018
My FAIR share of the work - Diamond Light Source - Dec 2018My FAIR share of the work - Diamond Light Source - Dec 2018
My FAIR share of the work - Diamond Light Source - Dec 2018Susanna-Assunta Sansone
 

Semelhante a Lecture_1_Intro_toDS&AI.pptx (20)

Untitled document.pdf
Untitled document.pdfUntitled document.pdf
Untitled document.pdf
 
Introduction to Data Science.pdf
Introduction to Data Science.pdfIntroduction to Data Science.pdf
Introduction to Data Science.pdf
 
The Analytics and Data Science Landscape
The Analytics and Data Science LandscapeThe Analytics and Data Science Landscape
The Analytics and Data Science Landscape
 
Next generation of data scientist
Next generation of data scientistNext generation of data scientist
Next generation of data scientist
 
Session 01 designing and scoping a data science project
Session 01 designing and scoping a data science projectSession 01 designing and scoping a data science project
Session 01 designing and scoping a data science project
 
Session 01 designing and scoping a data science project
Session 01 designing and scoping a data science projectSession 01 designing and scoping a data science project
Session 01 designing and scoping a data science project
 
ds.pptx
ds.pptxds.pptx
ds.pptx
 
The profile of the management (data) scientist: Potential scenarios and skill...
The profile of the management (data) scientist: Potential scenarios and skill...The profile of the management (data) scientist: Potential scenarios and skill...
The profile of the management (data) scientist: Potential scenarios and skill...
 
Ch1IntroductiontoDataScience.pptx
Ch1IntroductiontoDataScience.pptxCh1IntroductiontoDataScience.pptx
Ch1IntroductiontoDataScience.pptx
 
Data Science and AI in Biomedicine: The World has Changed
Data Science and AI in Biomedicine: The World has ChangedData Science and AI in Biomedicine: The World has Changed
Data Science and AI in Biomedicine: The World has Changed
 
Information entanglement
Information entanglementInformation entanglement
Information entanglement
 
Ed Fox on Learning Technologies
Ed Fox on Learning TechnologiesEd Fox on Learning Technologies
Ed Fox on Learning Technologies
 
Data, Data Everywhere: What's A Publisher to Do?
Data, Data Everywhere: What's  A Publisher to Do?Data, Data Everywhere: What's  A Publisher to Do?
Data, Data Everywhere: What's A Publisher to Do?
 
A New Paradigm on Analytic-Driven Information and Automation V2.pdf
A New Paradigm on Analytic-Driven Information and Automation V2.pdfA New Paradigm on Analytic-Driven Information and Automation V2.pdf
A New Paradigm on Analytic-Driven Information and Automation V2.pdf
 
Data fluency for the 21st century
Data fluency for the 21st centuryData fluency for the 21st century
Data fluency for the 21st century
 
Data Science for Every Student at RPI
Data Science for Every Student at RPIData Science for Every Student at RPI
Data Science for Every Student at RPI
 
Data Science Meets Biomedicine, Does Anything Change
Data Science Meets Biomedicine, Does Anything ChangeData Science Meets Biomedicine, Does Anything Change
Data Science Meets Biomedicine, Does Anything Change
 
Thinking About the Making of Data
Thinking About the Making of DataThinking About the Making of Data
Thinking About the Making of Data
 
Dutch Cooking with xAPI Recipes, The Good, the Bad, and the Consistent
Dutch Cooking with xAPI Recipes, The Good, the Bad, and the ConsistentDutch Cooking with xAPI Recipes, The Good, the Bad, and the Consistent
Dutch Cooking with xAPI Recipes, The Good, the Bad, and the Consistent
 
My FAIR share of the work - Diamond Light Source - Dec 2018
My FAIR share of the work - Diamond Light Source - Dec 2018My FAIR share of the work - Diamond Light Source - Dec 2018
My FAIR share of the work - Diamond Light Source - Dec 2018
 

Último

Beyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global ImpactBeyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global ImpactPECB
 
Disha NEET Physics Guide for classes 11 and 12.pdf
Disha NEET Physics Guide for classes 11 and 12.pdfDisha NEET Physics Guide for classes 11 and 12.pdf
Disha NEET Physics Guide for classes 11 and 12.pdfchloefrazer622
 
Class 11th Physics NEET formula sheet pdf
Class 11th Physics NEET formula sheet pdfClass 11th Physics NEET formula sheet pdf
Class 11th Physics NEET formula sheet pdfAyushMahapatra5
 
Grant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingGrant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingTechSoup
 
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...Krashi Coaching
 
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...fonyou31
 
A Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy ReformA Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy ReformChameera Dedduwage
 
fourth grading exam for kindergarten in writing
fourth grading exam for kindergarten in writingfourth grading exam for kindergarten in writing
fourth grading exam for kindergarten in writingTeacherCyreneCayanan
 
9548086042 for call girls in Indira Nagar with room service
9548086042  for call girls in Indira Nagar  with room service9548086042  for call girls in Indira Nagar  with room service
9548086042 for call girls in Indira Nagar with room servicediscovermytutordmt
 
Interactive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communicationInteractive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communicationnomboosow
 
Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17Celine George
 
Activity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfActivity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfciinovamais
 
Q4-W6-Restating Informational Text Grade 3
Q4-W6-Restating Informational Text Grade 3Q4-W6-Restating Informational Text Grade 3
Q4-W6-Restating Informational Text Grade 3JemimahLaneBuaron
 
1029 - Danh muc Sach Giao Khoa 10 . pdf
1029 -  Danh muc Sach Giao Khoa 10 . pdf1029 -  Danh muc Sach Giao Khoa 10 . pdf
1029 - Danh muc Sach Giao Khoa 10 . pdfQucHHunhnh
 
microwave assisted reaction. General introduction
microwave assisted reaction. General introductionmicrowave assisted reaction. General introduction
microwave assisted reaction. General introductionMaksud Ahmed
 
Measures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SDMeasures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SDThiyagu K
 
Z Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot GraphZ Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot GraphThiyagu K
 

Último (20)

Beyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global ImpactBeyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global Impact
 
Disha NEET Physics Guide for classes 11 and 12.pdf
Disha NEET Physics Guide for classes 11 and 12.pdfDisha NEET Physics Guide for classes 11 and 12.pdf
Disha NEET Physics Guide for classes 11 and 12.pdf
 
Código Creativo y Arte de Software | Unidad 1
Código Creativo y Arte de Software | Unidad 1Código Creativo y Arte de Software | Unidad 1
Código Creativo y Arte de Software | Unidad 1
 
Class 11th Physics NEET formula sheet pdf
Class 11th Physics NEET formula sheet pdfClass 11th Physics NEET formula sheet pdf
Class 11th Physics NEET formula sheet pdf
 
Grant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingGrant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy Consulting
 
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
 
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...
 
A Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy ReformA Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy Reform
 
fourth grading exam for kindergarten in writing
fourth grading exam for kindergarten in writingfourth grading exam for kindergarten in writing
fourth grading exam for kindergarten in writing
 
9548086042 for call girls in Indira Nagar with room service
9548086042  for call girls in Indira Nagar  with room service9548086042  for call girls in Indira Nagar  with room service
9548086042 for call girls in Indira Nagar with room service
 
Interactive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communicationInteractive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communication
 
Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17
 
Activity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfActivity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdf
 
Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"
Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"
Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"
 
Q4-W6-Restating Informational Text Grade 3
Q4-W6-Restating Informational Text Grade 3Q4-W6-Restating Informational Text Grade 3
Q4-W6-Restating Informational Text Grade 3
 
1029 - Danh muc Sach Giao Khoa 10 . pdf
1029 -  Danh muc Sach Giao Khoa 10 . pdf1029 -  Danh muc Sach Giao Khoa 10 . pdf
1029 - Danh muc Sach Giao Khoa 10 . pdf
 
microwave assisted reaction. General introduction
microwave assisted reaction. General introductionmicrowave assisted reaction. General introduction
microwave assisted reaction. General introduction
 
Measures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SDMeasures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SD
 
INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptxINDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
 
Z Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot GraphZ Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot Graph
 

Lecture_1_Intro_toDS&AI.pptx

  • 1. DS101: Introduction to AI and DS Lecture 1: Introduction to Data Science Dr. Sudheer hsudheer@ifheindia.org 1
  • 2. 2 Course Code Course Title L P U DS101 Introduction to Data Science and Artificial Intelligence 3 0 3 Team of Instructors: 1. Ms Sathya AR 2. Ms. P Rohini 3. Dr. H Sudheer 4. Dr. P. Sirisha Course Objective: 1. The objective of this course is to expose the students to fundamental concepts of data science and their implementation using Python programming. 2. Introduce the mathematical foundations required for data science 3. To explore the various data pre-processing techniques 4. To Summarize the aspects of exploratory data analysis (EDA): Uses of EDA; Role of metadata in EDA; Data transformations identified through EDA. 5. To understand the AI approaches in Data Science.
  • 3. 3 Textbook (s) T1 Cathy O’Neil and Rachel Schutt, “Doing Data Science, Straight Talk From The Frontline”, O’Reilly, 2014. T2 Artificial Intelligence A Modern Approach, by Stuart Russell and Peter Norvig, 3 rd Edition, Pearson Education, 2010, ISBN 13:978-0-13-604259-4. Reference Book(s) R1 Python Data Science Handbook, Essential Tools for Working with Data, Jake VanderPlas,Orielly, 2017 R2 Data Science from Scratch: FIRST PRINCIPLES WITH PYTHON, Joel Grus, Orielly,2019 R3 The Data Science HandBook, Field Cady ,Wiley,2017 R4 Jiawei Han, Micheline Kamber and Jian Pei, “ Data Mining: Concepts and Techniques”, Third Edition. ISBN 0123814790, 2011 Online Resources R5 https://onlinecourses.nptel.ac.in/noc22_cs72/preview R6 https://www.udemy.com/course/complete-python-bootcamp/ R7 https://lms.simplilearn.com/courses/4227/Introduction-to-Data- Science/syllabus
  • 4. 4 “Data is the new oil. It’s valuable, but if unrefined it cannot really be used. It has to be changed into gas, plastic, chemicals, etc to create a valuable entity that drives profitable activity; so must data be broken down, analyzed for it to have value.” — Clive Humby, 2006
  • 5. 5 Increasingly many companies see themselves as data driven.
  • 6. 6
  • 7. Data Science is the science which uses computer science, statistics and machine learning, visualization and human-computer interactions to collect, clean, integrate, analyze, visualize, interact with data to create data products. Data science is an interdisciplinary field that uses scientific methods, processes, algorithms and systems to extract knowledge and insights from noisy, structured and unstructured data] and apply knowledge from data across a broad range of application domains. Data science is related to data mining, machine learning and big data. SOURCE : WIKIPIDEA 7
  • 8. Big Data and Data Science Hype 8
  • 9. “Big Data” Sources Every: Click Ad impression Billing event Fast Forward, pause,… Server request Transaction Network message Fault … User Generated (Web & Mobile) …. . Internet of Things / M2M Health/Scientific Computing It’s All Happening On-line
  • 11. 11
  • 12. The Current Landscape (with a Little History) 12 Data science is a broad field that refers to the collective processes, theories, concepts, tools and technologies that enable the review, analysis and extraction of valuable knowledge and information from raw data. Source: Techopedia Drew Conway’s Venn diagram of data science
  • 13. Rise of the Data Scientist 13 skills of Data Geeks: Statistics – traditional analysis you’re used to thinking about Data Munging – parsing, scraping, and formatting data Visualization – graphs, tools, etc. Harvard Business Review declared data scientist to be the “Sexiest Job of the 21st Century”. https://hbr.org/2012/10/data-scientist-the-sexiest-job-of-the-21st-century
  • 14. The Role of the Social Scientist in Data Science 14 Both LinkedIn and Facebook are social network companies. Often‐ times a description or definition of data scientist includes hybrid sta tistician, software engineer, and social scientist. If they’re social science-y problems like friend recommendations or people you know or user segmentation, then by all means, bring on the social scientist! Social scientists also do tend to be good question askers and have other good investigative qualities, so a social scientist who also has the quantitative and programming chops makes a great data scientist.
  • 15. Data Science Jobs 15 Most of the job descriptions: they ask data scientists to be experts in computer science, statistics, communication, data visualization, and to have extensive domain expertise. Nobody is an expert in everything, which is why it makes more sense to create teams of people who have different profiles and different expertise together, as a team, they can specialize in all those things. A Data Science Profile : • Computer science • Math • Statistics • Machine learning • Domain expertise • Communication and presentation skills • Data visualization
  • 16. Rachel’s data science profile, which she created to illustrate trying to visualize oneself as a data scientist; she wanted students and guest lecturers to “riff” on this—to add buckets or remove skills, use a different scale or visualization method, and think about the drawbacks of self- reporting 16
  • 17. Data science team profiles can be constructed from data scientist profiles; there should be alignment between the data science team profile and the profile of the data problems they try to solve 17
  • 18. Data science workflow 18 Section 2 https://cacm.acm.org/blogs/blog-cacm/169199-data-science- workflow-overview-and-challenges/fulltext
  • 20. Data science workflow 20 Digging Around in Data Hypothesize Model Large Scale Exploitation Evaluate Interpret Clean, prep
  • 21. What is hard about Data Science 21 • Overcoming assumptions • Making ad-hoc explanations of data patterns • Overgeneralizing • Communication • Not checking enough (validate models, data pipeline integrity, etc.) • Using statistical tests correctly • Prototype  Production transitions • Data pipeline complexity (who do you ask?)
  • 22. What is hard about Data Science 22
  • 23. What are Data Scientists really doing? 23 Section 2 https://visit.figure-eight.com/rs/416-ZBE- 142/images/CrowdFlower_DataScienceReport_2016.pdf

Notas do Editor

  1. Ronny Kohavi* keynote at KDD 2015 People are incredibly clever at explaining “very surprising results”. Unfortunately most very surprising results are caused by data pipeline errors. Beware “HiPPOs” (Highest Paid-Person’s Opinion)
  2. Quote from paper “I’d rather the data go away than be wrong and not know” Assumptions not communicated: transformations not documented.
  3. Quote from paper “I’d rather the data go away than be wrong and not know” Assumptions not communicated: transformations not documented.