Lecture_1_Intro_toDS&AI.pptx

DS101:
Introduction to AI and DS
Lecture 1: Introduction to Data Science
Dr. Sudheer
hsudheer@ifheindia.org
1
2
Course Code Course Title L P U
DS101
Introduction to Data Science and Artificial
Intelligence
3 0 3
Team of Instructors: 1. Ms Sathya AR 2. Ms. P Rohini 3. Dr. H Sudheer 4. Dr. P. Sirisha
Course Objective:
1. The objective of this course is to expose the students to fundamental concepts of data science and their
implementation using Python programming.
2. Introduce the mathematical foundations required for data science
3. To explore the various data pre-processing techniques
4. To Summarize the aspects of exploratory data analysis (EDA): Uses of EDA; Role of metadata in EDA; Data
transformations identified through EDA.
5. To understand the AI approaches in Data Science.
3
Textbook (s) T1 Cathy O’Neil and Rachel Schutt, “Doing Data Science, Straight Talk From The
Frontline”, O’Reilly, 2014.
T2 Artificial Intelligence A Modern Approach, by Stuart Russell and Peter
Norvig, 3 rd Edition, Pearson Education, 2010, ISBN 13:978-0-13-604259-4.
Reference Book(s) R1 Python Data Science Handbook, Essential Tools for Working with Data, Jake
VanderPlas,Orielly, 2017
R2
Data Science from Scratch: FIRST PRINCIPLES WITH PYTHON, Joel Grus,
Orielly,2019
R3 The Data Science HandBook, Field Cady ,Wiley,2017
R4
Jiawei Han, Micheline Kamber and Jian Pei, “ Data Mining: Concepts and
Techniques”, Third Edition. ISBN 0123814790, 2011
Online Resources R5 https://onlinecourses.nptel.ac.in/noc22_cs72/preview
R6 https://www.udemy.com/course/complete-python-bootcamp/
R7
https://lms.simplilearn.com/courses/4227/Introduction-to-Data-
Science/syllabus
4
“Data is the new oil. It’s valuable, but if unrefined it cannot really be used. It has
to be changed into gas, plastic, chemicals, etc to create a valuable entity that
drives profitable activity; so must data be broken down, analyzed for it to have
value.” — Clive Humby, 2006
5
Increasingly many companies see
themselves as data driven.
6
Data Science is the science which uses computer science, statistics
and machine learning, visualization and human-computer
interactions to collect, clean, integrate, analyze, visualize, interact
with data to create data products.
Data science is an interdisciplinary field that uses scientific methods,
processes, algorithms and systems to extract knowledge and insights
from noisy, structured and unstructured data] and apply knowledge from
data across a broad range of application domains. Data science is related
to data mining, machine learning and big data.
SOURCE : WIKIPIDEA
7
Big Data and Data Science Hype
8
“Big Data” Sources
Every:
Click
Ad impression
Billing event
Fast Forward, pause,…
Server request
Transaction
Network message
Fault
…
User Generated (Web &
Mobile)
….
.
Internet of Things / M2M Health/Scientific Computing
It’s All Happening On-line
“Big Data” Sources
11
The Current Landscape (with a Little History)
12
Data science is a broad field that refers to the collective
processes, theories, concepts, tools and technologies that
enable the review, analysis and extraction of valuable
knowledge and information from raw data.
Source: Techopedia
Drew Conway’s Venn diagram of data science
Rise of the Data Scientist
13
skills of Data Geeks:
Statistics – traditional analysis you’re used to thinking about
Data Munging – parsing, scraping, and formatting data
Visualization – graphs, tools, etc.
Harvard Business Review declared data scientist to be the “Sexiest Job of the
21st Century”.
https://hbr.org/2012/10/data-scientist-the-sexiest-job-of-the-21st-century
The Role of the Social Scientist in Data Science
14
Both LinkedIn and Facebook are social network companies.
Often‐ times a description or definition of data scientist includes hybrid sta
tistician, software engineer, and social scientist.
If they’re social science-y problems like friend recommendations or people
you know or user segmentation, then by all means, bring on the social
scientist! Social scientists also do tend to be good question askers and have
other good investigative qualities, so a social scientist who also has the
quantitative and programming chops makes a great data scientist.
Data Science Jobs
15
Most of the job descriptions: they ask data scientists to be experts in
computer science, statistics, communication, data visualization, and to have
extensive domain expertise.
Nobody is an expert in everything, which is why it makes more sense to create
teams of people who have different profiles and different expertise together,
as a team, they can specialize in all those things.
A Data Science Profile :
• Computer science
• Math
• Statistics
• Machine learning
• Domain expertise
• Communication and presentation skills
• Data visualization
Rachel’s data science profile, which she created to illustrate trying to visualize oneself as a data
scientist; she wanted students and guest lecturers to “riff” on this—to add buckets or remove
skills, use a different scale or visualization method, and think about the drawbacks of self-
reporting
16
Data science team profiles can be
constructed from data scientist
profiles; there should be alignment
between the data science team
profile and the profile of the data
problems they try to solve
17
Data science workflow
18
Section 2
https://cacm.acm.org/blogs/blog-cacm/169199-data-science-
workflow-overview-and-challenges/fulltext
Data science workflow
19
Section 2
Data science workflow
20
Digging Around
in Data
Hypothesize
Model
Large Scale
Exploitation
Evaluate
Interpret
Clean,
prep
What is hard about Data Science
21
• Overcoming assumptions
• Making ad-hoc explanations of data patterns
• Overgeneralizing
• Communication
• Not checking enough (validate models, data pipeline
integrity, etc.)
• Using statistical tests correctly
• Prototype  Production transitions
• Data pipeline complexity (who do you ask?)
What is hard about Data Science
22
What are Data Scientists really doing?
23
Section 2
https://visit.figure-eight.com/rs/416-ZBE-
142/images/CrowdFlower_DataScienceReport_2016.pdf
1 de 23

Recomendados

50 Years of Data Science por
50 Years of Data Science50 Years of Data Science
50 Years of Data ScienceNafiseh Navabpour
739 visualizações27 slides
Luciano uvi hackfest.28.10.2020 por
Luciano uvi hackfest.28.10.2020Luciano uvi hackfest.28.10.2020
Luciano uvi hackfest.28.10.2020Joanne Luciano
118 visualizações41 slides
Data literacy por
Data literacyData literacy
Data literacyJayanta Nayek
4K visualizações45 slides
Data science por
Data science Data science
Data science SouravSadhukhan6
3.2K visualizações9 slides
Data+Science : A First Course por
Data+Science : A First CourseData+Science : A First Course
Data+Science : A First CourseArnab Majumdar
39.7K visualizações35 slides
NOVA Data Science Meetup 8-10-2017 Presentation - State of Data Science Educa... por
NOVA Data Science Meetup 8-10-2017 Presentation - State of Data Science Educa...NOVA Data Science Meetup 8-10-2017 Presentation - State of Data Science Educa...
NOVA Data Science Meetup 8-10-2017 Presentation - State of Data Science Educa...NOVA DATASCIENCE
323 visualizações30 slides

Mais conteúdo relacionado

Similar a Lecture_1_Intro_toDS&AI.pptx

Data science by john d. kelleher, brendan tierney (z lib.org) por
Data science by john d. kelleher, brendan tierney (z lib.org)Data science by john d. kelleher, brendan tierney (z lib.org)
Data science by john d. kelleher, brendan tierney (z lib.org)Tayab Memon
3.2K visualizações282 slides
Next generation of data scientist por
Next generation of data scientistNext generation of data scientist
Next generation of data scientistTanujaSomvanshi1
833 visualizações12 slides
Session 01 designing and scoping a data science project por
Session 01 designing and scoping a data science projectSession 01 designing and scoping a data science project
Session 01 designing and scoping a data science projectbodaceacat
562 visualizações42 slides
Session 01 designing and scoping a data science project por
Session 01 designing and scoping a data science projectSession 01 designing and scoping a data science project
Session 01 designing and scoping a data science projectSara-Jayne Terp
990 visualizações42 slides
ds.pptx por
ds.pptxds.pptx
ds.pptxElves3
37 visualizações6 slides
The profile of the management (data) scientist: Potential scenarios and skill... por
The profile of the management (data) scientist: Potential scenarios and skill...The profile of the management (data) scientist: Potential scenarios and skill...
The profile of the management (data) scientist: Potential scenarios and skill...Juan Mateos-Garcia
785 visualizações10 slides

Similar a Lecture_1_Intro_toDS&AI.pptx(20)

Data science by john d. kelleher, brendan tierney (z lib.org) por Tayab Memon
Data science by john d. kelleher, brendan tierney (z lib.org)Data science by john d. kelleher, brendan tierney (z lib.org)
Data science by john d. kelleher, brendan tierney (z lib.org)
Tayab Memon3.2K visualizações
Next generation of data scientist por TanujaSomvanshi1
Next generation of data scientistNext generation of data scientist
Next generation of data scientist
TanujaSomvanshi1833 visualizações
Session 01 designing and scoping a data science project por bodaceacat
Session 01 designing and scoping a data science projectSession 01 designing and scoping a data science project
Session 01 designing and scoping a data science project
bodaceacat562 visualizações
Session 01 designing and scoping a data science project por Sara-Jayne Terp
Session 01 designing and scoping a data science projectSession 01 designing and scoping a data science project
Session 01 designing and scoping a data science project
Sara-Jayne Terp990 visualizações
ds.pptx por Elves3
ds.pptxds.pptx
ds.pptx
Elves337 visualizações
The profile of the management (data) scientist: Potential scenarios and skill... por Juan Mateos-Garcia
The profile of the management (data) scientist: Potential scenarios and skill...The profile of the management (data) scientist: Potential scenarios and skill...
The profile of the management (data) scientist: Potential scenarios and skill...
Juan Mateos-Garcia785 visualizações
Ch1IntroductiontoDataScience.pptx por AbderrahmanABID2
Ch1IntroductiontoDataScience.pptxCh1IntroductiontoDataScience.pptx
Ch1IntroductiontoDataScience.pptx
AbderrahmanABID214 visualizações
Ed Fox on Learning Technologies por Gardner Campbell
Ed Fox on Learning TechnologiesEd Fox on Learning Technologies
Ed Fox on Learning Technologies
Gardner Campbell613 visualizações
Data, Data Everywhere: What's A Publisher to Do? por Anita de Waard
Data, Data Everywhere: What's  A Publisher to Do?Data, Data Everywhere: What's  A Publisher to Do?
Data, Data Everywhere: What's A Publisher to Do?
Anita de Waard338 visualizações
A New Paradigm on Analytic-Driven Information and Automation V2.pdf por ArmyTrilidiaDevegaSK
A New Paradigm on Analytic-Driven Information and Automation V2.pdfA New Paradigm on Analytic-Driven Information and Automation V2.pdf
A New Paradigm on Analytic-Driven Information and Automation V2.pdf
ArmyTrilidiaDevegaSK19 visualizações
Data fluency for the 21st century por MartinFrigaard
Data fluency for the 21st centuryData fluency for the 21st century
Data fluency for the 21st century
MartinFrigaard97 visualizações
Data Science for Every Student at RPI por Steven Miller
Data Science for Every Student at RPIData Science for Every Student at RPI
Data Science for Every Student at RPI
Steven Miller581 visualizações
Data Science Meets Biomedicine, Does Anything Change por Philip Bourne
Data Science Meets Biomedicine, Does Anything ChangeData Science Meets Biomedicine, Does Anything Change
Data Science Meets Biomedicine, Does Anything Change
Philip Bourne14 visualizações
Thinking About the Making of Data por Paul Groth
Thinking About the Making of DataThinking About the Making of Data
Thinking About the Making of Data
Paul Groth440 visualizações
Dutch Cooking with xAPI Recipes, The Good, the Bad, and the Consistent por Hendrik Drachsler
Dutch Cooking with xAPI Recipes, The Good, the Bad, and the ConsistentDutch Cooking with xAPI Recipes, The Good, the Bad, and the Consistent
Dutch Cooking with xAPI Recipes, The Good, the Bad, and the Consistent
Hendrik Drachsler935 visualizações
My FAIR share of the work - Diamond Light Source - Dec 2018 por Susanna-Assunta Sansone
My FAIR share of the work - Diamond Light Source - Dec 2018My FAIR share of the work - Diamond Light Source - Dec 2018
My FAIR share of the work - Diamond Light Source - Dec 2018
Susanna-Assunta Sansone237 visualizações
Data Science- Basics.pptx por RupaliKute3
Data Science- Basics.pptxData Science- Basics.pptx
Data Science- Basics.pptx
RupaliKute32 visualizações
Building Effective Visualization Shiny WVF por Olga Scrivner
Building Effective Visualization Shiny WVFBuilding Effective Visualization Shiny WVF
Building Effective Visualization Shiny WVF
Olga Scrivner148 visualizações
Roles of Datascience.pptx por KarthicaMarasamy
Roles of Datascience.pptxRoles of Datascience.pptx
Roles of Datascience.pptx
KarthicaMarasamy3 visualizações

Último

discussion post.pdf por
discussion post.pdfdiscussion post.pdf
discussion post.pdfjessemercerail
120 visualizações1 slide
231112 (WR) v1 ChatGPT OEB 2023.pdf por
231112 (WR) v1  ChatGPT OEB 2023.pdf231112 (WR) v1  ChatGPT OEB 2023.pdf
231112 (WR) v1 ChatGPT OEB 2023.pdfWilfredRubens.com
144 visualizações21 slides
EIT-Digital_Spohrer_AI_Intro 20231128 v1.pptx por
EIT-Digital_Spohrer_AI_Intro 20231128 v1.pptxEIT-Digital_Spohrer_AI_Intro 20231128 v1.pptx
EIT-Digital_Spohrer_AI_Intro 20231128 v1.pptxISSIP
317 visualizações50 slides
ISO/IEC 27001 and ISO/IEC 27005: Managing AI Risks Effectively por
ISO/IEC 27001 and ISO/IEC 27005: Managing AI Risks EffectivelyISO/IEC 27001 and ISO/IEC 27005: Managing AI Risks Effectively
ISO/IEC 27001 and ISO/IEC 27005: Managing AI Risks EffectivelyPECB
545 visualizações18 slides
Are we onboard yet University of Sussex.pptx por
Are we onboard yet University of Sussex.pptxAre we onboard yet University of Sussex.pptx
Are we onboard yet University of Sussex.pptxJisc
77 visualizações7 slides
Solar System and Galaxies.pptx por
Solar System and Galaxies.pptxSolar System and Galaxies.pptx
Solar System and Galaxies.pptxDrHafizKosar
85 visualizações26 slides

Último(20)

discussion post.pdf por jessemercerail
discussion post.pdfdiscussion post.pdf
discussion post.pdf
jessemercerail120 visualizações
231112 (WR) v1 ChatGPT OEB 2023.pdf por WilfredRubens.com
231112 (WR) v1  ChatGPT OEB 2023.pdf231112 (WR) v1  ChatGPT OEB 2023.pdf
231112 (WR) v1 ChatGPT OEB 2023.pdf
WilfredRubens.com144 visualizações
EIT-Digital_Spohrer_AI_Intro 20231128 v1.pptx por ISSIP
EIT-Digital_Spohrer_AI_Intro 20231128 v1.pptxEIT-Digital_Spohrer_AI_Intro 20231128 v1.pptx
EIT-Digital_Spohrer_AI_Intro 20231128 v1.pptx
ISSIP317 visualizações
ISO/IEC 27001 and ISO/IEC 27005: Managing AI Risks Effectively por PECB
ISO/IEC 27001 and ISO/IEC 27005: Managing AI Risks EffectivelyISO/IEC 27001 and ISO/IEC 27005: Managing AI Risks Effectively
ISO/IEC 27001 and ISO/IEC 27005: Managing AI Risks Effectively
PECB 545 visualizações
Are we onboard yet University of Sussex.pptx por Jisc
Are we onboard yet University of Sussex.pptxAre we onboard yet University of Sussex.pptx
Are we onboard yet University of Sussex.pptx
Jisc77 visualizações
Solar System and Galaxies.pptx por DrHafizKosar
Solar System and Galaxies.pptxSolar System and Galaxies.pptx
Solar System and Galaxies.pptx
DrHafizKosar85 visualizações
ANATOMY AND PHYSIOLOGY UNIT 1 { PART-1} por DR .PALLAVI PATHANIA
ANATOMY AND PHYSIOLOGY UNIT 1 { PART-1}ANATOMY AND PHYSIOLOGY UNIT 1 { PART-1}
ANATOMY AND PHYSIOLOGY UNIT 1 { PART-1}
DR .PALLAVI PATHANIA240 visualizações
Psychology KS5 por WestHatch
Psychology KS5Psychology KS5
Psychology KS5
WestHatch77 visualizações
Google solution challenge..pptx por ChitreshGyanani1
Google solution challenge..pptxGoogle solution challenge..pptx
Google solution challenge..pptx
ChitreshGyanani198 visualizações
Structure and Functions of Cell.pdf por Nithya Murugan
Structure and Functions of Cell.pdfStructure and Functions of Cell.pdf
Structure and Functions of Cell.pdf
Nithya Murugan368 visualizações
Material del tarjetero LEES Travesías.docx por Norberto Millán Muñoz
Material del tarjetero LEES Travesías.docxMaterial del tarjetero LEES Travesías.docx
Material del tarjetero LEES Travesías.docx
Norberto Millán Muñoz68 visualizações
American Psychological Association 7th Edition.pptx por SamiullahAfridi4
American Psychological Association  7th Edition.pptxAmerican Psychological Association  7th Edition.pptx
American Psychological Association 7th Edition.pptx
SamiullahAfridi482 visualizações
CWP_23995_2013_17_11_2023_FINAL_ORDER.pdf por SukhwinderSingh895865
CWP_23995_2013_17_11_2023_FINAL_ORDER.pdfCWP_23995_2013_17_11_2023_FINAL_ORDER.pdf
CWP_23995_2013_17_11_2023_FINAL_ORDER.pdf
SukhwinderSingh895865507 visualizações
Scope of Biochemistry.pptx por shoba shoba
Scope of Biochemistry.pptxScope of Biochemistry.pptx
Scope of Biochemistry.pptx
shoba shoba124 visualizações
Narration lesson plan.docx por TARIQ KHAN
Narration lesson plan.docxNarration lesson plan.docx
Narration lesson plan.docx
TARIQ KHAN104 visualizações
Women from Hackney’s History: Stoke Newington by Sue Doe por History of Stoke Newington
Women from Hackney’s History: Stoke Newington by Sue DoeWomen from Hackney’s History: Stoke Newington by Sue Doe
Women from Hackney’s History: Stoke Newington by Sue Doe
History of Stoke Newington141 visualizações

Lecture_1_Intro_toDS&AI.pptx

  • 1. DS101: Introduction to AI and DS Lecture 1: Introduction to Data Science Dr. Sudheer hsudheer@ifheindia.org 1
  • 2. 2 Course Code Course Title L P U DS101 Introduction to Data Science and Artificial Intelligence 3 0 3 Team of Instructors: 1. Ms Sathya AR 2. Ms. P Rohini 3. Dr. H Sudheer 4. Dr. P. Sirisha Course Objective: 1. The objective of this course is to expose the students to fundamental concepts of data science and their implementation using Python programming. 2. Introduce the mathematical foundations required for data science 3. To explore the various data pre-processing techniques 4. To Summarize the aspects of exploratory data analysis (EDA): Uses of EDA; Role of metadata in EDA; Data transformations identified through EDA. 5. To understand the AI approaches in Data Science.
  • 3. 3 Textbook (s) T1 Cathy O’Neil and Rachel Schutt, “Doing Data Science, Straight Talk From The Frontline”, O’Reilly, 2014. T2 Artificial Intelligence A Modern Approach, by Stuart Russell and Peter Norvig, 3 rd Edition, Pearson Education, 2010, ISBN 13:978-0-13-604259-4. Reference Book(s) R1 Python Data Science Handbook, Essential Tools for Working with Data, Jake VanderPlas,Orielly, 2017 R2 Data Science from Scratch: FIRST PRINCIPLES WITH PYTHON, Joel Grus, Orielly,2019 R3 The Data Science HandBook, Field Cady ,Wiley,2017 R4 Jiawei Han, Micheline Kamber and Jian Pei, “ Data Mining: Concepts and Techniques”, Third Edition. ISBN 0123814790, 2011 Online Resources R5 https://onlinecourses.nptel.ac.in/noc22_cs72/preview R6 https://www.udemy.com/course/complete-python-bootcamp/ R7 https://lms.simplilearn.com/courses/4227/Introduction-to-Data- Science/syllabus
  • 4. 4 “Data is the new oil. It’s valuable, but if unrefined it cannot really be used. It has to be changed into gas, plastic, chemicals, etc to create a valuable entity that drives profitable activity; so must data be broken down, analyzed for it to have value.” — Clive Humby, 2006
  • 5. 5 Increasingly many companies see themselves as data driven.
  • 6. 6
  • 7. Data Science is the science which uses computer science, statistics and machine learning, visualization and human-computer interactions to collect, clean, integrate, analyze, visualize, interact with data to create data products. Data science is an interdisciplinary field that uses scientific methods, processes, algorithms and systems to extract knowledge and insights from noisy, structured and unstructured data] and apply knowledge from data across a broad range of application domains. Data science is related to data mining, machine learning and big data. SOURCE : WIKIPIDEA 7
  • 8. Big Data and Data Science Hype 8
  • 9. “Big Data” Sources Every: Click Ad impression Billing event Fast Forward, pause,… Server request Transaction Network message Fault … User Generated (Web & Mobile) …. . Internet of Things / M2M Health/Scientific Computing It’s All Happening On-line
  • 11. 11
  • 12. The Current Landscape (with a Little History) 12 Data science is a broad field that refers to the collective processes, theories, concepts, tools and technologies that enable the review, analysis and extraction of valuable knowledge and information from raw data. Source: Techopedia Drew Conway’s Venn diagram of data science
  • 13. Rise of the Data Scientist 13 skills of Data Geeks: Statistics – traditional analysis you’re used to thinking about Data Munging – parsing, scraping, and formatting data Visualization – graphs, tools, etc. Harvard Business Review declared data scientist to be the “Sexiest Job of the 21st Century”. https://hbr.org/2012/10/data-scientist-the-sexiest-job-of-the-21st-century
  • 14. The Role of the Social Scientist in Data Science 14 Both LinkedIn and Facebook are social network companies. Often‐ times a description or definition of data scientist includes hybrid sta tistician, software engineer, and social scientist. If they’re social science-y problems like friend recommendations or people you know or user segmentation, then by all means, bring on the social scientist! Social scientists also do tend to be good question askers and have other good investigative qualities, so a social scientist who also has the quantitative and programming chops makes a great data scientist.
  • 15. Data Science Jobs 15 Most of the job descriptions: they ask data scientists to be experts in computer science, statistics, communication, data visualization, and to have extensive domain expertise. Nobody is an expert in everything, which is why it makes more sense to create teams of people who have different profiles and different expertise together, as a team, they can specialize in all those things. A Data Science Profile : • Computer science • Math • Statistics • Machine learning • Domain expertise • Communication and presentation skills • Data visualization
  • 16. Rachel’s data science profile, which she created to illustrate trying to visualize oneself as a data scientist; she wanted students and guest lecturers to “riff” on this—to add buckets or remove skills, use a different scale or visualization method, and think about the drawbacks of self- reporting 16
  • 17. Data science team profiles can be constructed from data scientist profiles; there should be alignment between the data science team profile and the profile of the data problems they try to solve 17
  • 18. Data science workflow 18 Section 2 https://cacm.acm.org/blogs/blog-cacm/169199-data-science- workflow-overview-and-challenges/fulltext
  • 20. Data science workflow 20 Digging Around in Data Hypothesize Model Large Scale Exploitation Evaluate Interpret Clean, prep
  • 21. What is hard about Data Science 21 • Overcoming assumptions • Making ad-hoc explanations of data patterns • Overgeneralizing • Communication • Not checking enough (validate models, data pipeline integrity, etc.) • Using statistical tests correctly • Prototype  Production transitions • Data pipeline complexity (who do you ask?)
  • 22. What is hard about Data Science 22
  • 23. What are Data Scientists really doing? 23 Section 2 https://visit.figure-eight.com/rs/416-ZBE- 142/images/CrowdFlower_DataScienceReport_2016.pdf

Notas do Editor

  1. Ronny Kohavi* keynote at KDD 2015 People are incredibly clever at explaining “very surprising results”. Unfortunately most very surprising results are caused by data pipeline errors. Beware “HiPPOs” (Highest Paid-Person’s Opinion)
  2. Quote from paper “I’d rather the data go away than be wrong and not know” Assumptions not communicated: transformations not documented.
  3. Quote from paper “I’d rather the data go away than be wrong and not know” Assumptions not communicated: transformations not documented.