SlideShare a Scribd company logo
1 of 12
Download to read offline
Lecture#2
Program: BS(DS)-Fall 2019
Instructor: Konpal Darakshan
DATA SCIENCE
Topics
• What is Data Science?
• Pre-requisites for Data Science.
• Data ScienceTasks
• Data Science Life Cycle
• Data Scientist
• Common Data Quality Problem
• How to tackle a Data problem
Data Science
A science of:
■ Interactive analysis of data
■ Interactive retrieval of data
■ Interactive prediction based on foresee/intelligence.
■ Generalized definition: Data science is the science which uses
computer science, statistics and machine learning, visualization
and human computer interactions to collect, clean, integrate,
analyze, visualize, and interact with data to create data
products.
Data Science
Pre-requisites for Data Science:
■ Computer Science: the study of both computer hardware and
software design. It encompasses both the study of theoretical
algorithms and the practical problems involved in
implementing them through computer hardware and software.
■ Statistics: a branch of mathematics dealing with the collection,
analysis, interpretation, and presentation of masses of
numerical data.
■ Machine Learning: Machine
learning is an application of artificial
intelligence (AI) that provides systems
the ability to automatically learn and
improve from experience without
being explicitly programmed. Machine
learning focuses on the development
of computer programs that can access
data and use it learn for themselves.
■ Visualization: the process of
representing data graphically and
interacting with these representations
in order to gain insight into the data.
Pre-requisites for Data Science:
Regular Data ScienceTasks
• Data analysis
• What percentage of users back to our site?
• Which products usually bought together?
• Modeling/statistics
• How many cars we are going to sell next year?
• Which city is better for opening new office?
• Engineering/prototyping
• Product to use a prediction model
• Visualization of analytics
Lecture #02
What is a Data Scientist?
■ Data scientists serve the needs and solve the problems of data users. They use their
formidable skills in math, statistics and programming to clean, manage and organize
them.Then they apply all their analytic powers to uncover hidden solutions in the data.
Lecture #02
How to tackle a Data problem?
■ Subject Matter Expert (SME)
■ They possess domain knowledge with regards to the type of
problem and so are a source of professional wisdom.
■ Anomaly
■ Anomalies are best and worst case scenarios. Main aim is to
reach “the center” for the information required regarding the
problem.
■ Risk and Uncertainty in data
■ Uncertainty can be minimized through the validation of
information gained about the problem. Risks can be reduced
once uncertainty is reduced which enables to make good
decisions.

More Related Content

What's hot

Graph Analytics for big data
Graph Analytics for big dataGraph Analytics for big data
Graph Analytics for big dataSigmoid
 
Data mining an introduction
Data mining an introductionData mining an introduction
Data mining an introductionDr-Dipali Meher
 
Research Methods for Computational Statistics
Research Methods for Computational StatisticsResearch Methods for Computational Statistics
Research Methods for Computational StatisticsSetia Pramana
 
Informatica MDM Presentation
Informatica MDM PresentationInformatica MDM Presentation
Informatica MDM PresentationMaxHung
 
Data quality overview
Data quality overviewData quality overview
Data quality overviewAlex Meadows
 
Sap information steward
Sap information stewardSap information steward
Sap information stewardytrhvk
 
Data Strategy
Data StrategyData Strategy
Data Strategysabnees
 
Data Lakehouse, Data Mesh, and Data Fabric (r2)
Data Lakehouse, Data Mesh, and Data Fabric (r2)Data Lakehouse, Data Mesh, and Data Fabric (r2)
Data Lakehouse, Data Mesh, and Data Fabric (r2)James Serra
 
Master Data Management
Master Data ManagementMaster Data Management
Master Data ManagementZahra Mansoori
 
Data Platform Architecture Principles and Evaluation Criteria
Data Platform Architecture Principles and Evaluation CriteriaData Platform Architecture Principles and Evaluation Criteria
Data Platform Architecture Principles and Evaluation CriteriaScyllaDB
 
Data pre processing
Data pre processingData pre processing
Data pre processingpommurajopt
 
Data quality architecture
Data quality architectureData quality architecture
Data quality architectureanicewick
 
Big Data Architecture
Big Data ArchitectureBig Data Architecture
Big Data ArchitectureGuido Schmutz
 
Master data management
Master data managementMaster data management
Master data managementZahra Mansoori
 
Big Data and BI Best Practices
Big Data and BI Best PracticesBig Data and BI Best Practices
Big Data and BI Best PracticesYellowfin
 
Implementing Effective Data Governance
Implementing Effective Data GovernanceImplementing Effective Data Governance
Implementing Effective Data GovernanceChristopher Bradley
 
You Need a Data Catalog. Do You Know Why?
You Need a Data Catalog. Do You Know Why?You Need a Data Catalog. Do You Know Why?
You Need a Data Catalog. Do You Know Why?Precisely
 
Data Quality for Non-Data People
Data Quality for Non-Data PeopleData Quality for Non-Data People
Data Quality for Non-Data PeopleDATAVERSITY
 

What's hot (20)

Graph Analytics for big data
Graph Analytics for big dataGraph Analytics for big data
Graph Analytics for big data
 
Data mining an introduction
Data mining an introductionData mining an introduction
Data mining an introduction
 
Big Data analytics best practices
Big Data analytics best practicesBig Data analytics best practices
Big Data analytics best practices
 
Research Methods for Computational Statistics
Research Methods for Computational StatisticsResearch Methods for Computational Statistics
Research Methods for Computational Statistics
 
Informatica MDM Presentation
Informatica MDM PresentationInformatica MDM Presentation
Informatica MDM Presentation
 
Data quality overview
Data quality overviewData quality overview
Data quality overview
 
Sap information steward
Sap information stewardSap information steward
Sap information steward
 
Data Strategy
Data StrategyData Strategy
Data Strategy
 
Data Lakehouse, Data Mesh, and Data Fabric (r2)
Data Lakehouse, Data Mesh, and Data Fabric (r2)Data Lakehouse, Data Mesh, and Data Fabric (r2)
Data Lakehouse, Data Mesh, and Data Fabric (r2)
 
Master Data Management
Master Data ManagementMaster Data Management
Master Data Management
 
Data Platform Architecture Principles and Evaluation Criteria
Data Platform Architecture Principles and Evaluation CriteriaData Platform Architecture Principles and Evaluation Criteria
Data Platform Architecture Principles and Evaluation Criteria
 
Data pre processing
Data pre processingData pre processing
Data pre processing
 
Data quality architecture
Data quality architectureData quality architecture
Data quality architecture
 
Big Data Architecture
Big Data ArchitectureBig Data Architecture
Big Data Architecture
 
Master data management
Master data managementMaster data management
Master data management
 
03 preprocessing
03 preprocessing03 preprocessing
03 preprocessing
 
Big Data and BI Best Practices
Big Data and BI Best PracticesBig Data and BI Best Practices
Big Data and BI Best Practices
 
Implementing Effective Data Governance
Implementing Effective Data GovernanceImplementing Effective Data Governance
Implementing Effective Data Governance
 
You Need a Data Catalog. Do You Know Why?
You Need a Data Catalog. Do You Know Why?You Need a Data Catalog. Do You Know Why?
You Need a Data Catalog. Do You Know Why?
 
Data Quality for Non-Data People
Data Quality for Non-Data PeopleData Quality for Non-Data People
Data Quality for Non-Data People
 

Similar to Lecture #02

Data analytics career path
Data analytics career pathData analytics career path
Data analytics career pathRubikal
 
DATA SCIENCE IS CATALYZING BUSINESS AND INNOVATION
DATA SCIENCE IS CATALYZING BUSINESS AND INNOVATION DATA SCIENCE IS CATALYZING BUSINESS AND INNOVATION
DATA SCIENCE IS CATALYZING BUSINESS AND INNOVATION Elvis Muyanja
 
Colloquium(7)_DataScience:ShivShaktiGhosh&MohitGarg
Colloquium(7)_DataScience:ShivShaktiGhosh&MohitGargColloquium(7)_DataScience:ShivShaktiGhosh&MohitGarg
Colloquium(7)_DataScience:ShivShaktiGhosh&MohitGargShiv Shakti Ghosh
 
Data Science Training in Chandigarh h
Data Science Training in Chandigarh    hData Science Training in Chandigarh    h
Data Science Training in Chandigarh hasmeerana605
 
Data Science Introduction: Concepts, lifecycle, applications.pptx
Data Science Introduction: Concepts, lifecycle, applications.pptxData Science Introduction: Concepts, lifecycle, applications.pptx
Data Science Introduction: Concepts, lifecycle, applications.pptxsumitkumar600840
 
Data Science.pptx NEW COURICUUMN IN DATA
Data Science.pptx NEW COURICUUMN IN DATAData Science.pptx NEW COURICUUMN IN DATA
Data Science.pptx NEW COURICUUMN IN DATAjaved75
 
DATA SCIENCE.pptx.pdf
DATA SCIENCE.pptx.pdfDATA SCIENCE.pptx.pdf
DATA SCIENCE.pptx.pdfRahulTr22
 
Data Analytics & Visualization (Introduction)
Data Analytics & Visualization (Introduction)Data Analytics & Visualization (Introduction)
Data Analytics & Visualization (Introduction)Dolapo Amusat
 
Introduction to Data Science.pptx
Introduction to Data Science.pptxIntroduction to Data Science.pptx
Introduction to Data Science.pptxDr.Shweta
 
Key Roles In Data-Driven Organisation
Key Roles In Data-Driven OrganisationKey Roles In Data-Driven Organisation
Key Roles In Data-Driven OrganisationKnoldus Inc.
 
Key Roles In Data-Driven Organisation
Key Roles In Data-Driven OrganisationKey Roles In Data-Driven Organisation
Key Roles In Data-Driven OrganisationKnoldus Inc.
 
Data science and business analytics
Data  science and business analyticsData  science and business analytics
Data science and business analyticsInbavalli Valli
 
Introduction to Data Science.pptx
Introduction to Data Science.pptxIntroduction to Data Science.pptx
Introduction to Data Science.pptxVrishit Saraswat
 
INTRODUCTION TO DATA SCIENCE -CONCEPTS.pptx
INTRODUCTION TO DATA SCIENCE -CONCEPTS.pptxINTRODUCTION TO DATA SCIENCE -CONCEPTS.pptx
INTRODUCTION TO DATA SCIENCE -CONCEPTS.pptxMadhumitha N
 
Getting started in Data Science (April 2017, Los Angeles)
Getting started in Data Science (April 2017, Los Angeles)Getting started in Data Science (April 2017, Los Angeles)
Getting started in Data Science (April 2017, Los Angeles)Thinkful
 

Similar to Lecture #02 (20)

Data analytics career path
Data analytics career pathData analytics career path
Data analytics career path
 
Data Analytics Career Paths
Data Analytics Career PathsData Analytics Career Paths
Data Analytics Career Paths
 
DATA SCIENCE IS CATALYZING BUSINESS AND INNOVATION
DATA SCIENCE IS CATALYZING BUSINESS AND INNOVATION DATA SCIENCE IS CATALYZING BUSINESS AND INNOVATION
DATA SCIENCE IS CATALYZING BUSINESS AND INNOVATION
 
Data science
Data scienceData science
Data science
 
Colloquium(7)_DataScience:ShivShaktiGhosh&MohitGarg
Colloquium(7)_DataScience:ShivShaktiGhosh&MohitGargColloquium(7)_DataScience:ShivShaktiGhosh&MohitGarg
Colloquium(7)_DataScience:ShivShaktiGhosh&MohitGarg
 
Data Science Training in Chandigarh h
Data Science Training in Chandigarh    hData Science Training in Chandigarh    h
Data Science Training in Chandigarh h
 
Data Science Introduction: Concepts, lifecycle, applications.pptx
Data Science Introduction: Concepts, lifecycle, applications.pptxData Science Introduction: Concepts, lifecycle, applications.pptx
Data Science Introduction: Concepts, lifecycle, applications.pptx
 
Data Science.pptx NEW COURICUUMN IN DATA
Data Science.pptx NEW COURICUUMN IN DATAData Science.pptx NEW COURICUUMN IN DATA
Data Science.pptx NEW COURICUUMN IN DATA
 
DATA SCIENCE.pptx.pdf
DATA SCIENCE.pptx.pdfDATA SCIENCE.pptx.pdf
DATA SCIENCE.pptx.pdf
 
data science
data sciencedata science
data science
 
data science
data sciencedata science
data science
 
Data Analytics & Visualization (Introduction)
Data Analytics & Visualization (Introduction)Data Analytics & Visualization (Introduction)
Data Analytics & Visualization (Introduction)
 
Introduction to Data Science.pptx
Introduction to Data Science.pptxIntroduction to Data Science.pptx
Introduction to Data Science.pptx
 
Key Roles In Data-Driven Organisation
Key Roles In Data-Driven OrganisationKey Roles In Data-Driven Organisation
Key Roles In Data-Driven Organisation
 
Key Roles In Data-Driven Organisation
Key Roles In Data-Driven OrganisationKey Roles In Data-Driven Organisation
Key Roles In Data-Driven Organisation
 
Data science and business analytics
Data  science and business analyticsData  science and business analytics
Data science and business analytics
 
DataScience.pptx
DataScience.pptxDataScience.pptx
DataScience.pptx
 
Introduction to Data Science.pptx
Introduction to Data Science.pptxIntroduction to Data Science.pptx
Introduction to Data Science.pptx
 
INTRODUCTION TO DATA SCIENCE -CONCEPTS.pptx
INTRODUCTION TO DATA SCIENCE -CONCEPTS.pptxINTRODUCTION TO DATA SCIENCE -CONCEPTS.pptx
INTRODUCTION TO DATA SCIENCE -CONCEPTS.pptx
 
Getting started in Data Science (April 2017, Los Angeles)
Getting started in Data Science (April 2017, Los Angeles)Getting started in Data Science (April 2017, Los Angeles)
Getting started in Data Science (April 2017, Los Angeles)
 

Recently uploaded

MEASURES OF DISPERSION I BSc Botany .ppt
MEASURES OF DISPERSION I BSc Botany .pptMEASURES OF DISPERSION I BSc Botany .ppt
MEASURES OF DISPERSION I BSc Botany .pptaigil2
 
YourView Panel Book.pptx YourView Panel Book.
YourView Panel Book.pptx YourView Panel Book.YourView Panel Book.pptx YourView Panel Book.
YourView Panel Book.pptx YourView Panel Book.JasonViviers2
 
ChistaDATA Real-Time DATA Analytics Infrastructure
ChistaDATA Real-Time DATA Analytics InfrastructureChistaDATA Real-Time DATA Analytics Infrastructure
ChistaDATA Real-Time DATA Analytics Infrastructuresonikadigital1
 
Cash Is Still King: ATM market research '2023
Cash Is Still King: ATM market research '2023Cash Is Still King: ATM market research '2023
Cash Is Still King: ATM market research '2023Vladislav Solodkiy
 
How is Real-Time Analytics Different from Traditional OLAP?
How is Real-Time Analytics Different from Traditional OLAP?How is Real-Time Analytics Different from Traditional OLAP?
How is Real-Time Analytics Different from Traditional OLAP?sonikadigital1
 
Master's Thesis - Data Science - Presentation
Master's Thesis - Data Science - PresentationMaster's Thesis - Data Science - Presentation
Master's Thesis - Data Science - PresentationGiorgio Carbone
 
Persuasive E-commerce, Our Biased Brain @ Bikkeldag 2024
Persuasive E-commerce, Our Biased Brain @ Bikkeldag 2024Persuasive E-commerce, Our Biased Brain @ Bikkeldag 2024
Persuasive E-commerce, Our Biased Brain @ Bikkeldag 2024Guido X Jansen
 
CI, CD -Tools to integrate without manual intervention
CI, CD -Tools to integrate without manual interventionCI, CD -Tools to integrate without manual intervention
CI, CD -Tools to integrate without manual interventionajayrajaganeshkayala
 
The Universal GTM - how we design GTM and dataLayer
The Universal GTM - how we design GTM and dataLayerThe Universal GTM - how we design GTM and dataLayer
The Universal GTM - how we design GTM and dataLayerPavel Šabatka
 
SFBA Splunk Usergroup meeting March 13, 2024
SFBA Splunk Usergroup meeting March 13, 2024SFBA Splunk Usergroup meeting March 13, 2024
SFBA Splunk Usergroup meeting March 13, 2024Becky Burwell
 
Virtuosoft SmartSync Product Introduction
Virtuosoft SmartSync Product IntroductionVirtuosoft SmartSync Product Introduction
Virtuosoft SmartSync Product Introductionsanjaymuralee1
 
Strategic CX: A Deep Dive into Voice of the Customer Insights for Clarity
Strategic CX: A Deep Dive into Voice of the Customer Insights for ClarityStrategic CX: A Deep Dive into Voice of the Customer Insights for Clarity
Strategic CX: A Deep Dive into Voice of the Customer Insights for ClarityAggregage
 
TINJUAN PEMROSESAN TRANSAKSI DAN ERP.pptx
TINJUAN PEMROSESAN TRANSAKSI DAN ERP.pptxTINJUAN PEMROSESAN TRANSAKSI DAN ERP.pptx
TINJUAN PEMROSESAN TRANSAKSI DAN ERP.pptxDwiAyuSitiHartinah
 
Mapping the pubmed data under different suptopics using NLP.pptx
Mapping the pubmed data under different suptopics using NLP.pptxMapping the pubmed data under different suptopics using NLP.pptx
Mapping the pubmed data under different suptopics using NLP.pptxVenkatasubramani13
 
5 Ds to Define Data Archiving Best Practices
5 Ds to Define Data Archiving Best Practices5 Ds to Define Data Archiving Best Practices
5 Ds to Define Data Archiving Best PracticesDataArchiva
 
Elements of language learning - an analysis of how different elements of lang...
Elements of language learning - an analysis of how different elements of lang...Elements of language learning - an analysis of how different elements of lang...
Elements of language learning - an analysis of how different elements of lang...PrithaVashisht1
 
AI for Sustainable Development Goals (SDGs)
AI for Sustainable Development Goals (SDGs)AI for Sustainable Development Goals (SDGs)
AI for Sustainable Development Goals (SDGs)Data & Analytics Magazin
 

Recently uploaded (17)

MEASURES OF DISPERSION I BSc Botany .ppt
MEASURES OF DISPERSION I BSc Botany .pptMEASURES OF DISPERSION I BSc Botany .ppt
MEASURES OF DISPERSION I BSc Botany .ppt
 
YourView Panel Book.pptx YourView Panel Book.
YourView Panel Book.pptx YourView Panel Book.YourView Panel Book.pptx YourView Panel Book.
YourView Panel Book.pptx YourView Panel Book.
 
ChistaDATA Real-Time DATA Analytics Infrastructure
ChistaDATA Real-Time DATA Analytics InfrastructureChistaDATA Real-Time DATA Analytics Infrastructure
ChistaDATA Real-Time DATA Analytics Infrastructure
 
Cash Is Still King: ATM market research '2023
Cash Is Still King: ATM market research '2023Cash Is Still King: ATM market research '2023
Cash Is Still King: ATM market research '2023
 
How is Real-Time Analytics Different from Traditional OLAP?
How is Real-Time Analytics Different from Traditional OLAP?How is Real-Time Analytics Different from Traditional OLAP?
How is Real-Time Analytics Different from Traditional OLAP?
 
Master's Thesis - Data Science - Presentation
Master's Thesis - Data Science - PresentationMaster's Thesis - Data Science - Presentation
Master's Thesis - Data Science - Presentation
 
Persuasive E-commerce, Our Biased Brain @ Bikkeldag 2024
Persuasive E-commerce, Our Biased Brain @ Bikkeldag 2024Persuasive E-commerce, Our Biased Brain @ Bikkeldag 2024
Persuasive E-commerce, Our Biased Brain @ Bikkeldag 2024
 
CI, CD -Tools to integrate without manual intervention
CI, CD -Tools to integrate without manual interventionCI, CD -Tools to integrate without manual intervention
CI, CD -Tools to integrate without manual intervention
 
The Universal GTM - how we design GTM and dataLayer
The Universal GTM - how we design GTM and dataLayerThe Universal GTM - how we design GTM and dataLayer
The Universal GTM - how we design GTM and dataLayer
 
SFBA Splunk Usergroup meeting March 13, 2024
SFBA Splunk Usergroup meeting March 13, 2024SFBA Splunk Usergroup meeting March 13, 2024
SFBA Splunk Usergroup meeting March 13, 2024
 
Virtuosoft SmartSync Product Introduction
Virtuosoft SmartSync Product IntroductionVirtuosoft SmartSync Product Introduction
Virtuosoft SmartSync Product Introduction
 
Strategic CX: A Deep Dive into Voice of the Customer Insights for Clarity
Strategic CX: A Deep Dive into Voice of the Customer Insights for ClarityStrategic CX: A Deep Dive into Voice of the Customer Insights for Clarity
Strategic CX: A Deep Dive into Voice of the Customer Insights for Clarity
 
TINJUAN PEMROSESAN TRANSAKSI DAN ERP.pptx
TINJUAN PEMROSESAN TRANSAKSI DAN ERP.pptxTINJUAN PEMROSESAN TRANSAKSI DAN ERP.pptx
TINJUAN PEMROSESAN TRANSAKSI DAN ERP.pptx
 
Mapping the pubmed data under different suptopics using NLP.pptx
Mapping the pubmed data under different suptopics using NLP.pptxMapping the pubmed data under different suptopics using NLP.pptx
Mapping the pubmed data under different suptopics using NLP.pptx
 
5 Ds to Define Data Archiving Best Practices
5 Ds to Define Data Archiving Best Practices5 Ds to Define Data Archiving Best Practices
5 Ds to Define Data Archiving Best Practices
 
Elements of language learning - an analysis of how different elements of lang...
Elements of language learning - an analysis of how different elements of lang...Elements of language learning - an analysis of how different elements of lang...
Elements of language learning - an analysis of how different elements of lang...
 
AI for Sustainable Development Goals (SDGs)
AI for Sustainable Development Goals (SDGs)AI for Sustainable Development Goals (SDGs)
AI for Sustainable Development Goals (SDGs)
 

Lecture #02

  • 1. Lecture#2 Program: BS(DS)-Fall 2019 Instructor: Konpal Darakshan DATA SCIENCE
  • 2. Topics • What is Data Science? • Pre-requisites for Data Science. • Data ScienceTasks • Data Science Life Cycle • Data Scientist • Common Data Quality Problem • How to tackle a Data problem
  • 3. Data Science A science of: ■ Interactive analysis of data ■ Interactive retrieval of data ■ Interactive prediction based on foresee/intelligence. ■ Generalized definition: Data science is the science which uses computer science, statistics and machine learning, visualization and human computer interactions to collect, clean, integrate, analyze, visualize, and interact with data to create data products.
  • 5. Pre-requisites for Data Science: ■ Computer Science: the study of both computer hardware and software design. It encompasses both the study of theoretical algorithms and the practical problems involved in implementing them through computer hardware and software. ■ Statistics: a branch of mathematics dealing with the collection, analysis, interpretation, and presentation of masses of numerical data.
  • 6. ■ Machine Learning: Machine learning is an application of artificial intelligence (AI) that provides systems the ability to automatically learn and improve from experience without being explicitly programmed. Machine learning focuses on the development of computer programs that can access data and use it learn for themselves. ■ Visualization: the process of representing data graphically and interacting with these representations in order to gain insight into the data. Pre-requisites for Data Science:
  • 7. Regular Data ScienceTasks • Data analysis • What percentage of users back to our site? • Which products usually bought together? • Modeling/statistics • How many cars we are going to sell next year? • Which city is better for opening new office? • Engineering/prototyping • Product to use a prediction model • Visualization of analytics
  • 9. What is a Data Scientist? ■ Data scientists serve the needs and solve the problems of data users. They use their formidable skills in math, statistics and programming to clean, manage and organize them.Then they apply all their analytic powers to uncover hidden solutions in the data.
  • 11. How to tackle a Data problem?
  • 12. ■ Subject Matter Expert (SME) ■ They possess domain knowledge with regards to the type of problem and so are a source of professional wisdom. ■ Anomaly ■ Anomalies are best and worst case scenarios. Main aim is to reach “the center” for the information required regarding the problem. ■ Risk and Uncertainty in data ■ Uncertainty can be minimized through the validation of information gained about the problem. Risks can be reduced once uncertainty is reduced which enables to make good decisions.