SlideShare uma empresa Scribd logo
1 de 11
The Data Science Process
AAruna Kumari
Assistant Professor
CSE Department
VNRVJIET
Every Day Data
• Every day,
• we create 2.5 quintillion bytes of data —
• so much that 90% of the data in the world today has
been created in the last two years alone.”
Data science
• The interest in data science
•Solve problems and answer questions using
data
•Goal to improve future outcomes
•What is the data science process?
CRISP-DM Methodology diagram
• Business UnderstandingBusiness
Understanding
Analytic
Approach
Data
Requirements
Data
Collection
Data
Understanding
Data
Preparation
Modeling
Evaluation
Deployment
Feedback
1. Business understanding
• Every project begins with business understanding
• Project objective?
• Business sponsors play the most critical role
• What are we trying to do – what is the goal?
• How do you define “success” and how can you
measure it?
Business
Understanding
2. Analytic Approach
• Express problem in context of statistical and machine learning techniques
• Regression:
• “Predicting revenue in the next quarter?”
• Classification:
• “Does this patient have cancer A, cancer B, or are they healthy?”
• Clustering:
• “Are there groups of users that seem to behave similarly to each other?”
• Recommendation/Personalization:
• “How can I target discounts to specific customers?”
• Outlier Detection
Business
Understanding
Analytical
Approach
Statistical / machine learning technique(s)
• Linear regression
• Logistic regression
• Clustering
• K-means
• Hierarchical
• Density-based
• Classification Trees
• Random Forests
• Neural networks
• Text mining
(natural language
processing)
• Principal
component
analysis
• Support Vector
Machines
• Hidden Markov Models
• …
Data compilation
• The chosen analytic approach determines the
• data requirements.
• Content, formats, representations
• Initial data collection is performed.
• Available Data?
• Obtain data?
• Revise data requirements or collect more data?
• Then data understanding is gained.
• Initial insights about data
• Descriptive statistics and visualization
• Additional data collection to fill gaps, if needed
Data
Requirements
Data
Collection
Data
Understanding
Data preparation
Data preparation encompasses all activities to construct and clean the
data set.
• Data cleaning
• Missing or invalid values
• Eliminating duplicate rows
• Formatting properly
• Combining multiple data sources
• Transforming data
• Feature engineering
• Text analysis
• Accelerate data preparation by automating common
• Arguably the most time-consuming step
• “80% of the entire DS process is in data cleaning and preparation”
Data
Preparation
Modeling
• Modeling:
• Developing predictive or descriptive models
• May try using multiple algorithms
• Highly iterative process
Data
PreparationModelingEvaluation
Deployment and feedback
• Once finalized, the model is deployed into a production environment.
• May start in a limited / test environment
• Involves other roles:
• Solution owner
• Marketing
• Application developers
• IT administration
• Getting Feedback :
• How well did the model perform?
• Iterative process for model refinement and redeployment
• A/B testing
• Big Data University:
• Inactive -> Active
Feedback
Deployment

Mais conteúdo relacionado

Mais procurados

Lessons learned from the proverbial battlefield - Hortonworks roadshow
Lessons learned from the proverbial battlefield - Hortonworks roadshowLessons learned from the proverbial battlefield - Hortonworks roadshow
Lessons learned from the proverbial battlefield - Hortonworks roadshowSuhail S.
 
RallyZ: Session 2
RallyZ: Session 2RallyZ: Session 2
RallyZ: Session 2Quadlyfe
 
Analytics 101 - Getting Started
Analytics 101 - Getting Started Analytics 101 - Getting Started
Analytics 101 - Getting Started Gautam Munshi
 
CIRPA 2016: Individual Level Predictive Analytics for Improving Student Enrol...
CIRPA 2016: Individual Level Predictive Analytics for Improving Student Enrol...CIRPA 2016: Individual Level Predictive Analytics for Improving Student Enrol...
CIRPA 2016: Individual Level Predictive Analytics for Improving Student Enrol...Stephen Childs
 
Student Grade Prediction
Student Grade PredictionStudent Grade Prediction
Student Grade PredictionGaurav Sawant
 
CSCL17 Discussant: Learning Analytics and Intelligent Support
CSCL17 Discussant: Learning Analytics and Intelligent SupportCSCL17 Discussant: Learning Analytics and Intelligent Support
CSCL17 Discussant: Learning Analytics and Intelligent SupportBodong Chen
 
Ensuring data quality
Ensuring data qualityEnsuring data quality
Ensuring data qualityIUPUI
 
Bb Tour ANZ 2017 - Predicting Student Success
Bb Tour ANZ 2017 - Predicting Student SuccessBb Tour ANZ 2017 - Predicting Student Success
Bb Tour ANZ 2017 - Predicting Student SuccessBlackboard APAC
 
Big data
Big dataBig data
Big data26Nia
 
Big Data in Education Sector
Big Data in Education SectorBig Data in Education Sector
Big Data in Education SectorKaran Sachdeva
 
Data quality overview
Data quality overviewData quality overview
Data quality overviewAlex Meadows
 
Data Quality Rules introduction
Data Quality Rules introductionData Quality Rules introduction
Data Quality Rules introductiondatatovalue
 
GPS for Chemical Space - Digital Assistants to Support Molecule Design - Chem...
GPS for Chemical Space - Digital Assistants to Support Molecule Design - Chem...GPS for Chemical Space - Digital Assistants to Support Molecule Design - Chem...
GPS for Chemical Space - Digital Assistants to Support Molecule Design - Chem...ChemAxon
 
Predicting student performance using aggregated data sources
Predicting student performance using aggregated data sourcesPredicting student performance using aggregated data sources
Predicting student performance using aggregated data sourcesOlugbenga Wilson Adejo
 

Mais procurados (18)

Vitriol
VitriolVitriol
Vitriol
 
Data science
Data science Data science
Data science
 
Lessons learned from the proverbial battlefield - Hortonworks roadshow
Lessons learned from the proverbial battlefield - Hortonworks roadshowLessons learned from the proverbial battlefield - Hortonworks roadshow
Lessons learned from the proverbial battlefield - Hortonworks roadshow
 
RallyZ: Session 2
RallyZ: Session 2RallyZ: Session 2
RallyZ: Session 2
 
Analytics 101 - Getting Started
Analytics 101 - Getting Started Analytics 101 - Getting Started
Analytics 101 - Getting Started
 
CIRPA 2016: Individual Level Predictive Analytics for Improving Student Enrol...
CIRPA 2016: Individual Level Predictive Analytics for Improving Student Enrol...CIRPA 2016: Individual Level Predictive Analytics for Improving Student Enrol...
CIRPA 2016: Individual Level Predictive Analytics for Improving Student Enrol...
 
Student Grade Prediction
Student Grade PredictionStudent Grade Prediction
Student Grade Prediction
 
CSCL17 Discussant: Learning Analytics and Intelligent Support
CSCL17 Discussant: Learning Analytics and Intelligent SupportCSCL17 Discussant: Learning Analytics and Intelligent Support
CSCL17 Discussant: Learning Analytics and Intelligent Support
 
Ensuring data quality
Ensuring data qualityEnsuring data quality
Ensuring data quality
 
Bb Tour ANZ 2017 - Predicting Student Success
Bb Tour ANZ 2017 - Predicting Student SuccessBb Tour ANZ 2017 - Predicting Student Success
Bb Tour ANZ 2017 - Predicting Student Success
 
Powerpoint si
Powerpoint siPowerpoint si
Powerpoint si
 
Powerpoint si
Powerpoint siPowerpoint si
Powerpoint si
 
Big data
Big dataBig data
Big data
 
Big Data in Education Sector
Big Data in Education SectorBig Data in Education Sector
Big Data in Education Sector
 
Data quality overview
Data quality overviewData quality overview
Data quality overview
 
Data Quality Rules introduction
Data Quality Rules introductionData Quality Rules introduction
Data Quality Rules introduction
 
GPS for Chemical Space - Digital Assistants to Support Molecule Design - Chem...
GPS for Chemical Space - Digital Assistants to Support Molecule Design - Chem...GPS for Chemical Space - Digital Assistants to Support Molecule Design - Chem...
GPS for Chemical Space - Digital Assistants to Support Molecule Design - Chem...
 
Predicting student performance using aggregated data sources
Predicting student performance using aggregated data sourcesPredicting student performance using aggregated data sources
Predicting student performance using aggregated data sources
 

Semelhante a Datascience methodology

Group 1 Report CRISP - DM METHODOLOGY.pptx
Group 1 Report CRISP - DM METHODOLOGY.pptxGroup 1 Report CRISP - DM METHODOLOGY.pptx
Group 1 Report CRISP - DM METHODOLOGY.pptxellamangapis2003
 
An Introduction to Advanced analytics and data mining
An Introduction to Advanced analytics and data miningAn Introduction to Advanced analytics and data mining
An Introduction to Advanced analytics and data miningBarry Leventhal
 
Data-Ed Slides: Data Modeling Strategies - Getting Your Data Ready for the Ca...
Data-Ed Slides: Data Modeling Strategies - Getting Your Data Ready for the Ca...Data-Ed Slides: Data Modeling Strategies - Getting Your Data Ready for the Ca...
Data-Ed Slides: Data Modeling Strategies - Getting Your Data Ready for the Ca...DATAVERSITY
 
Business Analytics and Data mining.pdf
Business Analytics and Data mining.pdfBusiness Analytics and Data mining.pdf
Business Analytics and Data mining.pdfssuser0413ec
 
Application of mis in textile industry
Application of mis in textile industryApplication of mis in textile industry
Application of mis in textile industryMajharul Islam
 
DATA SCIENCE AND BIG DATA ANALYTICSCHAPTER 2 DATA ANA.docx
DATA SCIENCE AND BIG DATA ANALYTICSCHAPTER 2 DATA ANA.docxDATA SCIENCE AND BIG DATA ANALYTICSCHAPTER 2 DATA ANA.docx
DATA SCIENCE AND BIG DATA ANALYTICSCHAPTER 2 DATA ANA.docxrandyburney60861
 
Introducition to Data scinece compiled by hu
Introducition to Data scinece compiled by huIntroducition to Data scinece compiled by hu
Introducition to Data scinece compiled by huwekineheshete
 
Decision support systems and business intelligence
Decision support systems and business intelligenceDecision support systems and business intelligence
Decision support systems and business intelligenceShwetabh Jaiswal
 
000 introduction to big data analytics 2021
000   introduction to big data analytics  2021000   introduction to big data analytics  2021
000 introduction to big data analytics 2021Dendej Sawarnkatat
 
Data Refinement: The missing link between data collection and decisions
Data Refinement: The missing link between data collection and decisionsData Refinement: The missing link between data collection and decisions
Data Refinement: The missing link between data collection and decisionsVivastream
 
Introduction to Data mining
Introduction to Data miningIntroduction to Data mining
Introduction to Data miningHadi Fadlallah
 
Data Analytics: Better Decision, Better Business
Data Analytics: Better Decision, Better BusinessData Analytics: Better Decision, Better Business
Data Analytics: Better Decision, Better BusinessMcKonly & Asbury, LLP
 
Data Science Training and Placement
Data Science Training and PlacementData Science Training and Placement
Data Science Training and PlacementAkhilGGM
 
Introduction to data science
Introduction to data scienceIntroduction to data science
Introduction to data scienceSpartan60
 
Data science training in hyd ppt converted (1)
Data science training in hyd ppt converted (1)Data science training in hyd ppt converted (1)
Data science training in hyd ppt converted (1)SayyedYusufali
 
Data science training in hyd pdf converted (1)
Data science training in hyd pdf converted (1)Data science training in hyd pdf converted (1)
Data science training in hyd pdf converted (1)SayyedYusufali
 
Data science training in hydpdf converted (1)
Data science training in hydpdf  converted (1)Data science training in hydpdf  converted (1)
Data science training in hydpdf converted (1)SayyedYusufali
 
Which institute is best for data science?
Which institute is best for data science?Which institute is best for data science?
Which institute is best for data science?DIGITALSAI1
 

Semelhante a Datascience methodology (20)

Data Analytics course.pptx
Data Analytics course.pptxData Analytics course.pptx
Data Analytics course.pptx
 
Group 1 Report CRISP - DM METHODOLOGY.pptx
Group 1 Report CRISP - DM METHODOLOGY.pptxGroup 1 Report CRISP - DM METHODOLOGY.pptx
Group 1 Report CRISP - DM METHODOLOGY.pptx
 
An Introduction to Advanced analytics and data mining
An Introduction to Advanced analytics and data miningAn Introduction to Advanced analytics and data mining
An Introduction to Advanced analytics and data mining
 
Data-Ed Slides: Data Modeling Strategies - Getting Your Data Ready for the Ca...
Data-Ed Slides: Data Modeling Strategies - Getting Your Data Ready for the Ca...Data-Ed Slides: Data Modeling Strategies - Getting Your Data Ready for the Ca...
Data-Ed Slides: Data Modeling Strategies - Getting Your Data Ready for the Ca...
 
DataScience.pptx
DataScience.pptxDataScience.pptx
DataScience.pptx
 
Business Analytics and Data mining.pdf
Business Analytics and Data mining.pdfBusiness Analytics and Data mining.pdf
Business Analytics and Data mining.pdf
 
Application of mis in textile industry
Application of mis in textile industryApplication of mis in textile industry
Application of mis in textile industry
 
DATA SCIENCE AND BIG DATA ANALYTICSCHAPTER 2 DATA ANA.docx
DATA SCIENCE AND BIG DATA ANALYTICSCHAPTER 2 DATA ANA.docxDATA SCIENCE AND BIG DATA ANALYTICSCHAPTER 2 DATA ANA.docx
DATA SCIENCE AND BIG DATA ANALYTICSCHAPTER 2 DATA ANA.docx
 
Introducition to Data scinece compiled by hu
Introducition to Data scinece compiled by huIntroducition to Data scinece compiled by hu
Introducition to Data scinece compiled by hu
 
Decision support systems and business intelligence
Decision support systems and business intelligenceDecision support systems and business intelligence
Decision support systems and business intelligence
 
000 introduction to big data analytics 2021
000   introduction to big data analytics  2021000   introduction to big data analytics  2021
000 introduction to big data analytics 2021
 
Data Refinement: The missing link between data collection and decisions
Data Refinement: The missing link between data collection and decisionsData Refinement: The missing link between data collection and decisions
Data Refinement: The missing link between data collection and decisions
 
Introduction to Data mining
Introduction to Data miningIntroduction to Data mining
Introduction to Data mining
 
Data Analytics: Better Decision, Better Business
Data Analytics: Better Decision, Better BusinessData Analytics: Better Decision, Better Business
Data Analytics: Better Decision, Better Business
 
Data Science Training and Placement
Data Science Training and PlacementData Science Training and Placement
Data Science Training and Placement
 
Introduction to data science
Introduction to data scienceIntroduction to data science
Introduction to data science
 
Data science training in hyd ppt converted (1)
Data science training in hyd ppt converted (1)Data science training in hyd ppt converted (1)
Data science training in hyd ppt converted (1)
 
Data science training in hyd pdf converted (1)
Data science training in hyd pdf converted (1)Data science training in hyd pdf converted (1)
Data science training in hyd pdf converted (1)
 
Data science training in hydpdf converted (1)
Data science training in hydpdf  converted (1)Data science training in hydpdf  converted (1)
Data science training in hydpdf converted (1)
 
Which institute is best for data science?
Which institute is best for data science?Which institute is best for data science?
Which institute is best for data science?
 

Último

ANG SEKTOR NG agrikultura.pptx QUARTER 4
ANG SEKTOR NG agrikultura.pptx QUARTER 4ANG SEKTOR NG agrikultura.pptx QUARTER 4
ANG SEKTOR NG agrikultura.pptx QUARTER 4MiaBumagat1
 
Procuring digital preservation CAN be quick and painless with our new dynamic...
Procuring digital preservation CAN be quick and painless with our new dynamic...Procuring digital preservation CAN be quick and painless with our new dynamic...
Procuring digital preservation CAN be quick and painless with our new dynamic...Jisc
 
Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)
Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)
Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)lakshayb543
 
Proudly South Africa powerpoint Thorisha.pptx
Proudly South Africa powerpoint Thorisha.pptxProudly South Africa powerpoint Thorisha.pptx
Proudly South Africa powerpoint Thorisha.pptxthorishapillay1
 
Keynote by Prof. Wurzer at Nordex about IP-design
Keynote by Prof. Wurzer at Nordex about IP-designKeynote by Prof. Wurzer at Nordex about IP-design
Keynote by Prof. Wurzer at Nordex about IP-designMIPLM
 
Difference Between Search & Browse Methods in Odoo 17
Difference Between Search & Browse Methods in Odoo 17Difference Between Search & Browse Methods in Odoo 17
Difference Between Search & Browse Methods in Odoo 17Celine George
 
Concurrency Control in Database Management system
Concurrency Control in Database Management systemConcurrency Control in Database Management system
Concurrency Control in Database Management systemChristalin Nelson
 
Science 7 Quarter 4 Module 2: Natural Resources.pptx
Science 7 Quarter 4 Module 2: Natural Resources.pptxScience 7 Quarter 4 Module 2: Natural Resources.pptx
Science 7 Quarter 4 Module 2: Natural Resources.pptxMaryGraceBautista27
 
Earth Day Presentation wow hello nice great
Earth Day Presentation wow hello nice greatEarth Day Presentation wow hello nice great
Earth Day Presentation wow hello nice greatYousafMalik24
 
How to Add Barcode on PDF Report in Odoo 17
How to Add Barcode on PDF Report in Odoo 17How to Add Barcode on PDF Report in Odoo 17
How to Add Barcode on PDF Report in Odoo 17Celine George
 
Transaction Management in Database Management System
Transaction Management in Database Management SystemTransaction Management in Database Management System
Transaction Management in Database Management SystemChristalin Nelson
 
Virtual-Orientation-on-the-Administration-of-NATG12-NATG6-and-ELLNA.pdf
Virtual-Orientation-on-the-Administration-of-NATG12-NATG6-and-ELLNA.pdfVirtual-Orientation-on-the-Administration-of-NATG12-NATG6-and-ELLNA.pdf
Virtual-Orientation-on-the-Administration-of-NATG12-NATG6-and-ELLNA.pdfErwinPantujan2
 
Judging the Relevance and worth of ideas part 2.pptx
Judging the Relevance  and worth of ideas part 2.pptxJudging the Relevance  and worth of ideas part 2.pptx
Judging the Relevance and worth of ideas part 2.pptxSherlyMaeNeri
 
HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - CẢ NĂ...
HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - CẢ NĂ...HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - CẢ NĂ...
HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - CẢ NĂ...Nguyen Thanh Tu Collection
 
AMERICAN LANGUAGE HUB_Level2_Student'sBook_Answerkey.pdf
AMERICAN LANGUAGE HUB_Level2_Student'sBook_Answerkey.pdfAMERICAN LANGUAGE HUB_Level2_Student'sBook_Answerkey.pdf
AMERICAN LANGUAGE HUB_Level2_Student'sBook_Answerkey.pdfphamnguyenenglishnb
 

Último (20)

ANG SEKTOR NG agrikultura.pptx QUARTER 4
ANG SEKTOR NG agrikultura.pptx QUARTER 4ANG SEKTOR NG agrikultura.pptx QUARTER 4
ANG SEKTOR NG agrikultura.pptx QUARTER 4
 
Procuring digital preservation CAN be quick and painless with our new dynamic...
Procuring digital preservation CAN be quick and painless with our new dynamic...Procuring digital preservation CAN be quick and painless with our new dynamic...
Procuring digital preservation CAN be quick and painless with our new dynamic...
 
Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)
Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)
Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)
 
Proudly South Africa powerpoint Thorisha.pptx
Proudly South Africa powerpoint Thorisha.pptxProudly South Africa powerpoint Thorisha.pptx
Proudly South Africa powerpoint Thorisha.pptx
 
Keynote by Prof. Wurzer at Nordex about IP-design
Keynote by Prof. Wurzer at Nordex about IP-designKeynote by Prof. Wurzer at Nordex about IP-design
Keynote by Prof. Wurzer at Nordex about IP-design
 
Difference Between Search & Browse Methods in Odoo 17
Difference Between Search & Browse Methods in Odoo 17Difference Between Search & Browse Methods in Odoo 17
Difference Between Search & Browse Methods in Odoo 17
 
Concurrency Control in Database Management system
Concurrency Control in Database Management systemConcurrency Control in Database Management system
Concurrency Control in Database Management system
 
Raw materials used in Herbal Cosmetics.pptx
Raw materials used in Herbal Cosmetics.pptxRaw materials used in Herbal Cosmetics.pptx
Raw materials used in Herbal Cosmetics.pptx
 
Science 7 Quarter 4 Module 2: Natural Resources.pptx
Science 7 Quarter 4 Module 2: Natural Resources.pptxScience 7 Quarter 4 Module 2: Natural Resources.pptx
Science 7 Quarter 4 Module 2: Natural Resources.pptx
 
Earth Day Presentation wow hello nice great
Earth Day Presentation wow hello nice greatEarth Day Presentation wow hello nice great
Earth Day Presentation wow hello nice great
 
How to Add Barcode on PDF Report in Odoo 17
How to Add Barcode on PDF Report in Odoo 17How to Add Barcode on PDF Report in Odoo 17
How to Add Barcode on PDF Report in Odoo 17
 
Model Call Girl in Tilak Nagar Delhi reach out to us at 🔝9953056974🔝
Model Call Girl in Tilak Nagar Delhi reach out to us at 🔝9953056974🔝Model Call Girl in Tilak Nagar Delhi reach out to us at 🔝9953056974🔝
Model Call Girl in Tilak Nagar Delhi reach out to us at 🔝9953056974🔝
 
YOUVE_GOT_EMAIL_PRELIMS_EL_DORADO_2024.pptx
YOUVE_GOT_EMAIL_PRELIMS_EL_DORADO_2024.pptxYOUVE_GOT_EMAIL_PRELIMS_EL_DORADO_2024.pptx
YOUVE_GOT_EMAIL_PRELIMS_EL_DORADO_2024.pptx
 
Transaction Management in Database Management System
Transaction Management in Database Management SystemTransaction Management in Database Management System
Transaction Management in Database Management System
 
Virtual-Orientation-on-the-Administration-of-NATG12-NATG6-and-ELLNA.pdf
Virtual-Orientation-on-the-Administration-of-NATG12-NATG6-and-ELLNA.pdfVirtual-Orientation-on-the-Administration-of-NATG12-NATG6-and-ELLNA.pdf
Virtual-Orientation-on-the-Administration-of-NATG12-NATG6-and-ELLNA.pdf
 
Judging the Relevance and worth of ideas part 2.pptx
Judging the Relevance  and worth of ideas part 2.pptxJudging the Relevance  and worth of ideas part 2.pptx
Judging the Relevance and worth of ideas part 2.pptx
 
FINALS_OF_LEFT_ON_C'N_EL_DORADO_2024.pptx
FINALS_OF_LEFT_ON_C'N_EL_DORADO_2024.pptxFINALS_OF_LEFT_ON_C'N_EL_DORADO_2024.pptx
FINALS_OF_LEFT_ON_C'N_EL_DORADO_2024.pptx
 
HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - CẢ NĂ...
HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - CẢ NĂ...HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - CẢ NĂ...
HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - CẢ NĂ...
 
AMERICAN LANGUAGE HUB_Level2_Student'sBook_Answerkey.pdf
AMERICAN LANGUAGE HUB_Level2_Student'sBook_Answerkey.pdfAMERICAN LANGUAGE HUB_Level2_Student'sBook_Answerkey.pdf
AMERICAN LANGUAGE HUB_Level2_Student'sBook_Answerkey.pdf
 
YOUVE GOT EMAIL_FINALS_EL_DORADO_2024.pptx
YOUVE GOT EMAIL_FINALS_EL_DORADO_2024.pptxYOUVE GOT EMAIL_FINALS_EL_DORADO_2024.pptx
YOUVE GOT EMAIL_FINALS_EL_DORADO_2024.pptx
 

Datascience methodology

  • 1. The Data Science Process AAruna Kumari Assistant Professor CSE Department VNRVJIET
  • 2. Every Day Data • Every day, • we create 2.5 quintillion bytes of data — • so much that 90% of the data in the world today has been created in the last two years alone.”
  • 3. Data science • The interest in data science •Solve problems and answer questions using data •Goal to improve future outcomes •What is the data science process?
  • 4. CRISP-DM Methodology diagram • Business UnderstandingBusiness Understanding Analytic Approach Data Requirements Data Collection Data Understanding Data Preparation Modeling Evaluation Deployment Feedback
  • 5. 1. Business understanding • Every project begins with business understanding • Project objective? • Business sponsors play the most critical role • What are we trying to do – what is the goal? • How do you define “success” and how can you measure it? Business Understanding
  • 6. 2. Analytic Approach • Express problem in context of statistical and machine learning techniques • Regression: • “Predicting revenue in the next quarter?” • Classification: • “Does this patient have cancer A, cancer B, or are they healthy?” • Clustering: • “Are there groups of users that seem to behave similarly to each other?” • Recommendation/Personalization: • “How can I target discounts to specific customers?” • Outlier Detection Business Understanding Analytical Approach
  • 7. Statistical / machine learning technique(s) • Linear regression • Logistic regression • Clustering • K-means • Hierarchical • Density-based • Classification Trees • Random Forests • Neural networks • Text mining (natural language processing) • Principal component analysis • Support Vector Machines • Hidden Markov Models • …
  • 8. Data compilation • The chosen analytic approach determines the • data requirements. • Content, formats, representations • Initial data collection is performed. • Available Data? • Obtain data? • Revise data requirements or collect more data? • Then data understanding is gained. • Initial insights about data • Descriptive statistics and visualization • Additional data collection to fill gaps, if needed Data Requirements Data Collection Data Understanding
  • 9. Data preparation Data preparation encompasses all activities to construct and clean the data set. • Data cleaning • Missing or invalid values • Eliminating duplicate rows • Formatting properly • Combining multiple data sources • Transforming data • Feature engineering • Text analysis • Accelerate data preparation by automating common • Arguably the most time-consuming step • “80% of the entire DS process is in data cleaning and preparation” Data Preparation
  • 10. Modeling • Modeling: • Developing predictive or descriptive models • May try using multiple algorithms • Highly iterative process Data PreparationModelingEvaluation
  • 11. Deployment and feedback • Once finalized, the model is deployed into a production environment. • May start in a limited / test environment • Involves other roles: • Solution owner • Marketing • Application developers • IT administration • Getting Feedback : • How well did the model perform? • Iterative process for model refinement and redeployment • A/B testing • Big Data University: • Inactive -> Active Feedback Deployment