SlideShare uma empresa Scribd logo
1 de 41
Introduction to Data Mining Ch. 2 Data Preprocessing Heon Gyu Lee ( [email_address] ) http://dblab.chungbuk.ac.kr/~hglee DB/Bioinfo., Lab.  http://dblab.chungbuk.ac.kr Chungbuk National University
Why Data Preprocessing? ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
What is Data? ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],Attributes Objects
Types of Attributes  ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Discrete and Continuous Attributes  ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Data Quality  ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Noise ,[object Object],[object Object],Two Sine Waves Two Sine Waves + Noise
Outliers ,[object Object]
Missing Values ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Duplicate Data ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Major Tasks in Data Preprocessing ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Forms of Data Preprocessing
[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],Data Cleaning
Data Cleaning  : How to Handle Missing Data? ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Data Cleaning  : How to Handle Noisy Data? ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Data Cleaning  : Binning Methods ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Data Cleaning : Regression x y y = x + 1 X1 Y1 Y1’
Data Cleaning : Cluster Analysis
Data Integration ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Data Integration  : Handling Redundancy in Data Integration ,[object Object],[object Object],[object Object],[object Object],[object Object]
Data Integration :  Correlation Analysis (Numerical Data) ,[object Object],[object Object],[object Object],[object Object]
Data Integration  : Correlation Analysis (Categorical Data) ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Chi-Square Calculation: An Example ,[object Object],[object Object],1500 1200 300 Sum(col.) 1050 1000(840) 50(210) Not like science fiction 450 200(360) 250(90) Like science fiction Sum (row) Not play chess Play chess
Data Transformation ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Data Transformation : Normalization ,[object Object],[object Object],[object Object],[object Object],[object Object],Where  j  is the smallest integer such that Max(| ν ’ |) < 1
Data Reduction Strategies ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Data Reduction : Aggregation ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Data Reduction : Aggregation Standard Deviation of Average Monthly Precipitation Standard Deviation of Average Yearly Precipitation Variation of Precipitation in Australia
Data Reduction : Sampling  ,[object Object],[object Object],[object Object],[object Object]
Data Reduction : Types of Sampling ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Data Reduction  : Dimensionality Reduction ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Dimensionality Reduction : PCA ,[object Object],x 2 x 1 e
Dimensionality Reduction : PCA ,[object Object],[object Object],x 2 x 1 e
Data Reduction  : Feature Subset Selection ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Data Reduction  : Feature Subset Selection ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Data Reduction  : Feature Creation ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Data Reduction  : Mapping Data to a New Space Two Sine Waves Two Sine Waves + Noise Frequency ,[object Object],[object Object]
Data Reduction  : Discretization Using Class Labels ,[object Object],3 categories for both x and y 5 categories for both x and y
Data Reduction  : Discretization Without Using Class Labels Data Equal interval width Equal frequency K-means
Data Reduction  : Attribute Transformation ,[object Object],[object Object],[object Object]
Question & Answer

Mais conteúdo relacionado

Mais procurados

Data preprocessing
Data preprocessingData preprocessing
Data preprocessing
ksamyMCA
 
Data preprocessing
Data preprocessingData preprocessing
Data preprocessing
kayathri02
 

Mais procurados (20)

Data preprocess
Data preprocessData preprocess
Data preprocess
 
Data preprocessing in Data Mining
Data preprocessing in Data MiningData preprocessing in Data Mining
Data preprocessing in Data Mining
 
Data mining :Concepts and Techniques Chapter 2, data
Data mining :Concepts and Techniques Chapter 2, dataData mining :Concepts and Techniques Chapter 2, data
Data mining :Concepts and Techniques Chapter 2, data
 
Data Mining: Concepts and Techniques (3rd ed.) - Chapter 3 preprocessing
Data Mining:  Concepts and Techniques (3rd ed.)- Chapter 3 preprocessingData Mining:  Concepts and Techniques (3rd ed.)- Chapter 3 preprocessing
Data Mining: Concepts and Techniques (3rd ed.) - Chapter 3 preprocessing
 
Data preprocessing
Data preprocessingData preprocessing
Data preprocessing
 
Exploratory data analysis
Exploratory data analysisExploratory data analysis
Exploratory data analysis
 
Knowledge discovery process
Knowledge discovery process Knowledge discovery process
Knowledge discovery process
 
Data pre processing
Data pre processingData pre processing
Data pre processing
 
Classification in data mining
Classification in data mining Classification in data mining
Classification in data mining
 
Data preprocessing
Data preprocessingData preprocessing
Data preprocessing
 
3. mining frequent patterns
3. mining frequent patterns3. mining frequent patterns
3. mining frequent patterns
 
Data Mining: Data Preprocessing
Data Mining: Data PreprocessingData Mining: Data Preprocessing
Data Mining: Data Preprocessing
 
Introduction to Data Science.pptx
Introduction to Data Science.pptxIntroduction to Data Science.pptx
Introduction to Data Science.pptx
 
1.2 steps and functionalities
1.2 steps and functionalities1.2 steps and functionalities
1.2 steps and functionalities
 
Exploratory data analysis with Python
Exploratory data analysis with PythonExploratory data analysis with Python
Exploratory data analysis with Python
 
Data mining , Knowledge Discovery Process, Classification
Data mining , Knowledge Discovery Process, ClassificationData mining , Knowledge Discovery Process, Classification
Data mining , Knowledge Discovery Process, Classification
 
Data preprocessing PPT
Data preprocessing PPTData preprocessing PPT
Data preprocessing PPT
 
Data Mining: Association Rules Basics
Data Mining: Association Rules BasicsData Mining: Association Rules Basics
Data Mining: Association Rules Basics
 
Data PreProcessing
Data PreProcessingData PreProcessing
Data PreProcessing
 
Missing data handling
Missing data handlingMissing data handling
Missing data handling
 

Semelhante a Data Preprocessing

03 preprocessing
03 preprocessing03 preprocessing
03 preprocessing
purnimatm
 
Data Preprocessing and Visualizsdjvnovrnververdfvdfation
Data Preprocessing and VisualizsdjvnovrnververdfvdfationData Preprocessing and Visualizsdjvnovrnververdfvdfation
Data Preprocessing and Visualizsdjvnovrnververdfvdfation
wokati2689
 
03Predddddddddddddddddddddddprocessling.ppt
03Predddddddddddddddddddddddprocessling.ppt03Predddddddddddddddddddddddprocessling.ppt
03Predddddddddddddddddddddddprocessling.ppt
a99150433
 

Semelhante a Data Preprocessing (20)

03Preprocessing01.pdf
03Preprocessing01.pdf03Preprocessing01.pdf
03Preprocessing01.pdf
 
Preprocessing.ppt
Preprocessing.pptPreprocessing.ppt
Preprocessing.ppt
 
Datapreprocessing
DatapreprocessingDatapreprocessing
Datapreprocessing
 
03 preprocessing
03 preprocessing03 preprocessing
03 preprocessing
 
Chapter 3. Data Preprocessing.ppt
Chapter 3. Data Preprocessing.pptChapter 3. Data Preprocessing.ppt
Chapter 3. Data Preprocessing.ppt
 
Upstate CSCI 525 Data Mining Chapter 3
Upstate CSCI 525 Data Mining Chapter 3Upstate CSCI 525 Data Mining Chapter 3
Upstate CSCI 525 Data Mining Chapter 3
 
02Data updated.pdf
02Data updated.pdf02Data updated.pdf
02Data updated.pdf
 
Cs501 data preprocessingdw
Cs501 data preprocessingdwCs501 data preprocessingdw
Cs501 data preprocessingdw
 
Unit 3-2.ppt
Unit 3-2.pptUnit 3-2.ppt
Unit 3-2.ppt
 
data processing.pdf
data processing.pdfdata processing.pdf
data processing.pdf
 
Data Preprocessing
Data PreprocessingData Preprocessing
Data Preprocessing
 
1.6.data preprocessing
1.6.data preprocessing1.6.data preprocessing
1.6.data preprocessing
 
03Preprocessing_plp.pptx
03Preprocessing_plp.pptx03Preprocessing_plp.pptx
03Preprocessing_plp.pptx
 
03Preprocessing.ppt
03Preprocessing.ppt03Preprocessing.ppt
03Preprocessing.ppt
 
03Preprocessing_plp.pptx
03Preprocessing_plp.pptx03Preprocessing_plp.pptx
03Preprocessing_plp.pptx
 
Data Preprocessing and Visualizsdjvnovrnververdfvdfation
Data Preprocessing and VisualizsdjvnovrnververdfvdfationData Preprocessing and Visualizsdjvnovrnververdfvdfation
Data Preprocessing and Visualizsdjvnovrnververdfvdfation
 
03Preprocessing for student computer sciecne.ppt
03Preprocessing for student computer sciecne.ppt03Preprocessing for student computer sciecne.ppt
03Preprocessing for student computer sciecne.ppt
 
03Preprocessing.ppt
03Preprocessing.ppt03Preprocessing.ppt
03Preprocessing.ppt
 
Preprocessing.ppt
Preprocessing.pptPreprocessing.ppt
Preprocessing.ppt
 
03Predddddddddddddddddddddddprocessling.ppt
03Predddddddddddddddddddddddprocessling.ppt03Predddddddddddddddddddddddprocessling.ppt
03Predddddddddddddddddddddddprocessling.ppt
 

Mais de Object-Frontier Software Pvt. Ltd (9)

Chap9
Chap9Chap9
Chap9
 
Wsh96 Wilkinson
Wsh96 WilkinsonWsh96 Wilkinson
Wsh96 Wilkinson
 
Dc 11 Brucepotter
Dc 11 BrucepotterDc 11 Brucepotter
Dc 11 Brucepotter
 
Ieee 802.11overview
Ieee 802.11overviewIeee 802.11overview
Ieee 802.11overview
 
Presentation
PresentationPresentation
Presentation
 
Gsm Network
Gsm NetworkGsm Network
Gsm Network
 
GPRS
GPRSGPRS
GPRS
 
CORBA
CORBACORBA
CORBA
 
Rmi
RmiRmi
Rmi
 

Último

Último (20)

Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your Business
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
HTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesHTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation Strategies
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 

Data Preprocessing

  • 1. Introduction to Data Mining Ch. 2 Data Preprocessing Heon Gyu Lee ( [email_address] ) http://dblab.chungbuk.ac.kr/~hglee DB/Bioinfo., Lab. http://dblab.chungbuk.ac.kr Chungbuk National University
  • 2.
  • 3.
  • 4.
  • 5.
  • 6.
  • 7.
  • 8.
  • 9.
  • 10.
  • 11.
  • 12. Forms of Data Preprocessing
  • 13.
  • 14.
  • 15.
  • 16.
  • 17. Data Cleaning : Regression x y y = x + 1 X1 Y1 Y1’
  • 18. Data Cleaning : Cluster Analysis
  • 19.
  • 20.
  • 21.
  • 22.
  • 23.
  • 24.
  • 25.
  • 26.
  • 27.
  • 28. Data Reduction : Aggregation Standard Deviation of Average Monthly Precipitation Standard Deviation of Average Yearly Precipitation Variation of Precipitation in Australia
  • 29.
  • 30.
  • 31.
  • 32.
  • 33.
  • 34.
  • 35.
  • 36.
  • 37.
  • 38.
  • 39. Data Reduction : Discretization Without Using Class Labels Data Equal interval width Equal frequency K-means
  • 40.