SlideShare a Scribd company logo
1 of 137
SUSHIL  KULKARNI INTRODUCTION TO  DATA MINING
[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],SUSHIL KULKARNI
DATA SUSHIL KULKARNI
[object Object],[object Object],[object Object],DATA SUSHIL KULKARNI
[object Object],[object Object],[object Object],[object Object],[object Object],DATA SUSHIL KULKARNI
[object Object],[object Object],[object Object],DATA UNCOVER HIDDEN INFORMATION DATA MINING SUSHIL KULKARNI
DATA MINING DEFINITION SUSHIL KULKARNI
[object Object],[object Object],[object Object],[object Object],DEFINE DATA MINING SUSHIL KULKARNI
[object Object],[object Object],[object Object],[object Object],FEW TERMS SUSHIL KULKARNI
[object Object],[object Object],[object Object],[object Object],[object Object],FEW TERMS SUSHIL KULKARNI
EXAMPLE OF LAGE DATASETS ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],SUSHIL KULKARNI
EXAMPLES OF DATA MINING APPLICATIONS ,[object Object],[object Object],[object Object],[object Object],[object Object],SUSHIL KULKARNI
[object Object],[object Object],[object Object],THUS : DATA MINING SUSHIL KULKARNI
[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],THUS : DATA MINING SUSHIL KULKARNI
NUGGETS SUSHIL KULKARNI
[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],NUGGETS SUSHIL KULKARNI
[object Object],[object Object],NUGGETS SUSHIL KULKARNI
[object Object],[object Object],[object Object],[object Object],PEOPLE THINK SUSHIL KULKARNI
DATA MINING PROCESS SUSHIL KULKARNI
[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],The Data Mining Process SUSHIL KULKARNI
[object Object],[object Object],[object Object],The Data Mining Process SUSHIL KULKARNI
[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],EXAMPLE SUSHIL KULKARNI
[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],EXAMPLE SUSHIL KULKARNI
DATA MINNING QUERIES SUSHIL KULKARNI
DB VS DM PROCESSING ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],SUSHIL KULKARNI
QUERY EXAMPLES ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],SUSHIL KULKARNI
[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],SUSHIL KULKARNI
KDD PROCESS SUSHIL KULKARNI
KDD PROCESS ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],SUSHIL KULKARNI
STEPS OF KDD PROCESS ,[object Object],[object Object],[object Object],[object Object],SUSHIL KULKARNI
STEPS OF KDD PROCESS ,[object Object],[object Object],[object Object],[object Object],[object Object],SUSHIL KULKARNI
STEPS OF KDD PROCESS ,[object Object],[object Object],[object Object],SUSHIL KULKARNI
VISUALIZATION TECHNIQUES Hybrid-  combination of above approaches Hierarchical-  Hierarchically dividing display area Pixel-based-  data as colored pixels Icon-based-  using colors figures as icons Geometric- boxplot, scatter plot Graphical -bar charts,pie charts histograms
Data Cleaning Data Integration Knowledge Selection Data Mining Pattern Evaluation Data Transformation Operational Databases KDD is the nontrivial extraction of implicit previously unknown and potentially useful knowledge from data ,[object Object],Data Preprocessing Data Warehouses SUSHIL KULKARNI
KDD PROCESS EX: WEB LOG ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],SUSHIL KULKARNI
KDD PROCESS EX: WEB LOG ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],SUSHIL KULKARNI
DATA MINING VS. KDD ,[object Object],[object Object],[object Object],[object Object],SUSHIL KULKARNI
KDD ISSUES ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],SUSHIL KULKARNI
KDD ISSUES ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],SUSHIL KULKARNI
DATA MINING TASKS AND METHODS SUSHIL KULKARNI
ARE ALL THE ‘DISCOVERED’ PATTERNS INTERESTING? ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],SUSHIL KULKARNI
[object Object],[object Object],[object Object],ARE ALL THE ‘DISCOVERED’ PATTERNS INTERESTING? SUSHIL KULKARNI
CAN WE FIND ALL AND ONLY INTERESTING PATTERENS? ,[object Object],[object Object],[object Object],[object Object],SUSHIL KULKARNI
[object Object],[object Object],[object Object],[object Object],[object Object],CAN WE FIND ALL AND ONLY INTERESTING PATTERENS? SUSHIL KULKARNI
Data Mining Predictive Descriptive Classification Regression Time series Analysis Prediction Clustering Summarization Association rules Sequence Discovery SUSHIL KULKARNI
Data Mining Tasks ,[object Object],[object Object],[object Object],SUSHIL KULKARNI
Data Mining Tasks ,[object Object],[object Object],[object Object],SUSHIL KULKARNI
Data Mining Methods ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],SUSHIL KULKARNI
DATA PREPROCESSING SUSHIL KULKARNI
DIRTY DATA ,[object Object],[object Object],[object Object],[object Object],SUSHIL KULKARNI
WHY DATA PREPROCESSING? ,[object Object],[object Object],[object Object],[object Object],SUSHIL KULKARNI
Why can Data be  Incomplete ? ,[object Object],[object Object],SUSHIL KULKARNI
Why can Data be  Incomplete ? ,[object Object],[object Object],[object Object],SUSHIL KULKARNI
Why can Data be   Noisy / Inconsistent  ? ,[object Object],[object Object],[object Object],[object Object],SUSHIL KULKARNI
Why can Data be   Noisy / Inconsistent  ? ,[object Object],[object Object],SUSHIL KULKARNI
TASKS IN DATA PREPROCESSING SUSHIL KULKARNI
Major Tasks in Data Preprocessing ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],outliers=exceptions! SUSHIL KULKARNI
Major Tasks in Data Preprocessing ,[object Object],[object Object],[object Object],[object Object],SUSHIL KULKARNI
Forms of data preprocessing   SUSHIL KULKARNI
DATA CLEANING SUSHIL KULKARNI
[object Object],[object Object],[object Object],[object Object],DATA CLEANING SUSHIL KULKARNI
[object Object],[object Object],HOW TO HANDLE MISSING DATA? SUSHIL KULKARNI
[object Object],[object Object],[object Object],[object Object],HOW TO HANDLE MISSING DATA? SUSHIL KULKARNI
HOW TO HANDLE MISSING DATA? Fill missing values using aggregate functions (e.g., average) or probabilistic estimates on global value distribution E.g., put the average income here, or put the most probable income based on the fact that the person is 39 years old E.g., put the most frequent team here SUSHIL KULKARNI F ? 45,390 45 F Yankees ? 39 M Red Sox 24,200 23 Gender Team Income Age
[object Object],HOW TO HANDLE NOISY DATA?    Discretization SUSHIL KULKARNI
[object Object],[object Object],[object Object],[object Object],[object Object],HOW TO HANDLE NOISY DATA?    Discretization  :  Smoothing techniques   SUSHIL KULKARNI
[object Object],[object Object],[object Object],[object Object],HOW TO HANDLE NOISY DATA?    Discretization  :  Smoothing techniques   SUSHIL KULKARNI
[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],SIMPLE DISCRETISATION    METHODS: BINNING SUSHIL KULKARNI
[object Object],[object Object],[object Object],SIMPLE DISCRETISATION    METHODS: BINNING SUSHIL KULKARNI
[object Object],[object Object],[object Object],[object Object],BINNING : EXAMPLE SUSHIL KULKARNI
[object Object],[object Object],[object Object],EXAMPLE:  EQUI- WIDTH BINNING SUSHIL KULKARNI [ 20, +) { 23, 26, 28 } 3 [10, 20) { 12, 16, 16, 18 } 2 [ - , 10) {0,4} 1 Bin Boundaries Bin Elements Bin #
[object Object],[object Object],[object Object],EXAMPLE:  EQUI- DEPTH BINNING SUSHIL KULKARNI [ 21, +) { 23, 26, 28 } 3 [14, 21) { 16, 16, 18 } 2 [ - , 14) {0,4, 12} 1 Bin Boundaries Bin Elements Bin #
SMOOTHING USING BINNING METHODS ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],SUSHIL KULKARNI
SIMPLE DISCRETISATION METHODS: BINNING Example: customer ages 0-10 10-20 20-30 30-40 40-50 50-60 60-70 70-80 Equi-width binning: number of values 0-22 22-31 44-48 32-38 38-44 48-55 55-62 62-80 Equi-depth binning: SUSHIL KULKARNI
FEW TASKS SUSHIL KULKARNI
BASIC DATA MINING TASKS ,[object Object],[object Object],[object Object],[object Object],[object Object],SUSHIL KULKARNI
CLUSTERING ,[object Object],[object Object],[object Object],SUSHIL KULKARNI
CLUSTER ANALYSIS cluster outlier salary age
CLASSIFICATION ,[object Object],[object Object],[object Object],[object Object],SUSHIL KULKARNI
REGRESSION ,[object Object],SUSHIL KULKARNI
REGRESSION x y y = x + 1 X1 Y1 (salary) (age) Example of linear regression SUSHIL KULKARNI
DATA    INTEGRATION SUSHIL KULKARNI
DATA INTEGRATION ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],SUSHIL KULKARNI
DATA INTEGRATION ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],SUSHIL KULKARNI
DATA  TRANSFORMATION SUSHIL KULKARNI
DATA TRANSFORMATION ,[object Object],[object Object],[object Object],SUSHIL KULKARNI
[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],DATA TRANSFORMATION SUSHIL KULKARNI
NORMALIZATION ,[object Object],[object Object],SUSHIL KULKARNI
NORMALIZATION ,[object Object],Where  j  is the smallest integer such that  Max(|  V  ‘  | ) <1 SUSHIL KULKARNI
SUMMARIZATION ,[object Object],[object Object],[object Object],[object Object],[object Object],SUSHIL KULKARNI
DATA    EXTRACTION,   SELECTION,    CONSTRUCTION,    COMPRESSION SUSHIL KULKARNI
TERMS ,[object Object],[object Object],[object Object],[object Object],SUSHIL KULKARNI
TERMS ,[object Object],[object Object],[object Object],[object Object],SUSHIL KULKARNI
SELECTION: DECISION TREE INDUCTION: Example Initial attribute set: {A1, A2, A3, A4, A5, A6} A4 ? A1? A6? Class 1 Class 2 Class 2 Reduced attribute set:  {A1, A4, A6} Class 1 > SUSHIL KULKARNI
DATA COMPRESSION ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],SUSHIL KULKARNI
DATA COMPRESSION ,[object Object],[object Object],SUSHIL KULKARNI
DATA COMPRESSION Original Data Compressed  Data lossless Original Data Approximated   lossy SUSHIL KULKARNI
NUMEROSITY REDUCTION:   Reduce the  volume  of data ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],SUSHIL KULKARNI
HISTOGRAM ,[object Object],[object Object],[object Object],[object Object],[object Object],SUSHIL KULKARNI
HISTOGRAM SUSHIL KULKARNI
HISTOGRAM TYPES ,[object Object],[object Object],[object Object],[object Object],SUSHIL KULKARNI
HISTOGRAM TYPES ,[object Object],[object Object],[object Object],[object Object],SUSHIL KULKARNI
HISTOGRAM TYPES ,[object Object],[object Object],[object Object],[object Object],SUSHIL KULKARNI
HIERARCHICAL REDUCTION ,[object Object],[object Object],SUSHIL KULKARNI
HIERARCHICAL REDUCTION ,[object Object],[object Object],[object Object],[object Object],SUSHIL KULKARNI
MULTIDIMENSIONAL INDEX    STRUCTURES CAN BE USED FOR    DATA REDUCTION ,[object Object],[object Object],R0 R1 R2 R3 R4 R5 R6 f c g d h b a e i Example: an R-tree R0 (0) e f c i a b R5 R6 R3 R4 R1 R2 g h d R0: R1: R2: R3: R4: R5: R6: SUSHIL KULKARNI
SAMPLING ,[object Object],[object Object],[object Object],[object Object],SUSHIL KULKARNI
SAMPLING ,[object Object],[object Object],[object Object],[object Object],[object Object],SUSHIL KULKARNI
SAMPLING SRSWOR (simple random sample without  replacement) SRSWR Raw Data SUSHIL KULKARNI
SAMPLING Raw Data  Cluster/Stratified Sample ,[object Object],[object Object],[object Object],[object Object],SUSHIL KULKARNI
LINK ANALYSIS ,[object Object],[object Object],[object Object],[object Object],[object Object],SUSHIL KULKARNI
EX: TIME SERIES ANALYSIS ,[object Object],[object Object],[object Object],[object Object],SUSHIL KULKARNI
DATA MINING DEVELOPMENT ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],SUSHIL KULKARNI
[object Object],[object Object],[object Object],[object Object],[object Object],SUSHIL KULKARNI
VIEW DATA USING DATA MINING  SUSHIL KULKARNI
DATA MINING METRICS ,[object Object],[object Object],[object Object],[object Object],SUSHIL KULKARNI
VISUALIZATION TECHNIQUES ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],SUSHIL KULKARNI
DATA BASE PERSPECTIVE ON DATA MINING ,[object Object],[object Object],[object Object],[object Object],SUSHIL KULKARNI
RELATED CONCEPTS OUTLINE ,[object Object],[object Object],[object Object],[object Object],Goal:  Examine some areas which are related to data mining. SUSHIL KULKARNI
RELATED CONCEPTS OUTLINE ,[object Object],[object Object],[object Object],[object Object],[object Object],SUSHIL KULKARNI
DB AND OLTP SYSTEMS ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],SUSHIL KULKARNI
FUZZY SETS AND LOGIC ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],SUSHIL KULKARNI
FUZZY SETS SUSHIL KULKARNI
FUZZY SETS Fuzzy set shows the triangular view of set of member ship values are shown in fuzzy set There is gradual decrease in the set of values of short, gradual increase and decrease in the set of values of median and, gradual increase in the set of values of tall. SUSHIL KULKARNI
CLASSIFICATION/ PREDICTION IS FUZZY Loan Amnt Simple Fuzzy Accept Accept Reject Reject SUSHIL KULKARNI
INFORMATION RETRIEVAL ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],SUSHIL KULKARNI
INFORMATION RETRIEVAL ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],SUSHIL KULKARNI
IR QUERY RESULT MEASURES AND CLASSIFICATION IR Classification SUSHIL KULKARNI
DIMENSION MODELING ,[object Object],[object Object],[object Object],SUSHIL KULKARNI
DIMENSION MODELING ,[object Object],[object Object],[object Object],[object Object],SUSHIL KULKARNI
AGGREGATION HIERARCHIES SUSHIL KULKARNI
STATISTICS ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],SUSHIL KULKARNI
STATISTICS ,[object Object],[object Object],[object Object],SUSHIL KULKARNI
MACHINE LEARNING ,[object Object],[object Object],[object Object],SUSHIL KULKARNI
MACHINE LEARNING ,[object Object],[object Object],[object Object],SUSHIL KULKARNI
PATTERN MATCHING (RECOGNITION) ,[object Object],[object Object],[object Object],SUSHIL KULKARNI
T H A N K S ! SUSHIL KULKARNI

More Related Content

What's hot

Introduction to Data Mining
Introduction to Data Mining Introduction to Data Mining
Introduction to Data Mining Sushil Kulkarni
 
Data Mining Techniques
Data Mining TechniquesData Mining Techniques
Data Mining TechniquesSanzid Kawsar
 
Data mining (lecture 1 & 2) conecpts and techniques
Data mining (lecture 1 & 2) conecpts and techniquesData mining (lecture 1 & 2) conecpts and techniques
Data mining (lecture 1 & 2) conecpts and techniquesSaif Ullah
 
Application of data mining
Application of data miningApplication of data mining
Application of data miningSHIVANI SONI
 
Introduction To Data Mining
Introduction To Data Mining   Introduction To Data Mining
Introduction To Data Mining Phi Jack
 
Data Mining – analyse Bank Marketing Data Set by WEKA.
Data Mining – analyse Bank Marketing Data Set by WEKA.Data Mining – analyse Bank Marketing Data Set by WEKA.
Data Mining – analyse Bank Marketing Data Set by WEKA.Mateusz Brzoska
 
Data Mining: What is Data Mining?
Data Mining: What is Data Mining?Data Mining: What is Data Mining?
Data Mining: What is Data Mining?Seerat Malik
 
01. Introduction to Data Mining and BI
01. Introduction to Data Mining and BI01. Introduction to Data Mining and BI
01. Introduction to Data Mining and BIAchmad Solichin
 
Information Technology Data Mining
Information Technology Data MiningInformation Technology Data Mining
Information Technology Data Miningsamiksha sharma
 
Data mining seminar report
Data mining seminar reportData mining seminar report
Data mining seminar reportmayurik19
 
Data mining by_ashok
Data mining by_ashokData mining by_ashok
Data mining by_ashokAshok Kumar
 
data mining and data warehousing
data mining and data warehousingdata mining and data warehousing
data mining and data warehousingSunny Gandhi
 
Data mining concepts
Data mining conceptsData mining concepts
Data mining conceptsBasit Rafiq
 

What's hot (20)

Introduction to Data Mining
Introduction to Data Mining Introduction to Data Mining
Introduction to Data Mining
 
Data Mining Techniques
Data Mining TechniquesData Mining Techniques
Data Mining Techniques
 
Data mining (lecture 1 & 2) conecpts and techniques
Data mining (lecture 1 & 2) conecpts and techniquesData mining (lecture 1 & 2) conecpts and techniques
Data mining (lecture 1 & 2) conecpts and techniques
 
Application of data mining
Application of data miningApplication of data mining
Application of data mining
 
Introduction To Data Mining
Introduction To Data Mining   Introduction To Data Mining
Introduction To Data Mining
 
Data Mining – analyse Bank Marketing Data Set by WEKA.
Data Mining – analyse Bank Marketing Data Set by WEKA.Data Mining – analyse Bank Marketing Data Set by WEKA.
Data Mining – analyse Bank Marketing Data Set by WEKA.
 
Data Mining: What is Data Mining?
Data Mining: What is Data Mining?Data Mining: What is Data Mining?
Data Mining: What is Data Mining?
 
Data Mining
Data MiningData Mining
Data Mining
 
01. Introduction to Data Mining and BI
01. Introduction to Data Mining and BI01. Introduction to Data Mining and BI
01. Introduction to Data Mining and BI
 
Information Technology Data Mining
Information Technology Data MiningInformation Technology Data Mining
Information Technology Data Mining
 
Data mining
Data miningData mining
Data mining
 
Data mining seminar report
Data mining seminar reportData mining seminar report
Data mining seminar report
 
Data mining
Data miningData mining
Data mining
 
Data mining
Data mining Data mining
Data mining
 
Data mining
Data miningData mining
Data mining
 
Data mining by_ashok
Data mining by_ashokData mining by_ashok
Data mining by_ashok
 
Data mining
Data miningData mining
Data mining
 
data mining and data warehousing
data mining and data warehousingdata mining and data warehousing
data mining and data warehousing
 
Data mining concepts
Data mining conceptsData mining concepts
Data mining concepts
 
Data mining
Data miningData mining
Data mining
 

Viewers also liked

Data mining-2
Data mining-2Data mining-2
Data mining-2Nit Hik
 
Major issues in data mining
Major issues in data miningMajor issues in data mining
Major issues in data miningSlideshare
 
Text mining presentation in Data mining Area
Text mining presentation in Data mining AreaText mining presentation in Data mining Area
Text mining presentation in Data mining AreaMahamudHasanCSE
 
Data Mining: an Introduction
Data Mining: an IntroductionData Mining: an Introduction
Data Mining: an IntroductionAli Abbasi
 
Machine Learning and Data Mining: 19 Mining Text And Web Data
Machine Learning and Data Mining: 19 Mining Text And Web DataMachine Learning and Data Mining: 19 Mining Text And Web Data
Machine Learning and Data Mining: 19 Mining Text And Web DataPier Luca Lanzi
 
NumPy and SciPy for Data Mining and Data Analysis Including iPython, SciKits,...
NumPy and SciPy for Data Mining and Data Analysis Including iPython, SciKits,...NumPy and SciPy for Data Mining and Data Analysis Including iPython, SciKits,...
NumPy and SciPy for Data Mining and Data Analysis Including iPython, SciKits,...Ryan Rosario
 
Knowledge Discovery and Data Mining
Knowledge Discovery and Data MiningKnowledge Discovery and Data Mining
Knowledge Discovery and Data MiningAmritanshu Mehra
 
Data warehousing and data mining
Data warehousing and data miningData warehousing and data mining
Data warehousing and data miningSnehali Chake
 
Chapter - 6 Data Mining Concepts and Techniques 2nd Ed slides Han & Kamber
Chapter - 6 Data Mining Concepts and Techniques 2nd Ed slides Han & KamberChapter - 6 Data Mining Concepts and Techniques 2nd Ed slides Han & Kamber
Chapter - 6 Data Mining Concepts and Techniques 2nd Ed slides Han & Kambererror007
 
Data Mining and Data Warehousing
Data Mining and Data WarehousingData Mining and Data Warehousing
Data Mining and Data WarehousingAmdocs
 
introduction to data mining tutorial
introduction to data mining tutorial introduction to data mining tutorial
introduction to data mining tutorial Salah Amean
 
Data Mining and Business Intelligence Tools
Data Mining and Business Intelligence ToolsData Mining and Business Intelligence Tools
Data Mining and Business Intelligence ToolsMotaz Saad
 

Viewers also liked (20)

Tax DSS
Tax DSSTax DSS
Tax DSS
 
Data mining
Data miningData mining
Data mining
 
Amm Icict 12 2005
Amm Icict 12 2005Amm Icict 12 2005
Amm Icict 12 2005
 
Data mining-2
Data mining-2Data mining-2
Data mining-2
 
Major issues in data mining
Major issues in data miningMajor issues in data mining
Major issues in data mining
 
Big Data v Data Mining
Big Data v Data MiningBig Data v Data Mining
Big Data v Data Mining
 
Data mining and_big_data_web
Data mining and_big_data_webData mining and_big_data_web
Data mining and_big_data_web
 
Analytics and Data Mining Industry Overview
Analytics and Data Mining Industry OverviewAnalytics and Data Mining Industry Overview
Analytics and Data Mining Industry Overview
 
Text mining presentation in Data mining Area
Text mining presentation in Data mining AreaText mining presentation in Data mining Area
Text mining presentation in Data mining Area
 
Lecture 01 Data Mining
Lecture 01 Data MiningLecture 01 Data Mining
Lecture 01 Data Mining
 
Data Mining: an Introduction
Data Mining: an IntroductionData Mining: an Introduction
Data Mining: an Introduction
 
Machine Learning and Data Mining: 19 Mining Text And Web Data
Machine Learning and Data Mining: 19 Mining Text And Web DataMachine Learning and Data Mining: 19 Mining Text And Web Data
Machine Learning and Data Mining: 19 Mining Text And Web Data
 
NumPy and SciPy for Data Mining and Data Analysis Including iPython, SciKits,...
NumPy and SciPy for Data Mining and Data Analysis Including iPython, SciKits,...NumPy and SciPy for Data Mining and Data Analysis Including iPython, SciKits,...
NumPy and SciPy for Data Mining and Data Analysis Including iPython, SciKits,...
 
Data Mining Overview
Data Mining OverviewData Mining Overview
Data Mining Overview
 
Knowledge Discovery and Data Mining
Knowledge Discovery and Data MiningKnowledge Discovery and Data Mining
Knowledge Discovery and Data Mining
 
Data warehousing and data mining
Data warehousing and data miningData warehousing and data mining
Data warehousing and data mining
 
Chapter - 6 Data Mining Concepts and Techniques 2nd Ed slides Han & Kamber
Chapter - 6 Data Mining Concepts and Techniques 2nd Ed slides Han & KamberChapter - 6 Data Mining Concepts and Techniques 2nd Ed slides Han & Kamber
Chapter - 6 Data Mining Concepts and Techniques 2nd Ed slides Han & Kamber
 
Data Mining and Data Warehousing
Data Mining and Data WarehousingData Mining and Data Warehousing
Data Mining and Data Warehousing
 
introduction to data mining tutorial
introduction to data mining tutorial introduction to data mining tutorial
introduction to data mining tutorial
 
Data Mining and Business Intelligence Tools
Data Mining and Business Intelligence ToolsData Mining and Business Intelligence Tools
Data Mining and Business Intelligence Tools
 

Similar to Ch 1 Intro to Data Mining

Data mining an introduction
Data mining an introductionData mining an introduction
Data mining an introductionDr-Dipali Meher
 
DM-Unit-1-Part 1-R.pdf
DM-Unit-1-Part 1-R.pdfDM-Unit-1-Part 1-R.pdf
DM-Unit-1-Part 1-R.pdfssuserb933d8
 
TTG Int.LTD Data Mining Technique
TTG Int.LTD Data Mining TechniqueTTG Int.LTD Data Mining Technique
TTG Int.LTD Data Mining TechniqueMehmet Beyaz
 
Data mining , Knowledge Discovery Process, Classification
Data mining , Knowledge Discovery Process, ClassificationData mining , Knowledge Discovery Process, Classification
Data mining , Knowledge Discovery Process, ClassificationDr. Abdul Ahad Abro
 
Lect 1 introduction
Lect 1 introductionLect 1 introduction
Lect 1 introductionhktripathy
 
Data Mining and Data Warehousing (MAKAUT)
Data Mining and Data Warehousing (MAKAUT)Data Mining and Data Warehousing (MAKAUT)
Data Mining and Data Warehousing (MAKAUT)Bikramjit Sarkar, Ph.D.
 
Data Mining : Concepts and Techniques
Data Mining : Concepts and TechniquesData Mining : Concepts and Techniques
Data Mining : Concepts and TechniquesDeepaR42
 
Data Mining & Data Warehousing Lecture Notes
Data Mining & Data Warehousing Lecture NotesData Mining & Data Warehousing Lecture Notes
Data Mining & Data Warehousing Lecture NotesFellowBuddy.com
 
Data mining introduction
Data mining introductionData mining introduction
Data mining introductionBasma Gamal
 
Data Mining- Unit-I PPT (1).ppt
Data Mining- Unit-I PPT (1).pptData Mining- Unit-I PPT (1).ppt
Data Mining- Unit-I PPT (1).pptAravindReddy565690
 
2 introductory slides
2 introductory slides2 introductory slides
2 introductory slidestafosepsdfasg
 
Data Mining – A Perspective Approach
Data Mining – A Perspective ApproachData Mining – A Perspective Approach
Data Mining – A Perspective ApproachIRJET Journal
 
chapter1_Introduction.pdf data mining ppt
chapter1_Introduction.pdf data mining pptchapter1_Introduction.pdf data mining ppt
chapter1_Introduction.pdf data mining pptGyanaKarn
 
UNIT2-Data Mining.pdf
UNIT2-Data Mining.pdfUNIT2-Data Mining.pdf
UNIT2-Data Mining.pdfNancykumari47
 

Similar to Ch 1 Intro to Data Mining (20)

Data mining an introduction
Data mining an introductionData mining an introduction
Data mining an introduction
 
Data Mining and Knowledge
Data Mining and KnowledgeData Mining and Knowledge
Data Mining and Knowledge
 
DM-Unit-1-Part 1-R.pdf
DM-Unit-1-Part 1-R.pdfDM-Unit-1-Part 1-R.pdf
DM-Unit-1-Part 1-R.pdf
 
TTG Int.LTD Data Mining Technique
TTG Int.LTD Data Mining TechniqueTTG Int.LTD Data Mining Technique
TTG Int.LTD Data Mining Technique
 
Introduction to Data Mining
Introduction to Data MiningIntroduction to Data Mining
Introduction to Data Mining
 
Chapter 1: Introduction to Data Mining
Chapter 1: Introduction to Data MiningChapter 1: Introduction to Data Mining
Chapter 1: Introduction to Data Mining
 
Data mining , Knowledge Discovery Process, Classification
Data mining , Knowledge Discovery Process, ClassificationData mining , Knowledge Discovery Process, Classification
Data mining , Knowledge Discovery Process, Classification
 
Lect 1 introduction
Lect 1 introductionLect 1 introduction
Lect 1 introduction
 
Unit i
Unit iUnit i
Unit i
 
Data Mining and Data Warehousing (MAKAUT)
Data Mining and Data Warehousing (MAKAUT)Data Mining and Data Warehousing (MAKAUT)
Data Mining and Data Warehousing (MAKAUT)
 
Data Mining : Concepts and Techniques
Data Mining : Concepts and TechniquesData Mining : Concepts and Techniques
Data Mining : Concepts and Techniques
 
Data Mining & Data Warehousing Lecture Notes
Data Mining & Data Warehousing Lecture NotesData Mining & Data Warehousing Lecture Notes
Data Mining & Data Warehousing Lecture Notes
 
Data mining introduction
Data mining introductionData mining introduction
Data mining introduction
 
Data Mining- Unit-I PPT (1).ppt
Data Mining- Unit-I PPT (1).pptData Mining- Unit-I PPT (1).ppt
Data Mining- Unit-I PPT (1).ppt
 
2 introductory slides
2 introductory slides2 introductory slides
2 introductory slides
 
Seminar Report Vaibhav
Seminar Report VaibhavSeminar Report Vaibhav
Seminar Report Vaibhav
 
Data Mining – A Perspective Approach
Data Mining – A Perspective ApproachData Mining – A Perspective Approach
Data Mining – A Perspective Approach
 
chapter1_Introduction.pdf data mining ppt
chapter1_Introduction.pdf data mining pptchapter1_Introduction.pdf data mining ppt
chapter1_Introduction.pdf data mining ppt
 
dwdm unit 1.ppt
dwdm unit 1.pptdwdm unit 1.ppt
dwdm unit 1.ppt
 
UNIT2-Data Mining.pdf
UNIT2-Data Mining.pdfUNIT2-Data Mining.pdf
UNIT2-Data Mining.pdf
 

Recently uploaded

Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationSafe Software
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 
Vector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector DatabasesVector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector DatabasesZilliz
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenHervé Boutemy
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machinePadma Pradeep
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Wonjun Hwang
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Scott Keck-Warren
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyAlfredo García Lavilla
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays
 
The Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdfThe Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdfSeasiaInfotech2
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostZilliz
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsMiki Katsuragi
 

Recently uploaded (20)

Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 
Vector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector DatabasesVector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector Databases
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache Maven
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machine
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck Presentation
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easy
 
DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special EditionDMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
 
The Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdfThe Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdf
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering Tips
 

Ch 1 Intro to Data Mining

  • 1. SUSHIL KULKARNI INTRODUCTION TO DATA MINING
  • 2.
  • 4.
  • 5.
  • 6.
  • 7. DATA MINING DEFINITION SUSHIL KULKARNI
  • 8.
  • 9.
  • 10.
  • 11.
  • 12.
  • 13.
  • 14.
  • 16.
  • 17.
  • 18.
  • 19. DATA MINING PROCESS SUSHIL KULKARNI
  • 20.
  • 21.
  • 22.
  • 23.
  • 24. DATA MINNING QUERIES SUSHIL KULKARNI
  • 25.
  • 26.
  • 27.
  • 28. KDD PROCESS SUSHIL KULKARNI
  • 29.
  • 30.
  • 31.
  • 32.
  • 33. VISUALIZATION TECHNIQUES Hybrid- combination of above approaches Hierarchical- Hierarchically dividing display area Pixel-based- data as colored pixels Icon-based- using colors figures as icons Geometric- boxplot, scatter plot Graphical -bar charts,pie charts histograms
  • 34.
  • 35.
  • 36.
  • 37.
  • 38.
  • 39.
  • 40. DATA MINING TASKS AND METHODS SUSHIL KULKARNI
  • 41.
  • 42.
  • 43.
  • 44.
  • 45. Data Mining Predictive Descriptive Classification Regression Time series Analysis Prediction Clustering Summarization Association rules Sequence Discovery SUSHIL KULKARNI
  • 46.
  • 47.
  • 48.
  • 50.
  • 51.
  • 52.
  • 53.
  • 54.
  • 55.
  • 56. TASKS IN DATA PREPROCESSING SUSHIL KULKARNI
  • 57.
  • 58.
  • 59. Forms of data preprocessing SUSHIL KULKARNI
  • 61.
  • 62.
  • 63.
  • 64. HOW TO HANDLE MISSING DATA? Fill missing values using aggregate functions (e.g., average) or probabilistic estimates on global value distribution E.g., put the average income here, or put the most probable income based on the fact that the person is 39 years old E.g., put the most frequent team here SUSHIL KULKARNI F ? 45,390 45 F Yankees ? 39 M Red Sox 24,200 23 Gender Team Income Age
  • 65.
  • 66.
  • 67.
  • 68.
  • 69.
  • 70.
  • 71.
  • 72.
  • 73.
  • 74. SIMPLE DISCRETISATION METHODS: BINNING Example: customer ages 0-10 10-20 20-30 30-40 40-50 50-60 60-70 70-80 Equi-width binning: number of values 0-22 22-31 44-48 32-38 38-44 48-55 55-62 62-80 Equi-depth binning: SUSHIL KULKARNI
  • 75. FEW TASKS SUSHIL KULKARNI
  • 76.
  • 77.
  • 78. CLUSTER ANALYSIS cluster outlier salary age
  • 79.
  • 80.
  • 81. REGRESSION x y y = x + 1 X1 Y1 (salary) (age) Example of linear regression SUSHIL KULKARNI
  • 82. DATA INTEGRATION SUSHIL KULKARNI
  • 83.
  • 84.
  • 85. DATA TRANSFORMATION SUSHIL KULKARNI
  • 86.
  • 87.
  • 88.
  • 89.
  • 90.
  • 91. DATA EXTRACTION, SELECTION, CONSTRUCTION, COMPRESSION SUSHIL KULKARNI
  • 92.
  • 93.
  • 94. SELECTION: DECISION TREE INDUCTION: Example Initial attribute set: {A1, A2, A3, A4, A5, A6} A4 ? A1? A6? Class 1 Class 2 Class 2 Reduced attribute set: {A1, A4, A6} Class 1 > SUSHIL KULKARNI
  • 95.
  • 96.
  • 97. DATA COMPRESSION Original Data Compressed Data lossless Original Data Approximated lossy SUSHIL KULKARNI
  • 98.
  • 99.
  • 101.
  • 102.
  • 103.
  • 104.
  • 105.
  • 106.
  • 107.
  • 108.
  • 109. SAMPLING SRSWOR (simple random sample without replacement) SRSWR Raw Data SUSHIL KULKARNI
  • 110.
  • 111.
  • 112.
  • 113.
  • 114.
  • 115. VIEW DATA USING DATA MINING SUSHIL KULKARNI
  • 116.
  • 117.
  • 118.
  • 119.
  • 120.
  • 121.
  • 122.
  • 123. FUZZY SETS SUSHIL KULKARNI
  • 124. FUZZY SETS Fuzzy set shows the triangular view of set of member ship values are shown in fuzzy set There is gradual decrease in the set of values of short, gradual increase and decrease in the set of values of median and, gradual increase in the set of values of tall. SUSHIL KULKARNI
  • 125. CLASSIFICATION/ PREDICTION IS FUZZY Loan Amnt Simple Fuzzy Accept Accept Reject Reject SUSHIL KULKARNI
  • 126.
  • 127.
  • 128. IR QUERY RESULT MEASURES AND CLASSIFICATION IR Classification SUSHIL KULKARNI
  • 129.
  • 130.
  • 132.
  • 133.
  • 134.
  • 135.
  • 136.
  • 137. T H A N K S ! SUSHIL KULKARNI