SlideShare uma empresa Scribd logo
1 de 28
Lecture Notes 1: Introduction to Data Mining Zhangxi Lin ISQS 6347 Texas Tech University ISQS 6347, Data & Text Mining
What is Data Mining? ,[object Object],[object Object],[object Object],[object Object],ISQS 6347, Data & Text Mining
Data Mining Process ISQS 6347, Data & Text Mining
What is Text Mining? ,[object Object],ISQS 6347, Data & Text Mining Patterns Trends Associations
Motivation for Text Mining ,[object Object],[object Object],ISQS 6347, Data & Text Mining 90% Structured Numerical or Coded Information 10% Unstructured or Semi-structured Information
Text Mining Process ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],ISQS 6347, Data & Text Mining
[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],Why Mine Data? Commercial Viewpoint ISQS 6347, Data & Text Mining
Why Mine Data? Scientific Viewpoint ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
[object Object],[object Object],[object Object],[object Object],[object Object],Origins of Data Mining ISQS 6347, Data & Text Mining Machine Learning/ Pattern   Recognition Statistics/ AI Data Mining Database systems
ISQS 6347, Data & Text Mining ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Data Mining Tasks ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],ISQS 6347, Data & Text Mining
Classification: Definition ,[object Object],[object Object],[object Object],[object Object],[object Object],ISQS 6347, Data & Text Mining
Classification Example ISQS 6347, Data & Text Mining categorical categorical continuous class Training  Set Learn  Classifier Test Set Model
Classification: Application 1 ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],ISQS 6347, Data & Text Mining From [Berry & Linoff] Data Mining Techniques, 1997
Classification: Application 2 ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],ISQS 6347, Data & Text Mining
Clustering Definition ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],ISQS 6347, Data & Text Mining
Illustrating Clustering ISQS 6347, Data & Text Mining ,[object Object],Intracluster distances are minimized Intercluster distances are maximized
Clustering Example ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],ISQS 6347, Data & Text Mining
Association Rule Discovery: Definition ,[object Object],[object Object],ISQS 6347, Data & Text Mining Rules Discovered: {Milk} --> {Coke} {Diaper, Milk} --> {Beer}
Association Rule Discovery Example ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],ISQS 6347, Data & Text Mining
Regression ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],ISQS 6347, Data & Text Mining
Deviation/Anomaly Detection ,[object Object],[object Object],[object Object],[object Object],ISQS 6347, Data & Text Mining Typical network traffic at University level may reach over 100 million connections per day
Text Mining Tasks ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],ISQS 6347, Data & Text Mining
[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],Example:  Decision Support using Bank Call Center Data ISQS 6347, Data & Text Mining
Example:  Decision Support using Bank Call Center Data ,[object Object],[object Object],[object Object],ISQS 6347, Data & Text Mining AC2G31, 01, 0101, PCC, 021, 0053352,  NEW YORK, NY , H-SUPRVR8,  STMT ,  “ Mr. Stark has been with the company for about 20 yrs. He  hates  his  stmt   format and wishes that we would show a daily balance to help him know when he falls below the required balance on the account.”
Challenges of Data Mining ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],ISQS 6347, Data & Text Mining
Challenges of Text Mining ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],ISQS 6347, Data & Text Mining
SAS Training/Self-taught Courses ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],ISQS 6347, Data & Text Mining

Mais conteúdo relacionado

Mais procurados

Data mining seminar report
Data mining seminar reportData mining seminar report
Data mining seminar report
mayurik19
 
Information Technology Data Mining
Information Technology Data MiningInformation Technology Data Mining
Information Technology Data Mining
samiksha sharma
 

Mais procurados (19)

Introduction to data mining technique
Introduction to data mining techniqueIntroduction to data mining technique
Introduction to data mining technique
 
Data mining presentation.ppt
Data mining presentation.pptData mining presentation.ppt
Data mining presentation.ppt
 
Data mining seminar report
Data mining seminar reportData mining seminar report
Data mining seminar report
 
Data mining and knowledge Discovery
Data mining and knowledge DiscoveryData mining and knowledge Discovery
Data mining and knowledge Discovery
 
Data Mining Concepts
Data Mining ConceptsData Mining Concepts
Data Mining Concepts
 
Data mining
Data miningData mining
Data mining
 
Data mining & data warehousing
Data mining & data warehousingData mining & data warehousing
Data mining & data warehousing
 
Data mining
Data miningData mining
Data mining
 
Data mining
Data miningData mining
Data mining
 
Information Technology Data Mining
Information Technology Data MiningInformation Technology Data Mining
Information Technology Data Mining
 
Importance of Data Mining
Importance of Data MiningImportance of Data Mining
Importance of Data Mining
 
MC0088 Internal Assignment (SMU)
MC0088 Internal Assignment (SMU)MC0088 Internal Assignment (SMU)
MC0088 Internal Assignment (SMU)
 
Data mining concepts
Data mining conceptsData mining concepts
Data mining concepts
 
Introduction to Data Mining
Introduction to Data MiningIntroduction to Data Mining
Introduction to Data Mining
 
An introduction to data mining and its techniques
An introduction to data mining and its techniquesAn introduction to data mining and its techniques
An introduction to data mining and its techniques
 
knowledge discovery and data mining approach in databases (2)
knowledge discovery and data mining approach in databases (2)knowledge discovery and data mining approach in databases (2)
knowledge discovery and data mining approach in databases (2)
 
Data Mining
Data MiningData Mining
Data Mining
 
Introduction data mining
Introduction data miningIntroduction data mining
Introduction data mining
 
Abstract
AbstractAbstract
Abstract
 

Destaque (7)

Makers all around you
Makers all around youMakers all around you
Makers all around you
 
My Law
My LawMy Law
My Law
 
Testing
TestingTesting
Testing
 
My Law
My LawMy Law
My Law
 
Seguimiento Merge Sort
Seguimiento Merge SortSeguimiento Merge Sort
Seguimiento Merge Sort
 
Creative spaces
Creative spaces Creative spaces
Creative spaces
 
Testing
TestingTesting
Testing
 

Semelhante a D M1

Data warehouse and data mining
Data warehouse and data miningData warehouse and data mining
Data warehouse and data mining
Rohit Kumar
 

Semelhante a D M1 (20)

Data mining-basic
Data mining-basicData mining-basic
Data mining-basic
 
Data Mining
Data MiningData Mining
Data Mining
 
Week-1-Introduction to Data Mining.pptx
Week-1-Introduction to Data Mining.pptxWeek-1-Introduction to Data Mining.pptx
Week-1-Introduction to Data Mining.pptx
 
Data-Mining-ppt (1).pptx
Data-Mining-ppt (1).pptxData-Mining-ppt (1).pptx
Data-Mining-ppt (1).pptx
 
Data-Mining-ppt.pptx
Data-Mining-ppt.pptxData-Mining-ppt.pptx
Data-Mining-ppt.pptx
 
data.2.pptx
data.2.pptxdata.2.pptx
data.2.pptx
 
Lect 1 introduction
Lect 1 introductionLect 1 introduction
Lect 1 introduction
 
Data mining final year project in jalandhar
Data mining final year project in jalandharData mining final year project in jalandhar
Data mining final year project in jalandhar
 
Data mining final year project in ludhiana
Data mining final year project in ludhianaData mining final year project in ludhiana
Data mining final year project in ludhiana
 
Data Mining – A Perspective Approach
Data Mining – A Perspective ApproachData Mining – A Perspective Approach
Data Mining – A Perspective Approach
 
Lect 1 introduction
Lect 1 introductionLect 1 introduction
Lect 1 introduction
 
Data mining
Data miningData mining
Data mining
 
Data warehouse and data mining
Data warehouse and data miningData warehouse and data mining
Data warehouse and data mining
 
Data Mining
Data MiningData Mining
Data Mining
 
Data Mining
Data MiningData Mining
Data Mining
 
6months industrial training in data mining,ludhiana
6months industrial training in data mining,ludhiana6months industrial training in data mining,ludhiana
6months industrial training in data mining,ludhiana
 
6months industrial training in data mining, jalandhar
6months industrial training in data mining, jalandhar6months industrial training in data mining, jalandhar
6months industrial training in data mining, jalandhar
 
6 weeks summer training in data mining,ludhiana
6 weeks summer training in data mining,ludhiana6 weeks summer training in data mining,ludhiana
6 weeks summer training in data mining,ludhiana
 
6 weeks summer training in data mining,jalandhar
6 weeks summer training in data mining,jalandhar6 weeks summer training in data mining,jalandhar
6 weeks summer training in data mining,jalandhar
 
Data mining
Data miningData mining
Data mining
 

Último

Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Victor Rentea
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 

Último (20)

Understanding the FAA Part 107 License ..
Understanding the FAA Part 107 License ..Understanding the FAA Part 107 License ..
Understanding the FAA Part 107 License ..
 
Vector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptxVector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptx
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor Presentation
 
Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)
 
Platformless Horizons for Digital Adaptability
Platformless Horizons for Digital AdaptabilityPlatformless Horizons for Digital Adaptability
Platformless Horizons for Digital Adaptability
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptx
 
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
 
CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In PakistanCNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistan
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamDEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 

D M1

  • 1. Lecture Notes 1: Introduction to Data Mining Zhangxi Lin ISQS 6347 Texas Tech University ISQS 6347, Data & Text Mining
  • 2.
  • 3. Data Mining Process ISQS 6347, Data & Text Mining
  • 4.
  • 5.
  • 6.
  • 7.
  • 8.
  • 9.
  • 10.
  • 11.
  • 12.
  • 13. Classification Example ISQS 6347, Data & Text Mining categorical categorical continuous class Training Set Learn Classifier Test Set Model
  • 14.
  • 15.
  • 16.
  • 17.
  • 18.
  • 19.
  • 20.
  • 21.
  • 22.
  • 23.
  • 24.
  • 25.
  • 26.
  • 27.
  • 28.