SlideShare uma empresa Scribd logo
1 de 17
Data Mining
www.StudsPlanet.com
Agenda
 What is Data Mining?
 Data Mining Tasks
 Challenges in Data mining
www.StudsPlanet.com
What is Data Mining
 Data mining is integral part of knowledge
discovery in databases (KDD), which is the
overall process of converting raw data into
useful information. This process consists of
series of transformation steps from
preprocessing to postprocessing of data
mining results
www.StudsPlanet.com
Process of Knowledge
Discovery in Database(KDD)
Data
Preprocessing
Data Mining PostProcessing
Normalization.
Data subsetting
Normalization.
Data subsetting
Filtering
Patterns,Visualization,
Pattern Interpretation
Filtering
Patterns,Visualization,
Pattern Interpretation
Inputdata
Input
Data Information
www.StudsPlanet.com
Data Mining Tasks
 Data Mining is generally divided into two
tasks.
1. Predictive tasks
2. Descriptive tasks
www.StudsPlanet.com
Predictive Tasks
 Objective: Predict the value of a specific
attribute (target/dependent variable)based
on the value of other attributes
(explanatory).
Example: Judge if a patient has specific
disease based on his/her medical tests results.
www.StudsPlanet.com
Descriptive Tasks
 Objective: To derive patterns
(correlation,trends,trajectories) that
summarizes the underlying relationship
between data.
Example: Identifying web pages that are
accessed together.(human interpretable
pattern)
www.StudsPlanet.com
Data Mining Tasks [contd.]
 Classification [Predictive]
 Clustering [Descriptive]
 Association Rule Discovery[Descriptive]
 Sequential Pattern Discovery [Descriptive]
 Regression [Predictive]
 Deviation Detection [Predictive]
www.StudsPlanet.com
Classification: Definition
 Classification: Given a collection of records
 Each record contains a set of attributes, one of the
attribute is a class.
 Find a model for class attribute as a function of
values of other attributes.
 Goal: previously unseen records should be
assigned a class as accurately as possible.
 A test set is used to determine the accuracy of the model.
Usually, the given data set is divided into training and
test sets, with training set used to build the model and
test set used to validate it.www.StudsPlanet.com
Classification: Example
 Direct Marketing
Goal: Reduce cost of mailing by targeting a set of
consumers likely to buy a new product.
 Approach:
 Use the data for a similar product introduced before.
 We know which customers decided to buy and which decided
otherwise. This {buy, don’t buy} decision forms the class
attribute.
 Collect various demographic, lifestyle, and company-interaction
related information about all such customers.
 Type of business, where they stay, how much they earn, etc.
 Use this information as input attributes to learn a classifier
model. (from Berry & Linoff, 1997)
www.StudsPlanet.com
Clustering: Definition
 Given a set of data points, each having a set
of attributes, and a similarity measure among
them, find clusters such that
 Data points in one cluster are more similar to one
another.
 Data points in separate clusters are less similar to
one another.
www.StudsPlanet.com
Clustering: Example
 Document Clustering:
 Goal: To find groups of documents that are similar to
each other based on the important terms appearing in
them.
 Approach: To identify frequently occurring terms in
each document. Form a similarity measure based on the
frequencies of different terms. Use it to cluster.
 Gain: Information Retrieval can utilize the clusters to
relate a new document or search term to clustered
documents.
www.StudsPlanet.com
Illustrating Document Clustering
Category Total
Articles
Correctly Placed
Financial 555 364
Foreign 341 260
National 273 36
Metro 943 746
Sports 738 573
Entertainment 354 278
Clustering Points: 3204 Articles Of Los Angles Times.
Similarity Measure: How Many words are common in these
documents. (after some word filtering) (Introduction to Data mining 2007)
www.StudsPlanet.com
Association Rule Discovery:
Definition
Given a set of records each of which contain some number of items
from a given collection;
Apriori principle: If an item set is frequent then its subset is also
frequent
TID Items
1 Bread, Coke Milk
2
3
Beer, Bread
Beer,Coke, Diaper, Milk
4 Beer, Bread, Diaper,
Milk
5 Coke, Diaper, Milk
Rule Discovered:
Milk -> Coke
Diaper, Milk -> Beer
www.StudsPlanet.com
Other Mining Tasks in Nutshell
 Sequential Pattern Discovery
In point-of-sale transaction sequences,
 Computer Bookstore:
(Intro_To_Visual_C) (C++_Primer) -->
(Perl_for_dummies,Tcl_Tk)
 Regression: Neural Networks
 Deviation Detection: Detect deviation from normal
behavior. Eg. Credit card fraud.
www.StudsPlanet.com
Challenges of Data Mining
 Scalability
 Dimensionality
 Complex and Heterogeneous Data
 Data Quality
 Data Ownership and Distribution
 Privacy Preservation
 Streaming Data
www.StudsPlanet.com
References
 Tan, P., Steinbach, M., & Kumar, V.,
Introduction to Data Mining. Addison
Wesley, 2006.
www.StudsPlanet.com

Mais conteúdo relacionado

Mais procurados

Datamining
DataminingDatamining
Dataminingsumit621
 
Knowledge discovery thru data mining
Knowledge discovery thru data miningKnowledge discovery thru data mining
Knowledge discovery thru data miningDevakumar Jain
 
Additional themes of data mining for Msc CS
Additional themes of data mining for Msc CSAdditional themes of data mining for Msc CS
Additional themes of data mining for Msc CSThanveen
 
Datamining - On What Kind of Data
Datamining - On What Kind of DataDatamining - On What Kind of Data
Datamining - On What Kind of Datawina wulansari
 
Data mining , Knowledge Discovery Process, Classification
Data mining , Knowledge Discovery Process, ClassificationData mining , Knowledge Discovery Process, Classification
Data mining , Knowledge Discovery Process, ClassificationDr. Abdul Ahad Abro
 
Data Mining Primitives, Languages & Systems
Data Mining Primitives, Languages & SystemsData Mining Primitives, Languages & Systems
Data Mining Primitives, Languages & SystemsNiloy Sikder
 
1.2 steps and functionalities
1.2 steps and functionalities1.2 steps and functionalities
1.2 steps and functionalitiesRajendran
 
Knowledge discovery process
Knowledge discovery process Knowledge discovery process
Knowledge discovery process Shuvra Ghosh
 
Data mining concepts and work
Data mining concepts and workData mining concepts and work
Data mining concepts and workAmr Abd El Latief
 
Data mining
Data mining Data mining
Data mining AthiraR23
 
Introduction To Data Mining
Introduction To Data Mining   Introduction To Data Mining
Introduction To Data Mining Phi Jack
 
Data Mining methodology
 Data Mining methodology  Data Mining methodology
Data Mining methodology rebeccatho
 
Data Mining: Classification and analysis
Data Mining: Classification and analysisData Mining: Classification and analysis
Data Mining: Classification and analysisDataminingTools Inc
 
introduction to data mining tutorial
introduction to data mining tutorial introduction to data mining tutorial
introduction to data mining tutorial Salah Amean
 

Mais procurados (20)

Datamining
DataminingDatamining
Datamining
 
Knowledge discovery thru data mining
Knowledge discovery thru data miningKnowledge discovery thru data mining
Knowledge discovery thru data mining
 
Additional themes of data mining for Msc CS
Additional themes of data mining for Msc CSAdditional themes of data mining for Msc CS
Additional themes of data mining for Msc CS
 
Datamining - On What Kind of Data
Datamining - On What Kind of DataDatamining - On What Kind of Data
Datamining - On What Kind of Data
 
Data mining , Knowledge Discovery Process, Classification
Data mining , Knowledge Discovery Process, ClassificationData mining , Knowledge Discovery Process, Classification
Data mining , Knowledge Discovery Process, Classification
 
Data Mining Primitives, Languages & Systems
Data Mining Primitives, Languages & SystemsData Mining Primitives, Languages & Systems
Data Mining Primitives, Languages & Systems
 
1.2 steps and functionalities
1.2 steps and functionalities1.2 steps and functionalities
1.2 steps and functionalities
 
Database
DatabaseDatabase
Database
 
Knowledge discovery process
Knowledge discovery process Knowledge discovery process
Knowledge discovery process
 
Data mining concepts and work
Data mining concepts and workData mining concepts and work
Data mining concepts and work
 
Lecture1
Lecture1Lecture1
Lecture1
 
Data mining
Data miningData mining
Data mining
 
Data mining
Data mining Data mining
Data mining
 
Introduction To Data Mining
Introduction To Data Mining   Introduction To Data Mining
Introduction To Data Mining
 
3 Data Mining Tasks
3  Data Mining Tasks3  Data Mining Tasks
3 Data Mining Tasks
 
2 Data-mining process
2   Data-mining process2   Data-mining process
2 Data-mining process
 
Lecture - Data Mining
Lecture - Data MiningLecture - Data Mining
Lecture - Data Mining
 
Data Mining methodology
 Data Mining methodology  Data Mining methodology
Data Mining methodology
 
Data Mining: Classification and analysis
Data Mining: Classification and analysisData Mining: Classification and analysis
Data Mining: Classification and analysis
 
introduction to data mining tutorial
introduction to data mining tutorial introduction to data mining tutorial
introduction to data mining tutorial
 

Semelhante a Data mining

Semelhante a Data mining (20)

data mining presentation power point for the study
data mining presentation power point for the studydata mining presentation power point for the study
data mining presentation power point for the study
 
lect1lect1lect1lect1lect1lect1lect1lect1.ppt
lect1lect1lect1lect1lect1lect1lect1lect1.pptlect1lect1lect1lect1lect1lect1lect1lect1.ppt
lect1lect1lect1lect1lect1lect1lect1lect1.ppt
 
lect1.ppt
lect1.pptlect1.ppt
lect1.ppt
 
Introduction to Data Mining
Introduction to Data MiningIntroduction to Data Mining
Introduction to Data Mining
 
Chapter 1: Introduction to Data Mining
Chapter 1: Introduction to Data MiningChapter 1: Introduction to Data Mining
Chapter 1: Introduction to Data Mining
 
Introduction-to-Knowledge Discovery in Database
Introduction-to-Knowledge Discovery in DatabaseIntroduction-to-Knowledge Discovery in Database
Introduction-to-Knowledge Discovery in Database
 
Data mining
Data miningData mining
Data mining
 
Data-Mining-ppt (1).pptx
Data-Mining-ppt (1).pptxData-Mining-ppt (1).pptx
Data-Mining-ppt (1).pptx
 
Data-Mining-ppt.pptx
Data-Mining-ppt.pptxData-Mining-ppt.pptx
Data-Mining-ppt.pptx
 
data.2.pptx
data.2.pptxdata.2.pptx
data.2.pptx
 
Talk
TalkTalk
Talk
 
D M1
D M1D M1
D M1
 
Testing
TestingTesting
Testing
 
Testing
TestingTesting
Testing
 
Seminar Presentation
Seminar PresentationSeminar Presentation
Seminar Presentation
 
Data Mining
Data MiningData Mining
Data Mining
 
Data Mining
Data MiningData Mining
Data Mining
 
Chapter 1.pdf
Chapter 1.pdfChapter 1.pdf
Chapter 1.pdf
 
Introduction to Data Mining
Introduction to Data Mining Introduction to Data Mining
Introduction to Data Mining
 
Data Mining & Data Warehousing Lecture Notes
Data Mining & Data Warehousing Lecture NotesData Mining & Data Warehousing Lecture Notes
Data Mining & Data Warehousing Lecture Notes
 

Mais de Nits Kedia

Sugar industry
Sugar industrySugar industry
Sugar industryNits Kedia
 
Nokia international product life cycle 1
Nokia international product life cycle 1Nokia international product life cycle 1
Nokia international product life cycle 1Nits Kedia
 
Nokia the morph concept
Nokia  the morph conceptNokia  the morph concept
Nokia the morph conceptNits Kedia
 
Meaning & nature of resources
Meaning & nature of resourcesMeaning & nature of resources
Meaning & nature of resourcesNits Kedia
 
Leadership lesson from india
Leadership lesson from indiaLeadership lesson from india
Leadership lesson from indiaNits Kedia
 
Leadership across culture
Leadership across cultureLeadership across culture
Leadership across cultureNits Kedia
 
Labout and employmenr discimination law
Labout and employmenr discimination lawLabout and employmenr discimination law
Labout and employmenr discimination lawNits Kedia
 
International law and wto
International law and wtoInternational law and wto
International law and wtoNits Kedia
 
Intellectual property rights (2)
Intellectual property rights (2)Intellectual property rights (2)
Intellectual property rights (2)Nits Kedia
 
India's 5 year plan startegy
India's 5 year plan startegyIndia's 5 year plan startegy
India's 5 year plan startegyNits Kedia
 
Import clearance procedure
Import clearance procedureImport clearance procedure
Import clearance procedureNits Kedia
 
Human environment
Human environmentHuman environment
Human environmentNits Kedia
 
Globaliation p point
Globaliation p pointGlobaliation p point
Globaliation p pointNits Kedia
 
Foreign corrupt practises act(fcpa)
Foreign corrupt practises act(fcpa)Foreign corrupt practises act(fcpa)
Foreign corrupt practises act(fcpa)Nits Kedia
 
Financial services
Financial servicesFinancial services
Financial servicesNits Kedia
 
Financial services 1
Financial services 1Financial services 1
Financial services 1Nits Kedia
 

Mais de Nits Kedia (20)

Wto
WtoWto
Wto
 
Trips
TripsTrips
Trips
 
Sugar industry
Sugar industrySugar industry
Sugar industry
 
Nokia international product life cycle 1
Nokia international product life cycle 1Nokia international product life cycle 1
Nokia international product life cycle 1
 
Nokia the morph concept
Nokia  the morph conceptNokia  the morph concept
Nokia the morph concept
 
Meaning & nature of resources
Meaning & nature of resourcesMeaning & nature of resources
Meaning & nature of resources
 
Leadership lesson from india
Leadership lesson from indiaLeadership lesson from india
Leadership lesson from india
 
Leadership across culture
Leadership across cultureLeadership across culture
Leadership across culture
 
Labout and employmenr discimination law
Labout and employmenr discimination lawLabout and employmenr discimination law
Labout and employmenr discimination law
 
International law and wto
International law and wtoInternational law and wto
International law and wto
 
Intellectual property rights (2)
Intellectual property rights (2)Intellectual property rights (2)
Intellectual property rights (2)
 
India's 5 year plan startegy
India's 5 year plan startegyIndia's 5 year plan startegy
India's 5 year plan startegy
 
Import clearance procedure
Import clearance procedureImport clearance procedure
Import clearance procedure
 
Ifm intro
Ifm intro Ifm intro
Ifm intro
 
Human environment
Human environmentHuman environment
Human environment
 
Globaliation p point
Globaliation p pointGlobaliation p point
Globaliation p point
 
Foreign corrupt practises act(fcpa)
Foreign corrupt practises act(fcpa)Foreign corrupt practises act(fcpa)
Foreign corrupt practises act(fcpa)
 
Financial services
Financial servicesFinancial services
Financial services
 
Financial services 1
Financial services 1Financial services 1
Financial services 1
 
Fdi
FdiFdi
Fdi
 

Último

Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxLoriGlavin3
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxLoriGlavin3
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxhariprasad279825
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .Alan Dix
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfLoriGlavin3
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxLoriGlavin3
 
A Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersA Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersNicole Novielli
 
Training state-of-the-art general text embedding
Training state-of-the-art general text embeddingTraining state-of-the-art general text embedding
Training state-of-the-art general text embeddingZilliz
 
What is Artificial Intelligence?????????
What is Artificial Intelligence?????????What is Artificial Intelligence?????????
What is Artificial Intelligence?????????blackmambaettijean
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenHervé Boutemy
 
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersRaghuram Pandurangan
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxLoriGlavin3
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024Lonnie McRorey
 

Último (20)

Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptx
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdf
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
 
A Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersA Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software Developers
 
Training state-of-the-art general text embedding
Training state-of-the-art general text embeddingTraining state-of-the-art general text embedding
Training state-of-the-art general text embedding
 
What is Artificial Intelligence?????????
What is Artificial Intelligence?????????What is Artificial Intelligence?????????
What is Artificial Intelligence?????????
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache Maven
 
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information Developers
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024
 

Data mining

  • 2. Agenda  What is Data Mining?  Data Mining Tasks  Challenges in Data mining www.StudsPlanet.com
  • 3. What is Data Mining  Data mining is integral part of knowledge discovery in databases (KDD), which is the overall process of converting raw data into useful information. This process consists of series of transformation steps from preprocessing to postprocessing of data mining results www.StudsPlanet.com
  • 4. Process of Knowledge Discovery in Database(KDD) Data Preprocessing Data Mining PostProcessing Normalization. Data subsetting Normalization. Data subsetting Filtering Patterns,Visualization, Pattern Interpretation Filtering Patterns,Visualization, Pattern Interpretation Inputdata Input Data Information www.StudsPlanet.com
  • 5. Data Mining Tasks  Data Mining is generally divided into two tasks. 1. Predictive tasks 2. Descriptive tasks www.StudsPlanet.com
  • 6. Predictive Tasks  Objective: Predict the value of a specific attribute (target/dependent variable)based on the value of other attributes (explanatory). Example: Judge if a patient has specific disease based on his/her medical tests results. www.StudsPlanet.com
  • 7. Descriptive Tasks  Objective: To derive patterns (correlation,trends,trajectories) that summarizes the underlying relationship between data. Example: Identifying web pages that are accessed together.(human interpretable pattern) www.StudsPlanet.com
  • 8. Data Mining Tasks [contd.]  Classification [Predictive]  Clustering [Descriptive]  Association Rule Discovery[Descriptive]  Sequential Pattern Discovery [Descriptive]  Regression [Predictive]  Deviation Detection [Predictive] www.StudsPlanet.com
  • 9. Classification: Definition  Classification: Given a collection of records  Each record contains a set of attributes, one of the attribute is a class.  Find a model for class attribute as a function of values of other attributes.  Goal: previously unseen records should be assigned a class as accurately as possible.  A test set is used to determine the accuracy of the model. Usually, the given data set is divided into training and test sets, with training set used to build the model and test set used to validate it.www.StudsPlanet.com
  • 10. Classification: Example  Direct Marketing Goal: Reduce cost of mailing by targeting a set of consumers likely to buy a new product.  Approach:  Use the data for a similar product introduced before.  We know which customers decided to buy and which decided otherwise. This {buy, don’t buy} decision forms the class attribute.  Collect various demographic, lifestyle, and company-interaction related information about all such customers.  Type of business, where they stay, how much they earn, etc.  Use this information as input attributes to learn a classifier model. (from Berry & Linoff, 1997) www.StudsPlanet.com
  • 11. Clustering: Definition  Given a set of data points, each having a set of attributes, and a similarity measure among them, find clusters such that  Data points in one cluster are more similar to one another.  Data points in separate clusters are less similar to one another. www.StudsPlanet.com
  • 12. Clustering: Example  Document Clustering:  Goal: To find groups of documents that are similar to each other based on the important terms appearing in them.  Approach: To identify frequently occurring terms in each document. Form a similarity measure based on the frequencies of different terms. Use it to cluster.  Gain: Information Retrieval can utilize the clusters to relate a new document or search term to clustered documents. www.StudsPlanet.com
  • 13. Illustrating Document Clustering Category Total Articles Correctly Placed Financial 555 364 Foreign 341 260 National 273 36 Metro 943 746 Sports 738 573 Entertainment 354 278 Clustering Points: 3204 Articles Of Los Angles Times. Similarity Measure: How Many words are common in these documents. (after some word filtering) (Introduction to Data mining 2007) www.StudsPlanet.com
  • 14. Association Rule Discovery: Definition Given a set of records each of which contain some number of items from a given collection; Apriori principle: If an item set is frequent then its subset is also frequent TID Items 1 Bread, Coke Milk 2 3 Beer, Bread Beer,Coke, Diaper, Milk 4 Beer, Bread, Diaper, Milk 5 Coke, Diaper, Milk Rule Discovered: Milk -> Coke Diaper, Milk -> Beer www.StudsPlanet.com
  • 15. Other Mining Tasks in Nutshell  Sequential Pattern Discovery In point-of-sale transaction sequences,  Computer Bookstore: (Intro_To_Visual_C) (C++_Primer) --> (Perl_for_dummies,Tcl_Tk)  Regression: Neural Networks  Deviation Detection: Detect deviation from normal behavior. Eg. Credit card fraud. www.StudsPlanet.com
  • 16. Challenges of Data Mining  Scalability  Dimensionality  Complex and Heterogeneous Data  Data Quality  Data Ownership and Distribution  Privacy Preservation  Streaming Data www.StudsPlanet.com
  • 17. References  Tan, P., Steinbach, M., & Kumar, V., Introduction to Data Mining. Addison Wesley, 2006. www.StudsPlanet.com

Notas do Editor

  1. .