Data mining concepts and work

Presented to :
Dr. Rabie
By :
Amr Abd EL Latief Abd El Al

Data Mining Def.
 Def. :
 Data mining is the extraction of interesting patterns or
knowledge from huge amount of data.
Known different names :
 knowledge discovery (mining) in databases (KDD)
 knowledge extraction,
 data/pattern analysis,
 data archeology,
 data dredging,
 information harvesting,
 business intelligence and others. [1]

What is Data Mining
 Data Mining enables data exploration, data analysis,
and data visualization of huge databases at a high level
of abstraction, without a specific hypothesis in mind.
 working of data mining is understood by using a
method called modeling with it to make predictions.

Data Mining Technologies
 include :
 artificial neural networks
 decision trees
 genetic algorithms.
 Machine Learning .
 Evolutionary Computing
 MOEA Multi objective Evolutionary
Computing

Classifications
Data Types
Application
Data Types
Data Structure
Functionality

Data Types Application S.V.
 Business transactions
 Scientific data
 Medical and personal data
 Surveillance video and pictures
 Satellite sensing
 Text reports and memos (e-mail messages)
 Most of the communications
 The World Wide Web repositories

types of data (Data Structure S.V.)
 Flat files
 Relational Databases
 Data Warehouses
 Transaction Databases
 Multimedia Databases
 Spatial Databases
 World Wide Web

FUNCTIONALITIES AND
CLASSIFICATIONS OF
DATA MINING
 Characterization
 Discrimination
 Association analysis
 Classification
 uses given class labels to order the objects in
 the data collection Classification approaches normally use a
 training set where all objects are already associated with
 known class labels. The classification algorithm learns from
 the training set and builds a model. The model is used to
 classify new objects.
 Prediction
 Prediction

Data Mining Systems
specialized
data source mined
dataClassification
according to the data
drawn on modmodel
el drawn on
kind of knowledge
discovered
mining techniques
used
comprehensive

Classification according to the type
of data source mined
 This classification categorizes data mining systems
according to the type of data handled:
 spatial data
 multimedia data
 time-series data
 text data
 World Wide Web.

Classification according to the data
model drawn on
based on the data model involved:
 Relational database
 object-oriented database
 data warehouse
 Transactional
 others

Classification according to the king
of knowledge discovered
based on the kind of knowledge discovered or data
mining functionalities:
 Characterization
 discrimination
 Association
 classification
 clustering
 others

Classification according to mining
techniques used
 The classification categorizes data mining systems
according to the data analysis approach used:
 machine learning
 neural networks
 Genetic algorithms
 Statistics
 visualization
 database oriented
 data warehouse-oriented
 others

take into account the degree of
user interaction involved in the
data mining process
 query-driven systems,
 interactive exploratory systems
 autonomous systems
Note:
 A comprehensive system would provide a wide variety
of data mining techniques to fit different situations
and options, and offer different degrees of user
interaction.

Data Mining Goals
 the two main goals of DM are:
 description
 prediction.
 Standard tasks in the field of DM are: description,
clustering, association discovery, sequential pattern
analysis, classification and regression.
 Description : can be obtained by characterization or by
discrimination.
 Characterization: is a summarization of the general features
 Discrimination :does not differ too much from
characterization. It consists of characterizing a class by
comparison with another one.

Data Mining Goals
 Clustering differs from classification since it analyses data
objects without knowing their class.
 Association : discovery results in a set of association rules
which represents attribute-value conditions frequently
occurring in a given set of data.
 Sequential pattern analysis : consists in searching for
frequently occurring patterns related to time.
 Regression : uses existing values of some variables in order
to forecast what values of another continuous variable will
be

Machine Learning
 A ML system uses an entire finite set of objects,
examples which represent observations of the
environment ; the learning algorithm learns a model
from this set which is called the training set.
 ML In DM include:
 databases
 data warehouses
 flat files

Classification in DM
 Classification:
is a form of data analysis that can be used to extract
models describing important classes or to predict future
trends.
 It represents :
learning paradigm which consists in segmenting data by
assigning it to groups, or classes,, that are already defined.
 the assumption is a small database size but In Data Mining
it must be scalable technique.

Classification in DM
 classes are represented by:
the values of a particular attribute called goal attribute
and remaining attributes are called predicting
attribute.
 resulting model is usually represented as:
a set of IF-THEN prediction rules where each one
predicts a class from the predicting attributes.

ML in Classification
 Procedure:
 Algorithms are first applied to the so-called training set
which contains training examples with a known class to
discover rules.
 the model is used for classification on a set of examples,
called the test set.
 The predictive accuracy of the model is evaluated on the
test set

Classification Methods
 Main classification methods are:
 decision tree induction
 Scalability problem
 Bayesian classification
 neural network learning.
 Draw Backs:
 Time-consuming
 difficulty for humans to interpret their results.

ASSOCIATION ANALYSIS
 They show relationships between attributes. Their
typical application domain is market basket and
transaction data analysis.
 Association Rules:
 An association rule is generally defined as an expression
 X=>Y,
 where X and Y are sets of attribute-value terms

ASSOCIATION ANALYSIS
 Rules are not supposed to be strictly correct in order
for them to be useful. It is generally required to find
rules which are true to some degree only.
 X implies Y
 X tends to imply Y
 Support and confidence

Apriori Algorithm
 Depends on Frqeuent occurence
 Draw Backs :
 Large number of database scans
 Large size of generated intermediate sets.
 Apriori mining only Boolean and single-dimensional
association rules.
 These rules are adapted to market basket analysis and can

GA Advantages in Data Mining
 DM problem needs: robustness of solutions and
scalability
 GA Advantages:
 there is high ability to find patterns in vey large spaces.
 parallel implementation
 It performs a kind Of global search rather than local
hill-climbing.
 the patterns produced are directly understandable

Search Challenges
 scalability problems is an important research
challenge too.
 MULTI-OBJECTIVE RULE EXTRACTION
 MOEA Issues

Data mining concepts and work

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (19)

Similar to Data mining concepts and work

Similar to Data mining concepts and work (20)

More from Amr Abd El Latief

More from Amr Abd El Latief (12)

Recently uploaded

Recently uploaded (20)

Data mining concepts and work