presentation on recent data mining Techniques ,and future directions of research from the recent research papers made in Pre-master ,in Cairo University under supervision of Dr. Rabie
2. Data Mining Def.
Def. :
Data mining is the extraction of interesting patterns or
knowledge from huge amount of data.
Known different names :
knowledge discovery (mining) in databases (KDD)
knowledge extraction,
data/pattern analysis,
data archeology,
data dredging,
information harvesting,
business intelligence and others. [1]
3. What is Data Mining
Data Mining enables data exploration, data analysis,
and data visualization of huge databases at a high level
of abstraction, without a specific hypothesis in mind.
working of data mining is understood by using a
method called modeling with it to make predictions.
4. Data Mining Technologies
include :
artificial neural networks
decision trees
genetic algorithms.
Machine Learning .
Evolutionary Computing
MOEA Multi objective Evolutionary
Computing
9. Data Types Application S.V.
Business transactions
Scientific data
Medical and personal data
Surveillance video and pictures
Satellite sensing
Text reports and memos (e-mail messages)
Most of the communications
The World Wide Web repositories
10. types of data (Data Structure S.V.)
Flat files
Relational Databases
Data Warehouses
Transaction Databases
Multimedia Databases
Spatial Databases
World Wide Web
11. FUNCTIONALITIES AND
CLASSIFICATIONS OF
DATA MINING
Characterization
Discrimination
Association analysis
Classification
uses given class labels to order the objects in
the data collection Classification approaches normally use a
training set where all objects are already associated with
known class labels. The classification algorithm learns from
the training set and builds a model. The model is used to
classify new objects.
Prediction
Prediction
12. Data Mining Systems
specialized
data source mined
dataClassification
according to the data
drawn on modmodel
el drawn on
kind of knowledge
discovered
mining techniques
used
comprehensive
13. Classification according to the type
of data source mined
This classification categorizes data mining systems
according to the type of data handled:
spatial data
multimedia data
time-series data
text data
World Wide Web.
14. Classification according to the data
model drawn on
This classification categorizes data mining systems
based on the data model involved:
Relational database
object-oriented database
data warehouse
Transactional
others
15. Classification according to the king
of knowledge discovered
This classification categorizes data mining systems
based on the kind of knowledge discovered or data
mining functionalities:
Characterization
discrimination
Association
classification
clustering
others
16. Classification according to mining
techniques used
The classification categorizes data mining systems
according to the data analysis approach used:
machine learning
neural networks
Genetic algorithms
Statistics
visualization
database oriented
data warehouse-oriented
others
17. take into account the degree of
user interaction involved in the
data mining process
query-driven systems,
interactive exploratory systems
autonomous systems
Note:
A comprehensive system would provide a wide variety
of data mining techniques to fit different situations
and options, and offer different degrees of user
interaction.
19. Data Mining Goals
the two main goals of DM are:
description
prediction.
Standard tasks in the field of DM are: description,
clustering, association discovery, sequential pattern
analysis, classification and regression.
Description : can be obtained by characterization or by
discrimination.
Characterization: is a summarization of the general features
Discrimination :does not differ too much from
characterization. It consists of characterizing a class by
comparison with another one.
20. Data Mining Goals
Clustering differs from classification since it analyses data
objects without knowing their class.
Association : discovery results in a set of association rules
which represents attribute-value conditions frequently
occurring in a given set of data.
Sequential pattern analysis : consists in searching for
frequently occurring patterns related to time.
Regression : uses existing values of some variables in order
to forecast what values of another continuous variable will
be
21. Machine Learning
A ML system uses an entire finite set of objects,
examples which represent observations of the
environment ; the learning algorithm learns a model
from this set which is called the training set.
ML In DM include:
databases
data warehouses
flat files
22. Classification in DM
Classification:
is a form of data analysis that can be used to extract
models describing important classes or to predict future
trends.
It represents :
learning paradigm which consists in segmenting data by
assigning it to groups, or classes,, that are already defined.
the assumption is a small database size but In Data Mining
it must be scalable technique.
23. Classification in DM
classes are represented by:
the values of a particular attribute called goal attribute
and remaining attributes are called predicting
attribute.
resulting model is usually represented as:
a set of IF-THEN prediction rules where each one
predicts a class from the predicting attributes.
24. ML in Classification
Procedure:
Algorithms are first applied to the so-called training set
which contains training examples with a known class to
discover rules.
the model is used for classification on a set of examples,
called the test set.
The predictive accuracy of the model is evaluated on the
test set
25. Classification Methods
Main classification methods are:
decision tree induction
Scalability problem
Bayesian classification
neural network learning.
Draw Backs:
Time-consuming
difficulty for humans to interpret their results.
26. ASSOCIATION ANALYSIS
They show relationships between attributes. Their
typical application domain is market basket and
transaction data analysis.
Association Rules:
An association rule is generally defined as an expression
X=>Y,
where X and Y are sets of attribute-value terms
27. ASSOCIATION ANALYSIS
Rules are not supposed to be strictly correct in order
for them to be useful. It is generally required to find
rules which are true to some degree only.
X implies Y
X tends to imply Y
Support and confidence
28. Apriori Algorithm
Depends on Frqeuent occurence
Draw Backs :
Large number of database scans
Large size of generated intermediate sets.
Apriori mining only Boolean and single-dimensional
association rules.
These rules are adapted to market basket analysis and can
29. GA Advantages in Data Mining
DM problem needs: robustness of solutions and
scalability
GA Advantages:
there is high ability to find patterns in vey large spaces.
parallel implementation
It performs a kind Of global search rather than local
hill-climbing.
the patterns produced are directly understandable
30. Search Challenges
scalability problems is an important research
challenge too.
MULTI-OBJECTIVE RULE EXTRACTION
MOEA Issues