18. AArrcchhiitteeccttuurree ooff aa TTyyppiiccaall DDaattaa
MMiinniinngg SSyysstteemm
Graphical user interface
Pattern evaluation
Data mining engine
Database or data
warehouse server
Data cleaning & data integration Filtering
Data
Warehouse
Databases
Lecture-2 WWhhaatt iiss DDaattaa MMiinniinngg??
Knowledge-base
19. Major sources of abundant data
Business: Web, e-commerce, transactions,
stocks, …
Science: Remote sensing, bioinformatics,
scientific simulation, …
Society and everyone: news, digital cameras,
YouTube
21. DDaattaa MMiinniinngg aanndd BBuussiinneessss
IInntteelllliiggeennccee
Increasing potential
to support
business decisions End User
Statistical Analysis, Querying and Reporting
Data Warehouses / Data Marts
Lecture-2 WWhhaatt iiss DDaattaa MMiinniinngg??
Business
Analyst
Data
Analyst
DBA
Making
Decisions
Data Presentation
Visualization Techniques
Data Mining
Information Discovery
Data Exploration
OLAP, MDA
Data Sources
Paper, Files, Information Providers, Database Systems, OLTP
24. Example 1.1 A relational database for AllElectronics. The
AllElectronics company is described by the
following relation tables: customer, item, employee, and branch.
Fragments of the tables
.
The relation customer consists of a set of attributes, including a
unique customer identity number (cust ID), customer name,
address, age, occupation, annual income,
credit information, category, and so on.
Similarly, each of the relations item, employee, and branch
consists of a set of attributes
describing their properties.
28. When data mining is applied to relational
databases,
we can go further by searching for trends or data
patterns. For example,
data mining systems can analyze customer data
to predict the credit risk of new customers based
on their income, age, and previous credit
information.
38. Concept/Class Description: CChhaarraacctteerriizzaattiioonn
aanndd
DDiissccrriimmiinnaattiioonn
These descriptions can be derived via
(1) data characterization, by summarizing the data of the class
under study (often called the target class)
, or
(2) data discrimination, by comparison of the target class with one
or a set of comparative classes (often called the contrasting
classes), or
(3) both data characterization and discrimination.
45. Data MMiinniinngg:: CCoonnfflluueennccee ooff
MMuullttiippllee DDiisscciipplliinneess
Database
Technology Statistics
Information
Science Data Mining
MachineLearning
Other
Disciplines
Visualization
61. OOLLTTPP vvss.. OOLLAAPP
OLTP OLAP
users clerk, IT professional knowledge worker
function day to day operations decision support
DB design application-oriented subject-oriented
data current, up-to-date
detailed, flat relational
isolated
historical,
summarized, multidimensional
integrated, consolidated
usage repetitive ad-hoc
access read/write
index/hash on prim. key
lots of scans
unit of work short, simple transaction complex query
# records accessed tens millions
#users thousands hundreds
DB size 100MB-GB 100GB-TB
metric transaction throughput query throughput, response
78. A CCoonncceepptt HHiieerraarrcchhyy:: DDiimmeennssiioonn ((llooccaattiioonn))
all
Europe ...
North_America
Germany ... Spain Canada ...
Mexico
... Vancouver
...
city Frankfurt Toronto
L. Chan ...
M. Wind
all
region
country
office
79. MMuullttiiddiimmeennssiioonnaall DDaattaa
SSaalleess vvoolluummee aass aa ffuunnccttiioonn ooff pprroodduucctt,,
mmoonntthh,, aanndd rreeggiioonn
Region
Product
Month
Dimensions: Product, Location, Time
Hierarchical summarization paths
Industry Region Year
Category Country Quarter
Product City Month Week
Office Day
80. AA SSaammppllee DDaattaa CCuubbee
Total annual sales
of TV in U.S.A.
Date
Product
Country
sum
sum
TV
PC
VCR
1Qtr 2Qtr 3Qtr 4Qtr
U.S.A
Canada
Mexico
sum
81. CCuubbooiiddss CCoorrrreessppoonnddiinngg ttoo tthhee
CCuubbee
all
product date country
product,date product,country date, country
product, date, country
0-D(apex) cuboid
1-D cuboids
2-D cuboids
3-D(base) cuboid
90. MMuullttii--TTiieerreedd AArrcchhiitteeccttuurree
Data
Warehouse
Extract
Transform
Load
Refresh
Serve
OLAP Engine
Analysis
Query
Reports
Data mining
Monitor
&
Integrator
Metadata
Data Marts
other
source
s
Operational
DBs
Data Sources Data Storage
Front-End Tools
OLAP Server
94. DDaattaa WWaarreehhoouussee DDeevveellooppmmeenntt:: AA
RReeccoommmmeennddeedd AApppprrooaacchh
Data
Mart
Distributed
Data Marts
Data
Mart
Multi-Tier Data
Warehouse
Enterprise
Data
Warehouse
Model refinement Model refinement
Define a high-level corporate data model
110. AAnn OOLLAAMM AArrcchhiitteeccttuurree
Mining query Mining result
Meta
Data
Data
Warehouse
MDDB
OLAM
Engine
OLAP
Engine
User GUI API
Data Cube API
Database API
Data cleaning
Data integration
Layer4
User Interface
Layer3
OLAP/OLAM
Layer2
MDDB
Layer1
Data
Repository
Filtering&Integration Filtering
Databases