SlideShare uma empresa Scribd logo
1 de 64
Data Mining and Data
Warehousing
Introduction
Course Title: Data Mining and Data Warehousing (Elective)
• Course code: IT 308
• Credits: 3
• Lecture Hours: 48
• Course Objective
– The objective of the course is to make learner understand foundation
principles and techniques of data mining and data warehousing.
– Students will be able to select and use various data mining language and
tools very useful for adding business value of an organization.
• Course Description
– Introduction, Data Preprocessing- Data Integration and Transformation,
Classification, Association Analysis, Cluster Analysis, Information Privacy
and Data Mining, Advanced Applications, Search engines, Data
Warehouses, Capacity Planning.
2Compiled By: Kamal Acharya7/2/2019
Course Details
• Unit 1: Introduction LH 2
– 1.1. Data Mining Origin
– 1.2. Data Mining & Data Warehousing basics
• Unit 2: Data Preprocessing LH 6
– 2.1. Data Types and Attributes
– 2.2. Data Pre-processing
– 2.3. OLAP
– 2.4 Characteristics of OLAP Systems
– 2.5 Multidimensional View and Data cube
– 2.6 Data Cube Implementation
– 2.7 Data Cube Operations (Roll-up, Roll Down, slice and dice and pivot)
– 2.8 Guidelines for OLAP Implementation
3Compiled By: Kamal Acharya7/2/2019
Contd..
• Unit 3: Classification LH 7
– 3.1. Basics and Algorithms
– 3.2. Decision Tree Classifier
– 3.3. Rule Based Classifier
– 3.4. Nearest Neighbor Classifier
– 3.5. Bayesian Classifier
– 3.6. Artificial Neural Network Classifier
– 3.7. Issues : Over-fitting, Validation, Model Comparison
• Unit 4: Association Analysis LH 7
– 4.1. Basics and Algorithms
– 4.2. Frequent Item-set Pattern & Apriori Principle
– 4.3. FP-Growth, FP-Tree
– 4.4. Handling Categorical Attributes
4Compiled By: Kamal Acharya7/2/2019
Contd..
• Unit 5: Cluster Analysis LH 7
– 5.1. Basics and Algorithms
– 5.2. K-means Clustering
– 5.3. Hierarchical Clustering
– 5.4. Density-based spatial clustering of applications with noise (DBSCAN)
Clustering
• Unit 6: Information Privacy and Data Mining LH 3
– 6.1 Basic principles to Protect Information Privacy
– 6.2 Uses and Misuses of Data Mining
– 6.3 Primary Aims of data Mining
– 6.4 Pitfalls of Data Mining
5Compiled By: Kamal Acharya7/2/2019
Contd..
• Unit 7: Advanced Applications LH 3
– 7.1. Web-mining: Web content mining, web usage mining
– 7.2. Time-series data mining
• Unit 8: Search Engines LH 3
– 8.1 Characteristics of search engine
– 8.2 Search Engine functionality
– 8.3 Ranking of Web pages
6Compiled By: Kamal Acharya7/2/2019
Contd..
• Unit 9: Data Warehousing LH 7
– 9.1 Operational Data sources
– 9.2 ETL (Extract, Transform, Load)
– 9.3 Data Warehouse Processes, Managers and their functions
– 9.4 Data Warehouses and Data Warehouses Design
– 9.5 Guidelines for Data Warehouse Implementation
• Unit 10 Capacity Planning LH 3
– 10.1 Calculating storage requirement, CPU requirements
7Compiled By: Kamal Acharya7/2/2019
Contd..
• Practical: Students should practice enough on real-world data intensive
problems
8Compiled By: Kamal Acharya7/2/2019
Contd..
• References: Pang-Ning Tan, Michael Steinbach and Vipin Kumar, Introductionto Data
Mining, 2005, Addison- Wesley.
– Jiawei Han and Micheline Kamber, Data Mining: Concepts and Techniques, 2nd Edition,
2006, Morgan Kaufmann.
9Compiled By: Kamal Acharya7/2/2019
Contd..
– G.K. Gupta, Introduction to Data Mining with Case Studies, Prentice Hall of India
– IBM, An Introduction to Building the Data Warehouse, Prentice Hall of India
– IBM, Introduction to Business Intelligence and Data Warehousing, Prentice Hall of
India
– Adriaans Pieter, D. Zantige, "Data Mining", Pearson Education Asia Pub. Ltd, 2002
10Compiled By: Kamal Acharya7/2/2019
Unit 1 : Introduction to Data Mining and Data
Warehousing
What is Data?
– A representation of facts, concepts, or instructions in a formal manner
suitable for communication, interpretation, or processing by human
beings or by computers.
11Compiled By: Kamal Acharya7/2/2019
Origin of Data mining
• The steady and amazing progress of computer hardware
technology in the past three decades has led to large supplies
of powerful and affordable computers, data collection
equipment, and storage media.
• This technology provides a great boost to the database and
information industry, and makes a huge number of databases
and information re-positories available.
• This availability of huge data repositories creates a Data
explosion problem (data rich knowledge poor situation).
12Compiled By: Kamal Acharya7/2/2019
Contd..
• We are drowning in data, but starving for knowledge!
• So, Powerful and versatile tools are badly needed to automatically uncover
valuable information from tremendous amounts of data and to transform
such data into organized knowledge.
Necessity is the Mother of invention! -plato
• This necessity has led to the birth of data mining.
13Compiled By: Kamal Acharya7/2/2019
What is Data Mining?
Extraction of interesting (non-trivial, implicit, previously unknown and potentially
useful) patterns or knowledge from huge amount of data.
7/2/2019 14Compiled By: Kamal Acharya
Contd...
• Data mining: a misnomer?
• Scenario: Remember that the mining of gold from the rocks or sand is
referred to as gold mining rather than rock or sand mining.
– Thus, data mining should have been more appropriately named as “knowledge
mining” which emphasis on mining knowledge from large amounts of data.
– But, which is unfortunately somewhat long so, named “data mining”
• The overall goal of the data mining process is to extract pattern
from a data set and transform it into an understandable
structure for further use.
15Compiled By: Kamal Acharya7/2/2019
Contd..
• Alternative names for data mining:
– Knowledge discovery(mining) in databases (KDD)
– knowledge extraction
– data/pattern analysis
– data archeology
– data dredging
– information harvesting
– business intelligence, etc.
16Compiled By: Kamal Acharya7/2/2019
Contd..
• The key properties of data mining are:
– Automatic discovery of patterns
• E.g., Market basket analysis.
– Prediction of likely outcomes
• E.g., weather forecasting
– Creation of actionable information
• E.g., Police investigation
– Focus on large datasets and databases
17Compiled By: Kamal Acharya7/2/2019
Data mining is not
•Brute-force crunching of bulk data.
•“Blind” application of algorithms.
•Going to find non-existential relationships.
•Presenting data in different ways
•Queries to the database are not DM.
•A magic that will turn your data into gold.
7/2/2019 18Compiled By: Kamal Acharya
Contd..
• Data Mining: Confluence of Multiple Disciplines :
19Compiled By: Kamal Acharya
Data Mining
Database
Technology Statistics
Machine
Learning
Pattern
Recognition
Algorithm
Other
Disciplines
Visualization
7/2/2019
Why Data Mining?—Potential Applications
• Data analysis and decision support
– Market analysis and management
• Market basket analysis, sale techniques, customer feedback on
items (Opinion Mining)
– Risk analysis and management
• Forecasting, decision support system
– Fraud detection and detection of unusual patterns (outliers)
207/2/2019 Compiled By: Kamal Acharya
Why Data Mining?—Potential Applications
• Other Applications
– Text mining (news group, email, documents) and Web
mining
– Stream data mining (mining from continuous / rapid data
• Eg Telephone communication pattern, Web Searching, Sensor data
– Bioinformatics and bio-data analysis
217/2/2019 Compiled By: Kamal Acharya
Data Mining: On What Kinds of Data?
• As a general technology, data mining can be applied to any kind of
data as long as the data are meaningful for a target application.
– Database-oriented data sets and applications
• Relational database, data warehouse, transactional database
227/2/2019 Compiled By: Kamal Acharya
Contd..
• Advanced data sets and advanced applications
– Data streams and sensor data
– Time-series data
– graphs, social networks and multi-linked data
– Heterogeneous databases and legacy databases
– Spatial data and spatiotemporal data
– Multimedia database
– Text databases
– The World-Wide Web
237/2/2019 Compiled By: Kamal Acharya
Knowledge Discovery (KDD) Process..
• Simply stated, data mining refers to extracting or “mining”
knowledge from large amounts of data stored in databases,
data warehouses, or other information repositories.
• Many people treat data mining as a synonym for another
popularly used term, Knowledge Discovery from Data, or
KDD.
• Alternatively, others view data mining as simply an essential
step in the process of knowledge discovery.
• Knowledge discovery consists of an iterative sequence of the
following steps:
24Compiled By: Kamal Acharya7/2/2019
Contd..
Figure: Knowledge Discovery Process (Stages of KDD)
25Compiled By: Kamal Acharya7/2/2019
Contd..
• Data cleaning :
– It removes noise and inconsistent data
• Data integration:
– This combines data from multiple data sources
• Data selection:
– Data relevant to the analysis task are retrieved from the database
• Data transformation:
– Data are transformed or consolidated into forms appropriate for mining
by performing summary or aggregation operations.
26Compiled By: Kamal Acharya7/2/2019
Contd..
• Data mining:
– an essential process where intelligent methods are applied in order to
extract data patterns
• Pattern evaluation:
– Identifies the truly interesting patterns representing knowledge based
on some interestingness measures.
• Knowledge presentation:
– Knowledge representation techniques are used to present the mined
knowledge to the user.
27Compiled By: Kamal Acharya7/2/2019
Contd..
• According to this view, data mining is only one step in the knowledge
discovery process.
• However, in industry, in media, and in the database research milieu, the
term data mining is becoming more popular than the longer term of
knowledge discovery from data.
• Therefore, we choose to use the term data mining.
• Based on this view, the architecture of a typical data mining system is
described in the following slides
28Compiled By: Kamal Acharya7/2/2019
Architecture of Data Mining System
• A typical data mining system may have the following major components.
Figure: Architecture of Data Mining System
29Compiled By: Kamal Acharya7/2/2019
Contd..
• Database, Data Warehouse, World Wide Web, or Other Information
Repository:
– This is one or a set of databases, data warehouses, spreadsheets, or
other kinds of information repositories.
– Data cleaning and data integration techniques may be performed on the
data.
30Compiled By: Kamal Acharya7/2/2019
Contd..
• Database or Data Warehouse Server:
– The database or data warehouse server is responsible for
fetching the relevant data, based on the user’s data mining
request.
31Compiled By: Kamal Acharya7/2/2019
Contd..
• Knowledge Base:
– This is the domain knowledge that is used to guide the search or
evaluate the interestingness of resulting patterns. It is simply stored in
the form of set of rules.
– Such knowledge can include concept hierarchies, used to organize
attributes or attribute values into different levels of abstraction.
32Compiled By: Kamal Acharya7/2/2019
Contd..
• Data Mining Engine:
– This is essential to the data mining system and ideally consists of a set
of functional modules for tasks such as association and correlation
analysis, classification, prediction, cluster analysis, outlier analysis and
etc.
33Compiled By: Kamal Acharya7/2/2019
Contd..
• Pattern Evaluation Module:
– This component typically employs interestingness measures and
interacts with the data mining modules so as to focus the search toward
interesting patterns.
– It may use interestingness thresholds to filter out discovered patterns.
34Compiled By: Kamal Acharya7/2/2019
Contd..
• User interface:
– This module communicates between users and the data
mining system, allowing the user to interact with the
system by specifying a data mining query or task.
– In addition, this component allows the user to browse
database and data warehouse schemas or data structures,
evaluate mined patterns, and visualize the patterns in
different forms.
35Compiled By: Kamal Acharya7/2/2019
What is a Data Warehouse?
• A warehouse in general terms is a historic repository of information
collected from multiple sources, stored under a unified schema, and that
usually resides at a single site.
• Data Warehouses are constructed via a process of
– DATA CLEANING,
– DATA INTEGRATION,
– DATA TRANSFORMATION,
– DATA LOADING, and
– PERIODIC DATA REFRESHING.
• A data warehouse stores historical data of an organization so that they can
analyze their performance over the past time (days, weeks, months or
years) and plan for the future.
36Compiled By: Kamal Acharya7/2/2019
Contd……
• The popular definition of the data warehouse is given by WH
Inmon:
• “A data warehouse is a subject-oriented, integrated, time-variant, and
nonvolatile collection of data in support of management’s decision-
making process.”
• Data warehousing:
– The process of constructing and using data warehouses.
– Is the process of extracting & transferring operational data
into informational data & loading it into a central data store
(warehouse)
377/2/2019 Compiled By: Kamal Acharya
Data Warehouse—Integrated
• Constructed by integrating multiple,
heterogeneous data sources
– relational databases, flat files, on-line
transaction records
• Data cleaning and data integration
techniques are applied.
– Ensure consistency in naming conventions,
encoding structures, attribute measures, etc.
among different data sources
• E.g., Hotel price: currency, tax, etc.
Sales
system
Payroll
system
Purchasing
system
Customer
data
387/2/2019 Compiled By: Kamal Acharya
Data Warehouse—Subject-Oriented
• Organized around major subjects, such as
customer, product, sales.
• Focusing on the modeling and analysis of
data for decision makers, not on daily
operations or transaction processing.
• Provide a simple and concise view around
particular subject issues by excluding data
that are not useful in the decision support
process.
Sales
system
Payroll
system
Purchasing
system
Customer
data
Vendor
data
Employee
data
Operational data DW
397/2/2019 Compiled By: Kamal Acharya
Data Warehouse—Time Variant
• The time horizon for the data warehouse is significantly
longer than that of operational systems.
– Operational database: current value data.
– Data warehouse data: provide information from a historical
perspective (e.g., past 5-10 years)
407/2/2019 Compiled By: Kamal Acharya
Data Warehouse—Non-Volatile
• A physically separate store of data
transformed from the operational
environment.
• Operational update of data does not
occur in the data warehouse
environment.
– Does not require transaction processing,
recovery, and concurrency control
mechanisms
– Requires only two operations in data
accessing:
• initial loading of data and access of
data.
Sales
system
create
update
insert
delete Customer
data
load
access
DBMS DW
417/2/2019 Compiled By: Kamal Acharya
Data Warehouse Usage
• Three kinds of data warehouse applications
– Information processing
• supports querying, basic statistical analysis, and reporting
using tables, charts and graphs.
427/2/2019 Compiled By: Kamal Acharya
Contd..
• Analytical processing
• multidimensional analysis of data warehouse data
• supports basic OLAP operations(drill-down, roll-up, slice-dice, drilling,
pivoting, which allows the user to view the data at differing degree of
summarization.
• Data mining
• knowledge discovery from hidden patterns
• supports associations, constructing analytical models, performing
classification and prediction, and presenting the mining results using
visualization tools.
437/2/2019 Compiled By: Kamal Acharya
General Architecture
Data
Warehouse
Query
and
Data Analysis
Component
External
Sources
Data
Integration
Component
OLAP
Server
OLAP
queries/
reports
data
mining
Metadata
Monitoring
Administration
Internal
Sources
Data
acquisition
Data
extraction
Construction &
maintenance 447/2/2019 Compiled By: Kamal Acharya
45
3 main phases
• Data acquisition:
– relevant data collection
– Recovering: transformation into the data warehouse model from
existing models
– Loading: cleaning and loading in the Data Warehouse.
• Storage
• Data extraction
– Tool examples: Query/report, SQL, multidimensional analysis (OLAP
tools), data mining
• Maintenance(Optional)
7/2/2019 Compiled By: Kamal Acharya
THE USE OF A DATA WAREHOUSE
INVENTORY
DATABASE
PERSONNEL
DATABASE
NEWCASTLE
SALES DB
LONDON
SALES DB
GLASGOW
SALES DB
STEP 2: Question the Data Warehouse
DECISIONS
and ACTIONS!
STEP 3: Do something
with what you learn from
the Data Warehouse
STEP 1: Load the Data Warehouse
DATA
WAREHOUSE
467/2/2019 Compiled By: Kamal Acharya
Benefits of Data Warehousing
• Queries do not impact Operational systems
• Provides quick response to queries for reporting
• Enables Subject Area Orientation
• Integrates data from multiple, diverse sources
• Enables multiple interpretations of same data by different users or groups
• Provides thorough analysis of data over a period of time
• Accuracy of Operational systems can be checked
• Provides analysis capabilities to decision makers
7/2/2019 47Compiled By: Kamal Acharya
• Increase customer profitability
• Cost effective decision making
• Manage customer and business partner relationships
• Manage risk, assets and liabilities
• Integrate inventory, operations and manufacturing
• Reduction in time to locate, access, and analyze
information (Link multiple locations and geographies)
• Identify developing trends and reduce time to market
• Strategic advantage over competitors
7/2/2019 48Compiled By: Kamal Acharya
• Potential high returns on investment
• Competitive advantage
• Increased productivity of corporate decision-makers
• Provide reliable, High performance access
• Consistent view of Data: Same query, same data. All users
should be warned if data load has not come in.
• Quality of data is a driver for business re-engineering.
7/2/2019 49Compiled By: Kamal Acharya
Applications of Data Mining
• Data mining is an interdisciplinary field with wide and diverse
applications
– There exist nontrivial gaps between data mining principles and
domain-specific applications
• Some application domains
– Financial data analysis
– Retail industry
– Telecommunication industry
– Biological data analysis
7/2/2019 50Compiled By: Kamal Acharya
Data Mining for Financial Data Analysis
• Financial data collected in banks and financial institutions are often relatively
complete, reliable, and of high quality
• Design and construction of data warehouses for multidimensional data analysis and
data mining
– View the debt and revenue changes by month, by region, by sector, and by other factors
– Access statistical information such as max, min, total, average, trend, etc.
• Loan payment prediction/consumer credit policy analysis
– feature selection and attribute relevance ranking
– Loan payment performance
– Consumer credit rating
7/2/2019 51Compiled By: Kamal Acharya
• Classification and clustering of customers for targeted
marketing
– multidimensional segmentation by nearest-neighbor,
classification, decision trees, etc. to identify customer groups or
associate a new customer to an appropriate customer group
• Detection of money laundering and other financial crimes
– integration of from multiple DBs (e.g., bank transactions,
federal/state crime history DBs)
– Tools: data visualization, linkage analysis, classification,
clustering tools, outlier analysis, and sequential pattern analysis
tools (find unusual access sequences)
7/2/2019 52Compiled By: Kamal Acharya
Data Mining for Retail Industry
• Retail industry: huge amounts of data on sales, customer shopping history,
etc.
• Applications of retail data mining
– Identify customer buying behaviors
– Discover customer shopping patterns and trends
– Improve the quality of customer service
– Achieve better customer retention and satisfaction
– Enhance goods consumption ratios
– Design more effective goods transportation and distribution policies
7/2/2019 53Compiled By: Kamal Acharya
• Example 1. Design and construction of data warehouses based on the
benefits of data mining
– Multidimensional analysis of sales, customers, products, time, and region
• Example 2. Analysis of the effectiveness of sales campaigns
• Example 3. Customer retention: Analysis of customer loyalty
– Use customer loyalty card information to register sequences of purchases of
particular customers
– Use sequential pattern mining to investigate changes in customer
consumption or loyalty
– Suggest adjustments on the pricing and variety of goods
• Example 4. Purchase recommendation and cross-reference of items
7/2/2019 54Compiled By: Kamal Acharya
Data Mining for Telecommunication Industry
• A rapidly expanding and highly competitive industry and a great demand
for data mining
– Understand the business involved
– Identify telecommunication patterns
– Catch fraudulent activities
– Make better use of resources
– Improve the quality of service
• Multidimensional analysis of telecommunication data
– Intrinsically multidimensional: calling-time, duration, location of caller,
location of callee, type of call, etc.
7/2/2019 55Compiled By: Kamal Acharya
• Fraudulent pattern analysis and the identification of unusual patterns
– Identify potentially fraudulent users and their typical usage
patterns
– Detect attempts to gain fraudulent entry to customer accounts
– Discover unusual patterns which may need special attention
• Multidimensional association and sequential pattern analysis
– Find usage patterns for a set of communication services by
customer group, by month, etc.
– Promote the sales of specific services
– Improve the availability of particular services in a region
• Use of visualization tools in telecommunication data analysis
7/2/2019 56Compiled By: Kamal Acharya
Biomedical Data Analysis
• DNA sequences: 4 basic building blocks (nucleotides): adenine (A),
cytosine (C), guanine (G), and thymine (T).
• Gene: a sequence of hundreds of individual nucleotides arranged in a
particular order
• Humans have around 30,000 genes
• Tremendous number of ways that the nucleotides can be ordered and
sequenced to form distinct genes
• Semantic integration of heterogeneous, distributed genome databases
– Current: highly distributed, uncontrolled generation and use of a wide variety
of DNA data
– Data cleaning and data integration methods developed in data mining will help
7/2/2019 57Compiled By: Kamal Acharya
• Similarity search and comparison among DNA sequences
– Compare the frequently occurring patterns of each class (e.g., diseased and
healthy)
– Identify gene sequence patterns that play roles in various diseases
• Association analysis: identification of co-occurring gene sequences
– Most diseases are not triggered by a single gene but by a combination of
genes acting together
– Association analysis may help determine the kinds of genes that are likely
to co-occur together in target samples
• Path analysis: linking genes to different disease development stages
– Different genes may become active at different stages of the disease
– Develop pharmaceutical interventions that target the different stages
separately
• Visualization tools and genetic data analysis
7/2/2019 58Compiled By: Kamal Acharya
Problems in Data Warehousing
• Underestimation of resources for data loading
• Hidden problems with source systems
• Required data not captured
• Increased end-user demands
• Data homogenization
• High demand for resources
• Data ownership
• High maintenance
• Long duration projects
• Complexity of integration
7/2/2019 59Compiled By: Kamal Acharya
Major Challenges in Data Warehousing
• Data mining requires single, separate, clean, integrated, and self-consistent source of data.
– A DW is well equipped for providing data for mining.
• Data quality and consistency is essential to ensure the accuracy of the predictive models.
– DWs are populated with clean, consistent data
• Advantageous to mine data from multiple sources to discover as many interrelationships as
possible.
– DWs contain data from a number of sources.
• Selecting relevant subsets of records and fields for data mining
– requires query capabilities of the DW.
• Results of a data mining study are useful if can further investigate the uncovered patterns.
– DWs provide capability to go back to the data source.
7/2/2019 60Compiled By: Kamal Acharya
• The largest challenge a data miner may face is the sheer
volume of data in the data warehouse.
• It is quite important, then, that summary data also be
available to get the analysis started.
• A major problem is that this sheer volume may mask
the important relationships the data miner is interested
in.
• The ability to overcome the volume and be able to
interpret the data is quite important.
7/2/2019 61Compiled By: Kamal Acharya
 Efficiency and scalability of data mining algorithms
 Parallel, distributed, stream, and incremental mining methods
 Handling high-dimensionality
 Handling noise, uncertainty, and incompleteness of data
 Incorporation of constraints, expert knowledge, and background knowledge in data
mining
 Pattern evaluation and knowledge integration
 Mining diverse and heterogeneous kinds of data: e.g., bioinformatics, Web,
software/system engineering, information networks
 Application-oriented and domain-specific data mining
 Invisible data mining (embedded in other functional modules)
 Protection of security, integrity, and privacy in data mining
Major Challenges in Data Mining
7/2/2019 62Compiled By: Kamal Acharya
Homework
• Briefly explain data mining and define it. Why data mining being used
more widely now?
• State and explain the major applications of data mining.
• Explain briefly some limitations of data mining.
• What is the future of data mining?
• Can data mining in some areas assist in identifying corruption? Select one
area and study the possibilities.
• How is a data warehouse different from a database? How are they similar.
• Explain what data warehousing and OLAP aim to achieve that can not be
achieved by OLTP systems.
7/2/2019 Compiled By: Kamal Acharya 63
Thank You !
64Compiled By: Kamal Acharya7/2/2019

Mais conteúdo relacionado

Mais procurados

Data mining concepts and work
Data mining concepts and workData mining concepts and work
Data mining concepts and workAmr Abd El Latief
 
Knowledge discovery process
Knowledge discovery process Knowledge discovery process
Knowledge discovery process Shuvra Ghosh
 
Data Preprocessing || Data Mining
Data Preprocessing || Data MiningData Preprocessing || Data Mining
Data Preprocessing || Data MiningIffat Firozy
 
OLAP & DATA WAREHOUSE
OLAP & DATA WAREHOUSEOLAP & DATA WAREHOUSE
OLAP & DATA WAREHOUSEZalpa Rathod
 
Introduction to Data mining
Introduction to Data miningIntroduction to Data mining
Introduction to Data miningHadi Fadlallah
 
Data warehousing and online analytical processing
Data warehousing and online analytical processingData warehousing and online analytical processing
Data warehousing and online analytical processingVijayasankariS
 
Data Warehouse Modeling
Data Warehouse ModelingData Warehouse Modeling
Data Warehouse Modelingvivekjv
 
Data mining: Classification and prediction
Data mining: Classification and predictionData mining: Classification and prediction
Data mining: Classification and predictionDataminingTools Inc
 
Fundamentals of Database ppt ch01
Fundamentals of Database ppt ch01Fundamentals of Database ppt ch01
Fundamentals of Database ppt ch01Jotham Gadot
 
Online Analytical Processing
Online Analytical ProcessingOnline Analytical Processing
Online Analytical Processingnayakslideshare
 
DATA WAREHOUSE IMPLEMENTATION BY SAIKIRAN PANJALA
DATA WAREHOUSE IMPLEMENTATION BY SAIKIRAN PANJALADATA WAREHOUSE IMPLEMENTATION BY SAIKIRAN PANJALA
DATA WAREHOUSE IMPLEMENTATION BY SAIKIRAN PANJALASaikiran Panjala
 
Project Presentation on Data WareHouse
Project Presentation on Data WareHouseProject Presentation on Data WareHouse
Project Presentation on Data WareHouseAbhi Bhardwaj
 

Mais procurados (20)

Datawarehouse and OLAP
Datawarehouse and OLAPDatawarehouse and OLAP
Datawarehouse and OLAP
 
Data mining concepts and work
Data mining concepts and workData mining concepts and work
Data mining concepts and work
 
Knowledge discovery process
Knowledge discovery process Knowledge discovery process
Knowledge discovery process
 
Data Preprocessing || Data Mining
Data Preprocessing || Data MiningData Preprocessing || Data Mining
Data Preprocessing || Data Mining
 
OLAP & DATA WAREHOUSE
OLAP & DATA WAREHOUSEOLAP & DATA WAREHOUSE
OLAP & DATA WAREHOUSE
 
Data Warehouse
Data Warehouse Data Warehouse
Data Warehouse
 
Data Warehousing
Data WarehousingData Warehousing
Data Warehousing
 
Introduction to Data mining
Introduction to Data miningIntroduction to Data mining
Introduction to Data mining
 
Data warehousing and online analytical processing
Data warehousing and online analytical processingData warehousing and online analytical processing
Data warehousing and online analytical processing
 
Data Warehouse Modeling
Data Warehouse ModelingData Warehouse Modeling
Data Warehouse Modeling
 
01 Data Mining: Concepts and Techniques, 2nd ed.
01 Data Mining: Concepts and Techniques, 2nd ed.01 Data Mining: Concepts and Techniques, 2nd ed.
01 Data Mining: Concepts and Techniques, 2nd ed.
 
Abstract data types
Abstract data typesAbstract data types
Abstract data types
 
Data warehousing
Data warehousingData warehousing
Data warehousing
 
Data mining: Classification and prediction
Data mining: Classification and predictionData mining: Classification and prediction
Data mining: Classification and prediction
 
Fundamentals of Database ppt ch01
Fundamentals of Database ppt ch01Fundamentals of Database ppt ch01
Fundamentals of Database ppt ch01
 
Online Analytical Processing
Online Analytical ProcessingOnline Analytical Processing
Online Analytical Processing
 
DATA WAREHOUSE IMPLEMENTATION BY SAIKIRAN PANJALA
DATA WAREHOUSE IMPLEMENTATION BY SAIKIRAN PANJALADATA WAREHOUSE IMPLEMENTATION BY SAIKIRAN PANJALA
DATA WAREHOUSE IMPLEMENTATION BY SAIKIRAN PANJALA
 
3 Data Mining Tasks
3  Data Mining Tasks3  Data Mining Tasks
3 Data Mining Tasks
 
Project Presentation on Data WareHouse
Project Presentation on Data WareHouseProject Presentation on Data WareHouse
Project Presentation on Data WareHouse
 
Introduction to Data Mining
Introduction to Data MiningIntroduction to Data Mining
Introduction to Data Mining
 

Semelhante a Introduction to Data Mining and Data Warehousing

BIM Data Mining Unit1 by Tekendra Nath Yogi
 BIM Data Mining Unit1 by Tekendra Nath Yogi BIM Data Mining Unit1 by Tekendra Nath Yogi
BIM Data Mining Unit1 by Tekendra Nath YogiTekendra Nath Yogi
 
From data mining to knowledge discovery in
From data mining to knowledge discovery inFrom data mining to knowledge discovery in
From data mining to knowledge discovery inRaj Kumar Ranabhat
 
Data Mining - Presentation.pptx
Data Mining - Presentation.pptxData Mining - Presentation.pptx
Data Mining - Presentation.pptxfahadusman23
 
Data mining and business intelligence
Data mining and business intelligenceData mining and business intelligence
Data mining and business intelligencechirag patil
 
Intro to Data warehousing lecture 16
Intro to Data warehousing   lecture 16Intro to Data warehousing   lecture 16
Intro to Data warehousing lecture 16AnwarrChaudary
 
Datamininglecture
DatamininglectureDatamininglecture
DatamininglectureManish Rana
 
Data mining (lecture 1 & 2) conecpts and techniques
Data mining (lecture 1 & 2) conecpts and techniquesData mining (lecture 1 & 2) conecpts and techniques
Data mining (lecture 1 & 2) conecpts and techniquesSaif Ullah
 
Data mining concepts
Data mining conceptsData mining concepts
Data mining conceptsBasit Rafiq
 
Unlock Your Data for ML & AI using Data Virtualization
Unlock Your Data for ML & AI using Data VirtualizationUnlock Your Data for ML & AI using Data Virtualization
Unlock Your Data for ML & AI using Data VirtualizationDenodo
 

Semelhante a Introduction to Data Mining and Data Warehousing (20)

BIM Data Mining Unit1 by Tekendra Nath Yogi
 BIM Data Mining Unit1 by Tekendra Nath Yogi BIM Data Mining Unit1 by Tekendra Nath Yogi
BIM Data Mining Unit1 by Tekendra Nath Yogi
 
From data mining to knowledge discovery in
From data mining to knowledge discovery inFrom data mining to knowledge discovery in
From data mining to knowledge discovery in
 
Web mining
Web mining Web mining
Web mining
 
Bigdata-Intro.pptx
Bigdata-Intro.pptxBigdata-Intro.pptx
Bigdata-Intro.pptx
 
Digital curation centre
Digital curation centreDigital curation centre
Digital curation centre
 
Data mining
Data miningData mining
Data mining
 
Unit 1
Unit 1Unit 1
Unit 1
 
Dma unit 1
Dma unit   1Dma unit   1
Dma unit 1
 
Data mining
Data miningData mining
Data mining
 
Data mining
Data miningData mining
Data mining
 
Data Mining - Presentation.pptx
Data Mining - Presentation.pptxData Mining - Presentation.pptx
Data Mining - Presentation.pptx
 
dwdm unit 1.ppt
dwdm unit 1.pptdwdm unit 1.ppt
dwdm unit 1.ppt
 
Data mining and business intelligence
Data mining and business intelligenceData mining and business intelligence
Data mining and business intelligence
 
Linked Open Data
Linked Open DataLinked Open Data
Linked Open Data
 
Intro to Data warehousing lecture 16
Intro to Data warehousing   lecture 16Intro to Data warehousing   lecture 16
Intro to Data warehousing lecture 16
 
Datamininglecture
DatamininglectureDatamininglecture
Datamininglecture
 
Data mining (lecture 1 & 2) conecpts and techniques
Data mining (lecture 1 & 2) conecpts and techniquesData mining (lecture 1 & 2) conecpts and techniques
Data mining (lecture 1 & 2) conecpts and techniques
 
Seminar Report Vaibhav
Seminar Report VaibhavSeminar Report Vaibhav
Seminar Report Vaibhav
 
Data mining concepts
Data mining conceptsData mining concepts
Data mining concepts
 
Unlock Your Data for ML & AI using Data Virtualization
Unlock Your Data for ML & AI using Data VirtualizationUnlock Your Data for ML & AI using Data Virtualization
Unlock Your Data for ML & AI using Data Virtualization
 

Mais de Kamal Acharya

Programming the basic computer
Programming the basic computerProgramming the basic computer
Programming the basic computerKamal Acharya
 
Introduction to Computer Security
Introduction to Computer SecurityIntroduction to Computer Security
Introduction to Computer SecurityKamal Acharya
 
Making decision and repeating in PHP
Making decision and repeating  in PHPMaking decision and repeating  in PHP
Making decision and repeating in PHPKamal Acharya
 
Working with arrays in php
Working with arrays in phpWorking with arrays in php
Working with arrays in phpKamal Acharya
 
Text and Numbers (Data Types)in PHP
Text and Numbers (Data Types)in PHPText and Numbers (Data Types)in PHP
Text and Numbers (Data Types)in PHPKamal Acharya
 
Capacity Planning of Data Warehousing
Capacity Planning of Data WarehousingCapacity Planning of Data Warehousing
Capacity Planning of Data WarehousingKamal Acharya
 
Information Privacy and Data Mining
Information Privacy and Data MiningInformation Privacy and Data Mining
Information Privacy and Data MiningKamal Acharya
 
Association Analysis in Data Mining
Association Analysis in Data MiningAssociation Analysis in Data Mining
Association Analysis in Data MiningKamal Acharya
 
Classification techniques in data mining
Classification techniques in data miningClassification techniques in data mining
Classification techniques in data miningKamal Acharya
 

Mais de Kamal Acharya (20)

Programming the basic computer
Programming the basic computerProgramming the basic computer
Programming the basic computer
 
Computer Arithmetic
Computer ArithmeticComputer Arithmetic
Computer Arithmetic
 
Introduction to Computer Security
Introduction to Computer SecurityIntroduction to Computer Security
Introduction to Computer Security
 
Session and Cookies
Session and CookiesSession and Cookies
Session and Cookies
 
Functions in php
Functions in phpFunctions in php
Functions in php
 
Web forms in php
Web forms in phpWeb forms in php
Web forms in php
 
Making decision and repeating in PHP
Making decision and repeating  in PHPMaking decision and repeating  in PHP
Making decision and repeating in PHP
 
Working with arrays in php
Working with arrays in phpWorking with arrays in php
Working with arrays in php
 
Text and Numbers (Data Types)in PHP
Text and Numbers (Data Types)in PHPText and Numbers (Data Types)in PHP
Text and Numbers (Data Types)in PHP
 
Introduction to PHP
Introduction to PHPIntroduction to PHP
Introduction to PHP
 
Capacity Planning of Data Warehousing
Capacity Planning of Data WarehousingCapacity Planning of Data Warehousing
Capacity Planning of Data Warehousing
 
Search Engines
Search EnginesSearch Engines
Search Engines
 
Web Mining
Web MiningWeb Mining
Web Mining
 
Information Privacy and Data Mining
Information Privacy and Data MiningInformation Privacy and Data Mining
Information Privacy and Data Mining
 
Cluster Analysis
Cluster AnalysisCluster Analysis
Cluster Analysis
 
Association Analysis in Data Mining
Association Analysis in Data MiningAssociation Analysis in Data Mining
Association Analysis in Data Mining
 
Classification techniques in data mining
Classification techniques in data miningClassification techniques in data mining
Classification techniques in data mining
 
Data Preprocessing
Data PreprocessingData Preprocessing
Data Preprocessing
 
Functions in Python
Functions in PythonFunctions in Python
Functions in Python
 
Python Flow Control
Python Flow ControlPython Flow Control
Python Flow Control
 

Último

TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...Nguyen Thanh Tu Collection
 
COMMUNICATING NEGATIVE NEWS - APPROACHES .pptx
COMMUNICATING NEGATIVE NEWS - APPROACHES .pptxCOMMUNICATING NEGATIVE NEWS - APPROACHES .pptx
COMMUNICATING NEGATIVE NEWS - APPROACHES .pptxannathomasp01
 
Basic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptxBasic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptxDenish Jangid
 
Plant propagation: Sexual and Asexual propapagation.pptx
Plant propagation: Sexual and Asexual propapagation.pptxPlant propagation: Sexual and Asexual propapagation.pptx
Plant propagation: Sexual and Asexual propapagation.pptxUmeshTimilsina1
 
Sensory_Experience_and_Emotional_Resonance_in_Gabriel_Okaras_The_Piano_and_Th...
Sensory_Experience_and_Emotional_Resonance_in_Gabriel_Okaras_The_Piano_and_Th...Sensory_Experience_and_Emotional_Resonance_in_Gabriel_Okaras_The_Piano_and_Th...
Sensory_Experience_and_Emotional_Resonance_in_Gabriel_Okaras_The_Piano_and_Th...Pooja Bhuva
 
Understanding Accommodations and Modifications
Understanding  Accommodations and ModificationsUnderstanding  Accommodations and Modifications
Understanding Accommodations and ModificationsMJDuyan
 
On_Translating_a_Tamil_Poem_by_A_K_Ramanujan.pptx
On_Translating_a_Tamil_Poem_by_A_K_Ramanujan.pptxOn_Translating_a_Tamil_Poem_by_A_K_Ramanujan.pptx
On_Translating_a_Tamil_Poem_by_A_K_Ramanujan.pptxPooja Bhuva
 
Google Gemini An AI Revolution in Education.pptx
Google Gemini An AI Revolution in Education.pptxGoogle Gemini An AI Revolution in Education.pptx
Google Gemini An AI Revolution in Education.pptxDr. Sarita Anand
 
ICT Role in 21st Century Education & its Challenges.pptx
ICT Role in 21st Century Education & its Challenges.pptxICT Role in 21st Century Education & its Challenges.pptx
ICT Role in 21st Century Education & its Challenges.pptxAreebaZafar22
 
How to setup Pycharm environment for Odoo 17.pptx
How to setup Pycharm environment for Odoo 17.pptxHow to setup Pycharm environment for Odoo 17.pptx
How to setup Pycharm environment for Odoo 17.pptxCeline George
 
HMCS Vancouver Pre-Deployment Brief - May 2024 (Web Version).pptx
HMCS Vancouver Pre-Deployment Brief - May 2024 (Web Version).pptxHMCS Vancouver Pre-Deployment Brief - May 2024 (Web Version).pptx
HMCS Vancouver Pre-Deployment Brief - May 2024 (Web Version).pptxmarlenawright1
 
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptxMaritesTamaniVerdade
 
This PowerPoint helps students to consider the concept of infinity.
This PowerPoint helps students to consider the concept of infinity.This PowerPoint helps students to consider the concept of infinity.
This PowerPoint helps students to consider the concept of infinity.christianmathematics
 
Interdisciplinary_Insights_Data_Collection_Methods.pptx
Interdisciplinary_Insights_Data_Collection_Methods.pptxInterdisciplinary_Insights_Data_Collection_Methods.pptx
Interdisciplinary_Insights_Data_Collection_Methods.pptxPooja Bhuva
 
Application orientated numerical on hev.ppt
Application orientated numerical on hev.pptApplication orientated numerical on hev.ppt
Application orientated numerical on hev.pptRamjanShidvankar
 
Exploring_the_Narrative_Style_of_Amitav_Ghoshs_Gun_Island.pptx
Exploring_the_Narrative_Style_of_Amitav_Ghoshs_Gun_Island.pptxExploring_the_Narrative_Style_of_Amitav_Ghoshs_Gun_Island.pptx
Exploring_the_Narrative_Style_of_Amitav_Ghoshs_Gun_Island.pptxPooja Bhuva
 
How to Add New Custom Addons Path in Odoo 17
How to Add New Custom Addons Path in Odoo 17How to Add New Custom Addons Path in Odoo 17
How to Add New Custom Addons Path in Odoo 17Celine George
 
General Principles of Intellectual Property: Concepts of Intellectual Proper...
General Principles of Intellectual Property: Concepts of Intellectual  Proper...General Principles of Intellectual Property: Concepts of Intellectual  Proper...
General Principles of Intellectual Property: Concepts of Intellectual Proper...Poonam Aher Patil
 
Wellbeing inclusion and digital dystopias.pptx
Wellbeing inclusion and digital dystopias.pptxWellbeing inclusion and digital dystopias.pptx
Wellbeing inclusion and digital dystopias.pptxJisc
 
Python Notes for mca i year students osmania university.docx
Python Notes for mca i year students osmania university.docxPython Notes for mca i year students osmania university.docx
Python Notes for mca i year students osmania university.docxRamakrishna Reddy Bijjam
 

Último (20)

TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...
 
COMMUNICATING NEGATIVE NEWS - APPROACHES .pptx
COMMUNICATING NEGATIVE NEWS - APPROACHES .pptxCOMMUNICATING NEGATIVE NEWS - APPROACHES .pptx
COMMUNICATING NEGATIVE NEWS - APPROACHES .pptx
 
Basic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptxBasic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptx
 
Plant propagation: Sexual and Asexual propapagation.pptx
Plant propagation: Sexual and Asexual propapagation.pptxPlant propagation: Sexual and Asexual propapagation.pptx
Plant propagation: Sexual and Asexual propapagation.pptx
 
Sensory_Experience_and_Emotional_Resonance_in_Gabriel_Okaras_The_Piano_and_Th...
Sensory_Experience_and_Emotional_Resonance_in_Gabriel_Okaras_The_Piano_and_Th...Sensory_Experience_and_Emotional_Resonance_in_Gabriel_Okaras_The_Piano_and_Th...
Sensory_Experience_and_Emotional_Resonance_in_Gabriel_Okaras_The_Piano_and_Th...
 
Understanding Accommodations and Modifications
Understanding  Accommodations and ModificationsUnderstanding  Accommodations and Modifications
Understanding Accommodations and Modifications
 
On_Translating_a_Tamil_Poem_by_A_K_Ramanujan.pptx
On_Translating_a_Tamil_Poem_by_A_K_Ramanujan.pptxOn_Translating_a_Tamil_Poem_by_A_K_Ramanujan.pptx
On_Translating_a_Tamil_Poem_by_A_K_Ramanujan.pptx
 
Google Gemini An AI Revolution in Education.pptx
Google Gemini An AI Revolution in Education.pptxGoogle Gemini An AI Revolution in Education.pptx
Google Gemini An AI Revolution in Education.pptx
 
ICT Role in 21st Century Education & its Challenges.pptx
ICT Role in 21st Century Education & its Challenges.pptxICT Role in 21st Century Education & its Challenges.pptx
ICT Role in 21st Century Education & its Challenges.pptx
 
How to setup Pycharm environment for Odoo 17.pptx
How to setup Pycharm environment for Odoo 17.pptxHow to setup Pycharm environment for Odoo 17.pptx
How to setup Pycharm environment for Odoo 17.pptx
 
HMCS Vancouver Pre-Deployment Brief - May 2024 (Web Version).pptx
HMCS Vancouver Pre-Deployment Brief - May 2024 (Web Version).pptxHMCS Vancouver Pre-Deployment Brief - May 2024 (Web Version).pptx
HMCS Vancouver Pre-Deployment Brief - May 2024 (Web Version).pptx
 
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx
 
This PowerPoint helps students to consider the concept of infinity.
This PowerPoint helps students to consider the concept of infinity.This PowerPoint helps students to consider the concept of infinity.
This PowerPoint helps students to consider the concept of infinity.
 
Interdisciplinary_Insights_Data_Collection_Methods.pptx
Interdisciplinary_Insights_Data_Collection_Methods.pptxInterdisciplinary_Insights_Data_Collection_Methods.pptx
Interdisciplinary_Insights_Data_Collection_Methods.pptx
 
Application orientated numerical on hev.ppt
Application orientated numerical on hev.pptApplication orientated numerical on hev.ppt
Application orientated numerical on hev.ppt
 
Exploring_the_Narrative_Style_of_Amitav_Ghoshs_Gun_Island.pptx
Exploring_the_Narrative_Style_of_Amitav_Ghoshs_Gun_Island.pptxExploring_the_Narrative_Style_of_Amitav_Ghoshs_Gun_Island.pptx
Exploring_the_Narrative_Style_of_Amitav_Ghoshs_Gun_Island.pptx
 
How to Add New Custom Addons Path in Odoo 17
How to Add New Custom Addons Path in Odoo 17How to Add New Custom Addons Path in Odoo 17
How to Add New Custom Addons Path in Odoo 17
 
General Principles of Intellectual Property: Concepts of Intellectual Proper...
General Principles of Intellectual Property: Concepts of Intellectual  Proper...General Principles of Intellectual Property: Concepts of Intellectual  Proper...
General Principles of Intellectual Property: Concepts of Intellectual Proper...
 
Wellbeing inclusion and digital dystopias.pptx
Wellbeing inclusion and digital dystopias.pptxWellbeing inclusion and digital dystopias.pptx
Wellbeing inclusion and digital dystopias.pptx
 
Python Notes for mca i year students osmania university.docx
Python Notes for mca i year students osmania university.docxPython Notes for mca i year students osmania university.docx
Python Notes for mca i year students osmania university.docx
 

Introduction to Data Mining and Data Warehousing

  • 1. Data Mining and Data Warehousing Introduction
  • 2. Course Title: Data Mining and Data Warehousing (Elective) • Course code: IT 308 • Credits: 3 • Lecture Hours: 48 • Course Objective – The objective of the course is to make learner understand foundation principles and techniques of data mining and data warehousing. – Students will be able to select and use various data mining language and tools very useful for adding business value of an organization. • Course Description – Introduction, Data Preprocessing- Data Integration and Transformation, Classification, Association Analysis, Cluster Analysis, Information Privacy and Data Mining, Advanced Applications, Search engines, Data Warehouses, Capacity Planning. 2Compiled By: Kamal Acharya7/2/2019
  • 3. Course Details • Unit 1: Introduction LH 2 – 1.1. Data Mining Origin – 1.2. Data Mining & Data Warehousing basics • Unit 2: Data Preprocessing LH 6 – 2.1. Data Types and Attributes – 2.2. Data Pre-processing – 2.3. OLAP – 2.4 Characteristics of OLAP Systems – 2.5 Multidimensional View and Data cube – 2.6 Data Cube Implementation – 2.7 Data Cube Operations (Roll-up, Roll Down, slice and dice and pivot) – 2.8 Guidelines for OLAP Implementation 3Compiled By: Kamal Acharya7/2/2019
  • 4. Contd.. • Unit 3: Classification LH 7 – 3.1. Basics and Algorithms – 3.2. Decision Tree Classifier – 3.3. Rule Based Classifier – 3.4. Nearest Neighbor Classifier – 3.5. Bayesian Classifier – 3.6. Artificial Neural Network Classifier – 3.7. Issues : Over-fitting, Validation, Model Comparison • Unit 4: Association Analysis LH 7 – 4.1. Basics and Algorithms – 4.2. Frequent Item-set Pattern & Apriori Principle – 4.3. FP-Growth, FP-Tree – 4.4. Handling Categorical Attributes 4Compiled By: Kamal Acharya7/2/2019
  • 5. Contd.. • Unit 5: Cluster Analysis LH 7 – 5.1. Basics and Algorithms – 5.2. K-means Clustering – 5.3. Hierarchical Clustering – 5.4. Density-based spatial clustering of applications with noise (DBSCAN) Clustering • Unit 6: Information Privacy and Data Mining LH 3 – 6.1 Basic principles to Protect Information Privacy – 6.2 Uses and Misuses of Data Mining – 6.3 Primary Aims of data Mining – 6.4 Pitfalls of Data Mining 5Compiled By: Kamal Acharya7/2/2019
  • 6. Contd.. • Unit 7: Advanced Applications LH 3 – 7.1. Web-mining: Web content mining, web usage mining – 7.2. Time-series data mining • Unit 8: Search Engines LH 3 – 8.1 Characteristics of search engine – 8.2 Search Engine functionality – 8.3 Ranking of Web pages 6Compiled By: Kamal Acharya7/2/2019
  • 7. Contd.. • Unit 9: Data Warehousing LH 7 – 9.1 Operational Data sources – 9.2 ETL (Extract, Transform, Load) – 9.3 Data Warehouse Processes, Managers and their functions – 9.4 Data Warehouses and Data Warehouses Design – 9.5 Guidelines for Data Warehouse Implementation • Unit 10 Capacity Planning LH 3 – 10.1 Calculating storage requirement, CPU requirements 7Compiled By: Kamal Acharya7/2/2019
  • 8. Contd.. • Practical: Students should practice enough on real-world data intensive problems 8Compiled By: Kamal Acharya7/2/2019
  • 9. Contd.. • References: Pang-Ning Tan, Michael Steinbach and Vipin Kumar, Introductionto Data Mining, 2005, Addison- Wesley. – Jiawei Han and Micheline Kamber, Data Mining: Concepts and Techniques, 2nd Edition, 2006, Morgan Kaufmann. 9Compiled By: Kamal Acharya7/2/2019
  • 10. Contd.. – G.K. Gupta, Introduction to Data Mining with Case Studies, Prentice Hall of India – IBM, An Introduction to Building the Data Warehouse, Prentice Hall of India – IBM, Introduction to Business Intelligence and Data Warehousing, Prentice Hall of India – Adriaans Pieter, D. Zantige, "Data Mining", Pearson Education Asia Pub. Ltd, 2002 10Compiled By: Kamal Acharya7/2/2019
  • 11. Unit 1 : Introduction to Data Mining and Data Warehousing What is Data? – A representation of facts, concepts, or instructions in a formal manner suitable for communication, interpretation, or processing by human beings or by computers. 11Compiled By: Kamal Acharya7/2/2019
  • 12. Origin of Data mining • The steady and amazing progress of computer hardware technology in the past three decades has led to large supplies of powerful and affordable computers, data collection equipment, and storage media. • This technology provides a great boost to the database and information industry, and makes a huge number of databases and information re-positories available. • This availability of huge data repositories creates a Data explosion problem (data rich knowledge poor situation). 12Compiled By: Kamal Acharya7/2/2019
  • 13. Contd.. • We are drowning in data, but starving for knowledge! • So, Powerful and versatile tools are badly needed to automatically uncover valuable information from tremendous amounts of data and to transform such data into organized knowledge. Necessity is the Mother of invention! -plato • This necessity has led to the birth of data mining. 13Compiled By: Kamal Acharya7/2/2019
  • 14. What is Data Mining? Extraction of interesting (non-trivial, implicit, previously unknown and potentially useful) patterns or knowledge from huge amount of data. 7/2/2019 14Compiled By: Kamal Acharya
  • 15. Contd... • Data mining: a misnomer? • Scenario: Remember that the mining of gold from the rocks or sand is referred to as gold mining rather than rock or sand mining. – Thus, data mining should have been more appropriately named as “knowledge mining” which emphasis on mining knowledge from large amounts of data. – But, which is unfortunately somewhat long so, named “data mining” • The overall goal of the data mining process is to extract pattern from a data set and transform it into an understandable structure for further use. 15Compiled By: Kamal Acharya7/2/2019
  • 16. Contd.. • Alternative names for data mining: – Knowledge discovery(mining) in databases (KDD) – knowledge extraction – data/pattern analysis – data archeology – data dredging – information harvesting – business intelligence, etc. 16Compiled By: Kamal Acharya7/2/2019
  • 17. Contd.. • The key properties of data mining are: – Automatic discovery of patterns • E.g., Market basket analysis. – Prediction of likely outcomes • E.g., weather forecasting – Creation of actionable information • E.g., Police investigation – Focus on large datasets and databases 17Compiled By: Kamal Acharya7/2/2019
  • 18. Data mining is not •Brute-force crunching of bulk data. •“Blind” application of algorithms. •Going to find non-existential relationships. •Presenting data in different ways •Queries to the database are not DM. •A magic that will turn your data into gold. 7/2/2019 18Compiled By: Kamal Acharya
  • 19. Contd.. • Data Mining: Confluence of Multiple Disciplines : 19Compiled By: Kamal Acharya Data Mining Database Technology Statistics Machine Learning Pattern Recognition Algorithm Other Disciplines Visualization 7/2/2019
  • 20. Why Data Mining?—Potential Applications • Data analysis and decision support – Market analysis and management • Market basket analysis, sale techniques, customer feedback on items (Opinion Mining) – Risk analysis and management • Forecasting, decision support system – Fraud detection and detection of unusual patterns (outliers) 207/2/2019 Compiled By: Kamal Acharya
  • 21. Why Data Mining?—Potential Applications • Other Applications – Text mining (news group, email, documents) and Web mining – Stream data mining (mining from continuous / rapid data • Eg Telephone communication pattern, Web Searching, Sensor data – Bioinformatics and bio-data analysis 217/2/2019 Compiled By: Kamal Acharya
  • 22. Data Mining: On What Kinds of Data? • As a general technology, data mining can be applied to any kind of data as long as the data are meaningful for a target application. – Database-oriented data sets and applications • Relational database, data warehouse, transactional database 227/2/2019 Compiled By: Kamal Acharya
  • 23. Contd.. • Advanced data sets and advanced applications – Data streams and sensor data – Time-series data – graphs, social networks and multi-linked data – Heterogeneous databases and legacy databases – Spatial data and spatiotemporal data – Multimedia database – Text databases – The World-Wide Web 237/2/2019 Compiled By: Kamal Acharya
  • 24. Knowledge Discovery (KDD) Process.. • Simply stated, data mining refers to extracting or “mining” knowledge from large amounts of data stored in databases, data warehouses, or other information repositories. • Many people treat data mining as a synonym for another popularly used term, Knowledge Discovery from Data, or KDD. • Alternatively, others view data mining as simply an essential step in the process of knowledge discovery. • Knowledge discovery consists of an iterative sequence of the following steps: 24Compiled By: Kamal Acharya7/2/2019
  • 25. Contd.. Figure: Knowledge Discovery Process (Stages of KDD) 25Compiled By: Kamal Acharya7/2/2019
  • 26. Contd.. • Data cleaning : – It removes noise and inconsistent data • Data integration: – This combines data from multiple data sources • Data selection: – Data relevant to the analysis task are retrieved from the database • Data transformation: – Data are transformed or consolidated into forms appropriate for mining by performing summary or aggregation operations. 26Compiled By: Kamal Acharya7/2/2019
  • 27. Contd.. • Data mining: – an essential process where intelligent methods are applied in order to extract data patterns • Pattern evaluation: – Identifies the truly interesting patterns representing knowledge based on some interestingness measures. • Knowledge presentation: – Knowledge representation techniques are used to present the mined knowledge to the user. 27Compiled By: Kamal Acharya7/2/2019
  • 28. Contd.. • According to this view, data mining is only one step in the knowledge discovery process. • However, in industry, in media, and in the database research milieu, the term data mining is becoming more popular than the longer term of knowledge discovery from data. • Therefore, we choose to use the term data mining. • Based on this view, the architecture of a typical data mining system is described in the following slides 28Compiled By: Kamal Acharya7/2/2019
  • 29. Architecture of Data Mining System • A typical data mining system may have the following major components. Figure: Architecture of Data Mining System 29Compiled By: Kamal Acharya7/2/2019
  • 30. Contd.. • Database, Data Warehouse, World Wide Web, or Other Information Repository: – This is one or a set of databases, data warehouses, spreadsheets, or other kinds of information repositories. – Data cleaning and data integration techniques may be performed on the data. 30Compiled By: Kamal Acharya7/2/2019
  • 31. Contd.. • Database or Data Warehouse Server: – The database or data warehouse server is responsible for fetching the relevant data, based on the user’s data mining request. 31Compiled By: Kamal Acharya7/2/2019
  • 32. Contd.. • Knowledge Base: – This is the domain knowledge that is used to guide the search or evaluate the interestingness of resulting patterns. It is simply stored in the form of set of rules. – Such knowledge can include concept hierarchies, used to organize attributes or attribute values into different levels of abstraction. 32Compiled By: Kamal Acharya7/2/2019
  • 33. Contd.. • Data Mining Engine: – This is essential to the data mining system and ideally consists of a set of functional modules for tasks such as association and correlation analysis, classification, prediction, cluster analysis, outlier analysis and etc. 33Compiled By: Kamal Acharya7/2/2019
  • 34. Contd.. • Pattern Evaluation Module: – This component typically employs interestingness measures and interacts with the data mining modules so as to focus the search toward interesting patterns. – It may use interestingness thresholds to filter out discovered patterns. 34Compiled By: Kamal Acharya7/2/2019
  • 35. Contd.. • User interface: – This module communicates between users and the data mining system, allowing the user to interact with the system by specifying a data mining query or task. – In addition, this component allows the user to browse database and data warehouse schemas or data structures, evaluate mined patterns, and visualize the patterns in different forms. 35Compiled By: Kamal Acharya7/2/2019
  • 36. What is a Data Warehouse? • A warehouse in general terms is a historic repository of information collected from multiple sources, stored under a unified schema, and that usually resides at a single site. • Data Warehouses are constructed via a process of – DATA CLEANING, – DATA INTEGRATION, – DATA TRANSFORMATION, – DATA LOADING, and – PERIODIC DATA REFRESHING. • A data warehouse stores historical data of an organization so that they can analyze their performance over the past time (days, weeks, months or years) and plan for the future. 36Compiled By: Kamal Acharya7/2/2019
  • 37. Contd…… • The popular definition of the data warehouse is given by WH Inmon: • “A data warehouse is a subject-oriented, integrated, time-variant, and nonvolatile collection of data in support of management’s decision- making process.” • Data warehousing: – The process of constructing and using data warehouses. – Is the process of extracting & transferring operational data into informational data & loading it into a central data store (warehouse) 377/2/2019 Compiled By: Kamal Acharya
  • 38. Data Warehouse—Integrated • Constructed by integrating multiple, heterogeneous data sources – relational databases, flat files, on-line transaction records • Data cleaning and data integration techniques are applied. – Ensure consistency in naming conventions, encoding structures, attribute measures, etc. among different data sources • E.g., Hotel price: currency, tax, etc. Sales system Payroll system Purchasing system Customer data 387/2/2019 Compiled By: Kamal Acharya
  • 39. Data Warehouse—Subject-Oriented • Organized around major subjects, such as customer, product, sales. • Focusing on the modeling and analysis of data for decision makers, not on daily operations or transaction processing. • Provide a simple and concise view around particular subject issues by excluding data that are not useful in the decision support process. Sales system Payroll system Purchasing system Customer data Vendor data Employee data Operational data DW 397/2/2019 Compiled By: Kamal Acharya
  • 40. Data Warehouse—Time Variant • The time horizon for the data warehouse is significantly longer than that of operational systems. – Operational database: current value data. – Data warehouse data: provide information from a historical perspective (e.g., past 5-10 years) 407/2/2019 Compiled By: Kamal Acharya
  • 41. Data Warehouse—Non-Volatile • A physically separate store of data transformed from the operational environment. • Operational update of data does not occur in the data warehouse environment. – Does not require transaction processing, recovery, and concurrency control mechanisms – Requires only two operations in data accessing: • initial loading of data and access of data. Sales system create update insert delete Customer data load access DBMS DW 417/2/2019 Compiled By: Kamal Acharya
  • 42. Data Warehouse Usage • Three kinds of data warehouse applications – Information processing • supports querying, basic statistical analysis, and reporting using tables, charts and graphs. 427/2/2019 Compiled By: Kamal Acharya
  • 43. Contd.. • Analytical processing • multidimensional analysis of data warehouse data • supports basic OLAP operations(drill-down, roll-up, slice-dice, drilling, pivoting, which allows the user to view the data at differing degree of summarization. • Data mining • knowledge discovery from hidden patterns • supports associations, constructing analytical models, performing classification and prediction, and presenting the mining results using visualization tools. 437/2/2019 Compiled By: Kamal Acharya
  • 45. 45 3 main phases • Data acquisition: – relevant data collection – Recovering: transformation into the data warehouse model from existing models – Loading: cleaning and loading in the Data Warehouse. • Storage • Data extraction – Tool examples: Query/report, SQL, multidimensional analysis (OLAP tools), data mining • Maintenance(Optional) 7/2/2019 Compiled By: Kamal Acharya
  • 46. THE USE OF A DATA WAREHOUSE INVENTORY DATABASE PERSONNEL DATABASE NEWCASTLE SALES DB LONDON SALES DB GLASGOW SALES DB STEP 2: Question the Data Warehouse DECISIONS and ACTIONS! STEP 3: Do something with what you learn from the Data Warehouse STEP 1: Load the Data Warehouse DATA WAREHOUSE 467/2/2019 Compiled By: Kamal Acharya
  • 47. Benefits of Data Warehousing • Queries do not impact Operational systems • Provides quick response to queries for reporting • Enables Subject Area Orientation • Integrates data from multiple, diverse sources • Enables multiple interpretations of same data by different users or groups • Provides thorough analysis of data over a period of time • Accuracy of Operational systems can be checked • Provides analysis capabilities to decision makers 7/2/2019 47Compiled By: Kamal Acharya
  • 48. • Increase customer profitability • Cost effective decision making • Manage customer and business partner relationships • Manage risk, assets and liabilities • Integrate inventory, operations and manufacturing • Reduction in time to locate, access, and analyze information (Link multiple locations and geographies) • Identify developing trends and reduce time to market • Strategic advantage over competitors 7/2/2019 48Compiled By: Kamal Acharya
  • 49. • Potential high returns on investment • Competitive advantage • Increased productivity of corporate decision-makers • Provide reliable, High performance access • Consistent view of Data: Same query, same data. All users should be warned if data load has not come in. • Quality of data is a driver for business re-engineering. 7/2/2019 49Compiled By: Kamal Acharya
  • 50. Applications of Data Mining • Data mining is an interdisciplinary field with wide and diverse applications – There exist nontrivial gaps between data mining principles and domain-specific applications • Some application domains – Financial data analysis – Retail industry – Telecommunication industry – Biological data analysis 7/2/2019 50Compiled By: Kamal Acharya
  • 51. Data Mining for Financial Data Analysis • Financial data collected in banks and financial institutions are often relatively complete, reliable, and of high quality • Design and construction of data warehouses for multidimensional data analysis and data mining – View the debt and revenue changes by month, by region, by sector, and by other factors – Access statistical information such as max, min, total, average, trend, etc. • Loan payment prediction/consumer credit policy analysis – feature selection and attribute relevance ranking – Loan payment performance – Consumer credit rating 7/2/2019 51Compiled By: Kamal Acharya
  • 52. • Classification and clustering of customers for targeted marketing – multidimensional segmentation by nearest-neighbor, classification, decision trees, etc. to identify customer groups or associate a new customer to an appropriate customer group • Detection of money laundering and other financial crimes – integration of from multiple DBs (e.g., bank transactions, federal/state crime history DBs) – Tools: data visualization, linkage analysis, classification, clustering tools, outlier analysis, and sequential pattern analysis tools (find unusual access sequences) 7/2/2019 52Compiled By: Kamal Acharya
  • 53. Data Mining for Retail Industry • Retail industry: huge amounts of data on sales, customer shopping history, etc. • Applications of retail data mining – Identify customer buying behaviors – Discover customer shopping patterns and trends – Improve the quality of customer service – Achieve better customer retention and satisfaction – Enhance goods consumption ratios – Design more effective goods transportation and distribution policies 7/2/2019 53Compiled By: Kamal Acharya
  • 54. • Example 1. Design and construction of data warehouses based on the benefits of data mining – Multidimensional analysis of sales, customers, products, time, and region • Example 2. Analysis of the effectiveness of sales campaigns • Example 3. Customer retention: Analysis of customer loyalty – Use customer loyalty card information to register sequences of purchases of particular customers – Use sequential pattern mining to investigate changes in customer consumption or loyalty – Suggest adjustments on the pricing and variety of goods • Example 4. Purchase recommendation and cross-reference of items 7/2/2019 54Compiled By: Kamal Acharya
  • 55. Data Mining for Telecommunication Industry • A rapidly expanding and highly competitive industry and a great demand for data mining – Understand the business involved – Identify telecommunication patterns – Catch fraudulent activities – Make better use of resources – Improve the quality of service • Multidimensional analysis of telecommunication data – Intrinsically multidimensional: calling-time, duration, location of caller, location of callee, type of call, etc. 7/2/2019 55Compiled By: Kamal Acharya
  • 56. • Fraudulent pattern analysis and the identification of unusual patterns – Identify potentially fraudulent users and their typical usage patterns – Detect attempts to gain fraudulent entry to customer accounts – Discover unusual patterns which may need special attention • Multidimensional association and sequential pattern analysis – Find usage patterns for a set of communication services by customer group, by month, etc. – Promote the sales of specific services – Improve the availability of particular services in a region • Use of visualization tools in telecommunication data analysis 7/2/2019 56Compiled By: Kamal Acharya
  • 57. Biomedical Data Analysis • DNA sequences: 4 basic building blocks (nucleotides): adenine (A), cytosine (C), guanine (G), and thymine (T). • Gene: a sequence of hundreds of individual nucleotides arranged in a particular order • Humans have around 30,000 genes • Tremendous number of ways that the nucleotides can be ordered and sequenced to form distinct genes • Semantic integration of heterogeneous, distributed genome databases – Current: highly distributed, uncontrolled generation and use of a wide variety of DNA data – Data cleaning and data integration methods developed in data mining will help 7/2/2019 57Compiled By: Kamal Acharya
  • 58. • Similarity search and comparison among DNA sequences – Compare the frequently occurring patterns of each class (e.g., diseased and healthy) – Identify gene sequence patterns that play roles in various diseases • Association analysis: identification of co-occurring gene sequences – Most diseases are not triggered by a single gene but by a combination of genes acting together – Association analysis may help determine the kinds of genes that are likely to co-occur together in target samples • Path analysis: linking genes to different disease development stages – Different genes may become active at different stages of the disease – Develop pharmaceutical interventions that target the different stages separately • Visualization tools and genetic data analysis 7/2/2019 58Compiled By: Kamal Acharya
  • 59. Problems in Data Warehousing • Underestimation of resources for data loading • Hidden problems with source systems • Required data not captured • Increased end-user demands • Data homogenization • High demand for resources • Data ownership • High maintenance • Long duration projects • Complexity of integration 7/2/2019 59Compiled By: Kamal Acharya
  • 60. Major Challenges in Data Warehousing • Data mining requires single, separate, clean, integrated, and self-consistent source of data. – A DW is well equipped for providing data for mining. • Data quality and consistency is essential to ensure the accuracy of the predictive models. – DWs are populated with clean, consistent data • Advantageous to mine data from multiple sources to discover as many interrelationships as possible. – DWs contain data from a number of sources. • Selecting relevant subsets of records and fields for data mining – requires query capabilities of the DW. • Results of a data mining study are useful if can further investigate the uncovered patterns. – DWs provide capability to go back to the data source. 7/2/2019 60Compiled By: Kamal Acharya
  • 61. • The largest challenge a data miner may face is the sheer volume of data in the data warehouse. • It is quite important, then, that summary data also be available to get the analysis started. • A major problem is that this sheer volume may mask the important relationships the data miner is interested in. • The ability to overcome the volume and be able to interpret the data is quite important. 7/2/2019 61Compiled By: Kamal Acharya
  • 62.  Efficiency and scalability of data mining algorithms  Parallel, distributed, stream, and incremental mining methods  Handling high-dimensionality  Handling noise, uncertainty, and incompleteness of data  Incorporation of constraints, expert knowledge, and background knowledge in data mining  Pattern evaluation and knowledge integration  Mining diverse and heterogeneous kinds of data: e.g., bioinformatics, Web, software/system engineering, information networks  Application-oriented and domain-specific data mining  Invisible data mining (embedded in other functional modules)  Protection of security, integrity, and privacy in data mining Major Challenges in Data Mining 7/2/2019 62Compiled By: Kamal Acharya
  • 63. Homework • Briefly explain data mining and define it. Why data mining being used more widely now? • State and explain the major applications of data mining. • Explain briefly some limitations of data mining. • What is the future of data mining? • Can data mining in some areas assist in identifying corruption? Select one area and study the possibilities. • How is a data warehouse different from a database? How are they similar. • Explain what data warehousing and OLAP aim to achieve that can not be achieved by OLTP systems. 7/2/2019 Compiled By: Kamal Acharya 63
  • 64. Thank You ! 64Compiled By: Kamal Acharya7/2/2019