1. MIS 542 Data Warehousing and Data Mining
Spring 2006
Instructor: Bertan Badur, Ph.D.
Office: HKB 226
Phone: (212) 359 70 27
E-mail: badur@boun.edu.tr
Course Hours: Lectures: Mondays 6,7,8 (14:00-16:50)
URL: www.mis.boun.edu.tr/badur/MIS542
Course Description:
This course consists of three parts In the first part is about basic concepts and
methodologies of knowledge discovery from large databases and warehouses. Basic data
mining functionalities such as concept description, association, classification, prediction
and clustering are introduced. Data warehousing and OLAP is presented. Second part of
the course is about detailed discussion of various algorithms to achieve basic data
mining functionalities. Applications of these concepts and techniques to real world
problems are discussed with the aid of data mining software tools. Third part introduces
advanced topics such as : text mining, web mining, mining special or temporal data
Motivation:
As huge volumes of data accumulates in business, scientific and engineering databases,
development of reliable and scalable analysis procedures is essential to extract hidden
rules or useful patterns from these large databases. Data mining is an emerging
interdisciplinary science aiming at developing automatic or semiautomatic techniques to
discover knowledge hidden in these databases, so that decision making processes in
business and in other environments are much faster and efficient. Hence, utilization of
data mining in finance, marketing, and in telecommunication industries are dramatically
increasing in recent years.
Text Book:
• Data Mining Concepts and Techniques, by Jiawei Han, Kamber M Morgan Kaufmann
Publishers 2001
Recommended:
• Data Mining : Practical Machine Learning Tools and Techniques 2ed Edition, by Ian
H. Witten, Frank E., Morgan Kaufmann Publishers, 2005.
• Data Mining Introductory and Advanced Topics, by Margaret H. Dunham, Pearson
Education Inc. 2003
• Data Mining: Concepts, Models, Methods, and Algorithms, by Mehmet Kantardzic,
IEEE Press Willey Interscience, 2003
2. Supplementary Text Books:
Technical books
• Data Mining: A Tutorial Based Primer, by Richard J. Roiger, Michael W. Geatz,
Addision Wesley 2003
• Machine Learning, by Tom M. Mitchell, McGraw-Hill International Editions,
1997
• Predictive Data Mining : Weiss S. M. and N. Indurkhaya Morgan Koufmann Pub.
1998
• Principles of Data Mining by Hand D., Mannilla H., Smyth P. , MIT Press 2001
• Discovering Knowledge in Data: An Introduction to Data Mining, D. T. Larose,
Wiley-Interscience, 2005 .
Business Oriented Books
• Mastering Data Mining: The Art and Science of Customer Relationship
Management, by Michael T. A. Berry, Gordon Linoff, Willey Computer
Publishing, 2000
• Data Mining Techniques: For Marketing, Sales and Customer Relationship
Management; by Michael T. A. Berry, Gordon Linoff, Willey Computer
Publishing, 2004
• Data Mining Cookbook: Modeling Data for Marketing, Risk, and CRM by Rud O.
P. John Wiley & Sons Inc. 2001.
• The Data Warehouse Lifecycle Toolkit by Kimball R.,Reeves L.,Ross M.,
Thornthwite W , Wiley 1998
Course Outline:
• Introduction (1 Week)
• Motivation and Preliminary Definitions
• Methodology of Knowledge Discovery in Databases
• Architectures of Data Mining Systems
• Descriptive/Predictive Data Mining or Supervised and Unsupervised Learning
• Data Mining Functionalities
• Business Applications
• Basic Data Mining Techniques (1 Week)
• Decision Trees
• ID3 Algorithm
• Association Rules
• Apriori Algorithm
• Clustering
• k-Means Algorithm
• Methodology of Knowledge Discovery in Databases (1 Week)
• KDD Process Model
• Data Preprocessing
• Handling Missing Data
• Data Transformation
• Discretization
• Sampling
• Data Warehouses and OLAP (1 Weeks)
3. • Basic Concepts of Data Warehousing
• A Multidimensional Data Model
• Architectures of Data Warehousing Systems
• Computation of OLAP Cubes
• Frequent Pattern Mining (2 Weeks)
• Single Dimensional Association Rules
• Multilevel Association Rules
• Multidimensional Association Rules
• Constraint Based Association Mining
• Sequential Pattern Mining
• Midterm
• Classification and Prediction (3 Weeks)
• Decision Trees
• C4.5 Algorithm
• CART
• Bayesian Classification
• Naïve Bayesian Clasification
• Bayesian Belief Networks
• Classification by Backpropagation
• Bayesian Classification
• k-Nearest Neighbor Clasification
• Combining Classifiers
• Classification Accuracy
• Cluster Analysis (2 Weeks)
• Types of Data in Cluster Analysis
• Partitioning Methods
• K-medoids
• CLARA
• Hierarchical Methods
• BIRCH
• Density Based Methods
• DBSCAN
• EM Algorithm
• Model Based Methods
• Self Organizing Maps
•
• Case Studies (1 Weeks)
Grading:
Homework %20
Paper reviews and presentations %10
Project %20
Midterm %25
Final Exam %25
Project:
4. Each student or group of students (at most two) is required to develop a term project.
Implementation of selected data mining algorithms, application of studied techniques to a
real world problem, or performance study of selected data mining algorithms can be
accepted as term projects.
Paper Reviews and Presentations:
Each student is expected to write a short critical review of a recent paper, related to an
application of data mining. A short presentation of the reviewed paper in class is
required as well.
Homework:
There are 5 or 6 sets of homework. These may include discussion questions, numerical
problems and data mining problems using real world or artificially generated data
Software:
• DBMiner: DBMiner 2.0 Educational Version: developed by J. Han and his team
;author of the book “Data Mining Concepts and Techniques”; compatible with the text
book, perform association classification and cluster analysis.
• SPSS
• Neural Connection: Performs neural network modeling for classification and
prediction
• Answer Tree: Decision tree analysis
• Microsoft SQL Server Analysis Services
• MATLAB
Data Sources:
• FoodMart or WareMart Database of Microsoft Analysis Services
• Data sources from internet
• UCI KDD Archive
• UCI Machine Learning Library
• Financial/Macroeconomic data from IMKB or TCMB
• Text book’s datasets
Schedule of Some Events:
Project Proposals: 10.04.2006
Paper presentations: - 22.05.2006
Midterm: 03.04.2006
Project Final Report: .After finals
Project Presentations: - After finals
Late Submition Policy:
%20 cut for each late school day