SlideShare uma empresa Scribd logo
1 de 4
MIS 542 Data Mining Concepts and Techniques
                              Spring 2008

Instructor:     Bertan Badur, Ph.D.
Office:         HKB 226
Phone:          (212) 359 70 27
E-mail:         badur@boun.edu.tr

Course Hours: Lectures: Fridays 2,3,4 (10:00-12:50)
URL:          www.mis.boun.edu.tr/badur/MIS542
Course Assistant: Ümit Topaçan
Office:           HKB 229
Phone:            (212) 359 71 13
E-mail:            topacan@boun.edu.tr

Course Description:

This course consists of three parts In the first part is about basic concepts and
methodologies of knowledge discovery from large databases and warehouses. Basic data
mining functionalities such as concept description, association, classification, prediction
and clustering are introduced. Data warehousing and OLAP is presented. Second part of
the course is about detailed discussion of various algorithms to achieve basic data
mining functionalities. Applications of these concepts and techniques to real world
problems are discussed with the aid of data mining software tools. Third part introduces
advanced topics such as : text mining, web mining, mining special or temporal data

Motivation:

As huge volumes of data accumulates in business, scientific and engineering databases,
development of reliable and scalable analysis procedures is essential to extract hidden
rules or useful patterns from these large databases. Data mining is an emerging
interdisciplinary science aiming at developing automatic or semiautomatic techniques to
discover knowledge hidden in these databases, so that decision making processes in
business and in other environments are much faster and efficient. Hence, utilization of
data mining in finance, marketing, and in telecommunication industries are dramatically
increasing in recent years.

Text Book:

•   Introduction to Data Mining, by P. N. Tan, M. Stainback, V. Kumar, Pearson
    Addition Wisley , 5006

Recommended:

•   Data Mining Concepts and Techniques, 2ed by Jiawei Han, Kamber M Morgan
    Kaufmann Publishers 2005
•   Data Mining : Practical Machine Learning Tools and Techniques 2ed Edition, by Ian
    H. Witten, Frank E., Morgan Kaufmann Publishers, 2005.
•   Data Mining Introductory and Advanced Topics, by Margaret H. Dunham, Pearson
    Education Inc. 2003
•   Data Mining: Concepts, Models, Methods, and Algorithms, by Mehmet Kantardzic,
    IEEE Press Willey Interscience, 2003
Supplementary Text Books:
   Technical books
   • Data Mining: A Tutorial Based Primer, by Richard J. Roiger, Michael W. Geatz,
      Addision Wesley 2003
   • Machine Learning, by Tom M. Mitchell, McGraw-Hill International Editions,
      1997
   • Predictive Data Mining : Weiss S. M. and N. Indurkhaya Morgan Koufmann Pub.
      1998
   • Principles of Data Mining by Hand D., Mannilla H., Smyth P. , MIT Press 2001
   • Discovering Knowledge in Data: An Introduction to Data Mining, D. T. Larose,
      Wiley-Interscience, 2005 .
   • Handbook of Data Mining and Knowledge Discovery, Willi Klözken, Zytkow J.
      M., Oxford University Press, 2002.
   Business Oriented Books
   • Mastering Data Mining: The Art and Science of Customer Relationship
      Management, by Michael T. A. Berry, Gordon Linoff, Willey Computer
      Publishing, 2000
   • Data Mining Techniques: For Marketing, Sales and Customer Relationship
      Management; by Michael T. A. Berry, Gordon Linoff, Willey Computer
      Publishing, 2004
   • Data Mining Cookbook: Modeling Data for Marketing, Risk, and CRM by Rud O.
      P. John Wiley & Sons Inc. 2001.

Course Outline:

•   Introduction (1 Week)
    • Motivation and Preliminary Definitions
    • Methodology of Knowledge Discovery in Databases
    • Architectures of Data Mining Systems
    • Descriptive/Predictive Data Mining or Supervised and Unsupervised Learning
    • Data Mining Functionalities
    • Business Applications
•   Basic Data Mining Techniques (1 Week)
    • Decision Trees
        • ID3 Algorithm
    • Association Rules
        • Apriori Algorithm
    • Clustering
        • k-Means Algorithm
•   Methodology of Knowledge Discovery in Databases (1 Week)
    • KDD Process Model
    • Data Preprocessing
    • Handling Missing Data
    • Data Transformation
    • Discretization
    • Sampling
•   Data Warehouses and OLAP (1 Weeks)
    • Basic Concepts of Data Warehousing
    • A Multidimensional Data Model
    • Architectures of Data Warehousing Systems
    • Computation of OLAP Cubes

•   Cluster Analysis (2 Weeks)
•  Types of Data in Cluster Analysis
    •  Partitioning Methods
       • K-medoids
       • CLARA
    • Hierarchical Methods
       • BIRCH
    • Density Based Methods
       • DBSCAN
       • EM Algorithm
    • Model Based Methods
       • Self Organizing Maps
•   Classification and Prediction (3 Weeks)
    • Decision Trees
       • C4.5 Algorithm
       • CART
    • Bayesian Classification
       • Naïve Bayesian Clasification
       • Bayesian Belief Networks
    • Classification by Backpropagation
    • Bayesian Classification
    • k-Nearest Neighbor Clasification
    • Combining Classifiers
    • Classification Accuracy
•   Midterm
•   Frequent Pattern Mining (2 Weeks)
    • Single Dimensional Association Rules
    • Multilevel Association Rules
    • Multidimensional Association Rules
    • Constraint Based Association Mining
    • Sequential Pattern Mining
•
•   Case Studies (1 Weeks)

Grading:

Homework                            %20
Paper reviews and presentations     % 5
Project                             %20
Midterm                             %25
Final Exam                          %30

Project:

Each student or group of students (at most two) is required to develop a term project.
Implementation of selected data mining algorithms, application of studied techniques to a
real world problem, or performance study of selected data mining algorithms can be
accepted as term projects.




Paper Reviews and Presentations:
Each student is expected to write a short critical review of a recent paper, related to an
application of data mining. A short presentation of the reviewed paper in class is
required as well.

Homework:

There are 4 or 5 sets of homework. These may include discussion questions, numerical
problems and data mining problems using real world or artificially generated data

Software:

•   DBMiner: DBMiner 2.0 Educational Version: developed by J. Han and his team
    ;author of the book “Data Mining Concepts and Techniques”; compatible with the text
    book, perform association classification and cluster analysis.
•   SPSS
    • Neural Connection: Performs neural network modeling for classification and
        prediction
    • Answer Tree: Decision tree analysis
•   Microsoft SQL Server Analysis Services
•   MATLAB

Data Sources:

•   FoodMart or WareMart Database of Microsoft Analysis Services
•   Data sources from internet
    • UCI KDD Archive
    • UCI Machine Learning Library
•   Financial/Macroeconomic data from IMKB or TCMB
•   Text book’s datasets

Schedule of Some Events:

Project Proposals: 04.04.2008
Paper presentations: - 23.05.2007
Midterm: 25.04.2008
Project Final Report: .After finals
Project Presentations: - After finals
Late Submition Policy:
%25 cut for each late school day

Mais conteúdo relacionado

Mais procurados

Data management for TA's
Data management for TA'sData management for TA's
Data management for TA'saaroncollie
 
Functional and Architectural Requirements for Metadata: Supporting Discovery...
Functional and Architectural Requirements for Metadata: Supporting Discovery...Functional and Architectural Requirements for Metadata: Supporting Discovery...
Functional and Architectural Requirements for Metadata: Supporting Discovery...Jian Qin
 
Practical and Conceptual Considerations of Research Object Preservation
Practical and Conceptual Considerations of Research Object PreservationPractical and Conceptual Considerations of Research Object Preservation
Practical and Conceptual Considerations of Research Object PreservationSEAD
 
Metadata for digital long-term preservation
Metadata for digital long-term preservationMetadata for digital long-term preservation
Metadata for digital long-term preservationMichael Day
 
Sharing the load: librarians and research data support services
Sharing the load: librarians and research data support servicesSharing the load: librarians and research data support services
Sharing the load: librarians and research data support servicesLondon South Bank University
 
Advancing Knowledge Discovery and Data Mining
Advancing Knowledge Discovery and Data MiningAdvancing Knowledge Discovery and Data Mining
Advancing Knowledge Discovery and Data MiningRyota Eisaki
 
Guidelines for OSTP Data Access Plans
Guidelines for OSTP Data Access PlansGuidelines for OSTP Data Access Plans
Guidelines for OSTP Data Access PlansICPSR
 
Understanding ICPSR - An Orientation and Tours of ICPSR Data Services and Edu...
Understanding ICPSR - An Orientation and Tours of ICPSR Data Services and Edu...Understanding ICPSR - An Orientation and Tours of ICPSR Data Services and Edu...
Understanding ICPSR - An Orientation and Tours of ICPSR Data Services and Edu...ICPSR
 
Best practices data collection
Best practices data collectionBest practices data collection
Best practices data collectionSherry Lake
 
Data Sharing with ICPSR: Fueling the Cycle of Science through Discovery, Acce...
Data Sharing with ICPSR: Fueling the Cycle of Science through Discovery, Acce...Data Sharing with ICPSR: Fueling the Cycle of Science through Discovery, Acce...
Data Sharing with ICPSR: Fueling the Cycle of Science through Discovery, Acce...ICPSR
 
Data Citation Implementation Guidelines By Tim Clark
Data Citation Implementation Guidelines By Tim ClarkData Citation Implementation Guidelines By Tim Clark
Data Citation Implementation Guidelines By Tim Clarkdatascienceiqss
 
2013 ICPSR Data Services
2013 ICPSR Data Services2013 ICPSR Data Services
2013 ICPSR Data ServicesICPSR
 
BigInsight seminar on Practical Privacy-Preserving Distributed Statistical Co...
BigInsight seminar on Practical Privacy-Preserving Distributed Statistical Co...BigInsight seminar on Practical Privacy-Preserving Distributed Statistical Co...
BigInsight seminar on Practical Privacy-Preserving Distributed Statistical Co...Statistisk sentralbyrå
 
Introduction to Data Management Planning
Introduction to Data Management PlanningIntroduction to Data Management Planning
Introduction to Data Management PlanningSarah Jones
 
Intro to Data Management Plans
Intro to Data Management PlansIntro to Data Management Plans
Intro to Data Management PlansSarah Jones
 

Mais procurados (20)

Data management for TA's
Data management for TA'sData management for TA's
Data management for TA's
 
Functional and Architectural Requirements for Metadata: Supporting Discovery...
Functional and Architectural Requirements for Metadata: Supporting Discovery...Functional and Architectural Requirements for Metadata: Supporting Discovery...
Functional and Architectural Requirements for Metadata: Supporting Discovery...
 
Practical and Conceptual Considerations of Research Object Preservation
Practical and Conceptual Considerations of Research Object PreservationPractical and Conceptual Considerations of Research Object Preservation
Practical and Conceptual Considerations of Research Object Preservation
 
Metadata for digital long-term preservation
Metadata for digital long-term preservationMetadata for digital long-term preservation
Metadata for digital long-term preservation
 
Sharing the load: librarians and research data support services
Sharing the load: librarians and research data support servicesSharing the load: librarians and research data support services
Sharing the load: librarians and research data support services
 
Advancing Knowledge Discovery and Data Mining
Advancing Knowledge Discovery and Data MiningAdvancing Knowledge Discovery and Data Mining
Advancing Knowledge Discovery and Data Mining
 
Guidelines for OSTP Data Access Plans
Guidelines for OSTP Data Access PlansGuidelines for OSTP Data Access Plans
Guidelines for OSTP Data Access Plans
 
2012.10 - DDI Lifecycle - Moving Forward
2012.10 - DDI Lifecycle - Moving Forward2012.10 - DDI Lifecycle - Moving Forward
2012.10 - DDI Lifecycle - Moving Forward
 
Understanding ICPSR - An Orientation and Tours of ICPSR Data Services and Edu...
Understanding ICPSR - An Orientation and Tours of ICPSR Data Services and Edu...Understanding ICPSR - An Orientation and Tours of ICPSR Data Services and Edu...
Understanding ICPSR - An Orientation and Tours of ICPSR Data Services and Edu...
 
Best practices data collection
Best practices data collectionBest practices data collection
Best practices data collection
 
Data Sharing with ICPSR: Fueling the Cycle of Science through Discovery, Acce...
Data Sharing with ICPSR: Fueling the Cycle of Science through Discovery, Acce...Data Sharing with ICPSR: Fueling the Cycle of Science through Discovery, Acce...
Data Sharing with ICPSR: Fueling the Cycle of Science through Discovery, Acce...
 
Llebot "Research Data Support for Researchers: Metadata, Challenges, and Oppo...
Llebot "Research Data Support for Researchers: Metadata, Challenges, and Oppo...Llebot "Research Data Support for Researchers: Metadata, Challenges, and Oppo...
Llebot "Research Data Support for Researchers: Metadata, Challenges, and Oppo...
 
Data Citation Implementation Guidelines By Tim Clark
Data Citation Implementation Guidelines By Tim ClarkData Citation Implementation Guidelines By Tim Clark
Data Citation Implementation Guidelines By Tim Clark
 
2013 ICPSR Data Services
2013 ICPSR Data Services2013 ICPSR Data Services
2013 ICPSR Data Services
 
BigInsight seminar on Practical Privacy-Preserving Distributed Statistical Co...
BigInsight seminar on Practical Privacy-Preserving Distributed Statistical Co...BigInsight seminar on Practical Privacy-Preserving Distributed Statistical Co...
BigInsight seminar on Practical Privacy-Preserving Distributed Statistical Co...
 
ROER4D Open Data Initiative
ROER4D Open Data InitiativeROER4D Open Data Initiative
ROER4D Open Data Initiative
 
Introduction to Data Management Planning
Introduction to Data Management PlanningIntroduction to Data Management Planning
Introduction to Data Management Planning
 
Hawkins "Implementation of the CONSER Standard Record"
Hawkins "Implementation of the CONSER Standard Record"Hawkins "Implementation of the CONSER Standard Record"
Hawkins "Implementation of the CONSER Standard Record"
 
Intro to Data Management Plans
Intro to Data Management PlansIntro to Data Management Plans
Intro to Data Management Plans
 
Praetzellis "Data Management Planning and Tools"
Praetzellis "Data Management Planning and Tools"Praetzellis "Data Management Planning and Tools"
Praetzellis "Data Management Planning and Tools"
 

Semelhante a MIS 542 Syllabus 08.doc

Lecture_1_Intro.pdf
Lecture_1_Intro.pdfLecture_1_Intro.pdf
Lecture_1_Intro.pdfpaijitk
 
Data mining techniques unit 1
Data mining techniques  unit 1Data mining techniques  unit 1
Data mining techniques unit 1malathieswaran29
 
2 introductory slides
2 introductory slides2 introductory slides
2 introductory slidestafosepsdfasg
 
Pemanfaatan Big Data Dalam Riset 2023.pptx
Pemanfaatan Big Data Dalam Riset 2023.pptxPemanfaatan Big Data Dalam Riset 2023.pptx
Pemanfaatan Big Data Dalam Riset 2023.pptxelisarosa29
 
Incentivising the uptake of reusable metadata in the survey production process
Incentivising the uptake of reusable metadata in the survey production processIncentivising the uptake of reusable metadata in the survey production process
Incentivising the uptake of reusable metadata in the survey production processLouise Corti
 
Love Your Data Locally
Love Your Data LocallyLove Your Data Locally
Love Your Data LocallyErin D. Foster
 
Datamininglecture
DatamininglectureDatamininglecture
DatamininglectureManish Rana
 
NCME Big Data in Education
NCME Big Data  in EducationNCME Big Data  in Education
NCME Big Data in EducationPhilip Piety
 
Changing the Curation Equation: A Data Lifecycle Approach to Lowering Costs a...
Changing the Curation Equation: A Data Lifecycle Approach to Lowering Costs a...Changing the Curation Equation: A Data Lifecycle Approach to Lowering Costs a...
Changing the Curation Equation: A Data Lifecycle Approach to Lowering Costs a...SEAD
 
Data Science Training and Placement
Data Science Training and PlacementData Science Training and Placement
Data Science Training and PlacementAkhilGGM
 
8th semester syllabus b sc csit-pawan kafle
8th semester syllabus b sc csit-pawan kafle8th semester syllabus b sc csit-pawan kafle
8th semester syllabus b sc csit-pawan kaflePAWAN KAFLE
 
Data Sets, Ensemble Cloud Computing, and the University Library: Getting the ...
Data Sets, Ensemble Cloud Computing, and the University Library:Getting the ...Data Sets, Ensemble Cloud Computing, and the University Library:Getting the ...
Data Sets, Ensemble Cloud Computing, and the University Library: Getting the ...SEAD
 
How metadata drives data sharing; UK Data Archive
How metadata drives data sharing; UK Data Archive How metadata drives data sharing; UK Data Archive
How metadata drives data sharing; UK Data Archive Louise Corti
 

Semelhante a MIS 542 Syllabus 08.doc (20)

Lecture_1_Intro.pdf
Lecture_1_Intro.pdfLecture_1_Intro.pdf
Lecture_1_Intro.pdf
 
dwdm unit 1.ppt
dwdm unit 1.pptdwdm unit 1.ppt
dwdm unit 1.ppt
 
17 cs002
17 cs00217 cs002
17 cs002
 
Data mining techniques unit 1
Data mining techniques  unit 1Data mining techniques  unit 1
Data mining techniques unit 1
 
2 introductory slides
2 introductory slides2 introductory slides
2 introductory slides
 
DBMS
DBMSDBMS
DBMS
 
Pemanfaatan Big Data Dalam Riset 2023.pptx
Pemanfaatan Big Data Dalam Riset 2023.pptxPemanfaatan Big Data Dalam Riset 2023.pptx
Pemanfaatan Big Data Dalam Riset 2023.pptx
 
Incentivising the uptake of reusable metadata in the survey production process
Incentivising the uptake of reusable metadata in the survey production processIncentivising the uptake of reusable metadata in the survey production process
Incentivising the uptake of reusable metadata in the survey production process
 
Love Your Data Locally
Love Your Data LocallyLove Your Data Locally
Love Your Data Locally
 
Datamininglecture
DatamininglectureDatamininglecture
Datamininglecture
 
NCME Big Data in Education
NCME Big Data  in EducationNCME Big Data  in Education
NCME Big Data in Education
 
Changing the Curation Equation: A Data Lifecycle Approach to Lowering Costs a...
Changing the Curation Equation: A Data Lifecycle Approach to Lowering Costs a...Changing the Curation Equation: A Data Lifecycle Approach to Lowering Costs a...
Changing the Curation Equation: A Data Lifecycle Approach to Lowering Costs a...
 
Unit iii
Unit iiiUnit iii
Unit iii
 
NISO Virtual Conference Scientific Data Management: Caring for Your Instituti...
NISO Virtual Conference Scientific Data Management: Caring for Your Instituti...NISO Virtual Conference Scientific Data Management: Caring for Your Instituti...
NISO Virtual Conference Scientific Data Management: Caring for Your Instituti...
 
Data Science Training and Placement
Data Science Training and PlacementData Science Training and Placement
Data Science Training and Placement
 
8th semester syllabus b sc csit-pawan kafle
8th semester syllabus b sc csit-pawan kafle8th semester syllabus b sc csit-pawan kafle
8th semester syllabus b sc csit-pawan kafle
 
Lecture - Data Mining
Lecture - Data MiningLecture - Data Mining
Lecture - Data Mining
 
BAS 250 Lecture 1
BAS 250 Lecture 1BAS 250 Lecture 1
BAS 250 Lecture 1
 
Data Sets, Ensemble Cloud Computing, and the University Library: Getting the ...
Data Sets, Ensemble Cloud Computing, and the University Library:Getting the ...Data Sets, Ensemble Cloud Computing, and the University Library:Getting the ...
Data Sets, Ensemble Cloud Computing, and the University Library: Getting the ...
 
How metadata drives data sharing; UK Data Archive
How metadata drives data sharing; UK Data Archive How metadata drives data sharing; UK Data Archive
How metadata drives data sharing; UK Data Archive
 

Mais de butest

EL MODELO DE NEGOCIO DE YOUTUBE
EL MODELO DE NEGOCIO DE YOUTUBEEL MODELO DE NEGOCIO DE YOUTUBE
EL MODELO DE NEGOCIO DE YOUTUBEbutest
 
1. MPEG I.B.P frame之不同
1. MPEG I.B.P frame之不同1. MPEG I.B.P frame之不同
1. MPEG I.B.P frame之不同butest
 
LESSONS FROM THE MICHAEL JACKSON TRIAL
LESSONS FROM THE MICHAEL JACKSON TRIALLESSONS FROM THE MICHAEL JACKSON TRIAL
LESSONS FROM THE MICHAEL JACKSON TRIALbutest
 
Timeline: The Life of Michael Jackson
Timeline: The Life of Michael JacksonTimeline: The Life of Michael Jackson
Timeline: The Life of Michael Jacksonbutest
 
Popular Reading Last Updated April 1, 2010 Adams, Lorraine The ...
Popular Reading Last Updated April 1, 2010 Adams, Lorraine The ...Popular Reading Last Updated April 1, 2010 Adams, Lorraine The ...
Popular Reading Last Updated April 1, 2010 Adams, Lorraine The ...butest
 
LESSONS FROM THE MICHAEL JACKSON TRIAL
LESSONS FROM THE MICHAEL JACKSON TRIALLESSONS FROM THE MICHAEL JACKSON TRIAL
LESSONS FROM THE MICHAEL JACKSON TRIALbutest
 
Com 380, Summer II
Com 380, Summer IICom 380, Summer II
Com 380, Summer IIbutest
 
The MYnstrel Free Press Volume 2: Economic Struggles, Meet Jazz
The MYnstrel Free Press Volume 2: Economic Struggles, Meet JazzThe MYnstrel Free Press Volume 2: Economic Struggles, Meet Jazz
The MYnstrel Free Press Volume 2: Economic Struggles, Meet Jazzbutest
 
MICHAEL JACKSON.doc
MICHAEL JACKSON.docMICHAEL JACKSON.doc
MICHAEL JACKSON.docbutest
 
Social Networks: Twitter Facebook SL - Slide 1
Social Networks: Twitter Facebook SL - Slide 1Social Networks: Twitter Facebook SL - Slide 1
Social Networks: Twitter Facebook SL - Slide 1butest
 
Facebook
Facebook Facebook
Facebook butest
 
Executive Summary Hare Chevrolet is a General Motors dealership ...
Executive Summary Hare Chevrolet is a General Motors dealership ...Executive Summary Hare Chevrolet is a General Motors dealership ...
Executive Summary Hare Chevrolet is a General Motors dealership ...butest
 
Welcome to the Dougherty County Public Library's Facebook and ...
Welcome to the Dougherty County Public Library's Facebook and ...Welcome to the Dougherty County Public Library's Facebook and ...
Welcome to the Dougherty County Public Library's Facebook and ...butest
 
NEWS ANNOUNCEMENT
NEWS ANNOUNCEMENTNEWS ANNOUNCEMENT
NEWS ANNOUNCEMENTbutest
 
C-2100 Ultra Zoom.doc
C-2100 Ultra Zoom.docC-2100 Ultra Zoom.doc
C-2100 Ultra Zoom.docbutest
 
MAC Printing on ITS Printers.doc.doc
MAC Printing on ITS Printers.doc.docMAC Printing on ITS Printers.doc.doc
MAC Printing on ITS Printers.doc.docbutest
 
Mac OS X Guide.doc
Mac OS X Guide.docMac OS X Guide.doc
Mac OS X Guide.docbutest
 
WEB DESIGN!
WEB DESIGN!WEB DESIGN!
WEB DESIGN!butest
 

Mais de butest (20)

EL MODELO DE NEGOCIO DE YOUTUBE
EL MODELO DE NEGOCIO DE YOUTUBEEL MODELO DE NEGOCIO DE YOUTUBE
EL MODELO DE NEGOCIO DE YOUTUBE
 
1. MPEG I.B.P frame之不同
1. MPEG I.B.P frame之不同1. MPEG I.B.P frame之不同
1. MPEG I.B.P frame之不同
 
LESSONS FROM THE MICHAEL JACKSON TRIAL
LESSONS FROM THE MICHAEL JACKSON TRIALLESSONS FROM THE MICHAEL JACKSON TRIAL
LESSONS FROM THE MICHAEL JACKSON TRIAL
 
Timeline: The Life of Michael Jackson
Timeline: The Life of Michael JacksonTimeline: The Life of Michael Jackson
Timeline: The Life of Michael Jackson
 
Popular Reading Last Updated April 1, 2010 Adams, Lorraine The ...
Popular Reading Last Updated April 1, 2010 Adams, Lorraine The ...Popular Reading Last Updated April 1, 2010 Adams, Lorraine The ...
Popular Reading Last Updated April 1, 2010 Adams, Lorraine The ...
 
LESSONS FROM THE MICHAEL JACKSON TRIAL
LESSONS FROM THE MICHAEL JACKSON TRIALLESSONS FROM THE MICHAEL JACKSON TRIAL
LESSONS FROM THE MICHAEL JACKSON TRIAL
 
Com 380, Summer II
Com 380, Summer IICom 380, Summer II
Com 380, Summer II
 
PPT
PPTPPT
PPT
 
The MYnstrel Free Press Volume 2: Economic Struggles, Meet Jazz
The MYnstrel Free Press Volume 2: Economic Struggles, Meet JazzThe MYnstrel Free Press Volume 2: Economic Struggles, Meet Jazz
The MYnstrel Free Press Volume 2: Economic Struggles, Meet Jazz
 
MICHAEL JACKSON.doc
MICHAEL JACKSON.docMICHAEL JACKSON.doc
MICHAEL JACKSON.doc
 
Social Networks: Twitter Facebook SL - Slide 1
Social Networks: Twitter Facebook SL - Slide 1Social Networks: Twitter Facebook SL - Slide 1
Social Networks: Twitter Facebook SL - Slide 1
 
Facebook
Facebook Facebook
Facebook
 
Executive Summary Hare Chevrolet is a General Motors dealership ...
Executive Summary Hare Chevrolet is a General Motors dealership ...Executive Summary Hare Chevrolet is a General Motors dealership ...
Executive Summary Hare Chevrolet is a General Motors dealership ...
 
Welcome to the Dougherty County Public Library's Facebook and ...
Welcome to the Dougherty County Public Library's Facebook and ...Welcome to the Dougherty County Public Library's Facebook and ...
Welcome to the Dougherty County Public Library's Facebook and ...
 
NEWS ANNOUNCEMENT
NEWS ANNOUNCEMENTNEWS ANNOUNCEMENT
NEWS ANNOUNCEMENT
 
C-2100 Ultra Zoom.doc
C-2100 Ultra Zoom.docC-2100 Ultra Zoom.doc
C-2100 Ultra Zoom.doc
 
MAC Printing on ITS Printers.doc.doc
MAC Printing on ITS Printers.doc.docMAC Printing on ITS Printers.doc.doc
MAC Printing on ITS Printers.doc.doc
 
Mac OS X Guide.doc
Mac OS X Guide.docMac OS X Guide.doc
Mac OS X Guide.doc
 
hier
hierhier
hier
 
WEB DESIGN!
WEB DESIGN!WEB DESIGN!
WEB DESIGN!
 

MIS 542 Syllabus 08.doc

  • 1. MIS 542 Data Mining Concepts and Techniques Spring 2008 Instructor: Bertan Badur, Ph.D. Office: HKB 226 Phone: (212) 359 70 27 E-mail: badur@boun.edu.tr Course Hours: Lectures: Fridays 2,3,4 (10:00-12:50) URL: www.mis.boun.edu.tr/badur/MIS542 Course Assistant: Ümit Topaçan Office: HKB 229 Phone: (212) 359 71 13 E-mail: topacan@boun.edu.tr Course Description: This course consists of three parts In the first part is about basic concepts and methodologies of knowledge discovery from large databases and warehouses. Basic data mining functionalities such as concept description, association, classification, prediction and clustering are introduced. Data warehousing and OLAP is presented. Second part of the course is about detailed discussion of various algorithms to achieve basic data mining functionalities. Applications of these concepts and techniques to real world problems are discussed with the aid of data mining software tools. Third part introduces advanced topics such as : text mining, web mining, mining special or temporal data Motivation: As huge volumes of data accumulates in business, scientific and engineering databases, development of reliable and scalable analysis procedures is essential to extract hidden rules or useful patterns from these large databases. Data mining is an emerging interdisciplinary science aiming at developing automatic or semiautomatic techniques to discover knowledge hidden in these databases, so that decision making processes in business and in other environments are much faster and efficient. Hence, utilization of data mining in finance, marketing, and in telecommunication industries are dramatically increasing in recent years. Text Book: • Introduction to Data Mining, by P. N. Tan, M. Stainback, V. Kumar, Pearson Addition Wisley , 5006 Recommended: • Data Mining Concepts and Techniques, 2ed by Jiawei Han, Kamber M Morgan Kaufmann Publishers 2005 • Data Mining : Practical Machine Learning Tools and Techniques 2ed Edition, by Ian H. Witten, Frank E., Morgan Kaufmann Publishers, 2005. • Data Mining Introductory and Advanced Topics, by Margaret H. Dunham, Pearson Education Inc. 2003 • Data Mining: Concepts, Models, Methods, and Algorithms, by Mehmet Kantardzic, IEEE Press Willey Interscience, 2003
  • 2. Supplementary Text Books: Technical books • Data Mining: A Tutorial Based Primer, by Richard J. Roiger, Michael W. Geatz, Addision Wesley 2003 • Machine Learning, by Tom M. Mitchell, McGraw-Hill International Editions, 1997 • Predictive Data Mining : Weiss S. M. and N. Indurkhaya Morgan Koufmann Pub. 1998 • Principles of Data Mining by Hand D., Mannilla H., Smyth P. , MIT Press 2001 • Discovering Knowledge in Data: An Introduction to Data Mining, D. T. Larose, Wiley-Interscience, 2005 . • Handbook of Data Mining and Knowledge Discovery, Willi Klözken, Zytkow J. M., Oxford University Press, 2002. Business Oriented Books • Mastering Data Mining: The Art and Science of Customer Relationship Management, by Michael T. A. Berry, Gordon Linoff, Willey Computer Publishing, 2000 • Data Mining Techniques: For Marketing, Sales and Customer Relationship Management; by Michael T. A. Berry, Gordon Linoff, Willey Computer Publishing, 2004 • Data Mining Cookbook: Modeling Data for Marketing, Risk, and CRM by Rud O. P. John Wiley & Sons Inc. 2001. Course Outline: • Introduction (1 Week) • Motivation and Preliminary Definitions • Methodology of Knowledge Discovery in Databases • Architectures of Data Mining Systems • Descriptive/Predictive Data Mining or Supervised and Unsupervised Learning • Data Mining Functionalities • Business Applications • Basic Data Mining Techniques (1 Week) • Decision Trees • ID3 Algorithm • Association Rules • Apriori Algorithm • Clustering • k-Means Algorithm • Methodology of Knowledge Discovery in Databases (1 Week) • KDD Process Model • Data Preprocessing • Handling Missing Data • Data Transformation • Discretization • Sampling • Data Warehouses and OLAP (1 Weeks) • Basic Concepts of Data Warehousing • A Multidimensional Data Model • Architectures of Data Warehousing Systems • Computation of OLAP Cubes • Cluster Analysis (2 Weeks)
  • 3. • Types of Data in Cluster Analysis • Partitioning Methods • K-medoids • CLARA • Hierarchical Methods • BIRCH • Density Based Methods • DBSCAN • EM Algorithm • Model Based Methods • Self Organizing Maps • Classification and Prediction (3 Weeks) • Decision Trees • C4.5 Algorithm • CART • Bayesian Classification • Naïve Bayesian Clasification • Bayesian Belief Networks • Classification by Backpropagation • Bayesian Classification • k-Nearest Neighbor Clasification • Combining Classifiers • Classification Accuracy • Midterm • Frequent Pattern Mining (2 Weeks) • Single Dimensional Association Rules • Multilevel Association Rules • Multidimensional Association Rules • Constraint Based Association Mining • Sequential Pattern Mining • • Case Studies (1 Weeks) Grading: Homework %20 Paper reviews and presentations % 5 Project %20 Midterm %25 Final Exam %30 Project: Each student or group of students (at most two) is required to develop a term project. Implementation of selected data mining algorithms, application of studied techniques to a real world problem, or performance study of selected data mining algorithms can be accepted as term projects. Paper Reviews and Presentations:
  • 4. Each student is expected to write a short critical review of a recent paper, related to an application of data mining. A short presentation of the reviewed paper in class is required as well. Homework: There are 4 or 5 sets of homework. These may include discussion questions, numerical problems and data mining problems using real world or artificially generated data Software: • DBMiner: DBMiner 2.0 Educational Version: developed by J. Han and his team ;author of the book “Data Mining Concepts and Techniques”; compatible with the text book, perform association classification and cluster analysis. • SPSS • Neural Connection: Performs neural network modeling for classification and prediction • Answer Tree: Decision tree analysis • Microsoft SQL Server Analysis Services • MATLAB Data Sources: • FoodMart or WareMart Database of Microsoft Analysis Services • Data sources from internet • UCI KDD Archive • UCI Machine Learning Library • Financial/Macroeconomic data from IMKB or TCMB • Text book’s datasets Schedule of Some Events: Project Proposals: 04.04.2008 Paper presentations: - 23.05.2007 Midterm: 25.04.2008 Project Final Report: .After finals Project Presentations: - After finals Late Submition Policy: %25 cut for each late school day