ST: CS504
Advanced Topics in Data Mining Techniques
in addition to main course page: http://husky.if.uidaho.edu/DMf07/
Fall 2007



Class notes & tentative schedule

Addition material (textbook): Data Mining: Concepts and Techniques, 2nd ed., by Jiawei Han and Micheline Kamber (slides, software, figures): original web site and course site .

    Session 01
    08.28.2007

    Introduction - class description • Class web page • Textbooks • Goals & topics of the course • Intro on data mining • Data warehousing • Data cubes • Data warehousing schemas

    Session 02
    08.30.2007
    Ch.2-3 textbook • What is Data Mining? • Data mining steps, DM in business intelligence, DM as a confluence of disciplines • DM Query Language (DMQL) • Integration and coupling of DM and DW • DM Architecture • DM Functionalities and task primitives • OLTP vs. OLAP • Data warehouse and data cubes • Schemas: star, snowflake, and fact constellation
    Session 03
    09.04.2007
    Ch.3 textbook • Multidimensional Data Model (MDDM) • Measures • Concept Hierarchies (CHs) • OLAP in MDDM • Starnet Model • Data Warehouse Architecture • Views, approaches and steps • 3-tier architecture • Back-end tools • Metadata • OLAP servers (ROLAP , MOLAP, and HOLAP)
    Session 04
    09.06.2007

    Ch.3-4 textbook • Data Warehouse implementation • Data cube computations • Data cube construction (number, access, query) • Materialization (no, partial, full) • Partial materialization (iceberg cube, shell cube • Indexing OLAP data (Bitmap indexing, join indexing)

    Session 05
    09.11.2007
    Ch.3-4 textbook • Data Cube Computation & Materialization • Ancestor and descendant • MultiWay Array Aggregation • Bottom-Up Computation (BUC)
    Session 06
    09.14.2007
    Ch.3-4 textbook • Data Cube Computation & Materialization (Star-cubing) • Star-cubing computation (shared dimensions • cuboid trees • star-tree construction • star-tree in computing iceberg cube)
    Session 07
    09.18.2007

    Ch.6 textbook & Berry's book on DM for marketing • Extracting models for data classification & prediction • Two phases or data classification • Classification (supervised learning, unsupervised learning) • Classification Accuracy (training and testing, accuracy, boosting the accuracy) • Classification & Prediction (data preparation, data transformation, algorithms) • Decision Trees (example, ID3 & CART, induction, splitting attributes, selection methods, scenarios, stopping criterion, complexity, diversity and purity, purity measures - Gini , Entropy)

    Session 08
    09.20.2007

    Practical Data Mining • Real-world issues concerning you and your data

    Session 09
    09.25.2007
    Data Mining Guest Lecture: Have fun. Look ma, it's a network.
    Session 10
    09.27.2007
    Practical issues with decision trees • Building, browsing a data cube in SQL • MDX
    Session 11
    10.02.2007
    Project papers presentations and discussions
    Session 12
    10.04.2007
    Learning from Neighbors • Eager vs. lazy learners • Lazy learners • K-nearest-neighbor classifier • Case based reasoning (CBR) • Distance measures • Euclidian space • coding theory• fuzzy space
    Session 13
    10.09.2007

    Distance measures in FL & NNS • Fuzzy logic (Intro, Fuzzy systems, Distance in fuzzy logic) • Neural Networks (Intro, Hamming distance and net value, CPN networks) • Data Mining Based on Experience • Memory (Case) Based Reasoning (Examples & steps, Case study from Barry's book)

    Session 14
    10.11.2007
    Data Mining Based on Experience: Collaborative (Information) Filtering • Steps • Examples • Applications • Papers • Resources
    Session 15
    10.16.2007

    Crisp (Hard) vs. Fuzzy Clustering • Clustering Based on ED • Clustering using Neural Networks (Kohonen Self-Organizing Map, forming clusters as you go)
    Hard clustering (k -Means clustering) • Fuzzy clustering (c -Means clustering)

    Session 16
    10.18.2007
    Mining the Web Page Layout Structure • HITS (Hyperlink-Induced Topic Search • Steps • Examples • Problems
    Session 17
    10.23.2007
    Review - Analytic Geometry in Euclidean Space with Cartesian Coordinates (slope intercept, line intercept, scalar form of a line, 2 & 3 point form, one point and vector form) • Neuron as linear classifier (Neuron, definition, threshold, bias) • Neuron as linear classifier (Linear classifier with more than two classes, Linear classifier in multidimensional space) • Support Vector Machines • History, books • Algorithm • Derivation for linearly separable patterns • Derivation for linearly inseparable patterns • XOR example • A few more examples of neural networks (Radial Basis Function Networks (RBF), Perceptron adjustable rule)
    Session 18
    10.25.2007
    Discussion of previous session material • Brief review of ANNs for classification and learning (RBF – Radial Basis Function Networks, CPN – Counter Propagation Networks, LVQ – Learning Vector Quantization, Functional Link Networks, Polynomial Networks, Perceptron adjustable rule)
    Session 19
    10.30.2007

    Classification and Prediction - Regression Analysis (Types, history, Least square method, Measure of goodness-of-fit, Multiple linear, nonlinear, fuzzy regression) • Accuracy & error measures (Accuracy, misclassification rate, confusion matrix, Other measures - TP, FP, TN, FN, P, Accuracy vs. threshold, Predictor error measures)

    Session 20
    11.01.2007
    Exam review
    Session 21
    11.06.2007
    Project paper presentations
    Session 22
    11.08.2007
    Project paper presentations • Reading material presentations
    Session 23
    11.13.2007
    Techniques for accuracy estimation • Holdout and random subsampling • k-fold cross-validation • Bootstrap • Techniques for accuracy improvement • Bagging • Boosting
    Session 24
    11.15.2007
    Fuzzy Preference Relation • Definitions (fuzzy number, fuzzy value, av , gamma resolution) • Fuzzy preference relations and properties ( Orlovsky , Lee) • Fuzzy satisfaction degree and properties • Fuzzy preference relation in continuous domain and visualization • Applications (decision support systems, attack signatures, fault tolerance voters)
    Session 25
    11.20.2007
    Revisiting covered material • Discussion

    No class
    11.22.2007

    Thanksgiving, no class
    Session 26
    11.27.2007
    Post exam review • Weka data mining tool
    Session 27
    11.29.2007
    Mining Frequent Patterns • Definitions (frequent itemset , frequent sequential pattern, frequent structured pattern, market basket analysis) • Association Rules (support & confidence) • Strong rules (occurrence frequency of an itemset ; relative support; absolute support; frequent itemset ; confidence) • Challenges in association rule mining
    Session 28
    12.04.2007
    Revisiting covered material • Discussion
    Session 29
    12.06.2007
    Paper discussions and class project paper discussions
    Session 30
    12.11.2007
    No examination week • Final class paper presentations
    Session 31
    12.13.2007

    No examination week • Remaining class paper presentations • Course summary (Topics, assignments, presentations)

    Week of
    12.17.2007
    Final examination week
Dr. Manic's home page Course main page Course homework page Check your grades Back to top of page