DATA MINING SPECIAL TOPICS CLASS

Spring 2001



Class Topics.
I will post links to materials we use in the class here.

DATE TOPICS LINKS
January 18 Introduction to Data Mining HTML
A warehousing and mining case study
Association Rules
This is an edited version of Han and Kamber's Chapter 1 notes
-
See Han and Kamber's Chapter 6 notes
January 25 Relational databases and SQL
Data Warehousing
Ideas for projects HTML
Visualizing and Exploring Data HTML PPT
There are excellent notes here and here.
See Han and Kamber's Chapter 2 notes
-
The OLIVE Library
February 1 Some basic principles HTML PPT
Ideas for projects HTML
*Support Vector Machines HTML PPT
Chapter 4, 5, & 6 of HMS
-
-
February 8 Descriptive modeling HTML PPT
Graphical Markov Models-Intro
Ideas for projects HTML
*Global Partial Orders from Sequential Data
Chapter 9 of HMS
-
-
PDF PS
February 15 Graphical Markov Models-Gaussian UDGs
Ideas for projects HTML
Score Functions HTML PPT
-
-
Chapter 7 of HMS
February 22 Predictive Models for Classification HTML PPT
Bias versus Variance for Classification
Chapter 10 of HMS
PS PDF
March 1 Bias correction for Naive Bayes
Search and Optimization Methods HTML PPT
*Flexible Metric NN Classification
-
Chapter 8 of HMS
ps.Z PPT
March 8 *Variational methods
Monte Carlo methods
-
-
March 15 Spring break -
March 22 *Likelihood-based squashing
Bayesian networks
PDF
-
March 29 Bayesian networks continued -
April 5 Bagging, Boosting, etc. PPT
Neural Networks
Check this list. See also Greg Ridgeway's Interface paper.
PPT (slides by Sebastian Thrun)
April 12 Information retrieval and filtering -
April 19 I need to reschedule this class -
April 26 Project presentations -
* indicates a "special topic"


Software for Projects.
You can use any software you like for the projects. I recommend that you use tools like Matlab, SAS, or S-Plus so as not to get too bogged down in software issues. Feel free to either use available implementations of data mining algorithms (many are publically available, e.g., see the UC Irvine Machine Learning Data Archive for a list of data mining/machine-learning software or you can implement an algorithm directly in a language of their choice (naturally you will learn more if you opt for the 2nd option, but please make sure you understand the complexity of the project you are proposing and that you will have time to finish it). Check the kdnuggets site for an extensive listing of software tools.


Other Data Mining Courses


Most of the links below come from Padhraic Smyth's course at UCI.

General Pointers to Data Mining and related Web pages


Below are some pointers to fairly large collections of online data sets. If you need to find a data set for a project, there is probably something on one of these sites of potential interest. I'll try to update this list and add some more recent data sets
 

General Benchmark Data Collections

Collections of Application-Specific Data Sets

Sites which have lists of various data set collections