We will also make use of J. Han and M. Kamber (2000) Data mining: concepts and techniques. Morgan Kaufman. Han and Kamber have a terrific set of lecture notes on the Web.
Class Topics.
I will post links to materials we use in the class here.
| DATE | TOPICS | LINKS |
| January 18 | Introduction to Data Mining
HTML
A warehousing and mining case study Association Rules |
This is an edited version of Han and Kamber's
Chapter 1 notes - See Han and Kamber's Chapter 6 notes |
| January 25 | Relational databases and SQL Data Warehousing Ideas for projects HTML Visualizing and Exploring Data HTML PPT |
There are excellent notes
here and here.
See Han and Kamber's Chapter 2 notes - The OLIVE Library |
| February 1 | Some basic principles HTML
PPT
Ideas for projects HTML *Support Vector Machines HTML PPT |
Chapter 4, 5, & 6 of HMS - - |
| February 8 |
Descriptive modeling HTML
PPT
Graphical Markov Models-Intro Ideas for projects HTML *Global Partial Orders from Sequential Data |
Chapter 9 of HMS
- - PDF PS |
| February 15 |
Graphical Markov Models-Gaussian UDGs
Ideas for projects HTML Score Functions HTML PPT |
-
- Chapter 7 of HMS |
| February 22 |
Predictive Models for Classification HTML
PPT
Bias versus Variance for Classification |
Chapter 10 of HMS
PS PDF |
| March 1 |
Bias correction for Naive Bayes
Search and Optimization Methods HTML PPT *Flexible Metric NN Classification |
-
Chapter 8 of HMS ps.Z PPT |
| March 8 |
*Variational methods
Monte Carlo methods |
-
- |
| March 15 | Spring break | - |
| March 22 |
*Likelihood-based squashing
Bayesian networks |
PDF
- |
| March 29 | Bayesian networks continued | - |
| April 5 |
Bagging, Boosting, etc. PPT
Neural Networks |
Check this list. See also Greg Ridgeway's Interface paper.
PPT (slides by Sebastian Thrun) |
| April 12 | Information retrieval and filtering | - |
| April 19 | I need to reschedule this class | - |
| April 26 | Project presentations | - |
Software for Projects.
You can use any software you like for the projects.
I recommend that you use tools like Matlab,
SAS, or S-Plus so as not to get too bogged down
in software issues.
Feel free to either use available implementations of data mining algorithms
(many are publically available, e.g., see the UC Irvine Machine Learning Data
Archive for a list of data mining/machine-learning software
or you can implement an algorithm directly in a
language of their choice (naturally you will learn more if you opt for the 2nd
option, but please make sure you understand the complexity of the project you
are proposing and that you will have time to finish it).
Check the kdnuggets site
for an extensive listing of software tools.