R Code
About the code:
- R code - none of the codes poster here are optimized for speed.
- Please consult the corresponding paper for details about algorithms.
- Please do not redistribute.
- Codes for "Simultaneous gene clustering and subset selection for sample classification via MDL"
- Codes for Data depth cluster validation, clustering and classification via the L1 data depth.
- Some R code for the gap statistic that I've implemented.
- Please consult the paper by Robert Tibshirani, Guenther Walther and Trevor Hastie for details. "Estimating the number of clusters in a dataset via the Gap statistic". Tech. report. March 2000. (published Journal of the Royal Statistical Society, B, 63:411-423,2001.
- Required libraries: cluster
- Assist functions: Source in the entire function file in R, the supporting functions are located inside the main file.
- The main programs are gappcalg. and gapalg.q (the standard gap and the principal component version)
- Input (for both)
- The data matrix: rows are clustered.
- kvec: a sequence of the numbers of clusters we can select (e.g. seq(1,10)).
- B: the number of simulated null data sets.
- Calling the functions:
- gappcalg(leukemiadata,seq(1,6),10)
- Output:
- ksel: the selected number of clusters, Sgap and Sdgap: the gap statistics for different numbers of clusters and their sd, diffk: the difference between the gap statistics for consecutive numbers of clusters.
- The gap statistic +/- prediction sd is plotted for the values of kvec. Look for an "elbow", the smallest k for which a significant increase in gap is observed.
-
- Codes for Cluster validation using the Gap statistic (PC method), and the standard gap statistic.
Back to : Rebecka
05/03