R Code

About the code:

R code - none of the codes poster here are optimized for speed.
Please consult the corresponding paper for details about algorithms.
Please do not redistribute.

  • Codes for "Simultaneous gene clustering and subset selection for sample classification via MDL"
  • Codes for Data depth cluster validation, clustering and classification via the L1 data depth.

  • Some R code for the gap statistic that I've implemented.
    Please consult the paper by Robert Tibshirani, Guenther Walther and Trevor Hastie for details. "Estimating the number of clusters in a dataset via the Gap statistic". Tech. report. March 2000. (published Journal of the Royal Statistical Society, B, 63:411-423,2001.

    Required libraries: cluster
    Assist functions: Source in the entire function file in R, the supporting functions are located inside the main file.

    The main programs are gappcalg. and gapalg.q (the standard gap and the principal component version)
    Input (for both)
    The data matrix: rows are clustered.
    kvec: a sequence of the numbers of clusters we can select (e.g. seq(1,10)).
    B: the number of simulated null data sets.

    Calling the functions:
    gappcalg(leukemiadata,seq(1,6),10)

    Output:

    ksel: the selected number of clusters, Sgap and Sdgap: the gap statistics for different numbers of clusters and their sd, diffk: the difference between the gap statistics for consecutive numbers of clusters.

    The gap statistic +/- prediction sd is plotted for the values of kvec. Look for an "elbow", the smallest k for which a significant increase in gap is observed.

  • Codes for Cluster validation using the Gap statistic (PC method), and the standard gap statistic.
  • Back to : Rebecka

    05/03