Skip to main content
Comparative and Functional Genomics logoLink to Comparative and Functional Genomics
. 2003 Jun;4(3):287–299. doi: 10.1002/cfg.290

Performance Assessment of Kernel Density Clustering for Gene Expression Profile Data

Guoping Shu 1,, Beiyan Zeng 1, Yiping P Chen 1, Oscar H Smith 1
PMCID: PMC2448457  PMID: 18629292

Abstract

Kernel density smoothing techniques have been used in classification or supervised learning of gene expression profile (GEP) data, but their applications to clustering or unsupervised learning of those data have not been explored and assessed. Here we report a kernel density clustering method for analysing GEP data and compare its performance with the three most widely-used clustering methods: hierarchical clustering, K-means clustering, and multivariate mixture model-based clustering. Using several methods to measure agreement, between-cluster isolation, and withincluster coherence, such as the Adjusted Rand Index, the Pseudo F test, the r2 test, and the profile plot, we have assessed the effectiveness of kernel density clustering for recovering clusters, and its robustness against noise on clustering both simulated and real GEP data. Our results show that the kernel density clustering method has excellent performance in recovering clusters from simulated data and in grouping large real expression profile data sets into compact and well-isolated clusters, and that it is the most robust clustering method for analysing noisy expression profile data compared to the other three methods assessed.

Full Text

The Full Text of this article is available as a PDF (485.9 KB).

Selected References

These references are in PubMed. This may not be the complete list of references from this article.

  1. Brown M. P., Grundy W. N., Lin D., Cristianini N., Sugnet C. W., Furey T. S., Ares M., Jr, Haussler D. Knowledge-based analysis of microarray gene expression data by using support vector machines. Proc Natl Acad Sci U S A. 2000 Jan 4;97(1):262–267. doi: 10.1073/pnas.97.1.262. [DOI] [PMC free article] [PubMed] [Google Scholar]
  2. De Smet Frank, Mathys Janick, Marchal Kathleen, Thijs Gert, De Moor Bart, Moreau Yves. Adaptive quality-based clustering of gene expression profiles. Bioinformatics. 2002 May;18(5):735–746. doi: 10.1093/bioinformatics/18.5.735. [DOI] [PubMed] [Google Scholar]
  3. DeRisi J. L., Iyer V. R., Brown P. O. Exploring the metabolic and genetic control of gene expression on a genomic scale. Science. 1997 Oct 24;278(5338):680–686. doi: 10.1126/science.278.5338.680. [DOI] [PubMed] [Google Scholar]
  4. Ghosh Debashis, Chinnaiyan Arul M. Mixture modelling of gene expression data from microarray experiments. Bioinformatics. 2002 Feb;18(2):275–286. doi: 10.1093/bioinformatics/18.2.275. [DOI] [PubMed] [Google Scholar]
  5. Herrero J., Valencia A., Dopazo J. A hierarchical unsupervised growing neural network for clustering gene expression patterns. Bioinformatics. 2001 Feb;17(2):126–136. doi: 10.1093/bioinformatics/17.2.126. [DOI] [PubMed] [Google Scholar]
  6. Heyer L. J., Kruglyak S., Yooseph S. Exploring expression data: identification and analysis of coexpressed genes. Genome Res. 1999 Nov;9(11):1106–1115. doi: 10.1101/gr.9.11.1106. [DOI] [PMC free article] [PubMed] [Google Scholar]
  7. Lee Jian-Ming, Williams Mark E., Tingey Scott V., Rafalski J. Antoni. DNA array profiling of gene expression changes during maize embryo development. Funct Integr Genomics. 2002 Feb 19;2(1-2):13–27. doi: 10.1007/s10142-002-0046-6. [DOI] [PubMed] [Google Scholar]
  8. Lee M. L., Kuo F. C., Whitmore G. A., Sklar J. Importance of replication in microarray gene expression studies: statistical methods and evidence from repetitive cDNA hybridizations. Proc Natl Acad Sci U S A. 2000 Aug 29;97(18):9834–9839. doi: 10.1073/pnas.97.18.9834. [DOI] [PMC free article] [PubMed] [Google Scholar]
  9. Li L., Weinberg C. R., Darden T. A., Pedersen L. G. Gene selection for sample classification based on gene expression data: study of sensitivity to choice of parameters of the GA/KNN method. Bioinformatics. 2001 Dec;17(12):1131–1142. doi: 10.1093/bioinformatics/17.12.1131. [DOI] [PubMed] [Google Scholar]
  10. Lockhart D. J., Winzeler E. A. Genomics, gene expression and DNA arrays. Nature. 2000 Jun 15;405(6788):827–836. doi: 10.1038/35015701. [DOI] [PubMed] [Google Scholar]
  11. Tamayo P., Slonim D., Mesirov J., Zhu Q., Kitareewan S., Dmitrovsky E., Lander E. S., Golub T. R. Interpreting patterns of gene expression with self-organizing maps: methods and application to hematopoietic differentiation. Proc Natl Acad Sci U S A. 1999 Mar 16;96(6):2907–2912. doi: 10.1073/pnas.96.6.2907. [DOI] [PMC free article] [PubMed] [Google Scholar]
  12. Tavazoie S., Hughes J. D., Campbell M. J., Cho R. J., Church G. M. Systematic determination of genetic network architecture. Nat Genet. 1999 Jul;22(3):281–285. doi: 10.1038/10343. [DOI] [PubMed] [Google Scholar]
  13. Tseng G. C., Oh M. K., Rohlin L., Liao J. C., Wong W. H. Issues in cDNA microarray analysis: quality filtering, channel normalization, models of variations and assessment of gene effects. Nucleic Acids Res. 2001 Jun 15;29(12):2549–2557. doi: 10.1093/nar/29.12.2549. [DOI] [PMC free article] [PubMed] [Google Scholar]
  14. Yeung K. Y., Fraley C., Murua A., Raftery A. E., Ruzzo W. L. Model-based clustering and data transformations for gene expression data. Bioinformatics. 2001 Oct;17(10):977–987. doi: 10.1093/bioinformatics/17.10.977. [DOI] [PubMed] [Google Scholar]

Articles from Comparative and Functional Genomics are provided here courtesy of Wiley

RESOURCES