Skip to main content
NeuroRx logoLink to NeuroRx
. 2012 Sep 5;3(3):384–395. doi: 10.1016/j.nurx.2006.05.037

Utility of correlation measures in analysis of gene expression

Anthony Almudevar 1, Lev B Klebanov 2, Xing Qiu 1, Peter Salzman 1, Andrei Y Yakovlev 1,
PMCID: PMC3593386  PMID: 16815221

Summary

The role of the correlation structure of gene expression data are two-fold: It is a source of complications and useful information at the same time. Ignoring the strong stochastic dependence between gene expression levels in statistical methodologies for microarray data analysis may deteriorate their performance. However, there is a host of valuable information in the correlation structure that deserves a closer look. A proper use of correlation measures can remedy deficiencies of currently practiced methods that are focused too heavily on strong effects in terms of differential expression of genes. The present paper discusses the utility of correlation measures in microarray data analysis and gene regulatory network reconstruction, along with various pitfalls in both research areas that have been uncovered in methodological studies. These issues have broad applicability to all genomic studies examining the biology, diagnosis, and treatment of neurological disorders.

Key Words: Gene expression, correlation, microarrays, genetic networks

References

  • 1.Storey JD. Comment on Resampling-based multiple testing for DNA microarray data analysis by Ge, Dudoit, and Speed. Test. 2003;12:1–77. doi: 10.1007/BF02595811. [DOI] [Google Scholar]
  • 2.Qiu X, Brooks A, Klebanov L, Yakovlev A. The effects of normalization on the correlation structure of microarray data. BMC Bioinformatics. 2005;6:120–120. doi: 10.1186/1471-2105-6-120. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 3.Friedman N, Linial M, Nachman I, Pe’er D. Using Bayesian networks to analyze expression data. J Comput Biol. 2000;7:601–620. doi: 10.1089/106652700750050961. [DOI] [PubMed] [Google Scholar]
  • 4.Wolfinger RD, Gibson G, Wolfinger ED, Bennett L, Hamadeh H, Bushel P, et al. Assessing gene significance from cDNA microarray expression data via mixed models. J Comp Biol. 2001;8:625–637. doi: 10.1089/106652701753307520. [DOI] [PubMed] [Google Scholar]
  • 5.Dudoit S, Shaffer JP, Boldrick JC. Multiple hypothesis testing in microarray experiments. Stat Sci. 2003;18:71–103. doi: 10.1214/ss/1056397487. [DOI] [Google Scholar]
  • 6.Simon RM, Korn EL, McShane LM, Radmacher MD, Wright GW, Zhao Y. Design and analysis of DNA microarray investigations. New York: Springer; 2003. [Google Scholar]
  • 7.Speed TP. Statistical analysis of gene expression microarray data. Boca Raton, FL: Chapman & Hall CRC; 2003. [Google Scholar]
  • 8.Lee M-L. Analysis of microarray gene expression data. Boston: Kluwer; 2004. [Google Scholar]
  • 9.McLachlan GL, Do K-A, Ambroise C. Analyzing microarray gene expression data. Hoboken, NJ: Wiley; 2004. [Google Scholar]
  • 10.Wit E, MacClure J. Statistics for microarrays. Chichester: Wiley; 2004. [Google Scholar]
  • 11.Tian L, Greenberg SA, Kong SW, Altschuler J, Kohane IS, Park PJ. Discovering statistically significant pathways in expression profiling studies. Proc Natl Acad Sci USA. 2005;102:13544–13549. doi: 10.1073/pnas.0506577102. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 12.Ma’ayan A, Gardiner K, Iyengar R. The cognitive phenotype of Down syndrome: insights from intracellular network analysis. NeuroRx. 2006;3:394–403. doi: 10.1016/j.nurx.2006.05.036. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 13.Ginsberg SD, Che S, Counts SE, Mufson EJ. Single cell gene expression profiling in Alzheimer’s disease. NeuroRx. 2006;3:302–317. doi: 10.1016/j.nurx.2006.05.007. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 14.Miller RM, Federoff HJ. Microarrays in Parkinson’s disease: a systematic approach. NeuroRx. 2006;3:318–325. doi: 10.1016/j.nurx.2006.05.008. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 15.Klebanov L, Jordan C, Yakovlev A. A new type of stochastic dependence revealed in gene expression data. Stat Appl Genet Mol Biol. 2006;5:7–7. doi: 10.2202/1544-6115.1189. [DOI] [PubMed] [Google Scholar]
  • 16.Butte AJ, Tamayo P, Slonim D, Golub TR. Discovering functional relationships between RNA expression and chemotherapeutic susceptibility using relevance networks. Proc Natl Acad Sci USA. 2000;97:12182–12186. doi: 10.1073/pnas.220392197. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 17.Qiu X, Klebanov L, Yakovlev AY. Correlation between gene expression levels and limitations of the empirical Bayes methodology for finding differentially expressed genes. Stat Appl Genet Mol Biol. 2005;4:34–34. doi: 10.2202/1544-6115.1157. [DOI] [PubMed] [Google Scholar]
  • 18.Qiu X, Yakovlev A. Instability of false discovery rate estimation. Technical Report 06/03. Available at: http://www.urmc.rochester.edu/ smd/biostat/people/faculty/andrei.htm. Accessed: 2006.
  • 19.Goerman JJ, van de Geer SA, de Kort F, van Houwelingen HC. A global test for groups of genes: testing association with a clinical outcome. Bioinformatics. 2004;20:93–99. doi: 10.1093/bioinformatics/btg382. [DOI] [PubMed] [Google Scholar]
  • 20.Jaeger J, Sengupta R, Ruzzo WL. Improved gene selection for classification of microarrays. Kauai, HI: Pacific Symposium on Biocomputing, 2003 [DOI] [PubMed]
  • 21.Xiao Y, Frisina R, Gordon A, Klebanov L, Yakovlev A. Multi-variate search for differentially expressed gene combinations. BMC Bioinformatics. 2004;5:164–164. doi: 10.1186/1471-2105-5-164. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 22.Lu Y, Liu P-Y, Deng H-W. Hotelling’sT2 multivariate profiling for detecting differential expression in microarrays. Bioinformatics. 2005;21:3105–3113. doi: 10.1093/bioinformatics/bti496. [DOI] [PubMed] [Google Scholar]
  • 23.Dettling M, Gabrielson E, Parmigiani G. Searching for differentially expressed gene combinations. Genome Biol. 2005;6:R88–R88. doi: 10.1186/gb-2005-6-10-r88. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 24.Efron B. Correlation and large-scale simultaneous significance testing. Available at: http://www-stat.stanford.edu/~brad/papers/. Accessed: 2006.
  • 25.Eisen MB, Spellman PT, Brown PO, Botstein D. Cluster analysis and display of genome-wide expression patterns. Proc Natl Acad Sci USA. 1998;95:14863–14868. doi: 10.1073/pnas.95.25.14863. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 26.Lee M-L, Kuo FC, Whitmore GA, Sklar J. Importance of replication in microarray gene expression studies: statistical methods and evidence from repetitive DNA hybridizations. Proc Natl Acad Sci USA. 2000;97:9834–9839. doi: 10.1073/pnas.97.18.9834. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 27.Pavlidis P, Li Q, Noble WS. The effect of replication on gene expression microarray experiments. Bioinformatics. 2003;19:1620–1627. doi: 10.1093/bioinformatics/btg227. [DOI] [PubMed] [Google Scholar]
  • 28.Qiu X, Xiao Y, Gordon A, Yakovlev A. Assessing stability of gene selection in microarray data analysis. BMC Bioinformatics. 2006;7:50–50. doi: 10.1186/1471-2105-7-50. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 29.Lee M-LT, Gray RJ, Björkbacka H, Freeman MW. Generalized rank tests for replicated microarray data. Stat Appl Genet Mol Biol. 2005;4:3–3. doi: 10.2202/1544-6115.1093. [DOI] [PubMed] [Google Scholar]
  • 30.Klebanov L, Gordon A, Xiao Y, Land H, Yakovlev A. A permutation test motivated by microarray data analysis.Comp Stat Data Anal (in press).
  • 31.Xiao Y, Gordon A, Yakovlev A. The L1-version of the Cramer—von Mises test for two-sample comparisons in microarray data analysis. Technical Report 06/03. Available at: http://www.urmc.rochester.edu/ smd/biostat/people/faculty/andrei.htm. Accessed: 2006. [DOI] [PMC free article] [PubMed]
  • 32.Baldi P, Hatfield GW. DNA microarrays and gene expression: from experiments to data analysis and modeling. Cambridge, UK: Cambridge University Press; 2002. [Google Scholar]
  • 33.Verma TS, Pearl J. Equivalence and synthesis of causal models. Proc UAI. 1990;6:255–268. [Google Scholar]
  • 34.de la Fuente A, Bing N, Hoeschele I, Mendes P. Discovery of meaningful associations in genomic data using partial correlation coefficients. Bioinformatics. 2004;20:3565–3574. doi: 10.1093/bioinformatics/bth445. [DOI] [PubMed] [Google Scholar]
  • 35.Schafer J, Strimmer K. An empirical Bayes approach to inferring large-scale gene association networks. Bioinformatics. 2005;21:754–764. doi: 10.1093/bioinformatics/bti062. [DOI] [PubMed] [Google Scholar]
  • 36.Wille A, Bühlmann P. Low-order conditional independence graphs for inferring genetic networks.Stat Appl Genet Mol Biol 5(1) Article 1. 2006. Available at: http://www.bepress.com/sagmb/ vol5/iss1/art1. Accessed: 2006. [DOI] [PubMed]
  • 37.Butte AJ, Kohane IS. Mutual information relevance networks: functional genomic clustering using pairwise entropy measurements.Pacific Symp Biocomput 418–429, 2000. [DOI] [PubMed]
  • 38.Chu T, Glymour C, Schemes R, Spirtes P. A statistical problem for inference to regulatory structure from associations of gene expression measurements with microarrays. Bioinformatics. 2003;19:1147–1152. doi: 10.1093/bioinformatics/btg011. [DOI] [PubMed] [Google Scholar]
  • 39.Bahcall OG. Single cell resolution in regulation of gene expression. Mol Syst Biol. 2005;1:41000–41000. doi: 10.1038/msb4100020. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 40.Zhou X, Kao M-CJ, Wong WH. Transitive functional annotation by shortest-path analysis of gene expression data. Proc Natl Acad Sci USA. 2002;99:12783–12788. doi: 10.1073/pnas.192159399. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 41.Jordan MI, editor. Learning in graphical models. Cambridge, MA: The MIT Press; 1998. [Google Scholar]
  • 42.Madigan D, York J. Bayesian graphical models for discrete data. Int Stat Rev. 1995;63:215–232. doi: 10.2307/1403615. [DOI] [Google Scholar]
  • 43.Hoeting JA, Madigan D, Raftery AE, Volinsky CT. Bayesian model averaging: a tutorial. Stat Sci. 1999;14:382–417. doi: 10.1214/ss/1009212519. [DOI] [Google Scholar]
  • 44.Friedman N, Koller D. Being Bayesian about Bayesian network structure: a Bayesian approach to structure discovery in Bayesian networks. Machine Learn. 2003;50:95–125. doi: 10.1023/A:1020249912095. [DOI] [Google Scholar]
  • 45.Dash D, Cooper GF. Model averaging for prediction with discrete Bayesian networks. J Machine Learn Res. 2004;5:1177–1203. [Google Scholar]
  • 46.Almudevar, A, Salzman P. Using a Bayesian posterior density in the design of perturbation experiments. In Proceedings of the IEEE Symposium on Computational Intelligence and Bioinformatics and Computational Biology, 2005, San Diego, CA, 2005.
  • 47.Pearl J. Probabilistic reasoning in intelligent systems. San Francisco, CA: Morgan Kaufmann; 1988. [Google Scholar]
  • 48.Friedman N. Inferring cellular networks using probabilistic graphical models.Science 303: 799–805. [DOI] [PubMed]
  • 49.Sebastiani P, Abad M, Ramoni MF. Bayesian networks for genomic analysis. In: Dougherty ER, Shmulevich I, Chen J, Wang ZJ, editors. Genomic signal processing and statistics, EURASIP Book series on signal processing and communications. New York: Hindawi Publishing Corporation; 2005. pp. 281–320. [Google Scholar]
  • 50.Hartemink AJ. Reverse engineering gene regulatory networks. Nat Biotechnol. 2005;23:554–555. doi: 10.1038/nbt0505-554. [DOI] [PubMed] [Google Scholar]
  • 51.Heckerman D, Geiger D, Chickering DM. Learning Bayesian networks: the combination of knowledge and statistical data. Machine Learn. 1995;20:197–243. [Google Scholar]
  • 52.Friedman N, Goldszmidt M. Learning Bayesian networks with local structure. In: Jordon MI, editor. Learning in graphical models. Cambridge, MA: The MIT Press; 1998. pp. 412–459. [Google Scholar]

Articles from NeuroRx are provided here courtesy of Am. Soc. for Experimental NeuroTherapeutics

RESOURCES