Summary
Despite advances in microarray technology that have led to increased reproducibility and substantial reductions in the cost of microarrays, the successful use of this technology is still elusive for many researchers, and microarray data analysis in particular presents a substantial bottleneck for many biomedical researchers. There are many reasons for this, including the expense of and a lack of adequate training in the use of analysis software. An additional reason is that microarray data analysis has largely been treated in the past as a set of separate steps, with the majority of emphasis being placed on statistical analysis and visualization of the data. For many biomedical researchers determining the biological significance of the data has been the greatest challenge and in the last several years more emphasis has been placed on this aspect of the analysis process. Despite this broadening of the scope of analysis there are still several aspects of the process that continue to be neglected, including additional related and interdependent aspects, such as experimental design, data accessibility, and platform selection. Though not traditionally thought of as integral to the data analysis process, these factors have profound effects on the analysis process. This article will discuss the importance of these additional aspects, as well as statistical analysis and determination of biological significance of microarray data. A summary of currently available software options will also be presented with a focus on the aspects discussed.
Key Words: Microarray data analysis, biological significance, microarray, statistical analysis, clustering
References
- 1.Miller RM, Federoff HJ. Microarrays in Parkinson’s Disease: A Systematic Approach. NeuroRx. 2006;3:318–325. doi: 10.1016/j.nurx.2006.05.008. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 2.Miller LD, Long PM, Wong L, Mukherjee S, McShane LM, Liu ET. Optimal gene expression analysis by microarrays. Cancer Cell. 2002;2:353–361. doi: 10.1016/S1535-6108(02)00181-2. [DOI] [PubMed] [Google Scholar]
- 3.Draghici S. Statistical intelligence: effective analysis of high-density microarray data. Drug Discovery Today. 2002;7:S55–S63. doi: 10.1016/S1359-6446(02)02292-4. [DOI] [PubMed] [Google Scholar]
- 4.Pavlidis P. Using ANOVA for gene selection from microarray studies of the nervous system. Methods. 2003;31:282–289. doi: 10.1016/S1046-2023(03)00157-9. [DOI] [PubMed] [Google Scholar]
- 5.Pavlidis P, Li Q, Noble WS. The effect of replication on gene expression microarray experiments. Bioinformatics. 2003;19:1620–1627. doi: 10.1093/bioinformatics/btg227. [DOI] [PubMed] [Google Scholar]
- 6.Borovecki F, Lovrecic L, Zhou J, Jeong H, Then F, Rosas HD, et al. Genome-wide expression profiling of human blood reveals biomarkers for Huntington’s disease. Proc Natl Acad Sci USA. 2005;102:11023–11028. doi: 10.1073/pnas.0504921102. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 7.Barnes M, Freudenberg J, Thompson S, Aronow B, Pavlidis P. Experimental comparison and cross-validation of the Affymetrix and Illumina gene expression analysis platforms. Nucleic Acids Res. 2005;33:5914–5923. doi: 10.1093/nar/gki890. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 8.Larkin JE, Frank BC, Gavras H, Sultana R, Quackenbush J. Independence and reproducibility across microarray platforms. Nat Methods. 2005;2:337–344. doi: 10.1038/nmeth757. [DOI] [PubMed] [Google Scholar]
- 9.Irizarry RA, Warren D, Spencer F, Kim IF, Biswal S, Frank BC, et al. Multiple-laboratory comparison of microarray platforms. Nat Methods. 2005;2:345–350. doi: 10.1038/nmeth756. [DOI] [PubMed] [Google Scholar]
- 10.Reimers M, Heilig M, Sommer WH. Gene discovery in neuropharmacological and behavioral studies using Affymetrix microarray data. Methods. 2005;37:219–228. doi: 10.1016/j.ymeth.2005.09.002. [DOI] [PubMed] [Google Scholar]
- 11.Bolstad BM, Irizarry RA, Astrand M, Speed TP. A comparison of normalization methods for high density oligonucleotide array data based on variance and bias. Bioinformatics. 2003;19:185–193. doi: 10.1093/bioinformatics/19.2.185. [DOI] [PubMed] [Google Scholar]
- 12.Quackenbush J. Microarray data normalization and transformation. Nat Genet. 2002;32:S496–S501. doi: 10.1038/ng1032. [DOI] [PubMed] [Google Scholar]
- 13.Rosati B, Grau F, Kuehler A, Rodriguez S, McKinnon D. Comparison of different probe-level analysis techniques for oligonucleotide microarrays. Biotechniques. 2004;36:316–322. doi: 10.2144/04362MT03. [DOI] [PubMed] [Google Scholar]
- 14.Wu W, Dave N, Tseng GC, Richards T, Xing EP, Kaminski N. Comparison of normalization methods for CodeLink Bioarray data. BMC Bioinformatics. 2005;6:309–309. doi: 10.1186/1471-2105-6-309. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 15.Almudevar et al. NeuroRx 2006. [DOI] [PMC free article] [PubMed]
- 16.Baldi P, Long AD. A Bayesian framework for the analysis of microarray expression data: regularized t-test and statistical inferences of gene changes. Bioinformatics. 2001;17:509–519. doi: 10.1093/bioinformatics/17.6.509. [DOI] [PubMed] [Google Scholar]
- 17.Benjamini Y, Hochberg Y. Controlling the false discovery rate: a practical and powerful approach to multiple testing. J R Stat Soc B. 1995;57:289–300. [Google Scholar]
- 18.Dudoit S, Shaffer JP, Boldrick JC. Multiple hypothesis testing in microarray experiments. Stat Sci. 2003;18:71–103. doi: 10.1214/ss/1056397487. [DOI] [Google Scholar]
- 19.Reiner A, Yekutieli D, Benjamini Y. Identifying differentially expressed genes using false discovery rate controlling procedures. Bioinformatics. 2003;19:368–375. doi: 10.1093/bioinformatics/btf877. [DOI] [PubMed] [Google Scholar]
- 20.Kaufman L, Rousseeuw PJ. Finding groups in data: an introduction to cluster analysis. New York: Wiley; 1990. [Google Scholar]
- 21.Eisen MB, Spellman PT, Brown PO, Botstein D. Cluster analysis and display of genome-wide expression patterns. Proc Natl Acad Sci USA. 1998;95:14863–14868. doi: 10.1073/pnas.95.25.14863. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 22.Quackenbush J. Computational analysis of microarray data. Nat Rev Genet. 2001;2:418–427. doi: 10.1038/35076576. [DOI] [PubMed] [Google Scholar]
- 23.Wheeler DL, Barrett T, Benson DA, Bryant SH, Canese K, Chetvemin V, et al. Database resources of the National Center for Biotechnology Information. Nucleic Acids Res. 2006;34:D173–D180. doi: 10.1093/nar/gkj158. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 24.Doniger SW, Salomonis N, Dahlquist KD, Vranizan K, Lawlor SC, Conklin BR. MAPPFinder: using Gene Ontology and Gen-MAPP to create a global gene-expression profile from microarray data. Genome Biol. 2003;4:R7–R7. doi: 10.1186/gb-2003-4-1-r7. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 25.Pavlidis P, Qin J, Arango V, Mann JJ, Sibille E. Using the gene ontology for microarray data mining: a comparison of methods and application to age effects in human prefrontal cortex. Neurochem Res. 2004;29:1213–1222. doi: 10.1023/B:NERE.0000023608.29741.45. [DOI] [PubMed] [Google Scholar]
- 26.Ball CA, Sherlock G, Parkinson H, Rocca-Sera P, Brooksbank C, Causton HC, et al. Microarray Gen Expression Data (MGED) Society. Standards for microarray data. Science. 2003;298:539–539. doi: 10.1126/science.298.5593.539b. [DOI] [PubMed] [Google Scholar]
- 27.Brazma A, Hingamp P, Quackenbush J, Sherlock G, Spellman P, Stoeckert C, et al. Minimum information about a microarray experiment (MIAME): toward standards for microarray data. Nat Genet. 2001;29:365–371. doi: 10.1038/ng1201-365. [DOI] [PubMed] [Google Scholar]
- 28.MIAME checklist. The Microarray Gene Expression Data (MGED) Society. Available at: http://www.mged.org/ Workgroups/MIAME/miame_checklist.html. Accessed: June 5, 2006.
- 29.Barrett T, Suzek TO, Troup DB, Wilhite SE, Ngau WC, Ledoux P, et al. NCBI GEO: mining millions of expression profiles. Database and tools. Nucleic Acids Res. 2005;33:D562–D566. doi: 10.1093/nar/gki022. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 30.Knudsen S. A biologist’s guide to analysis of DNA microarray data. New York: Wiley-Liss; 2002. [Google Scholar]
- 31.Parmigiani G, Garret E, Irizarry R, Zeger S. The analysis of gene expression data: methods and software. New York: Springer; 2003. [Google Scholar]
- 32.Quackenbush J. Open-source software accelerates bioinformatics. Genome Biol. 2003;4:336–336. doi: 10.1186/gb-2003-4-9-336. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 33.Dudoit S, Gentleman RC, Quackenbush J. Open source software for the analysis of microarray data.Biotechniques S45-S51, 2003. [PubMed]