Abstract
Motivation: Recent attempts to account for multiple testing in the analysis of microarray data have focused on controlling the false discovery rate (FDR), which is defined as the expected percentage of the number of false positive genes among the claimed significant genes. As a consequence, the accuracy of the FDR estimators will be important for correctly controlling FDR. Xie et al. found that the standard permutation method of estimating FDR is biased and proposed to delete the predicted differentially expressed (DE) genes in the estimation of FDR for one-sample comparison. However, we notice that the formula of the FDR used in their paper is incorrect. This makes the comparison results reported in their paper unconvincing. Other problems with their method include the biased estimation of FDR caused by over- or under-deletion of DE genes in the estimation of FDR and by the implicit use of an unreasonable estimator of the true proportion of equivalently expressed (EE) genes. Due to the great importance of accurate FDR estimation in microarray data analysis, it is necessary to point out such problems and propose improved methods.
Results: Our results confirm that the standard permutation method overestimates the FDR. With the correct FDR formula, we show the method of Xie et al. always gives biased estimation of FDR: it overestimates when the number of claimed significant genes is small, and underestimates when the number of claimed significant genes is large. To overcome these problems, we propose two modifications. The simulation results show that our estimator gives more accurate estimation.
Contact: szhang3@unl.edu
REFERENCES
- Benjamini Y, Hochberg Y. Controlling the false discovery rate: a pratical and powerful approach to multiple testing. J. R. Stat. Soc. 1995;57:289–300. [Google Scholar]
- Benjamini Y, Yekutieli D. The control of the False discovery rate in multiple testing under dependency. Ann. Stat. 2001;29:1165–1188. [Google Scholar]
- Efron B, et al. Empirical Bayes analysis of a microarray experiment. J. Am. Stat. Assoc. 2001;96:1151–1160. [Google Scholar]
- Guo X, Pan W. Using weighted permutation scores to detect differential gene expression with microarray data. J. Comput. Biol. 2005;3:989–1006. doi: 10.1142/s021972000500134x. [DOI] [PubMed] [Google Scholar]
- Kendziorski CM, et al. On parametric empirical Bayes methods for comparing multiple groups using replicated gene expression profiles. Stat. Med. 2003;22:3899–3914. doi: 10.1002/sim.1548. [DOI] [PubMed] [Google Scholar]
- Kerr MK, et al. Analysis of variance for gene expression microarray data. J. Comput. Biol. 2000;7:19–837. doi: 10.1089/10665270050514954. [DOI] [PubMed] [Google Scholar]
- Newton MA, et al. On differentially variability of expression ratios: improving statistical inference about gene expression changes from microarray data. J. Comput. Biol. 2001;8:37–52. doi: 10.1089/106652701300099074. [DOI] [PubMed] [Google Scholar]
- Pan W, et al. A mixture model approach to detecting differentially expressed genes with microarray data. Funct. Integr. Genomics. 2003;3:117–124. doi: 10.1007/s10142-003-0085-7. [DOI] [PubMed] [Google Scholar]
- Pan W. On the use of permutation in the performance of a class of nonparametric methods to detect differential gene expression. Bioinformatics. 2003;19:1333–1040. doi: 10.1093/bioinformatics/btg167. [DOI] [PubMed] [Google Scholar]
- Pollard KS, et al. Multiple testing procedures: R multtest package and applications to genomics. [last accessed date December 2004];2004 164 U.C. Berkeley Division of Biostatistics Working Paper Series. Working Paper, Available at http://www.bepress.com/ucbbiostat/paper164. [Google Scholar]
- Smyth GK. Linear models and empirical bayes methods for assessing differential expression in microarray experiments. Stat. Appl. Gene. Mol. Biol. 2004;3 doi: 10.2202/1544-6115.1027. Article 3. [DOI] [PubMed] [Google Scholar]
- Storey JD, Tibshirani R. Statistical significance for genomewide studies. Proc. Natl Acad. Sci. USA. 2003;100:9440–9445. doi: 10.1073/pnas.1530509100. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Thomas JG, et al. An efficient and robust statistical modeling approach to discover differentially expressed genes using genomic expression profiles. Genome Res. 2001;11:1227–1236. doi: 10.1101/gr.165101. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Tusher VG, et al. Significant analysis of microarrays applied to the ionizing radiation response. Proc. Natl Acad. Sci. USA. 2001;98:5116–5121. doi: 10.1073/pnas.091062498. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Xie Y, et al. A note on using permutation based false discovery rate estimate to compare different analysis methods for microarray data. Bioinformatics. 2005;21:4280–4288. doi: 10.1093/bioinformatics/bti685. [DOI] [PubMed] [Google Scholar]
- Yekutieli D, Benjamini Y. Resampling based false discovery rate controlling multiple testing procedure for correlated test statistics. J. Stat. Plann. Inference. 1999;82:171–196. [Google Scholar]
- Zhao Y, Pan W. Modified nonparametric approaches to detecting differentially expressed genes in replicated microarray experiments. Bioinformatics. 2003;19:1046–1054. doi: 10.1093/bioinformatics/btf879. [DOI] [PubMed] [Google Scholar]
- Zhang S. An improved nonparametric approach for detecting differentially expressed genes with replicated microarray data. Stat. Appl. Gene. Mol. Biol. 2006;5 doi: 10.2202/1544-6115.1246. Article 30. [DOI] [PubMed] [Google Scholar]
- Zhong S, et al. Evolutionary genomics of ecological specialization. Proc. Natl Acad. Sci. USA. 2004;101:11719–11724. doi: 10.1073/pnas.0404397101. [DOI] [PMC free article] [PubMed] [Google Scholar]