Abstract
A recent editorial in PLoS Biology by MacCallum and Hill (2006) pointed out the inappropriateness of studies evaluating signatures of positive selection based solely in single-site analyses. Therefore the rising number of articles claiming positive selection that have been recently published urges the question of how to improve the bioinformatics standards for reliably unravel positive selection? Deeper integrative efforts using state-of-the-art methodologies at the gene-level and protein-level are improving positive selection studies. Here we provide some computational guidelines to thoroughly document molecular adaptation.
Keywords: bioinformatics, positive selection, molecular adaptation
The expression of the genetic information of living organisms depends largely on the functions of proteins. Important protein functionalities can be preserved by reducing genetic variability through purifying selection over long evolutionary time periods. In contrast, extensive genetic variation favoring amino-acid replacements in protein-coding genes through positive selection may originate novel functionalities. Understanding which gene is being influenced by natural selection can provide fundamental biological insight about species evolution and ecological fitness.
Selection can be inferred by comparing the rates of synonymous (silent; dS) and nonsynonymous (amino-acid replacement; dN) substitutions, where dS < dN is an indication of positive selection, and dS > dN suggests negative selection (Hughes and Nei, 1988). Powerful single-site analyses to detect selection have been developed (Yang and Bielawski, 2000) and have been implemented in relatively easy to use computer packages such as PAML (Yang, 1997). However, because these algorithms are so sensitive at detecting selection, many journals no longer publish papers that only use software such as PAML to identify adaptively evolving genes. Indeed, this issue was addressed by a recent editorial in PLoS Biology (MacCallum and Hill, 2006) where the editor points out the increasing number of articles claiming positive selection that have been recently published. To quote the above editorial “It is, therefore, no longer appropriate to sequence a gene in several species, stake a claim for positive selection, and expect the results to be published in a top-tier journal.” Such a policy is not limited to PLoS journals, but is also now being applied at more specialized journals such as Molecular Biology and Evolution, urging the need to improve evolutionary bioinformatics essays of molecular adaptation.
There are two main criticisms of single-site analyses. First, there is potentially a high probability of obtaining false-positives (Suzuki and Nei, 2002; Guindon et al. 2006). Second, a high dN/dS ratio may not actually reflect a signature of selection, but result from demographic populations events (Kreitman, 2000) and non-neutral evolution at synonymous sites (Chamary et al. 2006). Regardless, the controversy around the topic of positive selection raises the question of how to improve the standards and statistics for reliable bioinformatics studies on positive selection?
Increasingly powerful computational genomics and proteomics tools may be the ultimate bridge between structural biology and molecular evolution. Many of the recent studies claiming positive selection have relied mostly on single-site analyses and the link with protein function, when addressed, relied mostly on the identification of potential selected sites in available crystal-structures, along with speculation about its functional importance. Clearly, complementary and deeper protein-level approaches, which have been mostly unexploited previously, are required. Indeed, recent studies have shown that protein evolutionary history can be largely retraced (Weinreich et al. 2006; Yoshikuni et al. 2006), suggesting that deeper integrative efforts using state-of-the-art methodologies at the gene-level and protein-level may significantly improve positive selection studies. Here we provide some computational guidelines to thoroughly document molecular adaptation.
First, single-site analyses (Yang and Bielawski, 2000) are useful for detecting selection at the gene level when it operates more or less constantly over evolutionary time, but are less useful when selection operates temporarily, as appears to occur for most biological innovations. Thus, second, recent methods, combining both gene and protein information, should be applied. The nature of the amino acid change (“conservative” or “radical” depending on the magnitude of the physicochemical difference between amino-acids; Smith, 2003; Woolley et al. 2003), and the physical location of aminoacid sites in the three-dimensional (3D) protein structure (Suzuki, 2004; Berglund et al. 2005) are important assets for deciphering and interpreting molecular adaptation. Moreover, rate-shift models are also useful for testing protein functional divergence (Knudsen and Miyamoto, 2001).
Third, molecular adaptations studies should apply protein-level analyses that can overcome some of the limitations of single-site methods (Suzuki and Nei, 2002; Suzuki, 2004). These include computational techniques such as molecular mechanics, quantum mechanics, and hybrid-methods that study biological systems in atomic detail, including enzyme mechanistic assessments and rational drug design (reviewed in Ramos and Fernandes, 2006). Homology-modeling is a reliable technique to computationally infer an unknown protein 3D-structure based on experimentally determined 3D-structure of a related protein (>50% amino-acid identity) (Martí-Renom et al. 2000). Even when tools such as dN/dS fail to detect the selective history of a gene, a 3D-structural homology-model may detect non-negligible functional shifts (Andrés et al. 2004). Computational mutagenesis, molecular docking, and the calculation of electrostatics molecular potentials and free energies of association, reveal important functional interactions in enzymatic systems (complex ligand-receptor) and protein-protein interactions (Ramos and Fernandes, 2006). The implementation of such techniques using distributing computing and grid computing solutions may have great potential for future protein-level analyses at a genome-wide level.
Genomics and proteomics are rapidly-evolving research fields and their rational integration with other disciplines such as ecology and evolution has the potential to provide new perspectives on the process of adaptation relevancy and the neutral theory (da Fonseca et al. 2007; Marques et al. 2006). Rigorous interpretation and functional validation of targeted genes under adaptive evolution using integrated gene-level and protein-level information will improve the standards of reliable detection of positive selection and will be necessary to understand these fundamental evolutionary processes.
Acknowledgments
This work was supported in part by the Project POCTI/BSE/47559/2002 and PTDC/BIA-BDE/69144/2006 from the Portuguese Foundation for Science and Technology. Comments made by two anonymous referees improved a previous version of this manuscript.
Footnotes
Please note that this article may not be used for commercial purposes. For further information please refer to the copyright statement at http://www.la-press.com/copyright.htm
References
- Andrés AM, Soldevila M, Navarro A, Kidd KK, Oliva B, Bertranpetit J. Positive selection in MAOA gene is human exclusive: determination of the putative amino acid change selected in the human lineage. Hum. Genet. 2004;115(5):377–86. doi: 10.1007/s00439-004-1179-6. [DOI] [PubMed] [Google Scholar]
- Berglund AC, Wallner B, Elofsson A, Liberles DA. Tertiary windowing to detect positive diversifying selection. J. Mol. Evol. 2005;60(4):499–504. doi: 10.1007/s00239-004-0223-4. [DOI] [PubMed] [Google Scholar]
- Chamary JV, Parmley JL, Hurst LD. Hearing silence: non-neutral evolution at synonymous sites in mammals. Nat. Rev. Genet. 2006;7:98–108. doi: 10.1038/nrg1770. [DOI] [PubMed] [Google Scholar]
- da Fonseca R, Antunes A, Mélo A, Ramos MJ. Structural divergence and adaptive evolution in mammalian cytochromes P450 2C. Gene. 2007;387:58–66. doi: 10.1016/j.gene.2006.08.017. [DOI] [PubMed] [Google Scholar]
- Guindon S, Black M, Rodrigo A. Control of the false discovery rate applied to the detection of positively selected amino acid sites. Mol. Biol. Evol. 2006;23:919–926. doi: 10.1093/molbev/msj095. [DOI] [PubMed] [Google Scholar]
- Hughes AL, Nei M. Pattern of nucleotide substitution at major histocompatibility complex class I loci reveals overdominant selection. Nature. 1988;335:167–70. doi: 10.1038/335167a0. [DOI] [PubMed] [Google Scholar]
- Knudsen B, Miyamoto MM. A likelihood ratio test for evolutionary rate shifts and functional divergence among proteins. Proc. Natl. Acad. Sci. U.S.A. 2001;98:14512–7. doi: 10.1073/pnas.251526398. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Kreitman M. Methods to detect selection in populations with applications to the human. Annu. Rev. Genomics Hum. Genet. 2000;1:539–59. doi: 10.1146/annurev.genom.1.1.539. [DOI] [PubMed] [Google Scholar]
- MacCallum C, Hill E. Being positive about selection. PLoS Biol. 2006;4:e87. doi: 10.1371/journal.pbio.0040087. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Marques A, Antunes A, Fernandes PA, Ramos MJ. Comparative evolutionary genomics of the HADH2 gene encoding amyloid beta-binding alcohol dehydrogenase/17beta-hydroxysteroid dehydrogenase type 10 (ABAD/HSD10) BMC Genomics. 2006;7:202. doi: 10.1186/1471-2164-7-202. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Marti-Renom MA, Stuart AC, Fiser A, Sanchez R, Melo F, Sali A. Comparative protein structure modeling of genes and genomes. Annu. Rev. Biophys Biomol. Struct. 2000;29:291–325. doi: 10.1146/annurev.biophys.29.1.291. [DOI] [PubMed] [Google Scholar]
- Ramos MJ, Fernandes PA. Atomic-level rational drug design. Curr. Comp.-Aided Drug Design. 2006;2:57–81. [Google Scholar]
- Smith NG. Are radical and conservative substitution rates useful statistics in molecular evolution? J. Mol. Evol. 2003;57:467–478. doi: 10.1007/s00239-003-2500-z. [DOI] [PubMed] [Google Scholar]
- Suzuki Y, Nei M. Simulation study of the reliability and robustness of the statistical methods for detecting positive selection at single amino acid sites. Mol. Biol. Evol. 2002;19:1865–9. doi: 10.1093/oxfordjournals.molbev.a004010. [DOI] [PubMed] [Google Scholar]
- Suzuki Y. Three-dimensional window analysis for detecting positive selection at structural regions of proteins. Mol. Biol. Evol. 2004;21:2352–9. doi: 10.1093/molbev/msh249. [DOI] [PubMed] [Google Scholar]
- Weinreich DM, Delaney NF, Depristo MA, Hartl DL. Darwinian evolution can follow only very few mutational paths to fitter proteins. Science. 2006;312:111–114. doi: 10.1126/science.1123539. [DOI] [PubMed] [Google Scholar]
- Woolley S, Johnson J, Smith MJ, Crandall KA, McClellan DA. TreeSAAP: selection on amino acid properties using phylogenetic trees. Bioinformatics. 2003;19:671–2. doi: 10.1093/bioinformatics/btg043. [DOI] [PubMed] [Google Scholar]
- Yang Z. PAML: a program package for phylogenetic analysis by maximum likelihood. Comput. Appl. Biosci. 1997;13:555–6. doi: 10.1093/bioinformatics/13.5.555. [DOI] [PubMed] [Google Scholar]
- Yang Z, Bielawski JP. Statistical methods for detecting molecular adaptation. Trends Ecol. Evol. 2000;15:496–503. doi: 10.1016/S0169-5347(00)01994-7. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Yoshikuni Y, Ferrin TE, Keasling JD. Designed divergent evolution of enzyme function. Nature. 2006;440:1078–82. doi: 10.1038/nature04607. [DOI] [PubMed] [Google Scholar]