Skip to main content
Bioinformatics logoLink to Bioinformatics
. 2010 Aug 26;26(19):2462–2463. doi: 10.1093/bioinformatics/btq467

RDP3: a flexible and fast computer program for analyzing recombination

Darren P Martin 1,2,*, Philippe Lemey 3, Martin Lott 1,2,4, Vincent Moulton 4, David Posada 5, Pierre Lefeuvre 1,6
PMCID: PMC2944210  PMID: 20798170

Abstract

Summary: RDP3 is a new version of the RDP program for characterizing recombination events in DNA-sequence alignments. Among other novelties, this version includes four new recombination analysis methods (3SEQ, VISRD, PHYLRO and LDHAT), new tests for recombination hot-spots, a range of matrix methods for visualizing over-all patterns of recombination within datasets and recombination-aware ancestral sequence reconstruction. Complementary to a high degree of analysis flow automation, RDP3 also has a highly interactive and detailed graphical user interface that enables more focused hands-on cross-checking of results with a wide variety of newly implemented phylogenetic tree construction and matrix-based recombination signal visualization methods. The new RDP3 can accommodate large datasets and is capable of analyzing alignments ranging in size from 1000×10 kilobase sequences to 20×2 megabase sequences within 48 h on a desktop PC.

Availability: RDP3 is available for free from its web site http://darwin.uvigo.es/rdp/rdp.html

Contact: darrenpatrickmartin@gmail.com

Supplementary information: The RDP3 program manual contains detailed descriptions of the various methods it implements and a step-by-step guide describing how best to use these.


rpd3 is a computer program for statistical identification and characterization of historical recombination events. Given a set of aligned nucleotide sequences, rpd3 will rapidly analyze these with a range of powerful non-parametric recombination detection methods (including bootscan, maxchi, chimaera, 3seq, geneconv, siscan, phylpro and visrd; Boni et al., 2007; Gibbs et al., 2000; Lemey et al., 2009; Padidam et al., 1999, Posada and Crandall, 2001; Weiller, 1998). It will provide a detailed breakdown of recombination breakpoint locations, and the identities of recombinant and parental sequences. For further downstream analyses, the program enables users to save edited sequence alignments with (i) recombinant sequences removed; (ii) recombinationally derived tracts of sequence removed; or (iii) recombinant sequences split into their constituent parts.

An important strength of rdp3 that makes it applicable to a variety of recombination analysis problems is that, unlike many other recombination detection programs such as simplot (Lole et al., 1999), dual brothers (Minin et al., 2005), jphmm (Schultz et al., 2006) or scueal (Kosakovsky et al., 2009), it does not screen predefined sets of potentially recombinant (or query) sequences against other predefined sets of non-recombinant (or reference) sequences. rdp3 instead treats every sequence within an input alignment as a potential recombinant and systematically screens large numbers of sequence triplets and/or quartets to identify sets of three or four sequences that contain a recombinant and two sequences resembling its parents. Such an approach means that rdp3 can simultaneously detect the entire scope of recombination evident within a dataset (i.e. not just that occurring between the reference strains or species) enabling its use in the characterization of complex recombinants such as those derived through recombination between parental sequences that were themselves recombinant. The drawback of such a flexible, exploratory framework is that it can often be difficult to assess the uncertainty associated with inferred recombination patterns. However, with its wide range of cross-checking tools, rpd3 is complementary to probabilistic recombination analysis approaches.

1 NEW FEATURES IN rpd3

Although the graphically intensive and highly interactive rpd3 interface remains superficially unchanged from that of its predecessor, rpd2 (Martin et al., 2005a, b), it includes simple point-and-click access to a multitude of powerful new features. Among these are three new non-parametric recombination detection methods (3seq, visrd and phylpro; Boni et al., 2007; Lemey et al., 2009; Weiller, 1998), a parametric recombination rate estimation method (ldhat; McVean et al., 2004), two new tree construction methods (Maximum likelihood with phyml and Bayesian with mrbayes; Guindon and Gascual, 2003; Ronquist and Huelsenbeck, 2003), two recombination hotspot-tests (Heath et al., 2006), a test of recombination induced protein mis-folding (Lefeuvre et al., 2007; Voigt et al., 2002), recombination-aware methods for reconstructing ancestral sequences (Arenas and Posada et al., 2010) and a range of matrix methods for visualizing overall patterns of recombination within datasets (Jakobsen and Easteal, 1996; Lefeuvre et al., 2009; McVean et al., 2004).

In addition to the new methods implemented in rpd3, another important improvement over rpd2 is the way in which rpd3 automatically scans alignments for recombination signals and then infers the minimum numbers of recombination events needed to account for these signals. rpd3 implements a range of heuristic recombinant sequence identification methods based on the phylpro (Weiller, 1998), visrd (Lemey et al., 2009) and subtree-prune and regraft methods (that identify recombinants sequences as those which ‘jump’ between the branches of phylogenetic trees constructed from different fragments of the same sequence alignment; Beiko and Hamilton, 2006; Heath et al., 2006). rdp3 also automatically checks detected recombination signals to determine whether they might not be better accounted for by sequence misalignment than recombination. Misalignments introduce homoplasy and are a common cause of false positive recombination signals. Misalignments are automatically detected in rpd3 by separately realigning recombinant sequences with each of their identified parents (rpd3 uses clustalw to do this; Chenna et al., 2003) and comparing these pair-wise alignments to those of the corresponding sequence pairs in the full multiple sequence alignment. By more accurately identifying recombinant sequences and discounting recombination signals attributable to sequence misalignments, rpd3 significantly outperforms rdp2 for overall quantitative assessments of recombination patterns such as those carried out in the new breakpoint hot-spot and protein folding disruption tests.

In addition to streamlined tools for managing, testing and editing information on detected recombination events, rpd3 also provides a range of new tools for users to cross-check how accurately the program has identified (i) groups of recombinants supposedly sharing traces of the same recombination events; (ii) recombinant and parental sequences; and (iii) recombination breakpoint positions. These include heat-plots indicating how closely the recombination patterns in two recombinants resemble one another in relation to their supposed parental sequences, color coded phylogenetic trees for identifying recombinants and parental sequences and maxchi (Maynard Smith, 1992) and lard (Holmes et al., 1999) breakpoint matrices for manually identifying breakpoint positions.

All of the automated recombination detection methods in rpd3 have been rigorously speed optimized and as a result the program is able to analyze datasets containing up to 40 million nt within 48 h on a standard 2 GHz processor with 2 GB of RAM. Such large datasets might, for example, consist of 20 full bacterial genome sequences, or 1000 full viral genome sequences. With default program settings datasets containing 100 10 kb long sequences can be analyzed within 10 min.

Funding: Wellcome Trust (to D.P.M.); Postdoctoral fellowship from the Fund for Scientific Research (FWO) Flanders (to Ph.L.); South African Centre of High Performance Computing bursary (to M.L.); European Research Council (ERC-2007-Stg 203161-PHYGENOM to D.P.); Spanish Ministry of Science and Education (BFU2009-08611 to D.P.); GIS CRVOI (grant NPRAO/AIRD/CRVOI/08/03 to Pi.L.); Wellcome Trust (grant number GR079127MA).

Conflict of Interest: none declared.

REFERENCES

  1. Arenas M, Posada D. The effect of recombination on the reconstruction of ancestral sequences. Genetics. 2010;184:1133–1139. doi: 10.1534/genetics.109.113423. [DOI] [PMC free article] [PubMed] [Google Scholar]
  2. Beiko RG, Hamilton N. Phylogenetic identification of lateral genetic transfer events. BMC Evol. Biol. 2006;6:15. doi: 10.1186/1471-2148-6-15. [DOI] [PMC free article] [PubMed] [Google Scholar]
  3. Boni MF, et al. An exact nonparametric method for inferring mosaic structure in sequence triplets. Genetics. 2007;176:1035–1047. doi: 10.1534/genetics.106.068874. [DOI] [PMC free article] [PubMed] [Google Scholar]
  4. Chenna R, et al. Multiple sequence alignment with the Clustal series of programs. Nucleic Acids Res. 2003;31:3497–3500. doi: 10.1093/nar/gkg500. [DOI] [PMC free article] [PubMed] [Google Scholar]
  5. Gibbs MJ, et al. Sister-scanning: a Monte Carlo procedure for assessing signals in recombinant sequences. Bioinformatics. 2000;16:573–582. doi: 10.1093/bioinformatics/16.7.573. [DOI] [PubMed] [Google Scholar]
  6. Guindon S, Gascuel O. A. simple, fast, and accurate algorithm to estimate large phylogenies by maximum likelihood. Syst. Biol. 2003;52:696–704. doi: 10.1080/10635150390235520. [DOI] [PubMed] [Google Scholar]
  7. Heath L, et al. Recombination patterns in aphthoviruses mirror those found in other picornaviruses. J. Virol. 2006;80:11827–11832. doi: 10.1128/JVI.01100-06. [DOI] [PMC free article] [PubMed] [Google Scholar]
  8. Holmes EC, et al. Phylogenetic evidence for recombination in dengue virus. Mol. Biol. Evol. 1999;16:405–409. doi: 10.1093/oxfordjournals.molbev.a026121. [DOI] [PubMed] [Google Scholar]
  9. Jakobsen IB, Easteal S. A program for calculating and displaying compatibility matrices as an aid in determining reticulate evolution in molecular sequences. Comput. Appl. Biosci. 1996;12:291–295. doi: 10.1093/bioinformatics/12.4.291. [DOI] [PubMed] [Google Scholar]
  10. Kosakovsky Pond,SL, et al. An evolutionary model-based algorithm for accurate phylogenetic breakpoint mapping and subtype prediction in HIV-1. PLoS Comput. Biol. 2009;5:e1000581. doi: 10.1371/journal.pcbi.1000581. [DOI] [PMC free article] [PubMed] [Google Scholar]
  11. Lefeuvre P, et al. Avoidance of protein fold disruption in natural virus recombinants. PLoS Pathog. 2007;3:e181. doi: 10.1371/journal.ppat.0030181. [DOI] [PMC free article] [PubMed] [Google Scholar]
  12. Lefeuvre P, et al. Widely conserved recombination patterns among single-stranded DNA viruses. J. Virol. 2009;83:2697–2707. doi: 10.1128/JVI.02152-08. [DOI] [PMC free article] [PubMed] [Google Scholar]
  13. Lemey P, et al. Identifying recombinants in human and primate immunodeficiency virus sequence alignments using quartet scanning. BMC Bioinformatics. 2009;10:126. doi: 10.1186/1471-2105-10-126. [DOI] [PMC free article] [PubMed] [Google Scholar]
  14. Lole KS, et al. Full-length human immunodeficiency virus type 1 genomes from subtype C-infected seroconverters in India, with evidence of intersubtype recombination. J. Virol. 1999;73:152–160. doi: 10.1128/jvi.73.1.152-160.1999. [DOI] [PMC free article] [PubMed] [Google Scholar]
  15. Martin DP, et al. RDP2: recombination detection and analysis from sequence alignments. Bioinformatics. 2005a;21:260–262. doi: 10.1093/bioinformatics/bth490. [DOI] [PubMed] [Google Scholar]
  16. Martin DP, et al. A modified bootscan algorithm for automated identification of recombinant sequences and recombination breakpoints. AIDS Res. Hum. Retrovir. 2005b;21:98–102. doi: 10.1089/aid.2005.21.98. [DOI] [PubMed] [Google Scholar]
  17. Maynard Smith J. Analyzing the mosaic structure of genes. J. Mol. Evol. 1992;34:126–129. doi: 10.1007/BF00182389. [DOI] [PubMed] [Google Scholar]
  18. McVean GA, et al. The fine-scale structure of recombination rate variation in the human genome. Science. 2004;304:581–584. doi: 10.1126/science.1092500. [DOI] [PubMed] [Google Scholar]
  19. Minin VN, et al. Dual multiple change-point model leads to more accurate recombination detection. Bioinformatics. 2005;21:3034–3042. doi: 10.1093/bioinformatics/bti459. [DOI] [PubMed] [Google Scholar]
  20. Padidam M, et al. Possible emergence of new geminiviruses by frequent recombination. Virology. 1999;265:218–225. doi: 10.1006/viro.1999.0056. [DOI] [PubMed] [Google Scholar]
  21. Posada D, Crandall KA. Evaluation of methods for detecting recombination from DNA sequences: computer simulations. Proc. Natl Acad. Sci. USA. 2001;98:13757–13762. doi: 10.1073/pnas.241370698. [DOI] [PMC free article] [PubMed] [Google Scholar]
  22. Ronquist F, Huelsenbeck JP. MrBayes 3: Bayesian phylogenetic inference under mixed models. Bioinformatics. 2003;19:1572–1574. doi: 10.1093/bioinformatics/btg180. [DOI] [PubMed] [Google Scholar]
  23. Schultz AK, et al. A jumping profile Hidden Markov Model and applications to recombination sites in HIV and HCV genomes. BMC Bioinformatics. 2006;7:265. doi: 10.1186/1471-2105-7-265. [DOI] [PMC free article] [PubMed] [Google Scholar]
  24. Voigt CA, et al. Protein building blocks preserved by recombination. Nat. Struct. Biol. 2002;9:553–558. doi: 10.1038/nsb805. [DOI] [PubMed] [Google Scholar]
  25. Weiller GF. Phylogenetic profiles: a graphical method for detecting genetic recombinations in homologous sequences. Mol. Biol. Evol. 1998;15:326–335. doi: 10.1093/oxfordjournals.molbev.a025929. [DOI] [PubMed] [Google Scholar]

Articles from Bioinformatics are provided here courtesy of Oxford University Press

RESOURCES