Skip to main content
. 2012 Apr 6;7(4):e33394. doi: 10.1371/journal.pone.0033394

Table 2. A workflow for phylogenetic inference using RAD sequences.

Steps to determine if the species your wish to study are appropriate for RAD phylogenetics
How much evolutionary divergence time do you expect between taxa? RAD appears to work well for ≤50 million year divergences by consistently fails at ≥100 million years.
Collect samples
High quality genomic DNA is the required input. It is better if there is a continuum of relatedness between taxa so that each species has at least some close relatives included in the analysis.
Prep and sequence DNA
This can either be done in house or by sending samples to a sequencing facility [8]. The basic procedure is to cut genomic DNA with the specified restriction enzyme, randomly fragment the resulting pieces, barcode the samples for increased cost-efficiency, and sequence.
Filter sequences and call consensus loci
Som sequence reads will be ambiguous or of low quality. These should be discarded. High coverage of loci allows for probabilistic analyses of the most likely base at each position [12][15].
Cluster sequences (Step 1 from Methods)
A variety of clustering similarities should be tried to test the consistency and believability of results. UCLUST [16] is fast and effective at finding homologous sequences.
Choose minimum taxa cluster sizes (Step 2)
Small minimum taxa cluster sizes tend to produce the best topologies but larger values may be useful with very large datasets. Any cluster smaller than the chosen minimum taxa cluster size is excluded as are clusters with samples represented by multiple sequences.
Align clusters of sequences (Step 3)
Each cluster of sequences should be individually aligned using an automated alignment program. The volume of data precludes manual alignment.
Concatenate clusters (Step 3)
All clusters should be concatenated, filling in missing sequences from each cluster with gaps. There will be many missing sequences.
Reconstruct phylogeny
RAxML [19] is fast, can handle matrices even millions of base pairs long and can reconstruct accurate topologies from this type of data but other methods can be used.
Compare results from different parameters
Different sets of reasonably chosen parameters should produce similar topologies. Although low clustering similarities were successful in our study, higher similarities may be more useful for more recently divergent taxa. Low clustering thresholds may allow for more data, but more data may also be discarded if multiple sequences from a single species more often end up clustering together.