Gomez-Alpizar et 10.1073/pnas.0611479104.XXYYYYY103. |
Fig. 3. (A) Intron Ras sequence chromatograms of Phytophthora infestans showing sites with double bands (peaks) indicating heterozygotes, F: Forward; R: Reverse. (B) NruI restriction digestion of RAS PCR products from P. infestans isolates. 1) 100 bp ladder, 2) CR-51, 3) CR-52, 4) PER 800, 5) PER 832, 6) 100bp ladder. Lanes 2 and 3) CR-51 and CR-52 (homozygotes). Lanes 4 and 5, PER 800 and PER 832 (heterozygotes); lanes 1 and 6, 100 bp ladder. (C) NcoI restriction digestion of b-Tubulin PCR products from P. infestans isolates. Lane 2, undigested PCR product. Lanes 3, 4, and 5, BOL -3, BOL-6, BOL-14 (heterozygotes). Lanes 6 and 7, US-13 and US-14 (heterozygotes). Lanes 8 and 9, PIC97620 and PIC97224 (homozygotes). Lanes 10 and 11, CR-52 and CR-61 (heterozygotes). Lanes 1 and 12, 100-bp ladder
Fig. 4. Flowchart of SNAP Workbench platform. The double arrows indicate the data steps taken (Path 1) in analysis of the Phytophthora infestans multilocus sequence data.
SI Text
Statistical analysis.
Sequence data from mitochondrial loci (P3 and P4) and nuclear loci (Intron Ras, Ras and β-tubulin) were combined separately into alignment files using SNAP Combine (52). Sequences were collapsed into unique haplotypes using SNAP Map (52) after removing insertions and deletions (indels) from each of the aligned multilocus data sets and excluding infinite-sites violations. Base substitutions were categorized as phylogenetically informative or uninformative, transitions vs transversions, and replacement vs. synonymous amino acid changes in the coding region of each alignment. Resultant haplotype data sets were used to examine the overall support or conflict among the variable sites in the DNA sequence alignment. A site compatibility matrix was generated from each haplotype data set using SNAP Clade and Matrix (50). Compatibility matrices were used to examine compatibility/incompatibility among all variable sites, with any resultant incompatible sites removed from the data set. This was important as subsequent coalescent analyses assume that all variable sites are fully compatible. Data sets were also evaluated using RecMin (53) for evidence of recombination boundaries and for estimating the minimum number of recombination events. Conflicting data partitions or putative recombinant haplotypes were also excluded from further analyses, except when testing for population subdivision using Hudson's test statistics as recombination increases the power of these tests (28). Nonrecombining data sets were collapsed into unique haplotypes excluding infinite-sites violations using SNAP Map (Fig. 4, path 1). Three non-recombining data sets were defined as follows: the mitochondrial regions P3 and P4, intron Ras (Intron Ras) and the combined intron Ras and Ras gene (IntronRas + Ras). Data from the b-tubulin gene was not used in further analyses due to incompatibility of the only segregating site in this gene found with variation in all the other loci we examined.
Neutrality tests and population subdivision.
Polymorphism analysis was carried out on nonrecombining data sets using the program DnaSPv4 (version 4) (54). Estimates of the number of segregating sites and sequence diversity statistics, including Watterson's θ (estimated as θw ) (55) and Tajima's π (56) were calculated for the entire sample and each locus separately, as well as for the eight sampled areas in South American (SA: Brazil, Bolivia, Peru, and Ecuador) and non South American (NSA: Costa Rica, Mexico, the US, and Ireland) regions. To increase sample sizes, we amalgamated Brazilian and Bolivian isolates (BRABO), Peruvian and Ecuadorian isolates (PEECU), Mexican and Costa Rican isolates (MECO), and US and Irish isolates (USIR). Criteria for amalgamating populations included: sample size, geographical location (sampling areas), sharing of genotypes (Table S8), and relation to migration hypotheses. Sequence variation was tested for deviations from neutrality by using Tajima's D, Fu and Li's D* and F* and Fu's Fs tests (57,58,59).Genetic differentiation among populations was analyzed using SNAP Map, Seqtomatrix and Permtest (28,30) implemented in SNAP Workbench. Permtest is a nonparametric permutation method based on Monte Carlo simulations that estimates Hudson's test statistics (KST, KS, and KT) under the null hypothesis of no genetic differentiation. Significance was evaluated by performing 1000 permutations for each nuclear and mitochondrial data set including incompatible sites and recombinant sequences (recombination blocks, path 1) (Fig. 4).
Migration analysis.
In populations where subdivision was observed, we further tested the null hypothesis of isolation between SA and NSA populations using the "isolation with migration "(IM) program developed by Hey and Nielsen (31). Under the null hypothesis of isolation, we would expect migration (M) to be near 0. If migration rates are nonzero and SA and NSA populations split a long time ago then divergence is consistent with a two-island model and equilibrium migration rates. IM implements Markov chain Monte Carlo simulations to estimate the posterior probability distribution of multiple demographic parameters including effective population size, divergence time and migration rates for a pair of closely related populations or species. The IM model assumes neutrality and no recombination and therefore these analyses were performed on nonrecombining and neutrally evolving loci (Fig. 4, path 1).Genealogical analysis.
The ancestral history of the populations for each nonrecombining data set (mitochondrial (P3+P4) and nuclear (IntronRas (separate data not shown) and IntronRas+Ras) were inferred using Genetree (Version 9.0) (32) in SNAP Workbench (50). The genealogy with the highest root probability, ages of mutations, the TMRCA of the sample, and the geographic distribution of the mutations were estimated from coalescent simulations. Coalescent analyses with population subdivision were performed and a backward migration matrix was estimated for each locus using IM. Recombinant haplotypes were identified a priori using SNAP Clade and excluded from the analysis. Coalescent simulations were performed assuming an infinite-sites model, constant population size and population subdivision. Gene genealogies for each locus were inferred using five million simulations of the coalescent. Additionally we performed five independent runs of five million simulations using a different starting random number seed for each run to ensure convergence.