Supporting Materials and Methods

DNA Sequencing. For analysis of mtDNA, we sequenced the HVS1 control region, as described (1). PCR products were purified by using the Qiaquick purification kit (Qiagen, Valencia, CA). Independent sequences from both DNA strands were obtained by using the ABI PRISM Dye Terminator Cycle Sequencing Ready Reaction kit (Applied Biosystems). For Helicobacter pylori, fragments of gene fragments from atpA, efp, mutY, ppa, trpC, ureI, vacA, and yphC were amplified and sequenced from all isolates, as described (2).

Microsatellite Genotyping. We genotyped individuals for 17 unlinked nuclear microsatellites (GenBank accession nos. D20S161, G08055, G08056, G08057, UT717, UT970, UT1220, UT1257, UT1376, UT1674, UT1689, UT1704, UT1708, UT2092, UT2127, UT5029, and UT6540), largely consisting of tetranucleotide repeats. PCR amplifications were conducted in 10 m l volumes containing 10–50 ng of DNA, 300 pmol of each primer, 75 m M of each nucleotide, 1.2 mM MgCl2, 1 × Taq buffer (10 mM Tris•HCl pH 9, 50 mM KCl) and 0.25 units of Taq polymerase (Perkin–Elmer). The PCR protocol comprised an initial denaturation at 95°C for 4 min, followed by 32 cycles of denaturation at 95°C for 30 s, primer annealing at 55°C for 45 s, and extension at 72°C for 1 min, and a final 5-min extension at 72°C. We mixed 5 m l of each PCR product with 2 m l of blue formamide and loaded them on a 12% polyacrylamide gel. Electrophoresis was for 15 h at 15 or 25 W, depending on fragment sizes. The allele sizes were calculated by reference to a size standard (10-bp DNA ladder; Invitrogen).

Statistical Analysis. Sequences were aligned by using seqlab and pileup [Wisconsin Package Version 9.1, Genetics Computer Group (GCG), Madison, WI]. The best-fit model of DNA substitution and the parameter estimates used for HSV-1 tree reconstruction were chosen by performing hierarchical likelihood ratio tests implemented in paup* (3) and modeltest 1.05 (4). A neighbor-joining tree (5) was estimated with paup*, incorporating the best-fit maximum-likelihood model of evolution. Confidence in the tree relationships was assessed by using 1,000 bootstrap replicates (6). A neighbor-joining tree of the multilocus haplotypes from H. pylori was estimated in mega (7). The hierarchical components of human microsatellite genotypes and mtDNA as well as microbial sequences were computed using the Analysis of Molecular Variance (AMOVA) procedure in arlequin 2.0 (available at http://anthropologie.unige.ch/arlequin). The AMOVA procedure incorporates both the estimated divergence between sequences and their frequencies. The significance between haplotype frequencies in different populations was estimated by using c 2 test 2 of clump (8), which assesses the significance of data in sparse matrices by a Monte Carlo method. To detect subtle, individual genetic patterns, a factorial correspondence analysis (FCA) was performed with Genetix, which graphically projects the individuals on the factor space defined by the similarity of their allelic states.

To examine whether genetic clusters corresponded to the two ethnic groups, microsatellite genotypes and bacterial sequences were analyzed with structure (9), which identifies clustered sources of ancestry of multilocus genotypes among genetically similar individuals. We assumed that each individual had multiple sources of ancestry due to admixture and therefore estimated fraction of ancestry from each of the different sources for each individual. The bacterial sequences include linked nucleotides within each gene fragment and were evaluated with the "linkage" model of structure (10), which uses the map location of individual polymorphisms for its calculations. Microsatellites are essentially unlinked and were therefore analyzed with the "admixture" model, which does not use information from genetic linkage to recognize populations. The number of ancestral clusters, K, was determined by comparing log-likelihoods in multiple runs for values of K between one and five. Each run consisted of 100,000 iterations with a burn-in period of 30,000 and multiple runs with the same value of K produced nearly identical results.

Genetic Variability and Population Genetics Parameters. To determine whether population expansion had happened, we calculated theoretical mismatch distributions for a population of constant size using DnaSP (11) and compared this with the observed data with the Kolmogorov–Smirnov test. Allelic diversity, genetic variation [observed heterozygosity under Hardy–Weinberg equilibrium (HWE)], deviation from HWE and genetic differentiation were all calculated with genepop (available at http://wbiomed.curtin.edu.au/genepop). Variation of allelic frequencies among samples was assessed with genetix 4.0, as described (12): first, the null hypothesis of homogeneity in allelic distribution was tested by Fisher’s exact test using the Markov chain method, and then the standardized variance in allelic frequencies (q ) was determined as an estimator of FST.

1. Salas, A., Richards, M., De La, F. T., Lareu, M. V., Sobrino, B., Sanchez-Diz, P., Macaulay, V. & Carracedo, A. (2002) Am. J. Hum. Genet. 71, 1082–1111.

2. Achtman, M., Azuma, T., Berg, D. E., Ito, Y., Morelli, G., Pan, Z.-J., Suerbaum, S., Thompson, S., van der Ende, A. & van Doorn, L. J. (1999) Mol. Microbiol. 32, 459–470.

3. Swofford, D. L. (2002) paup*, Phylogenetic Analysis Using Parsimony (*and Other Methods) (Sinauer, Sunderland, MA), Version 4.

4. Posada, D. & Crandall, K. A. (1998) Bioinformatics. 14, 817–818.

5. Saitou, N. & Nei, M. (1987) Mol. Biol. Evol. 4, 406–425.

6. Felsenstein, J. (1985) Evolution 39, 783–791.

7. Kumar, S., Tamura, K., Jakobsen, I. B. & Nei, M. (2001) Bioinformatics. 17, 1244–1245.

8. Sham, P. C. & Curtis, D. (1995) Ann. Hum. Genet. 59, 97–105.

9. Pritchard, J. K., Stephens, M. & Donnelly, P. (2000) Genetics 155, 945–959.

10. Falush, D., Stephens, M. & Pritchard, J. K. (2003) Genetics 164, 1567–1587.

11. Rozas, J. & Rozas, R. (1999) Bioinformatics 15, 174-175.

12. Weir, B. S. & Cockerham, C. C. (1984) Evolution 38, 1358-1370.