Supporting information for Aanen et al. (2002) Proc. Natl. Acad. Sci. USA, 10.1073/pnas.222313099

 

Taxon Sampling with Geographical Origin and GenBank Accession Numbers

Notes on Termite Taxon Sampling.

Kambhampati and Eggleton (1) recognize 14 genera of Macrotermitinae. Of these:
  1. Sphaerotermes was included in our original ingroup data set but did not fall within the same clade as the fungus-growing termites (i.e., the presently recognized Macrotermitinae is polyphyletic).
  2. Euscaiotermes and Parahypotermes are described inadequately and probably are best synonymized with Odontotermes.
  3. We were unable to sequence samples of Allodontermes and Megaprotermes (old 80% alcohol samples from the the Natural History Museum, London). The former genus is closest morphologically to Synacanthotermes (W. A. Sands, personal communication.), and the latter genus is closest morphologically to (and may be nested phylogenetically within) Protermes (2).

In Table 1, details are provided on all samples used in this study.

General Information on Phylogenetic Analysis

We analyzed the different DNA sequences by using Bayesian techniques (3) to estimate the phylogenetic histories of the two interacting mutualists. Advantages of the Bayesian approach are: (i) it is fast so that complex models of sequence evolution can be used, (ii) a single analysis gives information on both a phylogeny and support for individual branches, and (iii) phylogenetic uncertainty can be taken into account when using phylogenies to test hypotheses. Because Bayesian posterior probabilities cannot be calculated even for small phylogenetic problems, they must be estimated by using other methods. One method is the Markow chain Monte Carlo approach with Metropolis-Hastings sampling (3). For most analyses we used mrbayes 2.01, but for the fungal sequence we performed a combined analysis with test version 3.0, because this version allows the simultaneous use of different models for different data partitions. In Termite Data Set and Phylogenetic Analyses and Fungal Data Set and Phylogenetic Analyses we give the specific details for the termite and fungal data sets, respectively. Here we also provide details on alternative maximum-likelihood (ML) and maximum-parsimony (MP) analyses of the same data sets.

Termite Data Set and Phylogenetic Analyses

Termite total DNA was isolated from legs (large species) or thorax tissue including legs (small species) by using the Qiagen DNeasy tissue kit. A region of cytochrome oxydase subunit 1 (COI), ~1,000 bp in size was amplified as two fragments, with an overlap of 50–150 bp. Before this study no COI sequences for termites were known, so primers initially were based on the COI sequence of Blattella germanica (GenBank accession no. S72627) (4) to amplify a fragment of 700 bp (Bl-L-1960 and Bl-H-2653) for two termite species. Based on these first two sequences, new primers for termites were designed and tested in combination with one Blatella primer (BL1812 and BH2928). Subsequently, other primers were developed based on the growing number of sequences becoming available during this study. In Table 2 the sequences of the primers used in this study to amplify the termite COI region are given.

Model of Sequence Evolution.

We used modeltest (5) to estimate the best-fit model for the evolution of the sequences. The model selected was the general time-reversible model with gamma-distributed rate variation and a proportion of invariable sites (GTR + G + I). We then tested whether the GTR model with site-specific substitution rates statistically improved the fit of the data. The ln(l) of the ML tree using site-specific rates was –7,501.17378, and the ln(l) of the ML tree using gamma-distributed site rate variation was –7,675.73763 (d = 174.56385). We used parametric bootstrapping (6) to test the significance of the observed difference. We produced 100 data sets of the same size as the original data set using seq-gen (7), with model parameters estimated from the original data according to the null model [GTR + G + I, with R matrix (0.3124, 5.7921, 0.6569, 0.00001, 6.9015, 1); base frequencies (A = 0.3843, C = 0.3028, G = 0.0983, T = 0.2146); shape of gamma-distribution 1.9792; and proportion of invariable sites 0.6318]. We calculated log-likelihood scores of the simulated data under both H0 and H1 in paup* 4.0b10 (heuristic search option using a neighbor-joining (NJ) starting tree and tree bisection-reconnection (TBR) branch swapping (with 300 rearrangements) (8) and calculated the difference d between each pair of scores. We tested the significance of the two models according to ref. 14 and rejected the GTR + G + I model in favor of the GTR + SSR model (in all simulations d < 0, P < 0.01).

Bayesian Analysis.

In an initial analysis we included nine species of non-fungus-growing Termitidae (including Sphaerotermes sphaerothorax) and one species of the Rhinotermitidae to root the tree. From this first analysis we concluded that the fungus-growing termites form a monophyletic group (posterior probability = 0.98, Fig. 2). The sister group of the fungus-growing termites consisted of the species Labritermes butelreepeni and Foraminitermes valens (posterior probability 0.99). Theoretical and empirical data have shown that including (members of) the sister group as the outgroup is to be preferred over (members of) different clades as outgroups (9). Therefore, we used this sister group of the fungus-growing termites consisting of L. butelreepeni and F. valens as the outgroup in subsequent analyses. We obtained slightly different results for the ingroup relationships using this outgroup (Fig. 3). Most noticeably, the genus Microtermes had a basal position in the first analysis, whereas it had a terminal position as the sister group of the genus Ancistrotermes in the second analysis. The latter result is consistent with morphological evidence.

To test for the stability of the posterior probabilities found in a Bayesian analysis, we did three separate analyses using different random trees with which to start. The topologies of the three majority-rule consensus trees of the sampled trees were identical, and posterior probabilities for individual nodes differed only slightly (maximum 4 %, trees not shown). Based on these results, we employed a single run (1.5 million generations, burnin = 100,000) for additional analyses.

To test the influence of outgroups on ingroup relationships further, we did three additional analyses: (i) F. valens as the single outgroup (no positively differing results), (ii) L. butelreepeni as the single outgroup with one difference—Pseudacanthotermes militaris becomes basal, and (iii) no outgroup with no positively differing results.

ML and MP Analyses of Termite Sequences

. ML and MP analyses were performed with paup* (8). The ML analyses employed heuristic searches by using an NJ starting tree and TBR branch swapping (with 100,000 rearrangements). The model and settings found in Estimation of Models of Sequence Evolution were used [GTR + SSR, with R matrix (0.3124, 5.7921, 0.6569, 0.00001, 6.9015, 1); base frequencies (A = 0.3843, C = 0.3028, G = 0.0983, T = 0.2146)]. Nonparametric bootstrap analyses were conducted by using 100 bootstrap replicates generated by codonbootstrap (10). codonbootstrap takes the protein-coding structure of the data set into account by bootstrapping codons instead of single-nucleotide positions. For every replicate, heuristic searches were done by using an NJ starting tree and TBR branch swapping (with 700 rearrangements), and the model and settings were found with modeltest (5). Our ML reconstruction was not positively different from the Bayesian majority rule consensus tree (Fig. 4), but the bootstrap support estimates were sometimes lower than the corresponding Bayesian posterior probabilities. MP analyses employed heuristic searches by using a stepwise addition starting tree and 1,000 random addition sequences with TBR branch swapping. Bootstrap analyses were conducted by using 1,000 bootstrap replicates. For every replicate, heuristic searches were done by using a stepwise addition starting tree and 10 random addition sequences with TBR branch swapping. The strict consensus tree of the 22 most parsimonious trees found (length = 1,527 steps; consistencey index = 0.34; retention index = 0.59) was not positively different from the Bayesian majority-rule consensus tree but had a lower resolution (Fig. 5). The low consistencey index of the unweighted parsimony trees shows that the data set has a high degree of homoplasy. Therefore, a successive approximation was performed by a single round of reweighting characters according to their mean rescaled consistencey index (11, 12). This procedure produced a single tree (Fig. 6) that differed only at three terminal positions from the Bayesian and ML tree (the positions of Synacanthotermes heterodon, Odontotermes sp. 1, and P. militaris were slightly different).

Fungal Data Set and Phylogenetic Analyses

Fungal DNA was obtained from comb material, basidiocarps, and termite gut contents. For the comb material and basidiocarps the Qiagen DNeasy plant kit was used, whereas fungal DNA from gut contents was isolated by using the Qiagen DNeasy tissue kit. Two sequences were determined for the fungal symbionts:

  1. Approximately 530 bp from the 5' side (final alignment 553 nucleotides) of the 25S nuclear RNA gene (nLSU-rDNA). A region of maximally 11 nucleotides was excluded from the analysis, because it could not be aligned unambiguously.
  2. Approximately 320 bp of the 12S mitochondrial RNA gene (mtSSU-rDNA). The final alignment included 327 nucleotides. A region of maximally 28 bp was excluded from the analysis because it could not be aligned unambiguously.

In order to be able to use gut contents to obtain fungal sequences, specific primers were developed for both sequences (D.K.A. and J.J.B., unpublished data). For the 25S fragment, one specific primer was made (25S4R, ACAAGTGCTGAGTTCCTCAG) based on sequences already available in GenBank [used in ref. 17, accession nos. AF042585.1 (Termitomyces cylindricus), AF042586.1 (Termitomyces heimii), AF223174.1 (Termitomyces subhyalinus), and AF042587.1 (Podabrella microcarpus)]. The specificity of this primer was checked by doing blast searches (www.ncbi.nlm.nih.gov/blast/Blast.cgi). This primer was used in combination with primer ITS4R (GCATATCAATAAGCGGAGGA, the reverse complement of ITS4) (13) to amplify a product of ~530 nucleotides.

For the mitochondrial fragment we first used universal primers on the comb material (MS1 and MS2, http://plantbio.berkeley.edu/~bruns/primers.html) and developed and tested several specific primers from the first five sequences obtained. The specificity of these primers was checked by doing blast searches. The following two specific primers were used: (i) ssufw105, specific for Termitomyces (TCGCGTTAGCATCGTTACTAGT) and (ii) ssurev475, specific for some Lyophylleae including Termitomyces (GCCAGAGACGCGAACGTTAGTCG).

The following groups of termite species had fungal symbionts with identical sequences for both DNA fragments sequenced such that only one of the sequences was used in the analyses:

  1. Microtermes sp. 1 (dka3), Microtermes sp. 2 (dka38), Microtermes sp. 3 (dka142), Microtermes sp. 3 (dka103), Microtermes sp. 3 (dka145), Ancistrotermes cavithorax (dka34), Ancistrotermes cavithorax (dka46), Ancistrotermes crucifer (dka69), Ancistrotermes crucifer/Termitomyces medius (dka138), Ancistrotermes crucifer (dka140), S. heterodon (dka118), and S. heterodon (dka119);
  2. Macrotermes bellicosus (dka8) and Macrotermes bellicosus (dka14);
  3. Macrotermes bellicosus (dka50) and Macrotermes bellicosus (dka60);
  4. Macrotermes bellicosus (dka19) and Macrotermes bellicosus (dka52);
  5. Macrotermes muelleri (367551) and Macrotermes nobilis (367569),
  6. Acanthotermes acanthothorax (dka120) and Acanthotermes acanthothorax (dka136);
  7. Odontoterms latericius (dka39) and Odontoterms latericius (dka42);
  8. Odontotermes minutus (dka161) and Odontotermes billitoni (dka168); and
  9. Protermes minutus (367567) and Protermes prorepens (dka129).

Models of Sequence Evolution.

The two fungal sequences were tested for combinability by using the partition homogeneity test (15) implemented in paup* 4.0b (8), which evaluates the significance of incongruence between data partitions. There was no significant incongruence between the two data sets (1,000 artificial data sets, P = 0.24), which is consistent with earlier findings (16).

We used modeltest (5) to estimate the best-fit model for the evolution of the two respective sequences. For the nuclear 25S data set the model selected was Timura and Nei’s 1993 model with a gamma-distributed rate variation and a proportion of invariable sites with equal base frequencies [TrNef + I + G , R matrix (1, 4.3636, 1, 1, 12.0306, 1); proportion of invariable sites = 0.4915; and shape of gamma distribution = 0.7257]. For the mitochondrial 12S data set the model selected was F81 + G (all rates equal, base frequencies: A = 0.3124, C = 0.1859, G = 0.1469, T = 0.3548, and gamma shape parameter = 0.5038). For the combined data set the model selected was Timura and Nei’s 1993 model with gamma-distributed rate variation and a proportion of invariable sites [TrN + I + G , R matrix (1, 2.8404, 1, 1, 4.9821, 1); base frequencies (A = 0.2885, C = 0.1961, G = 0.2175, T = 0.2979); proportion of invariable sites = 0.4655; and shape of gamma distribution = 0.8866].

Bayesian Analysis.

We performed an analysis on the combined data set by using mrbayes 3.0* (test version). The two partitions were defined, and for each partition a separate model was used: the GTR + I + G (lset nst = 6 rates = invgamma) for the nuclear 25S and the GTR + G (lset nst = 6 rates = gamma) for the mitochondrial 12S (Fig. 7). To look for the stability of the topology found in the combined analysis, we did some additional analyses. First we did a separate Bayesian analysis on the two different sequences, which gave only minor positive conflicts between these trees and the combined tree (Fig. 8, nuc-25S; Fig. 9, mt-12S). Second, we did extra Bayesian analyses where we varied the number of outgroups: (i) Lyophyllum atratum and Tephrocybe rancida, identical topology; (ii) only T. rancida, identical topology except for unresolved position of fungus of Odontotermes sp.1; and (iii) no outgroup, almost identical topology with no positive differences.

ML and MP Analyses of Fungal Sequences.

ML and MP analyses were performed with paup*4.0 (8). ML analyses employed heuristic searches by using an NJ starting tree and TBR branch swapping (with 1,00,000 rearrangements). The model and settings found in Estimation of Models of Sequence Evolution were used [TrN + I + G , R matrix (1, 2.8404, 1, 1, 4.9821, 1); base frequencies (A = 0.2885, C = 0.1961, G = 0.2175, T = 0.2979]; proportion of invariable sites = 0.4655; and shape of gamma distribution = 0.8866]. Nonparametric bootstrap analyses were conducted by using 100 bootstrap replicates. For every replicate, heuristic searches were done by using an NJ starting tree and TBR branch swapping (with 1,000 rearrangements) and the model and settings found in the Estimation of Models of Sequence Evolution. The ML reconstruction was not positively different from the Bayesian majority-rule consensus tree (Fig. 10), but the bootstrap support was sometimes lower than the Bayesian posterior probabilities. MP analyses on the combined data set employed heuristic searches by using a stepwise addition starting tree and 1,000 random addition sequences with TBR branch swapping. Bootstrap analyses were conducted by using 1,000 bootstrap replicates. For every replicate heuristic searches were done using a step-wise addition starting tree and 10 random addition sequences with TBR branch swapping. The strict consensus tree of the six most parsimonious trees found (length = 332 steps; consistencey index = 0.69; retention index = 0.75) (Fig. 11) was almost identical to the Bayesian majority-rule consensus tree (except for a minor difference in the position of Odontotermes sp.4).

 

Data Sets in Nexus Format

Five data sets in Nexus format can be downloaded from www.zi.ku.dk/eunet/aanenetal.htm by using the password "ruudnenaa."

The data sets are:

COI-dataset-2outgroups

(termite data set with two outgroups, with which most of the analyses have been done).

COI-dataset-10outgr

(termite data set with 10 outgroups, with which an initial analysis has been done to show monophyly of fungus-growing Macrotermitinae).

fungicomplete

(both fungal data sets in one file).

fungi25S

(fungal nuclear 25 S).

fungi-mt12S

(fungal mitochondrial 12 S).

1. Kambhampati, S. & Eggleton, P. (2000) in Termites: Evolution, Sociality, Symbioses, Ecology, eds. Abe, T., Bignell, D. E. & Higashi, M. (Kluwer, Dordrecht, The Netherlands), pp. 1–24.

2. Ruelle, J. E., (1978) J. Entomol. Soc. Afr. 41, 17–23.

3. Huelsenbeck, J. P. & Ronquist, F. (2001) Bioinformatics 17, 754–755.

4. Martinez-Gonzalez, J. & Hegardt, F. G. (1994) Insect Biochem. Mol. Biol. 24, 619–626.

5. Posada, D. & Crandall, K. A. (1998) Bioinformatics 14, 817–818.

6. Huelsenbeck, J. P. & Crandall, K. A. (1997) Annu. Rev. Ecol. Syst. 28, 437–466.

7. Rambaut, A. & Grassly, N. C. (1997) Comput. Appl. Biosci. 13, 235–238.

8. Swofford, D. L. (2001) paup* (Sinauer, Sunderland, MA), Version 4.0b10.

9. Smith, A. B. (1994) Biol. J. Linn. Soc. 51, 279–292.

10. Bollback, J. P. (2001) codonbootstrap (Depart. of Biology, Univ. of Rochester, Rochester, NY), Version 3.0b4.

11. Farris, J. S. (1969) Syst. Zool. 18, 374–385.

12. Kretzer, A. M. & Bruns, T. D. (1999) Mol. Phylogenet. Evol. 13, 483–492.

13. White, T. J., Bruns, T. D., Lee, S. & Taylor, J. (1990) in PCR Protocols: A Guide to Methods and Applications, eds. Innis, M. A., Gelfand, D. H., Sninsky, J. J. & White, T. J. (Academic, San Diego), pp. 315–322.

14. Goldman, N. (1993). J. Mol. Evol. 36, 182–198.

15. Farris, J. S., Kallersjo, M., Kluge, A. G. & Bult, C. (1994) Cladistics 10, 315–319.

16. Moncalvo, J.-M., Drehmel, D. & Vilgalys, R. (2000) Mol. Phylogenet. Evol. 16, 48–63.

17. Moncalvo, J.-M., Lutzoni, F. M., Rehner, S. A., Johnson, J. & Vilgalys, R. (2000) Syst. Biol. 49, 278–305.