Abstract
Identifying the factors that influence the outcome of host-microbial interactions is critical to protecting biodiversity, minimizing agricultural losses, and improving human health. A few genes that determine symbiosis or resistance to infectious disease have been identified in model species, but a comprehensive examination of how a host's genotype influences the structure of its microbial community is lacking. Here we report the results of a field experiment with the model plant Arabidopsis thaliana to identify the fungi and bacteria that colonize its leaves and the host loci that influence the microbes’ numbers. The composition of this community differs among accessions of A. thaliana. Genome-wide association studies (GWAS) suggest that plant loci responsible for defense and cell wall integrity affect variation in this community. Furthermore, species richness in the bacterial community is shaped by host genetic variation, notably at loci that also influence the reproduction of viruses, trichome branching and morphogenesis.
INTRODUCTION
Plants, the main driver of primary productivity in terrestrial ecosystems, provide habitat to countless bacteria, yeasts, filamentous fungi, protists, oomycetes, and nematodes. Recent studies have investigated the role of the environment and host-genetics in affecting the bacteria that live in both the rhizosphere1-5 and phyllosphere6-9. The discovery that these communities are shaped, at least in part, by host genetic variation motivates the search for the host genes involved2-4,6,9.
The plant genetic model, A. thaliana, is ideal for investigating the molecular bases of traits of ecological and agricultural interest, including resistance to fungal and bacterial species, and has been used successfully to identify loci that recognize individual isolates of model pathogens10,11. Here we investigate which microbial species colonize the leaves of A. thaliana, and whether host-genetic factors play a discernible role. For this purpose, we grew a worldwide diversity panel of 196 accessions10 (Supplementary Table 1), in replicate, in a field site where the species occurs. To be consistent with the predominantly winter-annual life history of A. thaliana, we conducted our experiment from autumn to spring, and at the end of the experiment took a ‘snapshot’ of the microbial community by flash-freezing samples in the field. Here, in addition to characterizing the bacteria and fungi that live in the leaves of A. thaliana, we identify the host genes that contribute to the structure of its microbial community.
RESULTS
The leaf microbial community of A. thaliana
Leaves were washed and vortexed to remove loosely associated microbes before extracting DNA from each leaf rosette. To characterize the bacterial community in each sample, variable regions 5 (V5), 6, and 7 of bacterial 16S ribosomal DNA (rDNA) genes were PCR amplified using the primer pair 799F and 1193R. In addition, the first internal transcribed spacer (ITS1) within eukaryotic rDNA was amplified using the fungal specific primer ITS1-F with ITS2. All amplicons were sequenced, in multiplex, using a 454 FLX system (Titanium chemistry). After basic quality control (Methods), ~3,186 ± 2,202 (mean ± s.d.) bacterial reads (1,768,402 total reads) and ~526 ± 248 fungal reads (297,871 reads) were obtained from each sample. DNA sequences sharing ≥ 97% pairwise similarity were clustered to identify species-level operational taxonomic units (OTUs).
Across accessions, we found 5,057 non-singleton bacterial OTUs, with the majority belonging to families in the Proteobacteria, Bacteroidetes, and Actinobacteria (Supplementary Figs. 1a-d). In particular, Sphingomonas (α-proteobacteria), Flavobacterium (Bacteroidetes), Rhizobium (α-proteobacteria), and Pseudomonas (γ-proteobacteria) – all of which are known to occur in the phyllosphere of A. thaliana throughout much of the species' range12,13 – were common genera. A total of 2,582 non-singleton fungal OTUs were also observed, mostly representing families from the ascomycete classes Dothideomycetes and Sordariomycetes, and the basidiomycete class Tremellomycetes (Supplementary Figs. 1e-h). Genera known to contain plant pathogens included Epicoccum, Alternaria, Mycosphaerella, Fusarium, and Plectosphaerella. The most heavily sequenced (i.e. most 'abundant') fungal OTUs share taxonomic affinity with the genus Tetracladium, which, although originally assumed to be restricted to aquatic environments, are frequently found on plants14.
After correcting for differences in sequencing among samples and adjusting for technical confounders, strong and significant species associations (Kendall's Test of Concordance15, P = 0.001, 1000 permutations) were observed within both the bacterial and fungal communities (Supplementary Figs. 2 and 3), suggesting that members of the microbial community interact or that portions of the microbial community respond to the same host factors. To take into account these correlations, we summarized each community using eigenvector techniques (Methods), including principal components analysis (PCA) and canonical correspondence analysis (CCA).
The leaf microbial community is shaped by host genetics
We found that genetic variation within A. thaliana clearly shapes the leaf bacterial community, but only when we focused on the most heavily sequenced OTUs. As an example, PCA of the bacterial community distinguishes accessions of A. thaliana according to host-genotype, with inbred replicates of the same accession significantly clustered together (Fig. 1a; Methods) when analyzing, at most, the top 50% of the community (H2 ~ 40%; P = 0.044, 1000 permutations; for the top 1%, H2 ~ 42%; P = 0.004). However, these 2,528 bacterial OTUs correspond to more than 99% of the sequencing reads, which suggests that rare species or sequencing artifacts16,17 may obscure evidence that hosts structure their microbial communities (Fig. 1b).
Figure 1. Genetic variation within A. thaliana shapes the composition of the best-sequenced members of the microbial community.
(a) Using eigenvector techniques, inbred replicates of A. thaliana cluster together only when analyzing the most heavily sequenced bacteria. Nevertheless, the vast majority of the sequencing effort characterizes a small number (and %) of taxa in each community (b). Taken together, this implies that vagrant species and other poorly characterized/sequenced taxa (and occasionally, sequencing artifacts) obscure evidence that hosts shape their microbial communities. (c) Host-genetic variation within A. thaliana also affects the ability of fungi to colonize and proliferate on its leaves. All P values take into account technical confounders.
Species of bacteria tend to be more prevalent (i.e. common) across host samples than species of fungi, leading to higher estimates of turnover (β-diversity) in the fungal than bacterial community (Supplementary Fig. 4). It is unclear if fungi disperse poorly compared to bacteria, or if other factors (e.g. host-selection and/or interspecific-competition) differentially shape these two communities. Nevertheless, both presence-absence and abundance data reveal clear evidence that host-genetic variation shapes the communities of fungi associated with the leaves of A. thaliana, but for only the most heavily sequenced taxa (Fig. 1c).
We looked for further evidence that hosts shape their microbial communities by using genome-wide single nucleotide polymorphism (SNP) data18 to estimate the relatedness among accessions, before asking whether more closely related individuals harbor more similar communities. This approach is likely to underestimate the heritability of traits influenced by non-additive effects, genetic heterogeneity19, or by rare causal SNPs in incomplete linkage disequilibrium (LD) with genotyped SNPs4,20; nevertheless, heritable eigenvectors were found in both communities, regardless of the ordination technique used (Methods; Supplementary Table 2). For example, SNPs explain 9% of the variance for PC1 (P = 0.003) and 8% of the variance for PC2 (P = 0.015) from PCA of the fungal community, as well as 11% of the variance for PC2 of the bacterial community (P = 0.001).
The genes associated with the leaf microbial community
Having established that microbial communities are shaped by host genotypes, we turned to GWAS21-23 to map any major genetic variants underlying variation in these eigenvectors and, separately, the presence-absence and abundance of the most heavily sequenced (n = 100) taxa in each kingdom. In addition, to explore the processes shaping each microbial community, we used a false discovery rate24 (FDR) of 10% to identify enriched gene ontology (GO) categories25 (Methods).
We found that bacterial and fungal communities are shaped by similar biological processes, albeit by different underlying genes. In the analysis of individual OTUs, a few genomic regions stand out as being generally important (Fig. 2 and Supplementary Table 3), and candidate genes significantly overrepresented across analyses (Methods) tend to be associated with OTUs in only one kingdom (but see Supplementary Table 4). In contrast, gene set enrichment analyses reveal that the most common biological process overrepresented across analyses is 'defense response', followed closely by kinase-related activities, for both the bacterial and fungal community (Table 1).
Figure 2. The most frequently observed genomic region in the results from GWAS of the 100 most heavily sequenced bacterial OTUs.
The points illustrate the minimum P-value, per 10-kb region, from these separate analyses (i.e. separate GWAS of individual OTUs), and this region is shared in the extreme tail for 9 out of these 100 OTUs (100,000 permutations; P = 1 × 10−5). Notable a priori candidate genes include FAD2 and TBL1; as mentioned in the main text, the TBL gene family is involved in secondary cell wall synthesis and cellulose deposition. The association peaks, however, on TETRASPANIN 6 (TET6), a gene involved in metal ion transport.
Table 1.
Biological categories most often enriched in GWAS of the 100 most abundant OTUs. Storey's procedure24 was used to correct for multiple testing (FDR ≤ 10%). Only the top 3 enriched GO-terms are shown, unless there are ties among results. The probability of observing the same category across analyses was determined through 100,000 permutations (Methods).
| Kingdom | Biological category | Number of OTUs | Rank | P-value |
|---|---|---|---|---|
| Fungi | defense response | 21 | 1 | 1 × 10−5 |
| Fungi | signal transduction | 12 | 2 | 1 × 10−5 |
| Fungi | protein serine/threonine kinase activity | 9 | 3 | 2 × 10−5 |
| Bacteria | defense response | 9 | 1 | 1 × 10−5 |
| Bacteria | kinase activity | 8 | 2 | 1 × 10−5 |
| Bacteria | Casparian strip | 7 | 3 | 0.00015 |
| Bacteria | cell wall modification | 7 | 3 | 0.00015 |
| Bacteria | cell-cell junction assembly | 7 | 3 | 0.00015 |
| Bacteria | plasma membrane part | 7 | 3 | 0.00015 |
The cell wall, comprised of the polysaccharides cellulose (β−1,4-glucan), callose (β−1,3-glucan) and pectin (a heteropolysaccharide), is one of the first obstacles for any plant pathogen, and biological processes associated with the cell wall are significantly overrepresented across GWAS of individual bacterial species. Similarly, for the combined fungal community, the strongest GWAS peaks for PC1 and PC2 from PCA each fall within candidate genes implicated in cell wall integrity. For PC1, the top SNP lies within GLUCANSYNTHASE-LIKE 11 (GSL11); a related locus (GSL5) in A. thaliana seals wounds that arise during fungal infection using callose26. For PC2, the top SNP falls within a member of the TRICHOME BIREFRINGENCE-LIKE gene family (TBL37), which is involved in secondary cell wall formation through the deposition of cellulose27.
Plant microtubules, which form the cytoskeleton and are regularly moved to the site of contact with a microbe, act as either a defense mechanism or, after reorganization of the plant cell wall, to enable compatible symbioses with diverse microbial species28. Still other pathogens depolymerize microtubules to facilitate infection; in the case of viruses, microtubules provide a means for intra and intercellular mobility. Several distinct microtubule related categories are significantly enriched in the results from GWAS of the fungal community (Supplementary Table 5).
Although many of the strongest associations are implicated in the presence-absence or abundance of only one or a few OTUs, several of these are members of large gene families, some of which are likely to be functionally redundant. For example, ATP binding cassette (ABC) transporters ferry metabolites around the cell and across the cell membrane, and mutations in ABC transporters lead to various human diseases (e.g. cystic fibrosis29) and plant resistance to a number of toxins and pathogens30. ABC transporters are found among the strongest associations from GWAS of both bacterial and fungal OTUs (e.g. Fig. 3a, b and Supplementary Table 4). As another example, pectin in the cell wall is frequently degraded by pathogen produced enzymes (i.e. pectinases)31. Even so, we found several (non-allelic) host polymorphisms involved in the synthesis and esterification of pectin to be associated with various OTUs (Fig. 3b-d), which highlights the role of cell wall integrity in shaping the composition of the leaf microbial community. The results from all analyses have been deposited in the Dryad Digital Repository (http://doi.org/10.5061/dryad.8sm01).
Figure 3. Genes implicated in the community composition of the leaves.
The ABC transporter C family members 7 and 8 (multidrug resistance-associated proteins 7 and 8) are associated (Chr 3, ~4.21 Mb) with the abundance of an OTU assigned to Mycosphaerella (a), while ABC transporter G family member 35 (pleiotropic drug resistance 7; Chr 1, ~5.23 Mb) and a pectinesterase (AT2G36710; Chr 2, ~15.392 Mb) are implicated in the abundance of an OTU assigned to Sphingomonas (b). Other pectin related enzymes include the pectate lyase (AT4G13210; Chr 4, ~7.67 Mb) associated with the abundance of Chryseobacterium (c) and the pectinesterase (AT5G26810; Chr 5 ~9.432 Mb) associated with the abundance of Xanthomonas (d). Notable a priori candidate genes also include TERPENE SYNTHASE 10 (TPS10; Chr 2, ~10.297 Mb) identified in (a), the resistance gene (R-gene) pinpointed (Chr 5, ~18.287 Mb) in (c), and the oxidoreductase (Chr 4, ~9.708 Mb) illustrated in (d). To assess genome-wide significance, a permutation approach was used that takes into account population structure (Methods).
Finally, we investigated heritability of broad community descriptors and found that the number of bacterial species (i.e. “richness”) in the leaf is affected by host genetic variation (H2 ~ 46%; P = 0.021), with host SNPs explaining ~8% (P = 0.023) of the phenotypic variance. Among the most significantly enriched biological processes in the results from GWAS (Table 2) are categories related to trichomes, which modify water use, leaf reflectance, and temperature32. In the case of plant defense, trichomes tend to discourage insect herbivory33,34, and have been reported to facilitate infection by some species of fungi, both by catching spores35 and by giving fungi a means to proliferate on the leaf36. It isn't clear how trichomes shape the bacterial community, but it is interesting to note that bacterial species richness does not change with the number of trichomes on a leaf10 (P = 0.32, simple linear regression), unless the plants were induced with the defense hormone jasmonic acid (β = −0.13, R2 = 0.06; P = 0.026). It is thus tempting to speculate that richness in the leaf bacterial community is shaped by other plant enemies (e.g. insects, fungi) that vector bacteria or trigger defense responses. An additional difficulty is that the pathways responsible for trichome and cuticle synthesis overlap37, and mutants in cuticle formation host altered microbial communities9. Deciphering how hosts shape bacterial communities is clearly complex, and one must remain aware of both genetic constraints within the host and impacts of other species. In fact, in the results from GWAS of leaf bacterial richness, the most significantly enriched category involves the reproduction of viruses, implying that these loci are pleiotropic or that leaf-associated bacteria and viruses interact, as has been observed during human respiratory38 and polio39 infections.
Table 2.
Biological categories enriched in the 5% tail from GWAS of the log of species richness (S) in the bacterial community Storey's procedure24 was used to correct for multiple testing (FDR ≤ 10%).
| Biological process | Enrichment | Storey's FDR, q < 0.1 |
|---|---|---|
| regulation of viral reproduction | 20.1 | 0.022 |
| trichome branching | 4.5 | 0.029 |
| meiosis | 4.7 | 0.082 |
| plastid stroma | 4.5 | 0.082 |
| trichome morphogenesis | 3.4 | 0.085 |
| perinuclear region of cytoplasm | 8.6 | 0.096 |
| xyloglucan biosynthetic process | 8.6 | 0.096 |
DISCUSSION
In summary, our results demonstrate that GWAS can help to identify the loci and host processes that structure microbial communities. However, our results also emphasize the need, moving forward, to consider the role of genetic heterogeneity and interactions among microbes in shaping these communities. The role of life-history traits40 (i.e. plant phenology) and the environment should also be taken into account. Studies of the rhizosphere demonstrate a role of soil type and chemistry in addition to host genetics3-5. In our study, we controlled for the environment (Methods), but differences in the environment could cause distinct loci or host processes (e.g. Tables 1 and 2) to shape the leaf microbiome of A. thaliana at different times or places. Similar patterns have been observed for flowering time, a trait for which few candidate genes are identified in both field and greenhouse conditions41. Be that as it may, adjusting for environmental factors improves power in mapping studies42,43, and an understanding of important environmental factors should improve the ability to predict microbial phenotypes. As sequencing costs continue to decrease, the ability to dissect the host-microbial interactions affecting human disease, agriculture, and conservation efforts, is finally within reach.
METHODS
Field experiment
We sowed 4 replicates of each of 196 accessions of A. thaliana (Supplementary Table 1) in two randomized blocks (2 replicates per accession per block) using a mixture (1:1) of Fafard C2 and Metromix 200 soil. The soil was autoclaved to reduce the number of greenhouse bacteria and fungi on plants, before transferring them to the field. Seeds were watered and then stored in a cold dark room (4°C) to homogenize germination. After 7 days of stratification, all plants were moved to a glass greenhouse and grown in 12 hours of light (20°C) for 19 days (allowing most accessions to germinate and reach the 4-leaf stage).
These plants were then transferred to a field site (42.0831°N, 86.351°W; Southwest Michigan Research and Extension Center, Benton Harbor, MI, USA; October 22nd, 2008) known to host a naturalized population of A. thaliana. Within blocks, samples were planted 10 cm apart from one another, and the blocks were separated by 2 m. The plants were watered generously on the day of transplanting, but were otherwise left untreated until the end of the experiment. Weather records for the field station can be found at: http://www.enviroweather.msu.edu/weather.php?stn=swm
The following spring (March 27th, 2009), we used sterile technique and flash-froze samples in the field using liquid nitrogen, before transferring them to the lab on ice. Samples were stored at −80°C until further processing.
Isolation of host-associated microbial DNA
Prior to DNA extraction, we removed the most loosely associated microbes from each rosette, by washing each sample using an earlier approach13,44. Briefly, we washed each sample first in 0.1 M potassium phosphate buffer, pH 8.0, then in 70% ethanol, and finally in sterile water; the water wash was repeated 3 times. Samples were vortexed (20 seconds) and centrifuged between each wash before the supernatants were discarded, presumably leaving the most tightly associated members of the epiphytic communities, as well as the endophytic communities. The samples were then extracted using Mo Bio's Ultra-clean htp-96 well Plant DNA Isolation Kit. To increase cell-lysis, we repeated the manufacturer's recommended freeze-thaw method 3 times before DNA extraction. DNA was stored at −20°C until used in PCR.
Amplicon library preparation and sequencing (16S and ITS)
To characterize the bacterial and fungal communities of A. thaliana, each sample was used as template to PCR amplify phylogenetically informative regions of 16S (bacteria) and ITS-1 (fungi). We used 454 FLX Titanium emPCR Kits (Lib-L) for all sequencing.
Bacteria
To survey bacterial communities, GS FLX Titanium Primer B (5’-CCTATCCCCTGTGTGCCTTGGCAGTCTCAG-3’) was attached to 799F (5’-AACMGGATTAGATACCCKG-3’)45; Primer A (5’-CCATCTCATCCCTGCGTGTCTCCGACTCAG-3’) was combined with a 12-bp error-correcting barcode46, a 2-bp linker (5’-AT-3’) and the reverse primer 1193R (5-ACGTCATCCCCACCTTCC-3’)13. Together, 799F and 1193R amplify the hypervariable regions V5, V6 and V7 of the 16S gene.
Fungi
To amplify ITS-1, Primer B (above) was attached to the fungus-specific ITS1F (5’-CTTGGTCATTTAGAGGAAGTAA-3’)47; Primer A included a 12-mer barcode, a 2-bp linker (CA) and ITS2 (5’-GCTGCGTTCTTCATCGATGC-3’)48.
Each sample was PCR-amplified in triplicate, and each 25-μl reaction contained 2-ul genomic DNA, 10-ul 2.5x HotMasterMix (5-Prime) and 0.2uM of each primer. PCR conditions included: an initial denaturing step at 94°C for 2.5 minutes, followed by 30 cycles of a denaturing step (94°C for 30 s), an annealing step (55°C for 40 s), and an extension step (68°C for 40 s). A final extension step at 68°C was performed for 7 min before storing the samples at 4°C. When necessary, PCR dropouts were re-amplified. All samples were quantified using Picogreen (Invitrogen) and these barcoded libraries were pooled to equimolar concentrations.
799F and 1193R exclude chloroplast DNA45. To exclude the remaining mtDNA, we captured the phylogenetic target (~505 bp including the above primers) using a 2% agarose gel. Although this approach is effective in minimizing the amplification of host DNA, it likely misrepresents the abundances of several interesting taxa45, such as the Cyanobacteria, The gel slices were extracted (QIAGEN's QIAquick), and samples were further purified with Ampure magnetic purification beads (Agencourt). Finally, samples were quantified using the Qubit dsDNA HS Assay Kit (Invitrogen) and sequenced using 454 FLX Titanium based-chemistry (Roche Life Sciences).
16S/ITS1 rDNA Data Processing
We denoised all of the SFF files generated from pyrosequencing using AmpliconNoise17 and QIIME49. We required sequence reads to be less than 500 bp and used Perseus17 to minimize the number of chimeras. We initially created 560 bacterial amplicon libraries and 570 fungal amplicon libraries with PCR; denoising these resulted in 555 bacterial and 566 fungal samples.
We used the default parameters in QIIME to pick operational taxonomic units (OTUs) sharing 97% sequence similarity (using the algorithm, ‘cdhit’). Each bacterial OTU was assigned taxonomic status using the RDP algorithm, also implemented in QIIME. To determine the taxonomic affinity of fungal OTUs, we used the software package MARTA50.
Samples with poor sequencing coverage were omitted from all analyses. We required a minimum of 800 reads per bacterial sample, and 200 reads per fungal sample; this resulted in 512 bacterial and 549 fungal samples. To correct for differences in sequencing effort (coverage), each sample was resampled to either 800 reads (for bacteria) or 200 reads (for fungi). However, all samples were resampled to contain 200 reads before making comparisons between the bacterial and fungal communities (e.g. fSupplementary Fig. 4). Because the samples were grown in different blocks (above), PCR-amplified in separate 96- well plates, and sequenced on separate picotiter (ptp) plates, we took into account these covariates in the analyses described below.
Microbial analyses
Associations among microbes
To perform Kendall's Test, we used the kendall.global function in the R-package51 vegan52.
Ordination techniques
The function rda (scale = T) in vegan was used to perform principal components analysis, while cca was used for canonical (constrained) correspondence analysis (CCA). decorana was used to perform detrended correspondence analysis (DCA).
To test the hypothesis that accessions of A. thaliana differ with respect to the composition of their microbial communities, which we characterized with these ordination techniques, we used the functions envfit (for unconstrained ordination techniques) and anova.cca (for ordinations produced by CCA). To investigate whether host genetic differences are easier to discern for well-sequenced taxa than rare taxa (i.e. due to species turnover among rare species in the microbial community, sequencing artifacts, or some other mechanism), we ordered the species matrix by total (maximum) sequencing coverage per OTU. We prefer to characterize well-sequenced taxa as “most heavily sequenced OTUs” rather than “most abundant” because of common technical artifacts (due to primer biases or RNA operon count differences, etc.).
Briefly, envfit identifies the direction in multi-dimensional ordination space that is maximally associated with an environmental variable (here, host genotype, or accession_id). The goodness-of-fit statistic is r2 which is equal to 1 - (ssw/sst); ssw is the within-group sum of squares and sst = the total sums of squares. To assess the significance of this association, we permuted the data 999 times and counted the number of times that these simulated r2 values matched or exceeded the observed r2 value (including the observed r2 value, which is assumed to be an observation from the null distribution). To determine whether ordinations produced with CCA are shaped by host-genotype, we used the function anova.cca. This function also relies on permutation tests, but does so to determine how often the observed constrained inertia (the constraint being host-genotype) is exceeded when the data are randomly permuted.
Genome Wide Association Studies (GWAS)
We used a mixed-model approach21,22 as implemented in the mixmogam package23 to account for the complex pattern of relatedness among our accessions for all GWAS. To estimate a genome-wide P value threshold, we performed permutations, where we re-ran association scans (genome-wide) using a linear transformation of the phenotype values. Our method controls for population structure using an approach similar to53; however, instead of simulating phenotypes under the null, we permute the transformed phenotype values. This allows us to also control for false positives that can arise if the residuals do not come from a Gaussian distribution. The transformation matrix is the Cholesky decomposition of the inverse phenotypic covariance matrix, as estimated from the mixed model. By applying this linear transformation to the phenotype values, the resulting vector contains values that are expected to be uncorrelated; we randomly permute these to obtain a new vector that we transform as described in53, i.e. using the Cholesky decomposition of the phenotypic covariance. Because we use efficient mixed models, and because we only need to transform the phenotypes and the genotypes once, the time complexity of the permutation is O(n2m+knm), where n is the number of individuals, m is the number of genotype variants tested, and k is the number of permutations. We performed k = 100 permutations for each trait.
Phenotypes
To correct for differences in sequencing effort among samples, all data were resampled to either 800 reads (bacterial community) or 200 reads (fungal community) before conducting GWAS using common SNPs (minor allele frequency ≥ 5%). To identify loci underlying variation in the structure of the bacterial and fungal communities, we considered each (separate) community as an aggregate. The raw data from each community were Hellinger54 transformed (i.e. the OTUs in each sample were expressed as a fraction of the sampling effort, and then square-root transformed) before PCA was performed on the most-heavily sequenced members of these communities. To be consistent with the results illustrated in Figure 1, we analyzed the top 2% of the fungal community, and the top 50% of the bacterial community. However, we noticed that we could explain a larger fraction of the variance (both from PCA and from SNPs) by analyzing smaller fractions of the bacterial community, due to species turnover in the community and the different number of variables considered by PCA. We also conducted GWAS after analyzing each microbial community using CCA on the top 2% of the bacterial community and top 3% of the fungal community; DCA was performed on the top 5% of the fungal community and the top 2% of the bacterial community. In general, many researchers prefer CCA over PCA because it is more robust to the so-called ‘horsehoe effect’; its drawback is that eigenvalues from CCA are not as easily interpreted as in PCA. The top 5 eigenvectors from CCA and PCA were analyzed in GWAS. Only 4 axes (DCA1-4) are output from decorana (vegan's function to perform DCA).
In order to evaluate the association between these SNPs and the abundance of individual bacterial or fungal OTUs, each species matrix was either square root transformed or analyzed as the presence/absence of the 100 most heavily sequenced OTUs in each community. As above, we used the vegan function cca to ‘partial-out’ the technical confounders (cca performs QR decomposition) block, picotiter plate, and PCR plate, using the residuals from cca as phenotypes in GWAS, similar to earlier PCA-based approaches55.
To identify loci associated with bacterial species richness56 (diversity of order 0), the number of species within each sample was tallied and log-transformed; technical confounders (above) were regressed out before conducting GWAS. Because RNA operon counts differ among species, and bias results from PCR, we avoided estimating “true” diversity (diversity of order 1), which is often calculated using Shannon diversity.
The most common results from GWAS
To identify genomic regions shared in the top-results from these GWAS, we combined GWAS analyses of the colonization (presence/absence) and proliferation (abundance) for each OTU into one dataset. To do so, we combined P values from GWAS using Brown's method57, which is similar to Fisher's combined P value approach, but suitable for correlated datasets (e.g. the P values from these 2 analyses).
We split the results from each analysis into 10-kb windows (yielding 11,614 windows). Then, to make the results comparable across GWAS, we ranked and calculated an empirical P value for each window. Next, we determined the amount of overlap in the top results (P ≤ 0.001, empirical P value) from GWAS of individual OTUs in each community. To determine the significance of observed sharing, we used 100,000 simulations to construct a null distribution; each observation in the null was based on selecting 11 (P ≤ 0.001)windows from each of 100 simulated GWAS results (i.e. 100 OTUs). We then counted the number of times a window was shared ‘x’ or more times. To assess the probability of observing the same genomic region in the bacterial and fungal analyses, we sampled from 200 simulated GWAS results (i.e. 100 OTUs from each community).
Enrichment of go-categories in the results from GWAS
To determine which biological processes underlie variation in the composition of A. thaliana's microbial community, we tested for an overrepresentation of Gene Ontology (‘goterm’) categories25 (ftp://ftp.arabidopsis.org/home/tair/Ontologies/Gene_Ontology/) in the top results from GWAS. We omitted gene models with low confidence (evidence code: ‘Inferred from Electronic Annotation’ (IEA)) and any biological category represented by only one gene model, leaving 3,588 unique GO-terms.
Next, we split the results from GWAS into 10-kb windows and took the minimum score within that window as the test statistic. We then counted the number of gene models (± 1000 bp surrounding DNA) within the top 5% of these (windowed) GWAS results. To ensure that we identify ‘broadly’ (i.e. genome-wide) enriched categories, we required at least 3 10-kb windows to contain gene models from the gene set category. To account for multiple testing, all P values were corrected using Storey's approach24 at an FDR level of 10%.
To determine the probability of observing a GO-term enriched in the results from GWAS multiple times (as illustrated in Table 1), we simulated Gene Set Enrichment Analyses. That is, we used 100,000 permutations to construct a null distribution where each observation in the null was constructed by randomly selecting (and tallying), 100 times (i.e. 100 OTUs), the same number of GO-terms significantly enriched in the analyses of each OTU. The P-values reported in Table 1 reflect the number of times that a biological category is shared x or more times in this null distribution.
Supplementary Material
ACKNOWLEDGMENTS
We thank D. Francis for providing the field site, and M. Palmer for helpful discussions about ordination techniques. We are also grateful to J. Ding, J. Higgins, F.G. Sperone, P.R. Stone, X. Sun, and W. Zhang for help sowing, collecting, and/or cleaning the samples. This work was supported by grants from NIH (GM057994, GM083068) and NSF (MCB 0603515) to J.B. M.W.H. was supported by an Achievement Rewards for College Scientists (ARCS) Foundation Scholarship. N.B. was supported by a Swiss NSF post-doctoral fellowship.
Footnotes
Accession codes. The 454 pyrosequencing data have been deposited in the ERA Sequence Read Archive with accession code PRJEB7247. GWAS results are available at the Dryad Digital Repository: http://doi.org/10.5061/dryad.8sm01
AUTHOR CONTRIBUTIONS
M.W.H., N.B. and J.B. conceived of and designed the project. M.W.H. and N.B. setup and executed the experiments with help from K.B., M.V, and D.M. B.M., S.S. and J.I.G. oversaw and performed quality-control on the sequencing. M.W.H., B.J.V. and M.N. designed code to conduct GWAS, and M.W.H. carried out all analyses. M.W.H. and J.B. wrote the paper, with comments from the other authors.
COMPETING FINANCIAL INTERESTS
The authors declare no competing financial interests.
References
- 1.Fromin N, Achouak W, Thiéry JM, Heulin T. The genotypic diversity of Pseudomonas brassicacearum populations isolated from roots of Arabidopsis thaliana: influence of plant genotype. FEMS Microbiology Ecology. 2001;37:21–29. [Google Scholar]
- 2.Micallef SA, Shiaris MP, Colón-Carmona A. Influence of Arabidopsis thaliana accessions on rhizobacterial communities and natural variation in root exudates. Journal of Experimental Botany. 2009;60:1729–1742. doi: 10.1093/jxb/erp053. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 3.Lundberg DS, et al. Defining the core Arabidopsis thaliana root microbiome. Nature. 2012;488:86–90. doi: 10.1038/nature11237. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 4.Peiffer JA, et al. Diversity and heritability of the maize rhizosphere microbiome under field conditions. Proc Natl Acad Sci USA. 2013 doi: 10.1073/pnas.1302837110. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 5.Bulgarelli D, et al. Revealing structure and assembly cues for Arabidopsis root-inhabiting bacterial microbiota. Nature. 2012;488:91–95. doi: 10.1038/nature11336. [DOI] [PubMed] [Google Scholar]
- 6.Balint-Kurti P, Simmons SJ, Blum JE, Ballare CL, Stapleton AE. Maize leaf epiphytic bacteria diversity patterns are genetically correlated with resistance to fungal pathogen infection. Mol Plant Microbe Interact. 2010;23:473–484. doi: 10.1094/MPMI-23-4-0473. [DOI] [PubMed] [Google Scholar]
- 7.Hunter PJ, Hand P, Pink D, Whipps JM, Bending GD. Both leaf properties and microbe-microbe interactions influence within-species variation in bacterial population diversity and structure in the lettuce (Lactuca Species) phyllosphere. Appl Environ Microbiol. 2010;76:8117–8125. doi: 10.1128/AEM.01321-10. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 8.Reisberg EE, Hildebrandt U, Riederer M, Hentschel U. Distinct phyllosphere bacterial communities on Arabidopsis wax mutant leaves. PLoS ONE. 2013;8:e78613. doi: 10.1371/journal.pone.0078613. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 9.Bodenhausen N, Bortfeld-Miller M, Ackermann M, Vorholt JA. A synthetic community approach reveals plant genotypes affecting the phyllosphere microbiota. PLoS Genetics. 2014;10:e1004283. doi: 10.1371/journal.pgen.1004283. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 10.Atwell S, et al. Genome-wide association study of 107 phenotypes in Arabidopsis thaliana inbred lines. Nature. 2010;465:627–631. doi: 10.1038/nature08800. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 11.Nemri A, et al. Genome-wide survey of Arabidopsis natural variation in downy mildew resistance using combined association and linkage mapping. Proc Natl Acad Sci USA. 2010;107:10302–10307. doi: 10.1073/pnas.0913160107. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 12.Delmotte N, et al. Community proteogenomics reveals insights into the physiology of phyllosphere bacteria. Proc Natl Acad Sci USA. 2009;106:16428–16433. doi: 10.1073/pnas.0905240106. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 13.Bodenhausen N, Horton MW, Bergelson J. Bacterial communities associated with the leaves and the roots of Arabidopsis thaliana. PLoS ONE. 2013;8:e56329. doi: 10.1371/journal.pone.0056329. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 14.Selosse MA, Vohník M, Chauvet E. Out of the rivers: are some aquatic hyphomycetes plant endophytes? New Phytologist. 2008;178:3–7. doi: 10.1111/j.1469-8137.2008.02390.x. [DOI] [PubMed] [Google Scholar]
- 15.Legendre P. Species associations: The Kendall coefficient of concordance revisited. J. Agric. Biol. Environ. Stat. 2005;10:226–245. [Google Scholar]
- 16.Kunin V, Engelbrektson A, Ochman H, Hugenholtz P. Wrinkles in the rare biosphere: pyrosequencing errors can lead to artificial inflation of diversity estimates. Environmental Microbiology. 2010;12:118–123. doi: 10.1111/j.1462-2920.2009.02051.x. [DOI] [PubMed] [Google Scholar]
- 17.Quince C, Lanzen A, Davenport RJ, Turnbaugh PJ. Removing noise from pyrosequenced amplicons. BMC Bioinformatics. 2011;12:38. doi: 10.1186/1471-2105-12-38. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 18.Horton MW, et al. Genome-wide patterns of genetic variation in worldwide Arabidopsis thaliana accessions from the RegMap panel. Nat Genet. 2012;44:212–216. doi: 10.1038/ng.1042. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 19.McClellan J, King MC. Genetic heterogeneity in human disease. Cell. 2010;141:210–217. doi: 10.1016/j.cell.2010.03.032. [DOI] [PubMed] [Google Scholar]
- 20.Yang J, et al. Common SNPs explain a large proportion of the heritability for human height. Nat Genet. 2010;42:565–569. doi: 10.1038/ng.608. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 21.Kang HM, et al. Variance component model to account for sample structure in genome-wide association studies. Nat Genet. 2010;42:348–354. doi: 10.1038/ng.548. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 22.Zhang Z, et al. Mixed linear model approach adapted for genome-wide association studies. Nat Genet. 2010;42:355–360. doi: 10.1038/ng.546. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 23.Segura V, et al. An efficient multi-locus mixed-model approach for genome-wide association studies in structured populations. Nat Genet. 2012;44:825–830. doi: 10.1038/ng.2314. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 24.Storey JD, Tibshirani R. Statistical significance for genomewide studies. Proc Natl Acad Sci USA. 2003;100:9440–9445. doi: 10.1073/pnas.1530509100. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 25.The Gene Ontology Consortium The Gene Ontology project in 2008. Nucleic Acids Res. 2008;36:D440–444. doi: 10.1093/nar/gkm883. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 26.Jacobs AK, et al. An Arabidopsis Callose Synthase, GSL5, Is Required for Wound and Papillary Callose Formation. Plant Cell. 2003;15:2503–2513. doi: 10.1105/tpc.016097. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 27.Bischoff V, et al. TRICHOME BIREFRINGENCE and Its Homolog AT5G01360 Encode Plant-Specific DUF231 Proteins Required for Cellulose Biosynthesis in Arabidopsis. Plant Physiology. 2010;153:590–602. doi: 10.1104/pp.110.153320. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 28.Hardham AR. Microtubules and biotic interactions. The Plant Journal. 2013;75:278–289. doi: 10.1111/tpj.12171. [DOI] [PubMed] [Google Scholar]
- 29.Gadsby DC, Vergani P, Csanady L. The ABC protein turned chloride channel whose failure causes cystic fibrosis. Nature. 2006;440:477–483. doi: 10.1038/nature04712. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 30.Stein M, et al. Arabidopsis PEN3/PDR8, an ATP binding cassette transporter, contributes to nonhost resistance to inappropriate pathogens that enter by direct penetration. Plant Cell. 2006;18:731–746. doi: 10.1105/tpc.105.038372. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 31.Yoder MD, Keen NT, Jurnak F. New domain motif: the structure of pectate lyase C, a secreted plant virulence factor. Science. 1993;260:1503–1507. doi: 10.1126/science.8502994. [DOI] [PubMed] [Google Scholar]
- 32.Ehleringer J, Bjorkman O, Mooney HA. Leaf pubescence: effects on absorptance and photosynthesis in a desert shrub. Science. 1976;192:376–377. doi: 10.1126/science.192.4237.376. [DOI] [PubMed] [Google Scholar]
- 33.Levin DA. The role of trichomes in plant defense. Q Rev Biol. 1973:3–15. [Google Scholar]
- 34.Dai X, et al. TrichOME: a comparative omics database for plant trichomes. Plant Physiol. 2010;152:44–54. doi: 10.1104/pp.109.145813. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 35.Roda A, Nyrop J, English-Loeb G. Leaf pubescence mediates the abundance of non-prey food and the density of the predatory mite Typhlodromus pyri. Exp Appl Acarol. 2003;29:193–211. doi: 10.1023/a:1025874722092. [DOI] [PubMed] [Google Scholar]
- 36.Calo L, Garcia I, Gotor C, Romero LC. Leaf hairs influence phytopathogenic fungus infection and confer an increased resistance when expressing a Trichoderma alpha-1,3-glucanase. Journal of Experimental Botany. 2006;57:3911–3920. doi: 10.1093/jxb/erl155. [DOI] [PubMed] [Google Scholar]
- 37.Xia Y, et al. The glabra1 mutation affects cuticle formation and plant responses to microbes. Plant Physiology. 2010;154:833–846. doi: 10.1104/pp.110.161646. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 38.Bosch AA, Biesbroek G, Trzcinski K, Sanders EA, Bogaert D. Viral and bacterial interactions in the upper respiratory tract. PLoS Pathogens. 2013;9:e1003057. doi: 10.1371/journal.ppat.1003057. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 39.Kuss SK, et al. Intestinal microbiota promote enteric virus replication and systemic pathogenesis. Science. 2011;334:249–252. doi: 10.1126/science.1211057. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 40.Wagner MR, et al. Natural soil microbes alter flowering phenology and the intensity of selection on flowering time in a wild Arabidopsis relative. Ecol Lett. 2014;17:717–726. doi: 10.1111/ele.12276. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 41.Brachi B, et al. Linkage and Association Mapping of Arabidopsis thaliana Flowering Time in Nature. PLoS Genetics. 2010;6:e1000940. doi: 10.1371/journal.pgen.1000940. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 42.Igl W, et al. Modeling of Environmental Effects in Genome-Wide Association Studies Identifies SLC2A2 and HP as Novel Loci Influencing Serum Cholesterol Levels. PLoS Genetics. 2010;6:e1000798. doi: 10.1371/journal.pgen.1000798. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 43.Hamza TH, et al. Genome-Wide Gene-Environment Study Identifies Glutamate Receptor Gene GRIN2A as a Parkinson's Disease Modifier Gene via Interaction with Coffee. PLoS Genetics. 2011;7:e1002237. doi: 10.1371/journal.pgen.1002237. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 44.Qvit-Raz N, Jurkevitch E, Belkin S. Drop-size soda lakes: transient microbial habitats on a salt-secreting desert tree. Genetics. 2008;178:1615–1622. doi: 10.1534/genetics.107.082164. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 45.Chelius MK, Triplett EW. The Diversity of Archaea and Bacteria in Association with the Roots of Zea mays L. Microb Ecol. 2001;41:252–263. doi: 10.1007/s002480000087. [DOI] [PubMed] [Google Scholar]
- 46.Fierer N, Hamady M, Lauber CL, Knight R. The influence of sex, handedness, and washing on the diversity of hand surface bacteria. Proc Natl Acad Sci USA. 2008;105:17994–17999. doi: 10.1073/pnas.0807920105. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 47.Gardes M, Bruns TD. ITS primers with enhanced specificity for basidiomycetes--application to the identification of mycorrhizae and rusts. Mol Ecol. 1993;2:113–118. doi: 10.1111/j.1365-294x.1993.tb00005.x. [DOI] [PubMed] [Google Scholar]
- 48.White TJ, Bruns TD, Lee SB, Taylor JW. PCR Protocols: a guide to methods and applications. Academic Press; 1990. pp. 315–322. [Google Scholar]
- 49.Caporaso JG, et al. QIIME allows analysis of high-throughput community sequencing data. Nat Methods. 2010;7:335–336. doi: 10.1038/nmeth.f.303. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 50.Horton M, Bodenhausen N, Bergelson J. MARTA: a suite of Java-based tools for assigning taxonomic status to DNA sequences. Bioinformatics. 2010;26:568–569. doi: 10.1093/bioinformatics/btp682. [DOI] [PubMed] [Google Scholar]
- 51.R Core Team R: A Language and Environment for Statistical Computing. 2012 [Google Scholar]
- 52.Oksanen J, et al. vegan: Community Ecology Package. 2011 http://cran.r-project.org/package=vegan.
- 53.Muller BU, Stich B, Piepho HP. A general method for controlling the genome-wide type I error rate in linkage and association mapping experiments in plants. Heredity (Edinb) 2011;106:825–831. doi: 10.1038/hdy.2010.125. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 54.Legendre P, Gallagher ED. Ecologically meaningful transformations for ordination of species data. Oecologia. 2001;129:271–280. doi: 10.1007/s004420100716. [DOI] [PubMed] [Google Scholar]
- 55.Pickrell JK, et al. Understanding mechanisms underlying human gene expression variation with RNA sequencing. Nature. 2010;464:768–772. doi: 10.1038/nature08872. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 56.Hill MO. Diversity and Evenness: A Unifying Notation and Its Consequences. Ecology. 1973;54:427–432. [Google Scholar]
- 57.Brown MB. Method for combining non-independent, one-sided tests of significance. Biometrics. 1975;31:987–992. [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.



