Skip to main content
PLOS One logoLink to PLOS One
. 2020 Sep 18;15(9):e0237493. doi: 10.1371/journal.pone.0237493

Functional gene categories differentiate maize leaf drought-related microbial epiphytic communities

Barbara A Methe 1,2, David Hiltbrand 3, Jeffrey Roach 4, Wenwei Xu 5, Stuart G Gordon 6, Brad W Goodner 7, Ann E Stapleton 3,*
Editor: Sara Amancio8
PMCID: PMC7500591  PMID: 32946440

Abstract

The phyllosphere epiphytic microbiome is composed of microorganisms that colonize the external aerial portions of plants. Relationships of plant responses to specific microorganisms–both pathogenic and beneficial–have been examined, but the phyllosphere microbiome functional and metabolic profile responses are not well described. Changing crop growth conditions, such as increased drought, can have profound impacts on crop productivity. Also, epiphytic microbial communities provide a new target for crop yield optimization. We compared Zea mays leaf microbiomes collected under drought and well-watered conditions by examining functional gene annotation patterns across three physically disparate locations each with and without drought treatment, through the application of short read metagenomic sequencing. Drought samples exhibited different functional sequence compositions at each of the three field sites. Maize phyllosphere functional profiles revealed a wide variety of metabolic and regulatory processes that differed in drought and normal water conditions and provide key baseline information for future selective breeding.

Introduction

Plants form a wide variety of intimate associations with a diversity of microorganisms in the phyllosphere, the above-ground plant surface [1, 2]. Microorganisms can exist as endophytes within the plant, as epiphytes on plant surfaces (which together compose the phyllosphere) and in the soil surrounding and in the roots [35]. The ubiquity and intricacy of these plant-microbe associations support the model of the plant as a”meta-organism” or”holobiont” consisting of the host and its microbiome (the collection of microorganisms and their gene content) which maintain a relationship over the lifetime of the plant [3, 6, 7]. The plant-associated microbiome, the phytobiome, is a complex and dynamic system existing as both an agonist and antagonist of plant fitness and adaptability [8, 9]. Therefore, elucidating the nature and extent of these interactions offers significant opportunities for improving plant health, for example, through alterations in nutrient cycling, neutralizing toxic compounds, discouraging pathogens, and promoting resistance to abiotic stresses that have the potential for generating significant impact on plant productivity [1014]. Optimization of selective breeding for epiphytes presents new challenges in ensuring that microbe colonization occurs as needed, while presenting new potential effective indirect genetic selection [1416] for crop improvement. Ultimately, engineering microbial and plant genotypes for optimal function and resilience will also require causal, mechanistic analyses of gene and pathway level processes; one first step in such mechanistic analysis for the microbial components of the phyllosphere is construction of controlled synthetic communities of microbes or assembly of specific sets of microbial functional genes [17].

In contrast to the rhizosphere, the region of soil that is directly influenced by root secretions, the phyllosphere is both a relatively understudied and transitory microbial environment [2, 18]. Microbial epiphytes of the phyllosphere experience an environment subject to different influences than those found in the rhizosphere and from host endophytes. Those in the phyllosphere experience atmospheric influences including direct sunlight exposure during diurnal cycles, and barriers such as waxy cuticle resulting in an oligotrophic environment [19, 20]. More labile associations between epiphytic microbes and host leaves do present an opportunity for interventions. For example, inoculation of beneficials or application of probiotics [21] could be done rapidly, during crop growth, since above-ground leaves and stems are easy to access. Longer term interventions such as selection of host genotypes that support specific desired microbial functions on external leaf surfaces at key points during growth or in response to biotic or abiotic stress could also be attempted [22, 23].

Corn, Zea mays L., is a widely grown and economically important annual crop. Drought is an abiotic stress that can negatively affect plant productivity [24]. Hence, understanding the role that the phyllosphere may play in association with maize undergoing abiotic stress is a priority. Epiphytic microbes are a unique target for drought tolerance. Targeting such microbes has potential advantages in the speed of alterations relative to plant breeding. It also provides the potential for temporal targeting through inoculation only during the adverse conditions [14]. Supporting this potential, seed microbial inoculation for crop drought tolerance is already in commercial use (for example, https://www.indigoag.com/).

Only a small fraction of microbial diversity is culturable in vitro [25, 26]. This has led to the use of culture independent methods for study of microbial community structure and function. For approximately the past two decades, microbial and fungal diversity has been described via the sequencing of amplicons representing biomarkers, such as the 16S rRNA gene (bacteria) and internally transcribed spacer (ITS) regions (fungi) [27, 28].More recently, techniques in microbial community research have shifted to investigate community structure and function at a systems-wide level. One such systems method, metagenomics, involves sequencing and analyzing genes derived from whole communities as opposed to individual genomes. Examining microbiomes at this level has shown that microbes ultimately function within communities rather than as individual species [9]. The traditional use of taxa to investigate microbiomes does not fully account for metabolic interactions between species. Typically functional genes exhibit different patterns than taxa, and functional genes are often better predictors of niche [2931]. In addition, functional gene content can be more heritable (i.e., more driven by host genetic interactions) [32]. Functional gene analyses also provide key information needed for community-level metabolic engineering [14, 22].

To address our questions about functional differences between microbial communities, we selected a factorial design with use of multiple field sites to increase generality. We know that plant breeding requires consideration of environmental contributions. By prioritizing multiple field sites in our initial investigations, our results provide critical information for future experimental designs for breeding and extension of the experiments found here.

Seed stocks

Zea mays L. inbred B73 seed was supplied by the Maize Stock Center, http://maizecoop.cropsci.uiuc.edu/, and seed was increased at the Central Crops Agricultural Research Station, Clayton, NC using standard maize nursery procedures. Genotype of the seed lots used for these experiments was verified by SSR genotyping using eleven markers and comparison of fragment sizes to the sizes listed in the MaizeGDB database [33], http://www.maizegdb.org/ssr.php.

Experimental design and field sampling

Research field sites were generously provided by collaborators with ongoing scientific and extension projects; no additional permits or permissions were required. For this experiment we used a hierarchical design, with the treatment plots nested in each field site. There were three randomly arranged plots within each treatment level at each field site, surrounded by additional plant plots. Replicated field plots were planted in Albany, CA at the USDA University of California-Berkeley field site (abbreviated as CA), 37 degrees 53 min 12.8 sec N 122 degrees 17 min 59.8 sec W, on June 6, 2012. The field site had uniform soil and subsurface irrigation and fertilization supplied according to normal agronomic practice for this growing area. The southern section had normal irrigation throughout the season. However, the northern section had normal irrigation until vegetative growth stage V5 when all watering was stopped for two weeks; after sampling of leaves irrigation was resumed to allow plant growth to maturity. Seeds were planted at two sites in Texas, Dumas Etter field (abbreviated as DE) 35.998744 degrees N 101.988583 degrees W on May 8, 2013, and Halfway, TX field (abbreviated as HF), 34.184136 degrees N 101.943636 degrees W on April 26, 2012. The sites had center-pivot irrigation and standard maize field management. Drought treatment blocks were watered at 75% of the normal rate at DE and at 50% of the normal rate at the HF field site. The DE field site had one replicate plot that experienced additional rain late in the season (after phyllosphere sample collection). The HF field site had no unmanaged precipitation between July 9 and harvest. Late-season (post-phyllosphere sampling) field trait measurement methods and data files for each field site are provided in Supplemental Plant Traits files 1–7 in doi: 10.5061/dryad.7m0cfxprs.

Field trait measurement

At the Texas field sites (DE and HF) plant and ear heights were measured once per plot when growth was complete after tasseling. Ears were harvested and shipped to UNCW for measurement. For each ear, cob diameter at the base was measured with digital calipers, and twenty seeds were removed from the middle of each cob, placed in envelopes, and weighed. For the CA site, individual plant heights were measured and cobs were collected at the end of the season, October 1–3, 2012. Seed development was not complete, so only cob traits were measured. Cob diameter at base was measured with digital calipers; cob length was measured with a ruler. Plant data for each location and trait are included in doi: 10.5061/dryad.7m0cfxprs as plant trait files 1 to 6, with metadata about the column headers in file 7.

Leaf sampling and DNA extraction

Samples were collected from DE on June 26, 2012 and from HF on June 27, 2012, at developmental stage V8. The CA phyllosphere samples were taken August 7 and 8, 2012, at developmental stage V8. Six fully expanded leaves from the top quarter of the plants in each plot were placed into sterile bags (Whirl-Pak, Nasco, Fort Atkinson, WI) prefilled with 300 mL sterile water and 3 microliters Silwet L-77 (EMCO, North Chicago, IL). Bags were moved to nearby shelters, sonicated for one minute to loosen epiphytic microbes, and the 300 mL of wash solution was filtered through sterile Pall microfunnel 0.2 micron filter cups (VWR, Radner, PA) to collect microbial cells on the filter surface. The filter was removed from the cup with sterile tweezers and dropped into small sterile Whirl-Pak bags then stored frozen until DNA extraction. DNA was extracted from each filter with a PowerSoil Mega kit (MoBio, Carlsbad, CA). Samples were concentrated with filter-sterilized sodium chloride and absolute ethanol according to the manufacturer’s instructions and shipped frozen to JCVI for sequencing. Supplemental methods video links are available in the supplemental files repository at Data Dryad doi: 10.5061/dryad.7m0cfxprs, to provide additional details on the protocol used for leaf washes and filtering. Mock and soil samples were sampled using the leaf-wash protocol, to allow detection of any sequences likely to have entered the samples from soil particles or air and from our sampling equipment.

Library construction and sequencing

All library construction and sequencing were completed using Illumina reagents and protocols. Samples PHYLLO09 and PLYLLO10 were sequenced with Illumina HiSeq and all other samples were sequencing using the MiSeq platform. Two nucleic acid negative control filters were also processed through DNA extraction and library construction and sequencing to test for the presence of any significant contamination of experimental samples by exogenous DNA. One sample from HF drought was lost during processing. The raw data and processed reads are accessible from the NCBI Short Read Archive under Bioproject PRJNA297239.

Because of the nature of the sampling collection and nucleic acid procedures, plant host genomic DNA was inevitably included in the nucleic acid samples used for library construction. Therefore, a screening process was implemented to remove both sequencing artifacts and reads most likely to be of maize origin. Adaptor sequences were removed from the SRA sequencing reads using Trim Galore version 0.4.3 <https://www.bioinformatics.babraham.ac.uk/projects/trim_galore/>. The reads were subsequently filtered to remove maize sequences by alignment using bowtie2 version 2.2.9 [34] to v4 of the B73 Zea mays reference, Zm-B73-REFERENCE-GRAMENE-4.0 <ftp://ftp.ensemblgenomes.org/pub/plants/release-37/fasta/zea_mays/dna/Zea_mays.AGPv4.dna.toplevel.fa.gz> [35]. Quality control at each processing step: initial reads, after adapter trimming, and after host filtering, was verified by FastQC v0.11.7. Read pairs that could be joined were joined with vsearch, v1.10.2 linux x86 64, <https://github.com/torognes/vsearch> [36] and all resulting single-end reads: those that joined and those that did not, were retained for further analysis. UniProt50 protein annotation was performed using HUMAnN2 v0.9.1, <https://github.com/leylabmpi/humann2>, [37] resulting in estimates of gene family count, path abundance, and path coverage together with estimates of taxonomic profile at the species level generated by MetaPhlAn2 [38]. Gene family HUMAnN2 output was explicitly normalized to counts per megabase to adjust for different input library sizes. HUMAnN2 is a reference-based method, so we focus on comparisons between drought and control conditions within the experiments (as all reference-based methods rely on existing data). Full details of parameters, software packages, and scripts used to manage analyses are available in Data Dryad repository doi: 10.5061/dryad.7m0cfxprs.

Count data analysis

Analysis of the number of reads for each UniProt annotation in each sample was performed with ENNB [39]. The parameters and full R scripts for analyzing the data (along with an R notebook explaining the process) are available in Data Dryad at doi: 10.5061/dryad.7m0cfxprs. ENNB is a two-stage process with an elastic net for feature selection then negative binomial fit to identify significant annotations, though it is only possible to fit one factor (nested or full factorials for multiple experimental factors are not possible to fit using this two-stage multivariate method). The package was downloaded from the An web page (http://cals.arizona.edu/anling/software.htm) and scripts written to run both method 1, the trimmed mean (TMM) from the EdgeR package, and method 2, DE-Seq-type count overdispersion. Statistical analysis of annotations different in drought and well-watered conditions were carried out for each field site. The simulations that were created as described in the Simulation Construction section below were used to set the P-value threshold for the analysis of the samples. Imputation of samples was used to calculate the lambda value for cross-validation in ENNB, as specified in the ENNB documentation. The multiple imputation function within ENNB was used to create a third HF drought data column, as ENNB required three samples. After analysis, the annotation data sets were cleaned to remove any rows with annotation IDs that were present in the soil or mock-collected sequenced samples. All input files, R code, an R notebook explaining the analysis, and output files are available at doi: 10.5061/dryad.7m0cfxprs.

Visualization of significant annotations

Uniprot lists were converted to Gene Ontology lists (not a 1 to 1 mapping) using the conversion web tool at EBI, with lists available in the supplemental data in doi: 10.5061/dryad.7m0cfxprs, then the lists of GO Process and GO Function annotations that were significantly different upon output from ENNB were visualized using REVIGO [40], http://revigo.irb.hr/, with the Simrel and medium list defaults selected. The REVIGO cytoscape-format xgmml network files were color-coded and the network layout redrawn using Cytoscape v3.2.1 [41]. Venn diagrams for comparison of lists were created with http://www.webgestalt.org/GOView/ [42].

Simulation construction for analysis validation

In order to measure the precision and accuracy of our analysis pipeline, we constructed simulated files of sequences and processed these through our analysis pipeline to generate simulated counts. Then, we analyzed the simulated counts with ENNB and functions to tabulate true and false positives. We modified and updated FunctionSim (https://cals.arizona.edu/ anling/software/FunctionSIM.htm) to generate sequences with signal and noise that were made independently of our real data. The full set of scripts and parameters is available in Data Dryad at doi: 10.5061/dryad.7m0cfxprs. We tested multiple ENNB thresholds for declaring significant annotations to select suitable cutoff and analysis options with the lowest possible false positive rate. The goal for the simulations to determine if ENNB was a viable method for detecting gene counts between groups. We used a threshold alpha of 0.001. The lowest FDR (0.088) calculated using simulation-group comparisons was used to determine that ENNB would be an acceptable tool to run against the real data, provided the sequence match value to declare similarity was set to a suitably high level. The confusion matrices (true and false positives and negatives) for a range of parameter and sequence similarities are available in the Dryad repository; the notebook is genefamilies_simulations.Rmd.

Statistical analysis of plant traits

Plant traits (seed weight, plant height, and cob diameter) were analyzed with linear regression models using JMP11 Pro (SAS, Cary, NC) with an adjusted alpha of 0.05. Models were fit with water treatment (as a nominal factor) for each trait. For HF and DE cob diameter traits, plot numbers were used to identify the group of plants within the larger field and those plot IDs were included in the model to account for the blocks. The number of replicates for each comparison is provided in the box plot figure legend.

Availability of data and materials

Metagenomic sequences are available in the SRA repository, identifier BIOPROJECT PRJNA297239. All data analysis scripts, simulations, intermediate files and metadata files are available from Data Dryad doi: 10.5061/dryad.7m0cfxprs. A preliminary version of this work is available in bioRxiv under bioRxiv 104331 doi https://doi.org/10.1101/104331.

Results

To examine the microbial metabolic and regulatory functions important for leaf epiphytic community differences between drought and well-watered field plots, we developed a nested experimental design and a per-field-site analysis using factorial multivariate approaches suitable for our zero-inflated annotation read count data. We prioritized comparisons within multiple geographically diverse field sites. Genotype–environment interaction is a key logistic and experimental constraint for future host plant breeding for improved varieties that would support optimal microbial communities.

We saw little relationship between depth of microbial sequence and annotation quality (Table 1).; for example, in comparison of samples 11 and 12 where sequence depth was not correlated to signal level. Both of the soil samples and one mock sample had no sequence signal (Table 1). The second mock sample contained some sequences that were not classified as contaminants. All annotation rows present in the mock sample were removed from all sample rows before statistical analysis.

Table 1. Sample characteristics.

Sample ID Field Site1 Treatment Type Sequence Amount2 Sequence Comment
PHYLLO9 HF watered deep
PHYLLO10 DE watered deep
PHYLLO11 CA watered small all contaminant
PHYLLO12 CA drought large low proportion of signal
PHYLLO13 CA watered large
PHYLLO14 CA drought moderate
PHYLLO15 DE watered moderate
PHYLLO16 DE drought small
PHYLLO17 CA soil watered small soil sample below watered plot plants, all contaminant
PHYLLO18 CA soil drought small soil sample below drought plot plants, all contaminant
PHYLLO19 mockDE none small
PHYLLO20 mockCA none small low proportion of signal
PHYLLO21 CA watered large
PHYLLO22 CA drought large
PHYLLO23 DE watered large
PHYLLO24 DE drought small
PHYLLO25 DE drought moderate
PHYLLO26 HF watered deep
PHYLLO27 HF watered deep
PHYLLO28 HF drought deep
PHYLLO29 HF drought deep

1 Full field information for these two-letter abbreviations is available in the Methods section.

2 Small indicated that the sample contained less than 233k reads, moderate indicates 233-500k reads, large indicates 500k-1.6m reads, deep indicates greater than 1.7m reads.

Annotations differing between drought and watered treatments

To robustly determine the ENNB parameters with the fewest false positives we created simulations using an independent sequence database. Then, we processed the simulations through our sequence read and statistical analysis code and measured the number of true and false detections. For count analysis, use of the trimmed mean adjustment (Tmm1) and a threshold of 0.001 for negative binomial fitting gave fewer false positives and we report results using those thresholds. Our analyses may be re-run using the scripts and setting information provided in the Dryad repository supplemental files if different P-value thresholds are desired.

Drought and watered plots at each site had significant differences in read counts for regulatory and metabolic functions. The ENNB analysis with normalization by TMM generated a list of significant GO Process and GO Function annotations in watered as compared to drought-treated phyllosphere samples for each field site, with groups of related GO terms from REVIGO analysis indicated by edges between GO node terms. Larger nodes indicate the frequency of the annotation in the GO database, so smaller nodes with no edges such as bacteriocin immunity (Fig 1A) are the most unique. The significant GO Process terms identified as semantically distinct in the drought treatment for the Albany, CA field (abbreviated as CA) site (Fig 1A) include biochemical pathways involved in basic cellular responses, such as transcription and DNA replication, and specific metabolic remodeling pathways, such as isoleucine biosynthesis. Pathways we observed that are likely to be important for microbial community interactions include bacteriocin immunity and amino acid transport [43]. Functional annotations (Fig 1B) for the CA field site are similar to process annotations, with the addition of a cluster of energy-metabolism related binding functions, such as NADP binding (Fig 1B).

Fig 1. Network visualization of Gene Ontology process and function annotation differences between normal water and drought treatments at the CA site.

Fig 1

Significant Gene Ontology (GO) annotations from ENNB analysis were grouped by semantic similarity into a network. The size of each node is proportional to the frequency of annotation relative to the GO database. Similar terms are linked with edges. Circles and boxes indicate terms shared between field sites. a) CA field site GO Process annotations that were significantly different between fully watered and drought microbial phyllosphere samples. b) CA field site GO function annotations that were significantly different between fully watered and drought microbial phyllosphere samples.

Functional annotations that were significant from the Dumas-Etter, TX field site (abbreviated as DE) include a range of metabolic and regulatory terms, with a large cluster of amino acid, nucleic acid, and sugar metabolic enzymes (center of Fig 2A) and a second large cluster of regulatory and response categories (top of Fig 2A), such as quorum sensing. Topics related to response to oxidative stress form a smaller cluster. Unusual categories with single small nodes include protein refolding and reactive oxygen species metabolism. The term ‘transcriptional regulation’ was shared with the CA term list (circled in Figs 1 and 2). The function term network (Fig 2B) also has a cluster for metal ion binding (visible at the top left of Fig 2B). After quality control, the DE site retained all six samples (Table 1) and this site had the largest number of significant annotations (Figs 2 and S1 via https://doi.org/10.5061/dryad.7m0cfxprs).

Fig 2. Network visualization of Gene Ontology process and function annotation differences between normal water and drought treatments at the DE site.

Fig 2

Significant Gene Ontology (GO) annotations from ENNB analysis were grouped by semantic similarity into a network. The size of each node is proportional to the frequency of annotation relative to the GO database. Similar terms are linked with edges. Circles and boxes indicate terms shared between field sites. a) DE field site GO Process annotations that were significantly different between fully watered and drought microbial phyllosphere samples. b) DE field site GO function annotations that were significantly different between fully watered and drought microbial phyllosphere samples.

Significant annotations from the Halfway, TX field site (abbreviated as HF) include a group of biosynthetic enzymes for amino and fatty acid synthesis (Fig 3A top left), and amino acid biosynthesis enzymes (Fig 3A top right). The process annotation ‘translation’ was shared with the DE site (indicated by the dashed square around the node and annotation label), and amino acid transport was shared with the CA field site (indicated by a dashed diamond). In the process listing, an example unusual pathway found only in HF is self proteolysis. Functional annotations include a set of regulatory activities (e.g., kinases) and several ion binding activities. The zinc ion binding activity was shared with the DE annotation list. One unusual annotation found only in HF function was cob(I)yrinic acid a,c-diamide adenosyltransferase, which is part of the vitamin B12 cofactor pathway.

Fig 3. Network visualization of Gene Ontology process and function annotation differences between normal water and drought treatments at the HF site.

Fig 3

Significant Gene Ontology (GO) annotations from ENNB analysis were grouped by semantic similarity into a network. The size of each node is proportional to the frequency of annotation relative to the GO database. Similar terms are linked with edges. Circles and boxes indicate terms shared between field sites. a) HF field site GO Process annotations that were significantly different between fully watered and drought microbial phyllosphere samples. b) HF field site GO function annotations that were significantly different between fully watered and drought microbial phyllosphere samples.

Plant traits

To confirm that drought treatment plots were relevant for host plant performance, we analyzed plant growth measurements. All plant measurements at all sites showed significantly less growth in the treatment with less water (Fig 4). Plot effects were examined for each trait and no significant interaction between plot and replicate was found (results not shown). Mid-season plant heights were significantly less (P<0.0001) in the drought condition for the CA site. The drought-treated plants were 20% shorter, with an estimated difference between normal water and drought of 0.158 meters. The DE field site with plot 101 excluded exhibited significant (P = 0.0139) effects of drought on end-of-season seed weight (Fig 4B), with the seed weights in drought reduced by about 25% (estimated difference of 0.468 grams less in drought samples). Plot 101 from the 75% site had a late-season rain event after microbiome sampling but before seed harvest that necessitated its exclusion. Drought reduced seed weight by 50% at the HF field site (Fig 4C), with P<0.0001 and an effect difference of 1.206 g less in drought seed samples. Cob diameters were also significantly smaller in the drought-treated plants (Fig 4D, 4E and 4F) with the effect size differences ranking the drought intensity of DE (1.66 mm less in drought) less than CA (2.52 mm less in drought), with the most severe cob diameter drought effects at the HF site (3.24 mm less in drought).

Fig 4. Drought effects on plant growth.

Fig 4

Error bars are standard error and colors are grouped within a field site. a) Comparison of plant heights in drought and well-watered plots from the California-Albany (CA) field site; bar heights indicate average height in meters. Drought (less drip irrigation) n = 40, well-watered regular drip irrigation n = 40. b) Comparison of seed weights from mild drought (75% of normal irrigation) and well-watered (100% irrigation) field blocks at the Texas-Dumas-Etter (DE) field site. Colored bar indicates mean value. Drought n = 18, well-watered n = 17. c) Comparison of seed weights from in intense drought (half of normal watering level, DRT) and well-watered (WW) field plots at the Texas-Halfway (HF) site. Colored bar indicates mean value. DRT n = 4 and WW n = 22. Zeros (cobs with no seeds from DRT) were not included in the analysis. d) Box plot of cob diameter of CA samples; white line is mean and quantiles are indicated by the box and whiskers, n = 12. e) Comparison of cob diameter by water treatment in samples from the DE site. Colored bar indicates mean value, n(75%) = 28, n(100%) = 26. f) Comparison of cob diameter by water treatment in samples from the HF site. Colored bar indicates mean value, n(DRT) = 10, n(WW) = 22.

Discussion

We qualitatively compared functional genes across all three sites (S1 Fig, Data Dryad repository doi: 10.5061/dryad.7m0cfxprs), though we did not fit a statistical model for comparisons of drought effects across field sites, as the field sites differed in multiple ways. There were more shared drought-treatment-relevant functional categories in comparisons of the CA and DE field sites than in comparisons with the HF site (S1 Fig). This suggests that drought severity could play a role in functional gene importance despite differences in soil and other aspects of each field environment, because the CA and DE plots did share milder drought conditions despite differences in delivery of irrigation. We would expect differences across field sites based on plant physiology and known differences in maize growth across field sites [44]. However, field site is confounded with the field-specific drought treatments in our study and we thus cannot quantitatively compare the field site annotation networks. Shared annotations across field sites often were not consistently increased or decreased in read count levels. For example, amino acid transport process read counts were higher in watered samples at the HF site and higher in drought samples at the CA site. However, the extent of drought-treatment significant annotation term sharing (without consideration of read count levels, as shown in S1 Fig) is consistent with the extent of plant growth effect, with HF sharing fewer terms and having more severe drought.

Lists of phyllosphere ribotypes from prior field studies [4548] were used to generate a list of expected species. Expected phyllosphere species that were also present in our samples include Methylobacterium spp., Dietzia spp., and Pseudomonas spp., (Supplemental file metaphlan2.tsv in Data Dryad repository doi: 10.5061/dryad.7m0cfxprs). We carried out a detailed comparison of the annotations from the rice phyllosphere proteome [49] to our list. Six rice GO process were in the metaproteome pfam list [49], and three of the six were shared with our process lists. Recent literature on functional genes suggests that functions are more predictive than ribotype profiles [30, 31]. Therefore, in future experiments, testing of the effects of synthetic communities with similar ribosomal but different functional composition would be of broad interest. Our functional gene information is a step toward designing a future synthetic community test of functional annotation predictive ability.

In a maize leaf microbe association genetics experiment, predicted metabolic functions were more heritable than ribotypes, which also suggests that function is key [32]. Selection for specific microbial functional genes or generic markers for pathways could easily be incorporated into newer DNA-based crop genomic selection processes that are sequencing based [5052]. The importance of incorporating microbial sequence predictors lends support to the movement toward sequencing to collect all DNA data, not just filtered SNP sets or SNPs with prior data on causality. Microbial sequences are not in linkage disequilibrium like chromosomal SNPs, so it would not be possible take advantage of tag SNPs. Because the cost of complete sequencing is decreasing, we advocate for modeling and tests of full-sequence predictors that include both host chromosomal and epiphyte functional DNA information.

We suggest that a key next step in understanding use of leaf microbial annotations for crop improvement would be to measure microbial community annotations in selected and unselected breeding program lines across multiple test sites. This would allow the estimation of the genotype and environment breeding values for functional gene annotation. That information would determine future breeding strategy and would be efficient, because collection of functional gene information could be an add-on to host breeding experiments such as g2f for maize (https://www.genomes2fields.org/) and terraRef for sorghum (http://terraref.org/). There are few publicly available field sites for drought experiments–we know of only five within the continental USA–so public-private partnerships and use of large-scale field experimental networks are logical next steps for better understanding of microbial community development for crop improvement.

Leaf epiphytes have short and long term intervention possibilities. Indirect selection for host effects is likely to be more cost-effective than inoculation, but that takes much longer to implement through the required multiple breeding cycles. Leaf microbes are typically not in seeds and thus not consumed. Thus, these microbes are logical targets for improved forage quality, energy extraction from biomass, or optimization of soil fertility for the next season as well as for plant host benefit.

We advocate for future experiments that build on the functional genes we identified and combining synthetic community development approaches with breeding experiments to generate knowledge that would be needed for future holobiont breeding system development. Our results allow prioritization of specific gene function pathways in choosing culturable microbe mixtures for future experiments on design of drought tolerant epiphytic communities.

Conclusion

We identified a range of biosynthetic and regulatory microbial functional and process annotations that differed between drought and well-watered maize leaf epiphytic communities at three different field sites. These functions now provide a target for selection of beneficial microbes and for design of synthetic community casual tests of community interactions.

Supporting information

S1 Fig

(TXT)

Acknowledgments

We thank Neha Gupta, Bryan Frank and Kelvin Li for their work on library construction and sequencing. We are obliged to Stephen P. Talley for his hard work on the ENNB analysis of an initial annotation dataset. We very much appreciate the contributions of Sarah Hake, China Lunde and the Hake lab members, who provided field and lab space for this project and carried out the field management for us. We are grateful to Robert L. Bryden, Danielle Allery Nail, and Bonnie M. Mitchell for assistance with plant trait measurements and to Monika Bihan for assistance with the sequencing and annotations. Author Roles: Ann E. Stapleton conceived the experiment, designed and carried out the field sampling, oversaw the data analysis, and wrote the manuscript. Wenwei Xu set up the Texas field sites with drought and normal irrigation treatments, planted the experimental plots, collected trait data, and edited the manuscript. Jeffrey Roach constructed the simulations. Dave Hiltbrand analyzed the annotation count data. Stuart Gordon and Brad Goodner edited the manuscript. Barbara Methe supervised the sequencing and sequence data processing and edited the manuscript.

Data Availability

All relevant data are uploaded to the Dryad repository and publicly accessible via the following DOI: 10.5061/dryad.7m0cfxprs. https://datadryad.org/stash/dataset/doi:10.5061/dryad.7m0cfxprs

Funding Statement

This material is based upon work supported by the National Science Foundation under Grant No. 1126938 to AES and BM. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

References

  • 1.Bulgarelli D, Schlaeppi K, Spaepen S, van Themaat EVL, Schulze-Lefert P. Structure and Functions of the Bacterial Microbiota of Plants. Annual Review of Plant Biology. 2013;64(1):807–838. 10.1146/annurev-arplant-050312-120106 [DOI] [PubMed] [Google Scholar]
  • 2.Mu¨ller DB, Vogel C, Bai Y, Vorholt JA. type [; 2016]Available from: http://www.annualreviews.org/doi/10.1146/annurev-genet-120215-034952.
  • 3.Berg G, Grube M, Schloter M, Smalla K. Unraveling the plant microbiome: looking back and future perspectives. Plant Biotic Interactions. 2014;5:148 10.3389/fmicb.2014.00148 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 4.Knief C. Analysis of plant microbe interactions in the era of next generation sequencing technologies. Plant Genetics and Genomics. 2014;5:216 10.3389/fpls.2014.00216 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 5.Mine A, Sato M, Tsuda K. Toward a systems understanding of plant–microbe interactions. Frontiers in Plant Science. 2014;5 10.3389/fpls.2014.00423 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 6.Morella NM, Weng FCH, Joubert PM, Metcalf CJE, Lindow S, Koskella B. Successive passaging of a plant-associated microbiome reveals robust habitat and host genotype-dependent selection. bioRxiv. 2019; p. 627794 10.1101/627794 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 7.Schlechter RO, Miebach M, Remus-Emsermann MNP. Driving factors of epiphytic bacterial communities: A review. Journal of Advanced Research. 2019; 10.1016/j.jare.2019.03.003 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 8.Vacher C, Hampe A, Port´e AJ, Sauer U, Compant S, Morris CE. The Phyllosphere: Microbial Jungle at the Plant–Climate Interface. Annual Review of Ecology, Evolution, and Systematics. 2016;47(1):1–24. 10.1146/annurev-ecolsys-121415-032238 [DOI] [Google Scholar]
  • 9.Leach JE, Triplett LR, Argueso CT, Trivedi P. Communication in the Phytobiome. Cell. 2017;169(4):587–596. 10.1016/j.cell.2017.04.025 [DOI] [PubMed] [Google Scholar]
  • 10.Howden AJM, Preston GM. Nitrilase enzymes and their role in plant–microbe interactions. Microbial biotechnology. 2009;2(4):441–451. 10.1111/j.1751-7915.2009.00111.x [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 11.Choudhary DK, Sharma KP, Gaur RK. Biotechnological perspectives of microbes in agro-ecosystems. Biotechnology Letters. 2011;33(10):1905–1910. 10.1007/s10529-011-0662-0 [DOI] [PubMed] [Google Scholar]
  • 12.Churchland C, Grayston SJ. Specificity of plant-microbe interactions in the tree mycorrhizosphere biome and consequences for soil C cycling. Frontiers in Microbiology. 2014;5:261 10.3389/fmicb.2014.00261 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 13.Phieler R, Voit A, Kothe E. Microbially supported phytoremediation of heavy metal contaminated soils: strategies and applications. Advances in Biochemical Engineering/Biotechnology. 2014;141:211–235. [DOI] [PubMed] [Google Scholar]
  • 14.Orozco-Mosqueda MdC, Rocha-Granados MdC, Glick BR, Santoyo G. Microbiome engineering to improve biocontrol and plant growth-promoting mechanisms. Microbiological Research. 2018;208:25–31. 10.1016/j.micres.2018.01.005 [DOI] [PubMed] [Google Scholar]
  • 15.Parnell JJ, Berka R, Young HA, Sturino JM, Kang Y, Barnhart DM, et al. From the Lab to the Farm: An Industrial Perspective of Plant Beneficial Microorganisms. Frontiers in Plant Science. 2016;7 10.3389/fpls.2016.01110 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 16.Mitter B, Pfaffenbichler N, Flavell R, Compant S, Antonielli L, Petric A, et al. A New Approach to Modify Plant Microbiomes and Traits by Introducing Beneficial Bacteria at Flowering into Progeny Seeds. Frontiers in Microbiology. 2017;8 10.3389/fmicb.2017.00011 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 17.Vorholt JA, Vogel C, Carlstr¨om CI, Mu¨ller DB. Establishing Causality: Opportunities of Synthetic Communities for Plant Microbiome Research. Cell Host & Microbe. 2017;22(2):142–155. 10.1016/j.chom.2017.07.004 [DOI] [PubMed] [Google Scholar]
  • 18.Bringel F, Cou´ee I. Pivotal roles of phyllosphere microorganisms at the interface between plant functioning and atmospheric trace gas dynamics. Frontiers in Microbiology. 2015;6 10.3389/fmicb.2015.00486 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 19.Remus[U+2010]Emsermann MNP, Schlechter RO. Phyllosphere microbiology: at the interface between microbial individuals and the plant host. New Phytologist. 2018;218(4):1327–1333. 10.1111/nph.15054 [DOI] [PubMed] [Google Scholar]
  • 20.Mu¨ller C, Riederer M. Plant surface properties in chemical ecology. Journal of Chemical Ecology. 2005;31(11):2621–2651. 10.1007/s10886-005-7617-7 [DOI] [PubMed] [Google Scholar]
  • 21.Luziatelli F, Ficca AG, Colla G, Baldassarre Sˇvecov´a E, Ruzzi M. Foliar Application of Vegetal-Derived Bioactive Compounds Stimulates the Growth of Beneficial Bacteria and Enhances Microbiome Biodiversity in Lettuce. Frontiers in Plant Science. 2019;10 10.3389/fpls.2019.00060 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 22.Kerdraon L, Laval V, Suffert F. How can a knowledge of microbiota-pathogen interactions in cereal cropping systems help us to manage residue-borne fungal diseases? arXiv:190302246 [q-bio]. 2019;.
  • 23.Compant S, Samad A, Faist H, Sessitsch A. A review on the plant microbiome: Ecology, functions, and emerging trends in microbial application. Journal of Advanced Research. 2019; 10.1016/j.jare.2019.03.004 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 24.Schmidhuber J, Tubiello FN. Global food security under climate change. Proceedings of the National Academy of Sciences. 2007;104(50):19703–19708. 10.1073/pnas.0701976104 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 25.Kell DB, Kaprelyants AS, Weichart DH, Harwood CR, Barer MR. Viability and activity in readily culturable bacteria: a review and discussion of the practical issues. Antonie Van Leeuwenhoek. 1998;73(2):169–187. [DOI] [PubMed] [Google Scholar]
  • 26.Torsvik V, Øvre˚as L. Microbial diversity and function in soil: from genes to ecosystems. Current Opinion in Microbiology. 2002;5(3):240–245. [DOI] [PubMed] [Google Scholar]
  • 27.Rajendhran J, Gunasekaran P. Microbial phylogeny and diversity: small subunit ribosomal RNA sequence analysis and beyond. Microbiological Research. 2011;166(2):99–110. 10.1016/j.micres.2010.02.003 [DOI] [PubMed] [Google Scholar]
  • 28.Meth´e BA, Lasa I. Microbiology in the ‘omics era: from the study of single cells to communities and beyond. Current Opinion in Microbiology. 2013;16(5):602–604. 10.1016/j.mib.2013.10.002 [DOI] [PubMed] [Google Scholar]
  • 29.Kelly LW, Williams GJ, Barott KL, Carlson CA, Dinsdale EA, Edwards RA,et al. Local genomic adaptation of coral reef-associated microbiomes to gradients of natural variability and anthropogenic stressors. Proceedings of the National Academy of Sciences. 2014;111(28):10227–10232. 10.1073/pnas.1403319111 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 30.Burke C, Steinberg P, Rusch D, Kjelleberg S, Thomas T. Bacterial community assembly based on functional genes rather than species. Proceedings of the National Academy of Sciences. 2011;108(34):14288–14293. 10.1073/pnas.1101591108 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 31.Doolittle WF, Inkpen SA. Processes and patterns of interaction as units of selection: An introduction to ITSNTS thinking. Proceedings of the National Academy of Sciences. 2018;115(16):4006–4014. 10.1073/pnas.1722232115 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 32.Wallace JG, Kremling KA, Buckler ES. Quantitative Genetic Analysis of the Maize Leaf Microbiome. bioRxiv. 2018; p. 268532 10.1101/268532 [DOI] [Google Scholar]
  • 33.Sen TZ, Andorf CM, Schaeffer ML, Harper LC, Sparks ME, Duvick J, et al. MaizeGDB becomes ‘sequence-centric’. Database. 2009;2009:bap020 10.1093/database/bap020 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 34.Langmead B. A tandem simulation framework for predicting mapping quality. Genome Biology. 2017;18:152 10.1186/s13059-017-1290-3 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 35.Jiao Y, Peluso P, Shi J, Liang T, Stitzer MC, Wang B, et al. Improved maize reference genome with single-molecule technologies. Nature. 2017;546(7659):524–527. 10.1038/nature22971 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 36.Rognes T, Flouri T, Nichols B, Quince C, Mah´e F. VSEARCH: a versatile open source tool for metagenomics. PeerJ. 2016;4:e2584 10.7717/peerj.2584 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 37.Abubucker S, Segata N, Goll J, Schubert AM, Izard J, Cantarel BL, et al. Metabolic reconstruction for metagenomic data and its application to the human microbiome. PLoS computational biology. 2012;8(6):e1002358 10.1371/journal.pcbi.1002358 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 38.Truong DT, Franzosa EA, Tickle TL, Scholz M, Weingart G, Pasolli E, et al. MetaPhlAn2 for enhanced metagenomic taxonomic profiling. Nature Methods. 2015;12(10):902–903. 10.1038/nmeth.3589 [DOI] [PubMed] [Google Scholar]
  • 39.Pookhao N, Sohn MB, Li Q, Jenkins I, Du R, Jiang H, et al. A two-stage statistical procedure for feature selection and comparison in functional analysis of metagenomes. Bioinformatics. 2015;31(2):158–165. 10.1093/bioinformatics/btu635 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 40.Supek F, Boˇsnjak M, Sˇkunca N, Sˇmuc T. REVIGO Summarizes and Visualizes Long Lists of Gene Ontology Terms. PLoS ONE. 2011;6(7):e21800 10.1371/journal.pone.0021800 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 41.Shannon P, Markiel A, Ozier O, Baliga NS, Wang JT, Ramage D, et al. Cytoscape: a software environment for integrated models of biomolecular interaction networks. Genome research. 2003;13(11):2498–2504. 10.1101/gr.1239303 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 42.Zhang B, Kirov S, Snoddy J. WebGestalt: an integrated system for exploring gene sets in various biological contexts. Nucleic Acids Research. 2005;33(Web Server issue):W741–748. 10.1093/nar/gki475 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 43.Mee MT, Collins JJ, Church GM, Wang HH. Syntrophic exchange in synthetic microbial communities. Proceedings of the National Academy of Sciences. 2014; p. 201405641 10.1073/pnas.1405641111 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 44.Carena MJ, Hallauer AR, Miranda Filho JB, Filho JBM. Quantitative Genetics in Maize Breeding. Springer New York; 2010. [Google Scholar]
  • 45.Rastogi G, Sbodio A, Tech JJ, Suslow TV, Coaker GL, Leveau JHJ. Leaf microbiota in an agroecosystem: spatiotemporal variation in bacterial community composition on field-grown lettuce. The ISME journal. 2012;6(10):1812–1822. 10.1038/ismej.2012.32 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 46.Copeland JK, Yuan L, Layeghifard M, Wang PW, Guttman DS. Seasonal Community Succession of the Phyllosphere Microbiome. Molecular Plant-Microbe Interactions. 2015;28(3):274–285. 10.1094/MPMI-10-14-0331-FI [DOI] [PubMed] [Google Scholar]
  • 47.Wagner MR, Lundberg DS, Rio TGd, Tringe SG, Dangl JL, Mitchell-Olds T. Host genotype and age shape the leaf and root microbiomes of a wild perennial plant. Nature Communications. 2016;7:12151 10.1038/ncomms12151 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 48.Bodenhausen N, Horton MW, Bergelson J. Bacterial Communities Associated with the Leaves and the Roots of Arabidopsis thaliana. PLOS ONE. 2013;8(2):e56329 10.1371/journal.pone.0056329 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 49.Knief C, Delmotte N, Chaffron S, Stark M, Innerebner G, Wassmann R, et al. Metaproteogenomic analysis of microbial communities in the phyllosphere and rhizosphere of rice. The ISME Journal. 2012;6(7):1378–1390. 10.1038/ismej.2011.192 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 50.P´erez-Enciso M, Forneris N, Campos Gdl, Legarra A. Evaluating Sequence-Based Genomic Prediction with an Efficient New Simulator. Genetics. 2016; p. genetics.116.194878. 10.1534/genetics.116.194878 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 51.Forneris NS, Vitezica ZG, Legarra A, P´erez-Enciso M. Influence of epistasis on response to genomic selection using complete sequence data. Genetics Selection Evolution. 2017;49:66 10.1186/s12711-017-0340-3 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 52.Zhang C, Kemp RA, Stothard P, Wang Z, Boddicker N, Krivushin K, et al. Genomic evaluation of feed efficiency component traits in Duroc pigs using 80K, 650K and whole-genome sequence variants. Genetics Selection Evolution. 2018;50:14 10.1186/s12711-018-0387-9 [DOI] [PMC free article] [PubMed] [Google Scholar]

Decision Letter 0

Sara Amancio

24 Dec 2019

PONE-D-19-16307

Functional gene categories differentiate maize leaf drought-related microbial epiphytic communities

PLOS ONE

Dear Dr Stapleton,

Thank you for submitting your manuscript to PLOS ONE. After careful consideration, we feel that it has merit but does not fully meet PLOS ONE’s publication criteria as it currently stands. Therefore, we invite you to submit a revised version of the manuscript that addresses the points raised during the review process.

The ms “Functional Gene Categories Differentiate Maize Leaf Drought-Relate Microbial Epiphytic Communities” PONE-D-19-16307 is available at the preprint server bioRxiv since 2017  (104331) as “Functional genes that distinguish maize phyllosphere metagenomes in drought and well-watered conditions”.

The present version submitted in June 2019 was reviewed by two specialists who made a number of comments and suggestions.

The authors are asked to prepare a new version addressing the reviewer´s comments and/or to reply point by point to those comments.

Since the preprint version is from 2017, the available information has expanded. As so,  references must be updated. (Cf, for example, reviewer 1 for “culturable microorganisms”).and the text adjusted accordingly.

Also in what concerns the reference microbiome: if not possible to use a plant-associated bacteria database, give a substantiate justification.

Address (or explain to  reviewer 1) the questions raised on Bioinformatics approaches.

Validation of key genes expression by qPCR would address the suggestion by reviewer 2.

We would appreciate receiving your revised manuscript by 2020 February 10th. When you are ready to submit your revision, log on to https://www.editorialmanager.com/pone/ and select the 'Submissions Needing Revision' folder to locate your manuscript file.

If you would like to make changes to your financial disclosure, please include your updated statement in your cover letter.

To enhance the reproducibility of your results, we recommend that if applicable you deposit your laboratory protocols in protocols.io, where a protocol can be assigned its own identifier (DOI) such that it can be cited independently in the future. For instructions see: http://journals.plos.org/plosone/s/submission-guidelines#loc-laboratory-protocols

Please include the following items when submitting your revised manuscript:

  • A rebuttal letter that responds to each point raised by the academic editor and reviewer(s). This letter should be uploaded as separate file and labeled 'Response to Reviewers'.

  • A marked-up copy of your manuscript that highlights changes made to the original version. This file should be uploaded as separate file and labeled 'Revised Manuscript with Track Changes'.

  • An unmarked version of your revised paper without tracked changes. This file should be uploaded as separate file and labeled 'Manuscript'.

Please note while forming your response, if your article is accepted, you may have the opportunity to make the peer review history publicly available. The record will include editor decision letters (with reviews) and your responses to reviewer comments. If eligible, we will contact you to opt in or out.

We look forward to receiving your revised manuscript.

Kind regards,

Sara Amancio

Academic Editor

PLOS ONE

Additional Editor Comments:

The ms “Functional Gene Categories Differentiate Maize Leaf Drought-Relate Microbial Epiphytic Communities” PONE-D-19-16307 is available at the preprint server bioRxiv since 2017 (104331) as “Functional genes that distinguish maize phyllosphere metagenomes in drought and well-watered conditions”.

The present version submitted in June 2019 was reviewed by two specialists who made a number of comments and suggestions.

The authors are asked to prepare a new version addressing the reviewer´s comments and/or to reply point by point to those comments.

Since the preprint version is from 2017, the available information has increased. As so, a references must be updated. (Cf, for example, reviewer 1 for “culturable microorganisms”).

Also in what concerns reference microbiome: if not possible to use a plant-associated bacteria database, give a substantiate justification.

Address (or explain to the reviewer) the questions on Bioinformatics approaches.

Validation of key genes expression by qPCR would address the suggestion by reviewer 2.

Journal requirements:

When submitting your revision, we need you to address these additional requirements.

1. Please ensure that your manuscript meets PLOS ONE's style requirements, including those for file naming. The PLOS ONE style templates can be found at

http://www.journals.plos.org/plosone/s/file?id=wjVg/PLOSOne_formatting_sample_main_body.pdf and http://www.journals.plos.org/plosone/s/file?id=ba62/PLOSOne_formatting_sample_title_authors_affiliations.pdf

[Note: HTML markup is below. Please do not edit.]

Reviewers' comments:

Reviewer's Responses to Questions

Comments to the Author

1. Is the manuscript technically sound, and do the data support the conclusions?

The manuscript must describe a technically sound piece of scientific research with data that supports the conclusions. Experiments must have been conducted rigorously, with appropriate controls, replication, and sample sizes. The conclusions must be drawn appropriately based on the data presented.

Reviewer #1: No

Reviewer #2: Partly

**********

2. Has the statistical analysis been performed appropriately and rigorously?

Reviewer #1: No

Reviewer #2: Yes

**********

3. Have the authors made all data underlying the findings in their manuscript fully available?

The PLOS Data policy requires authors to make all data underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data—e.g. participant privacy or use of data from a third party—those must be specified.

Reviewer #1: Yes

Reviewer #2: Yes

**********

4. Is the manuscript presented in an intelligible fashion and written in standard English?

PLOS ONE does not copyedit accepted manuscripts, so the language in submitted articles must be clear, correct, and unambiguous. Any typographical or grammatical errors should be corrected at revision, so please note any specific errors here.

Reviewer #1: No

Reviewer #2: Yes

**********

5. Review Comments to the Author

Please use the space provided to explain your answers to the questions above. You may also include additional comments for the author, including concerns about dual publication, research ethics, or publication ethics. (Please upload your review as an attachment if it exceeds 20,000 characters)

Reviewer #1: In the manuscript by Methe et al. the authors perform a metagenomic characterization of the maize leaf epiphytic microbial community in normally irrigated plants and plants under drought stress. The authors use a replicated design across three sites USA and find that some functions recurrently have differential abundance between watered and drought stressed plants. The authors further show plant phenotypic changes in response to drought stress.

At this point, I have some major concerns regarding the methods, and clarity of presentation, as well as some minor issues that I point out below.

Major comments

1. I commend the authors for making publicly available all the raw sequencing data and for creating synapse repository with all the code used for analysis. I was able to download all of it. However, I found that I could only see the contents of the files in synapse.org repositories after registering for an account. The account is free, but I do feel this requirement places an unnecessary barrier for potential readers. It seems like synapse.org doesn’t allow for download without an account so I would encourage the authors to consider alternatives. Finally, I couldn’t find clear documentation on synapse.org regarding the type of stats that are collected regarding data usage, and which of those stats are available to the repository owners; since I was forced to provide and email to create my account and download the repositories, I opted for creating a new email account (unrelated to my affiliation and name) in order to ensure that the confidentiality of the peer review is maintained.

2. The introduction and discussion seem a bit off-topic from the core of the paper. For example the authors spend a lot of sentences talking about the potential for synthetic communities, but all the work described is on field environments with their own natural microbial communities. No strains are isolated either so it is not directly clear how this work translate synthetic communities. On the other hand, there is no description of the phyllosphere microenvironemt; what are the defining characteristics of this microbial habitat? Are there any maize specific features that might be relevant? I think presenting more about the phyllosphere itself would help interpret the functional results much betters than it is possible now.

3. In general, I found the manuscript hard to read because of excessive jargon and complex sentences. I give one example of a very hard to parse sentence “a per-field-site analysis using factorial multivariate approaches suitable for zero-inflated annotation read count data.”. This is the only mention of Zero-Inflation in the manuscript so I actually don’t know what the authors are trying to point out. It could be greatly simplified by saying “We analyzed each site independently with method X”. I think a good deal of effort needs to go into streamlining the manuscript and ensuring that terms are defined and well utilized.

4. Line 46. The number of culturable microorganisms has greatly increased in recent years. The authors provide dated references. For example, in the plant field check doi:10.1038/s41588-017-0012-9 to see that the number is closer to 50%, and this has become apparent in many other environments as well.

5. The authors use the HUMAnN2 pipeline to map their metagenomic sequences to UniProt and their associated GO annotations. This pipeline was defined with the human microbiome in mind and so I wonder how representative the reference is for plant-associated bacteria. What proportion of the reads can be mapped with this pipeline? Did the authors tried to map to a plant-associated bacteria database instead? Also, I couldn’t really figure out which file in the 10.7303/syn12933189 repository contains the details for running this pipeline.

6. I didn’t understand the rationale for sequencing two samples with HiSeq and the rest with MiSeq. What proportion of the phyllosphere are the authors capturing? Do saturation plots show that the number of new features reaches a plateau with the amount of sequencing they do?

7. In my experience, when dealing with bacterial sequences of non-model organisms, one can typically annotate less than 50% of bacterial genes with a GO tag, and many of those annotations are very general. I think this is reflected in the overlapping features highlighted in supplementary figure 1b. I would think that using a different annotation pipeline would lead to more relevant and interpretable results. Also, what proportion of the UniProt hits had a GO term?

8. I was confused by the imputation of missing HF and CA samples (lines 166-167). What is a missing sample? Why do you need to impute? What is being imputed? I also was confused by the code at “ENNB_GeneFamilies.R” and “genefamiles_real_data.nb.html”. What I see from the code `pmap_dbl(list(.$PHYLLO28, .$PHYLLO29, .$PHYLLO30), ~ (..1 + ..2 + ..3) / 3)` is that the authors are taking the average of three samples to create a fourth sample. In the next line of code the authors use the average of four samples (including the recently imputed one) to create a new average and a fifth imputed sample. This type of average-based imputation is known to be biased and to overestimate confidence; it also does not match the main text which mentions package MI. Finally, it seems like 5 samples are being created for each group, but the table in the ““Phyllosephere Metagenomics Experimental Design.docx” suggests that there are only 3 per group.

9. The authors describe some simulations to estimate the false positive rate of their pipeline. I think more details are needed. How many simulations? Did they match the number of samples and sequencing depth? What did they use as reference to simulate reads? They mention they selected the lowest false positive rate, but what was that rate? I think a figure (probably supplemental) summarizing the results of this simulation would be a good addition. Further, many assumptions go into simulating reads and/or count data, did the authors try permutation of the real data?

10. I found the two-step method for count comparison quite confusing. The authors use a binomial elastic net (after TMM) to select features, and then use a negative binomial model in the selected features. It seems like this will affect the FDR estimation since by pre-selecting features, one ends up with an excess of small p-values from the negative binomial. Further, I am not sure how well the binomial model captures the data, why not use a poisson elastic net which is closer to what the data behaves. I also don’t understand why only a two group comparison can be performed. Any model can be defined in the design matrix of the elastic net, and so more complex designs can be used. Moreover, there are tools specifically designed to test for differences between groups that are nested by some other variable (e.g. mixed models), therefore I think it should be possible and more correct to use one of these approaches to model all the samples together.

11. I was surprised by no signal in the soil samples. Drought stress should influence soil organisms as well, do the authors have any explanation for it?

12. The authors show a handful of GO terms that are differential between watered and drought-stressed plants. Some of those terms are differential in multiple sites. However, the way the data is analyzed it is hard to determine if the differential abundance is in the same direction and if the number of overlapping terms is significant or can be explained by chance alone. Please differentiate between drought enriched and drought depleted, and provide statistical quantification of the amount of overlap. Also, are the GO terms that are differential in multiple sites driven by the same uniprot genes?

13. The authors don’t show any relationship between the plant phenotypes and the microbial composition. It is well known that drought stress causes a number of changes in the plant. And it is reasonable to hypothesize that bacteria might mediate or help cope with some of those changes. But the fact that both the microbial functional composition and some plant phenotypes change in drought stress does not really tell us if that hypothesis is true. The authors could directly test for associations between drought stress phenotypes and functional content, but in the current form one cannot draw any conclusions.

14. The authors list a number of references that suggest that functional content might be more informative than taxonomic composition. This is an interesting area for discussion but the authors present no taxonomic comparisons from their data. Therefore, we don’t know if the taxonomic differences in drought vs watered plants are weaker than the functional differences they present. I think the authors should include a taxonomic comparison (which they have readily available from their metagenome sequencing) if they want to claim that functional characterization is more important than taxonomic comparisons.

Minor comments:

1. I was able to figure out from the table in the “Phyllosephere Metagenomics Experimental Design.docx” file that there are 2-3 samples per site x treatment, but I think the precise number of samples should be in the main text as a small table of figure, maybe as part of a diagram of the experimental design.

2. The sample processing methods suggest that only epiphytes are being considered. There is nothing wrong with that, but the authors should be explicit about it in the different sections of the main text (ie. abstract, introduction, results, discussion).

3. Line 230. Threshold on what parameter?

4. For figs 1-3, I found the circles and boxes hard to follow. I suggest the authors try colors since those are more visually obvious. In any case, I also think the authors should include a graphical legend indicating what the different symbols/sizes/colors mean.

Reviewer #2: The manuscript “Functional gene categories differentiate maize leaf drought-related microbial epiphytic communities” is well written, and analyze the functional gene of phyllosphere microbiome associated to maiz leaf growing in two conditions, drought and well-watered. The analysis of genes in both growth conditions shows the importance of genes related with aminoacids biosynthesis and transport, metal ion binding and regulatory functions as quorum sensing. However, additional experiments as expression differential of genes involved with aminoacids byosinthesis or drought responses in maize would support the hypothesis of bioinformatic analysis on functional gene. The agronomic data showed in the figure 4b-f, the scale of “Y” axis in the graphs need correct the scale and letter size. Additional information showing the effects on plant growth maize plants under the two conditions evaluated could complement the agronomic data evaluated.

**********

6. PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files.

If you choose “no”, your identity will remain anonymous but your review may still be made public.

Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy.

Reviewer #1: No

Reviewer #2: No

[NOTE: If reviewer comments were submitted as an attachment file, they will be attached to this email and accessible via the submission site. Please log into your account, locate the manuscript record, and check for the action link "View Attachments". If this link does not appear, there are no attachment files to be viewed.]

While revising your submission, please upload your figure files to the Preflight Analysis and Conversion Engine (PACE) digital diagnostic tool, https://pacev2.apexcovantage.com/. PACE helps ensure that figures meet PLOS requirements. To use PACE, you must first register as a user. Registration is free. Then, login and navigate to the UPLOAD tab, where you will find detailed instructions on how to use the tool. If you encounter any issues or have any questions when using PACE, please email us at figures@plos.org. Please note that Supporting Information files do not need this step.

PLoS One. 2020 Sep 18;15(9):e0237493. doi: 10.1371/journal.pone.0237493.r002

Author response to Decision Letter 0


12 Apr 2020

We very much appreciate these thorough, thoughtful review comments. Our replies are in plain text below the italicized comments and we have uploaded our reply as a file in this resubmission.

Additional Editor Comments:

The ms “Functional Gene Categories Differentiate Maize Leaf Drought-Relate Microbial Epiphytic Communities” PONE-D-19-16307 is available at the preprint server bioRxiv since 2017 (104331) as “Functional genes that distinguish maize phyllosphere metagenomes in drought and well-watered conditions”.

The present version submitted in June 2019 was reviewed by two specialists who made a number of comments and suggestions.

The authors are asked to prepare a new version addressing the reviewer´s comments and/or to reply point by point to those comments.

Since the preprint version is from 2017, the available information has increased. As so, a references must be updated. (Cf, for example, reviewer 1 for “culturable microorganisms”).

We apologize for the confusion – the biorxiv preprint was an older version with different authors, title and text. We have now updated the biorxiv to the version we submitted to PLOS ONE. In this version the references are up to date.

Also in what concerns reference microbiome: if not possible to use a plant-associated bacteria database, give a substantiate justification.

We address this by explaining the admittedly confusing name of the microbiome resource; the developers do note on the first page of their documentation that the name is a historical artifact and is no longer an accurate descriptor of the software pipeline.

Address (or explain to the reviewer) the questions on Bioinformatics approaches.

We address each point in the reviewer comments below.

Validation of key genes expression by qPCR would address the suggestion by reviewer 2.

We explain in the reply to reviewer 2 that this is not a good option as our experimental design is not suitable for testing for predictive gene expression change (which occurs on short time scales). Our focus is on gene family DNA content differences that reflect microbial community differences across drought factor levels.

Journal requirements:

When submitting your revision, we need you to address these additional requirements.

1. Please ensure that your manuscript meets PLOS ONE's style requirements, including those for file naming. The PLOS ONE style templates can be found at

http://www.journals.plos.org/plosone/s/file?id=wjVg/PLOSOne_formatting_sample_main_body.pdf and http://www.journals.plos.org/plosone/s/file?id=ba62/PLOSOne_formatting_sample_title_authors_affiliations.pdf

We have complied with these format instructions. We have a latex version of our manuscript and we would prefer to submit using that format if possible. We also have created a docx format with track changes as requested.

Reviewer #1: In the manuscript by Methe et al. the authors perform a metagenomic characterization of the maize leaf epiphytic microbial community in normally irrigated plants and plants under drought stress. The authors use a replicated design across three sites USA and find that some functions recurrently have differential abundance between watered and drought stressed plants. The authors further show plant phenotypic changes in response to drought stress.

At this point, I have some major concerns regarding the methods, and clarity of presentation, as well as some minor issues that I point out below.

Major comments

1. I commend the authors for making publicly available all the raw sequencing data and for creating synapse repository with all the code used for analysis. I was able to download all of it. However, I found that I could only see the contents of the files in synapse.org repositories after registering for an account. The account is free, but I do feel this requirement places an unnecessary barrier for potential readers. It seems like synapse.org doesn’t allow for download without an account so I would encourage the authors to consider alternatives. Finally, I couldn’t find clear documentation on synapse.org regarding the type of stats that are collected regarding data usage, and which of those stats are available to the repository owners; since I was forced to provide and email to create my account and download the repositories, I opted for creating a new email account (unrelated to my affiliation and name) in order to ensure that the confidentiality of the peer review is maintained.

We agree that the synapse repository rules, which were put into place to ensure that users agreed to the data conduct standards, do not serve anonymous reviewers well. We appreciate all the effort you put into accessing our code and documentation! We used synapse.org as it is free, version-controlled, and allowed enough space for all our files, data and documentation. Other options did not meet all of those constraints for us at that time. We will make our synapse repositories public and freely accessible upon publication of our manuscript.

2. The introduction and discussion seem a bit off-topic from the core of the paper. For example the authors spend a lot of sentences talking about the potential for synthetic communities, but all the work described is on field environments with their own natural microbial communities. No strains are isolated either so it is not directly clear how this work translate synthetic communities. On the other hand, there is no description of the phyllosphere microenvironemt; what are the defining characteristics of this microbial habitat? Are there any maize specific features that might be relevant? I think presenting more about the phyllosphere itself would help interpret the functional results much betters than it is possible now.

There may have been some confusion between the old preprint and the new PLOS submission. We agree that it is confusing to have the old preprint visible (with the prior title and different author list), so we have updated biorxiv with the version we submitted to PLOS. That version is now available from biorxiv. We only discuss synthetic communities as a future research area in the discussion section of the updated current PLOS manuscript.

We did not measure any specific leaf habitat features in this work, so we did not discuss that literature. We have examined leaf surfaces via scanning electron microscopy in some previous work (https://f1000research.com/articles/6-1698), but we did not feel it was necessary to cite this. We would be happy to add this self-citation or other cites to habitat features, but it is not a measured part of the experimental results that we present.

3. In general, I found the manuscript hard to read because of excessive jargon and complex sentences. I give one example of a very hard to parse sentence “a per-field-site analysis using factorial multivariate approaches suitable for zero-inflated annotation read count data.”. This is the only mention of Zero-Inflation in the manuscript so I actually don’t know what the authors are trying to point out. It could be greatly simplified by saying “We analyzed each site independently with method X”. I think a good deal of effort needs to go into streamlining the manuscript and ensuring that terms are defined and well utilized.

We have added additional explanation to the methods section.

4. Line 46. The number of culturable microorganisms has greatly increased in recent years. The authors provide dated references. For example, in the plant field check doi:10.1038/s41588-017-0012-9 to see that the number is closer to 50%, and this has become apparent in many other environments as well.

We reworded that sentence (line 45) to be up-to date and to be less specific about culturability numbers, as they certainly can change over time. The doi (doi:10.1038/s41588-017-0012-9) refers to an analysis of plant roots and does not specifically describe culturability experiments as far as we can tell from the abstract; we do not have access to the full text.

5. The authors use the HUMAnN2 pipeline to map their metagenomic sequences to UniProt and their associated GO annotations. This pipeline was defined with the human microbiome in mind and so I wonder how representative the reference is for plant-associated bacteria. What proportion of the reads can be mapped with this pipeline? Did the authors tried to map to a plant-associated bacteria database instead? Also, I couldn’t really figure out which file in the 10.7303/syn12933189 repository contains the details for running this pipeline.

This pipeline “is appropriate for any type of microbial community shotgun sequence profiling (i.e. not just the human microbiome; the name is a historical product of its origin” as stated by the pipeline developer at https://bitbucket.org/biobakery/biobakery/wiki/humann2.

6. I didn’t understand the rationale for sequencing two samples with HiSeq and the rest with MiSeq. What proportion of the phyllosphere are the authors capturing? Do saturation plots show that the number of new features reaches a plateau with the amount of sequencing they do?

Sequence data from a sample are inherently a population sample, as we cannot plan to sequence every DNA molecule in a sample using this technology. This ‘sampling of molecules’ aspect of sequencing was an important criteria in our data analysis design; we use zero-inflated distributions to appropriately model these data. Rather than rely on saturation plots and ‘equal samples’ we modeled the output reads using a more appropriate distribution and an analysis method that allows for any differences in feature detection. We did use two different sequencing technologies to check for any biases in our quality control processing from the raw reads; we did not observe any consistent differences in QC parameters between the MiSeq and HiSeq outputs.

7. In my experience, when dealing with bacterial sequences of non-model organisms, one can typically annotate less than 50% of bacterial genes with a GO tag, and many of those annotations are very general. I think this is reflected in the overlapping features highlighted in supplementary figure 1b. I would think that using a different annotation pipeline would lead to more relevant and interpretable results. Also, what proportion of the UniProt hits had a GO term?

We chose GO as this annotation type is widely used and interpretable. One reason we made all our raw data and intermediate files available is to allow others to apply updated functional annotations if desired. It would be useful in the future to do a simulation study of the inherent error and distributions in functional annotations, though this kind of simulation would be a more theoretical and statistical aspect of data analysis and thus is not part of our experimental study.

8. I was confused by the imputation of missing HF and CA samples (lines 166-167). What is a missing sample? Why do you need to impute? What is being imputed? I also was confused by the code at “ENNB_GeneFamilies.R” and “genefamiles_real_data.nb.html”. What I see from the code `pmap_dbl(list(.$PHYLLO28, .$PHYLLO29, .$PHYLLO30), ~ (..1 + ..2 + ..3) / 3)` is that the authors are taking the average of three samples to create a fourth sample. In the next line of code the authors use the average of four samples (including the recently imputed one) to create a new average and a fifth imputed sample. This type of average-based imputation is known to be biased and to overestimate confidence; it also does not match the main text which mentions package MI. Finally, it seems like 5 samples are being created for each group, but the table in the ““Phyllosephere Metagenomics Experimental Design.docx” suggests that there are only 3 per group.

We clarified our methods data analysis section and our code repository to more clearly explain how we used the imputations to get the lambda value that was used in the cross validation for the elastic net portion of the ENNB analysis of the sample data. The simulations were used to select appropriate parameters and P-value thresholds for ENNB to ensure optimal error control in our analysis of the sample data. We added more explanation of these steps to our code repository, and we re-confirmed that the R code runs and produces the expected outputs. We agree that this was poorly explained in the original text, and we appreciate your careful review of our code and documentation.

9. The authors describe some simulations to estimate the false positive rate of their pipeline. I think more details are needed. How many simulations? Did they match the number of samples and sequencing depth? What did they use as reference to simulate reads? They mention they selected the lowest false positive rate, but what was that rate? I think a figure (probably supplemental) summarizing the results of this simulation would be a good addition. Further, many assumptions go into simulating reads and/or count data, did the authors try permutation of the real data?

Permutation of real data is not appropriate for data analysis from our small-n experimental design. We need to choose analysis parameters that have an optimal true and false positive level, and thus we must create known-truth simulations. Permutations would address a different question and are quite problematic for small numbers of highly multivariate data with zero-inflated distributions. The R notebook in the supplemental data has a succinct summary of the simulation analyses that were used, with a discussion of how method performance degrades when genes are not assigned to the correct gene family by sequence comparison.

10. I found the two-step method for count comparison quite confusing. The authors use a binomial elastic net (after TMM) to select features, and then use a negative binomial model in the selected features. It seems like this will affect the FDR estimation since by pre-selecting features, one ends up with an excess of small p-values from the negative binomial. Further, I am not sure how well the binomial model captures the data, why not use a poisson elastic net which is closer to what the data behaves. I also don’t understand why only a two group comparison can be performed. Any model can be defined in the design matrix of the elastic net, and so more complex designs can be used. Moreover, there are tools specifically designed to test for differences between groups that are nested by some other variable (e.g. mixed models), therefore I think it should be possible and more correct to use one of these approaches to model all the samples together.

The true and false positive rates in the ENNB two-stage approach are described at length in their original paper (Pookhao et al, Bioinformatics 2015); the method parameters were carefully calibrated to avoid excess false positives. We chose this ENNB method specifically because it models the count data appropriately, with a well-accepted distribution for counts, the zero-inflated negative binomial. We know that our data type is zero-inflated count data, and thus it is the appropriate distribution to use. The ENNB package only supports two-group (single factor) comparison. We do not know of any packages that allow hierarchical factor analysis (as with our experimental design that has the drought factor nested in location) with zero-inflated count data. Additional details on the complexity of this modeling problem are described in papers such as those from D. Witten and colleagues (https://amstat.tandfonline.com/doi/abs/10.1080/10618600.2015.1067217#.XijgYhf_qkY). We also explain the experimental design limits across geographies in our reply to comment 12 below.

11. I was surprised by no signal in the soil samples. Drought stress should influence soil organisms as well, do the authors have any explanation for it?

The extraction method was designed to be specific for leaf epiphytes, not for soils. Thus, we expected little DNA from these samples -- and that is what we observed. The soil samples control for the possible confounding effect of having soil particles present on leaf surfaces and inadvertently extracting soil-associated microbial DNA.

12. The authors show a handful of GO terms that are differential between watered and drought-stressed plants. Some of those terms are differential in multiple sites. However, the way the data is analyzed it is hard to determine if the differential abundance is in the same direction and if the number of overlapping terms is significant or can be explained by chance alone. Please differentiate between drought enriched and drought depleted, and provide statistical quantification of the amount of overlap. Also, are the GO terms that are differential in multiple sites driven by the same uniprot genes?

We purposefully did not do a statistical analysis across the geographic locations, as there are many factors that differ across geographies and three sites are thus a priori confounded across these multiple differences. The goal of using multiple sites is to illustrate that functional gene changes are likely to be site-specific, and to encourage follow-up large-scale experiments or future experimental designs such as transplantation of soils and plants across field sites.

13. The authors don’t show any relationship between the plant phenotypes and the microbial composition. It is well known that drought stress causes a number of changes in the plant. And it is reasonable to hypothesize that bacteria might mediate or help cope with some of those changes. But the fact that both the microbial functional composition and some plant phenotypes change in drought stress does not really tell us if that hypothesis is true. The authors could directly test for associations between drought stress phenotypes and functional content, but in the current form one cannot draw any conclusions.

This would require a ‘multi-view’ analysis; this type of analysis is of strong recent research interest (for example, https://academic.oup.com/biostatistics/advance-article-abstract/doi/10.1093/biostatistics/kxz001/5310125). However, there is little clarity on optimal methods for such an analysis at this time. We prefer the normal statistical approach of considering all measurements on a sample to be multivariate. Multivariate analysis of a factorial experimental design does not allow inference on the order of events; though the plant measurements are taken later in the season the analysis is more appropriately considered as one set of multiple measurements of the same sampling unit.

14. The authors list a number of references that suggest that functional content might be more informative than taxonomic composition. This is an interesting area for discussion but the authors present no taxonomic comparisons from their data. Therefore, we don’t know if the taxonomic differences in drought vs watered plants are weaker than the functional differences they present. I think the authors should include a taxonomic comparison (which they have readily available from their metagenome sequencing) if they want to claim that functional characterization is more important than taxonomic comparisons.

We provide our experimental data (including the taxonomic assignments) for anyone interested in using real data to examine this question. In our view, the relative value of taxonomic information for a range of different microbial community questions and for different types of experimental designs is best studied using simulations and large data sets, rather than using the focused experimental designs that we used to address functional differences with and without drought. As you note, we cite these references to support our focus on functional analysis, not as part of a different kind of study on the relative usefulness of taxonomic versus functional annotations.

Minor comments:

1. I was able to figure out from the table in the “Phyllosephere Metagenomics Experimental Design.docx” file that there are 2-3 samples per site x treatment, but I think the precise number of samples should be in the main text as a small table of figure, maybe as part of a diagram of the experimental design.

We appreciate this suggestion but prefer to keep the larger table with full experimental details in the supplemental files to keep the number of figures and tables in the main text more compact.

2. The sample processing methods suggest that only epiphytes are being considered. There is nothing wrong with that, but the authors should be explicit about it in the different sections of the main text (ie. abstract, introduction, results, discussion).

This is described in the title of the article as well as in these sections. Perhaps there was confusion between the submitted version of the article and the old bioarxiv preprint?

3. Line 230. Threshold on what parameter?

Corrected.

4. For figs 1-3, I found the circles and boxes hard to follow. I suggest the authors try colors since those are more visually obvious. In any case, I also think the authors should include a graphical legend indicating what the different symbols/sizes/colors mean.

We chose the gray-scales, shapes and patterns to be color-blind friendly. We agree that color-coding would be useful for many readers, but we also wish to ensure access for all. If PLOS ONE allows a graphical legend in our case we would be happy to include that within the figures.

Reviewer #2: The manuscript “Functional gene categories differentiate maize leaf drought-related microbial epiphytic communities” is well written, and analyze the functional gene of phyllosphere microbiome associated to maiz leaf growing in two conditions, drought and well-watered. The analysis of genes in both growth conditions shows the importance of genes related with aminoacids biosynthesis and transport, metal ion binding and regulatory functions as quorum sensing. However, additional experiments as expression differential of genes involved with aminoacids byosinthesis or drought responses in maize would support the hypothesis of bioinformatic analysis on functional gene.

Our experiment was not designed to detect RNA expression differences, which occur on a short time scale and would require much more intensive sampling and a different experimental design. We agree that follow-up experiments should consider appropriate experimental designs for measurement of gene expression if there is future interest in short-term adaptive processes in epiphytic microbiomes.

The agronomic data showed in the figure 4b-f, the scale of “Y” axis in the graphs need correct the scale and letter size.

We have a high-resolution figure with appropriate text size available; unfortunately it could not be submitted in the journal’s MSS submission system. We will work with the editors to ensure that the final figure is of appropriate quality.

Additional information showing the effects on plant growth maize plants under the two conditions evaluated could complement the agronomic data evaluated.

To confirm that the drought conditions we applied in our experiments were relevant, we measured plant traits normally used in agronomic experiments, such as plant height and seed weight. This measurement met our goal of checking that the drought conditions were agronomically relevant.

Attachment

Submitted filename: 2020 reply to reviewers PLOS ONE.docx

Decision Letter 1

Sara Amancio

6 May 2020

PONE-D-19-16307R1

Functional gene categories differentiate maize leaf drought-related microbial epiphytic communities

PLOS ONE

Dear Dr. Stapleton,

Thank you for submitting your manuscript to PLOS ONE. After careful consideration, we feel that it has merit but does not fully meet PLOS ONE’s publication criteria as it currently stands. Therefore, we invite you to submit a revised version of the manuscript that addresses the points raised during the review process.

The ms “Functional Gene Categories Differentiate Maize Leaf Drought-Relate Microbial Epiphytic Communities” PONE-D-19-16307R1 was resubmitted in april 2020, almost four months after the decision of the reviewing of the first version was sent to authors.

R1 version was reviewed by the same experts as the first version. Both asked for further revision and addressed comments and questions to the present version.

Besides other points raised by the reviewers, soil sampling and statistical analysis must be thoroughly addressed.

The authors are asked to prepare a new version addressing the reviewer´s comments and/or to reply point by point to those comments.

We would appreciate receiving your revised manuscript by 15th June. When you are ready to submit your revision, log on to https://www.editorialmanager.com/pone/ and select the 'Submissions Needing Revision' folder to locate your manuscript file.

If you would like to make changes to your financial disclosure, please include your updated statement in your cover letter.

To enhance the reproducibility of your results, we recommend that if applicable you deposit your laboratory protocols in protocols.io, where a protocol can be assigned its own identifier (DOI) such that it can be cited independently in the future. For instructions see: http://journals.plos.org/plosone/s/submission-guidelines#loc-laboratory-protocols

Please include the following items when submitting your revised manuscript:

  • A rebuttal letter that responds to each point raised by the academic editor and reviewer(s). This letter should be uploaded as separate file and labeled 'Response to Reviewers'.

  • A marked-up copy of your manuscript that highlights changes made to the original version. This file should be uploaded as separate file and labeled 'Revised Manuscript with Track Changes'.

  • An unmarked version of your revised paper without tracked changes. This file should be uploaded as separate file and labeled 'Manuscript'.

Please note while forming your response, if your article is accepted, you may have the opportunity to make the peer review history publicly available. The record will include editor decision letters (with reviews) and your responses to reviewer comments. If eligible, we will contact you to opt in or out.

We look forward to receiving your revised manuscript.

Kind regards,

Sara Amancio

Academic Editor

PLOS ONE

[Note: HTML markup is below. Please do not edit.]

Reviewers' comments:

Reviewer's Responses to Questions

Comments to the Author

1. If the authors have adequately addressed your comments raised in a previous round of review and you feel that this manuscript is now acceptable for publication, you may indicate that here to bypass the “Comments to the Author” section, enter your conflict of interest statement in the “Confidential to Editor” section, and submit your "Accept" recommendation.

Reviewer #1: (No Response)

Reviewer #2: All comments have been addressed

**********

2. Is the manuscript technically sound, and do the data support the conclusions?

The manuscript must describe a technically sound piece of scientific research with data that supports the conclusions. Experiments must have been conducted rigorously, with appropriate controls, replication, and sample sizes. The conclusions must be drawn appropriately based on the data presented.

Reviewer #1: No

Reviewer #2: Partly

**********

3. Has the statistical analysis been performed appropriately and rigorously?

Reviewer #1: No

Reviewer #2: Yes

**********

4. Have the authors made all data underlying the findings in their manuscript fully available?

The PLOS Data policy requires authors to make all data underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data—e.g. participant privacy or use of data from a third party—those must be specified.

Reviewer #1: Yes

Reviewer #2: Yes

**********

5. Is the manuscript presented in an intelligible fashion and written in standard English?

PLOS ONE does not copyedit accepted manuscripts, so the language in submitted articles must be clear, correct, and unambiguous. Any typographical or grammatical errors should be corrected at revision, so please note any specific errors here.

Reviewer #1: Yes

Reviewer #2: Yes

**********

6. Review Comments to the Author

Please use the space provided to explain your answers to the questions above. You may also include additional comments for the author, including concerns about dual publication, research ethics, or publication ethics. (Please upload your review as an attachment if it exceeds 20,000 characters)

Reviewer #1: The revised manuscript by Methe et al. makes some clarifications regarding various points, but I think it still lacks enough detail in a number of aspects. I also have some methodological comments.

1. I am still surprised by the lack of soil sequences. The authors write in their response that methods were specifically designed for leaf epiphytes but there is no clear methodological description of how soil samples were generated. DNA extraction with PowerSoil would certainly produce a high number of microbial DNA from soil samples if standard soil sampling methods were used. Please include a description of the precise soil sampling method.

2. The authors of the HUMaN2 pipeline do state in their website that their method can be used for any community. However, the HUMaN2 publication (doi: 10.1038/s41592-018-0176-y) makes no such claim and only provides analysis for human and marine communities. The methods may be general but since it is a reference based approach, it will always be sensitive to the reference database. Therefore, I urge great caution when interpreting results from this approach on plant-associated communities.

3. Line 190. I had asked about the false discovery rate (FDR) and the author state that they chose the lowest possible false positive rate (FPR) but they never state what was that rate. I think the final FDR/FPR ratio needs to be reported in the manuscript otherwise one doesn’t know how ot interpret the results. Also, FPR and FDR are not identical and the authors should control via FDR not FPR.

4. line 220 states that “little correlation between depth of microbial sequence and annotation quality (Table 1).”, but the table doesn’t actually shows that. It just tells me that some samples were sequenced different amounts, but there is no quality metrics.

5. Lines 295-297 state that there is a correlation between drought severity and similarity in microbial functional changes, but with only three sites and without a statistical analysis to back it, I don’t see how that apparent correlation can be interpreted.

6. From the title abstract and conclusion it seems as if as if functional categories consistently differentiate between drought and well-watered conditions; however, in their response the authors state that “The goal of using multiple sites is to illustrate that functional gene changes are likely to be site-specific”. I think the authors need to be more clear about what conclusion they are trying to present. Moreover, claiming site specific differences requires comparisons between sites which the authors refrain from because of lack of appropriate statistical tools. I think this is a valid argument, but then the authors need to be explicit about the limitations of their conclusions. One can make claims about specific sites, but it is impossible to claim if the differences between sites are particularly big or small because we don’t know what to expect.

Reviewer #2: Why the agronomic data found in figure 4 was drawn from different sites? For example a) plant height of the California-Albany (CA) site, b) Seed weight of the Texas-Dumas-Etter site, c) Seed weight of the Texas-Halfway site, d) stem diameter of the California site -Albany ... The same agronomic data record was made for the different sites, because the comparison between the sites in the different agronomic variables analyzed is not illustrated.

Ideally, the effects of affected plants on the effects of plants against drought can be shown, images of plants grown under both conditions, perhaps modify Figure 4 or attach it as supplementary material.

Why use different degrees of drought between different sites and they were not the same at all sites?

How to consider the height of the plants, at what height of the plant to take the diameter of the plant, the weight of the seeds of how many seeds and in the condition raised? It is not described in materials and methods or if the strategy was taken from previous work.

The sizes of the graphics and letters of the axes are small, they need to be edited and the figures must be loaded according to the PLOS ONE guidelines.

**********

7. PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files.

If you choose “no”, your identity will remain anonymous but your review may still be made public.

Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy.

Reviewer #1: Yes: Sur Herrera Paredes

Reviewer #2: No

[NOTE: If reviewer comments were submitted as an attachment file, they will be attached to this email and accessible via the submission site. Please log into your account, locate the manuscript record, and check for the action link "View Attachments". If this link does not appear, there are no attachment files to be viewed.]

While revising your submission, please upload your figure files to the Preflight Analysis and Conversion Engine (PACE) digital diagnostic tool, https://pacev2.apexcovantage.com/. PACE helps ensure that figures meet PLOS requirements. To use PACE, you must first register as a user. Registration is free. Then, login and navigate to the UPLOAD tab, where you will find detailed instructions on how to use the tool. If you encounter any issues or have any questions when using PACE, please email us at figures@plos.org. Please note that Supporting Information files do not need this step.

PLoS One. 2020 Sep 18;15(9):e0237493. doi: 10.1371/journal.pone.0237493.r004

Author response to Decision Letter 1


20 Jun 2020

May 20, 2020

Reply to reviewer 1 comments. Reviewer requests are in italics, and our replies to reviewer 1 are in plain text.

Reviewer's Responses to Questions

Comments to the Author

1. If the authors have adequately addressed your comments raised in a previous round of review and you feel that this manuscript is now acceptable for publication, you may indicate that here to bypass the “Comments to the Author” section, enter your conflict of interest statement in the “Confidential to Editor” section, and submit your "Accept" recommendation.

Reviewer #1: (No Response)

Reviewer #2: All comments have been addressed

We provide replies to reviewer 1’s comments below.

6. Review Comments to the Author

Please use the space provided to explain your answers to the questions above. You may also include additional comments for the author, including concerns about dual publication, research ethics, or publication ethics. (Please upload your review as an attachment if it exceeds 20,000 characters)

Reviewer #1: The revised manuscript by Methe et al. makes some clarifications regarding various points, but I think it still lacks enough detail in a number of aspects. I also have some methodological comments.

1. I am still surprised by the lack of soil sequences. The authors write in their response that methods were specifically designed for leaf epiphytes but there is no clear methodological description of how soil samples were generated. DNA extraction with PowerSoil would certainly produce a high number of microbial DNA from soil samples if standard soil sampling methods were used. Please include a description of the precise soil sampling method.

Since the goal was to check the extent of soil microbial DNA on leaves due to soil particles on the leaf surface, the soil samples were extracted using the same method that was used for the leaf epiphyte samples (placing soil samples taken adjacent to plants into sterile water with Silwet, filtering, and extracting). We added an additional sentence in the methods to explain this more clearly.

2. The authors of the HUMaN2 pipeline do state in their website that their method can be used for any community. However, the HUMaN2 publication (doi: 10.1038/s41592-018-0176-y) makes no such claim and only provides analysis for human and marine communities. The methods may be general but since it is a reference based approach, it will always be sensitive to the reference database. Therefore, I urge great caution when interpreting results from this approach on plant-associated communities.

We added additional explanation of the fact that this is a reference-based approach to the methods section.

3. Line 190. I had asked about the false discovery rate (FDR) and the author state that they chose the lowest possible false positive rate (FPR) but they never state what was that rate. I think the final FDR/FPR ratio needs to be reported in the manuscript otherwise one doesn’t know how ot interpret the results. Also, FPR and FDR are not identical and the authors should control via FDR not FPR.

We added additional explanation to the main manuscript, with information about where in the supplemental methods additional information such as the confusion matrices may be found. We created groups (A, B, C) with sequences from the same gene family and different gene families; A and B were low-similarity, while A and C were typical higher similarity levels. When we compared different combinations of groups, AB, BC, AC, the FDRs after running through the ENNB pipeline with TMM normalization were 0.846, 0.195, and 0.088 respectively using the threshold of 0.001 to compare p-values. The goal for the simulations to determine if ENNB was a viable method for detecting gene counts between groups. The low FDR calculated using simulations on groups A and C allowed us to determine that ENNB would be an acceptable tool to run against the real data provided the sequence match value to declare function similarity was set to the higher level. The notebook is genefamilies_simulations.Rmd.

4. line 220 states that “little correlation between depth of microbial sequence and annotation quality (Table 1).”, but the table doesn’t actually shows that. It just tells me that some samples were sequenced different amounts, but there is no quality metrics.

We corrected this sentence, thank you for pointing this out.

5. Lines 295-297 state that there is a correlation between drought severity and similarity in microbial functional changes, but with only three sites and without a statistical analysis to back it, I don’t see how that apparent correlation can be interpreted.

We agree that comparisons across sites are qualitative, and ‘correlation’ in this sentence is a descriptive term, not a specific measurement.

6. From the title abstract and conclusion it seems as if as if functional categories consistently differentiate between drought and well-watered conditions; however, in their response the authors state that “The goal of using multiple sites is to illustrate that functional gene changes are likely to be site-specific”. I think the authors need to be more clear about what conclusion they are trying to present. Moreover, claiming site specific differences requires comparisons between sites which the authors refrain from because of lack of appropriate statistical tools. I think this is a valid argument, but then the authors need to be explicit about the limitations of their conclusions. One can make claims about specific sites, but it is impossible to claim if the differences between sites are particularly big or small because we don’t know what to expect.

We edited the conclusion to clarify this point.

Reviewer #2: Why the agronomic data found in figure 4 was drawn from different sites? For example a) plant height of the California-Albany (CA) site, b) Seed weight of the Texas-Dumas-Etter site, c) Seed weight of the Texas-Halfway site, d) stem diameter of the California site -Albany ... The same agronomic data record was made for the different sites, because the comparison between the sites in the different agronomic variables analyzed is not illustrated.

Ideally, the effects of affected plants on the effects of plants against drought can be shown, images of plants grown under both conditions, perhaps modify Figure 4 or attach it as supplementary material.

Why use different degrees of drought between different sites and they were not the same at all sites?

How to consider the height of the plants, at what height of the plant to take the diameter of the plant, the weight of the seeds of how many seeds and in the condition raised? It is not described in materials and methods or if the strategy was taken from previous work.

The sizes of the graphics and letters of the axes are small, they need to be edited and the figures must be loaded according to the PLOS ONE guidelines.

We have uploaded our correctly formatted figure as an eps file.

Attachment

Submitted filename: May 20 reply to reviewer 1 PLOS ONE.docx

Decision Letter 2

Sara Amancio

29 Jul 2020

Functional gene categories differentiate maize leaf drought-related microbial epiphytic communities

PONE-D-19-16307R2

Dear Dr. Stapleton

We’re pleased to inform you that your manuscript has been judged scientifically suitable for publication and will be formally accepted for publication once it meets all outstanding technical requirements.

Within one week, you’ll receive an e-mail detailing the required amendments. When these have been addressed, you’ll receive a formal acceptance letter and your manuscript will be scheduled for publication.

An invoice for payment will follow shortly after the formal acceptance. To ensure an efficient process, please log into Editorial Manager at http://www.editorialmanager.com/pone/, click the 'Update My Information' link at the top of the page, and double check that your user information is up-to-date. If you have any billing related questions, please contact our Author Billing department directly at authorbilling@plos.org.

If your institution or institutions have a press office, please notify them about your upcoming paper to help maximize its impact. If they’ll be preparing press materials, please inform our press team as soon as possible -- no later than 48 hours after receiving the formal acceptance. Your manuscript will remain under strict press embargo until 2 pm Eastern Time on the date of publication. For more information, please contact onepress@plos.org.

Kind regards,

Sara Amancio

Academic Editor

PLOS ONE

Additional Editor Comments (optional):

Reviewers' comments:

Reviewer's Responses to Questions

Comments to the Author

1. If the authors have adequately addressed your comments raised in a previous round of review and you feel that this manuscript is now acceptable for publication, you may indicate that here to bypass the “Comments to the Author” section, enter your conflict of interest statement in the “Confidential to Editor” section, and submit your "Accept" recommendation.

Reviewer #1: All comments have been addressed

Reviewer #2: All comments have been addressed

**********

2. Is the manuscript technically sound, and do the data support the conclusions?

The manuscript must describe a technically sound piece of scientific research with data that supports the conclusions. Experiments must have been conducted rigorously, with appropriate controls, replication, and sample sizes. The conclusions must be drawn appropriately based on the data presented.

Reviewer #1: Yes

Reviewer #2: Partly

**********

3. Has the statistical analysis been performed appropriately and rigorously?

Reviewer #1: Yes

Reviewer #2: Yes

**********

4. Have the authors made all data underlying the findings in their manuscript fully available?

The PLOS Data policy requires authors to make all data underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data—e.g. participant privacy or use of data from a third party—those must be specified.

Reviewer #1: Yes

Reviewer #2: Yes

**********

5. Is the manuscript presented in an intelligible fashion and written in standard English?

PLOS ONE does not copyedit accepted manuscripts, so the language in submitted articles must be clear, correct, and unambiguous. Any typographical or grammatical errors should be corrected at revision, so please note any specific errors here.

Reviewer #1: Yes

Reviewer #2: Yes

**********

6. Review Comments to the Author

Please use the space provided to explain your answers to the questions above. You may also include additional comments for the author, including concerns about dual publication, research ethics, or publication ethics. (Please upload your review as an attachment if it exceeds 20,000 characters)

Reviewer #1: (No Response)

Reviewer #2: Is not clear why used agronomic data from different sampling site for discuss your results in the figure 4, like height, cob diameter, seed biomass. Do you have some pictures that ilustres the effects on plant phenotype?

**********

7. PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files.

If you choose “no”, your identity will remain anonymous but your review may still be made public.

Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy.

Reviewer #1: Yes: Sur Herrera Paredes

Reviewer #2: No

Acceptance letter

Sara Amancio

17 Aug 2020

PONE-D-19-16307R2

Functional gene categories differentiate maize leaf drought-related microbial epiphytic communities

Dear Dr. Stapleton:

I'm pleased to inform you that your manuscript has been deemed suitable for publication in PLOS ONE. Congratulations! Your manuscript is now with our production department.

If your institution or institutions have a press office, please let them know about your upcoming paper now to help maximize its impact. If they'll be preparing press materials, please inform our press team within the next 48 hours. Your manuscript will remain under strict press embargo until 2 pm Eastern Time on the date of publication. For more information please contact onepress@plos.org.

If we can help with anything else, please email us at plosone@plos.org.

Thank you for submitting your work to PLOS ONE and supporting open access.

Kind regards,

PLOS ONE Editorial Office Staff

on behalf of

Prof Sara Amancio

Academic Editor

PLOS ONE

Associated Data

    This section collects any data citations, data availability statements, or supplementary materials included in this article.

    Supplementary Materials

    S1 Fig

    (TXT)

    Attachment

    Submitted filename: 2020 reply to reviewers PLOS ONE.docx

    Attachment

    Submitted filename: May 20 reply to reviewer 1 PLOS ONE.docx

    Data Availability Statement

    All relevant data are uploaded to the Dryad repository and publicly accessible via the following DOI: 10.5061/dryad.7m0cfxprs. https://datadryad.org/stash/dataset/doi:10.5061/dryad.7m0cfxprs

    Metagenomic sequences are available in the SRA repository, identifier BIOPROJECT PRJNA297239. All data analysis scripts, simulations, intermediate files and metadata files are available from Data Dryad doi: 10.5061/dryad.7m0cfxprs. A preliminary version of this work is available in bioRxiv under bioRxiv 104331 doi https://doi.org/10.1101/104331.


    Articles from PLoS ONE are provided here courtesy of PLOS

    RESOURCES