Abstract
Culture-independent studies to characterize skin microbiota are increasingly common, due in part to affordable and accessible sequencing and analysis platforms. Compared to culture-based techniques, DNA sequencing of the bacterial 16S ribosomal RNA (rRNA) gene or whole metagenome shotgun (WMS) sequencing provide more precise microbial community characterizations. Most widely used protocols were developed to characterize microbiota of other habitats (i.e. gastrointestinal), and have not been systematically compared for their utility in skin microbiome surveys. Here we establish a resource for the cutaneous research community to guide experimental design in characterizing skin microbiota. We compare two widely sequenced regions of the 16S rRNA gene to WMS sequencing for recapitulating skin microbiome community composition, diversity, and genetic functional enrichment. We show that WMS sequencing most accurately recapitulates microbial communities, but sequencing of hypervariable regions 1-3 of the 16S rRNA gene provides highly similar results. Sequencing of hypervariable region 4 poorly captures skin commensal microbiota, especially Propionibacterium. WMS sequencing, which is resource- and cost-intensive, provides evidence of a community’s functional potential; however, metagenome predictions based on 16S rRNA sequence tags closely approximate WMS genetic functional profiles. This work highlights the importance of experimental design for downstream results in skin microbiome surveys.
Introduction
Research devoted to the skin microbiome has surged in the past decade, due in large part to accessible, affordable high throughput DNA sequencing technology and the realization that the microbiome may modulate the pathogenesis of many cutaneous disorders. The majority of protocols for characterizing microbial communities were initially developed and optimized to survey the gastrointestinal tract or the environment, niches that harbor distinct sets of microbiota compared to the skin. A standardized methodology for skin microbiome studies is lacking, though these protocols are often pivotal to their outcome (Guo et al. 2013; Nelson et al. 2014; Albertsen et al. 2015).
A common approach to characterize cutaneous microbial communities relies upon amplification, sequencing, and analysis of the prokaryotic 16S rRNA gene. This approach has been employed in multiple studies of skin bacterial communities and their association with health and disease (Hannigan and Grice 2013). Initial studies utilized full-length 16S rRNA gene sequences (~1500 kb) generated by Sanger sequencing methods. Next-generation sequencing platforms that allow for vastly increased sequencing depth at a fraction of the cost generate shorter read lengths, making it impractical to sequence the full-length gene. Therefore, one or more hypervariable regions, or 16S tags, are selected for sequencing as a proxy for the full-length gene. No single hypervariable region is able to distinguish amongst all bacteria and primer biases may differentially affect amplification efficiency of different types of bacteria. However, specific regions may be optimal for capturing the diversity and composition of different ecosystems.
More recently, whole metagenomic shotgun (WMS) sequencing has been employed for both taxonomic and functional annotation of skin microbial communities (Human Microbiome Project 2012b; Oh et al. 2014; Hannigan et al. 2015). This approach reduces amplification bias, captures multi-kingdom communities, and allows for strain-level analysis. WMS datasets, although information rich, are more expensive to generate and require greater computational knowledge and resources to store, process, and analyze. While gene content can be extracted from WMS data to provide insight into functional processes of the microbial community, bioinformatic tools now exist to predict functional content from 16S tag sequences (Langille et al. 2013) and may in some cases be superior to WMS sequencing for microbial community classification (Xu et al. 2014).
Here we present a comparison of experimental strategies to identify optimal parameters for capturing the composition, diversity, and genetic content of the cutaneous microbiome. We applied 16S rRNA tag sequencing to cutaneous swabs and a publicly available mock community control of 20 bacterial species in known concentration. We sequenced two 16S tag regions commonly utilized in microbiome studies, including hypervariable regions 1-3 (V1-V3), and region 4 (V4) (Caporaso et al. 2011) to compare their utility in accurately characterizing skin microbiota diversity and composition. Additionally, we performed WMS sequencing on the same swab samples and controls to identify any additional utility of WMS sequencing over 16S tag sequencing for characterizing skin microbiota and identifying genetic functional enrichment.
Results
Sampling, sequencing, and quality control
Sixty-two cutaneous skin swabs were collected from nine healthy volunteers (See Supplemental Table 1 for cohort characteristics). Sampled skin sites consisted of diverse microenvironments with respect to moisture (sweat) and sebum: sebaceous (retroauricular crease [Ra], occiput [Oc], and forehead [Fh]), moist (toe web [Tw] and umbilicus [Um]), and intermittently moist (antecubital fossa [Ac] and palm [Pa]) (Fig 1A).
Whole genomic DNA was extracted from the swab samples and subjected to microbiome profiling using three different approaches: (1) V1-V3 tag sequencing; (2) V4 tag sequencing; and (3) WMS sequencing (Fig 1B,C). V1-V3 tag sequencing was employed by the Human Microbiome Project (Human Microbiome Project 2012a; 2012b; Aagaard et al. 2013). The V4 region was used in the Earth Microbiome Project (Gilbert et al. 2014) and is widely utilized to characterize microbiota of other body habitats. We did not include regions further 3’ in the 16S rRNA gene, as these have been documented to generally perform less well for a variety of analyses (Wu et al. 2010; Conlan et al. 2012; Jumpstart Consortium Human Microbiome Project Data Generation Working 2012) and/or are not widely employed for the characterization of microbial communities. All sequencing was performed on either the Illumina MiSeq or HiSeq 2500 platforms. A publicly available mock community control (MCC) was sequenced in parallel with the skin samples.
The V1-V3 dataset contained 2,124,836 total high quality sequence reads, with a median of 24,891 sequence reads per sample. The V4 dataset contained 5,328,215 total high quality sequence reads, with a median of 77,928 sequence reads per sample. The WMS dataset contained 81,553,035 total high quality sequence reads, with a median of 1,233,172 sequence reads per sample (See Supplemental Table 2 for per sample sequence counts).
Skin bacterial community composition varies by sequencing technique
We compared each sequencing method to determine how well they recapitulated the taxonomic relative abundance of the MCC, which contained 100,000 rRNA operon copies per organism per µL; therefore each of the 20 species contained in the MCC should account for 5% of the community by 16S tag sequencing. For WMS sequencing, expected MCC abundances must take into account the concentration of the input DNA. We first mapped our sequences to the expected MCC species to identify community composition (Fig 2A). Hierarchical clustering of taxonomic profiles indicated that WMS provided a close approximation of the MCC (Fig 2A). V1-V3 tag sequencing provided the best proxy for 16S-based profiling, while V4 tag sequencing severely underrepresented Staphylococcus epidermidis and Propionibacterium acnes and overrepresented Staphylococcus aureus. When using OTU-based methods to characterize the MCC, similar trends were observed, however, V1-V3 was unable to classify all taxa to the genus level (Fig S1).
Propionibacterium (including P. acnes), Staphylococcus (including S. epidermidis and S. hominis), and Corynebacterium were the dominant bacterial genera on healthy human skin (Fig 2B; Fig S2). The most notable observation was that Propionibacterium was vastly underrepresented in the V4 dataset. We employed basic linear regression analysis to correlate the relative abundance of three prominent skin bacteria in V1-V3 and V4 datasets compared to their relative abundance in the WMS dataset, which fairly accurately recapitulated the composition of the MCC (Fig S3). The relative abundances represented by the V1-V3 dataset had much higher positive correlations to WMS relative abundances than were observed with the V4 dataset for Propionibacterium (R2 = 0.931 vs. 0.499), Staphylococcus (R2=0.736 vs. 0.153) and Corynebacterium (R2=0.789 vs. 0.281). These data indicate that V4 representations of skin microbiome composition are severely biased against bacteria that are present in great prevalence and abundance on the human skin.
Hierarchical clustering revealed that this bias was not equal across all microenvironments. Intermittently moist and sebaceous samples from the V1-V3 and WMS datasets cluster together, but V4 were less similar (Fig S4). This clustering appears to be driven largely by underrepresentation of Propionibacterium in V4 tags. Moist sites were most taxonomically similar across all sequencing methods and clustered together regardless of method.
Staphylococcus species level classification in 16S datasets is enabled by phylogenetic placement algorithms
A trade-off when using cost-effective next-generation sequencing is the short read lengths that these platforms generate, presenting a challenge for accurate genus-, species- and strain-level classification. Using OTU based methods, both V1-V3 and V4 tag sequencing failed to accurately identify more than 30% of the species in the MCC (Fig S5A). Moreover, only 13.7% of the V1-V3 and 7.6% of the V4 OTUs were classified to the species level in the cutaneous swab samples (Fig 3A).
Species-level resolution of skin microbiota is especially important when trying to differentiate between commensals (i.e. S. epidermidis) and pathogens (i.e. S. aureus). Using the RDP classifier in QIIME, we were unable to speciate Staphylococcus in the V4 samples, and only identified S. epidermidis in the V1-V3 samples (Fig 3B), despite evidence that additional Staphylococcus species live on the skin (Fig 3C). An approach to improve taxonomic resolution of 16S rRNA tag sequence data is to use phylogenetic information. We attempted to classify Staphylococcus species in the 16S datasets by using pplacer(Matsen et al. 2010), an algorithm that uses maximum-likelihood criteria to place sequences on a fixed phylogenetic reference tree.
WMS accurately identified the two Staphylococcus species, S. epidermidis and S. aureus, in the MCC (Fig S5B). Only 11% and <1% of Staphylococcus sequences using V1-V3 and V4 tags, respectively, were classifiable at the species-level using pplacer (Fig S5B). Pplacer classification of V1-V3 tags identified the correct species, but overrepresented the relative abundance of S. epidermidis. V4 tag species-level classification identified S. aureus, but also falsely identified S. hominis and S. haemolyticus.
Pplacer analysis of the skin swabs revealed agreement between the V1-V3 and WMS datasets, but not the V4 dataset. WMS identified the predominant Staphylococcus species to be S. epidermidis, S. hominis, and S. capitis (Fig 3A). Of the sequences identified as Staphylococcus at the genus level in the V1-V3 dataset, 59% were classified at the species level. S. epidermidis and S. hominis were identified, but S. capitis was absent (Fig 3B). Less than 1% of the V4 Staphylococcus sequences were classified by pplacer and they were predominantly characterized as S. aureus and S. haemolyticus (Fig 3C).
Computationally predicted versus observed functional profiles
A perceived advantage of WMS approaches for skin microbiome studies is the functional insight gained through analysis of genetic enrichment. However, functional genetic profiles can be predicted from 16S rRNA sequences with the program PICRUSt (Langille et al. 2013), which uses reference genomes to infer a composite metagenome and predict abundance of gene families. Therefore, we compared functional genetic profiles obtained by WMS to PICRUSt-predicted functional genetic profiles of V4 and V1-V3 tag sequence datasets.
Functional enrichment analysis of the MCC identified variation in KEGG Pathway enrichment by sequencing technique (Fig S6A), but did not reveal significant differences in Shannon Diversity (Fig S6B). Notably, several metabolic pathways, including “metabolism of cofactors and vitamins” and “carbohydrate metabolism” were more abundant and “energy metabolism” and “biosynthesis of other secondary metabolites” were less abundant in the WMS dataset than in metagenomes predicted from16S tag sequence data. Functional profiles of each skin swab generated from the WMS dataset also differed compositionally from their matched V1-V3 and V4 predicted metagenomes. We focused on the 102 pathways identified across all datasets, in at least 4 samples, and at greater than 0.5% abundance. We grouped these pathways into 28 higher level KEGG categories, 21 of which were shared in all datasets and significantly differentially enriched between either of the 16S and the WMS datasets (Fig 4A, FDR corrected paired Wilcoxon test, p< 0.05). The KEGG category “xenobiotics biodegradation and metabolism” was enriched in both 16S-predicted functional profiles (Fig 4B), with the greatest differences observed in pathway ko00930 (“Caprolactam degradation”). Conversely, the “translation” category was more prominent in the WMS dataset, with significant differences in ribosomal (ko03008, ko03010) and tRNA (ko00970) pathways (Fig 4B). Several KEGG categories also differed between V1-V3 and V4 sequencing techniques, including “glycan biosynthesis and metabolism”, which is significantly different between the V4 and WMS datasets, but not the V1-V3 and WMS datasets (Fig 4B). Despite these observed differences, Spearman correlations revealed strong trends between the mean relative abundances of higher-level KEGG pathways in the predicted functional profiles compared to the WMS dataset across all body sites sampled (Fig S7).
Diversity trends are dependent on methodology
We estimated and compared taxonomic alpha diversity of skin bacterial communities using the Shannon diversity index, which takes into account both the total number of species in the community (richness) and the evenness of the species present. All three sequencing approaches identified sebaceous sites as the least diverse and significantly less diverse than intermittently moist sites (Fig 5A; p < 0.05 Kruskal and Multiple Comparison Post Hoc Test). While V1-V3 and WMS sequencing identified significant diversity trends between moist and sebaceous sites, V4 tag sequencing did not. Alternatively, V4 found intermittently moist sites to be significantly more diverse than moist sites, a trend that was not confirmed by the other methods.
Cutaneous functional diversity, calculated based on predicted gene functions, has previously been shown to vary by biogeography (Oh et al. 2014). However, we identified conflicting trends in the skin microbiome based on the microenvironment of the site sampled. Both V4 and V1-V3 tags identified significant differences dependent on microenvironment that were not found in the WMS sequencing dataset (Fig 5B; p < 0.05 Kruskal and Multiple Comparison Post Hoc Test).
We also compared beta diversity, or bacterial community structure, as recapitulated by V1-V3 and V4 tag sequencing. We applied Procrustes analysis to Bray-Curtis dissimilarity matrices in order to determine whether the use of different 16S rRNA sequence tags would derive similar beta diversity conclusions. Though significant, Procrustes analysis showed very weak congruence between the datasets (Fig 5C; m12 squared = 0.6338, p = 0.0001).
Discussion
As microbiome sequencing surveys become increasingly common, effective study design is crucial for the development of meaningful datasets. We make the following recommendations for studying skin microbiota from swab samples: 1) Regarding choosing a 16S rRNA gene amplicon, the V1-V3 region provides more accurate assessments of human skin microbiota compared to the V4 region. 2) WMS sequencing is superior for species-level taxonomic classification, and previous reports have demonstrated the utility in strain-level analysis and capturing non-prokaryotic elements of the skin microbiome (Oh et al. 2014). However, V1-V3 tags provide reasonable proxies for taxonomic composition and diversity at a much lower cost and effort. The goal of the experiment should be carefully considered in addition to available resources for generating and analyzing the resulting datasets. 3) Functional genetic predictions based on 16S rRNA tags are remarkably similar to those provided by WMS sequencing, and may in some cases provide a reasonable estimate of functional enrichment when the expertise and/or resources are not available to perform WMS. However, owing to strain variability and widespread horizontal gene transfer between bacteria, results of predictive analyses should be interpreted with some caution. Predictive analyses are also limited in their ability to identify antibiotic resistance and virulence genes that may be of interest and could be inferred from WMS sequencing.
Primers amplifying the V4 variable region, as employed here, were not able to recover Propionibacterium or reliably speciate Staphylococcus. This is not unexpected since the V4 hypervariable region is much shorter than the V1-V3 region and has a higher degree of sequence conservation (Chakravorty et al. 2007). A separate study also remarked on the absence of Propionibacterium in V4 libraries, suggesting that a single nucleotide difference between the 27F forward primer and annealing site in the P. acnes 16S rRNA gene may impair detection (Nelson et al. 2014). Our findings underscore the importance of thoroughly vetting primers for their ability to capture microbiota of importance to the skin habitat.
A recent study noted that biogeography of the skin, as well as individuality, shaped strain level cutaneous diversity (Oh et al. 2014). P. acnes were also found to be differentially associated with acne (Fitz-Gibbon et al. 2013). Speciation and strain level identification of microbiota may be important if the ultimate goal is to identify a putative causal microbe/microbiota for downstream studies to examine mechanism. We noted that the V1-V3 region was able to speciate the majority of Staphylococcus sequences based on phylogenetic placement against a curated reference database of Staphylococcus species. However, WMS sequencing would be a superior approach if one wished to identify overall strain level variability and/or did not have access to a reliable curated reference database for their genus or species of choice.
Based on the striking differences in taxonomic composition of the datasets, the strong correlations of KEGG pathway abundance across sequencing methods is surprising, but may indicate shared functionality amongst different microorganisms in cutaneous communities. A question that remains unclear is whether functional units provide additional insight and are more effective at characterizing microbiome datasets than the taxonomic units currently in use. Xu et al. found that taxonomic profiles are better at classifying samples into biologically meaningful categories (Xu et al. 2014). However, this may change with technological advances, including improvement in predictive tools and database annotations.
While our study focuses on the effect of primer selection and sequencing approach, there are many other factors to consider when designing a skin microbiome survey. First, the sample collection technique should be consistent throughout the entire study. Here, we used a skin swab method that is minimally invasive. Other studies have reported the utility of deeper sampling of the skin layers (Nakatsuji et al. 2013). The utility of these sampling methods for WMS sequencing, however, is probably limited, as the amount of human DNA present in these samples would greatly overwhelm the microbial DNA present.
Second, studies investigating low-biomass sites, such as the skin, must also take sequencing depth into account and employ appropriate controls (Salter et al. 2014). Controls accounting for reagent contamination are critical for interpreting results. We recommend eliminating potential contaminants wherever possible during sample preparation by purchasing high quality, ultra-pure, DNA-free reagents, UV treating equipment and reagents, and performing all experiments in a hood.
Third, computational analysis and selection of variables, such as OTU picking method and alpha diversity metric, can greatly impact the interpretation of results. We employed default and commonly used variables, when possible, to make the analysis widely applicable.
Finally, we did not obtain cultures in parallel to collecting skin swabs for microbiome analysis to compare our results. While it would be a point of interest to compare cultures to 16S tag sequencing for deciphering community composition, we expect that, as reported previously in several examples (Findley et al. 2013; Gardner et al. 2013) cultures would greatly underestimate the diversity and composition of the skin microbiota.
Overall, our comparison of three different DNA sequencing methods indicates that 16S tag sequencing of the V1-V3 region is a reasonable, cost-effective approach, when simply profiling the composition of a skin microbial community or identifying biomarkers associated with skin disease.
Materials & Methods
Sample collection
The University of Pennsylvania Internal Review Board approved all human subject recruitment and sample collection. Healthy adult human volunteers residing in Philadelphia, PA and surrounding areas, were recruited to provide cutaneous swabs. Sample collection was performed following written, informed consent by the subject. Exclusion criteria included self-reported antibiotic treatment (oral or systemic) six months prior to enrollment, observable dermatologic diseases, and significant comorbidities including HIV or other immunocompromised states. Subjects were instructed to avoid hand sanitizers and antimicrobial soaps and skincare products for 1 week prior to sample collection appointment. Subjects were also instructed not to shower for 24 hours prior to sample collection appointment. Cutaneous swabs (Epicentre) were collected as described previously (Grice et al. 2009) and stored in 300µL yeast cell lysis solution (from Epicentre MasterPure Yeast DNA Purification kit) at −20°C immediately following collection. Swabs were incubated for one hour at 37°C with shaking and 10,000 units of ReadyLyse Lysozyme solution (Epicentre). Samples were subjected to bead beating for ten minutes at maximum speed on a vortex mixer with 0.5 mm glass beads (MoBio), followed by a 30 minute incubation at 65°C with shaking. As previously described (Gardner et al. 2013), protein precipitation reagent (Epicentre) was added and samples were spun at maximum speed. The supernatant was removed, mixed with isopropanol and applied to a column from the PureLink Genomic DNA Mini Kit (Invitrogen). Instructions for the Invitrogen PureLink kit were followed exactly, and DNA was eluted in 50 mL elution buffer (Invitrogen). At each sampling event, swab control samples that never came into contact with the skin were collected, prepared and sequenced exactly as the experimental samples. No significant background contamination from either reagents and/or collection procedures was recovered.
16S rRNA sequencing, sequence processing, and analysis
Sequencing libraries were prepared using the Invitrogen Accuprime for PCR, the AMPure kit for PCR product cleanup and normalization, and the Qiagen MinElute column for pooled PCR product purification. Sequencing was performed at the Penn Next Generation Sequencing Core on the Illumina MiSeq. The mock community control (MCC; obtained from BEI Resources, NIAID, NIH as part of the Human Microbiome Project: Genomic DNA from Microbial Mock Community B (Even, Low Concentration), v5.1L, for 16S rRNA Gene Sequencing, HM-782D) was sequenced in parallel with experimental samples. Sequencing of the V4 region was performed using 150 bp paired-end chemistry and reads between 248 and 255 nucleotides long were retained for analysis (99.58% of total sequences). Sequencing of the V1-V3 region was performed using 300 bp paired-end chemistry and reads between 465 and 535 nucleotides long were retained (96.74% of total sequences). Samples were processed in QIIME 1.8.0 (Caporaso et al. 2010) and statistical analysis and visualization was performed in the R statistical computing environment (R Core Team 2015) as follows. Sequences were clustered into operational taxonomic units (OTUs) with a 97% similarity threshold by reference based Uclust clustering (Edgar 2010), using the Greengenes database 13_8 (DeSantis et al. 2006).
Taxonomic classification was assigned using the RDP classifier (Wang et al. 2007). Chimeric sequences were identified using ChimeraSlayer (Haas et al. 2011) and removed along with those identified as Cyanobacteria. OTUs were removed if they only represented one sequence or were present in only one sample. Samples were rarefied to an even depth of 2500 sequences per sample, after which alpha and beta diversities were calculated. In addition to the OTU based methods, the MCC datasets were blasted against a custom database (blastn, max_target_seqs 1, e<10−10; alignment length > 300 for V1-V3 and >150 for V4 samples) to calculate community composition. Sequences classified as Staphylococcus at the genus level were analyzed using the pplacer algorithm with “—keep-at-most 100 –max-pitches 100” (Matsen et al. 2010) and a curated phylogenetic reference package (Conlan et al. 2012). Taxonomic classifications were generated using the guppy program, and species-level classifications with a maximum likelihood greater than 0.75 were retained. ‘Closed-reference’ OTU picking against the Greengenes database, with OTUs assigned at 97% identity, was used to generate biom–formatted OTU tables for functional prediction with PICRUSt (Langille et al. 2013) that were subsequently annotated with HUMAnN v0.99 (Abubucker et al. 2012). Kruskal-Wallis and multiple comparison post hoc tests were calculated in R with the pgirmess package (Giraudoux 2015). Procrustes analysis was performed in R, using beta diversity Bray Curtis dissimilarity matrices generated in QIIME and the metaMDS and protest functions in the vegan package (Oksanen et al. 2015).
Whole metagenome sequencing and analysis
Libraries were prepared using the NexteraXT (Illumina) library preparation kit according to the manufacturer’s instructions, with the exception that PCR cycles were increased to 15. Additionally, instead of using the manufacturer’s NexteraXT bead-based normalization protocol, we manually normalized and pooled based on DNA concentration and average fragment lengths. Sequencing was performed at the Penn Next Generation Sequencing Core on the Illumina MiSeq and/or HiSeq2500 rapid chemistry to obtain 150 bp paired end reads.
Sequence data were obtained in fastq format. Adapters were removed using cutadapt (version 1.4.1) with an error rate of 0.1 and overlap of 10. Low quality sequences (quality score <33) were removed using the standalone FASTX toolkit (version 0.0.14) with default parameters. Sequences mapping to the human genome were removed from the quality-trimmed dataset using the standalone DeconSeq toolkit (version 0.4.3) with default parameters and the human reference GRCh37 (Schmieder and Edwards 2011). Because a 1% spike-in of PhiX Control was added to the sequencing runs for quality control purposes, any sequences mapping to the PhiX174 genome (NCBI Accession: NC_001422) were also removed using DeconSeq. Sequences <80 nucleotides long were removed from the quality trimmed, DeconSeq filtered fastq files and one of the paired reads (SE1) was input into MetaPhlAn version 1.7.7 (Segata et al. 2012; 2013) for taxonomic classification. One of the paired ends (SE1) from the MCC sample was blasted against a custom database of genomes from the 20 expected bacterial species (blastn, max_target_seqs 1, e<10−10; alignment length > 50) to calculate community composition. Alpha diversity was calculated in vegan (Oksanen et al. 2015) using the biom table generated from MetaPhlAn output. For functional annotation and comparison, one set of the paired end reads (SE1) for each sample was subsampled to 200,000 sequences, queried against a reduced KEGG reference database version 56 (blastx; max_target_seqs 1, e < 10−10), and input into HUMAnN v0.99 (Abubucker et al. 2012).
Supplementary Material
Acknowledgments
We thank the volunteers for their participation in this study, Penn Next Generation Sequencing Core for sequencing support, the Penn Medicine Academic Computing Services for computing resources, and members of the Grice laboratory for their underlying contributions. EAG is supported by grants from the National Institute of Arthritis and Musculoskeletal and Skin Diseases and the National Institute of Nursing Research (R00-AR060873, R01-AR066663, and R01-NR015639). JSM is supported by NIH T32 HG000046 Computational Genomics Training Grant, GDH and AJS are supported by the Department of Defense National Defense Science and Engineering Graduate fellowship program, and BPH was supported by NIH T32 AR007465 Dermatology Research Training Grant. The content is solely the responsibility of the authors and does not necessarily represent the official views of the National Institutes of Health or the Department of Defense.
Abbreviations
- HUMAnN
HMP unified metabolic analysis network
- KEGG
Kyoto Encyclopedia of Genes and Genomes
- OTU
Operational taxonomic unit
- PCR
Polymerase chain reaction
- PICRUSt
Phylotypic investigation of communities by reconstruction of unobserved states
- QIIME
Quantitative Insights into Microbial Ecology
- WMS
whole metagenomic shotgun sequencing
- 16S rRNA
the prokaryote-specific small subunit of the ribosomal RNA
- V4
hypervariable region 4 of the 16S rRNA gene
- V1-V3
hypervariable regions 1 through 3 of the 16S rRNA gene
Footnotes
Publisher's Disclaimer: This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final citable form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.
Conflict of Interest
The authors state no conflicts of interest.
Author Contributions
JSM and EAG conceived and designed the study. AST collected skin swabs from subjects. JSM, GDH, AJS, and AST prepared samples for sequencing. JSM, GDH, BPH, and QZ analyzed sequence data. JSM and EAG drafted the manuscript.
Data Access
Sequences are deposited in the NCBI Short Read Archive (Accession PRJNA295605 and PRJNA266117). Analysis scripts and intermediate files are archived at Figshare and available at https://dx.doi.org/10.6084/m9.figshare.1544714.v1.
References
- Aagaard K, Petrosino J, Keitel W, Watson M, Katancik J, Garcia N, et al. The Human Microbiome Project strategy for comprehensive sampling of the human microbiome and why it matters. FASEB J. 2013 Mar;27(3):1012–22. doi: 10.1096/fj.12-220806. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Abubucker S, Segata N, Goll J, Schubert AM, Izard J, Cantarel BL, et al. Eisen JA, editor. Metabolic reconstruction for metagenomic data and its application to the human microbiome. PLoS computational biology. 2012;8(6):e1002358. doi: 10.1371/journal.pcbi.1002358. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Albertsen M, Karst SM, Ziegler AS, Kirkegaard RH, Nielsen PH. Aziz RK, editor. Back to Basics – The Influence of DNA Extraction and Primer Choice on Phylogenetic Analysis of Activated Sludge Communities. PloS one [Internet] 2015 Jul 16;10(7):e0132783–15. doi: 10.1371/journal.pone.0132783. Available from: http://dx.plos.org/10.1371/journal.pone.0132783. [DOI] [PMC free article] [PubMed]
- Caporaso JG, Kuczynski J, Stombaugh J, Bittinger K, Bushman FD, Costello EK, et al. QIIME allows analysis of high-throughput community sequencing data. Nature methods. 2010 May;7(5):335–6. doi: 10.1038/nmeth.f.303. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Caporaso JG, Lauber CL, Walters WA. Global patterns of 16S rRNA diversity at a depth of millions of sequences per sample. 2011 doi: 10.1073/pnas.1000080107. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Chakravorty S, Helb D, Burday M, Connell N, Alland D. A detailed analysis of 16S ribosomal RNA gene segments for the diagnosis of pathogenic bacteria. Journal of microbiological methods. 2007 May;69(2):330–9. doi: 10.1016/j.mimet.2007.02.005. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Conlan S, Kong HH, Segre JA. Highlander SK, editor. Species-Level Analysis of DNA Sequence Data from the NIH Human Microbiome Project. PloS one [Internet] 2012 Oct 10;7(10):e47075–7. doi: 10.1371/journal.pone.0047075. Available from: http://dx.plos.org/10.1371/journal.pone.0047075. [DOI] [PMC free article] [PubMed]
- DeSantis TZ, Hugenholtz P, Larsen N, Rojas M, Brodie EL, Keller K, et al. Greengenes, a chimera-checked 16S rRNA gene database and workbench compatible with ARB. Applied and Environmental Microbiology. 2006 Jul;72(7):5069–72. doi: 10.1128/AEM.03006-05. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Edgar RC. Search and clustering orders of magnitude faster than BLAST. Bioinformatics [Internet] 2010 Sep 23;26(19):2460–1. doi: 10.1093/bioinformatics/btq461. Available from: http://bioinformatics.oxfordjournals.org/cgi/doi/10.1093/bioinformatics/btq461. [DOI] [PubMed] [Google Scholar]
- Findley K, Oh J, Yang J, Conlan S, Deming C, Meyer JA, et al. Topographic diversity of fungal and bacterial communities in human skin. 2013 Jun 20;498(7454):367–70. doi: 10.1038/nature12171. Available from: http://www.ncbi.nlm.nih.gov/pubmed/23698366. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Fitz-Gibbon S, Tomida S, Chiu B-H, Nguyen L, Du C, Liu M, et al. Propionibacterium acnes strain populations in the human skin microbiome associated with acne. The Journal of investigative dermatology [Internet] 2013 Sep;133(9):2152–60. doi: 10.1038/jid.2013.21. Available from: http://www.ncbi.nlm.nih.gov/pubmed/23337890. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Gardner SE, Hillis SL, Heilmann K, Segre JA, Grice EA. The neuropathic diabetic foot ulcer microbiome is associated with clinical factors. Diabetes. 2013 Mar;62(3):923–30. doi: 10.2337/db12-0771. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Gilbert JA, Jansson JK, Knight R. The Earth Microbiome project: successes and aspirations. BMC Biol. 2014;12(1):69. doi: 10.1186/s12915-014-0069-1. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Giraudoux P. pgirmess: Data Analysis in Ecology. 2015 [Google Scholar]
- Grice EA, Kong HH, Conlan S, Deming CB, Davis J, Young AC, et al. Topographical and temporal diversity of the human skin microbiome. Science. American Association for the Advancement of Science. 2009 May 29;324(5931):1190–2. doi: 10.1126/science.1171700. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Guo F, Ju F, Cai L, Zhang T. Hudson AO, editor. Taxonomic precision of different hypervariable regions of 16S rRNA gene and annotation methods for functional bacterial groups in biological wastewater treatment. PloS one. 2013;8(10):e76185. doi: 10.1371/journal.pone.0076185. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Haas BJ, Gevers D, Earl AM, Feldgarden M, Ward DV, Giannoukos G, et al. Chimeric 16S rRNA sequence formation and detection in Sanger and 454-pyrosequenced PCR amplicons. Genome research. 2011 Mar;21(3):494–504. doi: 10.1101/gr.112730.110. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Hannigan GD, Grice EA. Microbial Ecology of the Skin in the Era of Metagenomics and Molecular Microbiology. Cold Spring Harbor Perspectives in Medicine. 2013 Dec 2;3(12):a015362–2. doi: 10.1101/cshperspect.a015362. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Hannigan GD, Meisel JS, Tyldsley AS, Zheng Q, Hodkinson BP, SanMiguel AJ, et al. The Human Skin Double-Stranded DNA Virome: Topographical and Temporal Diversity, Genetic Enrichment, and Dynamic Associations with the Host Microbiome. mBio. American Society for Microbiology. 2015;6(5):e01578–15. doi: 10.1128/mBio.01578-15. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Human Microbiome Project C A framework for human microbiome research. Nature [Internet] 2012a Jun 14;486(7402):215–21. doi: 10.1038/nature11209. Available from: http://www.ncbi.nlm.nih.gov/pubmed/22699610. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Human Microbiome Project C Structure, function and diversity of the healthy human microbiome. Nature [Internet] 2012b Jun 14;486(7402):207–14. doi: 10.1038/nature11234. Available from: http://www.ncbi.nlm.nih.gov/pubmed/22699609. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Jumpstart Consortium Human Microbiome Project Data Generation Working G Evaluation of 16S rDNA-based community profiling for human microbiome research. PloS one. 2012;7(6):e39315. doi: 10.1371/journal.pone.0039315. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Langille MGI, Zaneveld J, Caporaso JG, McDonald D, Knights D, Reyes JA, et al. Predictive functional profiling of microbial communities using 16S rRNA marker gene sequences. 2013 Sep;31(9):814–21. doi: 10.1038/nbt.2676. Available from: http://www.ncbi.nlm.nih.gov/pubmed/23975157. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Matsen FA, Kodner RB, Armbrust EV. pplacer: linear time maximum-likelihood and Bayesian phylogenetic placement of sequences onto a fixed reference tree. BMC Bioinformatics [Internet] 2010;11(1):538. doi: 10.1186/1471-2105-11-538. Available from: http://www.biomedcentral.com/1471-2105/11/538. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Nakatsuji T, Chiang H-I, Jiang SB, Nagarajan H, Zengler K, Gallo RL. The microbiome extends to subepidermal compartments of normal skin. Nature Communications. Nature Publishing Group. 2013 Feb 5;4:1431. doi: 10.1038/ncomms2441. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Nelson MC, Morrison HG, Benjamino J, Grim SL, Graf J. Heimesaat MM, editor. Analysis, Optimization and Verification of Illumina-Generated 16S rRNA Gene Amplicon Surveys. PloS one [Internet] 2014 Apr 10;9(4):e94249–14. doi: 10.1371/journal.pone.0094249. Available from: http://dx.plos.org/10.1371/journal.pone.0094249. [DOI] [PMC free article] [PubMed]
- Oh J, Byrd AL, Deming C, Conlan S. Kong HH, et al., editors. NISC Comparative Sequencing Program. Biogeography and individuality shape function in the human skin metagenome. 2014 Oct 2;514(7520):59–64. doi: 10.1038/nature13786. Available from: http://www.ncbi.nlm.nih.gov/pubmed/25279917. [DOI] [PMC free article] [PubMed]
- Oksanen J, Blanchet FG, Kindt R, Legendre P, Minchin PR, O'Hara RB, et al. vegan: Community Ecology Package. 2015 [Google Scholar]
- R Core Team . R: A Language and Environment for Statistical Computing [Internet] R Foundation for Statistical Computing; 2015; Vienna, Austria: Available from: https://www.R-project.org/ [Google Scholar]
- Salter SJ, Cox MJ, Turek EM, Calus ST, Cookson WO, Moffatt MF, et al. Reagent and laboratory contamination can critically impact sequence-based microbiome analyses. BMC Biol. BioMed Central Ltd. 2014 Nov 12;12(1):87. doi: 10.1186/s12915-014-0087-z. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Schmieder R, Edwards R. Rodriguez-Valera F, editor. Fast identification and removal of sequence contamination from genomic and metagenomic datasets. PloS one. 2011;6(3):e17288. doi: 10.1371/journal.pone.0017288. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Segata N, Boernigen D, Tickle TL, Morgan XC, Garrett WS, Huttenhower C. Computational meta'omics for microbial community studies. Molecular systems biology. 2013;9(1):666–6. doi: 10.1038/msb.2013.22. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Segata N, Waldron L, Ballarini A, Narasimhan V, Jousson O, Huttenhower C. Metagenomic microbial community profiling using unique clade-specific marker genes. Nature methods [Internet] 2012 Aug;9(8):811–4. doi: 10.1038/nmeth.2066. Available from: http://www.ncbi.nlm.nih.gov/pubmed/22688413. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Wang Q, Garrity GM, Tiedje JM, Cole JR. Naive Bayesian classifier for rapid assignment of rRNA sequences into the new bacterial taxonomy. Applied and Environmental Microbiology. 2007 Aug;73(16):5261–7. doi: 10.1128/AEM.00062-07. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Wu GD, Lewis JD, Hoffmann C, Chen Y-Y, Knight R, Bittinger K, et al. Sampling and pyrosequencing methods for characterizing bacterial communities in the human gut using 16S sequence tags. BMC Microbiol. BioMed Central Ltd. 2010;10(1):206–14. doi: 10.1186/1471-2180-10-206. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Xu Z, Malmer D, Langille MGI, Way SF, Knight R. Which is more important for classifying microbial communities: who's there or what they can do? The ISME journal [Internet] 2014 Dec;8(12):2357–9. doi: 10.1038/ismej.2014.157. Available from: http://www.ncbi.nlm.nih.gov/pubmed/25171332. [DOI] [PMC free article] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.