Abstract
Elucidating the factors that impinge on the stability of bacterial communities in the vagina may help in predicting the risk of diseases that affect women’s health. Here, we describe the temporal dynamics of the composition of vaginal bacterial communities in 32 reproductive age women over a 16-week period. The analysis revealed the dynamics of five major classes of bacterial communities and showed that some communities change markedly over short time periods, whereas others are relatively stable. Modeling community stability using new quantitative measures indicates that deviation from stability correlates with time in the menstrual cycle, bacterial community composition and sexual activity. The women studied are healthy, thus it appears that neither variation in community composition per se, nor higher levels of observed diversity (co-dominance) are necessarily indicative of dysbiosis, in which there is microbial imbalance accompanied by symptoms.
INTRODUCTION
Bacterial communities in the human vagina are key components of a multifaceted antimicrobial defense system that operates to protect women against disease (1). The mechanisms by which these bacterial communities effectively exclude non-indigenous organisms in reproductive-age women are not known with certitude, but one appears to be the production of acidic fermentation products (primarily lactic acid) that serve to create an acidic environment that restricts the growth of most pathogens (2). Consistent with this, microbiologists have demonstrated that vaginal bacterial communities of healthy women are often dominated by species of lactic acid-producing bacteria. The bacterial communities that help to provide these ecosystem services are composed of markedly different species, but can be clustered into five main groups that are commonly found in reproductive-age women of different ethnic groups and ages (3, 4). However, little is known about the temporal dynamics of these sorts of communities and whether they differ in terms of resistance and resilience to internally or externally imposed disturbances such as menses, sexual behaviors, lubricants, and hygiene practices. A better understanding of factors that lead to the development and maintenance of specific and stable vaginal bacterial communities is needed so that strategies can be developed to promote and maintain reproductive health.
From ecological theory it can be anticipated that less stable communities may be more susceptible to invasion by pathogenic organisms (5). In the human vagina, an imbalance in the microbial community is thought to elicit the symptoms associated with bacterial vaginosis (6). BV is the most frequently cited cause of vaginal discharge and malodor, and is the most common vaginal syndrome of reproductive-age women. It results in millions of health care visits annually in the United States alone (7), and is associated with increased risk for acquiring sexually transmitted infections (8) and developing obstetrics complications (9). The condition is characterized by the occurrence of an excessive thin white vaginal discharge, a fishy unpleasant vaginal odor, an elevated vaginal pH (>4.5), and the presence of “clue” cells (squamous epithelial cells covered with adherent bacteria). In clinical settings BV is diagnosed when three of these four criteria are met (10). However, in research and laboratory settings, BV is diagnosed by scoring a Gram stained vaginal smear, often using the criteria defined by Nugent et al. (11). The Nugent score is based on the premise that high numbers of Lactobacillus spp. are indicative of health and their depletion coupled with increased proportions of morphotypes resembling Gardnerella, Bacteroides spp. and curved Gram-variable rods is indicative of BV. Although the causes of BV remain an enigma (12), and there is contentious debate as to how it should be diagnosed, the condition is clearly a risk factor for adverse sequelae that affect women’s health.
Our understanding of vaginal bacterial community composition has broadened as a result of investigators using cultivation-independent methods based on the analysis of 16S ribosomal RNA (rRNA) gene sequences (3, 13, 14). These studies have shown that vaginal bacterial communities vary among women, are species rich, and include bacteria that had not been detected by traditional culture-based methods. However, to date, most studies of vaginal microbiology have used cross-sectional designs in which samples are obtained from individuals at a single time point or the interval between sampling spans weeks or months. Although these studies have provided important information on the species composition of vaginal communities, they yield little insight about the normal temporal dynamics of these bacterial communities within individuals and do not allow for the estimation of community stability. As a result, it is not known if vaginal bacterial communities are relatively stable over time, can exist in alternative equilibrium states, oscillate in accordance with hormonal cycles, or exhibit more complex dynamics, all of which could affect the risk of disease. Moreover, the influences of personal behaviors, hygiene practices, and host characteristics (i.e., immune and genetic factors) on community dynamics remain largely unknown.
Here we describe changes in the identity and abundances of bacteria in the vaginal communities of 32 women by analyzing vaginal samples obtained twice weekly over a 16-week period. The kinds of bacteria present in the samples were identified by classifying 16S rRNA gene sequences in each sample using high-throughput pyrosequencing technology and characterizing vaginal community function by determining the metabolites produced throughout the 16-week period.
RESULTS
Characterization of vaginal bacterial communities
To determine the temporal dynamics of vaginal bacterial communities in healthy reproductive-age women a longitudinal study was conducted in which 32 women (Table S1) self-collected mid-vaginal swabs twice-weekly for 16 weeks using a validated self-collection protocol (15). The bacterial diversity in the vaginal communities sampled was determined by pyrosequencing variable regions 1 and 2 (V1–V2) of bacterial rRNA genes. A data set of 2,522,080 high-quality classifiable 16S rRNA gene sequences was obtained from 937 samples with an average of 2,692 ± 910 (SD) sequences per sample. By adapting methods we have described previously (3) (see methods section and Supplementary Online Materials), the bacterial communities sampled were classified into one of five community state types based on differences in species composition and their relative abundances (fig. S1 and table S2). Three of these clusters, which were designated community state types I–III, were dominated by L. crispatus, L. gasseri or L. iners, respectively. Unlike our previous study (3), none of the communities examined here were dominated by L. jensenii, probably because of the low prevalence of this community state type and the sampling of too few women. Communities that clustered in community state type IV-A and IV-B were heterogeneous in composition, lacked significant numbers of Lactobacillus sp. but differed in species composition (fig. S1 and table S2). Communities of state type IV-A were generally characterized by modest proportions of either L. crispatus, L. iners, or other Lactobacillus spp., along with low proportions of various species of strictly anaerobic bacteria such as Anaerococcus, Corynebacterium, Finegoldia or Streptococcus. In contrast, communities of state type IV-B had higher proportions of the genus Atopobium, in addition to Prevotella, Parvimonas, Sneathia, Gardnerella, Mobiluncus or Peptoniphilus and several other taxa. Some of the taxa in communities of state type IV-B have previously been shown to be associated with bacterial vaginosis (16). Overall, high Nugent scores were most often associated with communities state type IV-B, whereas low Nugent scores were associated with community state type IV-A (Fig. 1E and fig. S2).
We further assessed the association of subject ethnicity with vaginal bacterial community composition and compared the results to the findings of our previous study (3). That study showed that Black women are more likely to harbor communities of state type IV, whereas white women have a higher prevalence of community state type I. A log-linear model was used to assess similarity between proportions of community state types in Black and white women in both studies. We showed that the association between community state types and ethnicity of the cohort of this study was similar to that previously reported (see Supplementary Online Material and fig. S3).
Community classes and community state type transitions
Profiles of community state types were derived from a time series of community samples and clustered into five classes of longitudinal dynamics, which we refer to as community classes (Fig. 1A and 1B), designated LC, LG, LI, DA and DB (see Methods section). The term community class is further defined in the Supplementary Material. These classes reflect similarities in how community composition changed over time (Fig. 1). Nearly all the temporal profiles were complex and somewhat individualized (Fig. 1C). Not all kinds of transitions between community states were equally likely to occur within a time series, and some were not observed (Table 1 and fig. S4). For example, vaginal bacterial communities of state type IV-B often transitioned to state type III, but only rarely to state type I. Likewise, the results show that vaginal communities of state type I, which are dominated by L. crispatus, most often transition to become community state type III, which are dominated by L. iners, or community state type IV-A. Of note, communities of state type III were two times more likely to switch to community state type IV-B than to community state type IV-A. The communities of state type II are dominated by L. gasseri and rarely underwent transitions to other community state types. Conversely, we observed no transitions from community state type I to community state type II, and there were only two transitions from community state type III to community state type II.
Table 1.
from to | I | II | III | IV,A | IV,B |
---|---|---|---|---|---|
I | 133 (84.2) | 0 | 12 (7.6) | 11 (7.0) | 2 (1.3) |
II | 0 | 50 (89.3) | 2 (3.6) | 4 (7.1) | 0 |
III | 10 (2.7) | 2 (0.5) | 314 (85.8) | 16 (4.4) | 24 (6.6) |
IV,A | 12 (13.6) | 3 (3.4) | 17 (19.3) | 49 (55.7) | 7 (8.0) |
IV,B | 3 (1.3) | 0 | 22 (9.3) | 4 (1.7) | 208 (87.8) |
Index of stability of vaginal bacterial communities
By definition, communities that persist in the same state type over time display a high level of stability, while those that often transition to different state types have low levels of stability. We introduce the normalized Jensen-Shannon divergence index, defined as the median of Jensen-Shannon distances between all pairs of community states, which provides a quantitative measure of community stability (see the Supplementary Online Material and Fig. 1D). As the index becomes smaller less change is observed between different community states and the community is more constant over time. The vaginal bacterial communities of subjects 3, 4, 5 and 6 were of class DB, and displayed higher levels of overall constancy and persistently high Nugent score even though the subjects did not report any BV symptoms. This calls into question current widely held views on what is considered to be a “normal” vaginal bacterial community that are founded on the premise that the communities of healthy women must contain high proportions of Lactobacillus species (17). This premise is also undermined by the observation that vaginal bacterial communities of subjects 32, 26, 25 of class DA and subjects 30, 24, 31 of class LC had high Jensen-Shannon indices yet persistently low Nugent scores (Fig. 1D and 1E). Thus, highly variable vaginal bacterial communities do not always have persistently high Nugent scores, so variation per se does not always equate with disease. These results suggest that neither variation in community composition, nor constantly high levels of apparent diversity (co-dominance) are necessarily indicative of dysbiosis. The meaning and implications of these observations will remain unknown until molecular characterization of vaginal microbiota is applied to studies of the events leading to adverse outcomes, such as acquisition of sexually transmitted diseases or preterm birth.
Lactobacillus sp. and vaginal bacterial community stability
The stability of vaginal communities was associated with certain species of lactic acid-producing bacteria that dominated a community. For example, the vaginal communities of women in community classes LC (5 women) and LG (2 women) were most often dominated by L. crispatus and L. gasseri, respectively. These communities were rather stable, exhibited fewer transitions between community states, and typically had low Nugent scores. In these women most transitions between community states were associated with menses (fig. S1 and fig. S5). For example, in subject 28 (fig. S5), a community dominated by L. crispatus was resilient, being replaced by a community dominated by L. iners during menses, which then reverted to a community dominated by L. crispatus at the end of menses. Similarly, in subject 24 (Fig. 2A), a community dominated by L. crispatus is replaced by a community dominated by Streptococcus during menses. Communities of class LI (14 women, fig. S5) were typically dominated by L. iners, but varied widely in terms of species composition and stability. This is illustrated by the communities of subjects 14, 15, 16, 18 and 19 that appeared rather stable over time, while others such as 11 and 27 commonly shifted to different community types that were more often associated with higher Nugent scores. The underlying reasons for these differences are unknown but might reflect genomic heterogeneity in the dominating Lactobacillus sp. Heatmaps depicting the dynamics of vaginal communities in a few selected subjects are shown in Figure 2; those for all 32 subjects are shown in fig. S5.
Dynamics of the vaginal microbiota
The rapid and sometimes extensive turnover of human vaginal bacterial communities was visualized by mapping temporal changes in community composition onto the 3-dimensional community space defined previously in a cross-sectional study by Ravel et al. of vaginal communities in 396 women (3) (Fig. 3). These illustrations show that while the vaginal community composition of many individuals changed over time the alternatives were often restricted to specific community states (movie S1 and Fig. 3D). This is consistent with the notion that vaginal communities can exist in alternative equilibrium states as we have previously hypothesized (3). In contrast, communities of most other women evinced dynamics consistent with the “community resilience hypothesis”, in which a community, though dynamic, normally resides in a single region of space (movie S2 and Fig. 3B). In these cases, the composition and structure of a community occasionally changed to a transition state, but they were resilient, and homeostatic mechanisms returned them to their ‘ground state’.
In many women, the bacterial species composition and rank abundances of bacterial species changed markedly over short periods of time (fig. S5). Consistent with this, the communities of some women seemed resilient and showed simple and predictable changes between community states that occurred only during menses [e.g., subjects 24 (Fig. 2A), 12 (Fig. 2D), 28, and 19]. In contrast the communities of other women were almost invariant during menses (e.g., subject 29, Fig. 2B), while still others changed states continuously over time regardless of whether the subjects were menstruating or not (e.g., subjects 1, 2, 7 and 27). The latter might be examples of communities transitioning to alternative equilibrium states. Most of the transitions to other state types were, however, transient in nature with 35% of all state types persisting for not more than a week. This is consistent with previous studies that showed significant yet short-lived changes in Nugent scores (18, 19).
Identifying factors that disturb stability
We sought to evaluate the dependence of vaginal bacterial community stability as a function of time in the menstrual cycle and other time-varying factors. The Jensen-Shannon divergence index described above is a measure of a community’s deviation from constancy over an entire time series, and hence it cannot be used to study temporal changes in community stability. Instead we calculated the rate of change of the log of Jensen-Shannon distances between consecutive community states. The resulting log Jensen-Shannon divergence rate of change over normalized menstrual cycles was modeled using a linear mixed effects model with a Fourier polynomial of the normalized menstrual time component adjusted for hormonal contraception, community type, sexual activities, lubricant use and douching (with the last three evaluated one day prior to sampling) and with subject based degree two polynomial random effects to account for correlation in repeated measurements on each subject (Fig. 4). The Fourier polynomial captured population level dependence of the log Jensen-Shannon divergence rate of change (i.e., community deviation from constancy) on time in the menstrual cycle. The lowest constancy was associated with menses. Interestingly, the two minima of the population level Jensen-Shannon divergence rate of change (indicating highest community constancy) coincided with the two maxima of estradiol concentration as reported by Minassian et al. (20), while the maximum of progesterone concentration coincided with the second minimum of the population level constancy function (Fig. 4). Of all the metadata evaluated in the model, sexual activity was the only one that had a significant (negative) effect on constancy independent of time in the menstrual cycle, but its effect was rather weak in comparison to that of community class (see the Material and Methods section). Interestingly, no statistically significant dependence was observed between Shannon diversity index and time in the menstrual cycle (fig. S6), indicating that while bacterial communities tend to lack stability during menses it is not necessarily accompanied by increased community diversity.
Dynamics of the vaginal metabolome
A total of six samples from subjects 28, 22 and 8 and twelve samples from subject 10 were processed for metabolic profiling by 1H NMR spectroscopy (Fig. 5). These subjects were selected because they each belong to one of the major community classes observed in this study (Fig. 1). Strong resonance signals from the most abundant metabolites were identified in spectra of these samples, including lactate (δ = 1.33 ppm), acetate (δ = 1.92 ppm), and succinate (δ = 2.41 ppm), as well as certain amino acids and sugars (δ = 3.2–5.5 ppm) (Fig. 5). Subjects 28 and 22 maintained high levels of lactate even during menses when community composition shifted (Fig. 5B and C). During week 11 and menses, subject 22 underwent a shift from a community dominated by L. gasseri to a community dominated by Streptococcus sp. without a corresponding decrease in lactate levels, suggesting functional redundancy between these two species and conservation of overall community function. However, Streptococcus sp. dominated communities contained slightly higher levels of acetate (Fig. 5C). Similarly, during menses, the abundance of L. iners in the vaginal bacterial community of subject 28 increased while that of L. crispatus decreased, without effect on the metabolome of the community (Fig. 5B). Over time, the vaginal bacterial community of subject 8 consistently included members of the genera Atopobium, Prevotella, and other anaerobes as well as lower abundance of L. iners. The metabolome consistently showed high levels of succinate, and acetate and lower concentrations of lactate over time as compared to the metabolomes of the other subjects (Fig. 5D).
The normalized 1H NMR spectra of vaginal swabs were analyzed by principal component analysis to obtain an overview of the variations between samples within and between subjects (fig. S7). This analysis confirmed that the metabolic output of each community class was generally unique and somewhat consistent over time except for subject 10 (fig. S7A), which varied more than the others.
Major changes in community composition that lasted several weeks were temporally associated with changes in the metabolome. For example, the vaginal community of subject 10 underwent a profound change beginning in week 8 of the study in which the proportion of Atopobium sp. markedly increased while that of L. iners decreased. This change was characterized by the increase of succinate and acetate and the decrease of lactate (Fig. 5A). The overall clustering was influenced by lactate and glycogen for subject 22, 28 and part of the time series in subject 10, while higher levels of succinate and acetate were responsible for the clustering of samples from subject 8 and 10 (fig. S7B).
DISCUSSION
Here, we have shown that vaginal communities of some women show low constancy and high levels of species turnover, and that community dynamics vary widely among women even among those that cluster in the same community class. These findings suggest that point estimates of community composition that arise from cross sectional studies could be misleading for most women because they experience a distribution of community states over time. This calls into question whether data from cross-sectional studies can be reliably used to link vaginal bacterial community composition to health outcomes. Fluctuation in community composition and constancy are mainly affected by time in the menstrual cycle, community class, and to a certain extent by sexual activity, but other unknown factors are also certainly at play. Nonetheless, it should be noted that in many cases community function is probably maintained despite changes in community composition, because shifts in community structure often only involved changes in the relative dominance of a limited number of different lactic acid producing bacterial species. Such fluctuations could occur while maintaining community performance (i.e., lactic acid production; Fig. 5 A–D) when there is functional redundancy among community members and shifts in the relative abundances of guild members occur due to changes in environmental conditions that favor one population over another. This coupled with the observation that a limited number of community types are found in women of different ethnicities suggest that deterministic processes driven by interspecies interactions, host-microbe interactions, and niche partitioning may account for vaginal community composition. In some cases, changes in community performance occur as a result of changes in community composition (Fig. 5A). These are reflected in the shift in composition of the metabolome, which in this strictly anaerobic environment is dominated by fermentation products such as lactic, succinic and acetic acids that accumulate in vaginal secretions (Fig. 5A–D)
Intervals of increased susceptibility to disease may occur because vaginal microbiota fluctuate. It is envisioned that better knowledge of vaginal community dynamics and the mutualism between the host and indigenous bacterial communities will lead to the development of strategies to manage the vaginal ecosystem in a way that promotes health and minimizes the use of antibiotics. These efforts should take into account differences among individuals in terms of vaginal community dynamics, and the potential problems that might arise from diagnostic and therapeutic strategies based on cross-sectional studies. The data reported in this study should help to inform and improve interpretation of past and future cross-sectional datasets.
MATERIALS AND METHODS
Subjects and sample collection
Between 2006–2007, 39 non-pregnant, reproductive-age women were enrolled in a longitudinal study at Johns Hopkins University School of Medicine, Baltimore, MD (21, 22). The protocol was approved by the Institutional Review Board of the Johns Hopkins University School of Medicine and the University of Maryland School of Medicine. Written informed consent was appropriately obtained from all participants that included permission to use the samples obtained in future studies.
Overall, 32 women successfully completed the longitudinal study (85% retention) (table S1). Self-collected mid-vaginal swabs and vaginal smears were obtained twice-weekly for 16 weeks. All the swabs collected on a given day were stored in a freezer at the participant’s home and transferred on ice to the clinic on the participants’ scheduled visits at week 4 and week 16 (or more frequently) where they were archived at −80°C.
Daily diaries were in the form of a yes/no check-off list to report menstrual bleeding; vaginal douching; sexual activity (including vaginal intercourse, receptive oral sex, digital penetration, anal sex, and use of sex toys, condoms, spermicides and lubricants); wearing of thong undergarments; use of medications or contraceptives, a diaphragm, sanitary napkin, and/or tampons. A comprehensive health and behavior questionnaire was administered at baseline and follow-up visits. Smears and behavioral diaries were submitted to the study site weekly.
Nugent Gram stain analysis
The vaginal smears were heat fixed and Gram stained, then blinded and read in random order. A microscopy score of 0–10 was assigned by an experienced microbiologist using the standardized method described by Nugent et al. (11). Nugent’s scores are composite scores based on the cellular morphologies of the bacteria present in a sample. A score of 0–3 was designated normal, 4–6 as intermediate and 7–10 was considered to be abnormal and indicative of bacterial vaginosis.
DNA extraction and purification
Genomic DNA was extracted from archived vaginal swab specimens. Procedures for the extraction of genomic DNA from frozen vaginal swabs have been developed and validated (3, 15). Briefly, frozen vaginal swabs were immersed in 1 ml of pre-warmed (55°C) cell lysis buffer composed of 0.05M potassium phosphate buffer containing 50 μl lyzosyme (10 mg/ml), 6 μl of mutanolysin (25,000 U/ml; Sigma-Aldrich) and 3 μl of lysostaphin (4,00 U/ml in sodium acetate; Sigma-Aldrich) and the mixture was incubated for 1 hour at 37°C. Then 10 μl proteinase K (20 mg/ml), 100 μl 10% SDS, and 20 μl RNase A (20 mg/ml) were added and the mixture was incubated for 1h at 55°C. The samples were transferred to a FastPrep Lysing Matrix B tube (Bio101) and microbial cells were lysed by mechanical disruption using a bead beater (FastPrep instrument, Qbiogene) set at 6.0 m/s for 30 sec. The lysate was processed using the ZR Fecal DNA extraction kit (ZYMO Research) and according to the manufacturer’s protocol omitting the lysis steps (steps 1–3). The kit includes a column (Zymo-Sin IV-HRC spin filter) specifically designed to remove PCR inhibitors from the DNA samples. The DNA was eluted into 100 μl of TE buffer, pH 8.0. This procedure provided between 2.5 and 5 μg of high quality whole genomic DNA from vaginal swabs.
PCR amplification and sequencing of the V1–V2 region of bacterial 16S rRNA genes. We used two universal primers 27F and 338R for PCR amplification of the V1–V2 hypervariable regions of the 16S rRNA gene. The 338R primer includes a unique sequence tag to barcode the samples. A total of 96 unique 338R primers each with a specific barcode were synthesized. The primers were as follows:
27F - 5′-GCCTTGCCAGCCCGCTCAGTCAGAGTTTGATCCTGGCTCAG-3′
338R - 5′-GCCTCCCTCGCGCCATCAGNNNNNNNNCATTACCGCGGCTGCTGGCA-3′
where the underlined sequences are the 454 Life Sciences® primers B and A in 27F and 338R, respectively, and the bold font denotes the universal 16S rRNA primers 27F and 338R. The barcode within 338R is denoted by 8 Ns.
For every set of 192 vaginal genomic DNA samples PCR amplification of 16S rRNA genes was performed in 96 well microtiter plates as follows: 5.0 μl 10X PCR buffer II (Applied Biosystems, Foster City, CA), 3.0 μl MgCl2 (25 mM; Applied Biosystems), 2.5 μl Triton X-100 (1%), 2.0 μl deoxyribonucleoside triphosphates (10 mM), 1.0 μl each of primer 27F and 338R (20 pmol/μl each), 0.5 μl AmpliTaq DNA polymerase (5U/μl; Applied Biosystems), and 50 ng of template DNA in a total reaction volume of 50 μl. Reactions were run in a PTC-100 thermal controller (MJ Research) using the following cycling parameters: 5 min denaturing at 95°C followed by 20 cycles of 30 sec at 95°C (denaturing), 30 sec at 56°C (annealing) and 90 sec at 72°C (elongation), with a final extension at 72°C for 7 minutes. Separate plates containing negative controls without a template were included for each set of plate processed. The presence of amplicons was confirmed by gel electrophoresis on a 2% agarose gel and stained with SYBRGreen (Ambion). PCR products were quantified using Quant-iT Picogreen® quantification system (Invitrogen) and equimolar amounts (100 ng) of the PCR amplicons were mixed in a single tube. Amplification primers and reaction buffer were removed by processing the amplicons mixture with the AMPure Kit (Agencourt). All PCR amplification reactions that failed were repeated twice using different amounts of template DNA and if these failed the samples were excluded from the analysis.
The purified amplicon mixtures were sequenced by 454 pyrosequencing using 454 Life Sciences® primer A by the Genomics Resource Center at the Institute for Genome Sciences, University of Maryland School of Medicine using Roche/454 Titanium chemistries and protocols recommended by the manufacturer and as amended by the Center.
Sequence Reads Quality Control, Analysis and Taxonomic Assignments. The QIIME software package (23) was used for quality control of the sequence reads using the split-library.pl script and the following criteria: 1) minimum and maximum length of 220 bp and 400 bp; 2) an average of q25 over a sliding window of 50 bp. If the read quality dropped below q25 it was trimmed at the first base pair of the window and then reassessed for length criteria; 4) a perfect match to a barcode sequence; and 5) presence of the 338R 16S primer sequence used for amplification. Sequences were binned based on sample-specific barcode sequences and trimmed by removal of the barcode and primer sequences (forward if present and reverse). High quality sequence reads were first de-replicated using 99% similarity using the UCLUST software package (24) and detection of potential chimeric sequences was performed using the UCHIME component of UCLUST (25). Chimeric sequences were removed prior to taxonomic assignments.
Taxonomic assignments were performed as described by Ravel et al. (3) using a combination of the RDP Naïve Bayesian Classifier (26) and speciateIT (speciateIT.sourceforge.net). Taxonomic assignments (sequence read counts and relative abundances) are shown in Table S2.
Statistical Analysis
Community state type assignments. A community state type is a cluster of community states (the species composition and abundance of a vaginal community at a specific point in time) that are similar in terms of the kinds and relative abundances of observed phylotypes (fig. S1 and Supplementary Online Material). The dissimilarity between community states was measured using Jensen-Shannon metric (Supplementary Online Material). The clustering of community states was done using hierarchical clustering based on the Jensen-Shannon distances between all pairs of community states and Ward linkage. Only phylotypes with at least 20 sequences were used in the clustering process. The number of clusters was determined using the Silhouette measure of the degree of confidence in a particular clustering. It was applied to the clusters of community states that resulted from the Jensen-Shannon hierarchical clustering by cutting the corresponding dendrogram at the level that would have k leaves (clusters), where k was between 2 and 10. The maximum of the Silhouette values was at k = 5. The corresponding five clusters are depicted on Fig. S1 and are labeled I, II, III, IV-A, IV-B.
The Silhouette values lie in the interval [−1, 1], with well-clustered observations having values near 1 and poorly clustered observations having values near −1. The precise definition of Silhouette measure is as follows: for each community state type i, the silhouette width s(i) of i is set to be 0 if i is the only observation in the cluster and if there is more than one element in the cluster of i, s(i) is defined by the formula:
where a(i) is the average Jensen-Shannon distance between i and all other community states of the cluster to which i belongs and for each cluster C that does not contain i, d(i,C) is defined as the average Jensen-Shannon distance between i and all community states of C. b(i) is defined as minC d(i,C), and can be seen as the dissimilarity between i and the nearest cluster to which it does not belong. Community states with a s(i) value close to 1 are very well clustered, a value of s(i) close to 0 mean that the community state lies between two clusters, and a negative value of s(i) means that the community state was probably placed in the wrong cluster. The Silhouette value of clustering results of community states is the average of silhouette widths over all community states. The calculation of Silhouette values was done using R package clValid version 2.0–2 modified to allow for use of Jensen-Shannon metric.
Community class assignments
The temporal patterns of vaginal bacterial community dynamics can be classified using profiles of community state types (see Supplementary Online Material). A profile of community state types is obtained by replacing each community state in the temporal data of phylotype proportions by the corresponding community state type (Fig. 1C). The proportion of time each state type is observed within the profile characterizes persistence of the community in a given community state type (Fig. 1C). Temporal patterns of vaginal bacterial communities are classified based on the long-term patterns of community state type persistence. The dendrogram in Figure 1 was generated using Ward linkage hierarchical clustering of the Euclidean distances between state type persistence vectors. A color bar in Figure 1B delineates five clusters of community state type profiles, and these are referred to as community classes. The number of clusters was determined using the Silhouette method described above in the context of community state type assignment.
Modeling the dependence of the average population-wide deviation from constancy of vaginal bacterial communities on the time in the menstrual cycle. We used a mixed effects model to assess the association between the average population-wide deviation from constancy of vaginal bacterial communities and the time in menstrual cycle, hormonal contraception, community class, sexual activity, lubricant use, and douching. Sexual activity was considered a categorical variable indicating that one of the following activities took place one day before sampling: vaginal sex, anal sex, receptive oral sex, digital penetration or use of a sex toy. Lubricant use and douching were also recorded a day before sampling. The outcome variable in the model is the rate of change of the log of Jensen-Shannon divergence defined as the ratio of the log of the Jensen-Shannon divergences between consecutive community states and the time between states. A structure of the model is captured by the following equations:
(1) |
where Yi is a vector of log Jensen-Shannon divergence rates of change for the i-th subject; s(t) is a Fourier (trigonometric) polynomial capturing the nonlinear, mean pattern of association between Yi and the normalized menstruation time t. The normalized menstrual time was derived from the time variable by rescaling the intervals from the first day of one menstrual cycle to the first day of the next menstrual cycle to create 28 day cycles. If the study ended before the next menstrual cycle and the number of days from the first day of the last menses to the last day in the study was less than 28 days, no rescaling took place. Xi is a design matrix of fixed effects for the i-th subject that in the full model incorporates the effect of hormonal contraception, community class and the one-day-lagged effects of sexual activity, lubricant use and douching; and b is a vector of model parameters measuring the level of association with the aforementioned fixed effects. Zi is a random effects design matrix of a polynomial function of sampling times for the i-th subject; and bi is the associated vector of coefficients assumed random with a joint normal distribution with mean 0 and a variance-covariance matrix D. εi is a vector of residuals that are also assumed normally distributed with a variance-covariance matrices, Σi, specific for each individual and described below. We assume bi and εi to be independent. Model fitting was performed using R package nlme version 3.1–100 (27).
Model selection was performed using the following top-down strategy (28).
Find the optimal random effects structure of a “beyond optimal” model whose fixed component contains all explanatory variables and as many interactions as possible using restricted maximum likelihood (REML) estimators (29).
Find the optimal fixed effects structure (using maximal likelihood estimators) for the model with the optimal random effects structure.
Present the final model using REML estimation.
The fixed effects of the “beyond optimal” model contain hormonal contraception, community class and the one-day-lagged effects of sexual activity, lubricant use, douching and degree four Fourier polynomial of normalized menstrual time stratified by community class. The random structures tested with this model were subject specific polynomials of sampling time of degree 0, 1, 2 and 3. Both likelihood ratio test and the Akaike information criterion (AIC) (29) identified the polynomial of degree two as the optimal random structure. When comparing models with nested random effects structures, the models are nested with respect to the variances. Thus, when performing a likelihood ratio test on such nested models the null hypothesis is H0 : σ2=0 and the alternative hypothesis is H1 : σ2>0. Since the alternative hypothesis is σ2>0 and not σ2 ≠ 0, we are testing on the boundary. This is because if there is no evidence to reject the null-hypothesis, then σ2=0 is the lowest possible value, as the variance is always non-negative. P-values of the likelihood ratio tests for the random structures were computed using Verbeke and Molenberghs’ formulas for the likelihood ratio test while testing on the boundary (30).
The optimal fixed effects structure was identified by a top-down selection of the categorical variables: hormonal contraception, community class, sexual activity, lubricant use and douching (using likelihood ration tests) followed by a selection of the stratification of the Fourier polynomial term (using AIC criteria as these models were not nested within each other) and finally selection of the degree of the Fourier polynomial term. We have tested stratifications of the Fourier polynomial term corresponding to all possible subsets of the set {DA, DB, LI, LC, LG} of community classes, including the empty set (no stratification). The optimal fixed effects structure contains a degree three Fourier polynomial, with no stratification, adjusted for sexual activity and a binary community class factor with two levels, one that combines classes DA and DB, and the other that combines classes LI, LC and LG (referred to as class L). The coefficients and p-values of the model for the sexual activity and binary community coefficients are shown in table S3. Of all the metadata evaluated in the model, sexual activity was the only one with a significant (negative) effect on constancy, independent of time in the menstrual cycle, however the magnitude of the effect is rather weak in comparison to that of binary community state type. Communities LI, LC and LG are significantly more stable (constant) than the communities DA and DB.
The subject-specific normal quantile-quantile residual plots show that the residuals are approximately normally distributed. The within subject plots of normalized residuals versus normalized menstrual time showed no discernible patterns.
We have also built models with the same structure as model (1) but where s(t) was set to be either a thin plate regression spline (31, 32) or a polynomial of normalized menstrual time. The mean population-level deviation from constancy curves of these models were not significantly different from the mean population-level deviation from constancy curves derived using Fourier polynomial model.
To assess the sensitivity of the results to the choice of dissimilarity measures between community states used, the final model was applied to the rates of change between consecutive community states utilizing Bray-Curtis, relative entropy, Euclidean and Manhattan dissimilarity measures. The resulting population-wide deviation from constancy curves all lay within the 95% confidence interval ring of the Jensen-Shannon deviation from constancy curve indicating the robustness of our results.
NMR sample preparation and data acquisition
For each sample, one dry Dacron swab head was cut using scissors, and placed in a 1.5 ml centrifuge tube. The scissors were ethanol sterilized and flamed before each use. Approximately 0.6 ml of deuterated water (D2O) was added to the centrifuge tube as an extraction solvent. The sample was homogenized by vortex mixing for 1 min and stored on ice for 5 min. The solution was pipetted into a clean 1.5 ml tube and centrifuged (3 min, 13,000 rpm, 4 °C) to remove particulates. An unused swab was also processed using the same procedure and served as a control.
1H-NMR was used to assess the metabolite profiles. A total of 500 μl of the swab extract containing 50 mM phosphate buffer at pH 7.0 and 30 μM sodium 3-(trimethylsilyl) propionate-2,2,3,3-d4 as internal chemical shift reference were processed after vortexing and centrifuged at 13,000 rpm for 2 min and transferred to a 5 mm NMR tube. All 1H-NMR experiments were carried out at 25°C on a Varian AS500 spectrometer operating at a proton NMR frequency of 499.75 MHz. One-dimensional spectra were recorded using a standard Carr-Purcell-Meiboom-Gill (CPMG) pulse sequence. An 80 ms CPMG pulse train was used to eliminate signals from large molecules, such as proteins from blood serum. Each spectrum consisted of 128 transients with a spectral width of 12 ppm and relaxation delay of 5.0 s. All free induction decays were Fourier transformed with an exponential function equivalent to a 0.3 Hz line-broadening factor and the spectra were zero filled to 32K points. The resulting spectra were manually phased and baseline corrected using ACDLABS (version 10.0, Advanced Chemistry Development, Inc.). For 1H NMR signal assignment purposes, two-dimensional (2-D) J-resolved spectroscopy (33) and total correlation spectroscopy (TOCSY) (34) NMR spectra were acquired for two selected samples. J-resolved spectra were collected using 128 scans per 32 increments with 5000 Hz spectral width in F2 and 36 Hz in F1. The TOCSY spectra were recorded with a data matrix of 2048 × 128 with spectral width of 5000 Hz in F2 and F1. Sixty-four scans were acquired and a mixing time of 80 ms was used. All 2-D NMR data were processed with the software package NMRPipe (35).
Metabolic composition and Principal Component Analysis
1H-NMR spectra were reduced to 61 integrated regions of varied width corresponding to the region of δ 0.60 – 8.50 ppm using intelligent bucketing from ACDLAB. The region from δ= 4.7 – 5.2 ppm was excluded from the analysis to avoid the phasing effects of the pre-saturation of the residual water signal. Peaks from ethanol at δ =1.19 ppm and 3.65 ppm (Fig. 5A) and glycerol at δ =3.56 ppm, 3.65 ppm and 3.78 ppm (Fig. 5C) were believed to be potentially introduced to the swabs during sample collection or sample handling. Thus the regions of δ =1.16–1.22 ppm, 3.50–3.80 ppm were also excluded from the analysis. Normalization of the integrals to the total sum of the spectrum was carried out on the data prior to principle component analysis (PCA) to allow for differences in signal-to-noise. PCA was performed with the SIMCA-P software (version 10.0, Umetrics, Sweden). The data were pre-processed by Pareto scaling
Supplementary Material
Acknowledgments
We thank Rachel Gicquelais, Elisabeth Neuendorf and Hongqiu Yang for technical assistance with DNA extraction.
Funding. This research was supported by National Institute of Health, National Institute for Allergy and Infectious Diseases awards UH2-AI083264 (JR/LJF), U01-AI70921 (JR), K01-AI080974 (RMB) and R03-AI061131.
Footnotes
Materials and Methods
Fig. S1. Assignment of vaginal community state types.
Fig S2. Profiles of community state types, Nugent scores, and menses for 32 women over a 16-week period.
Fig. S3. Modeling the correlation between ethnicity and community state types.
Fig. S4. Graphical representation of community state type transitions observed between all consecutive pairs of time points (905 transitions, Table 1) and their frequencies among all women.
Fig. S5. Individual profiles of dynamics of vaginal bacterial communities over 16 weeks for each of 32 women enrolled in the study.
Fig. S6. Shannon diversity indices over the menstrual time in 32 women samples twice weekly for 16 weeks.
Fig. S7. Principal component analysis of metabolic profiles.
Table S1. Descriptive characteristics of the 32 women enrolled in the 16-week longitudinal study.
Table S2. Taxonomic assignments, relative abundances of taxa, and metadata.
Table S3. The coefficients and p-values of binary community class and sexual activity for the optimal linear mixed effects model.
Movie S1. Animation of the vaginal microbiota dynamics in subject 26 over a 16-week period
Movie S2. Animation of the vaginal microbiota dynamics in subject 13 over a 16-week period
Author contributions. RMB, JR and LJF conceived the original idea and designed the study. JS, SSKK, LF conducted most of the experiments. GB performed the metabolomic experiments and analyses. PG and ZA developed, implemented and performed the statistical modeling analyses. UMES and XZhong contributed to the statistical modeling analyses. ZM and XZhou contributed to the interpretation of the data. PG, RMB, GB, ZA, LJF and JR interpreted the data and wrote the manuscript.
Competing interests. The authors declare that they have no competing interests.
Accession numbers. The sequence data is available in the small read archive (SRA) (accession no. SRA026073). The metadata associated with the sequence data is available in dbGap (dbGap study no. phs000261).
References
- 1.Sobel JD. Is There a Protective Role for Vaginal Flora? Current infectious disease reports. 1999;1:379. doi: 10.1007/s11908-999-0045-z. [DOI] [PubMed] [Google Scholar]
- 2.Witkin SS, Linhares IM, Giraldo P. Bacterial flora of the female genital tract: function and immune regulation. Best Pract Res Clin Obstet Gynaecol. 2007;21:347. doi: 10.1016/j.bpobgyn.2006.12.004. [DOI] [PubMed] [Google Scholar]
- 3.Ravel J, et al. The vaginal microbiome of reproductive age women. Proc Nat Acad Sci. 2011;108:4680. doi: 10.1073/pnas.1002611107. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 4.Yamamoto T, Zhou X, Williams CJ, Hochwalt A, Forney LJ. Bacterial populations in the vaginas of healthy adolescent women. J Pediatr Adolesc Gynecol. 2009;22:11. doi: 10.1016/j.jpag.2008.01.073. [DOI] [PubMed] [Google Scholar]
- 5.Dunstan PK, Johnson CR. Linking richness, community variability, and invasion resistance with patch size. Ecology. 2006;87:2842. doi: 10.1890/0012-9658(2006)87[2842:lrcvai]2.0.co;2. [DOI] [PubMed] [Google Scholar]
- 6.Marrazzo JM. Interpreting the epidemiology and natural history of bacterial vaginosis: are we still confused? Anaerobe. 2011;17:186. doi: 10.1016/j.anaerobe.2011.03.016. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 7.Sobel JD. What’s new in bacterial vaginosis and trichomoniasis? Infectious disease clinics of North America. 2005;19:387. doi: 10.1016/j.idc.2005.03.001. [DOI] [PubMed] [Google Scholar]
- 8.Martin HL, et al. Vaginal lactobacilli, microbial flora, and risk of human immunodeficiency virus type 1 and sexually transmitted disease acquisition. J Infect Dis. 1999;180:1863. doi: 10.1086/315127. [DOI] [PubMed] [Google Scholar]
- 9.Hillier SL, et al. Association between bacterial vaginosis and preterm delivery of a low-birth-weight infant. The Vaginal Infections and Prematurity Study Group. N Engl J Med. 1995;333:1737. doi: 10.1056/NEJM199512283332604. [DOI] [PubMed] [Google Scholar]
- 10.Amsel R, et al. Nonspecific vaginitis. Diagnostic criteria and microbial and epidemiologic associations. Am J Med. 1983;74:14. doi: 10.1016/0002-9343(83)91112-9. [DOI] [PubMed] [Google Scholar]
- 11.Nugent RP, Krohn MA, Hillier SL. Reliability of diagnosing bacterial vaginosis is improved by a standardized method of gram stain interpretation. J Clin Microbiol. 1991;29:297. doi: 10.1128/jcm.29.2.297-301.1991. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 12.Marrazzo JM, et al. Bacterial vaginosis: identifying research gaps proceedings of a workshop sponsored by DHHS/NIH/NIAID. Sexually transmitted diseases. 2010;37:732. doi: 10.1097/OLQ.0b013e3181fbbc95. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 13.Zhou X, et al. Differences in the composition of vaginal microbial communities found in healthy Caucasian and black women. ISME J. 2007;1:121. doi: 10.1038/ismej.2007.12. [DOI] [PubMed] [Google Scholar]
- 14.Fredricks DN, Marrazzo JM. Molecular methodology in determining vaginal flora in health and disease: its time has come. Current infectious disease reports. 2005;7:463. doi: 10.1007/s11908-005-0049-2. [DOI] [PubMed] [Google Scholar]
- 15.Forney LJ, et al. Comparison of self-collected and physician-collected vaginal swabs for microbiome analysis. J Clin Microbiol. 2010;48:1741. doi: 10.1128/JCM.01710-09. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 16.Fredricks DN, Fiedler TL, Marrazzo JM. Molecular identification of bacteria associated with bacterial vaginosis. N Engl J Med. 2005;353:1899. doi: 10.1056/NEJMoa043802. [DOI] [PubMed] [Google Scholar]
- 17.Srinivasan S, Fredricks DN. The human vaginal bacterial biota and bacterial vaginosis. Interdiscip Perspect Infect Dis. 2008;2008:750479. doi: 10.1155/2008/750479. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 18.Brotman RM, Ravel J, Cone RA, Zenilman JZ. Rapid fluctuation of the vaginal microbiota measured by Gram stain analysis. Sex Trans Infect. 2010 doi: 10.1136/sti.2009.040592. in press. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 19.Schwebke JR, Richey CM, Weiss HL. Correlation of behaviors with microbiological changes in vaginal flora. J Infect Dis. 1999;180:1632. doi: 10.1086/315065. [DOI] [PubMed] [Google Scholar]
- 20.Minassian SS, Wu CH. Free and protein-bound progesterone during normal and luteal phase defective cycles. International journal of gynaecology and obstetrics: the official organ of the International Federation of Gynaecology and Obstetrics. 1993;43:163. doi: 10.1016/0020-7292(93)90324-p. [DOI] [PubMed] [Google Scholar]
- 21.Brotman RM, et al. The effect of vaginal douching cessation on bacterial vaginosis: a pilot study. Am J Obstet Gynecol. 2008;198:628.e1. doi: 10.1016/j.ajog.2007.11.043. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 22.Brotman RM, Ravel J, Cone RA, Zenilman JM. Rapid fluctuation of the vaginal microbiota measured by Gram stain analysis. Sexually transmitted infections. 2010;86:297. doi: 10.1136/sti.2009.040592. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 23.Caporaso JG, et al. QIIME allows analysis of high-throughput community sequencing data. Nature methods. 2010;7:335. doi: 10.1038/nmeth.f.303. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 24.Edgar RC. Search and clustering orders of magnitude faster than BLAST. Bioinformatics. 2010;26:2460. doi: 10.1093/bioinformatics/btq461. [DOI] [PubMed] [Google Scholar]
- 25.Edgar RC, Haas BJ, Clemente JC, Quince C, Knight R. UCHIME improves sensitivity and speed of chimera detection. Bioinformatics. 2011;27:2194. doi: 10.1093/bioinformatics/btr381. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 26.Wang Q, Garrity GM, Tiedje JM, Cole JR. Naive Bayesian classifier for rapid assignment of rRNA sequences into the new bacterial taxonomy. Appl Environ Microbiol. 2007;73:5261. doi: 10.1128/AEM.00062-07. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 27.Pinheiro J, Bates D, DebRoy S, Sarkar D. 2006 [Google Scholar]
- 28.Diggle PJ, Heagerty P, Liang KY, Zeger SL. The analysis of longitudinal data. 2. Oxford university Press; 2002. [Google Scholar]
- 29.Zuur AF, Leno EN, Walker NJ, Saveliev AA, Smith GM. Mixed Effects Models and Extensions in Ecology with R. Springer; 2009. [Google Scholar]
- 30.Verbeke G, Molenberghs G. Linear Mixed Models for Longitudinal Data. Springer Verlag; 2009. p. 568. [Google Scholar]
- 31.Wood SN. Thin-plate regression splines. Journal of the Royal Statistical Society (B) 2003;65:95. [Google Scholar]
- 32.Wood SN. Generalized additive models. an introduction with R. CRC Press; 2006. p. 391. [Google Scholar]
- 33.Aue WP, Karham J, Ernst RR. Homonuclear broadband decoupling and two dimensional J-resolved NMR spectroscopy. Journal of Chemical Physics. 1976;64:4226. [Google Scholar]
- 34.Bax A, Davis DG. MLEV-17-based two-dimensional homonuclear magnetization transfer spectroscopy. J Magn Reson. 1985;65:355. [Google Scholar]
- 35.Delaglio F, et al. NMRPipe: a multidimensional spectral processing system based on UNIX pipes. J Biomol NMR. 1995;6:277. doi: 10.1007/BF00197809. [DOI] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.