Significance
Childhood malnutrition is a global health problem not attributable to food insecurity alone. Sequencing DNA viruses present in fecal microbiota serially sampled from 0- to 3-y-old Malawian twin pairs, we identify age-discriminatory viruses that define a “program” of assembly of phage and eukaryotic components of the gut “virome” within and across pairs where both cotwins manifest healthy growth. This program is perturbed (delayed) in both members of discordant pairs where one cotwin develops severe acute malnutrition and the other appears healthy by anthropometry. This developmental delay is not repaired by therapeutic foods. These age- and disease-discriminatory viruses may help define familial risk for childhood malnutrition and provide a viral dimension for characterizing the developmental biology of our gut microbial “organ.”
Keywords: assembly of the human gut DNA virome, childhood malnutrition, age/disease-discriminatory phage and eukaryotic viruses, gnotobiotic mice, epidemiology
Abstract
The bacterial component of the human gut microbiota undergoes a definable program of postnatal development. Evidence is accumulating that this program is disrupted in children with severe acute malnutrition (SAM) and that their persistent gut microbiota immaturity, which is not durably repaired with current ready-to-use therapeutic food (RUTF) interventions, is causally related to disease pathogenesis. To further characterize gut microbial community development in healthy versus malnourished infants/children, we performed a time-series metagenomic study of DNA isolated from virus-like particles (VLPs) recovered from fecal samples collected during the first 30 mo of postnatal life from eight pairs of mono- and dizygotic Malawian twins concordant for healthy growth and 12 twin pairs discordant for SAM. Both members of discordant pairs were sampled just before, during, and after treatment with a peanut-based RUTF. Using Random Forests and a dataset of 17,676 viral contigs assembled from shotgun sequencing reads of VLP DNAs, we identified viruses that distinguish different stages in the assembly of the gut microbiota in the concordant healthy twin pairs. This developmental program is impaired in both members of SAM discordant pairs and not repaired with RUTF. Phage plus members of the Anelloviridae and Circoviridae families of eukaryotic viruses discriminate discordant from concordant healthy pairs. These results disclose that apparently healthy cotwins in discordant pairs have viromes associated with, although not necessarily mediators, of SAM; as such, they provide a human model for delineating normal versus perturbed postnatal acquisition and retention of the gut microbiota’s viral component in populations at risk for malnutrition.
Malnutrition (undernutrition) is a leading cause of child mortality worldwide (1). Severe acute malnutrition (SAM) can manifest itself as progressive wasting (marasmus) or as a more abrupt onset syndrome characterized by generalized edema, hepatic steatosis, skin rashes and ulcerations, and anorexia (kwashiorkor). The configuration of the bacterial component of the gut microbiota of healthy infants evolves to an adult-like configuration during the first 2–3 y of life (2, 3). Normal postnatal maturation of the gut microbial community is perturbed in SAM; children with SAM living in Malawi and in Bangladesh have gut microbiota with bacterial configurations that appear younger (more immature) than the microbiota of chronologically age-matched individuals with healthy growth phenotypes (3, 4). Moreover, this immaturity is only transiently improved with current ready-to-use therapeutic food (RUTF) interventions (3, 4). These children can be viewed as having a persistent developmental abnormality—one that affects a microbial “organ” whose key functions include the biosynthesis of vitamins and the biotransformation of dietary components into products that benefit members of the gut microbial community and their host (2–5).
A study of 317 twin pairs from five rural villages in southern Malawi showed that discordance for moderate acute malnutrition (MAM) and SAM was surprisingly high during the first 3 y of life (43% of pairs) and not significantly different between mono- and dizygotic pairs (concordant undernourished pairs comprised 7% of the cohort) (4). The standard of care in Malawi is to treat both cotwins in pairs discordant for marasmus or kwashiorkor with a peanut-based RUTF for several weeks until a threshold increase in weight has been achieved (both siblings in the pair are treated to avoid potential problems arising from maternal food-sharing practices that emphasize the diseased child and neglect the healthy cotwin) (4, 6). Although short-term administration of RUTF has dramatically reduced mortality, it generally does not ameliorate the long-term morbidities associated with malnutrition—stunting, neurodevelopmental abnormalities, and immune dysfunction (e.g., refs. 6–10).
Transplantation of fecal samples obtained from children with kwashiorkor and their apparently healthy cotwins into separate groups of adult germ-free mice consuming a prototypic macro- and micronutrient-deficient Malawian diet resulted in transmission of discordant weight loss and metabolic and gut barrier dysfunction phenotypes to the animals. Development of these pathologic phenotypes was diet-dependent: they were not observed, or dramatically mitigated, when gnotobiotic mice harboring a kwashiorkor microbiota received a healthy diet with adequate nutrients (4, 11). Together, these findings indicate that the gut microbiota is causally related to SAM (4) but also raise the question of how discordance for SAM arises and whether the cotwin classified as “healthy” by anthropometry has an underlying perturbation in his/her gut community that reflects familial risk for development of pathology. To address these issues, we focused on the most variable component of the human gut microbial community, the DNA virome. Moreover, although surveys of DNA viruses present in the gut microbiota of healthy adults had revealed a dominance of phage, in particular of lysogenic phages (prophages) (12–15), almost nothing was known about the normal pattern of assembly of the virome and the factors that shape this aspect of postnatal microbiota development (16).
Results
Fecal samples used for the present study had been collected from a subset of the larger 317 twin pair cohort: this subset consisted of 8 monozygotic and 12 dizygotic Malawian twin pairs between 0–30 mo of age living in five rural villages (see Table S1 for subject characteristics). Six of the discordant pairs contained a cotwin who developed kwashiorkor, whereas the other twin remained healthy. In the other six discordant pairs, one cotwin developed marasmus, whereas the other sibling remained healthy. Eight other pairs remained concordant for normal growth as defined by anthropometry (Table S1). Both siblings in each discordant pair were treated for 2–8 wk with a peanut-based RUTF. In addition, if fecal samples were available, we characterized the viromes of the twins’ mother and an older sibling.
A total of 231 fecal samples were collected from twins, their mothers, and an older sibling at the time points shown in Fig. S1A. The samples were frozen immediately in cryogenic storage containers maintained at liquid nitrogen temperature, subsequently stored at −80 °C and then used as the starting material for purification of virus-like particles (VLPs) (SI Methods). VLP DNA isolated from each sample and technical replicates from six randomly selected samples were subjected to multiple displacement amplification (MDA) and shotgun pyrosequencing [53,334 ± 2,290 reads/sample (average ± SEM); 365 ± 121 nt/read (average ± SD); Table S1]. On average, 62.4 ± 23% (mean ± SD) of the reads per sample had no significant similarity to sequences in public DNA sequence databases (Fig. S1B) and 35 ± 23% (mean ± SD) had significant hits to an updated viral nonredundant (NR) database [Viral_NR_DB (13)], whereas only 0.95 ± 1.2% of reads had unique hits to a database of 128 human gut-associated bacterial genomes (17) (this latter result also highlights the quality of VLP purification before DNA extraction).
The dataset of raw pyrosequencing reads and a cross-assembly strategy described in Fig. S2A and SI Methods were used to assemble a total of 17,676 contigs ≥500 nt (largest, 228,572 nt); 85 ± 9% of the raw reads per VLP DNA sample mapped to these contigs when a threshold nucleotide sequence identity of ≥95% over the length of the read was applied (Table S1). Analyzing the size distribution of the contigs as a function of their sequencing coverage (Fig. S2B), and considering those with overlapping termini, we identified three distinct size ranges for circular contigs: (i) >30Kb (the expected size range for circular dsDNA phages belonging to the Caudovirales); (ii) 6–7 Kb [expected size for ssDNA phages in the Microviridae family, notably the Alpavirinae (18)]; and (iii) 3–4 Kb (expected size for ssDNA eukaryotic viruses in the Anelloviridae family). The results were consistent with our taxonomic assignments (Fig. S2C and Table S2): (i) 4,048 contigs had significant similarity to known members of the Caudovirales; (ii) 395 contigs were assigned to Microviridae [164 of these contigs were classified as belonging to the Alpavirinae, a recently described subfamily of temperate phages associated with members of the Bacteroidetes (18)]; and (iii) 2,414 contigs had significant similarity to members of the Anelloviridae, a family of single-stranded viruses that infect different eukaryotic hosts including humans (see Fig. S2 D and E for further taxonomic classification).
Features of Virome Assembly/Development in Young Malawian Twin Pairs.
Raw reads from each virome were mapped to the assembled contigs and a normalized matrix of the number of reads per Kbp of contig sequence per million raw reads per VLP DNA sample (RPKM) was built (a “viral contig abundance matrix”; SI Methods). β-Diversity was measured using the Hellinger distance metric on the log-transformed matrix. This is analogous to using tables of bacterial operational taxonomic units (OTUs) for measurements of the degree of similarity between different (gut) microbial community samples. Distances were computed between fecal viromes sampled from a given individual over time (intrapersonal variation), as well as between individuals belonging to his/her family (interpersonal comparisons of cotwins, twin-mother or twin-older sibling), or to other families (interpersonal comparisons of unrelated twins, unrelated mothers/older siblings, or unrelated twins to unrelated mothers/older siblings).
The highest similarity between fecal DNA viromes was within an individual over time. Cotwins were more similar to each other; this relationship was not significantly affected by zygosity (P = 0.47; Mann–Whitney test); significantly greater differences in viromes were observed between a twin and his/her mother or older sibling or between any two unrelated individuals (Fig. S3A). The similarities between the fecal DNA viromes of unrelated young Malawian twins were significantly higher than between the twins and their mothers or older siblings (Fig. S3A). This latter finding emphasizes the importance of age as a variable affecting virome composition during the first 3 y of life. Age is also a major driver of variation in bacterial community composition, as illustrated by applying a phylogenetic distance metric (unweighted UniFrac) to 16S rRNA datasets generated from the same fecal samples used to purify VLPs (Fig. S3B).
Previous viromes sampled from fecal VLPs purified from healthy adult twin pairs and their mothers living in the United States showed that each individual’s collection of viruses was highly distinctive and stable (13) (i.e., the Hellinger distances between viromes sampled from cotwins was not significantly different from the distances between the cotwin and mother or another unrelated person). Although a few phages were identified as shared across members of the small cohort of adult US twins (13, 19), the results emphasized the high degree of interpersonal variation that existed between these adults. In contrast, the Malawian twins, which differ from the adult US cohort in a number of respects including age, geographic location, health status, and hygiene practices, exhibited much more substantial similarity in their early-life DNA virome membership.
Age-Discriminatory Viral Contigs.
To determine which viruses were responsible for the age signal described above, we first used a rarefied cross-assembly matrix to calculate two metrics; Shannon diversity and “predicted observed species.” Samples were clustered into 5-mo age bins [a window wide enough to incorporate a sufficient number of samples for analysis while still being narrow enough to not compromise the number of time points (bins) needed to show age-associated changes]. Using this approach, both measures of α-diversity increase significantly as a function of age for single-stranded phages [0.129; R2 = 0.099; P < 0.0001 (one-way ANOVA post test for linear trend); primarily members of Alpavirinae, which, as noted above, are associated with the Bacteroidetes (18)] and bacteria (slope for Shannon diversity index, 0.383; R2 = 0.501; P < 0.0001). In contrast, the α-diversity of eukaryotic ssDNA viruses (Fig. S4) exhibits a negative albeit not statistically significant correlation with age (slope, −0.043; R2 = 0.015; P = 0.12).
A Random Forests regression classifier was then trained to determine how well the chronologic age of a healthy donor of a given fecal sample could be predicted based on the DNA virome. Linear regression of predicted age against donor chronological age over the range of 6–22 mo yielded a regression coefficient of 0.6 (Fig. 1A). Fig. 1B shows a heat map of 22 age-discriminatory contigs that, when applying the Random Forests machine learning method, explain 68.7 ± 0.31% (mean ± SD) of the observed variance for concordant healthy twin pairs (compared with 54.5 ± 3.1% when using the full set of contigs) and yield a regression coefficient of 0.7. After summarizing the viral abundance matrix by the assignable taxonomy of its component contigs and sorting samples as a function of age, we found that (i) ssDNA eukaryotic viruses belonging to the Anelloviridae are highly abundant in the DNA viromes of healthy infants and children until 15–18 mo of age, after which time, the abundance of Anelloviridae diminish; (ii) ssDNA phages belonging to the Alpavirinae subfamily of the Microviridae increase in abundance as a function of age; and (iii) dsDNA phages assigned to the family Siphoviridae in the order Caudovirales are highly abundant in VLP samples from 0 to 10 mo of age and then slowly decrease (Fig. S5A). The high representation of this family from the Caudovirales during early phases of virome assembly and the high abundance of unclassified viruses during later months could reflect the high representation of phages from the Proteobacteria and Actinobacteria (early gut colonizers) in public DNA sequence databases and the paucity of full genomes for phages that use gut Bacteroidetes and Firmicutes as their hosts.
Viral Contigs That Distinguish Families.
Table S3 shows that there were contigs present in at least one fecal VLP sample from all 20 families, as well as contigs present in up to 60% of the samples regardless of family. The observed family clustering (Fig. S6A) suggested that it should be possible to use viral contig representation to predict family of origin using Random Forests (see SI Methods and Fig. S7 for details about implementation including criteria used for discriminatory feature selection). The Random Forests classifier accurately assigned twin-pair DNA viromes by family of origin [Out-Of-Bag (OOB) error rate of 6.4 ± 0.66% (mean ± SD) using the discriminatory contigs]. A heat map of the abundances of the most discriminatory contigs revealed that twin pairs shared a large percentage of their viromes, whether they were concordant for healthy status or discordant for SAM (Fig. S6B and Table S4). Interestingly there were a significantly greater number of contigs that discriminate families with SAM discordance compared with those with concordant healthy twin pairs (P = 0.02; Kruskal–Wallis test; Fig. S6C).
Contigs That Distinguish Fecal DNA Viromes in Kwashiorkor and/or Marasmus Twin Pairs Compared with Concordant Healthy Pairs.
β-Diversity measurements, based on the Hellinger metric, revealed that fecal viromes sampled from both members of SAM discordant pairs were less variable than those from concordant healthy pairs (i.e., both members of the discordant pairs did not develop more individualistic viromes). This significant reduction in variation was evident in kwashiorkor or marasmus discordant pairs at the time of diagnosis and endured during and following treatment with RUTF (Fig. 2A). These findings suggested that the DNA virome is perturbed in both members of these twin pairs, even though only one member manifests overt disease.
After removing family- and age-discriminatory contigs, we used Random Forests to determine whether health status could be predicted from virome composition. Classification proved to be quite accurate; 125 of the most discriminatory disease-associated viral contigs produced an OOB error rate of 9.61 ± 0.72% (see Fig. S5B for a heat map of these contigs as a function of health status). Importantly, their presence was not limited to the affected cotwin but rather was indicative of twin pairs discordant for marasmus or kwashiorkor; 80 of these disease-discriminatory contigs were significantly associated with pairs containing a child with kwashiorkor, and another 18 contigs were significantly associated with pairs containing a cotwin with marasmus, whereas a third subset of 27 contigs were discriminatory for concordant healthy individuals, either based on their presence in these pairs or by their absence in healthy and presence in discordant pairs (see Table S4 for contigs that fall into these different categories and their annotations).
Fig. 2B presents a subset of 16 contigs, from the group of 125, which comprise a sparse Random Forests-derived model with an OOB error rate equivalent to that achieved with the full dataset of contigs. As is typical with viromes, most of the ORFs do not have significant similarity with known genes. The most common recognizable functions of their encoded proteins are related to virion structures (e.g., tail fibers, capsids) and integration of phage into the host genome (e.g., integrases, transposases, and regulatory genes such as CI). ORFs assigned to the Torque Teno Virus family were found in 5 of the 16 contigs. One of the disease-discriminatory contigs includes a gene specifying an Ig domain-containing protein similar to those identified by Minot et al. (15) and hypothesized to be responsible for a microbiome-derived adaptive immune system (20). We subsequently selected all proteins from the Kyoto Encyclopedia of Genes and Genomes (KEGG) database with annotations of “Ig-like” or “Ig-domain” and blasted all identified proteins in the full dataset of 17,676 contigs to this Ig-only database (e value threshold, 1 × e−3). A total of 384 proteins from 327 contigs had significant hits. The distribution of the Ig-like and IgA-domain proteins was compared among the taxonomically annotated contigs and contigs that were discriminatory for the different variables analyzed. Genes encoding these proteins were significantly enriched in the Caudovirales, in particular Podoviridae (χ2 P < 0.0001). The presence of Ig-similar containing contigs was enriched in family-discriminatory but not health status-discriminatory contigs (χ2 P < 0.0001 and 0.636, respectively). A more comprehensive analysis of 70,160 predicted ORFs in all 17,676 contigs yielded 8,239 (11.74%) known or predicted proteins of which 64% were hypothetical or conserved hypothetical proteins; 44% of the remaining proteins were assigned to two KEGG categories “Nucleotide metabolism” and “Replication and repair,” a predominance that is not surprising for viruses (13). The very small number of proteins in other KEGG categories precluded us from conducting a suitably powered analysis that tests whether there are any functions enriched in the subset of 22 age-discriminatory or 125 disease-discriminatory contigs we had identified.
There was no statistically significant effect of RUTF on the abundances of the diseased twin pair-discriminatory viral contigs compared with the last time point sampled before initiation of the food-based intervention (P = 0.4822; Friedman test) and no effect of zygosity (the latter conclusion comes with the caveat that the number of twin pairs studied is small). Moreover, these contigs were rare in older siblings or mothers with only one, belonging to Circoviridae, present at an abundance ≥ 1 RPKM in more than 50% of these other family members (Table S4A).
Thirty-seven of the disease pair-discriminatory contigs had assignable taxonomy to eukaryotic ssDNA viruses belonging to the Anelloviridae or Circoviridae; this number is significantly more than expected from the distribution of all contigs with assignable taxonomies (Fig. S6D; two-way ANOVA, P = 0.0208). (Note that our study revealed a large number of previously unidentified lineages in these two eukaryotic viral families; Fig. S2 D and E and SI Methods.)
Gnotobiotic Mouse Studies of the Kwashiorkor and Marasmus Viromes.
The identification of eukaryotic ssDNA viruses as a prominent component of the set of viruses that discriminate SAM discordant twin pairs from concordant healthy pairs, but not the cotwin with normal anthropometry from his/her undernourished twin, raised the question of whether these viruses are causally related to disease pathogenesis. As noted above, transplantation of fecal microbiota from members of twin pairs discordant for kwashiorkor to adult germ-free mice consuming a prototypic Malawian diet transmitted discordant weight loss and metabolic phenotypes (including inhibition of the TCA cycle) and an enteropathy characterized by disruption of the small intestinal and colonic epithelial barrier (4, 11). Therefore, we isolated VLPs from fecal pellets that had been collected (and stored at −80 °C) from four groups of gnotobiotic mice sampled 3–32 d after the mice were gavaged with fecal microbiota obtained from kwashiorkor discordant pairs 56 and 57 [n = 10 animals/donor microbiota; donor microbiota were collected before subjects were treated with RUTF (4); see Table S5 for information about the time points when fecal samples were collected from mice; note that VLPs were also recovered from cecal contents harvested at the time the animals were killed]. Shotgun reads from the VLP DNAs [21,721 ± 6,869 reads/sample (mean ± SEM)] were used to query our dataset of 17,676 viral contigs from the human samples. A wide range (7–95%) of the reads from the different VLP samples mapped to a total of 87 contigs (successful transfer of these viruses to mice was defined as a contig present in at least one gut VLP sample at more than 0.1% relative abundance, with ≥10 reads mapping to that contig). Of these 87 contigs, 22 had assignable taxonomy; 19 of these belonged to the Caudovirales order (Table S5). Only one of the phages that were successfully transferred and retained represented a SAM discriminatory biomarker with assigned taxonomy—this single-stranded phage was classified as a member of the family Inoviridae (Table S5). The phages that were detected in recipient gnotobiotic mice could either represent lytic viruses transferred with their corresponding host bacterial strains or prophages induced at various time points during the mouse experiment. No assignable eukaryotic viruses were detected in any of the fecal samples obtained from any of the mice at any of the time points surveyed, indicating that these human viruses were not retained in the guts of the gnotobiotic animals under the experimental conditions used. Our failure to capture these eukaryotic DNA viruses in mice is consistent with the fact that although a large number of Anelloviruses have been identified in domesticated and wild animals, including rodents, pigs, and nonhuman primates, successful infection of animal models using human Anelloviruses has yet to be reported (21).
Our findings also suggest that the Anelloviridae (and Circoviridae) detected in the microbiota of the SAM discordant pairs are not necessary or sufficient to produce the transmissible discordant weight loss, barrier disruption, and metabolic phenotypes previously documented in these recipient gnotobiotic mice. Complex transkingdom interactions are being documented between persistent enteric viruses (both DNA and RNA), members of the domain Bacteria as well as Eukarya, and components of the immune system (22–24). Anelloviruses have been identified as chronic infecting viruses, and have been isolated from multiple body habitats and biofluids including bile, feces, saliva, urine, amniotic fluid, breast milk, cervical secretions plus sewage, suggesting several potential routes of transmission (25, 26). Although early “infection” with members of this family is almost universal [100% within the first 2 y of life in one Japanese study (27), with constant shedding in feces during the first year documented in another study (28)], there is, as of yet, no proof that Anelloviruses cause any disease (29, 30). Changes in the abundances of Anelloviruses in serum have been reported in lung transplantation patients (31, 32), patients with diverse respiratory tract infections (33), and those who develop AIDS (34). SAM is also associated with defects in immune function, including disturbances in the gut mucosal barrier (9, 11). At present, it is not clear whether these viruses “simply” provide a high-resolution map of disordered immune regulation or whether they are mediators of various aspects of immune function and dysfunction. Addressing this question will be difficult; the host species specificities of Anelloviridae and Circoviridae and the inability to capture these viruses in gnotobiotic mice harboring human gut microbial communities (but not human immune cell repertoires) represent significant challenges to overcome when designing preclinical models for proof-of-concept tests of whether viruses in these families are “simply” biomarkers of SAM in these twin pairs or causally related to disease pathogenesis.
Epidemiologic Considerations.
Because Random Forests proved successful in accurately classifying viromes based on age and health status, we attempted to use this machine-learning approach to classify viromes based on seasonality and/or village of origin. Accurate predictions were limited to village-of-origin (OOB error rate, 26.35 ± 2.4%). Using a subset of 162 village-discriminatory contigs, it was possible to decrease the OOB error rate to 14.89 ± 0.65% (mean ± SD); 105 of these contigs were members of the Anelloviridae and Circoviridae (Figs. S2 D and E, S6D, and S8A). The distributions of anthropometric measures [weight-to-height Z score (WHZ), weight-to-age Z score (WAZ), and height-to-age Z score (HAZ)] varied between the villages such that there were significant differences between some pairwise comparisons (one-way ANOVA and post hoc Tukey’s tests). The greatest distinction involved Mitondo and Makhwira, which had more infants with lower WHZ and WAZ scores compared with Chamba, Mayaka, and M’biza (Fig. S8B). Mitondo and Makhwira are the two villages positioned in the lower Shire River Valley, where ambient temperatures are higher and rate of childhood illnesses, particularly malaria, are considerably greater than in the rest of the country (Fig. S8C). The distinct distributions of these virotypes should prove useful for subsequent epidemiologic and anthropologic studies that seek to address questions about factors that might affect how these viruses are acquired and transmitted between individuals and the contributions of these viruses to health status.
SI Methods
Sample Collection.
A team of one US pediatrician and a minimum of two trained local personnel visited each site every 1 to 2 wk, where the weight and height of each infant or child was measured, and each child was checked for bilateral pitting edema. Both siblings in each discordant pair were treated with a peanut-based RUTF that was produced in Malawi (36). Note that in the case of monozygotic twin pair 23, discordance was first manifest with one cotwin presenting with marasmus and the other with normal anthropometry. Following RUTF, the affected cotwin improved but was still classified as having MAM. In addition, the initially healthy sibling in this pair developed MAM in the period after RUTF. Therefore, both cotwins were treated with a second round of RUTF. In the case of pair 229, one of the cotwins presented with marasmus and was treated with RUTF but subsequently developed kwashiorkor following cessation of therapy.
During each visit for a scheduled fecal sample collection, each child wore a commercial disposable diaper lined with plastic. Each fecal specimen was flash-frozen in a cryogenic storage container filled with liquid nitrogen, within 10 min after it was produced. All samples were subsequently stored at −80 °C before analyses. For twin pairs who remained concordant for healthy growth, fecal samples were collected on average every 3 mo, yielding 5 ± 1 samples/individual (mean ± SD). In the case of twins who became discordant for kwashiorkor or marasmus sampling was increased to every 2 wk during the period of RUTF treatment and additionally at 2 and 4 wk following cessation of RUTF, producing 8 ± 3 samples per child (4).
Purification of VLPs.
VLP purification and DNA extraction was performed with minor modifications to the procedure described previously that we and others have shown to result in minimal contamination with bacterial DNA (35, 37). In brief, a 100- to 300-mg aliquot of each fecal sample was resuspended in 0.4 mL SM buffer [100 mM NaCl, 8 mM MgSO4, 50 mM Tris (pH 7.5) and 0.002% gelatin (wt/vol); sterilized by passage through a 0.02-µm pore diameter filter (Whatman)]. After homogenization by vortexing for 5 min, samples were centrifuged twice at 2,500 × g for 10 min at 4 °C to remove large particles and bacterial cells. The resulting supernatant was filtered once through a 0.45-µm pore diameter Millex filter (Millipore) and twice through 0.22-µm pore diameter Millex filters (Millipore). The volume of the filtrate was adjusted to 200 µL with SM buffer if needed. Each sample was treated with 20 µL of lysozyme (10 mg/mL) for 30 min at 37 °C, followed by incubation for 10 min with 0.2 volumes of chloroform. The sample was then centrifuged at 2,500 × g for 5 min at room temperature. The aqueous phase was collected and incubated with 3 U of DNaseI (Sigma) and 20 µL of 10× DNase buffer (50 mM MgCl2, 10 mM CaCl2) for 1 h at 37 °C, after which time, enzyme activity was inactivated by incubation at 65 °C for 15 min.
To isolate DNA, purified VLPs were incubated with 10 µL of 10% (wt/vol) SDS and 1 µL of proteinase K (20 mg/mL; Sigma) for 20 min at 56 °C; 35 µL of 5 M NaCl and 28 µL of 10% (wt/vol) cetyltrimethylammonium bromide/0.7M NaCl were then added, followed by incubation at 65 °C for 10 min. The sample was mixed with an equal volume of phenol:chloroform:isoamyl alcohol (25:24:1), vortexed, and centrifuged at 8,000 × g for 5 min at room temperature. The resulting aqueous phase was mixed with an equal volume of chloroform and spun at 8,000 × g for 5 min at room temperature. The aqueous phase from this step was passed through a Qiagen MinElute purification column (elution volume, 30 µL).
MDA was performed with Illustra GenomiPhi V2 (GE Healthcare Life Sciences), according to the manufacturer’s instructions (n = 3 independent reactions/VLP DNA sample to prevent single amplification bias). Reactions were subsequently pooled and the DNA product was purified using a Qiagen DNeasy purification kit (elution volume, 75µL).
Quality-control assays using 16S rDNA PCR was performed to establish the absent to negligible bacterial DNA contamination obtained using the method we used for VLP purification from fecal samples (13).
Shotgun Pyrosequencing of VLP-Derived DNA.
For each library, 100 µL of total DNA in Tris-EDTA (TE) buffer (pH 7.0; 5 ng/µL) was fragmented by sonication in thin-walled 0.2-mL eight-strip PCR tubes using a BioruptorXL multisample sonicator (Diagenode) set on “high”; sonication occurred over the course of 8 min using successive cycles of 30 s “on” followed by 30 s “off.” Sonicated samples were subsequently cleaned using the MinElute 96 UF PCR Purification Kit (Qiagen) according to the manufacturer’s instructions. Each sonicated DNA sample in each well of the 96-well plate was eluted with 22 µL of nuclease-free sterile water. For end repair and A-tailing, 20 µL of sonicated DNA was added to 5 µL of a mixture containing 2.5 µL of 10× T4 DNA ligase buffer (NEB), 1 µL of 1 mM dNTPs (NEB), and 0.5 µL of each of the following enzymes: T4 polymerase (3 U/µL; NEB), T4 polynucleotide kinase (10 U/µL; NEB), and Taq polymerase (5 U/µL; Life Technologies). The solution was mixed by vortexing and then incubated for 30 min at 25 °C, followed by 20 min at 75 °C.
A total of 24 independent adapters were synthesized containing 24 different multiplex identifier (MID) barcodes. These barcodes were ligated to the A-tailed DNA sample in a 27-µL reaction by adding 1 µL of 25 µM adapter mix plus 1 µL of T4 DNA ligase (2,000,000 U/mL; NEB). The adapter mix was prepared by combining 12.5 µL of a 100 µM stock of each adapter oligo and 25 µL of oligo buffer (1× TE, 0.1 M NaCl), incubating the mixture at 95 °C for 1 min and then slowly decreasing the temperature (at a rate of 0.1 °C/s) until reaching 4 °C. After 30 min of incubation at 16 °C, 2.5 µL of 50 mM EDTA was added to stop the ligation reaction. Sets of 24 samples, all harboring different adapter sequences, were pooled. The pool was purified using Agencourt AMPure XP beads (Beckman Coulter) and quantified using the recommended 454 rapid library barcoded fluorescent-labeled adapters and a plate reader (Synergy2; Biotek). After quantification, normalized pools of 24 samples were sequenced using “454 FLX Titanium” chemistry.
Initial quality filtering of the raw pyrosequencer data consisted of parsing reads by their MID, followed by removal of short reads (less than 60 nt), reads with three or more ambiguous (“N”) bases anywhere in the sequence, reads with two continuous N bases, replicate reads (reads where the first 20 nt were >97% identical), and sequences with significant similarity to human reference genomes (blastn with e value < 1 × e−5; to ensure the deidentification of samples).
All pyrosequencing reads from VLP-derived viromes were used to query (blastx e value < 1 × e−5) (i) the KEGG database (v52); (ii) Cluster of Orthologous Groups (COG)/String database (v8.2); (iii) a database of 128 human gut-associated bacterial genomes (17) that contains representatives of major phylogenetic lineages present in the human gut microbiota but was used to measure the degree of human bacterial contamination rather than as an exhaustive collection of organisms; and (iv) an updated version of a custom NR_Viral_DB composed of a set of 7,077 sequences deduplicated at 95% identity. This latter database consisted of GenBank Viral Refseq entries plus all available complete viral genomes in National Center for Biotechnology Information (NCBI) and European Bioinformatics Institute and 512 predicted prophages identified using PhageFinder (38). A custom MySQL database was built to store information about each blast hit and the read coverage per VLP sample; this database was used to parse blast results and identify whether a read hit more than one database.
Assembly and Annotation of Viral Genomes.
Reads generated from VLP-derived DNA were submitted for assembly using the following pipeline. For each sample, CD-HIT v4.6 (39) was used to cluster reads at 90% global identity from each cluster and the top five sequences were taken as representatives. These reads were used for de novo assembly using Newbler v2.8 (release 20120726_1306; 454 Life Sciences) and default parameters except for “minimum identity” (90%) and “minimum overlap” (20 nt). FR-HIT (40) was used to map all raw reads to the assembled contigs at 95% identity over the length of the read.
Following assembly of reads from each sample, a cross-assembly strategy was employed using all reads from all samples obtained from a given human family (the rationale being family members are more likely to share virotypes). Contigs >500 bp were first pooled together and deduplicated by blasting “all against all” and removing any contig that was contained in over 90% of its length within another contig. Within a human family, all raw reads that were not used for the individual sample assembly that generated the initial contigs were mapped to the pool of contigs generated from each of the individual assemblies within that given family using FR-HIT at 95% identity to identify contigs present at low abundances at specific time points that prevented contig assembly in a given sample. Reads that remained unmapped were used for de novo assembly using the same strategy used for individual samples. The resulting contigs were deduplicated, and reads were mapped again to identify potential chimeric contigs (contigs with a sudden drop in coverage or with positions where there were high numbers of partially mapped reads). Reads that mapped partially to the edge of contigs were pooled together and used in an attempt to extend the edges of these contigs, while checking for “consistency” (defined as ≥3× coverage with 100% identity). The extended contigs were subsequently assembled using Phrap to identify contigs with terminal overlaps that could be joined (Phrap parameters: -minmatch 20 -maxmatch 20 -bandwidth 5 -minscore 20 -gap_ext -4). The Phrap contigs were pooled with the previous contigs and the full set was dereplicated. Reads were mapped a final time to identify potential chimeric contigs and contigs with overlapping ends. The process yielded, in its final iteration, a total of 17,676 contigs. Script wrappers used to implement the assembly are available at https://github.com/GordonLab/Virome_cross_assembly.
Cross-Contig Comparison.
To analyze viral abundance and diversity within and between individuals, all raw reads generated from each VLP DNA sample were mapped against the assembled VLP contigs using FR-HIT at 90% identity cutoff. As noted in the main text, a matrix of reads per Kbp of contig sequence per million reads of sample (RPKM) was generated. This matrix was used as the equivalent of an OTU table for analysis of α- and β-diversity. Contigs with high similarity to Escherichia coli genomes (with their prophage sequences removed) were observed in cases where the VLP sample had very low yields of DNA after MDA (E. coli DNA contamination may reflect residual DNA contamination in the preparations of recombinant enzymes used for MDA). These contigs were identified (Table S2) and removed from further analysis.
Viral α- and β-Diversity.
For α-diversity analysis, the nonnormalized matrix of read counts mapping to each contig per sample was used and rarefied to 10,000 reads per sample. “Observed species” and the “Shannon diversity index” were calculated for each sample using QIIME (v1.8) (41). To calculate α-diversity metrics associated with specific viral taxa (single-stranded phages, single-stranded eukaryotic viruses, etc.), only the rows corresponding to contigs assigned to that corresponding taxonomy were selected from the rarefied matrix. β-Diversity analyses were performed using a log transformation of the normalized abundance matrix; the Hellinger metric was used as implemented in QIIME (v1.8). Distance matrices were used to generate Unweighted Pair Group Method with Arithmetic Mean (UPGMA) clustering or to evaluate within and between individual virome sample distances.
Random Forests Analysis.
Random Forests was used either as a classifier for discrete variables (“Family,” “Health Status,” “Village of Origin,” “Season”) or for regression models on quantitative variables (“Age”), as implemented in randomForest 4.6-7 library in R. To run the classifier, the dataset was randomly split into training (70%) and test (30%) datasets, making sure that the 70:30% split was even over the different characteristics evaluated. The same protocol was repeated 100 times to ensure that no bias in selection of the training and test set would significantly affect the predictions. A total of 1,000 random trees were generated per iteration. The OOB error, feature importance, and prediction for the 100 iterations were averaged for the results presented. Feature importance score for a given contig was defined as the mean decrease in accuracy over all classes when the contig was removed from a random tree. To reduce the size of the dataset, contigs that were only present in one sample from one twin-pair were removed from the analysis.
For feature selection, we used the R package VarSelRF (42) to select variables when Random Forests was used for classification; in the case of Age, the R package Boruta (43) was used for feature selection. Both packages select the minimum number of contigs capable of obtaining the same or better classification accuracy, thus minimizing the error. Because the Random Forests was performed 100 different times with different training and test sets, it is possible that the selected variables could vary in each case. Furthermore, in cases with equally informative variables, Random Forests will pick one randomly each time. To select a final set with the minimum number of contigs capable of achieving the highest possible classification, we sorted the contigs in decreasing order by the number of times (out of the 100) that they were picked as important by either VarSelRF or Boruta; subsets of the contigs selected on 5–90% of the occasions were taken and 100 iterations of Random Forests were performed with each subset. The subset with the minimum number of contigs that had an OOB error rate within 1 SD of the minimum obtained from all datasets were selected as the discriminatory contigs.
To identify all contigs capable of discriminating and classifying samples by family of origin, we measured two different properties of contigs relative to family assignment: (i) “sensitivity” (presence of contigs in samples of a given family/total number of samples where the contig was present); and (ii) “precision” (the presence of contigs in samples of a given family/total number of samples of a given family). We used sensitivity and precision because other metrics such as “specificity” measure true negatives, which in our case, represent absence of the contigs in samples from other families (will always constitute a large percentage of the samples, implying that this number will always be very low, hence biasing the analysis). We took the sum of the negative log2 of each fraction, where a value of 0 will be maximum sensitivity and precision, and larger numbers will indicate lower precision/sensitivity (defined as the “SP metric”).
We also used an independent metric, mutual information (MI), in a matrix where contig abundances were binned in 10 groups of equal proportions. MI measures the “information” contained in a given contig regarding a specific family. Calculations of MI were based on the general formula
where X and Y are discrete variables, in this case, “family of origin” and “binned abundance of contigs” per sample. Given the distribution of abundances of contigs in each VLP sample and the corresponding families, it is possible to calculate the MI between the abundance of each contig among the samples and their families of origin. After calculating these metrics for all contigs as a function of each family, we found correlations between the two metrics (Fig. S7) (i.e., contigs have in common a low SP value and high MI). We then defined cutoff values for MI and SP to determine the minimal set of discriminatory contigs that minimizes the observed error rate when running Random Forests predictions. We took cutoffs (SP = 1.5, 2, 2.5, 3; and MI = 1, 0.08, 0.06) in all possible combinations; contigs that passed the specific cutoffs were selected and 100 replicates of Random Forests were run. Based on the results, we selected SP 2.0 and MI 0.08 as the cutoff giving the minimum number of contigs with an OOB error rate of 6.38 ± 0.66% (mean ± SD) that falls within 1 SD of the minimum observed.
Phylogenetic Analysis.
Viral contigs were trimmed if their 5′ and 3′ ends overlapped with >99% nucleotide identity. Getorf software from the European Molecular Biology Open Software Suite (EMBOSS) package was used to predict ORFs with canonical start and stop codons and with length greater than 600 nt. Predicted ORFs were queried against NCBI NR database using blastp with an e value cutoff of 1 × e−5. Sequences that shared significant similarity with the replication-associated protein (Rep) of Circoviridae or ORF1 of Anelloviridae were extracted. Representative viruses belonging to Circoviridae and Anelloviridae were selected according to International Committee on Taxonomy of Viruses (Table S6). The Rep proteins from the representative Circoviridae viruses and the ORF1 proteins from the representative Anelloviridae viruses were used for phylogenetic analysis. Multiple sequence alignments were performed with ClustalW (44). Phylogenetic analysis was based on the Neighbor Joining method in the MEGA v5.05 package (45) with 500 bootstrap replicates. Phylogenetic trees were visualized using TreeView (46).
Bacterial 16S rRNA Gene Amplification, Amplicon Sequencing, and Data Analysis.
Fecal samples were pulverized with a mortar and pestle in liquid nitrogen. An aliquot (500 mg) of each frozen pulverized fecal sample was resuspended in a solution containing 500 µL of extraction buffer [200 mM Tris (pH 8.0), 200 mM NaCl, 20 mM EDTA], 210µL of 20% SDS, 500 µL of phenol:chloroform:isoamyl alcohol (25:24:1), and 500µL of a slurry of 0.1-mm diameter zirconia/silica beads (BioSpec Products). Cells were mechanically disrupted using a bead beater (BioSpec Products) set on high for 2 min at room temperature, followed by extraction with phenol:chloroform:isoamyl alcohol and precipitation with isopropanol. DNA obtained from three separate aliquots of each fecal sample were pooled and used for amplification of bacterial 16S rRNA genes. Approximately 330-bp amplicons, spanning variable region 4 (V4) of bacterial 16S rRNA genes present in fecal samples, were generated by PCR as described previously (3). Four replicate PCRs were performed for each fecal DNA sample. Replicate PCRs were pooled and purified using Ampure magnetic purification beads (Agencourt). DNA was quantified using Qubit and an equimolar amount of each sample was sequenced with an Illumina MiSeq instrument generating paired end reads of 250 nt each.
The 16S rRNA reads were analyzed using QIIME (v1.8): raw sequences, quality files, and a mapping file indicating the barcode sequence corresponding to each sample were used as inputs to split reads by samples and bin them into 97% identity (ID) OTUs based on their sequence similarity. Reads were matched to the reference Greengenes database (version May 2013), and taxonomy assignments were made using the naïve Bayesian Ribosomal Database Project (RDP)-classifier version 2.4 (47). OTUs that did not match to the Greengenes database were classified as de novo OTUs. A taxonomic tree was built based on sequence similarity. A sample-by-OTU table was used together with the tree for calculating α- and β-diversity.
Statistical Tests and Plots.
Other statistical tests were performed and heat maps were produced using the R package and Prism v6.0. Statistical significance in figures were defined in Prism v6.0: ns: P > 0.05; *P ≤ 0.05; **P ≤ 0.01; ***P ≤ 0.001; ****P ≤ 0.0001.
Discussion
We have conducted a time series comparative metagenomic study of the fecal DNA viromes of twins concordant for healthy status during the first 3 y of life and twins who became discordant for kwashiorkor or marasmus, plus their mothers and older siblings. Our results provide a human model for delineating normal versus perturbed postnatal acquisition and retention of the gut microbiota’s viral component in children at risk for and with manifest undernutrition.
Although read depth was modest in this study, and larger sequencing depth can help identify rare virotypes and hypervariability in viral genomes (15), we benefited from the relatively longer read lengths obtained with pyrosequencing and the cross-assembly strategy used to generate partial or complete viral genomes [23- ± 2-fold (mean ± SEM) sequence coverage of assembled viral contigs]. Remarkably, 95.8% of the 70,160 predicted ORFs identified in the 17,676 assembled viral contigs encode hypothetical proteins or conserved hypothetical proteins. This finding further emphasizes the importance of developing new approaches, such as combining Hidden Markov Models with machine learning methods, to identify proteins that are highly discriminatory markers of the environmental origin and/or taxonomic features of viromes and hence the focus of efforts to delineate their functions.
Studying the assembly of components of the gut microbiota within and across families, including those containing mono- or dizygotic twin pairs, provides a microbial view of human postnatal development. Machine-learning methods (Random Forests) have yielded sparse models composed of a limited number of highly indicative age-discriminatory bacterial strains that together form a signature for defining the normal developmental biology of this microbial organ (3, 5). The fact that time-dependent changes in the representation of these indicative bacterial strains was similar across biologically unrelated individuals living in distinct geographic areas [e.g., Malawi and Bangladesh (3)] suggests that a set of (still-to-be-defined) rules govern development/differentiation of this organ which is composed of multiple cell lineages (taxa). The present study provides an additional developmental perspective, one focused on the viral component of the gut community. Analogous to the approach used for the bacterial component, applying machine-learning methods to a viral abundance matrix where contigs’ abundances that were quantified as a function of individual, twin-pair, time after birth, family membership, and health status yielded a set of age-discriminatory phage and eukaryotic viruses.
The identification of viral contigs that discriminate both the kwashiorkor (or marasmus) and apparently healthy cotwins in discordant pairs from members of age-matched concordant healthy pairs is noteworthy from a developmental biology perspective: it reveals that specification of a normal community “fate” is perturbed in both members of a discordant twin pair and that the shared virome features of their “healthy” sibling provide an operational definition of a sensitized, at-risk host/microbial community.
The ability to compare and contrast phenotypes transmitted by microbiota from healthy cotwins in discordant pairs and microbiota from concordant healthy pairs to recipient gnotobiotic mice as a function of various defined perturbations (dietary, manipulations of innate and adaptive immunity, and other characteristics of the gut mucosal barrier, enteropathogen load, host genotype, etc.) provides a way to identify factors, both autonomous (to the community) and nonautonomous (derived from the environment surrounding the community), that control the developmental trajectory of the microbiota and the origins of discordant phenotypes within twin pairs.
Answering the question of how the shared pattern of assembly/maturation of the DNA virome noted across biologically unrelated healthy infants and children is related to the program of succession/assembly of bacterial components of their microbiota and the codevelopment of their gut mucosal immune system may ultimately provide ways for deliberately advancing microbiota maturation in those with SAM (e.g., by introducing phage from a healthy individual whose chronologic age is similar to or older than that of the state of microbiota maturation in a child with SAM to facilitate colonization of human gut bacterial lineages that are not well represented in their developmentally arrested microbiota). Studies involving deliberate introduction of purified human fecal VLP preparations into gnotobiotic mice colonized with a defined consortium of sequenced members of the human gut microbiota have shown that viral–bacterial dynamics in vivo are complex; the correlations between phage and bacterial strain abundances are not always obvious and involve negative correlations shifted in time as a result of a predator–prey dynamic, whereas prophages have linear positive correlations (35). As a consequence, answering this question promises to be challenging.
Methods
Subjects were recruited through health centers located in the Malawian villages of Makhwira, Mitondo, M'biza, Chamba, and Mayaka using procedures approved by the College of Medicine Research Ethics Committee of the University of Malawi and by the Human Research Protection Office of Washington University School of Medicine in St. Louis. All experiments involving mice were performed with protocols approved by the Washington University Animal Studies Committee (4). Procedures for sample collection, purification of VLPs, shotgun pyrosequencing of VLP-derived DNA, assembly and annotation of viral genomes, cross-contig comparisons, calculations of viral α- and β-diversity, Random Forests analysis, viral phylogenetic analysis, bacterial 16S rRNA gene amplification and amplicon sequencing, as well as statistical analyses are described in detail in SI Methods.
Supplementary Material
Acknowledgments
We thank Sabrina Wagoner, Su Deng, Jessica Hoisington-López, and Marty Meier for superb technical assistance. This work was supported by a grant from the Bill & Melinda Gates Foundation. L.V.B. received support from National Institutes of Health Grant T32 AI007172.
Footnotes
The authors declare no conflict of interest.
Data deposition: The 16S rRNA and shotgun sequencing datasets have been deposited in the European Nucleotide Archive (ENA; www.ebi.ac.uk/ena) in raw format, prior to post-processing and data analysis, under study accession number PRJEB9818. The dataset of 17,676 viral contigs assembled from shotgun sequencing reads has also been deposited in ENA under the same accession number.
This article contains supporting information online at www.pnas.org/lookup/suppl/doi:10.1073/pnas.1514285112/-/DCSupplemental.
References
- 1.Black RE, et al. Maternal and Child Nutrition Study Group Maternal and child undernutrition and overweight in low-income and middle-income countries. Lancet. 2013;382(9890):427–451. doi: 10.1016/S0140-6736(13)60937-X. [DOI] [PubMed] [Google Scholar]
- 2.Yatsunenko T, et al. Human gut microbiome viewed across age and geography. Nature. 2012;486(7402):222–227. doi: 10.1038/nature11053. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 3.Subramanian S, et al. Persistent gut microbiota immaturity in malnourished Bangladeshi children. Nature. 2014;510(7505):417–421. doi: 10.1038/nature13421. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 4.Smith MI, et al. Gut microbiomes of Malawian twin pairs discordant for kwashiorkor. Science. 2013;339(6119):548–554. doi: 10.1126/science.1229000. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 5.Subramanian S, et al. Cultivating healthy growth and nutrition through the gut microbiota. Cell. 2015;161(1):36–48. doi: 10.1016/j.cell.2015.03.013. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 6.Trehan I, Manary MJ. Management of severe acute malnutrition in low-income and middle-income countries. Arch Dis Child. 2015;100(3):283–287. doi: 10.1136/archdischild-2014-306026. [DOI] [PubMed] [Google Scholar]
- 7.Victoria JG, Kapoor A, Dupuis K, Schnurr DP, Delwart EL. Rapid identification of known and new RNA viruses from animal tissues. PLoS Pathog. 2008;4(9):e1000163. doi: 10.1371/journal.ppat.1000163. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 8.Gaayeb L, et al. Effects of malnutrition on children’s immunity to bacterial antigens in Northern Senegal. Am J Trop Med Hyg. 2014;90(3):566–573. doi: 10.4269/ajtmh.12-0657. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 9.Kosek M, et al. MAL-ED network Fecal markers of intestinal inflammation and permeability associated with the subsequent acquisition of linear growth deficits in infants. Am J Trop Med Hyg. 2013;88(2):390–396. doi: 10.4269/ajtmh.2012.12-0549. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 10.Waber DP, et al. Impaired IQ and academic skills in adults who experienced moderate to severe infantile malnutrition: A 40-year study. Nutr Neurosci. 2014;17(2):58–64. doi: 10.1179/1476830513Y.0000000061. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 11.Kau AL, et al. Functional characterization of IgA-targeted bacterial taxa from undernourished Malawian children that produce diet-dependent enteropathy. Sci Transl Med. 2015;7(276):276ra24. doi: 10.1126/scitranslmed.aaa4877. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 12.Breitbart M, et al. Metagenomic analyses of an uncultured viral community from human feces. J Bacteriol. 2003;185(20):6220–6223. doi: 10.1128/JB.185.20.6220-6223.2003. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 13.Reyes A, et al. Viruses in the faecal microbiota of monozygotic twins and their mothers. Nature. 2010;466(7304):334–338. doi: 10.1038/nature09199. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 14.Minot S, et al. The human gut virome: Inter-individual variation and dynamic response to diet. Genome Res. 2011;21(10):1616–1625. doi: 10.1101/gr.122705.111. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 15.Minot S, Grunberg S, Wu GD, Lewis JD, Bushman FD. Hypervariable loci in the human gut virome. Proc Natl Acad Sci USA. 2012;109(10):3962–3966. doi: 10.1073/pnas.1119061109. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 16.Breitbart M, et al. Viral diversity and dynamics in an infant gut. Res Microbiol. 2008;159(5):367–373. doi: 10.1016/j.resmic.2008.04.006. [DOI] [PubMed] [Google Scholar]
- 17.Forsberg KJ, et al. The shared antibiotic resistome of soil bacteria and human pathogens. Science. 2012;337(6098):1107–1111. doi: 10.1126/science.1220761. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 18.Krupovic M, Forterre P. Microviridae goes temperate: Microvirus-related proviruses reside in the genomes of Bacteroidetes. PLoS One. 2011;6(5):e19893. doi: 10.1371/journal.pone.0019893. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 19.Dutilh BE, et al. A highly abundant bacteriophage discovered in the unknown sequences of human faecal metagenomes. Nat Commun. 2014;5:4498. doi: 10.1038/ncomms5498. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 20.Barr JJ, et al. Bacteriophage adhering to mucus provide a non-host-derived immunity. Proc Natl Acad Sci USA. 2013;110(26):10771–10776. doi: 10.1073/pnas.1305923110. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 21.Nishiyama S, et al. Identification of novel anelloviruses with broad diversity in UK rodents. J Gen Virol. 2014;95(Pt 7):1544–1553. doi: 10.1099/vir.0.065219-0. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 22.Reese TA, et al. Coinfection. Helminth infection reactivates latent γ-herpesvirus via cytokine competition at a viral promoter. Science. 2014;345(6196):573–577. doi: 10.1126/science.1254517. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 23.Baldridge MT, et al. Commensal microbes and interferon-λ determine persistence of enteric murine norovirus infection. Science. 2015;347(6219):266–269. doi: 10.1126/science.1258025. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 24.Nice TJ, et al. Interferon-λ cures persistent murine norovirus infection in the absence of adaptive immunity. Science. 2015;347(6219):269–273. doi: 10.1126/science.1258100. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 25.Breitbart M, Rohwer F. Method for discovering novel DNA viruses in blood using viral particle selection and shotgun sequencing. Biotechniques. 2005;39(5):729–736. doi: 10.2144/000112019. [DOI] [PubMed] [Google Scholar]
- 26.Bernardin F, Operskalski E, Busch M, Delwart E. Transfusion transmission of highly prevalent commensal human viruses. Transfusion. 2010;50(11):2474–2483. doi: 10.1111/j.1537-2995.2010.02699.x. [DOI] [PubMed] [Google Scholar]
- 27.Ninomiya M, Takahashi M, Nishizawa T, Shimosegawa T, Okamoto H. Development of PCR assays with nested primers specific for differential detection of three human anelloviruses and early acquisition of dual or triple infection during infancy. J Clin Microbiol. 2008;46(2):507–514. doi: 10.1128/JCM.01703-07. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 28.Kapusinszky B, Minor P, Delwart E. Nearly constant shedding of diverse enteric viruses by two healthy infants. J Clin Microbiol. 2012;50(11):3427–3434. doi: 10.1128/JCM.01589-12. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 29.Virgin HW, Wherry EJ, Ahmed R. Redefining chronic viral infection. Cell. 2009;138(1):30–50. doi: 10.1016/j.cell.2009.06.036. [DOI] [PubMed] [Google Scholar]
- 30.Okamoto H. History of discoveries and pathogenicity of TT viruses. Curr Top Microbiol Immunol. 2009;331:1–20. doi: 10.1007/978-3-540-70972-5_1. [DOI] [PubMed] [Google Scholar]
- 31.De Vlaminck I, et al. Temporal response of the human virome to immunosuppression and antiviral therapy. Cell. 2013;155(5):1178–1187. doi: 10.1016/j.cell.2013.10.034. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 32.Young JC, et al. Viral metagenomics reveal blooms of anelloviruses in the respiratory tract of lung transplant recipients. Am J Transplant. 2015;15(1):200–209. doi: 10.1111/ajt.13031. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 33.Maggi F, et al. TT virus in the nasal secretions of children with acute respiratory diseases: Relations to viremia and disease severity. J Virol. 2003;77(4):2418–2425. doi: 10.1128/JVI.77.4.2418-2425.2003. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 34.Thom K, Petrik J. Progression towards AIDS leads to increased Torque teno virus and Torque teno minivirus titers in tissues of HIV infected individuals. J Med Virol. 2007;79(1):1–7. doi: 10.1002/jmv.20756. [DOI] [PubMed] [Google Scholar]
- 35.Reyes A, Wu M, McNulty NP, Rohwer FL, Gordon JI. Gnotobiotic mouse model of phage-bacterial host dynamics in the human gut. Proc Natl Acad Sci USA. 2013;110(50):20236–20241. doi: 10.1073/pnas.1319470110. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 36.Manary MJ. Local production and provision of ready-to-use therapeutic food (RUTF) spread for the treatment of severe childhood malnutrition. Food Nutr Bull. 2006;27(3) Suppl:S83–S89. doi: 10.1177/15648265060273S305. [DOI] [PubMed] [Google Scholar]
- 37.Kleiner M, Hooper LV, Duerkop BA. Evaluation of methods to purify virus-like particles for metagenomic sequencing of intestinal viromes. BMC Genomics. 2015;16:7. doi: 10.1186/s12864-014-1207-4. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 38.Fouts DE. Phage_Finder: Automated identification and classification of prophage regions in complete bacterial genome sequences. Nucleic Acids Res. 2006;34(20):5839–5851. doi: 10.1093/nar/gkl732. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 39.Li W, Godzik A. Cd-hit: A fast program for clustering and comparing large sets of protein or nucleotide sequences. Bioinformatics. 2006;22(13):1658–1659. doi: 10.1093/bioinformatics/btl158. [DOI] [PubMed] [Google Scholar]
- 40.Niu B, Zhu Z, Fu L, Wu S, Li W. FR-HIT, a very fast program to recruit metagenomic reads to homologous reference genomes. Bioinformatics. 2011;27(12):1704–1705. doi: 10.1093/bioinformatics/btr252. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 41.Caporaso JG, et al. QIIME allows analysis of high-throughput community sequencing data. Nat Methods. 2010;7(5):335–336. doi: 10.1038/nmeth.f.303. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 42.Díaz-Uriarte R, Alvarez de Andrés S. Gene selection and classification of microarray data using random forest. BMC Bioinformatics. 2006;7:3. doi: 10.1186/1471-2105-7-3. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 43.Kursa MB. Robustness of Random Forest-based gene selection methods. BMC Bioinformatics. 2014;15:8. doi: 10.1186/1471-2105-15-8. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 44.Thompson JD, Higgins DG, Gibson TJ. CLUSTAL W: Improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice. Nucleic Acids Res. 1994;22(22):4673–4680. doi: 10.1093/nar/22.22.4673. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 45.Tamura K, et al. MEGA5: Molecular evolutionary genetics analysis using maximum likelihood, evolutionary distance, and maximum parsimony methods. Mol Biol Evol. 2011;28(10):2731–2739. doi: 10.1093/molbev/msr121. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 46.Page RD. TreeView: An application to display phylogenetic trees on personal computers. Comput Appl Biosci. 1996;12(4):357–358. doi: 10.1093/bioinformatics/12.4.357. [DOI] [PubMed] [Google Scholar]
- 47.Wang Q, Garrity GM, Tiedje JM, Cole JR. Naive Bayesian classifier for rapid assignment of rRNA sequences into the new bacterial taxonomy. Appl Environ Microbiol. 2007;73(16):5261–5267. doi: 10.1128/AEM.00062-07. [DOI] [PMC free article] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.