Targeted whole-genome recovery of single viral species in a complex environmental sample

Liyin Chen; Anqi Chen; Xinge Diana Zhang; Maria Teresa Saenz Robles; Hee-Sun Han; Yi Xiao; Gao Xiao; James M Pipas; David A Weitz

doi:10.1073/pnas.2404727121

. 2024 Jul 25;121(31):e2404727121. doi: 10.1073/pnas.2404727121

Targeted whole-genome recovery of single viral species in a complex environmental sample

Liyin Chen ^a, Anqi Chen ^a, Xinge Diana Zhang ^a, Maria Teresa Saenz Robles ^b, Hee-Sun Han ^c,^d, Yi Xiao ^a, Gao Xiao ^a, James M Pipas ^b, David A Weitz ^a,^e,¹

PMCID: PMC11295033 PMID: 39052829

Significance

As the most abundant biological entity on earth, viruses impact human health and the economy enormously, yet existing viral databases cover at most 0.1% of the global virome. Unknown viruses in an environmental sample can be found and characterized by bulk sequencing followed by metagenomics analysis. However, to identify a specific viral subpopulation, such as an emergent pathogen, bulk-sequencing all viral genomes reduces the sequencing resolution of the targeted subpopulation. To overcome this challenge, we develop a microfluidics-based method to enrich target viral genomes, which comprises single genome encapsulation and isolation, and a one-step DNA amplification and detection assay. The efficiency and reliability of our method are demonstrated by de novo assembly of virus genomes spiked in a sewage sample.

Keywords: single-virus sequencing, de novo genome assembly, isothermal gene detection, droplet microfluidics, whole-genome amplification

Abstract

Characterizing unknown viruses is essential for understanding viral ecology and preparing against viral outbreaks. Recovering complete genome sequences from environmental samples remains computationally challenging using metagenomics, especially for low-abundance species with uneven coverage. We present an experimental method for reliably recovering complete viral genomes from complex environmental samples. Individual genomes are encapsulated into droplets and amplified using multiple displacement amplification. A unique gene detection assay, which employs an RNA-based probe and an exonuclease, selectively identifies droplets containing the target viral genome. Labeled droplets are sorted using a microfluidic sorter, and genomes are extracted for sequencing. We demonstrate this method’s efficacy by spiking two known viral genomes, Simian virus 40 (SV40, 5,243 bp) and Human Adenovirus 5 (HAd5, 35,938 bp), into a sewage sample with a final abundance in the droplets of around 0.1% and 0.015%, respectively. We achieve 100% recovery of the complete sequence of the spiked-in SV40 genome with uniform coverage distribution. For the larger HAd5 genome, we cover approximately 99.4% of its sequence. Notably, genome recovery is achieved with as few as one sorted droplet, which enables the recovery of any desired genomes in complex environmental samples, regardless of their abundance. This method enables single-genome whole-genome amplification and targeting characterizations of rare viral species and will facilitate our ability to access the mutational profile in single-virus genomes and contribute to an improved understanding of viral ecology.

Viruses have enormous impacts on health and the economy, yet existing viral sequence databases cover at most 0.1% of the global virome (1). Recognizing this gap, numerous international consortia and research institutes have been actively surveilling the virome to gain a deeper understanding of viral diversity and ecology (1–3). In particular, characterizing unknown viruses can guide the prediction of the emergence of pathogenic viruses and the formulation of effective countermeasures to mitigate their impact (4). To find unknown viruses, environmental samples, such as sewage water, are analyzed because they harbor a large number of unknown species (5–9). The principal method to study viruses in environmental samples is shotgun metagenomics, where genetic materials of a mixture of viral species are all sequenced together (10, 11). Contigs assembled from metagenomics data can be used to predict functions of potential genes in a virus population and may reveal previously uncharacterized species if the identified genes do not match those in existing viral genome databases (12). In fact, this approach has led to the revelation of numerous unknown viral species, some of which are present in extremely low abundance within the population (9, 11, 13, 14). However, systematic characterization of these unknown species is hindered by the complexity and cost of computationally deriving the complete genome sequences of specific species of interest from metagenomic data (3, 15–19). In particular, it is technically challenging to reconstruct the genomes of rare species in a population or those from closely related strains, as their sequences can suffer from low and uneven representation in the sequencing library due to other prevalent species or population microdiversity.

To simplify genome reconstruction from heterogeneous libraries, single-cell genomics (SCGs) methods that isolate single viral species and prepare individual sequencing libraries have been developed (20–26). When key contig sequences are identified through metagenomics, SCGs methods that enrich genomes containing specific sequences prove more effective compared to those that profile every viral sequence indiscriminately. However, existing SCGs methods with selection schemes face issues such as high false-positive rates, uneven genome coverage in whole-genome amplification (WGA) libraries, or limited applicability to genomes with low abundance in a sample. Among various details in the experimental designs, the method to label genomes of interest critically influences the effectiveness of these methods. To provide a signal for selection, probe-based PCR can be used to detect genomes containing specific sequences. However, without careful primer design, the large number of amplicons generated by PCR detection can contaminate the sequencing library, introducing sequencing bias and wasting sequencing reads (26). The PCR amplicons can be digested if deoxythymidine triphosphate (dTTP) is replaced by Deoxyuridine Triphosphate (dUTP) to enable post-PCR USER digestion (21); nevertheless, the digested fragments can still interfere with WGA, resulting in under-represented regions in the sequencing library. Therefore, the need exists for alternative methods that can reliably detect and recover the complete genome of any given viral species from a complex sample.

Here, we describe a method that combines droplet microfluidics and a customized genome labeling assay to identify, amplify, and isolate the complete genome of a specific viral species from a heterogeneous sample. Individual genomes are encapsulated into droplets and amplified with multiple displacement amplification (MDA) (27). To label droplets containing the genome of interest, we develop a gene detection assay comprising an RNA-based probe and an exonuclease, which allows for a one-step reaction for genome amplification and labeling. The single-step reaction reduces sample loss compared to the two-step PCR detection process and eliminates the generation of unnecessary detection-related amplicons, ensuring a nonbiased whole-genome sequencing library. Labeled droplets are selected from the mixture with a microfluidic sorter, and these genomes are extracted for sequencing. To validate our method, we spike the genome of a known virus into a sewage sample and recover the whole sequence of the spiked-in genome. When we use Simianvirus 40 (SV40) (5,243 bp) as the target virus, the complete genome sequence is 100% recovered with a highly uniform coverage distribution. When Human Adenovirus 5 (HAd5) (35,938 bp) is chosen as the target, the assembled sequences cover 99.4% of its genome. Moreover, this method achieves full genome recovery with as little as one sorted droplet, demonstrating its potential for single-genome WGA and for targeting rare species within complex environmental samples. By enabling thorough characterization of species that could be of significant importance, this method complements metagenomics methods to accelerate our exploration of the virome.

Results and Discussion

Droplet microfluidics is used to compartmentalize individual viral genomes, amplify them inside droplets, and isolate the target genome. We dilute the sewage sample and encapsulate viral genomes into water-in-oil droplets following Poisson statistics, ensuring that nearly all droplets contain no more than one viral genome. Single-DNA encapsulation is the key to generating WGA libraries of single species. To generate clonal copies of each encapsulated genome, we perform MDA in droplets (Fig. 1A). As MDA is sensitive to contamination of exogenous DNA, we pretreat the reagents with UV light and prepare them in a high efficiency particulate air (HEPA)-filtered environment (28). A microfluidic drop maker is used to coencapsulate Φ29 DNA polymerase and other MDA reagents with the genomes for in-drop WGA. The drop maker contains separate inlets for the sewage sample and the MDA reagents, ensuring that the amplification reaction starts only when droplets are formed.

Fig. 1. — Microfluidic workflow of isolating, detecting, and amplifying a target viral genome. (A) Each viral genome is encapsulated into a droplet, along with MDA and detection assay reagents. WGA via MDA and the detection assay reaction take place inside each droplet during incubation. (B) The RNA probes in droplets containing the target genomes are digested and their fluorophore and quencher are separated. In contrast, those that are encapsulated with nontarget genomes remain intactGene probes in each droplet are digested or remain intact depending on whether they are coencapsulated with the target genome or not. (C) Droplets are reinjected into a microfluidic droplet sorter and those containing the target genomes are sorted out and collected into a tube for extraction. (D) The extracted DNA undergoes a second MDA (i) in bulk or (ii) in water/oil emulsion. (E) The final amplified products are sequenced and processed with the computational workflow to recover the sequence of the target genome.

To isolate the target genomes, the droplets that encapsulate them must be labeled to allow them to be selected. Other selected-sequencing workflows detect genomes of interest using PCR amplification of a known fragment, which is stained by a DNA intercalating dye or a probe (21, 26). These methods inherently generate a large number of PCR amplicons that either interfere with WGA or consume resources in the sequencing library, resulting in uneven genome coverages. In the case where the goal is to analyze genomic variations of a known species with a reference genome, unevenness in genome coverage is acceptable. However, for de novo genome assembly of unknown species, the uniformity of the amplified sequencing library is critically important. Missing regions or artificially overrepresented regions from PCR amplicons can lead to insufficient overlaps and failure to join contiguous sequences. To overcome this issue, we develop a gene detection assay that works directly with MDA and does not require the additional PCR step, ensuring the uniformity of the sequencing library. In addition, this method eliminates the additional microfluidic manipulation steps associated with the PCR reactions, saving sample-processing time and reducing the amount of material loss.

Commonly used gene detection methods include nonspecific gene labels such as a double-stranded DNA (dsDNA) binding dye and sequence-specific labels such as Taqman probes, molecular beacons, and adjacent hybridization probes (29). A dsDNA binding dye will fail to distinguish the droplets containing the target genome because MDA nonspecifically amplifies all DNA in each droplet. Taqman probes are not applicable as they require the DNA polymerase to exhibit 5′-to-3′ exonuclease activity, which is lacking in the Φ29 DNA polymerase used in MDA. Molecular beacon probes do not require such enzyme activity. However, they cannot generate differentiable signals between target and nontarget genomes in MDA conditions, specifically, in the presence of random primers and the absence of thermal cycling. Similarly, adjacent hybridization probes suffer from high background signals and produce an insufficient signal-to-noise ratio to identify target genomes in MDA reactions (SI Appendix, Fig. S1).

Since no commercially available probes are useful in selectively labeling MDA-amplified target genomes, we design a unique gene detection assay that is compatible with MDA conditions. This assay utilizes custom-designed RNA probes and an enzyme, RNase H. The RNA probe is an RNA oligo with a fluorophore conjugating at its 5′ end and a quencher at the 3′ end. The sequence of the oligo is designed using contigs from metagenomic data to target a viral species of interest. To ensure proper binding to target genes, we select RNA sequences that do not form stable secondary structures and have a melting temperature higher than 30 °C. The choice of the fluorophore–quencher pair can be flexible but the length of the oligo must be adjusted to achieve optimal quenching efficiency. For common commercial fluorophore–quencher pair, the optimal length of the oligo is 18 to 24 bp (30); in this work, we use the 5′FAM (Fluorescein)/3′IBFQ (Iowa Black® FQ) pair and 22 bp oligos. RNase H is an endonuclease that digests RNA only when an RNA strand is complexed with its complementary DNA strand. During MDA, if a droplet contains the target genome, the RNA probes anneal to its complementary region in the genome, forming DNA–RNA complexes. When RNase H binds to these complexes, it digests the RNA oligos, releasing the fluorophores from the quenchers. As a result, droplets containing the target genome exhibit high fluorescence signals upon excitation. By contrast, in droplets that contain a nontarget genome or no genome at all, the RNA probes are not digested by RNase H and the fluorophore remains quenched (Figs. 1B and 2A). This gene detection assay is particularly suitable for MDA and can be extended to other room-temperature amplification methods. Unlike probes that depend on the change of their secondary structure, such as molecular beacons, the choice of a linear probe circumvents the need for thermal cycling to induce a structural change. The RNase we choose to digest RNA probe attains high activity under the buffer condition of MDA at 30 °C (SI Appendix, Table S2). Moreover, this enzyme is highly specific only to the RNA strands when they are complexed with the complementary DNA, and therefore generates minimum background signals from unbound RNA probes.

Fig. 2. — RNA dual-labeled gene probe allows the detection of specific genomes. (A) Sequence-specific RNA probes bind to target genomes and are subsequently digested by RNase H, leaving the DNA intact and releasing the fluorophores. (B) Droplets containing the target genomes emit fluorescence signals upon excitation. (C) The distribution of fluorescence detected by our sorting setup shows two distinct peaks.

To perform this gene detection assay, we encapsulate all reagents for the detection assay together with MDA reagents and individual viral genomes. After encapsulation, droplets are collected into a tube and incubated at 30 °C for 16 h. To select those droplets containing the genome of interest, we reinject all the droplets into a microfluidic sorting device and perform fluorescence-activated droplet sorting (31). Sorted droplets are collected into a tube, and the target genomes are released from the droplets by adding a demulsifier to break the emulsion (Fig. 1C). To prepare genomic DNA for sequencing, the minimum amount of DNA is 1 ng (Nextera XT, Illumina Inc.); however, samples with 100 or fewer sorted droplets do not meet this requirement. Therefore, we use the genomic copies extracted from sorted droplets as templates and perform a second MDA in bulk to generate sufficient DNA for sequencing (Fig. 1 D, i). The final amplification products are processed into a sequencing library and sequenced with an Illumina platform (Fig. 1E).

To assemble the sequencing reads into a final genome sequence, we develop a computational workflow. We filter raw sequencing reads to eliminate low-quality ones and map them to the human genome to remove potential human contamination. The remaining reads are assembled into contigs with an open-sourced de novo genome assembler, SPAdes (32). The resulting contigs are aligned to genomes in the NCBI databases. The longest contig that does not map to any known organism is chosen to be the sequence of the target genome.

To validate our method for sequencing the whole genome of a target viral species from a heterogeneous source, we conduct experiments using two sets of sewage samples, one with spiked-in genomic DNA from the known SV40 virus and one without. We design an RNA probe (SI Appendix, Table S1) to label SV40 and initially assessed its performance with bulk MDA before incorporating it into our microfluidic workflow. We assay the two sets of sewage samples and monitor their fluorescence signals at various time points (SI Appendix, Fig. S2). After 6 h of incubation, we observe that in the samples with spiked-in SV40 genomes, the average fluorescence intensity was approximately 1.8 times higher than in samples without SV40 genomes. This difference in fluorescence intensity continues to increase and plateaus after between 12 to 16 h, reaching a maximum difference of nearly 2.4-fold. Additionally, we optimize the concentrations of the RNA probe and RNase H by performing bulk MDA with the two sets of sewage samples with a series of concentrations and measuring the fluorescence intensities at the end of the reactions (SI Appendix, Table S3). We find that a probe concentration of 500 nM and an RNase H concentration of 0.2 units/μL yield the largest signal-to-noise ratio.

We apply the optimized assay concentrations to perform the detection assay with in-drop MDA. In the sample with spike-in SV40 genomes, we observe high fluorescence intensity in a subset of droplets (Fig. 2B). By contrast, in the sample without the spike-in, the fluorescence level in all imaged droplets remains low, confirming the specificity of the RNA probe for SV40 (SI Appendix). To verify that the fluorescence intensity differences between droplets containing the SV40 genome copies and those that do not are large enough to be detected by the sorting setup, we inject droplets that have undergone single-genome MDA with a spike-in sample into the sorting chip. Two distinct populations are detected based on the fluorescence intensity. The major dark population corresponds to empty drops or drops containing nontarget viral genomes, whereas the bright population represents droplets containing the SV40 target (Fig. 2C). The bright population comprises ~0.1% of all drops, consistent with the amount of SV40 genome we spike into the sewage sample. The bright droplets are selected and the genomic DNA from these droplets is amplified in a bulk MDA reaction to generate sufficient DNA for sequencing.

To assess the quality of the amplified DNA, we perform a restriction analysis before sending the sample for sequencing. A restriction enzyme, PvuII, is used to digest the amplified DNA, and the resultant fragments are analyzed using gel electrophoresis. When we use DNA from more than five sorted droplets as the template for the second round of MDA, the restriction analysis shows successful amplification of the SV40 genome, as confirmed by the three expected fragments at 1,446, 1,790, and 1,997 bp. However, when only one droplet is sorted, the recovered genome cannot be further amplified because the concentration of the SV40 genome in the MDA reaction is too low, allowing nonspecific amplification from primer dimers or contaminating DNA to dominate the reaction. Samples with only one sorted droplet thus result in nonspecific amplification products which cannot be digested by PvuII (Fig. 3A).

Fig. 3. — Sorted genomes need to be amplified with MDA again to yield sufficient DNA for sequencing, and performing reactions in drops generated by pipetting shows reduced background amplification. Genomes from one sorted droplet are dominated by background signal in bulk MDA (A) but not in pipet-drop MDA (B).

To reduce nonspecific amplification and achieve WGA in samples with only one sorted droplet, we carry out the postsorting MDA in subnanoliter droplets. Instead of using an additional microfluidic device, we create droplets by simple pipetting. We add fluorinated oil and MDA reagents to the DNA solution collected from the sorting, pipet up and down for about 1 min to generate an emulsion and incubate at 30 °C for 8 h. During pipetting, SV40 genomic copies and any potentially contaminating DNA fragments are compartmentalized into polydispersed water-in-oil droplets. This compartmentalization increases the effective concentration of SV40 genomes in droplets and concurrently suppresses the amplification of primer dimers and any contaminating DNA. We use this simple pipetting method, as opposed to using a microfluidic setup, for generating droplets in postsorting MDA because the uniformity of droplets or the precise count of genomic copies within each droplet becomes less critical at this stage, where the DNA solution primarily consists of genomes from a single viral species. The final amplification products are extracted from the droplets for subsequent processing (Fig. 1 D, ii). With the same restriction analysis described above, we confirm that this simple pipetting approach generates enough genomic DNA for sequencing from as little as one sorted droplet (Fig. 3B). The capability to recover a viral genome from single droplets will enable precise genetic characterization to find rare viruses in the environment and to access the mutational profile of single virus genomes.

The amplification product from a single, sorted droplet is sequenced, and the reads are de novo-assembled using our computational pipeline. The output of the assembler matches the SV40 genome in the NCBI database with 100% sequence coverage and sequence identity (Table 1, Sample 2/3). Interestingly, the length of the assembled contig is longer than that of the SV40 genome. Upon examination of the contig, we find that the sequences at both ends overlap, suggesting that this length difference can be used to infer the circular structure of a genome.

Table 1.

Summary of the de novo assembly of representative samples

Sample	Target species	Number of sorted droplets	Length of selected contig (bp)	Coverage of reference genome (%)	Sequence identify (%)
1/1	SV40	50	5,613	100	100
1/2	SV40	5	5,541	100	100
2/1	SV40	5	5,465	100	100
2/2	SV40	5	5,397	100	100
2/3	SV40	1	5,254	100	100
2/4	SV40	1	5,397	100	100
3/1	HAd5	5	35,725	99.4	99.9
3/2	HAd5	5	35,725	99.4	99.9
3/3	HAd5	1	35,725	99.4	99.9
3/4	HAd5	1	35,725	99.4	99.9

Open in a new tab

To further investigate any possible bias or contamination in our sequencing library, we analyze the sequence reads by aligning them to the SV40 reference genome. We find complete coverage of the genome with highly uniform distribution (Fig. 4 A and B), indicating minimal bias in our genome amplification process. The uniformity of WGA could be further improved by using a thermostable mutant of the phi29 polymerase, EquiPhi29™, especially for genomes with high G-C content (25). Moreover, the fraction of mapped reads in the sequencing library is close to unity (SI Appendix, Table S4), which confirms a negligible degree of contamination in the workflow and explains the successful assembly of the target genome. Consistent sequencing and assembly results are obtained from five other samples tested (Table 1 and SI Appendix, Fig. S3 and Table S4), further demonstrating the reliability of our method.

Fig. 4. — Representative graphs of the genome coverage of the enriched SV40 and HAd5 samples. (A) Per-base coverage of the SV40 genome. (B) Distribution of the per-base coverage of the SV40 genome. (C) Per-base coverage of the HAd5 genome.

To examine the generalizability of our platform, we apply our method to a second virus, HAd5, whose genome is linear and much longer than SV40. Given the high processivity and strong strand-displacement activity in MDA, small circular genomes are preferred templates as compared to large linear genomes as they allow the polymerase to proceed continuously around the circular template (33). However, since most viral genomes are linear, we further demonstrate our method on HAd5, a representative virus that has a linear genome. We spike the genomic DNA of HAd5 into the sewage sample, design an RNA probe for it (SI Appendix, Table S1), and enrich its genome using our experimental method. We collect samples containing one or five sorted droplets and use the pipetting method to amplify the genomes a second time prior to sequencing. De novo assembly of the reads generate contigs matches well to the known HAd5 genome, consistently achieving 99.4% in sequence coverage and higher than 99.9% in sequence identity with as little as one sorted droplet (Table 1, Sample 3/1-4). To investigate the source of the 0.6% missing coverage, we map the sequence reads to the HAd5 reference genome. The reads cover the complete reference genome as shown in Fig. 4C, suggesting that the missing bases in the de novo-assembled contig are not the result of a defect in our library preparation process. Comparing the assembled contig to the reference genome, we find that the missing bases are mainly the 103 bases from both ends, which are the inverted terminal repeats of HAd5 (34). Interestingly, the assembler outputs this 103 bp sequence as the second-longest contig. The terminal repeats are probably separated by the assembler because the patterns of reads in this region, such as the frequency and existence of reads from both strands, were not the same as in the rest of the genome. We expect that optimizing genome assemblers in resolving inverted terminal repeats would improve the completeness of assembled genomes of adenoviruses.

Our method enables the complete genome sequencing of a specific viral species from as little as one single sorted droplet, enabling numerous high-impact applications, including identifying rare viral species, achieving strain-level genome recovery, and analyzing the mutation profiles of individual viral genomes. Additionally, this technique can be adapted to simultaneously target multiple viral species in one experiment by incorporating barcoding schemes (35). By employing high-throughput screening to identify the species of interest, followed by selective barcoding and sequencing of their genomes, this approach concentrates the sequencing reads on specific subsets of species within a sample, allowing for efficient and cost-effective targeted viral studies.

Supplementary Material

Appendix 01 (PDF)

pnas.2404727121.sapp.pdf^{(715.3KB, pdf)}

Acknowledgments

The research work of the authors is supported by the US NIH grants, 5R01AI153156.

Author contributions

L.C., H.-S.H., J.M.P., and D.A.W. designed research; L.C., X.D.Z., and M.T.S.R. performed research; L.C., M.T.S.R., Y.X., and G.X. contributed new reagents/analytic tools; L.C. analyzed data; and L.C. and A.C. wrote the paper.

Competing interests

The authors declare no competing interest.

Footnotes

This article is a PNAS Direct Submission.

Data, Materials, and Software Availability

All study data are included in the article and/or SI Appendix.

Supporting Information

References

1.Edgar R. C., et al. , Petabase-scale sequence alignment catalyses viral discovery. Nature 602, 142–147 (2022), 10.1038/s41586-021-04332-2. [DOI] [PubMed] [Google Scholar]
2.Carroll D., et al. , The global virome project. Science 359, 872–874 (2018). [DOI] [PubMed] [Google Scholar]
3.Neri U., et al. , Expansion of the global RNA virome reveals diverse clades of bacteriophages. Cell 185, 4023–4037.e18 (2022), 10.1016/j.cell.2022.08.023. [DOI] [PubMed] [Google Scholar]
4.Artika I. M., Wiyatno A., Ma’roef C. N., Pathogenic viruses: Molecular detection and characterization. Infect. Genet. Evol. 81, 104215 (2020), 10.1016/j.meegid.2020.104215. [DOI] [PMC free article] [PubMed] [Google Scholar]
5.Tisza M., et al. , Wastewater sequencing reveals community and variant dynamics of the collective human virome. Nat. Commun. 14, 6878 (2023), 10.1038/s41467-023-42064-1. [DOI] [PMC free article] [PubMed] [Google Scholar]
6.Cantalupo P. G., et al. , Raw sewage harbors diverse viral populations. mBio 2, e00180-11 (2011), 10.1128/mBio.00180-11. [DOI] [PMC free article] [PubMed] [Google Scholar]
7.Tao Z., et al. , Detection of multiple human astroviruses in sewage by next-generation sequencing. Water Res. 218, 118523 (2022), 10.1016/j.watres.2022.118523. [DOI] [PubMed] [Google Scholar]
8.Lu J., et al. , Capturing noroviruses circulating in the population: Sewage surveillance in Guangdong, China (2013–2018). Water Res. 196, 116990 (2021), 10.1016/j.watres.2021.116990. [DOI] [PubMed] [Google Scholar]
9.Kraberger S., Schreck J., Galilee C., Varsani A., Genome sequences of microviruses identified in a sample from a sewage treatment oxidation pond. Microbiol. Resour. Announc. 10, e00373-21 (2021), 10.1128/MRA.00373-21. [DOI] [PMC free article] [PubMed] [Google Scholar]
10.Chen G., et al. , Application of metagenomics to biological wastewater treatment. Sci. Total Environ. 807, 150737 (2022), 10.1016/j.scitotenv.2021.150737. [DOI] [PubMed] [Google Scholar]
11.Paez-Espino D., et al. , Uncovering Earth’s virome. Nature 536, 425–430 (2016), 10.1038/nature19094. [DOI] [PubMed] [Google Scholar]
12.Bharagava R. N., et al. , “Applications of metagenomics in microbial bioremediation of pollutants: From genomics to environmental cleanup” in Microbial Diversity in the Genomic Era (Academic Press, 2018), pp. 459–477, 10.1016/B978-0-12-814849-5.00026-5. [DOI] [Google Scholar]
13.Loh J., et al. , Detection of novel sequences related to African Swine Fever virus in human serum and sewage. J. Virol. 83, 13019–13025 (2009), 10.1128/JVI.00638-09. [DOI] [PMC free article] [PubMed] [Google Scholar]
14.Pavlopoulos G. A., et al. , Unraveling the functional dark matter through global metagenomics. Nature 622, 594–602 (2023), 10.1038/s41586-023-06583-7. [DOI] [PMC free article] [PubMed] [Google Scholar]
15.Xu Y., Zhao F., Single-cell metagenomics: Challenges and applications. Protein Cell 9, 501–510 (2018), 10.1007/s13238-018-0544-5. [DOI] [PMC free article] [PubMed] [Google Scholar]
16.Ayling M., Clark M. D., Leggett R. M., New approaches for metagenome assembly with short reads. Brief. Bioinform. 21, 584–594 (2020), 10.1093/bib/bbz020. [DOI] [PMC free article] [PubMed] [Google Scholar]
17.Ghurye J. S., Cepeda-Espinoza V., Pop M., Metagenomic assembly: Overview, challenges, and applications. Yale J. Biol. Med. 89, 353–362 (2016). [PMC free article] [PubMed] [Google Scholar]
18.Zhou Y., et al. , Recovering metagenome-assembled genomes from shotgun metagenomic sequencing data: Methods, applications, challenges, and opportunities. Microbiol. Res. 260, 127023 (2022), 10.1016/j.micres.2022.127023. [DOI] [PubMed] [Google Scholar]
19.Martínez Martínez J., Single-virus genomics and beyond. Nat. Rev. Microbiol. 18, 705–716 (2020), 10.1038/s41579-020-00444-0. [DOI] [PubMed] [Google Scholar]
20.Martinez-Hernandez F., et al. , Single-virus genomics reveals hidden cosmopolitan and abundant viruses. Nat. Commun. 8, 15892 (2017), 10.1038/ncomms15892. [DOI] [PMC free article] [PubMed] [Google Scholar]
21.Han H. S., et al. , Whole-genome sequencing of a single viral species from a highly heterogeneous sample. Angew. Chem. Int. Ed. Engl. 54, 13985–13988 (2015), 10.1002/anie.201507047. [DOI] [PMC free article] [PubMed] [Google Scholar]
22.Allen L. Z., et al. , Single virus genomics: A new tool for virus discovery. PLoS One 6, e17722 (2011), 10.1371/journal.pone.0017722. [DOI] [PMC free article] [PubMed] [Google Scholar]
23.Nishikawa Y., et al. , Validation of the application of gel beads-based single-cell genome sequencing platform to soil and seawater. ISME Commun. 2, 92 (2022), 10.1038/s43705-022-00179-4. [DOI] [PMC free article] [PubMed] [Google Scholar]
24.Beaulaurier J., et al. , Assembly-free single-molecule sequencing recovers complete virus genomes from natural microbial communities. Genome Res. 30, 437–446 (2020), 10.1101/gr.251686.119. [DOI] [PMC free article] [PubMed] [Google Scholar]
25.Stepanauskas R., et al. , Improved genome recovery and integrated cell-size analyses of individual uncultured microbial cells and viral particles. Nat. Commun. 8, 84 (2017), 10.1038/s41467-017-00128-z. [DOI] [PMC free article] [PubMed] [Google Scholar]
26.Sun C., et al. , Droplet-microfluidics-assisted sequencing of HIV proviruses and their integration sites in cells from people on antiretroviral therapy. Nat. Biomed. Eng. 6, 1004–1012 (2022), 10.1038/s41551-022-00864-8. [DOI] [PMC free article] [PubMed] [Google Scholar]
27.Lasken R. S., Genomic DNA amplification by the multiple displacement amplification (MDA) method. Biochem. Soc. Trans. 37, 450–453 (2009), 10.1042/BST0370450. [DOI] [PubMed] [Google Scholar]
28.Woyke T., et al. , Decontamination of MDA reagents for single cell whole genome amplification. PLoS One 6, e26161 (2011), 10.1371/journal.pone.0026161. [DOI] [PMC free article] [PubMed] [Google Scholar]
29.Didenko V. V., DNA probes using fluorescence resonance energy transfer (FRET): Designs and applications. Biotechniques 31, 1106–1116, 1118, 1120–1121 (2001), 10.2144/01315rv02. [DOI] [PMC free article] [PubMed] [Google Scholar]
30.Thornton B., Basu C., Real-time PCR (qPCR) primer design using free online software. Biochem. Mol. Biol. Educ. 39, 145–154 (2011), 10.1002/bmb.20461. [DOI] [PubMed] [Google Scholar]
31.Baret J. C., et al. , Fluorescence-activated droplet sorting (FADS): Efficient microfluidic cell sorting based on enzymatic activity. Lab Chip 9, 1850–1858 (2009), 10.1039/b902504a. [DOI] [PubMed] [Google Scholar]
32.Bankevich A., et al. , SPAdes: A new genome assembly algorithm and its applications to single-cell sequencing. J. Comput. Biol. 19, 455–477 (2012), 10.1089/cmb.2012.0021. [DOI] [PMC free article] [PubMed] [Google Scholar]
33.Dean F. B., et al. , Rapid amplification of plasmid and phage DNA using Phi 29 DNA polymerase and multiply-primed rolling circle amplification. Genome Res. 11, 1095–1099 (2001), 10.1101/gr.180501. [DOI] [PMC free article] [PubMed] [Google Scholar]
34.Hatfield L., Hearing P., Redundant elements in the adenovirus type 5 inverted terminal repeat promote bidirectional transcription in vitro and are important for virus growth in vivo. Virology 184, 265–276 (1991), 10.1016/0042-6822(91)90843-z. [DOI] [PubMed] [Google Scholar]
35.Lan F., et al. , Droplet barcoding for massively parallel single-molecule deep sequencing. Nat. Commun. 7, 11784 (2016), 10.1038/ncomms11784. [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Appendix 01 (PDF)

pnas.2404727121.sapp.pdf^{(715.3KB, pdf)}

Data Availability Statement

All study data are included in the article and/or SI Appendix.

[r1] 1.Edgar R. C., et al. , Petabase-scale sequence alignment catalyses viral discovery. Nature 602, 142–147 (2022), 10.1038/s41586-021-04332-2. [DOI] [PubMed] [Google Scholar]

[r2] 2.Carroll D., et al. , The global virome project. Science 359, 872–874 (2018). [DOI] [PubMed] [Google Scholar]

[r3] 3.Neri U., et al. , Expansion of the global RNA virome reveals diverse clades of bacteriophages. Cell 185, 4023–4037.e18 (2022), 10.1016/j.cell.2022.08.023. [DOI] [PubMed] [Google Scholar]

[r4] 4.Artika I. M., Wiyatno A., Ma’roef C. N., Pathogenic viruses: Molecular detection and characterization. Infect. Genet. Evol. 81, 104215 (2020), 10.1016/j.meegid.2020.104215. [DOI] [PMC free article] [PubMed] [Google Scholar]

[r5] 5.Tisza M., et al. , Wastewater sequencing reveals community and variant dynamics of the collective human virome. Nat. Commun. 14, 6878 (2023), 10.1038/s41467-023-42064-1. [DOI] [PMC free article] [PubMed] [Google Scholar]

[r6] 6.Cantalupo P. G., et al. , Raw sewage harbors diverse viral populations. mBio 2, e00180-11 (2011), 10.1128/mBio.00180-11. [DOI] [PMC free article] [PubMed] [Google Scholar]

[r7] 7.Tao Z., et al. , Detection of multiple human astroviruses in sewage by next-generation sequencing. Water Res. 218, 118523 (2022), 10.1016/j.watres.2022.118523. [DOI] [PubMed] [Google Scholar]

[r8] 8.Lu J., et al. , Capturing noroviruses circulating in the population: Sewage surveillance in Guangdong, China (2013–2018). Water Res. 196, 116990 (2021), 10.1016/j.watres.2021.116990. [DOI] [PubMed] [Google Scholar]

[r9] 9.Kraberger S., Schreck J., Galilee C., Varsani A., Genome sequences of microviruses identified in a sample from a sewage treatment oxidation pond. Microbiol. Resour. Announc. 10, e00373-21 (2021), 10.1128/MRA.00373-21. [DOI] [PMC free article] [PubMed] [Google Scholar]

[r10] 10.Chen G., et al. , Application of metagenomics to biological wastewater treatment. Sci. Total Environ. 807, 150737 (2022), 10.1016/j.scitotenv.2021.150737. [DOI] [PubMed] [Google Scholar]

[r11] 11.Paez-Espino D., et al. , Uncovering Earth’s virome. Nature 536, 425–430 (2016), 10.1038/nature19094. [DOI] [PubMed] [Google Scholar]

[r12] 12.Bharagava R. N., et al. , “Applications of metagenomics in microbial bioremediation of pollutants: From genomics to environmental cleanup” in Microbial Diversity in the Genomic Era (Academic Press, 2018), pp. 459–477, 10.1016/B978-0-12-814849-5.00026-5. [DOI] [Google Scholar]

[r13] 13.Loh J., et al. , Detection of novel sequences related to African Swine Fever virus in human serum and sewage. J. Virol. 83, 13019–13025 (2009), 10.1128/JVI.00638-09. [DOI] [PMC free article] [PubMed] [Google Scholar]

[r14] 14.Pavlopoulos G. A., et al. , Unraveling the functional dark matter through global metagenomics. Nature 622, 594–602 (2023), 10.1038/s41586-023-06583-7. [DOI] [PMC free article] [PubMed] [Google Scholar]

[r15] 15.Xu Y., Zhao F., Single-cell metagenomics: Challenges and applications. Protein Cell 9, 501–510 (2018), 10.1007/s13238-018-0544-5. [DOI] [PMC free article] [PubMed] [Google Scholar]

[r16] 16.Ayling M., Clark M. D., Leggett R. M., New approaches for metagenome assembly with short reads. Brief. Bioinform. 21, 584–594 (2020), 10.1093/bib/bbz020. [DOI] [PMC free article] [PubMed] [Google Scholar]

[r17] 17.Ghurye J. S., Cepeda-Espinoza V., Pop M., Metagenomic assembly: Overview, challenges, and applications. Yale J. Biol. Med. 89, 353–362 (2016). [PMC free article] [PubMed] [Google Scholar]

[r18] 18.Zhou Y., et al. , Recovering metagenome-assembled genomes from shotgun metagenomic sequencing data: Methods, applications, challenges, and opportunities. Microbiol. Res. 260, 127023 (2022), 10.1016/j.micres.2022.127023. [DOI] [PubMed] [Google Scholar]

[r19] 19.Martínez Martínez J., Single-virus genomics and beyond. Nat. Rev. Microbiol. 18, 705–716 (2020), 10.1038/s41579-020-00444-0. [DOI] [PubMed] [Google Scholar]

[r20] 20.Martinez-Hernandez F., et al. , Single-virus genomics reveals hidden cosmopolitan and abundant viruses. Nat. Commun. 8, 15892 (2017), 10.1038/ncomms15892. [DOI] [PMC free article] [PubMed] [Google Scholar]

[r21] 21.Han H. S., et al. , Whole-genome sequencing of a single viral species from a highly heterogeneous sample. Angew. Chem. Int. Ed. Engl. 54, 13985–13988 (2015), 10.1002/anie.201507047. [DOI] [PMC free article] [PubMed] [Google Scholar]

[r22] 22.Allen L. Z., et al. , Single virus genomics: A new tool for virus discovery. PLoS One 6, e17722 (2011), 10.1371/journal.pone.0017722. [DOI] [PMC free article] [PubMed] [Google Scholar]

[r23] 23.Nishikawa Y., et al. , Validation of the application of gel beads-based single-cell genome sequencing platform to soil and seawater. ISME Commun. 2, 92 (2022), 10.1038/s43705-022-00179-4. [DOI] [PMC free article] [PubMed] [Google Scholar]

[r24] 24.Beaulaurier J., et al. , Assembly-free single-molecule sequencing recovers complete virus genomes from natural microbial communities. Genome Res. 30, 437–446 (2020), 10.1101/gr.251686.119. [DOI] [PMC free article] [PubMed] [Google Scholar]

[r25] 25.Stepanauskas R., et al. , Improved genome recovery and integrated cell-size analyses of individual uncultured microbial cells and viral particles. Nat. Commun. 8, 84 (2017), 10.1038/s41467-017-00128-z. [DOI] [PMC free article] [PubMed] [Google Scholar]

[r26] 26.Sun C., et al. , Droplet-microfluidics-assisted sequencing of HIV proviruses and their integration sites in cells from people on antiretroviral therapy. Nat. Biomed. Eng. 6, 1004–1012 (2022), 10.1038/s41551-022-00864-8. [DOI] [PMC free article] [PubMed] [Google Scholar]

[r27] 27.Lasken R. S., Genomic DNA amplification by the multiple displacement amplification (MDA) method. Biochem. Soc. Trans. 37, 450–453 (2009), 10.1042/BST0370450. [DOI] [PubMed] [Google Scholar]

[r28] 28.Woyke T., et al. , Decontamination of MDA reagents for single cell whole genome amplification. PLoS One 6, e26161 (2011), 10.1371/journal.pone.0026161. [DOI] [PMC free article] [PubMed] [Google Scholar]

[r29] 29.Didenko V. V., DNA probes using fluorescence resonance energy transfer (FRET): Designs and applications. Biotechniques 31, 1106–1116, 1118, 1120–1121 (2001), 10.2144/01315rv02. [DOI] [PMC free article] [PubMed] [Google Scholar]

[r30] 30.Thornton B., Basu C., Real-time PCR (qPCR) primer design using free online software. Biochem. Mol. Biol. Educ. 39, 145–154 (2011), 10.1002/bmb.20461. [DOI] [PubMed] [Google Scholar]

[r31] 31.Baret J. C., et al. , Fluorescence-activated droplet sorting (FADS): Efficient microfluidic cell sorting based on enzymatic activity. Lab Chip 9, 1850–1858 (2009), 10.1039/b902504a. [DOI] [PubMed] [Google Scholar]

[r32] 32.Bankevich A., et al. , SPAdes: A new genome assembly algorithm and its applications to single-cell sequencing. J. Comput. Biol. 19, 455–477 (2012), 10.1089/cmb.2012.0021. [DOI] [PMC free article] [PubMed] [Google Scholar]

[r33] 33.Dean F. B., et al. , Rapid amplification of plasmid and phage DNA using Phi 29 DNA polymerase and multiply-primed rolling circle amplification. Genome Res. 11, 1095–1099 (2001), 10.1101/gr.180501. [DOI] [PMC free article] [PubMed] [Google Scholar]

[r34] 34.Hatfield L., Hearing P., Redundant elements in the adenovirus type 5 inverted terminal repeat promote bidirectional transcription in vitro and are important for virus growth in vivo. Virology 184, 265–276 (1991), 10.1016/0042-6822(91)90843-z. [DOI] [PubMed] [Google Scholar]

[r35] 35.Lan F., et al. , Droplet barcoding for massively parallel single-molecule deep sequencing. Nat. Commun. 7, 11784 (2016), 10.1038/ncomms11784. [DOI] [PMC free article] [PubMed] [Google Scholar]

PERMALINK

Targeted whole-genome recovery of single viral species in a complex environmental sample

Liyin Chen

Anqi Chen

Xinge Diana Zhang

Maria Teresa Saenz Robles

Hee-Sun Han

Yi Xiao

Gao Xiao

James M Pipas

David A Weitz

Significance

Abstract

Results and Discussion

Fig. 1.

Fig. 2.

Fig. 3.

Table 1.

Fig. 4.

Supplementary Material

Acknowledgments

Author contributions

Competing interests

Footnotes

Data, Materials, and Software Availability

Supporting Information

References

Associated Data

Supplementary Materials

Data Availability Statement

ACTIONS

PERMALINK

RESOURCES

Cite

Add to Collections

PERMALINK

Targeted whole-genome recovery of single viral species in a complex environmental sample

Liyin Chen

Anqi Chen

Xinge Diana Zhang

Maria Teresa Saenz Robles

Hee-Sun Han

Yi Xiao

Gao Xiao

James M Pipas

David A Weitz

Significance

Abstract

Results and Discussion

Fig. 1.

Fig. 2.

Fig. 3.

Table 1.

Fig. 4.

Supplementary Material

Acknowledgments

Author contributions

Competing interests

Footnotes

Data, Materials, and Software Availability

Supporting Information

References

Associated Data

Supplementary Materials

Data Availability Statement

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases