Abstract
Single cell genomics is a powerful and increasingly popular tool for studying the genetic make-up of uncultured microbes. A key challenge for successful single cell sequencing and analysis is the removal of exogenous DNA from whole genome amplification reagents. We found that UV irradiation of the multiple displacement amplification (MDA) reagents, including the Phi29 polymerase and random hexamer primers, effectively eliminates the amplification of contaminating DNA. The methodology is quick, simple, and highly effective, thus significantly improving whole genome amplification from single cells.
Introduction
The large amounts of DNA required for microbial genome sequencing are traditionally harvested from laboratory cultures, yet most microorganisms cannot be easily grown in isolation. Thus, the metabolic information encoded within most species is largely inaccessible with standard genomic approaches. Single cell whole genome amplification (WGA), however, circumvents this requirement for isolation by producing billions of genome copies from a single template. Multiple displacement amplification (MDA) using phi29 polymerase and random hexamer primers has become the preferred method for single cell WGA, and has successfully enabled partial and full genome recovery of microbes from a variety of environments [1]–[6]. However, the commercially available MDA reagents are frequently contaminated with unwanted DNA that is co-amplified with the target DNA, which reduces sequence efficiency and could confound analysis of unknown microbial genomes [6], [7]. While it is possible to prepare high purity Phi29 polymerase in house with careful measures of eliminating contaminating nucleic acids in many steps [7], a simpler and equally effective method of removing contaminants from commercial reagents has not been fully explored.
UV-irradiation can cause DNA single- and double-strand breaks, photooxidation damage of bases, and the formation of cyclobutane pyrimidine dimers [8]–[11]. These UV-induced lesions are inhibitory to DNA replication as the polymerase terminates or stalls at the lesion sites. Due to its simplicity, UV-irradiation has been used to treat PCR and MDA reagents to successfully suppress the amplification of unwanted DNA when dealing with single or a few copies of target DNA [3], [12], [13]. In the attempt of standardizing the UV-irradiation method, we here report the effect of different UV dosages on removing contaminant DNA from the MDA amplification reagents used for single cell whole genome amplification, as well as the UV impact on the enzymatic activity. From the analysis of genomic sequence data of >100 Escherichia coli single cells, we demonstrate the optimal range of UV treatment of MDA reagents for efficiently removing contaminant DNA without a significant reduction of the Phi29 activity or introducing additional single cell genome coverage bias or artifacts.
Results and Discussion
Real-time MDA and high throughput shotgun sequencing allowed us to identify the ideal UV exposure required to eliminate exogenous DNA amplification while maintaining sufficient polymerase activity for whole genome amplification. Removal efficiency was assessed by intentionally contaminating MDA reagents with 50 fg of Bacillus subtilis DNA in each reaction, which is equivalent to approximately 10 genome copies. Contaminated and uncontaminated MDA reaction cocktails were irradiated for 0, 30, 60 and 90 min prior to real-time amplification of individual E. coli cells (Figures S1, S2, and S3). Amplification kinetics in the real-time MDA reactions of these single cells and positive controls (reactions with 10–100 E. coli cells) were compared between the UV-irradiations (Figure 1, Figure S3). We observed an increase of time required to amplify positive controls and single cells with an increase of UV treatment time. Only a marginal reduction of the number of amplified single cells and their fluorescent intensities of the final amplified products were observed if the UV treatment time was limited to 60 min. Most of the single cell amplified products represent approximately 108-fold increase of DNA quantity (i.e. from 5 fg to 0.5 µg). In contrast, a much larger impact was seen with the amplification of background contaminated DNA in the real-time MDA curves. These amplification curves indicate a window of opportunity to harvest the amplified target genomes prior to the occurrence of background amplification. The observed deterioration of the MDA activity was due to the reduction of the Phi29 enzymatic activity as the MDA activity can be restored by adding more polymerase suggesting that the hexamers, nucleotides and other components are not the limiting factors in the UV treated reagents (data not shown). In summary, the real-time MDA data suggests that the 60 min UV treatment of the reagents effectively eliminates amplification in no template controls and does not have a significant impact on the polymerase activity in single cell reactions.
To verify our real-time MDA results, we performed shotgun sequencing of 109 E. coli single amplified genomes and 37 control samples on the Illumina GAIIX platform (Figure S1). We generated 7.6 Gbp from these libraries, which corresponds to approximately 10x sequence coverage for each MDA product (Supplementary Methods). Reads were mapped to the E. coli and B. subtilis genomes as well as blasted against the nt database to determine the composition of the sequencing libraries. We found that a 60 min UV treatment (or an accumulative dose of 11.4 J/cm2) of the MDA reagents completely eliminated the common contaminants (e.g. Pseudomonas and Delftia sequences) typically found in untreated samples (Figure 2A–C). Even with the 30 min UV treatments, most of the common contaminants were removed from the reagents. We also observed a bias of unmapped reads (i.e. no similarity to any GenBank organisms) surviving the UV treatments even as high as 90 min (an accumulative dose of 17 .1 J/cm2). Unmapped reads could either represent contaminated organisms that have not been sequenced yet or the elongated products of hexamers priming each other. The lack of both, sequence similarity (blastx hits) to known proteins and predicted long open reading frames (ORFs), as well as the absence of matching reads amongst different UV-treated samples (data not shown) suggest that these unmapped reads originated from random hexamers. Similarly, the percentage of reads matching B. subtilis in intentionally contaminated libraries dropped to nearly zero after 60 minutes of exposure (Figure S4).
To assess whether the UV treatment diminished genome recovery, we generated rarefaction curves of genome coverage for the single cell genome assemblies (Figure 2D). Rarefaction curves for the different treatment durations did not show significant difference, suggesting that UV treatment does not systematically impact genome recovery. Twelve single E. coli cells were sequenced to a greater depth (∼160x sequence coverage), yielding genome recovery of ∼32–72% based on read mapping or 13–41% when using de novo assembly. These estimates provide a baseline on what one can expect to recover from a single cell given the protocols used in this study and a short-read sequence depth of 160x (Figure S5). We moreover assessed the impact of the photo-damaged hexamers to the error rate of the amplified genomes. The average error rate of the resulting E. coli reads was not significantly different for the different UV treatments: 1.1±0.1%, 0.9±0.1%, 0.9±0.1%, and 0.8±0.2% for UV treatments of 0, 30, 60, and 90 minutes, respectively. This result indicates either the photodamaged hexamers were not incorporated into the amplified genomes or the UV treatment does not impact the enzyme's proofreading ability. Thus, UV irradiation is a simple and effective treatment for decontaminating MDA reagents used for single cell genome amplification.
Materials and Methods
Single-cell sorting
The cells used in this study, Escherichia coli str. K-12 substr. MG1655 (TaxID: 511145), were originally obtained from ATCC (strain #700926). Cells were collected following the clean sorting procedures detailed by Rodrigue et al. 2009 [14]. Briefly, a stationary phase culture of individual E. coli cells was sorted by the Cytopeia Influx Cell Sorter (BD Biosciences) into two 96 well plates containing 3 µl of UV treated TE. The cells were stained with SYBR Green I (Invitrogen) and illuminated by a 488 nm laser (Coherent Inc.). The sorting window was based on size determined by side scatter and green fluorescence (531/40 bp filter). For each plate, single cells were sorted into eight columns, 100 and 10 cells into one column, a droplet of sheath fluid into one column (noise sort), and no droplets at all into two columns (no sort), for a total of one plate (Figure S2).
Single cell lysis and real-time multiple displacement amplification (MDA)
We compared two procedures for UV decontamination of reagents: (i) non-spiked MDA reagents and (ii) spiked MDA reagents. E.coli single cells and controls in one 96-well plate were lysed for 20 min at room temperature using alkaline solution from the Repli-G UltraFast Mini Kit (Qiagen) according to manufacturer's instructions. After neutralization, the samples were amplified using the Repliphi Phi29 reagents (Epicentre). Each 50 µl reaction contained Phi29 Reaction Buffer (1X final concentration), 50 µM random hexamers with the phosphorothioate bonds between the last two nucleotides at the 3′ end d (IDT), 0.4 mM dNTP, 5% DMSO (Sigma), 10 mM DTT (Sigma), 100 U Phi29 and 0.5 µM Syto 13 (Invitrogen). A mastermix of MDA reagents minus the Syto 13 (degrades when exposed to UV) sufficient for a 96-well plate was assembled and then aliquoted into four Eppendorf Safe-Lock 1.5 ml clear microcentrifuge tubes. The tubes of mastermix were UV treated on ice (Figure S1) in the Stratalinker 2400 UV Crosslinker (Stratagene) at 254 nm for 0, 30, 60 and 90 min. These represent the UV doses of 0, 5.7, 11.4 and 17.1 J/cm2, respectively, when measuring inside the eppendorf tubes at the 4 cm distance to the light bulb (Figure S2). Syto 13 was added to the mastermix after UV treatment and each tube of treated MDA mastermix was added to one quarter of the 96-well plate of lysed E.coli single cells including respective controls (Figure S2). The MDA reactions were run in real time on the Roche LightCycler 480 for 17 hours at 30°C. The same procedure was used for a second 96-well plate, but the MDA mastermix was purposefully contaminated with the addition of 50 fg of Bacillus subtilis DNA per MDA reaction prior to UV treatment.
Indexed Illumina library construction and sequencing
Single cell amplified DNA was sheared in 100 µl using the Covaris E210 with the setting of 10% duty cycle, intensity 5, and 200 cycle per burst for 6 min per sample. The concentration and fragment size of each sheared sample was determined on the Caliper GX machine using the manufacture's recommended conditions. The fragment sizes were in the range of 250 to 400 bp, and the concentration ranged from 0 to 37.25 ng/µl. The sheared DNA was end-repaired, A-tailed, and ligated to the Illumina adaptors according to the Illumina standard PE protocol. The adaptor-ligated samples were then amplified by PCR for 10 cycles using a set of 96 indexed primers. The concentration of the resulted 96 Illumina indexed libraries was again determined using the Caliper GX machine. Two nM of DNA fragments (0.5 to 12 µl) of each library were pooled together and the main library bands around 300 bp were gel-purified and dissolved in 30 µl TE. The two library pools, one spiked with B. subtilis DNA and one without, had a concentration of 21.5 ng/µl (or 105.9 nM) and 25.4 ng/µl (or 120.1 nM), respectively. One lane of flowcell was generated from each library pool and sequenced in an Illumina GAIIx sequencer according to the manufacturer's protocols. Approximately 4.1 and 3.5 Gbp of sequence data were collected from the spiked and unspiked pooled libraries, respectively. Another aliquot of 2 nM from a selected set of twenty indexed libraries derived from the unspiked plate were pooled together to form 4 new library pools. Approximately 8 Gbp of additional sequence data was generated from these 4 library pools to increase the sequence depth of these SAGs.
Data analysis
Sequences derived from each SAG were mapped to reference genomes of Escherichia coli K-12 (U00096.2), Delftia acidovorans SPH-1 (CP000884.1), and 22 Pseudomonas genomes (including Pseudomonas syringae (NC_004578.1, NC_004632.1, NC_004633.1, NC_005773.3, NC_007005.1, NC_007274.1, NC_007275.1), Pseudomonas putida (NC_002947.3, NC_009512.1, NC_010322.1, NC_010501.1), Pseudomonas fluorescens (NC_004129.6, NC_007492.2, NC_009444.1, NC_012660.1), Pseudomonas aeruginosa (NC_002516.2, NC_008463.1, NC_009656.1, NC_011770.1), Pseudomonas stutzeri (NC_009434.1), Pseudomonas mendocina (NC_009439.1), and Pseudomonas entomophila (NC_008027.1)) using the short read alignment program bwa (version 0.5.8c, default mapping parameters) [15] to determine the fraction of reads mapping to each of the three groups. Unmapped reads were further compared to NCBI's non-redundant nucleotide database using megablast 2.2.23. The best BLAST hits were used to determine the distribution of phyla matched by the reads from each SAG.
Based on the alignments to Escherichia coli K-12 and de-novo assemblies of all SAGs, we calculated the fraction of the reference genome covered by at least one read, and the contigs resulting from assembly, respectively (Figures 2, S4, and S5). The MDA amplification introduces a tremendous bias in the sequencing coverage of the genome causing problems in the assembly process. Therefore, all raw Illumina sequence data was passed through a filtering program developed at JGI, which filters out known Illumina sequencing and library preparation artifacts. Specifically, all reads containing sequencing adapters, low complexity reads, and reads containing short tandem repeats were removed. Duplicated read pairs derived from PCR amplification during library preparation were identified and consolidated into a single consensus read pair. The artifact filtered sequence data was screened and trimmed according to the k-mers present in the dataset. High-depth k-mers presumably derived from MDA amplification bias cause problems in the assembly, especially if the k-mer depth varies in orders of magnitude for different regions of the genome. We removed reads representing high-abundance k-mers (>32x k-mer depth, k = 31) and trimmed reads that contain unique k-mers.
The filtered reads of each SAG were assembled into contigs using Velvet version 1.1.04 [16]. The VelvetOptimiser script (version 2.1.7) was used with default optimization functions (n50 for k-mer choice, total number of base pairs in large contigs for cov_cutoff optimization).
Rarefaction analysis was performed by sub-sampling the BAM alignment files generated by bwa (see above). For each sample size an appropriate number of pairs of reads were extracted randomly from the BAM file where both reads mapped to the E. coli reference sequence. The mapping information of the sub-samples was used to calculate the fraction of the E. coli reference covered by at least one read. Additionally, we assembled each subsample and mapped the contigs back to the reference (Figure S5).
We also analyzed the error rate of the Illumina reads to assess whether UV treatment has any impact on Phi29 proof reading activity. BAM alignment files were used to calculate the number of exact matching bases, mismatches, insertion, deletions, and number of clipped bases (bwa soft clipping). For each E. coli single cell, error rates were calculated for all reads mapping to the E. coli reference genome.
Supporting Information
Acknowledgments
We thank Eric Tang for the library construction and the JGI production sequencing team for generating the sequences.
Footnotes
Competing Interests: The authors have declared that no competing interests exist.
Funding: This work was conducted by the U.S. Department of Energy Joint Genome Institute and supported by the Office of Science of the U.S. Department of Energy under Contract No. DE-AC02-05CH11231. Dr. Stepanauskas was supported by National Science Foundation (NSF) grants EF-0633142, MCB-738232 and EF-826924. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.
References
- 1.Raghunathan A, Ferguson HR, Jr, Bornarth CJ, Song W, Driscoll M, et al. Genomic DNA amplification from a single bacterium. Appl Environ Microbiol. 2005;71:3342–3347. doi: 10.1128/AEM.71.6.3342-3347.2005. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 2.Marcy Y, Ouverney C, Bik EM, Losekann T, Ivanova N, et al. Dissecting biological “dark matter” with single-cell genetic analysis of rare and uncultivated TM7 microbes from the human mouth. Proceedings of the National Academy of Sciences of the United States of America. 2007;104:11889–11894. doi: 10.1073/pnas.0704662104. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 3.Zhang K, Martiny AC, Reppas NB, Barry KW, Malek J, et al. Sequencing genomes from single cells by polymerase cloning. Nature biotechnology. 2006;24:680–686. doi: 10.1038/nbt1214. [DOI] [PubMed] [Google Scholar]
- 4.Stepanauskas R, Sieracki ME. Matching phylogeny and metabolism in the uncultured marine bacteria, one cell at a time. Proc Natl Acad Sci U S A. 2007;104:9052–9057. doi: 10.1073/pnas.0700496104. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 5.Woyke T, Xie G, Copeland A, Gonzalez JM, Han C, et al. Assembling the marine metagenome, one cell at a time. PLoS One. 2009;4:e5299. doi: 10.1371/journal.pone.0005299. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 6.Woyke T, Tighe D, Mavromatis K, Clum A, Copeland A, et al. One bacterial cell, one complete genome. PLoS One. 2010;5:e10314. doi: 10.1371/journal.pone.0010314. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 7.Blainey PC, Quake SR. Digital MDA for enumeration of total nucleic acid contamination. Nucleic Acids Res. 2010;39:e19. doi: 10.1093/nar/gkq1074. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 8.Cadet J, Berger M, Decarroz C, Wagner JR, van Lier JE, et al. Photosensitized reactions of nucleic acids. Biochimie. 1986;68:813–834. doi: 10.1016/s0300-9084(86)80097-9. [DOI] [PubMed] [Google Scholar]
- 9.Cadet J, Sage E, Douki T. Ultraviolet radiation-mediated damage to cellular DNA. Mutat Res. 2005;571:3–17. doi: 10.1016/j.mrfmmm.2004.09.012. [DOI] [PubMed] [Google Scholar]
- 10.Brash DE, Haseltine WA. UV-induced mutation hotspots occur at DNA damage hotspots. Nature. 1982;298:189–192. doi: 10.1038/298189a0. [DOI] [PubMed] [Google Scholar]
- 11.Varghese AJ, Patrick MH. Cytosine derived heteroadduct formation in ultraviolet-irradiated DNA. Nature. 1969;223:299–300. doi: 10.1038/223299a0. [DOI] [PubMed] [Google Scholar]
- 12.Ou CY, Moore JL, Schochetman G. Use of UV irradiation to reduce false positivity in polymerase chain reaction. Biotechniques. 1991;10:442–446. [PubMed] [Google Scholar]
- 13.Champlot S, Berthelot C, Pruvost M, Bennett EA, Grange T, et al. An efficient multistrategy DNA decontamination procedure of PCR reagents for hypersensitive PCR applications. PLoS One. 2010;5:e13042. doi: 10.1371/journal.pone.0013042. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 14.Rodrigue S, Malmstrom RR, Berlin AM, Birren BW, Henn MR, et al. Whole genome amplification and de novo assembly of single bacterial cells. PLoS One. 2009;4:e6864. doi: 10.1371/journal.pone.0006864. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 15.Li H, Durbin R. Fast and accurate short read alignment with Burrows-Wheeler transform. Bioinformatics. 2009;25:1754–1760. doi: 10.1093/bioinformatics/btp324. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 16.Zerbino DR, Birney E. Velvet: algorithms for de novo short read assembly using de Bruijn graphs. Genome Res. 2008;18:821–829. doi: 10.1101/gr.074492.107. [DOI] [PMC free article] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.