Abstract
To improve the metagenomic analysis of complex microbiomes, we have repurposed restriction endonucleases as methyl specific DNA binding proteins. As an example, we use DpnI immobilized on magnetic beads. The ten minute extraction technique allows specific binding of genomes containing the DpnI Gm6ATC motif common in the genomic DNA of many bacteria including γ-proteobacteria. Using synthetic genome mixtures, we demonstrate 80% recovery of Escherichia coli genomic DNA even when only femtogram quantities are spiked into 10 µg of human DNA background. Binding is very specific with less than 0.5% of human DNA bound. Next Generation Sequencing of input and enriched synthetic mixtures results in over 100-fold enrichment of target genomes relative to human and plant DNA. We also show comparable enrichment when sequencing complex microbiomes such as those from creek water and human saliva. The technique can be broadened to other restriction enzymes allowing for the selective enrichment of trace and unculturable organisms from complex microbiomes and the stratification of organisms according to restriction enzyme enrichment.
Introduction
Next Generation Sequencing (NGS) has reinvigorated the understanding of the role that bacteria play as symbionts and pathogens of plants [1], insects [2], vertebrates [3] and in the environment [4], [5]. NGS has broadened the study of the prokaryotic world beyond the small fraction of bacteria (less than 1%) thought to be culturable [6], [7], [8]. Using NGS for metagenomic studies, in which an entire sample of mixed organismal DNA is sequenced, has the advantage of querying the entire population of isolated DNA and overcomes many biases of other metagenomic methods such as microarray analysis or multiplex PCR. However, there are some drawbacks to using NGS metagenomic strategies. First, sensitivity to microbes may be decreased in the presence of large amounts of non-informative DNA (e.g. eukaryotic DNA). Second, typical metagenomic samples can contain hundreds of bacterial species making it difficult to parse and assemble genomes [9].
Recently developed methods to selectively enrich prokaryotic DNA exploit the 5-methylcytosine (5mC) in CpG sites of eukaryotes (mCpG), a modification largely absent in the bacterial world. One method uses a methyl-binding protein/Fc fusion protein to bind eukaryotic mCpG containing DNA and remove it from the mixture [10]. In an alternate approach, a truncated version of the human cytidylate-phosphate-deoxyguanylate protein has been used to bind non-methylated CpG sequences in bacterial DNA [11]. Bacteria have other stable epigenetic modifications in addition to 5mC including 6-methyladenine (6 mA) and 4-methylcytosine (4mC). The 6 mA modification was shown to occur at 94.1% of the 41,791 GATC sites in the Escherichia coli genome [12] and is widespread in prokaryotes but is otherwise reported only in ciliates and lower eukaryotes [13]. The DNA adenine methyltransferase (DamMT) directs adenine methylation within the context of GATC sequences and is found in at least one clade of bacteria consisting of the orders Enterobacteriales, Vibrionales, Aeromonadales, Pasteurellales and Alteromonadales [14]. In E. coli, GATC methylation influences chromosome replication, gene expression and mismatch repair. In Vibrio cholerae it is required for viability and in Salmonella enterica and Haemophilus influenzae it may act as a virulence factor [14]. 6 mA is also generated by some methyltransferases (MTases) as part of restriction modification systems [15]. Restriction endonucleases rely on methylation patterns to combat invasive genomes, particularly phage, while avoiding digestion of host DNA. Evolution has thus selected for enzymes with exquisite methylation sensitivity.
Here we present a restriction endonuclease-mediated DNA enrichment approach. DpnI is a methyl-directed restriction endonuclease that restricts DNA only when it is methylated on adenine residues within the GATC sequence [16], [17]. We therefore anticipated that DpnI could distinguish bacterial genomes containing the Gm6ATC DNA modification from other bacterial and eukaryotic DNA. By manipulating the reaction conditions, we can use it to bind DNA without cutting. Since DpnI binds to DNA only when it is adenine methylated within GATC sites we predicted little or no binding to eukaryotic DNA and highly specific binding to DNA from DamMT+ bacteria. We demonstrate that DpnI can selectively enrich microbial DNA from synthetic and real-world samples. We extend our approach to a second restriction enzyme, DpnII that specifically enriches non-methylated GATC DNA (e.g. human genome). DNA enriched by this method can be used for PCR, qPCR and NGS analysis. The technique can enable the targeted enrichment of genomes from various microbiomes or the specific identification of pathogens from complex samples. We envision the use of restriction endonuclease binders to stratify complex metagenomic samples into groupings based on methylome signatures. This could link DNA fragments in otherwise poorly assembled contigs, aiding the reconstruction of genomes from unculturable organisms.
Materials and Methods
Genomic DNA was obtained from the ATCC with the exception of the following: E. coli K12 (Affymetrix, Santa Clara, CA); Yersinia pestis, Franscisella tularensis, Burkholderia mallei, Burkholderia cepacia, Brucella abortus, Bacillus anthracis (BEI Resources, Manassas, VA); and Human, Arabidopsis and Rice (Zyagen, San Diego, CA). Commercially available DpnI and pUC19 were purchased from NEB (Ipswich, MA).
DpnI purification and biotinylation
DpnI was purified essentially as described [18] with some modifications. BL21(DE3)A cells transformed with pLS252 were obtained from ATCC. Following a 5 hour expression, cells were harvested, resuspended in 20 mM Tris (pH 7.6), 0.5 M NaCl, 0.1 mM EDTA, 1 mM BME and lysed. Following centrifugation, nucleic acids were removed by polyethyleneimine (PEI) treatment. The PEI supernatant was treated with 75% ammonium sulfate and subjected to centrifugation. The pellet was resuspended in 20 mM Tris pH 7.6, 100 mM NaCl, 0.1 mM EDTA, 1 mM BME and dialyzed against Buffer A (20 mM Tris pH 7.6, 150 mM NaCl, 0.1 mM EDTA, 5 mM BME). The dialysate was loaded onto a phosphocellulose column and eluted with buffer B (20 mM Tris pH 7.6, 1 M NaCl, 0.1 mM EDTA, 5 mM BME). Fractions containing DpnI were pooled, dialyzed against buffer A and loaded onto an EMD sulfate column. Fractions containing DpnI were again pooled, dialyzed against buffer A and loaded onto an EMD sulfate column to remove any remaining contaminates.
DpnI was biotin labeled with the EZ-Link Sulf-NHS-biotin kit (Pierce, Rockford, IL) following the manufacturer's protocol. The extent of biotinylation was evaluated using the HABA assay (Pierce). Each mole of DpnI was found to contain 4-5 mole of biotin.
Restriction activity assay
1 µg of pUC19 was digested in the presence of 100 ng of purified DpnI, DpnI-biotin or with 20 U of commercial DpnI in 20 mM Tris-HCl (pH 7.6), 50 nM NaCl, 10 mM CaCl2, with or without 20 mM MgCl2 for 1 hour at 37°C. Reactions were stopped by the addition of loading buffer containing SYBR green (Life Technologies, Carlsbad, CA). DNA was separated on a 1.5% TBE agarose gel.
Generation of template DNA
DNA was PCR amplified from pUC19 using primers (IDT, San Diego, CA) that resulted in a 477 nt fragment (Forward- TCTGCGCTCTGCTGAAGCCAGTTAC; reverse- GCTGATAAATCTGGAGCCGGTGAGC) or a 651 nt fragment (forward- GGCAGCAGCCACTGGTAACAGGATT; reverse- GATGGAGGCGGATAAAGTTGCAGGA). The 477 nt fragment was treated with dam methyltransferase (NEB) resulting in DNA containing the Gm6ATC modification. All fragments were gel-purified using agarose gel electrophoresis and the MinElute Gel Extraction kit (Qiagen, Venlo, Limburg).
Electrophoretic mobility shift assay
EMSA was carried out as previously described [19] with some modifications. FAM-labeled duplex oligonucleotide containing one Gm6ATC site with the top strand sequence FAM-GCAGGm6ATCAACAGTCACACT (TriLink, San Diego, CA) was incubated with DpnI (or DpnI-biotin) in the presence of 20 mM Tris-HCl, 50 mM NaCl, 10 mM CaCl2, 1 mg/ml BSA and 10 µg/ml salmon sperm DNA for 30 minutes at room temperature. Glycerol was added to a final concentration of 10% and the samples loaded onto a 20% TBE acrylamide gel (Life Technologies) that had been pre-run for 2 hours at 4°C with TBE. Samples were subjected to separation at 200 V for 1.75 hours. FAM-labeled DNA was imaged using an AlphaImager (Protein Simple, Santa Clara, CA).
DpnI pull-down assay
Preparation of DpnI-coated magnetic beads
20 µl streptavidin magnetic beads (NEB) were washed twice with Binding Buffer (10 mM Tris pH 7.9, 50 mM NaCl, 10 mM CaCl2, 0.01% Tween 20). Biotinylated DpnI was added to the beads at 10 ng DpnI/µl beads. After mixing by pipetting, the beads were washed twice with Binding Buffer and used for binding reactions.
DNA pull-down
DNA samples were prepared in Binding Buffer. The assay was performed either in 1.7 ml microcentrifuge tubes or in a 96-well plate. 50 µl DNA samples were added to the DpnI coated beads. The beads were mixed by end-over-end rotation or on a plate shaker for 5 minutes to 1 hour. Magnetic beads were separated using either a tube magnetic stand (Life Technologies) or a plate magnet (Millipore, Billerica, MA). The beads were washed once with Wash Buffer (10 mM Tris pH 7.9, 500 mM NaCl, 10 mM CaCl2, 0.1% Tween 20) followed by one Binding Buffer wash. Beads were resuspended in 50 µl of Binding Buffer for qPCR analysis.
For gel analysis and next-generation library preparation, the DNA was eluted from beads by incubation with 50 µl 5 M guanidinium thiocyanate at room temperature for 5 minutes. The eluent was transferred to a 3500 MWCO dialysis tube (Thermo Scientific, Waltham, MA) and dialyzed against distilled water for 1 hour at room temperature.
Genomic DNA qPCR analysis
Primers were synthesized by IDT and probes were made by Life Technologies. Reactions were prepared using the QuantiProbe FAST PCR Kit (Qiagen) except for the DYZ assay which was prepared with TaqMan Universal Master Mix (Life Technologies). Reactions were cycled once at 95°C for 3 minutes followed by 40 cycles of 95°C for 3 seconds and 60°C for 30 seconds on an ABI 7300. The universal bacterial 16S assay has been described previously [20]. Assays specific for Human RNaseP, human TERT and Arabidopsis ACT2 gene were obtained from Life Tech. E. coli 16S assay: forward -CCAGGGCTACACACGTGCTA; reverse - TCTCGCGAGGTCGCTTCT; probe - AATGGCGCATACAAA. Human DYZ assay: forward - TCGAGTGCATTCCATTCCG; reverse - ATGGAATGGCATCAAACGGAA; probe - TGGCTGTCCATTCCA. Relative abundance was calculated using either a standard curve or the delta Ct method. For the universal 16S assay, standard curves were generated using the genomic DNA of the organism being tested to correct for the varied copy number of the 16S gene.
Preparation of synthetic mixture
Bacterial genomes were obtained through the ATCC or BEI as listed and concentrations determined using the Qubit dsDNA HS assay (Life Technologies). Bacterial genomes were diluted with water to obtain the desired concentrations (Table 1) and validated again using Qubit dsDNA HS assay before assembly of the final synthetic mix.
Table 1. DpnI pulls down genomic DNA from different organisms with varying efficiency.
Family | Organism | Strain | Gram | DamMT | DpnI Pull Down Efficiency* |
Aeromonadaceae | Aeromonas hydrophila | ATCC 7966 | - | + | ++ |
Enterobacteriaeceae | Enterobacter cloacae | ATCC 13047 | - | + | ++ |
Escherichia coli | K12 | - | + | ++ | |
Klebsiella pneumoniae | ATCC 700721 | - | + | ++ | |
Proteus mirabilis | ATCC 12453 | - | + | ++ | |
Salmonella typhimurium | SU453 | - | + | ++ | |
Serratia marcescens subsp. marcescens | ATCC 13880 | - | + | ++ | |
Yersinia pestis | China CDC | - | + | ++ | |
Yersinia pseudotuberculosis | ATCC 13979 | - | + | ++ | |
Pasterellaceae | Haemophilus influenzae | ATCC 51907 | - | + | ++ |
Haemophilus parahaemolyticus | ATCC 10014 | - | + | ++ | |
Haemophilus parainfluenzae | ATCC 33392 | - | + | ++ | |
Legionellaceae | Legionella pneumophila | ATCC 33152 | - | (+) | ++ |
Campylobacteraceae | Campylobacter jejuni subsp. jejuni | ATCC 700819 | - | (+) | + |
Helicobacteraceae | Helicobacter pylori | ATCC 700824, J99 | - | (-) | + |
Burkholderiaceae | Burkholderia mallei | CRP 23344 | - | (-) | + |
Burkholderia cepacia | CRP BRUK102 | - | - | + | |
Brucellaceae | Brucella abortus | CRP 2308 | - | - | + |
Pseudomonadaceae | Pseudomonas aeruginosa | ATCC 47085 | - | - | + |
Bacillaceae | Bacillus anthracis | Sterne | + | - | - |
Enterococcaceae | Enterococcus faecium | ATCC 51559 | + | - | - |
Eukaryota - Fungi | Aspergillus fumigatus | MYA-4609 | N/A | - | +/- |
Eukaryota - Brassicaceae | Arabidopsis thaliana | N/A | - | - | |
Eukaryota - Hominidae | Homo sapiens, male | N/A | - | - |
*Recovery as compared to input by qPCR.
-Less than 2%, +/-2–10%, +10–50%, ++50–100%.
DNA isolation from saliva
The PowerSoil DNA isolation kit (MO BIO Laboratories, Carlsbad, CA) was used to extract DNA from 1 ml of pooled human saliva (BioReclamation, Farmingdale, NY). The DNA was eluted in DpnI Binding Buffer and 400 ng of the DNA was subjected to the DpnI pull-down assay. The input, unbound, and bound/eluted fractions were used to prepare sequencing libraries.
DNA isolation from creek water
A 1000 ml water sample was collected from a creek 25 meters downstream from a sedimentation pond used for primary passive treatment of ground water run-off. A 100 ml aliquot was filtered over a 0.2 µm Nalgene sterile analytical filter unit (Thermo Scientific) prior to DNA extraction with the PowerWater DNA Isolation Kit (MO BIO Laboratories). A 150 ng aliquot of the DNA was subjected to the DpnI pull-down assay. The input, unbound, and bound/eluted fractions were used to prepare sequencing libraries.
Library preparation and sequencing
The Nextera DNA Sample Preparation Kit (Illumina, San Diego, CA) was used to prepare libraries from input, unbound, and bound/eluted fractions from DpnI pull-down assays. Manufacturer's instructions were followed for the library preparation except for recommended number of PCR cycles, which were varied according to the amount of DNA. For the synthetic mixture, they were as follows: Input – 7 cycles, DpnI bound – 10 cycles, DpnI unbound – 7 cycles. Libraries were sequenced following the manufacturer's instructions for the HiSeq 2500 Rapid Run mode to obtain 50 nucleotide read lengths. The files corresponding to all the raw reads generated in this study are publicly available at the NCBI Short Read Archive (SRP044748).
Sequence analysis
For microbial taxa identification, Illumina data sets were analyzed by an automated pipeline (ZovaSeq from Zova Systems, LLC, San Diego CA) in which identifying sequence reads are assigned to specific microbial taxa when a given read length is found to occur uniquely within the taxa as defined by the NCBI taxonomy database [21], [22]. Relative abundance was calculated using two methods which gave equivalent results: tallying the number of ZovaSeq identifying reads for each bacterial taxa or by using Bowtie 1.0.0 to map reads to all identified organisms in the sample by perfect match. For known higher eukaryotes in the sample (Homo sapiens, Oryza sativa) reads were mapped using Bowtie 1.0.0 with parameters allowing 2 mismatches in a 28 bp seed region.
Relative enrichment of the DpnI bound versus input samples were determined by the following equation:
Relative enrichment as compared with human DNA was determined by dividing DpnI enrichment for the organism of interest by DpnI enrichment for human.
Results
6mA is a frequent prokaryotic DNA modification that has only rarely been reported in eukaryotic genomes [13]. Since DpnI is one of a limited number of methyl-directed Type II restriction endonucleases that depend on the presence of 6 mA to bind and cut its target DNA sequence [16], [17], we surmised that it could effectively bind Gm6ATC containing genomes for enrichment, allowing segregation away from non-methylated GATC DNA. To test this, we covalently bound biotin to DpnI to facilitate immobilization of the enzyme onto streptavidin coated particles. This necessitated purification of DpnI since commercial sources for the enzyme are dilute and contain other proteins that prevent us from selectively biotinylating the restriction enzyme. The activity of purified DpnI both before and after biotinylation was analyzed by restriction digestion of pUC19 isolated from DamMT+ E. coli. DpnI and DpnI-biotin were both found to be active when compared to commercially available enzyme, with a slight reduction in activity observed when the protein was biotinylated (Figure 1A).
To effectively bind and separate Gm6ATC DNA fragments from a mixture, the cleavage activity of DpnI must be prevented. We tested DpnI digestion of pUC19 in the absence of magnesium ions and did not observe cleavage activity, as previously reported [16]. Since the absence of magnesium might also affect the binding of DpnI to its target, we tested both DpnI and DpnI-biotin in an electrophoretic mobility shift assay. A FAM-labeled oligonucleotide duplex containing a single Gm6ATC sequence was incubated with increasing amounts of DpnI and DpnI-biotin. Both DpnI and DpnI-biotin are able to bind and shift Gm6ATC containing DNA in the absence of magnesium and no noticeable decrease in the binding affinity is observed when DpnI is biotinylated (Figure 1B).
To test our hypothesis that DpnI could be used to separate Gm6ATC containing DNA from fragments without Gm6ATC sites, we used a mixture of a 477 bp Dam-methylated DNA fragment and a 651 bp non-methylated fragment. The two fragments both contained seven GATC sites and were derived from overlapping regions in pUC19 to minimize bias caused by sequence differences. DpnI-biotin was immobilized onto streptavidin-magnetic particles and titrated into a mixture of the two fragments. DNA that bound to the DpnI-coated particles was eluted and desalted. All fractions were separated by electrophoresis on an agarose gel. An increase in the amount of DpnI-beads resulted in further depletion of the 477 bp fragment. The eluted fractions contained only the 477 bp fragment (Figure 1C, lanes 6–9) leaving the non-methylated 651 bp fragment in the supernatant (Figure 1C, lanes 2–5). Thus immobilized DpnI specifically bound Gm6ATC containing DNA (477 bp) which could be purified away from other fragments.
After observing efficient segregation of specific Gm6ATC DNA fragments, we investigated whether DpnI-biotin was suitable for isolating a Gm6ATC-containing genome when mixed with GATC-containing genomes. A synthetic mix containing 1 ng E. coli and 500 ng of Human genomic DNA was prepared and incubated with immobilized DpnI. After separation, fractions were analyzed using qPCR. We found that DpnI-coated particles isolated E. coli genomic DNA with high efficiency (Figure 2A), binding nearly 80% in 5 minutes. Enrichment was also specific, with 99.6% of Human DNA remaining unbound. Comparable isolation efficiency was observed for the DNA mixtures prepared in buffers ranging from pH 4 to 10 (Figure 2B). Additionally when fragment sizes were at least 3 kb, DpnI binding was not significantly affected, but did decrease with smaller fragments (Figure S3).
The relative genomic composition of complex samples varies widely. We therefore tested the limits of DpnI separation by incubating various amounts of E. coli and human DNA with immobilized DpnI. To test the sensitivity of DpnI separation, the level of human DNA was held constant at 1 µg and E. coli DNA was titrated from 1 ng to 10 fg. We observed approximately 80% recovery of E. coli DNA and rejection of 99.5% of human DNA. Sensitivity was observed to 10 fg E. coli DNA, the detection limit of the qPCR assay used (Figure 2C). This demonstrates efficient separation by DpnI of Gm6ATC containing DNA when present at as low as 10−8 of the level of eukaryotic DNA.
We next tested the ability of DpnI to exclude human DNA present at high concentrations. When the concentration of E. coli DNA was held constant at 1 ng while increasing the concentration of human DNA, we observed E. coli DNA recovery as high as 82% and exceeding 60% even in the presence of 10 µg of human DNA, a 10,000-fold difference (Figure 2D). These results demonstrate that DpnI DNA segregation is effective and efficient with differing ratios of target versus non-target DNA.
We next examined how efficiently DpnI binds genomes from a variety of organisms including some that are clinically relevant [23]. For each organism of interest, 1 ng of bacterial genomic DNA was combined with 1 µg of human DNA. DNA mixtures were incubated with immobilized DpnI. Following segregation, DNA in the DpnI bound and unbound fractions were analyzed by qPCR. DpnI successfully bound and separated genomic DNA from gram-negative organisms known to express DamMT (Table 1). The range of recovery was between 50% and 100% of the measured input. For gram-negative bacteria not known to have a DamMT gene, recovery was lower, from 10% to 45% of the measured input, but still significantly higher than binding to human DNA. Binding of gram-positive bacterial DNA was less than 3% and binding to eukaryotic DNA was below 0.5%. We conclude that DpnI can be used to efficiently bind and segregate genomes from a wide variety of organisms with very little binding to eukaryotic DNA.
To test how well DpnI enrichment can improve the coverage and read depth of prokaryotic DNA in a mixture, we designed a synthetic mixture of genomic DNA that included both eukaryotic and prokaryotic DNA (Table 2). Human DNA made up the bulk of the mixture at over 97% by weight. DNA from rice (1%) and Aspergillus (1%) was added to represent plant and fungal genomes, respectively. Microbe genomes were added in a pair-wise fashion. Each pair consisted of an equal amount of DNA from a DamMT+ and a DamMT- organism, and subsequent pairs were diluted ten-fold to test the limit of DpnI enrichment. The DNA mixture was subjected to DpnI segregation. The DNA from the bulk mixture, the unbound fraction and the bound/eluted fraction were used to prepare sequencing libraries. We found that the number of reads from eukaryotes was dramatically reduced in the DpnI-bound fraction (Figure S5). Reads mapping to the human genome made up 59% of the mapped reads in the synthetic mix input but only 5% in the DpnI-bound fraction. The reads mapping to Oryza (rice) were also greatly reduced, from 31% of the mapped reads in the input sample to 2.5% of the mapped reads in the bound fraction (Figure 3A).
Table 2. Genome mix used for sequencing and relative enrichment results.
Organism in Input Mixture | Relative Enrichment | ||||
Species | Strain | Genome Size | % by mass | Bound vs. Input | Organism vs. Human |
Homo sapiens | 3,209,290,000 | 96.80% | 0.09 | N/A | |
Oryza sativa | 382,780,000 | 1.00% | 0.08 | 0.9 | |
Aspergillus fumigatus | Af293 | 29,390,000 | 1.00% | 0.8 | 9.1 |
Escherichia coli | O157:H7 str. EDL933 | 5,620,000 | 1.00% | 57 | 654 |
Bacillus anthracis | Sterne | 5,228,663 | 0.10% | 1.2 | 13.5 |
Salmonella enterica | Ty2 | 4,790,000 | 0.10% | 58 | 666.1 |
Streptococcus pneumoniae | R6 | 2,038,615 | 0.01% | 0.7 | 7.8 |
Shigella flexneri | 2457T | 4,600,000 | 0.01% | 58.9 | 676.2 |
Staphylococcus aureus | Mu50 | 2,903,147 | 0.001% | 0.9 | 9.8 |
Yersinia pestis | A1122 | 4,660,000 | 0.001% | 72.1 | 827.2 |
Enterococcus faecalis | V583 | 3,360,000 | 0.0001% | ND* | ND* |
Vibrio cholera | N16961 | 3,745,000 | 0.0001% | 75.4 | 865.2 |
Pantoea ananatis | N/A** | N/A** | 55.9 | 641.7 |
ND: Not determined. N/A: Not applicable.
*E. faecalis was not detectable in the Input fraction.
**P. ananatis was not knowingly added in the sample mix but is a probable contaminant of the rice genome (O. sativa).
Surprisingly, we observed that DNAs from all microbial organisms, not just from DamMT+ bacteria, were enriched compared to human and rice (Figure 3B, 3C and Figure S5). DNA from DamMT+ bacteria was most effectively enriched, up to 70-fold compared to input levels and up to 800-fold when directly compared to human (Figure 3B). The E. coli DNA in the mixture was enriched from comprising less than 1% of the reads in the sample input to over 50% of the reads in the bound fraction. This resulted in improved sequencing coverage of the E. coli genome. Only 67% of the E. coli genome sequence was covered by reads in the input sample. Following DpnI enrichment, >99% of the E. coli genome sequence was covered, with a depth of coverage averaging 40 reads. Furthermore, there was no discernable coverage bias in the enriched genomes (Figure 4B), indicating that DpnI enrichment can be used to greatly improve whole genome sequencing. A similar pattern of enrichment was observed for the remaining DamMT+ organisms.
As an exemplar clinical sample, DNA in saliva is overwhelmingly derived from human cells [24], with prokaryotic DNA making up less than 4%. We isolated DNA from saliva and performed a DpnI separation. The input, bound/eluted and unbound fractions were sequenced. Whereas human reads made up over 75% of the total reads in the input sample, following DpnI enrichment less than 5% of the total reads were human (Figure 5A). Prokaryotic reads increased from less than 5% of the total reads to over 50% in the DpnI-enriched fraction. There are a significant number of reads that could not be assigned to any organism. This is likely due to the high number of unsequenced organisms in the sample. The most abundant genera in the sample were Haemophilus, Neisseria, Veillonella, Prevotella and Streptococcus. Together these five genera comprised 87% of reads mapped to prokaryotes. As expected, a subset of the organisms was highly enriched in the bound fraction while some organisms were not enriched and yet another set were depleted (Figure 5B). Haemophilus, Aggregatibacter, Actinobacillus, Vibrio and Treponema were all enriched ten-fold in the bound fraction compared to input (Figure 5B). Haemophilus parainfluenzae was a major component of both the input and bound fractions and was enriched 36-fold compared to input. Though not enriched, Prevotella, an organism closely associated with dental carries [25], is still a major component of the bound fraction. Other organisms were undetectable in the input fraction but had mapped reads in the bound fraction (Figure 5B and Table S1).
We next isolated DNA from a water sample collected from a creek after a heavy rain and subjected it to segregation by DpnI. The identified genera segregated into three distinct groups in the bound fraction: highly enriched, slightly enriched and non-enriched (Figure 6A). Eleven genera were enriched over 20-fold compared to input. Of these, Aeromonas, Shewanella, Pantoea, Enterobacter and Rahnella were the most abundant in the bound fraction. For example, we found a high number of identifying reads in the bound fraction that mapped to the fish pathogen Aeromonas salmonicida (over 18% of mapped reads and 0.48% of the total reads). The same organism represented less than 6% of mapped reads and 0.014% of the total reads in the input (Figure S1). The coverage we observed suggests that the sequenced organism is a close relative of Aeromonas salmonicida. DpnI segregation resulted in nearly 35-fold enrichment of this organism's DNA.
Having succeeded in efficiently segregating DNA genomes with DpnI, we investigated whether this approach might be applicable to other restriction enzymes. DpnII is known to have the opposite activity of DpnI in that it recognizes and cuts only non-methylated GATC sequences and DpnII activity is blocked by 6 mA. We therefore expected DpnII to bind to human, but not E. coli genomic DNA. Similar to our experiments with DpnI, we immobilized DpnII to test its ability to separate a mixture of 1 ng of human DNA and 500 ng of E. coli DNA. DpnII was able to enrich the human DNA with minimal binding to E. coli DNA (Figure S2). Therefore restriction endonuclease-mediated DNA separation is not limited to DpnI.
Discussion
Type II restriction endonucleases have been selected during evolution to ensure they do not cut their own DNA, a suicidal event, while quickly binding to and digesting any foreign DNA that lacks the correct methylation pattern [26]. We demonstrate that manipulation of in vitro conditions enables DpnI to bind but not cut DNA containing its target sequence. While the binding affinity of DpnI has not been determined, several restriction enzymes have been measured in the picomolar [27], [28] to nanomolar range [29], [30] and our results support the use of restriction enzymes as strong and specific DNA binding proteins.
DpnI binding to target DNA was rapid, with 75% of E. coli DNA bound after only 5 minutes (Figure 2A). We also observed highly specific binding with over 99.5% of human DNA excluded and over 80% of targeted E. coli DNA binding (Figure 2A, C and D). This rapid and exquisite target discrimination by DpnI in vitro is a reflection of the natural ability of restriction endonucleases to quickly scan and locate target sequences in large amounts of DNA in vivo [31]. Immobilized DpnI can be used to differentially bind and segregate prokaryotic DNA present at 1/10,000 the level of eukaryotic DNA (Figure 2D). Efficient removal of background human genetic material enables pathogen DNA to be concentrated to achieve sensitive detection which could be particularly useful for un-culturable pathogenic bacteria. This feature could be exploited for the diagnostic detection of trace amount of pathogens in clinical samples such as blood from patients with septicemia, a serious infection that lacks an early detection method [32].
One critique of using a methyl-directed binding protein to enrich DNA is that the process may introduce coverage bias with more reads observed in close proximity to the protein binding site. However, when samples were separated by DpnI and then analyzed by NGS, DpnI enrichment resulted in very low sequence coverage biases (Figure 4). The even coverage is likely due to the frequency and distribution of DpnI binding sites in target DNA. For example, in E. coli O157:H7, there are approximately 42,000 GATC sites, 94% of which have been shown by SMRT sequencing to be adenine methylated with an average gap between GATC sites of about 250 bp [12]. Additionally, DpnI segregation generated low biases when input DNA fragments were above 3 kb (Figure S3). Thus typical DNA isolation procedures are sufficient to achieve efficient DpnI segregation. Biases could arise however if smaller bacterial fragments, from degraded DNA for instance, are present.
We predicted little or no binding to eukaryotic DNA and highly specific binding to DNA from DamMT+ bacteria. We did not anticipate the low level binding of DpnI to micro-organisms not known to contain Gm6ATC sites (Figure 3B). This in vitro non-canonical binding may simply reflect a difference in DNA binding affinity compared to the more rigorously studied specificity of restriction activity. Published factors known to affect restriction specificity of DpnI include the presence of non-GATC sequences that contain a methylated adenine residue [33] and DNA topology effects [34]. Alternatively, DNA modifications other than 6mA may be affecting DpnI binding specificity. Although DpnI needs a Gm6ATC site to cut, it appears that at least some amount of binding occurs when that pattern is absent and that binding decreases in the presence of CpG methylation. We observed that when the Aspergillus fumigatus genome which is not known to contain Gm6ATC is treated with a CpG methyltransferase, binding drops significantly (Figure S4). It is unknown whether this is a differential feature of binding versus digestion or an artifact of biotinylation. A more in-depth study of DpnI binding patterns is needed to better understand the binding to DNA from DamMT- organisms.
Observations to date suggest that methyl signatures created by restriction modification systems are only sporadically distributed amongst microbial taxa [26], [35]. In contrast, orphan MTases, such as DamMT, are often conserved across extensive groups of bacteria which rely on these methylation patterns to control crucial cellular processes like chromosome replication [14], [36]. We consider DamMT+ bacteria to be part of a more expansive methylome which would include organisms which methylate at GATC sites in other contexts (e.g. B. amyloliquefaciens, BamHI GG6mATCC). The broad and deep genomic coverage consistently observed when sequencing DpnI enriched DamMT+ bacterial DNA (Figure 4) suggests that the binding kinetics are equivalent across these organisms. We hypothesize that with regard to G6mATC, organisms may divide into genomes that (A) have a DamMT-like density of G6mATC sites and are highly enriched, (B) lower site density that are only slightly enriched and (C) those genomes with no G6mATC sites. This last category may be greatly discriminated against if it possess mCpG sites, as does human DNA, or may result in an equal in abundance in the bound fraction and the input sample when CpG sites are absent (Figure 6A), as is the case for most bacteria.
We demonstrated that by purifying DNA by methylome, enrichment exceeding 50-fold of specific genomes is possible. In the case of the water sample, an organism closely related to Aeromonas salmonicida was highly enriched, with hundreds of thousands of non-normalized reads in the DpnI bound fraction compared to approximately 5000 in the input library. Typically, the high complexity of a microbiome would make reassembling genomes of unknown species challenging. Existing methods rely on bioinformatics, using alignment to reference genomes, nucleotide composition [37], differential coverage binning [38], or variations in gene count [9] to achieve partial assemblies. Our enrichment approach increases coverage, facilitates informatics processes and provides opportunities to characterize previously unsequenced and unculturable microbial taxa in diverse microbial communities.
Enrichment upstream of NGS allows for better coverage and increased certainty of the presence of organisms. This may be useful for samples with a very high load of eukaryotic DNA, such as those from the throat, buccal mucosa, or saliva [24]. The DpnI enrichment of pathogen DNA from saliva has several potential applications. Bacterial populations in saliva change in response to many disease conditions [39]. Identification and quantification of bacterial profiles may be important for detection of oral and/or systemic disease. With only about 100 cultivable strains out of the over 700 oral microbiota taxa [39], DpnI enrichment may provide a reliable way to identify novel bacterial species present in saliva using NGS. For example, Aggregatibacter actinomycetemcomitans, a strain known to be involved in periodontitis [40] was enriched 27-fold over input (Table S1). Treponema denticola, another strain implicated in periodontitis [40], was undetectable in the input fraction but had over 300 associated reads in the bound fraction (Table S1).
DpnI is unique in that it is a methyl-directed type II enzyme that can be used as a tool to bind DNA of a broad clade of widely studied bacteria with impacts on human health. Our demonstration that DpnII, a methyl inhibited type II endonuclease can also be used for differential selection of DNA opens the door to using alternative enzymes for DNA segregation. Over 300 restriction endonucleases with methyl-specific recognition specificities have been catalogued [41] potentially offering many more opportunities to discriminate genomes based on methylation patterns. By choosing restriction endonucleases with different methylation specificities, we envision the ability to stratify complex genomic mixtures into various methylomes, thus simplifying the experimental characterization of any microbiome.
The discovery of restriction endonucleases enabled the biotech revolution. These enzymes now offer a new technical utility, expanding on their natural role as discriminators of their own genomes to allow isolation of genomes from unculturable bacterial genomes present at low levels from diverse hosts and environments. Careful consideration of 6 mA, 4mC and 5mC directed or blocked endonucleases has led us to use these molecular biological tools in new ways and to develop new methodologies that promise additional insights into the natural and pathogenic microbiomes of our world.
Supporting Information
Acknowledgments
We are grateful to Kurt Klimpel for his critical review of this manuscript. We also thank the staff of GHC Technologies for early technical support.
Data Availability
The authors confirm that all data underlying the findings are fully available without restriction. The files corresponding to all the raw next generation sequencing reads generated in this study are publicly available at the NCBI Short Read Archive (SRP044748).
Funding Statement
Funding for this research was provided in full through contract HSHQDC-10-C-00019 to FLIR Systems Inc. by the Department of Homeland Security, Science and Technology Directorate, http://www.dhs.gov/. Dr. Hultgren was the DHS technical representative who reviewed and approved the manuscript for publication. The funders had no additional roles in study design, data collection, analysis or preparation of the manuscript.
References
- 1. Philippot L, Raaijmakers JM, Lemanceau P, van der Putten WH (2013) Going back to the roots: the microbial ecology of the rhizosphere. Nat Rev Microbiol 11: 789–799. [DOI] [PubMed] [Google Scholar]
- 2. Engel P, Moran NA (2013) The gut microbiota of insects - diversity in structure and function. FEMS Microbiol Rev 37: 699–735. [DOI] [PubMed] [Google Scholar]
- 3. Kostic AD, Howitt MR, Garrett WS (2013) Exploring host-microbiota interactions in animal models and humans. Genes Dev 27: 701–718. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 4. Ntougias S, Bourtzis K, Tsiamis G (2013) The microbiology of olive mill wastes. Biomed Res Int 2013: 784591. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 5. Alvarez B, Lopez MM, Biosca EG (2007) Influence of native microbiota on survival of Ralstonia solanacearum phylotype II in river water microcosms. Appl Environ Microbiol 73: 7210–7217. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 6. Vartoukian SR, Palmer RM, Wade WG (2010) Strategies for culture of 'unculturable' bacteria. FEMS Microbiol Lett 309: 1–7. [DOI] [PubMed] [Google Scholar]
- 7. Stewart EJ (2012) Growing unculturable bacteria. J Bacteriol 194: 4151–4160. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 8. Amann RI, Ludwig W, Schleifer KH (1995) Phylogenetic identification and in situ detection of individual microbial cells without cultivation. Microbiological Reviews 59: 143–169. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 9. Carr R, Shen-Orr SS, Borenstein E (2013) Reconstructing the genomic content of microbiome taxa through shotgun metagenomic deconvolution. PLoS Comput Biol 9: e1003292. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 10. Feehery GR, Yigit E, Oyola SO, Langhorst BW, Schmidt VT, et al. (2013) A Method for Selectively Enriching Microbial DNA from Contaminating Vertebrate Host DNA. PLoS One 8: e76096. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 11. Sachse S, Straube E, Lehmann M, Bauer M, Russwurm S, et al. (2009) Truncated Human Cytidylate-Phosphate-Deoxyguanylate-Binding Protein for Improved Nucleic Acid Amplification Technique-Based Detection of Bacterial Species in Human Samples. Journal of Clinical Microbiology 47: 1050–1057. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 12. Fang G, Munera D, Friedman DI, Mandlik A, Chao MC, et al. (2012) Genome-wide mapping of methylated adenine residues in pathogenic Escherichia coli using single-molecule real-time sequencing. Nat Biotechnol 30: 1232–1239. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 13. Ratel D, Ravanat JL, Berger F, Wion D (2006) N6-methyladenine: the other methylated base of DNA. Bioessays 28: 309–315. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 14. Lobner-Olesen A, Skovgaard O, Marinus MG (2005) Dam methylation: coordinating cellular processes. Curr Opin Microbiol 8: 154–160. [DOI] [PubMed] [Google Scholar]
- 15. Loenen WA, Raleigh EA (2013) The other face of restriction: modification-dependent enzymes. Nucleic Acids Res 42: 56–69. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 16. Lacks S, Greenberg B (1975) A deoxyribonuclease of Diplococcus pneumoniae specific for methylated DNA. Journal of Biological Chemistry 250: 4060–4066. [PubMed] [Google Scholar]
- 17. Vovis GF, Lacks S (1977) Complementary action of restriction enzymes endo R ·DpnI and endo R · DpnII on bacteriophage f1 DNA. Journal of Molecular Biology 115: 525–538. [DOI] [PubMed] [Google Scholar]
- 18. de la Campa AG, Springhorn SS, Kale P, Lacks SA (1988) Proteins encoded by the DpnI restriction gene cassette. Hyperproduction and characterization of the DpnI endonuclease. J Biol Chem 263: 14696–14702. [PubMed] [Google Scholar]
- 19. Xu SY, Schildkraut I (1991) Isolation of BamHI variants with reduced cleavage activities. J Biol Chem 266: 4425–4429. [PubMed] [Google Scholar]
- 20. Bispo PJ, de Melo GB, Hofling-Lima AL, Pignatari AC (2010) Detection and gram discrimination of bacterial pathogens from aqueous and vitreous humor using real-time PCR assays. Invest Ophthalmol Vis Sci 52: 873–881. [DOI] [PubMed] [Google Scholar]
- 21. Sayers EW, Barrett T, Benson DA, Bryant SH, Canese K, et al. (2009) Database resources of the National Center for Biotechnology Information. Nucleic Acids Res 37: D5–15. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 22. Benson DA, Karsch-Mizrachi I, Lipman DJ, Ostell J, Sayers EW (2009) GenBank. Nucleic Acids Res 37: D26–31. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 23. Low DA, Weyand NJ, Mahan MJ (2001) Roles of DNA Adenine Methylation in Regulating Bacterial Gene Expression and Virulence. Infection and Immunity 69: 7197–7204. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 24. Methé BA NK, Pop M, Creasy HH, Giglio MG, Huttenhower C, et al. (2012) A framework for human microbiome research. Nature 486: 215–221. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 25. Yang F, Zeng X, Ning K, Liu KL, Lo CC, et al. (2012) Saliva microbiomes distinguish caries-active from healthy human populations. ISME J 6: 1–10. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 26. Vasu K, Nagaraja V (2013) Diverse Functions of Restriction-Modification Systems in Addition to Cellular Defense. Microbiology and Molecular Biology Reviews 77: 53–72. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 27. Lynch TW, Kosztin D, McLean MA, Schulten K, Sligar SG (2002) Dissecting the molecular origins of specific protein-nucleic acid recognition: hydrostatic pressure and molecular dynamics. Biophys J 82: 93–98. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 28. Wong DL, Pavlovich JG, Reich NO (1998) Electrospray ionization mass spectrometric characterization of photocrosslinked DNA-EcoRI DNA methyltransferase complexes. Nucleic Acids Res 26: 645–649. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 29. Taylor JD, Badcoe IG, Clarke AR, Halford SE (1991) EcoRV restriction endonuclease binds all DNA sequences with equal affinity. Biochemistry 30: 8743–8753. [DOI] [PubMed] [Google Scholar]
- 30. Sud'ina AE, Zatsepin TS, Pingoud V, Pingoud A, Oretskaya TS, et al. (2005) Affinity modification of the restriction endonuclease SsoII by 2'-aldehyde-containing double stranded DNAs. Biochemistry (Mosc) 70: 941–947. [DOI] [PubMed] [Google Scholar]
- 31. Bonnet I, Biebricher A, Porte PL, Loverdo C, Benichou O, et al. (2008) Sliding and jumping of single EcoRV restriction enzymes on non-cognate DNA. Nucleic Acids Res 36: 4118–4127. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 32. Yagupsky P, Nolte FS (1990) Quantitative aspects of septicemia. Clin Microbiol Rev 3: 269–279. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 33. Siwek W, Czapinska H, Bochtler M, Bujnicki JM, Skowronek K (2012) Crystal structure and mechanism of action of the N6-methyladenine-dependent type IIM restriction endonuclease R.DpnI. Nucleic Acids Res 40: 7563–7572. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 34. Kingston IJ, Gormley NA, Halford SE (2003) DNA supercoiling enables the type IIS restriction enzyme BspMI to recognise the relative orientation of two DNA sequences. Nucleic Acids Res 31: 5221–5228. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 35. Seshasayee AS, Singh P, Krishna S (2012) Context-dependent conservation of DNA methyltransferases in bacteria. Nucleic Acids Res 40: 7066–7073. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 36. Marinus MG, Casadesus J (2009) Roles of DNA adenine methylation in host–pathogen interactions: mismatch repair, transcriptional regulation, and more. FEMS Microbiology Reviews 33: 488–503. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 37. Herlemann DP, Lundin D, Labrenz M, Jurgens K, Zheng Z, et al. (2013) Metagenomic de novo assembly of an aquatic representative of the verrucomicrobial class Spartobacteria. MBio 4: e00569–00512. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 38. Albertsen M, Hugenholtz P, Skarshewski A, Nielsen KL, Tyson GW, et al. (2013) Genome sequences of rare, uncultured bacteria obtained by differential coverage binning of multiple metagenomes. Nat Biotechnol 31: 533–538. [DOI] [PubMed] [Google Scholar]
- 39. Malamud D (2011) Saliva as a diagnostic fluid. Dent Clin North Am 55: 159–178. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 40. Zhang L, Henson BS, Camargo PM, Wong DT (2009) The clinical value of salivary biomarkers for periodontal disease. Periodontol 2000 51: 25–37. [DOI] [PubMed] [Google Scholar]
- 41. Roberts RJ, Vincze T, Posfai J, Macelis D (2010) REBASE—a database for DNA restriction and modification: enzymes, genes and genomes. Nucleic Acids Res 38: D234–236. [DOI] [PMC free article] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Supplementary Materials
Data Availability Statement
The authors confirm that all data underlying the findings are fully available without restriction. The files corresponding to all the raw next generation sequencing reads generated in this study are publicly available at the NCBI Short Read Archive (SRP044748).