Facile, High Quality Sequencing of Bacterial Genomes from Small Amounts of DNA

Momchilo Vuyisich; Ayesha Arefin; Karen Davenport; Shihai Feng; Cheryl Gleasner; Kim McMurry; Beverly Parson-Quintana; Jennifer Price; Matthew Scholz; Patrick Chain

doi:10.1155/2014/434575

. 2014 Nov 13;2014:434575. doi: 10.1155/2014/434575

Facile, High Quality Sequencing of Bacterial Genomes from Small Amounts of DNA

Momchilo Vuyisich ^1,^*, Ayesha Arefin ¹, Karen Davenport ¹, Shihai Feng ¹, Cheryl Gleasner ¹, Kim McMurry ¹, Beverly Parson-Quintana ¹, Jennifer Price ², Matthew Scholz ¹, Patrick Chain ¹

PMCID: PMC4247979 PMID: 25478564

Abstract

Sequencing bacterial genomes has traditionally required large amounts of genomic DNA (~1 μg). There have been few studies to determine the effects of the input DNA amount or library preparation method on the quality of sequencing data. Several new commercially available library preparation methods enable shotgun sequencing from as little as 1 ng of input DNA. In this study, we evaluated the NEBNext Ultra library preparation reagents for sequencing bacterial genomes. We have evaluated the utility of NEBNext Ultra for resequencing and de novo assembly of four bacterial genomes and compared its performance with the TruSeq library preparation kit. The NEBNext Ultra reagents enable high quality resequencing and de novo assembly of a variety of bacterial genomes when using 100 ng of input genomic DNA. For the two most challenging genomes (Burkholderia spp.), which have the highest GC content and are the longest, we also show that the quality of both resequencing and de novo assembly is not decreased when only 10 ng of input genomic DNA is used.

1. Introduction

The rapid improvement in quality, quantity, and cost of next generation sequencing (NGS) has resulted in commensurate improvements in analysis techniques. For bacteria, high throughput sequencing has become a routine task. The availability of kits for library preparation, rapid and high content sequencing, and mature data analysis pipelines for genome resequencing and assembly had drastically reduced costs and improved reliability of these results. The commoditization of bacterial genome sequencing has led to more complex applications: clinical and agricultural diagnostics [1–4], outbreak detection and monitoring [5–7], human health studies [8, 9], biocatalysis [10, 11], environmental studies [12], and many others [13, 14].

For NGS platforms, current sequencing technologies require that sequencing adapters be ligated to DNA fragments before sequencing is possible. Ligation of adapters to (typically small) DNA fragments is an inefficient process, generating ligated hybrids from only a small fraction of targeted DNA molecules. This limitation in turn increases the required DNA input, with the only goal being to generate sufficient numbers of ligated fragments to allow sequencing. Typical library preparation methods require large amounts (~1 μg) at high concentrations (>25 ng/mL) of DNA for successful library generation, limiting the types of samples that can be sequenced reliably.

Existing library preparation methods have several reported limitations. These include high variability of evenness and completeness of genome coverage as a function of %GC content, input DNA quantities, and sequencing technology [15–19]. These impact the amount of sequencing data required and the quality of genome assembly and analysis.

Several library preparation kits that require 1–100 ng of input DNA are now available (New England Biolabs' NEBNext, Illumina's TruSeq Nano, Bioo Scientific's NEXTflex, NuGEN's Ovation Ultralow, etc.). This paper details the results of evaluation of the utility of the NEBNext Ultra library preparation kits for both resequencing and assembly of several bacterial genomes. We compare the evenness and completeness of coverage between NEBNext Ultra and Illumina TruSeq kits for bacterial genomes of varying size and %GC content. Our findings indicate that low DNA input amounts are sufficient to generate high quality sequencing data that can be used for genome resequencing or de novo assembly (if combined with long fragment data).

2. Materials and Methods

2.1. Overview

We sequenced three different bacterial species with various genome lengths (from 5.4 Mb to 6.7 Mb) and containing various %GC contents (from 35% to 68%). Standard input DNA amounts were 100 ng, approximately 10x lower than the required amount for the Illumina TruSeq kit and 10x higher than the minimum DNA inputs per NEBNext Ultra manual specifications. The most challenging (longest genome and highest GC content) bacterial genomes (Burkholderia A and B) were also sequenced with minimal DNA inputs (10 ng). All samples were sequenced on the Illumina HiSeq platform using 2 × 100 bp chemistry. Data analyses consisted of read-mapping the short fragment data to reference genomes using BWA (Burrows-Wheeler Alignment). These data were also combined with long insert mate pair data to evaluate their utility for de novo assembly of the bacterial genomes.

2.2. Bacterial Strains and Genomic DNA Preparation

Genomic DNA from Bacillus anthracis (strain Sterne 34F2) was isolated from a log phase culture using the MO-BIO UltraClean microbial DNA isolation kit. The Escherichia coli strain 2009EL-2050 and genomic DNA purification have been previously described [20]. Burkholderia thailandensis A (strain E254, accession numbers CP004381 and CP004382) and Burkholderia thailandensis B (strain USAMRU Malaysia #20, accession numbers CP004383 and CP004384) are previously reported strains, and DNA was provided by Dr. Paul Keim's group (sequences to be published in Spring 2014). The integrity of all genomic DNA samples was evaluated using agarose gels and their quantity measured with PicoGreen reagents on a Qubit 2.0 instrument.

2.3. Library Preparation (Figure S1)

NEBNext Ultra library preparation protocol consists of several enzymatic and two purification steps, one of which is used for size selection of library fragments. Genomic DNA samples were sheared in 55 μL of TLE buffer (10 mM Tris, 0.1 mM EDTA, pH 8) using Covaris E220 with the following settings: duty cycle 10%, intensity 5, cycle 200, and time 100 sec. After shearing, two enzymatic steps (end preparation and adapter ligation) are performed in the same tube, followed by size selection of the library fragments using a double AMPure cleanup. First AMPure step used 0.4x sample volume of beads and the supernatant was transferred to a clean tube. The second AMPure step used 0.2x sample volume of beads. Selected library fragments were amplified with barcoded primers (10–12 PCR cycles) and purified one more time with AMPure beads (0.5x bead volume) (see Supplementary Material available online at http://dx.doi.org/10.1155/2014/434575).

2.4. Library Quality Control, Quantification, and Sequencing

NEBNext libraries were analyzed using Bioanalyzer 2100 and DNA 1000 or DNA high sensitivity chips, to quantify the library size and assess the level of adapter-dimer and primer-dimer contamination. Libraries were quantified using Illumina library qPCR quantification kits from KAPA Biosystems and sequenced on either the Illumina MiSeq or Illumina HiSeq.

The Illumina data from this study were trimmed to remove any ambiguous bases; any reads shorter than 70 bp after trimming and the corresponding read pairs were discarded. The total number of reads per sample ranged from 6.2 million to 47.8 million before trimming. All data had read lengths of 151 bp with one exception which had read lengths of 101 bp. After trimming, the average read lengths were reduced by less than 3.5% for all samples. The data for each sample were normalized to 70x coverage of the genome after trimming. The average number of reads with a quality greater than Q20 after trimming and normalization ranged from 61% of the total reads to 97% of the total reads. The total number of reads, the number of reads with quality greater than Q20, and the average read lengths before and after trimming for each sample can be found in Table S1. The assemblies were compared to the reference genomes to consider insertion/deletion errors and rearrangements using an in-house Perl script.

2.5. Mapping of Reads to Reference Genomes

For read-mapping, all trimmed reads from each preparation were used. Burrows-Wheeler Alignment (BWA) mapping tool was used, combined with SAMtools and in-house Perl scripts for coverage and insert size analysis [21, 22]. For base coverage we used BWA global alignment option with default parameters. BWA global alignment only reports the best alignment based on score calculated by a set of parameters. If a read has several possible best alignment spots, BWA randomly assigns the read to one spot. All reads mapped to contigs were used to calculate base coverage. For insert size calculation, only properly paired reads (read pair on the same contig and with correct orientation) were used. We report the mean, standard deviation, the minimum, and maximum of the insert size distribution for all short fragment libraries. We utilized three thresholds for reporting coverage: 0%, 1%, and 10% of mean fold coverage.

2.6. Genome Coverage

Calculation of evenness of coverage was performed by calculating the average and standard deviation of coverage across nonoverlapping 10 kbp fragments of the finished genome. Evenness for each fragment was calculated as 1 − (standard deviation of coverage/coverage). All data points (genomic and plasmid coverage, where appropriate) were used to generate box and whisker plots in IBM's statistics program SPSS.

2.7. Assembly Methods

Two deBruijn graph assembly tools were used to evaluate the quality of the short fragment data for the purpose of assembling high quality genomes. IDBA uses only paired reads from short fragment Illumina libraries [23]. Paired reads were randomly selected (in silico) from each sample to generate libraries of approximately 70-fold genome coverage for each sample, in order to normalize the data. The only exception was the E. coli sample prepared with the TruSeq kit, for which only 61-fold coverage was available. Each data set was assembled with IDBA, version 1.1.0.

The 70-fold short fragment Illumina data were combined with previously sequenced long insert mate pair data generated by 454. The 454 data had an average insert size of 8 kbps and provided 7- to 8-fold base coverage, with the exception again for the E. coli samples, which had approximately 3.5-fold coverage. The combined data were assembled with Allpaths, version 44837 [24]. The 454 data were used without trimming or data reduction in the Allpaths assembly.

3. Results and Discussion

3.1. Library Preparation (Figure S1 and Table S1)

The library preparation protocol, as described in Section 2, yields average insert sizes of ~270 ± 15 bps (average library sizes of ~400 ± 15 bps) that are optimal for either 2 × 100 or 2 × 150 bp sequencing on Illumina platforms. Different insert sizes can easily be obtained by adjusting the size selection step (ratio of DNA solution to AMPure beads) as recommended by the manufacturer. It is not necessary to adjust the shearing step, as the sheared DNA produced by Covaris has a very broad size distribution. NEBNext library process provides very consistent results in terms of library size and concentration, even when performed for the very first time.

Prior to normalization and sequencing, samples were analyzed using Qubit (PicoGreen-based method), Bioanalyzer 2100, and quantitative real-time PCR (qPCR, KAPA Biosystems). When the libraries are quantified by qPCR, accurate normalization and clustering was achieved. Unfortunately, this was not the case when molar library concentrations were obtained with Qubit and Bioanalyzer data only (without qPCR). Therefore, we recommend that qPCR library quantification is routinely performed. Sequencing was performed on either Illumina HiSeq (2 × 100 bp) or Illumina MiSeq (2 × 150 bp).

3.2. Evenness of Coverage

Figure 1 (B. anthracis and E. coli) and Figure 2 (B. thailandensis A and B) contain sliding window coverage plots that compare the coverage of each genome by different library preparation method and different DNA input amount. From the figures, it can be seen that the genome coverage is remarkably similar regardless of the library preparation method (NEBNext or TruSeq). Of particular interest is that even the libraries prepared from only 10 ng of genomic DNA produced essentially the same evenness of genome coverage as the rest of the samples (Figures 2(a)–2(d), top panel). There are some differences among the data sets, however. As Table 1 shows, the number of true gaps in coverage (0%) is slightly higher for NEBNext than for TruSeq libraries, while the number of gaps is lower for NEBNext when using 1% or 10% average coverage thresholds. The data were not normalized among all samples prior to evenness of coverage comparisons. Instead, they were normalized within each sample relative to the average coverage.

Evenness of coverage for B. anthracis (a) and E. coli (b) genome sequencing using NEBNext library preparation with 100 ng DNA input (top graph) and TruSeq library preparation with 1,000 ng DNA input (bottom graph). Plots are normalized by the average coverage in each figure.

Evenness of coverage plots for B. thailandensis A ((a) is chromosome 1 and (b) is chromosome 2) and B. thailandensis B ((c) is chromosome 1 and (d) is chromosome 2) genome sequencing. Top graph within each panel shows NEBNext library preparation data with 10 ng or 100 ng DNA input (black color is for 10 ng samples and red color is for 100 ng samples). Bottom graph within each panel shows TruSeq library preparation data with 1,000 ng DNA input. Plots are normalized by the average coverage in each figure.

Table 1.

Comparison of gap counts by organism, replicon, and library preparation method. Input DNA amount for each sample follows the name of the library preparation method.

Gap counts for E. coli genome
Library preparation method-DNA input	Reference	Replicon length, bp	Cutoff based on average coverage
Library preparation method-DNA input	Reference	Replicon length, bp	0	1%	10%
NEBNext-100 ng	Plasmid 1	109,274	0	0	24
	Plasmid 2	74,213	45	145	318
	Plasmid 3	1,549	0	0	8
	Chromosome	5,253,138	26	206	1262
TruSeq-1000 ng	Plasmid	109,274	0	1	49
	Plasmid	74,213	44	200	491
	Plasmid	1,549	0	0	25
	Chromosome	5,253,138	23	449	2833

Gap counts for B. thailandensis A genome
Library preparation method-DNA input	Reference	Replicon length, bp	Cutoff based on average coverage
Library preparation method-DNA input	Reference	Replicon length, bp	0	1%	10%

NEBNext-10 ng	Chromosome 1	3,805,980	47	72	329
NEBNext-10 ng	Chromosome 2	2,870,750	86	125	498
NEBNext-100 ng	Chromosome 1	3,805,980	3	35	669
NEBNext-100 ng	Chromosome 2	2,870,750	78	151	1625
TruSeq-1000 ng	Chromosome 1	3,805,980	0	153	2747
TruSeq-1000 ng	Chromosome 2	2,870,750	30	135	1937

Gap counts for B. thailandensis B genome
Library preparation method-DNA input	Reference	Replicon length, bp	Cutoff based on average coverage
Library preparation method-DNA input	Reference	Replicon length, bp	0	1%	10%

NEBNext-10 ng	Chromosome 1	3,805,980	37	158	1297
NEBNext-10 ng	Chromosome 2	2,870,750	19	38	717
NEBNext-100 ng	Chromosome 1	3,805,980	0	0	0
NEBNext-100 ng	Chromosome 2	2,870,750	6	158	2328
TruSeq-1000 ng	Chromosome 1	3,805,980	0	114	1874
TruSeq-1000 ng	Chromosome 2	2,870,750	0	28	1177

Open in a new tab

Figure 3 shows box and whisker plots of the evenness of coverage of 10 kbp windows for each genome. In the case of E. coli, the evenness of coverage for NEBNext Ultra libraries prepared with 100 ng of input DNA is superior to that of TruSeq libraries produced with 1 μg of input DNA. For B. anthracis and B. thailandensis, there is more variation in coverage for the NEBNext preparations at both 100 ng and 10 ng. Further examination of this effect suggests that it is proportional to the amount of input DNA, supporting the theory that NEBNext Ultra kits either do not introduce bias or introduce similar bias to TruSeq kits, with a lower DNA requirement.

Box and whisker plots of the variation of coverage when mapping reads to the reference genomes. Variation was calculated for nonoverlapping 10 kbp windows. Evenness of coverage is calculated as 1 − (standard deviation of coverage/median coverage).

3.3. Genome Assembly

Figure 4 shows the results of de novo genome assemblies generated using the short fragment data either alone (Figure 4(a), assembled with IDBA) or complemented with long insert mate pair data (Figure 4(b), assembled with Allpaths). IDBA assemblies show very similar results for low (B. anthracis) to medium (E. coli) %GC genomes. However, data obtained from NEBNext libraries show a dramatic reduction in the number of contigs for high %GC genomes from the two Burkholderia strains. Importantly, NEBNext libraries prepared from just 10 ng of genomic DNA maintain the high quality of genome assembly, producing similar numbers of contigs as with 100 ng input DNA samples. Allpaths assemblies show very similar results in terms of the number of contigs produced. The number of scaffolds does not seem to depend on the library preparation method. However, scaffolding mostly depends on the long insert mate pair data, which are the same for all assemblies. Comprehensive assembly statistics are shown in Tables S2a and S2b.

(a) IDBA assemblies of bacterial genomes using only Illumina short insert paired data. (b) Allpaths assemblies of bacterial genomes using Illumina short insert paired data and 454 long insert paired data.

In conclusion, we have demonstrated that the quality of bacterial genome resequencing and de novo assembly is similar, regardless of the library preparation method and input DNA amount (from 10 to 1000 ng). The only significant difference was observed in the assemblies of the B. thailandensis genomes, where NEBNext library data produced dramatically more contiguous assemblies (Figure 4). In general, the assemblies from the TruSeq libraries were more prone to indels and rearrangements. The full results for the comparisons of our assemblies to the reference genomes are found in Table 2. This is likely due to the improved ability of the NEBNext reagents to more effectively amplify high %GC regions that are very common in Burkholderia genomes. Modern library preparation methods for next generation sequencing technologies, as represented by NEBNext Ultra, are enabling bacterial genome sequencing from very small input amounts of genomic DNA. These methods likely do not require any additional improvement, since handling and quantifying DNA amounts smaller than 10 ng can become challenging.

Table 2.

Genome assembly statistics and comparison to reference genomes for insertions/deletions and rearrangements.

Organism	Assembly method	Library preparation method	Starting DNA quantity, ng	# of contigs	Total bps	Relative to the reference genomes
Organism	Assembly method	Library preparation method	Starting DNA quantity, ng	# of contigs	Total bps	Deletions	Insertions	Total indels	Rearrangements
B. anthracis	Allpaths	NEBNext	100	21	5,358,817	170	6,518	6,688	80
	Allpaths	TruSeq	1000	23	5,348,296	170	50,762	50,932	92
	IDBA	NEBNext	100	48	5,349,320	1,525	16,827	18,352	102
	IDBA	TruSeq	1000	41	5,483,709	87,019	1,838	88,857	104

E. coli	Allpaths	NEBNext	100	82	5,312,046	758	48,753	49,511	222
	Allpaths	TruSeq	1000	96	5,269,849	760	80,227	80,987	238
	IDBA	NEBNext	100	199	5,216,101	111	152,190	152,301	414
	IDBA	TruSeq	1000	207	5,242,710	224	75,926	76,150	446

B. thailandensis A	Allpaths	NEBNext	10	26	6,652,405	100	12,213	12,313	80
		NEBNext	100	27	6,650,888	143	13,042	13,185	154
		TruSeq	1000	77	6,603,636	6,294	27,237	33,531	252
	IDBA	NEBNext	10	119	6,575,406	206	38,419	38,625	276
		NEBNext	100	117	6,575,500	351	38,382	38,733	280
		TruSeq	1000	271	6,582,658	8,320	35,337	43,657	586

B. thailandensis B	Allpaths	NEBNext	10	23	6,660,010	144	12,576	12,720	82
		NEBNext	100	27	6,651,385	163	12,278	12,441	86
		TruSeq	1000	53	6,655,446	157	16,517	16,674	204
	IDBA	NEBNext	10	129	6,579,849	749	45,201	45,950	264
		NEBNext	100	125	6,579,749	236	44,279	44,515	262
		TruSeq	1000	281	6,587,931	8,694	45,997	54,691	612

Open in a new tab

4. Availability

We have deposited the genomes of Burkholderia thailandensis A (strain E254, accession numbers CP004381 and CP004382) and Burkholderia thailandensis B (strain USAMRU Malaysia #20, accession numbers CP004383 and CP004384) into GenBank. The genome sequences will become available in the Spring of 2014.

All experimental and computational methods used during this work are publically available or can be provided by the authors.

Supplementary Material

Supplementary data describe the NEBNext Ultra workflow (Figure S1), statistics for all NEBNext and TruSeq libraries (Table S1), and statistics for all assemblies generated with IDBA (Table 2a) and Allpaths (Table 2a) algorithms.

434575.f1.pdf^{(183.7KB, pdf)}

Acknowledgments

The authors thank Defense Threat Reduction Agency (DTRA) for funding. They also thank all colleagues within the Genome Science Programs at Los Alamos National Laboratory and research scientists from New England Biolabs.

Conflict of Interests

The authors declare that there is no conflict of interests regarding the publication of this paper.

References

1.Deshpande A., Gans J., Graves S. W., Green L., Taylor L., Kim H. B., Kunde Y. A., Leonard P. M., Li P.-E., Mark J., Song J., Vuyisich M., White P. S. A rapid multiplex assay for nucleic acid-based diagnostics. Journal of Microbiological Methods. 2010;80(2):155–163. doi: 10.1016/j.mimet.2009.12.001. [DOI] [PubMed] [Google Scholar]
2.Dong J., Olano J. P., McBride J. W., Walker D. H. Emerging pathogens: challenges and successes of molecular diagnostics. Journal of Molecular Diagnostics. 2008;10(3):185–197. doi: 10.2353/jmoldx.2008.070063. [DOI] [PMC free article] [PubMed] [Google Scholar]
3.Giljohann D. A., Mirkin C. A. Drivers of biodiagnostic development. Nature. 2009;462(7272):461–464. doi: 10.1038/nature08605. [DOI] [PMC free article] [PubMed] [Google Scholar]
4.Hu B., Xie G., Lo C.-C., Starkenburg S. R., Chain P. S. G. Pathogen comparative genomics in the next-generation sequencing era: genome alignments, pangenomics and metagenomics. Briefings in Functional Genomics. 2011;10(6):322–333. doi: 10.1093/bfgp/elr042.elr042 [DOI] [PubMed] [Google Scholar]
5.Gilmour M. W., Graham M., van Domselaar G., et al. High-throughput genome sequencing of two Listeria monocytogenes clinical isolates during a large foodborne outbreak. BMC Genomics. 2010;11, article 120 doi: 10.1186/1471-2164-11-120. [DOI] [PMC free article] [PubMed] [Google Scholar]
6.Grad Y. H., Lipsitch M., Feldgarden M., Arachchi H. M., Cerqueira G. C., FitzGerald M., Godfrey P., Haas B. J., Murphy C. I., Russ C., Sykes S., Walker B. J., Wortman J. R., Young S., Zeng Q., Abouelleil A., Bochicchio J., Chauvin S., DeSmet T., Gujja S., McCowan C., Montmayeur A., Steelman S., Frimodt-Møller J., Petersen A. M., Struve C., Krogfelt K. A., Bingen E., Weill F.-X., Lander E. S., Nusbaum C., Birren B. W., Hung D. T., Hanage W. P. Genomic epidemiology of the Escherichia coli O104:H4 outbreaks in Europe. Proceedings of the National Academy of Sciences of the United States of America. 2012;109(14, article 3065) doi: 10.1073/pnas.1203955109. [DOI] [PMC free article] [PubMed] [Google Scholar]
7.Sherry N. L., Porter J. L., Seemann T., Watkins A., Stinear T. P., Howden B. P. Outbreak investigation using high-throughput genome sequencing within a diagnostic microbiology laboratory. Journal of Clinical Microbiology. 2013;51(5):1396–1401. doi: 10.1128/JCM.03332-12. [DOI] [PMC free article] [PubMed] [Google Scholar]
8.Bischoff S. C. ‘Gut health’: a new objective in medicine? BMC Medicine. 2011;9, article 24 doi: 10.1186/1741-7015-9-24. [DOI] [PMC free article] [PubMed] [Google Scholar]
9.Eckburg P. B., Bik E. M., Bernstein C. N., et al. Microbiology: diversity of the human intestinal microbial flora. Science. 2005;308(5728):1635–1638. doi: 10.1126/science.1110591. [DOI] [PMC free article] [PubMed] [Google Scholar]
10.Kovács Z., Benjamins E., Grau K., Ur Rehman A., Ebrahimi M., Czermak P. Recent developments in manufacturing oligosaccharides with prebiotic functions. (Advances in Biochemical Engineering/Biotechnology).Biotechnology of Food and Feed Additives. 2014;143:257–295. doi: 10.1007/10_2013_237. [DOI] [PubMed] [Google Scholar]
11.Reetz M. T. Biocatalysis in organic chemistry and biotechnology: past, present, and future. Journal of the American Chemical Society. 2013;135(34):12480–12496. doi: 10.1021/ja405051f. [DOI] [PubMed] [Google Scholar]
12.Dunbar J., Eichorst S. A., Gallegos-Graves L. V., et al. Common bacterial responses in six ecosystems exposed to 10 years of elevated atmospheric carbon dioxide. Environmental Microbiology. 2012;14(5):1145–1158. doi: 10.1111/j.1462-2920.2011.02695.x. [DOI] [PubMed] [Google Scholar]
13.MacCannell D. Bacterial strain typing. Clinics in Laboratory Medicine. 2013;33(3):630–650. doi: 10.1016/j.cll.2013.03.005. [DOI] [PubMed] [Google Scholar]
14.Shtarkman Y. M., Koçer Z. A., Edgar R., Veerapaneni R. S., D'Elia T., Morris P. F., Rogers S. O. Subglacial lake vostok (antarctica) accretion ice contains a diverse set of sequences from aquatic, marine and sediment-inhabiting bacteria and eukarya. PLoS ONE. 2013;8(7) doi: 10.1371/journal.pone.0067221.e67221 [DOI] [PMC free article] [PubMed] [Google Scholar]
15.Matochko W. L., Derda R. Error analysis of deep sequencing of phage libraries: peptides censored in sequencing. Computational and Mathematical Methods in Medicine. 2013;2013 doi: 10.1155/2013/491612.491612 [DOI] [PMC free article] [PubMed] [Google Scholar]
16.Rieber N., Zapatka M., Lasitschka B., Jones D., Northcott P., Hutter B., Jäger N., Kool M., Taylor M., Lichter P., Pfister S., Wolf S., Brors B., Eils R. Coverage bias and sensitivity of variant calling for four whole-genome sequencing technologies. PLoS ONE. 2013;8(6) doi: 10.1371/journal.pone.0066621.e66621 [DOI] [PMC free article] [PubMed] [Google Scholar]
17.Ross M. G., Russ C., Costello M., et al. Characterizing and measuring bias in sequence data. Genome Biology. 2013;14, article R51 doi: 10.1186/gb-2013-14-5-r51. [DOI] [PMC free article] [PubMed] [Google Scholar]
18.Seguin-Orlando A. Ligation bias in illumina next-generation DNA libraries: implications for sequencing ancient Genomes. PLOS ONE. 2013;8(10) doi: 10.1371/journal.pone.0078575.e78575 [DOI] [PMC free article] [PubMed] [Google Scholar]
19.Solonenko S. A., Ignacio-Espinoza J. C., Alberti A., Cruaud C., Hallam S., Konstantinidis K., Tyson G., Wincker P., Sullivan M. B. Sequencing platform and library preparation choices impact viral metagenomes. BMC Genomics. 2013;14(1, article 320) doi: 10.1186/1471-2164-14-320. [DOI] [PMC free article] [PubMed] [Google Scholar]
20.Ahmed S. A., Awosika J., Baldwin C., et al. Genomic comparison of Escherichia coli O104:H4 isolates from 2009 and 2011 reveals plasmid, and prophage heterogeneity, including shiga toxin encoding phage stx2. PLoS ONE. 2012;7(11) doi: 10.1371/journal.pone.0048228.e48228 [DOI] [PMC free article] [PubMed] [Google Scholar]
21.Li H., Durbin R. Fast and accurate short read alignment with Burrows-Wheeler transform. Bioinformatics. 2009;25(14):1754–1760. doi: 10.1093/bioinformatics/btp324. [DOI] [PMC free article] [PubMed] [Google Scholar]
22.Li H., Handsaker B., Wysoker A., Fennell T., Ruan J., Homer N., Marth G., Abecasis G., Durbin R. The Sequence alignment/map (SAM) format and SAMtools. Bioinformatics. 2009;25(16):2078–2079. doi: 10.1093/bioinformatics/btp352. [DOI] [PMC free article] [PubMed] [Google Scholar]
23.Peng Y., Leung H. C. M., Yiu S. M., Chin F. Y. L. IDBA-UD: a de novo assembler for single-cell and metagenomic sequencing data with highly uneven depth. Bioinformatics. 2012;28(11):1420–1428. doi: 10.1093/bioinformatics/bts174. [DOI] [PubMed] [Google Scholar]
24.Butler J., MacCallum I., Kleber M., Shlyakhter I. A., Belmonte M. K., Lander E. S., Nusbaum C., Jaffe D. B. ALLPATHS: de novo assembly of whole-genome shotgun microreads. Genome Research. 2008;18(5):810–820. doi: 10.1101/gr.7337908. [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

434575.f1.pdf^{(183.7KB, pdf)}

[B1] 1.Deshpande A., Gans J., Graves S. W., Green L., Taylor L., Kim H. B., Kunde Y. A., Leonard P. M., Li P.-E., Mark J., Song J., Vuyisich M., White P. S. A rapid multiplex assay for nucleic acid-based diagnostics. Journal of Microbiological Methods. 2010;80(2):155–163. doi: 10.1016/j.mimet.2009.12.001. [DOI] [PubMed] [Google Scholar]

[B2] 2.Dong J., Olano J. P., McBride J. W., Walker D. H. Emerging pathogens: challenges and successes of molecular diagnostics. Journal of Molecular Diagnostics. 2008;10(3):185–197. doi: 10.2353/jmoldx.2008.070063. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B3] 3.Giljohann D. A., Mirkin C. A. Drivers of biodiagnostic development. Nature. 2009;462(7272):461–464. doi: 10.1038/nature08605. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B4] 4.Hu B., Xie G., Lo C.-C., Starkenburg S. R., Chain P. S. G. Pathogen comparative genomics in the next-generation sequencing era: genome alignments, pangenomics and metagenomics. Briefings in Functional Genomics. 2011;10(6):322–333. doi: 10.1093/bfgp/elr042.elr042 [DOI] [PubMed] [Google Scholar]

[B5] 5.Gilmour M. W., Graham M., van Domselaar G., et al. High-throughput genome sequencing of two Listeria monocytogenes clinical isolates during a large foodborne outbreak. BMC Genomics. 2010;11, article 120 doi: 10.1186/1471-2164-11-120. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B6] 6.Grad Y. H., Lipsitch M., Feldgarden M., Arachchi H. M., Cerqueira G. C., FitzGerald M., Godfrey P., Haas B. J., Murphy C. I., Russ C., Sykes S., Walker B. J., Wortman J. R., Young S., Zeng Q., Abouelleil A., Bochicchio J., Chauvin S., DeSmet T., Gujja S., McCowan C., Montmayeur A., Steelman S., Frimodt-Møller J., Petersen A. M., Struve C., Krogfelt K. A., Bingen E., Weill F.-X., Lander E. S., Nusbaum C., Birren B. W., Hung D. T., Hanage W. P. Genomic epidemiology of the Escherichia coli O104:H4 outbreaks in Europe. Proceedings of the National Academy of Sciences of the United States of America. 2012;109(14, article 3065) doi: 10.1073/pnas.1203955109. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B7] 7.Sherry N. L., Porter J. L., Seemann T., Watkins A., Stinear T. P., Howden B. P. Outbreak investigation using high-throughput genome sequencing within a diagnostic microbiology laboratory. Journal of Clinical Microbiology. 2013;51(5):1396–1401. doi: 10.1128/JCM.03332-12. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B8] 8.Bischoff S. C. ‘Gut health’: a new objective in medicine? BMC Medicine. 2011;9, article 24 doi: 10.1186/1741-7015-9-24. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B9] 9.Eckburg P. B., Bik E. M., Bernstein C. N., et al. Microbiology: diversity of the human intestinal microbial flora. Science. 2005;308(5728):1635–1638. doi: 10.1126/science.1110591. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B10] 10.Kovács Z., Benjamins E., Grau K., Ur Rehman A., Ebrahimi M., Czermak P. Recent developments in manufacturing oligosaccharides with prebiotic functions. (Advances in Biochemical Engineering/Biotechnology).Biotechnology of Food and Feed Additives. 2014;143:257–295. doi: 10.1007/10_2013_237. [DOI] [PubMed] [Google Scholar]

[B11] 11.Reetz M. T. Biocatalysis in organic chemistry and biotechnology: past, present, and future. Journal of the American Chemical Society. 2013;135(34):12480–12496. doi: 10.1021/ja405051f. [DOI] [PubMed] [Google Scholar]

[B12] 12.Dunbar J., Eichorst S. A., Gallegos-Graves L. V., et al. Common bacterial responses in six ecosystems exposed to 10 years of elevated atmospheric carbon dioxide. Environmental Microbiology. 2012;14(5):1145–1158. doi: 10.1111/j.1462-2920.2011.02695.x. [DOI] [PubMed] [Google Scholar]

[B13] 13.MacCannell D. Bacterial strain typing. Clinics in Laboratory Medicine. 2013;33(3):630–650. doi: 10.1016/j.cll.2013.03.005. [DOI] [PubMed] [Google Scholar]

[B14] 14.Shtarkman Y. M., Koçer Z. A., Edgar R., Veerapaneni R. S., D'Elia T., Morris P. F., Rogers S. O. Subglacial lake vostok (antarctica) accretion ice contains a diverse set of sequences from aquatic, marine and sediment-inhabiting bacteria and eukarya. PLoS ONE. 2013;8(7) doi: 10.1371/journal.pone.0067221.e67221 [DOI] [PMC free article] [PubMed] [Google Scholar]

[B15] 15.Matochko W. L., Derda R. Error analysis of deep sequencing of phage libraries: peptides censored in sequencing. Computational and Mathematical Methods in Medicine. 2013;2013 doi: 10.1155/2013/491612.491612 [DOI] [PMC free article] [PubMed] [Google Scholar]

[B16] 16.Rieber N., Zapatka M., Lasitschka B., Jones D., Northcott P., Hutter B., Jäger N., Kool M., Taylor M., Lichter P., Pfister S., Wolf S., Brors B., Eils R. Coverage bias and sensitivity of variant calling for four whole-genome sequencing technologies. PLoS ONE. 2013;8(6) doi: 10.1371/journal.pone.0066621.e66621 [DOI] [PMC free article] [PubMed] [Google Scholar]

[B17] 17.Ross M. G., Russ C., Costello M., et al. Characterizing and measuring bias in sequence data. Genome Biology. 2013;14, article R51 doi: 10.1186/gb-2013-14-5-r51. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B18] 18.Seguin-Orlando A. Ligation bias in illumina next-generation DNA libraries: implications for sequencing ancient Genomes. PLOS ONE. 2013;8(10) doi: 10.1371/journal.pone.0078575.e78575 [DOI] [PMC free article] [PubMed] [Google Scholar]

[B19] 19.Solonenko S. A., Ignacio-Espinoza J. C., Alberti A., Cruaud C., Hallam S., Konstantinidis K., Tyson G., Wincker P., Sullivan M. B. Sequencing platform and library preparation choices impact viral metagenomes. BMC Genomics. 2013;14(1, article 320) doi: 10.1186/1471-2164-14-320. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B20] 20.Ahmed S. A., Awosika J., Baldwin C., et al. Genomic comparison of Escherichia coli O104:H4 isolates from 2009 and 2011 reveals plasmid, and prophage heterogeneity, including shiga toxin encoding phage stx2. PLoS ONE. 2012;7(11) doi: 10.1371/journal.pone.0048228.e48228 [DOI] [PMC free article] [PubMed] [Google Scholar]

[B21] 21.Li H., Durbin R. Fast and accurate short read alignment with Burrows-Wheeler transform. Bioinformatics. 2009;25(14):1754–1760. doi: 10.1093/bioinformatics/btp324. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B22] 22.Li H., Handsaker B., Wysoker A., Fennell T., Ruan J., Homer N., Marth G., Abecasis G., Durbin R. The Sequence alignment/map (SAM) format and SAMtools. Bioinformatics. 2009;25(16):2078–2079. doi: 10.1093/bioinformatics/btp352. [DOI] [PMC free article] [PubMed] [Google Scholar]

[B23] 23.Peng Y., Leung H. C. M., Yiu S. M., Chin F. Y. L. IDBA-UD: a de novo assembler for single-cell and metagenomic sequencing data with highly uneven depth. Bioinformatics. 2012;28(11):1420–1428. doi: 10.1093/bioinformatics/bts174. [DOI] [PubMed] [Google Scholar]

[B24] 24.Butler J., MacCallum I., Kleber M., Shlyakhter I. A., Belmonte M. K., Lander E. S., Nusbaum C., Jaffe D. B. ALLPATHS: de novo assembly of whole-genome shotgun microreads. Genome Research. 2008;18(5):810–820. doi: 10.1101/gr.7337908. [DOI] [PMC free article] [PubMed] [Google Scholar]

PERMALINK

Facile, High Quality Sequencing of Bacterial Genomes from Small Amounts of DNA

Momchilo Vuyisich

Ayesha Arefin

Karen Davenport

Shihai Feng

Cheryl Gleasner

Kim McMurry

Beverly Parson-Quintana

Jennifer Price

Matthew Scholz

Patrick Chain

Abstract

1. Introduction

2. Materials and Methods

2.1. Overview

2.2. Bacterial Strains and Genomic DNA Preparation

2.3. Library Preparation (Figure S1)

2.4. Library Quality Control, Quantification, and Sequencing

2.5. Mapping of Reads to Reference Genomes

2.6. Genome Coverage

2.7. Assembly Methods

3. Results and Discussion

3.1. Library Preparation (Figure S1 and Table S1)

3.2. Evenness of Coverage

Figure 1.

Figure 2.

Table 1.

Figure 3.

3.3. Genome Assembly

Figure 4.

Table 2.

4. Availability

Supplementary Material

Acknowledgments

Conflict of Interests

References

Associated Data

Supplementary Materials

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases