Skip to main content
International Journal of Genomics logoLink to International Journal of Genomics
. 2014 Nov 13;2014:434575. doi: 10.1155/2014/434575

Facile, High Quality Sequencing of Bacterial Genomes from Small Amounts of DNA

Momchilo Vuyisich 1,*, Ayesha Arefin 1, Karen Davenport 1, Shihai Feng 1, Cheryl Gleasner 1, Kim McMurry 1, Beverly Parson-Quintana 1, Jennifer Price 2, Matthew Scholz 1, Patrick Chain 1
PMCID: PMC4247979  PMID: 25478564

Abstract

Sequencing bacterial genomes has traditionally required large amounts of genomic DNA (~1 μg). There have been few studies to determine the effects of the input DNA amount or library preparation method on the quality of sequencing data. Several new commercially available library preparation methods enable shotgun sequencing from as little as 1 ng of input DNA. In this study, we evaluated the NEBNext Ultra library preparation reagents for sequencing bacterial genomes. We have evaluated the utility of NEBNext Ultra for resequencing and de novo assembly of four bacterial genomes and compared its performance with the TruSeq library preparation kit. The NEBNext Ultra reagents enable high quality resequencing and de novo assembly of a variety of bacterial genomes when using 100 ng of input genomic DNA. For the two most challenging genomes (Burkholderia spp.), which have the highest GC content and are the longest, we also show that the quality of both resequencing and de novo assembly is not decreased when only 10 ng of input genomic DNA is used.

1. Introduction

The rapid improvement in quality, quantity, and cost of next generation sequencing (NGS) has resulted in commensurate improvements in analysis techniques. For bacteria, high throughput sequencing has become a routine task. The availability of kits for library preparation, rapid and high content sequencing, and mature data analysis pipelines for genome resequencing and assembly had drastically reduced costs and improved reliability of these results. The commoditization of bacterial genome sequencing has led to more complex applications: clinical and agricultural diagnostics [14], outbreak detection and monitoring [57], human health studies [8, 9], biocatalysis [10, 11], environmental studies [12], and many others [13, 14].

For NGS platforms, current sequencing technologies require that sequencing adapters be ligated to DNA fragments before sequencing is possible. Ligation of adapters to (typically small) DNA fragments is an inefficient process, generating ligated hybrids from only a small fraction of targeted DNA molecules. This limitation in turn increases the required DNA input, with the only goal being to generate sufficient numbers of ligated fragments to allow sequencing. Typical library preparation methods require large amounts (~1 μg) at high concentrations (>25 ng/mL) of DNA for successful library generation, limiting the types of samples that can be sequenced reliably.

Existing library preparation methods have several reported limitations. These include high variability of evenness and completeness of genome coverage as a function of %GC content, input DNA quantities, and sequencing technology [1519]. These impact the amount of sequencing data required and the quality of genome assembly and analysis.

Several library preparation kits that require 1–100 ng of input DNA are now available (New England Biolabs' NEBNext, Illumina's TruSeq Nano, Bioo Scientific's NEXTflex, NuGEN's Ovation Ultralow, etc.). This paper details the results of evaluation of the utility of the NEBNext Ultra library preparation kits for both resequencing and assembly of several bacterial genomes. We compare the evenness and completeness of coverage between NEBNext Ultra and Illumina TruSeq kits for bacterial genomes of varying size and %GC content. Our findings indicate that low DNA input amounts are sufficient to generate high quality sequencing data that can be used for genome resequencing or de novo assembly (if combined with long fragment data).

2. Materials and Methods

2.1. Overview

We sequenced three different bacterial species with various genome lengths (from 5.4 Mb to 6.7 Mb) and containing various %GC contents (from 35% to 68%). Standard input DNA amounts were 100 ng, approximately 10x lower than the required amount for the Illumina TruSeq kit and 10x higher than the minimum DNA inputs per NEBNext Ultra manual specifications. The most challenging (longest genome and highest GC content) bacterial genomes (Burkholderia A and B) were also sequenced with minimal DNA inputs (10 ng). All samples were sequenced on the Illumina HiSeq platform using 2 × 100 bp chemistry. Data analyses consisted of read-mapping the short fragment data to reference genomes using BWA (Burrows-Wheeler Alignment). These data were also combined with long insert mate pair data to evaluate their utility for de novo assembly of the bacterial genomes.

2.2. Bacterial Strains and Genomic DNA Preparation

Genomic DNA from Bacillus anthracis (strain Sterne 34F2) was isolated from a log phase culture using the MO-BIO UltraClean microbial DNA isolation kit. The Escherichia coli strain 2009EL-2050 and genomic DNA purification have been previously described [20]. Burkholderia thailandensis A (strain E254, accession numbers CP004381 and CP004382) and Burkholderia thailandensis B (strain USAMRU Malaysia #20, accession numbers CP004383 and CP004384) are previously reported strains, and DNA was provided by Dr. Paul Keim's group (sequences to be published in Spring 2014). The integrity of all genomic DNA samples was evaluated using agarose gels and their quantity measured with PicoGreen reagents on a Qubit 2.0 instrument.

2.3. Library Preparation (Figure S1)

NEBNext Ultra library preparation protocol consists of several enzymatic and two purification steps, one of which is used for size selection of library fragments. Genomic DNA samples were sheared in 55 μL of TLE buffer (10 mM Tris, 0.1 mM EDTA, pH 8) using Covaris E220 with the following settings: duty cycle 10%, intensity 5, cycle 200, and time 100 sec. After shearing, two enzymatic steps (end preparation and adapter ligation) are performed in the same tube, followed by size selection of the library fragments using a double AMPure cleanup. First AMPure step used 0.4x sample volume of beads and the supernatant was transferred to a clean tube. The second AMPure step used 0.2x sample volume of beads. Selected library fragments were amplified with barcoded primers (10–12 PCR cycles) and purified one more time with AMPure beads (0.5x bead volume) (see Supplementary Material available online at http://dx.doi.org/10.1155/2014/434575).

2.4. Library Quality Control, Quantification, and Sequencing

NEBNext libraries were analyzed using Bioanalyzer 2100 and DNA 1000 or DNA high sensitivity chips, to quantify the library size and assess the level of adapter-dimer and primer-dimer contamination. Libraries were quantified using Illumina library qPCR quantification kits from KAPA Biosystems and sequenced on either the Illumina MiSeq or Illumina HiSeq.

The Illumina data from this study were trimmed to remove any ambiguous bases; any reads shorter than 70 bp after trimming and the corresponding read pairs were discarded. The total number of reads per sample ranged from 6.2 million to 47.8 million before trimming. All data had read lengths of 151 bp with one exception which had read lengths of 101 bp. After trimming, the average read lengths were reduced by less than 3.5% for all samples. The data for each sample were normalized to 70x coverage of the genome after trimming. The average number of reads with a quality greater than Q20 after trimming and normalization ranged from 61% of the total reads to 97% of the total reads. The total number of reads, the number of reads with quality greater than Q20, and the average read lengths before and after trimming for each sample can be found in Table S1. The assemblies were compared to the reference genomes to consider insertion/deletion errors and rearrangements using an in-house Perl script.

2.5. Mapping of Reads to Reference Genomes

For read-mapping, all trimmed reads from each preparation were used. Burrows-Wheeler Alignment (BWA) mapping tool was used, combined with SAMtools and in-house Perl scripts for coverage and insert size analysis [21, 22]. For base coverage we used BWA global alignment option with default parameters. BWA global alignment only reports the best alignment based on score calculated by a set of parameters. If a read has several possible best alignment spots, BWA randomly assigns the read to one spot. All reads mapped to contigs were used to calculate base coverage. For insert size calculation, only properly paired reads (read pair on the same contig and with correct orientation) were used. We report the mean, standard deviation, the minimum, and maximum of the insert size distribution for all short fragment libraries. We utilized three thresholds for reporting coverage: 0%, 1%, and 10% of mean fold coverage.

2.6. Genome Coverage

Calculation of evenness of coverage was performed by calculating the average and standard deviation of coverage across nonoverlapping 10 kbp fragments of the finished genome. Evenness for each fragment was calculated as 1 − (standard deviation of coverage/coverage). All data points (genomic and plasmid coverage, where appropriate) were used to generate box and whisker plots in IBM's statistics program SPSS.

2.7. Assembly Methods

Two deBruijn graph assembly tools were used to evaluate the quality of the short fragment data for the purpose of assembling high quality genomes. IDBA uses only paired reads from short fragment Illumina libraries [23]. Paired reads were randomly selected (in silico) from each sample to generate libraries of approximately 70-fold genome coverage for each sample, in order to normalize the data. The only exception was the E. coli sample prepared with the TruSeq kit, for which only 61-fold coverage was available. Each data set was assembled with IDBA, version 1.1.0.

The 70-fold short fragment Illumina data were combined with previously sequenced long insert mate pair data generated by 454. The 454 data had an average insert size of 8 kbps and provided 7- to 8-fold base coverage, with the exception again for the E. coli samples, which had approximately 3.5-fold coverage. The combined data were assembled with Allpaths, version 44837 [24]. The 454 data were used without trimming or data reduction in the Allpaths assembly.

3. Results and Discussion

3.1. Library Preparation (Figure S1 and Table S1)

The library preparation protocol, as described in Section 2, yields average insert sizes of ~270 ± 15 bps (average library sizes of ~400 ± 15 bps) that are optimal for either 2 × 100 or 2 × 150 bp sequencing on Illumina platforms. Different insert sizes can easily be obtained by adjusting the size selection step (ratio of DNA solution to AMPure beads) as recommended by the manufacturer. It is not necessary to adjust the shearing step, as the sheared DNA produced by Covaris has a very broad size distribution. NEBNext library process provides very consistent results in terms of library size and concentration, even when performed for the very first time.

Prior to normalization and sequencing, samples were analyzed using Qubit (PicoGreen-based method), Bioanalyzer 2100, and quantitative real-time PCR (qPCR, KAPA Biosystems). When the libraries are quantified by qPCR, accurate normalization and clustering was achieved. Unfortunately, this was not the case when molar library concentrations were obtained with Qubit and Bioanalyzer data only (without qPCR). Therefore, we recommend that qPCR library quantification is routinely performed. Sequencing was performed on either Illumina HiSeq (2 × 100 bp) or Illumina MiSeq (2 × 150 bp).

3.2. Evenness of Coverage

Figure 1 (B. anthracis and E. coli) and Figure 2 (B. thailandensis A and B) contain sliding window coverage plots that compare the coverage of each genome by different library preparation method and different DNA input amount. From the figures, it can be seen that the genome coverage is remarkably similar regardless of the library preparation method (NEBNext or TruSeq). Of particular interest is that even the libraries prepared from only 10 ng of genomic DNA produced essentially the same evenness of genome coverage as the rest of the samples (Figures 2(a)2(d), top panel). There are some differences among the data sets, however. As Table 1 shows, the number of true gaps in coverage (0%) is slightly higher for NEBNext than for TruSeq libraries, while the number of gaps is lower for NEBNext when using 1% or 10% average coverage thresholds. The data were not normalized among all samples prior to evenness of coverage comparisons. Instead, they were normalized within each sample relative to the average coverage.

Figure 1.

Figure 1

Evenness of coverage for B. anthracis (a) and E. coli (b) genome sequencing using NEBNext library preparation with 100 ng DNA input (top graph) and TruSeq library preparation with 1,000 ng DNA input (bottom graph). Plots are normalized by the average coverage in each figure.

Figure 2.

Figure 2

Evenness of coverage plots for B. thailandensis A ((a) is chromosome 1 and (b) is chromosome 2) and B. thailandensis B ((c) is chromosome 1 and (d) is chromosome 2) genome sequencing. Top graph within each panel shows NEBNext library preparation data with 10 ng or 100 ng DNA input (black color is for 10 ng samples and red color is for 100 ng samples). Bottom graph within each panel shows TruSeq library preparation data with 1,000 ng DNA input. Plots are normalized by the average coverage in each figure.

Table 1.

Comparison of gap counts by organism, replicon, and library preparation method. Input DNA amount for each sample follows the name of the library preparation method.

Gap counts for E. coli genome
Library preparation method-DNA input Reference Replicon length, bp Cutoff based on average coverage
0 1% 10%
NEBNext-100 ng Plasmid 1 109,274 0 0 24
Plasmid 2 74,213 45 145 318
Plasmid 3 1,549 0 0 8
Chromosome 5,253,138 26 206 1262
TruSeq-1000 ng Plasmid 109,274 0 1 49
Plasmid 74,213 44 200 491
Plasmid 1,549 0 0 25
Chromosome 5,253,138 23 449 2833

Gap counts for B. thailandensis A genome
Library preparation method-DNA input Reference Replicon length, bp Cutoff based on average coverage
0 1% 10%

NEBNext-10 ng Chromosome 1 3,805,980 47 72 329
Chromosome 2 2,870,750 86 125 498
NEBNext-100 ng Chromosome 1 3,805,980 3 35 669
Chromosome 2 2,870,750 78 151 1625
TruSeq-1000 ng Chromosome 1 3,805,980 0 153 2747
Chromosome 2 2,870,750 30 135 1937

Gap counts for B. thailandensis B genome
Library preparation method-DNA input Reference Replicon length, bp Cutoff based on average coverage
0 1% 10%

NEBNext-10 ng Chromosome 1 3,805,980 37 158 1297
Chromosome 2 2,870,750 19 38 717
NEBNext-100 ng Chromosome 1 3,805,980 0 0 0
Chromosome 2 2,870,750 6 158 2328
TruSeq-1000 ng Chromosome 1 3,805,980 0 114 1874
Chromosome 2 2,870,750 0 28 1177

Figure 3 shows box and whisker plots of the evenness of coverage of 10 kbp windows for each genome. In the case of E. coli, the evenness of coverage for NEBNext Ultra libraries prepared with 100 ng of input DNA is superior to that of TruSeq libraries produced with 1 μg of input DNA. For B. anthracis and B. thailandensis, there is more variation in coverage for the NEBNext preparations at both 100 ng and 10 ng. Further examination of this effect suggests that it is proportional to the amount of input DNA, supporting the theory that NEBNext Ultra kits either do not introduce bias or introduce similar bias to TruSeq kits, with a lower DNA requirement.

Figure 3.

Figure 3

Box and whisker plots of the variation of coverage when mapping reads to the reference genomes. Variation was calculated for nonoverlapping 10 kbp windows. Evenness of coverage is calculated as 1 − (standard deviation of coverage/median coverage).

3.3. Genome Assembly

Figure 4 shows the results of de novo genome assemblies generated using the short fragment data either alone (Figure 4(a), assembled with IDBA) or complemented with long insert mate pair data (Figure 4(b), assembled with Allpaths). IDBA assemblies show very similar results for low (B. anthracis) to medium (E. coli) %GC genomes. However, data obtained from NEBNext libraries show a dramatic reduction in the number of contigs for high %GC genomes from the two Burkholderia strains. Importantly, NEBNext libraries prepared from just 10 ng of genomic DNA maintain the high quality of genome assembly, producing similar numbers of contigs as with 100 ng input DNA samples. Allpaths assemblies show very similar results in terms of the number of contigs produced. The number of scaffolds does not seem to depend on the library preparation method. However, scaffolding mostly depends on the long insert mate pair data, which are the same for all assemblies. Comprehensive assembly statistics are shown in Tables S2a and S2b.

Figure 4.

Figure 4

(a) IDBA assemblies of bacterial genomes using only Illumina short insert paired data. (b) Allpaths assemblies of bacterial genomes using Illumina short insert paired data and 454 long insert paired data.

In conclusion, we have demonstrated that the quality of bacterial genome resequencing and de novo assembly is similar, regardless of the library preparation method and input DNA amount (from 10 to 1000 ng). The only significant difference was observed in the assemblies of the B. thailandensis genomes, where NEBNext library data produced dramatically more contiguous assemblies (Figure 4). In general, the assemblies from the TruSeq libraries were more prone to indels and rearrangements. The full results for the comparisons of our assemblies to the reference genomes are found in Table 2. This is likely due to the improved ability of the NEBNext reagents to more effectively amplify high %GC regions that are very common in Burkholderia genomes. Modern library preparation methods for next generation sequencing technologies, as represented by NEBNext Ultra, are enabling bacterial genome sequencing from very small input amounts of genomic DNA. These methods likely do not require any additional improvement, since handling and quantifying DNA amounts smaller than 10 ng can become challenging.

Table 2.

Genome assembly statistics and comparison to reference genomes for insertions/deletions and rearrangements.

Organism Assembly method Library preparation method Starting DNA quantity, ng # of contigs Total bps Relative to the reference genomes
Deletions Insertions Total indels Rearrangements
B. anthracis Allpaths NEBNext 100 21 5,358,817 170 6,518 6,688 80
TruSeq 1000 23 5,348,296 170 50,762 50,932 92
IDBA NEBNext 100 48 5,349,320 1,525 16,827 18,352 102
TruSeq 1000 41 5,483,709 87,019 1,838 88,857 104

E. coli Allpaths NEBNext 100 82 5,312,046 758 48,753 49,511 222
TruSeq 1000 96 5,269,849 760 80,227 80,987 238
IDBA NEBNext 100 199 5,216,101 111 152,190 152,301 414
TruSeq 1000 207 5,242,710 224 75,926 76,150 446

B. thailandensis
A
Allpaths NEBNext 10 26 6,652,405 100 12,213 12,313 80
NEBNext 100 27 6,650,888 143 13,042 13,185 154
TruSeq 1000 77 6,603,636 6,294 27,237 33,531 252
IDBA NEBNext 10 119 6,575,406 206 38,419 38,625 276
NEBNext 100 117 6,575,500 351 38,382 38,733 280
TruSeq 1000 271 6,582,658 8,320 35,337 43,657 586

B. thailandensis
B
Allpaths NEBNext 10 23 6,660,010 144 12,576 12,720 82
NEBNext 100 27 6,651,385 163 12,278 12,441 86
TruSeq 1000 53 6,655,446 157 16,517 16,674 204
IDBA NEBNext 10 129 6,579,849 749 45,201 45,950 264
NEBNext 100 125 6,579,749 236 44,279 44,515 262
TruSeq 1000 281 6,587,931 8,694 45,997 54,691 612

4. Availability

We have deposited the genomes of Burkholderia thailandensis A (strain E254, accession numbers CP004381 and CP004382) and Burkholderia thailandensis B (strain USAMRU Malaysia #20, accession numbers CP004383 and CP004384) into GenBank. The genome sequences will become available in the Spring of 2014.

All experimental and computational methods used during this work are publically available or can be provided by the authors.

Supplementary Material

Supplementary data describe the NEBNext Ultra workflow (Figure S1), statistics for all NEBNext and TruSeq libraries (Table S1), and statistics for all assemblies generated with IDBA (Table 2a) and Allpaths (Table 2a) algorithms.

434575.f1.pdf (183.7KB, pdf)

Acknowledgments

The authors thank Defense Threat Reduction Agency (DTRA) for funding. They also thank all colleagues within the Genome Science Programs at Los Alamos National Laboratory and research scientists from New England Biolabs.

Conflict of Interests

The authors declare that there is no conflict of interests regarding the publication of this paper.

References

  • 1.Deshpande A., Gans J., Graves S. W., Green L., Taylor L., Kim H. B., Kunde Y. A., Leonard P. M., Li P.-E., Mark J., Song J., Vuyisich M., White P. S. A rapid multiplex assay for nucleic acid-based diagnostics. Journal of Microbiological Methods. 2010;80(2):155–163. doi: 10.1016/j.mimet.2009.12.001. [DOI] [PubMed] [Google Scholar]
  • 2.Dong J., Olano J. P., McBride J. W., Walker D. H. Emerging pathogens: challenges and successes of molecular diagnostics. Journal of Molecular Diagnostics. 2008;10(3):185–197. doi: 10.2353/jmoldx.2008.070063. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 3.Giljohann D. A., Mirkin C. A. Drivers of biodiagnostic development. Nature. 2009;462(7272):461–464. doi: 10.1038/nature08605. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 4.Hu B., Xie G., Lo C.-C., Starkenburg S. R., Chain P. S. G. Pathogen comparative genomics in the next-generation sequencing era: genome alignments, pangenomics and metagenomics. Briefings in Functional Genomics. 2011;10(6):322–333. doi: 10.1093/bfgp/elr042.elr042 [DOI] [PubMed] [Google Scholar]
  • 5.Gilmour M. W., Graham M., van Domselaar G., et al. High-throughput genome sequencing of two Listeria monocytogenes clinical isolates during a large foodborne outbreak. BMC Genomics. 2010;11, article 120 doi: 10.1186/1471-2164-11-120. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 6.Grad Y. H., Lipsitch M., Feldgarden M., Arachchi H. M., Cerqueira G. C., FitzGerald M., Godfrey P., Haas B. J., Murphy C. I., Russ C., Sykes S., Walker B. J., Wortman J. R., Young S., Zeng Q., Abouelleil A., Bochicchio J., Chauvin S., DeSmet T., Gujja S., McCowan C., Montmayeur A., Steelman S., Frimodt-Møller J., Petersen A. M., Struve C., Krogfelt K. A., Bingen E., Weill F.-X., Lander E. S., Nusbaum C., Birren B. W., Hung D. T., Hanage W. P. Genomic epidemiology of the Escherichia coli O104:H4 outbreaks in Europe. Proceedings of the National Academy of Sciences of the United States of America. 2012;109(14, article 3065) doi: 10.1073/pnas.1203955109. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 7.Sherry N. L., Porter J. L., Seemann T., Watkins A., Stinear T. P., Howden B. P. Outbreak investigation using high-throughput genome sequencing within a diagnostic microbiology laboratory. Journal of Clinical Microbiology. 2013;51(5):1396–1401. doi: 10.1128/JCM.03332-12. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 8.Bischoff S. C. ‘Gut health’: a new objective in medicine? BMC Medicine. 2011;9, article 24 doi: 10.1186/1741-7015-9-24. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 9.Eckburg P. B., Bik E. M., Bernstein C. N., et al. Microbiology: diversity of the human intestinal microbial flora. Science. 2005;308(5728):1635–1638. doi: 10.1126/science.1110591. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 10.Kovács Z., Benjamins E., Grau K., Ur Rehman A., Ebrahimi M., Czermak P. Recent developments in manufacturing oligosaccharides with prebiotic functions. (Advances in Biochemical Engineering/Biotechnology).Biotechnology of Food and Feed Additives. 2014;143:257–295. doi: 10.1007/10_2013_237. [DOI] [PubMed] [Google Scholar]
  • 11.Reetz M. T. Biocatalysis in organic chemistry and biotechnology: past, present, and future. Journal of the American Chemical Society. 2013;135(34):12480–12496. doi: 10.1021/ja405051f. [DOI] [PubMed] [Google Scholar]
  • 12.Dunbar J., Eichorst S. A., Gallegos-Graves L. V., et al. Common bacterial responses in six ecosystems exposed to 10 years of elevated atmospheric carbon dioxide. Environmental Microbiology. 2012;14(5):1145–1158. doi: 10.1111/j.1462-2920.2011.02695.x. [DOI] [PubMed] [Google Scholar]
  • 13.MacCannell D. Bacterial strain typing. Clinics in Laboratory Medicine. 2013;33(3):630–650. doi: 10.1016/j.cll.2013.03.005. [DOI] [PubMed] [Google Scholar]
  • 14.Shtarkman Y. M., Koçer Z. A., Edgar R., Veerapaneni R. S., D'Elia T., Morris P. F., Rogers S. O. Subglacial lake vostok (antarctica) accretion ice contains a diverse set of sequences from aquatic, marine and sediment-inhabiting bacteria and eukarya. PLoS ONE. 2013;8(7) doi: 10.1371/journal.pone.0067221.e67221 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 15.Matochko W. L., Derda R. Error analysis of deep sequencing of phage libraries: peptides censored in sequencing. Computational and Mathematical Methods in Medicine. 2013;2013 doi: 10.1155/2013/491612.491612 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 16.Rieber N., Zapatka M., Lasitschka B., Jones D., Northcott P., Hutter B., Jäger N., Kool M., Taylor M., Lichter P., Pfister S., Wolf S., Brors B., Eils R. Coverage bias and sensitivity of variant calling for four whole-genome sequencing technologies. PLoS ONE. 2013;8(6) doi: 10.1371/journal.pone.0066621.e66621 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 17.Ross M. G., Russ C., Costello M., et al. Characterizing and measuring bias in sequence data. Genome Biology. 2013;14, article R51 doi: 10.1186/gb-2013-14-5-r51. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 18.Seguin-Orlando A. Ligation bias in illumina next-generation DNA libraries: implications for sequencing ancient Genomes. PLOS ONE. 2013;8(10) doi: 10.1371/journal.pone.0078575.e78575 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 19.Solonenko S. A., Ignacio-Espinoza J. C., Alberti A., Cruaud C., Hallam S., Konstantinidis K., Tyson G., Wincker P., Sullivan M. B. Sequencing platform and library preparation choices impact viral metagenomes. BMC Genomics. 2013;14(1, article 320) doi: 10.1186/1471-2164-14-320. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 20.Ahmed S. A., Awosika J., Baldwin C., et al. Genomic comparison of Escherichia coli O104:H4 isolates from 2009 and 2011 reveals plasmid, and prophage heterogeneity, including shiga toxin encoding phage stx2. PLoS ONE. 2012;7(11) doi: 10.1371/journal.pone.0048228.e48228 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 21.Li H., Durbin R. Fast and accurate short read alignment with Burrows-Wheeler transform. Bioinformatics. 2009;25(14):1754–1760. doi: 10.1093/bioinformatics/btp324. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 22.Li H., Handsaker B., Wysoker A., Fennell T., Ruan J., Homer N., Marth G., Abecasis G., Durbin R. The Sequence alignment/map (SAM) format and SAMtools. Bioinformatics. 2009;25(16):2078–2079. doi: 10.1093/bioinformatics/btp352. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 23.Peng Y., Leung H. C. M., Yiu S. M., Chin F. Y. L. IDBA-UD: a de novo assembler for single-cell and metagenomic sequencing data with highly uneven depth. Bioinformatics. 2012;28(11):1420–1428. doi: 10.1093/bioinformatics/bts174. [DOI] [PubMed] [Google Scholar]
  • 24.Butler J., MacCallum I., Kleber M., Shlyakhter I. A., Belmonte M. K., Lander E. S., Nusbaum C., Jaffe D. B. ALLPATHS: de novo assembly of whole-genome shotgun microreads. Genome Research. 2008;18(5):810–820. doi: 10.1101/gr.7337908. [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Supplementary data describe the NEBNext Ultra workflow (Figure S1), statistics for all NEBNext and TruSeq libraries (Table S1), and statistics for all assemblies generated with IDBA (Table 2a) and Allpaths (Table 2a) algorithms.

434575.f1.pdf (183.7KB, pdf)

Articles from International Journal of Genomics are provided here courtesy of Wiley

RESOURCES