Abstract
Massively parallel DNA sequencing offers many benefits, but major inhibitory cost factors include: (1) start-up (i.e., purchasing initial reagents and equipment); (2) buy-in (i.e., getting the smallest possible amount of data from a run); and (3) sample preparation. Reducing sample preparation costs is commonly addressed, but start-up and buy-in costs are rarely addressed. We present dual-indexing systems to address all three of these issues. By breaking the library construction process into universal, re-usable, combinatorial components, we reduce all costs, while increasing the number of samples and the variety of library types that can be combined within runs. We accomplish this by extending the Illumina TruSeq dual-indexing approach to 768 (384 + 384) indexed primers that produce 384 unique dual-indexes or 147,456 (384 × 384) unique combinations. We maintain eight nucleotide indexes, with many that are compatible with Illumina index sequences. We synthesized these indexing primers, purifying them with only standard desalting and placing small aliquots in replicate plates. In qPCR validation tests, 206 of 208 primers tested passed (99% success). We then created hundreds of libraries in various scenarios. Our approach reduces start-up and per-sample costs by requiring only one universal adapter that works with indexed PCR primers to uniquely identify samples. Our approach reduces buy-in costs because: (1) relatively few oligonucleotides are needed to produce a large number of indexed libraries; and (2) the large number of possible primers allows researchers to use unique primer sets for different projects, which facilitates pooling of samples during sequencing. Our libraries make use of standard Illumina sequencing primers and index sequence length and are demultiplexed with standard Illumina software, thereby minimizing customization headaches. In subsequent Adapterama papers, we use these same primers with different adapter stubs to construct amplicon and restriction-site associated DNA libraries, but their use can be expanded to any type of library sequenced on Illumina platforms.
Keywords: Illumina, Next Generation Sequencing, NovaSeq, Sample Preparation, Pooling, Multiplexing, Adapters, Primers
Introduction
Massively parallel sequencing, more commonly known as next-generation sequencing (NGS), has transformed the life sciences. The unprecedented amount of sequence data generated by NGS platforms facilitates advances in approaches, techniques, and discoveries (Ansorge, 2009; Tautz, Ellegren & Weigel, 2010; Goodwin, McPherson & McCombie, 2016). Reduced costs (Glenn, 2011; Glenn, 2016) are a major component of NGS success because cost reduction enables many studies that were previously infeasible. Although NGS costs per read have dropped tremendously, the minimum cost to obtain any amount of NGS data (i.e., the minimum buy-in cost) remains high, particularly when researchers want to collect small amounts of DNA sequence data from large numbers of individual samples in a single run. These buy-in costs are largely driven by the money required to purchase adapters containing unique identifying sequences that allow tagging and tracking of samples sequenced in multiplex (see Glossary). For example, the purchase price for a subset of 96, single-index, TruSeq-equivalent adapters described in Faircloth & Glenn (2012) would require an initial investment of at least $3,161 (US; $11,321 with TruGrade® purification), and this investment is exclusive of the additional costs to purchase other necessary library preparation reagents and consumables. A second problem for researchers wishing to collect smaller amounts of sequence data from many samples sequenced in multiplex is the relatively limited number of indexed adapters that are available. Although several publications (e.g., Meyer & Kircher, 2010; Faircloth & Glenn, 2012; Rohland & Reich, 2012) and commercial products (e.g., Illumina Nextera, Illumina, San Diego, CA, USA; Bioo Scientific NEXTflex-HT, Bioo Scientific, Austin, TX, USA) provide schemes for indexing hundreds of individuals sequenced in multiplex, most of these approaches do not facilitate individually tagging many thousands of samples at low cost so that samples can be pooled into a single sequencing run. Given the increasing capacity of high-end Illumina instruments (e.g., Illumina NovaSeq), this is a significant and growing issue. A third constraint that has long been known (Kircher, Sawyer & Meyer, 2012) is that Illumina instruments can mismatch the read(s) and index sequence(s) by hopping or swapping indexes (Sinha et al., 2017; Costello et al., 2018), causing sequence misidentification and other problems. Uniquely tagging each index position significantly reduces these problems (Kircher, Sawyer & Meyer, 2012; Illumina, 2017; Costello et al., 2018). As a result, library preparation methods that reduce costs while simultaneously increasing the number of samples that can be tagged and sequenced together would benefit many types of research.
In this first paper of the Adapterama series, we present the key components of an integrated system for producing 384 uniquely dual-indexed (or 147,456 combinatorially-indexed) Illumina libraries at low cost (Fig. 1, Figs. S1). We build this integrated system on top of previous developments introduced by Illumina (2018a) and others (e.g., Meyer & Kircher, 2010; Fisher et al., 2011), and we show that it is possible to significantly reduce library preparation costs by changing from full-length adapters that incorporate tags in the Illumina TruSeq strategy to shorter universal adapter stubs and indexing primers (hereafter referred to as the iTru strategy; which is similar to the original Illumina indexing strategy Illumina, 2018a). Simply moving from a TruSeq indexing strategy to the iTru indexing strategy, while maintaining a single indexing position, can reduce costs by more than 50% (Table 1). When taking advantage of the dual-indexing offered by our iTru strategy, researchers can reduce costs by at least an order of magnitude relative to TruSeq (Table 1). This method is also extensible to the Illumina Nextera adapter sequences (Syed, Grunenwald & Caruccio, 2009; Adey et al., 2010), hereafter referred to as the iNext approach (Figs. S1–S2; File S1). We focus on describing the iTru system because TruSeq is more commonly used than Nextera and to simplify presentation of the system (details of the iNext system are generally given in the supplemental figures and files). In subsequent Adapterama manuscripts, we extend the system presented here for a variety of applications (e.g., amplicon sequencing and RADseq), but we use our iTru or iNext indexing primers throughout (Fig. S1).
Table 1. Comparison of oligonucleotide numbers and costs when using varying numbers of independent tags.
Uniquely indexed libraries | Librarytype | Index positions | Stub adapter Oligos | Long adapter Oligos | Indexed primers | Adapter cost + Primer cost (US $) |
---|---|---|---|---|---|---|
96 | TruSeq* | 1 | 0 | 1 + 96 | 0 [2#] | $4,019 + $18 |
96 | TruSeq Nano HT | 2 | 0 | 8 + 12 | 0# | $4,560$+ $0 |
96 | iTrua | 1 | 2 | 0 | 1 + 96 | $45 + $1,617 |
96 | iTrub | 2 | 2 | 0 | 8 + 12 | $45 + $344 |
384 | TruSeq* | 1 | 0 | 1 + 384 | 0 [2#] | $16,029 + $18 |
384 | iTrua | 1 | 2 | 0 | 1 + 384 | $45 + $6,416 |
384 | iTrub | 2 | 2 | 0 | 16 + 24 | $45 + $689 |
9216 | TruSeq* | 1 | 0 | 1 + 9216e | 0 [2#] | $392,049 + $18 |
9216 | iTrua | 1 | 2 | 0 | 1 + 9216c | $45 + $153,539 |
9216 | iTrub | 2 | 2 | 0 | 96 + 96 | $45 + $3,333 |
147,456 | iTrub | 2 | 2 | 0 | 384 + 384 | $45 + $13,332 |
Notes.
Original TruSeq approach with custom adapters (cf. Faircloth & Glenn, 2012); kits are no longer available, but the method can be home-brewed (cf. Fisher et al., 2010), or the adapters can be used with reagents from TruSeq Nano kits.
P5 and P7 primers are used.
Price includes all library preparation reagents, not just adapters; P5 and P7 primers are included in kit.
Libraries contain both i5 and i7 tags, but only one iTru5 primer is used for all samples, thus only the i7 tags are informative and are sequenced (cost efficient with old versions of HiSeq ≤2,500 kits). This method is no longer recommended, but illustrates cost differences.
Both the i5 and i7 indexes are informative and are sequenced.
Tags of 11 nucleotides are required for 9216 tags of edit distance ≥3.
Here we outline the ideas underlying genomic library construction for Illumina sequencers, and we provide some historical perspective on Illumina library preparation for researchers new to Illumina sequencing. Following this introduction, we describe our iTru design, which modifies Illumina’s original library construction method and extends the approach to include indexes on both primers (i.e., double-indexing; c.f., Kircher, Sawyer & Meyer, 2012). The iTru method (Figs. 1, 2 and 3) produces: (1) libraries that are compatible with all Illumina sequencing instruments and reagents; (2) libraries that can be pooled (i.e., multiplexed) with other Illumina libraries; (3) libraries that can be sequenced using standard Illumina sequencing primers and protocols; and (4) data that can be demultiplexed with standard Illumina software packages and pipelines.
Illumina libraries
DNA molecules that can be sequenced on Illumina instruments require specific primer-binding sites (i.e., adapters; Glossary) on each end. The procedure to incorporate the adapters to the DNA insert is generally referred to as “library preparation”. Library preparation of genomic DNA, in its most common form, involves randomly shearing DNA to a desired size range (e.g., 200–600 bp); end-repairing and adenylating the sheared DNA; adding synthetic, double-stranded adapters onto each end of the adenylated DNA molecules using T/A ligation; and using limited-cycle PCR amplification to increase the copy number of valid constructs (Figs. 1, 2 and 3, S3; c.f. Fig. S2; Fig. S4).
Illumina library preparations differed from their early competitors (chiefly 454) because their double-stranded adapters used a Y-yoke design to increase library construction efficiency (Bentley et al., 2008; Greigite, 2009). The Y-yoke structure of the adapters allows each starting DNA molecule to serve as two templates, requiring ≥3 cycles of PCR to produce complete double-stranded library molecules (Fig. S3). The DNA molecules resulting from these preparations (Figs. 1, 2 and 3; Fig. S4) contain: (1) outer primer-binding sites (P5 and P7) used to capture individual DNA molecules on the surface of Illumina flow cells and clonally amplify them; (2) separate primer-binding sites (Read 1 and Read 2), located internal to the P5 and P7 sites, that allow directional sequencing of both DNA strands; and (3) short DNA sequences, known as indexes (Glossary; see below), inserted into the P7 side of the adapter molecule (Illumina, 2018a; Fig. 4, i7 index, sequence obtained from Index Read 1; the i5 index was added subsequently, see below).
Indexing
Indexing strategies are generally meant to individually identify different DNA samples by incorporating unique DNA sequences into the library constructs (Shoemaker et al., 1996; Binladen et al., 2007; Glenn et al., 2007; Hoffmann et al., 2007; Meyer et al., 2007; Craig et al., 2008). Indexed libraries can then be pooled together (multiplexed) in a single sequencing lane. During sequencing, individual molecules are captured on the surface of the Illumina flow cells, the individual molecules are clonally amplified, and up to four separate sequencing reactions take place sequentially, each creating a separate sequencing read (Fig. 4). After sequencing, computer software matches the observed index sequence for each molecule to a list of samples with expected indexes (i.e., using a sample sheet; File S2) and parses the bulk data back into its component parts (i.e., demultiplexing, e.g., using bcl2fastq Illumina, 2017).
In practice, the history and current status of Illumina indexing strategies is quite complicated (e.g., Illumina, 2018a), with several transitions among different adapter systems that resulted from changing capabilities of sequencing instruments. Illumina originally created 12 different i7 indexes (Figs. 1; 3 and 4) to allow pooling of up to 12 samples, and the company later increased the number of i7 indexes for certain applications to 48. The original Illumina i7 indexes had a length of six nucleotides (nt) and were constructed such that ≥2 substitution errors were needed to turn one index into another—an effort to minimize sample confusion as a result of sequencing error. Sequencing errors on Illumina instruments are primarily substitutions; thus, Illumina’s initial indexes were designed to be robust to substitution sequencing errors. Deletions, however, are the primary errors of oligonucleotide synthesis (i.e., synthesis of the adapters and/or primers used to make the indexed libraries). It is, therefore, desirable to have indexes that are robust to insertions and deletions (indels) as well as substitutions, thus conforming to an edit-distance metric and limiting the assignment of sequences to the wrong sample (Faircloth & Glenn, 2012). When index sets have edit-distances ≥3, then error correction can be employed, but this distance criterion is frequently violated (Faircloth & Glenn, 2012).
Building upon earlier in-house and external efforts, Illumina introduced a product (Nextera kits) that used an i5 index and an i7 index (i.e., dual-indexing; see Glossary, Fig. 1, and below) each of which were longer (8 nt) and, at that time, conformed to the edit-distance metric. Nextera adapters use the same sequences for interaction with the flow-cell (i.e., P5 and P7; Fig. 1), but have unique Read 1 and Read 2 sequences relative to TruSeq (Figs. S2; S4). Thus, Illumina does not recommend combining Nextera and TruSeq libraries within a single sequencing lane (Illumina, 2012; but see below). Illumina subsequently incorporated 8 nt, dual indexes into the TruSeq system with their release of TruSeqHT. Although the Illumina TruSeqHT indexes are robust to insertion, deletion, and substitution errors, the updated TruSeqHT i7 indexes do not maintain an edit-distance ≥3, when compared to other TruSeq HT i7 indexes in the same set or when combined with all previous Illumina i7 indexes, and so do not allow proper error correction (Fig. S5; File S3). Regardless, the TruSeqHT indexing system is more robust, accurate, and flexible than previous approaches, and researchers can index template DNA molecules using the i7 indexes alone (single-indexing) or in combination with i5 indexes (dual-indexing).
Dual indexing on the Illumina platform means that indexes can be used combinatorially (Kircher, Sawyer & Meyer, 2012; Faircloth & Glenn, 2012). Major advantages of the dual-indexing strategy include: (1) the need for fewer oligonucleotides to index the same number of samples in multiplex (e.g., 8 + 12 = 20 primers produce 8 ×12 = 96 unique tag combinations); (2) concomitantly reducing the cost of production, inventory, and quality control (QC) (i.e., it is less expensive to produce, maintain stocks of, and do QC on 20 primers than 96); and (3) the universality of the approach—dual-indexing is compatible with both full-length adapters (e.g., TruSeqHT libraries) or universal adapter stubs and primers (e.g., Nextera, iNext, or iTru). As noted above, combinatorial indexing is susceptible to index hopping which results in sequences being assigned to the incorrect samples, whereas using unique sequences at multiple index positions (e.g., unique dual-indexes) significantly reduces these problems (Kircher, Sawyer & Meyer, 2012; Illumina, 2017; Costello et al., 2018).
Illumina-compatible libraries
Illumina’s libraries have been the industry’s gold standard for sequence quality on Illumina platforms, but their library preparation kits are among the most expensive available. The number of indexes offered by Illumina was limited to ≤48 and the number of dual-index combinations ≤96, until subsequent releases of additional indexes for the Nextera system, which can dual-index up to 384 samples (Illumina, 2018b). Most recently, Illumina has partnered with Integrated DNA Technologies, Inc. (IDT, Coralville, IA, USA) to develop a set of 192 (96 + 96) indexed adapters that also contain unique molecular identifiers (https://support.illumina.com/downloads/idt-illumina-truseq-ud-indexes-sample-sheet-templates.html; UMIs, Glossary) to improve multiplexing, mitigate sample misassignment due to index hopping, and detect PCR duplicates (IDT, 2018; MacConaill et al., 2018). Alternative commercial kits have been produced to increase efficiency, reduce GC bias (Aird et al., 2011; Kozerewa et al., 2009), and/or increase the number of indexes, but costs remain high and the total number of commercially available indexes still generally remains ≤384.
A variety of library preparation methods have also been described by research groups that reduce per-sample costs relative to most commercial kits (e.g., Meyer & Kircher, 2010 [MK-2010]; Fisher et al., 2011 [F-2011]; see Head et al., 2014 for others). The MK-2010 and F-2011 methods are in widespread use, but they do have some shortcomings. For example, the MK-2010 method: (1) specifies HPLC purification of adapter oligonucleotides, which increases start-up costs dramatically and can lead to contamination from previous oligonucleotides that were purified on the same HPLC columns; (2) relies on hairpin suppression of molecules with identical adapter ends (instead of using a Y-yoke adapter) which is efficient with smaller inserts (e.g., <200 bp) but loses efficiency with increasing insert length; and (3) relies on blunt-ended ligation, which allows the formation of chimeric inserts. The F-2011 method introduced the idea of “on-bead” library preparation, which increases efficiency and reduces costs; thus, many commercial kits have subsequently incorporated similar on-bead library preparation approaches. Limitations of the F-2011 method include use of: (1) custom NEB reagents, not in the standard catalog or available in small quantities; (2) large volumes of enzymes; and (3) Illumina adapters and primers, which increase costs and limit the number of samples that can be pooled.
Our approach builds upon many of the previous approaches introduced by Illumina, MK-2010, F-2011, Rohland & Reich (2012), and others to develop library preparation methods for genomic DNA that overcome many of these limitations. We describe adapters, primers, and library construction methods that produce DNA molecules equivalent to and compatible with Illumina’s TruSeqHT libraries (and, separately, Nextera libraries, see File S1; Table 2). Our method extends the number of available index combinations from 96 × 96 to 384 × 384, while maintaining a minimum edit-distance of ≥3 between all indexes. We demonstrate the effectiveness of our combinatorial indexing primers by controlled quantitative PCR experiments, and we demonstrate the utility of our system by preparing and sequencing iTru libraries from organisms with varying genome size and DNA quality.
Table 2. Comparison of Nextera, iNext, iTru, and TruSeq Nano HT library preparation methods.
Library Type | Nextera | iNext | iTru | TruSeq Nano HT |
---|---|---|---|---|
Input DNA (ng) | Intact (≥50) | Sheared (≥100b) | Sheared (≥100c) | Sheared (≥100) |
Repair ends | N/A | Yes | Yes | Yes |
Add DNA overhang | N/A | C | A | A |
Ligate adapter | Tagmentation | iNext stub | iTru stub | TruSeq |
Limited cycle PCR primers | Nextera or iNext* | Nextera or iNext | iTru | P5 and P7 |
Advantages | Least time | Lower cost, high diversity | Lower cost, high diversity | Industry standard |
Disadvantages | Higher cost, lower diversity, less randomnessb | More prep. time than Nextera | More prep. time than Nextera | Higher cost, more input DNA, more prep. time; not for sequence capture |
Notes.
Note, iNext primers are not specified as biotinylated, and thus will not work interchangeably with Nextera libraries that use streptavidin beads to capture/normalize/purify libraries unless biotins are added. Using unmodified iNext primers requires other purification and normalization procedures.
Tagmentation does not insert adapters into the genome as randomly as shearing the DNA.
Hyper Prep Plus Kits (KapaBioSciences) allow input as low as one ng of intact DNA.
Materials & Methods
Adapter and primer design
We modified the Illumina TruSeq system by dividing the adapter components into two parts: (1) a universal Y-yoke adapter “stub” that comprises parts of the Read 1 and Read 2 primer binding sites plus the Y-yoke; and (2) a set of amplification primers (iTru5, iTru7), parts of which are complementary to the Y-yoke stub and which also contain custom sequence tag(s) for sample indexing (Fig. 1; Fig. 3; Table 3; File S4) as well as the sequences (P5, P7) necessary for clonal amplification on Illumina flow cells. The iTru Y-yoke adapter has a single 5′ thymidine (T) overhang and can be used in standard library preparations that produce insert DNA with single 3′ adenosine (A) overhangs. We designed a large set of indexed amplification primers (iTru5, iTru7; File S4) that contain a subset of our custom 8 nt sequence tags (from (Faircloth & Glenn, 2012), as well as an initial set that incorporated the TruSeq HT indexes (i.e., D5xx for iTru5 and D7xx for iTru7) which could serve as controls. All iTru5 indexes are compatible with Illumina indexes. Some of the iTru7 indexes are not compatible with Illumina indexes (i.e., edit-distance is ≤2). We grouped the iTru primers with our sequence tags into clearly identifiable, numbered sets (100 and 300 series) that are compatible with 8 nt indexes in the standard Illumina TruSeqHT primers, as well as Illumina v2 8 nt indexes (including the 6 nt indexes converted to 8 nt via addition of invariant bases from the adapter). We also created several additional numbered sets (200 and 400 series) of iTru primers that are compatible with all other primers and sequence tags in our iTru system, but which are not compatible with all Illumina indexes. We then balanced the base composition of all iTru primers in all numbered sets in groups of eight for iTru5 and groups of 12 for iTru7, because balanced base composition is critical for successful index sequencing (Illumina, 2016; see Discussion for additional information on combining small numbers of libraries).
Table 3. iTru and iNext adapter stub oligonucleotides and tagged primer sequences.
iTru | |||||
---|---|---|---|---|---|
Adapter | Stub name | Stub sequence | |||
iTru_R2_stub_RCp | /5Phos/GATCGGAAGAGCACACGTCTGAACTCCAGTCAC | ||||
iTru_R1_stub | ACACTCTTTCCCTACACGACGCTCTTCCGATCT | ||||
Primer name | 5′ end | Tag sequence | 3′ end | Tag number | |
i5 | iTru5_01_A | AATGATACGGCGACCACCGAGATCTACAC | ACCGACAA | ACACTCTTTCCCTA*C | tag063 |
iTru5_01_B | AATGATACGGCGACCACCGAGATCTACAC | AGTGGCAA | ACACTCTTTCCCTA*C | tag134 | |
i7 | iTru7_01_01 | CAAGCAGAAGACGGCATACGAGAT | AGTGACCT | GTGACTGGAGTTCA*G | tag132 |
iTru7_01_02 | CAAGCAGAAGACGGCATACGAGAT | AACAGTCC | GTGACTGGAGTTCA*G | tag008 |
We ordered the components of our Y-yoke adapter stubs and iTru primers from IDT, with standard desalting purification only. We modified the adapter stub sequence by phosphorylating the 5′ end of iTru_R2_stub_RCp oligonucleotide (Fig. 1; Table 3), and we modified each of the iTru primer sequences by adding a phosphorothioate bond (Eckstein, 1985) before the 3′ nucleotide of each sequence to inhibit degradation due to the exonuclease activity of proof-reading polymerases (Skerra, 1992), which are commonly used in library preparation. Following initial small-scale orders, we ordered sets of iTru primers, placing the iTru5 and iTru7 primers into every other column (iTru5) or row (iTru7) of 96-well plates, with 0.625 or 1.25 nmol aliquots in replicate plates (Files S4–S5). We hydrated newly synthesized primers to 10 µM in the plate and 5 µM prior to use File S6). Subsequently, we ordered the complete set of 384 iTru5 and 384 iTru7 primers in 96-well plates with 1.25 nmol aliquots (Files S4–S5).
Validation of iTru primers by quantitative PCR (qPCR)
To determine whether our indexed iTru5 and iTru7 primers were biasing amplification, we selected a subset of iTru7 (n = 160) and iTru5 (n = 48) primers for qPCR validation. To validate the iTru primers, we prepared a pool of adapter-ligated chicken DNA using an inexpensive, double-digest RAD approach (3RAD; Graham et al., 2015; Bayona-Vásquez et al., 2019) that produces a DNA construct having 5′ and 3′ ends identical to our Y-yoke adapter. We then set up quantitative PCR reactions with 5 µL GoTaq qPCR Master Mix (Promega, Madison, WI, USA), 1 µL each forward and reverse primer at 5 µM, 2 µL adapter-ligated DNA at 0.12 ng/µL, and 1 µL H2O. Working under the assumption that Illumina primers have been validated as unbiased by Illumina, we tested all forward (iTru5) primers with Illumina D701 as the reverse primer, and we tested all reverse (iTru7) primers with Illumina D501 as the forward primer. We ran all primer tests in duplicate on an Applied Biosystems StepOnePlus (Thermo Fisher Scientific, Waltham, MA, USA) using the following conditions: 95 °C for 2 min, then 40 cycles of 95 °C for 15 s, and 60 °C for 1 min. Because we needed to run multiple plates of qPCR to test all of the primers, we included the iTru5 set 2 primer A (iTru5_02_A) and the iTru7 set 2 primer 1 (iTru7_02_01) on all plates to provide a baseline reference for iTru5 or iTru7 primer performance. We determined the threshold cycle (CT) using the default settings of the StepOnePlus, we averaged CT values from replicate runs, and we calculated Delta CT for each iTru primer using two approaches. First, we evaluated the relative performance of all iTru5 and iTru7 primers by subtracting the CT of the iTru5 or iTru7 primer being tested from the average CT of all iTru5 or iTru7 primers. Second, we evaluated the performance of all iTru5 and iTru7 primers by subtracting the baseline reference CT of iTru5_02_A from the CT of the iTru5 primer being tested and by subtracting the baseline reference CT of iTru7_02_01 from the CT of the iTru7 primer being tested. We expected that unbiased primers would not deviate from the average and/or baseline performance by more than 1.5 PCR cycles (>1.5 CT), a value that should encompass the stochasticity seen between independent PCR reactions as a result of small, unavoidable primer concentration and other amplification performance differences.
Implementation in E. coli and eukaryote libraries: DNA source
To test the performance of both our Y-yoke adapters and the iTru system in a variety of library preparation scenarios, we prepared genomic libraries from DNA of various types and quality. As a simple, known source of control DNA, we used Escherichia coli k-12 strain MG1655 (hereafter E. coli; Roche, Basel, Switzerland), which has a high-quality genome sequence available (GenBank accession NC_000913; 4.6 Mb) and is commonly used for quality control of sequencing libraries. To examine how our iTru system performed with DNA of varying quality and complexity, we also prepared iTru libraries from DNA that we isolated from six samples from a diverse array of species (two sharks, one tarantula, one jellyfish, and a coral). We isolated each of these DNA sources using a variety of techniques commonly used in many labs, including commercial kits, salting out, or CTAB Phenol-Chloroform extraction (Table 4; also see File S1 for additional details about testing iNext). These samples represent the range of species, sampling conditions, and DNA isolation techniques that are commonly encountered in model and non-model organism studies, and the taxa we sampled included particularly challenging specimens (i.e., tarantula, coral and jellyfish) that have previously performed poorly with commercial library preparation kits. Before library preparation, we fragmented E. coli genomic DNA to 400–600 bp using a Covaris S2 (Covaris, Woburn, MA, USA), and we fragmented genomic DNA (normalized to 23 ng/µL) to 400–600 using the Bioruptor UCD-300 sonication device (Diagenode, Denville, NJ, USA).
Table 4. Results from initial iTru library preparation and sequencing tests of DNA from sharks and challenging non-model organisms.
Sample ID | Common name | Species | DNA extraction method | i7 index ID | Raw index count | Number of read pairs | Primary objective | Usable reads | putative mtDNA contig size in bp (mean coverage) | Microsats identifiedb |
---|---|---|---|---|---|---|---|---|---|---|
MaF 5 | white shark | Carcharodon carcharias | Protocol 1 | 705 | 1,930,539 | 1,805,638 | mtDNA | 1,722,562 | 17,103 (46×)c | – |
MaF 19 | white shark | Carcharodon carcharias | Protocol 2 | 707 | 2,075,236 | 1,927,792 | mtDNA | 2,003,858 | 17,138 (31×)c | – |
MaF 10 | silky shark | Carcharhinus falciformis | Protocol 1 | 706 | 1,438,468 | 1,358,550 | mtDNA | 1,800,534 | 17,285 (22×)d | – |
MaF 1 | tarantula | Brachypelma vagans | Protocol 1 | 701 | 985,171 | 934,406 | msats | 80,790 | – | 563 |
MaF 16 | cannonball jellyfish | Stomolophus spp. | Protocol 3 | 703 | 959,516 | 909,401 | msats | 591,608 | – | 92,668 |
MaF 9 | coral | Poritespanamensis | Protocol 1 | 702 | 3,449,711 | 3,298,155 | msats | 1,549,718 | 18,628 (50×)e | 7.322 |
Total | 10,838,641 | 10,233,942 |
Notes.
Only includes high quality reads with inserts of 250 bases; excluded reads generally due to short insert length due to degraded input DNA.
Identified using default parameters in PAL-finder (Castoe et al., 2012).
Implementation in E. coli and eukaryote libraries: library construction
Prior to library preparation, we annealed the iTru adapter sequences to form double-stranded, Y-yoke adapters by mixing equal volumes of the iTru_R1_stub and iTru_R2_stub_RCp oligos at 100 µM, supplementing the mixture with 100 mM NaCl, heating the solution to 98 °C for 2 min in a thermal cycler, and allowing the thermal cycler to slowly cool the mixture to room temperature (File S7).
We prepared genomic iTru libraries from E. coli using kits, reagents, and protocols from Kapa Biosystems (Roche, Basel, Switzerland), with minor modifications to the manufacturer’s instructions. The major change we made was to ligate the universal iTru adapter stubs (Table 3; File S4) to the 3′-adenylated (i.e., +A) DNA fragments, and then use the iTru5 and iTru7 primers with TruSeqHT indexes for limited-cycle amplification (Figs. 1, 2 and 3). For the eukaryotic libraries, we further modified the manufacturer’s instructions by using half-volume reaction sizes with the following two changes. We used an inexpensive alternative to commercial SPRI reagents (Sera-Mag SpeedBeads, Thermo-Scientific, Waltham, MA, USA; see File S8) in all cleanup steps. After adapter ligation, we performed a post-ligation cleanup followed by SPRI dual-size selection using first 0.55× PEG/NaCl and then an additional 0.16×SpeedBeads which also contains PEG/NaCl. We outline step-by-step methods for this approach in File S9.
Sequencing
We quantified libraries using a Qubit 2.0 Fluorometer (Thermo Fisher Scientific, Waltham, MA, USA) and KAPA qPCR, checked for index diversity (File S10), and then normalized and pooled all libraries at 10 nM (File S11). We also ensured the quality of library pools by running 1 µL on a Bioanalyzer High Sensitivity chip (Agilent Technologies, Santa Clara, CA, USA). We combined the iTru and iNext E. coli library pools (File S1) with samples from other experiments, and we sequenced the combined pools using a single run in Illumina MiSeq v2 500 cycle kit (PE250). We combined the eukaryotic libraries with additional TruSeq libraries from other experiments and sequenced these on a separate run of Illumina MiSeq v2 500 cycle kit to produce PE250 reads.
Sequence analysis
After sequencing, we demultiplexed reads using Illumina software (bcl2fastq v 1.8 –2.17; (Illumina, 2017). We then imported reads to Geneious 6.1.7–R9.0.4 and trimmed adapters and low-quality bases (<Q20). We removed reads with inserts of <125 bases prior to all downstream analyses. We mapped E. coli reads back to NC_000913 using the Geneious mapper (fastest setting, single iteration). We assembled reads from the eukaryotic libraries using the Geneious assembler (fastest setting), and we extracted contigs of 250 to 450 bp from eukaryotic libraries of tarantula, jellyfish, and coral for downstream microsatellite searches using msatCommander 1.0.8 (Faircloth, 2008). We also used PAL_FINDER v0.02.03 (Castoe et al., 2012) to enumerate microsatellites within read-pairs that had inserts ≥250 bases. Finally, we extracted contigs of approximately 17 kb from the shark libraries, and we used MEGA-BLAST searches to determine which of these contigs represented shark mtDNA genomes (Díaz-Jaimes et al., 2016). We did the same with approximately 18 kb fragments from the coral (Del Rio Portilla et al., 2016).
Larger-scale tests
Following initial validation of the iTru primers and the utility of the iTru library preparation approach, we placed the iTru system into an extensive test phase in which we routinely used this approach for library construction within our own labs while we also made all components of the iTru system available to dozens of other labs. To demonstrate the utility of our approach across a variety of projects, we analyzed read count data from four of these studies (n = 576 libraries) that used the iTru system as part of a workflow for target enrichment of ultraconserved elements (UCEs; Faircloth et al., 2012). These included 90 iTru libraries prepared by our group from cichlid fishes (McGee et al., 2016), 183 iTru libraries prepared by a second group from carangimorph fishes (Harrington et al., 2016), 100 iTru libraries prepared by a third group from ants (Faircloth et al., 2015; Blaimer et al., 2016), and 203 iTru libraries prepared by our group from birds. For the bird libraries, we prepared one batch of standard Illumina libraries (n = 10) and 2 batches of iTru libraries (n = 203), which allowed us to look at sample-to-sample differences in read counts returned from standard Illumina libraries relative to our iTru libraries. One of the two batches of iTru libraries (n = 92) combined standard Illumina primers (D5xx; which we used on E. coli) on the P5 side with iTru7 primers on the P7 side. The second batch (n = 111) combined iTru5 primers on the P5 side with iTru7 primers on the P7 side. The first batch allowed us to assess iTru7 performance separate from that of iTru5, while the iTru5+iTru7 libraries allowed us to assess performance of the full iTru system relative to all other combinations. For all remaining libraries within the other projects, each group followed the protocols for iTru library preparation described above using combinations of only iTru5 and iTru7 primers.
Following library preparation and PCR amplification, each laboratory combined all libraries into equimolar pools containing 8–12 libraries and followed a standardized protocol for target enrichment of UCE loci (http://ultraconserved.org/; Faircloth et al., 2012). After enrichment, each group used a Bioanalyzer to determine the insert size of enriched libraries and, to reduce the variance in number of reads sequenced from each pool, quantified pools using a commercially available KAPA qPCR kit. Prior to sequencing, all research groups used the average fragment size distribution and qPCR concentration of each pool to produce an equimolar, project-specific pool-of-pooled-libraries for sequencing with a final concentration of 10 nM. We sequenced the enriched cichlid and carangimorph libraries using different, partial runs of PE150 sequencing on an Illumina NextSeq, the ant libraries using one lane of PE125 sequencing on an Illumina HiSeq 2500, and the bird libraries using two lanes of PE150 sequencing on an Illumina HiSeq 1500 (Rapid Run Mode). For the carangimorph fish libraries, we wanted each sample to receive 0.5% of the total number of reads in the NextSeq run. For all other libraries, we wanted each library to receive 1% of the total number of reads. After sequencing, we computed the average number of raw reads returned per sample, the 95% confidence interval (95 CI) of reads returned per sample, and the percentage of reads returned per sample.
Results
Validation of iTru primers by quantitative PCR (qPCR)
Almost all iTru primers (158/160 iTru7 and 48/48 iTru5) had average CT values within 1.5 cycles of both the average Δ CT and the baseline Δ CT (Fig. S8; File S12), suggesting that our iTru indexed amplification primers amplify successfully (98.7% success for iTru7; 100% success for iTru5) and perform similarly to one another. There were two iTru7 primers that failed to amplify during their initial tests, iTru7_401_07 and iTru7_209_04. We rehydrated a new plate of primers and retested iTru7_401_07, which amplified normally (CT = 19.4, Δ CT (average) = − 0.7; Δ CT (baseline) = 1.1) during the retest.
E. coli iTru libraries
The iTru libraries we prepared from E. coli returned similar numbers of reads from each iTru library, averaging 973,008 reads per sample (95 CI: 161,044; Fig. S9; File S13). Each library contained >400,000 high quality reads that covered >99.99% of the known E. coli genome sequence. These results suggest that our genomic iTru library preparation process produces valid constructs for Illumina sequencing, and that iTru dual-indexed libraries pooled at equimolar ratios return roughly similar amounts of sequence data (Fig. S9), although we combined libraries at equimolar ratios prior to sequencing using fluorometry which can result in some variation around the targeted read number for each library.
Eukaryote iTru libraries
We successfully sequenced all eukaryotic genomic libraries prepared using the iTru system and the libraries returned an average of 1,806,440 reads per sample (95 CI: 743,337; Table 4). Using a genome skimming approach, we sequenced the mitogenomes of the shark and coral samples to an average coverage of 33× and 50×, respectively. We used the contig assemblies from our tarantula, jellyfish, and coral samples to design primer pairs targeting >100 microsatellite loci in each taxon. Although the variance in the number of sequencing reads returned per library was higher among these samples than the E. coli libraries, these results demonstrate that the iTru system can be used to prepare libraries from DNA of different organisms extracted using different purification approaches, including DNA that produced very poor results with commercial kits (data not shown).
Larger-scale tests
Our beta test allowed us to collect sequence data from many different iTru5 and iTru7 primers used to index a variety of iTru libraries from fishes (McGee et al., 2016; Harrington et al., 2016), ants (Faircloth et al., 2015; Blaimer et al., 2016), and birds. Few of the libraries that we or others prepared using the iTru system showed large differences in the desired number of reads sequenced when compared to libraries having Illumina-only adapters/index sequences when viewed in aggregate (Fig. S10) or on an index-by-index basis across projects (Figs. S11–S14; File S14). The iTru primer combinations that sometimes returned a lower number of reads for a particular library in a particular project did not show this behavior in other studies (e.g., compare iTru7_402_07 in Fig. S13 versus Fig. S14), suggesting that the reduction in read numbers results from particular library preparation, pooling, enrichment, and quantification practices for specific samples (i.e., specific experimental errors, library preparation methods, or sample-index interactions) rather than inherently bad iTru indexes/primers.
Discussion
Our results show that the iTru universal adapter stubs and iTru primers can be used to produce genomic libraries for a variety of purposes. The low variance in CT values among iTru5 and iTru7 primers demonstrates that the different index sequences have minimal effect on the libraries, and our results from real-world tests demonstrate that the iTru system works well with DNA from different extraction methods and of differing quality, quantity, and copy number. The results we present from DNA libraries prepared using the iTru system in our and others’ laboratories show that the approach easily scales to hundreds of libraries prepared, pooled, and sequenced in a single lane, ultimately producing information consistent with the variety of Illumina library techniques we have employed to obtain similar data (e.g., Crawford et al., 2012; McCormack et al., 2013; Smith et al., 2014; Kieran et al., 2019).
After testing the iTru system in several labs, we made several changes in our approach. The most significant of these were: (1) use a naming scheme that allows researchers to easily identify sets of iTru7 primers that are compatible or incompatible with TruSeq indexes; and (2) to increase the amount of iTru5 and iTru7 aliquoted into plates after oligo synthesis (from 0.625 nmol to 1.25 nmol), which reduced library amplification failures that resulted from improper hydration of low-quantity primers in specific wells of plates. The naming scheme and concentrations used in all supplemental files and the naming scheme we used in the Methods section reflect these changes to minimize confusion. After making these changes, we and others have successfully produced libraries and sequencing reads from all iTru5 and iTru7 primers, libraries for many of the primers are detailed in the supplemental files, and we have no evidence suggesting that any of the primer sequences will not work correctly. The original sets of iTru7 primers (sets 00–13) synthesized for beta testing have mixed compatibility with Illumina indexes, thus we encourage beta users to exhaust old stocks and adopt the new sets.
It is important to note that the iTru5 and iTru7 primers are grouped into “balanced” sets of 8 or 12 to minimize problems of index base diversity during sequencing. Index balance problems arise because of the way Illumina platforms detect bases during the sequencing run (Illumina, 2016), and the main issues associated with unbalanced base composition are experienced when relatively few samples are sequenced or when a small number of libraries with unbalanced sequence tags take up a large fraction of the sequencing run. We modeled the original four color-scheme used in HiSeq and MiSeq instruments. Using an entire group of eight iTru5 and 12 iTru7 indexed primers within a sequencing pool where each library is present in equal proportion ensures balanced base representation during the index sequence read(s). We also empirically validated this in the two-channel system used in NextSeq, MiniSeq and NovaSeq platforms. Generally, when researchers multiplex more than one group of eight iTru5 or 12 iTru7 indexed primers, base diversity is even more balanced, although it is always a good idea to check the balance of sequencing tags in all sequencing runs (i.e., use File S10). When less than a whole set of primers (i.e., <8 iTru5 primers or <12 iTru7 primers) are used, or if very few libraries will dominate the percentage of reads within a run, it becomes critical to ensure the tags are sufficiently diverse (i.e., use File S10, which includes separate calculations of base diversity for both color schemes). It is also possible to use the stub ligation products from one sample for multiple PCR reactions with different iTru5, iTru7 primers, or even to pool iTru5 and iTru7 primers, thus creating increased numbers of indexes in a pool from a limited number of samples.
All of the iTru oligonucleotides make use of a single phosphorothioate bond between the penultimate and 3′ base. Phosphorothioate linkages protect the 3′ end of oligonucleotides from some forms of nuclease activity (Eckstein, 1985; Skerra, 1992) such as those introduced by some DNA ligases and polymerases (exonuclease activity is a common contaminant of ligases and an intrinsic activity of proofreading polymerases), but phosphorothioate linkages add a modest cost to each primer (∼$3 USD per phosphorothioate linkage). Phosphorothioate linkages are also chiral, so only 50% of synthetic molecules receive protection per linkage, while the other 50% remain susceptible to nuclease activity (Eckstein, 1985). Adding a second phosphorothioate bond can reduce the proportion of unprotected molecules by 50% (thus 75% would be protected and 25% would remain susceptible). Illumina and other vendors often include three or more phosphorothioate linkages at the 3′ end of their oligonucleotides to ensure that a large fraction of the molecules are protected from nuclease activity. We include only a single phosphorothioate linkage in our iTru oligo designs because if we lose the 3′ base, we would rather lose the rest of the molecule instead of rescuing the remaining part of it, which may not function appropriately. This strategy also reduces costs associated with synthesizing the oligonucleotides, although others may prefer to incorporate additional phosphorothioate linkages (e.g., two phosphorotioate linkages would lead to 50% fully protected oligonucleotides and 25% that only lose a single 3′ base).
Who should adopt this method?
Today, there is great need to efficiently minimize cost per sample by scaling and increasing multiplexing flexibility, especially with the advent of platforms like the NovaSeq 6000 that can yield up to 3000 Gb in a single run. Researchers who need higher capacity to multiplex their Illumina library preparations or who have not yet invested heavily in any other method will likely find our approach attractive. It has a low cost of entry and significant flexibility (see below). The more types of libraries, projects, and samples researchers use, the quicker they will recoup the cost of switching and see savings. Additionally, researchers using MK-2010 to construct libraries with inserts >200 bp, particularly inserts ≥400 bp, are likely to benefit from using a Y-yoke adapter (though the upper size-limit of specific Illumina instruments and kits should be kept in mind). Our dual-indexed iTru/iNext libraries also reduce concerns over misassignment because, although index-switching occurs with low probability at both ends of sequences in a library, it rarely affects both ends of the same fragment (Larsson et al., 2018).
Researchers already invested in and using other methods with good success, such as the MK-2010 or F-2011 approaches, may wonder if it is worthwhile to switch. We suggest that it would be reasonable to continue using the MK-2010 and/or F-2011 methods if these are already being used successfully; for these labs, we simply provide some alternative adapters and primers that could be used once existing stocks of MK-2010 and/or F-2011 adapters and primers are exhausted or when new projects requiring unique or larger numbers of uniquely indexed samples are encountered.
iNext
In addition to the iTru adapters and primers we designed and tested, we have developed a universal adapter stub and sets of primers (iNext; File S1) that are compatible with the Illumina Nextera system and the original 8 ×12 Nextera indexes, though they are not compatible with all of the subsequent Nextera indexes. As noted in the methods, both iNext and iTru make use of slightly different subsets of the tags identified by Faircloth & Glenn (2012), and the indexed primer sets and numbering approaches are independent between iNext and iTru (e.g., iNext5_01_A does not have the same sequence tag as iTru5_01_A). Thus, researchers should use the tag sequence or tag number from Faircloth & Glenn (2012) or the tag sequences themselves to determine which indexes are equivalent (e.g., iNext7_07_06 uses tag 113 [AGCTAAGC] as does iTru7_203_10; these should not be combined into a single sequencing pool). Although we demonstrate it is possible to combine iNext and iTru libraries within the same MiSeq run (File S13; the iNext and iTru E. coli data come from a single MiSeq run, which works because TruSeq and Nextera sequencing primers are factory mixed together in those kits) and have subsequently added iNext or Nextera libraries in limited quantities to several of our iTru library pools run on the MiSeq, we are skeptical that other researchers should or will do this routinely. If researchers want to combine iNext and iTru libraries on a regular basis, it would be worthwhile to run additional experiments and to screen and sort the tags to compile sets with numbering that is consistent, thus facilitating pooling between the two systems.
Troubleshooting
Although all researchers endeavor to conduct mistake-free experiments, foul-ups are certain to occur. In addition to simple record-keeping errors, a very common mistake is flipping the orientation of one of the strip tubes containing iTru primer aliquots. Thus, it is critical to have the capacity to quickly and easily determine what index sequences and combinations are present within a sequencing run. We have developed a small and fast python program (File S15) that can count the indexes within a file of reads that were not assigned to specific samples during demultiplexing (i.e., the undetermined reads from bcl2fastq).
Other applications and future modifications
It is possible to use the iTru system for a variety purposes beyond what we describe here. For example, we have used the iTru system for making RNAseq libraries using KAPA library kits, as well as NEB Ultra II and Ultra II FS (New England Biolabs, Ipswich, MA, USA). Nearly any approach that yields double-stranded template molecules with a single adenosine can be used with no significant modifications to what we have described. One of the attractive features of our system is that it separates the primers and stubs into more manageable units. In other Adapterama papers, we use these same iTru primers with different adapter stubs to construct double- to quadruple-indexed amplicon libraries (Glenn et al., 2019), double-digest restriction-site associated DNA (3RAD; Bayona-Vásquez et al., 2019), and RADcap (Hoffberg et al., 2016) libraries. All of these extensions facilitate library preparation, sequencing, and bioinformatic processing of these types of data while also significantly reducing costs.
Having separate primers and adapter stubs simplifies and reduces costs associated with modification or swapping out of the universal Y-yoke adapters (Table 3; Files S4; S16), creating opportunities for further research and protocol development. For example, if researchers wanted to optimize library preparation for low levels of input DNA, then implementing an adapter stub in a stem-loop configuration (e.g., NEB Next Ultra; (New England Biolabs, Ipswich, MA, USA)) would be worth investigating. Similarly, adapters containing uracils that are broken at the uracil sites by USER (NEB M5505) or uracil-DNA-glycosylase (UDG; e.g., NEB M0280) plus APE 1 (e.g., NEB M0282) facilitate a variety of designs with potentially beneficial characteristics worth exploring, especially for mate-pair libraries. However, given recent advances in commercial kits that reduce buffer exchanges and increase efficiency (e.g., KAPA Hyper and HyperPlus and NEB Ultra II and UltraII FS, which require as little as 1 ng of input DNA), it is likely that the use of such high efficiency approaches combined with the iTru adapters and primers will be sufficient for the vast majority of applications where samples derive from ≥1,000 eukaryotic cells.
Conclusions
We describe an approach that uses a single universal adapter stub and relatively few PCR primers to produce many Illumina libraries. The approach allows multiple researchers to have unique primer sets so that libraries from individual researchers can be pooled without worrying about tag overlap. These primers can also be used with a variety of other application-specific adapters described in subsequent Adapterama papers for amplicon and RADseq libraries (Bayona-Vásquez et al., 2019; Glenn et al., 2019; Hoffberg et al., 2016). By modularizing library construction, researchers are free to focus on the development of new application-specific tags. Taking advantage of the many available tags also creates opportunities for low-cost experimental optimization attempts. Although the adapters and primers we describe are specific to Illumina, many of the ideas can easily be extended to Ion Torrent, Pacific Biosystems, Oxford Nanopore, and other sequencing platforms (Glenn et al., 2007).
Supplemental Information
Acknowledgments
Oligonucleotide sequences ©2007-2019 Illumina, Inc. All rights reserved. Derivative works created by Illumina customers are authorized for use with Illumina instruments and products only. All other uses are strictly prohibited. We thank Rigoberto Delgado Vega, Yann Henaut, Salima Machkour M’Rabet, Fausto Valenzuela, Eduardo Balart, David Paz, Carolina Galvan, Liza Gomez Daglio, and Pindaro Díaz Jaimes for providing samples, Erin Lipp for generously sharing laboratory space and equipment, Rahat Desai, Megan Beaudry, Julia Frederick and Will Thompson for helpful edits, and our colleagues at the Georgia Genomics and Bioinformatics Core and the Georgia Advanced Computing Resource Center. We thank Richard Harrington, Matt Friedman, and Thomas Near (carangimorph fishes), and Michael Branstetter, John Longino, and Phil Ward (ants) for allowing us to use read-count information from their respective studies. Finally, we acknowledge and thank Lisa Ortuno (deceased) for her enthusiastic support of this work; our world was enriched while she shared it with us. The information we present allows all researchers to synthesize the oligonucleotides at any vendor of their choice, follow or modify the library preparation techniques we have included, and freely publish results simply with proper attribution of this paper and Illumina®™.
Glossary
- adapters
oligonucleotides of known sequence that are ligated onto the ends of nucleic acids for the purpose of further manipulation or NGS library construction. In this paper we will make use of double-stranded DNA adapter stubs (see below).
- barcodes
see index or tag; this term is also used to mean a DNA sequence that can be used to identify the taxon from which a sample derives, thus we avoid using this ambiguous term.
- cluster
a group of molecules on an Illumina flow cell that have been clonally amplified via bridge PCR or newer approaches (i.e., all molecules in a cluster are replicates of a single starting molecule from an Illumina library).
- demultiplex
to separate pooled (multiplexed) sample information into their constituent parts (i.e., assign reads to specific samples)
- identifying sequences
see index or tag.
- index or tag
a short, unique sequence of DNA added to samples so they can be pooled and sequenced in parallel, with each resulting sequence containing information to identify the source sample or sometimes a pool of samples. Some authors and companies refer to such sequences as barcodes or molecular identifiers (MIDs). We use “Illumina index” when referring to specific sequences designed by Illumina, “tag” when specifically referring to sequences from Faircloth & Glenn (2012), and “index” when generically referring to identifying sequences in adapters and primers compatible with Illumina instruments.
- Index Read 1
the DNA sequence obtained from the 2nd Illumina sequencing reaction, yielding the i7 index sequence, which is placed into the header of Read 1 and Read 2 (if present)
- Index Read 2
the DNA sequence obtained from the 3rd Illumina sequencing reaction, yielding the i5 index sequence, which is placed into the header of Read 1 and Read 2 (if present)
- i5 index
the second indexing position introduced by Illumina, obtained by index Read 2, which is the 3rd read of a cluster made by Illumina instruments.
- i7 index
the original indexing position used in Illumina sequencing, obtained by index Read 1, which is the 2nd read of a cluster made by Illumina instruments.
- iNext
dual-index library preparation methods presented herein that are compatible with Illumina Nextera libraries.
- iTru
dual-index library preparation methods presented herein that are compatible with Illumina TruSeq libraries.
- library
a population of molecules with adapters on each end of each molecule to facilitate sequencing.
- MID
molecular identifier, term commonly used with 454 sequencing; see index or tag.
- multiplex
to combine samples together for further process or sequencing all at once.
- P5
an engineered DNA sequence that is: (1) incorporated into adapters of Illumina libraries for bulk amplification of library molecules and (2) manufactured as oligonucleotides grafted onto the surface of Illumina flow cells and used for clonal amplification of library molecules, and priming the 3rd sequencing reaction on MiSeq and HiSeq ≤2500 instruments.
- P7
an engineered DNA sequence that is: (1) incorporated into adapters of Illumina libraries for bulk amplification of library molecules and (2) manufactured as oligonucleotides grafted onto the surface of Illumina flow cells and used for clonal amplification of library molecules.
- paired-end reads
DNA sequences obtained from sequencing each strand of DNA templates within clusters (see Fig. 4).
- primers
single-stranded oligonucleotides used to initiate strand elongation for sequencing or amplification
- Read 1
the DNA sequence obtained from the 1st Illumina sequencing reaction, obtained as a fastq file with headers that contain data from indexing reads 1 and 2.
- Read 2
the DNA sequence obtained from the 4th Illumina sequencing reaction, obtained as a fastq file with headers that contain data from indexing reads 1 and 2.
- sequence diversity
the base composition of nucleotides across all clusters being sequenced at any given base position. Illumina sequencing requires sequence diversity for successful determination of a base call.
- stubs
short universal adapters that are formed by annealing two complimentary oligonucleotides together (Illumina Read1 and Read2 sequences) and attaching that double-stranded product to template DNA via ligation. In the iTru strategy, y-yoke adapter stubs are comprised of oligonucleotides with the Read1 and Read2 sequences.
- TruSeq
trademark associated with many Illumina sequencing products
- UMI
Unique Molecular Indices, similar to tag, is a short sequence to uniquely tag each molecule in a given sample library.
- y-yoke
an adapter that is formed from two oligonucleotides that are complementary on only one end to form a product that is double-stranded at one end, but single-stranded at the other end.
Funding Statement
This work was supported by: DEB-1242241, DEB-1242260, Dimensions of Biodiversity DEB-1136626, DEB-1146440, Graduate Research Fellowships DGE-0903734 and DGE-1452154, and Partnerships for International Research and Education (PIRE) OISE 0730218 from the U.S. National Science Foundation; CICESE (internal project O0F092), SAGARPA-FIRCO (grant RGA-BCS-12-000003); UCMEXUS (grant CN-13-617); and CB-CONACYT (grants 106925 and 157993). There was no additional external funding received for this study. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.
Contributor Information
Travis C. Glenn, Email: travisg@uga.edu.
Brant C. Faircloth, Email: brant@faircloth-lab.org.
Additional Information and Declarations
Competing Interests
Brant C. Faircloth is an Academic Editor for PeerJ.
The EHS DNA lab (baddna.uga.edu) and Georgia Genomics Bioinformatics Core (dna.uga.edu) provide oligonucleotide aliquots and library preparation services at cost, including some oligonucleotides and services that make use of the adapters and primers presented in this manuscript.
Author Contributions
Travis C. Glenn conceived and designed the experiments, analyzed the data, contributed reagents/materials/analysis tools, prepared figures and/or tables, authored or reviewed drafts of the paper, approved the final draft.
Roger A. Nilsen conceived and designed the experiments, performed the experiments, analyzed the data, approved the final draft.
Troy J. Kieran and Todd W. Pierson conceived and designed the experiments, performed the experiments, analyzed the data, prepared figures and/or tables, authored or reviewed drafts of the paper, approved the final draft.
Jon G. Sanders performed the experiments, analyzed the data, contributed reagents/materials/analysis tools, approved the final draft.
Natalia J. Bayona-Vásquez performed the experiments, analyzed the data, contributed reagents/materials/analysis tools, prepared figures and/or tables, authored or reviewed drafts of the paper, approved the final draft.
John W. Finger and Kerin E. Bentley performed the experiments, prepared figures and/or tables, approved the final draft.
Sandra L Hoffberg performed the experiments, analyzed the data, prepared figures and/or tables, approved the final draft.
Swarnali Louha analyzed the data, approved the final draft.
Francisco J. Garcia-De Leon, Miguel Angel del Rio Portilla, Kurt D. Reed, Jennifer K. Meece, Samuel E. Aggrey, Romdhane Rekaya, Magdy Alabady, Myriam Belanger and Kevin Winker contributed reagents/materials/analysis tools, approved the final draft.
Jennifer L. Anderson performed the experiments, contributed reagents/materials/analysis tools, approved the final draft.
Brant C. Faircloth conceived and designed the experiments, performed the experiments, analyzed the data, contributed reagents/materials/analysis tools, prepared figures and/or tables, authored or reviewed drafts of the paper, approved the final draft.
Animal Ethics
The following information was supplied relating to ethical approvals (i.e., approving body and any reference numbers):
None of the samples were collected for this study. All samples discussed in this paper (except the E. coli) were collected for and used in the papers cited within this manuscript. Use of the sequences for this work is ancillary to their initial purpose, thus it doesn’t seem appropriate to suggest otherwise by including this information here - we simply cite the relevant studies.
DNA Deposition
The following information was supplied regarding the deposition of DNA sequences:
Data is available at NCBI SRA. The BioProject ID is PRJNA541504.
Data Availability
The following information was supplied regarding data availability:
Summary tables of data are provided as in Files S12, S13, and S14.
The raw data has been published for other purposes and references to it have been provided. These include data related to fishes (McGee et al., 2016; Harrington et al., 2016), and ants (Faircloth et al., 2015; Blaimer et al., 2016). E. coli data are deposited in SRA under the BioProject ID PRJNA541504.
References
- Adey et al. (2010).Adey A, Morrison HG, Asan, Xun X, Kitzman JO, Turner EH, Stackhouse B, MacKenzie AP, Caruccio NC, Zhang X, Shendure J. Rapid, low-input, low-bias construction of shotgun fragment libraries by high-density in vitro transposition. Genome Biology. 2010;11 doi: 10.1186/gb-2010-11-12-r119. Article R119. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Aird et al. (2011).Aird D, Ross MG, Chen WS, Danielsson M, Fennell T, Russ C, Jaffe DB, Nusbaum C, Gnirke A. Analyzing and minimizing PCR amplification bias in Illumina sequencing libraries. Genome Biology. 2011;12 doi: 10.1186/gb-2011-12-2-r18. Article R18. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Aljanabi & Martínez (1997).Aljanabi SM, Martínez I. Universal and rapid salt-extraction of high quality genomic DNA for PCR-based techniques. Nucleic Acids Research. 1997;25:4692–4693. doi: 10.1093/nar/25.22.4692. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Ansorge (2009).Ansorge WJ. Next-generation DNA sequencing techniques. Nature Biotechnology. 2009;25:195–203. doi: 10.1016/j.nbt.2008.12.009. [DOI] [PubMed] [Google Scholar]
- Bayona-Vásquez et al. (2019).Bayona-Vásquez NJ, Glenn TC, Kieran TJ, Pierson TW, Hoffberg SL, Scott PA, Bentley KE, Finger Jr JW, Louha S, Troendle N, Diaz-Jaimes P, Mauricio R, Faircloth BC. Adapterama III: quadruple-indexed, double/triple-enzyme RADseq libraries (2RAD/3RAD) PeerJ. 2019;7:e7724. doi: 10.7717/peerj.7724. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Bentley et al. (2008).Bentley DR, Balasubramanian S, Swerdlow HP, Smith GP, Milton J, Brown CG, Hall KP, Evers DJ, Barnes CL, Bignell HR, Boutell JM, Bryant J, Carter RJ, Keira Cheetham R, Cox AJ, Ellis DJ, Flatbush MR, Gormley NA, Humphray SJ, Irving LJ, Karbelashvili MS, Kirk SM, Li H, Liu X, Maisinger KS, Murray LJ, Obradovic B, Ost T, Parkinson ML, Pratt MR, Rasolonjatovo IM, Reed MT, Rigatti R, Rodighiero C, Ross MT, Sabot A, Sankar SV, Scally A, Schroth GP, Smith ME, Smith VP, Spiridou A, Torrance PE, Tzonev SS, Vermaas EH, Walter K, Wu X, Zhang L, Alam MD, Anastasi C, Aniebo IC, Bailey DM, Bancarz IR, Banerjee S, Barbour SG, Baybayan PA, Benoit VA, Benson KF, Bevis C, Black PJ, Boodhun A, Brennan JS, Bridgham JA, Brown RC, Brown AA, Buermann DH, Bundu AA, Burrows JC, Carter NP, Castillo N, Chiara E Catenazzi M, Chang S, Neil Cooley R, Crake NR, Dada OO, Diakoumakos KD, Dominguez-Fernandez B, Earnshaw DJ, Egbujor UC, Elmore DW, Etchin SS, Ewan MR, Fedurco M, Fraser LJ, Fuentes Fajardo KV, Scott Furey W, George D, Gietzen KJ, Goddard CP, Golda GS, Granieri PA, Green DE, Gustafson DL, Hansen NF, Harnish K, Haudenschild CD, Heyer NI, Hims MM, Ho JT, Horgan AM, Hoschler K, Hurwitz S, Ivanov DV, Johnson MQ, James T, Huw Jones TA, Kang GD, Kerelska TH, Kersey AD, Khrebtukova I, Kindwall AP, Kingsbury Z, Kokko-Gonzales PI, Kumar A, Laurent MA, Lawley CT, Lee SE, Lee X, Liao AK, Loch JA, Lok M, Luo S, Mammen RM, Martin JW, McCauley PG, McNitt P, Mehta P, Moon KW, Mullens JW, Newington T, Ning Z, Ling Ng B, Novo SM, O’Neill MJ, Osborne MA, Osnowski A, Ostadan O, Paraschos LL, Pickering L, Pike AC, Pike AC, Chris Pinkard D, Pliskin DP, Podhasky J, Quijano VJ, Raczy C, Rae VH, Rawlings SR, Chiva Rodriguez A, Roe PM, Rogers J, Rogert Bacigalupo MC, Romanov N, Romieu A, Roth RK, Rourke NJ, Ruediger ST, Rusman E, Sanches-Kuiper RM, Schenker MR, Seoane JM, Shaw RJ, Shiver MK, Short SW, Sizto NL, Sluis JP, Smith MA, Ernest Sohna Sohna J, Spence EJ, Stevens K, Sutton N, Szajkowski L, Tregidgo CL, Turcatti G, Vandevondele S, Verhovsky Y, Virk SM, Wakelin S, Walcott GC, Wang J, Worsley GJ, Yan J, Yau L, Zuerlein M, Rogers J, Mullikin JC, Hurles ME, McCooke NJ, West JS, Oaks FL, Lundberg PL, Klenerman D, Durbin R, Smith AJ. Accurate whole human genome sequencing using reversible terminator chemistry. Nature. 2008;456:53–59. doi: 10.1038/nature07517. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Binladen et al. (2007).Binladen J, Gilbert MTP, Bollback JP, Panitz F, Bendixen C, Nielsen R, Willerslev E. The use of coded PCR primers enables high-throughput sequencing of multiple homolog amplification products by 454 parallel sequencing. PLOS ONE. 2007;2:e197. doi: 10.1371/journal.pone.0000197. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Blaimer et al. (2016).Blaimer BB, LaPolla JS, Branstetter MG, Lloyd MW, Brady SG. Phylogenomics, biogeography and diversification of obligate mealybug-tending ants in the genus Acropyga. Molecular Phylogenetics and Evolution. 2016;102:20–29. doi: 10.1016/j.ympev.2016.05.030. [DOI] [PubMed] [Google Scholar]
- Castoe et al. (2012).Castoe TA, Poole AW, De Koning APJ, Jones KL, Tomback DF, Oyler-McCance SJ, Fike JA, Lance SL, Streicher JW, Smith EN, Pollock DD. Rapid microsatellite identification from Illumina paired-end genomic sequencing in two birds and a snake. PLOS ONE. 2012;7:e30953. doi: 10.1371/journal.pone.0030953. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Costello et al. (2018).Costello M, Fleharty M, Abreu J, Farjoun Y, Ferriera S, Holmes L, Granger B, Green L, Howd T, Mason T, Vicente G, Dasilva M, Brodeur W, DeSmet T, Dodge S, Lennon NJ, Gabriel S. Characterization and remediation of sample index swaps by non-redundant dual indexing on massively parallel sequencing platforms. BMC Genomics. 2018;19:332. doi: 10.1186/s12864-018-4703-0. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Craig et al. (2008).Craig DW, Pearson JV, Szelinger S, Sekar A, Redman M, Corneveaux JJ, Pawlowski TL, Laub T, Nunn G, Stephan DA, Homer N, Huentelman MJ. Identification of genetic variants using bar-coded multiplex sequencing. Nature Methods. 2008;5:887–893. doi: 10.1038/nmeth.1251. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Crawford et al. (2012).Crawford NG, Faircloth BC, McCormack JE, Brumfield RT, Winker K, Glenn TC. More than 1000 ultraconserved elements provide evidence that turtles are the sister group of archosaurs. Biology Letters. 2012;8:783–786. doi: 10.1098/rsbl.2012.0331. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Del Rio Portilla et al. (2016).Del Rio Portilla MA, Vargas Peralta CE, Paz García DA, Lafarga De La Cruz F, Balart Páez E, García De León FJ. The complete mitochondrial DNA of endemic Eastern Pacific coral (Porites panamensis) Mitochondrial DNA. 2016;27:738–739. doi: 10.3109/19401736.2014.913166. [DOI] [PubMed] [Google Scholar]
- Díaz-Jaimes et al. (2016).Díaz-Jaimes P, Hinojosa-Alvarez S, Sánchez-Hernández X, Hoyos-Padilla M, García-De-León FJ. The complete mitochondrial DNA of white shark (Carcharodon carcharias) from Isla Guadalupe, Mexico. Mitochondrial DNA. 2016;27:1281–1282. doi: 10.3109/19401736.2014.945556. [DOI] [PubMed] [Google Scholar]
- Eckstein (1985).Eckstein F. Nucleoside phosphorothioates. Annual Review of Biochemistry. 1985;54:367–402. doi: 10.1146/annurev.bi.54.070185.002055. [DOI] [PubMed] [Google Scholar]
- Faircloth (2008).Faircloth BC. MSATCOMMANDER: detection of microsatellite repeat arrays and automated, locus-specific primer design. Molecular Ecology Resources. 2008;8:92–94. doi: 10.1111/j.1471-8286.2007.01884.x. [DOI] [PubMed] [Google Scholar]
- Faircloth et al. (2015).Faircloth BC, Branstetter MG, White ND, Brady SG. Target enrichment of ultraconserved elements from arthropods provides a genomic perspective on relationships among Hymenoptera. Molecular Ecology Resources. 2015;15:489–501. doi: 10.1111/1755-0998.12328. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Faircloth & Glenn (2012).Faircloth BC, Glenn TC. Not all sequence tags are created equal: designing and validating sequence identification tags robust to indels. PLOS ONE. 2012;7:e42543. doi: 10.1371/journal.pone.0042543. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Faircloth et al. (2012).Faircloth BC, McCormack JE, Crawford NG, Harvey MG, Brumfield RT, Glenn TC. Ultraconserved elements anchor thousands of genetic markers for target enrichment spanning multiple evolutionary timescales. Systematic Biology. 2012;61:717–26. doi: 10.1093/sysbio/sys004. [DOI] [PubMed] [Google Scholar]
- Fisher et al. (2011).Fisher S, Barry A, Abreu J, Minie B, Nolan J, Delorey TM, Young G, Fennell TJ, Allen A, Ambrogio L, Berlin AM, Blumenstiel B, Cibulskis K, Friedrich D, Johnson R, Juhn F, Reilly B, Shammas R, Stalker J, Sykes SM, Thompson J, Walsh J, Zimmer A, Zwirko Z, Gabriel S, Nicol R, Nusbaum C. A scalable, fully automated process for construction of sequence-ready human exome targeted capture libraries. Genome Biology. 2011;12 doi: 10.1186/gb-2011-12-1-r1. Article R1. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Galván-Tirado et al. (2016).Galván-Tirado C, Hinojosa-Alvarez S, Diaz-Jaimes P, Marcet-Houben M, García-De-León FJ. The complete mitochondrial DNA of the silky shark (Carcharhinus falciformis) Mitochondrial DNA. 2016;27:157–158. doi: 10.3109/19401736.2013.878922. [DOI] [PubMed] [Google Scholar]
- Glenn (2011).Glenn TC. Field guide to next-generation DNA sequencers. Molecular Ecology Resources. 2011;11:759–769. doi: 10.1111/j.1755-0998.2011.03024.x. [DOI] [PubMed] [Google Scholar]
- Glenn (2016).Glenn TC. Update to the field guide to next-generation DNA sequencers. 2016. http://www.molecularecologist.com/next-gen-fieldguide-2016/ [15 April 2016]. http://www.molecularecologist.com/next-gen-fieldguide-2016/ [DOI] [PubMed]
- Glenn et al. (2007).Glenn TC, Jones KL, Lance SL, Szalai GJ, Felder MR. Source tagging and normalization of DNA for parallel DNA sequencing. 2007. US Provisional Patent 60/909, 010 filed March 30, 2007.
- Glenn et al. (2019).Glenn TC, Pierson TW, Bayona-Vásquez NJ, Kieran TJ, Hoffberg SL, Thomas IVJC, Lefever DE, Finger Jr JW, Gao B, Bian X, Louha S, Kolli RT, Bentley K, Rushmore J, Wong K, Shaw TI, Rothrock Jr MJ, McKee AM, Guo TL, Mauricio R, Molina M, Cummings BS, Lash LH, Lu K, Gilbert GS, Hubbell SP, Faircloth BC. Adapterama II: universal amplicon sequencing on Illumina platforms (TaggiMatrix) PeerJ. 2019;7:e7786. doi: 10.7717/peerj.7786. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Goodwin, McPherson & McCombie (2016).Goodwin S, McPherson JD, McCombie WR. Coming of age: ten years of next-generation sequencing technologies. Nature Reviews Genetics. 2016;17:333–351. doi: 10.1038/nrg.2016.49. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Graham et al. (2015).Graham CF, Glenn TC, McArthur A, Boreham DR, Kieran T, Lance S, Manzon RG, Martino JA, Pierson T, Rogers SM, Wilson JY, Somers CM. Impacts of degraded DNA on restriction enzyme associated DNA sequencing (RADSeq) Molecular Ecology Resources. 2015;15:1304–1315. doi: 10.1111/1755-0998.12404. [DOI] [PubMed] [Google Scholar]
- Greigite (2009).Greigite Adapter illustration PDF. 2009. http://seqanswers.com/forums/showpost.php?p=7629&postcount=36. [25 February 2019]. http://seqanswers.com/forums/showpost.php?p=7629&postcount=36
- Harrington et al. (2016).Harrington RC, Faircloth BC, Eytan RI, Smith WL, Near TJ, Alfaro ME, Friedman M. Phylogenomic analysis of carangimorph fishes reveals flatfish asymmetry arose in a blink of the evolutionary eye. BMC Evolutionary Biology. 2016;16:224. doi: 10.1186/s12862-016-0786-x. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Head et al. (2014).Head SR, Komori HK, LaMere SA, Whisenant T, Nieuwerburgh F, Salomon DR, Ordoukhanian P. Library construction for next-generation sequencing: overviews and challenges. BioTechniques. 2014;56:61–77. doi: 10.2144/000114133. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Hoffberg et al. (2016).Hoffberg SL, Kieran TJ, Catchen JM, Devault A, Faircloth BC, Mauricio R, Glenn TC. RADcap: sequence capture of dual-digest RADseq libraries with identifiable duplicates and reduced missing data. Molecular Ecology Resources. 2016;16:1264–1278. doi: 10.1111/1755-0998.12566. [DOI] [PubMed] [Google Scholar]
- Hoffmann et al. (2007).Hoffmann C, Minkah N, Leipzig J, Wang G, Arens MQ, Tebas P, Bushman FD. DNA bar coding and pyrosequencing to identify rare HIV drug resistance mutations. Nucleic Acids Research. 2007;13:e91. doi: 10.1093/nar/gkm435. [DOI] [PMC free article] [PubMed] [Google Scholar]
- IDT (2018).IDT Adapters for next generation sequencing. 2018. http://idtdna.com/pages/products/next-generation-sequencing/adapters. [2 August 2018]. http://idtdna.com/pages/products/next-generation-sequencing/adapters
- Illumina (2018a).Illumina Multiplexed sequencing with the Illumina Genome Analyzer system. 2018a. [28 April 2016]. Pub. No. 770-2008-011. http://www.illumina.com/documents/products/datasheets/datasheet_sequencing_multiplex.pdf .
- Illumina (2012).Illumina Frequently asked questions; Sequencing; Can I run TruSeq HT libraries with Nextera libraries on the same lane or flow cell? 2012. http://support.illumina.com/sequencing/sequencing_kits/truseq_dna_ht_sample_prep_kit/questions.ilmn. [28 April 2016]. http://support.illumina.com/sequencing/sequencing_kits/truseq_dna_ht_sample_prep_kit/questions.ilmn
- Illumina (2016).Illumina Nextera low plex pooling guidelines. 2016. [28 April 2016]. Pub. No. 770-2011-044, current as of 24 March 2016. http://www.illumina.com/documents/products/technotes/technote_nextera_low_plex_pooling_guidelines.pdf .
- Illumina (2017).Illumina bcl2fastq2 conversion software v.2.19 user guide. 2017. [1st August 2018]. Illumina Proprietary Part #15051736v2, 2017. https://support.illumina.com/content/dam/illumina-support/documents/documentation/software_documentation/bcl2fastq/bcl2fastq2_guide_15051736_v2.pdf .
- Illumina (2018a).Illumina Indexed sequencing overview guide. 2018a. [2 August 2018]. Illumina Proprietary Document #15057455v04, 2018. http://support.illumina.com/content/dam/illumina-support/documents/documentation/system_documentation/miseq/indexed-sequencing-overview-guide-15057455-04.pdf .
- Illumina (2018b).Illumina Nextera XT DNA library prep kit reference guide. 2018b. [2 August 2018]. Illumina Proprietary Part #15031942v03, 2018. https://support.illumina.com/content/dam/illumina-support/documents/documentation/chemistry_documentation/samplepreps_nextera/nextera-xt/nextera-xt-library-prep-reference-guide-15031942-03.pdf .
- Kieran et al. (2019).Kieran TJ, Gordon ERL, Forthman M, Hoey-Chamberlain R, Kimball RT, Faircloth BC, Weirauch C, Glenn TC. Insight from an ultraconserved element bait set designed for hemipteran phylogenetics integrated with genomic resources. Molecular Phylogeny and Evolution. 2019;130:297–303. doi: 10.1016/j.ympev.2018.10.026. [DOI] [PubMed] [Google Scholar]
- Kircher, Sawyer & Meyer (2012).Kircher M, Sawyer S, Meyer M. Double indexing overcomes inaccuracies in multiplex sequencing on the Illumina platform. Nucleic Acids Research. 2012;40:e3. doi: 10.1093/nar/gkr771. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Kozerewa et al. (2009).Kozerewa I, Ning Z, Quail MA, Sanders MJ, Berriman M, Turner DJ. Amplification-free Illumina sequencing-library preparation facilitates improved mapping and assembly of (G+C)-biased genomes. Nature Methods. 2009;6:291–295. doi: 10.1038/nmeth.1311. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Larsson et al. (2018).Larsson AJ, Stanley G, Sinha R, Weissman IL, Sandberg R. Computational correction of index switching in multiplexed sequencing libraries. Nature Methods. 2018;15:305–307. doi: 10.1038/nmeth.4666. [DOI] [PubMed] [Google Scholar]
- MacConaill et al. (2018).MacConaill LE, Burns RT, Nag A, Coleman HA, Slevin MK, Giorda K, Light M, Lai K, Jarosz M, McNeill MS, Ducar MD, Meyerson M, Thorner AR. Unique, dual-indexed sequencing adapters with UMIs effectively eliminate index cross-talk and significantly improve sensitivity of massively parallel sequencing. BMC Genomics. 2018;19:30. doi: 10.1186/s12864-017-4428-5. [DOI] [PMC free article] [PubMed] [Google Scholar]
- McCormack et al. (2013).McCormack JE, Harvey MG, Faircloth BC, Crawford NG, Glenn TC, Brumfield RT. A phylogeny of birds based on over 1, 500 loci collected by target enrichment and high-throughput sequencing. PLOS ONE. 2013;8:e54848. doi: 10.1371/journal.pone.0054848. [DOI] [PMC free article] [PubMed] [Google Scholar]
- McGee et al. (2016).McGee MD, Faircloth BC, Borstein SR, Zheng J, Hulsey CD, Wainwright PC, Alfaro ME. Replicated divergence in cichlid radiations mirrors a major vertebrate innovation. Proceedings of the Royal Society of London: Biological Sciences. 2016;283 doi: 10.1098/rspb.2015.1413. Article 201514. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Meyer & Kircher (2010).Meyer M, Kircher M. Illumina sequencing library preparation for highly multiplexed target capture and sequencing. Cold Spring Harbor Protocol. 2010;2010 doi: 10.1101/pdb.prot5448. Article pdb prot5448. [DOI] [PubMed] [Google Scholar]
- Meyer et al. (2007).Meyer M, Stenzel U, Myles S, Prufer K, Mofreiter M. Targeted high- throughput sequencing of tagged nucleic acid samples. Nucleic Acids Research. 2007;35:e97. doi: 10.1093/nar/gkm566. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Rohland & Reich (2012).Rohland N, Reich D. Cost-effective, high-throughput DNA sequencing libraries for multiplexed target capture. Genome Research. 2012;22:939–946. doi: 10.1101/gr.128124.111. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Shoemaker et al. (1996).Shoemaker DD, Lashkari DA, Morris D, Mittmann M, Davis RW. Quantitative phenotypic analysis of yeast deletion mutants using a highly parallel molecular bar-coding strategy. Nature Genetics. 1996;14:450–456. doi: 10.1038/ng1296-450. [DOI] [PubMed] [Google Scholar]
- Sinha et al. (2017).Sinha R, Abu-Ali G, Vogtmann E, Fodor AA, Ren B, Amir A, Schwager E, Crabtree J, Ma S, The Microbiome Quality Control Project Consortium. Abnet CC, Knight R, White O, Huttenhower C. Assessment of variation in microbial community amplicon sequencing by the Microbiome Quality Control (MBQC) project consortium. Nature Biotechnology. 2017;35:1077–1086. doi: 10.1038/nbt.3981. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Skerra (1992).Skerra A. Phosphorothioate primers improve the amplification of DNA sequences by DNA polymerases with proofreading activity. Nucleic Acids Research. 1992;20:3551–3554. doi: 10.1093/nar/20.14.3551. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Smith et al. (2014).Smith BT, Harvey MG, Faircloth BC, Glenn TC, Brumfield RT. Target capture and massively parallel sequencing of ultraconserved elements (UCEs) for comparative studies at shallow evolutionary time scales. Systematic Biology. 2014;63:83–95. doi: 10.1093/sysbio/syt061. [DOI] [PubMed] [Google Scholar]
- Syed, Grunenwald & Caruccio (2009).Syed F, Grunenwald H, Caruccio N. Next-generation sequencing library preparation: simultaneous fragmentation and tagging using in vitro transposition. Nature Methods Application Note. 2009;6:856. doi: 10.1038/nmeth.f.272. [DOI] [Google Scholar]
- Tautz, Ellegren & Weigel (2010).Tautz D, Ellegren H, Weigel D. Next generation molecular ecology. Molecular Ecology. 2010;19(s1):1–3. doi: 10.1111/j.1365-294X.2009.04489.x. [DOI] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Supplementary Materials
Data Availability Statement
The following information was supplied regarding data availability:
Summary tables of data are provided as in Files S12, S13, and S14.
The raw data has been published for other purposes and references to it have been provided. These include data related to fishes (McGee et al., 2016; Harrington et al., 2016), and ants (Faircloth et al., 2015; Blaimer et al., 2016). E. coli data are deposited in SRA under the BioProject ID PRJNA541504.