Skip to main content
Journal of Biomolecular Techniques : JBT logoLink to Journal of Biomolecular Techniques : JBT
. 2008 Dec;19(5):335–341.

Effect of Primer Proximity to a Difficult-to-Sequence Region on Read Length and Sequence Quality

Jan Kieleczawa 1,
PMCID: PMC2628072  PMID: 19183797

Abstract

Anecdotal and not well-established evidence implies that there could be some effect of primer proximity in relation to a difficult region on read length and sequence quality. In this paper we sequenced many different categories of difficult regions where primers were located at various distances in relation to such regions and we found that there is only weak, if any, correlation between primer proximity and read length or sequence quality. The occasional improvements observed in some studies could be related instead to more optimal primers or better quality DNA. We suggest that instead of trying to design primers at varying distances to a difficult region, sequence finishers concentrate on applying modified chemistries appropriate to a given difficult region.

Keywords: difficult template region, DNA sequencing, modified sequencing chemistry, primer proximity

INTRODUCTION

Despite tremendous advances using next generation sequencing technologies,13 the elucidation of DNA sequence by the Sanger protocol4 is still the preferred method of choice in most DNA core facilities as well as in small and big sequencing centers. Advances in sequencing chemistries,58 optimization of auxiliary protocols,912 and improvements in instrumentation have made this technology flexible, reliable, and easy to use. Assuming that the quality and the quantity of the DNA preparation are acceptable, one can easily obtain over 900 bases of good quality for most nondifficult templates. However, if the DNA template is difficult—defined as one that cannot be sequenced using the standard ABI-like protocol5—more advanced protocols are needed to get clean read through. In the last few years, significant progress has been made in sequencing through many kinds of difficult templates.1322 However, one almost unexplored aspect of sequencing through any difficult region is the effect of primer proximity to a difficult region on the read length. Currently, to our best knowledge, only anecdotal evidence exists (e.g., Ref. 23, personal communications) that there could be some effect of primer proximity to a difficult region on the read length and sequence quality. In this paper, we systematically explore the effect of primer proximity in relation to a number of difficult regions on the ability to obtain clean and long read lengths through such regions.

MATERIALS AND METHODS

Fourteen DNA templates used throughout this study contained a variety of difficult-to-sequence regions and were primarily collected through standard submission of sequencing requests to a DNA sequencing group at Wyeth, Cambridge. All of these templates were prepared using Marligen’s PowerPrep HP Plasmid Maxiprep System (Ijamsville, MD) and some DNAs were also prepared using Sequence Resolver Kit.11

DNA sequencing (in triplicates for each primer), cleanup of sequencing reactions, and electrophoresis were carried out as described before.14,15 Modifications to a standard DNA sequencing protocol are described in the legend to Figure 6. All dye terminator mixes were purchased from Applied Biosystems (Foster City, CA) and betaine was from Sigma (Sigma-Aldrich, St. Louis, MI). Data were analyzed using Sequencher program (Gene Codes, Ann Arbor, MI), and for assembly into contigs, only traces with a median read length, out of three, were used.

FIGURE 6.

FIGURE 6

FIGURE 6

Effect of various modifications of sequencing protocol and different methods of template preparation on read length. a and b show Q ≥ 20 read length in forward and reverse direction, respectively. Note that in A no data were obtained for DNA 2 using standard ABI protocol. DNAs 2, 5, and 13 were prepared using either a standard plasmid purification method or a modified templiphi protocol as described in the literature.11 These DNAs were then sequenced using the following modifications: (1) standard ABI protocol5; (2) protocol with heat denaturation step14,15; (3) as in protocol 2 but 1 M betaine was included in the heat denaturation step; (4) the sequencing mix (per one 10-μl reaction) was composed of 1.5 μl of undiluted BigDye v3.1 dye–terminator mix, 0.5 μl of undiluted dGTP v3.0 dye-terminator mix, and 2 μl of 5 M betaine; (5) as in protocol 4 but both dye-terminators mixes were diluted 8-fold. The buffer strength and magnesium were maintained at the same level as in undiluted reaction mix; (6) DNA was prepared using templiphi method and sequencing protocol was as in protocol 2; and (7) DNA was prepared as in protocol 6 but sequencing mix was as in protocol 4.

Primer selection for this study was greatly facilitated using the Find Primer algorithm which is part of the DNA sequencing LIMS developed at the Wyeth core facility.2426 Briefly, this algorithm matches all primers available in our library against a reference sequence, and positions and orientations of found primers are displayed. If needed, new primers can be designed at specified intervals by using another algorithm developed at the Wyeth core facility. An example of such a primer match is shown in Figure 1. In each case presented in this paper, several primers (from 3 to 28) were selected on both sides of a difficult region (Table 1). To predict various potentially difficult-to-sequence regions in templates, we developed the “Examine Repeats” algorithm26 which can calculate up to seven various structures. In addition, the GC module calculates GC content in a reference sequence at specified intervals (Fig. 1). Examples of such predictions are shown in Figures 2 and 3.

FIGURE 1.

FIGURE 1

Matching primers to a reference sequence using “Find Primer” module. The database finds perfect matches of all existing primers against provided reference sequence. The matching can occur over the entire length or any portion of a reference sequence. To expand (or limit) the number of found primers the length of desired primers can be changed. The positions and orientations of matched primers (in relation to a reference sequence) are displayed in the box at the left side. To select a primer an analyst just checks it off and the primer is automatically scheduled for a run. The default parameters for primer design are shown at the right lower corner. The “Examine Repeat” module is used to predict various potentially difficult-to-sequence regions.26 The small window below the “Select a Region to hilite its Primers” description shows GC content of the reference sequence within selected intervals (the default is 500 bases but it can be changed).

TABLE 1.

DNA Templates: Characteristics of Difficult Regionsa

DNA # Characteristics of a Difficult Region (Forward Direction Only) No of Primers:F Direction Forward Range Average Q ≥ 20F Direction No of Primers:R Direction Reverse Range Average Q ≥ 20R Direction
DNA 1 87% GC-rich over 100-base region 8 40–300 964±11 13 40–630 965±18
DNA 2 94% GC over 200 bases/101-base nonrepeat G/C 3 140–700 N/A 5 150–200 N/A
DNA 3 18 As and 17 Ts separated by 380 bases 6 122–527b 911±25 8 102–522b 899±25
541–761b 182–652b
DNA 4 18 Cs/10 Cs separated by 7 bases 3 130–540 N/A 4 330–450 N/A
DNA 5 24–bp Hairpin with Tm >95°C 12 246–328 N/A 10 272–638 N/A
DNA 6 19-Base inverted repeat, 19-base homopolymer (G) and 41-base T/A nonrepeat (all within 300-base region) 6 348–783 N/A 28 43–785 N/A
DNA 7 20 A/T dinucleotide repeats 7 52–532 878±46 9 59–511 782±90
DNA 8 26 C/A dinucleotide repeats 9 72–492 791±67 8 32–494 641±94
DNA 9 40-Base nonrepeat T/C 7 73–247 868±43 8 68–391 799±125
DNA 10 81-Base nonrepeat A/T including 28-base poly T 12 368–751 894±72 19 124–498 935±37
DNA 11 95-Base nonrepeat A/G including 34-base poly A 8 71–838 859±54 9 10–875 878±35
DNA 12 147-Base nonrepeat T/C 12 212–652 728±88 12 136–441 901±17
DNA 13 456-Base nonrepeat T/C 4 98–530 N/A 7 79–490 N/A
DNA 14 Difficult region not detected by 4D module 10 109–448 807±46 10 202–821 782±105
a

Forward and reverse range refers to the distance, in bases, of primers relative to a difficult region. This table also shows the number of primers for each DNA and direction as well as average Q ≥ 20 read length. All Q ≥ 20 values in this table were calculated when DNAs were sequenced using protocol 2 as described in the legend to Figure 6.

b

Indicates distance to poly A (top numbers) and to poly T (bottom numbers).

FIGURE 2.

FIGURE 2

Various Potentially Difficult-to-Sequence Motifs in DNA 2a

FIGURE 3.

FIGURE 3

Various Potentially Difficult-to-Sequence Motifs in -DNA 6a

Note: All primers used in this study passed primer design criteria as specified by Primer Designer software from Scientific and Educational Software (Cary, NC); Tm 54–70°C, GC% = 55±10, stability > 1.3 kcal/mole (3′ vs 5′), matches at 3′ end < 3, hairpin separation < 7, base runs < 4, adjacent homologous bases < 7, and repeats:dinucleotide pairs < 3.

RESULTS AND DISCUSSION

The characteristic of difficult regions in each of the 14 templates used in this study, as well as the number of primers used in forward and reverse directions, is presented in Table 1. The forward/reverse range indicates the distance (in bases) from the 3′ end of sequencing primers to the beginning of a difficult region.

There are two general cases observed in this study. Case 1: The forward and reverse reads stop at the beginning of a difficult motif (DNAs 4–6) or at some distance into such aregion (DNAs 2, 13) without completely getting through, with the consequence that there is no assembly into a single contig. It is obvious that in this case read length is dependent on the distance of a primer from a difficult region, but in no situation was it possible to sequence through such a region. Case 2: The forward and reverse primers read through a difficult region and assemble into a single contig (all other DNAs in this study). Reads are somewhat shorter (with few exceptions) compared with typical read lengths of over 900 bases, and relatively small standard deviations (1–15% with median of about 4.5%) for reads in either forward or reverse directions indicates the lack of significant effect of primer position on the ability to obtain better quality and longer reads. Figure 4 shows an example of Sequencher assembly for DNA template 5 containing a strong 24-base hairpin. All 11 sequences, regardless of the distance to the hairpin, terminated at the beginning of a hairpin and did not overlap with sequences generated using reverse primers (not shown here). In Figure 5 (DNA 8) the forward and reverse reads assemble into a single contig but there is no significant effect of primer proximity to CA/GT dinucleotide repeats on the read length. Table 2 shows individual Q 20 read-length values corresponding to data presented in Figure 5.

FIGURE 4.

FIGURE 4

FIGURE 4

Termination of sequencing traces at the beginning of a hairpin in DNA 5. a: This vector was purchased from Invitrogen and routinely any new vector is re-sequenced. Sequencing verification revealed the presence of a 27-base-pair insertion that wasn’t included in Invitrogen’s original reference sequence. This insertion generated a 24-base perfect-match hairpin and the blue arrow indicates the termination position for primers from forward position. b: The top part of the figure shows an overview of trace alignment. Above each line is the trace description with the last part indicating the position of a trace with respect to the entire length (consensus sequence). The bottom part shows chromatograms in the area delineated by the broken-line box above. Under described experimental conditions10 the first correctly called base is about 35–40 bases away from the 3′ ends of primers. Hence this figure also shows a distribution of primers in relation to the beginning of a hairpin.

FIGURE 5.

FIGURE 5

Assembly of sequencing traces from forward and reverse directions in DNA 8. The top part of the figure shows an overview of trace alignment. Above each line is the trace description with last part indicating the position of a trace with respect to the entire length (consensus sequence). The bottom part of the figure shows chromatograms in the CA region (nucleotides 630–682 on the consensus sequence of 1380 bases) corresponding to a broken-line boxed area in the top part of the figure.

TABLE 2.

Individual Q>20 Read Length Values Corresponding to Chromatograms Shown in Figure 5a

Primer # Direction Q ≥ 20 Direction Q ≥ 20
1 F 803 R 690
2 F 782 R 714
3 F 771 R 729
4 F 786 R 709
5 F 783 R 750
6 F 841 R 666
a

Read length values in Figure 5 represent auto-trimmed values using Sequencher (default trimming parameters) and are different from Q ≥ 20 values.

In all cases presented in this work (107 forward and 150 reverse primers tested on 14 different difficult templates), we did not observe any significant effect of primer proximity to a difficult region on the ability to read through a difficult region (in DNAs for case 1) or on the substantially increased read lengths and better quality for DNAs representing case 2. A much better option to successfully sequence through any kind of difficult template is to use modified chemistry,14,15 as shown in Figure 6A,B, or a template that was prepared with a different preparation method.11,27 The data in this figure show the significant variations (for the same primer) of read length depending on the type of chemistry used. It is also evident that the most optimal type of chemistry depends on the direction of sequencing. This phenomenon is explored more deeply in an upcoming paper based on the interlaboratory study conducted by the DNA Sequencing Research Group on a much larger set of difficult templates (J. Kieleczawa et al., accepted for publication in JBT).

ACKNOWLEDGMENTS

I wish to thank Drs. L. Bloom and B. Ulmer for critical reading and numerous suggestions during the preparation of this manuscript.

REFERENCES

  • 1.Margulies M, Egholm M, Altman WE, et al. Genome-sequencing in micro-fabricated high-density picolitre reactors. Nature. 2005;437:376–380. doi: 10.1038/nature03959. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 2.Bentley DR. Whole genome re-sequencing. Curr Opin Genet Dev. 2006;16:545–552. doi: 10.1016/j.gde.2006.10.009. [DOI] [PubMed] [Google Scholar]
  • 3.McLaughlin SF, Peckham HE, Zhang ZH, et al. Whole-genome resequencing with short reads: Accurate mutation discovery with mate pairs and quality values. 2007 AGBT Conference; Marco Island, FL. Poster 2620. [Google Scholar]
  • 4.Sanger F, Nicklen S, Coulson AR. DNA sequencing with chain-terminating inhibitors. Proc Natl Acad Sci USA. 1977;74:5463–5467. doi: 10.1073/pnas.74.12.5463. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 5.ABI PRISM® BigDye Terminator v3.1 Cycle Sequencing Kit. 2002 Protocol. Part number 4337035 Rev. A. Applied Biosystems, Foster City, CA.
  • 6.Automated DNA Sequencing. Chemistry Guide. Document number 4305080B. 2000. Applied Biosystems, Foster City, CA.
  • 7.Azadan RJ, Fogleman JC, Danielson PB. Capillary electrophoresis sequencing: Maximum read length at minimal cost. BioTechniques. 2002;32:24–28. doi: 10.2144/02321bm01. [DOI] [PubMed] [Google Scholar]
  • 8.Brandis J, Bloom C, Richards JH. DNA polymerases having improved labeled nucleotide incorporation properties. 2001. US Patent 6,265,193.
  • 9.Kieleczawa J, editor. DNA Sequencing: Optimizing the Process and Analysis. Sudbury, MA: Jones and Bartlett; 2005. [Google Scholar]
  • 10.Kieleczawa J, editor. DNA Sequencing II: Optimizing Preparation and Cleanup. Sudbury, MA: Jones & Bartlett; 2006. [Google Scholar]
  • 11.GE Healthcare. Sequence Finishing Kit. Product Code 25-6401-01, 2003.
  • 12.Murray V. Improved double-stranded DNA sequencing using the linear polymerase chain reaction. Nucleic Acids Res. 1989;17:8889. doi: 10.1093/nar/17.21.8889. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 13.Adams PS, Dolejsi MK, Hardin S, et al. DNA sequencing of a moderately difficult template: Evaluation of the results from a Thermus thermophilus unknown test sample. BioTechniques. 1996;21:678. [PubMed] [Google Scholar]
  • 14.Kieleczawa J. Simple modifications of the standard DNA sequencing protocol allow for sequencing through siRNA hairpins and other repeats. J Biomol Tech. 2005;16:220–223. [PMC free article] [PubMed] [Google Scholar]
  • 15.Kieleczawa J. Fundamentals of sequencing of difficult templates-an overview. J Biomol Tech. 2006;17:207–217. [PMC free article] [PubMed] [Google Scholar]
  • 16.Gerstner A, Sasvari-Szekely M, Kalasz H, Guttman A. Sequencing difficult DNA templates using membrane-mediated loading with hot sample application. BioTechniques. 2000;28:628–630. [PubMed] [Google Scholar]
  • 17.Hawes JW, et al. Sequencing through difficult repetitive sequence. Results from the ABRF DNA Sequence Research Group Study. J Biomol Tech. 2003 www.abrf.org/Research-Groups/DNASequencing/DSRG2003Study.
  • 18.Ducat DC, Herrera FJ, Triezenberg SJ. Overcoming obstacles in DNA sequencing of expression plasmids for short interfering RNAs. BioTechniques. 2003;34:1140–1144. doi: 10.2144/03346bm04. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 19.Esposito D, Gillette W, Hartley JL. Blocking oligonucleotides improve sequencing through inverted repeats. BioTechniques. 2003;35:914–920. doi: 10.2144/03355bm02. [DOI] [PubMed] [Google Scholar]
  • 20.Langan JE, Rowbottom L, Liloglou T, Field JK, Risk JM. Sequencing of difficult templates containing poly (A/T) tracts: Closure of sequencing gaps. BioTechniques. 2002;33:276–280. doi: 10.2144/02332bm04. [DOI] [PubMed] [Google Scholar]
  • 21.Thomas MG, Hesse SA, McKie AT, Farzaneh F. Sequencing of cDNA using anchored oligo dT primers. Nucleic Acid Res. 1993;21:3915–3916. doi: 10.1093/nar/21.16.3915. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 22.Zhao X, Haqqi T, Yadav SP. Sequencing telomeric DNA templates with short tandem repeats using dye terminator cycle sequencing. J Biomol Tech. 2000;11:111–121. [PMC free article] [PubMed] [Google Scholar]
  • 23.Yang A. Solutions for sequencing difficult regions. In: Kieleczawa J, editor. DNA Sequencing III: Dealing with Difficult Templates. Sudbury, MA: Jones & Bartlett; 2008. pp. 65–90. [Google Scholar]
  • 24.Koffman D, Sookdeo H. DNA sequencing database: A flexible LIMS for DNA sequencing analysis. In: Kieleczawa J, editor. DNA Sequencing: Optimizing the Process and Analysis. Sudbury, MA: Jones & Bartlett; 2005. pp. 143–156. [Google Scholar]
  • 25.Kieleczawa J, Atnoor D, Carmical M, et al. Essential software and other tools used in modern biology laboratories. In: Kieleczawa J, editor. DNA Sequencing II: Optimizing Preparation and Cleanup. Sudbury, MA: Jones & Bartlett; 2006. pp. 313–353. [Google Scholar]
  • 26.Kieleczawa J, Lakshmanan B, Koffman D, Kitzmiller A. Bio-informatics tools to aid sequencing of difficult templates. In: Kieleczawa J, editor. DNA Sequencing III: Dealing with Difficult Templates. Jones & Bartlett; Sudbury, MA: 2008. pp. 163–177. [Google Scholar]
  • 27.Kieleczawa J, Wu P. Preparation of difficult DNA templates using seven different commercial methods. In: Kieleczawa J, editor. DNA Sequencing II: Optimizing Preparation and Cleanup. Jones & Bartlett; Sudbury, MA: 2006. pp. 1–14. [Google Scholar]

Articles from Journal of Biomolecular Techniques : JBT are provided here courtesy of The Association of Biomolecular Resource Facilities

RESOURCES