Skip to main content
Proceedings of the National Academy of Sciences of the United States of America logoLink to Proceedings of the National Academy of Sciences of the United States of America
. 2002 Jul 1;99(14):9346–9351. doi: 10.1073/pnas.132218699

An approach for global scanning of single nucleotide variations

Xinghua Pan *, Sherman M Weissman †,
PMCID: PMC123143  PMID: 12093903

Abstract

Efficient global scanning of single nucleotide variations in DNA sequences between related, complex DNA samples remains a challenge. In the present article we present an approach to this problem. We have used immobilized thymidine DNA glycosylases to capture and enrich DNA fragments containing internal mismatched base pairs and separate these fragments as a pool from perfectly base-paired fragments as another pool. Enrichments of up to several hundredfold were obtained with one cycle of treatment, and all of the four groups of single nucleotide mismatches were fully covered by combining use of two thymine DNA glycosylases generated here. We have used a heterohybrid-orientating strategy for selective amplification of duplexes with one strand derived from each of two input DNA samples, which can also be used for selective amplification of duplexes with both strands derived from one of two input samples when desired. By combining these methods, the single nucleotide variations either between two DNA pools or within one DNA pool can be obtained in one process. This approach has been applied to the total cDNA from a human cell line and has several potential applications in mapping genetic variations, particularly global scanning of cDNA single nucleotide variations or polymorphisms, and finally high-throughput mapping of complex genetic traits.


The association of specific phenotypes with variations in base sequence has been a major achievement of modern molecular biology (15). As an important part of this process a number of methods have been developed either for rapid detection of known points of variation in DNA sequence or for detection of previously unappreciated sequence variation (6). The latter category of approaches has been difficult to apply to large genomes on a global scale. Although MutHLS enzymes have been used to study the identical-by-descent of the Saccharomyces cerevisiae, mouse, and human genome (710), we and others have not obtained sufficient specificity with MutHLS to permit global detection of single nucleotide variations in a complex DNA pool. Representative differential analysis has been successfully used in scanning short or long fragment deletions (11, 12), but it is generally not sensitive to subtle nucleotide variation.

In this article we present results obtained by using a different group of enzymes, the thymine (thymidine) DNA glycosylases (TDGs) (1318), to separately enrich perfectly matched (Pm) and mismatch-containing (Mm) DNA duplexes from complex mixtures. While this study was in progress, DNA glycosylases were used for detection of DNA mutation (19, 20), but those modes were not suitable for global-scale scanning. DNA glycosylases are a group of DNA repair enzymes whose primary action is to hydrolyze the bond between deoxyribose and one of the bases in DNA, generating an abasic site without necessarily cleaving the sugar phosphate backbone of DNA. Many glycosylases are specific for abnormal bases generated by chemical damage to DNA including cytidine deamination. However, two types of enzymes, MutY and TDG, will hydrolyze certain normal bases in a DNA duplex if they are not part of a Watson–Crick base pair. In addition, TDG might have a more useful range of specificities.

We show that the method presented in this article can, with one cycle of treatment, obtain more than 100-fold physical enrichment of either pool of DNA duplexes (Pm or Mm) from the other pool. We also describe a method for comparing two pools of DNA fragments and separately recovering fragments in which one strand is derived from each of the two pools (heterohybrids) and fragments with both strands derived from the same pool (homohybrids). We present evidence proving the principle of this approach for global screening of single nucleotide variations in a complex cDNA pool. Finally, we discuss the potential applications of this approach.

Materials and Methods

Cloning, Expression, and Purification of TDG.

Two TDGs were prepared. Human TDG (hTDG) cDNA was amplified from cDNA prepared by reverse transcription of poly(A) RNA from the human JY lymphoid cell line with a pair of primers that were designed according to the reported sequences (16): upstream primer (5′- GCGGATCCGAAGCGGAGAACGCGGGCAGCTATTC) and downstream primer (5′-ACCGGTCGACAGCAGAGCTGAGAAGCACCATTCT). The amplicon was cloned into vector pGEX-4T3 (Amersham Pharmacia) between the restriction sites BamHI and SalI (pGex-4T3-hTDG). The fused gluththione S-transferase (GST)-hTDG was highly expressed in Escherichia coli BL21 after induction with 1 mM isopropyl β-d-thiogalactoside at 25°C for 3 h, and was purified to homogeneity with glutathione Sepharose 4B beads (Amersham Pharmacia). Alternatively, the enzyme was kept on the beads in an immobilized form. These soluble and bead-immobilized hTDGs were redissolved in a storage buffer (2 mM 2-BME/5 mM EDTA/100 mM NaCl/50 mM KCl/200 μg/ml BSA/25 mM Hepes, pH 7.6/50% glycerol) and stored at −80°C. The gene of the second thymidine glycosylase, the archaeon Methanobacterium thermoautotropicum DNA mismatch N-glycosylase (Mig.Mth or mTDG) (13), was PCR-amplified directly from this microorganism with the following primers: upstream primer (5′-GCGGATCCCATCACCATCACCATCACTTGGATGATGCTACTAATAAAAAAAG) and downstream primer (5′-ACCGGTCGACAGTACTACACTTCTCATAGTAGCTAC). The His-6 codons in the upstream primer were originally designed for potential alternative purification or immobilization. The vector construction, expression, purification, and storage of GST-mTDG were the same as that for GST-hTDG.

Oligodeoxynucleotides for Model Templates' Construction.

The oligonucleotides used here were synthesized at the Oligonucleotide Synthesis Laboratory, Department of Pathology, Yale School of Medicine, and were gel-purified.

The oligodeoxynucleotides S76C1G1 (5′-GATCAGGAATCGGTCTACGTGCGTGAATCCACACCGAGCTATTCTCTCGATGTAAGTACCACGGTGCAGGTCGACG) and S76G2T2 (5′-GATCCGTCGACCTGCACCGTGGTACTTACATCGAGAGAATAGCTTGGTGTGGATTCACGCACGTAGACCGATTCCT) were annealed to form a duplex 76 Mm CG/GT, which contained an internal CG/GT pairing (the bold letters indicate the mispaired nucleotides).

Oligonucleotides Bo60 (5′-CCGTGGTACTTACATCGAGAGATCC1G1CTTGGTGTGGATTCACGCACGTAGACCGATTCCT) and the To60 (5′-AGGAATCGGTCTACGTGCGTGAATCCACACCAAGC2G2GATCTCTCGATGTAAGTACCACGG) were annealed to form a 60-bp Pm oligonucleotide duplex, 60 Pm. The substitutions of one or two nucleotides led us to construct the most favorable and the least favorable G/T Mm duplexes with different 5′ neighbor nucleotides (60 Mm CG/GT, 60 Mm AG/TT) and a duplex with two-adjacent mismatches (60 Mm AG/GT).

A second series of 60-bp bottom strands, with different nucleotide substitutes in the middle part of the second basic bottom strand (Bs60: 5′-CCGTGGTACTTACATCGAGAGATCCACTTGGTGTGGATTCACGCACGTAGACCGATTCCT, with bold letter indicating the nucleotide different from Bo60) were also made to anneal with the consensus top strand (To60T35, with a T substituting for the C at the position 35 of To60 above) for construction of a series of other nucleotide mismatch(es)-containing duplexes, including: 60 Mm C/A, 60 Mm C/T, 60 Mm G/A, 60 Mm T/T, 60 Mm A/A, 60 Mm G/G, 60 Mm C/C, 60 Mm GG/TG, 60 Mm GGA/TGA, and duplexes with different insertions: 60 Ins.T, 60 Ins.A, 60 Ins.C, 60 Ins.G, 60 Ins.GT, and 60 Ins.GTG.

TDG Gel-Shift Assay and TDG-Mediated Cleavage Assay.

A TDG gel-shift assay was designed and optimized for detecting the binding of TDG with DNA. For making the template duplex, one oligonucleotide was 5′ P32-labeled with T4 polynucleotide kinase and purified with the Qiaquick nucleotide removal kit (Qiagen, Chatsworth, CA), followed by annealing it with its complementary oligonucleotide in an annealing buffer (10 mM Tris⋅HCl, pH 7.5/60 mM NaCl), heating to 95°C for 10 min then cooling to room temperature overnight to form an oligonucleotide duplex. The labeled oligonucleotide usually was the strand that would not be attacked by TDGs. A mixture of 0.2 pmol duplex and an appropriate amount of purified mTDG (1.5 μl × 1.4 μg/μl) or hTDG (1.5 μl × 2.0 μg/μl), except as otherwise specified, were incubated in 20 μl TDG gel-shift buffer [25 mM Tris⋅HCl pH 8.3/5 mM EDTA/200 mM NaCl (except when otherwise specified)] at 37°C (hTDG) or 60°C (mTDG) or 25°C (both enzymes) for 4–6 h, followed by addition of 5 μl 20% Ficoll buffer. Eight microliters of each sample was loaded onto a 12% sequencing gel, and the dried gel was exposed to x-ray film as described (21).

For the TDG-mediated cleavage assay, the labeled oligonucleotide usually was the strand that would be attacked by TDGs. The labeled and unlabeled oligonucleotides were annealed to form a duplex, as described above. Either 1.5 μl × 1.4 μg/μl mTDG or 1.5 μl × 2.0 μg/μl hTDG was incubated with 0.2 pmol DNA duplex in 20 μl optimized TDG cleavage buffer (25 mM Tris⋅HCl, pH 7.5/5 mM MgCl) at 37°C (hTDG) or 60°C (mTDG) or 25°C (both enzymes) for 4–6 h, followed by treatment in 100 mM NaOH at 99°C for 10 min, addition of 15 μl sequencing stop buffer, and denaturation at 96°C for 5 min; 10 μl of each sample was then loaded onto an 8% sequencing gel. The dried gel was exposed to x-ray film as described (21).

Separation and Enrichment of Internal Nucleotide Mm DNA Fragments from Pm Fragments with Immobilized TDG.

The input DNA sample was a mixture of DNA duplexes composed of Pm and Mm duplex(es). Up to 10 μg DNA mixture was loaded in an Eppendorf tube containing 80 μl pre-equilibrated immobilized hTDG or mTDG on beads with 500 μl TDG binding buffer (50 mM Tris⋅HCl, pH 8.3/50 mM KCl/5 mM EDTA/0.2 mM ZnCl/1 mM DTT/0.25 mg/ml BSA), and rotated gently at room temperature for 6 h or overnight. Then 2–3 cycles of washing were carried out with 1000 μl washing buffer (50 mM Tris⋅HCl, pH 8.3/50 mM KCl/5 mM EDTA/0.2 mM ZnCl/1 mM DTT/0.25 mg/ml BSA with NaCl in concentrations of 50–300 mM). Each cycle of washing included 4 h of incubation at room temperature and removal of the supernatant after centrifugation at 1,000 rpm × 2 min. Elution was accomplished by application of 1,000 μl elution buffer (50 mM Tris⋅HCl, pH 7.5/50 mM KCl/30-50 mM MgCl2/0.2 mM ZnCl/1 mM DTT/0.25 mg/ml BSA) at room temperature for 4 h and centrifugation at 1,000 rpm × 2 min. This elution step was repeated 2–3 cycles; the eluates were collected as the portion of the DNA duplex(es) with internal mismatches.

For further enrichment of the Pm DNA duplex(es), the initial flow-through portion (supernatant) obtained above was phenol-chloroform-ethanol-purified, and applied to a tube of 100 μl pre-equilibrated fresh immobilized TDG bead suspension with 500 μl TDG Pm enrichment buffer (50 mM Tris⋅HCl, pH 7.5/50 mM KCl/10 mM EDTA/0.2 mM ZnCl/1 mM DTT/0.25 mg/ml BSA), rotated gently at room temperature for 6 h or overnight, centrifuged at 1,000 rpm × 2 min. The supernatant contained the Pm DNA duplexes.

Fractions from different steps were phenol chloroform-extracted, ethanol-precipitated, and dissolved in 10 mM Tris·HCl, pH 7.5/1 mM EDTA for further analysis, or for repeat of this enrichment process when necessary.

Specific Recovery of the Heterohybrids from the Mixture of Heterohybrids and Homohybrids.

A strategy based on the PCR suppressing effect (22, 23) was adopted and modified here for the specific recovery of the heterohybrids from a mixture of heterohybrids and homohybrids and for further subdividing of the amplicon when required. Two partly duplex adapters were synthesized. The two oligonucleotides HeA1 (5′-GTAATACGACTCACTATAGGGCTCGAGCGGCCGCCCGGGCAGGT) and HeA2 (5′-GATCACCTGCCC) were annealed to constitute the first heterohybrids-orientating adapter (AdHeA). Two other oligonucleotides, HeB1 (5′-TGTAGCGTGAAGACGACAGAAAGGGCGTGGTGCGGAGGGCGGT) and HeB2 (GATCACCGCCCTCCG), were annealed to construct the second adapter (AdHeB). There was no phosphate at the 5′ end of oligonucleotides HeA2 or HeB2. Each adapter had a 5′ GATC overhang on the inner end for linking with the Sau3AI cohesive end of a DNA fragment. Primers HeP1 (5′-GTAATACGACTCACTATAGGGC) and HeP2 (5′-TGTAGCGTGAAGACGACAGAA) with sequences respectively derived from the outer part of HeA1 and HeB1 were synthesized for the recovery and amplification of the constructs. A set of primers HeP1N (5′-GCGGCCGCCCGGGCAGGTGATCN), where the N is one nucleotide or a combination of two or three nucleotides, was used to amplify a subset of the DNA fragments that were ligated to the adapter AdHeA on one end and had the N nucleotide(s) adjacent to the adapter. Another set of similar primers, HeP2N, was also synthesized (5′-CGTGGTGCGGAGGGCGGTGATCN).

Test DNA sample (Sau3AI-digested human lymphoblastoid cDNA or adenovirus vector pAdEasy-1 DNA) was divided into two halves, forming pools a and b, and separately ligated to the adapters AdHeA and AdHeB with T4 DNA ligase and formed the constructs of AdHeA-DNA-AdHeA and AdHeB-DNA-AdHeB. To form a mixture containing the heterohybrid AdHeA-DNA-AdHeB, these two constructs were mixed together, purified with phenol chloroform, precipitated with ethanol, suspended in 4 μl of 3× EE buffer [30 mM [N-(2-hydroxyethyl) piperazine-N′-(3-propanesulfonic acid), pH 8.0] (Sigma) and 3 mM EDTA], overlaid with 1 drop of mineral oil, heated at 98°C × 4 min and cooled to 85°C. One microliter of preheated 5 M NaCl was added and carefully mixed by pipetting. This mixture was slowly cooled (in 5 h) to 67°C and held at 67°C for 20 h. It was then diluted to 200 μl, and 2 μl was taken and supplied with PCR components including 1× buffer, primers Hep1 and Hep2, and four dNTPs, but not Taq DNA polymerase, in 25 μl, which was heated at 72°C for 15 min in a PCR machine with the preheated Taq DNA polymerase added in the fifth minute to fill in the ends. For recovery of the general pool of heterohybrids, this reaction was directly connected to a thermal cycle profile (94°C × 40 s, 68°C × 50 s, and 72°C × 1 min for 14–24 cycles). To reduce the complexity of a DNA pool, the PCR product obtained was diluted 100- to 1,000-fold, and different combinations of primers [HeP1N (or HeP1) and HeB2N (or HeP2)] were used in a second set of PCRs for subdivision.

cDNA Single Nucleotide Variation Scanning.

We prepared a pool of oligo(dT)-primed cDNA from the oligo(dT) cellulose-purified human lymphoblastoid cell mRNA according to the protocol described in the GIBCO/BRL kit. The cDNA was then digested with Sau3AI in the provided buffer at 37°C for 3 h, purified with phenol-chloroform, and precipitated with 1/5 vol 3 M NaAc, pH 5.0 and 2.5 vol ethanol.

A 5-μg digested cDNA pool was divided into two halves and separately ligated to the adapter AdHeA and AdHeB. The two constructs were mixed, denatured, reannealed, and filled in the ends as described above. The filling-in reaction was stopped by adding 20 mM EDTA into the mixture. After preincubating 20 μg of HaeIII-digested human genomic DNA with beads immobilizing hTDG or mTDG in TDG binding buffer at room temperature for 4 h, the cDNA sample treated above was added in the mixture. The following separation of Mm fragments from Pm fragments was accomplished as described. The whole pool or specific subsets of the heterohybrids of both Mm and Pm fragments were separately amplified and displayed in parallel on an 8% sequencing gel (21).

The candidate Mm fragment bands were recovered by needle puncture and amplified with the same primers used to generate the sample for gel display. The PCR products were digested with Sau3AI and cloned into pBluescript II SK(−). Inserts of single clones were PCR-amplified with the flanking primers derived from the vectors and transferred to duplicate nylon membranes (Hybond-N+). The pools of Mm and Pm output cDNA fragments from the treatment were separately α-P32-labeled via PCR and separately hybridized to the membranes.

Results

Outline of the Strategy.

If two pools of DNA are digested with a restriction endonuclease, mixed, denatured, and reannealed, the result will be a mixture of Pm DNA duplexes and duplexes containing one or more mismatched base pairs (Mm). Separation in bulk of the Pm from the Mm fragments, followed by selective recovery of heterohybrids or one of two homohybrids would permit detection both of DNA sequence variants in each pool and sequence variants that distinguished the sequences in one pool from those in the other (Fig. 1). Initial efforts to accomplish this separation with the MutS and MutL components were unsatisfactory. We therefore explored the use of TDGs for this purpose.

Figure 1.

Figure 1

Outline of the strategy for global scanning of single nucleotide variations between two pools of DNA duplex. Two pools of DNA duplexes, a and b, were ligated to two different sets of adapters. AA or BB represent DNA duplexes in which both strands come from the same sources, or homohybrids. AB represents DNA duplexes in which the two strands came from different pools, or heterohybrids. Pm represents Pm DNA duplexes. Mm indicates DNA duplexes with one or more mismatched base pairs. ▵ indicates a mismatched base pair.

Producing Two TDGs and Testing Their Activity and Specificity.

Both hTDG and mTDG were prepared as GST fusion proteins and purified. Both enzymes were stable for up to 2 years when stored at −80°C in either a soluble form or attached to glutathione beads in the storage buffer given in Materials and Methods.

Initially we measured activity and specificity of both enzymes in the soluble form. DNA cleavage assays showed that, in our optimized condition, either enzyme could be used to quantitatively remove synthetic DNA duplexes containing a single mismatched G/T base pair, regardless of the flanking base pair (a mTDG cleavage assay is shown in Fig. 2A). Also, the TDG could largely destroy duplexes containing an internal loop of two or three mismatched base pairs containing G/T and duplex containing insertion of thymidine (Ins.T). Further studies showed that the enzymes, when used in sufficient amounts and under optimized conditions, could attack a variety of other mismatched base pairs but did not attack any Pm DNA. The substrate specificities for these two enzymes in some cases overlapped but not all cases (Table 1). Furthermore, neither enzyme was found to recognize the 2- or 3-nt insertion tested (i.e., Ins.GT or Ins.GTG), or any other single nucleotide insertion (Ins.A, Ins.G, or Ins.C). Again, experiments showed that the two enzymes could be used to selectively bind Mm DNA fragments in gel mobility-shift assays in an optimized condition (an hTDG binding assay is shown in Fig. 2B). These results encouraged us to explore using immobilized enzyme to physically separate Mm from Pm fragments in a complex mixture.

Figure 2.

Figure 2

Soluble TDGs specifically recognized DNA duplex containing mismatched base pairs. Some 0.2 pmol 5′ P32-labeled Bo60 was annealed with 1.0 pmol To60 to form 60-bp Pm or 60-bp Mm with various mismatched base pairs. (A) mTDG-mediated cleavage of mismatch duplexes with different 5′ neighbor nucleotides or a duplex containing two successive mismatches. Sixty-base pair Pm (60 Pm) and three 60-bp Mm duplexes containing respectively CG/GT, AG/TT, and AG/GT were incubated separately with purified mTDG (1.5 μl × 1.4 μg/μl) and assayed by cleavage of DNA at the abasic sites. (B) hTDG specifically bound mismatch duplex in optimized conditions. Sixty-base pair Pm (60 Pm) and 60-bp Mm (60 Mm CG/GT) duplexes were incubated separately with purified hTDG (1.5 μl × 2.0 μg/μl) in the TDG gel-shift buffer. The highest ratio of Mm/Pm was reached at 200 mM NaCl.

Table 1.

The four groups of single nucleotide mismatches were fully covered by using two TDGs in combination and some other subtle mismatches were also recognized

Mismatches hTDG mTDG Frequencies
G/T and A/C G/T*** G/T*** 62.5%
C/T and A/G C/T** A/G*, C/T* 20.1%
G/G and C/C G/G** 10.9%
T/T and A/A T/T** T/T* 6.5%
Ins. T Ins. T**
AG/GT * *
GT/TG *** ***
GGA/TGA *** ***
*

Various sensitivity of the TDG-mediated cleavage (***>**>*); — no recognition. 

The frequencies of the groups of nucleotide variation existed in the coding regions of human genes, based on the data reported by M. Krawczak (24). No recognition for Ins. A, Ins. G, Ins. C, Ins. GT, or Ins. GTG was found, and these insertions are not listed. 

Immobilized TDG Mediated Specific Enrichment of Mm DNA Duplexes and Pm DNA Duplexes.

A synthetic DNA duplex containing a single internal G/T mismatch was labeled on the DNA strand containing the G, that is, the strand that would not be attacked by TDGs. This duplex was mixed with a labeled Pm duplex, and the mixture was subjected to the process of separation and enrichment of Mm and Pm fragments. As shown in Fig. 3, several hundredfold enrichment of Mm or Pm DNA duplexes could be obtained with appropriate conditions. The estimated fold enrichment is only approximate because of the difficulty in quantitating the very small amounts of radioisotope signal of residual Mm or Pm duplex in the background.

Figure 3.

Figure 3

Immobilized TDGs specifically enriched Mm fragments or Pm fragments under different conditions. (A) Gel display. C, The input mixture control containing a 76-bp mismatch oligonucleotide duplex (76 Mm CG/GT) and a 60-bp perfect oligonucleotide duplex (60 Pm); F, the initial flow-through or supernatant portion; W1 and W2, the first and second washing supernatant off the beads with buffer containing 100 mM and 300 mM NaCl, respectively; E1 and E2, the first and second elutes, with the elution buffer containing 30 mM and 50 mM Mgcl2, respectively. See detailed procedure in Materials and Methods. (B) Pm enrichment with hTDG or mTDG. (C) Mm enrichment with hTDG.

Larger DNA fragments showed some nonspecific affinity for the beads. To test the effectiveness of the enrichment in the presence of larger fragments, we labeled an enzymatic digest of a plasmid, MspI-digested pBR322, containing fragments up to 622 bp in length. This digest was mixed with a labeled synthetic 60-bp duplex containing a mismatched base pair (G/T) and applied to the immobilized mTDG beads. The mismatch-enriched eluate showed 10- to 30-fold enrichment of the shorter Mm fragment compared with the larger fragments from the plasmid. Conversely, the Mm fragment was relatively depleted from the initial flow-through portion so that the ratio of Mm to Pm in the eluate to the ratio in the flow-through was 100- to 900-fold (Fig. 4). More cycle of treatment on the output of Pm and/or Mm could further increase the enrichment fold (data not shown). In several experiments, the addition of a 25-fold excess unlabeled digested genomic DNA as a carrier did not affect the capture of Mm fragments and, if anything, improved the specificity.

Figure 4.

Figure 4

Enrichment of a 60-bp Mm DNA duplex from a mixture containing a wide range of sizes of DNA duplexes. MspI-restricted pBR322-DNA was 3′ P32-labeled with Tag polymerase at 72°C for 5 min in the presence of α-P32 dCTP and dGTP. A 60-bp oligonucleotide duplex, 60 Mm CG/GT , was 5′ P32-labeled with T4 polynucleotide kinase onto the mismatched G-containing strand (Bo60). The mixture containing these labeled DNA fragments was loaded onto the mTDG beads, washed and eluted as described in Materials and Methods. C, The input mixture control. The description of F, W1, W2, E1 and E2 is the same as that in Fig. 3.

Specific Recovery of the Heterohybrid DNA Duplexes from the Mixture of Two Original DNA Sources That Are Mixed and Denatured/Reannealed.

For several applications it would be desirable to selectively recover DNA heterohybrids or homohybrids. We explored several methods to do this. The principle and selectivity of the current design is shown in Fig. 5. Only the duplexes with different adapters on two ends, which are filled in before PCR, are amplifiable, whereas DNA strands with both ends from the same component of the mixture cannot be amplified because of competitive head-to-tail self-annealing of the DNA during PCR. In addition, the recovered fragments showed a satisfactory size range, from 100 to 800 bp.

Figure 5.

Figure 5

Selective amplification of heterohybrids with the heterhybrid-orientating strategy. (A) Schematic diagram of the strategy. The fragments with only one end linked to an adapter or neither of the two ends ligated to an adapter are not illustrated here, because they are not exponentially amplifiable. Dotted pattern represents the outer part of adapter AdHeA, from which the primer HeP1 sequence is derived; dotted lines represents the inner part of adapter AdHeA, from which the subsetting primer HeP1N sequence is derived. Solid rectangles represent the outer part of adapter AdHeB, from which the primer HeP2 sequence is derived; open rectangles represent the inner part of adapter AdHeB, from which the subsetting primer HeP1N sequence is derived. (B) The adenovirus vector pAdEasy-1 and a human lymphoblastoid cell cDNA pool were separately digested with Sau3AI and ligated to adapter AdHeA (lanes 1) and AdHeB (lanes 2). The two constructs ligated with different adapters were mixed together, denatured, and reannealed to form heterohybrids (lanes 3). After filling in the ends and purification, the samples were recovered by PCR with primers HeP1 and HeP2. The products were displayed on an 8% sequencing gel.

Enrichment and Gel Display Analysis of Heterzygous Loci of a Human Lymphoblastoid Cell cDNA Pool.

We enriched and analyzed the heterzygous loci of a human lymphoblastoid cell cDNA pool. Allelic variations in the coding sequences of this diploid cell would show up as Mm fragments. The cDNA was digested with Sau3AI, and heterohybrid-orientating adapters were added as described above. After treatment with immobilized hTDG, the enriched pools of Pm and Mm fragments were amplified with a radioactive DNA primer, and the resulting fragments were fractionated by electrophoresis on an 8% acrylamide sequencing gel (Fig. 6A1). The complexity of the mixture of fragments was also reduced in test runs by using DNA primers whose 3′ end extended beyond the Sau3AI recognition sequence by one or two specifically chosen bases (Fig. 6A2). Because Mm and Pm DNA pools were amplified to the same final yield, whereas the Mm fragments are fewer in number than the Pm fragments, the relatively high intensity of bands in the Mm lanes does not reflect an absolute enrichment for these fragments.

Figure 6.

Figure 6

Gel display and Southern blot analysis of Mm fragments of a human cDNA pool. After an immobilized hTDG-mediated separation of Mm and Pm from a human lymphoblastoid cell cDNA pool, the resulting pools of Mm and Pm were subjected to selective heterohybrid amplification. (A1) The cDNA pool recovered by PCR with primers HeP1 × HeP2 (20 cycles). (A2) Two subsets with decreasing coverage. The output of the first PCR with HeP1 × HeP2 (14 cycles) was diluted × 100-fold, 2 μl of it was taken as the input for the second PCR amplification with subdividing primers HeP1C × HeP2 or HeP1CA × HeP2 (20 cycles). Numbers (1, 2, 3, 4, 5) are different candidate Mm bands. (B) PCR-Southern blot test of the candidate Mm gene fragments indicated in A2. The marker (*) shows the clone's hybridization signal is significantly stronger on the membrane hybridized with Mm probe than that with Pm probe. The marker (+) refers to the clone (217 bp) derived from band 3. The (1) and (2) after Actin β and UbC indicate two different fragments of a gene.

Approximately 40 bands that appeared to be enriched in the Mm portion of the cDNA fragments were recovered from the gels by PCR. Because these fragments had mirror orientation to adapters, they could not be sequenced directly but had to be subcloned. On an average, four clones from each band were sequenced. The resulting fragments included DNA sequences from the MHC class I and class II alleles, a number of ribosomal proteins, mitochondrial DNAs, unibiquitins, and actins (70%), and a variety of single copy genes (30%). Several different types of fragments were often recovered from a single band, both because of the complexity of the cDNA mixtures and the impurity of the gel-excised bands. To analyze the enrichment of Mm fragments, individual cloned fragments were recovered by PCR and subjected to Southern blotting with probes prepared from either the total Mm or total Pm DNA pools (Fig. 6B). The result showed a large fraction of the fragments were enriched in the Mm pool, although the degree of enrichment varied over a fairly wide range.

Discussion

In the present work, we demonstrate the ability of our approach to globally screen an entire cDNA pool with a quick and high-throughput procedure for detecting single nucleotide variations that may affect the function of, or associate with, the gene(s) responsible for a phenotype. This process promises to be a powerful tool for global screening of single nucleotide variations in any cDNA pool. The cDNA pool used can be derived from either a single person or a group of persons. The single nucleotide variation information obtained can be from either a single pool or between two pools. Similarly, the identical DNA fragments shared within one pool or between two pools can also be obtained. This approach may be useful in analyzing the genetic components of a multifactorial disorder in which the affected cells or tissues are localized. Further, it provides a conceptual pathway to reach the final goal of whole genome screening for a multifactorial disorder in humans.

The TDGs have certain properties that make them especially suitable for large-scale detection of sequence variation (1318, 25). After purification they show a high degree of specificity for mismatch bases and can hydrolyze a variety of different mismatched normal DNA bases. We found that all four groups of single nucleotide mismatches and a few other subtle nucleotide mismatches are covered by using mTDG and hTDG in combination (Table 1). In addition, our data showed that after base hydrolysis the enzymes exhibit a high affinity for abasic sites in the absence of magnesium, and specifically and efficiently release the abasic sites in the presence of enough magnesium (Fig. 3), although the dissociation ratio may vary among different bases opposite the abasic site within a limited concentration of magnesium (17). Finally, the enzymes are sufficiently stable for long-term storage as immobilized products. While this work was in progress, Chakrabarti et al. (26) reported that a different class of glycosylase (MutY), when combined with the treatment of aldehydes, could enrich A/G mismatch.

Distinguishing genetic variations between two populations from the variations within one population is essential for dissecting the genetic components of any phenotype of an organism exhibiting a high level of intrapopulation polymorphism. Among a number of possible methods we designed, the strategy we demonstrate here provides a simple way and a satisfactory selectivity for separating heterohybrids from homohybrids after the two DNA pools are denatured and reannealed. When this method is applied to two halves of one common DNA pool as presented here, the recovered products actually represent homohybrids. When four different adapters are separately ligated to four halves of two original DNA pools a and b (two halves for original pool a and two halves for original pool b) followed by denaturing and reannealing, either heterohybrids (AB) or one of the two homohybrids (AA or BB) can be selectively recovered from a DNA mixture by selective use of the PCR primer pair corresponding to the related set of adapters. In addition, the strategy we used here for subdividing an entire DNA mixture into subsets with reduced complexity will enable us to more efficiently analyze a complex DNA resource; a full set of subsets may cover all of the components of the DNA pool analyzed.

Although we fragmented the long DNA fragments here with Sau3AI, other restriction enzymes may be used. However, restriction with any single endonuclease would generate some fragments too small for reannealing and others too large for effective PCR amplification in a competitive situation, thus maximum coverage should be obtained by using several enzymes.

In this article, we used gel display and Southern blot hybridization to analyze the results. For large-scale application, tiling arrays of unique genomic fragments or cDNA fragments combined with the presented technique are expected to detect the single nucleotide variations in cDNA or the whole genome in a high-throughput fashion.

When applying this approach to the human genome DNA, some further development may be required. For many fragments in a cDNA pool, individual species of DNA fragments are in high enough concentration for efficient reannealing. For application to unique sequence of genomic DNA from mammals, some prior enrichment of DNA subsets would be desirable, whether by physical separation or enzymatic subdivision of groups of DNA fragments, or by selection by annealing to cloned segments of genomic DNA such as yeast artificial chromosome or bacterial artificial chromosome clones.

We noticed that the DNA duplexes formed by annealing of strands from different members of a family of closely related genes such as the MHC class I genes tended to be recovered. The ongoing improvements of the method may considerably decrease this complication.

We visualize several other applications for approaches of the type we describe here. In studies of mutations from malignant tissues, areas of loss of heterozygosity would appear as regions in which there is no detectable polymorphism. Similarly, this type of method might be used to detect somatic single base mutations arising in malignancies. In these cases no simple genetic method is applicable for mutation detection.

In principle, this approach could also be very useful for detection of mutations or polymorphisms arising in inbred model organisms including mice. Limitations of application to mice generated in current large-scale mutagenesis studies are that the level of mutation generated, for example by N-ethyl-N-nitrosourea, may be too many irrelevant mutations to permit screening of first-generation mice. In addition, when mouse hybrids are mutated or generated by breeding, it will be necessary to distinguish strain variations from relevant induced mutations.

Finally, the type of analyses suggested here is particularly attractive for certain human genetic variants. Sporadic dominant mutations that prevent fertility, for example, cannot be detected by genetic or positional cloning methods, but at least in principle might be detectable by procedures permitting whole-genome scanning for mutations. Familial recessive mutations might also be identified by detecting regions of loss of heterozygosity. Detection of recurring single nucleotide polymorphisms with affected individuals might assist in mapping familial dominant mutations. Combined with quantitative microarray hybridization and DNA pooling strategies, which can be used for allele-frequency estimation, this approach could be used to screen the single nucleotide variations in terms of allele frequency distortion for a given phenotype. Therefore, this approach might be developed into a valuable tool for mapping complex human diseases.

Acknowledgments

We thank Dr. David C. Ward (Yale School of Medicine) for critical reading and insightful comments about the manuscript. We thank Dr. Roger Lasken (Molecular Staging Inc.) for management throughout this project. We also thank Chenhao Fan for his assistance in gel display and sequence data analysis, Hai Huang and Wesley Bonds for preparing the human cDNA pool, and Barbara Gramenos for helping edit the manuscript. This work received financial support from National Cancer Institute Grants R21 CA088326 and R33 CA88326.

Abbreviations

TDG

thymine (thymidine) DNA glycosylase

hTDG

human TDG

mTDG

Methanobacterium thermoautotropicum DNA mismatch N-glycosylase

Pm

perfectly matched

Mm

mismatch-containing

GST

glutathione S-transferase

References

  • 1.Gray I C, Campbell D A, Spurr N K. Hum Mol Genet. 2000;9:2403–2408. doi: 10.1093/hmg/9.16.2403. [DOI] [PubMed] [Google Scholar]
  • 2.Hirschhorn J N, Lohmueller K, Byrne E, Hirschhorn K. Genet Med. 2002;4:45–61. doi: 10.1097/00125817-200203000-00002. [DOI] [PubMed] [Google Scholar]
  • 3.Jonsson J J, Weissman S M. Proc Natl Acad Sci USA. 1995;92:83–85. doi: 10.1073/pnas.92.1.83. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 4.Risch N, Merikangas K. Science. 1996;273:1516–1517. doi: 10.1126/science.273.5281.1516. [DOI] [PubMed] [Google Scholar]
  • 5.Taylor J G, Choi E H, Foster C B, Chanock S J. Trends Mol Med. 2001;7:507–512. doi: 10.1016/s1471-4914(01)02183-9. [DOI] [PubMed] [Google Scholar]
  • 6.Syvanen A C. Nat Rev Genet. 2001;2:930–942. doi: 10.1038/35103535. [DOI] [PubMed] [Google Scholar]
  • 7.Cheung V G, Gregg J P, Gogolin-Ewens K J, Bandong J, Stanley C A, Baker L, Higgins M J, Nowak N J, Shows T B, Ewens W J, et al. Nat Genet. 1998;18:225–230. doi: 10.1038/ng0398-225. [DOI] [PubMed] [Google Scholar]
  • 8.McAllister L, Penland L, Brown P O. Genomics. 1998;47:7–11. doi: 10.1006/geno.1997.5083. [DOI] [PubMed] [Google Scholar]
  • 9.Nelson S F, McCusker J H, Sander M A, Kee Y, Modrich P, Brown P O. Nat Genet. 1993;4:11–18. doi: 10.1038/ng0593-11. [DOI] [PubMed] [Google Scholar]
  • 10.Mirzayans F, Walter M A. Methods Mol Biol. 2001;175:37–46. doi: 10.1385/1-59259-235-X:037. [DOI] [PubMed] [Google Scholar]
  • 11.Lisitsyn N, Wigler M. Science. 1993;259:946–951. doi: 10.1126/science.8438152. [DOI] [PubMed] [Google Scholar]
  • 12.Wallrapp C, Gress T M. Methods Mol Biol. 2001;175:279–294. doi: 10.1385/1-59259-235-X:279. [DOI] [PubMed] [Google Scholar]
  • 13.Horst J P, Fritz H J. EMBO J. 1996;15:5459–5469. [PMC free article] [PubMed] [Google Scholar]
  • 14.Hardeland U, Bentele M, Lettieri T, Steinacher R, Jiricny J, Schar P. Prog Nucleic Acid Res Mol Biol. 2001;68:235–253. doi: 10.1016/s0079-6603(01)68103-0. [DOI] [PubMed] [Google Scholar]
  • 15.Neddermann P, Jiricny J. J Biol Chem. 1993;268:21218–21224. [PubMed] [Google Scholar]
  • 16.Neddermann P, Gallinari P, Lettieri T, Schmid D, Truong O, Hsuan J J, Wiebauer K, Jiricny J. J Biol Chem. 1996;271:12767–12774. doi: 10.1074/jbc.271.22.12767. [DOI] [PubMed] [Google Scholar]
  • 17.Waters T R, Gallinari P, Jiricny J, Swann P F. J Biol Chem. 1999;274:67–74. doi: 10.1074/jbc.274.1.67. [DOI] [PubMed] [Google Scholar]
  • 18.Waters T R, Swann P F. J Biol Chem. 1998;273:20007–20014. doi: 10.1074/jbc.273.32.20007. [DOI] [PubMed] [Google Scholar]
  • 19.Vaughan P, McCarthy T V. Genet Anal. 1999;14:169–175. doi: 10.1016/s1050-3862(98)00025-4. [DOI] [PubMed] [Google Scholar]
  • 20.Bazar L S, Collier G B, Vanek P G, Siles B A, Kow Y W, Doetsch P W, Cunningham R P, Chirikjian J G. Electrophoresis. 1999;20:1141–1148. doi: 10.1002/(SICI)1522-2683(19990101)20:6<1141::AID-ELPS1141>3.0.CO;2-7. [DOI] [PubMed] [Google Scholar]
  • 21.Prashar Y, Weissman S M. Methods Enzymol. 1999;303:258–272. doi: 10.1016/s0076-6879(99)03017-7. [DOI] [PubMed] [Google Scholar]
  • 22.Matz M, Usman N, Shagin D, Bogdanova E, Lukyanov S. Nucleic Acids Res. 1997;25:2541–2542. doi: 10.1093/nar/25.12.2541. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 23.Diatchenko L, Lau Y F, Campbell A P, Chenchik A, Moqadam F, Huang B, Lukyanov S, Lukyanov K, Gurskaya N, Sverdlov E D, Siebert P D. Proc Natl Acad Sci USA. 1996;93:6025–6030. doi: 10.1073/pnas.93.12.6025. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 24.Krawczak M, Ball E V, Cooper D N. Am J Hum Genet. 1998;63:474–488. doi: 10.1086/301965. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 25.Mol C D, Parikh S S, Putnam C D, Lo T P, Tainer J A. Annu Rev Biophys Biomol Struct. 1999;28:101–128. doi: 10.1146/annurev.biophys.28.1.101. [DOI] [PubMed] [Google Scholar]
  • 26.Chakrabarti S, Price B D, Tetradis S, Fox E A, Zhang Y, Maulik G, Makrigiorgos G M. Cancer Res. 2000;60:3732–3737. [PubMed] [Google Scholar]

Articles from Proceedings of the National Academy of Sciences of the United States of America are provided here courtesy of National Academy of Sciences

RESOURCES