Abstract
Allele-specific silencing using siRNAs targeting heterozygous single-nucleotide polymorphisms (SNPs) is a promising therapy for trinucleotide repeat diseases, such as Huntington’s disease (HD). Linking SNP identities to the two huntingtin (HTT) alleles—normal and disease-causing—is a prerequisite for allele-specific RNAi. Here, we describe a method, SNP linkage by circularization (SLiC), to identify the linkage between CAG repeat length and the nucleotide identity of heterozygous SNPs using HD patient peripheral blood samples.
At least ten dominant neurodegenerative diseases are caused by expansion of CAG trinucleotide repeats, creating proteins containing long, toxic polyglutamine repeats. In theory, all of these diseases could be treated by using siRNAs to selectively silencing the allele containing the expanded CAG repeat. Huntington’s disease (HD) is caused by expansion of CAG trinucleotide repeats within the first exon of the HTT gene. Most HD patients carry one normal allele, bearing 6 to ~35 CAG triplets (average, 18), and a mutant, disease-causing allele containing > 36 CAG triplets (average, 42)1,2. The resulting mutant HTT protein causes selective loss of neurons, whereas the normal allele is essential for both development and neurogenesis. Half the wild-type amount of HTT protein suffices for normal development and neuronal function in an adult3–6. Because siRNAs targeting the CAG repeat cannot discriminate between the mutant and normal alleles7, we have proposed to silence the mutant allele using siRNAs that can discriminate between a single nucleotide change—a single nucleotide polymorphism (SNP)8,9,10. Such an siRNA would fully match the disease allele, reducing its expression, but would mismatch with the nucleotide present at the SNP site in the normal HTT mRNA9,11.
Linkage of specific isoforms of a heterozygous SNP to the expanded CAG repeat is the foundation for the design of disease-allele-specific siRNAs for an individual HD patient. The standard method for linking SNPs to inherited, dominant disease-causing genes is pedigree analysis, which requires molecular testing of close relatives of the patient. For HD, these are likely to be the unaffected parent and at-risk siblings or children. In the United States, fears of loss of health insurance by at-risk relatives14 and uncertainty about paternity complicate pedigree testing. The method described here—SNP linkage by circularization (SLiC)—avoids these complications because only the HD patient need be genetically tested.
The HTT locus is large, spanning 180 kb on the short arm of chromosome 4, and produces two mRNA isoforms, 10 and 13 kb, by alternative splicing4. There are 16 SNP sites in the normal HTT mRNA with significant heterozygosity (Supplementary Table 1). If the rates of heterozygosity are similar in HD patients, the vast majority will have at least one heterozygous SNP and could be treated by at least one of the pairs of siRNAs. However, most candidate SNP sites are in exons 25, 29, 39, 47, 48, 50, 57, 60, 61 and 67, thousands of base pairs distal to the CAG expansion (Fig. 1a). Thus, the challenge to identify linkage between a particular SNP isoform and the expanded CAG repeat is the length of the HTT mRNA.
Figure 1.
A strategy for linking SNP identity to CAG repeat length, (CAG)n. (a) The structure of the HTT mRNA, with the positions of SNP sites and their distance to the CAG repeats. E, exon. (b) The SLiC strategy to identify the linkage of distant SNPs to CAG repeat length. Red, CAG repeats; blue, SNP sites; green, KasI restriction sites; arrows, PCR primers. (c) PCR amplification of the full-length cDNA generated by reverse transcription using primers in exon 1. This amplicon spans the CAG repeats. The mutant alleles (upper bands) migrate more slowly than the normal allele. Lanes 1 to 7 correspond to PCR products from different HD patient samples. (d) The full-length cDNA was amplified by long-range PCR using a common 5′ primer flanking the CAG repeats and a 3′ primer distal to each SNP site interrogated. Lane 1, exon 1 to exon 25; lane 2, exon 1 to exon 29; lane 3, exon 1 to exon 39; lane 4, exon 1 to exon 47; lane 5, exon 1 to exon 48; lane 6, exon 1 to exon 50; lane 7, exon 1 to exon 57; lane 8, exon 1 to exon 60; lane 9, exon 1 to exon 61; lanes 10 and 11, exon 1 to the 3′ UTR in exon 67. Arrowheads indicate the expected long-range PCR products. The products from the two different CAG length alleles co-migrate. (e) Ligation products of KasI-digested cDNA spanning exon 1 and exon 29. Ø, the cDNA incubated without E. coli ligase. L1, the ligation products of a reaction using 10 ng/μl KasI-digested cDNA; L2, 1ng/μl; L3, 0.1 ng/μl. Arrowhead indicates circularized cDNA that migrates faster than linear cDNA of same length. (f) Inverse PCR products using as template the circular ligation products from a two-fold dilution series of KasI-digested cDNA. The products fuse exon 29 (containing the SNP site) to exon 1 (containing the CAG repeats) and were readily separated by agarose gel electrophoresis. Lanes 1 to 7 correspond to decreasing amounts of cDNA used during the circularization step, ranging from 1 ng/μl to 0.016 ng/μl. Note that 0.03 ng/μl was the lowest concentration of KasI-digested cDNA that permitted subsequent inverse PCR. (g) Representative inverse PCR products generated from HD patient samples.
Our strategy juxtaposes the CAG repeats and the distant SNP site by first circularizing the DNA to produce a single PCR amplicon that can be readily sequenced (Fig. 1b). Total RNA was extracted from HD patient tissue samples; reverse-transcribed using a primer flanking the SNP site to be interrogated; and then amplified as cDNA by long-range PCR using a pair of primers, one 5′ to the CAG repeats and one 3′ to the SNP. The long, rare cDNA generated by PCR was then circularized by intramolecular ligation, and a small amplicon comprising the CAG repeat juxtaposed to the SNP site generated by inverse PCR. This final amplicon is sufficiently small that the PCR products from the normal and disease-causing alleles could be readily separated by agarose gel electrophoresis according to their CAG repeat lengths. Direct sequencing of each product identified the SNP–(CAG)n linkage.
HTT is ubiquitously expressed in human tissues, including peripheral blood lymphocytes12. We used HD patent blood samples to validate the SLiC method. Lymphocytes were purified from HD patient blood, cultured for three to four days, and then full-length HTT first-strand cDNA was synthesized from the total RNA extracted from the amplified lymphocytes. To verify full-length cDNA production, we used the first-strand cDNA to PCR amplify a 180 bp region of exon 1 encompassing the CAG repeats. Exon 1 was successfully amplified from both alleles (Fig. 1c), establishing that our protocol yields full-length cDNA for both mutant and normal HTT mRNA. We also used the first-strand cDNA to PCR amplify across each candidate SNP site to establish which SNPs were heterozygous in each patient assayed. Direct sequencing of the PCR products established that the SNP heterozygosity separately deduced from cDNA and genomic DNA matched for all 11 sites examined (Supplementary Table 2).
Next, we performed long-range PCR to amplify a region of the cDNA beginning 5′ to the CAG repeats in exon 1 and ending 3′ to the candidate SNP site, a distance spanning 3.3 to 10.9 kbp for the SNP sites to be interrogated. The CAG repeats and the SNP site lie near the ends of this amplicon. The PCR primers used in this reaction introduce KasI restriction sites, which were subsequently used to join the ends of the PCR product. The PCR products were purified together from a 0.9% agarose gel (Fig. 1d).
Intramolecular circularization of the long range PCR products is central to our method. KasI sites were used because no KasI restriction site lies in any of the large amplicons spanning exon 1 to the downstream SNP sites we interrogated. KasI digestion of the long range PCR products created cohesive ends that were then joined in dilute solution using E. coli ligase. To find the optimal DNA concentration for circularization, KasI digested cDNA (from exon 1 to exon 29 in Fig. 1d; 0.1 μg) was first diluted to 10 ng/μl, 1ng/μl and 0.1ng/μl and ligated using E. coli ligase under conditions that favor the formation of circles13. At 0.1 ng/μl, most of the cDNA migrated faster than expected for a linear monomer, indicating it was circular (Fig. 1e). A second round of optimization examined concentrations from 1 ng/μl to 0. 016 ng/μl. The resulting intramolecular ligation products served as template for inverse PCR using a forward primer complementary to exon 29, upstream of the heterozygous SNP site, and a reverse primer in exon 1. The products from the two alleles were readily separated; as little as 0.03 ng/μl of KasI-digested cDNA sufficed to template inverse PCR (Fig. 1f). The inverse PCR products were gel purified and directly sequenced, providing the desired SNP-(CAG)n linkage information (Fig. 2). As a final improvement to the protocol, the KasI-digested, E. coli ligase-treated cDNA was incubated with Exonuclease V to degrade linear DNA, including undesirable intermolecular ligation products. To test the optimized SLiC protocol, we interrogated eight different SNP sites in nine HD patients (Fig. 1g, Supplementary Table 1, and Supplementary Table 3), establishing their linkage to the disease allele of HTT.
Figure 2.
Representative sequencing traces for inverse PCR products joining the (CAG)n to a SNP site in exon 29 or exon 67 (3′ UTR). At left, mutant HTT alleles from four HD patients. At right, normal HTT alleles for the corresponding patients.
Because our method requires only peripheral blood, it can be used as a diagnostic technique for patient-specific therapeutic RNAi, but the method is applicable to any tissue sample in which HTT is expressed. For example, we used the method on two frozen post-mortem brain samples to identify their SNP-(CAG)n linkages. SLiC avoids traditional cloning techniques that are cumbersome for large mRNAs such as HTT, reducing cost and effort, and which are unlikely to prove robust in a clinical setting. The method is rapid, allowing identification of allelic linkage between the CAG repeats and heterozygous SNP sites in less than two weeks (Supplementary Table 3). Pre-screening for heterozygosity by direct sequencing of PCR amplicons spanning a panel of SNPs should further streamline the process15. For patient tissue samples with multiple heterozygous SNP sites, the method can identify linkages among the heterozygous SNPs, providing haplotype information. The SNP-(CAG)n linkage for a patient also makes it possible to examine the individual expression of the two alleles. Thus, one could test if the mutant, disease-causing mRNA accumulates differentially or if the expression ratio of mutant mRNA to normal affects age of HD onset or the speed of disease progression.
At least nine other neurodegenerative diseases are caused by CAG repeat expansion. Our method can be used to investigate (CAG)n-SNP linkage for these trinucleotide repeat diseases or for any two alleles of a gene that differ by insertion or deletion in which SNP sites of interest are distant from the site of mutation.
Supplementary Material
Acknowledgments
We thank Melvin Chan for advice on lymphocyte isolation and culture, Jürg Straubhauer and NIH Diabetes and Endocrinology Research Grant 5P30DK32520–25 for assistance with data analysis, members of Zamore and Aronin labs for helpful discussions, assistance, and comments on the manuscript. This work was supported by funding from National Institutes of Health (NS38194 to NA, PDZ, MD; 1P01NS058793 to HDR, SH; and NINDS NS042861 to HDR), and CHDI (NA, PDZ).
References
- 1.MacDonald Marcy E, Ambrose Christine M, Duyao Mabel P, et al. Cell. 1993;72:971. [Google Scholar]
- 2.Snell RG, MacMillan JC, Cheadle JP, et al. Nat Genet. 1993;4:393. doi: 10.1038/ng0893-393. [DOI] [PubMed] [Google Scholar]
- 3.Dixon KT, Cearley JA, Hunter JM, et al. Gene Expr. 2004;11:221. doi: 10.3727/000000003783992234. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 4.Ambrose CM, Duyao MP, Barnes G, et al. Somat Cell Mol Genet. 1994;20:27. doi: 10.1007/BF02290678. [DOI] [PubMed] [Google Scholar]
- 5.Duyao MP, Auerbach AB, Ryan A, et al. Science. 1995;269:407. doi: 10.1126/science.7618107. [DOI] [PubMed] [Google Scholar]
- 6.Zeitlin S, Liu JP, Chapman DL, et al. Nat Genet. 1995;11:155. doi: 10.1038/ng1095-155. [DOI] [PubMed] [Google Scholar]
- 7.Caplen NJ, Taylor JP, Statham VS, et al. Hum Mol Genet. 2002;11:175. doi: 10.1093/hmg/11.2.175. [DOI] [PubMed] [Google Scholar]
- 8.DiFiglia M, Sena-Esteves M, Chase K, et al. Proc Natl Acad Sci U S A. 2007;104:17204. doi: 10.1073/pnas.0708285104. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 9.Schwarz DS, Ding H, Kennington L, et al. PLoS Genet. 2006;2:e140. doi: 10.1371/journal.pgen.0020140. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 10.Ding H, Schwarz DS, Keene A, et al. Aging Cell. 2003;2:209. doi: 10.1046/j.1474-9728.2003.00054.x. [DOI] [PubMed] [Google Scholar]
- 11.Miller VM, Xia H, Marrs GL, et al. Proc Natl Acad Sci U S A. 2003;100:7195. doi: 10.1073/pnas.1231012100. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 12.Landwehrmeyer GB, McNeil SM, Dure LS, 4th, et al. Ann Neurol. 1995;37:218. doi: 10.1002/ana.410370213. [DOI] [PubMed] [Google Scholar]
- 13.Ochman H, Gerber AS, Hartl DL. Genetics. 1988;120:621. doi: 10.1093/genetics/120.3.621. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 14.Oster E, et al. Am J Med Genet A. 2008;146A:2070–2077. doi: 10.1002/ajmg.a.32422. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 15.van Bilsen PH, et al. Hum Gene Ther. 2008;19:710–719. doi: 10.1089/hum.2007.116. [DOI] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.


