Abstract
CANVAS is a recently characterized repeat expansion disease, most commonly caused by homozygous expansions of an intronic (A2G3)n repeat in the RFC1 gene. There are a multitude of repeat motifs found in the human population at this locus, some of which are pathogenic and others benign. In this study, we conducted structure-functional analyses of the main pathogenic (A2G3)n and the main nonpathogenic (A4G)n repeats. We found that the pathogenic, but not the nonpathogenic, repeat presents a potent, orientation-dependent impediment to DNA polymerization in vitro. The pattern of the polymerization blockage is consistent with triplex or quadruplex formation in the presence of magnesium or potassium ions, respectively. Chemical probing of both repeats in supercoiled DNA reveals triplex H-DNA formation by the pathogenic repeat. Consistently, bioinformatic analysis of the S1-END-seq data from human cell lines shows preferential H-DNA formation genome-wide by (A2G3)n motifs over (A4G)n motifs in vivo. Finally, the pathogenic, but not the non-pathogenic, repeat stalls replication fork progression in yeast and human cells. We hypothesize that CANVAS-causing (A2G3)n repeat represents a challenge to genome stability by folding into alternative DNA structures that stall DNA replication.
INTRODUCTION
Biallelic expansions of (A2G3)n repeats cause a newly discovered repeat expansion disease (RED) named CANVAS (cerebellar ataxia, neuropathy, vestibular areflexia syndrome) 1,2. It is an autosomal recessive disease with a carrier frequency range from 0.7% to 4% in studied populations, resulting in a prevalence range from 1:20,000 to 1:625, respectively 1,3. This frequency establishes RFC1-related ataxia as likely the most common cause of hereditary late-onset ataxia 1,2,4. CANVAS is a progressive neurodegenerative disease with a mean age of onset of 52 and a broad spectrum of clinical features, including but not limited to: imbalance, peripheral sensory symptoms, oscillopsia, dry cough, autonomic dysfunction, dysarthria, and dysphagia 4,5.
The expandable (A2G3)n repeat resides in the poly-A tail of an AluSx3 element within the second intron of the RFC1 gene 1. RFC1 encodes the largest subunit of replication factor C, the complex responsible for loading PCNA onto DNA during DNA replication and repair 6,7. Given its essential role, it is not surprising that this is the first RFC1 mutation found to cause human disease. While the pathogenic mechanism of CANVAS remains uncertain, RFC1 loss of function is suspected due to (1) its recessive inheritance and (2) the discovery of CANVAS-affected patients heterozygous for the repeat expansion and an RFC1 truncating mutation 3,8–11. CANVAS’s disease pathogenesis is an area of intense biomedical research and at the heart of every RED’s pathogenesis and genetics are the repeats themselves.
Most expandable repeats can adopt alternative, non-B DNA secondary structures that are integral to their propensity to expand and cause disease (reviewed in 12). These secondary structures are formed during processes involving B-DNA unwinding, such as replication, transcription, and repair, and, at the same time, they can become an obstacle for these processes. Notably, DNA replication and repair have been identified as major sources of repeat instability across various repeats, which has been attributed to their structure-prone nature (reviewed in 12).
CANVAS differs from most other REDs in that its pathogenic allele varies from the nonpathogenic one not only in repeat size, but also in its base composition. While most healthy individuals harbor (A4G)11–100 repeats in the RFC1 locus, CANVAS patients largely carry (A2G3)250–2000 repeats 1,13. Other iterations of the repeat exist, and their pathogenicity is currently being unraveled. Oftentimes, repeat variants are not pure and contain expanded (A2G3)n or other repeat interruptions, adding complexity to assigning the alternative repeat to the pathogenic or benign category 13. Alternative repeat motifs include (A3G2)n, (ACAGG)n, (AGGGC)n, (AAGGC)n, (AAGAG)n, (AGAGG)n, (AAAGGG)n, (ACGGG)n, (AG4)n, and (AACGG)n 1,13–18. Interestingly, the propensity of the most frequently found repeats to expand correlates with the number of guanine residues in the repetitive unit: A4G < A3G2< A2G3 1. Given the repeat composition, we hypothesized the pathogenic repeats can form a stable triplex H-DNA or G-quadruplex DNA, since they are simultaneously homopurine/homopyrimidine (hPu/hPy) mirror repeats 19 and contain evenly spaced G3 runs20.
Thus, we set out to determine if the pathogenic (A2G3)n repeats form an alternative DNA secondary structure(s), what this structure(s) is, and whether it impedes replication, which a priori may lead to the repeat’s instability. We found that pathogenic (A2G3)10 repeats, but not the nonpathogenic (A4G)10 repeats, strongly stall DNA Pol I polymerization and bacteriophage T7 polymerization in vitro. The stalling is orientation-dependent, occurring only when (A2G3)n serves as the template strand. Ambient conditions have a profound effect on the position of the stall, with patterns suggestive of G-quadruplex formation in the presence of potassium or triplex formation in the presence of magnesium. Using chemical probing, we identified the formation of H-r triplex DNA (pyrimidine-purine-purine triplex)19 by the pathogenic (A2G3)n repeat in vitro. Analysis of S1-END-seq peaks at genome-wide H-motifs in human cells revealed that (A2G3)n tracts have a higher propensity than (A4G)n tracts to form triplexes in vivo. Using two-dimensional electrophoretic analysis of replication intermediates, we show that the pathogenic repeat causes orientation-dependent replication stalling in both yeast and human cells when the (A2G3)n run is in the lagging strand template. We suggest, therefore, that non-B DNA-forming potential of the pathogenic repeat during DNA replication results in fork stalling, which may have a role in its instability.
MATERIALS AND METHODS
Details of construction of plasmids and strains used in this study are described in the Supplemental Materials. Specific PCR programs, plasmids, primers, and strains used in this study are listed in Supplemental Tables 1-6.
Cloning and amplification of (A2G3)n repeats
Due to the secondary structure-forming potential of these repeats, adjustments were made to PCR reaction mixes and cycling programs: Thermo Scientific Phusion High Fidelity DNA Polymerase (Cat# F530L) was used to amplify fragments for Gibson Assembly, cloning, or fragments for yeast transformation. The manufacturer’s reaction for Phusion PCR using was altered as follows: the master mix included 1M Betaine and no DMSO. The program listed in Supplemental Table 1 was followed with the following general changes: the extension time was lengthened to allow for progression through the entire repeat sequence and the extension temperature was raised to 75°C or 80°C when necessary. For some PCRs, the annealing temperature was also raised, and longer primers were designed accordingly to ensure annealing. For products with multiple bands used for cloning, the properly sized fragment was gel extracted and repeat length was confirmed with the Taq repeat PCR programs before cloning. Specific protocols are described in Supplemental Table 2 and generally, annealing and extension temperatures were raised and extension time lengthened. Phusion was also used to amplify the (A4G)n repeats. Genescript Taq polymerase (Cat# E00007) was used to amplify the (A2G3)n repeats. The following PCR reaction was used: 1X Taq Buffer with (NH4)2SO4 (Thermo Scientific Cat# B33), 1mM primers, 100mM dNTP, 1mM MgCl2, 1M betaine, 0.625 units Taq polymerase, and 1ng DNA. The PCR programs used for different repeats are listed in Supplemental Table 3.
Supercoiled plasmids used for two-dimensional electrophoresis or chemical probing were run alongside the linearized plasmid on a 0.8% agarose gel with 0.5µg/mL EtBr (conditions for 7.8kb plasmid) or 0.6% agarose gel with 1µg/mL EtBr (conditions for 12.4kb plasmid) to confirm that their monomeric state. E. coli strains with (A2G3)n-bearing plasmids were grown at 23°C to reduce instability during plasmid replication.
DNA polymerization
Double-stranded plasmids pJH2 and pJH9 (ordered from Genescript) were used as DNA templates for in vitro polymerization experiments with (A2G3)10 or (A4G)10 repeats, respectively. Primers JH271 and JH272 were used to polymerize through the repeats with either the pyrimidine or purine repeat in the template strand, respectively. Reactions using ThermoSequenase were carried out as follows: The USBio Thermo Sequenase Cycle Sequencing Kit (Cat# 78500) was used according to the manufacturer’s 3’-dNTP internal label cycling sequencing instructions with the following alterations and specifications: Instead of conducting multiple rounds of labeling and extending, 5µg of each plasmid and 0.5pmol of primer were used to conduct only one cycle. The primer was pre-annealed to the plasmid in 11µl water at 95°C for 2 minutes and immediately submerged in an ice-water bath, then the labeling components were added and the primer was labeled at 60°C for 30 seconds with 0.5µl of 1.25mM [α-32P]dATP. Upon aliquoting this reaction into the four pre-aliquoted termination mixes, termination was carried out at 72°C for 5 minutes and 4µl of stop solution was added to each reaction, mixed, and incubated at 95°C for 5 minutes and immediately submerged into an ice-water bath. 8µl of each reaction was loaded onto a 6% polyacrylamide gel with 7.5M urea prepared according to manufacturer’s instructions (National Diagnostics SEQUAGEL SEQUENCING SYSTEM 2.2 (Cat# EC-833)). Reactions using Vent polymerase were carried out as follows: Vent (exo-) DNA polymerase (New England Biolabs Cat# M0257S) was used according to primer extension experiments described in 21 with the following exceptions: 0.5pmol of primer and 5µg of DNA were pre-annealed as described above, the ddNTP:dNTP ratios and concentrations used are described in 22, labeling was carried out at 60°C for 30 seconds, termination was carried out at 90°C for 10 minutes, and steps after termination are identical to those with ThermoSequenase.
Primer/templates used for in vitro extension by T7 DNA polymerase were formed by pre-annealing 2.5µM 5’-32P labeled primer JH270 with 3.125µM single-stranded oligonucleotides bearing either (A2G3)10 (JH267), (T2C3)10 (JH268), or (A4G)10 (JH269) repeats in 10 mM Tris-HCl, pH 7.5, 0.1 mM EDTA alone or with the addition of 50mM K+ and 10mM Mg2+, or 50mM NaCl, KCl, or LiCl by incubation at 95°C for 5 minutes, followed by gradual return to room temperature. Primer extensions reactions consisted of 10nM primer/template and 1 µM T7 DNA polymerase in 40 mM Tris-HCl, pH 7.5, 5mM dithiothreitol (DTT), and 55mM of the indicated monovalent metal chloride salt (Li+, K+, or Na+). DNA polymerization was initiated by addition of 10mM MgCl2, followed by incubation at room temperature for 1 minute, and reactions were quenched with formamide/EDTA loading dye. Primer extension products were resolved by electrophoresis on a denaturing 10% polyacrylamide gel.
Chemical probing
2µg of the repeat-containing pJH2 or pJH9 plasmid was incubated in a 10mM Tris-HCl pH 7.5 2mM MgCl2 buffer with either 6mM KMnO4 or the same volume of water for 2 minutes at 37°C. The reaction was quenched with 1M β-mercaptoethanol, precipitated with ethanol, rinsed with 70% ethanol, and resuspended in 5µl water. The primer extension was carried out using the same protocol as was described in the in vitro polymerization methods with the following changes: the chemically probed DNA was pre-annealed with 0.5pmol of primer JH271 in a final volume of 5.5µl, the labeling reaction’s final volume was 8.75µl with the same ratios as described in the manufacturer’s instructions, and dNTPs were added to a final concentration of 75mM for each dNTP for extension.
Analysis of S1-END-seq data
S1-END-seq data from five cell lines were obtained from a previous study 23, provided as BED files, which represented the output of the “Peak calling” stage. Custom Python scripts were written to perform the following analysis. Overlapping genomic coordinates from the five cell lines were combined into non-overlapping peaks. Coordinates were converted to GRCh38 using Liftover. A set of control coordinates were randomly generated, matching the S1-END-seq peaks in total number, the length of each peak, and the proportion located on each chromosome. S1-END-seq peaks and control coordinates were compared to a database of repetitive sequences generated previously 24. For each category or subtype of repeat, the proportion of those falling within 100 nucleotides of the S1-END-seq peaks or control coordinates was calculated for each repeat length. Comparison of distributions along repeat lengths were made by Wilcoxon signed-rank test, restricted to length bins containing at least 3 repeats. Linear best fit lines for each distribution were weighted to the total number of repeats in each length bin and were also restricted to bins containing at least 3 repeats.
Analysis of replication intermediates by two-dimensional (2D) gel electrophoresis in yeast and human cells
Yeast cells:
The plasmids used for yeast two-dimensional gel electrophoresis contain the repeats, a yeast 2µ origin of replication, and ampicillin resistance. Plasmids were named pJH1, pJH26, and pJH4 and have (A2G3)60, (C3T2)60, or (A4G)60 in the lagging strand template in relation to the yeast 2µ origin. Plasmids were transformed into yeast strain JAH231 and yeast replication intermediates were extracted, restriction digested, run on 2D gel, and analyzed all according to 25. Details of plasmid construction are in the supplemental materials.
Human cells:
The plasmids used for human cell two-dimensional gel electrophoresis contain the repeats, the SV40 origin of replication, the gene encoding the large T antigen, and ampicillin resistance. Plasmids were named pJH5, pJH6, and pJH11 and have (A2G3)60, (C3T2)60, or (A4G)60 in the lagging strand template in relation to the SV40 origin of replication. Following the methods outlined in 26, plasmids were transfected into HEK293T cells, incubated for 48 hours, replication intermediates were extracted, and intermediates were run on a 2D gel. Details of plasmid construction are in the Supplemental Materials.
RESULTS
Pathogenic (A2G3)n repeats strongly stall DNA polymerization in vitro in an orientation-dependent manner
Structure-forming repeats pose a significant obstacle to polymerases for DNA synthesis during DNA replication and repair 27–33. This phenomenon is believed to be a driver of repeat instability (reviewed in 12,34,35). During replication, the Okazaki initiation zone remains single-stranded, allowing for structure formation preferentially on the lagging, rather than leading, strand template. Accordingly, replication issues are more severe for many repeats when their structure-prone strand is on the lagging strand template, and they are particularly unstable in this orientation (reviewed in 12). To the best of our knowledge, no experimental data are available on the replication nor instability of the expandable CANVAS (A2G3)n repeat.
To investigate polymerization through the repeats in either orientation in vitro, we used ThermoSequenase, a mutated DNA Polymerase I (exo-)36, and a repeat-containing double-stranded plasmid template with a primer that anneals up- or downstream of the repeats. The polymerization reaction was then carried out at 72°C in the presence of 3 mM Mg2+ as described in Materials and Methods. Note that upon denaturing and quick primer annealing, the plasmid template becomes a coil of intertwined single-stranded DNA segments, rather than a plain circular double-stranded DNA (Figure 1B). When the (A2G3)10 run serves as the template, polymerization stalls profoundly at its center (at the fifth and sixth repeats) and is unable to progress further in almost all templates (Figure 1A). In contrast, when the (C3T2)10 serves as the template, polymerization progresses through the repeats smoothly (Figure 1A). For the nonpathogenic repeats, when the (A4G)10 run is in the template, polymerization only mildly stalls at the sixth and seventh repeats, while the majority of DNA polymerases progress through the repeats. The pyrimidine (CT4)n run in the template does not pose an obstacle for the DNA polymerase (Figure 1A).
Figure 1. In vitro polymerization through pathogenic and nonpathogenic repeats.
(A) Polyacrylamide gel electrophoresis separation of ThermoSequenase sequencing reactions performed as described in Materials and Methods. Briefly, 5µg of each plasmid and 0.5pmol of primer pre-annealed and the USBio Thermo Sequenase Cycle Sequencing Kit’s 3’-dNTP internal label cycling sequencing instructions were followed with several modifications detailed in Materials and Methods. (B) Schematic of denatured double-stranded plasmid with primers annealed to allow for ThermoSequenase polymerization through the purine- or pyrimidine-rich strand of (A2G3)10 or (A4G)10 in the template strand. Primers are 98 base pairs or 75 base pairs away from repeats for the purine-rich or pyrimidine-rich template, respectively. Created with BioRender. (C) Model for triplex formation as polymerase progresses through the repeats with the purine-rich strand as the template. Created with BioRender. (D) Polyacrylamide gel electrophoresis separation of in vitro T7 DNA polymerase primer extension reaction. 3.125µM single-stranded oligonucleotides bearing either (A2G3)10 (JH267), (T2C3)10 (JH268), or (A4G)10 (JH269) repeats were pre-annealed with a 2.5 µM 5’-32P labeled primer (JH270) in 10 mM Tris-HCl, pH 7.5, 0.1 mM EDTA by incubation at 95°C for 5 minutes, followed by gradual return to room temperature. 50mM K+ and 10mM Mg2+ were added to the annealing buffer in the left panel. Primer extension reactions consisted of 10nM primer/template and 1µM T7 DNA polymerase in 40 mM Tris-HCl, pH 7.5, 5mM dithiothreitol (DTT), and 55mM K+ (left panel). Primer extension was initiated by addition of 10mM MgCl2, followed by incubation at room temperature for 1 minute, and quenching with formamide/EDTA loading dye. Primer extension products were resolved by electrophoresis on a denaturing 10% polyacrylamide gel. NT=no template (primer added without oligonucleotide template). (E) The same general protocol was followed as in (D), with the following changes: 50mM LiCl, KCl, or NaCl were added to the annealing buffer and 55 mM of the indicated monovalent metal chloride salt was added to the primer extension buffer. Sequencing alongside primer extension reactions were conducted with the templates and primers used for the primer extension reactions and followed the protocol detailed in the Materials and Methods for the sequencing reactions in Figure 1A. NR=no reaction (no addition of MgCl2). (F) Model for G-quadruplex formation as T7 DNA polymerase progresses along the purine-rich template in the presence of K+ in the annealing and primer extension buffer.
Strikingly, for the relatively short (A2G3)10 template, a potent ThermoSequenase is basically halted at 72°C. The only instance when DNA polymerization was able to progress through the repeat under reaction conditions similar to 21: using Vent DNA polymerase at the extension temperature of 85°C and decreased to 1mM Mg2+ concentration (Figure S1A). This is indicative of the formation of an extremely stable secondary structure during the polymerization process.
A priori such a structure could either be (1) an H-r DNA triplex formed when DNA polymerase reaches the center of the template, or (2) a G-quadruplex formed by the G-rich template strand. The observed polymerase stalling at the center of the (A2G3)10 strand combined with the Mg2+-dependence of the stalling strongly implicates H-r DNA triplex formation, which, given the repeat’s base composition, would be among the strongest triplexes possible (Figure 1C). The lack of a potassium cation due to optimal ThermoSequenase reaction conditions and the presence of magnesium in our reaction renders G-quadruplex formation less likely 37,38. Furthermore, G-quadruplex-forming sequences stall the polymerase directly at the 3’ end of the sequence 38,39, rather than at the center. The minor stall within the nonpathogenic (A4G)10 repeat template is likely caused by a much weaker H-r DNA triplex. It was indeed found that A-rich H-r triplexes are stabilized by Zn2+, rather than Mg2+ cations 40.
We then investigated DNA polymerization through single-stranded CANVAS repeat-containing templates by bacteriophage T7 DNA polymerase. T7 DNA polymerase is a robust mesophilic enzyme that is often used as an in vitro model system for processive DNA polymerases and replication forks 41. Differently from ThermoSequenase, it is active at 25°C in the presence of both K+ and Mg2+ ions. For this initial experiment, we annealed a primer to a single-stranded template upstream of 10 repeat units: either (A2G3)10, (C3T2)10, or (A4G)10, followed by DNA polymerization. T7 DNA polymerase only stalls when (A2G3)10 serves as the template strand and does not stall when (C3T2)10 or the nonpathogenic repeat serve as the template strand (Figure 1D). Notably when both potassium and magnesium are present in the annealing and primer extension buffers, the stall is at the beginning of the repeats, while in the lack of potassium, DNA polymerase progresses through, but multiple weaker stalls are observed inside the repeat. Altogether, this is indicative of G-quadruplex formation stabilized by potassium ions.
Altering the annealing and primer extension buffer conditions to include various monovalent ions revealed that the polymerase stalling pattern with (A2G3)10 in the template strand changes depending on the surrounding ion concentration (Figure 1E). A significant stall occurs at the first repeat in the presence of potassium, while a weaker stall occurs just after half of the repeats when no additional salt is added to the annealing buffer (Figure 1E). The stall seen in high [K+] at the beginning of the repeats is likely caused by a G-quadruplex (Figure 1E, F). Meanwhile, the stall just after halfway through the repeats that occurs with no additional ions is likely an H-r triplex formed in a similar manner as the ThermoSequenase stall we observed. No stalling is observed when (C3T2)10 or (A4G)10 is in the template strand, regardless of surrounding ions (Figure S1B).
Altogether we conclude that depending on exact ion concentration, either a G-quadruplex or H-r triplex formed during polymerization through the (A2G3)10 template blocks DNA polymerase progression in vitro.
(A2G3)n repeats form triplex DNA in supercoiled DNA
While both the pathogenic (A2G3)n and nonpathogenic (A4G)n repeats are hPu/hPy mirror repeats, the former is G-rich and the latter is A-rich. A priori, both can form triplex H-DNA, but the (A2G3)n repeat would most likely form an H-r triplex structure at physiological pH given the cytosine protonation required for the H-y isoform, while the (A4G)n repeat could form either the H-r or H-y isoform, if it is able to form a triplex 19,42. In addition, the (A2G3)n repeat has the potential to form G4-DNA as it has regularly spaced G3 blocks. The (A4G)n repeat, in contrast, can convert into the so-called propeller DNA (P-DNA) 43 or be a DNA unwinding element (DUE) as it has multiple An-runs 43,44.
To distinguish which of these structures are formed by those repeats in supercoiled DNA, we mapped single-stranded portions within the repeats in vitro. We used potassium permanganate (KMnO4), which preferentially modifies single-stranded thymines 45, thereby blocking Watson-Crick hydrogen bonding with a complementary strand and allowing their detection as polymerization termination sites in a primer extension assay. This approach could not be used for the (A2G3)n strand since it stalls the polymerase, but interrogating the (C3T2)n strand would allow us to distinguish between the candidate structures.
Upon potassium permanganate modification of plasmids with (A2G3)10 repeats, ThermoSequenase terminates in the wide area between the sixth and ninth repeats (Figure 2A). Meanwhile, the unmodified plasmid shows an almost undetectable level of polymerase termination (Figure S2). This data shows that the 5’-half of the polypyrimidine strand of the pathogenic repeat is single-stranded (Figure 2A), consistent with the H-r3 DNA triplex 19 (Figure 2B). At the same time, KMnO4 modification of the (A4G)10 repeat in supercoiled DNA shows evident termination signals at the beginning of the repeat combined with weak termination signals throughout the repeat (Figure 2A). This pattern is a stark contrast from the pathogenic repeat and suggests the (A4G)10 repeats in supercoiled DNA are transiently unwound, which is consistent with DUE chemical modification 44 (Figure 2C).
Figure 2. Potassium permanganate probing of pathogenic and nonpathogenic repeats.
(A) Polyacrylamide gel electrophoresis separation of sequencing reactions and primer extension reactions on potassium permanganate or water treated repeat-containing plasmids using the pyrimidine-rich template. Repeat-containing supercoiled DNA was incubated with potassium permanganate or water, the DNA was precipitated, and used for a primer extension reaction as described in Materials and Methods. (B) H-r3 triplex predicted from chemical probing for (A2G3)10 repeats. (C) DNA unwinding element (DUE) predicted from chemical probing for (A4G)10 repeats. Purple stars in (B) and (C) represent possible KMnO4 modification sites.
(A2G3)n repeats have a higher propensity to form triplex DNA in vivo than (A4G)n repeats
To examine the triplex-forming potential of (A2G3)n and (A4G)n repeats genome-wide in vivo, we reanalyzed an S1-END-seq dataset from a previous study23 which detected all triplex structures genome-wide in human cells. The S1-END-seq technique uses S1 nuclease to convert single-stranded DNA to double-strand breaks, which are then used as substrates for the attachment of high-throughput sequencing adapters 23. Triplex-forming regions were highlighted by this technique due to the presence of the extensive single-stranded regions in triplex H-DNA 23. Thus, to evaluate the triplex-forming potential of (A2G3)n and (A4G)n repeats, we compared the genomic coordinates of all such repeats in the human genome to the coordinates of peak regions identified by S1-END-seq. For both (A2G3)n and (A4G)n repeats, we see frequent overlap with the S1-END-seq peaks, growing to as much as 90% for very long repeat tracts, while overlap with randomly-generated genomic coordinates is in line with the 1.3% of the genome contained within peaks (Figure 3A). In contrast, G4-DNA motifs are not highly enriched in S1-END-seq peaks, with only 2–3% appearing in peaks regardless of motif length (Figure S3A). Other non-B-forming motifs are similarly not enriched in the S1-END-seq peaks (Figure S3B), with the sole exception of (AT)n repeats (Figure S3C), demonstrating that the S1-END-seq assay is primarily specific for triplex-forming DNA. Thus, it is highly likely that both (A2G3)n and (A4G)n repeats form DNA triplexes in vivo. Note that these data cannot rule out G-quadruplex formation by the pathogenic repeats since S1-END-seq does not detect these structures as readily.
Figure 3. Bioinformatic analysis of S1-END-seq peaks to determine the triplex-forming potential of various repeats using data from 23.
For each graph: Top: Graph depicting the percentage of repeats found within 100 nucleotides of S1-END-seq peaks as the repeat length increases. Bottom: Graph depicting the number of repeats as the repeat length increases. (A) Comparison of pathogenic (A2G3)n and (A4G)n motifs genome-wide. (B) Comparison of pentanucleotide motifs with increasing guanine:adenine ratio genome-wide.
Strikingly, (A2G3)n repeats demonstrate consistently higher in vivo triplex-forming potential than (A4G)n repeats along the axis of repeat length (Wilcoxon 1-sided test, p=2.7x10-7) (Figure 3A). Furthermore, other repeats known to be pathogenic - (AG4)n and (A3G2)n - also demonstrate higher in vivo triplex-forming potential than (A4G)n repeats, all displaying a similar trend as (A2G3)n (Figure 3B). Looking at a variety of hPu/hPy motifs with increasing numbers of adenines in a row, we see a pattern emerging, in which four or more adenines in a row becomes detrimental to triplex stability (Figure S3D). Looking at hexanucleotide motifs, though power-limited, we see that (A4G2)n and (A5G1)n show few signs of triplex formation, unlike all other hPu/hPy hexanucleotides (Figure S3E). Though (A4G2)n, (A3GAG)n and (A2G)n motifs all contain the same A:G ratio, only (A4G2)n motifs are inhibited in triplex formation (Figure S3F). We hypothesize that the presence of longer adenine runs in double-stranded DNA favors stiff propeller DNA (P-DNA) 43, making triplex nucleation problematic. (A)n repeats clearly do not form triplexes in vivo, while (G)n repeats show a slight elevation in overlaps with S1-END-seq peaks (Figure S3G), consistent with other G4-DNA motifs.
Pathogenic (A2G3)n repeats stall replication in yeast in an orientation-dependent manner
Our in vitro polymerization through the repeats suggests stable H-r triplex or G-quadruplex formation as the polymerase progresses through the pathogenic repeats. It is not clear which structure would prevail during DNA replication in vivo, given two major factors: (1) far more components are at play in the replication fork as compared to DNA polymerase alone and (2) intranuclear conditions, including chromatin, DNA supercoiling, and ion concentrations, differ from that in the polymerization reactions. We hypothesized, based on our chemical probing and bioinformatic analysis, that the pathogenic (A2G3)n repeats form H-r triplex DNA in vivo and would therefore stall replication only when the purine-rich strand resides on the lagging strand template. We also hypothesized that the nonpathogenic (A4G)n repeats would not stall the replication fork to the same extent as the pathogenic repeats.
To study whether the expandable CANVAS repeats stall DNA replication in vivo, we cloned a longer (A2G3)60 repeat tract into the multicopy yeast pRS425 plasmid in both orientations with respect to the replication direction and analyzed their replication using 2-D electrophoretic analysis of replication intermediates (Figure 4) as described in 46.
Figure 4. Analysis of yeast replication intermediates using two-dimensional gel electrophoresis.
(A) Representative gels of the no repeat control, (A2G3)60, (C3T2)60, and (A4G)60 in the lagging strand template of replication in yeast from the yeast 2µ origin of replication. The red arrow indicates replication fork stalling. (B) Densitometry profiles along the arc starting at the 1.5n spot to the 2n spot. These profiles were used for quantification, which was determined as described in 25,46. (C) Quantification of replication fork slowing via area analysis with the fold change increased normalized to the no repeat control arc. Error bars represent standard error of the mean and non-overlapping error bars were used to determine significance. Created with BioRender and prism.
There is no observable fork stall in the no repeat control plasmid (Figure 4 A,B). At the same time, the pathogenic repeat stalls replication fork progression only when its homopurine run is in the lagging strand template (Figure 4 A,B), as is evident from the presence of a bulge on the otherwise smooth descending half of the Y-arc. The presence of the (C3T2)60 run in the lagging strand template does not result in a defined stall site, but rather leads to a slight widening of the Y-arc downstream from the repeat. While the reasons for this arc widening are not clear, it might indicate minor slowing of the replication fork progression as it passes through the repeat (Figure 4 A,B). Regardless, quantification of the stalling (Figure 4 B,C) reveals a significant increase in replication fork stalling with (A2G3)60 in the lagging strand template over the no repeat control and the flipped orientation (Figure 4C). Similar orientation-dependence was previously observed for the Friedreich’s ataxia (GAA)n repeat in many systems 33,47,48, which forms H-r triplexes in vitro 49–52 and in vivo 23, suggesting the pathogenic CANVAS repeats may also form an H-r triplex in vivo. Importantly, the replication fork does not stall significantly more than the no repeat control when (A4G)60 is in the lagging strand template, in line with our hypothesis based on little in vitro polymerase stalling at the (A4G)10 repeats (Figure 1) and the lack of structure formation seen on (A4G)10 chemical probing (Figure 2). Altogether, these in vivo data mirror our in vitro results in their orientation-dependence and pathogenic repeat-dependence.
Pathogenic (A2G3)n repeats stall replication in human cells in an orientation-dependent manner
Does the replication fork stalling by (A2G3)n repeats in yeast hold true in human cells? To answer this question, we utilized an episome replicating in human HEK293T cells described by us earlier 26. Briefly, this episome contains both the SV40 origin of replication and the T-antigen driving its replication initiation and elongation. Thus, the more it replicates, the more T-antigen is produced, driving subsequent rounds of replication. As a result, this system generates a high amount of replication intermediates, making their electrophoretic analysis feasible and relatively easy. The (A2G3)60 repeat was cloned into this plasmid in two orientations relative to the SV40 origin.
Replication fork stalling patterns caused by the pathogenic (A2G3)n repeats in human cells appear to be fundamentally similar to that in yeast (Figure 5). When in the lagging strand template, the (A2G3)60 run causes a very prominent replication stall on the ascending half of the Y-arc, while no stall is detected in the no repeat control (Figure 5A,B). In the opposite orientation, the (C3T2)60 run on the lagging strand template does not cause prominent fork stalling (Figure 5A,B). Also similar to the yeast 2D data, the non-pathogenic (A4G)n repeat does not cause significant fork stalling (Figure 5).
Figure 5. Analysis of human cell replication intermediates using two-dimensional gel electrophoresis.
(A) Representative gels of the no repeat control, (A2G3)60, (C3T2)60, and (A4G)60 in the lagging strand template of replication from the SV40 origin of replication. The red arrow indicates replication fork stalling. (B) Densitometry profiles along the arc starting at the 1n spot to the 1.5n spot. These profiles were used for quantification, which was determined as described in 25,46. (C) Quantification of replication fork slowing via area analysis with the fold change increased normalized to the no repeat control arc. Error bars represent standard error of the mean and non-overlapping error bars were used to determine significance. Created with BioRender and prism.
The consistent throughline in the in vitro and in vivo replication data is that the pathogenic (A2G3)n repeats are an obstacle to DNA polymerases and replication forks particularly when the purine-rich strand is more single-stranded.
DISCUSSION
CANVAS is a recently discovered neurodegenerative RED characterized by a spectrum of clinical manifestations, including but not limited to cerebellar ataxia, neuropathy, and vestibular areflexia 1,2. Though it is estimated to be the most common cause of hereditary late-onset ataxia 1,4, very little is known about its genetics and pathogenesis. An unusual feature of this RED is that a change in the sequence from the nonpathogenic (A4G)11–100 to the pathogenic (A2G3)250–2,000 causes CANVAS, rather than only the expansion of a repeat 1. The recent discovery of various pathogenic and benign repeat iterations poses several interesting questions: (1) Do the repeats form a non-B DNA structure that is integral to its instability, as other RED repeats do? (2) Is this non-B DNA structure the same for different pathogenic repeats? (3) Is structure formation by only the pathogenic repeats the underlying reason for their pathogenicity? (4) How do the repeats expand and cause disease?
No model systems have so far been developed to study CANVAS’s genetics or pathogenesis. Clues to RED pathogenesis often lie within the repeats themselves, the non-B structures they form, and their interactions with cellular machinery. For example, the (GAA)exp repeats implicated in FRDA are at the heart of the expansion mechanism and disease pathogenesis: (GAA)n-related issues with replication and DNA repair contribute to the expansion of (GAA)n repeats (reviewed in 34) and expanded (GAA)n repeats block transcription, contributing to the loss of function pathogenesis of FRDA (reviewed in 12). Triplex formation of the (GAA)exp allele has thereby been shown to be central to both FRDA genetics and pathogenesis. Therefore, it is crucial to study the pathogenic (A2G3)n repeat in terms of its propensity to form a non-B DNA structure, ability to impede cellular machinery such as replication, and mechanisms of instability.
Using an arsenal of tools developed by our lab and others, we examined the CANVAS-causing (A2G3)n repeats and established key DNA-centric characteristics that may help unravel how these repeats expand and cause disease.
Using probing with a chemical specific to single-stranded thymines, we found that (A2G3)10, but not (A4G)10, forms an H-r DNA triplex in supercoiled DNA (Figure 2A). Recent S1-END-seq and permanganate footprinting data has shown that many hPu/hPy mirror repeats form triplexes in vivo 23,53. In fact, S1-END-seq indicated triplex formation in lymphoblasts derived from a (GAA)exp-harboring FRDA patient and not in an unaffected sibling 23. We suspect the same may be true for CANVAS patients and we wondered if we could glean information about the CANVAS repeats’ secondary structure by determining if these repeats form secondary structures elsewhere in the genome. Using the S1-END-seq data 23, we indeed found that long tracts of (A2G3)n and (A4G)n motifs overlap with S1-END-seq peaks genome-wide, indicative of their triplex formation in vivo (Figure 3). Remarkably, the propensity for triplex formation is higher for (A2G3)n than (A4G)n motifs in vivo, supporting our hypothesis that the pathogenic repeats form a more stable triplex than the nonpathogenic repeats, potentially leading to downstream repeat instability.
Transient non-B DNA structure formation by the repeats blocks DNA polymerization through the repeat at the center of the repeat in vitro in an orientation-dependent manner, i.e. when the homopurine run is on the template strand (Figure 1A). This is not the case for the nonpathogenic (A4G)n repeat. Many triplex-forming repeats have been shown to stall DNA polymerases in vitro particularly strongly when the homopurine strand is the template strand 21,32,54–57. In fact, the pattern we observed in vitro with prominent stalling once the polymerase reaches halfway through the repeats is almost identical to that seen with another H-r triplex-forming sequence 54,55. Interestingly, the pathogenic CANVAS repeat, when present in ssDNA, can adopt alternative structures depending on solution conditions. In vitro extension experiments using T7 DNA polymerase demonstrated stalling patterns that indicate the formation of G-quadruplex (Figure 1D,E). Solution conditions that stabilize G-quadruplexes result in a strong stall immediately upstream of the (A2G3)10 repeat. Ambient cellular conditions as well as additional replication proteins likely play a major role in determining the secondary structure formed by the CANVAS-causing repeats during replication in vivo.
Would such a potent repeat-mediated DNA polymerization block in vitro manifest itself in vivo? To study the effect of this repeat on replication in yeast, we used a repeat-bearing plasmid with a yeast 2µ origin of replication, which uses the host replisome to replicate through the repeats. For replication in human cells, we used a repeat-bearing plasmid with an SV40 origin of replication that expresses T antigen. Remarkably, the pathogenic (A2G3)n repeat blocks DNA replication in both yeast and human cells in an orientation-dependent manner: when the (A2G3)n run is in the lagging strand template (Figures 4 and 5).
H-motifs have been shown to stall replication when the homopurine strand is in the lagging strand template in bacteria 58, yeast 33,47,48, human cells, and human cell extracts 32. This replication fork stalling pattern is in stark contrast to orientation-independent hairpin-forming sequences 59,60 while the pattern of replication fork stalling for G-quadruplexes is more complicated with evidence of G-quadruplexes impeding the replication fork when in the leading 61–63 or lagging 64 strand template. Often, a G-quadruplex stabilizer or the knockout of a G-quadruplex unwinder is necessary to cause stalling 61,64. Therefore, the CANVAS repeat replication stalling pattern that is conserved from yeast to human cells is more consistent with a triplex-caused polymerization arrest in vivo, though G-quadruplex formation cannot be ruled out.
The dynamic nature of non-B DNA structure formation may allow for the formation of both a triplex and a G-quadruplex by the pathogenic repeats, depending cellular conditions. Experiments to determine genetic controls of this replication fork stalling are underway and will surely shed more light on the non-B DNA structure(s) involved in the orientation-dependent replication fork stalling observed at the pathogenic repeats.
Overall, we have found that the pathogenic (A2G3)n repeat forms a triplex in vitro and stalls replication in an orientation-dependent manner in vitro and in vivo. In each experiment, we have juxtaposed the pathogenic (A2G3)n repeat with the nonpathogenic (A4G)n repeat, finding the nonpathogenic repeat largely behaves similar to a nonrepetitive sequence and thus does not impede replication to the extent that (A2G3)n does. Therefore, we believe we have uncovered features of the pathogenic allele that are integral to its instability and pathogenicity. This illuminates an important next step: studying the genetics of CANVAS and how the repeats expand and cause disease. To this point, work to establish model systems to study the repeats’ instability (contraction and expansion) is currently underway and is a crucial start to understanding this recently characterized disease. Similarly, conducting these structure-functional analysis experiments with additional pathogenic and nonpathogenic alleles may illuminate a pattern suggestive of one structure over another, though it is possible different structures are formed by different repeats, leading to the same disease.
Supplementary Material
ACKNOWLEDGEMENTS
We thank Catherine Freudenreich, Mitch McVey, Claire Moore, Ralph Scully and members of the Mirkin laboratory for their integral input to this project.
Funding Statement
The work in the Mirkin laboratory is supported by the National Institute of General Medical Sciences (R35GM130322) and the National Science Foundation (2153071).
Footnotes
DATA AVAILABILITY
The data underlying this article are available in the article and in its online supplementary material.
SUPPLEMENTARY DATA
Supplementary data is available in a separate PDF.
REFERENCES
- 1.Cortese A. et al. Biallelic expansion of an intronic repeat in RFC1 is a common cause of late-onset ataxia. Nat Genet 51, 649–658 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 2.Rafehi H. et al. Bioinformatics-Based Identification of Expanded Repeats: A Non-reference Intronic Pentamer Expansion in RFC1 Causes CANVAS. Am J Hum Genet 105, 151–165 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 3.Arteche-López A. et al. New Cerebellar Ataxia, Neuropathy, Vestibular Areflexia Syndrome cases are caused by the presence of a nonsense variant in compound heterozygosity with the pathogenic repeat expansion in the RFC1 gene. Clin Genet 103, 236–241 (2023). [DOI] [PubMed] [Google Scholar]
- 4.Cortese A. et al. Cerebellar ataxia, neuropathy and vestibular areflexia syndrome (CANVAS): genetic and clinical aspects. Pract Neurol 22, 14–18 (2022). [DOI] [PubMed] [Google Scholar]
- 5.Cortese A. et al. Cerebellar ataxia, neuropathy, vestibular areflexia syndrome due to RFC1 repeat expansion. Brain 143, 480–490 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 6.Ogi T. et al. Three DNA polymerases, recruited by different mechanisms, carry out NER repair synthesis in human cells. Mol Cell 37, 714–727 (2010). [DOI] [PubMed] [Google Scholar]
- 7.Iyama T. & Wilson D. M. DNA repair mechanisms in dividing and non-dividing cells. DNA Repair (Amst) 12, 620–636 (2013). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 8.Benkirane M. et al. RFC1 nonsense and frameshift variants cause CANVAS: clues for an unsolved pathophysiology. Brain 145, 3770–3775 (2022). [DOI] [PubMed] [Google Scholar]
- 9.Ronco R. et al. Truncating Variants in RFC1 in Cerebellar Ataxia, Neuropathy, and Vestibular Areflexia Syndrome. Neurology 100, e543–e554 (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 10.King K. A. et al. Whole-Genome and Long-Read Sequencing Identify a Novel Mechanism in RFC1 Resulting in CANVAS Syndrome. Neurol Genet 8, e200036 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 11.Weber S. et al. Two RFC1 splicing variants in CANVAS. Brain awac466 (2022) doi: 10.1093/brain/awac466. [DOI] [PubMed] [Google Scholar]
- 12.Khristich A. N. & Mirkin S. M. On the wrong DNA track: Molecular mechanisms of repeat-mediated genome instability. J Biol Chem 295, 4134–4170 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 13.Dominik N. et al. Normal and pathogenic variation of RFC1 repeat expansions: implications for clinical diagnosis. Brain awad240 (2023) doi: 10.1093/brain/awad240. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 14.Barghigiani M. et al. Screening for RFC-1 pathological expansion in late-onset ataxias: a contribution to the differential diagnosis. J Neurol 269, 5431–5435 (2022). [DOI] [PubMed] [Google Scholar]
- 15.Akçimen F. et al. Investigation of the RFC1 Repeat Expansion in a Canadian and a Brazilian Ataxia Cohort: Identification of Novel Conformations. Front Genet 10, 1219 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 16.Abramzon Y. et al. Investigating RFC1 expansions in sporadic amyotrophic lateral sclerosis. J Neurol Sci 430, 118061 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 17.Scriba C. K. et al. A novel RFC1 repeat motif (ACAGG) in two Asia-Pacific CANVAS families. Brain 143, 2904–2910 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 18.Erdmann H. et al. Parallel in-depth analysis of repeat expansions in ataxia patients by long-read sequencing. Brain awac377 (2022) doi: 10.1093/brain/awac377. [DOI] [PubMed] [Google Scholar]
- 19.Mirkin S. M. & Frank-Kamenetskii M. D. H-DNA and Related Structures. Annu. Rev. Biophys. Biomol. Struct. 23, 541–576 (1994). [DOI] [PubMed] [Google Scholar]
- 20.Ding Y., Fleming A. M. & Burrows C. J. Case studies on potential G-quadruplex-forming sequences from the bacterial orders Deinococcales and Thermales derived from a survey of published genomes. Sci Rep 8, 15679 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 21.Krasilnikov A. S. et al. Mechanisms of Triplex-Caused Polymerization Arrest. Nucleic Acids Research 25, 1339–1346 (1997). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 22.Gardner A. F. & Jack W. E. Determinants of nucleotide sugar recognition in an archaeon DNA polymerase. Nucleic Acids Res 27, 2545–2553 (1999). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 23.Matos-Rodrigues G. et al. S1-END-seq reveals DNA secondary structures in human cells. Mol Cell 82, 3538–3552.e5 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 24.McGinty R. J. & Sunyaev S. R. Revisiting mutagenesis at non-B DNA motifs in the human genome. Nat Struct Mol Biol (2023) doi: 10.1038/s41594-023-00936-6. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 25.Radchenko E. A. et al. Partners in crime: Tbf1 and Vid22 promote expansions of long human telomeric repeats at an interstitial chromosome position in yeast. PNAS Nexus 1, pgac080 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 26.Rastokina A. et al. Large-scale expansions of Friedreich’s ataxia GAA•TTC repeats in an experimental human system: role of DNA replication and prevention by LNA-DNA oligonucleotides and PNA oligomers. Nucleic Acids Research gkad441 (2023) doi: 10.1093/nar/gkad441. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 27.Voineagu I., Freudenreich C. H. & Mirkin S. M. Checkpoint Responses to Unusual Structures Formed by DNA Repeats. Mol Carcinog 48, 309–318 (2009). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 28.Samadashwily G. M., Raca G. & Mirkin S. M. Trinucleotide repeats affect DNA replication in vivo. Nat Genet 17, 298–304 (1997). [DOI] [PubMed] [Google Scholar]
- 29.Hile S. E. & Eckert K. A. Positive correlation between DNA polymerase alpha-primase pausing and mutagenesis within polypyrimidine/polypurine microsatellite sequences. J Mol Biol 335, 745–759 (2004). [DOI] [PubMed] [Google Scholar]
- 30.Anand R. P. et al. Overcoming natural replication barriers: differential helicase requirements. Nucleic Acids Research 40, 1091–1105 (2012). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 31.Patel H. P., Lu L., Blaszak R. T. & Bissler J. J. PKD1 intron 21: triplex DNA formation and effect on replication. Nucleic Acids Res 32, 1460–1468 (2004). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 32.Liu G. et al. Replication fork stalling and checkpoint activation by a PKD1 locus mirror repeat polypurine-polypyrimidine (Pu-Py) tract. J Biol Chem 287, 33412–33423 (2012). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 33.Krasilnikova M. M. & Mirkin S. M. Replication stalling at Friedreich’s ataxia (GAA)n repeats in vivo. Mol Cell Biol 24, 2286–2295 (2004). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 34.Masnovo C., Lobo A. F. & Mirkin S. M. Replication dependent and independent mechanisms of GAA repeat instability. DNA Repair (Amst) 118, 103385 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 35.Wang G. & Vasquez K. M. Dynamic alternative DNA structures in biology and disease. Nat Rev Genet (2022) doi: 10.1038/s41576-022-00539-9. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 36.Vander Horn P. B. et al. Thermo Sequenase DNA polymerase and T. acidophilum pyrophosphatase: new thermostable enzymes for DNA sequencing. Biotechniques 22, 758–762, 764–765 (1997). [DOI] [PubMed] [Google Scholar]
- 37.Bhattacharyya D., Mirihana Arachchilage G. & Basu S. Metal Cations in G-Quadruplex Folding and Stability. Front Chem 4, 38 (2016). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 38.Chashchina G. V., Beniaminov A. D. & Kaluzhny D. N. Stable G-Quadruplex Structures of Oncogene Promoters Induce Potassium-Dependent Stops of Thermostable DNA Polymerase. Biochemistry (Mosc) 84, 562–569 (2019). [DOI] [PubMed] [Google Scholar]
- 39.Castillo Bosch P. et al. FANCJ promotes DNA synthesis through G-quadruplex structures. EMBO J 33, 2521–2533 (2014). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 40.Malkov V. A., Voloshin O. N., Soyfer V. N. & Frank-Kamenetskii M. D. Cation and sequence effects on stability of intermolecular pyrimidine-purine-purine triplex. Nucleic Acids Res 21, 585–591 (1993). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 41.Lee S.-J. & Richardson C. C. Choreography of bacteriophage T7 DNA replication. Current Opinion in Chemical Biology 15, 580–586 (2011). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 42.Frank-Kamenetskii M. D. & Mirkin S. M. Triplex Dna Structures. Annual Review of Biochemistry 64, 65–95 (1995). [DOI] [PubMed] [Google Scholar]
- 43.Aymami J., Coll M., Frederick C. A., Wang A. H. & Rich A. The propeller DNA conformation of poly(dA).poly(dT). Nucleic Acids Res 17, 3229–3245 (1989). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 44.Kowalski D. & Eddy M. J. The DNA unwinding element: a novel, cis-acting component that facilitates opening of the Escherichia coli replication origin. EMBO J 8, 4335–4344 (1989). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 45.Rubin C. M. & Schmid C. W. Pyrimidine-specific chemical reactions useful for DNA sequencing. Nucleic Acids Res 8, 4613–4619 (1980). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 46.Krasilnikova M. M. & Mirkin S. M. Analysis of Triplet Repeat Replication by Two-Dimensional Gel Electrophoresis. in Trinucleotide Repeat Protocols (ed. Kohwi Y.) 19–28 (Humana Press, 2004). doi: 10.1385/1-59259-804-8:019. [DOI] [PubMed] [Google Scholar]
- 47.Shishkin A. A. et al. Large-Scale Expansions of Friedreich’s Ataxia GAA Repeats in Yeast. Molecular Cell 35, 82–92 (2009). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 48.Kim H.-M. et al. Chromosome fragility at GAA tracts in yeast depends on repeat orientation and requires mismatch repair. EMBO J 27, 2896–2906 (2008). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 49.Gacy A. M. et al. GAA instability in Friedreich’s Ataxia shares a common, DNA-directed and intraallelic mechanism with other trinucleotide diseases. Mol Cell 1, 583–593 (1998). [DOI] [PubMed] [Google Scholar]
- 50.Mariappan S. V., Catasti P., Silks L. A., Bradbury E. M. & Gupta G. The high-resolution structure of the triplex formed by the GAA/TTC triplet repeat associated with Friedreich’s ataxia. J Mol Biol 285, 2035–2052 (1999). [DOI] [PubMed] [Google Scholar]
- 51.Sakamoto N. et al. Sticky DNA: Self-Association Properties of Long GAA·TTC Repeats in R·R·Y Triplex Structures from Friedreich’s Ataxia. Molecular Cell 3, 465–475 (1999). [DOI] [PubMed] [Google Scholar]
- 52.Vetcher A. A. et al. Sticky DNA, a long GAA.GAA.TTC triplex that is formed intramolecularly, in the sequence of intron 1 of the frataxin gene. J Biol Chem 277, 39217–39227 (2002). [DOI] [PubMed] [Google Scholar]
- 53.Kouzine F. et al. Permanganate/S1 Nuclease Footprinting Reveals Non-B DNA Structures with Regulatory Potential across a Mammalian Genome. Cell Syst 4, 344–356.e7 (2017). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 54.Lapidot A., Baran N. & Manor H. (dT-dC)n and (dG-dA)n tracts arrest single stranded DNA replication in vitro. Nucleic Acids Res 17, 883–900 (1989). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 55.Baran N., Lapidot A. & Manor H. Formation of DNA triplexes accounts for arrests of DNA synthesis at d(TC)n and d(GA)n tracts. Proc Natl Acad Sci U S A 88, 507–511 (1991). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 56.Dayn A., Samadashwily G. M. & Mirkin S. M. Intramolecular DNA triplexes: unusual sequence requirements and influence on DNA polymerization. Proc Natl Acad Sci U S A 89, 11406–11410 (1992). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 57.Samadashwily G. M., Dayn A. & Mirkin S. M. Suicidal nucleotide sequences for DNA polymerization. EMBO J 12, 4975–4983 (1993). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 58.Pollard L. M. et al. Replication-mediated instability of the GAA triplet repeat mutation in Friedreich ataxia. Nucleic Acids Research 32, 5962–5971 (2004). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 59.Pelletier R., Krasilnikova M. M., Samadashwily G. M., Lahue R. & Mirkin S. M. Replication and Expansion of Trinucleotide Repeats in Yeast. Mol Cell Biol 23, 1349–1357 (2003). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 60.Liu G. et al. Altered replication in human cells promotes DMPK (CTG)(n) · (CAG)(n) repeat instability. Mol Cell Biol 32, 1618–1632 (2012). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 61.Lopes J. et al. G-quadruplex-induced instability during leading-strand replication. EMBO J 30, 4033–4046 (2011). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 62.Guilbaud G. et al. Local epigenetic reprogramming induced by G-quadruplex ligands. Nat Chem 9, 1110–1117 (2017). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 63.Sarkies P. et al. FANCJ coordinates two pathways that maintain epigenetic stability at G-quadruplex DNA. Nucleic Acids Res 40, 1485–1498 (2012). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 64.Dahan D. et al. Pif1 is essential for efficient replisome progression through lagging strand G-quadruplex DNA secondary structures. Nucleic Acids Res 46, 11847–11857 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.