|
Supporting Text
Supporting Methods
E. coli Strains. KMBL1001 (no known mutations) and CS5429:KMBL1001 (D uvrB::Cam) were obtained from Nora Goosen (Leiden Institute of Chemistry, Leiden, The Netherlands). AB1157 [thr-1, araC14, leuB6, D (gpt– proA)62, lacY1, tsx-33, qsr'—, glnV44(AS), galK2(Oc), l —, Rac-0, hisG4(Oc), rfbD1, mgl-51, rpoS396(Am), rpsL31(strR), kdgK51, xylA5, mtl-1, argE3(Oc), thi-1] was obtained from the E. coli Genetic Stock Center (Yale University, New Haven, CT), JC10287:AB1157 [D (srlR–recA)304] was a gift from Susan M. Rosenberg (Baylor College of Medicine, Houston). KM52:AB1157 (D mutL460::Cam), KM55:AB1157 (D mutH461::Cam), and ES1574:AB1157 (D mutS260::Tn5) were a gift from Martin G. Marinus (University of Massachusetts, Worcester). RW118 [thi-1, thr-1, araD139, lacY1, argE3(Oc), D (gpt–proA)62, mtl-1, xyl-5, rpsL31(strR), tsx-33, supE44, galK2(Oc), hisG4(Oc), rfbD1, kdgK51, sulA211], RW630:RW118 (D umuDC595::Cat, D polB::Spc, D dinB::Kan), RW110 [uvrA6, thi-1, thr-1, araD-14, leuB6, lacY1, ilv323ts, D (gpt–proA)62, mtl-1, xyl-5, rpsL31(strR), tsx-33, supE44, galK2(Oc), hisG4(Oc), rfbD1, kdgK51, sulA211], and DV44:RW110 (D umuDC595::ErmGT, D polB::Spc, D dinB61::Ble) were a kind gift of Roger Woodgate (National Institute of Child Health and Human Development, National Institutes of Health, Bethesda, MD). Table 4 shows the relevant features of the strains and the acronyms used in this study.
Plasmids. We constructed pRW3619 from pGFPuv (Clontech) by cloning a transcription terminator cassette (Ter) with the rrnBT1 and rrnBT2 terminator sites (1) of pEH1 (GenBank accession no. GI:3560095) in the SapI site. The oligonucleotides TERT (AGCATAAAACGAAAGGCTCAGTCGAAAGACTGGGCCTTTCGTTTTATCTGT TGTTTGAGAAGGCCATCCTGACGGATGGCCTTTTT) and TERB (GCTAAAAAGGCCATCCGTCAGGCTGGCCTTCTCAAACAACAGATAAAACGA AAGGCCCAGTCTTTCGACTGAGCCTTTCGTTTTAT) formed single-stranded AGC and GCT at each end after annealing, which were compatible with SapI-cleaved pGFPuv. Ligation reconstituted only one SapI site. Recombinants were confirmed by BsrBI cleavage, which released a 241-nt fragment from pGFPuv and an » 320-nt fragment from pRW3619. Polyacrylamide gel electrophoresis analyses revealed a band at » 360 bp instead of 320, which we attributed to a correct insert migrating slower than expected from its size due to DNA bending at A/T tracts (2) (Fig. 1). In all cases, plasmids were purified by CsCl density gradient ultracentrifugation (3).
pRW3620 contained a 2.5-kb poly(R•Y) tract from intron 21 of the PKD1 gene. The tract was excised from pBS4.0 (4) by cleavage with PpuMI, which released an » 2,750-bp insert with the polypyrimidine-rich strand flanked by » 80 nt on the 5' side and » 180 nt on the 3' side. The fragment was ligated to EcoRI- linearized pRW3619 through two linkers: linker A (oligonucleotide 1 + oligonucleotide 2) and linker B (oligonucleotide 1 + oligonucleotide 3). Oligonucleotide 1 was AATTCGACCGTGCG, oligonucleotide 2 was GTCCGCACGGTCG, and oligonucleotide 3 was GACCGCACGGTCG. Recombinant clones were selected by transformation in E. coli strain HB101 (Fig. 1).
pRW3621 was found during the screening for the recombinant pRW3620. In pRW3621, the poly(R•Y) tract underwent an » 1.5 kb internal deletion that included the unique BbsI recognition site (Fig. 1). pIQ-kan (5) was a gift from Richard P. Bowater (University of East Anglia, UK).
Media and Chemicals. M9 salts contained 0.03 M Na2HPO4, 0.01 M NaCl, 0.02 M NH4Cl, 0.025 M KH2PO4, and 0.4% Casamino acids. Bacterial cells were grown on minimal media containing M9 salts, 0.4% glucose, 10 m g/ml thiamin, and 0.005 M MgSO4. Green fluorescent and white CFUs were visualized on agar plates supplemented with 1 mM IPTG by using a model 3-6000 Fotodyne hand-held UV lamp. We used ampicillin at 100 m g/ml, chloramphenicol at 34 m g/ml, and kanamycin at 50 m g/ml. Restriction enzymes were from New England Biolabs (Beverly, MA) and other enzymes were from USB (Cleveland, OH).
Bacterial Lysates. Bacterial cultures were grown in 100 ml of minimal medium at 37°C with shaking at 200 rpm; at the end of the growth, the OD600 was measured on a 1:10 dilution. Cells were harvested, washed twice in PBS (3) at 4°C, resuspended in lysis buffer (50 mM NaH2PO4/10 mM Tris•HCl, pH 8.0/200 mM NaCl) containing 0.5 mg/ml lysozyme at 2 ml/OD600 and then kept on ice for 30 min. Cells were disrupted by five freeze-thaw cycles and forced nine times through an 18-gauge needle with a 10-ml syringe. The lysate was collected by centrifugation at 23,000 × g for 30 min at 4°C. In later experiments, we performed an additional centrifugation at 400,000 × g for 30 min at 4°C to decrease the background fluorescence. Standard curves were made by adding rGFPuv (Clontech) to lysates prepared from untransformed E. coli cells.
Fluorometric Measurements of GFP and Control of Transcription. For the commercial vector pGFPuv, expression of the GFP gene was under the control of the lacZ promoter/operator, and activation of the promoter was expected to occur after induction by IPTG. However, E. coli cells transformed with pGFPuv were fluorescent in the absence of IPTG, suggesting that the concentration of lac repressor was insufficient, and/or that transcription from the ampicillin gene continued into the GFP gene. To obtain no fluorescence in the absence of IPTG, we cloned a transcriptional terminator before the GFP gene (Fig. 1) and co transformed cells with pIQ-kan, which expressed a variant lacI allele overproducing the repressor.
We analyzed the transcriptional activation of GFP by quantitative determinations of the protein in bacterial lysates. Fluorescence measurements were taken on a Turner Quantech fluorimeter using an NB405 excitation filter and a SC500 emission filter. Readings were taken in the "raw fluorescence" mode. To standardize the fluorescence assays, we performed the following experiments. First, standard curves were made by supplementing GFP-free lysates with known amounts of rGFPuv, which gave the expected linear relationships (Fig. 5A). Because the fluorescence in the blanks from different lysates had variable values, large errors were obtained when the concentrations of lysate in the samples and in the standard curve were different. Fig. 5B shows that diluting a lysate sample with lysate buffer rather than with GFP-free lysate yielded underestimated values. To correct for this underestimation, bacterial pellets were resuspended in amounts of lysis buffer proportional to their optical density. Repetitive measurements of the same sample also gave erroneous readings (Fig. 5C).
The amount of GFP in wt1 and D uvrB cells harboring both pIQ-kan and pRW3619 at the end of a period of logarithmic growth was below the detection limit. In the absence of pIQ-kan, between 0.050 and 0.100 m g/ml GFP was detected. When IPTG was added to wt1 and D uvrB cells harboring pIQ-kan and pRW3619 at the beginning of the logarithmic growth for a 3-h period, the concentrations of GFP were as follows: for 0.005 and 0.05 mM IPTG, not detectable; for 0.5 and 5 mM IPTG, 0.2 and 0.7 m g/ml, respectively. Induction with 2 mM IPTG for 3, 6, and 21.5 h in wt1 yielded 2.5, 23, and 80 m g/ml GFP, respectively (Fig. 5D). We conclude that the two-plasmid system in conjunction with millimolar concentrations of IPTG provided an effective method for controlling transcription of the GFP gene. Therefore, we used a concentration of 2 mM IPTG and an induction period of 6 h to routinely stimulate transcription.
Determination of the Frequency of White CFUs. We first transformed wt1, D UvrB, wt2, D MutL, and D RecA E. coli strains (part A of Table 4) with pIQ-kan and subsequently with pRW3619, pRW3620, or pRW3621. For each clone, we made a glycerol stock after confirming the plasmids by restriction analyses. Glycerol stocks were used to start overnight 10-ml cultures in the presence of ampicillin and kanamycin. One-milliliter aliquots were then transferred to two 500-ml flasks containing 100 ml of medium with ampicillin and kanamycin prewarmed to 37°C, and growth was continued for 3 h at 200 rpm, after which one flask was supplemented with 2 mM IPTG whereas both flasks received a second addition of ampicillin and kanamycin (10 m g/ml). Growth was continued for 6 h. The cultures were stopped on ice, OD600 was measured, and plasmid DNA was isolated as described above.
These plasmid preparations were then used to transform D UvrB cells. Transformants were plated in the presence of ampicillin and 1 mM IPTG and green and white CFUs were counted. We streaked white CFUs on plates (with ampicillin and IPTG) and determined their fluorescence again under conditions where no individual colonies were apparent. We termed the colonies that appeared white at this stage "white CFUs". To determine which of the white CFUs had a mutation in the GFP gene, we performed an additional restreaking to analyze individual colonies. In most cases, these turned out to be fluorescent. However, when plates showed that all, or most, individual CFUs retained their white phenotype, we restreaked them twice more. CFUs with a persistently white phenotype were transferred to liquid medium, from which we prepared a glycerol stock and plasmid DNA. This plasmid DNA was used for restriction analyses, DNA sequencing, and retransformation into new competent D UvrB cells to verify their fluorescence. The white transformants that showed instabilities were named "mutant white CFUs".
Amplification Primers. Primers for cycle sequencing were obtained from MWG-Biotech (High Point, NC) or from Sigma Genosys (The Woodlands, TX). The forward primers (and their positions relative to the pRW3620 map) were as follows: primer 1 at position 157 (TTCCGGCTCGTATGTTGTGTGG), primer 2 at position 552 (GCCCGAAGGTTATGTACAG), primer 3 at position 791 (GCCACAACATTGAAGATGGATCCG), primer 4 at position 5069 (ACACGACGGGGAGTCAGGC), and primer 5 at position 5905 (TGGAAAAACGCCAGCAACGCGGCC). The reverse primers were as follows: primer 6 at position 570 (CTGTACATAACCTTCGGGCATGGC), primer 7 at position 1093 (GAGAGTGGAGGGCACAGAGCAGC), primer 8 at position 3719 (GCTGGAGAGCCCACTTGAC), primer 9 at position 4041 (GCATATGGTGCACTCTCAG), and primer 10 at position 4271 (TATTGAAGCATTTATCAGGG).
DNA Sequencing. We sequenced clones on both strands using the primers described above. DNA isolated from mutant white CFUs was analyzed by cycle sequencing using an ABI Prism Big Dye Terminator Cycle Sequencing Kit (P/N 4303152, Applied Biosystems) and the ABI 310 Genetic Analyzer. Approximately 100-500 ng of DNA was mixed with 10 m M primer and 6 m l of enzyme mix and was then diluted to a final volume of 20 m l in water. Optimization of signal required two different thermal cycling conditions. Low-temperature cycle sequencing conditions consisted of heating to 98°C for 30 s, followed by 40 cycles of DNA annealing (50°C, 2 min) and elongation (62°C, 4 min). High-temperature cycle sequencing conditions consisted of heating to 98°C for 30 s, followed by 40 cycles of DNA annealing (58.3°C, 2 min) and elongation (67°C, 4 min). The reactions were purified by using Centrisep filter columns (Princeton Separations, Adelphia, NJ). The samples were dried, combined with HiDi formamide (Applied Biosystems), and then separated on the Genetic Analyzer with the POP-6 polymer system. The exact nature of the recombinational repair events was determined by using DNAStar sequence alignment software.
Statistical Tests. Two different statistical tests were used. For the analyses of white CFUs, the differences between groups of data were evaluated by the z test containing the Yates correction for continuity (6) with the program SigmaStat 2.0 (SPSS, Chicago). We used the z values to obtain P(a ), the probability of incurring a type I error. To be significant, the P(a ) value at P < 0.05 [P(a )0.05] or P < 0.01 [P(a )0.01] had to exceed 0.800. For the human database GRaBD, the distances of (R•Y)n and (RY•RY)n tracts from the breakpoint junctions were compared with the distances of (R•Y)n and (RY•RY)n tracts from random "control" locations, as specified later, using the survival log- rank test.
Recultivation Assays. Cells (part B of Table 4) were transformed by electroporation and » 2,000–5,000 CFUs were used to start 10-ml cultures in LB medium with appropriate antibiotics. Cells were grown overnight to stationary phase conditions (1st recultivation). One microliter (» 106 CFUs) was transferred to new medium (2nd recultivation) for a new overnight growth, and the procedure was repeated 5–7 times. Plasmid DNA was isolated after each recultivation and cleaved with AlwNI at its unique restriction site located in the origin of replication region, and the DNA was analyzed by 1% agarose gel electrophoresis. The instabilities were quantitated by ethidium bromide staining using the AlphaDigidoc 1000 (Alpha Innotech, San Leandro, CA) software.
Complexity Analysis. Complexity analysis (7, 8) was used to assess the likelihood of DNA breakage through the adoption of supercoil-dependent alternative DNA structures and the potential of illegitimate DNA end-joining through either secondary structure formation or slipped mispairing. Complexity analysis is based on the assessment of the occurrences of four different types of repetitive element, namely direct repeats, inverted repeats, symmetric elements, and inversions of inverted repeats. Because sequence complexity is strongly influenced by the occurrence of certain types of repeat within the DNA sequence, it represents an appropriate measure to assess the potential of a given sequence to form secondary structure or to exhibit sequence homology. The decomposition of a DNA fragment (S = AAGATCTTGA) with reference to the different types of repeats is exemplified in Fig. 7. Henceforth, repeated fragments are underlined, whereas repeats capable of DNA secondary structure formation are marked by arrows. Decompositions were determined both for 50-nt windows centered at the deletion breakpoints, which comprised the AˆB, AˆC, AˆD, CˆB, BˆD, and CˆD segments of DNA sequences. The length of the fragments to be considered as potentially involved in DNA structure formation was set at 5 bp. This length was selected on the basis of the statistical result of Zubkov and Michailov (9), who showed that the estimated average length of the longest repeat in a random text that has the same length and the same letter composition as a given text tends to 2 ln N½ ½ ln½ , where N is the length of the sequence and pr (1 £ r £ n) is the probability of the occurrence of the rth nucleotide in the sequence. In addition, decompositions for » 300 bp on either side of the breakpoints were computed.
Analysis of the "Nearest (RY•RY)n and (R•Y)n Tract to Rearrangement Breakpoint" in the GRaBD. The Gene Rearrangement Breakpoint Database (GRaBD) represents a collection of 397 published germ-line and somatic DNA breakpoint junctions derived from 219 different chromosomal rearrangements, most of which cause either human inherited disease or cancer (10). The distance from the nearest (RY•RY)n or (R•Y)n tract to breakpoint (n = 5–25) junction was analyzed for 126 gross deletions and 96 translocations. Two values, representing the distance from the breakpoint to the nearest 5' or 3' tract within 125 bp, respectively, were calculated for each breakpoint junction. These distance values were calculated for a series of motifs from 5 to 25 bp in length. Tracts that overlapped the breakpoint were classed as either 5' or 3' depending upon which side of the breakpoint the bulk of the motif fell (to ensure that this classification could be performed unambiguously, only motifs of an odd-numbered length were analyzed). Breakpoint-overlapping motifs were assigned the distance value of 1, motifs immediately adjacent to the breakpoint were assigned a score of 2, and motifs more distant from the breakpoint were assigned a score of 2 plus the number of intervening base pairs between the breakpoint and the motif. If the motif was absent upstream (or downstream) of the breakpoint, the result was treated as a censored observation. As a control, a collection of genomic location-matched human "reference DNA sequences" comprising 596,093 bp was retrieved from the GenBank and European Molecular Biology Laboratory (EMBL) databases. For each GRaBD entry, a control set of 10 DNA sequences matching the entry in terms of both sequence length and genomic location was sampled from this collection. Because the data were treated as censored, a statistical approach based on the survival log-rank test was adopted. By using this test, the distance scores calculated from a case and control sets were compared. Similar analyses using larger control sets (100, 1,000, and 7,500 control sequences per breakpoint junctions) generated virtually identical results (not shown). This approach was applied to the 37 deletion and 53 translocation breakpoint junctions whose breakpoint positions could be identified unambiguously.
The breakpoint locations of the remaining 89 deletion and 43 translocation junction sequences were ambiguous, since the sites of DNA breakage occurred within regions of sequence homology. In these cases, the breakpoint junctions were surveyed for the presence of (RY•RY)n or oligo(R•Y)n tracts of at least 5 bp occurring within either the regions of ambiguity or within the flanking 1–20 bp.
Supporting Results
Other Mutations. To determine the mechanisms responsible for the PKD1-induced loss of fluorescence, we replated each of the 1,434 white CFUs on ampicillin and IPTG-containing plates at least three consecutive times. In » 80% of cases, the nonfluorescent phenotype was limited to confluent cells. When restreaked, either fluorescent or very weakly fluorescent individual colonies appeared. We conclude that the white phenotype was caused by a decrease in GFP synthesis, rather than by mutations affecting the gene product. In the other 20% of cases, there was no recovery of fluorescence. These cells grew poorly, did not form individual colonies, and eventually ceased growing. To determine whether these results were caused by the exposure to UV light or by the PKD1 tract, we performed the following control. Ten plates containing confluent fluorescent wt1 cells harboring pRW3620 were exposed to UV for 15 s and then restreaked. The procedure was repeated 10 times for each plate. Small patches of fluorescence loss appeared after the third UV treatment in two plates, and then in others after subsequent treatments. In two plates, however, loss of fluorescence was never observed. Cells restreaked from the nonfluorescent patches always grew and again formed both fluorescent individual CFUs and small nonfluorescent patches. These data show that exposure to UV did induce loss of fluorescence. However, the effect was less dramatic than that observed for the white CFUs and did not cause growth arrest. Therefore, we conclude that the plasmids, rather than UV light, were responsible for the white CFUs phenotype.
To determine the reason for the loss of fluorescence, we isolated pRW3620 from 200 white CFUs. We found that only dimer plasmid was detected in 10% of the colonies. Because plasmid multimerization inhibits recombinant protein expression (11, 12), this result suggested that the PKD1 tract increased loss of fluorescence by inducing plasmid dimerization. Therefore, we measured the monomer/dimer (M/D) ratio for wt2, D RecA, and D MutL cells transformed with pRW3620 after 1, 3, 5, and 7 one-day recultivations in the absence of IPTG (Fig. 6). The values decreased from » 2 in the DNA used for transformation to 0.45 ± 0.14 in wt2, 0.38 ± 0.09 in D RecA, and 1.42 ± 0.34 in D MutL. These data show that the PKD1 tract increased the fraction of dimer plasmid, mostly in wt2 and D RecA, thereby confirming the conclusion that loss of fluorescence was caused by plasmid multimerization. The data further confirm the results in Table 2, which showed that the MMR protein was required to elicit the PKD1 response.
To determine the cause for the loss of cell viability, we counted the number of CFUs for wt2, D RecA, and D MutL transformed with pRW3620 at the end of the fifth recultivation in the presence and absence of IPTG (Table 5). The number of CFUs on agar plates without ampicillin (ampicillin sensitive) was lower in the presence of transcription than in its absence, suggesting longer doubling times or increased cell death. Most CFUs grew on ampicillin-containing agar plates, indicating that pRW3620 was retained. However, for wt2 cells grown with active transcription, the number of CFUs dropped by two orders of magnitude in the presence of ampicillin, indicating that >99% of cells lost the plasmid. Since such a drop was not observed with pRW3621 (not shown), we conclude that (i) loss of cell viability was caused by plasmid loss, (ii) the full-length PKD1 tract together with active transcription was responsible, and (iii) both the MutL and RecA proteins were involved in the process.
Analysis of DNA Sequences Flanking the Breakpoints in Clones 1–3 and 5–11. Clones with breakpoints in the vector and the PKD1 tract. Clone 6 underwent deletion and inversion reactions. A 3-kbp deletion occurred between positions 643 in the GFP gene and 3567 (PKD1 tract) (Fig. 8A and Table 1), which were not joined together. Position 643 was followed by base pair 3985 in an inverted orientation (Fig. 8A Center). A homologous CCC•GGG tract was present at both breakpoints after the inversion. The tract at 643 was located 3 bp upstream from two GTTAA direct repeats (Fig. 8A Top Left). The tract at 3985 was 1 bp downstream from a slipped structure formed by two GCGGGTGT motifs (Middle left), and abutting two tandem GGCT. In addition, a TTAAC tract, homologous to the direct repeats at 643 after the inversion, was located at position 3994. The endpoint of the inverted fragment (base pair 3567) continued with base pair 3902 (base pairs 3986–3901 were lost), at sites where the inversion revealed a homologous C•G pair (Fig. 8B). The breakpoint at C•G 3567 was in the PKD1 tract, and was preceded by direct and mirror repeats. The four GGGAG tracts on the bottom strand would also form tetraplex structures. Alternatively, slippage between 21-bp direct repeats positions the breakpoint at the junction with one of the loops. The breakpoint at G•C 3902 is located 3 bp upstream from an imperfect inverted repeat (cruciform in Fig. 8A Lower Right). Finally, a GGGAG motif, located at the junction with duplex DNA, was homologous, after the inversion, to the tetraplex-forming units at the PKD1 breakpoint. It is possible that these additional homologies near the breakpoints also played a role in the deletion reactions.
In clone 10, the breakpoint in the PKD1 tract (CC•GG at position 2423) was preceded by several direct repeats (Fig. 8C Right), the most marked being the repetition of four GGAGG pentamers, shown in a tetraplex configuration. The location of the breakpoint is 1 bp from the tetraplex. In the vector (Fig. 8C Left), the breakpoint at the CC•GG (position 862) was followed by two direct CCTGT and ACAA repeats and two TTACC mirror repeats. Slippage between the CCTGT repeats serves to position the breakpoint at the junction of the B-helix duplex with the single-stranded loop.
In clone 8, the deletion occurred between two 8-bp homologous tracts, CTCTCCCC, (Fig. 8D). At the breakpoint in the PKD1 region, the tract was located within a 13-bp region separating two 15-bp CCCTCTCCTCTCCCC direct repeats. Slippage would position the homologous nucleotides within loops, which may interact (Fig. 8D Lower Left), since they oppose each other because they are separated by one helical turn. The base pair composition and the local topological constraints are compatible with the formation of a short triplex involving the homologous nucleotides. An alternative triplex is also possible (Fig. 8D Lower Right), with the homologous nucleotides wrapping around the duplex DNA to form Y•R•Y triads. Interestingly, the breakpoint would be located between unpaired and paired bases in both conformations. The purine-rich loop may be partially ordered (the configuration shown at the Lower Right is not intended to indicate a minimum energy state) by hydrogen bonds and stacking interactions between purines (13). In the vector (Upper Left), the eight homologous base pairs were followed by three CG dinucleotide repeats. The ability of repeating CG to alternate between right- handed B and left-handed Z configurations, and the susceptibility of B–Z junctions to nuclease cleavage in vitro have been described (14-16).
In clone 5 (Fig. 8E), the deletion occurred between two CCTT homologous base pairs. At the PKD1 breakpoint (Right), the homology was preceded by two 10-bp CCTTTCCCCT direct repeats separated by 17 bp. Slippage locates the breakpoint at the junction between a single-stranded loop and duplex DNA. A similar architecture was present in the vector (Left), where slippage between two GTTAA direct repeats (separated by 9 bp) positioned the breakpoint at the junction between duplex DNA and a single-stranded loop.
Clones with breakpoints in the vector alone. In clone 1 (Fig. 8F), the deletion occurred between two homologous TGA repeats. Two CAACAT motifs were present, one at each breakpoint, and we propose that these were involved in mediating the mutational event. One mechanism would involve slippage between the two CAACAT motifs and the formation of a » 400-nt loop (Right), which could occur during replication on the lagging strand template. Alternatively, the repeats could interact after the looping of duplex DNA.
In clone 2 (Fig. 8G), the deletion occurred between two homologous A•T pairs. That at position 6087 abutted an imperfect mirror repeat, which we show in a triplex structure. The one at position 3828 was followed by an imperfect inverted repeat, and therefore a cruciform structure. For the cruciforms, we extended the base pairing beyond the program (17) (www.bioinfo.rpi.edu/applications/mfold/old/dna/form1.cgi), to include G•G, G•A (18–20), and T•T (21, 22) pairs.
In clones 3 and 9, the deletions occurred between two homologous T•A pairs, which were external to homologous TAC repeats (Fig. 8H). The breakpoint at position 5931 was followed by two tandem CTGGCCTTTTG repeats, shown as a slipped structure. The breakpoint at 4024 was also preceded by direct repeats (two CAGA repeats) that may form a slipped structure. In addition, a cruciform may form, containing the breakpoint within the stem.
In clone 7 (Fig. 8I), the deletion occurred between homologous AA•TT pairs. Several short direct repeats, expected to form a variety of small slipped structures, were found internal to the breakpoints.
In clone 11 (Fig. 8J), the deletion occurred between inverted CA•TG pairs. The breakpoint at position 6086 was identical to clone 2, whereas the breakpoint at position 4201 was flanked by short direct repeats embedded in an A+T-rich environment. The model shows a slipped structure between two ATAAA repeats.
Analysis of the DNA Sequence at Breakpoints of Clones Isolated from Recultivation Assays in PolII-, PolIV-, and PolV-Deficient Strains. We wished to determine whether the deletions observed in the recultivation assays were also characterized by non-B DNA structures at breakpoints, and whether the DNA damage-induced DNA polymerases (PolII, PolIV, and PolV) played a role in the PKD1-induced instabilities. Replication pausing is increased by non-B DNA-forming triplet repeats associated with neurological diseases (23), and such an effect may stimulate the SOS pathway; these DNA polymerases are a part of this pathway (24, 25). We found that in the presence of transcription, the deletion of the DNA damage-induced polymerases did not affect the stability of pRW3620 (not shown). Furthermore, restriction analyses of 86 clones indicated that the lengths of deletions, from 0.3 to 3.7 kbp, were comparable to those of wt strains. Thus, we conclude that the polymerases did not play a role in the transcription-induced, PKD1-dependent deletions. We selected five clones, three not fluorescent and two fluorescent (Fig. 9), for further sequence analyses.
In clone 12 (Fig. 10A), the deletion took place at homologous T•A pairs between positions 128 in the lacZ promoter and 3611 within the PKD1 insert (30 bp downstream of the poly(R•Y) tract). Two pairs of inverted repeats flanked the breakpoints, which may form cruciforms and locate the homologous T•A pairs at junctions between the structures and regular duplex DNA.
In clone 13 (Fig. 10B), the deletion occurred between two inverted AGC•GCT trinucleotides, one at position 1, just downstream of the transcriptional terminator, and the other at position 3720 in the PKD1 insert, 40 bp from its end. Both locations may fold into cruciforms and, interestingly, the breakpoints are found at sites of distortion in the duplex DNA within the structures.
In clone 14 (Fig. 10C), the deletion occurred between two inverted TTGG•CCAA pairs, one at position 852 in the GFP gene and the other at position 3769. Two mismatched cruciforms may form at both locations, which locate the breakpoints either in the loops or at the distorted sites of duplex DNA in the hairpin stems.
In clone 15 (Fig. 10D), the deletion took place between two inverted GA•TC dinucleotides, one at position 953 in the GFP gene, the other at position 1733 in the poly(R•Y) tract of PKD1. Because this clone was fluorescent, deletion in the GFP gene (50 bp from the end with the loss of the translation terminator site) had not abrogated the fluorescent properties of the polypeptide. The breakpoint in the GFP gene was followed by an inverted repeat, and the cruciform would locate the homologous pairs at the junction between regular duplex and the non-B structure. In the PKD1 tract, the breakpoint was followed by two TCTCCTCCCTCCC direct repeats separated by 4 bp, which are shown in a slipped configuration. Here, again, the homologous pairs were located at the junction between regular duplex DNA and the alternative structure.
In clone 16 (Fig. 10E), the deletion occurred within 12 homologous CTCCCCTCCCCT bp located at positions 1161 and 3484, respectively. Several direct repeats were present at both locations. Between base pair 1113 and 1348, a consensus TCCCCTCCCCTAGCCCTTCCCCTCC was repeated seven times, whereas between base pair 3245 and 3500 a consensus CCCCTTCTCTCCCCTCCCCTCTC was repeated at least seven times. Both repeat blocks contained the 12-bp homologous tract. The model shows one of the possible slipped structures whose formation could be mediated by the repeated units and emphasizes the potential of sequence motifs of the PKD1 tract to fold into a variety of stable non-B DNA structures.
From this analysis, we conclude that the deletion breakpoints coincided with DNA motifs that could adopt non-B DNA conformations in all five clones isolated from the recultivation assays.
Analysis of the DNA Sequence at Deletion Breakpoints in Selected Human Diseases. We wished to determine whether gross deletions in human chromosomes leading to inherited disease also share the characteristics of the PKD1-induced deletions in pRW3620, i.e., short homologies and sequence motifs that form non-B DNA structures at breakpoints. We therefore analyzed 11 mutations linked to 6 different inherited diseases, one in the PKD1 gene (16p13) associated with autosomal dominant polycystic kidney disease (26), one in the PARK7 gene (1p36) associated with autosomal recessive early-onset Parkinsonism (27), three in the ATP7A gene (Xq12–q13) associated with recessive lethal Menkes syndrome (28), one in the a -globin (HBA) locus (16pter–13.3) causing a +-thalassemia (29), three in the ALD gene leading to adrenoleukodystrophy, and two involving the L1CAM gene associated with hydrocephalus, both on chromosome Xq28 (30). Except for the PKD1 mutation, all others were derived from studies essentially selected at random. Analyses of the DNA sequence involved the criteria used above for the plasmid deletions and, in addition, the search for interspersed elements by the RepeatMasker program (http://www.repeatmasker.org/) version 07/07/2001 and the RepBase database version 6.3.
The first case contained an » 3-kbp deletion in the PKD1 gene (26). Fig. 12A shows the DNA sequence flanking the breakpoints; the deleted nucleotides are shown in parentheses and bold letters. Several base pairs with inverted homology were found at the breakpoints (wavy underline). Several sequence motifs were present at both locations that may adopt supercoil-dependent non-B DNA structures. In intron 1, an oligo(R•Y) tract that may form triplex DNA was identified (doubly underlined tract of capital letters), whereas in intron 5 several short direct repeats occurred (green and blue highlight), some compatible with the formation of tetraplex structures (blue highlight).
The second cases (Fig. 12B) contained an » 14-kb deletion in the PARK7 gene (27). The telomeric breakpoint occurred within the first of two repeated AluSx sequences (green highlight), separated by 34 bp containing a repeated GAA motif and a run of 13 Gs (italics in green highlight). The centromeric breakpoint was located within an AluJo sequence, followed by 2 bp and then by an AluSx repeat (yellow highlight). AluJo and AluSx shared a 35-bp motif (wavy underline) and the breakpoints occurred within this conserved sequence (TGYARTCCCAGCTACTCWRGAGGCTGAGGTGGGAG). The centromeric breakpoint was preceded by a tandem TGGTG direct repeat (green highlight), a 20-bp A+T-rich tract (gray highlight), and a 7-bp oligo(C•G) (green highlight). We note that the breakpoints occurred between two 35-bp homologous tracts, rather than between >300-bp homology of the AluSx repeats.
The third case (Fig. 12C) contained a 13.7-kbp deletion between introns 9 and 5, with a 13-bp insertion (GGAACATGGTTGT) in the copper-binding P-type ATPase (ATP7A) gene (28). The breakpoints occurred at homologous CAA trinucleotides (wavy underline). The one in intron 9 was located » 100 bp upstream from a FLAM_A SINE element, whereas the one in intron 5 was located 18 bp after an AluSx element. Extensive homologies existed between the two DNA elements (underline in yellow highlight) when aligned in inverted orientation. Additionally, the FLAM_A element (intron 9) initiated within an A+T-rich region (gray highlight) and the breakpoint was flanked by additional CAA repeats and preceded by three CTTT direct repeats (green highlight). The breakpoint in intron 5 was flanked by four A+T-rich sequences (20-, 8-, 11-, and 13-bp long, gray highlight); it was preceded by two tandem CCACT motifs (green highlight), and followed by several spaced AAA/TTT (red underline) motifs (compatible with static bends), and by an 11-bp motif (double wavy underline) complementary to the inserted sequence. Altogether, the deletion is consistent with the formation of a large loop stabilized by extensive base pairing between the inverted homologous sequences, as described for the PKD1 case (Fig. 4A). Similarly, the presence of nearby A+T-rich regions, the potential formation of non-B DNA structures (tetraplex and slipped structures), and torsional strains associated with static bends may have facilitated the melting of large areas of duplex DNA, thereby promoting the extensive interactions between the inverted homologous regions.
The fourth case involved a 13.7-kbp deletion between introns 14 and 7 of the ATP7A gene (Fig. 12D). The breakpoints occurred at an inverted homologous 5-bp (A)5 tract in intron 14 and TTATT in intron 7 (wavy underline), which may form four A•T pairs and one T•T/A•A pair. The breakpoints were located within a L1M2_orf2 LINE element in intron 14 and an AluSg sequence in intron 7, both rich in static bends (red and underline) due to phased AAA/TTT tracts. The breakpoint in intron 14 was located in the first of two direct repeats (AAAAAGAA), and was preceded (» 50 bp) by six CTA trinucleotides and followed (» 200 bp) by three ACAA repeats (green highlight). The breakpoint in intron 7 was preceded by two CATA direct repeats, three imperfect 14-bp direct repeats (green highlight), and an (A•T)26 tract (gray highlight). Thus, this deletion is characterized by the presence of homologies at the breakpoints and by DNA motifs compatible with the formation of supercoil-dependent slipped structures and melting of duplex DNA.
The fifth case involved a 15-kbp deletion between introns 15 and 11 of the ATP7A gene (Fig. 12E). The breakpoints occurred at 4-bp homologous tracts [either CACT (italic) or TTCA (wavy underline)]. The breakpoint in intron 15 was located within an AluSx element and was part of 42-bp direct repeats (green highlight). The breakpoint in intron 11 was located at the end of two » 20-bp direct repeats (green highlight); it was followed by an A+T-rich tract (gray highlight) and preceded by two CTGGAA direct repeats (green highlight). Therefore, the breakpoints of this deletion shared a short region of homology and were located within potential slipped structures.
The sixth analysis consists of a 7.9-kbp deletion in two independent families that involved the y a 1 and a 2 (HBA2) genes of the globin locus, causing a +-thalassemia (29). The breakpoints occurred at homologous CACAGGG (wavy underline) tracts (Fig. 12F). No Alu sequences were identified in the vicinity. However, the breakpoint upstream from the y a 1 pseudogene was preceded (8 bp) by two ATTTTTGAAAT direct repeats (green highlight) and it was part of three GGGT direct repeats (green highlight). This was followed by two ACCCC, two CCCAGC, and two GGGAA tandem repeats (green highlight) prone to form slipped structures, and by four CCCAGG direct repeats (blue highlight), which may form either slipped or tetraplex structures. The breakpoint downstream of the a 2 gene was preceded by six interspersed AAA/TTT tracts (red and underline), two GCCGA and two AGGGCCTG direct repeats (green highlight), and two CCCCC(C) and four AGGG tetraplex-forming direct repeats (blue highlight). This breakpoint was followed by five CACCC tracts (blue highlight), thus by additional tetraplex-forming tracts. Within 200 bp, two oligo(R•Y) tracts (double underline and capital letters) were also present. Therefore, the breakpoints of these deletions occurred at homologous sequences and were embedded in regions of DNA with the potential to form slipped, tri-, and tetrastranded structures.
The seventh case contained an 8.4-kbp deletion (with a CAGTTGC insertion) from exon 1 to intron 2 in the ALD gene (30). The breakpoints occurred at homologous tracts that contained a GCAAN(N)TGGA motif (Fig. 12G, wavy underline). The GCAA sequence was present twice on either side of the exon 1 breakpoint (green highlight) as a GCAAC direct repeat, and the proximal repeat [GCAAC(A)TG] was complementary to the insertion. To explain this insertion, the authors proposed an elegant model that involved strand annealing, DNA synthesis, and repair. Sequence dot-plot analyses (17) of » 500 bp indicated that a cluster of hairpin loop structures (short inverted repeats interacting in various combinations) may form within » 150 bp encompassing the breakpoint. The breakpoint in intron 2 was located within an AluSx element and between two » 60-bp direct repeats (green highlight), upstream from two (19- and 28-bp) A+T-rich tracts (dotted underline). Thus, features of these deletion breakpoints include short (4- to 8-bp) homologies, the duplication of a 7-bp insert, sites of hairpin and slipped structures, and tracts of low-melting base pairs.
The eighth case was a 4.6-kbp deletion from introns 2–5 in the ALD gene (Fig. 12H), with an insertion of » 130 bp (an AluYb8 sequence). The breakpoints occurred at a CACAA homologous tract (wavy underline). In intron 2, the breakpoint occurred at the end of the second direct repeat described earlier (Fig. 12G). The breakpoint in intron 5 occurred within an AluJb tract and was located between two » 40-bp direct repeats followed by two GAAA tandem repeats (green highlight). The breakpoint was also immediately preceded by an A+T-rich tract (gray highlight). An » 40-bp tract (double underline) homologous to the direct repeats in intron 2 (double underline in green highlight) was also present. Thus, the breakpoints in this deletion were characterized by 5-bp homologies, the presence of potential slipped structures, low-melting-point regions, and the ability to form looped structures.
The ninth case contained a 4.0-kbp deletion in the ALD gene between introns 2 and 5 (Fig. 12I). The breakpoints shared a 2-bp homology (CC). However other short homologies were also present (CCCCTCA, GGAT, TCGGG) within » 20 bp (wavy underline). Spaced C-rich tracts (blue highlight), with the potential to form tetraplex structures, and two CCCCCAG tandem repeats (green highlight) were present at the breakpoint in intron 2. The breakpoint in intron 5 was preceded by two oligo(R•Y) tracts (double underline capital letters) and by several short direct repeats (green highlight). In addition, a stable hairpin, whose stem terminated precisely at the breakpoint, was identified.
The tenth case contained a 1.9-kbp deletion between introns 27 and 20 of the L1CAM gene (Fig. 12J). The breakpoints occurred at a common GCAG tract (wavy underline), with an additional CCAGG motif (italics and underlined) also present at both locations. The breakpoint in intron 27 was preceded by two 24-bp and two GGGGAG direct repeats (green highlight), and was followed by two oligo(R•Y) tracts (double underline and capital letters). The breakpoint in intron 20 was flanked by short direct repeats, including TGCCCC, GGGGA, TGGGGTGC, GGT, CCC/GGG, and two longer imperfect repeats (green highlight). Thus, the deletion breakpoints shared a 4-bp homology and were close to slipped, triplex-forming, and tetraplex-forming sequences.
The eleventh case involved a 50-kbp deletion that ablated both the ALD and the adjacent L1CAM gene. The breakpoints shared a common AGG repeat (Fig. 12K, wavy underline). The AGG upstream of L1CAM was embedded in a potential slipped structure formed by two » 40-bp imperfect direct repeats (green highlight), which was preceded by tetraplex-forming G-rich repeats (blue highlight). The one downstream of the ALD gene was part of a LINE 1 (L1MB7) element and was also located in a potential slipped structure (two GGGTCAGGACA direct repeats) (green highlight). Additional non-B-DNA-forming motifs included two oligo(R•Y) tracts (double underline and capital letters), two CARCCCCCTG, and two » 70-bp direct repeats (green highlight), all upstream of the breakpoint.
In summary, we found that the breakpoints of deletions leading to human diseases occurred in all cases at homologous sequences, and that they invariably coincided with sites able to adopt supercoil-dependent non-B DNA structures. These features are identical to those discovered for the PKD1-induced long deletions in pRW3620, suggesting that long deletions occur by evolutionarily conserved mechanisms that involve the formation of alternative DNA conformations, DNA looping, and base pairing at breakpoints.
1. Hashemzadeh-Bonehi, L., Mehraein-Ghomi, F., Mitsopoulos, C., Jacob, J. P., Hennessey, E. S. & Broome-Smith, J. K. (1998) Mol. Microbiol. 30, 676-678.
2. Koo, H. S., Wu, H. M. & Crothers, D. (1986) Nature 320, 501-506.
3. Sambrook, J. & Russell, D. W. (2001) Molecular Cloning: A Laboratory Manual (Cold Spring Harbor Lab. Press, Plainview, NY), 3rd Ed.
4. Bacolla, A., Jaworski, A., Connors, T. D. & Wells, R. D. (2001) J. Biol. Chem. 276, 18597-18604.
5. Bowater, R. P., Chen, D. & Lilley, D. M. (1994) Biochemistry 33, 9266-9275.
6. Glantz, S. A. (1981) Primer of Biostatistics (McGraw-Hill, New York).
7. Gusev, V. D., Nemytikova, L. A. & Chuzhanova, N. A. (1999) Bioinformatics 15, 994-999.
8. Chuzhanova, N., Abeysinghe, S. S., Krawczak, M. & Cooper, D. N. (2003) Hum. Mutat. 22, 245-251.
9. Zubkov, A. M. & Michailov, V. G. (1974) in Theory of Probability and Its Applications, ed. Prokhorov, Y. V. (Russian Acad. Sci., Moscow), Vol. 19, pp. 173-181 (in Russian).
10. Abeysinghe, S. S., Chuzhanova, N., Krawczak, M., Ball, E. V. & Cooper, D. N. (2003) Hum. Mutat. 22, 229-244.
11. Summers, D. K., Beton, C. W. & Withers, H. L. (1993) Mol. Microbiol. 8, 1031-1038.
12. Saraswat, V., Kim, D. Y., Lee, J. & Park, Y. (1999) FEMS Microbiol. Lett. 179, 367-373.
13. Kettani, A., Gorin, A., Majumdar, A., Hermann, T., Skripkin, E., Zhao, H., Jones, R. & Patel, D. J. (2000) J. Mol. Biol. 297, 627-644.
14. Herbert, A. & Rich, A. (1999) Genetica 106, 37-47.
15. Zacharias, W., Jaworski, A., Larson, J. E. & Wells, R. D. (1988) Proc. Natl. Acad. Sci. USA 85, 7069-7073.
16. Zheng, G. X., Kochel, T., Hoepfner, R. W., Timmons, S. E. & Sinden, R. R. (1991) J. Mol. Biol. 221, 107-122.
17. SantaLucia, J., Jr. (1998) Proc. Natl. Acad. Sci. USA 95, 1460-1465.
18. Majumdar, A., Gosser, Y. & Patel, D. J. (2001) J. Biomol. NMR 21, 289-306.
19. Majumdar, A., Kettani, A., Skripkin, E. & Patel, D. J. (2001) J. Biomol. NMR 19, 103-113.
20. Peyret, N., Seneviratne, P. A., Allawi, H. T. & SantaLucia, J., Jr. (1999) Biochemistry 38, 3468-3477.
21. Smith, G. K., Jie, J., Fox, G. E. & Gao, X. (1995) Nucleic Acids Res. 23, 4303-4311.
22. Zheng, M., Huang, X., Smith, G. K., Yang, X. & Gao, X. (1996) J. Mol. Biol. 264, 323-336.
23. Samadashwily, G. M., Raca, G. & Mirkin, S. M. (1997) Nat. Genet. 17, 298-304.
24. Pham, P., Rangarajan, S., Woodgate, R. & Goodman, M. F. (2001) Proc. Natl. Acad. Sci. USA 98, 8350-8354.
25. Tang, M., Pham, P., Shen, X., Taylor, J. S., O’Donnell, M., Woodgate, R. & Goodman, M. F. (2000) Nature 404, 1014-1018.
26. Thomas, R., McConnell, R., Whittacker, J., Kirkpatrick, P., Bradley, J. & Sandford, R. (1999) Am. J. Hum. Genet. 65, 39-49.
27. Bonifati, V., Rizzu, P., van Baren, M. J., Schaap, O., Breedveld, G. J., Krieger, E., Dekker, M. C., Squitieri, F., Ibanez, P., Joosse, M., et al. (2003) Science 299, 256-259.
28. Poulsen, L., Horn, N., Heilstrup, H., Lund, C., Tumer, Z. & Moller, L. B. (2002) Clin. Genet. 62, 449-457.
29. Harteveld, C. L., Van Delft, P., Wijermans, P. W., Kappers-Klunne, M. C., Weegenaar, J., Losekoot, M. & Giordano, P. C. (2003) Br. J. Haematol. 120, 364-366.
30. Kutsche, K., Ressler, B., Katzera, H. G., Orth, U., Gillessen-Kaesbach, G., Morlot, S., Schwinger, E. & Gal, A. (2002) Hum. Mutat. 19, 526-535.