Skip to main content
Proceedings of the National Academy of Sciences of the United States of America logoLink to Proceedings of the National Academy of Sciences of the United States of America
. 1997 Dec 23;94(26):14584–14589. doi: 10.1073/pnas.94.26.14584

Sequence of the FRA3B common fragile region: Implications for the mechanism of FHIT deletion

Hiroshi Inoue 1, Hideshi Ishii 1, Hansjuerg Alder 1, Eric Snyder 1, Teresa Druck 1, Kay Huebner 1, Carlo M Croce 1
PMCID: PMC25062  PMID: 9405656

Abstract

The hypothesis that chromosomal fragile sites may be “weak links” that result in hot spots for cancer-specific chromosome rearrangements was supported by the discovery that numerous cancer cell homozygous deletions and a familial translocation map within the FHIT gene, which encompasses the common fragile site, FRA3B. Sequence analysis of 276 kb of the FRA3B/FHIT locus and 22 associated cancer cell deletion endpoints shows that this locus is a frequent target of homologous recombination between long interspersed nuclear element sequences resulting in FHIT gene internal deletions, probably as a result of carcinogen-induced damage at FRA3B fragile sites.


Fragile sites are chromosome regions that reveal cytogenetically detectable gaps after the exposure of cells to specific reagents (1); several folate-sensitive, heritable fragile sites have been localized to unstable CGG repeats, and one of these sites, the FRA11B at 11q23.3, is associated with Jacobsen (11q-) syndrome that shows a direct link between a fragile site and in vivo chromosome breakage (2). Another heritable fragile site, the distamycin-A sensitive FRA16B, recently was found to be caused by an expanded 33-bp AT-rich minisatellite repeat (3). Thus, repeat expansion is a property of trinucleotide and minisatellite repeats, and these dynamic mutations are sometimes detected as fragile sites.

After induction of fragile sites in somatic cell hybrids, increased frequencies of translocations, deletions (46), and plasmid DNA integration (7) at fragile sites have been observed. Also, because the chromosomal positions of fragile sites apparently coincide cytogenetically with the regions of similar chromosomal aberrations in cancer cells, it has been postulated that fragile sites, which are susceptible to carcinogen-induced alterations (8), could play a role in cancer cell-specific chromosomal rearrangements (912), but the direct proof of involvement of fragile sites in cancer has been lacking.

Recently, the most inducible common fragile site, FRA3B at 3p14.2, was shown to exhibit gaps or fragility over a broad region (1315), much of which has been cloned (7, 13, 14, 16, 17) and partially sequenced (7, 14, 17, 18). A papillomavirus insertion site (14), plasmid integration sites (7), and cancer specific translocations and deletions (17, 19, 20) have been mapped within the FHIT gene, which encompasses FRA3B (16). FHIT alleles are frequently inactivated in common human cancers (16, 2024), and the replacement of Fhit expression in cancer cells suppresses their tumorigenicity (25). Thus, chromosome rearrangement at FRA3B, probably following carcinogen damage, is associated with development of major human cancers.

To elucidate mechanisms involved in FHIT alterations and FRA3B/FHIT fragility, we sequenced a ≈200-kb region surrounding FHIT exon 5, which included the majority of induced gaps (15) and overlaps the reported 110-kb partial intron 4 sequence (17), resulting in a total sequenced region of 276 kbp. This region also showed hemizygous loss in numerous cancers (19, 2631). We also sequenced a region encompassing the familial kidney cancer-associated translocation (32) break between FHIT exons 3 and 4.

MATERIALS AND METHODS

DNA Sequencing Templates.

As shown in Fig. 1, six overlapping cosmid clones, covering ≈200 kbp of the FHIT locus from intron 4 through a large part of intron 5 (16), were sequenced completely. BAC clones 1O12 and 358N7, overlapping the centromeric and telomeric ends of the cosmid contig, were obtained from Research Genetics, Huntsville, AL. For sequencing the t(3;8) translocation region, a bacteriophage clone from a YAC 850A6 phage library was selected by using the D3S1480 amplified product. Phage ends were sequenced, and primers pairs prepared. PCR amplification of hybrid DNAs, including DNA from a hybrid carrying the der 3 chromosome from a t(3;8) family member, showed that this phage spanned the break, as did an 8.4-kb subclone (33).

Figure 1.

Figure 1

The normal FHIT/FRA3B locus. The top line represents the locus with positions of reported markers and FHIT exons. The long hatched bar represents the 210-kb sequenced region, which overlapped the reported U66722 sequence (solid bar) (17). Genomic YAC, BAC, phage and cosmid clones used in the analysis are shown. The small hatched bar represents the sequence of chromosome 3 at the t(3;8) break.

Cell Lines.

Cancer cell lines KATO III (stomach carcinoma), MDA-MB436 (breast carcinoma), LoVo (colon adenocarcinoma), and LS180 (colon adenocarcinoma) were obtained from the American Type Culture Collection and HK1 (nasopharyngeal carcinoma) was provided by Dolly Huang (Chinese University of Hong Kong). Deletion in the FHIT locus and absence of Fhit protein were reported for these cell lines (16, 19, 20).

Shotgun Sequencing.

Cosmid and BAC DNA were prepared by Qiagen Mini or Maxi prep kit (Qiagen, Chatsworth, CA). For each clone, 30-μl aliquots (30 μg) of DNA were sonicated at the lowest energy setting for 5–10 sec at 0°C with the 3-mm probe. Aliquots were electrophoresed on 0.8% agarose gels, and the preparations for which the peak fragment size was 2–3 kb were analyzed further. DNA fragments (50 μl) for each clone were digested for 10 min at 30°C with BAL-31 nuclease (New England Biolabs), extracted with phenol, precipitated, dissolved, and fractionated on a 0.8% low melting agarose gel. The 1.5- to 2.0-kb fraction was excised, extracted with glassmilk with a Geneclean III kit (Bio 101), and dissolved in 10 μl of distilled water. DNA concentration was measured with Picogreen fluorescent dye (Molecular Probes) on a FluorImager SI (Molecular Dynamics). A two-step ligation procedure was used to produce plasmid libraries essentially as described previously (34), for each cosmid or BAC clone. Recombinant plasmids were isolated by Qiagen BioRobot 9600, and DNA concentration measured on the FluorImager SI analyzer. Plasmid DNAs were digested with EcoRI and HindIII (Boehringer Mannheim) and were gel separated to confirm the presence of insert DNA. Sequencing reactions and analysis were performed by using dyedeoxy-terminator reaction chemistry on a Perkin–Elmer/Cetus DNA Thermal Cycler 9600 and the Applied Biosystems Model 377 DNA sequencing systems.

Sequence data were edited, assembled, and analyzed with the sequencher 3.0 program (Gene Codes, Ann Arbor, MI). Strategies implemented for sequence completion included primer walking to extend the sequence read of a given subclone, to close a gap or confirm sequence that was obtained in the opposite orientation. The sequence of the regions not covered at least twice in one orientation and once in the other was obtained by PCR amplification and sequencing of the region from a cosmid or plasmid clone. After proofreading, the final sequence was analyzed with blast (35), repeatmasker (36), grail ii (37), genefinder (fgeneh) (38), and genescan (39) programs available at http://gc.bcm.tmc.edu:8088/search-launcher/launcher.html.

Deletion Analysis.

To define cancer cell homozygous deletions in the FRA3B region, 150 pairs of PCR oligonucleotide primers in unique sequences were generated. PCR reactions (20 μl) containing 100 ng of DNA, 20–40 ng of primers, 200 μM dNTPs, 10 mM Tris⋅Cl (pH 8.3), 50 mM KCl, 0.1 mg/ml gelatin, 1.5 mM MgCl2, and 0.5–1.5 units of Taq polymerase were performed in a Perkin–Elmer/Cetus DNA Thermal Cycler 9600 for 35 cycles of 95°C denaturation for 30 sec, 57°C annealing (varied for specific primer pairs) for 30 sec, and extending at 72°C for 30–45 sec. The amplified products were visualized in ethidium bromide-stained agarose gels.

Reverse Transcription–PCR (RT-PCR) and cDNA Library Screening.

RT-PCR was performed by using primers Z13–44/f (5′-GTCCGTAGATCTTGTTATGG-3′) and Z13–44/r (5′-TGGCTTTCAGGCATGTTGAGC-3′) and fetal kidney cDNA. A fetal kidney cDNA library was purchased (CLONTECH) and 3× 106 plaques were screened.

Inverse PCR.

Inverse PCR amplifications were performed by using a modification of the method described in ref. 40. DNA (3 μg) was digested with 50 units of restriction enzymes with four or five base pair recognition sites. The incompatible cohesive ends were filled in with T4 DNA polymerase. For circularization, the digested DNA was ligated in a 50-μl reaction with 4 units of T4 DNA ligase and ligation buffer (50 mM Tris⋅Cl, pH 7.4/10 mM MgCl2/10 mM ATP) at 16°C overnight. Circular products served as PCR templates under the following conditions: 5 min at 95°C followed with 35 cycles of 15 sec at 95°C, 30 sec at 60°C (first 10 cycles), 59°C (second 10 cycles), and 58°C (last 15 cycles), and then 3 min at 72°C.

RESULTS

DNA Sequence Analysis.

Shotgun sequencing was performed for the 210-kb region encompassing FHIT exon 5; 1,427,400 bp were sequenced for a coverage ratio of 6.9. Sequence fidelity was confirmed by primer walking, by amplification of identical sequence fragments from human DNA templates and by using Southern blot detection of expected size fragments in human DNAs. The complete sequence, numbered through 206,881 beginning at the telomeric end, was submitted to GenBank (accession no. AF020503). Fig. 1 shows the location of the sequenced region relative to FRA3B/FHIT landmarks. The GC content of the 210-kb region was 38.39 ± 0.50% with minimal deviation from this mean over the region, as shown for the 30-kb subregions in Table 1; this value was similar to that of the adjacent 110-kb sequence (38.9%) within intron 4 (17). There were no apparent CpG islands in the sequence.

Table 1.

Repetitive elements in the FRA3B/FHIT locus

Repeats Subregions, kb
1–30 30–60 60–90 90–120 120–150 150–180 180–210 210–240 240–270
SINEs
 Alus 1328  (4.4)* 1560  (5.2) 2063  (6.9) 1127  (3.8) 2292  (7.6) 1200  (4.0) 1456  (4.9) 1903  (6.3) 1990  (6.6)
 MIRs 1066  (3.4) 1301  (4.3) 980  (3.3) 579  (1.9) 704  (2.4) 460  (1.5) 858  (2.9) 678  (2.3) 492  (1.6)
LINEs
 L1 303  (1.0) 4651  (15.5) 9278  (30.9) 6295  (20.9) 2152  (7.2) 5079  (16.9) 3882  (12.9) 3320  (11.1) 3368  (11.2)
 L2 475  (1.6) 236  (0.8) 0  (0) 299  (1.0) 1068  (3.6) 1197  (4.0) 54  (0.2) 295  (1.0) 359  (1.2)
LTR elements
 MalRVs 781  (2.6) 394  (1.3) 0  (0) 1115  (3.7) 883  (2.9) 0  (0) 0  (0) 839  (2.8) 421  (1.4)
 HERVs 340  (1.1) 177  (0.6) 0  (0) 0  (0) 566  (1.9) 131  (0.4) 0  (0) 0  (0) 0  (0)
DNA elements
 MER1 655  (2.2) 434  (1.5) 0  (0) 1008  (3.4) 354  (1.2) 588  (2.0) 811  (2.7) 386  (1.3) 235  (0.8)
 MER2 952  (3.2) 0  (0) 654  (2.2) 269  (0.9) 904  (3.0) 2497  (8.3) 344  (1.2) 0  ((0) 0  (0)
 Mariners 1287  (4.3) 0  (0) 0  (0) 81  (0.3) 0  (0) 0  (0) 0  (0) 66  (0.2) 0  (0)
Unclassified 1028  (3.4) 0  (0) 0  (0) 0  (0) 0  (0) 779  (2.6) 0  (0) 0  (0) 0  (0)


 Total repeats 8598  (28.7) 8753  (29.2) 12975  (43.3) 10737  (35.8) 8923  (29.7) 11931  (39.8) 7622  (25.4) 7487  (25.0) 6855  (22.9)


GC content (%) 38.9 37.7 38.4 37.3 39.2 38.6 39.0 38.8 39.1
*

Total and percent (in parentheses) nucleotides represented by the repeat clone within the specific subregion. 

Repetitive element analysis included the 110-kb sequence reported by Boldog et al. (1997). 

The blast analysis did not reveal putative genes except for the FHIT exon 5 in the 210-kb region, and no other matching sequences were found in the EST database. After the masking of known repetitive sequences, the potential exons were detected by using grail, genefinder, and genescan programs; 28, 15, and 7 exon candidates were found, respectively. One of the sequences, Z4/34, at position 92 kb, also was detected by using RT-PCR and Northern blotting, which detected a 9- to 10-kb transcript in fetal kidney mRNA. The screening of a fetal kidney cDNA library, by using the RT-PCR product as probe, detected six overlapping cDNA clones. Sequence analysis revealed that the cDNA was at least 9 kb long and colinear with genomic DNA, suggesting that this cDNA represented an unprocessed transcript or resulted from false priming of contaminating DNA. A sequence with homology to human MTERF cDNA (GenBank accession no. Y09615) was found at position 231 kb. The sequence was colinear with genomic DNA, was interrupted with L1 repeats, and is probably a pseudogene.

Repetitive Element Analysis.

Long interspersed nuclear elements (LINE) (L1 and L2), short interspersed nuclear elements (Alu and MIR), elements with long terminal repeats (HERVs and MalRVs), and DNA transposons (mariner and MER) were distributed throughout the region (summarized in Table 1); 32.4% of the sequences in the region were repetitive elements. The distribution pattern of repetitive elements varied across the region, representing 43% of sequences in region 60–90 kb, 40% at 150–180 kb, and 36% at 90–120 kb, in contrast to 25% in the region 180–210 kb. Forty Alu and 41 MIR repetitive elements, totaling 10.7 kb (5.1% of total) and 5.9 kb (2.8%) in length, respectively, were recognized. Thirty-nine MER repeats occupied 4.4% of the region. Smit (36), in a robust analysis for human repetitive sequences in several megabases of DNA retrieved from GenBank, concluded that Alu sequences were less frequent in regions of low GC content, comprising 5.7% of the human genome with GC content of <43%, 17.9% of regions with GC content of 43–52% and 20.2% of sequences in regions with GC content of >52%, consistent with our findings. A comparison of repeat class frequencies in the Smit sequences, the Boldog et al. sequence (17), and our 210-kb sequence is shown in Fig. 2. LINE elements comprised ≈15% of the genome; in this 210-kb region, there were 43 LINEs with a length of 35.0 kb (16.7%), near the mean value for the whole genome. Most LINE elements were of the L1 subtype (total length 31.6 kb, 15.1%) and were concentrated at 50–100 kb and 150–180 kb, with 8 of the 11 L1 elements of >1 kb in these two subregions (see Table 1 and Fig. 3). Consensus topoisomerase II recognition sites also were distributed throughout the region, and 24 dinucleotide repeat motifs, including D3S1300, were identified. Four AT-rich trinucleotide repeats also were found: the polymorphic (TAA)11–22 (D3S4103) at 61 kb, (TAA)8 at 72 kb, (TGA)5 at 92 kb, and an imperfect (CAA)8 at 121 kb, as illustrated in Fig. 4.

Figure 2.

Figure 2

Repeats in the FHIT/FRA3B region. In this modification of Smit’s original figure (36), the whole genome was divided into three groups according to GC content, <43%, 43–52%, and >52%. MIR and L2 elements are combined. The ordinate shows the fraction of the respective sequenced region occupied by specific repetitive classes.

Figure 3.

Figure 3

Deletions in cancer cell lines. Depiction of 21 breakpoints (not including the LoVo breakpoint in BAC clone 358N7) with locations relative to L1 elements. L1 sequences of >1 kb, bold arrows; L1 sequences of <1 kb, small arrowheads

Figure 4.

Figure 4

Positions of FRA3B/FHIT rearrangements. Illustration of the 276-kb sequenced region summarizing positions of deletion endpoints, hybrid breaks, plasmid or viral flanking sequences, and trinucleotide repeats.

Insertion Elements.

pSV2neo-integration site flanking sequences were positioned on the 210-kb sequence (Fig. 4) at 109–110 kb (GenBank accession no. U06118) and at 146–147 kb (GenBank accession no. U40597) near the FHIT exon 5 at 149 kb. An inserted sequence was found in P4 cosmid subclones, which matched an Escherichia coli insertion-IS1 element (GenBank accession no. M13369). The insertion probably occurred during amplification of the cosmid DNA in E. coli; the sequence of the insertion site (Fig. 3) and element was deposited in GenBank (accession no. AF020504).

Hybrid Breakpoints.

Aphidicolin-induced breakpoints in somatic cell hybrids retaining human chromosome 3 were reported to cluster in the FRA3B region (13, 18, 17), and the distal breakpoint cluster reported by Wang et al. (18) was located at position 203 kb (see Fig. 3 and 4). A spontaneous break in hybrid cl3 (41) was mapped to 230 kb by using PCR amplifcation of primers derived from the FRA3B (Fig. 4). Interestingly, the cl3 break is near or in the L1 element at the 5′ end of the MTERF pseudogene.

Homozygous Deletions.

We (19, 16, 20) and others (17, 30) have mapped cancer cell homozygous deletions in the FRA3B region. Deletion breakpoints were located precisely within the sequenced region by primer “walking” to the endpoints by performing multiple PCR reactions on DNA templates from five cancer cell lines (unpublished data). Three discontinuous, homozygous deletions were found in MB436 and HK1; other cell lines (KATO III and LoVo) were shown to have one homozygously deleted region. For LS180, with three deletions (20), we fine-mapped only the large central deletion. Inverse PCRs performed at both sides (when possible) of the deletion endpoints revealed structural rearrangements of both FHIT alleles in these cell lines (Fig. 3; Table 2).

Table 2.

Summary of deletions and breakpoints in cancer cell lines

Cell line Allele Deletion, kb Length, kb Breakpoint, kb Repetitive sequence
LS180 a 22–144 122 22.024/143.708 (−)/(−)
b 71–180 109 70.784/180.720 L1/L1
KATO III a 1–61 >61 61.369/BAC358N7 L1
LoVo a 1–179 >179 179.243/BAC358N7 L1/L1
HK1 a 81–154 73 80.561/153.878 L1/L1
154–178 24 153.930/178.288 L1/L1
b 1–83 >83 83≈83.5 L1*
83.5–≈180 <97 83.5≈84 L1*
MB436 a 3–70 67 2.832/70.494 MIR/(−)
71–152 81 70.784/152.126 L1/L1
152–? >80 152≈153 L1*
b 3–70 67 2.832/70.494 MIR/(−)
94–232 138 93.880/231.609 L1/L1

(−), breakpoint not in repetitive elements. 

*

Breakpoints located by PCR analysis of homozygous deletion; sequences not determined. 

Breakpoint at position 232 kb was in the sequence reported by Boldog et al. (1997). 

In LS180 cells, independent deletions of 122 kb and 109 kb occurred on the two alleles at positions 22–144 kb for allele a and 71–180 kb for allele b, with a 73-kb overlapping region of deletion at 71–144 kb, which was observed as a homozygous deletion. LS180 cells also exhibited homozygous deletions in other regions of the FHIT locus (20) that were not sequenced. In KATO III and LoVo cells, the entire 210 kb was missing on allele b; one breakpoint was found in the KATO III allele a at position 61 kb and at position 179 kb of the LoVo a allele. In these cell lines, the sequence adjacent to the identified breakpoint was not present in the 210-kb region, indicating a deletion that ended outside the sequenced region. In both cell lines, the sequences adjacent to the break in allele a were in the partially sequenced BAC 358N7, telomeric to cosmid 63. In HK1 cells, deletions of 73 kb and 24 kb were found at 81–153.8 kb and at 153.9–178 kb on allele a, whereas the entire second allele except for a 0.5-kb segment (position 83–83.5 kb) was absent. In the MB436 DNA, we found four breakpoints for each allele as shown in Fig. 3. Interestingly, this cell line with the most complex FRA3B alterations was derived from a breast cancer that was treated with radiation and cytotoxic therapy before establishment of the cell line. Among the 22 breakpoints sequenced in these five cell lines, 16 were within L1-rich subregions; there were two “hot spots”, 50–100 kb and 150–180 kb, associated with a deletion in the 210-kb region (Fig. 3). In the HK1 line, four breakpoints on allele a, at 81, 153.8, 153.9, and 178 kb, were located within two large L1 sequences. This is true for four other break/rejoins: in the LS180 line (71 kb and 180 kb on allele b), and in the MB436 line (71 kb and 152 kb on allele a). In the HK1 cell line, two breakpoints on allele a were in L1 sequences at positions 83 kb and 83.5 kb, but the joining sequences were outside the 210-kb sequence, presumably in flanking regions. The same applied to a break on allele a in the MB436 line at 152 kb. In addition, four breaks were within 2 kb of an L1 sequence: one at 61 kb in the KATO III line, two in LoVo at 179 kb and in BAC 358N7, and one at 178 kb on allele a in the HK1 line. The previously reported U60203 lung cancer breakpoint sequence, which is not included in the 16 breaks discussed, also was not far from a large L1 sequence at 50 kb (Fig. 3 and 4). The breakpoints in LS180 and MB436 at 71 kb were identical, and although the joining sequences for the two breakpoints were derived from different sites, position 180 kb for LS180 and 152 kb for MB436, the joining sequences were 90% homologous over 1,500 bp, as shown in Fig. 5. Another MB436 breakpoint at 232 kb is near L1 sequences at the 3′ end of the MTERF pseudogene, not far from the cl3 hybrid break. Although other breaks showed no direct association with repetitive elements, a total of 16 of 22 cancer cell breaks were located in or near L1 sequences (GenBank accession nos. AF020609AF020615), demonstrating that the deletion machinery in this FRA3B region was influenced by DNA structure, particularly by L1 repetitive sequences.

Figure 5.

Figure 5

L1 associated breakpoints and sequences. (A) Six L1 sequences in the 210-kb region are aligned to the active LINE-1 (L1.2) prototype sequence (46). These L1 elements showed >80% sequence homology to the prototype over 1,000 bp. Note that all L1s were truncated, retaining only a 3′ portion of the sequence, and most breakpoints were located at the 5′ end of the retained sequence. EN, endonuclease; RT, reverse transcriptase; C, cysteine-rich region. (B) Rearrangements and sequence of breakpoints in MB436 and LS180; note sequence homology to the L1 prototype.

Sequence of the t(3;8) Break.

The sequence of the normal chromosome 3 region in which the translocation occurred was determined by sequencing of an 8.4-kb phage subclone from the D3S1480 phage (shown in Fig. 1) that crossed the 3p14.2 breakpoint. The position of the breakpoint was determined by amplification of products from a der 3 (8qter → 8q24∷3p14.2 → 3qter) hybrid (41) with primer pairs designed from the 8,392-bp sequence (numbered 1–8,392 from telemeric end; GenBank accession no. AF019967). The breakpoint was between nucleotides 1,086 and 1,608, which includes an Alu from 1,287 to 1,608. Thus, the breakpoint is in an Alu or within 200 bp of the Alu sequence.

DISCUSSION

The Fragile Region.

The structural basis for the cytogenetically visible chromosomal gaps that define fragile sites is not known, although the heritable fragile sites thus far defined somehow transmit expanded repeats at the nucleotide level to an aberrant chromatin structure that appears as a gap in metaphase chromosomes. At least one heritable fragile site can be associated with a deletion, with one deletion endpoint located within 20 kb of the expanded triplet repeat. It may be that the fragile site or gap corresponds to a single or double strand break within the triplet caused by the interference with DNA replication and that the deletion is then caused by incorrect repair of the break; thus, the actual site of the break or fragility is lost in the repaired chromosome. Not all expanded triplet sequences have been identified as fragile. Thus, there must be other components of chromatin that affect fragility, possibly including surrounding sequence, time of DNA replication, and level and tissue specificity of expression, although no characteristics aside from expanded polymorphic repeat sequences have been definitively associated with fragile site detection.

The basis for fragility at common fragile sites is not known, but it has been shown previously that the FRA3B “site” at 3p14.2 represents a broad region of fragility (13, 14) that may extend at least a megabase, with an epicenter of a few hundred kilobases surrounding the FHIT exon 5 (15), in which more than 60% of induced gaps occur. Nearly 300 kb of the fragile region has now been sequenced, and the basis for fragility across the region is not obvious. Because all individuals express this fragile region, specific polymorphisms would not be expected to be associated with fragility, although they might affect the degree of fragility. Without DNA from genetically unaffected individuals, the fragile sequence(s) will be difficult to identify. Numerous molecular genetic landmarks, such as translocations, insertion sites, somatic cell hybrid breakpoints, and cancer cell deletion breakpoints, have been placed precisely within the ≈300-kb context and certain conclusions can be inferred.

The Sequence.

In our analysis of positions of chromosomal fragile sites relative to FHIT exons, we found that an exon 4 cosmid was centromeric to ≈70% of the gaps and found that cosmid 63 was telomeric to ≈84% of the gaps (15). Thus, we chose to sequence a 210-kb region starting at cosmid 63 in midintron 5 and ending in intron 4; in combination with the published partial sequence of intron 4 (17), a 276-kb sequence encompassing the majority of fragile “breaks,” is now available. The total region, although in a Giemsa-light chromosome band, is AT rich and Alu depleted relative to the whole genome. There are numerous short and long repeats distributed over the region, including some triplet repeats, but neither the overall structure nor the specific regions within the sequence provided clues to the exact points within the sequence that are responsible for induced gaps. To pinpoint sites of fragile breaks, others have used aphidicolin-induced hybrid clones with breaks or plasmid insertions in 3p14.2 (7, 13, 18) to isolate sequences involved in breakage. The hybrid breaks and plasmid integration flanking sites are distributed over the region, as shown in Fig. 4, and their positions do not suggest sequence specificity. Presumably, the hybrid breakpoints are breaks that have been repaired so they might not be useful in identifying the actual fragile sites. Possibly one of the pSV2neo-integration site flanking sequences in intron 5 represents an actual fragile site, but the integration also involved deletion of a large intervening sequence. The integration site flanking sequences are in unique sequence falling between the major L1 rich clusters, as summarized in Fig. 4. The comparison of sequences around the integration site to each other, to the 3′ HPV integration site flanking sequences, and to the hybrid break cluster did not yield clues to FRA3B break mechanisms. Another type of breakpoint in the fragile region, but centromeric to the map shown, is the t(3;8) translocation, between FHIT exons 3 and 4. This break occurred within or near an Alu sequence, perhaps similarly to the centromeric side of the HPV16 integration (14), and could represent a sequence susceptible to gap formation. These observations suggest that the fragility in the FRA3B region is not dependent on specific sequences, but perhaps on long-range structural aspects associated with DNA replication, possibly involving transcriptional activity.

Cancer Cell Deletions.

The deletions within this region in cancer cells, as well as in carcinogen-exposed tissues (28, 42), preneoplasias and neoplasias, frequently involving both FHIT alleles, are no doubt associated with the fragile nature of the locus. A possible scenario is that the gaps seen after aphidicolin treatment represent single or double strand breaks that occur during replication. If the breaks are not repaired, the cell may be programmed to die; if the breaks are repaired properly, there is no consequence; if the breaks are repaired improperly, a FHIT exon could be deleted, imparting an in vivo advantage to the cell, especially when both FHIT alleles are inactivated. Thus, we examined the deleted FHIT alleles in several types of cancer cells for clues to fragility, as well as clues to mechanism of repair to breaks and deletions. We first fine-mapped positions of deletion endpoints on the sequenced region, and then we sequenced the deletion endpoints by inverse PCR. As shown in Fig. 3, the structure of cancer cell deleted alleles was complex, with all cell lines exhibiting independent deletions of both FHIT alleles, as predicted (20). Most interestingly, 16 of 22 break or deletion repairs involved L1 sequences of >1 kb. Thus, the deletion repair mechanism apparently involved recombination between L1 elements. Recently, the sequence of a 112-kb region of the human dystrophin gene intron 7 was reported to contain mutational “hot spots” including translocation breakpoints (43). The sequence contained only nine Alu elements (2.4% of the region) and 22 L1s (25.8% of the 112-kb region). This 112-kb region was similar to our sequenced region, regarding features such as the length of intron size, Alu depletion, richness in LINE sequences, and rearrangements, although it is not known to be fragile. It also has been noted (44) that intrachromsomal recombination between identical sequences is an active mechanism for generation of deletions, and the rate of intrachromosomal recombination is a direct function of the homology between the two elements, dropping off when the homology is below 200 bp. L1 repetitive elements are thought to be derived from the active L1 prototype with transposon activity. Recently, seven active L1 elements were identified in the human genome and a consensus sequence was reported (45, 46). Active elements were not among the 43 L1 elements identified in our sequenced region, but some of the elements showed high sequence homology with the prototype L1. These elements, including the L1s at 56.3 kb (1,444 bp), at 57.8 kb (1,060 bp), 70.8 kb (1,616 bp), 152 kb (1,817 bp), and 180.1 kb (2,249 bp), also showed 90% homology to each other over 1,000 bp and were associated with DNA deletion. In particular, MB436 and LS180 cell lines showed the same breakpoint at position 71 kb and, although the joining breakpoint sequences were at different sites (position 152.1 kb in MB436 and180.7 kb in LS180), they showed high sequence homology (see Fig. 5), indicating that the deletions involved common mechanisms.

L1-mediated gene rearrangements have been reported in various disorders: a 300-kb deletion in intron 10 of the PAX6 gene in familial aniridia (47), a translocation within COL5A intron 24 in Ehlers–Danlos syndrome (48), and an insertional mutation of a 7.1-kb L1 element within intron 6 of the β subunit of the glycine receptor in congenital myoclonia (49). The association of L1 elements with DNA deletion in cancer cells had not previously been observed. Loss of tumor suppressor genes is a primary step in carcinogenesis and our results suggest that the frequent deletions in the FRA3B/FHIT locus are mediated by L1 sequences and result in inactivation of the FHIT gene.

The distal, aphidicolin-induced breakpoint cluster (17, 18, 50), was located at position 203 kb; we found no carcinoma-associated deletion breakpoints in and around this position. The aphidicolin-induced break in the A5–4 hybrid, probably located ≈70 kb telomeric of FHIT exon 5 (17), could fall near an L1 in the deletion breakpoint “hot spot” within subregion 50–100 kb. The spontaneous break or break repair in hybrid cl3 was within 1 kb of an L1 sequence and a cancer cell deletion endpoint (Fig. 4).

Perspective.

It will be important to determine whether there is polymorphism in the human population at fragile breaks (possibly integration or translocation sequences) or at break recombination sequences such as the deletion endpoints. The answers to some of these questions can be determined by PCR amplification, but the repetitive nature and size of the repeat clusters will require investigation by using Southern blot to determine whether some individuals exhibit fewer L1-sequences in the region. Perhaps, if this is the case, the locus would still be fragile as expected, with less likelihood of L1 mediated repair. An immediate possibility is to use primer pairs flanking the characterized deletions to determine whether similar deletions occur frequently in primary cancers. Thus, we may be able to determine whether environmental insult to the common fragile region is more likely to result in FHIT deletion in individuals, depending on specific configurations of L1 loci for example, and if said FHIT deletions can be useful in following the natural history of the disease in important cancers.

Acknowledgments

This research was supported by U.S. Public Health Service Grants CA39860, CA51083, CA21124, and CA56336 from the National Cancer Institute and by a gift from Mr. R. R. M. Carpenter III and Mrs. Mary Carpenter. Genomic clones covering the region of the t(3;8) break were isolated by Kumar Kastury.

ABBREVIATIONS

RT

reverse transcription

LINE

long interspersed nuclear element

Footnotes

Data deposition: The sequences reported in this paper have been deposited in the GenBank database (accession nos. AF020503, AF020504, AF020609-AF020615, and AF019967).

References

  • 1.Sutherland G R, Richards R I. Curr Opin Genet Dev. 1995;5:323–327. doi: 10.1016/0959-437x(95)80046-8. [DOI] [PubMed] [Google Scholar]
  • 2.Jones C, Penny L, Mattina T, Yu S, Baker E, Voullaire L, Langdon W Y, Sutherland G R, Richards R I, Tunnacliffe A. Nature (London) 1995;376:145–149. doi: 10.1038/376145a0. [DOI] [PubMed] [Google Scholar]
  • 3.Yu S, Mangelsdorf M, Hewett D, Hobson L, Baker E, Eyre H J, Lapsys N, Paslier D L, Doggett N A, Sutherland G R, Richards R I. Cell. 1997;88:367–374. doi: 10.1016/s0092-8674(00)81875-9. [DOI] [PubMed] [Google Scholar]
  • 4.Glover T W, Stein C K. Am J Hum Genet. 1988;43:265–273. [PMC free article] [PubMed] [Google Scholar]
  • 5.Warren S T, Zhang F, Licambi G R, Peters J F. Science. 1987;237:420–423. doi: 10.1126/science.3603029. [DOI] [PubMed] [Google Scholar]
  • 6.Wang N D, Testa J R, Smith D I. Genomics. 1993;17:341–347. doi: 10.1006/geno.1993.1330. [DOI] [PubMed] [Google Scholar]
  • 7.Rassool F V, Le Beau M M, Shen M L, Neilly M E, Espinosa R R, Ong S T, Boldog F, Drabkin H, McCarroll R, McKeithan T W. Genomics. 1996;35:109–17. doi: 10.1006/geno.1996.0329. [DOI] [PubMed] [Google Scholar]
  • 8.Yunis J J, Soreng A L, Bowe A E. Oncogene. 1987;1:59–69. [PubMed] [Google Scholar]
  • 9.Yunis J J, Soreng A L. Science. 1984;226:1199–1204. doi: 10.1126/science.6239375. [DOI] [PubMed] [Google Scholar]
  • 10.Le Beau M M, Rowley J D. Science. 1984;226:1199–1204. [Google Scholar]
  • 11.Cannizzaro L A, Durst M, Mendez M J, Hecht B K, Hecht F. Cancer Genet Cytogenet. 1988;33:93–98. doi: 10.1016/0165-4608(88)90054-4. [DOI] [PubMed] [Google Scholar]
  • 12.Popescu N C, DiPaolo J A. Cancer Genet Cytogenet. 1989;42:157–171. doi: 10.1016/0165-4608(89)90084-8. [DOI] [PubMed] [Google Scholar]
  • 13.Paradee W, Wilke C M, Wang L, Shridhar R, Mullins C M, Hoge A, Glover T W, Smith D I. Genomics. 1996;35:87–93. doi: 10.1006/geno.1996.0326. [DOI] [PubMed] [Google Scholar]
  • 14.Wilke C M, Hall B K, Hoge A, Paradee W, Smith D I, Glover T W. Hum Mol Genet. 1996;5:187–195. doi: 10.1093/hmg/5.2.187. [DOI] [PubMed] [Google Scholar]
  • 15.Zimonjic D B, Druck T, Ohta M, Kastury K, Croce C M, Popescu N C, Huebner K. Cancer Res. 1997;57:1166–1170. [PubMed] [Google Scholar]
  • 16.Ohta M, Inoue H, Cotticelli M G, Kastury K, Baffa R, Palazzo J, Siprashvili Z, Mori M, McCue P, Druck T, Croce C M, Huebner K. Cell. 1996;84:587–597. doi: 10.1016/s0092-8674(00)81034-x. [DOI] [PubMed] [Google Scholar]
  • 17.Boldog F, Gemmill R M, West J, Robinson M, Robinson L, Li E, Roche J, Todd S, Waggoner B, Lundstrom R, Jacobson J, Mullokandov M R, Klinger H, Drabkin H A. Hum Mol Genet. 1997;6:193–203. doi: 10.1093/hmg/6.2.193. [DOI] [PubMed] [Google Scholar]
  • 18.Wang L, Paradee W, Mullins C, Shridhar R, Rosati R, Wilke C M, Glover T W, Smith D I. Genomics. 1997;41:485–488. doi: 10.1006/geno.1997.4690. [DOI] [PubMed] [Google Scholar]
  • 19.Kastury K, Baffa R, Druck T, Ohta M, Cotticelli M G, Inoue H, Negrini M, Rugge M, Huang D, Croce C M, Palazzo J, Huebner K. Cancer Res. 1996;56:978–983. [PubMed] [Google Scholar]
  • 20.Druck T, Hadaczek P, Fu T B, Ohta M, Siprashvili Z, Baffa R, Negrini M, Kastury K, Veronese M L, Rosen D, Rothstein J, McCue P, Cotticelli M G, Inoue H, Croce C M, Huebner K. Cancer Res. 1997;57:504–512. [PubMed] [Google Scholar]
  • 21.Virgilio L, Shuster M, Gollin S M, Veronese M L, Ohta M, Huebner K, Croce C M. Proc Natl Acad Sci USA. 1996;93:9770–9775. doi: 10.1073/pnas.93.18.9770. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 22.Sozzi G, Sard L, De Gregorio L, Marchetti A, Musso K, Buttitta F, Tornielli S, Pellegrini S, Veronese M L, Manenti G, Incarbone M, Chella A, Angeletti C A, Pastorino U, Huebner K, Bevilaqua G, Pilotti S, Croce C M, Pierotti M A. Cancer Res. 1997;57:2121–2123. [PubMed] [Google Scholar]
  • 23.Greenspan D L, Connolly D C, Wu R, Lei R Y, Vogelstein J T C, Kim Y-T, Mok J E, Munoz N, Bosch X, Shah K, Cho K R. Cancer Res. 1997;57:4692–4698. [PubMed] [Google Scholar]
  • 24.Huebner K, Hadaczek P, Siprashvili Z, Druck T, Croce C M. Biochim Biophys Acta. 1997;1332:M65–M70. doi: 10.1016/s0304-419x(97)00009-7. [DOI] [PubMed] [Google Scholar]
  • 25.Siprashvili Z, Sozzi G, Barnes L D, McCue P, Robinson A K, Eryomin V, Sard L, Tagliabue E, Greco A, Fusetti L, Schwartz G, Pierotti M A, Croce C M, Huebner K. Proc Natl Acad Sci USA. 1997;25:13771–13776. doi: 10.1073/pnas.94.25.13771. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 26.Druck T, Kastury K, Hadaczek P, Podolski J, Toloczko A, Sikorski A, Ohta M, LaForgia S, Lasota J, McCue P, Lubinski J, Huebner K. Cancer Res. 1995;55:5348–5353. [PubMed] [Google Scholar]
  • 27.Shridhar R, Shridhar V, Wang X, Paradee W, Dugan M, Sarkar F, Wilke C, Glover T W, Vaitkevicius V K, Smith D I. Cancer Res. 1996;56:4347–4350. [PubMed] [Google Scholar]
  • 28.Sozzi, G., Huebner, K. & Croce, C. M. (1997) Adv. Cancer Res., in press. [DOI] [PubMed]
  • 29.Ahmadian M, Wistuba I I, Fong K M, Behrens C, Kodagoda D R, Saboorian M H, Shay J, Tomlinson G E, Blum J, Minna J D, Gazdar A F. Cancer Res. 1997;57:3664–3668. [PubMed] [Google Scholar]
  • 30.Fong K M, Biesterveld E J, Virmani A, Wistuba I, Sekido Y, Bader S A, Ahmandian M, Ong S T, Rassool F V, Zimmerman P V, Giaccone G, Gazdar A F, Minna J D. Cancer Res. 1997;57:2256–2267. [PubMed] [Google Scholar]
  • 31.Larson A A, Kern S, Curtiss S, Gordon R, Cavenee W K, Hampton G M. Cancer Res. 1997;57:4082–4090. [PubMed] [Google Scholar]
  • 32.Cohen A J, Li F P, Berg S, Marchetto D J, Tsai S, Jacobs S C, Brown R S. N Engl J Med. 1979;301:592–595. doi: 10.1056/NEJM197909133011107. [DOI] [PubMed] [Google Scholar]
  • 33.Kastury K, Ohta M, Lasota J, Moir D, Dorman T, LaForgia S, Druck T, Huebner K. Genomics. 1996;32:225–235. doi: 10.1006/geno.1996.0109. [DOI] [PubMed] [Google Scholar]
  • 34.Fleischmann R D, Adams M D, White O, Clayton R A, Kirkness E F, Kerlavage A R, Bult C J, Tomb J F, Dougherty B A, Merrick J M. Science. 1995;269:496–512. doi: 10.1126/science.7542800. [DOI] [PubMed] [Google Scholar]
  • 35.Altschul S F, Gish W, Myers E W, Lipman D J. J Mol Biol. 1990;215:403–410. doi: 10.1016/S0022-2836(05)80360-2. [DOI] [PubMed] [Google Scholar]
  • 36.Smit A F A. Curr Opin Genet Dev. 1996;6:743–748. doi: 10.1016/s0959-437x(96)80030-x. [DOI] [PubMed] [Google Scholar]
  • 37.Xu Y, Einstein J R, Mural R J, Shah M, Uberbacher E C. In: An Improved System for Exon Recognition and Gene Modeling in Human DNA Sequences. Press A, editor. Menlo Park, CA: AAAI; 1994. pp. 376–384. [PubMed] [Google Scholar]
  • 38.Solovyev V V, Salamov A A, Lawrence C B. Nucleic Acids Res. 1994;22:5156–5163. doi: 10.1093/nar/22.24.5156. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 39.Burge C, Karlin S. J Mol Biol. 1997;268:78–94. doi: 10.1006/jmbi.1997.0951. [DOI] [PubMed] [Google Scholar]
  • 40.Ochman H, Gerber A S, Hartl D L. Genetics. 1988;120:621–623. doi: 10.1093/genetics/120.3.621. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 41.LaForgia S, Morse B, Levy J, Barnea G, Cannizzaro L A, Li F, Nowell P C, Boghosian-Sell L, Glick J, Weston A, Harris C C, Drabkin H, Patterson D, Croce C M, Schlessinger J, Huebner K. Proc Natl Acad Sci USA. 1991;88:5036–5040. doi: 10.1073/pnas.88.11.5036. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 42.Mao L, Lee J S, Kurie J M, Fan Y H, Lippman S M, Lee J J, Ro J Y, Broxson A, Yu R, Morice R C, Kemp B L, Khuri F R, Walsh G L, Hittelman W N, Hong W K. J Natl Cancer Inst. 1997;89:857–862. doi: 10.1093/jnci/89.12.857. [DOI] [PubMed] [Google Scholar]
  • 43.McNaughton J C, Hughes G, Jones W A, Stockwell P A, Klamut H J, Petersen G B. Genomics. 1997;40:294–304. doi: 10.1006/geno.1996.4543. [DOI] [PubMed] [Google Scholar]
  • 44.Waldman A S, Liskay R M. Proc Natl Acad Sci USA. 1987;84:5340–5344. doi: 10.1073/pnas.84.15.5340. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 45.Feng Q H, Moran J V, Kazazian H H, Boeke J D. Cell. 1996;87:905–916. doi: 10.1016/s0092-8674(00)81997-2. [DOI] [PubMed] [Google Scholar]
  • 46.Sassaman D M, Dombroski B A, Moran J V, Kimberland M L, Naas T P, Deberardinis R J, Gabriel A, Swergold G D, Kazazian H H. Nat Genet. 1997;16:37–43. doi: 10.1038/ng0597-37. [DOI] [PubMed] [Google Scholar]
  • 47.Drechsler M, Royer-Pokora B. Hum Genet. 1996;98:297–303. doi: 10.1007/s004390050210. [DOI] [PubMed] [Google Scholar]
  • 48.Toriello H V, Glover T W, Takahara K, Byers P H, Miller D E, Higgins J V, Greenspan D S. Nat Genet. 1996;13:361–365. doi: 10.1038/ng0796-361. [DOI] [PubMed] [Google Scholar]
  • 49.Kingsmore S F, Giros B, Suh D, Bieniarz M, Caron M G, Seldin M F. Nat Genet. 1994;7:136–141. doi: 10.1038/ng0694-136. [DOI] [PubMed] [Google Scholar]
  • 50.Paradee W, Mullins C, He Z, Glover T, Wilke C, Opalka B, Schutte J, Smith D I. Genomics. 1995;27:358–361. doi: 10.1006/geno.1995.1057. [DOI] [PubMed] [Google Scholar]

Articles from Proceedings of the National Academy of Sciences of the United States of America are provided here courtesy of National Academy of Sciences

RESOURCES