Abstract
Treacher Collins Syndrome (TCS) is the most common of the human mandibulofacial dysostosis disorders. Recently, a partial TCOF1 cDNA was identified and shown to contain mutations in TCS families. Here we present the entire exon/intron genomic structure and the complete coding sequence of TCOF1. TCOF1 encodes a low complexity protein of 1,411 amino acids, whose predicted protein structure reveals repeated motifs that mirror the organization of its exons. These motifs are shared with nucleolar trafficking proteins in other species and are predicted to be highly phosphorylated by casein kinase. Consistent with this, the full-length TCOF1 protein sequence also contains putative nuclear and nucleolar localization signals. Throughout the open reading frame, we detected an additional eight mutations in TCS families and several polymorphisms. We postulate that TCS results from defects in a nucleolar trafficking protein that is critically required during human craniofacial development.
Treacher Collins Syndrome (TCS) is the most common human mandibulofacial dysostosis disorder. It shows autosomal dominant inheritance and has an estimated incidence of 1 in 50,000 live births, with approximately 60% arising from new mutations (1, 2). Although thought to be fully penetrant, TCS shows wide variability, with some cases so mildly affected that they may go undiagnosed. The major diagnostic criteria of this disease include bilaterally symmetric midface hypoplasia, downward slant of palpebral fissures and colobomata of lower lids, micrognathia, microtia, and other deformities of the ear often leading to conductive hearing loss. From the structures affected and from studies in mice that were exposed to the teratogens cis- and trans-retinoic acid, it has been deduced that the disease is a result of interference in the development of the first and second branchial arches (1, 3–5).
The TCS gene, TCOF1 (MIM no. 154500) was initially localized to chromosome 5q31–33.3 by genetic linkage analysis (6, 7). This was then refined to the region between the colony stimulating factor 1 receptor gene (CSF1R) (proximally) and osteonectin gene (SPARC) (distally) (8, 9), and subsequently further refined to between marker D5S519, which is distal to CSF1R and SPARC (10). Physical and transcription maps of these regions were constructed (11–13). A candidate gene for TCS was recently identified (14) which, confusingly, in light of previous studies (10, 12) lies proximal to D5S519, between it and CSF1R. Five mutations, presumably leading to premature termination of translation, were detected in three exons from affected individuals. Five more mutations in TCOF1 recently have been reported in TCS-affected individuals (15), bringing the total number of identified mutations to 10. Of these, all are either nonsense mutations, insertions, deletions, or splicing mutations that apparently lead to premature termination of translation. Moreover, all appear to be unique to each family. A partial cDNA encoded by the TCOF1 gene was identified and shown to contain an open reading frame for a putative protein product called “Treacle.” How Treacle functions normally or in the etiology of TCS was not clear, since no obvious similarity to any known proteins was described in these studies.
We present the genomic structure and complete coding sequence of TCOF1. Our analysis of the structure of the predicted TCOF1 protein has revealed conserved motifs comprising individual exons, which are organized as repeated domains. By alignments with related motifs in other organisms, we predict that TCOF1 encodes a highly phosphorylated nucleolar protein. We have detected an additional eight mutations within this open reading frame in nine individuals (one insertion and seven deletions). Each of these appears to lead to premature termination of translation. These and other findings (14, 15) support a model of haploinsufficiency of a nucleolar phosphoprotein in a critical stage of human mandibulofacial development leading to the pathogenesis of TCS.
MATERIALS AND METHODS
TCOF1 Clone Extension.
The TCOF1 gene (14) was extended by 5′ and 3′ rapid amplification of cDNA ends (BRL) as suggested by the manufacturer. The 3′ end was extended in placental cDNA using gene-specific primer (GSP) 1: 5′-GAG CCA GAA GAG GAG CTT CA-3′; the product was amplified with GSP2: 5′-(CUA)4 CGG TTG AAG GTG GAG ATC AA AG-3′ (primers derived from exon 23). The 5′ end of TCOF1 was also extended in placental cDNA using GSP1A: 5′-AGG AGT TGG TAG ATG CTA GT-3′ in exon 4. The product was deoxyadenosine-tailed for second strand synthesis, which was amplified with GSP2A1: 5′-TTC GGC TTC TGC TTC TTC CT-3′ in exon 3 and reamplified with GSP2A: 5′-(CUA)4 TTA CGG GCT GAG CCA GGA A-3′ in exon 2. All PCR amplifications included 35 cycles of 94°C for 30 sec, 50°C for 30 sec, and 72°C for 30 sec. The DNA was isolated by gel purification using the Wizard PCR Preps DNA Purification System (Promega). The sequences of these PCR products were obtained from both directions using an Applied Biosystems 377 automated DNA sequencer.
Sequence Analysis.
The repeating motif within TCOF1 was initially detected by alignment of the gene with itself using the blastn and blastx programs (16). Putative phosphorylation, nuclear, and nucleolar localization sites were identified by visual inspection of the sequence. By the newer 2.6.2 versions of blastx and blastn programs, multiple alignments were generated to TCOF1 with genes of similar structures due to their repetitive nature.
Definition of TCOF1 Exon/Intron Boundaries.
Two genomic cosmid clones were sequenced to define exon/intron boundaries. Cosmid 245H1 from the Los Alamos chromosome 5 cosmid library (17) was used as a template for exons that are 5′ of exon 16, and cosmid 40A6 was used to sequence the 3′ end of TCOF1. The cosmid DNA was isolated using the Plasmid Midi kit (Qiagen, Chatsworth, CA) and sequenced by The Johns Hopkins Genetic Resources Core Facility using an Applied Biosystems 373 automated DNA sequencer. PCR primer pairs were generated from intron sequences to amplify each exon in 20–50 μl reactions containing 100–500 ng genomic DNA, 10 mM Tris·HCl (pH 8.3), 1.5 mM MgCl2, 50 mM KCl, 0.2 mM dNTPs, 0.5 μM each primer, and 1–2 units Taq DNA polymerase (Boehringer Mannheim). The PCR included 35 cycles of 94°C for 60 sec, annealing specific temperature (Table 1) for 60 sec, and 72°C for 60 sec. Sequencing the PCR products in both directions confirmed the exon/intron sequence derived from the cosmids.
Table 1.
Exon | Position in cDNA, bp | Splice acceptor* 5′ to 3′ | Splice donor* 5′ to 3′ | PCR primers† 5′ to 3′ | PCR size, bp‡Tm, °C | Ref. source |
---|---|---|---|---|---|---|
1 | 5′ UTR– 108 | ggtcgcgggt ATG GCC GAG G§ | G AGC GGC CAG gtaagcgttc | AGGCGGGGCGTGCAGGTA CCGCTGATCTCCACATCTTG | 205 65 | This study |
2 | 109–164 | ctctctgcag AAG TGT TTC C | AC TGG CAA CA gtaagtggtg | CCTCCCAAAGTGCTGAGA | 222 | This study |
GTGTCCGTCCCTACTCCA | 59 | |||||
3 | 165–304 | tgtcctgcag A ACC TCA GAG | GCC AAA GCC A gtaagagcct | ATTCTTGTGGAGTTGTTC | 308 | This study |
CCCCAGGGTCTTTTAGGT | 50 | |||||
4 | 305–378 | tttcttgcag CC CCA AGA CT | A AAA GCC AAG gtgagtggga | TCATCTGGCTCCTTTAGCAG | 141 | 15 |
TAGGCAATAGCTTGGAAGGC | 63 | |||||
5 | 379–565 | ttctctgtag GCA GAG ACA G | GCC AAG CCT G gtaagaagtc | TTGGGTTCAGATGCAAGTGG | 297 | 15 |
AAGTTCTGGGGACTAGGTTC | 59 | |||||
6 | 566–639 | cgatcctcag GG ATG GTG TC | A GAC GTG GAG gtaattgcca | TGGAAAGGGAGTCCCTCAGT | 167 | 15 |
GTTCCTGGAAGGGTTAGAGG | 62 | |||||
7 | 640–852 | ttttcaccag GTA AAG GCC T | C CTG CTT CAG gtgaggcctg | GTTTTCACAAGCAGGAGAGC | 313 | This study, 14 |
AGAAGGCCTTCTGGGGATG | 60 | |||||
8 | 853–1047 | gtttctccag GCG AAG GCC T | G CCT GCT CAG gtgaggcaga | GTGTCCTGTGTCTCCTCAC | 298 | 15 |
TTTAGGCATGGGGCTACTCT | 63 | |||||
9 | 1048–1257 | ctcactccag GCG AAG CCT T | T GCA GCT CAG gtgaggctgg | ACCTTTGCCACATCCAGCTC | 328 | 15 |
TCTTTTGAGGCAGGGCACAG | 65 | |||||
10 | 1258–1473 | tgtctcccag GTG AAG CCC T | C CCG GCT CAG gtgaggcccc | ACTCCCTCCCTAATCTTGTC | 278 | This study, 14 |
GAAAGAGCCTTACAGGAAGG | 60 | |||||
11 | 1474–1662 | ctcactccag GAA AAG TCC T | T GCA GCT CAG gtgaggcctg | TACCCTGGGCTCCCTCTC | 295 | This study |
CCGGGGGTGCTGACTGTG | 64 | |||||
12 | 1663–1911 | gtcccctcag GCA AAA CCA G | C GTG GGA CAG gtgaggcctg | GTGGGGCAGAACAGATGG | 408 | This study |
GGGATGACAAGGGGAAGA | 62 | |||||
13 | 1912–2109 | gtcatcccag GCA AAG TCT G | T CCA GCA CAG gtgaggccta | GGAGACACCTCTCTTCCCCT | 257 | This study, 14 |
GGATGGGCCTGCTCCTTCTA | 58 | |||||
14 | 2110–2247 | ctccactcag GTG AAA ACC T | T CCA TCC AAG gcaagtgggg | CAATCTCACCTTCTCCCTCCT | 219 | 15 |
AACCCTCCACACCTCCTGTG | 62 | |||||
15 | 2248–2427 | tgcaattcag GTG AAG CCA C | G CTG GCT CAG gtgaggggga | GGGAGTGGGACCTGAAAGAA | 277 | 15 |
CCCATGTAGGGGATGATCTC | 56 | |||||
16 | 2428–2628 | ctccctccag GCG AAG CCT T | C TCT GCC CAG gtaagacttg | AGATCATCCCCTACATGG | 360 | This study |
CCCTATACCCCCGTTCTG | 55 | |||||
17 | 2629–2815 | gtttttcaag GTG ATT AAA C | TTG ACT CCT G gtgagcgcag | CCCCATCACCTCCTTTCC | 335 | This study |
CCAGTGTCCTGTCCCTTCTG | 64 | |||||
18 | 2816–2952 | tccatttcag GC ATC AGA AC | A GCC ACT CAG gtacctggtg | GCACAGGCCGGTAAATTG | 315 | This study |
TTGCAGGCCATCCCATCA | 65 | |||||
19 | 2953–3066 | ccacccacag GTG TCA AAG A | T TCA AGT GGG gtgagcttgg | CCCCAGCCAGACAGCATC | 320 | This study |
AGGGAGGCAAACCAAGTG | 64 | |||||
20 | 3067–3286 | accgaattag GTT GAC AGT G | CAC ACG CTG G gtgagggtgc | ACTTGCCCTAATTTTTCC | 320 | This study |
CCACACAACACCCTCTTC | 55 | |||||
21 | 3287–3369 | tctccagtag GT CCC ACC CC | G CCA TCC CAG gtaactgcaa | ATGGGGGTGAGGGACCTG | 275 | This study |
CTGAGGGATCGGGTAGAC | 60 | |||||
22 | 3370–3550 | gcttcttcag TCT CTC CTC T | CCT AAA ACA G gtaagttaag | CTCTGTGCCTTGTTGTCC | 387 | This study |
CACTGCCCTGTCCCTCTG | 63 | |||||
23 | 3551–4111 | ctctccatag GT GGA AAA GA | TCC GAC AAG A gtgagtgacc | ACTCCCTGCACCCTCTTC | 364 | This study |
TGGTCTCCCGATAGCTTC | 60 | |||||
AGAAAGAAGGTGGTGGAC | 466 | |||||
TCTACATGGGAGGAATGA | 50 | |||||
24 | 4112–4209 | cttcccttag GA AAA AAA GA | G AAG AAA AAG gtagagagtt | ATTGACCCCAGCACTTAG | 248 | This study |
GAAGGGGGCAGGAACCAG | 60 | |||||
25 | 4210–3′ | ctcctcacag AAG AAG ACA G | ccaggcacag gtacgcttcc | GCAGTGGGTGGGGAAAAG | 168 | This study |
UTR, 4528 | CCACAGGGGACACCAGAG | 60 | ||||
26 | 3′ UTR, | cttcccctag ggatttccta | gaggaagggtpolyA¶ | ATGCCAGATTTCATTTTC | 406 | This study |
4259-4741 | CTGTGGAGCAAGGTGGTG | 50 | ||||
AGTGACCTCCTCTCCTTC | 411 | |||||
TAGTTTAGATGCCACCTC | 52 |
UTR, untranslated region.
Uppercase letters are coding sequences. Lowercase letters are noncoding sequences. Spaces in sequences indicate exon/intron boundaries or reading frames.
Primer pair sequences are noted for PCR amplification of each exon. Two pairs of PCR primers are required to amplify the entire length of exons 23 or 26.
The PCR product size and the annealing temperature used for PCR amplification are noted.
This is the sequence prior to and including the translation start site. The A nucleotide of the ATG is numbered 1.
This is the sequence prior to where the poly(A) tract is added.
Patient Population.
The probands and their relatives were clinically examined. Genomic DNA from blood samples and lymphoblast cultures were isolated from members of 55 unrelated TCS families and 60 controls.
Heteroduplex Analysis.
Heteroduplex analysis was performed to scan a total of 13 TCOF1 exons (4–16) for mutations. Five microliters of each PCR product was denatured at 98°C for 10 min and annealed at 68°C for 1 hr. The heteroduplexes were analyzed on a 1× Mutation Detection Enhancement gel (FMC) at 580 V for 16 hr. The gel was stained with ethidium bromide to visualize heteroduplexes. PCR products with mobility shifts on the Mutation Detection Enhancement gel were run on 2.0% NuSieve (FMC) agarose gels.
Mutation and Polymorphism Detection.
Isolated DNA was sequenced in both directions. Sequence data were used to establish allele-specific oligonucleotide (ASO) hybridization and restriction enzyme-based screening methods that would distinguish between the mutant and normal sequence or between polymorphisms. For ASO hybridizations, PCR products were dotted onto Hybond-N+ (Amersham) filters, prehybridized for 30 min, and hybridized for 1 hr with Rapid-hyb buffer (Amersham). The ASO was 32P-end-labeled with polynucleotide kinase. The filters were washed once at room temperature for 20 min in 5× SSC/0.1% SDS and then washed two times for 15 min at the specific ASO temperature wash (Table 2) in 0.2× SSC/0.1% SDS. The same filter was used for both the mutant and the corresponding normal ASO probes. Restriction enzyme digests were performed according to manufacturer’s instructions (New England Biolabs), using 10 μl from the PCR reaction in a total volume of 20 μl.
Table 2.
Exon/nucleotide | Amino acid | Screening method | Case* |
---|---|---|---|
5 | H141Q† | ASO‡ | 1F |
422insA | STOP at 33 aa away | ATGCCACAACCCTGCCAC 62 | |
ATGCCACACCCTGCCAC 60 | |||
5 | N166I† | ASO‡ | 1U |
497delATAC | STOP at 48 aa away | TCAGCAATACGTTGGTCT 48 | |
TCAGCAAATACTACGTTG 45 | |||
10 | S470Q† | ASO‡ | 2F |
1408delAG | STOP at 1 aa away | CAGAGCAGTAGTGAGGAG 48 | |
CAGAGAGCAGTAGTGAGGAG 53 | |||
16 | PR830–831PG§ | ASO‡ | 1F |
2490delC | STOP at 42 aa away | CCAAGGAGTCCCCAGGAA 53 | |
CCAAGGAGTCCCCCAGGAA 55 | |||
16 | TG842–843TA§ | Eco0109I (nl)¶ | 1F |
2526delAG | STOP at 11 aa away | ||
16 2552delA and 2561delA | K851S‖ STOP at 13 aa away | ASO for 2552delA‡ CAGGCAGGGAGCAGGA 48 CAGGCAGGGAAGCAGG 48 SfaNI for 2561delA | 1S |
16 2565delAG | SG855–856SE§ STOP at 9 aa away | DdeI (nl)¶ | 1F, 1S |
ins, insertion; del, deletion; STOP, termination codon; aa, amino acid.
The number of familial (F) and sporadic (S) cases in which mutations occurred is designated. If it is unknown whether a case is familial or sporadic because relatives are unavailable, it is designated as U.
The amino acid change indicates the first amino acid that is substituted. All subsequent amino acids are changed as a result of the frameshift.
ASO primer sequences in 5′ to 3′ orientation are listed with Tm °C used to wash membranes. The top oligonucleotide hybridizes to the mutant sequence and bottom oligonucleotide hybridizes to the normal sequence.
The amino acid change indicates (i) the amino acid in which a frameshift occurred without substitution and (ii) the next amino acid that is substituted. All subsequent amino acids are changed as a result of the frameshift.
Restriction enzymes used to detect mutant sequences are listed. (nl) indicates that the enzyme digests the normal, not the mutant sequence.
The amino acid change indicates the result of both mutations in the same gene.
TA Cloning.
To obtain each allele with insertions or deletions, the PCR product was cloned into vector pCR2.1 using the TA cloning kit (Invitrogen) according to manufacturer’s instructions. Clones containing the mutant or polymorphic alleles were detected by ASO hybridization or restriction enzyme digestion. PCR products were prepared for automated sequencing as stated above.
RESULTS
Deriving the Full-Length Coding Region of the TCOF1 Gene.
The published partial cDNA sequence of TCOF1 was extended in both the 5′ and 3′ directions by rapid amplification of cDNA ends. This extended the sequence 548 nucleotides 3′ to the poly(A) tail, which lies 17 nucleotides downstream of a consensus polyadenylylation signal and 505 nucleotides downstream of a translational stop site. 5′ rapid amplification of cDNA ends extended the sequence 126 nucleotides, including an in-frame translational start site (ATG with a Kozak initiation consensus sequence) to an in-frame stop signal (TAA). This gave a composite cDNA of 4,840 nucleotides. The size of the TCOF1 transcript is approximately 5.3 kb, as measured by our Northern blot analysis of mRNA from fetal brain, lung, liver, and kidney [CLONTECH, data not shown; previous estimates of transcript size are 5.8 and 6.3 kb (14)]. When the predicted poly(A) tract length is subtracted from this we estimate that at most there are a few hundred nucleotides remaining in the 5′ untranslated region of the gene. Translation of the 4,233-nucleotide open reading frame predicts a 1,411 amino acid protein of approximately 155 kd. The cDNA sequence can be accessed in GenBank (accession no. U76366U76366).
Sequence Analysis.
The TCOF1 predicted protein is of very low complexity. Alanine is the majority amino acid (14.7%), followed by serine (13.6%), lysine (11.1%), glutamic acid (9.1%), and proline (9.1%). Close visual inspection revealed a repeated motif that mirrors the exon/intron boundaries within the gene (Fig. 1). Within each of these repeated motifs are highly acidic clusters of amino acids, which contain the consensus casein kinase phosphorylation site (S/T-X-X-D/E) (18). Ten of the exons, 7–16, are most similar to each other with exons 8 and 16 having the highest protein identity of 78%. However, exons 3, 6, 17, 20, and 21, although less well conserved, also contain several potential casein kinase phosphorylation sites. Virtually every serine within the TCOF1 repeats could be phosphorylated, and only a few exons, such as the first exon, are without obvious potential phosphorylation sites. The carboxyl terminus of TCOF1 is extremely lysine rich (Fig. 1). This region contains several potential nuclear localization signals (K-K/R-X-R/K) (19, 20) at amino acid 1,285 (KKKKK), 1,314 (KRDK), 1,325 (KKGK), 1,362 (KKEK), 1,370 (KRKKDK), 1,377 (KKEKKKKAKK), and 1,398 (KKKKKKKK). An additional potential nuclear localization signal is contained at the amino-terminal end at position 74 (KKTR).
The entire TCOF1 gene, as well as its individual repeats, were searched against the Saccharomyces cerevisiae and Caenorhabditis elegans genomes to identify potential homologs. A large number of similar regions were detected, most of which are predicted to encode serine- or lysine-rich regions, by blast searches (16). It is not yet clear which if any of these genomic regions might encode a homolog to TCOF1. However, a search of the literature revealed a number of proteins from various species that contain multiple casein kinase phosphorylation sites. One protein, Nopp140 (for nucleolar phosphoprotein) shows striking similarity to the TCOF1 protein. This protein, identified in rat, has been shown to shuttle between the nucleolus and the cytoplasm (21). Nopp140, like TCOF1, is a low complexity protein (17.2% serine/16.1% lysine/14.3% alanine/9.7% glutamic acid/9.2% proline), which contains a 10-fold repeat of alternating acidic and basic regions, with each acidic domain containing numerous casein kinase phosphorylation sites. Nopp140 also contains seven potential nuclear localization signals. It is difficult to assign a numerical value to the degree of similarity between the Nopp140 and TCOF1 because multiple alignments can be generated due to the repetitive nature of each gene. A comparison of the predicted protein sequences of the two most similar regions within each (amino acids 891-1117 of TCOF1, amino acids 302–528 of Nopp140) reveals 21% identity and 35% similarity when conservative changes are included in the calculation.
Genomic Structure and Mutation Analysis.
Locations of exon/intron boundaries were determined by comparing genomic sequence to the extended cDNA sequence. Genomic sequence was derived from cosmids that were identified from a chromosome 5-specific genomic cosmid library (17) and confirmed by sequence from PCR products of each exon amplified from intronic primer pairs. The gene is encoded by 26 exons (49–561 bp in size) with the termination codon in exon 25 (nucleotides 4234–4236). Individual exons plus flanking intron sequences have been deposited in GenBank (accession nos. U84640–U84665U84640U84641U84642U84643U84644U84645U84646U84647U84648U84649U84650U84651U84652U84653U84654U84655U84656U84657U84658U84659U84660U84661U84662U84663U84664U84665) (Table 1). The genomic size of the gene was determined to be greater than 20 kb by Southern blot analysis (data not shown). PCR primer sets were used to amplify 13 of the 26 TCOF1 exons from affected individuals as well as unaffected controls. These PCR products were scanned for mutations by heteroduplex analysis, and mutations were confirmed by direct sequencing. ASOs were developed to the mutant and control sequences in five cases, and hybridization conditions were established that discriminated between the two. The three other mutations resulted in a gain or loss of a restriction enzyme recognition site (Fig. 2). In all, eight mutations were detected in 9 individuals including six familial cases, two sporadics, and one of unknown origin. In two instances, a mutation (deletion of AG) was shared by two unrelated families, one at nucleotide 1408 and the other at nucleotide 2565. In all other cases, mutations were only detected in a single family. Conversely, one individual was found to have two mutations in a single TCOF1 allele, and neither parent was found to have either mutation (Fig. 2 and Table 2). Mutations segregated with the disease in familial cases and were neither detected in unaffected members of the families nor in 60 controls. This study brings the total number of mutations to 18, scattered among 8 different exons and at potential phosphorylation sites in TCOF1 (14, 15). In each case, the mutations are predicted to result in premature termination of translation.
An unexpectedly large number of polymorphisms were detected within TCOF1 in this study: five different polymorphisms were detected while screening for mutations by heteroduplex analysis (Table 3). All of these changes were determined to be polymorphic because they are either silent mutations that do not alter the amino acid; they occur in noncoding regions but do not change the sequence to resemble splice sites; and they were detected in unaffected relatives and controls. Only one polymorphism, a 5-bp insertion in intron 15, occurred in a noncoding region. Three of the five remaining polymorphisms are predicted to produce amino acid changes: valine-518 to isoleucine, alanine-810 to valine, and serine-845 to leucine. The latter two changes occur within the context of a CpG dinucleotide that has an increased propensity for a C to T transition. Thus far, six polymorphisms have been detected (14) in three different exons and at potential phosphorylation sites of TCOF1.
Table 3.
Exon/ nucleotide | Amino acid | Screening method | Frequency* |
---|---|---|---|
10 | P/L439 | Eco0109‡ (P1) | 1% |
1316C/T† | 1/104 | ||
10 | P/P449 | SmaI‡ (P2) | 77% |
1347T/C† | 92/120 | ||
11 | V/I518 | AlwNI‡ (P1) | 3% |
1552G/A | 6/198 | ||
11 | S/S537 | ASO§ | 52% |
1611G/A | AGGAGTCGTCGGACAGTG 53 | 61/118 | |
AGGAGTCATCGGACAGTG 51 | |||
Intron 15 | None | ASO§ | 88% |
2428-20ins | CCTCCAGGCTCTCTCATCC 58 | 151/172 | |
CTCTC (P2) | GGCTCTCTCCTCTCATCC 53 | ||
16 | A/V810 | HphI‡ (P2) | 82% |
2429C/T | 106/130 | ||
16 | S/L845 | ASO§ | 1% |
2534C/T | AGGGCCTTCGGCTGC 47 | 1/122 | |
AGGGCCTTTGGCTGC 45 |
Percentage and absolute number of P2 allele over total number of alleles studied in unrelated individuals, primarily Caucasians.
These polymorphisms were originally reported in ref. 14.
Enzymes used to detect polymorphic sequences are listed. (P1) indicates that the enzyme cuts the first allele (e.g., 1316C), not the second allele (P2) (e.g., 1316T).
ASO primer sequences in 5′ to 3′ orientation are listed with Tm °C used to wash membrane. The top primer hybridizes to P1 and the bottom primer hybridizes to P2.
DISCUSSION
The emerging picture is that TCS is caused by a wide spectrum of private mutations in TCOF1, which has immediate implications for genetic counseling. Since no common mutation has been found to date, mutation data will most likely need to be established for each individual family. This will be facilitated now that the complete genomic structure of the TCOF1 coding region has been established. However, the possibility still exists that TCS can be caused by more than one gene, because recombination in affected individuals has been reported to preclude TCOF1 as the disease-causing gene (10, 14). Although the majority of families appeared to be linked to this region of chromosome 5, heterogeneity would not be detected if two such disease-causing genes were near each other.
It is interesting that TCOF1 has such a large number of detected polymorphisms; the expected rate is approximately one per 1–3 kb of coding sequence (22), and thus far seven independent polymorphisms, one-half with coding changes, have been detected in only a portion of the coding region (exons 4–16, 2,325 bp). This frequency approaches that found in noncoding sequence (22). It may be that redundancy in the structure reflects redundancy in function of the repeated motifs in the protein. Therefore, these variations may be better tolerated, and the selection process against such changes may be less strict. However, this does not fully explain the fact that some 60% of TCS cases appear to arise de novo. The mechanism that drives such a high rate of independent de novo mutations is not clear at this point; one study suggested a slight increase in paternal age in sporadic TCS (2). It will be interesting to investigate if mutations in TCOF1 can indeed be correlated with parental age or male versus female germ-line mosaicism, or if TCOF1 represents a mutational hot spot.
The intriguing sequence and structure of TCOF1 suggest its possible function. First, it contains several potential nuclear localization signals. Upon closer examination, TCOF1 also contains two regions, at amino acids 1,362–1,365 and 1,370–1,385, which resemble previously identified nucleolar localization signals (19, 23–25). In these studies, it was demonstrated that two short basic regions separated by 1–4 amino acids, at least one of which was acidic, was sufficient to target a fusion protein to the nucleolus of COS7 cells. Second, TCOF1 contains numerous casein kinase phosphorylation sites, suggesting a possible regulatory mechanism. Casein kinase is a serine/threonine kinase that requires acidic residues (“determinants”) on the carboxyl-terminal side of the phosphorylated residue; the activity is enhanced by acidic residues on the amino-terminal side as well. It prefers to phosphorylate serine more than threonine, and aspartate serves as a better determinant than glutamate (18). The +3 acidic position is especially critical. Phosphoserine can replace glutamate or aspartate as determinants. Thus, phosphorylation of serine can enhance the phosphorylation of an upstream serine and so on, leading to an overall cooperative mechanism of phosphorylation (26). Third, TCOF1 bears a particularly striking resemblance to the Nopp (“nucleolar phosphoprotein”) family of proteins identified in human (27), rat (21), and Xenopus (28). The Nopp proteins, in turn, resemble other proteins including the yeast proteins NSR1 (29, 30) and NPI46 (31) and human nucleolin (32). Each of these proteins contains repeated domains which contain potential sites of casein kinase phosphorylation, and each has been localized to the nucleolus. Primarily by virtue of their localization these proteins have been proposed to function in some aspect of ribosome assembly. In the case of rat Nopp140, perhaps the best studied, the protein has been visualized shuttling between cytoplasm and the dense fibrillar component of the nucleolus along fibrillar tracks. Moreover, it binds nuclear localization signals contained within peptides in vitro. Thus, it has been proposed that this protein possibly functions as a chaperone, shuttling proteins, presumably ribosomal proteins or preribosomes, in and/or out of the nucleolus (21).
With the elucidation of the genomic structure of TCOF1 it is clear that the repeated structure is a reflection of the exon structure of the gene, suggesting that it arose evolutionarily by a gene duplication mechanism. These repeated regions are rich in serine, lysine, alanine, glutamic acid, and proline; in rat Nopp140 these are also the most abundant amino acids and are organized into repeating acidic and basic regions. Serine residues found within these repeats in rat Nopp140 are heavily phosphorylated by casein kinase in vivo, and it is this form of the protein, and not the unphosphorylated form, which binds nuclear localization signals in vitro, implying that phosphorylation could be a means of regulating the activity of this protein. Perhaps the TCOF1 protein is subject to the same regulation.
The mutations we and others (14, 15) have detected in TCOF1 would cause either decreased mRNA levels or protein truncations and presumably either haploinsufficiency or a functionless protein because of the loss of the nuclear localization signals that are required for its shuttling capacity. How could haploinsufficiency of a widely expressed protein, which perhaps functions in ribosome assembly, cause the specific features of TCS? One possibility is that TCOF1 could function at a critical rate-limiting step during development when high levels of translational activity are essential. For example, Minute strains in Drosophila melanogaster harbor mutations in different components of the translational machinery. These strains are homozygous lethal, yet heterozygotes display a common structural phenotype of small bristles and small body size (33). Alternatively, deficiency of a TCOF1-shuttled protein involved specifically in craniofacial embryogenesis may lead to the restricted pattern of anomalies seen in TCS. Another question to be answered is why TCS displays such a variable phenotype. Perhaps other cellular proteins modify or partially replace the activity of TCOF1. Although a loss of function is likely, a dominant negative mechanism cannot be ruled out and could also explain the variability in expression (34). These speculations await cellular localization and functional studies of the TCOF1 protein.
Acknowledgments
We thank Roxann Ashworth, A. F. Scott, and T. D. Howard for generous assistance. This work was supported by National Institutes of Health Grants DE10180 and DE11131 (E.W.J.), Mental Research Center Grant HD24061, Outpatient General Clinical Research Center Grant RR00722, and Pediatric Clinical Research Center Grant RR00052.
ABBREVIATIONS
- TCS
Treacher Collins Syndrome
- GSP
gene-specific primer
- ASO
allele-specific oligonucleotide
Footnotes
References
- 1.Gorlin R J, Cohen M M, Levin L S. Syndromes of the Head and Neck. Oxford: Oxford Univ. Press; 1990. [Google Scholar]
- 2.Jones K L, Smith D W, Harvey M A S, Hall B D, Quan L. J Pediatr. 1975;86:84–88. doi: 10.1016/s0022-3476(75)80709-8. [DOI] [PubMed] [Google Scholar]
- 3.Sulik K K, Johnston M C, Smiley S J, Speight H S, Jarvis B E. Am J Med Genet. 1987;27:359–372. doi: 10.1002/ajmg.1320270214. [DOI] [PubMed] [Google Scholar]
- 4.Poswillo D. Br J Oral Surg. 1975;13:1–26. doi: 10.1016/0007-117x(75)90019-0. [DOI] [PubMed] [Google Scholar]
- 5.Wiley M J, Cauwenbergs P, Taylor I M. Acta Anat. 1983;116:180–192. doi: 10.1159/000145741. [DOI] [PubMed] [Google Scholar]
- 6.Dixon M J, Read A P, Donnai D, Colley A, Dixon J, Williamson R. Am J Hum Genet. 1991;49:17–22. [PMC free article] [PubMed] [Google Scholar]
- 7.Jabs E W, Li X, Coss C A, Taylor E W, Meyers D A, Weber J L. Genomics. 1991;11:193–198. [PubMed] [Google Scholar]
- 8.Dixon M J, Dixon J, Raskova D, Le Beau M M, Williamson R, Klinger K, Landes G M. Hum Mol Genet. 1992;1:249–253. doi: 10.1093/hmg/1.4.249. [DOI] [PubMed] [Google Scholar]
- 9.Jabs E W, Li X, Lovett M, Yamaoka L H, Taylor E, Speer M C, Coss C, Cadle R, Hall B, Brown K, Kidd K K, Dolganov G, Polymeropoulos M H, Meyers D A. Genomics. 1993;18:7–13. doi: 10.1006/geno.1993.1420. [DOI] [PubMed] [Google Scholar]
- 10.Dixon M J, Dixon J, Houseal T, Bhatt M, Ward D C, Klinger K, Landes G M. Am J Hum Genet. 1993;52:907–914. [PMC free article] [PubMed] [Google Scholar]
- 11.Li X, Wise C A, Le Paslier D, Hawkins A L, Griffin C A, Pittler S J, Lovett M, Jabs E W. Genomics. 1994;19:470–477. doi: 10.1006/geno.1994.1096. [DOI] [PubMed] [Google Scholar]
- 12.Dixon J, Gladwin A J, Loftus S K, Riley J, Perveen R, Wasmuth J J, Anand R, Dixon J J. Am J Hum Genet. 1994;55:372–378. [PMC free article] [PubMed] [Google Scholar]
- 13.Loftus S K, Dixon J, Koprivnikar K, Dixon J J, Wasmuth J J. Genome Res. 1996;6:26–34. doi: 10.1101/gr.6.1.26. [DOI] [PubMed] [Google Scholar]
- 14.The Treacher Collins Syndrome Collaborative Group. Nat Genet. 1996;12:130–136. doi: 10.1038/ng0296-130. [DOI] [PubMed] [Google Scholar]
- 15.Gladwin A J, Dixon J, Loftus S K, Edwards S, Wasmuth J J, Hennekam R C M, Dixon M J. Hum Mol Genet. 1996;5:1533–1538. doi: 10.1093/hmg/5.10.1533. [DOI] [PubMed] [Google Scholar]
- 16.Althschul S, Gish W, Miller W, Meyers E, Lipman D. J Mol Biol. 1990;215:403–410. doi: 10.1016/S0022-2836(05)80360-2. [DOI] [PubMed] [Google Scholar]
- 17.Longmire J L, Brown N C, Meincke L J, Campbell M L, Albright K L, Fawcett J J, Campbell E W, Moyzis R K, Hildebrand C E, Evans G A, Deaven L L. Genet Anal Tech Appl. 1993;10:69–76. doi: 10.1016/1050-3862(93)90037-j. [DOI] [PubMed] [Google Scholar]
- 18.Kuenzel E A, Mulligan J A, Sommercorn J, Krebs E G. J Biol Chem. 1987;262:9136–9140. [PubMed] [Google Scholar]
- 19.Dang C V, Lee W M F. J Biol Chem. 1989;264:18019–18023. [PubMed] [Google Scholar]
- 20.Chelsky D, Ralph R, Jonak G. Mol Cell Biol. 1989;9:2487–2492. doi: 10.1128/mcb.9.6.2487. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 21.Meier U T, Blobel G. Cell. 1992;70:127–138. doi: 10.1016/0092-8674(92)90539-o. [DOI] [PubMed] [Google Scholar]
- 22.Bowcock A M, Cavalli-Sforza L L. Genomics. 1991;11:491–498. doi: 10.1016/0888-7543(91)90170-j. [DOI] [PubMed] [Google Scholar]
- 23.Garcia J A, Harrick D, Pearson L, Mitsuyasu R, Gaynor R B. EMBO J. 1988;7:3143–3147. doi: 10.1002/j.1460-2075.1988.tb03181.x. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 24.Dang C V, Lee W M F. Mol Cell Biol. 1988;8:4048–4054. doi: 10.1128/mcb.8.10.4048. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 25.Hunt C, Morimoto R I. Proc Natl Acad Sci USA. 1985;82:6455–6459. doi: 10.1073/pnas.82.19.6455. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 26.Meggio F, Pinna L A. Biochim Biophys Acta. 1988;971:227–231. doi: 10.1016/0167-4889(88)90196-6. [DOI] [PubMed] [Google Scholar]
- 27.Pai C-Y, Chen H-K, Sheu H-L, Yeh N-H. J Cell Sci. 1995;108:1911–1920. doi: 10.1242/jcs.108.5.1911. [DOI] [PubMed] [Google Scholar]
- 28.Cairns C, McStay B. J Cell Sci. 1995;108:3339–3347. doi: 10.1242/jcs.108.10.3339. [DOI] [PubMed] [Google Scholar]
- 29.Lee W-C, Xue Z, Melese T. J Cell Biol. 1991;113:1–12. doi: 10.1083/jcb.113.1.1. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 30.Lee W-C, Zabetakis D, Melese T. Mol Cell Biol. 1992;12:3865–3871. doi: 10.1128/mcb.12.9.3865. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 31.Shan X, Xue Z, Melese T. J Cell Biol. 1994;126:853–862. doi: 10.1083/jcb.126.4.853. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 32.Srivastava M, McBride O W, Fleming P J, Pollard H B, Burns A L. J Biol Chem. 1990;265:14922–14931. [PubMed] [Google Scholar]
- 33.Boring L F, Sinervo B, Shubiger G. Dev Biol. 1989;132:343–354. doi: 10.1016/0012-1606(89)90231-5. [DOI] [PubMed] [Google Scholar]
- 34.Kern S E, Pietenpol J A, Thiagalingam S, Seymour A, Kinzler K W, Vogelstein B. Science. 1992;256:827–830. doi: 10.1126/science.1589764. [DOI] [PubMed] [Google Scholar]