Abstract
We originally showed that the protocadherin 15 gene (Pcdh15) is necessary for hearing and balance function; mutations in Pcdh15 affect hair cell development in Ames waltzer (av) mice. Here we extend that study to better understand how the Pcdh15 operates in a cell. The original report identified 33 exons in Pcdh15 with exon 1 being noncoding; additional exons of Pcdh15 have since been reported. The thirty-three exons of Pcdh15 described originally are embedded in 409 kb of mouse genomic sequence whilst the corresponding exons of human PCDH15 are spread over 980 kb of genomic DNA; the exons in Pcdh15/PCDH15 range in size from 9 to ∼2000 bps. Genomic organization of Pcdh15/PCDH15 bares similarity to Cadherin 23, but differs significantly from other protocadherin genes, such as Pcdhα, β, or γ. A CpG island is located ∼2900 bps upstream of the PCDH15 transcriptional start site. The Pcdh15/PCDH15 promoter lacks TATAA or CAAT sequences within 100 bases upstream of the transcription start site; deletion mapping showed that Pcdh15 harbors suppressor and enhancer elements. Preliminary searches for alternatively spliced transcripts of Pcdh15 identified novel splice variants not reported previously. Results from our study show that both mouse and human protocadherin 15 genes have complex genomic structures and transcription control mechanisms.
Keywords: PCDH15, genomic structure, promoter elements, alternative splicing
Introduction
Protocadherins are non-classic cadherins that are structurally and functionally divergent from classic cadherins [1; 2]. Mutation of mouse protocadherin 15 (Pcdh15) affects normal development of sensory hair cells in the inner ear, causing deafness and balance problems in av mice [3]. Mutations in its human ortholog, PCDH15, cause Usher syndrome 1F (USH1F), a form of sensory impairment in which deafness and blindness cosegregate [4; 5]. Mutations in PCDH15 are also associated with nonsyndromic deafness [6]. Elucidation of cis-acting promoter elements, alternate splicing and the genomic structure of Pcdh15 would help us better understand how protocadherins function in the inner ear and eye of mammals.
Promoters play a pivotal role in gene regulation. The murine Pcdh15 is expressed in hair cells and other epithelial cells and it is important to identify DNA sequence elements that control expression of Pcdh15/PCDH15. For example, identification of cis-acting elements of Pcdh15 would be valuable for the generation of conditional knockouts in mice and identification of cis-acting elements of PCDH15 would be a useful tool for gene therapy. Though mouse and human genome sequences are available, there are several reasons why dedicated analysis is required. First, unlike an exon, a promoter is an ill-defined unit. Typically, a sequence upstream of the transcription start site, including the start site, is considered the putative promoter. Second, in general, the putative promoter does not share extensive homology even with functionally correlated genes. The latter prevents detection by commonly used search programs such as BLAST or FASTA. Therefore, we applied a combination of in silico prediction tools coupled with in vitro verification protocols to identify promoter elements of Pcdh15. Results of this effort are presented here.
Alternative splicing regulates gene expression in a different way and it is a significant component of the complexity of the mammalian genome. The Cadherin 23 (CDH23) gene is associated with USH1D and DFNB12 (nonsyndromic autosomal recessive deafness) [7; 8]; mutations in the murine Cdh23 are associated with the deaf-circling mouse mutation, waltzer [9]. It was reported that different spliced variants were expressed in the ear and the eye, leading to different protein-protein interactions between CDH23 and other proteins [10]. In separate reports, it was shown that mutations in the Harmonin gene caused Usher syndrome 1C [11], while a mutation in alternatively spliced exon D of the Harmonin gene is linked to non-syndromic deafness [12]. Similarly, identification of spliced products may help to explain certain cases of non-syndromic mutations in PCDH15 and may also help explain how the gene mediates its function at the protein level.
Various members of the Protocadherin family are also known to express alternatively spliced gene transcripts [13]. The first evidence for alternative splicing in Pcdh15 came from Northern blot analysis, which demonstrated three large transcripts (10-12 kb) [4]. Isoforms consisting mostly of the cytoplasmic domain (CP) [6] and the extracellular domain (EC) [14] have also been described, and most recently, several other spliced products of Pcdh15 have been reported [15]. Here we report novel alternatively spliced forms of Pcdh15. Though the functional relevance of these isoforms needs to be determined in separate experiments, it is clear that the mechanism regulating Pcdh15/PCDH15 expression is complex.
Protocadherins are distinct from classical cadherins, the other subfamily of cadherins. Classical cadherins are defined by having five cadherin repeats in the EC domain, a single transmembrane domain, and a conserved cytoplasmic (CP) domain which interacts with β–catenin [1; 2]. While protocadherins share the cadherin repeat in the EC domain, they have a higher number of repeats. Additionally, protocadherins have greater variability in the CP domain, showing no homology to other members of the cadherin superfamily. The overall 3D structure of the cadherin repeat motif is quite similar to the immunoglobulin motif and it has been speculated that there may be some functional similarity between the two families [17; 18; 19]. Like immunoglobulins, protocadherins may play various roles within multi-cellular organisms. In this report, we compare the genomic structure and exon-to-domain organization of Pcdh15 to various members of the cadherin family. Our report shows that the genomic structure of Pcdh15/PCDH15 is unique among cadherins. The characterization of Pcdh15 / PCDH15 described in this report is useful for investigations involving the regulation, function or mutational analysis of these genes.
Materials and Methods
Human and mouse protocadherin genes referenced in this report, and the numbering of the exons in Pcdh15 are based on the original cDNA sequence submitted to GenBank (PCDH15 cDNA, accession no. AY029205; mouse Pcdh15, AF281899) [3; 4]. Additional exons of Pcdh15 reported [6; 15] since the original publication of Pcdh15 [3] have not been included in this study.
Determination of genomic sequence and alignment
Portions of the PCDH15 cDNA sequence were compared with sequences catalogued by the Human Genome Project using the BLAST search engine[20]. Matching sequences with a high level of agreement were aligned using the Sequencher program (GeneCodes, Michigan).
The presence of AG/GT acceptor/donor splice sites coinciding with abrupt changes from matching bases to mismatched bases was used in determining intron/exon boundaries. Because of the high homology between the human and mouse sequences, close agreement in the intron/exon boundaries was inferred and then confirmed by using the SSAHA search engine to find matching trace sequences (http://trace.ensembl.org). The boundary was assumed to be correct if an AG/GT site was found that coincided with the sudden occurrence of mismatching regions. Sequence analysis of putative exons and flanking splice junctions compared to the sequence of RT-PCR products confirmed the boundaries.
Determination of potential CpG island
The following criteria were used to determine a potential CpG island 135 kb upstream and 90kb downstream of exon 1: an observed/expected ratio of CpG >0.60, where the expected number of CpG per 200 nt was calculated by dividing the product of the number of C nt times the number of G nt by the overall number of nt in the window; a percent C + percent G > 40.00%; and a window length >200 nt. The prediction program CpG Plot/CpG report portion of the EMBOSS suite was used in the prediction (http://www.ebi.ac.uk/Tools/).
Prediction of exon-flanking primers
The exon-flanking primers were predicted using the Primer 3 program (http://www-genome.wi.mit.edu/cgibin/primer/primer3_www.cgi). Primers for exon 33 were designed so that the PCR products would overlap and cover the entire exon.
Determination of alternatively spliced products
RNA isolated from adult wild type mouse brain, eye, or neonate mouse cochlea was used in reverse transcription reactions with Superscript reverse transcriptase (Gibco BRL) following the manufacture's protocol. In general, PCR conditions were: 94°C for 2 min., followed by 30 cycles of 94°C for 30 s, 55°C for 30 s and 72°C for 1 min. The annealing temperature was adjusted based on the Tm of the primers. We designed primers complementary to Pcdh15 cDNA spaced approximately .5kb, 1kb, 1.5kb and 2kb to scan the entirety of the genes. Agarose gel electrophoresis was done on the amplified cDNA produced from the mRNA. For mouse Pcdh15 only, when multiple products or products of unexpected size resulted, these products were sequenced. Sequencing was done in both directions using the ABI 373A Dye terminator cycle sequencing system (Perkin Elmer) with the primers used for PCR. Sequences were compared with the Pcdh15 sequence in GenBank.
Promoter analysis
The 10kb segment upstream of exon 1 in Pcdh15 was run through several promoter prediction programs (Signal Scan, Promoter 2.0, ProScan, BLAST, MAR Finder, Promoter Inspector) that did not predict any traditional promoter elements, such as a TATAA box or CAAT boxes. We then created a series of truncated versions of the putative promoter region from the 10Kb segment upstream of exon 1 using PCR. These truncated segments which included exon 1 were fused with a lacZ reporter gene and were expressed as plasmids transfected in mouse 3T3-L1 fibroblast cells and assayed for β-galactosidase activity. Thirteen different plasmids (letters A-M) were created using progressively smaller fragments from the 10Kb segment (Figure 6a).
Figure 6.
A. Diagram showing the various fragments used for transfection and reporter assays. The encompassed region includes exon 1 and 10kb upstream. The plasmids ( pBlue TOPO ® vector from invitrogen) (lettered A- M) used contain a lacZ reporter gene downstream of the above fragments. B. Shows the results of transfection and reporter assay of the fragments shown in figure A. Numerical data generated were normalized by using an in situ plate assay to determine the transfection efficiency. The results of the plasmid alone showed essentially the same activity as plasmids G through M (data not shown) C. 35 kb upstream of exon 1 from mouse and Human PCDH15 is analyzed for regulatory elements with Signal Scan, Promoter 2.0, Proscan, BLAST, MAR finder, and Promoter Inspector. Rat PCDH15 is also shown for comparison. Areas of high homology exist between the rat and mouse. The mouse gene contains many potential regulatory sequences not found in humans. No TATA box or CAAT box sequences were found for either species.
Results
Intron/exon boundaries and intron sizes
We determined the intron/exon boundaries for human PCDH15 and identified 33 exons that ranged in size from 9 to 2069 base pairs (bps) (Table 1). The exon structure of mouse Pcdh15 is similar to the human gene (Table 2). Intron sizes vary substantially, ranging from 1016 to 144,560 bps (Figure 1 a&b). To determine the size of both PCDH15 and Pcdh15 we used the NCBI map viewer (http://www.ncbi.nlm.nih.gov/mapview/). The size of the gene for Human protocadherin 15 including the intronic sequences was approximately 980,000 bp. The mouse homolog was less than half as long, at just over 409,000 bp. Exon flanking sequences have the potential to be of significant value in mutation detection in humans and will make it possible to evaluate the contribution of PCDH15 to hereditary deafness, vestibular dysfunction and retinitis pigmentosa (Table 1&2).
Exon-intron boundaries of the human PCDH15 gene
Exon | 3′ Acceptor sequence | 5′ Exon | Size(bp) | 3′ Exon | 5′ Donor sequence | Intron (bp) |
---|---|---|---|---|---|---|
1a | cttcagtgccagtgaaatcaacgtgctgag | CTAATA......... | 355 | .........CCGCAA | gtaagttcttcccccgcccttctcttattt | 144560 |
2 | aatgtttctttctaccttttcttatttcag | ATCAGC......... | 119b | .........ATGATG | gtaaggttgcttttagcttgacctttagag | 138731d |
3 | ttattacttttggttattttctctctttag | ATTGCA......... | 66 | .........GGAATG | gtaagtgagttttttattttacaaatgtct | 122041d |
4 | tgttttcaagtacctttctttggtttgcag | GTACAA......... | 161 | .........AGAGAT | gttagtgattttttttctttgcccctgtaa | 9707 |
5 | attttaaaagatgcaatttctacattgcag | CCACCG......... | 156 | .........AATGAG | gtcagtttctatttactctctctctgtcct | 22635 |
6 | attgatatttaaaatcttttttcctttcag | CTCACT......... | 120 | .........GATCCG | gtatgtaaatacctatttgcaatgtaattt | 16658 |
7 | atctgaagtctgtatctttaatatttgcag | ACATCC......... | 111 | .........GCTAAT | gtgagtattcaatttcttttaaatttcttc | 12154 |
8 | ctacctgttgtttgtttttctccctaacag | GACCGT......... | 171 | .........ACTCCG | gtaaatatactcttatctttctccatttat | 80339 |
9 | tgaatgaaactctcttaatttttttttcag | GAAGAA......... | 109 | .........TTGTTG | gtttgtatctgataacattttacactatca | 22774 |
10 | attatgtactgaaatctcaatgtcttccag | GGACTC......... | 113 | .........ATTAAG | gtaaattaagctaatatcttatttccttgt | 18046 |
11 | ctaaatattttcttttcttttcaattttag | GCTGAA......... | 207 | .........GAAGAT | gtaagttaaatacatattttgctcttgttt | 10414 |
12 | cgtttgaaacgatgtcattttttttcaaag | ACAAAA......... | 135 | .........TTTTCG | gtaaagagcttttgtgtaaatatacatatg | 1540 |
13 | tctactaagtgtatttttctcctcccaaag | ATAACA......... | 150 | .........ATACAG | gtaactactcacataatataagtgtacact | 47374 |
14 | gaggatttatgttgtgattttcaattccag | CTCACT......... | 194 | .........GCGAAG | gtaatttatattagaaccaatgcagtcttt | 20092 |
15 | accatcttcttttcctctctttctctgcag | GAACTC......... | 133 | .........CTACAG | gtatgttccttggtgtgtgtgtgtgtgtgt | 42811 |
16 | tgataattgaattatgttttctctctgaag | GCAACT......... | 80 | .........AGAAAC | gtaagttaaaaatgatttcctactttaaaa | 10559 |
17 | gtcttatttgtttgtttgttttgtcactag | CACGGG......... | 94 | .........GATGGG | gtgagtatgctccctgtgaaggggttttaa | 12445 |
18 | tcacatttaatgcttgtgtttttctttaag | ACCTCA......... | 129 | .........GTAAAA | gtaagtggaatatttatatagatcttatat | 43566 |
19 | atttttttgtttttttgttttatttaatag | GCAACA......... | 306 | .........ATAGAG | gtgtgtgggaaaaagcatttattacatgaa | 2475 |
20 | ctaataaaactattcttctgcttattacag | GCCAAA......... | 225 | .........GTAAAG | gtaagggctagagatgacttctggttgcct | 24354 |
21 | aagtactaaactcctaattgtgttttacag | GATATG......... | 117 | .........CCTCCT | gtaagtagaaggcattgattaatattctga | 33756 |
22 | gcaaattaataattaataattttattttag | GGATTA......... | 141 | .........TTTAAG | gtcagtgtctcttttcattatgtggtctat | 1907 |
23 | ctttaacatgaatttttattgctttttaag | TTGGTG......... | 113 | .........ATATAG | gtattgatttctgtccgattttgaaataca | 18756 |
24 | ataaatacatcatcttctctaaacctgcag | ACCTCC......... | 112 | .........AAGAAG | gtgagtagacttttagtttgtgttctcacc | 1910 |
25 | atgaccaggtaattgtattttaatttctag | ATACAT......... | 139 | .........CAAAAA | gtaagttatactacttaaagcttttaccag | 35444 |
26 | ataaaaaatgcatcgtttcttgtcctccag | GCAATA......... | 128 | .........GTGAAG | gtatgcggaattatgcttaatatacagaac | 36378d |
27 | ataatctctttttttttcccctccgtctag | GCTACT......... | 216 | .........GTACTC | gtaagtagataaaacttcaggaaagggaag | 9378 |
28 | ttaaatgaattctgttttttcttctttcag | GTCTCC......... | 86 | .........TACAGA | gtaattgattttgtctccttatatatttat | 16678 |
29 | tgtcatgtttgtactttgtttgatttatag | GATCTT......... | 177 | .........TTTTAA | gtaagtgctttttgttatctgtaacagctc | 8788 |
30 | attaattttactatttatctaacaataaag | ATTTTT......... | 219 | .........CAGACA | gtaagtattcaaggatggtaaacctggata | 2741 |
31 | tatttttcttactcttgctttcccaaacag | GTTT> | 9 | <AAAGT | gtaagtataattaacattttacacatcctt | 1016 |
32 | gttttacaagccatccttactttatccaag | ACGTCA......... | 156 | .........CTCAAT | gtaagtagaccgttcaggaactctgagaag | 4034 |
33 | ttacatttcattatcttcttttctttcaag | TCTTTT......... | 2069c | |||
total: | 6816 | total: | 974061 |
Exon 1 is untranslated.
Exon 2 has a 5′-untranslated region of 28 bp followed by a 91 bp coding region.
Exon 33 has a 1501 bp coding region followed by a 568 bp 3′-untranslated region.
These intron sizes could not be counted exactly, but could be accurately estimated from contig NT_02478.5.
Exon-intron organization of the mouse Pcdh15 gene
Exon # | 3′ Acceptor sequence | 5′ Exon | Exon sz. (bp) | 3′ Exon | 5′ Donor sequence |
---|---|---|---|---|---|
1a | ggagcgctaattgcttttcagccagctgat | CAACGT......... | 377 | .........TTAAAG | gcaagtaagttccttgctcccagcttttgc |
2 | aatgtgctttcctgcttcctgttatttcag | ATCAGC......... | 119c | .........ACGATG | gtaagtccacctttactctgatctctgcag |
3 | aattttaacctttggttatcttttctctag | ATTGCA......... | 66 | .........GAAACG | gtgagtgtgttctgtatttcccagtatttt |
4 | tggttacaaatagctgtgtttggtttgcag | GTACAA......... | 161 | .........AGAGAC | gtcagtagctttatttttctctgaccctgt |
5 | tgtctaaaacgatgtcatttcttcccgcag | CCACCA......... | 156 | .........AATGAG | gtctgtgatcctgcgctgatatgtgtgttc |
6 | cagacagtttctctttcatgttctttccag | CTCACT......... | 120 | .........GATCCG | gtaagtactcagggtggtaattgttgtgag |
7 | gcctgacttgtgtctacttaacacttgcag | ACATCC......... | 111 | .........GCAAAT | gtgagtgcttgccttctcttaccttctgtt |
8b | tccgttaccctcatcag | GACCGT......... | 171 | .........ACTCCG | gtaagtacaatttcttcttctttctttcct |
9 | cacagctaaacactctgaattttcttccag | GAAGAA......... | 109 | .........TTGTCG | gtttgtatttgccaaaattttatcctgtta |
10 | cagttctgaactgagtctcttgtctcacag | GCACCC......... | 113 | .........ATTAAG | gtaaattaaactaaaatcttatttcctttt |
11 | atcatagctaatttttctttaatttttcag | GCTGAG......... | 207 | .........GAAGAC | gtaagttccatgactatcttagcaattatt |
12 | ttttaagtaatgtatttttttctttcaaag | ACAAAA......... | 135 | .........TTTCTG | gtaaaatatttaatttatgtgtatattagt |
13 | gcctctctactgcgttctgtttctccgcag | ATAACA......... | 150 | .........ATTCAG | gtaaggccactggatggcaggtgtgtttgc |
14 | aagatttgtattgtgccttttcaattccag | CTGACA......... | 194 | .........AAGAAG | gtgactccaccaggccctgtagttctatag |
15 | taaacacccatctctcttctccctgtgtag | GCACTC......... | 133 | .........CTACAG | gtgggcctgggatatggatctgtgtgtctg |
16 | tataattgaattatgttttcttctctgaag | GCAACT......... | 80 | .........AGAAAC | gtaagttacaagggaatgatttatacttga |
17 | ttcttttgttttgtttctttctgcctctag | CACAGG......... | 94 | .........GATGGA | gtaagtgttaccacaggggcggggatgcta |
18 | acactgaatcttctcttctcttcccttaag | ACCTCA......... | 129 | .........GTCCGG | gtaagtggaatcatcgtttagcttatgtgt |
19 | gttgtgcgaacttttgtttcttccatttag | GCAACA......... | 306 | .........ATAGAG | gtgtgtagacaggcattaactccatgacat |
20 | gtaaccatcctcctctatgacttgtcatag | GCCAAG......... | 225 | .........GTGAAG | gtaaggaaccttttctctctatatgccaca |
21 | acagcactaaatctcaattctacttcacag | GACATG......... | 117 | .........CCACCT | gtaagtagagatgctggccaatgctgtggg |
22 | catcccgatcgatcaattatcctttttcag | GGGATG......... | 141 | .........TTCAAG | gtcagtgtgccattttgtttgagacatcta |
23 | atctgtatgcggattctgtcacttttttag | CTGGTG......... | 113 | .........ATACAG | gtatgaattcccatcaggctttaaaacacc |
24 | agaaaatgaacaccccttttaaccctgcag | ACCTCC......... | 112 | .........AGGAAG | gtaggtaacgagttttgcctttgttttctt |
25 | gaatctaatgagttgtatgctggattccag | ACAAGT......... | 139 | .........CAAAAA | gtaagttacattgcttaaaaggttgacagt |
26 | tcaccaactaagatatgtcttgtcttccag | GCAATA......... | 128 | .........GTGAAG | gtatgaggagagaatctttgatgggaaact |
27 | atgatttgtcatgtggttttctcttcgtag | GCCACC......... | 216 | .........GTACTG | gtaagcgcatgtgtgaggggcatgagacac |
28 | aattaaatacattctattttcacttttcag | GTCTCC......... | 86 | .........TACAGA | gtaaaggattttgtttgttaatagattttc |
29 | tgtcatatttatgtgttattttatatttag | GATTTT......... | 177 | .........TTTTAA | gtaagtggtttttattgcacttcacctcca |
30 | ttagtttttcaatctgtcacacaacaaaag | GTTCCT......... | 219 | .........CCGACA | gtgagtatgcagttgctgggtggatattct |
31 | cttcttcttgcacactgctttccaaaacag | GTTT> | 9 | <AAAGT | gtaagtatgattacaaatattttgcacatg |
32 | gtcttacaagacaaccttactctactcaag | ACGCCA......... | 156 | .........CGCAAT | gtgagtaatgagggaggataactgtgtggg |
33 | tcacaacttgtctccttcttctctttcaag | TCTTTT......... | 2154d |
Exon 1 is untranslated.
Exon 8 did not have any additional high quality shotgun sequence to extend the 3' intron sequence.
Exon 2 has a 5′-untranslated region of 28 bp followed by a 91 bp coding region.
Exon 33 has a 1516 bp coding region followed by a 638 bp 3′-untranslated region.
Figure 1.
Genomic Structure of human PCDH15. A. Green boxes represent individual exons 1-33. (proportional to other exons, but not to scale of diagram) Dotted line indicates gaps where genomic sequence could not be aligned, but sizes could be inferred from contig NT_024078.5. B. Scaled down figure of the PCDH15 gene to show all exons in a single line and their positions along the ∼981 kb genomic DNA. C. Spliced exons aligned with predicted amino acid sequence and predicted domains.
By aligning the predicted domains with the amino acid sequence as determined from the nucleotide sequence of the gene we determined the exons responsible for encoding the different domains (Figure 1c). Exons 2 through 28 encode the cadherin repeats, with an average of 2.5 exons per repeat.
CpG island and 5′ UTR analysis
A CpG island 215 bps long was found approximately 2.9 kb upstream of the transcriptional start site. The only other potential upstream CpG islands are at least 60 kb away (Figure 2). Interestingly, an AG acceptor splice site was present for human protocadherin 15, which suggests the existence of another exon upstream of exon 1.
Figure 2.
CpG Island: 225 kb of human PCDH15 genomic sequence analyzed for CG content and putative CpG islands. The test thresholds for a reported island were an observed/expected ratio 0.60, a percent C + percent G > 40.00, and a length >200 bp. The putative island directly upstream of exon 1 has a length of 215 bp.
Sequence analysis of the mouse Pcdh15 and human PCDH15
Mouse Pcdh15 and human PCDH15 are highly similar in their cDNA sequences and predicted amino acid sequences (Figure 3). In the EC domain, the cDNA sequences are approximately 85% similar at the nucleotide level and 94% similar at the peptide sequence level. Similarity drops significantly in the CP domain with only 53% of the predicted amino acid sequences matching between the two species. Despite this dissimilarity between the amino acid sequences, within the CP domain, the two proline-rich domains are almost identical between the two species, suggesting that these domains are functionally significant.
Figure 3.
Comparison of similarities in predicted amino acid sequence between mouse and human. TM = transmembrane domain. SS = Signal Sequence
Comparison of the genomic structure of PCDH15 to other protocadherins in the database showed an important feature that may be unique to Pcdh15/PCDH15. Other protocadherins genes, such as the PCDHα, PCDHγ and the δ protocadherin subfamily, are characterized as having one large exon that encodes all of the cadherin repeats [19; 21]. PCDH15 , in contrast, has 27 exons encoding its 11 cadherin repeats. Comparison to the PCDHα and PCDHγ gene clusters shows this stark difference in genomic organization (Figure 4). Studies of the genomic organization of a number of classic cadherin genes have revealed that the DNA sequences encoding individual extracellular domains are interrupted by two introns [22; 23; 24; 25; 26]. The number of introns per cadherin repeat is not constant in PCDH15 but is typically two or three.
Figure 4.
Comparison of the genomic organization of A.)Pcdh alpha/gamma clusters to B.) Pcdh15 to C.) Cdh23. In A the upper orange bars represent the coding genomic regions. The lower bar represents the generic protein structure of members of the Pcdh alpha/gamma clusters. Notice that one large region of DNA codes for all of the cadherin repeats in the cytoplasmic domain (yellow). In B and C the upper orange and black bars represent coding genomic regions. Exons are the orange hash marks and the black regions represent introns. The lower bars for B and C represent the protein structure for Pcdh15 and Cdh23 respectively. Notice that the cadherin repeats in both Cdh23 and Pcdh15 are coded for by multiple exons.
Determination of Alternatively Spliced Products
RT-PCR analysis with forward primers (94 and/or 109) on exons 1 and 3, and reverse primers (95 and/or 109) on exons 6 and 10, showed one major product and two minor products (Figure 5a). The major product was the expected one which included all of the exons internal to the primers. The minor products included one transcript which skipped exon 4, and another which skipped exons 2 and 4(Figure 5B and 5C). Exon 2 codes for amino acids MFLQFAVWKCLPHGILIASLLVVSWGQYDDD, and exon 4 codes for TILVDNMLIKGTAGGPDPTIELSLKDNVDYWVLLDPVKQMLFLNSTGRVLDRD. The second region of alternative splicing was contained completely within exon 33. A portion with amino acids LFLLY…HFEQS is spliced out as a minor product. The major product is, as expected, the entire exon 33 (Figure 5D).
Figure 5.
Analysis of the various transcripts produced by alternative splicing in Pcdh15. A. Alternative spliced forms of Pcdh15. This schematic represents (not to scale) the exons in the Protocadherin 15 gene with forward and reverse primers (94,95,108,109) used in thePCR reaction. The major and minor products are shown with all included exons attached by solid lines. Note: The numbering of the exons in this figure is based on the original description of Pcdh15 [3; 4]. To use the criteria established by Haywood-Watson et. al.[15] one exon should come before the first exon shown and all exon numbers in the figure would increase by one. B. Gel analysis of RT-PCR products This PCR reaction includes phi markers (PhiX174) on left and in lane (1) primer sets 94 & 95, in lane (2) primer sets 108 & 109, and in lane (3) primer sets 94 & 109. Both the major and minor alternatively spliced products are present in this PCR reaction. C. Amino acid sequence of exons 2, 3, 4 and 5 of Pchd15. D. Various Transcripts produced by alternative splicing within exon 33.(Upper) Represents 3′ end of Protocadherin 15 gene with minor expressed protein product shown which splices out 33.b segment. (Lower) Represents segment of exons present in 3′ region with amino acid sequences included. Bold face amino acid sequence represent sequence alternatively spliced out in the minor product of this gene.
Promoter analysis
The expression of the Pcdh15 gene in vitro based on the lac-Z reporter assay results for the various plasmid constructs displayed both up and down regulation of the Pcdh15 gene (Figure 6a,b). Plasmid D (approximately 5.5Kb upstream through exon 1) showed the highest activity. Plasmids E and F contained in addition to the same 5.5 Kb segment as plasmid D, even more of the upstream segment. These plasmids demonstrated much lower expression of Pcdh15, indicating the presence of suppressor elements in this region. This was further substantiated by plasmids G-J, which had only upstream elements in the range of 10kb-5.5kb and displayed minimal expression of Pcdh15. The smaller plasmids (A-C) had increased expression as compared to plasmids E-J but had less expression than Plasmid D. See Figure 6 for the makeup of all of the constructs as well as their effects on the expression of Pcdh15.
Discussion
In the original description, PCDH15 consists of 33 exons[3; 4]. Twenty seven of the exons code for 11 cadherin repeats in its EC domain and one large exon codes for ∼90% of the cytoplasmic domain, while typical protocadherins appear to have one large exon coding for the EC repeats and multiple exons coding for the cytoplasmic domain. A comparison of the genomic structure of mouse Pcdh15 to mouse Cdh23 (Cadherin 23) is noteworthy in this regard. The 27 cadherin repeat domains of the mouse Cdh23 are coded by 59 exons and the CP domain is coded by 6 exons, one of which is quite large (927 bp)[27]. The genomic structure of mouse Cdh23 shows a pattern similar to mouse Pcdh15 in that both have an EC domain containing multiple cadherin repeats encoded by multiple exons and both have a CP domain of which a major portion is encoded by one exon. The genetic structure of Pcdh15 bears more resemblance to Cdh23 than it does to other members of the Protocadherin family (Figure 4). While the significance of the genomic organization of protocadherins remains to be elucidated, a single exon coding for most of the cytoplasmic domain within a gene that undergoes a significant amount of splicing in both the mouse and the human suggests that this stretch of amino acids is functionally significant. The first evidence that more than one transcript is expressed from the Pcdh15 locus came from Northern Blot analysis [3]. [16]. Recently, several alternatively spliced transcripts of Pcdh15 were reported to be expressed in the mouse inner ear by Friedman and colleagues [15,16]. Four main classes of alternatively spliced transcripts were defined ; two of these transcripts (CD-2 and CD-3) encode new cytoplasmic domains and increase the number of exons associated with the Pcdh15 locus to 39 (these additional exons are not shown in figure 1; our numbering is based on the original report of 33 exons). Based on immunolocalization studies of c-terminal isoforms in hair cells, it has been reported that a Pcdh15 c-terminal isoform, labeled CD3, is a tip-link antigen [16]. It is also reported that the temporal and spatial expression patterns of the transcripts with different cytoplasmic domains vary in the developing hair cells compared to mature hair cells, thus it is speculated that different isoforms may have different functions [16]. CD-2 and CD-3 were also reported to be expressed in the human retina [16].
Ahmed et al.[6] describe the protein product of an mRNA transcript that has a novel start site in the middle of exon 21, resulting in an N-terminally truncated protein. Haywood-Watson et. al. reported the existence of several new splice isoforms, including the existence of mRNA transcripts which skipped exons 3, 4, or 7, and one which is devoid of exons 3 through 15[15]. In comparing the findings of this report to previous reports [15][16], minor product B in figure 5, which skips exons 2 and 4, is consistent with one of the isoforms described by Ahmed et al (exons 3 and 5); however, minor product A, which skips only exon 4 (exon 5 in Ahmed et al), and the alternative splicing which occurs internal to exon 33 (exon 34 in Ahmed) are unique. Clearly Pcdh15 is capable of forming multiple alternately spliced products with fewer cadherin repeats, allowing for the formation of multiple protein isoforms from one gene. All of the observations cited above demonstrate that regulation of protocadherin 15 is complex. Mouse mutants lacking specific splice variants and/or knockdown experiments in other animal models, such as zebra fish, are necessary to prove the biological relevance of splice variants of Pcdh15/PCDH15 to hair cell development and function.
To determine how transcription of Pcdh15 is controlled, both in silico and in vitro analysis were carried out. To identify CpG islands in the 5′ UTR of PCDH15, we used a high stringency standard as recommended by Jones and Takai [30]. Our analysis showed a potential CpG island located ∼2.9Kb upstream of exon 1. There is also a potential CpG island located ∼17Kb downstream of the last exon found under the same conditions (data not shown). Since it has been estimated that ∼50% of the mammalian promoters contain CpG islands [31], this finding suggests that we have identified a putative promoter region of Pcdh15. The 5′ UTR region also was searched for promoter regions using PromoterInspector (www.genomatix.de), but no consensus signal for transcription initiation such as a CAAT or TATAA box was present within 100 bases upstream of the of the transcription start site. However, a recent report by Carninci et. al. shows that the classical TATA box promoter architecture represents only a minority of the set of mammalian promoters in mice and humans [32].
To better understand the molecular mechanisms which determine the expression of Pcdh15, deletion mapping of the sequence encompassing the putative promoter region was undertaken in order to determine the existence of any cis-acting elements (Figure 6). A lacZ assay was utilized because it is a more sensitive assay compared to other assays (i.e. green fluorescent protein), and is therefore more likely to detect weak expression or small changes in promoter activity. The results were novel for this family of Protocadherins and indicate that, in addition to a probable promoter region upstream of Pcdh15, there are likely a number of suppressor sites, as well. As previously noted, cells transfected with plasmid D had the highest level of expression. This expression decreased substantially in the presence of any upstream segments beyond the 5.5 Kb upstream sequence that comprised the D plasmid, indicating probable suppressor elements in that region. Their existence was further substantiated by plasmids G-J which contained only the putative upstream suppressor region (10Kb-5.5Kb upstream of exon 1). These plasmids resulted in low level expression. Additionally, it is of note that as the segments decreased in size from the 5′ end relative to plasmid D (A-C), the activity of the Pcdh15 also decreased. It is possible that the upstream region that was unique to plasmid D contained the major promoter sequence, and that these other regions found in plasmids A-C contained only secondary promoter activity.
The findings in this paper show that both mouse and human protocadherin 15 genes have complex genomic structures and transcription control mechanisms. PCDH15 is ∼980,000 bps long and is comprised of 33 exons ranging in size from 9 to 2069 bps. Pcdh15 is 409,000 bps long and is also comprised of 33 exons of similar length. Additional exons were reported a few weeks before this paper was submitted [16], which further attests to the complexity of protocadherin 15 genes. Nevertheless, it is intriguing to note that roughly the same number of exons in human PCDH15 are embedded in a genomic sequence much larger than its mouse counterpart. The significance of this divergence is not clear. These findings raise several important questions: What is the utility of having this gene spread out over such a large segment of the genome? Both syndromic and non-syndromic mutations of PCDH15 have been reported. Does human PCDH15 show a different splice pattern in eye versus ear? Can splicing account for non-syndromic mutations (i.e. if an exon containing the mutation was skipped over in one tissue but not in the other)? What is the biological significance of the existence of multiple isotypes of Pcdh15/PCDH15, that result from variable exon splicing? For example, Ahmed et. al. reported that some of the spliced forms of the mouse Pcdh15 may code for proteins associated with the tip-link antigen while other spliced forms are expressed in the hair cell body [16]. Finally, does the promoter which potentially contains a CpG island and lacks TATA box sequences lend the gene to evolutionary plasticity as suggested by Carninci et. al. [32], and if so why? The elucidation of its genomic structure and regulation of RNA expression sets the stage for further molecular analysis of Pcdh15/PCDH15.
Acknowledgements
This research was supported by NIDCD grant DC05385 to KNA. NDM was supported by funds from Miami University Honors program. RJHS is partially supported by NIDCD (R01-DC02842). We thank P. James, D. Pennock, and S. Hoffman for critical reading of the manuscript.
Footnotes
Publisher's Disclaimer: This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final citable form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.
Database Information
World wide web address and accession numbers of sequence data used in this article are listed below.
EMBOSS,CpG Plot/CpG Report, http://www.ebi.ac.uk/Tools/ -Web program to determine CpG Islands
Ensembl Trace Server, http://trace.ensembl.org/ -Search engine to find matching mouse shotgun sequences
Genomatix, http://genomatix.gsf.de/ -Web program for promoter prediction
ISREC,Profile Scan, http://www.isrec.isb-sib.ch/ -Web program to predict protein motifs
Primer 3, Steve Rozen, Helen J.Skaletsky (1998), http://www-genome.wi.mit.edu/cgi-bin/primer/primer3_www.cgi -Web program to design primers
Standard Nucleotide Blast, http://www.ncbi.nlm.nih.gov/BLAST/ -Search engine to find matching genomic sequence
References
- 1.Suzuki ST. Protocadherins and diversity of the cadherin superfamily. J Cell Sci. 1996;109(Pt 11):2609–11. doi: 10.1242/jcs.109.11.2609. [DOI] [PubMed] [Google Scholar]
- 2.Suzuki ST. Recent progress in protocadherin research. Exp Cell Res. 2000;261:13–8. doi: 10.1006/excr.2000.5039. [DOI] [PubMed] [Google Scholar]
- 3.Alagramam KN, Murcia CL, Kwon HY, Pawlowski KS, Wright CG, Woychik RP. The mouse Ames waltzer hearing-loss mutant is caused by mutation of Pcdh15, a novel protocadherin gene. Nat Genet. 2001;27:99–102. doi: 10.1038/83837. [DOI] [PubMed] [Google Scholar]
- 4.Alagramam KN, Yuan H, Kuehn MH, Murcia CL, Wayne S, Srisailpathy CR, Lowry RB, Knaus R, Van Laer L, Bernier FP, Schwartz S, Lee C, Morton CC, Mullins RF, Ramesh A, Van Camp G, Hageman GS, Woychik RP, Smith RJ. Mutations in the novel protocadherin PCDH15 cause Usher syndrome type 1F. Hum Mol Genet. 2001;10:1709–18. doi: 10.1093/hmg/10.16.1709. [DOI] [PubMed] [Google Scholar]
- 5.Ahmed ZM, Riazuddin S, Bernstein SL, Ahmed Z, Khan S, Griffith AJ, Morell RJ, Friedman TB, Riazuddin S, Wilcox ER. Mutations of the protocadherin gene PCDH15 cause Usher syndrome type 1F. Am J Hum Genet. 2001;69:25–34. doi: 10.1086/321277. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 6.Ahmed ZM, Riazuddin S, Ahmad J, Bernstein SL, Guo Y, Sabar MF, Sieving P, Riazuddin S, Griffith AJ, Friedman TB, Belyantseva IA, Wilcox ER. PCDH15 is expressed in the neurosensory epithelium of the eye and ear and mutant alleles are responsible for both USH1F and DFNB23. Hum Mol Genet. 2003;12:3215–23. doi: 10.1093/hmg/ddg358. [DOI] [PubMed] [Google Scholar]
- 7.Bolz H, von Brederlow B, Ramirez A, Bryda EC, Kutsche K, Nothwang HG, Seeliger M, del CSCM, Vila MC, Molina OP, Gal A, Kubisch C. Mutation of CDH23, encoding a new member of the cadherin gene family, causes Usher syndrome type 1D. Nat Genet. 2001;27:108–12. doi: 10.1038/83667. [DOI] [PubMed] [Google Scholar]
- 8.Bork JM, Peters LM, Riazuddin S, Bernstein SL, Ahmed ZM, Ness SL, Polomeno R, Ramesh A, Schloss M, Srisailpathy CR, Wayne S, Bellman S, Desmukh D, Ahmed Z, Khan SN, Kaloustian VM, Li XC, Lalwani A, Riazuddin S, Bitner-Glindzicz M, Nance WE, Liu XZ, Wistow G, Smith RJ, Griffith AJ, Wilcox ER, Friedman TB, Morell RJ. Usher syndrome 1D and nonsyndromic autosomal recessive deafness DFNB12 are caused by allelic mutations of the novel cadherin-like gene CDH23. Am J Hum Genet. 2001;68:26–37. doi: 10.1086/316954. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 9.Di Palma F, Holme RH, Bryda EC, Belyantseva IA, Pellegrino R, Kachar B, Steel KP, Noben-Trauth K. Mutations in Cdh23, encoding a new type of cadherin, cause stereocilia disorganization in waltzer, the mouse model for Usher syndrome type 1D. Nat Genet. 2001;27:103–7. doi: 10.1038/83660. [DOI] [PubMed] [Google Scholar]
- 10.Siemens J, Kazmierczak P, Reynolds A, Sticker M, Littlewood-Evans A, Muller U. The Usher syndrome proteins cadherin 23 and harmonin form a complex by means of PDZ-domain interactions. Proc Natl Acad Sci U S A. 2002;99:14946–51. doi: 10.1073/pnas.232579599. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 11.Verpy E, Leibovici M, Zwaenepoel I, Liu XZ, Gal A, Salem N, Mansour A, Blanchard S, Kobayashi I, Keats BJ, Slim R, Petit C. A defect in harmonin, a PDZ domain-containing protein expressed in the inner ear sensory hair cells, underlies Usher syndrome type 1C. Nat Genet. 2000;26:51–5. doi: 10.1038/79171. [DOI] [PubMed] [Google Scholar]
- 12.Ouyang XM, Xia XJ, Verpy E, Du LL, Pandya A, Petit C, Balkany T, Nance WE, Liu XZ. Mutations in the alternatively spliced exons of USH1C cause non-syndromic recessive deafness. Hum Genet. 2002;111:26–30. doi: 10.1007/s00439-002-0736-0. [DOI] [PubMed] [Google Scholar]
- 13.Frank M, Kemler R. Protocadherins. Curr Opin Cell Biol. 2002;14:557–62. doi: 10.1016/s0955-0674(02)00365-4. [DOI] [PubMed] [Google Scholar]
- 14.Rouget-Quermalet V, Giustiniani J, Marie-Cardine A, Beaud G, Besnard F, Loyaux D, Ferrara P, Leroy K, Shimizu N, Gaulard P, Bensussan A, Schmitt C. Protocadherin 15 (PCDH15): a new secreted isoform and a potential marker for NK/T cell lymphomas. Oncogene. 2006;25:2807–11. doi: 10.1038/sj.onc.1209301. [DOI] [PubMed] [Google Scholar]
- 15.Haywood-Watson RJ, 2nd, Ahmed ZM, Kjellstrom S, Bush RA, Takada Y, Hampton LL, Battey JF, Sieving PA, Friedman TB. Ames Waltzer deaf mice have reduced electroretinogram amplitudes and complex alternative splicing of Pcdh15 transcripts. Invest Ophthalmol Vis Sci. 2006;47:3074–84. doi: 10.1167/iovs.06-0108. [DOI] [PubMed] [Google Scholar]
- 16.Ahmed ZM, Goodyear R, Riazuddin S, Lagziel A, Legan PK, Behra M, Burgess SM, Lilley KS, Wilcox ER, Riazuddin S, Griffith AJ, Frolenkov GI, Belyantseva IA, Richardson GP, Friedman TB. The tip-link antigen, a protein associated with the transduction complex of sensory hair cells, is protocadherin-15. J Neurosci. 2006;26:7022–34. doi: 10.1523/JNEUROSCI.1163-06.2006. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 17.Overduin M, Harvey TS, Bagby S, Tong KI, Yau P, Takeichi M, Ikura M. Solution structure of the epithelial cadherin domain responsible for selective cell adhesion. Science. 1995;267:386–9. doi: 10.1126/science.7824937. [DOI] [PubMed] [Google Scholar]
- 18.Shapiro L, Fannon AM, Kwong PD, Thompson A, Lehmann MS, Grubel G, Legrand JF, Als-Nielsen J, Colman DR, Hendrickson WA. Structural basis of cell-cell adhesion by cadherins. Nature. 1995;374:327–37. doi: 10.1038/374327a0. [DOI] [PubMed] [Google Scholar]
- 19.Wu Q, Zhang T, Cheng JF, Kim Y, Grimwood J, Schmutz J, Dickson M, Noonan JP, Zhang MQ, Myers RM, Maniatis T. Comparative DNA sequence analysis of mouse and human protocadherin gene clusters. Genome Res. 2001;11:389–404. doi: 10.1101/gr.167301. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 20.Altschul SF, Madden TL, Schaffer AA, Zhang J, Zhang Z, Miller W, Lipman DJ. Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res. 1997;25:3389–402. doi: 10.1093/nar/25.17.3389. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 21.Redies C, Vanhalst K, Roy F. delta-Protocadherins: unique structures and functions. Cell Mol Life Sci. 2005;62:2840–52. doi: 10.1007/s00018-005-5320-z. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 22.Sorkin BC, Gallin WJ, Edelman GM, Cunningham BA. Genes for two calcium-dependent cell adhesion molecules have similar structures and are arranged in tandem in the chicken genome. Proc Natl Acad Sci U S A. 1991;88:11545–9. doi: 10.1073/pnas.88.24.11545. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 23.Ringwald M, Baribault H, Schmidt C, Kemler R. The structure of the gene coding for the mouse cell adhesion molecule uvomorulin. Nucleic Acids Res. 1991;19:6533–9. doi: 10.1093/nar/19.23.6533. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 24.Miyatani S, Copeland NG, Gilbert DJ, Jenkins NA, Takeichi M. Genomic structure and chromosomal mapping of the mouse N-cadherin gene. Proc Natl Acad Sci U S A. 1992;89:8443–7. doi: 10.1073/pnas.89.18.8443. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 25.Hatta M, Miyatani S, Copeland NG, Gilbert DJ, Jenkins NA, Takeichi M. Genomic organization and chromosomal mapping of the mouse P-cadherin gene. Nucleic Acids Res. 1991;19:4437–41. doi: 10.1093/nar/19.16.4437. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 26.Huber P, Dalmon J, Engiles J, Breviario F, Gory S, Siracusa LD, Buchberg AM, Dejana E. Genomic structure and chromosomal mapping of the mouse VE-cadherin gene (Cdh5) Genomics. 1996;32:21–8. doi: 10.1006/geno.1996.0072. [DOI] [PubMed] [Google Scholar]
- 27.Di Palma F, Pellegrino R, Noben-Trauth K. Genomic structure, alternative splice forms and normal and mutant alleles of cadherin 23 (Cdh23) Gene. 2001;281:31–41. doi: 10.1016/s0378-1119(01)00761-2. [DOI] [PubMed] [Google Scholar]
- 28.Ben-Yosef T, Ness SL, Madeo AC, Bar-Lev A, Wolfman JH, Ahmed ZM, Desnick RJ, Willner JP, Avraham KB, Ostrer H, Oddoux C, Griffith AJ, Friedman TB. A mutation of PCDH15 among Ashkenazi Jews with the type 1 Usher syndrome. N Engl J Med. 2003;348:1664–70. doi: 10.1056/NEJMoa021502. [DOI] [PubMed] [Google Scholar]
- 29.Ouyang XM, Yan D, Du LL, Hejtmancik JF, Jacobson SG, Nance WE, Li AR, Angeli S, Kaiser M, Newton V, Brown SD, Balkany T, Liu XZ. Characterization of Usher syndrome type I gene mutations in an Usher syndrome patient population. Hum Genet. 2005;116:292–9. doi: 10.1007/s00439-004-1227-2. [DOI] [PubMed] [Google Scholar]
- 30.Jones PA, Takai D. The role of DNA methylation in mammalian epigenetics. Science. 2001;293:1068–70. doi: 10.1126/science.1063852. [DOI] [PubMed] [Google Scholar]
- 31.Antequera F, Bird A. Number of CpG islands and genes in human and mouse. Proc Natl Acad Sci U S A. 1993;90:11995–9. doi: 10.1073/pnas.90.24.11995. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 32.Carninci P, Sandelin A, Lenhard B, Katayama S, Shimokawa K, Ponjavic J, Semple CA, Taylor MS, Engstrom PG, Frith MC, Forrest AR, Alkema WB, Tan SL, Plessy C, Kodzius R, Ravasi T, Kasukawa T, Fukuda S, Kanamori-Katayama M, Kitazume Y, Kawaji H, Kai C, Nakamura M, Konno H, Nakano K, Mottagui-Tabar S, Arner P, Chesi A, Gustincich S, Persichetti F, Suzuki H, Grimmond SM, Wells CA, Orlando V, Wahlestedt C, Liu ET, Harbers M, Kawai J, Bajic VB, Hume DA, Hayashizaki Y. Genome-wide analysis of mammalian promoter architecture and evolution. Nat Genet. 2006;38:626–35. doi: 10.1038/ng1789. [DOI] [PubMed] [Google Scholar]