Skip to main content
American Journal of Human Genetics logoLink to American Journal of Human Genetics
. 2005 Dec 22;78(2):291–302. doi: 10.1086/500151

Single-Nucleotide Polymorphisms in NAGNAG Acceptors Are Highly Predictive for Variations of Alternative Splicing

Michael Hiller 1,,*, Klaus Huse 2,,*, Karol Szafranski 2, Niels Jahn 2, Jochen Hampe 3, Stefan Schreiber 3, Rolf Backofen 1, Matthias Platzer 2
PMCID: PMC1380236  PMID: 16400609

Abstract

Aberrant or modified splicing patterns of genes are causative for many human diseases. Therefore, the identification of genetic variations that cause changes in the splicing pattern of a gene is important. Elsewhere, we described the widespread occurrence of alternative splicing at NAGNAG acceptors. Here, we report a genomewide screen for single-nucleotide polymorphisms (SNPs) that affect such tandem acceptors. From 121 SNPs identified, we extracted 64 SNPs that most likely affect alternative NAGNAG splicing. We demonstrate that the NAGNAG motif is necessary and sufficient for this type of alternative splicing. The evolutionarily young NAGNAG alleles, as determined by the comparison with the chimpanzee genome, exhibit the same biases toward intron phase 1 and single–amino acid insertion/deletions that were already observed for all human NAGNAG acceptors. Since 28% of the NAGNAG SNPs occur in known disease genes, they represent preferable candidates for a more-detailed functional analysis, especially since the splice relevance for some of the coding SNPs is overlooked. Against the background of a general lack of methods for identifying splice-relevant SNPs, the presented approach is highly effective in the prediction of polymorphisms that are causal for variations in alternative splicing.


SNPs, as the most abundant form of genetic variation, contribute significantly to phenotypic individuality and disease susceptibility. SNPs are mostly biallelic and are therefore easy to assay once they are described. Given their abundance in the human genome (∼1 SNP every 300 bp [Ke et al. 2004]) and their ease of high-throughput typing, SNPs progressively replace microsatellites as first-choice genetic markers in association and linkage studies.

Much interest focuses on SNPs that are located in coding regions, since those SNPs may alter the protein sequence. However, SNPs can also influence splicing, which usually has a greater effect on the resulting protein than does the alteration of a single codon. Recently, splicing mutations have been suspected to be the most frequent cause of hereditary diseases (Lopez-Bigas et al. 2005). Accordingly, an increasing number of SNPs have been described that cause diseases by a change or disruption of the normal splicing pattern (for review, see Cartegni et al. [2002] and Garcia-Blanco et al. [2004]). These splice-relevant SNPs affect donor and acceptor splice sites, branch points, exonic as well as intronic splicing enhancers and silencers or alter important mRNA secondary structures. For example, the G allele of the silent coding SNP rs17612648 in the PTPRC gene that is associated with multiple sclerosis destroys an exonic splicing silencer and abolishes the skipping of exon 4 (Lynch and Weiss 2001), and the SNP rs2076530 in BTLN2 that is associated with sarcoidosis leads to the activation of a cryptic donor site and a cryptic donor splice site 4 nt upstream (Valentonyte et al. 2005). Since the impact of SNPs on splicing is hard to predict in silico and is difficult to analyze experimentally, silent or intronic SNPs that may cause a phenotype or a disease by changing splicing patterns are often not investigated (Pagani and Baralle 2004). Thus, novel approaches are urgently needed to identify splice-relevant SNPs.

Recently, we reported the widespread occurrence of subtle alternative splice events that insert or delete the sequence NAG (N denotes A, C, G, or T) in mRNA (Hiller et al. 2004). This happens if both AG alleles of a NAGNAG acceptor can be chosen by the spliceosome. We termed the upstream acceptor in this tandem motif the “E acceptor” and the downstream one the “I acceptor.” The products that arise from the use of E and I acceptors are called “E and I transcripts and proteins,” respectively. The consequences of NAG insertion/deletions (indels) in mRNAs for the respective protein sequences are highly diverse and comprise eight different single–amino acid (aa) indel events, the exchange of a dipeptide and an unrelated aa, or the creation/destruction of a stop codon. Tandem acceptors are conserved between human and mouse, and the use of E or I acceptors can be controlled in a tissue-specific manner. Our results concerning the frequency and tissue specificity were confirmed by others (Tadokoro et al. 2005). Furthermore, E/I protein isoforms have functional differences (Condorelli et al. 1994; Tadokoro et al. 2005), and the SNP rs1650232 within a NAGNAG acceptor is associated with respiratory-distress syndrome (Karinch et al. 1997).

Since NAGNAG acceptors occur in ∼30% of human genes, we were interested in finding SNPs that may affect this type of alternative splicing. By scanning the SNP annotation of the human reference sequence, we identified those SNPs and provide experimental evidence of respective variations in the alternative splicing patterns. In addition, we introduce a classification for NAGNAG acceptors, with respect to their splicing plausibility, to bring forward a highly effective approach for predicting splice-relevant SNPs.

Methods

Identification of SNPs Affecting NAGNAG Acceptors

We downloaded the human genome assembly from the UCSC Genome Browser (UCSC Human Genome Browser, hg17, May 2004) as well as from RefSeq (refGene.txt.gz, January 12, 2005) and SNP annotations (snp.txt.gz, January 9, 2005). From the transcripts, we extracted a list of unique genomic positions of acceptor sites. We used the genomic position of the acceptors to select those SNPs that overlap the first 3 nt of an exon or the last 6 nt of an intron. Then, we evaluated whether one of both AG alleles or one of the two Ns in the NAGNAG pattern is polymorphic. SNPs are the only type of polymorphisms that were considered.

To check whether a tandem acceptor is EST confirmed, we used BLAST with a search string of 30 nt from the upstream exon and 30 nt from the downstream exon—taking the nonannotated acceptor into account—against the human fraction of the dbEST database (December 2004) and against the mRNA sequences downloaded from GenBank (December 2004). At most, one mismatch or one gap was allowed.

Comparison with the Chimpanzee Genome

We downloaded the chimpanzee genome working draft assembly from UCSC Genome Browser (UCSC Chimpanzee Genome Browser, panTro1, November 2003). We compared human polymorphic sites with the chimpanzee sequence, using BLAST, with 101-nt queries consisting of one of the SNP alleles, as well as 50 nt upstream and 50 nt downstream. Only hits with at least 95% identity and no other mismatch in the −5…+5 context of the SNP were considered.

Null Model for Gain of NAGNAG Acceptors

Briefly described, we determined the ancestral allele variant for 2,439 SNPs that overlap an acceptor in the 9-nt context by comparing the genomic sequence context with the chimpanzee genome. In addition, we selected a set of 8,082 acceptor sites not affected by known SNPs. Then, the 2,439 SNPs were randomly assigned to one of those acceptors, given that the ancestral allele variant is present at the respective position. This position was replaced by the nonancestral allele, and we evaluated and counted the possible impact on a NAGNAG acceptor. More details are given in appendix A.

Experimental Verification of Alternative Splicing at Polymorphic NAGNAG Acceptors

Genomic DNA and cDNA from 12 whites were kindly provided by Gerd Birkenmeier (Leipzig) and were purified from whole blood by standard methods. First-strand cDNA was derived from oligo-dT primed reverse transcription.

For determination of the respective genotypes, ∼20 ng of genomic DNA was used to PCR amplify the regions of the respective SNP through use of Ready-To-Go PCR beads (Amersham). PCR conditions were 1 cycle of denaturation at 95°C for 30 s; followed by 38 cycles of denaturing at 92°C for 30 s, annealing at 59°C for 30 s, and extension at 72°C for 60 s; and 1 cycle of final extension at 72°C for 5 min. PCR products were purified by precipitation and were sequenced with the same primers used for PCR amplification by the dye terminator method by use of BigDye v3.1 (Applied Biosystems). To identify E and I transcripts, cDNA from the genotyped individuals was amplified using the same PCR conditions with transcript-specific primers.

For amplification of genomic DNA and subsequent sequencing of the resulting amplicons that correspond to SNPs listed in table 1, we used primers 5′-CAGCTACGGTTTGCTGAGAA-3′ and 5′-ACAGAGGGGACAGGGAGATT-3′ for genotyping rs2245425, 5′-GATTTTCCTGGAGGAGAGGG-3′ and 5′-CAAGTTCAAAGCAAGCCTCC-3′ for rs1558876, 5′-AGGAGGCGTGCTATCTGGTA-3′ and 5′-GTAGGAAGCCCTGGAGGAAG-3′ for rs2290647, 5′-GCCATTGAGTTGTCATCACC-3′ and 5′-ACCCATTAGCTTGGCAACAG-3′ for rs2275992, 5′-AAGAATGGCGTCCATTTCAC-3′ and 5′-TTTCTGATCCTTGGTGAGGG-3′ for rs4590242, and 5′-CCTTCAACCTCAATGACGAAA-3′ and 5′-CACAAAGGACTTGTCAGGGA-3′ for rs1152522. RT-PCR for transcript amplification was done with primers 5′-GAAAGCGCGTACTACCTTCG-3′ and 5′-AATCCCTGGATCTGGCCTTA-3′ for TOR1AIP1, 5′-AGGCTACAACCACCCTCCTT-3′ and 5′-ACTTCCCCCTTGACGAGTTT-3′ for KIAA1001, 5′-AGAGGAGGACAAGGAGGAGC-3′ and 5′-GAACAGCGTCTGTGTCTCCA-3′ for KIAA1533, 5′-GGACATCTGTTTCTCGCCAT-3′ and 5′-ATCCTTCCATCTCACAACGG-3′ for ZFP91 (GenBank accession number NM_170768), 5′-TCTTTCTTTTGTGGTGGGGA-3′ and 5′-TGTCAGGGACCCAGATCTTC-3′ for GABRR1, and 5′-TGCAGGACCAGAATAAAGCC-3′ and 5′-TATGGTCCCTTGGACTTTGC-3′ for C14orf105. For ZFP91 and TOR1AIP1, the amplicons obtained by RT-PCR from individuals with each of the possible genotypes were cloned into PCR2.1-TOPO (Invitrogen) and were propagated in Escherichia coli TOP10 cells, respectively. Plasmids were isolated from several isolated clones, and their inserts were sequenced using plasmid primers. SNPs exhibiting nonancestral plausible NAGNAGs without EST evidence were selected by high frequencies of the minor alleles rs1638152 (DTX2), rs5248 (CMA1), and rs17105087 (SLC25A21). Genomic primers used were for DTX2 (5′-TTTCCTCCTGGCAGCTTAGA-3′ and 5′-GCTGGGAGATGAAACCAAAG-3′), CMA1 (5′-GGCTCCAAGGGTGACTGTTA-3′ and 5′-CCCCACTTTCCCGTTTAACT-3′), and SCL25A21 (5′-AACTCCATGTCGTCCCAAAG-3′ and 5′-CAAAATCGTTTGTTCTTTGCC-3′). Transcript-specific primers were used for DTX2 (5′-CAGGCATGACGAGTGTTCTG-3′ and 5′-CACAGCTAGGGACCCGAT-3′) and CMA1 (5′-CCCTGCTGCTCTTTCTCTTG-3′ and 5′-ACACACCTGTTCTTCCCCAG-3′).

Table 1.

Correlation between Acceptor Genotypes and the Appearance of E and I Transcripts[Note]

Observations for Genotype
Homozygous NAGNAG
Heterozygous
Homozygous Non-NAGNAG
dbSNP ID Gene Symbol No. of Probands cDNA Transcripts No. of Probands cDNA Transcripts No. of Probands cDNA Transcript
rs2245425 TOR1AIP1a 3 E+I 6 E+I 2 I
rs2275992 ZFP91a 1 E+I 7 E+I 4 E
rs1558876 KIAA1001 0 6 E+I 6 E
rs2290647 KIAA1533 0 4 E+I 8 E
rs4590242 GABRR1 11 E+I 1 E+I 0
rs1152522 C14orf105 0 0 12 I

Note.— E+I indicates presence of both E and I transcripts; E indicates only E transcripts; I indicates only I transcripts.

a

See also figure 2.

Results

SNPs in NAGNAG Acceptors Influence Alternative Splicing

We extracted from the UCSC Human Genome Browser (hg17, May 2004) all annotated SNPs that are located within the last 6 nt of an intron or within the first 3 nt of an exon, given intron-exon boundaries from RefSeq transcripts. From these SNPs, we selected those that affect a NAGNAG acceptor. With respect to the human reference genome sequence, the alternative SNP allele can create or destroy a NAGNAG acceptor by affecting one of both AG alleles (fig. 1A and 1B). Since the nucleotide upstream of any acceptor AG is usually C or T (Stamm et al. 2000) and a change at this position is likely to alter alternative splicing at a tandem acceptor, we also considered SNPs at the N positions in an existing tandem (fig. 1C). We found a total of 137 NAGNAG-affecting SNPs (table 2). Aware of the uncertainty about the true nature of SNPs in segmental duplications (Fredman et al. 2004; Taudien et al. 2004), we excluded seven (5%) of the variations from further analysis. Our precaution was justified by genotyping SNP rs1638152 in 12 whites; we consistently found both alleles and both transcripts (DTX2 [GenBank accession numbers DQ082728 and DQ082730]), which is a strong indication for paralogous sequence variants and/or multisite variations (combinatorial P=.0003). Since dbSNP entries sometimes are the result of sequencing errors, we manually examined the trace data (if available) and excluded a further nine SNPs (7%). Thus, we considered a total of 121 bona fide SNPs affecting NAGNAG acceptors.

Figure 1.

Figure  1

Schematic illustration of how SNPs affect splicing at NAGNAG acceptors. A, SNP alleles at position −2, −1, +2, or +3 of a NAGNAG acceptor destroy this motif by affecting the E (left) or I (right) acceptor, thus preventing alternative splicing. B, SNP alleles at intron positions −5 and −4 can create a novel E acceptor (left) and, at exon positions +2 and +3, a novel I acceptor (right), thus yielding a NAGNAG motif. Acceptors at these alleles may allow alternative splicing, as indicated by the two transcripts (E transcript above; I transcript below). C, SNP alleles at position −3 or +1 of a NAGNAG acceptor can convert a plausible NAGNAG that allows alternative splicing (left) to an implausible one that allows only the expression of one transcript (right), or vice versa. Positions refer to a standard intron-exon boundary. H denotes A, C, or T; upper- and lowercase letters indicate exonic and intronic nucleotides, respectively; exonic nucleotides are boxed.

Table 2.

SNPs That Affect NAGNAG Acceptors[Note]

SNP Protein Impact of
dbSNP IDa Chromosome Position Heterozygosity Gene Symbol Gene Nameb RefSeq Exon Nucleotide Patternc Variationd Positione Intron Phasef NAGNAG Splicingg Coding SNPh EST/mRNAi
rs2297988 10 99108372 .456 KIAA0690 KIAA0690 NM_015179 33 AAGCAG|AAA→GAGCAG|AAA N 1 0 Insert Q
rs4149853 1 238338449 .044 EXO1 Exonuclease 1 NM_130398 3 CAGCAG|AAC→TAGCAG|AAC N 1 5′ UTR
rs2071558 12 52105734 .255 AMHR2 Anti-Mullerian hormone receptor, type II NM_020547 6 CAGCAG|GTA→TAGCAG|GTA N 1 0 Insert Q
rs16960071 15 45846061 .055 SEMA6D Sema domain, transmembrane domain (TM), and cytoplasmic domain, (semaphorin) 6D NM_020858 16 ATGAAG|GCT→AAGAAG|GCT AG 2 2 Insert R
rs9621415 22 30953580 SLC5A4 Solute carrier family 5 (low affinity glucose cotransporter), member 4 NM_014227 9 CGGCAG|GTC→CAGCAG|GTC AG 2 0 Insert Q
rs12042060k 1 182663302 FIBL-6 Hemicentin NM_031935 12 TTGCAG|AAC→TAGCAG|AAC AG 2 1 Insert A
rs2287800 16 25136015 AQP8 Aquaporin 8 NM_001169 2 CGGCAG|ATA→CAGCAG|ATA AG 2 0 Insert Q 1:0
rs1650232j 10 81309218 SFTPA2 Surfactant, pulmonary-associated protein A2 NM_006926 3 CTGCAG|GAG→CAGCAG|GAG AG 2 5′ UTR 16:0
rs3997775j 10 81361506 SFTPA1 Surfactant, pulmonary-associated protein A1 NM_005411 3 CTGCAG|GAG→CAGCAG|GAG AG 2 5′ UTR 2:0
rs2298847 16 55258795 .196 MT1G Metallothionein 1G NM_005950 2 TAGCAG|GTG→TTGCAG|GTG AG 2 1 Insert A 59:0
rs1622213j 1 1506424 ATAD3B ATPase family, AAA domain containing 3B NM_031921 9 CGGCAG|GTC→CAGCAG|GTC AG 2 0 Insert Q 1:0
rs4590242 6 89969928 .441 GABRR1 Gamma-aminobutyric acid (GABA) receptor, rho 1 NM_002042 2 TGGTAG|GCC→TAGTAG|GCC AG 2 2 Insert S 2:0
rs2521612 17 39686270 SLC4A1 Solute carrier family 4, anion exchanger, member 1 (erythrocyte membrane protein band 3, Diego blood group) NM_000342 17 CCGTAG|GCT→CAGTAG|GCT AG 2 2 Insert R
rs2271959 17 38978266 ETV4 ETS variant gene 4 (E1A enhancer binding protein, E1AF) NM_001986 3 TCGCAG|AAA→TAGCAG|AAA AG 2 0 Insert Q 3:0
rs3020724 10 104580737 CYP17A1 Cytochrome P450, family 17, subfamily A, polypeptide 1 NM_000102 8 CTGCAG|AGC→CAGCAG|AGC AG 2 1 Insert A
rs6285 4 46875357 GABRB1 Gamma-aminobutyric acid (GABA) A receptor, beta 1 NM_000812 3 CGGCAG|GGC→CAGCAG|GGC AG 2 1 Insert A
rs5248 14 24045654 .242 CMA1 Chymase 1, mast cell NM_001836 3 CAACAG|GTC→CAGCAG|GTC AG 3 2 Insert S
rs3025420 9 133551742 DBH Dopamine beta-hydroxylase (dopamine beta-monooxygenase) NM_000787 11 CACCAG|GTT→CAGCAG|GTT AG 3 2 Insert S
rs11466221 2 70653703 .021 TGFA Transforming growth factor, alpha NM_003236 2 CAACAG|GTA→CAGCAG|GTA AG 3 1 Insert A
rs363209 16 65412967 APPBP1 Amyloid beta precursor protein binding protein 1, 59kDa NM_003905 7 AAACAG|CAC→AAGCAG|CAC AG 3 2 Insert S
rs9644946 9 124765060 .178 GOLGA1 Golgi autoantigen, golgin subfamily a, 1 NM_002077 8 AAATAG|GAG→AAGTAG|GAG AG 3 0 Insert stop
rs2245425 1 176590101 .464 TOR1AIP1 Torsin A interacting protein 1 NM_015602 3 TAGCAG|TGA→TAACAG|TGA AG 3 1 Insert A 13:0
rs200925 10 38166650 ZNF248 Zinc finger protein 248 NM_021045 5 TAACAG|GGT→TAGCAG|GGT AG 3 1 Insert A
rs1438073 2 182954651 PDE1A Phosphodiesterase 1A, calmodulin-dependent NM_001003683 3 AAATAG|ACT→AAGTAG|ACT AG 3 2 Insert R
rs13251099 8 12853939 FLJ36980 NA NM_182598 2 AAATAG|GTC→AAGTAG|GTC AG 3 5′ UTR
rs1076555k 1 152043527 SCAMP3 Secretory carrier membrane protein 3 NM_005698 3 CAACAG|CCA→CAGCAG|CCA AG 3 0 Insert Q
rs2290609 3 3119508 IL5RA Interleukin 5 receptor, alpha NM_000564 5 CAACAG|TTT→CAGCAG|TTT AG 3 1 I versus TV
rs3816280 18 9578855 PPP4R1 Protein phosphatase 4, regulatory subunit 1 NM_005134 5 CAATAG|AAC→CAGTAG|AAC AG 3 1 Insert V
rs8176139 17 38505426 .011 BRCA1 Breast cancer 1, early onset NM_007304 8 GTTTAG|CAG→GTTGAG|CAG N 4 0 Delete Q 7:0
rs6591368 11 68591544 TPCN2 Two pore segment channel 2 NM_139075 8 TCTTAG|CAG→TCTCAG|CAG N 4 0 Delete Q
rs11042902 11 10612199 MRVI1 Murine retrovirus integration site 1 homolog NM_006069 2 AACCGG|CAG→AACCAG|CAG AG 5 2 NR versus K 1:1
rs1152522 14 57018133 .441 C14orf105 Chromosome 14 ORF 105 NM_018168 4 TCATAG|CAG→TCATGG|CAG AG 5 0 Delete Q 6:0
rs2307130 1 100028610 .475 AGL Amylo-1, 6-glucosidase, 4-alpha-glucanotransferase (glycogen debranching enzyme, glycogen storage disease type III) NM_000644 2 CTCTAG|AAG→CTCTGG|AAG AG 5 5′ UTR
rs9463545 6 49927803 CRISP1 Cysteine-rich secretory protein 1 NM_001131 3 TAACAG|AAG→TAACCG|AAG AG 5 0 Delete K
rs11567804 12 8104059 .109 C3AR1 Complement component 3a receptor 1 NM_004054 2 TTGCAG|AAG→TTGCAA|AAG AG 6 5′ UTR
rs13228988 7 69674388 AUTS2 Autism susceptibility candidate 2 NM_015570 8 CGATAG|CAG→CGATAA|CAG AG 6 2 Delete S 2:0
rs479984 8 109039673 MGC35555 NA NM_178565 5 TACTAG|AAG→TACTAA|AAG AG 6 1 Delete E
rs1044833 12 30754672 C1QDC1 C1q domain containing 1 NM_001002259 18 AAACAG|CAG→AAACAT|CAG AG 6 1 Delete A
rs2243187 1 203402743 .041 IL19 Interleukin 19 NM_153758 5 TCACAG|CAG→TCACAA|CAG AG 6 0 Delete Q
rs1833783k 19 54161349 FTL Ferritin, light polypeptide NM_000146 3 ATATAG|AAG→ATATAC|AAG AG 6 0 Delete K 11:0
rs12974798 19 7532374 NTE Neuropathy target esterase NM_006702 35 TCGCAG|GAG→TCGCAG|AAG N 7 0 Delete K E→K
rs17173698 10 89458933 .011 PAPSS2 3′-Phosphoadenosine 5′-phosphosulfate synthase 2 NM_004670 2 TTATAG|GAG→TTATAG|AAG N 7 0 Delete K E→K
rs2261015j 7 66007678 .448 RSAFD1 Radical S-adenosyl methionine and flavodoxin domains 1 NM_018264 12 CTCCAG|GAG→CTCCAG|TAG N 7 1 Delete V G→V
rs1804783 19 13186422 CACNA1A Calcium channel, voltage-dependent, P/Q type, alpha 1A subunit NM_023035 39 TTGCAG|GAG→TTGCAG|TAG N 7 1 Delete V G→V
rs3842776 22 42906589 PARVG Parvin, gamma NM_022141 4 TTCCAG|GAG→TTCCAG|CAG N 7 1 Delete A G→A
rs879022 2 79276118 REGL Regenerating islet-derived-like, pancreatic stone protein-like, pancreatic thread protein-like (rat) NM_006508 3 GGACAG|GAG→GGACAG|AAG N 7 1 Delete E G→E
rs12944821 17 33030504 AP1GBP1 AP1 gamma subunit binding protein 1 NM_007247 3 TTTCAG|CAG→TTTCAG|GAG N 7 1 Delete G A→G 5:0
rs3751353 13 23793134 .083 MGC48915 NA NM_178540 4 ATTTAG|GAG→ATTTAG|CAG N 7 1 Delete A G→A
rs4565430k 8 97376502 PTDSS1 Phosphatidylserine synthase 1 NM_014754 5 AAATAG|GAG→AAATAG|CAG N 7 0 Delete Q E→Q
rs2409496 21 33825733 GART Phosphoribosylglycinamide formyltransferase, phosphoribosylglycinamide synthetase, phosphoribosylaminoimidazole synthetase NM_175085 6 AATCAG|GAG→AATCAG|CAG N 7 0 Delete Q E→Q
rs1638152j 7 75774409 .476 DTX2 Deltex homolog 2 (Drosophila) NM_020892 8 TTCTAG|GAG→TTCTAG|AAG N 7 1 Delete E G→E
rs3775296 4 187372916 .268 TLR3 Toll-like receptor 3 NM_003265 2 CTACAG|CAG→CTACAG|AAG N 7 5′ UTR
rs1127307 19 14450378 RGS19IP1 Regulator of G-protein signaling 19 interacting protein 1 NM_202494 6 CAATAG|CGG→CAATAG|CAG AG 8 1 Delete A 8:0
rs1071716 9 35675135 TPM2 Tropomyosin 2 (beta) NM_213674 6 CCCCAG|CCG→CCCCAG|CAG AG 8 2 Delete S
rs11660370 18 19028411 CABLES1 Cdk5 and Abl enzyme substrate 1 NM_138375 3 TTTCAG|ATG→TTTCAG|AAG AG 8 2 Delete R C→S
rs2010657 22 23328476 GGT1 Gamma-glutamyltransferase 1 NM_013421 2 CCCCAG|CGG→CCCCAG|CAG AG 8 5′ UTR
rs2275992 11 58135000 .29 ZFP91 Zinc finger protein 91 homolog (mouse) NM_170768 5 TTTTAG|TAG→TTTTAG|TGG AG 8 2 Delete S S→G 7:0
rs10914468 1 31833299 .497 COL16A1 Collagen, type XVI, alpha 1 NM_001856 5 CTCCAG|AAG→CTCCAG|ACG AG 8 2 Delete R
rs4434604 8 67227167 TRIM55 Tripartite motif-containing 55 NM_033058 8 TACCAG|AAG→TACCAG|AGG AG 8 1 Delete E
rs751517 22 36345477 GGA1 Golgi associated, gamma adaptin ear containing, ARF binding protein 1 NM_013365 10 TTCCAG|CGG→TTCCAG|CAG AG 8 1 Delete A
rs11661706 18 5409875 EPB41L3 Erythrocyte membrane protein band 4.1-like 3 NM_012307 12 CTGCAG|AGG→CTGCAG|AAG AG 8 1 Delete E
rs2273431 14 51566157 .178 NID2 Nidogen 2 (osteonidogen) NM_007361 10 ATGCAG|AGG→ATGCAG|AAG AG 8 1 Delete E
rs3746373 20 61409288 .113 KIAA1510 NA NM_020882 6 CCCCAG|CCG→CCCCAG|CAG AG 8 1 Delete A
rs7862221 9 132811775 TSC1 Tuberous sclerosis 1 NM_000368 14 CTTCAG|AAG→CTTCAG|AGG AG 8 1 Delete E
rs2290647 19 40198569 .371 KIAA1533 KIAA1533 NM_020895 11 CTCCAG|CGG→CTCCAG|CAG AG 8 1 Delete A 35:0
rs11553436 22 41232623 SERHL Serine hydrolase-like NM_170694 11 CTCCAG|CGG→CTCCAG|CAG AG 8 3′ UTR
rs12760076 1 44354572 DMAP1 DNA methyltransferase 1 associated protein 1 NM_019100 8 TTGCAG|ATG→TTGCAG|AAG AG 8 0 Delete K M→K
rs2243603 20 1494911 .364 SIRPB1 Signal-regulatory protein beta 1 NM_006065 5 TTCCAG|AAG→TTCCAG|AAC AG 9 1 Delete E A→P
rs3765018 1 54880319 .235 FLJ46354 NA NM_198547 23 TCCCAG|AAG→TCCCAG|AAA AG 9 3′ UTR 1:0
rs1558876 17 63876286 .455 KIAA1001 NA NM_014960 6 TTTCAG|CAC→TTTCAG|CAG AG 9 2 Delete S T→S 2:0
rs9611697k 22 40713170 SEPT3 Septin 3 NM_145733 8 GTGCAG|CAG→GTGCAG|CAC AG 9 0 Delete Q Q→H
rs3014960 13 44975382 .221 COG3 Component of oligomeric golgi complex 3 NM_031431 14 ATACAG|CAG→ATACAG|CAA AG 9 0 Delete Q
rs11597439 10 104174943 CUEDC2 CUE domain containing 2 NM_024040 2 CTTCAG|AAG→CTTCAG|AAC AG 9 5′ UTR 2:0
rs11574323 8 31101968 .011 WRN Werner syndrome NM_000553 23 GGGTAG|AAT→GGGTAG|AAG AG 9 2 QS versus H I→S
rs2425068 20 33678137 CPNE1 Copine I NM_152927 16 CCCCAG|CAA→CCCCAG|CAG AG 9 0 Delete Q 7:0
rs3738833 1 36528845 LSM10 LSM10, U7 small nuclear RNA associated NM_032881 2 CCACAG|CAA→CCACAG|CAG AG 9 5′ UTR
rs17105087 14 36250436 .378 SLC25A21 Solute carrier family 25 (mitochondrial oxodicarboxylate carrier), member 21 NM_030631 7 CTGCAG|CAA→CTGCAG|CAG AG 9 0 Delete Q
rs9606756 22 29331414 .131 TCN2 Transcobalamin II, macrocytic anemia NM_000355 2 TCTAAG|AAA→TCTAAG|AAG AG 9 1 Delete E I→V 2:0
rs2292402 3 142435751 .299 ACPL2 Acid phosphatase-like 2 NM_152282 2 GAGCAG|TGA→GTGCAG|TGA AG 2 5′ UTR
rs6670368k 1 89138996 KAT3 Kynurenine aminotransferase III NM_001008661 8 GTGTAG|GTG→GAGTAG|GTG AG 2 0 Insert stop
rs10152092 14 23103212 AP1G2 Adaptor-related protein complex 1, gamma 2 subunit NM_003917 11 GACCAG|GTA→GAGCAG|GTA AG 3 2 Insert S
rs12905385 15 40808855 .262 CDAN1 Congenital dyserythropoietic anemia, type I NM_138477 18 ATACAG|GAG→ATATAG|GAG N 4 1 Delete G
rs2250205 20 33331338 .389 ITGB4BP Integrin beta 4 binding protein NM_181467 4 TGACAG|GAG→TGATAG|GAG N 4 1 Delete G
rs5703 1 71040104 .068 PTGER3 Prostaglandin E receptor 3 (subtype EP3) NM_198713 4 TAATAG|GAG→TAACAG|GAG N 4 1 RE versus K
rs9635649j X 169594 GTPBP6 GTP binding protein 6 (putative) NM_012227 2 TTTTAG|GAG→TTTCAG|GAG N 4 2 Delete R
rs3087402 2 99510924 .011 REV1L REV1-like (yeast) NM_016316 7 TTCTAG|GAG→TTCCAG|GAG N 4 1 Delete G
rs762605 1 9032117 SLC2A5 Solute carrier family 2 (facilitated glucose/fructose transporter), member 5 NM_003039 12 CCGCAG|GAG→CCGTAG|GAG N 4 0 Delete E
rs3765166 2 172519287 .406 SLC25A12 Solute carrier family 25 (mitochondrial carrier, Aralar), member 12 NM_003705 6 TCACAG|GAG→TCATAG|GAG N 4 0 Delete E
rs2174769 1 37675452 .378 SNIP1 Smad nuclear interacting protein NM_024700 3 TTTTAG|GAG→TTTCAG|GAG N 4 0 Delete E
rs9866111 3 101857430 .469 GPR128 G protein-coupled receptor 128 NM_032787 13 ACATAG|GAG→ACACAG|GAG N 4 1 Delete G
rs10169344 2 113048652 .341 POLR1B Polymerase (RNA) I polypeptide B, 128 kDa NM_019014 15 TCCTAG|GAG→TCCCAG|GAG N 4 2 KS versus N
rs4933199 10 92621451 RPP30 Ribonuclease P/MRP 30 kDa subunit NM_006413 2 TTATAG|GAG→TTACAG|GAG N 4 5′ UTR
rs3821010 2 178508162 .375 PDE11A Phosphodiesterase 11A NM_016953 8 CCATAG|GAG→CCACAG|GAG N 4 1 Delete G
rs1046617 20 61844130 SLC2A4RG SLC2A4 regulator NM_020062 6 CCGCAG|GAG→CCGGAG|GAG N 4 2 Delete R
rs7279250 21 45539120 C21orf86 Chromosome 21 ORF 86 NM_153454 2 GACGAG|GAG→GACGTG|GAG AG 5 5′ UTR
rs12883949 14 102465744 AMN Amnionless homolog (mouse) NM_030943 8 CCGCAG|GAG→CCGCAA|GAG AG 6 1 Delete G
rs2306949 4 71520483 MUC7 Mucin 7, salivary NM_152291 2 TCCCAG|GAG→TCCCAA|GAG AG 6 5′ UTR 1:0
rs10981449k 9 112415891 KIAA1958 KIAA1958 NM_133465 2 CTTTAG|GAG→CTTTAA|GAG AG 6 5′ UTR
rs11591994 10 98072403 DNTT Deoxynucleotidyltransferase, terminal NM_004088 5 CATTAG|GAG→CATTAC|GAG AG 6 0 Delete E
rs3966262 22 23330878 GGT1 Gamma-glutamyltransferase 1 NM_013421 5 CTTCAG|GAG→CTTCAA|GAG AG 6 5′ UTR
rs6111953k 20 18355804 C20orf12 Chromosome 20 ORF 12 NM_018152 9 GTTAAG|GAG→GTTAAA|GAG AG 6 0 Delete E
rs11752742 6 64049085 GLULD1 Glutamate-ammonia ligase (glutamine synthase) domain containing 1 NM_016571 4 CTTTAG|GAG→CTTTAA|GAG AG 6 0 Delete E
rs193227 15 32421602 NOLA3 Nucleolar protein family A, member 3 (H/ACA small nucleolar RNPs) NM_018648 2 GAGCAG|AAA→GAGCAA|AAA AG 6 0 Insert Q
rs1509545 15 79206987 C15orf26 Chromosome 15 ORF 26 NM_173528 2 ATTAAG|GTG→ATTAAG|GAG AG 8 5′ UTR
rs1140420 15 61804764 HERC1 HECT (homologous to the E6-AP (UBE3A) carboxyl terminus) domain and RCC1 (CHC1)-like domain (RLD) 1 NM_003922 18 CTTTAG|GAG→CTTTAG|GGG AG 8 1 Delete G
rs2272238 12 32751569 .017 DNM1L Dynamin 1-like NM_012062 3 TTCCAG|GGG→TTCCAG|GAG AG 8 1 Delete G
rs11781386 8 141484978 T1 NA NM_031466 6 TTGCAG|GTG→TTGCAG|GAG AG 8 1 Delete G
rs1130329 7 4885795 .478 LOC389458 NA NM_203393 3 CCCCAG|GAG→CCCCAG|GCG AG 8 0 Delete E E→A
rs13405053 2 54274242 ACYP2 Acylphosphatase 2, muscle type NM_138448 2 ATACAG|GTG→ATACAG|GAG AG 8 1 Delete G
rs9272863 6 32718708 HLA-DQA1 Major histocompatibility complex, class II, DQ alpha 1 NM_002122 5 TTGCAG|GTG→TTGCAG|GAG AG 8 3′ UTR
rs4049844 22 23340850 GGT1 Gamma-glutamyltransferase 1 NM_005265 6 CCCCAG|GGG→CCCCAG|GAG AG 8 1 Delete G
rs4024419 7 151373558 MLL3 Myeloid/lymphoid or mixed-lineage leukemia 3 NM_170606 15 TCCTAG|GGG→TCCTAG|GAG AG 8 0 Delete E G→E
rs2236215 1 15637721 .463 KIAA0962 NA NM_015291 12 TTTTAG|GAG→TTTTAG|GCG AG 8 2 Delete R
rs5744948 12 131829664 .012 POLE Polymerase (DNA directed), epsilon NM_006231 37 ACCCAG|GAG→ACCCAG|GCG AG 8 0 Delete E E→A
rs4994616 1 148360461 TUFT1 Tuftelin 1 NM_020127 9 ACCCAG|GAG→ACCCAG|GCG AG 8 0 Delete E E→A
rs1136881j 6_hla_hap1 61968 HLA-DRB3 Major histocompatibility complex, class II, DR beta 3 NM_022555 4 TCTCAG|GAG→TCTCAG|GTG AG 8 1 RA versus T R→S
rs3209388 9 19366385 RPS6 Ribosomal protein S6 NM_001010 6 GTTTAG|GAG→GTTTAG|GGG AG 8 0 Delete E E→G
rs12816732 12 128219086 KIAA1944 NA NM_133448 5 TTCCAG|GAG→TTCCAG|GAA AG 9 0 Delete E
rs17036879 3 12535560 .055 TSEN2 tRNA splicing endonuclease 2 homolog (SEN2, S. cerevisiae) NM_025265 8 CTTTAG|GAG→CTTTAG|GAA AG 9 0 Delete E
rs1132591 19 59514796 LILRA5 Leukocyte immunoglobulin-like receptor, subfamily A (with TM domain), member 5 NM_181879 5 TCCTAG|GAT→TCCTAG|GAG AG 9 1 Delete G F→V
rs7182782 15 61729175 HERC1 HECT (homologous to the E6-AP (UBE3A) carboxyl terminus) domain and RCC1 (CHC1)-like domain (RLD) 1 NM_003922 54 TTTTAG|GAG→TTTTAG|GAA AG 9 1 Delete G G→R
rs3734334 6 36898850 .012 CPNE5 Copine V NM_020939 2 TTTCAG|GAA→TTTCAG|GAG AG 9 2 Delete R N→S
rs3816208 8 97683801 .115 SDC2 Syndecan 2 (heparan sulfate proteoglycan 1, cell surface-associated, fibroglycan) NM_002998 3 TTCCAG|GAG→TTCCAG|GAA AG 9 1 Delete G A→T
rs4150167 16 82771185 .026 TAF1C TATA box binding protein (TBP)-associated factor, RNA polymerase I, C, 110 kDa NM_139353 14 CAGGAG|AAG→CAGGAG|AAA AG 9 1 Delete G / Delete GE G→R
rs6836935 4 187918076 FAT FAT tumor suppressor homolog 1 (Drosophila) NM_005245 11 TTCCAG|GAA→TTCCAG|GAG AG 9 1 Delete G N→D
rs2843414 1 199126353 PPP1R12B Protein phosphatase 1, regulatory (inhibitor) subunit 12B NM_002481 4 CTGTAG|GAG→CTGTAG|GAA AG 9 1 Delete G V→I
rs9474034 6 51605478 PKHD1 Polycystic kidney and hepatic disease 1 (autosomal recessive) NM_138694 65 CCACAG|GAG→CCACAG|GAA AG 9 1 Delete G V→I
rs1040310 6 32115304 CYP21A2 Cytochrome P450, family 21, subfamily A, polypeptide 2 NM_000500 5 TGGCAG|GAC→TGGCAG|GAG AG 9 0 Delete E D→E
rs2854741 22 40861902 .246 CYP2D7P1 Cytochrome P450, family 2, subfamily D, polypeptide 7 pseudogene 1 NM_001002910 7 ACACAG|GAC→ACACAG|GAG AG 9 2 Delete R T→R
rs2156634 11 120281211 .191 GRIK4 Glutamate receptor, ionotropic, kainate 4 NM_014619 11 CTGCAG|GAG→CTGCAG|GAA AG 9 0 Delete E
rs2228112 5 127497758 .449 SLC12A2 Solute carrier family 12 (sodium/potassium/chloride transporters), member 2 NM_001046 6 TTCTAG|GAA→TTCTAG|GAG AG 9 0 Delete E
rs1152888 12 64891495 .327 IRAK3 Interleukin-1 receptor-associated kinase 3 NM_007199 5 CCTAAG|GAA→CCTAAG|GAG AG 9 1 Delete G I→V
rs4822258 22 41780029 .242 TTLL1 Tubulin tyrosine ligase-like family, member 1 NM_001008572 9 CACCAG|GAG→CACCAG|GAA AG 9 0 Delete E
rs1654551 19 56104480 KLK4 Kallikrein 4 (prostase, enamel matrix, prostate) NM_004917 2 CCTCAG|GAT→CCTCAG|GAG AG 9 1 Delete G S→A
rs7344530k 20 20119988 C20orf26 Chromosome 20 ORF 26 NM_015585 15 TTTCAG|GAT→TTTCAG|GAG AG 9 0 Delete E D→E
rs3179757 19 18190846 PDE4C Phosphodiesterase 4C, cAMP-specific (phosphodiesterase E1 dunce homolog, Drosophila) NM_000923 9 TCTTAG|GAG→TCTTAG|GAA AG 9 0 Delete E
rs3815354 17 9763745 GAS7 Growth arrest-specific 7 NM_201433 12 TTCCAG|GAG→TTCCAG|GAC AG 9 1 Delete G D→H

Note.— Data are from UCSC Human Genome Browser SNP source download January 9, 2005).

a

Plausible NAGNAG acceptors are shown in bold italics; implausible NAGNAG acceptors are shown in italics.

b

NA = no approved gene name.

c

9-nt acceptor context (−6…+3); reference genome sequence is on the left, and alternative SNP allele is on the right.

d

AG = one of both AG affected; N = one N in the NAGNAG pattern is affected.

e

Position of the SNP in the 9-nt acceptor context (from 1 = intron position −6 to 9 = exon position +3).

f

Intron phase of the intron or 5′/3′ UTR.

g

Impact of alternative splicing on the protein sequence.

h

aa change of missense SNPs.

i

Number of ESTs/mRNAs confirming the NAGNAG splice event alternative to the given RefSeq.

j

UCSC Human Genome Browser paralog warning; therefore, excluded from further analysis and table 6.

k

dbSNP entry is based on a sequencing error; therefore, excluded from further analysis and table 6.

Searching dbEST (December 2004), we obtained confirmation for alternative splicing at 16% (19 of 121) of these tandem acceptors. However, this percentage must be considered a lower bound. In addition to the general limitations of an EST-based evaluation of alternative splicing (insufficient EST coverage, especially for tandem acceptors that are spliced in a tissue-specific manner), the allele frequencies of the NAGNAG alleles and populational biases in EST sampling introduce further constrictions. Noteworthy, 18 (95%) of the 19 confirmed tandem acceptors match the consensus HAGHAG (H denotes A, C, or T). Thus, 26% of the 68 polymorphic HAGHAGs are EST confirmed, whereas only 1.9% of the 53 acceptors carrying G at one or both variable positions of the NAGNAG motif are EST supported. This is in line with our previous genomewide analysis, in which 31% of the HAGHAGs and only 1.7% of the remaining NAGNAGs were found to be experimentally confirmed (see table 1 of Hiller et al. [2004]). On the basis of these differences in the degree of confirmation by mRNA and EST data, we propose to subdivide all tandem acceptors into “plausible” (HAGHAG) and “implausible” (GAGHAG, HAGGAG, or GAGGAG) acceptors. Further support for this classification comes from the genomewide observation that all plausible NAGNAGs have the same bias toward intron phase 1, as described elsewhere (Hiller et al. 2004) for experimentally confirmed NAGNAGs, whereas the introns with implausible tandem acceptors are not biased toward phase 1 (table 3).

Table 3.

Phase Distribution of Human Introns and NAGNAG Acceptors[Note]

No. (%) of Introns by Phase
Intron Class 0 1 2
Confirmed NAGNAGsa,b 349 (39.8) 379 (43.2) 150 (17.0)
Plausible NAGNAGsb 1,111 (42.5) 1,099 (42.0) 405 (15.5)
Implausible NAGNAGsb 2,568 (54.5) 1,466 (31.1) 677 (14.4)
All intronsc (46) (33) (21)

Note.— Only NAGNAGs that are located upstream of a coding exon are considered.

a

EST/mRNA confirmed.

b

From table 1 of Hiller et al. (2004).

c

Genomewide frequencies (Long and Deutsch 1999).

Accordingly, 68 (56%) of the 121 SNPs affect a plausible NAGNAG. However, four of those convert a plausible into another plausible NAGNAG, which has presumably no drastic consequence for NAGNAG splicing, even though we cannot exclude the possibility of changes in the ratio of E to I transcripts or of changes in tissue specificity. Thus, we consider the remaining 64 (53%) SNPs as relevant for NAGNAG splicing (table 4).

Table 4.

SNPs Affecting Plausible NAGNAG Acceptors[Note]

dbSNP ID Gene Symbol RefSeq ID Exon Nucleotide Patterna mRNA/ESTb Intron Phasec Protein Impactd Chimpanzeee Human versus Chimpanzeef
rs2307130 AGL NM_000644 2 CTCTAG|AAG→CTCTGG|AAG 5′ UTR TAGAAG  Loss 
rs12944821 AP1GBP1 NM_007247 3 TTTCAG|CAG→TTTCAG|GAG 12:5 1 Delete G CAGCAG Loss
rs363209 APPBP1 NM_003905 7 AAACAG|CAC→AAGCAG|CAC 2 Insert S AAACAG Gain
rs2287800 AQP8 NM_001169 2 CGGCAG|ATA→CAGCAG|ATA 1:14 0 Insert Q CGGCAG Gain
rs13228988 AUTS2 NM_015570 8 CGATAG|CAG→CGATAA|CAG 4:2 2 Delete S TAGCAG Loss
rs8176139 BRCA1h NM_007304 8 GTTTAG|CAG→GTTGAG|CAG 26:7 0 Delete Q TAGCAG Loss
rs1152522 C14orf105 NM_018168 4 TCATAG|CAG→TCATGG|CAG 3:6 0 Delete Q TAGCAG Loss
rs1044833 C1QDC1h NM_001002259 18 AAACAG|CAG→AAACAT|CAG 1 Delete A CAGCAG Loss
rs11567804 C3AR1 NM_004054 2 TTGCAG|AAG→TTGCAA|AAG 5′ UTR CAGAAG Loss
rs11660370 CABLES1 NM_138375 3 TTTCAG|ATG→TTTCAG|AAG 2 Delete R CAGATG Gain
rs1804783 CACNA1A NM_023035 39 TTGCAG|GAG→TTGCAG|TAG
1 Delete V CAGGAG Gain
rs5248 CMA1 NM_001836 3 CAACAG|GTC→CAGCAG|GTC 15:21g 2 Insert S CAACAG Gain
rs3014960 COG3 NM_031431 14 ATACAG|CAG→ATACAG|CAA 0 Delete Q CAGCAA Gain
rs10914468 COL16A1 NM_001856 5 CTCCAG|AAG→CTCCAG|ACG 2 Delete R CAGACG Gain
rs2425068 CPNE1 NM_152927 16 CCCCAG|CAA→CCCCAG|CAG 79:7 0 Delete Q CAGCAA Gain
rs9463545 CRISP1 NM_001131 3 TAACAG|AAG→TAACCG|AAG 0 Delete K CAGAAG Loss
rs11597439 CUEDC2 NM_024040 2 CTTCAG|AAG→CTTCAG|AAC 49:2 5′ UTR CAGAAG Loss
rs3020724 CYP17A1 NM_000102 8 CTGCAG|AGC→CAGCAG|AGC 1 Insert A CTGCAG Gain
rs3025420 DBH NM_000787 11 CACCAG|GTT→CAGCAG|GTT 2 Insert S CACCAG Gain
rs12760076 DMAP1 NM_019100 8 TTGCAG|ATG→TTGCAG|AAG 0 Delete K CAGATG Gain
rs11661706 EPB41L3 NM_012307 12 CTGCAG|AGG→CTGCAG|AAG 1 Delete E CAGAGG Gain
rs2271959 ETV4 NM_001986 3 TCGCAG|AAA→TAGCAG|AAA 3:17 0 Insert Q TCGCAG Gain
rs13251099 FLJ36980 NM_182598 2 AAATAG|GTC→AAGTAG|GTC 5′ UTR AAATAG Gain
rs3765018 FLJ46354 NM_198547 23 TCCCAG|AAG→TCCCAG|AAA 11:1 3′ UTR CAGAAA Gain
rs6285 GABRB1 NM_000812 3 CGGCAG|GGC→CAGCAG|GGC 1 Insert A NA NA
rs4590242 GABRR1 NM_002042 2 TGGTAG|GCC→TAGTAG|GCC 2:2 2 Insert S TAGTAG Loss
rs2409496 GART NM_175085 6 AATCAG|GAG→AATCAG|CAG 0 Delete Q CAGGAG Gain
rs751517 GGA1 NM_013365 10 TTCCAG|CGG→TTCCAG|CAG 1 Delete A NA NA
rs2010657 GGT1h NM_013421 2 CCCCAG|CGG→CCCCAG|CAG 5′ UTR CAGCGG Gain
rs9644946 GOLGA1 NM_002077 8 AAATAG|GAG→AAGTAG|GAG 0 Insert stop AAATAG Gain
rs2243187 IL19 NM_153758 5 TCACAG|CAG→TCACAA|CAG 0 Delete Q CAGCAG Loss
rs2290609 IL5RA NM_000564 5 CAACAG|TTT→CAGCAG|TTT 1 I versus TV CAACAG Gain
rs2297988 KIAA0690 NM_015179 33 AAGCAG|AAA→GAGCAG|AAA 0 Insert Q GAGCAG Gain
rs1558876 KIAA1001 NM_014960 6 TTTCAG|CAC→TTTCAG|CAG 10:2 2 Delete S CAGCAG Loss
rs3746373 KIAA1510 NM_020882 6 CCCCAG|CCG→CCCCAG|CAG 1 Delete A CAGCCG Gain
rs2290647 KIAA1533 NM_020895 11 CTCCAG|CGG→CTCCAG|CAG 34:35 1 Delete A CAGCGG Gain
rs3738833 LSM10 NM_032881 2 CCACAG|CAA→CCACAG|CAG 5′ UTR CAGCAA Gain
rs479984 MGC35555 NM_178565 5 TACTAG|AAG→TACTAA|AAG 1 Delete E TAGAAG Loss
rs3751353 MGC48915 NM_178540 4 ATTTAG|GAG→ATTTAG|CAG 1 Delete A TAGGAG Gain
rs11042902 MRVI1 NM_006069 2 AACCGG|CAG→AACCAG|CAG 2:2 2 NR versus K CAGCAG Loss
rs2298847 MT1G NM_005950 2 TAGCAG|GTG→TTGCAG|GTG 59:14 1 Insert A TTGCAG Gain
rs2273431 NID2 NM_007361 10 ATGCAG|AGG→ATGCAG|AAG 1 Delete E CAGAGG Gain
rs12974798 NTE NM_006702 35 TCGCAG|GAG→TCGCAG|AAG 0 Delete K CAGGAG Gain
rs17173698 PAPSS2 NM_004670 2 TTATAG|GAG→TTATAG|AAG 0 Delete K TAGGAG Gain
rs3842776 PARVG NM_022141 4 TTCCAG|GAG→TTCCAG|CAG 1 Delete A CAGGAG Gain
rs1438073 PDE1A NM_001003683 3 AAATAG|ACT→AAGTAG|ACT 2 Insert R AAATAG Gain
rs3816280 PPP4R1 NM_005134 5 CAATAG|AAC→CAGTAG|AAC 1 Insert V CAATAG Gain
rs879022 REGL NM_006508 3 GGACAG|GAG→GGACAG|AAG 1 Delete E CAGGAG Gain
rs1127307 RGS19IP1 NM_202494 6 CAATAG|CGG→CAATAG|CAG 87:8 1 Delete A NA NA
rs16960071 SEMA6D NM_020858 16 ATGAAG|GCT→AAGAAG|GCT 2 Insert R ATGAAG Gain
rs11553436 SERHL NM_170694 11 CTCCAG|CGG→CTCCAG|CAG 3′ UTR CAGCGG Gain
rs2243603 SIRPB1 NM_006065 5 TTCCAG|AAG→TTCCAG|AAC 1 Delete E CAGAAC Gain
rs17105087 SLC25A21 NM_030631 7 CTGCAG|CAA→CTGCAG|CAG 0 Delete Q CAGCAA Gain
rs2521612 SLC4A1 NM_000342 17 CCGTAG|GCT→CAGTAG|GCT 2 Insert R CCGTAG Gain
rs9621415 SLC5A4 NM_014227 9 CGGCAG|GTC→CAGCAG|GTC 0 Insert Q CGGCAG Gain
rs9606756 TCN2 NM_000355 2 TCTAAG|AAA→TCTAAG|AAG 62:2 1 Delete E AAGAAA Gain
rs11466221 TGFAh NM_003236 2 CAACAG|GTA→CAGCAG|GTA 1 Insert A CAACAG Gain
rs2245425 TOR1AIP1h NM_015602 3 TAGCAG|TGA→TAACAG|TGA 13:41 1 Insert A TAGCAG Loss
rs1071716 TPM2 NM_213674 6 CCCCAG|CCG→CCCCAG|CAG 2 Delete S CAGTCG Gain
rs4434604 TRIM55 NM_033058 8 TACCAG|AAG→TACCAG|AGG 1 Delete E CAGAAG Loss
rs7862221 TSC1 NM_000368 14 CTTCAG|AAG→CTTCAG|AGG 1 Delete E CAGAAG Loss
rs11574323 WRN NM_000553 23 GGGTAG|AAT→GGGTAG|AAG 2 QS versus H TAGAAT Gain
rs2275992 ZFP91h NM_170768 5 TTTTAG|TAG→TTTTAG|TGG 31:7 2 Delete S TAGTAG Loss
rs200925 ZNF248 NM_021045 5 TAACAG|GGT→TAGCAG|GGT 1 Insert A TAACAG Gain

Note.— NA = sequence context not available in panTro1.

a

9-nt acceptor context; reference genome sequence is on the left, and SNP allele is on the right. Polymorphic position is shown in bold italics.

b

Number of mRNA and ESTs that match the E:I transcripts (shown only if both transcripts are EST confirmed).

c

Phase of the intron or 5′/3′ UTR.

d

Impact of alternative NAGNAG splicing on the protein sequence.

e

Chimpanzee sequence orthologous to human NAGNAG.

f

Gain = plausible NAGNAG in one of the human alleles and no or implausible NAGNAG in chimpanzee; loss = no or implausible NAGNAG in one of the human alleles and plausible NAGNAG in chimpanzee.

g

Experimentally confirmed within the present study.

h

Experimentally confirmed elsewhere (Tadokoro et al. 2005).

Cases of SNPs that comprise NAGNAG-acceptor and non-NAGNAG–acceptor alleles represent knockout experiments made by nature. We took this opportunity to investigate the assumed correlation between NAGNAG-acceptor genotypes and the appearance of E and I transcripts. Such a study seemed reasonable, since, so far, it has been performed in artificial splicing systems only (Tadokoro et al. 2005). We selected six SNPs with a heterozygosity of >0.2 that affect EST-confirmed HAGHAG acceptors for genotyping and detection of transcript forms. In two cases, we did not find either genotypes with at least one NAGNAG allele or genotypes that are homozygous for the non-NAGNAG allele. In the remaining four cases, we consistently observed E and I transcripts in cells with at least one HAGHAG allele, whereas cells that do not have a HAGHAG acceptor allele produced only one transcript (table 1). This strict correlation between NAGNAG alleles and alternative splicing is illustrated for ZFP91 and TOR1AIP1 in figure 2. These results confirm that NAGNAG motifs are necessary for this type of alternative splicing.

Figure 2.

Figure  2

SNPs that affect plausible NAGNAG acceptors as knockout experiments made by nature. A, Schematic representation of the nomenclature of NAGNAG acceptors (left) and transcripts (right). B, SNP rs2245425 affecting the E acceptor of TOR1AIP1 exon 3 leads to the exclusive expression of the I transcript from the A allele (NAGNAG position −4; for numbering scheme, refer to fig. 1). C, SNP rs2275992 affecting the I acceptor of ZFP91 exon 5 leads to the exclusive expression of the E transcript from the G allele (position −2). Homozygous NAGNAG allele (a), heterozygous (b), and homozygous non-NAGNAG allele (c) are shown as genomic with genotypes (left); cDNA with E:I transcript ratio determined by counting subcloned and sequenced RT-PCR fragments (right). The asterisk (*) denotes E transcripts that can be assigned to the SNP alleles in the I acceptor (A=15; G=72).

Next, we asked whether NAGNAG motifs created by the nonancestral SNP alleles are also sufficient for alternative splicing. With regard to the human reference sequence, in 36 (56%) of 64 cases, a novel NAGNAG is created; in 18 (28%), a known NAGNAG is destroyed by affecting an AG; and in 10 (16%), the N positions are changed. Since the appearance of a SNP allele in the current human genome build is rather random and does not reflect either the relative allele frequency in a defined population or its evolutionary history, the best reference for the question of gain versus loss of NAGNAG acceptors is the UCSC Chimpanzee Genome Browser (panTro1, November 2003). When the sequence context of the 64 plausible NAGNAG-affecting SNPs is compared, for 61 (95%), the orthologous chimpanzee nucleotide is identical to one of both human alleles, which we therefore consider the ancestral one (Watanabe et al. 2004). In 43 cases, the plausible NAGNAG is gained (nonancestral), and, in 18 cases, it is lost (ancestral). Consistent with our assumption that novel plausible NAGNAGs are very likely functional, we found EST evidence of alternative splicing in 16% (7 of 43) (table 4). To provide further experimental support that respective SNP alleles enable alternative NAGNAG splicing, we selected two nonancestral plausible NAGNAGs without EST evidence. As expected, in leukocytes of individuals heterozygous or homozygous for the respective tandem allele of rs5248, we observed the expression of E and I transcripts (GenBank accession numbers DQ082727 and DQ082729) in the ratios 4:14 and 11:7, respectively (table 4). In the case of rs17105087, we were unable to identify the nonancestral allele in our white population sample. By analyzing the human-chimpanzee genomic sequence context of the eight confirmed nonancestral NAGNAG alleles, we found three cases in which both genomes are identical in a long range (rs2287800 [−140/+123 identical nucleotides], rs3765018 [−130/+95 nt], and rs2290647 [−105/+70 nt]). Since most splice enhancers function only within a distance of <100 nt from the affected splice site (Schaal and Maniatis 1999), these findings suggest that NAGNAG motifs are sufficient for alternative splicing in the context of a previously non-NAGNAG acceptor.

Evolutionary Aspects of SNPs in NAGNAG Acceptors

At first glance, surprisingly, the large majority (43 [70%] of 61) of the plausible NAGNAGs are created (35 novel tandem AG alleles and 8 conversions of implausible into plausible), whereas only 18 are destroyed (16 AG destructions and 2 conversions of plausible into implausible). Therefore, we questioned whether there is a trend toward gain-of-NAGNAG acceptors in the human lineage. To test this, we used a null model that maps SNPs to randomly chosen acceptors (see appendix A) and found nearly the same relation for gain and loss of plausible NAGNAG acceptors. Thus, the high number of nonancestral plausible NAGNAGs is presumably a consequence of the fact that NAGNAG motifs represent only 5% of all human acceptors (Hiller et al. 2004). In consequence, in recent primate genomes, a constant bias seems to exist toward the accumulation of NAGNAG acceptors, which leads to an increased complexity of the transcriptome and proteome, antagonized by purifying selection. The question of whether the currently observed NAGNAG fraction among human acceptors represents the saturation level has to be addressed by further comparative genomewide analyses.

Furthermore, we observed striking differences in the numbers of SNPs that affect the AG of the E or I acceptor in ancestral plausible and implausible NAGNAGs, respectively. For the 16 ancestral plausible HAGHAGs, the E acceptor is affected in 11 cases and the I acceptor in 5. In contrast, for 22 implausible HAGGAGs (one ancestral GAGGAG and two GAGHAGs were omitted), we found 5 and 17 cases, respectively (Fisher’s exact test P=.00766). Interestingly, we observed the same trend by comparing all 138 human NAGNAGs that are not conserved in the chimpanzee genome (one GAGGAG and seven GAGHAGs were omitted). The I acceptors of 79 HAGHAGs are affected in 56% (44), whereas the GAG of 59 HAGGAGs is affected in 83% (49) (Fisher’s exact test P=.0009). Implausible GAGGAG and GAGHAG motifs were not considered, since the number of cases is too small.

Since tandem acceptors are nonrandomly distributed in the human genome, with a bias toward intron phase 1 and toward single-aa indels in phase 1 and 2, we questioned whether the nonancestral plausible NAGNAGs are also biased. Indeed, these NAGNAGs show the same bias toward intron phase 1, and they also have a strong tendency to result in single-aa indels (table 5). Thus, the process of establishing SNPs that are relevant for alternative NAGNAG splicing in the human population seems to be a nonrandom process that is subject to the same evolutionary forces as the maintenance of the tandem acceptors themselves.

Table 5.

Intron Phase Distribution and Single aa Events of Nonancestral Plausible NAGNAG Acceptors[Note]

No. (%) of Introns by Phase
Acceptor 0 1 2 No. (%) ofSingle-aa Events,Phases 1 and 2
Nonancestral NAGNAG allelesa 12 (31.6) 16 (42.1) 10 (26.3) 24 (92.3)
Nonpolymorphic confirmed NAGNAGsb 349 (39.8) 379 (43.2) 150 (17.0)   487 (92.1)  

Note.— Only NAGNAGs that are located upstream of a coding exon are considered.

a

Plausible polymorphic NAGNAGs for which the chimpanzee acceptor has no NAGNAG.

b

EST/mRNA-confirmed NAGNAGs (Hiller et al. 2004).

Discussion

Since splicing variations are coming more and more into the research focus of human molecular genetics (Lopez-Bigas et al. 2005), novel approaches are needed to identify splice-relevant SNPs. By data mining the SNP annotation of the UCSC Human Genome Browser, we identified 121 variations that may affect alternative splicing by creation, destruction, or changing of NAGNAG acceptors. To improve the specificity of our prediction, we classified NAGNAG acceptors into “plausible” (HAGHAG) and “implausible” (GAGHAG, HAGGAG, or GAGGAG) ones. This subdivision of the tandem acceptors, primarily based on the degree of confirmation by mRNA and EST data, is further supported by (1) the fact that GAG acceptors are very rare (Stamm et al. 2000), (2) our genomewide observation that only plausible and not implausible NAGNAGs have the same bias toward intron phase 1 as experimentally confirmed NAGNAGs (Hiller et al. 2004), and (3) the observed differences in the numbers of SNPs that affect the AG of the E or I acceptor in ancestral plausible and implausible NAGNAGs, respectively. The last indicates that the selection pressure to maintain the E acceptor for HAGGAGs is higher than the pressure to preserve the coding sequence, since destruction of the HAG acceptor will leave a GAG that is unlikely to act as an acceptor site. In contrast, for plausible HAGHAGs, destruction of either AG is much less deleterious, since the other will still function as an acceptor. Thus, the identified 64 SNPs in plausible NAGNAGs are highly predictive of variations in alternative splicing. Nevertheless, it represents an experimental and bioinformatic challenge for future research to elucidate what makes the rare cases of confirmed implausible NAGNAG acceptors.

Although it seems obvious that the disruption of a plausible NAGNAG acceptor abolishes the formation of alternative transcripts, SNPs in these motifs provide us with unique knockout experiments by nature to confirm this hypothesis experimentally. Analyzing the expression of E and I transcripts in cells with at least one HAGHAG allele or without HAGHAG alleles, we have shown that the NAGNAG motif is necessary for this type of alternative splicing. In a subsequent analysis, we asked whether NAGNAG motifs created by the nonancestral SNP alleles allow alternative splicing. Usually, the introduction of an AG anywhere in the pre-mRNA does not create a functional acceptor site, since a polypyrimidine tract upstream and possibly enhancer sequences are required for recognition by the spliceosome. However, we suppose that the creation of a second AG 3 bases up or downstream of an existing acceptor is very likely to result in a functional tandem acceptor, since the splice-relevant sequence context is already present.

Referring to the chimpanzee genome as the reference for ancestral SNP alleles, we found EST and RT-PCR evidence that novel plausible NAGNAGs are most likely functional. This implies that a change of a normal acceptor to a plausible NAGNAG acceptor by a single mutation is sufficient to enable alternative splicing. Although the mechanism of NAGNAG splicing is not understood in detail, our findings argue against a general involvement of signals other than the NAGNAG itself. Thus, we conclude that SNPs in plausible NAGNAGs have an influence on NAGNAG splicing, regardless of whether the NAGNAG is ancestral. However, additional signals might be necessary for regulation of alternative splicing at tandem receptors.

Most interestingly, 23% (15 of 64) of SNPs in plausible NAGNAGs are translationally nonsilent and, thus, introduce a novel dimension of variability on the protein level by changing the I acceptor and the aa sequence of the E protein. Whereas homozygotes express either one or two isoforms, heterozygosity results in three different proteins (fig. 3). As listed in the Human Gene Mutation Database, the aa change can be dramatic—for example, as from Glu to the oppositely charged Lys in PAPSS2 (rs17173698), which leads to a decrease in immunoreactive protein (Xu et al. 2002). However, the third isoform of the protein generated by alternative NAGNAG splicing had not been taken into consideration. Moreover, it is conceivable that some of the SNPs in NAGNAG acceptors that allow the formation of three protein isoforms in heterozygotes may confer a heterozygous advantage.

Figure 3.

Figure  3

SNP affecting the I acceptor and the aa sequence of the E protein (rs2275992 in ZFP91). Homozygosity of the G allele without a NAGNAG results in the expression of one protein (A), homozygosity of the A allele with the NAGNAG results in two (B), and heterozygosity results in three isoforms (C). All three transcripts are confirmed by at least four ESTs/mRNAs. The two allele variants are highlighted in light and dark gray. Amino acids are shown below the second codon position. Upper- and lowercase letters indicate exonic and intronic nucleotides, respectively. Exons are boxed.

Alternative splicing at tandem acceptors can result in the gain/loss of a premature stop codon in the mRNA. Among SNPs affecting plausible NAGNAGs, the G allele of SNP rs9644946 changes the acceptor context of GOLGA1 exon 8 from AAATAG to AAGTAG. Since intron 7 resides in phase 0, an inframe TAG insertion would be the consequence if the novel E acceptor is used. Interestingly, the gene codes for an autoantigen associated with Sjogren syndrome (MIM 270150). Since the E acceptor is preferred in alternative NAGNAG splicing (Hiller et al. 2004), the novel AAG acceptor is likely to be functional. The resulting E transcript is a candidate for nonsense-mediated mRNA decay (Maquat 2004). Thus, the AAGTAG allele would result in a lower protein expression. Alternatively, it is possible that the mRNA containing the premature stop codon escapes degradation, and the truncated protein may exhibit autoantigenic properties. It remains to be elucidated in populations with a sufficiently high allele frequency (e.g., 0.099 in the PERLEGEN panel that contains 24 samples of Chinese descent), regardless of whether alternative splicing at the AAGTAG acceptor contributes to the disease.

A second example of potential disease relevance is the SNP rs363209, the G allele of which creates a novel plausible AAGCAG acceptor of intron 6 in APPBP1 (GenBank accession number NM_003905). The APP-BP1 protein binds to the carboxyl-terminal region of the amyloid precursor protein (APP) and interacts with the ubiquitin-activating enzyme E1C (UBE1C [homolog to yeast Uba3]) in the process of neddylation (Walden et al. 2003). APP plays a central role in Alzheimer disease and Down syndrome. Dysfunction of the APP-BP1 interaction with APP has been suggested to be one cause of Alzheimer disease (Chen 2004). The protein-protein interactions of the APP-BP1 E and I isoforms may be different and modulate the respective processes. It should be mentioned that the UBE1C gene (GenBank accession number NM_003968) itself contains a tandem acceptor (CAGAAG in front of exon 11). This may further increase the flexibility of the neddylation process by all four combinations of the E/I protein isoforms from two genes each.

The disease relevance of a NAGNAG SNP is demonstrated for the ABCA4 gene (Maugeri et al. 1999). Maugeri et al. (1999) describe a NAGNAG mutation (2588G→C, changing the acceptor site TAGGAG→TAGCAG) that has a much higher frequency in patients with Stargardt disease 1 (STGD1 [MIM 248200]) and that is assumed to be a mild mutation that causes STGD1 in combination with a severe ABCA4 mutation. By experimental analysis of the splice patterns of two patients with STGD1 who carry the mutation and one control individual, they found that only the alleles with the TAGCAG produce two splice forms. Our study exactly predicts this mutation outcome.

In general, most of the SNPs that are described in the present study—in particular, these in plausible NAGNAGs—affect the E:I transcript ratio, depending on the cell’s genotype. SNP alleles with a destroyed E acceptor cause the exclusive expression of the I transcript. Alleles that destroy an I acceptor result in an exclusive expression of the longer E transcript. SNPs that comprise a plausible and an implausible NAGNAG allele will seriously hamper or disable splicing at the GAG acceptor. It has already been shown that a change in the ratio of alternative splice forms can cause diseases. For example, the change in the ratio of the alternative MAPT transcripts containing three or four microtubule-binding repeats may be causal for frontotemporal dementia (MIM 600274) (Spillantini et al. 1998). Another example is the WT1 gene, in which alternative donor usage results in two protein isoforms that differ in 3 aa (+KTS/−KTS isoforms) and function (Englert et al. 1995). The altered ratio of +KTS/−KTS leads to Frasier syndrome (MIM 136680) (Barbaux et al. 1997). This situation is similar to that of NAGNAG acceptors, since E/I protein isoforms are observed that have functional differences (Condorelli et al. 1994; Tadokoro et al. 2005).

Altogether, 28% (18 of 64) of the plausible NAGNAG SNPs occur in known disease genes (table 6). Thus, they are preferable candidates for more-detailed functional analysis and association studies to link alternative splicing with diseases. Currently, there are no general methods that allow the prediction of splice-relevant SNPs. Focusing on SNPs that affect NAGNAG acceptors, we present a highly effective approach for the identification of SNPs that result in variations in alternative splicing patterns.

Table 6.

Human Disease Genes with SNPs Affecting Plausible NAGNAG Acceptors

dbSNP ID Gene Symbol RefSeq ID Disease MIM Number(s) PubMed ID(s)
rs3020724 CYP17A1 NM_000102 Adrenal hyperplasia, congenital #202110, *609300 4303304
rs12042060 FIBL-6 NM_031935 Age-related macular degeneration #603075, *608548 14570714
rs2243187 IL19 NM_153758 Asthma *605687 15557163
rs8176139 BRCA1 NM_007304 Breast cancer *113705, #114480 9167459
rs11567804 C3AR1 NM_004054 Bronchial asthma *605246 15278436
rs3025420 DBH NM_000787 Congenital dopamine-beta-hydroxylase deficiency #223360, *609312 14991826
rs2409496 GART NM_175085 Down syndrome *138440 9328467
rs1804783 CACNA1A NM_023035 Episodic ataxia-2, familial hemiplegic migraine, spinocerebellar ataxia-6, idiopathic generalized epilepsy #183086, #141500, #108500, *601011 8988170, 8898206, 9302278
rs2010657 GGT1 NM_013421 Glutathionuria +231950 238530, 7623451
rs2307130 AGL NM_000644 Glycogen storage disease type III +232400 9032647, 10925384
rs1833783 FTL NM_000146 Hyperferritinemia-cataract syndrome #600886, *134790 7493028, 12199804
rs11661706 EPB41L3 NM_012307 Meningioma, lung cancer *605331 10888600, 9892180
rs2275992 ZFP91 NM_170768 Acute myeloid leukemia #601626 12738986,
rs1071716 TPM2 NM_213674 Nemaline myopathy-4, distal arthrogryposis 1 #609285, #108120, *190990 11738357, 12592607
rs2521612 SLC4A1 NM_000342 Renal tubular acidosis, ovalocytosis, spherocytosis #179800, 166900, +109270 9600966, 1737855, 9973643
rs9644946 GOLGA1 NM_002077 Sjogren syndrome 270150, *602502 9324025
rs17173698 PAPSS2 NM_004670 Spondyloepimetaphyseal dysplasia *603005 9714015
rs9606756 TCN2 NM_000355 Transcobalamin II deficiency +275350 14632784
rs7862221 TSC1 NM_000368 Tuberous sclerosis #191100, *605284 12773162, 14551205
rs11574323 WRN NM_000553 Werner syndrome #277700, *604611 9012406, 8968742

Acknowledgments

The skillful technical assistance of Ivonne Görlich is gratefully acknowledged. This work was supported by German Ministry of Education and Research grants 01GS0426 (to S.S.) and 01GR0105 and 0312704E (to M.P.) as well as Deutsche Forschungsgemeinschaft grant SFB604-02 (to M.P.).

Appendix A: Randomization Null Model for NAGNAG SNPs

To assess whether there is a preference for creating plausible NAGNAGs, we used a simulation that assigns a new acceptor to the 2.896 SNPs that overlap an acceptor in the 9-nt context and evaluates a possible NAGNAG-relevant outcome. For the 2,896 SNPs, we blasted the 101-nt genomic context (50 nt upstream and 50 nt downstream of the SNP) against the chimpanzee genome to determine the ancestral allele variant. We kept alignments with at least 95% identity and no mismatches in a ±5-nt context around the SNP position. This yielded a total of 2,439 SNPs. Then, we blasted the 103-nt contexts (50 nt up- and downstream of the acceptor NAG) of 10,000 human acceptor sites (excluding the acceptors that are overlapped by a known SNP) against the chimpanzee genome and kept 8,082 for which we found an alignment (95% identity and no mismatch ±10 nt around the NAG). Then, we assigned a new acceptor (randomly chosen from the 8,082) to a given SNP. We chose an acceptor with the ancestral allele variant at the respective position (e.g., if a SNP changes a C→G at position 4 of the 9-nt context, the new acceptor must also have a C at position 4). Since a methylated C in a CG context frequently mutates to a T, we assigned a new acceptor with the same sequence context at this position if the SNP represents a C→T mutation in a CG context (or a G→A mutation in a GC context on the opposite strand). This assures that context-dependent mutations are simulated in the same context. If a new acceptor is assigned to a SNP, we evaluated the possible impact on a NAGNAG acceptor. For each of the 2,439 SNPs, we successively assigned 10 randomly chosen acceptors (avoiding duplicate assignments).

The whole procedure was repeated 10 times, with different starts of the random-number generator. We calculated the following statistics from the 10 runs: (1) minimum and maximum percentage of creation versus destruction of a plausible NAGNAG, (2) minimum and maximum percentage of changes from a plausible to an implausible NAGNAG versus changes from an implausible to a plausible NAGNAG, and (3) minimum and maximum percentage of “gain of plausible NAGNAG” versus “loss of plausible NAGNAG.” “Gain of plausible NAGNAG” is the sum of created, plausible NAGNAGs and changes from implausible to plausible. “Loss of plausible NAGNAG” is the sum of destroyed, plausible NAGNAGs and changes from plausible to implausible. These values were compared with the observed values by Fisher’s exact test. For (1), we obtained P values between .52 and .75, for (2), P values between .72 and 1, and, for (3), P values between .66 and .88. Thus, the observed bias toward “gain of plausible NAGNAG” is comparable to the expectation.

Web Resources

Accession numbers and URLs for data presented herein are as follows:

  1. dbEST, ftp://ftp.ncbi.nlm.nih.gov/blast/db/FASTA/est_human.gz (for the human portion of dbEST)
  2. GenBank, http://www.ncbi.nlm.nih.gov/Genbank/ (for the human mRNA download, ZFP91 [accession number NM_170768], DTX2 [accession numbers DQ082728 and DQ082730], CMA1 [accession numbers DQ082727 and DQ082729]), APPBP1 [accession number NM_003905], and UBE1C [accession number NM_003968])
  3. Human Gene Mutation Database, http://www.uwcm.ac.uk/uwcm/mg/hgmd0.html
  4. Online Mendelian Inheritance in Man (OMIM), http://www.ncbi.nlm.gov/Omim/ (for Sjogren syndrome, STGD1, frontotemporal dementia, and Frasier syndrome)
  5. UCSC Chimpanzee Genome Browser, http://hgdownload.cse.ucsc.edu/goldenPath/PANTro1/bigZips/ (for source download panTro1 [November 2003])
  6. UCSC Human Genome Browser, http://hgdownload.cse.ucsc.edu/goldenPath/hg17/bigZips/ (for source download hg17)

References

  1. Barbaux S, Niaudet P, Gubler MC, Grunfeld JP, Jaubert F, Kuttenn F, Fekete CN, Souleyreau-Therville N, Thibaud E, Fellous M, McElreavey K (1997) Donor splice-site mutations in WT1 are responsible for Frasier syndrome. Nat Genet 17:467–470 10.1038/ng1297-467 [DOI] [PubMed] [Google Scholar]
  2. Cartegni L, Chew SL, Krainer AR (2002) Listening to silence and understanding nonsense: exonic mutations that affect splicing. Nat Rev Genet 3:285–298 10.1038/nrg775 [DOI] [PubMed] [Google Scholar]
  3. Chen YZ (2004) APP induces neuronal apoptosis through APP-BP1-mediated downregulation of β-catenin. Apoptosis 9:415–422 10.1023/B:APPT.0000031447.05354.9f [DOI] [PubMed] [Google Scholar]
  4. Condorelli G, Bueno R, Smith RJ (1994) Two alternatively spliced forms of the human insulin-like growth factor I receptor have distinct biological activities and internalization kinetics. J Biol Chem 269:8510–8516 [PubMed] [Google Scholar]
  5. Englert C, Vidal M, Maheswaran S, Ge Y, Ezzell RM, Isselbacher KJ, Haber DA (1995) Truncated WT1 mutants alter the subnuclear localization of the wild-type protein. Proc Natl Acad Sci USA 92:11960–11964 [DOI] [PMC free article] [PubMed] [Google Scholar]
  6. Fredman D, White SJ, Potter S, Eichler EE, Den Dunnen JT, Brookes AJ (2004) Complex SNP-related sequence variation in segmental genome duplications. Nat Genet 36:861–866 10.1038/ng1401 [DOI] [PubMed] [Google Scholar]
  7. Garcia-Blanco MA, Baraniak AP, Lasda EL (2004) Alternative splicing in disease and therapy. Nat Biotechnol 22:535–546 10.1038/nbt964 [DOI] [PubMed] [Google Scholar]
  8. Hiller M, Huse K, Szafranski K, Jahn N, Hampe J, Schreiber S, Backofen R, Platzer M (2004) Widespread occurrence of alternative splicing at NAGNAG acceptors contributes to proteome plasticity. Nat Genet 36:1255–1257 10.1038/ng1469 [DOI] [PubMed] [Google Scholar]
  9. Karinch AM, deMello DE, Floros J (1997) Effect of genotype on the levels of surfactant protein A mRNA and on the SP-A2 splice variants in adult humans. Biochem J 321:39–47 [DOI] [PMC free article] [PubMed] [Google Scholar]
  10. Ke X, Hunt S, Tapper W, Lawrence R, Stavrides G, Ghori J, Whittaker P, Collins A, Morris AP, Bentley D, Cardon LR, Deloukas P (2004) The impact of SNP density on fine-scale patterns of linkage disequilibrium. Hum Mol Genet 13:577–588 10.1093/hmg/ddh060 [DOI] [PubMed] [Google Scholar]
  11. Long M, Deutsch M (1999) Association of intron phases with conservation at splice site sequences and evolution of spliceosomal introns. Mol Biol Evol 16:1528–1534 [DOI] [PubMed] [Google Scholar]
  12. Lopez-Bigas N, Audit B, Ouzounis C, Parra G, Guigo R (2005) Are splicing mutations the most frequent cause of hereditary disease? FEBS Lett 579:1900–1903 10.1016/j.febslet.2005.02.047 [DOI] [PubMed] [Google Scholar]
  13. Lynch KW, Weiss A (2001) A CD45 polymorphism associated with multiple sclerosis disrupts an exonic splicing silencer. J Biol Chem 276:24341–24347 10.1074/jbc.M102175200 [DOI] [PubMed] [Google Scholar]
  14. Maquat LE (2004) Nonsense-mediated mRNA decay: splicing, translation and mRNP dynamics. Nat Rev Mol Cell Biol 5:89–99 10.1038/nrm1310 [DOI] [PubMed] [Google Scholar]
  15. Maugeri A, van Driel MA, van de Pol DJR, Klevering BJ, van Haren FJJ, Tijmes N, Bergen AAB, Rohrschneider K, Blankenagel A, Pinckers AJLG, Dahl N, Brunner HG, Deutman AF, Hoyng CB, Cremers FPM (1999) The 2588G→C mutation in the ABCR gene is a mild frequent founder mutation in the Western European population and allows the classification of ABCR mutations in patients with Stargardt disease. Am J Hum Genet 64:1024–1035 [DOI] [PMC free article] [PubMed] [Google Scholar]
  16. Pagani F, Baralle FE (2004) Genomic variants in exons and introns: identifying the splicing spoilers. Nat Rev Genet 5:389–396 10.1038/nrg1327 [DOI] [PubMed] [Google Scholar]
  17. Schaal TD, Maniatis T (1999) Multiple distinct splicing enhancers in the protein-coding sequences of a constitutively spliced pre-mRNA. Mol Cell Biol 19:261–273 [DOI] [PMC free article] [PubMed] [Google Scholar]
  18. Spillantini MG, Murrell JR, Goedert M, Farlow MR, Klug A, Ghetti B (1998) Mutation in the tau gene in familial multiple system tauopathy with presenile dementia. Proc Natl Acad Sci USA 95:7737–7741 10.1073/pnas.95.13.7737 [DOI] [PMC free article] [PubMed] [Google Scholar]
  19. Stamm S, Zhu J, Nakai K, Stoilov P, Stoss O, Zhang MQ (2000) An alternative-exon database and its statistical analysis. DNA Cell Biol 19:739–756 10.1089/104454900750058107 [DOI] [PubMed] [Google Scholar]
  20. Tadokoro K, Yamazaki-Inoue M, Tachibana M, Fujishiro M, Nagao K, Toyoda M, Ozaki M, Ono M, Miki N, Miyashita T, Yamada M (2005) Frequent occurrence of protein isoforms with or without a single amino acid residue by subtle alternative splicing: the case of Gln in DRPLA affects subcellular localization of the products. J Hum Genet 50:382–394 10.1007/s10038-005-0261-9 [DOI] [PubMed] [Google Scholar]
  21. Taudien S, Galgoczy P, Huse K, Reichwald K, Schilhabel M, Szafranski K, Shimizu A, Asakawa S, Frankish A, Loncarevic IF, Shimizu N, Siddiqui R, Platzer M (2004) Polymorphic segmental duplications at 8p23.1 challenge the determination of individual defensin gene repertoires and the assembly of a contiguous human reference sequence. BMC Genomics 5:92 10.1186/1471-2164-5-92 [DOI] [PMC free article] [PubMed] [Google Scholar]
  22. Valentonyte R, Hampe J, Huse K, Rosenstiel P, Albrecht M, Stenzel A, Nagy M, Gaede KI, Franke A, Haesler R, Koch A, Lengauer T, Seegert D, Reiling N, Ehlers S, Schwinger E, Platzer M, Krawczak M, Muller-Quernheim J, Schurmann M, Schreiber S (2005) Sarcoidosis is associated with a truncating splice site mutation in BTNL2. Nat Genet 37:357–364 10.1038/ng1519 [DOI] [PubMed] [Google Scholar]
  23. Walden H, Podgorski MS, Schulman BA (2003) Insights into the ubiquitin transfer cascade from the structure of the activating enzyme for NEDD8. Nature 422:330–334 10.1038/nature01456 [DOI] [PubMed] [Google Scholar]
  24. Watanabe H, Fujiyama A, Hattori M, Taylor TD, Toyoda A, Kuroki Y, Noguchi H, et al (2004) DNA sequence and comparative analysis of chimpanzee chromosome 22. Nature 429:382–388 10.1038/nature02564 [DOI] [PubMed] [Google Scholar]
  25. Xu ZH, Freimuth RR, Eckloff B, Wieben E, Weinshilboum RM (2002) Human 3′-phosphoadenosine 5′-phosphosulfate synthetase 2 (PAPSS2) pharmacogenetics: gene resequencing, genetic polymorphisms and functional characterization of variant allozymes. Pharmacogenetics 12:11–21 10.1097/00008571-200201000-00003 [DOI] [PubMed] [Google Scholar]

Articles from American Journal of Human Genetics are provided here courtesy of American Society of Human Genetics

RESOURCES