Single-Nucleotide Polymorphisms in NAGNAG Acceptors Are Highly Predictive for Variations of Alternative Splicing

Michael Hiller; Klaus Huse; Karol Szafranski; Niels Jahn; Jochen Hampe; Stefan Schreiber; Rolf Backofen; Matthias Platzer

doi:10.1086/500151

. 2005 Dec 22;78(2):291–302. doi: 10.1086/500151

Single-Nucleotide Polymorphisms in NAGNAG Acceptors Are Highly Predictive for Variations of Alternative Splicing

Michael Hiller ^1,,^*, Klaus Huse ^2,,^*, Karol Szafranski ², Niels Jahn ², Jochen Hampe ³, Stefan Schreiber ³, Rolf Backofen ¹, Matthias Platzer ²

PMCID: PMC1380236 PMID: 16400609

Abstract

Aberrant or modified splicing patterns of genes are causative for many human diseases. Therefore, the identification of genetic variations that cause changes in the splicing pattern of a gene is important. Elsewhere, we described the widespread occurrence of alternative splicing at NAGNAG acceptors. Here, we report a genomewide screen for single-nucleotide polymorphisms (SNPs) that affect such tandem acceptors. From 121 SNPs identified, we extracted 64 SNPs that most likely affect alternative NAGNAG splicing. We demonstrate that the NAGNAG motif is necessary and sufficient for this type of alternative splicing. The evolutionarily young NAGNAG alleles, as determined by the comparison with the chimpanzee genome, exhibit the same biases toward intron phase 1 and single–amino acid insertion/deletions that were already observed for all human NAGNAG acceptors. Since 28% of the NAGNAG SNPs occur in known disease genes, they represent preferable candidates for a more-detailed functional analysis, especially since the splice relevance for some of the coding SNPs is overlooked. Against the background of a general lack of methods for identifying splice-relevant SNPs, the presented approach is highly effective in the prediction of polymorphisms that are causal for variations in alternative splicing.

SNPs, as the most abundant form of genetic variation, contribute significantly to phenotypic individuality and disease susceptibility. SNPs are mostly biallelic and are therefore easy to assay once they are described. Given their abundance in the human genome (∼1 SNP every 300 bp [Ke et al. ²⁰⁰⁴]) and their ease of high-throughput typing, SNPs progressively replace microsatellites as first-choice genetic markers in association and linkage studies.

Much interest focuses on SNPs that are located in coding regions, since those SNPs may alter the protein sequence. However, SNPs can also influence splicing, which usually has a greater effect on the resulting protein than does the alteration of a single codon. Recently, splicing mutations have been suspected to be the most frequent cause of hereditary diseases (Lopez-Bigas et al. ²⁰⁰⁵). Accordingly, an increasing number of SNPs have been described that cause diseases by a change or disruption of the normal splicing pattern (for review, see Cartegni et al. [2002] and Garcia-Blanco et al. [2004]). These splice-relevant SNPs affect donor and acceptor splice sites, branch points, exonic as well as intronic splicing enhancers and silencers or alter important mRNA secondary structures. For example, the G allele of the silent coding SNP rs17612648 in the PTPRC gene that is associated with multiple sclerosis destroys an exonic splicing silencer and abolishes the skipping of exon 4 (Lynch and Weiss 2001), and the SNP rs2076530 in BTLN2 that is associated with sarcoidosis leads to the activation of a cryptic donor site and a cryptic donor splice site 4 nt upstream (Valentonyte et al. 2005). Since the impact of SNPs on splicing is hard to predict in silico and is difficult to analyze experimentally, silent or intronic SNPs that may cause a phenotype or a disease by changing splicing patterns are often not investigated (Pagani and Baralle 2004). Thus, novel approaches are urgently needed to identify splice-relevant SNPs.

Recently, we reported the widespread occurrence of subtle alternative splice events that insert or delete the sequence NAG (N denotes A, C, G, or T) in mRNA (Hiller et al. 2004). This happens if both AG alleles of a NAGNAG acceptor can be chosen by the spliceosome. We termed the upstream acceptor in this tandem motif the “E acceptor” and the downstream one the “I acceptor.” The products that arise from the use of E and I acceptors are called “E and I transcripts and proteins,” respectively. The consequences of NAG insertion/deletions (indels) in mRNAs for the respective protein sequences are highly diverse and comprise eight different single–amino acid (aa) indel events, the exchange of a dipeptide and an unrelated aa, or the creation/destruction of a stop codon. Tandem acceptors are conserved between human and mouse, and the use of E or I acceptors can be controlled in a tissue-specific manner. Our results concerning the frequency and tissue specificity were confirmed by others (Tadokoro et al. 2005). Furthermore, E/I protein isoforms have functional differences (Condorelli et al. 1994; Tadokoro et al. ²⁰⁰⁵), and the SNP rs1650232 within a NAGNAG acceptor is associated with respiratory-distress syndrome (Karinch et al. 1997).

Since NAGNAG acceptors occur in ∼30% of human genes, we were interested in finding SNPs that may affect this type of alternative splicing. By scanning the SNP annotation of the human reference sequence, we identified those SNPs and provide experimental evidence of respective variations in the alternative splicing patterns. In addition, we introduce a classification for NAGNAG acceptors, with respect to their splicing plausibility, to bring forward a highly effective approach for predicting splice-relevant SNPs.

Methods

Identification of SNPs Affecting NAGNAG Acceptors

We downloaded the human genome assembly from the UCSC Genome Browser (UCSC Human Genome Browser, hg17, May 2004) as well as from RefSeq (refGene.txt.gz, January 12, 2005) and SNP annotations (snp.txt.gz, January 9, 2005). From the transcripts, we extracted a list of unique genomic positions of acceptor sites. We used the genomic position of the acceptors to select those SNPs that overlap the first 3 nt of an exon or the last 6 nt of an intron. Then, we evaluated whether one of both AG alleles or one of the two Ns in the NAGNAG pattern is polymorphic. SNPs are the only type of polymorphisms that were considered.

To check whether a tandem acceptor is EST confirmed, we used BLAST with a search string of 30 nt from the upstream exon and 30 nt from the downstream exon—taking the nonannotated acceptor into account—against the human fraction of the dbEST database (December 2004) and against the mRNA sequences downloaded from GenBank (December 2004). At most, one mismatch or one gap was allowed.

Comparison with the Chimpanzee Genome

We downloaded the chimpanzee genome working draft assembly from UCSC Genome Browser (UCSC Chimpanzee Genome Browser, panTro1, November 2003). We compared human polymorphic sites with the chimpanzee sequence, using BLAST, with 101-nt queries consisting of one of the SNP alleles, as well as 50 nt upstream and 50 nt downstream. Only hits with at least 95% identity and no other mismatch in the −5…+5 context of the SNP were considered.

Null Model for Gain of NAGNAG Acceptors

Briefly described, we determined the ancestral allele variant for 2,439 SNPs that overlap an acceptor in the 9-nt context by comparing the genomic sequence context with the chimpanzee genome. In addition, we selected a set of 8,082 acceptor sites not affected by known SNPs. Then, the 2,439 SNPs were randomly assigned to one of those acceptors, given that the ancestral allele variant is present at the respective position. This position was replaced by the nonancestral allele, and we evaluated and counted the possible impact on a NAGNAG acceptor. More details are given in appendix A.

Experimental Verification of Alternative Splicing at Polymorphic NAGNAG Acceptors

Genomic DNA and cDNA from 12 whites were kindly provided by Gerd Birkenmeier (Leipzig) and were purified from whole blood by standard methods. First-strand cDNA was derived from oligo-dT primed reverse transcription.

For determination of the respective genotypes, ∼20 ng of genomic DNA was used to PCR amplify the regions of the respective SNP through use of Ready-To-Go PCR beads (Amersham). PCR conditions were 1 cycle of denaturation at 95°C for 30 s; followed by 38 cycles of denaturing at 92°C for 30 s, annealing at 59°C for 30 s, and extension at 72°C for 60 s; and 1 cycle of final extension at 72°C for 5 min. PCR products were purified by precipitation and were sequenced with the same primers used for PCR amplification by the dye terminator method by use of BigDye v3.1 (Applied Biosystems). To identify E and I transcripts, cDNA from the genotyped individuals was amplified using the same PCR conditions with transcript-specific primers.

For amplification of genomic DNA and subsequent sequencing of the resulting amplicons that correspond to SNPs listed in table 1, we used primers 5′-CAGCTACGGTTTGCTGAGAA-3′ and 5′-ACAGAGGGGACAGGGAGATT-3′ for genotyping rs2245425, 5′-GATTTTCCTGGAGGAGAGGG-3′ and 5′-CAAGTTCAAAGCAAGCCTCC-3′ for rs1558876, 5′-AGGAGGCGTGCTATCTGGTA-3′ and 5′-GTAGGAAGCCCTGGAGGAAG-3′ for rs2290647, 5′-GCCATTGAGTTGTCATCACC-3′ and 5′-ACCCATTAGCTTGGCAACAG-3′ for rs2275992, 5′-AAGAATGGCGTCCATTTCAC-3′ and 5′-TTTCTGATCCTTGGTGAGGG-3′ for rs4590242, and 5′-CCTTCAACCTCAATGACGAAA-3′ and 5′-CACAAAGGACTTGTCAGGGA-3′ for rs1152522. RT-PCR for transcript amplification was done with primers 5′-GAAAGCGCGTACTACCTTCG-3′ and 5′-AATCCCTGGATCTGGCCTTA-3′ for TOR1AIP1, 5′-AGGCTACAACCACCCTCCTT-3′ and 5′-ACTTCCCCCTTGACGAGTTT-3′ for KIAA1001, 5′-AGAGGAGGACAAGGAGGAGC-3′ and 5′-GAACAGCGTCTGTGTCTCCA-3′ for KIAA1533, 5′-GGACATCTGTTTCTCGCCAT-3′ and 5′-ATCCTTCCATCTCACAACGG-3′ for ZFP91 (GenBank accession number NM_170768), 5′-TCTTTCTTTTGTGGTGGGGA-3′ and 5′-TGTCAGGGACCCAGATCTTC-3′ for GABRR1, and 5′-TGCAGGACCAGAATAAAGCC-3′ and 5′-TATGGTCCCTTGGACTTTGC-3′ for C14orf105. For ZFP91 and TOR1AIP1, the amplicons obtained by RT-PCR from individuals with each of the possible genotypes were cloned into PCR2.1-TOPO (Invitrogen) and were propagated in Escherichia coli TOP10 cells, respectively. Plasmids were isolated from several isolated clones, and their inserts were sequenced using plasmid primers. SNPs exhibiting nonancestral plausible NAGNAGs without EST evidence were selected by high frequencies of the minor alleles rs1638152 (DTX2), rs5248 (CMA1), and rs17105087 (SLC25A21). Genomic primers used were for DTX2 (5′-TTTCCTCCTGGCAGCTTAGA-3′ and 5′-GCTGGGAGATGAAACCAAAG-3′), CMA1 (5′-GGCTCCAAGGGTGACTGTTA-3′ and 5′-CCCCACTTTCCCGTTTAACT-3′), and SCL25A21 (5′-AACTCCATGTCGTCCCAAAG-3′ and 5′-CAAAATCGTTTGTTCTTTGCC-3′). Transcript-specific primers were used for DTX2 (5′-CAGGCATGACGAGTGTTCTG-3′ and 5′-CACAGCTAGGGACCCGAT-3′) and CMA1 (5′-CCCTGCTGCTCTTTCTCTTG-3′ and 5′-ACACACCTGTTCTTCCCCAG-3′).

Table 1.

Correlation between Acceptor Genotypes and the Appearance of E and I Transcripts^[Note]

		Observations for Genotype
		Homozygous NAGNAG		Heterozygous		Homozygous Non-NAGNAG
dbSNP ID	Gene Symbol	No. of Probands	cDNA Transcripts	No. of Probands	cDNA Transcripts	No. of Probands	cDNA Transcript
rs2245425	TOR1AIP1^a	3	E+I	6	E+I	2	I
rs2275992	ZFP91^a	1	E+I	7	E+I	4	E
rs1558876	KIAA1001	0	…	6	E+I	6	E
rs2290647	KIAA1533	0	…	4	E+I	8	E
rs4590242	GABRR1	11	E+I	1	E+I	0	…
rs1152522	C14orf105	0	…	0	…	12	I

Open in a new tab

Note.— E+I indicates presence of both E and I transcripts; E indicates only E transcripts; I indicates only I transcripts.

Results

SNPs in NAGNAG Acceptors Influence Alternative Splicing

We extracted from the UCSC Human Genome Browser (hg17, May 2004) all annotated SNPs that are located within the last 6 nt of an intron or within the first 3 nt of an exon, given intron-exon boundaries from RefSeq transcripts. From these SNPs, we selected those that affect a NAGNAG acceptor. With respect to the human reference genome sequence, the alternative SNP allele can create or destroy a NAGNAG acceptor by affecting one of both AG alleles (fig. 1A and 1B). Since the nucleotide upstream of any acceptor AG is usually C or T (Stamm et al. 2000) and a change at this position is likely to alter alternative splicing at a tandem acceptor, we also considered SNPs at the N positions in an existing tandem (fig. 1C). We found a total of 137 NAGNAG-affecting SNPs (table 2). Aware of the uncertainty about the true nature of SNPs in segmental duplications (Fredman et al. 2004; Taudien et al. ²⁰⁰⁴), we excluded seven (5%) of the variations from further analysis. Our precaution was justified by genotyping SNP rs1638152 in 12 whites; we consistently found both alleles and both transcripts (DTX2 [GenBank accession numbers DQ082728 and DQ082730]), which is a strong indication for paralogous sequence variants and/or multisite variations (combinatorial P=.0003). Since dbSNP entries sometimes are the result of sequencing errors, we manually examined the trace data (if available) and excluded a further nine SNPs (7%). Thus, we considered a total of 121 bona fide SNPs affecting NAGNAG acceptors.

Schematic illustration of how SNPs affect splicing at NAGNAG acceptors. A, SNP alleles at position −2, −1, +2, or +3 of a NAGNAG acceptor destroy this motif by affecting the E (*left*) or I (*right*) acceptor, thus preventing alternative splicing. B, SNP alleles at intron positions −5 and −4 can create a novel E acceptor (*left*) and, at exon positions +2 and +3, a novel I acceptor (*right*), thus yielding a NAGNAG motif. Acceptors at these alleles may allow alternative splicing, as indicated by the two transcripts (*E transcript above; I transcript below*). C, SNP alleles at position −3 or +1 of a NAGNAG acceptor can convert a plausible NAGNAG that allows alternative splicing (*left*) to an implausible one that allows only the expression of one transcript (*right*), or vice versa. Positions refer to a standard intron-exon boundary. H denotes A, C, or T; upper- and lowercase letters indicate exonic and intronic nucleotides, respectively; exonic nucleotides are boxed.

Table 2.

SNPs That Affect NAGNAG Acceptors^[Note]

SNP

Protein Impact of

dbSNP ID^a

Chromosome

Position

Heterozygosity

Gene Symbol

Gene Name^b

RefSeq

Exon

Nucleotide Pattern^c

Variation^d

Position^e

Intron Phase^f

NAGNAG Splicing^g

Coding SNP^h

EST/mRNAⁱ

rs2297988

99108372

.456

KIAA0690

KIAA0690

NM_015179

AAGCAG|AAA→GAGCAG|AAA

Insert Q

…

rs4149853

238338449

.044

EXO1

Exonuclease 1

NM_130398

CAGCAG|AAC→TAGCAG|AAC

5′ UTR

…

rs2071558

52105734

.255

AMHR2

Anti-Mullerian hormone receptor, type II

NM_020547

CAGCAG|GTA→TAGCAG|GTA

Insert Q

…

rs16960071

45846061

.055

SEMA6D

Sema domain, transmembrane domain (TM), and cytoplasmic domain, (semaphorin) 6D

NM_020858

ATGAAG|GCT→AAGAAG|GCT

Insert R

…

rs9621415

30953580

…

SLC5A4

Solute carrier family 5 (low affinity glucose cotransporter), member 4

NM_014227

CGGCAG|GTC→CAGCAG|GTC

Insert Q

…

rs12042060^k

182663302

…

FIBL-6

Hemicentin

NM_031935

TTGCAG|AAC→TAGCAG|AAC

Insert A

…

rs2287800

25136015

…

AQP8

Aquaporin 8

NM_001169

CGGCAG|ATA→CAGCAG|ATA

Insert Q

…

1:0

rs1650232^j

81309218

…

SFTPA2

Surfactant, pulmonary-associated protein A2

NM_006926

CTGCAG|GAG→CAGCAG|GAG

5′ UTR

…

16:0

rs3997775^j

81361506

…

SFTPA1

Surfactant, pulmonary-associated protein A1

NM_005411

CTGCAG|GAG→CAGCAG|GAG

5′ UTR

…

2:0

rs2298847

55258795

.196

MT1G

Metallothionein 1G

NM_005950

TAGCAG|GTG→TTGCAG|GTG

Insert A

…

59:0

rs1622213^j

1506424

…

ATAD3B

ATPase family, AAA domain containing 3B

NM_031921

CGGCAG|GTC→CAGCAG|GTC

Insert Q

…

1:0

rs4590242

89969928

.441

GABRR1

Gamma-aminobutyric acid (GABA) receptor, rho 1

NM_002042

TGGTAG|GCC→TAGTAG|GCC

Insert S

…

2:0

rs2521612

39686270

…

SLC4A1

Solute carrier family 4, anion exchanger, member 1 (erythrocyte membrane protein band 3, Diego blood group)

NM_000342

CCGTAG|GCT→CAGTAG|GCT

Insert R

…

rs2271959

38978266

…

ETV4

ETS variant gene 4 (E1A enhancer binding protein, E1AF)

NM_001986

TCGCAG|AAA→TAGCAG|AAA

Insert Q

…

3:0

rs3020724

104580737

…

CYP17A1

Cytochrome P450, family 17, subfamily A, polypeptide 1

NM_000102

CTGCAG|AGC→CAGCAG|AGC

Insert A

…

rs6285

46875357

…

GABRB1

Gamma-aminobutyric acid (GABA) A receptor, beta 1

NM_000812

CGGCAG|GGC→CAGCAG|GGC

Insert A

…

rs5248

24045654

.242

CMA1

Chymase 1, mast cell

NM_001836

CAACAG|GTC→CAGCAG|GTC

Insert S

…

rs3025420

133551742

…

DBH

Dopamine beta-hydroxylase (dopamine beta-monooxygenase)

NM_000787

CACCAG|GTT→CAGCAG|GTT

Insert S

…

rs11466221

70653703

.021

TGFA

Transforming growth factor, alpha

NM_003236

CAACAG|GTA→CAGCAG|GTA

Insert A

…

rs363209

65412967

…

APPBP1

Amyloid beta precursor protein binding protein 1, 59kDa

NM_003905

AAACAG|CAC→AAGCAG|CAC

Insert S

…

rs9644946

124765060

.178

GOLGA1

Golgi autoantigen, golgin subfamily a, 1

NM_002077

AAATAG|GAG→AAGTAG|GAG

Insert stop

…

rs2245425

176590101

.464

TOR1AIP1

Torsin A interacting protein 1

NM_015602

TAGCAG|TGA→TAACAG|TGA

Insert A

…

13:0

rs200925

38166650

…

ZNF248

Zinc finger protein 248

NM_021045

TAACAG|GGT→TAGCAG|GGT

Insert A

…

rs1438073

182954651

…

PDE1A

Phosphodiesterase 1A, calmodulin-dependent

NM_001003683

AAATAG|ACT→AAGTAG|ACT

Insert R

…

rs13251099

12853939

…

FLJ36980

NM_182598

AAATAG|GTC→AAGTAG|GTC

5′ UTR

…

rs1076555^k

152043527

…

SCAMP3

Secretory carrier membrane protein 3

NM_005698

CAACAG|CCA→CAGCAG|CCA

Insert Q

…

rs2290609

3119508

…

IL5RA

Interleukin 5 receptor, alpha

NM_000564

CAACAG|TTT→CAGCAG|TTT

I versus TV

…

rs3816280

9578855

…

PPP4R1

Protein phosphatase 4, regulatory subunit 1

NM_005134

CAATAG|AAC→CAGTAG|AAC

Insert V

…

rs8176139

38505426

.011

BRCA1

Breast cancer 1, early onset

NM_007304

GTTTAG|CAG→GTTGAG|CAG

Delete Q

…

7:0

rs6591368

68591544

…

TPCN2

Two pore segment channel 2

NM_139075

TCTTAG|CAG→TCTCAG|CAG

Delete Q

…

rs11042902

10612199

…

MRVI1

Murine retrovirus integration site 1 homolog

NM_006069

AACCGG|CAG→AACCAG|CAG

NR versus K

…

1:1

rs1152522

57018133

.441

C14orf105

Chromosome 14 ORF 105

NM_018168

TCATAG|CAG→TCATGG|CAG

Delete Q

…

6:0

rs2307130

100028610

.475

AGL

Amylo-1, 6-glucosidase, 4-alpha-glucanotransferase (glycogen debranching enzyme, glycogen storage disease type III)

NM_000644

CTCTAG|AAG→CTCTGG|AAG

5′ UTR

…

rs9463545

49927803

…

CRISP1

Cysteine-rich secretory protein 1

NM_001131

TAACAG|AAG→TAACCG|AAG

Delete K

…

rs11567804

8104059

.109

C3AR1

Complement component 3a receptor 1

NM_004054

TTGCAG|AAG→TTGCAA|AAG

5′ UTR

…

rs13228988

69674388

…

AUTS2

Autism susceptibility candidate 2

NM_015570

CGATAG|CAG→CGATAA|CAG

Delete S

…

2:0

rs479984

109039673

…

MGC35555

NM_178565

TACTAG|AAG→TACTAA|AAG

Delete E

…

rs1044833

30754672

…

C1QDC1

C1q domain containing 1

NM_001002259

AAACAG|CAG→AAACAT|CAG

Delete A

…

rs2243187

203402743

.041

IL19

Interleukin 19

NM_153758

TCACAG|CAG→TCACAA|CAG

Delete Q

…

rs1833783^k

54161349

…

FTL

Ferritin, light polypeptide

NM_000146

ATATAG|AAG→ATATAC|AAG

Delete K

…

11:0

rs12974798

7532374

…

NTE

Neuropathy target esterase

NM_006702

TCGCAG|GAG→TCGCAG|AAG

Delete K

E→K

…

rs17173698

89458933

.011

PAPSS2

3′-Phosphoadenosine 5′-phosphosulfate synthase 2

NM_004670

TTATAG|GAG→TTATAG|AAG

Delete K

E→K

…

rs2261015^j

66007678

.448

RSAFD1

Radical S-adenosyl methionine and flavodoxin domains 1

NM_018264

CTCCAG|GAG→CTCCAG|TAG

Delete V

G→V

…

rs1804783

13186422

…

CACNA1A

Calcium channel, voltage-dependent, P/Q type, alpha 1A subunit

NM_023035

TTGCAG|GAG→TTGCAG|TAG

Delete V

G→V

…

rs3842776

42906589

…

PARVG

Parvin, gamma

NM_022141

TTCCAG|GAG→TTCCAG|CAG

Delete A

G→A

…

rs879022

79276118

…

REGL

Regenerating islet-derived-like, pancreatic stone protein-like, pancreatic thread protein-like (rat)

NM_006508

GGACAG|GAG→GGACAG|AAG

Delete E

G→E

…

rs12944821

33030504

…

AP1GBP1

AP1 gamma subunit binding protein 1

NM_007247

TTTCAG|CAG→TTTCAG|GAG

Delete G

A→G

5:0

rs3751353

23793134

.083

MGC48915

NM_178540

ATTTAG|GAG→ATTTAG|CAG

Delete A

G→A

…

rs4565430^k

97376502

…

PTDSS1

Phosphatidylserine synthase 1

NM_014754

AAATAG|GAG→AAATAG|CAG

Delete Q

E→Q

…

rs2409496

33825733

…

GART

Phosphoribosylglycinamide formyltransferase, phosphoribosylglycinamide synthetase, phosphoribosylaminoimidazole synthetase

NM_175085

AATCAG|GAG→AATCAG|CAG

Delete Q

E→Q

…

rs1638152^j

75774409

.476

DTX2

Deltex homolog 2 (Drosophila)

NM_020892

TTCTAG|GAG→TTCTAG|AAG

Delete E

G→E

…

rs3775296

187372916

.268

TLR3

Toll-like receptor 3

NM_003265

CTACAG|CAG→CTACAG|AAG

5′ UTR

…

rs1127307

14450378

…

RGS19IP1

Regulator of G-protein signaling 19 interacting protein 1

NM_202494

CAATAG|CGG→CAATAG|CAG

Delete A

…

8:0

rs1071716

35675135

…

TPM2

Tropomyosin 2 (beta)

NM_213674

CCCCAG|CCG→CCCCAG|CAG

Delete S

…

rs11660370

19028411

…

CABLES1

Cdk5 and Abl enzyme substrate 1

NM_138375

TTTCAG|ATG→TTTCAG|AAG

Delete R

C→S

…

rs2010657

23328476

…

GGT1

Gamma-glutamyltransferase 1

NM_013421

CCCCAG|CGG→CCCCAG|CAG

5′ UTR

…

rs2275992

58135000

.29

ZFP91

Zinc finger protein 91 homolog (mouse)

NM_170768

TTTTAG|TAG→TTTTAG|TGG

Delete S

S→G

7:0

rs10914468

31833299

.497

COL16A1

Collagen, type XVI, alpha 1

NM_001856

CTCCAG|AAG→CTCCAG|ACG

Delete R

…

rs4434604

67227167

…

TRIM55

Tripartite motif-containing 55

NM_033058

TACCAG|AAG→TACCAG|AGG

Delete E

…

rs751517

36345477

…

GGA1

Golgi associated, gamma adaptin ear containing, ARF binding protein 1

NM_013365

TTCCAG|CGG→TTCCAG|CAG

Delete A

…

rs11661706

5409875

…

EPB41L3

Erythrocyte membrane protein band 4.1-like 3

NM_012307

CTGCAG|AGG→CTGCAG|AAG

Delete E

…

rs2273431

51566157

.178

NID2

Nidogen 2 (osteonidogen)

NM_007361

ATGCAG|AGG→ATGCAG|AAG

Delete E

…

rs3746373

61409288

.113

KIAA1510

NM_020882

CCCCAG|CCG→CCCCAG|CAG

Delete A

…

rs7862221

132811775

…

TSC1

Tuberous sclerosis 1

NM_000368

CTTCAG|AAG→CTTCAG|AGG

Delete E

…

rs2290647

40198569

.371

KIAA1533

KIAA1533

NM_020895

CTCCAG|CGG→CTCCAG|CAG

Delete A

…

35:0

rs11553436

41232623

…

SERHL

Serine hydrolase-like

NM_170694

CTCCAG|CGG→CTCCAG|CAG

3′ UTR

…

rs12760076

44354572

…

DMAP1

DNA methyltransferase 1 associated protein 1

NM_019100

TTGCAG|ATG→TTGCAG|AAG

Delete K

M→K

…

rs2243603

1494911

.364

SIRPB1

Signal-regulatory protein beta 1

NM_006065

TTCCAG|AAG→TTCCAG|AAC

Delete E

A→P

…

rs3765018

54880319

.235

FLJ46354

NM_198547

TCCCAG|AAG→TCCCAG|AAA

3′ UTR

…

1:0

rs1558876

63876286

.455

KIAA1001

NM_014960

TTTCAG|CAC→TTTCAG|CAG

Delete S

T→S

2:0

rs9611697^k

40713170

…

SEPT3

Septin 3

NM_145733

GTGCAG|CAG→GTGCAG|CAC

Delete Q

Q→H

…

rs3014960

44975382

.221

COG3

Component of oligomeric golgi complex 3

NM_031431

ATACAG|CAG→ATACAG|CAA

Delete Q

…

rs11597439

104174943

…

CUEDC2

CUE domain containing 2

NM_024040

CTTCAG|AAG→CTTCAG|AAC

5′ UTR

…

2:0

rs11574323

31101968

.011

WRN

Werner syndrome

NM_000553

GGGTAG|AAT→GGGTAG|AAG

QS versus H

I→S

…

rs2425068

33678137

…

CPNE1

Copine I

NM_152927

CCCCAG|CAA→CCCCAG|CAG

Delete Q

…

7:0

rs3738833

36528845

…

LSM10

LSM10, U7 small nuclear RNA associated

NM_032881

CCACAG|CAA→CCACAG|CAG

5′ UTR

…

rs17105087

36250436

.378

SLC25A21

Solute carrier family 25 (mitochondrial oxodicarboxylate carrier), member 21

NM_030631

CTGCAG|CAA→CTGCAG|CAG

Delete Q

…

rs9606756

29331414

.131

TCN2

Transcobalamin II, macrocytic anemia

NM_000355

TCTAAG|AAA→TCTAAG|AAG

Delete E

I→V

2:0

rs2292402

142435751

.299

ACPL2

Acid phosphatase-like 2

NM_152282

GAGCAG|TGA→GTGCAG|TGA

5′ UTR

…

rs6670368^k

89138996

…

KAT3

Kynurenine aminotransferase III

NM_001008661

GTGTAG|GTG→GAGTAG|GTG

Insert stop

…

rs10152092

23103212

…

AP1G2

Adaptor-related protein complex 1, gamma 2 subunit

NM_003917

GACCAG|GTA→GAGCAG|GTA

Insert S

…

rs12905385

40808855

.262

CDAN1

Congenital dyserythropoietic anemia, type I

NM_138477

ATACAG|GAG→ATATAG|GAG

Delete G

…

rs2250205

33331338

.389

ITGB4BP

Integrin beta 4 binding protein

NM_181467

TGACAG|GAG→TGATAG|GAG

Delete G

…

rs5703

71040104

.068

PTGER3

Prostaglandin E receptor 3 (subtype EP3)

NM_198713

TAATAG|GAG→TAACAG|GAG

RE versus K

…

rs9635649^j

169594

…

GTPBP6

GTP binding protein 6 (putative)

NM_012227

TTTTAG|GAG→TTTCAG|GAG

Delete R

…

rs3087402

99510924

.011

REV1L

REV1-like (yeast)

NM_016316

TTCTAG|GAG→TTCCAG|GAG

Delete G

…

rs762605

9032117

…

SLC2A5

Solute carrier family 2 (facilitated glucose/fructose transporter), member 5

NM_003039

CCGCAG|GAG→CCGTAG|GAG

Delete E

…

rs3765166

172519287

.406

SLC25A12

Solute carrier family 25 (mitochondrial carrier, Aralar), member 12

NM_003705

TCACAG|GAG→TCATAG|GAG

Delete E

…

rs2174769

37675452

.378

SNIP1

Smad nuclear interacting protein

NM_024700

TTTTAG|GAG→TTTCAG|GAG

Delete E

…

rs9866111

101857430

.469

GPR128

G protein-coupled receptor 128

NM_032787

ACATAG|GAG→ACACAG|GAG

Delete G

…

rs10169344

113048652

.341

POLR1B

Polymerase (RNA) I polypeptide B, 128 kDa

NM_019014

TCCTAG|GAG→TCCCAG|GAG

KS versus N

…

rs4933199

92621451

…

RPP30

Ribonuclease P/MRP 30 kDa subunit

NM_006413

TTATAG|GAG→TTACAG|GAG

5′ UTR

…

rs3821010

178508162

.375

PDE11A

Phosphodiesterase 11A

NM_016953

CCATAG|GAG→CCACAG|GAG

Delete G

…

rs1046617

61844130

…

SLC2A4RG

SLC2A4 regulator

NM_020062

CCGCAG|GAG→CCGGAG|GAG

Delete R

…

rs7279250

45539120

…

C21orf86

Chromosome 21 ORF 86

NM_153454

GACGAG|GAG→GACGTG|GAG

5′ UTR

…

rs12883949

102465744

…

AMN

Amnionless homolog (mouse)

NM_030943

CCGCAG|GAG→CCGCAA|GAG

Delete G

…

rs2306949

71520483

…

MUC7

Mucin 7, salivary

NM_152291

TCCCAG|GAG→TCCCAA|GAG

5′ UTR

…

1:0

rs10981449^k

112415891

…

KIAA1958

KIAA1958

NM_133465

CTTTAG|GAG→CTTTAA|GAG

5′ UTR

…

rs11591994

98072403

…

DNTT

Deoxynucleotidyltransferase, terminal

NM_004088

CATTAG|GAG→CATTAC|GAG

Delete E

…

rs3966262

23330878

…

GGT1

Gamma-glutamyltransferase 1

NM_013421

CTTCAG|GAG→CTTCAA|GAG

5′ UTR

…

rs6111953^k

18355804

…

C20orf12

Chromosome 20 ORF 12

NM_018152

GTTAAG|GAG→GTTAAA|GAG

Delete E

…

rs11752742

64049085

…

GLULD1

Glutamate-ammonia ligase (glutamine synthase) domain containing 1

NM_016571

CTTTAG|GAG→CTTTAA|GAG

Delete E

…

rs193227

32421602

…

NOLA3

Nucleolar protein family A, member 3 (H/ACA small nucleolar RNPs)

NM_018648

GAGCAG|AAA→GAGCAA|AAA

Insert Q

…

rs1509545

79206987

…

C15orf26

Chromosome 15 ORF 26

NM_173528

ATTAAG|GTG→ATTAAG|GAG

5′ UTR

…

rs1140420

61804764

…

HERC1

HECT (homologous to the E6-AP (UBE3A) carboxyl terminus) domain and RCC1 (CHC1)-like domain (RLD) 1

NM_003922

CTTTAG|GAG→CTTTAG|GGG

Delete G

…

rs2272238

32751569

.017

DNM1L

Dynamin 1-like

NM_012062

TTCCAG|GGG→TTCCAG|GAG

Delete G

…

rs11781386

141484978

…

NM_031466

TTGCAG|GTG→TTGCAG|GAG

Delete G

…

rs1130329

4885795

.478

LOC389458

NM_203393

CCCCAG|GAG→CCCCAG|GCG

Delete E

E→A

…

rs13405053

54274242

…

ACYP2

Acylphosphatase 2, muscle type

NM_138448

ATACAG|GTG→ATACAG|GAG

Delete G

…

rs9272863

32718708

…

HLA-DQA1

Major histocompatibility complex, class II, DQ alpha 1

NM_002122

TTGCAG|GTG→TTGCAG|GAG

3′ UTR

…

rs4049844

23340850

…

GGT1

Gamma-glutamyltransferase 1

NM_005265

CCCCAG|GGG→CCCCAG|GAG

Delete G

…

rs4024419

151373558

…

MLL3

Myeloid/lymphoid or mixed-lineage leukemia 3

NM_170606

TCCTAG|GGG→TCCTAG|GAG

Delete E

G→E

…

rs2236215

15637721

.463

KIAA0962

NM_015291

TTTTAG|GAG→TTTTAG|GCG

Delete R

…

rs5744948

131829664

.012

POLE

Polymerase (DNA directed), epsilon

NM_006231

ACCCAG|GAG→ACCCAG|GCG

Delete E

E→A

…

rs4994616

148360461

…

TUFT1

Tuftelin 1

NM_020127

ACCCAG|GAG→ACCCAG|GCG

Delete E

E→A

…

rs1136881^j

6_hla_hap1

61968

…

HLA-DRB3

Major histocompatibility complex, class II, DR beta 3

NM_022555

TCTCAG|GAG→TCTCAG|GTG

RA versus T

R→S

…

rs3209388

19366385

…

RPS6

Ribosomal protein S6

NM_001010

GTTTAG|GAG→GTTTAG|GGG

Delete E

E→G

…

rs12816732

128219086

…

KIAA1944

NM_133448

TTCCAG|GAG→TTCCAG|GAA

Delete E

…

rs17036879

12535560

.055

TSEN2

tRNA splicing endonuclease 2 homolog (SEN2, S. cerevisiae)

NM_025265

CTTTAG|GAG→CTTTAG|GAA

Delete E

…

rs1132591

59514796

…

LILRA5

Leukocyte immunoglobulin-like receptor, subfamily A (with TM domain), member 5

NM_181879

TCCTAG|GAT→TCCTAG|GAG

Delete G

F→V

…

rs7182782

61729175

…

HERC1

HECT (homologous to the E6-AP (UBE3A) carboxyl terminus) domain and RCC1 (CHC1)-like domain (RLD) 1

NM_003922

TTTTAG|GAG→TTTTAG|GAA

Delete G

G→R

…

rs3734334

36898850

.012

CPNE5

Copine V

NM_020939

TTTCAG|GAA→TTTCAG|GAG

Delete R

N→S

…

rs3816208

97683801

.115

SDC2

Syndecan 2 (heparan sulfate proteoglycan 1, cell surface-associated, fibroglycan)

NM_002998

TTCCAG|GAG→TTCCAG|GAA

Delete G

A→T

…

rs4150167

82771185

.026

TAF1C

TATA box binding protein (TBP)-associated factor, RNA polymerase I, C, 110 kDa

NM_139353

CAGGAG|AAG→CAGGAG|AAA

Delete G / Delete GE

G→R

…

rs6836935

187918076

…

FAT

FAT tumor suppressor homolog 1 (Drosophila)

NM_005245

TTCCAG|GAA→TTCCAG|GAG

Delete G

N→D

…

rs2843414

199126353

…

PPP1R12B

Protein phosphatase 1, regulatory (inhibitor) subunit 12B

NM_002481

CTGTAG|GAG→CTGTAG|GAA

Delete G

V→I

…

rs9474034

51605478

…

PKHD1

Polycystic kidney and hepatic disease 1 (autosomal recessive)

NM_138694

CCACAG|GAG→CCACAG|GAA

Delete G

V→I

…

rs1040310

32115304

…

CYP21A2

Cytochrome P450, family 21, subfamily A, polypeptide 2

NM_000500

TGGCAG|GAC→TGGCAG|GAG

Delete E

D→E

…

rs2854741

40861902

.246

CYP2D7P1

Cytochrome P450, family 2, subfamily D, polypeptide 7 pseudogene 1

NM_001002910

ACACAG|GAC→ACACAG|GAG

Delete R

T→R

…

rs2156634

120281211

.191

GRIK4

Glutamate receptor, ionotropic, kainate 4

NM_014619

CTGCAG|GAG→CTGCAG|GAA

Delete E

…

rs2228112

127497758

.449

SLC12A2

Solute carrier family 12 (sodium/potassium/chloride transporters), member 2

NM_001046

TTCTAG|GAA→TTCTAG|GAG

Delete E

…

rs1152888

64891495

.327

IRAK3

Interleukin-1 receptor-associated kinase 3

NM_007199

CCTAAG|GAA→CCTAAG|GAG

Delete G

I→V

…

rs4822258

41780029

.242

TTLL1

Tubulin tyrosine ligase-like family, member 1

NM_001008572

CACCAG|GAG→CACCAG|GAA

Delete E

…

rs1654551

56104480

…

KLK4

Kallikrein 4 (prostase, enamel matrix, prostate)

NM_004917

CCTCAG|GAT→CCTCAG|GAG

Delete G

S→A

…

rs7344530^k

20119988

…

C20orf26

Chromosome 20 ORF 26

NM_015585

TTTCAG|GAT→TTTCAG|GAG

Delete E

D→E

…

rs3179757

18190846

…

PDE4C

Phosphodiesterase 4C, cAMP-specific (phosphodiesterase E1 dunce homolog, Drosophila)

NM_000923

TCTTAG|GAG→TCTTAG|GAA

Delete E

…

rs3815354

9763745

…

GAS7

Growth arrest-specific 7

NM_201433

TTCCAG|GAG→TTCCAG|GAC

Delete G

D→H

…

Open in a new tab

Note.— Data are from UCSC Human Genome Browser SNP source download January 9, 2005).

Plausible NAGNAG acceptors are shown in bold italics; implausible NAGNAG acceptors are shown in italics.

NA = no approved gene name.

9-nt acceptor context (−6…+3); reference genome sequence is on the left, and alternative SNP allele is on the right.

AG = one of both AG affected; N = one N in the NAGNAG pattern is affected.

Position of the SNP in the 9-nt acceptor context (from 1 = intron position −6 to 9 = exon position +3).

Intron phase of the intron or 5′/3′ UTR.

Impact of alternative splicing on the protein sequence.

aa change of missense SNPs.

ⁱ

Number of ESTs/mRNAs confirming the NAGNAG splice event alternative to the given RefSeq.

UCSC Human Genome Browser paralog warning; therefore, excluded from further analysis and table 6.

dbSNP entry is based on a sequencing error; therefore, excluded from further analysis and table 6.

Searching dbEST (December 2004), we obtained confirmation for alternative splicing at 16% (19 of 121) of these tandem acceptors. However, this percentage must be considered a lower bound. In addition to the general limitations of an EST-based evaluation of alternative splicing (insufficient EST coverage, especially for tandem acceptors that are spliced in a tissue-specific manner), the allele frequencies of the NAGNAG alleles and populational biases in EST sampling introduce further constrictions. Noteworthy, 18 (95%) of the 19 confirmed tandem acceptors match the consensus HAGHAG (H denotes A, C, or T). Thus, 26% of the 68 polymorphic HAGHAGs are EST confirmed, whereas only 1.9% of the 53 acceptors carrying G at one or both variable positions of the NAGNAG motif are EST supported. This is in line with our previous genomewide analysis, in which 31% of the HAGHAGs and only 1.7% of the remaining NAGNAGs were found to be experimentally confirmed (see table 1 of Hiller et al. [2004]). On the basis of these differences in the degree of confirmation by mRNA and EST data, we propose to subdivide all tandem acceptors into “plausible” (HAGHAG) and “implausible” (GAGHAG, HAGGAG, or GAGGAG) acceptors. Further support for this classification comes from the genomewide observation that all plausible NAGNAGs have the same bias toward intron phase 1, as described elsewhere (Hiller et al. 2004) for experimentally confirmed NAGNAGs, whereas the introns with implausible tandem acceptors are not biased toward phase 1 (table 3).

Table 3.

Phase Distribution of Human Introns and NAGNAG Acceptors^[Note]

	No. (%) of Introns by Phase
Intron Class	0	1	2
Confirmed NAGNAGs^a^,^b	349 (39.8)	379 (43.2)	150 (17.0)
Plausible NAGNAGs^b	1,111 (42.5)	1,099 (42.0)	405 (15.5)
Implausible NAGNAGs^b	2,568 (54.5)	1,466 (31.1)	677 (14.4)
All introns^c	(46)	(33)	(21)

Open in a new tab

Note.— Only NAGNAGs that are located upstream of a coding exon are considered.

EST/mRNA confirmed.

From table 1 of Hiller et al. (2004).

Genomewide frequencies (Long and Deutsch 1999).

Accordingly, 68 (56%) of the 121 SNPs affect a plausible NAGNAG. However, four of those convert a plausible into another plausible NAGNAG, which has presumably no drastic consequence for NAGNAG splicing, even though we cannot exclude the possibility of changes in the ratio of E to I transcripts or of changes in tissue specificity. Thus, we consider the remaining 64 (53%) SNPs as relevant for NAGNAG splicing (table 4).

Table 4.

SNPs Affecting Plausible NAGNAG Acceptors^[Note]

dbSNP ID	Gene Symbol	RefSeq ID	Exon	Nucleotide Pattern^a	mRNA/EST^b	Intron Phase^c	Protein Impact^d	Chimpanzee^e	Human versus Chimpanzee^f
rs2307130	AGL	NM_000644	2	CTCTAG\|AAG→CTCTGG\|AAG		…	5′ UTR	TAGAAG	Loss
rs12944821	AP1GBP1	NM_007247	3	TTTCAG\|CAG→TTTCAG\|GAG	12:5	1	Delete G	CAGCAG	Loss
rs363209	APPBP1	NM_003905	7	AAACAG\|CAC→AAGCAG\|CAC		2	Insert S	AAACAG	Gain
rs2287800	AQP8	NM_001169	2	CGGCAG\|ATA→CAGCAG\|ATA	1:14	0	Insert Q	CGGCAG	Gain
rs13228988	AUTS2	NM_015570	8	CGATAG\|CAG→CGATAA\|CAG	4:2	2	Delete S	TAGCAG	Loss
rs8176139	BRCA1^h	NM_007304	8	GTTTAG\|CAG→GTTGAG\|CAG	26:7	0	Delete Q	TAGCAG	Loss
rs1152522	C14orf105	NM_018168	4	TCATAG\|CAG→TCATGG\|CAG	3:6	0	Delete Q	TAGCAG	Loss
rs1044833	C1QDC1^h	NM_001002259	18	AAACAG\|CAG→AAACAT\|CAG		1	Delete A	CAGCAG	Loss
rs11567804	C3AR1	NM_004054	2	TTGCAG\|AAG→TTGCAA\|AAG		…	5′ UTR	CAGAAG	Loss
rs11660370	CABLES1	NM_138375	3	TTTCAG\|ATG→TTTCAG\|AAG		2	Delete R	CAGATG	Gain
rs1804783	CACNA1A	NM_023035	39	TTGCAG\|GAG→TTGCAG\|TAG		1	Delete V	CAGGAG	Gain
rs5248	CMA1	NM_001836	3	CAACAG\|GTC→CAGCAG\|GTC	15:21^g	2	Insert S	CAACAG	Gain
rs3014960	COG3	NM_031431	14	ATACAG\|CAG→ATACAG\|CAA		0	Delete Q	CAGCAA	Gain
rs10914468	COL16A1	NM_001856	5	CTCCAG\|AAG→CTCCAG\|ACG		2	Delete R	CAGACG	Gain
rs2425068	CPNE1	NM_152927	16	CCCCAG\|CAA→CCCCAG\|CAG	79:7	0	Delete Q	CAGCAA	Gain
rs9463545	CRISP1	NM_001131	3	TAACAG\|AAG→TAACCG\|AAG		0	Delete K	CAGAAG	Loss
rs11597439	CUEDC2	NM_024040	2	CTTCAG\|AAG→CTTCAG\|AAC	49:2	…	5′ UTR	CAGAAG	Loss
rs3020724	CYP17A1	NM_000102	8	CTGCAG\|AGC→CAGCAG\|AGC		1	Insert A	CTGCAG	Gain
rs3025420	DBH	NM_000787	11	CACCAG\|GTT→CAGCAG\|GTT		2	Insert S	CACCAG	Gain
rs12760076	DMAP1	NM_019100	8	TTGCAG\|ATG→TTGCAG\|AAG		0	Delete K	CAGATG	Gain
rs11661706	EPB41L3	NM_012307	12	CTGCAG\|AGG→CTGCAG\|AAG		1	Delete E	CAGAGG	Gain
rs2271959	ETV4	NM_001986	3	TCGCAG\|AAA→TAGCAG\|AAA	3:17	0	Insert Q	TCGCAG	Gain
rs13251099	FLJ36980	NM_182598	2	AAATAG\|GTC→AAGTAG\|GTC		…	5′ UTR	AAATAG	Gain
rs3765018	FLJ46354	NM_198547	23	TCCCAG\|AAG→TCCCAG\|AAA	11:1	…	3′ UTR	CAGAAA	Gain
rs6285	GABRB1	NM_000812	3	CGGCAG\|GGC→CAGCAG\|GGC		1	Insert A	NA	NA
rs4590242	GABRR1	NM_002042	2	TGGTAG\|GCC→TAGTAG\|GCC	2:2	2	Insert S	TAGTAG	Loss
rs2409496	GART	NM_175085	6	AATCAG\|GAG→AATCAG\|CAG		0	Delete Q	CAGGAG	Gain
rs751517	GGA1	NM_013365	10	TTCCAG\|CGG→TTCCAG\|CAG		1	Delete A	NA	NA
rs2010657	GGT1^h	NM_013421	2	CCCCAG\|CGG→CCCCAG\|CAG		…	5′ UTR	CAGCGG	Gain
rs9644946	GOLGA1	NM_002077	8	AAATAG\|GAG→AAGTAG\|GAG		0	Insert stop	AAATAG	Gain
rs2243187	IL19	NM_153758	5	TCACAG\|CAG→TCACAA\|CAG		0	Delete Q	CAGCAG	Loss
rs2290609	IL5RA	NM_000564	5	CAACAG\|TTT→CAGCAG\|TTT		1	I versus TV	CAACAG	Gain
rs2297988	KIAA0690	NM_015179	33	AAGCAG\|AAA→GAGCAG\|AAA		0	Insert Q	GAGCAG	Gain
rs1558876	KIAA1001	NM_014960	6	TTTCAG\|CAC→TTTCAG\|CAG	10:2	2	Delete S	CAGCAG	Loss
rs3746373	KIAA1510	NM_020882	6	CCCCAG\|CCG→CCCCAG\|CAG		1	Delete A	CAGCCG	Gain
rs2290647	KIAA1533	NM_020895	11	CTCCAG\|CGG→CTCCAG\|CAG	34:35	1	Delete A	CAGCGG	Gain
rs3738833	LSM10	NM_032881	2	CCACAG\|CAA→CCACAG\|CAG		…	5′ UTR	CAGCAA	Gain
rs479984	MGC35555	NM_178565	5	TACTAG\|AAG→TACTAA\|AAG		1	Delete E	TAGAAG	Loss
rs3751353	MGC48915	NM_178540	4	ATTTAG\|GAG→ATTTAG\|CAG		1	Delete A	TAGGAG	Gain
rs11042902	MRVI1	NM_006069	2	AACCGG\|CAG→AACCAG\|CAG	2:2	2	NR versus K	CAGCAG	Loss
rs2298847	MT1G	NM_005950	2	TAGCAG\|GTG→TTGCAG\|GTG	59:14	1	Insert A	TTGCAG	Gain
rs2273431	NID2	NM_007361	10	ATGCAG\|AGG→ATGCAG\|AAG		1	Delete E	CAGAGG	Gain
rs12974798	NTE	NM_006702	35	TCGCAG\|GAG→TCGCAG\|AAG		0	Delete K	CAGGAG	Gain
rs17173698	PAPSS2	NM_004670	2	TTATAG\|GAG→TTATAG\|AAG		0	Delete K	TAGGAG	Gain
rs3842776	PARVG	NM_022141	4	TTCCAG\|GAG→TTCCAG\|CAG		1	Delete A	CAGGAG	Gain
rs1438073	PDE1A	NM_001003683	3	AAATAG\|ACT→AAGTAG\|ACT		2	Insert R	AAATAG	Gain
rs3816280	PPP4R1	NM_005134	5	CAATAG\|AAC→CAGTAG\|AAC		1	Insert V	CAATAG	Gain
rs879022	REGL	NM_006508	3	GGACAG\|GAG→GGACAG\|AAG		1	Delete E	CAGGAG	Gain
rs1127307	RGS19IP1	NM_202494	6	CAATAG\|CGG→CAATAG\|CAG	87:8	1	Delete A	NA	NA
rs16960071	SEMA6D	NM_020858	16	ATGAAG\|GCT→AAGAAG\|GCT		2	Insert R	ATGAAG	Gain
rs11553436	SERHL	NM_170694	11	CTCCAG\|CGG→CTCCAG\|CAG		…	3′ UTR	CAGCGG	Gain
rs2243603	SIRPB1	NM_006065	5	TTCCAG\|AAG→TTCCAG\|AAC		1	Delete E	CAGAAC	Gain
rs17105087	SLC25A21	NM_030631	7	CTGCAG\|CAA→CTGCAG\|CAG		0	Delete Q	CAGCAA	Gain
rs2521612	SLC4A1	NM_000342	17	CCGTAG\|GCT→CAGTAG\|GCT		2	Insert R	CCGTAG	Gain
rs9621415	SLC5A4	NM_014227	9	CGGCAG\|GTC→CAGCAG\|GTC		0	Insert Q	CGGCAG	Gain
rs9606756	TCN2	NM_000355	2	TCTAAG\|AAA→TCTAAG\|AAG	62:2	1	Delete E	AAGAAA	Gain
rs11466221	TGFA^h	NM_003236	2	CAACAG\|GTA→CAGCAG\|GTA		1	Insert A	CAACAG	Gain
rs2245425	TOR1AIP1^h	NM_015602	3	TAGCAG\|TGA→TAACAG\|TGA	13:41	1	Insert A	TAGCAG	Loss
rs1071716	TPM2	NM_213674	6	CCCCAG\|CCG→CCCCAG\|CAG		2	Delete S	CAGTCG	Gain
rs4434604	TRIM55	NM_033058	8	TACCAG\|AAG→TACCAG\|AGG		1	Delete E	CAGAAG	Loss
rs7862221	TSC1	NM_000368	14	CTTCAG\|AAG→CTTCAG\|AGG		1	Delete E	CAGAAG	Loss
rs11574323	WRN	NM_000553	23	GGGTAG\|AAT→GGGTAG\|AAG		2	QS versus H	TAGAAT	Gain
rs2275992	ZFP91^h	NM_170768	5	TTTTAG\|TAG→TTTTAG\|TGG	31:7	2	Delete S	TAGTAG	Loss
rs200925	ZNF248	NM_021045	5	TAACAG\|GGT→TAGCAG\|GGT		1	Insert A	TAACAG	Gain

Open in a new tab

Note.— NA = sequence context not available in panTro1.

9-nt acceptor context; reference genome sequence is on the left, and SNP allele is on the right. Polymorphic position is shown in bold italics.

Number of mRNA and ESTs that match the E:I transcripts (shown only if both transcripts are EST confirmed).

Phase of the intron or 5′/3′ UTR.

Impact of alternative NAGNAG splicing on the protein sequence.

Chimpanzee sequence orthologous to human NAGNAG.

Gain = plausible NAGNAG in one of the human alleles and no or implausible NAGNAG in chimpanzee; loss = no or implausible NAGNAG in one of the human alleles and plausible NAGNAG in chimpanzee.

Experimentally confirmed within the present study.

Experimentally confirmed elsewhere (Tadokoro et al. 2005).

Cases of SNPs that comprise NAGNAG-acceptor and non-NAGNAG–acceptor alleles represent knockout experiments made by nature. We took this opportunity to investigate the assumed correlation between NAGNAG-acceptor genotypes and the appearance of E and I transcripts. Such a study seemed reasonable, since, so far, it has been performed in artificial splicing systems only (Tadokoro et al. 2005). We selected six SNPs with a heterozygosity of >0.2 that affect EST-confirmed HAGHAG acceptors for genotyping and detection of transcript forms. In two cases, we did not find either genotypes with at least one NAGNAG allele or genotypes that are homozygous for the non-NAGNAG allele. In the remaining four cases, we consistently observed E and I transcripts in cells with at least one HAGHAG allele, whereas cells that do not have a HAGHAG acceptor allele produced only one transcript (table 1). This strict correlation between NAGNAG alleles and alternative splicing is illustrated for ZFP91 and TOR1AIP1 in figure 2. These results confirm that NAGNAG motifs are necessary for this type of alternative splicing.

SNPs that affect plausible NAGNAG acceptors as knockout experiments made by nature. A, Schematic representation of the nomenclature of NAGNAG acceptors (*left*) and transcripts (*right*). B, SNP *rs2245425* affecting the E acceptor of *TOR1AIP1* exon 3 leads to the exclusive expression of the I transcript from the A allele (NAGNAG position −4; for numbering scheme, refer to fig. 1). C, SNP *rs2275992* affecting the I acceptor of *ZFP91* exon 5 leads to the exclusive expression of the E transcript from the G allele (position −2). Homozygous NAGNAG allele (a), heterozygous (b), and homozygous non-NAGNAG allele (c) are shown as genomic with genotypes (*left*); cDNA with E:I transcript ratio determined by counting subcloned and sequenced RT-PCR fragments (*right*). The asterisk (*) denotes E transcripts that can be assigned to the SNP alleles in the I acceptor (A=15; G=72).

Next, we asked whether NAGNAG motifs created by the nonancestral SNP alleles are also sufficient for alternative splicing. With regard to the human reference sequence, in 36 (56%) of 64 cases, a novel NAGNAG is created; in 18 (28%), a known NAGNAG is destroyed by affecting an AG; and in 10 (16%), the N positions are changed. Since the appearance of a SNP allele in the current human genome build is rather random and does not reflect either the relative allele frequency in a defined population or its evolutionary history, the best reference for the question of gain versus loss of NAGNAG acceptors is the UCSC Chimpanzee Genome Browser (panTro1, November 2003). When the sequence context of the 64 plausible NAGNAG-affecting SNPs is compared, for 61 (95%), the orthologous chimpanzee nucleotide is identical to one of both human alleles, which we therefore consider the ancestral one (Watanabe et al. 2004). In 43 cases, the plausible NAGNAG is gained (nonancestral), and, in 18 cases, it is lost (ancestral). Consistent with our assumption that novel plausible NAGNAGs are very likely functional, we found EST evidence of alternative splicing in 16% (7 of 43) (table 4). To provide further experimental support that respective SNP alleles enable alternative NAGNAG splicing, we selected two nonancestral plausible NAGNAGs without EST evidence. As expected, in leukocytes of individuals heterozygous or homozygous for the respective tandem allele of rs5248, we observed the expression of E and I transcripts (GenBank accession numbers DQ082727 and DQ082729) in the ratios 4:14 and 11:7, respectively (table 4). In the case of rs17105087, we were unable to identify the nonancestral allele in our white population sample. By analyzing the human-chimpanzee genomic sequence context of the eight confirmed nonancestral NAGNAG alleles, we found three cases in which both genomes are identical in a long range (rs2287800 [−140/+123 identical nucleotides], rs3765018 [−130/+95 nt], and rs2290647 [−105/+70 nt]). Since most splice enhancers function only within a distance of <100 nt from the affected splice site (Schaal and Maniatis 1999), these findings suggest that NAGNAG motifs are sufficient for alternative splicing in the context of a previously non-NAGNAG acceptor.

Evolutionary Aspects of SNPs in NAGNAG Acceptors

At first glance, surprisingly, the large majority (43 [70%] of 61) of the plausible NAGNAGs are created (35 novel tandem AG alleles and 8 conversions of implausible into plausible), whereas only 18 are destroyed (16 AG destructions and 2 conversions of plausible into implausible). Therefore, we questioned whether there is a trend toward gain-of-NAGNAG acceptors in the human lineage. To test this, we used a null model that maps SNPs to randomly chosen acceptors (see appendix A) and found nearly the same relation for gain and loss of plausible NAGNAG acceptors. Thus, the high number of nonancestral plausible NAGNAGs is presumably a consequence of the fact that NAGNAG motifs represent only 5% of all human acceptors (Hiller et al. 2004). In consequence, in recent primate genomes, a constant bias seems to exist toward the accumulation of NAGNAG acceptors, which leads to an increased complexity of the transcriptome and proteome, antagonized by purifying selection. The question of whether the currently observed NAGNAG fraction among human acceptors represents the saturation level has to be addressed by further comparative genomewide analyses.

Furthermore, we observed striking differences in the numbers of SNPs that affect the AG of the E or I acceptor in ancestral plausible and implausible NAGNAGs, respectively. For the 16 ancestral plausible HAGHAGs, the E acceptor is affected in 11 cases and the I acceptor in 5. In contrast, for 22 implausible HAGGAGs (one ancestral GAGGAG and two GAGHAGs were omitted), we found 5 and 17 cases, respectively (Fisher’s exact test P=.00766). Interestingly, we observed the same trend by comparing all 138 human NAGNAGs that are not conserved in the chimpanzee genome (one GAGGAG and seven GAGHAGs were omitted). The I acceptors of 79 HAGHAGs are affected in 56% (44), whereas the GAG of 59 HAGGAGs is affected in 83% (49) (Fisher’s exact test P=.0009). Implausible GAGGAG and GAGHAG motifs were not considered, since the number of cases is too small.

Since tandem acceptors are nonrandomly distributed in the human genome, with a bias toward intron phase 1 and toward single-aa indels in phase 1 and 2, we questioned whether the nonancestral plausible NAGNAGs are also biased. Indeed, these NAGNAGs show the same bias toward intron phase 1, and they also have a strong tendency to result in single-aa indels (table 5). Thus, the process of establishing SNPs that are relevant for alternative NAGNAG splicing in the human population seems to be a nonrandom process that is subject to the same evolutionary forces as the maintenance of the tandem acceptors themselves.

Table 5.

Intron Phase Distribution and Single aa Events of Nonancestral Plausible NAGNAG Acceptors^[Note]

	No. (%) of Introns by Phase
Acceptor	0	1	2	No. (%) ofSingle-aa Events,Phases 1 and 2
Nonancestral NAGNAG alleles^a	12 (31.6)	16 (42.1)	10 (26.3)	24 (92.3)
Nonpolymorphic confirmed NAGNAGs^b	349 (39.8)	379 (43.2)	150 (17.0)	487 (92.1)

Open in a new tab

Note.— Only NAGNAGs that are located upstream of a coding exon are considered.

Plausible polymorphic NAGNAGs for which the chimpanzee acceptor has no NAGNAG.

EST/mRNA-confirmed NAGNAGs (Hiller et al. 2004).

Discussion

Since splicing variations are coming more and more into the research focus of human molecular genetics (Lopez-Bigas et al. ²⁰⁰⁵), novel approaches are needed to identify splice-relevant SNPs. By data mining the SNP annotation of the UCSC Human Genome Browser, we identified 121 variations that may affect alternative splicing by creation, destruction, or changing of NAGNAG acceptors. To improve the specificity of our prediction, we classified NAGNAG acceptors into “plausible” (HAGHAG) and “implausible” (GAGHAG, HAGGAG, or GAGGAG) ones. This subdivision of the tandem acceptors, primarily based on the degree of confirmation by mRNA and EST data, is further supported by (1) the fact that GAG acceptors are very rare (Stamm et al. 2000), (2) our genomewide observation that only plausible and not implausible NAGNAGs have the same bias toward intron phase 1 as experimentally confirmed NAGNAGs (Hiller et al. 2004), and (3) the observed differences in the numbers of SNPs that affect the AG of the E or I acceptor in ancestral plausible and implausible NAGNAGs, respectively. The last indicates that the selection pressure to maintain the E acceptor for HAGGAGs is higher than the pressure to preserve the coding sequence, since destruction of the HAG acceptor will leave a GAG that is unlikely to act as an acceptor site. In contrast, for plausible HAGHAGs, destruction of either AG is much less deleterious, since the other will still function as an acceptor. Thus, the identified 64 SNPs in plausible NAGNAGs are highly predictive of variations in alternative splicing. Nevertheless, it represents an experimental and bioinformatic challenge for future research to elucidate what makes the rare cases of confirmed implausible NAGNAG acceptors.

Although it seems obvious that the disruption of a plausible NAGNAG acceptor abolishes the formation of alternative transcripts, SNPs in these motifs provide us with unique knockout experiments by nature to confirm this hypothesis experimentally. Analyzing the expression of E and I transcripts in cells with at least one HAGHAG allele or without HAGHAG alleles, we have shown that the NAGNAG motif is necessary for this type of alternative splicing. In a subsequent analysis, we asked whether NAGNAG motifs created by the nonancestral SNP alleles allow alternative splicing. Usually, the introduction of an AG anywhere in the pre-mRNA does not create a functional acceptor site, since a polypyrimidine tract upstream and possibly enhancer sequences are required for recognition by the spliceosome. However, we suppose that the creation of a second AG 3 bases up or downstream of an existing acceptor is very likely to result in a functional tandem acceptor, since the splice-relevant sequence context is already present.

Referring to the chimpanzee genome as the reference for ancestral SNP alleles, we found EST and RT-PCR evidence that novel plausible NAGNAGs are most likely functional. This implies that a change of a normal acceptor to a plausible NAGNAG acceptor by a single mutation is sufficient to enable alternative splicing. Although the mechanism of NAGNAG splicing is not understood in detail, our findings argue against a general involvement of signals other than the NAGNAG itself. Thus, we conclude that SNPs in plausible NAGNAGs have an influence on NAGNAG splicing, regardless of whether the NAGNAG is ancestral. However, additional signals might be necessary for regulation of alternative splicing at tandem receptors.

Most interestingly, 23% (15 of 64) of SNPs in plausible NAGNAGs are translationally nonsilent and, thus, introduce a novel dimension of variability on the protein level by changing the I acceptor and the aa sequence of the E protein. Whereas homozygotes express either one or two isoforms, heterozygosity results in three different proteins (fig. 3). As listed in the Human Gene Mutation Database, the aa change can be dramatic—for example, as from Glu to the oppositely charged Lys in PAPSS2 (rs17173698), which leads to a decrease in immunoreactive protein (Xu et al. 2002). However, the third isoform of the protein generated by alternative NAGNAG splicing had not been taken into consideration. Moreover, it is conceivable that some of the SNPs in NAGNAG acceptors that allow the formation of three protein isoforms in heterozygotes may confer a heterozygous advantage.

SNP affecting the I acceptor and the aa sequence of the E protein (*rs2275992* in *ZFP91*). Homozygosity of the G allele without a NAGNAG results in the expression of one protein (A), homozygosity of the A allele with the NAGNAG results in two (B), and heterozygosity results in three isoforms (C). All three transcripts are confirmed by at least four ESTs/mRNAs. The two allele variants are highlighted in light and dark gray. Amino acids are shown below the second codon position. Upper- and lowercase letters indicate exonic and intronic nucleotides, respectively. Exons are boxed.

Alternative splicing at tandem acceptors can result in the gain/loss of a premature stop codon in the mRNA. Among SNPs affecting plausible NAGNAGs, the G allele of SNP rs9644946 changes the acceptor context of GOLGA1 exon 8 from AAATAG to AAGTAG. Since intron 7 resides in phase 0, an inframe TAG insertion would be the consequence if the novel E acceptor is used. Interestingly, the gene codes for an autoantigen associated with Sjogren syndrome (MIM 270150). Since the E acceptor is preferred in alternative NAGNAG splicing (Hiller et al. 2004), the novel AAG acceptor is likely to be functional. The resulting E transcript is a candidate for nonsense-mediated mRNA decay (Maquat 2004). Thus, the AAGTAG allele would result in a lower protein expression. Alternatively, it is possible that the mRNA containing the premature stop codon escapes degradation, and the truncated protein may exhibit autoantigenic properties. It remains to be elucidated in populations with a sufficiently high allele frequency (e.g., 0.099 in the PERLEGEN panel that contains 24 samples of Chinese descent), regardless of whether alternative splicing at the AAGTAG acceptor contributes to the disease.

A second example of potential disease relevance is the SNP rs363209, the G allele of which creates a novel plausible AAGCAG acceptor of intron 6 in APPBP1 (GenBank accession number NM_003905). The APP-BP1 protein binds to the carboxyl-terminal region of the amyloid precursor protein (APP) and interacts with the ubiquitin-activating enzyme E1C (UBE1C [homolog to yeast Uba3]) in the process of neddylation (Walden et al. 2003). APP plays a central role in Alzheimer disease and Down syndrome. Dysfunction of the APP-BP1 interaction with APP has been suggested to be one cause of Alzheimer disease (Chen 2004). The protein-protein interactions of the APP-BP1 E and I isoforms may be different and modulate the respective processes. It should be mentioned that the UBE1C gene (GenBank accession number NM_003968) itself contains a tandem acceptor (CAGAAG in front of exon 11). This may further increase the flexibility of the neddylation process by all four combinations of the E/I protein isoforms from two genes each.

The disease relevance of a NAGNAG SNP is demonstrated for the ABCA4 gene (Maugeri et al. 1999). Maugeri et al. (1999) describe a NAGNAG mutation (2588G→C, changing the acceptor site TAGGAG→TAGCAG) that has a much higher frequency in patients with Stargardt disease 1 (STGD1 [MIM 248200]) and that is assumed to be a mild mutation that causes STGD1 in combination with a severe ABCA4 mutation. By experimental analysis of the splice patterns of two patients with STGD1 who carry the mutation and one control individual, they found that only the alleles with the TAGCAG produce two splice forms. Our study exactly predicts this mutation outcome.

In general, most of the SNPs that are described in the present study—in particular, these in plausible NAGNAGs—affect the E:I transcript ratio, depending on the cell’s genotype. SNP alleles with a destroyed E acceptor cause the exclusive expression of the I transcript. Alleles that destroy an I acceptor result in an exclusive expression of the longer E transcript. SNPs that comprise a plausible and an implausible NAGNAG allele will seriously hamper or disable splicing at the GAG acceptor. It has already been shown that a change in the ratio of alternative splice forms can cause diseases. For example, the change in the ratio of the alternative MAPT transcripts containing three or four microtubule-binding repeats may be causal for frontotemporal dementia (MIM 600274) (Spillantini et al. 1998). Another example is the WT1 gene, in which alternative donor usage results in two protein isoforms that differ in 3 aa (+KTS/−KTS isoforms) and function (Englert et al. 1995). The altered ratio of +KTS/−KTS leads to Frasier syndrome (MIM 136680) (Barbaux et al. 1997). This situation is similar to that of NAGNAG acceptors, since E/I protein isoforms are observed that have functional differences (Condorelli et al. 1994; Tadokoro et al. ²⁰⁰⁵).

Altogether, 28% (18 of 64) of the plausible NAGNAG SNPs occur in known disease genes (table 6). Thus, they are preferable candidates for more-detailed functional analysis and association studies to link alternative splicing with diseases. Currently, there are no general methods that allow the prediction of splice-relevant SNPs. Focusing on SNPs that affect NAGNAG acceptors, we present a highly effective approach for the identification of SNPs that result in variations in alternative splicing patterns.

Table 6.

Human Disease Genes with SNPs Affecting Plausible NAGNAG Acceptors

dbSNP ID	Gene Symbol	RefSeq ID	Disease	MIM Number(s)	PubMed ID(s)
rs3020724	CYP17A1	NM_000102	Adrenal hyperplasia, congenital	#202110, *609300	4303304
rs12042060	FIBL-6	NM_031935	Age-related macular degeneration	#603075, *608548	14570714
rs2243187	IL19	NM_153758	Asthma	*605687	15557163
rs8176139	BRCA1	NM_007304	Breast cancer	*113705, #114480	9167459
rs11567804	C3AR1	NM_004054	Bronchial asthma	*605246	15278436
rs3025420	DBH	NM_000787	Congenital dopamine-beta-hydroxylase deficiency	#223360, *609312	14991826
rs2409496	GART	NM_175085	Down syndrome	*138440	9328467
rs1804783	CACNA1A	NM_023035	Episodic ataxia-2, familial hemiplegic migraine, spinocerebellar ataxia-6, idiopathic generalized epilepsy	#183086, #141500, #108500, *601011	8988170, 8898206, 9302278
rs2010657	GGT1	NM_013421	Glutathionuria	+231950	238530, 7623451
rs2307130	AGL	NM_000644	Glycogen storage disease type III	+232400	9032647, 10925384
rs1833783	FTL	NM_000146	Hyperferritinemia-cataract syndrome	#600886, *134790	7493028, 12199804
rs11661706	EPB41L3	NM_012307	Meningioma, lung cancer	*605331	10888600, 9892180
rs2275992	ZFP91	NM_170768	Acute myeloid leukemia	#601626	12738986,
rs1071716	TPM2	NM_213674	Nemaline myopathy-4, distal arthrogryposis 1	#609285, #108120, *190990	11738357, 12592607
rs2521612	SLC4A1	NM_000342	Renal tubular acidosis, ovalocytosis, spherocytosis	#179800, 166900, +109270	9600966, 1737855, 9973643
rs9644946	GOLGA1	NM_002077	Sjogren syndrome	270150, *602502	9324025
rs17173698	PAPSS2	NM_004670	Spondyloepimetaphyseal dysplasia	*603005	9714015
rs9606756	TCN2	NM_000355	Transcobalamin II deficiency	+275350	14632784
rs7862221	TSC1	NM_000368	Tuberous sclerosis	#191100, *605284	12773162, 14551205
rs11574323	WRN	NM_000553	Werner syndrome	#277700, *604611	9012406, 8968742

Open in a new tab

Acknowledgments

The skillful technical assistance of Ivonne Görlich is gratefully acknowledged. This work was supported by German Ministry of Education and Research grants 01GS0426 (to S.S.) and 01GR0105 and 0312704E (to M.P.) as well as Deutsche Forschungsgemeinschaft grant SFB604-02 (to M.P.).

Appendix A: Randomization Null Model for NAGNAG SNPs

To assess whether there is a preference for creating plausible NAGNAGs, we used a simulation that assigns a new acceptor to the 2.896 SNPs that overlap an acceptor in the 9-nt context and evaluates a possible NAGNAG-relevant outcome. For the 2,896 SNPs, we blasted the 101-nt genomic context (50 nt upstream and 50 nt downstream of the SNP) against the chimpanzee genome to determine the ancestral allele variant. We kept alignments with at least 95% identity and no mismatches in a ±5-nt context around the SNP position. This yielded a total of 2,439 SNPs. Then, we blasted the 103-nt contexts (50 nt up- and downstream of the acceptor NAG) of 10,000 human acceptor sites (excluding the acceptors that are overlapped by a known SNP) against the chimpanzee genome and kept 8,082 for which we found an alignment (95% identity and no mismatch ±10 nt around the NAG). Then, we assigned a new acceptor (randomly chosen from the 8,082) to a given SNP. We chose an acceptor with the ancestral allele variant at the respective position (e.g., if a SNP changes a C→G at position 4 of the 9-nt context, the new acceptor must also have a C at position 4). Since a methylated C in a CG context frequently mutates to a T, we assigned a new acceptor with the same sequence context at this position if the SNP represents a C→T mutation in a CG context (or a G→A mutation in a GC context on the opposite strand). This assures that context-dependent mutations are simulated in the same context. If a new acceptor is assigned to a SNP, we evaluated the possible impact on a NAGNAG acceptor. For each of the 2,439 SNPs, we successively assigned 10 randomly chosen acceptors (avoiding duplicate assignments).

The whole procedure was repeated 10 times, with different starts of the random-number generator. We calculated the following statistics from the 10 runs: (1) minimum and maximum percentage of creation versus destruction of a plausible NAGNAG, (2) minimum and maximum percentage of changes from a plausible to an implausible NAGNAG versus changes from an implausible to a plausible NAGNAG, and (3) minimum and maximum percentage of “gain of plausible NAGNAG” versus “loss of plausible NAGNAG.” “Gain of plausible NAGNAG” is the sum of created, plausible NAGNAGs and changes from implausible to plausible. “Loss of plausible NAGNAG” is the sum of destroyed, plausible NAGNAGs and changes from plausible to implausible. These values were compared with the observed values by Fisher’s exact test. For (1), we obtained P values between .52 and .75, for (2), P values between .72 and 1, and, for (3), P values between .66 and .88. Thus, the observed bias toward “gain of plausible NAGNAG” is comparable to the expectation.

Web Resources

Accession numbers and URLs for data presented herein are as follows:

dbEST, ftp://ftp.ncbi.nlm.nih.gov/blast/db/FASTA/est_human.gz (for the human portion of dbEST)
GenBank, http://www.ncbi.nlm.nih.gov/Genbank/ (for the human mRNA download, ZFP91 [accession number NM_170768], DTX2 [accession numbers DQ082728 and DQ082730], CMA1 [accession numbers DQ082727 and DQ082729]), APPBP1 [accession number NM_003905], and UBE1C [accession number NM_003968])
Human Gene Mutation Database, http://www.uwcm.ac.uk/uwcm/mg/hgmd0.html
Online Mendelian Inheritance in Man (OMIM), http://www.ncbi.nlm.gov/Omim/ (for Sjogren syndrome, STGD1, frontotemporal dementia, and Frasier syndrome)
UCSC Chimpanzee Genome Browser, http://hgdownload.cse.ucsc.edu/goldenPath/PANTro1/bigZips/ (for source download panTro1 [November 2003])
UCSC Human Genome Browser, http://hgdownload.cse.ucsc.edu/goldenPath/hg17/bigZips/ (for source download hg17)

References

Barbaux S, Niaudet P, Gubler MC, Grunfeld JP, Jaubert F, Kuttenn F, Fekete CN, Souleyreau-Therville N, Thibaud E, Fellous M, McElreavey K (1997) Donor splice-site mutations in WT1 are responsible for Frasier syndrome. Nat Genet 17:467–470 10.1038/ng1297-467 [DOI] [PubMed] [Google Scholar]
Cartegni L, Chew SL, Krainer AR (2002) Listening to silence and understanding nonsense: exonic mutations that affect splicing. Nat Rev Genet 3:285–298 10.1038/nrg775 [DOI] [PubMed] [Google Scholar]
Chen YZ (2004) APP induces neuronal apoptosis through APP-BP1-mediated downregulation of β-catenin. Apoptosis 9:415–422 10.1023/B:APPT.0000031447.05354.9f [DOI] [PubMed] [Google Scholar]
Condorelli G, Bueno R, Smith RJ (1994) Two alternatively spliced forms of the human insulin-like growth factor I receptor have distinct biological activities and internalization kinetics. J Biol Chem 269:8510–8516 [PubMed] [Google Scholar]
Englert C, Vidal M, Maheswaran S, Ge Y, Ezzell RM, Isselbacher KJ, Haber DA (1995) Truncated WT1 mutants alter the subnuclear localization of the wild-type protein. Proc Natl Acad Sci USA 92:11960–11964 [DOI] [PMC free article] [PubMed] [Google Scholar]
Fredman D, White SJ, Potter S, Eichler EE, Den Dunnen JT, Brookes AJ (2004) Complex SNP-related sequence variation in segmental genome duplications. Nat Genet 36:861–866 10.1038/ng1401 [DOI] [PubMed] [Google Scholar]
Garcia-Blanco MA, Baraniak AP, Lasda EL (2004) Alternative splicing in disease and therapy. Nat Biotechnol 22:535–546 10.1038/nbt964 [DOI] [PubMed] [Google Scholar]
Hiller M, Huse K, Szafranski K, Jahn N, Hampe J, Schreiber S, Backofen R, Platzer M (2004) Widespread occurrence of alternative splicing at NAGNAG acceptors contributes to proteome plasticity. Nat Genet 36:1255–1257 10.1038/ng1469 [DOI] [PubMed] [Google Scholar]
Karinch AM, deMello DE, Floros J (1997) Effect of genotype on the levels of surfactant protein A mRNA and on the SP-A2 splice variants in adult humans. Biochem J 321:39–47 [DOI] [PMC free article] [PubMed] [Google Scholar]
Ke X, Hunt S, Tapper W, Lawrence R, Stavrides G, Ghori J, Whittaker P, Collins A, Morris AP, Bentley D, Cardon LR, Deloukas P (2004) The impact of SNP density on fine-scale patterns of linkage disequilibrium. Hum Mol Genet 13:577–588 10.1093/hmg/ddh060 [DOI] [PubMed] [Google Scholar]
Long M, Deutsch M (1999) Association of intron phases with conservation at splice site sequences and evolution of spliceosomal introns. Mol Biol Evol 16:1528–1534 [DOI] [PubMed] [Google Scholar]
Lopez-Bigas N, Audit B, Ouzounis C, Parra G, Guigo R (2005) Are splicing mutations the most frequent cause of hereditary disease? FEBS Lett 579:1900–1903 10.1016/j.febslet.2005.02.047 [DOI] [PubMed] [Google Scholar]
Lynch KW, Weiss A (2001) A CD45 polymorphism associated with multiple sclerosis disrupts an exonic splicing silencer. J Biol Chem 276:24341–24347 10.1074/jbc.M102175200 [DOI] [PubMed] [Google Scholar]
Maquat LE (2004) Nonsense-mediated mRNA decay: splicing, translation and mRNP dynamics. Nat Rev Mol Cell Biol 5:89–99 10.1038/nrm1310 [DOI] [PubMed] [Google Scholar]
Maugeri A, van Driel MA, van de Pol DJR, Klevering BJ, van Haren FJJ, Tijmes N, Bergen AAB, Rohrschneider K, Blankenagel A, Pinckers AJLG, Dahl N, Brunner HG, Deutman AF, Hoyng CB, Cremers FPM (1999) The 2588G→C mutation in the ABCR gene is a mild frequent founder mutation in the Western European population and allows the classification of ABCR mutations in patients with Stargardt disease. Am J Hum Genet 64:1024–1035 [DOI] [PMC free article] [PubMed] [Google Scholar]
Pagani F, Baralle FE (2004) Genomic variants in exons and introns: identifying the splicing spoilers. Nat Rev Genet 5:389–396 10.1038/nrg1327 [DOI] [PubMed] [Google Scholar]
Schaal TD, Maniatis T (1999) Multiple distinct splicing enhancers in the protein-coding sequences of a constitutively spliced pre-mRNA. Mol Cell Biol 19:261–273 [DOI] [PMC free article] [PubMed] [Google Scholar]
Spillantini MG, Murrell JR, Goedert M, Farlow MR, Klug A, Ghetti B (1998) Mutation in the tau gene in familial multiple system tauopathy with presenile dementia. Proc Natl Acad Sci USA 95:7737–7741 10.1073/pnas.95.13.7737 [DOI] [PMC free article] [PubMed] [Google Scholar]
Stamm S, Zhu J, Nakai K, Stoilov P, Stoss O, Zhang MQ (2000) An alternative-exon database and its statistical analysis. DNA Cell Biol 19:739–756 10.1089/104454900750058107 [DOI] [PubMed] [Google Scholar]
Tadokoro K, Yamazaki-Inoue M, Tachibana M, Fujishiro M, Nagao K, Toyoda M, Ozaki M, Ono M, Miki N, Miyashita T, Yamada M (2005) Frequent occurrence of protein isoforms with or without a single amino acid residue by subtle alternative splicing: the case of Gln in DRPLA affects subcellular localization of the products. J Hum Genet 50:382–394 10.1007/s10038-005-0261-9 [DOI] [PubMed] [Google Scholar]
Taudien S, Galgoczy P, Huse K, Reichwald K, Schilhabel M, Szafranski K, Shimizu A, Asakawa S, Frankish A, Loncarevic IF, Shimizu N, Siddiqui R, Platzer M (2004) Polymorphic segmental duplications at 8p23.1 challenge the determination of individual defensin gene repertoires and the assembly of a contiguous human reference sequence. BMC Genomics 5:92 10.1186/1471-2164-5-92 [DOI] [PMC free article] [PubMed] [Google Scholar]
Valentonyte R, Hampe J, Huse K, Rosenstiel P, Albrecht M, Stenzel A, Nagy M, Gaede KI, Franke A, Haesler R, Koch A, Lengauer T, Seegert D, Reiling N, Ehlers S, Schwinger E, Platzer M, Krawczak M, Muller-Quernheim J, Schurmann M, Schreiber S (2005) Sarcoidosis is associated with a truncating splice site mutation in BTNL2. Nat Genet 37:357–364 10.1038/ng1519 [DOI] [PubMed] [Google Scholar]
Walden H, Podgorski MS, Schulman BA (2003) Insights into the ubiquitin transfer cascade from the structure of the activating enzyme for NEDD8. Nature 422:330–334 10.1038/nature01456 [DOI] [PubMed] [Google Scholar]
Watanabe H, Fujiyama A, Hattori M, Taylor TD, Toyoda A, Kuroki Y, Noguchi H, et al (2004) DNA sequence and comparative analysis of chimpanzee chromosome 22. Nature 429:382–388 10.1038/nature02564 [DOI] [PubMed] [Google Scholar]
Xu ZH, Freimuth RR, Eckloff B, Wieben E, Weinshilboum RM (2002) Human 3′-phosphoadenosine 5′-phosphosulfate synthetase 2 (PAPSS2) pharmacogenetics: gene resequencing, genetic polymorphisms and functional characterization of variant allozymes. Pharmacogenetics 12:11–21 10.1097/00008571-200201000-00003 [DOI] [PubMed] [Google Scholar]

[RF1] dbEST, ftp://ftp.ncbi.nlm.nih.gov/blast/db/FASTA/est_human.gz (for the human portion of dbEST)

[RF2] GenBank, http://www.ncbi.nlm.nih.gov/Genbank/ (for the human mRNA download, ZFP91 [accession number NM_170768], DTX2 [accession numbers DQ082728 and DQ082730], CMA1 [accession numbers DQ082727 and DQ082729]), APPBP1 [accession number NM_003905], and UBE1C [accession number NM_003968])

[RF3] Human Gene Mutation Database, http://www.uwcm.ac.uk/uwcm/mg/hgmd0.html

[RF4] Online Mendelian Inheritance in Man (OMIM), http://www.ncbi.nlm.gov/Omim/ (for Sjogren syndrome, STGD1, frontotemporal dementia, and Frasier syndrome)

[RF5] UCSC Chimpanzee Genome Browser, http://hgdownload.cse.ucsc.edu/goldenPath/PANTro1/bigZips/ (for source download panTro1 [November 2003])

[RF6] UCSC Human Genome Browser, http://hgdownload.cse.ucsc.edu/goldenPath/hg17/bigZips/ (for source download hg17)

PERMALINK

Single-Nucleotide Polymorphisms in NAGNAG Acceptors Are Highly Predictive for Variations of Alternative Splicing

Michael Hiller

Klaus Huse

Karol Szafranski

Niels Jahn

Jochen Hampe

Stefan Schreiber

Rolf Backofen

Matthias Platzer

Abstract

Methods

Identification of SNPs Affecting NAGNAG Acceptors

Comparison with the Chimpanzee Genome

Null Model for Gain of NAGNAG Acceptors

Experimental Verification of Alternative Splicing at Polymorphic NAGNAG Acceptors

Table 1.

Results

SNPs in NAGNAG Acceptors Influence Alternative Splicing

Figure 1.

Table 2.

Table 3.

Table 4.

Figure 2.

Evolutionary Aspects of SNPs in NAGNAG Acceptors

Table 5.

Discussion

Figure 3.

Table 6.

Acknowledgments

Appendix A: Randomization Null Model for NAGNAG SNPs

Web Resources

References

ACTIONS

PERMALINK

RESOURCES

Cite

Add to Collections

PERMALINK

Single-Nucleotide Polymorphisms in NAGNAG Acceptors Are Highly Predictive for Variations of Alternative Splicing

Michael Hiller

Klaus Huse

Karol Szafranski

Niels Jahn

Jochen Hampe

Stefan Schreiber

Rolf Backofen

Matthias Platzer

Abstract

Methods

Identification of SNPs Affecting NAGNAG Acceptors

Comparison with the Chimpanzee Genome

Null Model for Gain of NAGNAG Acceptors

Experimental Verification of Alternative Splicing at Polymorphic NAGNAG Acceptors

Table 1.

Results

SNPs in NAGNAG Acceptors Influence Alternative Splicing

Figure 1.

Table 2.

Table 3.

Table 4.

Figure 2.

Evolutionary Aspects of SNPs in NAGNAG Acceptors

Table 5.

Discussion

Figure 3.

Table 6.

Acknowledgments

Appendix A: Randomization Null Model for NAGNAG SNPs

Web Resources

References

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases