Yeo et al. 10.1073/pnas.0409742102.

Supporting Information

Files in this Data Supplement:

Supporting Text
Supporting Table 1
Supporting Table 2
Supporting Figure 5
Supporting Table 3
Supporting Table 4
Supporting Figure 6
Supporting Figure 7
Supporting Table 5
Supporting Figure 8
Supporting Table 6
Supporting Table 7




lSupporting Figure 5

Fig. 5. Performance of various models differing in the choice of features and the number of oligonucleotide features used. (A) The average area under the curve (AUC) values obtained from cross-validation with different models by using various numbers of top-ranking oligonucleotide features are shown. (B) Different models (a to i) are shown in the table, using (denoted with a check, otherwise an x) different combinations of features, such as core features (5’ splice site, 3’ splice site scores, exon lengths, and upstream and downstream flanking intron lengths), upstream and downstream intron alignment scores, exon alignment or similarity scores; and oligonucleotide features from aligned regions (exon alignment features and upstream and downstream intron alignment features), as well as unaligned regions (exon features and flanking intron features).





Supporting Figure 6

Fig. 6. ACESCAN scores for all orthologous human–mouse exons identified by our automated procedure in sample genes. Known alternative exons are indicated by asterisks. The following known alternatively spliced (AS) exons are illustrated. (A) Exons 14 (69 bp) and 26 (54 bp) of the human voltage-dependent T-type calcium channel alpha 1G subunit gene (CACNA1G, Ensembl Gene ID ENSG00000006283). Electrophysiological studies have shown that skipping exon 26 of CACNA1G, which encodes a 1g (a human brain T Ca2+ channel a 1 subunit), affects the kinetics of deactivation and recovery from inactivation of the channel (1). This gene has three other cassette (skipped) exons, numbered 14, 34, and 35 (2), the latter two of which were not paired to annotated mouse exons by our automated exon orthology script and, therefore, did not receive ACESCAN scores. These latter examples remind us that incompleteness of annotation or orthology assignment places certain limits on the sensitivity of comparative methods like ACESCAN. (B) Exon 11 (34 bp) of the human polypyrimidine tract-binding protein gene (PTB, Ensemble Gene ID ENSG00000011304). Skipping this exon yields a premature stop codon in the downstream exon, generating a substrate for nonsense-mediated mRNA decay in an autoregulatory negative feedback loop (3). This PTB exon provides an example of function for one of the ~32% of predicted alternative conserved exons (ACEs) that disrupt reading frame.

1. Chemin, J., Monteil, A., Bourinet, E., Nargeot, J. & Lory, P. (2001) Biophys. J. 80, 1238–1250.

2. Mittman, S., Guo, J. & Agnew, W. S. (1999) Neurosci. Lett. 274, 143–146.

3. Wollerton, M. C., Gooding, C., Wagner, E. J., Garcia-Blanco, M. A. & Smith, C. W. (2004) Mol. Cell 13, 91–100.





Supporting Figure 7

Fig. 7. The fraction of genes containing SH,M exons used in training, the fraction of genes containing predicted ACEs, and the fraction of genes from all Ensembl-annotated genes annotated in the various Gene Ontology (GO) terms.





Supporting Figure 8

Fig. 8. Expression patterns of genes with predicted ACEs. Tissues exhibiting significant overrepresentation/underrepresentation of genes containing predicted ACEs (P < 0.05) in tissue-specifically expressed genes. Genes differentially expressed (>2-fold higher than median value across all tissues) in a HG-U95A microarray study were considered tissue-specifically expressed. Tissues are ordered from right to left in order of increasingly significant bias toward genes containing predicted ACEs.