Skip to main content
DNA Research: An International Journal for Rapid Publication of Reports on Genes and Genomes logoLink to DNA Research: An International Journal for Rapid Publication of Reports on Genes and Genomes
. 2009 Oct 30;16(6):371–383. doi: 10.1093/dnares/dsp022

Identification and Functional Analyses of 11 769 Full-length Human cDNAs Focused on Alternative Splicing

Ai Wakamatsu 1, Kouichi Kimura 2, Jun-ichi Yamamoto 3, Tetsuo Nishikawa 3, Nobuo Nomura 4, Sumio Sugano 5, Takao Isogai 1,3,*
PMCID: PMC2780955  PMID: 19880432

Abstract

We analyzed diversity of mRNA produced as a result of alternative splicing in order to evaluate gene function. First, we predicted the number of human genes transcribed into protein-coding mRNAs by using the sequence information of full-length cDNAs and 5′-ESTs and obtained 23 241 of such human genes. Next, using these genes, we analyzed the mRNA diversity and consequently sequenced and identified 11 769 human full-length cDNAs whose predicted open reading frames were different from other known full-length cDNAs. Especially, 30% of the cDNAs we identified contained variation in the transcription start site (TSS). Our analysis, which particularly focused on multiple variable first exons (FEVs) formed due to the alternative utilization of TSSs, led to the identification of 261 FEVs expressed in the tissue-specific manner. Quantification of the expression profiles of 13 genes by real-time PCR analysis further confirmed the tissue-specific expression of FEVs, e.g. OXR1 had specific TSS in brain and tumor tissues, and so on. Finally, based on the results of our mRNA diversity analysis, we have created the FLJ Human cDNA Database. From our result, it has been understood mechanisms that one gene produces suitable protein-coding transcripts responding to the situation and the environment.

Keywords: full-length cDNA, alternative splicing, alternative transcription start site, mRNA diversity, tissue-specific expression

1. Introduction

One of the most interesting findings revealed by the Human Genome Project is that the human genome contains only 20 000–25 000 protein-coding genes.1 This number is unexpectedly too small. To explain this unexpected result and to understand functions of genes, it is necessary to analyze mRNA diversity.

Biologically, multiple transcripts can be generated from a single gene by alternative splicing (AS). According to several reports on genome research, AS occurs in 30–60% of human genes.25 It has been reported that AS of a single gene could produce transcripts coding for multiple proteins, each exhibiting different biochemical properties including binding, intracellular localization and regulation of enzymatic activities.6 AS is also of interest to the pharmaceutical research because unwanted AS of genes could lead to various genetic diseases and cancers.7 We have particularly focused on the analysis of AS patterns that are produce by utilizing alternative transcription start sites (TSSs). Indeed, multiple transcripts were produced from a gene by utilizing variable TSSs.8,9 For example, the Pcdh gene, which contained variable TSSs, was shown to produce different transcripts;10 similarly, UGTs (UDP-glucuronosyltransferases), which contained more than 10 TSSs.11 From these findings, it is clear that to elucidate gene function, we have to further our knowledge on and understanding of all transcripts made from each gene, particularly those of the protein-coding transcripts. However, identification of all protein-coding transcripts have so far been difficult due to the fact that a large number of EST data accumulated in the databases are 3'-EST data, which were obtained by sequencing cDNAs from the polyA-end. Thus, even though sequences of a large number of mRNAs are already known, our understanding of these mRNAs remained incomplete because of the fragmentary nature and 3′-end bias of their sequences. Because of the lack of sequence information, it has been difficult to predict TSSs and to identify all the open reading frame (ORF) regions. Although the use of next generation sequencer helped in making advances in analyzing TSSs, it still remains extremely difficult to evaluate diversities of mRNAs transcribed by each gene because of their accumulation of short-length sequences (less than 50 bases) of cDNA clones.12,13

We sequenced ∼55 000 human full-length cDNAs, including 11 769 newly identified cDNAs described in this paper, and also obtained ∼1.45 million 5'-end-one-pass sequences (5'-EST).1417 We believe that these cDNA sequences are very useful in analyzing the diversity of protein-coding transcripts and would definitely contribute to our understanding of mRNA. First, our cDNA clones were isolated from full-length human cDNA libraries constructed by an optimized oligo-capping method, and therefore by utilizing their sequence information, we were able to identify the TSS with 90% or better accuracy.14,1820 Thus, we could easily and accurately identify TSSs of even low-expressing genes, for which up until now it required comparison of a large amount of data.17 Second, our 5′-EST data contained, on the average, sequence information of ∼500 bases/cDNA clone, which covered two or more exons. Since the average length of the 5'-untranslated region is believed to be 125 bases,21 it was possible to predict ORF regions using our 5'-EST data. Finally, the most important point is that all of our resources were obtained from the full-length cDNAs, including the TSS and the polyA site. Moreover, we could obtain various findings on protein expression from our full-length cDNAs.16 These findings could not be obtained from sequences of short mRNA fragments. Since AS of genes could potentially create a large number of protein-coding transcripts, analyzing full-length cDNAs might be immensely valuable in understanding gene function.

Here, we report on our analysis of 11 769 full-length cDNAs, which were identified from our full-length cDNA libraries, and contained ORFs as a result of AS. We also present our analysis on the splice patterns and expression profiles of the identified cDNAs to explore the correlation between the mRNA diversity and gene function. Furthermore, we describe 261 full-length cDNAs with unique TSSs known as multiple variable first exon (FEV) and report on their expression profiles. Finally, we report establishing the FLJ Human cDNA Database based on the results of our analysis of the variable protein-coding transcripts generated from each gene by AS.

2. Materials and methods

2.1. Construction of full-length cDNA libraries

Most total RNAs isolated from various tissues and cells were purchased from Clontech and Ambion. Cells were cultured following established protocol, and cytoplasmic total RNAs were extracted from these cultured cells following a standard RNA purification method. The list of total RNAs used in this study was shown in Supplementary Table S1. We constructed cDNA libraries from total RNAs by an optimized oligo-capping method (detailed method for the optimized oligo-capping is provided in the Supplementary Method 1).18,19 Briefly, total RNAs were treated with bacteria alkaline phosphatase (TaKaRa) and tobacco acid pyrophosphatase. After that, total RNAs were ligated to the oligo-RNA using the RNA ligase (TaKaRa). Oligo-capped polyA(+) RNAs were then isolated oligo-dT columns. The first-strand cDNAs were synthesized using the Superscript II reverse transcriptase (Invitrogen), the synthesized cDNAs were amplified using the Gene Amp XL PCR kit (ABI) and the amplified product was digested with the restriction enzyme SfiI. Fragments longer than 2 kb were selected and purified by agarose gel electrophoresis and cloned into the DraIII-digested pME18SFL3 vector following the standard methods. The 5′-end-one-pass sequences of cloned cDNAs were analyzed using the ABI 377 and 3700 sequencers (ABI). The 5'-end fullness rate of the constructed oligo-capped cDNA libraries was evaluated as described previously,22,23 and the detailed method for determining the 5'-end fullness rate is provided in the Supplementary Method 2.

2.2. Genome mapping and clustering

The 5′- and 3′-ends of cDNA sequences and the full-length cDNA sequences (Supplementary Table S2) were mapped onto the human genome (UCSC hg 18 NCBI Build 36.1). Possible local alignments between the cDNAs and genome sequences were identified by using the NCBI Mega BLAST program (ftp://ftp.ncbi.nih.gov/blast/). For each cDNA, best mapping of the sequence was determined from these local alignments using a dynamic programming technique that optimized the identity, coverage and topology of exons. The joining portions of consecutive local alignments were refined so as to restore the consensus sequence in the canonical splice sites. On the basis of the mapping results clustering of cDNA sequences were performed as follows: two cDNA sequences were grouped into the same cluster if their mapped positions shared at least one base on the genome. In general, each cluster corresponded to a single gene locus.

2.3. Identification of alternatively spliced variants of mRNAs

On the basis of the results of genome mapping and clustering analysis, ESTs that had different regions compared with known full-length cDNAs by AS were selected by Intris, a viewer for cDNA-genome alignments used for analysis of splicing variants and expression profiles.24 To exclude the cDNA fragments derived from the immature mRNA and genomic DNA, reliability of mRNA was evaluated by using not only the human EST data but also the data conserved from other animals (Phastcons; obtained from UCSC Genome Browser). We predicted the ORF regions from the 5'-end sequences of full-length cDNAs on selected ESTs by using ATGpr (http://flj.lifesciencedb.jp/top/).25 Next, we excluded those ESTs from the selected analytical targets when the predicted ORF regions of the selected ESTs were the same as the ORF regions of known full-length cDNAs. In addition, even if the predicted ORF regions were different from the ORF regions of known full-length cDNAs, we excluded cDNA clones containing extremely short ORF regions (mostly 60 amino acids or less) compared with the other full-length cDNAs that mapped in the same locus of the human genome. The selected cDNAs were further sequenced by primer walking method using an ABI3700 sequencer (ABI) to obtain information on 500 additional bases, and the ORF regions were predicted again by using the ATGpr.25 We also evaluated the predicted ORF regions by using TRis,26 translated region inspector, and examined their novelty of amino acid sequences by using ALVISION,27 aligns two cDNA sequences that are splicing variants allowing large gaps. When the reliability of the predicted ORF region was insufficient, we excluded it from our list of analytical targets. When the predicted ORF regions of the selected cDNAs were judged reliable and different from those of the known full-length cDNAs, we then sequenced the full-length cDNA clone all the way up to the stop codon. Consequently, we completely sequenced 11 769 of full-length FLJ cDNAs and analyzed their tissue-specific expression. A detailed method for the analysis of the tissue-specific expression of the cDNAs is provided in the Supplementary Method 3. We have also constructed the FLJ Human cDNA Database (http://flj.lifesciencedb.jp) that contained these sequence information. A detailed method for the analysis of AS by using the information available in the FLJ Human cDNA Database is provided in the Supplementary Method 4. Sequences of 11 769 of our full-length cDNAs were also deposited in the DDBJ/GenBank/EMBL databases (AK293122–AK304890).

2.4. Functional analysis of full-length cDNAs in silico

Sequences of cDNAs were analyzed for the signal sequences, trans-membrane domains and motifs in the encoded proteins by using Signal P ver. 3.0 (http://www.cbs.dtu.dk/services/SignalP/), SOSUI ver. 1.5 (Mitsui Knowledge Industry) and Pfam 19.0 (November 2005; http://pfam.sanger.ac.uk/), respectively. We obtained information on motifs showing E-values of e-30 or more from the Pfam analysis, and based on these results, we then categorized each cDNA and the corresponding gene according to its gene ontology (GO) (http://www.geneontology.org/) classification by using InterPro (http://www.ebi.ac.uk/interpro/).

2.5. Quantitative real-time PCR analysis

Total RNAs derived from various tissues were purchased from Clontech, Ambion and STRATAGENE (listed in Supplementary Table S4). From 10 µg of each total RNA, first-strand cDNAs were synthesized using random primers and the Superscript III reverse transcriptase (Invitrogen) following the manufacturer's instructions. Real-time PCR was performed using TaqMan Universal Master Mix (ABI) or SYBR Master Mix (ABI) on an ABI Fast7500 System (ABI) according to the manufacturer's instructions. Approximately 300 ng of template cDNAs was used in each PCR reaction. Probes and primers were designed using the Primer Express3.0 (ABI) (refer to Supplementary Table S5 for the list of primers). The expression levels of genes were normalized with respect to that of the human GAPDH, and expression values of individual genes were calculated by comparing their Ct values to that of the control using the RQ software (ABI). The expression levels of genes were represented in log10 base. Samples were run in duplicates and the data shown are the average of two experiments.

3. Results and discussion

3.1. Identification of human genes

It is known that AS could produce mRNA diversity.26 However, to analyze the mRNA diversity, it is necessary to identify human genes (i.e. the genome loci from where the protein-coding mRNAs are transcribed). We obtained 1.45 million human full-length cDNAs and sequenced their 5'-ends. We previously selected ∼30 000 cDNAs from these full-length cDNAs based on the novelty analysis, and completely sequenced them.1416 Later, we also selected ∼25 000 cDNAs based on the mRNA diversity and also sequenced them completely. In our quest to identify human genes, we used, for our analysis, the sequence information on these 55 000 full-length human cDNAs including 11 769 cDNAs reported in this paper (Supplementary Table S2). Furthermore, for the analysis, we not only used our own data but also data from 52 000 full-length human cDNA sequences available from the public databases, 30 000 human RefSeq (NCBI Reference Sequences; http://www.ncbi.nlm.nih.gov/RefSeq/) and 48 000 Ensembl, human gene transcripts (http://www.ensembl.org/index.html). In addition, we used EST sequences obtained by us and from other public databases (Supplementary Table S2). All the sequence data we collected were mapped onto the human genome and clustered. We then examined reliability of each full-length cDNAs by Intris24 using sequences of all full-length cDNAs and ESTs mapped on the same locus of the genome, and based on this analysis, we selected only the reliable cDNAs for the gene identification analysis. We determined the genome locus of each one of the selected reliable cDNA and manually checked them one by one to identify the corresponding gene. As a result, we identified 23 241 human genes from this analysis (Fig. 1A). Each gene cluster was classified into three categories based on the reliability scores. The number of genes in the high reliability category (high category) were 16 754. Sequences of cDNAs belonging to the high-category group were found to be already analyzed because the genome locus was covered by sequence information available from the three types of databases, the human full-length cDNAs, RefSeq and Ensembl. It accounted for 72% of the total number of genes. The number of genes with intermediate reliability (medium category) was 2854. As for the medium-category group, the genome locus was covered by sequence information available from only the human full-length cDNAs or from two out of three of the above-mentioned databases. The number of genes with low reliability (low category) were 3633. As for the low-category group, the gene locus was covered by sequence information available only from the RefSeq or the Ensembl.

Figure 1.

Figure 1

Clustering of human cDNA sequences. (A) Estimation of the number of human genes from full-length cDNAs and ESTs. Outline of our gene prediction method from the human full-length cDNAs and ESTs mapped to human genome is schematically shown. For each one of the predicted genes, classification reliability was evaluated manually. (B) Cover rate of FLJ EST sequences and (C) cover rate of FLJ full-length sequenced cDNAs. Results of reliability analysis according to the category based on the cover rates of 1.45 million of ESTs (B) and 55 000 full-length cDNAs (C).

To further assess these reliabilities, we next calculated the cover rate of genes using our cDNAs. First, the cover rate was calculated using our 1.45 million FLJ ESTs, and we found a positive correlation between these reliabilities and the cover rate of FLJ ESTs (Fig. 1B). Next, we calculated the cover rate of genes using our 55 000 FLJ human full-length cDNA sequences. In this case, we also found a positive correlation between the reliability and the cover rate similar to that was observed for the ESTs (Fig. 1C). Thus, we were able to verify reliability irrespective of whether we used the sequences of our ESTs or full-length cDNAs in the analysis.

3.2. Analysis of AS and functional classification of sequenced full-length cDNAs by GO

We selected 25 000 full-length cDNAs from among the identified genes by focusing our attention on AS and subsequently sequenced them. In addition, from these cDNAs, we selected 11 769 of human full-length cDNAs in which the ORF regions were predicted to be different from the known full-length cDNAs, and then classified them by GO according to their predicted functions. First, ESTs exhibiting a different splicing pattern than the known full-length cDNAs were selected and were completely sequenced. From the sequence analysis, we were able to predict the ORF regions in only 30% of them (results not shown). Interestingly, a number of cDNA, for which we were unable to predict the ORF region, were thought to produced by AS. But, because our target was to be able to predict the function of the gene from the sequence of its transcript, it was necessary to select protein-coding transcripts efficiently. It is difficult to predict the ORF region correctly from the EST sequences lacking the TSS. However, our 5'-EST sequences not only contained the TSS but also contained sequence information on an average of 500 bases from the TSS. Therefore, we were able to correctly predict the ORF regions of our 5'-EST by using ATGpr.25 As a result, the number of clones containing unpredictable ORF regions decreased to ∼10%. Moreover, by using the tools such as TRins26 for inspecting the translated region and ALVISION27 for evaluating the novelty of amino acid sequences, we succeeded in identifying the ORF regions with high accuracy. Consequently, we obtained 11 769 of human full-length cDNAs in which the ORF regions were predicted to be different from the known full-length cDNAs (Supplementary Table S3). Ninety-six percent of these cDNAs-encoded proteins which differed in at least 10 amino acids from those encoded by their respective known full-length cDNAs, mainly because we selected them based on their altered ORF regions as a result of AS. These full-length cDNAs covered 7025 of 23 241 genes that we had originally identified.

Once it was established that human genes could produce multiple protein-coding transcripts, it was important to analyze their putative functions. The GO classification analysis was performed for all 11 769 our full-length cDNAs using Pfam, and their predicted functions, obtained from this analysis, are summarized in Table 1. The classification results revealed that a large number of our cDNA clones were listed under the GO molecular function categories ‘nucleotide binding’, ‘nucleic acid binding’, ‘protein binding’, ‘hydrolase activity’, ‘transferase activity’ and ‘oxidoreductase activity’. Because 11 769 of our full-length cDNAs had ORF regions different from those of the known full-length cDNAs, we also analyzed their functions by predicting domains and motifs using Pfam, SOSUI and SignalP (Supplementary Table S3). Consequently, we discovered full-length cDNAs that encoded proteins with altered functional domains and signal sequences as a result of AS.

Table 1.

Functional classification of the 11 769 full-length cDNAs based on the molecular function hierarchy of GO

Functional categorization (GO: molecular function) Number of matched cDNAs
Binding
 Nucleotide binding 681
 Nucleic acid binding 341
 Protein binding 202
 Ion binding 149
 Lipid binding 28
 Tetrapyrrole binding 27
 Neurotransmitter binding 24
 Carbohydrate binding 22
 Other bindings 57
Catalytic activity
 Hydrolase activity 506
 Transferase activity 479
 Oxidoreductase activity 207
 Ligase activity 85
 Lyase activity 47
 Helicase activity 38
 Isomerase activity 26
 Other catalytic activities 106
Enzyme regulator activity
 GTPase regulator activity 45
 Enzyme inhibitor activity 44
 Other enzyme regulator activities 21
Motor activity
 Microtubule motor activity 24
 Other motor activities 20
Signal transducer activity
 Receptor activity 124
 Receptor binding 25
 Other signal transducer activities 40
Structural molecule activity
 Structural constituent of ribosome 25
 Other structural molecule activities 56
Transcription regulator activity
 Transcription factor activity 138
 Other transcription regulator activities 39
Translation regulator activity
 Translation factor activity, nucleic acid binding 25
Transporter activity
 Ion transporter activity 169
 Carrier activity 90
 Channel or pore class transporter activity 79
 ATPase activity, coupled to movement of substances 39
 Other transporter activities 131
Others 2
Molecular function unknown 45

If a protein was predicted to belong to two or more categories, all categories were included for counting.

3.3. Classification of splicing patterns of full-length cDNAs

Up until now, majority of the ESTs entered in the public databases were 3'-EST. We succeeded in constructing full-length cDNA libraries efficiently by using the optimized oligo-capping method and obtained ∼1.4 million 5'-ESTs of full-length cDNAs constructed by this method.18,19 Our 5'-EST sequences were especially useful for the analysis of TSSs because 90% or more of our cDNAs contained the TSSs. We analyzed the splicing patterns of the 11 769 cDNAs by using the 5'-EST sequence data (Fig. 2). Results of this analysis revealed that 3403 cDNAs, which correspond to ∼30% of all cDNAs, were transcribed using alternative TSSs (Type A), and thus, the predicted proteins contained new amino acid sequences at their N-terminal ends. In addition, 1962 cDNAs in Type A (designated as Type A1) contained FEV, due to transcripts originating from a TSS that was previously ignored because it was mapped in an intron region of the genome or transcripts originating from a TSS that was mapped upstream from the one that was analyzed before. Taken together, these results led to the discovery of new exons. We analyzed expression profiles of the genes containing multiple TSSs and discovered that the same gene could code for proteins with diverse function in different tissues by the proper utilization of alternative TSS. There were 8277 cDNAs (i.e. ∼70% of all the full-length cDNAs) that were transcribed from the previously identified TSSs, but contained different ORF region because of AS; they were designated as Type B. Because we used our 5'-EST data for the selection, a lot of Type B cDNAs were predicted to contain N-terminal sequences different from those of the known cDNAs, except for a portion of cDNAs which were either selected by PCR or found during sequencing analysis. To assess whether AS or use of alternative TSS could alter the function of the predicted protein, we compared the GO functional categories of the Type A and Type B (Table 2). Our results showed that majority of the Type A belonged mainly to the GO molecular function categories of ‘neurotransmitter binding’, ‘enzyme activator activity’, ‘cyclase activity’, ‘ATPase activity, coupled to movement of substances’ and ‘GTPase regulator activity’. Thus, by using our 5'-EST data, a lot of valuable information were obtained regarding the diversity of TSS and amino acid sequences at the N-terminal ends of proteins. However, since only a portion of the full-length cDNAs was selected for this analysis, information on sequence diversity in regions beyond 500 bases from the TSSs were not obtained. We believe that there are additional alternately spliced transcripts which remained to be analyzed in the future studies.

Figure 2.

Figure 2

Classifications of the 11 769 full-length cDNAs based on splicing patterns. The 11 769 human full-length cDNAs were classified according to their TSS utilization. Type A: these cDNAs were derived from transcripts which were generated utilizing a TSS different than the previously analyzed TSS of the gene. Type A1: cDNAs contained a sequence variation known as FEV. Type A2: this class of cDNAs did not have the FEV feature. Type B: these cDNAs were derived from transcripts that were generated utilizing the same TSS as the previously analyzed TSS, but were found to be alternatively spliced. We could not classify 89 cDNAs because they coded for newly identified proteins.

Table 2.

Functional classification of two types of splicing patterns of 11 769 full-length cDNAs based on GO category analysis

Functional categorization (GO: molecular function) Number of matched cDNAs
Type A (%) Type B (%) Type A + B
Binding
 Lipid binding 4 (14.3) 24 (85.7) 28
 Tetrapyrrole binding 5 (18.5) 22 (81.5) 27
 Neurotransmitter binding 12 (50.0)* 12 (50.0) 24
 Carbohydrate binding 4 (18.2) 18 (81.8) 22
 Cofactor binding 3 (16.7) 15 (83.3) 18
 Steroid binding 1 (10.0) 9 (90.0) 10
Catalytic activity
 Helicase activity 4 (10.5) 34 (89.5) 38
 Small protein activating enzyme activity 2 (18.2) 9 (81.8) 11
 Cyclase activity 6 (54.5)* 5 (45.5) 11
Enzyme regulator activity
 GTPase regulator activity 31 (68.9)* 14 (31.1) 45
 Enzyme activator activity 6 (50.0)* 6 (50.0) 12
Structural molecule activity
 Structural constituent of ribosome 1 (4.0) 24 (96.0) 25
Transporter activity
 ATPase activity, coupled to movement of substances 23 (59.0)* 16 (41.0) 39
 Electron transporter activity 2 (13.3) 13 (86.7) 15
Total 1344 (32.0) 2862 (68.0) 4206

The ratio of Type A and Type B is 3:7 as shown by total. Total is all the results of classification in the category of molecular function. If a protein was predicted to belong to two or more categories, all categories were included for counting.

*Functional categories biased to Type A.

3.4. Analysis of genes showing tissue-specific expression

We analyzed expression of genes producing multiple protein-coding transcripts by AS and found that many of these transcripts were expressed in specific tissues or cells, suggesting that the genes likely use this diversity according to the need and situation. We next analyzed expression profiles of 10 069 cDNAs, which corresponded to 5542 genes, out of 11 769 full-length cDNAs we identified in this study. As our cDNA libraries were constructed using RNAs derived from more than 100 different types of tissues and cells, we therefore used the 5′-EST data for analyzing gene expression. We next analyzed gene expression profiles of Type A1 cDNAs containing the FEV diversity and found that the FEVs of 261 cDNAs, which correspond to 155 genes, showed specific expression patterns that were different from those already obtained for the genes with alternative TSSs (Table 3). Thus, like the genes with alternative TSSs, the expression patterns of the genes with FEVs likely depended on the tissue and condition. Consequently, we found genes producing multiple protein-coding transcripts by AS.

Table 3.

Expressions of a selected list of 261 FEV-containing cDNAs (155 genes)

FLJ ID Specific expression Gene symbol FLJ ID Specific expression Gene symbol FLJ ID Specific expression Gene symbol FLJ ID Specific expression Gene symbol
FLJ50079 Brain NRK FLJ52319 Trachea GNE FLJ55043 FB, NT PDZRN3 FLJ57051 Brain Pld5
FLJ50162 Brain LARGE1 FLJ52354 Brain, NT CHRNB1_pre FLJ55050 Brain EPS15 FLJ57068 FB FGF13
FLJ50199 Brain ARHGEF6 FLJ52356 Testis ARMC4 FLJ55194 Brain Unknown FLJ57107 Brain, NT CHRNB1_pre
FLJ50365 Trachea CRISPLD1 FLJ52358 Testis TP73 FLJ55226 FB CHST10 FLJ57108 Brain SNAP91
FLJ50390 Brain GRIA1_pre FLJ52367 Testis IQGAP2 FLJ55256 Synovial TFEC FLJ57207 Im Unknown
FLJ50398 Testis IQGAP2 FLJ52368 Testis, Trachea ARMC4 FLJ55265 Im Unknown FLJ57232 Testis PRCP_pre
FLJ50459 Brain ETV1 FLJ52384 Im PTPN3 FLJ55281 Heart, Fetal heart SLC5A1 FLJ57269 Brain BTBD10
FLJ50460 Brain DLG4 FLJ52407 Testis CRB1_pre FLJ55284 FB, NT MAGI2 FLJ57290 Trachea CRISPLD1
FLJ50484 Brain SLC26A4 FLJ52427 Brain AMPD3 FLJ55338 FB CLASP1 FLJ57298 Brain RAPGEF4
FLJ50494 Brain ETV1 FLJ52435 Testis MARCH7 FLJ55344 Brain DYSF FLJ57302 Brain RAPGEF4
FLJ50523 Brain PEX5L FLJ52438 Brain RIMS1 FLJ55381 FB SLC44A5 FLJ57330 Brain APBB1
FLJ50526 Brain PEX5L FLJ52453 Testis AMPD3 FLJ55423 Placenta NRK FLJ57521 Tu PPFIBP2
FLJ50533 Brain SLC6A9 FLJ52496 Brain TSPAN5 FLJ55434 Testis POMGNT1 FLJ57884 FB FGF13
FLJ50539 Brain, NT DCAMKL1 FLJ52520 FB EOMES FLJ55460 Brain SEMA5B_pre FLJ57888 Brain SGCB
FLJ50557 Brain MAP7 FLJ52731 Brain SPRED2 FLJ55461 NT KLHL13 FLJ57953 Brain STAU
FLJ50577 FB DLG4 FLJ52750 Brain ARHGEF7 FLJ55481 NT RGMA_pre FLJ58008 Brain PPP2R2B
FLJ50619 NT ELAVL4 FLJ52810 Testis GABRB3_pre FLJ55495 Testis PCYT2 FLJ58099 Brain CLTCL1
FLJ50623 Brain, NT DCAMKL1 FLJ53109 Testis PPP2R5E FLJ55504 Testis KLHL13 FLJ58366 Brain RIMS1
FLJ50641 Brain ETV1 FLJ53114 Testis NCAM2_pre FLJ55514 Brain, Tu EGFR_pre FLJ58368 Brain RAPGEF4
FLJ50646 FB DLG4 FLJ53167 NT CUL4B FLJ55516 Tu LIMS1 FLJ58494 Brain Unknown
FLJ50725 Testis ATPAF1 FLJ53184 Brain PPFIA2 FLJ55607 Brain, Trachea HDAC9 FLJ58753 Brain ARHGEF3
FLJ50745 Testis CCNA1 FLJ53222 FB MLLT3 FLJ55622 Testis MMRN1_pre FLJ58755 Brain CHN2
FLJ50761 Brain LRIG1_pre FLJ53242 Testis CLASP1 FLJ55627 Testis MOV10L1 FLJ58966 Im RAB37
FLJ50773 Brain CALB1 FLJ53247 Testis IDE FLJ55628 Testis LOXHD1 FLJ59303 Brain DOCK4
FLJ50776 Brain ARHGEF6 FLJ53252 Testis CDH2_pre FLJ55641 Brain, NT JARID2 FLJ59333 Tu RARG
FLJ50810 FB, NT MAGI2 FLJ53320 Brain DLGAP1 FLJ55662 Im FGR FLJ59338 Tu RARG
FLJ50844 Brain WARS2_pre FLJ53324 Brain TJP2 FLJ55664 Testis NTRK3_pre FLJ59345 Brain PPFIA2
FLJ50917 Testis PCCB_pre FLJ53330 Brain, NT EXOC4 FLJ55778 Brain CLASP1 FLJ59425 Placenta SH3KBP1
FLJ50956 Brain RAPGEF4 FLJ53518 Testis POMGNT1 FLJ55834 Brain, NT FGF11 FLJ59496 Brain CHN2
FLJ50959 Brain RAPGEF4 FLJ53578 Brain Rims1 FLJ55856 Testis ARHGEF3 FLJ59502 Brain PPFIA2
FLJ50961 Brain TMEM16C FLJ53606 NT AKT1 FLJ55859 Testis ST7L FLJ59511 Brain GRIA1_pre
FLJ50989 FB EOMES FLJ53680 Testis KIF2C FLJ55865 Im SLC43A2 FLJ59545 Brain EML2
FLJ51025 Kidney NOX4 FLJ53829 Brain APBB1 FLJ55903 FB GPR161 FLJ59625 Brain ARHGEF7
FLJ51027 Kidney NOX4 FLJ53875 Brain APBB1 FLJ55905 Im FGD4 FLJ59641 Testis PPFIA2
FLJ51073 FB EOMES FLJ53929 Im PTPN4 FLJ55906 Testis KIFC3 FLJ59648 Im DYSF
FLJ51155 Testis Unknown FLJ53980 Brain PPM1F FLJ55918 Brain EML2 FLJ59678 Brain PEX5L
FLJ51157 Testis HDAC4 FLJ53990 Brain GABRB3_pre FLJ55961 Brain GRM4_pre FLJ59684 Brain PLEKHG5
FLJ51174 Im HDAC4 FLJ53997 Brain CTNNA2 FLJ55997 Brain CPNE6 FLJ59710 Brain MCF2
FLJ51177 Im HDAC4 FLJ53999 Brain GAB1 FLJ56033 Testis Unknown FLJ59717 FB TBR1
FLJ51210 Brain KIFC3 FLJ54008 Brain TPCN1 FLJ56036 Tu KIFC3 FLJ59769 Im PLEKHG5
FLJ51383 Testis PPP2R5A FLJ54011 Brain PPFIA2 FLJ56037 Testis, Prostate CUL2 FLJ59799 Testis CTNNA2
FLJ51528 Im BTNL8_pre FLJ54016 Testis DIP13B FLJ56038 Small intestine Unknown FLJ59802 Testis ADCY5
FLJ51566 Brain PDK1 FLJ54093 Brain GPHN FLJ56044 Brain OXR1 FLJ59806 Im HDAC4
FLJ51606 Trachea HABP2_pre FLJ54100 Brain CHN2 FLJ56093 Brain PTPRR_pre FLJ60503 Brain LARGE1
FLJ51663 Testis CPS1_pre FLJ54331 Brain, Osteoclast Unknown FLJ56095 Brain KLHL13 FLJ60665 Tu SLC44A5
FLJ51675 Brain ETV1 FLJ54394 Testis CRB1_pre FLJ56110 FB GOLSYN FLJ60667 Tu SLC44A5
FLJ51685 Testis MCF2 FLJ54513 Testis WDR59 FLJ56116 FB APLP1 FLJ60693 FB PHF21B
FLJ51695 Im TP74 FLJ54541 FB EXOC4 FLJ56136 NT SLC2A14 FLJ60998 Testis INPP4B
FLJ51706 Testis RAPGEF4 FLJ54577 NT HDAC9 FLJ56137 Im Unknown FLJ61124 Brain RAB37
FLJ51734 Uterus TMEM16C FLJ54580 NT HDAC9 FLJ56142 NT AMOTL2 FLJ61133 FB EXOC4
FLJ51737 Brain ARHGEF6 FLJ54612 Brain SH3KBP1 FLJ56148 Brain PLEKHG5 FLJ61370 FB SNCAIP
FLJ51769 Testis IQGAP2 FLJ54642 Brain APBB1 FLJ56167 Testis KLHL12 FLJ61443 Testis LARGE1
FLJ51805 Brain RIMS2 FLJ54658 Brain LSAMP_pre FLJ56226 NT SNCAIP FLJ61560 Trachea TJP2
FLJ51859 Brain APBB1 FLJ54672 Brain DOCK4 FLJ56370 Testis, Prostate FKBP8 FLJ61674 Brain PEX5L
FLJ51873 Brain, NT AGPS_pre FLJ54673 Brain Unknown FLJ56376 Brain MTMR1 FLJ61679 Brain APBB1
FLJ51910 FB GTPBP3 FLJ54674 Brain TPCN1 FLJ56411 Brain GRIA2_pre FLJ53199 Brain ↓ NEDD4L
FLJ51934 Im AOAH_pre FLJ54690 Brain BACE1_pre FLJ56420 Testis DNPEP FLJ59993 Brain ↓ RIMS1
FLJ51957 NT ELAVL4 FLJ54693 Brain BACE1_pre FLJ56452 Brain EML2 FLJ55591 Brain ↓ ARHGEF3
FLJ51977 Brain Unknown FLJ54702 Brain DLGAP1 FLJ56634 Brain GRM4_pre FLJ56152 Brain ↓ ARHGEF7
FLJ52027 Testis ATPAF1 FLJ54724 FB DLG2 FLJ56895 Testis EML2 FLJ58411 FB ↓ CACNB3
FLJ52034 Im Unknown FLJ54738 Brain PDZRN3 FLJ56912 Uterus FBLN2_pre FLJ58949 FB ↓ CACNB3
FLJ52037 Im GRAP2 FLJ54742 Testis Slmap FLJ56913 Placenta, Uterus FBLN2 FLJ57810 Tu ↓ A2ML1
FLJ52039 Im GRAP2 FLJ54746 NT PDZRN3 FLJ56957 Brain TMEM16C FLJ53545 Tu ↓ RARG
FLJ52041 Im Unknown FLJ54751 NT SUV420H1 FLJ56961 Brain CLTCL1
FLJ52042 Im GRAP2 FLJ54906 Trachea TMC5 FLJ56973 Brain TMEM16C
FLJ52288 Testis ARMC4 FLJ54987 FB PHF21B FLJ56979 Brain MYRIP

We analyzed expression profiles of the first exons of ∼1.5 million 5'-ESTs constructed by the oligo-capping method. From this analysis, we selected 261 full-length cDNAs based on the expression levels of their FEVs in specific tissues. Expression levels of cDNAs indicated without any label and with a ‘↓’ label were high and low, respectively, in the respective tissues.

*NT: NT2 cell induced by retinoic acid; FB, fetal brain; Im, immune tissues; Tu, tumor tissues; pre, precursor; unknown, function unknown.

3.5. Analysis of expression patterns of tissue-specific expressed genes

We quantified tissue-specific expressions of 13 out of 261 selected cDNAs by real-time PCR (Fig. 3). Results of our analysis especially suggested that there was a strong relationship between the tissue-specific expression and diversity of gene function or disease. We compared the expression profile of a specific gene by utilizing the TSS identified in this study with that of the same gene in which a previously identified TSS was utilized for expression. These results are summarized in Supplementary Table S6 and are discussed below in more detail.

Figure 3.

Figure 3

Quantitative evaluation of selected genes by real-time PCR. Expression levels of the first exon regions of the selected genes were analyzed by real-time PCR. The data were normalized with respect to that of the human GAPDH as described in the Materials and methods section. The expression levels of genes were represented in log10 base. Expression levels of cDNAs labeled ‘$$’ represent the very low expression level or undetected. (A) FGF13, (B) OXR1, (C) C6orf142, (D) PLD5, (E) FGD4, (F) C6orf32. BW, brain, whole; BC, brain, cerebellum; BF, fetal brain; SP, spleen; BM, bone marrow; TH, thymus; OV, ovary; PR, prostate; UT, uterus; MT, mixture of tumor human tissues; MN, control, mixture of normal human tissues; KT, kidney tumor; LT, lung tumor.

First example, FGF13 is a gene that belongs to the FGF family and is believed to play roles in cell proliferation and differentiation, and also in neuronal differentiation.28,29 FLJ57884 and FLJ57068 cDNAs exhibited different ORF regions as a result of FEV and were splicing variants of the known FGF13 cDNA. The TSSs we found in each one of them were located upstream from the TSS of FGF13. Whereas the known TSS of FGF13 was expressed highly in both fetal and adult brains, the TSSs of both FLJ57884 and FLJ57068 cDNAs were highly expressed only in the fetal brain. Moreover, the TSS of our FLJ57068 cDNA was also expressed highly in the kidney cancer (Fig. 3A). Second example, OXR1 is one of the oxidation stress receptivity genes localized in mitochondria.30 The TSS of known OXR1 was expressed at equal levels in various tissues. But the TSS we identified in the FLJ56044 cDNA was located upstream from the known TSS of OXR1 and was highly expressed in brain, kidney cancer and lung cancer (Fig. 3B). Thus, these results suggested that these two genes were using different TSSs to regulate their expression levels in the brain. Moreover, our results also suggest that, for both genes, only one of the TSSs was preferentially recognized by the transcription machinery in the cancerous tissue.

Third example, C6orf142 (chromosome 6 ORF 142) is a gene of an unknown function. The known TSS of C6orf142 was highly expressed in the heart. However, the TSS we identified in the FLJ58494 cDNA, which was located downstream from the previously identified TSS of C6orf142, was highly expressed in both fetal and adult brains (Fig. 3C). Fourth example, PLD5 is one of the phospholipid-splitting enzymes presumably involved in the intracellular signaling.31 Although the known TSS of PLD5 was expressed equally in various tissues, the TSS we identified in the FLJ57051 cDNA, which was located downstream of the previously identified TSS of PLD5, was highly expressed in the brain (Fig. 3D). Fifth example, SPRED2 is a Ras inhibitory factor belonging to the Sprouty/Spred family.32 The TSS we identified in the FLJ52731 cDNA, which was located downstream from the known TSS of SPRED2, was expressed highly in the brain (Supplementary Table S6). Sixth example, SEMA5B is a nerve guidance factor which is involved in organogenesis, angiogenesis and oncogenesis.33 The TSS we identified in the FLJ55460 cDNA, which was located downstream from the known TSS of SEMA5B, also was expressed highly in the brain (Supplementary Table S6). Seventh example, CACNB3 is a calcium channel beta-3 subunit, which is involved in modifying sympathetic nervous system, olfaction and control of blood pressure.34 Although the known TSS of CACNB3 was expressed highly in both fetal and adult brains, the newly identified TSSs of FLJ58949 and FLJ58411 cDNAs, both of which were located downstream from the known TSS of CACNB3, were expressed at a low level in the brain (Supplementary Table S6). These cDNAs exhibited different ORF regions as a result of AS. Eighth example, BACE1 is a peptide hydrolase that cleaves the amyloid precursor protein and is one of the factors involved in Alzheimer's disease.35 The known TSS of BACE1 was expressed equally in various tissues. However, the TSS we identified in the FLJ54690 cDNA, which was located downstream from the known TSS of BACE1, was expressed highly in the brain (Supplementary Table S6). Thus, these six genes regulated their expression levels in the brain using a specific TSS in each gene.

Ninth example, FGD4 is a gene that seemed to be involved in the regulation of the actin in the cytoskeleton and cell shape and also have various roles in proliferation, differentiation, transcriptional regulation and development.36 The known TSS of FGD4 was highly expressed in the nervous system tissues such as brain, spinal cord and testis. However, the TSS we identified in the FLJ55905 cDNA, which was located downstream from the known TSS of FGD4, was highly expressed in the immune system tissues such as bone marrow and spleen (Fig. 3E). Tenth example, C6orf32 is a gene of unknown function whose expression level increased during the myoblast differentiation of the embryo.37 FLJ56038 and FLJ56137 cDNAs exhibited different ORF regions as a result of FEV and were splicing variants of the known C6orf32 cDNA. The known TSS of C6orf32 was expressed at equal levels in various tissues. However, the TSSs we found in FLJ56038 and FLJ56137 cDNAs were located upstream of the known TSS of C6orf32, and both of these newly identified TSSs were highly expressed in the immune system tissues such as bone marrow, spleen and thymus (Fig. 3F). Eleventh example, PTPN4 is a gene belonging to the PTP (tyrosine escape phosphoric acid enzyme) family that works as a transmitter and controls various cellular processes like cell proliferation, differentiation, mitotic cycle and oncogenesis.38 The known TSS of PTPN4 was highly expressed in the brain, but the TSS we identified in the FLJ53929 cDNA, which was located downstream from the known TSS of TPN4, was highly expressed in the immune system tissues such as bone marrow and spleen (Supplementary Table S6). Twelfth example, BTNL8 is one of the butyrophilin-like proteins and seemed to be involved in conferring immunity.39 The known TSS of BTNL8 was found to be expressed at equal levels in various tissues. However, the TSS we identified in the FLJ51528 cDNA, which was located downstream from the known TSS of BTNL8, was highly expressed in the lung and thymus (Supplementary Table S6). Thus, it seems that these four genes regulated their expression levels in the immune system tissues by using specific TSSs.

Thirteenth example, AKT1 is a gene involved in apoptosis and neuronal differentiation and also may have a role in schizophrenia, especially in the neurotransmission system.40 The TSS we identified in the FLJ53606 cDNA, which was located downstream from the known TSS of AKT, was highly expressed in the retinoic acid-induced NT2 cells (Supplementary Table S6). Thus, this gene uses a specific TSS during the neuronal differentiation.

Thus, among the newly identified genes we have analyzed in this study, the TSSs of a number of these genes revealed specific expression patterns. These results suggest that a single gene could use alternative TSS for tissue-specific transcription. We also found a close relationship between the predicted function of a gene and its tissue-specific expression. Thus, our results suggest a strong correlation between the mRNA diversity and function of a gene.

3.6. Construction and use of the FLJ Human cDNA Database

We constructed the FLJ Human cDNA Database ver. 3.0 (http://flj.lifesciencedb.jp) based on the results of our analysis of variable protein-coding transcripts produced from a gene by AS. A detailed description of our DB is available at the DB website. The DB graphically displays mapping of all the full-length cDNAs in the human genome and their ORF regions and thus provides a lot of useful information on the mRNA diversity. Moreover, the DB not only contain sequence information on full-length human cDNAs but also contain sequence information on a huge number of human ESTs generated using the oligo-capping method, allowing us to obtain useful information on ESTs mapped on the same genome locus. Because the average length of our EST sequences was ∼500 bases, the diversity of mRNAs produced as a result of AS could be efficiently analyzed by using this information. Because we were able to accurately identify TSSs using our 5′-EST data, we believe that they could be used to understand the relationship between the variable utilization of TSSs and biological functions of genes. Moreover, one could analyze the expression profiles of the transcriptional region of genes using the data from our high accuracy 5′-EST sequences, although in some cases the results might be different from those obtained using the 3′-EST data.

Despite these useful features, our database specializes on 5′-end sequences, and therefore these data are not suitable for predicting AS in the C-terminal end. Then, a lot of AS-related information still remain to be extracted from our 1.4 million cDNA resources as all of them were not sequenced to completion. Because our cDNA resources are mostly full-length cDNAs including the TSS and the polyA site, complete sequencing of these cDNA clones will add to our understanding of the mRNA diversity. In addition, every full-length sequenced FLJ cDNAs is available from the National Institute of Technology and Evaluation (http://www.nite.go.jp/). We will continue to add new information on our resources to our database, and these resources will be very useful in the analysis of gene functions.

Because our interest was on the mRNAs with ORF regions different from those of already known mRNAs, we stopped sequencing the cDNA once we found that the predicted ORF region of the transcript was not different from the known mRNA (for instance, where the alternative TSS only existed in the 5′-untranslated region). We, however, found that there is a tissue specificity in the expression patterns of these genes where the variation in TSS existed in the 5′-untranslated region (results not shown). Collectively, these results suggest that depending on the situation and environment, the transcription machinery utilizes alternative TSS to regulate the expression of a transcript, even when the translated protein is same. These results are also included in our DB. We also did not complete sequencing the clones for which we were unable to predict the ORF regions of their mRNAs. However, we have also included these clones in the DB with the belief that one could obtain some new and useful information by analyzing these clones.

We discovered a lot of genes had mRNA diversity due to, for example, FEVs. We also found a lot of tissue-specific splicing patterns. Especially, in the case of FEVs that we analyzed, genes used different regions of the genome loci as the first exon, which seemed to be dependent on the tissue and its condition. We also discovered genes, the TSSs of which were located further away on the same genome locus of the gene. In these cases, there exists a high possibility that their transcription is controlled by individual transcription factors. As the mechanisms for controlling the transcription are closely related to the function, by understanding these mechanisms one could be able to artificially control the expression of an appropriate transcript in the future.

In this study, we have identified multiple transcripts producing genes, and we believe that each one of these genes is transcribed into an appropriate transcript according to the need and circumstance. Now, it will be important to know whether there is any correlation between the expression of one of the transcripts produced by a gene and a disease. For example, in the case of transcripts containing FEVs, which we analyzed in detail, only the first exon regions were different from the other previously characterized transcripts. Since the first exon regions of these transcripts are unique, it is possible to distinguish them easily from the other transcripts. It may be possible to control the expression of a specific mRNA from a group of mRNAs transcribed from a gene by targeting the first exon. As we accumulate more information on mRNA diversity of genes using approaches similar to what we have described in this study, we might be able to identify candidate genes as novel targets for the development of drugs with lower side effects.

Supplementary data

Supplementary data are available at www.dnaresearch.oxfordjournals.org.

Funding

This work was partly supported by a grant from New Energy and Industrial Technology Developmental Organization (NEDO) project of the Ministry of Economy, Trade and Industry of Japan.

Supplementary Material

[Supplementary Data]
dsp022_index.html (841B, html)

Acknowledgements

We thank the members of the NEDO functional analysis of protein and research application project for cDNA sequencing and clone stock, especially thanks to M. Yamazaki, K. Watanabe, A. Sugiyama, Y. Ono, T. Takayama (Japan Biological Informatics Consortium) and K. Fujita (National Institute of Technology and Evaluation, Japan). We also thank the members of the NEDO human full-length cDNA sequencing project for EST sequencing.

References

  • 1.International Human Genome Sequencing Consortium. Finishing the euchromatic sequence of the human genome. Nature. 2004;431:931–45. doi: 10.1038/nature03001. [DOI] [PubMed] [Google Scholar]
  • 2.Lander E.S., Linton L.M., Birren B., et al. Initial sequencing and analysis of the human genome. Nature. 2001;409:860–921. doi: 10.1038/35057062. [DOI] [PubMed] [Google Scholar]
  • 3.Lopez A.J. Alternative splicing of pre-mRNA: developmental consequences and mechanisms of regulation. Annu. Rev. Genet. 1998;32:279–305. doi: 10.1146/annurev.genet.32.1.279. [DOI] [PubMed] [Google Scholar]
  • 4.Black D.L. Protein diversity from alternative splicing: a challenge for bioinformatics and post-genome biology. Cell. 2000;103:367–70. doi: 10.1016/s0092-8674(00)00128-8. [DOI] [PubMed] [Google Scholar]
  • 5.Modrek B., Lee C. A genomic view of alternative splicing. Nat. Genet. 2002;30:13–9. doi: 10.1038/ng0102-13. [DOI] [PubMed] [Google Scholar]
  • 6.Stamm S. Signals and their transduction pathways regulating alternative splicing: a new dimension of the human genome. Hum. Mol. Genet. 2002;11:2409–16. doi: 10.1093/hmg/11.20.2409. [DOI] [PubMed] [Google Scholar]
  • 7.Bracco L., Kearsey J. The relevance of alternative RNA splicing to pharmacogenomics. Trends Biotechnol. 2003;21:346–53. doi: 10.1016/S0167-7799(03)00146-X. [DOI] [PubMed] [Google Scholar]
  • 8.Landry J.R., Mager D.L., Wilhelm B.T. Complex controls: the role of alternative promoters in mammalian genomes. Trends Genet. 2003;19:640–8. doi: 10.1016/j.tig.2003.09.014. [DOI] [PubMed] [Google Scholar]
  • 9.Zhang T., Haws P., Wu Q. Multiple variable first exons: a mechanism for cell- and tissue-specific gene regulation. Genome Res. 2004;14:79–89. doi: 10.1101/gr.1225204. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 10.Wu Q., Maniatis T. Large exons encoding multiple ectodomains are a characteristic feature of protocadherin genes. Proc. Natl Acad. Sci. USA. 2000;97:3124–9. doi: 10.1073/pnas.060027397. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 11.Strassburg C.P., Oldhafer K., Manns M.P., Tukey R.H. Differential expression of the UGT1A locus in human liver, biliary, and gastric tissue: identification of UGT1A7 and UGT1A10 transcripts in extrahepatic tissue. Mol. Pharmacol. 1997;52:212–20. doi: 10.1124/mol.52.2.212. [DOI] [PubMed] [Google Scholar]
  • 12.Wang E.T., Sandberg R., Luo S., et al. Alternative isoform regulation in human tissue transcriptomes. Nature. 2008;456:470–6. doi: 10.1038/nature07509. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 13.Licatalosi D.D., Mele A., Fak J.J., et al. HITS-CLIP yields genome-wide insights into brain alternative RNA processing. Nature. 2008;456:464–9. doi: 10.1038/nature07488. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 14.Ota T., Suzuki Y., Nishikawa T., et al. Complete sequencing and characterization of 21,243 full-length human cDNAs. Nat. Genet. 2004;36:40–5. doi: 10.1038/ng1285. [DOI] [PubMed] [Google Scholar]
  • 15.Otsuki T., Ota T., Nishikawa T., et al. Signal sequence and keyword trap in silico for selection of full-length human cDNAs encoding secretion or membrane proteins from oligo-capped cDNA libraries. DNA Res. 2005;12:117–26. doi: 10.1093/dnares/12.2.117. [DOI] [PubMed] [Google Scholar]
  • 16.Goshima N., Kawamura Y., Fukumoto A., et al. Human protein factory for converting the transcriptome into an in vitro-expressed proteome. Nat. Methods. 2008;5:1011–7. doi: 10.1038/nmeth.1273. [DOI] [PubMed] [Google Scholar]
  • 17.Kimura K., Wakamatsu A., Suzuki Y., et al. Diversification of transcriptional modulation: large-scale identification and characterization of putative alternative promoters of human genes. Genome Res. 2006;16:55–65. doi: 10.1101/gr.4039406. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 18.Maruyama K., Sugano S. Oligo-capping: a simple method to replace the cap structure of eukaryotic mRNAs with oligoribonucleotides. Gene. 1994;138:171–4. doi: 10.1016/0378-1119(94)90802-8. [DOI] [PubMed] [Google Scholar]
  • 19.Suzuki Y., Sugano S. Construction of a full-length enriched and a 5'-end enriched cDNA library using the oligo-capping method. Methods Mol. Biol. 2003;221:73–91. doi: 10.1385/1-59259-359-3:73. [DOI] [PubMed] [Google Scholar]
  • 20.Suzuki Y., Yoshitomo-Nakagawa K., Maruyama K., Suyama A., Sugano S. Construction and characterization of a full length-enriched and a 5'-end-enriched cDNA library. Gene. 1997;200:149–56. doi: 10.1016/s0378-1119(97)00411-3. [DOI] [PubMed] [Google Scholar]
  • 21.Suzuki Y., Ishihara D., Sasaki M., et al. Statistical analysis of the 5' untranslated region of human mRNA using ‘Oligo-Capped’ cDNA libraries. Genomics. 2000;64:286–97. doi: 10.1006/geno.2000.6076. [DOI] [PubMed] [Google Scholar]
  • 22.Nishikawa T., Ota T., Kawai Y., et al. Database and analysis system for cDNA clones obtained from full-length enriched cDNA libraries. In. Silico Biol. 2002;2:5–18. [PubMed] [Google Scholar]
  • 23.Nishikawa T., Ota T., Kawai Y., et al. Comparison of sequences of cDNA clones obtained from oligo-capping cDNA libraries with those from unigene. DNA Res. 2001;8:255–62. doi: 10.1093/dnares/8.6.255. [DOI] [PubMed] [Google Scholar]
  • 24.Kimura K., Nishikawa T., Nagai K., Sugano S., Isogai T. Intris: A viewer for cDNA-genome alignments enabling efficient detection of splicing variants and expression profiles. Genome Inform. 2002;13:548–50. [Google Scholar]
  • 25.Salamov A.A., Nishikawa T., Swindells M.B. Assessing protein coding region integrity in cDNA sequencing projects. Bioinformatics. 1998;14:384–90. doi: 10.1093/bioinformatics/14.5.384. [DOI] [PubMed] [Google Scholar]
  • 26.Kimura K., Nishikawa T., Nagai K., Sugano S., Nomura N., Isogai T. The translated region inspector for cDNA sequences. Genome Inform. 2003;14:456–7. [Google Scholar]
  • 27.Yamamoto J., Hatano N., Araki H., et al. A cDNA evaluation system for highly efficient sequencing of splicing variant cDNAs. Genome Inform. 2003;14:430–1. [Google Scholar]
  • 28.Facchiano A., Russo K., Facchiano A.M., et al. Identification of a novel domain of fibroblast growth factor 2 controlling its angiogenic properties. J. Biol. Chem. 2003;278:8751–60. doi: 10.1074/jbc.M209936200. [DOI] [PubMed] [Google Scholar]
  • 29.Greene J.M., Li Y.L., Yourey P.A., et al. Identification and characterization of a novel member of the fibroblast growth factor family. Eur. J. Neurosci. 1998;10:1911–25. doi: 10.1046/j.1460-9568.1998.00211.x. [DOI] [PubMed] [Google Scholar]
  • 30.Durand M., Kolpak A., Farrell T., et al. The OXR domain defines a conserved family of eukaryotic oxidation resistance proteins. BMC Cell Biol. 2007;8:13. doi: 10.1186/1471-2121-8-13. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 31.Foster D.A., Xu L. Phospholipase D in cell proliferation and cancer. Mol. Cancer. Res. 2003;1:789–800. [PubMed] [Google Scholar]
  • 32.Nonami A., Kato R., Taniguchi K., et al. Spred-1 negatively regulates interleukin-3-mediated ERK/mitogen-activated protein (MAP) kinase activation in hematopoietic cells. J. Biol. Chem. 2004;279:52543–51. doi: 10.1074/jbc.M405189200. [DOI] [PubMed] [Google Scholar]
  • 33.Adams R.H., Betz H., Puschel A.W. A novel class of murine semaphorins with homology to thrombospondin is differentially expressed during early embryogenesis. Mech. Dev. 1996;57:33–45. doi: 10.1016/0925-4773(96)00525-4. [DOI] [PubMed] [Google Scholar]
  • 34.Yamada Y., Masuda K., Li Q., et al. The structures of the human calcium channel alpha 1 subunit (CACNL1A2) and beta subunit (CACNLB3) genes. Genomics. 1995;27:312–9. doi: 10.1006/geno.1995.1048. [DOI] [PubMed] [Google Scholar]
  • 35.De Pietri Tonelli D., Mihailovich M., Di Cesare A., Codazzi F., Grohovaz F., Zacchetti D. Translational regulation of BACE-1 expression in neuronal and non-neuronal cells. Nucleic Acids Res. 2004;32:1808–17. doi: 10.1093/nar/gkh348. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 36.Chen X.M., Splinter P.L., Tietz P.S., Huang B.Q., Billadeau D.D., LaRusso N.F. Phosphatidylinositol 3-kinase and frabin mediate Cryptosporidium parvum cellular invasion via activation of Cdc42. J. Biol. Chem. 2004;279:31671–8. doi: 10.1074/jbc.M401592200. [DOI] [PubMed] [Google Scholar]
  • 37.Yoon S., Molloy M.J., Wu M.P., Cowan D.B., Gussoni E. C6ORF32 is upregulated during muscle cell differentiation and induces the formation of cellular filopodia. Dev. Biol. 2007;301:70–81. doi: 10.1016/j.ydbio.2006.11.002. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 38.Young J.A., Becker A.M., Medeiros J.J., et al. The protein tyrosine phosphatase PTPN4/PTP-MEG1, an enzyme capable of dephosphorylating the TCR ITAMs and regulating NF-kappaB, is dispensable for T cell development and/or T cell effector functions. Mol. Immunol. 2008;45:3756–66. doi: 10.1016/j.molimm.2008.05.023. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 39.Rhodes D.A., Stammers M., Malcherek G., Beck S., Trowsdale J. The cluster of BTN genes in the extended major histocompatibility complex. Genomics. 2001;71:351–62. doi: 10.1006/geno.2000.6406. [DOI] [PubMed] [Google Scholar]
  • 40.Brugge J., Hung M.C., Mills G.B. A new mutational AKTivation in the PI3K pathway. Cancer. Cell. 2007;12:104–7. doi: 10.1016/j.ccr.2007.07.014. [DOI] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

[Supplementary Data]
dsp022_index.html (841B, html)
dsp022_1.pdf (352.6KB, pdf)

Articles from DNA Research: An International Journal for Rapid Publication of Reports on Genes and Genomes are provided here courtesy of Oxford University Press

RESOURCES