Skip to main content
Nucleic Acids Research logoLink to Nucleic Acids Research
. 2004 May 20;32(9):2912–2924. doi: 10.1093/nar/gkh604

Activation of cryptic 3′ splice sites within introns of cellular genes following gene entrapment

Anna B Osipovich 1, Erica K White-Grindley 1, Geoffrey G Hicks 1,a, Michael J Roshon 1,a, Christian Shaffer 2, Jason H Moore 2, H Earl Ruley 1,*
PMCID: PMC419606  PMID: 15155860

Abstract

Gene trap vectors developed for genome-wide mutagenesis can be used to study factors governing the expression of exons inserted throughout the genome. For example, entrapment vectors consisting of a partial 3′-terminal exon [i.e. a neomycin resistance gene (Neo), a poly(A) site, but no 3′ splice site] were typically expressed following insertion into introns, from cellular transcripts that spliced to cryptic 3′ splice sites present either within the targeting vector or in the adjacent intron. A vector (U3NeoSV1) containing the wild-type Neo sequence preferentially disrupted genes that spliced in-frame to a cryptic 3′ splice site in the Neo coding sequence and expressed functional neomycin phosphotransferase fusion proteins. Removal of the cryptic Neo 3′ splice site did not reduce the proportion of clones with inserts in introns; rather, the fusion transcripts utilized cryptic 3′ splice sites present in the adjacent intron or generated by virus integration. However, gene entrapment with U3NeoSV2 was considerably more random than with U3NeoSV1, consistent with the widespread occurrence of potential 3′ splice site sequences in the introns of cellular genes. These results clarify the mechanisms of gene entrapment by U3 gene trap vectors and illustrate features of exon definition required for 3′ processing and polyadenylation of cellular transcripts.

INTRODUCTION

The sequencing and annotation phases of the mammalian genome projects are nearly complete, with the assembly of human and mouse genome sequences and the placement of over 32 000 genes on the physical maps (14). This structural information is having a great impact on the concepts and strategies used to understand gene function, including practical methods to characterize gene functions on a genome-wide level. For example, tagged sequence mutagenesis uses gene trap vectors to disrupt genes in murine embryonic stem (ES) cells combined with rapid, sequence-based screens to characterize the genes disrupted in each ES cell clone (512). The resulting libraries of mutant stem cell clones provide relatively large numbers of characterized mutations that can be transmitted into the mouse germline for studies of gene function (13).

Gene entrapment vectors developed in our laboratory contain a selectable marker in the U3 region of the long terminal repeat (LTR) of a replication-defective Moloney murine leukemia virus. Selection for U3 gene expression generates clones in which the provirus is positioned within actively transcribed genes and is expressed on transcripts originating in the flanking cellular DNA (14). The vectors have proven to be effective mutagens, as mutation frequencies at selected genes are 100- to 1000-fold higher in cells isolated after gene trap selection than in cells containing randomly integrated retroviruses (15). Moreover, 40–78% of mutations selected in ES cells have been reported to produce discernable phenotypes after transmission into the mouse germline (12,1618).

The U3 gene traps insert a partial, 3′-terminal exon containing a selectable marker and a poly(A) site, but no 3′ splice site (SS). The vectors were designed to select for clones with proviruses inserted into exons, since poly(A) sites are not efficiently utilized unless they are positioned downstream of a splice acceptor site. Thus, polyadenylation and 3′ end formation are inefficient when poly(A) sites are placed between 5′ and 3′ SSs, as in an intron (19,20), and insertion of a 5′ SS into a 3′-terminal exon suppresses polyadenylation (21,22). However, the genes disrupted by U3 gene traps frequently contain proviruses inserted into introns, with variable effects on cellular gene expression. For example, proviruses inserted into introns of genes encoding hnRNP C, EphA2 and Prmt1 induced essentially null mutations (2325), whereas insertion of the U3Neo vector in an intron of the gene encoding hnRNP A2/B1 (26) caused a hypomorphic mutation. In this latter case, some transcripts extending into the provirus spliced to a cryptic 3′ SS located within the neomycin phosphotransferase (NPT) coding sequence, while other transcripts spliced around the provirus, allowing continued expression of the occupied gene.

These issues highlighted the need to define more clearly the mechanisms by which a partial 3′ exon, consisting of a protein coding sequence and poly(A) site but without a 3′ SS, is expressed following insertion into introns. We first analyzed 400 entrapment loci for which genomic DNA sequences flanking the U3NeoSV1 provirus had been previously cloned and sequenced (27), utilizing the annotated mouse genome sequence (28), which was not available at the time of the original study. Most sites of provirus integration could be located on the mouse genome, providing information about the identity and structure of the disrupted genes, including the locations of exons and coding sequences. We report that ∼95% of genes disrupted by U3NeoSV1 contained proviruses inserted into introns. Based on the structures of the occupied genes and molecular analysis of individual clones, the NPT enzyme was typically expressed as a fusion protein from cellular transcripts that spliced in-frame to the cryptic 3′ SS within the NPT coding sequence. This mechanism of gene entrapment thus appeared to preferentially target genes with 5′ exons capable of splicing in-frame to produce functional NPT fusion proteins. To test this prediction, a second gene trap vector was constructed (U3NeoSV2) in which the cryptic Neo splice site was removed by site-directed mutagenesis. As expected, gene entrapment with U3NeoSV2 was considerably more random than with U3NeoSV1. However, elimination of the cryptic Neo 3′ SS did not reduce the proportion of entrapment clones that contained proviruses inserted into introns. Instead, the fusion transcripts utilized cryptic 3′ SSs present in the adjacent intron or generated by viral and cellular sequences brought together by virus integration. These results clarify the mechanisms of gene entrapment by U3 gene trap vectors and illustrate features of exon definition that govern the activity of 3′ processing and polyadenylation signals inserted into expressed cellular genes.

MATERIALS AND METHODS

Embryonic stem cell mutagenesis and cell culture

Cell lines with mutations in expressed cellular genes were generated by tagged sequence mutagenesis, a large-scale functional genomics strategy for disrupting genes in ES cells (27). ES cells were maintained in DMEM supplemented with 15% preselected and heat-inactivated fetal bovine serum, 100 mM non-essential amino acids, 0.1 mM β-mercaptoethanol and 1000 U/ml leukemia inhibitory factor (Esgro®; Gibco).

Analysis of cell–virus fusion transcripts

Total cellular RNA was isolated using the RNAgent RNA isolation system (Promega). An aliquot of 20 µg of each RNA was denatured with formamide, electrophoretically separated on a 1% formaldehyde agarose gel and transferred to a HyBond-N+ filter (Amersham). DNA probes for neomycin phosphotransferase (Neo) and β-lactamase were labeled with [α-32P]dCTP using random primer synthesis. Hybridization was performed at 42°C in 50% formamide, 10% dextran sulfate, 5× SSCPE, 1× Denhardt’s and 10 µg/µl single-stranded (ss)DNA. Blots were washed twice in 1× SSC/0.1% SDS at 65°C, followed by two washes in 0.2× SSC/0.1% SDS at 65°C, and exposed overnight.

RT–PCR was performed as described (26). Briefly, 20 µg of total cellular RNA was treated with 5 U RNase-free DNase (Gibco BRL), extracted with phenol/chloroform, precipitated in ethanol and resuspended in 20 µl of H2O. For cell lines containing the U3NeoSV1 and U3NeoSV2 gene trap vectors, the NeoA primer (5′-ATTGTCTGTTGTGCCCAGTCATA) and 5 µl of each RNA sample were used to produce first strand cDNA. For the Hnrpc mutant cell line containing the U3His gene trap vector, the HisA primer (5′-CGCTGTATTCACGCAGGGCATCG) was used. PCRs (50 µl) contained 2 µl of the RT reaction, 1× buffer, 1.5 mM MgCl, 0.2 mM dNTP, 0.2 µM gene trap-specific reverse primer, 0.2 µM gene-specific forward primer and 5 U Amplitaq (Perkin-Elmer/Cetus). Following initial denaturation at 95°C for 1 min, the reactions proceeded through 35 cycles of 94°C for 1 min, 55°C for 1 min and 72°C for 1.5 min). For cell lines containing U3NeoSV gene trap vectors, PCR reactions used a primer complementary to the provirus, NeoB primer (5′-CGAATAGCCTCTCCACCCAA), together with a primer complementary to the proximal exon of the disrupted cellular gene, as follows: Lamr1, 5′-AGTTCACTGCTGCTCAGCCT; Ddx19, 5′-CAA TGAGACCTACCTGCACC; Mrps30, 5′-GCATTGCCCCCGAAGATCAA; Rps19, 5′-CAGGAGTTCGTCAGAGCTTC; Plk3, 5′-TGCCAGCTTGATAGAGACAG; Gas5, 5′-TTT CGGAGCTGTCCGGCATT; Plk, 5′-CATCCCCATCTT CTGGGTCA; Dhx9, 5′-TCGCCGTTCTCGTGGAAG; Uhg, 5′-TTTTGAGGTGCCACCTTAC; Tk1, 5′-TTCCAGATC GCCCAGTACAAG; Eef1a1, 5′-CTTTTTCGCAACGGG TTTGCCG. For the cell line containing the U3His provirus, the PCRs used HisB (5′-GCTGTTCAGGGCTACAGCTGTTCC) and 4A4 (5′-ACGGTCGTGTGGTTCTT CGCTG) as primers complementary to the provirus and disrupted gene, respectively. An aliquot of 2 µg of genomic DNA (14) from each cell line was amplified in a separate reaction using the same combinations of provirus and gene-specific primers (GSPs). PCR products were separated on a 1.2% agarose gel, transferred to Hybond N membrane and probed with a U3 oligonucleotide (5′-GCTTGCCAAACCTACAGGTG) or in the case of Hnrpc the U1 oligonucleotide (5′-TGAGACCCAAGTCCCCAC) labeled with [α-32P]dCTP using Terminal Transferase (Promega). Hybridization was performed at 48°C in 2× Denhardt’s, 0.6 M NaCl, 10 mM Tris (pH 7.6), 10 mM HEPES (pH 7.9), 5 mM EDTA and 0.2% SDS. Blots were washed twice at 37°C in 2× SSC for 15 min and exposed for several hours. In all cases, duplicate gels were run and the products were gel purified (QIAquick Gel Extraction Kit; Qiagen) and TA subcloned into the pGEM-T Easy Vector (Promega). PCR products were sequenced using a dye terminator cycle sequencing system and an ABI 377 sequencer.

Immunoprecipitation and western analysis

Extracts of cell proteins were prepared from confluent 10 cm plates by detergent lysis in 1 ml of buffer (50 mM Tris–HCl pH 7.5, 100 mM NaCl, 0.5% NP40, 10 µg/ml CLAP and 0.5 PMSF). Fresh protein extract was used for immunoprecipitation as described (29). The lysates were precleared with 25 µl of normal rabbit serum, 1 µl of rabbit polyclonal anti-MDM antibody (Santa Cruz Biotechnology), which was later used as the control antibody for immunoprecipitation, and 25 µl of a 1:1 protein A/Sepharose slurry (Sigma). Precleared lysates were split into two samples and incubated (3 h with gentle rocking at 4°C) with either 2 µl of rabbit polyclonal NPT-specific antibody (5Prime/3Prime) or 1 µl of control antibody (rabbit polyclonal anti-MDM). Immune complexes were recovered 30 min after addition of the protein A/Sepharose. They were then washed three times in NET buffer (50 mM Tris pH 7.6, 150 mM NaCl and 0.1% NP40) and once with final wash buffer (10 mM Tris–HCl pH 7.6, and 0.1% NP40). Fusion proteins were detected using AuroProbe BLplus immunogold reagents (Amersham) as directed by the manufacturer using a 1:500 dilution of rabbit polyclonal NPT-specific antibody.

Construction and gene entrapment by U3NeoSV2

The cryptic 3′ SS in the Neo gene of the U3NeoSV1 gene trap shuttle vector (7) was removed using a Chameleon double-stranded site-directed mutagenesis kit (Stratagene) following the manufacturer’s protocol. NeoMut (5′-GGATTGCACGCTGGTTCTCCGGCCGC-3′) and Sca1Mut (5′-CTGTGACTGGTGACGCGTCAACCAAGT-3′) were used as mutagenic and selective primers, respectively, and mutant templates were verified by DNA sequencing. Gene entrapment with U3NeoSV2 was performed in ES cells as described for U3NeoSV1 (7). Cellular sequences flanking the integrated proviruses were cloned by plasmid rescue and were sequenced with a U3Neo primer (5′-CAGCTAGCTTGCCAAACC-3′). The flanking sequences from all of the entrapment clones have been deposited in the GenBank survey sequence database (dbGSS) under accession nos CL440621–CL441145.

RESULTS

Tagged sequence mutagenesis with U3NeoSV1

Four hundred flanking sequence tags from a library of genes disrupted by the U3NeoSV1 gene trap vector provirus described in an earlier study (27) were re-analyzed against the annotated mouse genome sequence using MultiBlaster, software for performing automated searches of nucleic acid databases (C.Shaffer, E.Ruley and J.Moore, unpublished results). MultiBlaster is constructed around a relational database from which the sequence tags can be retrieved for analysis by the BLASTN (30) and other search engines. Formatted Blast results and information about each mutation are also maintained in the database and both software and data access are controlled via a web browser interface. Blast results formatted by MultiBlaster include hyperlinks to the Entrez map viewer, allowing the user to locate sites of provirus integration on the annotated mouse genome sequence. In most cases, the annotation provided information about the identity and structure of the occupied genes, including the locations of exons and coding sequences. Altogether, >80% of the flanking sequence tags matched unique regions on the murine genome sequence, including 34 inserts that were not contained within an annotated transcription unit. These may represent inserts into previously uncharacterized genes or into regions that would extend the boundaries of nearby transcription units.

In a previous study, 50% of the flanking sequence tags matched cDNA or EST clones up to the site of virus integration, consistent with insertion into an exon (27). However, matches with the annotated mouse genome sequence suggest that ∼95% of U3NeoSV1 proviruses were actually inserted into introns. The discrepancy is due to several factors. First, very little genomic DNA sequence was available at the time of the original study. Therefore, sequences in the public databases that did match provirus-flanking DNA almost always came from cDNAs and ESTs, whereas the majority of flanking sequences derived from introns produced no significant matches. Second, some matches appear to involve intron sequences present in cDNA libraries (e.g. from unspliced transcripts or contaminating genomic DNA) and therefore do not reflect inserts in exons. However, we cannot exclude the possibility that some of these matching cDNAs were derived from less abundant, alternatively spliced transcripts that have not been incorporated into the gene structural annotations.

Genes disrupted by U3NeoSV1 appeared to be randomly distributed on all chromosomes (data not shown). However, gene entrapment was clearly not random. Of the 281 inserts that disrupted previously characterized transcription units 92 (33%) involved genes targeted multiple times in the library, including the gene encoding platelet endothelial cell adhesion molecule (Pecam), which was independently targeted nine times.

Cryptic 3′ SS in NPT predicts the formation of fusion proteins

Previous studies characterized a mutation in the gene encoding hnRNP A2/B1, induced by insertion of the U3Neo gene trap vector into an intron (26). Fusion transcripts initiating in the flanking cellular DNA spliced to a cryptic 3′ SS located within the Neo gene (Fig. 1A). The cryptic 3′ SS contained the requisite AG and potential branch point sequences, but lacked a polypyrimidine tract (Fig. 1B). Splicing to the cryptic site deleted the first nine amino acids of NPT and the appended N-terminal sequences of hnRNP U and hnRNP A2/B1 were predicted to splice in-frame with the NPT coding sequence.

Figure 1.

Figure 1

Use of a cryptic 3′ SS in NPT is predicted when U3NeoSV1 inserts in an intron of an expressed gene. (A) Schematic diagramming the U3Neo provirus in an intron. Fusion transcripts are predicted to splice to a cryptic 3′ SS located in the coding region of NPT. (B) Sequence of 3′ SS in NPT compared with the consensus sequence. Upstream exons are predicted to end in the first nucleotide of the codon to maintain the open reading frame. (C) ES cell clones containing a provirus inserted in an intron that were chosen for further analysis.

To determine if splicing to the cryptic Neo 3′ SS is a common mechanism allowing U3Neo vectors to be expressed from within introns, we analyzed additional inserts from the library of ES cell clones generated by the U3NeoSV1 gene trap vector. We predicted that if splicing occurred between the proximal exon of the disrupted gene and the cryptic Neo 3′ SS, then the protein encoded by the occupied gene should splice in-frame with the NPT enzyme. This prediction proved correct; in 93% of cases the exon proximal to the provirus ended after the first nucleotide of a codon (+1 reading frame), as is required to splice in-frame with the NPT coding sequence.

Analysis of cell–virus fusion transcripts

To verify the structures of the cell–virus fusion transcripts, five inserts located in introns were selected for further analysis (Fig. 1C). Lamr1 (laminin receptor), Ddx19 [DEAD (Asp-Glu-Ala-Asp) box polypeptide 19), Mrps30 (mitochondrial ribosomal protein S30) and Plk3 (a polo-like kinase 3) were chosen randomly from among inserts with well-characterized cDNA sequences. All of the clones contained a single, intact provirus and, taken together, they provided a range of distances between the proximal exon and the integrated provirus. The Rps19 (ribosomal protein S19) mutant clone was chosen because it contained an insert in which the proximal exon could not splice in-frame and raised the possibility of producing functional NPT by an alternative mechanism. Finally, two clones in which the virus had integrated into exons of known genes, Gas5 (growth arrest-specific gene 5) and Plk (polo-like kinase 1), were also analyzed to assess whether the Neo cryptic 3′ SS competes with the splice acceptor of the occupied exon.

Fusion transcripts in the clones listed above were examined by northern blot hybridization using a Neo-specific probe (Fig. 2). Transcripts of varying size were observed, presumably because different amounts of cellular RNA are appended to the inserted Neo genes. Ssrp1 is an ES cell clone containing a Neo gene introduced into the Ssrp1 gene by gene targeting (31). Neo transcripts, expressed from a phosphoglycerol kinase (PGK) promoter in Ssrp1 cells, are ∼1.1 kb. As expected, all provirus-containing clones expressed transcripts >1.2 kb, the size of U3Neo sequences incorporated into fusion transcripts.

Figure 2.

Figure 2

Northern blot analysis of gene trapped cell lines. Total RNA was isolated from D3 and from Ssrp1, Plk, Gas5, Lamr1, Ddx19, Mrps30, Rps19 and Plk3 mutant cell lines and hybridized to a Neo-specific probe. The D3 cell line is an uninfected ES cell line and the Ssrp1 mutant ES cell line contains a PGK-driven Neo gene. Plk and Gas5 are examples where U3NeoSV1 inserted into an exon while Lamr1, Ddx19, Mrps30, Rps19 and Plk3 all contain a provirus located in an intron. LTR: long terminal repeat.

Table 1 compares expected and observed sizes of cell–U3Neo fusion transcripts. Expected sizes reflect the amount of cellular RNA that could be appended to U3Neo, based on the published cDNA sequences (3238). In general, the size of the major fusion transcript was as expected or larger. Since our predictions assumed that the published cDNAs are full-length, larger transcripts may reflect additional 5′ sequences not present in the original cDNA clones. The one exception was Plk, where the size of the expected fusion transcript was smaller than actually observed. This could result from alternative splicing or the use of an alternative promoter in ES cells.

Table 1. Fusion transcripts and proteins expressed following gene entrapment.

Gene Expected transcript size (kb) Observed transcript size (kb) Expected protein mol. wt (kDa) Apparent protein mol. wt (kDa)
Plk 2.6 2.2 29 32
Gas5 1.3 1.3 29 31
Lamr1 2.1 2.6 57 62
Ddx19 2.7 3.3 80 72
Mrps30 1.5 1.6 36 44, 28
Rps19 1.3 1.5 29 30
Plk3 2.4 3.0 N/A N/A

Overall, the northern data suggest that polyadenylation and 3′ end formation occur primarily in the 5′ LTR of the U3 gene trap vector. However, larger NPT-specific transcripts were observed in all the U3NeoSV1 clones. Based on their size and hybridization (data not shown) to the ampicillin resistance gene (Amp, Fig. 1A) present in the body of the provirus, these less-abundant, larger transcripts appear to terminate in the 3′ LTR.

Splicing to a cryptic 3′ SS is necessary for U3Neo expression when the provirus has inserted into an intron

The structures of fusion transcripts across the cell–U3Neo junction were analyzed by RT–PCR (Fig. 3A). A Neo-specific primer, NeoA, was used to produce first strand cDNA. A second, nested Neo-specific primer, NeoB, was used in conjunction with a GSP for PCR. Each GSP was complementary to the upstream exon most proximal to the provirus. Both Neo-specific primers matched sequences downstream of the cryptic 3′ SS. For comparison, regions of genomic DNA were also amplified with the same GSP and NeoB primers.

Figure 3.

Figure 3

Fusion transcripts use cryptic Neo 3′ SS in Lamr1, Ddx19, Mrps30 and Plk3 cell lines. (A) Strategy used to amplify fusion transcripts. Primers used for RT–PCR are indicated along with the U3 probe used for Southern blot analysis, which is upstream of the identified Neo 3′ SS. (B) Ethidium bromide- stained products of RT–PCR and U3 probed Southern blots. Arrows indicate spliced transcripts that use the Neo 3′ SS confirmed by sequence analysis shown on the left with the exon and Neo boundaries labeled. The sizes of the major PCR products (nucleotides) are indicated.

In four cases (Lamr1, Ddx19, Mrps30 and Plk3), the amplified cDNAs included products shorter (marked with arrows in Fig. 3B) than were derived from genomic DNA. These cDNAs did not hybridize to a U3 sequence located upstream of the cryptic splice site (Fig. 3) and presumably reflect splicing between the proximal exon and the Neo 3′ SS. This was confirmed by DNA sequence analysis, as shown in Figure 3B. Larger cDNAs were also amplified from cell lines containing inserts in Lamr1, Ddx19 and Mrps30. These products co-migrated with amplified genomic DNA fragments and hybridized to the U3 probe. These cDNAs were sequenced (data not shown) and were colinear with the genomic DNA.

In principle, NPT could be expressed as a fusion protein from transcripts that splice to the Neo 3′ SS. Alternatively, NPT could be expressed from the unspliced transcripts, but as the upstream sequences have multiple in-fame stop codons, this would require translational reinitiation. To distinguish between these possibilities, NPT proteins were visualized by western blot analysis (Fig. 4). Cells where splicing to the Neo 3′ SS was observed (Lamr1, Ddx19, Mrps30 and Plk3; data not shown) all expressed fusion proteins larger than wild-type NPT (cf. the protein expressed in the Ssrp1 cell line). The apparent molecular weights of the fusion proteins correlated closely with predictions based on gene structure (Table 1). Moreover, none of the lines expressed wild-type NPT, even those (Lamr1 and Ddx19) that expressed higher levels of unspliced transcripts (Fig. 3B). This may reflect the fact that the second open reading frame in a polycistronic mRNA is translated inefficiently in mammalian cells (15,3942). Mrps30 cells also expressed a smaller NPT-related protein, presumably a degradation product of the larger fusion protein.

Figure 4.

Figure 4

Use of the Neo 3′ SS leads to production of NPT fusion proteins. Whole cell protein extracts were immunoprecipitated with anti-NPT (N) specific antibody or anti-MDM (C) as a control. Immunoprecipitated proteins were subjected to SDS–PAGE and western blot analysis with anti-NPT antibody. The RSB cell line was used as a positive control and a marker for native NPT protein. Wild-type ES cells, D3, were used as a negative control. Arrows identify the NPT-specific proteins in Lamr1, Ddx19, Mrps30 and Rps19 mutant cell lines. Proteins larger than native NPT, indicative of fusion proteins, are observed in Lamr1, Ddx19 and Mrps30 mutant cell lines.

Fusion transcripts expressed in cells with the Rps19 insert were also analyzed by RT–PCR. This line represents the occasional clone in which the proximal exon did not end in the +1 reading frame (Fig. 1C) and, therefore, utilization of the Neo 3′ SS was not expected. The cDNA product was significantly smaller than the corresponding genomic DNA, however, it hybridized to the U3 probe (Fig. 5A). This suggested that the proximal exon splices to sequences upstream of the Neo 3′ SS. Indeed, comparison of the cDNA and genomic sequences (Fig. 5B) identified a cryptic 3′ SS located in the intron of the Rps19 gene. This splice site appears to be a better match to the consensus sequence than the Neo 3′ SS. As predicted from the presence of in-frame stop codons in the U3 region upstream of Neo, the Rps19 cells appeared to express wild-type NPT (Fig. 4).

Figure 5.

Figure 5

Cell–U3Neo fusion transcripts in Rps19 cells use a cryptic 3′ SS located in the intron. (A) Ethidium bromide-stained products of RT–PCR and U3 probed Southern blots. Spliced transcript hybridized to the U3 probe indicating that the Neo 3′ SS is not utilized. (B) Diagram of splice junction between U3NeoSV1 provirus (dark gray box) and the Rps19 5′ proximal exon (light gray box). The small white box indicates the single intron nucleotide (G) that is incorporated into the mature transcript. The sequence of the spliced product and the genomic sequence are shown underneath. Consensus splice sites (nearly invariant nucleotides in bold) are also given and compared with the genomic sequence, along with a putative branch point sequence. The sizes of the major PCR products (nucleotides) are indicated.

Splicing to a cryptic 3′ SS is not required for U3Neo expression when the provirus has inserted into an exon

Cell–U3Neo fusion transcripts were also analyzed in two clones (Gas5 and Plk) where the provirus had inserted into an exon. Neither provirus inserted into the first exon and the initiation codons of the disrupted genes were 5′ and 3′ of the Plk and Gas5 inserts, respectively. The proximal Plk exon also ended in the +1 reading frame and, if spliced to the Neo 3′ SS, could generate transcripts that express an NPT fusion protein.

Only one RT–PCR product (B) was observed in Gas5 cells, which also hybridized to the U3 probe (Fig. 6B). DNA sequence analysis revealed that this cDNA resulted from splicing between the proximal exon and the exon occupied by the provirus (data not shown). In contrast, three RT–PCR products were observed in the Plk clone (Fig. 6B), which were sequenced (data not shown). The major product (C) resulted from splicing between the proximal exon and the cryptic Neo 3′ SS. The other products were derived from unspliced transcripts (A) and from splicing between the proximal and occupied exons (B). Both Gas5 and Plk cells apparently expressed only wild-type NPT (Fig. 4). Although the major transcript in Plk cells has the potential to express a Plk–NPT fusion protein, this protein was not observed. The results with both Gas5 and Plk inserts indicate that splicing to a cryptic 3′ SS is not required for U3Neo expression when the provirus has inserted into an exon.

Figure 6.

Figure 6

RT–PCR analysis of U3Neo fusion transcripts in Gas5 and Plk cell lines where U3NeoSV1 inserted in exons. (A) Diagram of RT–PCR strategy and possible transcripts. Neo-specific probes (NeoA and NeoB) and U3 are the same as shown in Figure 3, but the GSP is designed for the 5′ exon proximal to the occupied exon. (B) Ethidium bromide stain and U3 Southern blot of PCR products. Labeled arrows indicate the type of products observed referred to in (A). The sizes of the major PCR products (nucleotides) are indicated.

Tagged sequence mutagenesis with U3NeoSV2.

Gene entrapment by U3NeoSV1 preferentially targeted genes with 5′ exons that could splice in-frame to produce functional NPT fusion proteins. To eliminate this mechanism of gene entrapment, a second gene trap vector was constructed (U3NeoSV2) in which the cryptic Neo acceptor site was removed by changing the invariant A of the 3′ SS to a T (Fig. 7). The U3NeoSV2 vector was introduced into the Phoenix retrovirus packaging line (43) and culture medium from the stable virus-producing clones were used to infect ES cells at a low multiplicity of infection. After 7–10 days of neomycin selection, Neo-resistant clones were expanded for cryopreservation and for isolation of genomic DNA. Cellular DNAs were digested with either EcoRI or BamHI to clone sequences 5′ or 3′ of the provirus, respectively. Aliquots of the digested DNAs were used for Southern blot analysis using Neo- and Amp-specific probes to assess the number and integrity of provirus inserts and to verify the sizes of rescued plasmids (data not shown). Approximately 25% of the clones contained a single LTR, presumably generated by homologous recombination, as previously observed with the U3NeoSV1 vector. DNAs from clones containing single, intact proviruses were used for plasmid rescue. The rescued plasmids typically ranged from 4 to 7 kb in size and corresponded to the size of flanking fragments as assessed by Southern blot hybridization (data not shown).

Figure 7.

Figure 7

Generation of the U3NeoSV2 gene trap retrovirus shuttle vector. A cryptic 3′ SS located downstream of the start codon (AUG) of the Neo gene in U3NeoSV1 was removed by changing an A (shown in bold italic) to a T. This changes an invariant nucleotide in the consensus 3′ SS, shown above the U3NeoSV1 sequence. The schematic shows the structure of U3NeoSV transcripts expressed in packaging cell lines and includes a bacterial plasmid origin of replication (Ori), ampicillin resistance gene (AmpR) and BamHI and EcoRI restriction enzyme cleave sites. The neomycin resistance gene (Neo), its initiation codon (AUG) and cryptic 3′ SS in the 3′ LTR are duplicated in the integrated provirus.

Regions of 5′ flanking DNA were sequenced using a Neo-specific probe, generating a unique sequence tag for each insert of ∼500 bp in size, and were compared with the non-redundant, EST and mouse genome databases using the BLASTN program (30) via MultiBlaster. Search results were evaluated by bit scores, with scores >100 considered significant. Flanking sequences containing repetitive elements were trimmed and used for further database searches.

Starting with 183 flanking sequences, 62 were excluded from further analysis because their size and/or repetitive DNA content precluded meaningful sequence comparisons or because they were derived from sister clones. The remaining 121 sequence tags were informative and included 84 (69%) that matched gene sequences annotated on the mouse genome. Thirty-nine matched known genes for which functional information was available or could be inferred. Forty-five matched transcription units derived from anonymous cDNAs or ESTs. Altogether, 88% of the provirus flanking sequences matched mapped regions of genomic DNA, consistent with the extent to which the mouse genome sequence has been completed. Genes disrupted by U3NeoSV2 appeared to be randomly distributed on all chromosomes with the possible exception of chromosome 8, which had fewer than the expected number of inserts (data not shown).

The targeted genes encode proteins from different functional groups, for example, transcription factors (Stat5b), splicing factors (Zfp162), kinases (Tk1 and Ick), translation factors (Eef1a1), RNA-binding proteins (Bat1a, Dhx9 and Hnrpc), ligands involved in cell signaling (Blnk), cell receptors (Tnfrsf1a) and metabolic enzymes (Cyp26a1). A complete list of targeted genes is available on request and the sequence tags have been deposited in dbGSS. No gene appeared to be independently disrupted in more than one clone, for a repeat frequency of <1%. This is in striking contrast to gene entrapment by the U3NeoSV1 vector, in which 92 of 281 informative sequence tags (33%) involved mutations in genes targeted multiple times in the library. Moreover, with the exception of an insert in the Hnrpc gene, none of the genes disrupted by U3NeoSV2 was among the genes disrupted by U3NeoSV1. Thus, U3NeoSV2 appears to target a more random set of genes than U3NeoSV1.

Based on information obtained using the Entrez map viewer, 78 of the 84 annotated genes disrupted by U3NeoSV2 (93%) contained proviruses inserted into introns, although in some of these cases the flanking DNA also matched one or more ESTs up to the site of virus integration. As discussed above, many of these matching ESTs are probably artifacts resulting from cDNAs cloned from unspliced transcripts or from cloned contaminating genomic DNA. Alternatively, some of the ESTs may be derived from less abundant, alternatively sliced transcripts that have not been incorporated into the gene structural annotations. In either event, gene entrapment by U3NeoSV2, like U3NeoSV1, preferentially targets introns.

Cryptic splice sites are used for expression of U3NeoSV2 vector when inserted in introns

To understand mechanisms that facilitate Neo expression following insertion of the U3NeoSV2 vector into introns, four clones with intron inserts were selected for analysis of cell–Neo fusion transcripts. Total RNA from Dhx9 [DEAH (Asp-Glu-Ala-His) box polypeptide 9] (44), Uhg (U22 snoRNA host gene) (45), Tk1 (thymidine kinase 1) (46) and Eef1a1 (elongation factor 1α) (47) clones was purified and analyzed by RT–PCR. All of the clones contained a single intact integrated provirus (data not shown) and sequences of the upstream, proximal exons were used to design GSPs for RT–PCR. RT reactions were primed using a Neo-specific primer (NeoA) and cDNAs were amplified using a second, nested Neo-specific primer (NeoB) together with the GSP complementary to the upstream exon closest to the U3NeoSV2 provirus (Fig. 8A). The Neo primers were located downstream of the mutated cryptic 3′ SS. Genomic DNAs from each clone were also amplified with NeoB and GSPs to provide size standards for RT–PCR products that would be generated from transcripts colinear with the flanking genomic DNA, i.e. transcripts that are not spliced in the region between the GSP and Neo primer. As shown in Figure 8B (left), RT–PCR products amplified from all four clones included fragments shorter than the intervening genomic DNA. The shorter fragments also hybridized to an oligonucleotide probe corresponding to the end of the viral LTR (Fig. 8B, right). These data suggest that gene–U3Neo fusion transcripts are spliced, presumably between the 5′ SS of the upstream exon and cryptic 3′ SSs adjacent to the provirus. Cell–U3Neo fusion transcripts appeared to be exclusively spliced in the Dhx9, Eef1a1 and Tk1 inserts as determined by the size of RT–PCR products. However, the predominant fragment amplified from the Uhg clone was similar in size to the PCR product derived from genomic DNA and thus appeared to originate from unspliced transcripts extending through the proximal intron.

Figure 8.

Figure 8

Analysis of fusion transcripts expressed in cells following gene trap mutagenesis with U3NeoSV2. (A) RT–PCR strategy for amplification of cell–virus fusion transcripts. Primer NeoA was used for RT followed by amplification with NeoB and GSPs. Genomic DNA amplified with NeoB/GSP primer pair provides a PCR control and a size standard for unspliced fusion transcripts that extend from the proximal exon of the disrupted gene into the provirus. A 30 nt U3 probe was used for Southern blot analysis of PCR products. (B) PCR products amplified from fusion transcripts expressed by inserts in the Dhx9, Uhg, Tk1 and Eef1a1 genes were fractionated on 1% agarose gels and stained with ethidium bromide (RT–PCR, left) and then analyzed by Southern blot hybridization (right), probing with the U3 probe. PCR reactions were performed using total cellular RNA with (RT) and without (–RT) reverse transcription or using genomic DNA (DNA). With the exception of Uhg, the amplified fusion transcripts were smaller than the corresponding genomic DNA sequence, indicative of splicing between the upstream exon and cryptic splice sites upstream of the U3 sequence. Sequences amplified from Uhg fusion transcripts appeared to be derived mainly from transcripts extending from the proximal exon into the provirus, although some spliced products were also detected. The sizes of the major PCR products (nucleotides) are indicated.

To further characterize cell–U3NeoSV2 fusion transcripts expressed following insertion into introns, four independent RT–PCR products from the Dhx9, Eef1a1, Tk1 and Uhg (short product) clones were cloned and sequenced (Fig. 9). Fusion transcripts in all four lines spliced from the proximal exons to cryptic 3′ SSs close to the 5′ end of the U3NeoSV2 provirus. The cryptic 3′ SSs utilized by Dhx9 and Uhg transcripts were derived entirely from intron sequences (Fig. 9A). However, the 3′ SSs utilized by Eef1a1 and Tk1 transcripts were derived from a combination of cellular and viral sequences brought together by virus integration. Specifically, branch point and polypyrimidine sequences were located in the adjacent introns, but the Eef1a1 and Tk1 fusion transcripts spliced after an AG dinucleotide located 6 nt downstream of the cell–virus junction (Fig. 9).

Figure 9.

Figure 9

Cryptic splice sites used for expression of fusion transcripts in Dhx9, Uhg, Tk1 and Eef1a1 mutant ES cells. RT–PCR products amplified from cell–virus fusion transcripts expressed by inserts in the Dhx9, Uhg, Tk1 and Eef1a1 genes were cloned and sequenced. RT–PCR sequences (gray boxes) are highlighted on the genomic DNA sequence, leaving intron sequences unshaded. Provirus sequences are enclosed on three sides by lines. In each case, the provirus-proximal exon of the disrupted gene (left-most gray box) spliced to a cryptic 3′ SS located completely (Dhx9 and Uhg) or partially (Tk1 and Eef1a1) in the intron sequence adjacent to the U3NeoSV2 provirus. Potential branch point sequences are underlined and gaps in the sequence (…) are indicated in nucleotides. Consensus splice site sequences are shown at the bottom for comparison.

Among disrupted genes for which annotated protein coding sequences were available, 36 contained proviruses inserted between protein coding exons and 37 contained proviruses positioned upstream of the protein coding sequence. The translational reading frame of the upstream proximal exons appeared to be random, with similar numbers of 5′ SSs occurring after the first, second and last nucleotide of a codon (10, 14 and 12 instances, respectively). This is in striking contrast to gene entrapment by U3NeoSV1, where 93% of the proximal coding exons were predicted to splice after the first codon nucleotide, allowing the disrupted genes to encode NPT fusion proteins.

Activation of a cryptic 3′ SS in an intron of the gene encoding hnRNP C

We previously characterized a null mutation in the Hnrpc gene induced by the U3His gene trap retrovirus (24). Given the location of the provirus in the first intron upstream of the start codon of hnRNP C we were interested in factors that allowed expression of the inserted HisD-poly(A) sequence. As before, fusion transcripts across the cell–U3His junction were analyzed by RT–PCR, using a combination of vector-specific primers and GSPs (Fig. 10A). Only one cDNA was amplified and it was significantly smaller than the corresponding region of genomic DNA (Fig. 10B). This cDNA did not hybridize to an intron probe (U1, Fig. 10A) and thus appeared to be derived from a spliced transcript. Analysis of the RT–PCR product and genomic sequences indicated that the proximal Hnrpc exon splices to a cryptic 3′ SS located 32 nt upstream of the provirus in the adjacent intron. This splice site closely matches the 3′ SS consensus sequence (Fig. 10C) and it appears to be used efficiently, given the absence of unspliced fusion transcripts.

Figure 10.

Figure 10

Cell–U3His transcripts utilize a cryptic splice site in an intron of the hnRNP C gene trapped cell line. (A) Schematic of U3His insertion in the hnRNP C gene showing primers used for RT–PCR and the U1 intron-specific probe used for Southern blot analysis of PCR products. The location of the cryptic 3′ SS is also indicated. (B) Spliced transcript observed on an ethidium bromide-stained gel that is not detected with the U1 oligonucleotide probe. (C) Genomic sequence of the cryptic 3′ SS located in the hnRNP C intron compared with the consensus sequence. The putative branch point sequence is also given. Bold indicates nearly invariant nucleotides. The sizes of the major PCR products (nucleotides) are indicated.

DISCUSSION

The present study examined the mechanisms by which gene entrapment vectors consisting of a NPT coding sequence and poly(A) site are expressed following insertion throughout the genome and selection for neomycin-resistant clones. Over 93% of the inserted Neo-poly(A) sequences were located in introns of cellular genes and were expressed from transcripts that spliced to cryptic 3′ SSs present either in the vector or in the adjacent intron sequence. The wild-type Neo coding sequence was found to contain a cryptic 3′ SS located 23 nt downstream of the initiation codon and, consequently, a vector (U3NeoSV1) containing the wild-type Neo sequence typically disrupted genes that can splice in-frame to express enzymatically active NPT fusion proteins. Elimination of the cryptic Neo 3′ SS in the otherwise identical U3NeoSV2 vector did not reduce the proportion of entrapment clones that contained proviruses inserted into introns. Instead, the fusion transcripts utilized cryptic 3′ SSs present in the adjacent intron or generated by viral and cellular sequences brought together by virus integration.

The mechanism of gene entrapment by U3NeoSV1 predicted that gene targeting would be biased toward genes capable of splicing in-frame to express enzymatically active NPT fusion proteins and, in fact, mutagenesis with U3NeoSV2 appeared to be considerably more random than with U3NeoSV1. First, 93% of coding exons upstream of U3NeoSV1 spliced after the first codon nucleotide, as required to express NPT fusion proteins, whereas, the translational reading frames of exons upstream of U3NeoSV2 appeared to be random. Second, with exclusion of an insert in Hnrpc, none of the 84 characterized genes disrupted by U3NeoSV2 was represented among 281 mutations induced by U3NeoSV1, including Pecam (platelet endothelial cell adhesion molecule), which was disrupted nine times in the U3NeoSV1 mutant library. Finally, none of the genes disrupted by U3NeoSV2 was targeted more than once, whereas 33% of the genes disrupted by U3NeoSV1 were targeted multiple times.

The greater randomness of gene targeting by U3NeoSV2 as compared to U3NeoSV1 presumably reflects differences in number of sites in the genome that can allow U3Neo gene expression. By inference, sequences that can function as 3′ SSs appear to be widely distributed within introns. The results using U3NeoSV1 also suggest that a variety of protein sequences can be appended to the N-terminus of the NPT enzyme while maintaining enzymatic activity. This is consistent with previous studies of NPT fusion proteins (4850), although in some cases (e.g. in Mrps30 cells) proteolytic degradation may release active enzymes from larger, possibly inactive, fusion proteins (51).

The behavior of U3 gene trap vectors is consistent with an exon definition model of splice site selection. According to the model, factors that bind to the ends of exons, including the 5′-terminal cap, splice sites and poly (A) signal, interact across the exon and establish functional units that can be acted on by the splicing machinery (2628). The definition and polyadenylation of 3′-terminal exons appears to occur co-transcriptionally by factors recruited through RNA polymerase II and by spliceosome assembly (2932). Several observations suggest that polyadenylation is coupled to the activity of a 3′ SS. For example, poly(A) sites are not efficiently recognized when positioned within introns (34,35) and insertion of a 5′ SS into a 3′-terminal exon suppresses polyadenylation (36,37). Conversely, upstream 3′ SSs enhance polyadenylation (33,3840). Finally, mutations in the 3′ SSs of terminal exons result in the activation of cryptic 3′ SSs upstream or downstream of the mutation (41,42). The requirement for exon definition explains why sequences resembling splice sites are frequently found in introns but are not recognized by the splicing machinery (33). However, if the context of these sequences is altered by the insertion of a poly(A) site, as illustrated in the present study by inserts in introns of Rps19, Dhx9, Uhg and Hnrpc, then these splice sites can function efficiently enough for the cells to express drug resistance. The U3 gene traps select for clones in which the polyadenylation signal within the LTR retrovirus is active. In most cases, this involves activation of cryptic 3′ SSs that allow the U3-poly(A) sequence to be defined as a 3′-terminal exon. Exons proximal to the provirus also maintained at least some autonomy, splicing to cryptic sites upstream of the provirus. However, some fusion transcripts extended through the proximal exon and intervening intron sequences into the provirus, as illustrated by inserts disrupting Lamr1, Mrps30 and Uhg. In general, the efficiency of splicing between the proximal exon and cryptic 3′ SS was reduced as the size of the excised intron decreased.

A U3NeoSV1 insert in the gene encoding the S19 ribosomal subunit illustrates the relatively infrequent case in which splicing between the proximal exon and cryptic Neo 3′ SS was predicted to be out-of-fame and, in fact, splicing to the Neo 3′ SS was not observed. Instead, the Rps19 transcripts spliced to a cryptic splice site located in the intron just upstream of the provirus. The resulting Rps19–U3Neo fusion transcripts preserve the NPT initiation codon and express native NPT protein, apparently translated as a downsteam open reading frame. Rps19 transcripts spliced exclusively to the intron site, suggesting that the intron sequence provided a more efficient 3′ SS than the Neo sequence. The Rps19 intron site also more closely matched the 3′ SS consensus sequence than the cryptic Neo 3′ SS. Thus, while splicing to the cryptic Neo 3′ SS is efficient enough to confer neomycin resistance, the provirus may also utilize stronger cryptic splice sites in the flanking cellular DNA, should these be available.

Two cell lines with mutations in Gas5 and Plk were studied to assess how U3 gene expression occurs when U3NeoSV1 inserts into exons. In both instances, fusion transcripts extending into the provirus spliced to the acceptor site of the occupied exon. However, most Plk–U3Neo transcripts were generated by splicing between the proximal (to the occupied exon) exon and the cryptic Neo splice site. Although the proximal Plk exon is in the correct reading frame to produce a fusion protein, NPT expressed by Plk cells was the approximate molecular weight of the native enzyme. It is possible that the NPT is a breakdown product of a larger fusion protein.

Approximately 88% of the flanking sequences matched annotated genes in the mouse genome sequence, providing an independent assessment of the number of genes in the mouse genome. The composite MGSCv3 and HTGS Phase 3 sequence contained 33 958 protein coding genes based on transcript alignment and 49 838 genes predicted by the GenomeScan program (http://www.ncbi.nlm.nih.gov/genome/guide/mouse/MmStats.html). Estimates by the Mouse Genome Sequencing Consortium (50) and others (51,52) range between 30 000 and 45 000 protein coding genes. Inserts that did not appear to disrupt annotated genes may reflect the presence either of new genes or of additional exons that would extend the boundaries of existing transcription units. Transcriptionally active regions not associated with protein coding sequences could theoretically activate provirus gene expression, although there are no data to support this conjecture. In any event, the gene entrapment data are consistent with current estimates of gene number.

Gene entrapment provides an effective strategy to generate and characterize large numbers of mutations in ES cells, providing mutant alleles for transmission into the mouse germline (2225). Insertion of U3 gene trap vectors into introns have had variable effects on the expression of the occupied genes, as illustrated by essentially null mutations in Rangap1 (Ran GTPase activating protein) (43), EphA2 (Eph receptor A2) (44), Hnrpc (45) and Prmt1 (arginine methyltransferase 1) (46), as well as a hypomorphic allele involving Hnrpa2b1 (hnRNP A2/B1) (47). The severity of each mutation appears to depend on the efficiency of splicing between 5′ exons of the occupied gene and cryptic 3′ SSs in or upstream of the entrapment vector. Thus, a hypomorphic mutation in Hnrpa2b1 results from transcripts that skip the relatively weak cryptic 3′ SS in the U3Neo gene, whereas a null mutation in Hnrpc results from the efficient utilization of a cryptic 3′ SS in the adjacent intron sequence.

Other strategies for gene entrapment have encountered issues that affect the randomness of gene targeting and the mutagenic potentials of the targeting vectors (12,48,49). However, these issues are offset by the relative ease with which large numbers of insertion mutations can be characterized and by using a variety of entrapment vectors. In addition, the availability of multiple, independent mutations in the same gene (alleles) may be beneficial in the analysis of gene function.

Acknowledgments

ACKNOWLEDGEMENTS

We thank Tracy Moore-Jarrett and Abudi Nashabi for technical assistance. This work was supported by Public Health Service Grants (R01NS043952, R01RR13166 and P01HL68744 to H.E.R.). Additional support was provided by Cancer Center (Core) grant P30CA68485. E.K.W. and M.J.R. were supported by NRSA (2T32 CA009385) and MSTP (5T32GM07347) training grants, respectively.

DDBJ/EMBL/GenBank accession nos CL440621–CL441145

REFERENCES

  • 1.Lander E.S., Linton,L.M., Birren,B., Nusbaum,C., Zody,M.C., Baldwin,J., Devon,K., Dewar,K., Doyle,M., FitzHugh,W. et al. (2001) Initial sequencing and analysis of the human genome. Nature, 409, 860–921. [DOI] [PubMed] [Google Scholar]
  • 2.Venter J.C., Adams,M.D., Myers,E.W., Li,P.W., Mural,R.J., Sutton,G.G., Smith,H.O., Yandell,M., Evans,C.A., Holt,R.A. et al. (2001) The sequence of the human genome. Science, 291, 1304–1351. [DOI] [PubMed] [Google Scholar]
  • 3.Jackson I.J. (2001) Mouse genomics: making sense of the sequence. Curr. Biol., 11, R311–R314. [DOI] [PubMed] [Google Scholar]
  • 4.Galas D.J. (2001) Sequence interpretation. Making sense of the sequence. Science, 291, 1257–1260. [DOI] [PubMed] [Google Scholar]
  • 5.Skarnes W.C., Auerbach,B.A. and Joyner,A.L. (1992) A gene trap approach in mouse embryonic stem cells: the lacZ reported is activated by splicing, reflects endogenous gene expression and is mutagenic in mice. Genes Dev., 6, 903–918. [DOI] [PubMed] [Google Scholar]
  • 6.Skarnes W.C., Moss,J.E., Hurtley,S.M. and Beddington,R.S. (1995) Capturing genes encoding membrane and secreted proteins important for mouse development. Proc. Natl Acad. Sci. USA, 92, 6592–6596. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 7.Hicks G.G., Shi,E.G., Li,X.M., Li,C.H., Pawlak,M. and Ruley,H.E. (1997) Functional genomics in mice by tagged sequence mutagenesis. Nature Genet., 16, 338–344. [DOI] [PubMed] [Google Scholar]
  • 8.Chowdhury K., Bonaldo,P., Torres,M., Stoykova,A. and Gruss,P. (1997) Evidence for the stochastic integration of gene trap vectors into the mouse germline. Nucleic Acids Res., 25, 1531–1536. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 9.Zambrowicz B.P., Friedrich,G.A., Buxton,E.C., Lilleberg,S.L., Person,C. and Sands,A.T. (1998) Disruption and sequence identification of 2,000 genes in mouse embryonic stem cells. Nature, 392, 608–611. [DOI] [PubMed] [Google Scholar]
  • 10.Wiles M.V., Vauti,F., Otte,J., Fuchtbauer,E.M., Ruiz,P., Fuchtbauer,A., Arnold,H.H., Lehrach,H., Metz,T., von Melchner,H. et al. (2000) Establishment of a gene-trap sequence tag library to generate mutant mice from embryonic stem cells. Nature Genet., 24, 13–14. [DOI] [PubMed] [Google Scholar]
  • 11.Stryke D., Kawamoto,M., Huang,C.C., Johns,S.J., King,L.A., Harper,C.A., Meng,E.C., Lee,R.E., Yee,A., l’Italien,L. et al. (2003) BayGenomics: a resource of insertional mutations in mouse embryonic stem cells. Nucleic Acids Res., 31, 278–281. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 12.Hansen J., Floss,T., Van Sloun,P., Fuchtbauer,E.M., Vauti,F., Arnold,H.H., Schnutgen,F., Wurst,W., von Melchner,H. and Ruiz,P. (2003) A large-scale, gene-driven mutagenesis approach for the functional analysis of the mouse genome. Proc. Natl Acad. Sci. USA, 100, 9918–9922. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 13.Nadeau J.H., Balling,R., Barsh,G., Beier,D., Brown,S.D., Bucan,M., Camper,S., Carlson,G., Copeland,N., Eppig,J. et al. (2001) Sequence interpretation. Functional annotation of mouse genome sequences. Science, 291, 1251–1255. [DOI] [PubMed] [Google Scholar]
  • 14.Hicks G.G., Shi,E.G., Chen,J., Roshon,M., Williamson,D., Scherer,C. and Ruley,H.E. (1995) Retrovirus gene traps. Methods Enzymol., 254, 263–275. [DOI] [PubMed] [Google Scholar]
  • 15.Chang W., Hubbard,S.C., Friedel,C. and Ruley,H.E. (1993) Enrichment of insertional mutants following retrovirus gene trap selection. Virology, 193, 737–747. [DOI] [PubMed] [Google Scholar]
  • 16.Reddy S., Rayburn,H., von Melchner,H. and Ruley,H.E. (1992) Fluorescence-activated sorting of totipotent embryonic stem cells expressing developmentally regulated lacZ fusion genes. Proc. Natl Acad. Sci. USA, 89, 6721–6725. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 17.von Melchner H., DeGregori,J.V., Rayburn,H., Reddy,S., Friedel,C. and Ruley,H.E. (1992) Selective disruption of genes expressed in totipotent embryonal stem cells. Genes Dev., 6, 919–927. [DOI] [PubMed] [Google Scholar]
  • 18.Scherer C.A., Chen,J., Nachabeh,A., Hopkins,N. and Ruley,H.E. (1996) Transcriptional specificity of the pluripotent embryonic stem cell. Cell Growth Differ., 7, 1393–1401. [PubMed] [Google Scholar]
  • 19.Levitt N., Briggs,D., Gil,A. and Proudfoot,N.J. (1989) Definition of an efficient synthetic poly(A) site. Genes Dev., 3, 1019–1025. [DOI] [PubMed] [Google Scholar]
  • 20.Adami G. and Nevins,J.R. (1988) Splice site selection dominates over poly(A) site choice in RNA production from complex adenovirus transcription units. EMBO J., 7, 2107–2116. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 21.Furth P.A., Choe,W.T., Rex,J.H., Byrne,J.C. and Baker,C.C. (1994) Sequences homologous to 5′ splice sites are required for the inhibitory activity of papillomavirus late 3′ untranslated regions. Mol. Cell. Biol., 14, 5278–5289. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 22.Niwa M., MacDonald,C.C. and Berget,S.M. (1992) Are vertebrate exons scanned during splice-site selection? Nature, 360, 277–280. [DOI] [PubMed] [Google Scholar]
  • 23.Chen J., Nachabah,A., Scherer,C., Ganju,P., Reith,A., Bronson,R. and Ruley,H.E. (1996) Germ-line inactivation of the murine Eck receptor tyrosine kinase by gene trap retroviral insertion. Oncogene, 12, 979–988. [PubMed] [Google Scholar]
  • 24.Williamson D.J., Banik-Maiti,S., DeGregori,J. and Ruley,H.E. (2000) hnRNP C is required for postimplantation mouse development but is dispensable for cell viability. Mol. Cell. Biol., 20, 4094–4105. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 25.Pawlak M.R., Scherer,C.A., Chen,J., Roshon,M.J. and Ruley,H.E. (2000) Arginine N-methyltransferase 1 is required for early postimplantation mouse development, but cells deficient in the enzyme are viable. Mol. Cell. Biol., 20, 4859–4869. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 26.Roshon M., DeGregori,J.V. and Ruley,H.E. (2003) Gene trap mutagenesis of hnRNP A2/B1: a cryptic 3′ splice site in the neomycin resistance gene allows continued expression of the disrupted cellular gene. BMC Genomics, 4, 2. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 27.Hicks G.G., Shi,E.G., Li,X.M., Li,C.H., Pawlak,M. and Ruley,H.E. (1997) Functional genomics in mice by tagged sequence mutagenesis. Nature Genet., 16, 338–344. [DOI] [PubMed] [Google Scholar]
  • 28.Waterston R.H., Lindblad-Toh,K., Birney,E., Rogers,J., Abril,J.F., Agarwal,P., Agarwala,R., Ainscough,R., Alexandersson,M., An,P. et al. (2002) Initial sequencing and comparative analysis of the mouse genome. Nature, 420, 520–562. [DOI] [PubMed] [Google Scholar]
  • 29.Harlow E. and Lane,D. (1988) Antibodies: A Laboratory Manual. Cold Spring Harbor Laboratory Press, Cold Spring Harbor, NY. [Google Scholar]
  • 30.Altschul S.F., Gish,W., Miller,W., Myers,E.W. and Lipman,D.J. (1990) Basic local alignment search tool. J. Mol. Biol., 215, 403–410. [DOI] [PubMed] [Google Scholar]
  • 31.Cao S., Bendall,H., Hicks,G.G., Nashabi,A., Sakano,H., Shinkai,Y., Gariglio,M., Oltz,E.M. and Ruley,H.E. (2003) The high-mobility-group box protein SSRP1/T160 is essential for cell viability in day 3.5 mouse embryos. Mol. Cell. Biol., 23, 5301–5307. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 32.Coccia E.M., Cicala,C., Charlesworth,A., Ciccarelli,C., Rossi,G.B., Philipson,L. and Sorrentino,V. (1992) Regulation and expression of a growth arrest-specific gene (gas5) during growth, differentiation and development. Mol. Cell. Biol., 12, 3514–3521. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 33.Clay F.J., McEwen,S.J., Bertoncello,I., Wilks,A.F. and Dunn,A.R. (1993) Identification and cloning of a protein kinase-encoding mouse gene, Plk, related to the polo gene of Drosophila. Proc. Natl Acad. Sci. USA, 90, 4882–4886. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 34.Donohue P.J., Alberts,G.F., Guo,Y. and Winkles,J.A. (1995) Identification by targeted differential display of an immediate early gene encoding a putative serine/threonine kinase. J. Biol. Chem., 270, 10351–10357. [DOI] [PubMed] [Google Scholar]
  • 35.Gee S.L. and Conboy,J.G. (1994) Mouse erythroid cells express multiple putative RNA helicase genes exhibiting high sequence conservation from yeast to mammals. Gene, 140, 171–177. [DOI] [PubMed] [Google Scholar]
  • 36.Michiels L., Van der Rauwelaert,E., Van Hasselt,F., Kas,K. and Merregaert,J. (1993) fau cDNA encodes a ubiquitin-like-S30 fusion protein and is expressed as an antisense sequence in the Finkel-Biskis-Reilly murine sarcoma virus. Oncogene, 8, 2537–2546. [PubMed] [Google Scholar]
  • 37.Rao C.N., Castronovo,V., Schmitt,M.C., Wewer,U.M., Claysmith,A.P., Liotta,L.A. and Sobel,M.E. (1989) Evidence for a precursor of the high-affinity metastasis-associated murine laminin receptor. Biochemistry, 28, 7476–7486. [DOI] [PubMed] [Google Scholar]
  • 38.Suzuki K., Olvera,J. and Wool,I.G. (1990) The primary structure of rat ribosomal protein S19. Biochimie, 72, 299–302. [DOI] [PubMed] [Google Scholar]
  • 39.Kaufman R.J., Murtha,P. and Davies,M.V. (1987) Translational efficiency of polycistronic mRNAs and their utilization to express heterologous genes in mammalian cells. EMBO J., 6, 187–193. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 40.Kozak M. (1987) An analysis of 5′-noncoding sequences from 699 vertebrate messenger RNAs. Nucleic Acids Res., 15, 8125–8148. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 41.Kozak M. (1987) Effects of intercistronic length on the efficiency of reinitiation by eucaryotic ribosomes. Mol. Cell. Biol., 7, 3438–3445. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 42.Peabody D.S., Subramani,S. and Berg,P. (1986) Effect of upstream reading frames on translation efficiency in simian virus 40 recombinants. Mol. Cell. Biol., 6, 2704–2711. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 43.Grignani F., Kinsella,T., Mencarelli,A., Valtieri,M., Riganelli,D., Lanfrancone,L., Peschle,C., Nolan,G.P. and Pelicci,P.G. (1998) High-efficiency gene transfer and selection of human hematopoietic progenitor cells with a hybrid EBV/retroviral vector expressing the green fluorescence protein. Cancer Res., 58, 14–19. [PubMed] [Google Scholar]
  • 44.Lee C.G., Eki,T., Okumura,K., da Costa Soares,V. and Hurwitz,J. (1998) Molecular analysis of the cDNA and genomic DNA encoding mouse RNA helicase A. Genomics, 47, 365–371. [DOI] [PubMed] [Google Scholar]
  • 45.Tycowski K.T., Shu,M.D. and Steitz,J.A. (1996) A mammalian gene with introns instead of exons generating stable RNA products. Nature, 379, 464–466. [DOI] [PubMed] [Google Scholar]
  • 46.Sutterluety H., Bartl,S., Doetzlhofer,A., Khier,H., Wintersberger,E. and Seiser,C. (1998) Growth-regulated antisense transcription of the mouse thymidine kinase gene. Nucleic Acids Res., 26, 4989–4995. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 47.Lee S., Ann,D.K. and Wang,E. (1994) Cloning of human and mouse brain cDNAs coding for S1, the second member of the mammalian elongation factor-1 alpha gene family: analysis of a possible evolutionary pathway. Biochem. Biophys. Res. Commun., 203, 1371–1377. [DOI] [PubMed] [Google Scholar]
  • 48.Sedivy J.M. and Sharp,P.A. (1989) Positive genetic selection for gene disruption in mammalian cells by homologous recombination. Proc. Natl Acad. Sci. USA, 86, 227–231. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 49.Schwartz F., Maeda,N., Smithies,O., Hickey,R., Edelmann,W., Skoultchi,A. and Kucherlapati,R. (1991) A dominant positive and negative selectable gene for use in mammalian cells. Proc. Natl Acad. Sci. USA, 88, 10416–10420. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 50.Chui Y.L., Lozano,F., Jarvis,J.M., Pannell,R. and Milstein,C. (1995) A reporter gene to analyse the hypermutation of immunoglobulin genes. J. Mol. Biol., 249, 555–563. [DOI] [PubMed] [Google Scholar]
  • 51.Reiss B., Sprengel,R. and Schaller,H. (1984) Protein fusions with the kanamycin resistance gene from transposon Tn5. EMBO J., 3, 3317–3322. [DOI] [PMC free article] [PubMed] [Google Scholar]

Articles from Nucleic Acids Research are provided here courtesy of Oxford University Press

RESOURCES