ABSTRACT
Gene trapping is a high-throughput approach that has been used to introduce insertional mutations into the genome of mouse embryonic stem (ES) cells. It is performed with generic gene trap vectors that simultaneously mutate and report the expression of the endogenous gene at the site of insertion and provide a DNA sequence tag for the rapid identification of the disrupted gene. Large-scale international efforts assembled a gene trap library of 566,554 ES cell lines with single gene trap integrations distributed throughout the genome. Here, we re-investigated this unique library and identified mutations in 2202 non-coding RNA (ncRNA) genes, in addition to mutations in 12,078 distinct protein-coding genes. Moreover, we found certain types of gene trap vectors preferentially integrating into genes expressing specific long non-coding RNA (lncRNA) biotypes. Together with all other gene-trapped ES cell lines, lncRNA gene-trapped ES cell lines are readily available for functional in vitro and in vivo studies.
KEY WORDS: Long non-coding RNA, Mutation, Gene trap, Mouse, Mus musculus, Mutagenesis
Summary: Analysis of a large library of mouse embryonic stem cell lines with gene trap insertions revealed mutations in 2202 unique non-coding RNA genes, which will significantly contribute to the functional annotation of non-coding RNA genes.
INTRODUCTION
The comprehensive annotation of the mouse genome has identified over 21,000 protein-coding genes (PCGs), along with more than 15,000 non-coding RNA (ncRNA) genes. To address their function, platforms for large-scale mutagenesis in embryonic stem (ES) cells have been implemented, with the ultimate goal to convert all mutant ES cell lines into mice for subsequent phenotyping. Using high-throughput gene trapping and targeting, the International Knockout Mouse (IKMC) and International Mouse Phenotyping (IMPC) consortia have created an unprecedented resource comprising mutant ES cell lines harboring mutations in ∼18,500 unique PCGs. Of these, over 5000 have been converted into mice and subjected to high-throughput phenotyping (www.mousephenotype.org) (Bradley et al., 2012; Collins et al., 2007; Kaloff et al., 2016; Lloyd et al., 2020; Rosen et al., 2015; Skarnes et al., 2011, 2004). Moreover, genes thus far inaccessible by targeting or trapping are now being addressed individually using CRISPR/Cas9 technology (Brandl et al., 2015; Wefers et al., 2017).
Unlike gene targeting, gene trap strategies rely on generic vectors capable of simultaneously mutating and reporting gene expression at the insertion site as well as providing a sequence tag for seamless gene identification (Friedel and Soriano, 2010). Multiple gene trap vectors have been developed and used in high-throughput screens to generate large libraries of mutant ES cell lines. The vast majority of the ES cell lines assembled by the international consortia were produced with promoter trap vectors, most of which comprise a promoterless reporter and/or selectable marker gene flanked by a 5′ splice acceptor (SA) and a 3′ polyadenylation (pA) sequence (Table S1). Their integrations into an intron of an expressed gene elicits splicing of upstream exons to the reporter gene, resulting in a fusion transcript terminating at the gene trap's pA site and thus truncating the endogenous transcript (Friedrich and Soriano, 1991; Gossler et al., 1989; Skarnes et al., 1992; Wiles et al., 2000; Wurst et al., 1995; Zambrowicz et al., 2003, 1998). Variants thereof either contain type II transmembrane domains fused to the reporter for trapping secretory pathway genes (De-Zolt et al., 2006) or lack a splice acceptor for trapping exons, in which case the reporter is translated from in-frame read-through fusion transcripts (Hicks et al., 1997; von Melchner et al., 1992). Although in theory the latter vector (also referred to as ‘exon traps’) should be activated exclusively from in-frame integrations into exons, in practice a large proportion of these vectors are activated from integrations into introns by adjacent cryptic splice sites (Osipovich et al., 2004). A significantly lower number of ES cell lines were produced with vectors referred to as ‘polyA traps’, in which the reporter genes are flanked by a 5′ constitutive promoter and a 3′ splice donor site, enabling downstream splicing. PolyA trap integrations into introns are expressed from their exogenous promoter and, therefore, unlike most other gene trap vectors, are activated independently of target gene expression (Ishida and Leder, 1999; Niwa et al., 1993; Salminen et al., 1998; Stanford et al., 2006; Yoshida et al., 1995). In a further application, ES cell lines were also generated with gene trap vectors containing both promoter and polyA trap modules, although selection overwhelmingly relied on the promoter trap cassettes (Zambrowicz et al., 2003). Finally, to enable conditional mutagenesis, a significant proportion of ES cell lines were produced with promoter traps equipped with site-specific recombination systems (Schnütgen, 2006; Schnütgen et al., 2005). Overall 566,554 gene-trapped ES cell lines have been produced by the IKMC and can be accessed via the Mouse Genome Informatics (MGI) website (www.informatics.jax.org) (Ringwald et al., 2011). The database covers gene trap integrations into protein-coding and non-coding genes, including long and small non-coding RNA genes.
Long non-coding RNAs (lncRNAs) are defined by a gene length greater than 200 nucleotides, of which 9072 have been annotated in the Ensembl 83 (genome build GRCm38) database. Based on their position relative to PCGs, lncRNA genes were subdivided by the GENCODE consortium into five major classes: (1) long intergenic non-coding RNAs (lincRNAs) located between two protein-coding genes (n=3579); (2) antisense lncRNAs transcribed from the opposite strand of coding genes (n=2189); (3) ‘sense overlapping’ lncRNAs transcribed from the same strand of protein coding genes (n=23 genes); (4) ‘sense intronic’ lncRNAs transcribed from the introns of coding genes (n=253); and (5) ‘bidirectional promoter’ lncRNAs transcribed from the opposite strand within the promoter region of a protein-coding gene (n=12) (Frankish et al., 2019; Harrow et al., 2012). In addition, several lncRNA genes of numerically minor significance are distributed between the following biotypes (1) ‘processed transcript’ biotype, defined by noncoding transcripts without an open reading frame, (2) ‘3′ overlapping’, defined as short non-coding transcripts transcribed from the 3′UTR, (3) ‘macro’, defined by unspliced lncRNA of several kb in size; and (4) ‘to-be-experimentally-confirmed’ (TEC), defined by non-spliced polyadenylated transcripts with an open reading frame, which, pending further experimental validation, presumably encode novel proteins (Frankish et al., 2019; Harrow et al., 2012).
As key regulators of global gene expression, lncRNAs are involved in the regulation of nearly all fundamental biological processes, including development, cell cycle, differentiation, pluripotency, apoptosis, autophagy and cell migration (Fritah et al., 2014). Hence, it is not surprising that deregulation of lncRNA expression can lead to a wide spectrum of diseases (Rinn and Chang, 2012). However, only a minority of lncRNAs have been functionally validated thus far in tissue culture experiments and knockout mice (Bond et al., 2009; Gomez et al., 2013; Grote and Herrmann, 2015; Li et al., 2013; Liu et al., 2014; Nakagawa et al., 2014; Oliver et al., 2015; Sauvageau et al., 2013; Zhang et al., 2013). Given their biological significance, a large-scale analysis of individual lncRNA function(s) seems highly desirable. To facilitate this endeavor, we re-analyzed the existing gene trap libraries and identified 31,069 ES cell lines with gene trap insertions in 2202 unique ncRNA genes (Tables S4 and S5). This freely available resource should significantly support the functional lncRNA annotation effort.
RESULTS
The international gene trap resource
The MGI web portal provides the largest data set of gene trap sequence tags (GTSTs) from mutant murine ES cells generated worldwide by the consortia, institutions and corporations listed with their respective contributions in Table 1. MGI periodically updates vector integration sites by mapping existing GTSTs to the latest mouse genome sequence build (Ringwald et al., 2011). Presently, the database contains 854,155 GTSTs, of which 566,554 are unique. Systematic in-depth analysis of this database revealed 339,779 GTSTs (60%) corresponding to annotated genes and 226,773 (40%) to intergenic regions. For easy accessibility for the user, gene trap clones for a specific gene can be found in the MGI web portal by gene symbol or identifier. All trapped alleles are listed together with information about the vector, the insertion point, the sequence tags and the available mouse lines. Alternatively, a user can search a specified genomic region for gene trap integrations by using the MGI genome browser displaying the gene trap tracks (see tab ‘Search’ and follow the link ‘Mouse Genome Browsers’).
Table 1.
Distribution of gene trap integrations between major gene biotypes
According to their predicted function, the GENCODE consortium (www.gencodegenes.org) subdivides genes into PCGs, lncRNA genes, short non-coding RNA (sncRNA) genes and pseudogenes. Based on this classification, we identified 12,078 (82.1%) of the gene trap integrations in unique PCGs, 2060 (14.0%) in lncRNA genes, 142 (1.0%) in sncRNA genes and 426 (2.9%) in pseudogenes (Table 2). Overall, this corresponds to 55.1% of annotated PCGs and 22.7% of annotated lncRNA genes (Table 2; Tables S4 and S5). Gene trap integrations were significantly enriched in multiple-exon PCGs and processed lncRNA genes consistent with the vast majority of gene trap vectors, for which activation is based on upstream splicing (Tables S2 and S3). Regarding the position of insertion sites relative to transcription start sites, the majority of vectors with SA sites preferred the 5′ ends of both PCGs and lncRNA genes because the larger the 5′ sequence appended to the reporter the less likely the latter will maintain its function. By contrast, polyA trap vectors overwhelmingly select for integration into the 3′ ends of both PCGs and lncRNA genes, as more upstream integrations are generally lost due to nonsense-mediated decay (NMD) (Shigeoka et al., 2005; Stanford et al., 2006) (Fig. 1).
Table 2.
Distribution of gene trap insertions between specific ncRNA biotypes
Seventy one percent of the trapped lncRNAs (1455 of 2060) belonged to the lincRNA (806) and antisense RNA (649) biotypes, which together are the most prevalent lncRNAs in the mouse genome (Table 3). Consistent with the general preference of gene traps to mutate larger, multiple-exon genes, only between 1% and 4% of sncRNAs were trapped primarily by vectors lacking SA sites (Table 3). Although in PCGs only ∼0.1% to 0.8% of gene trap insertions occurred in non-spliced genes, insertions into non-spliced lncRNA genes occurred up to 100 times more frequently (1-13%), reflecting the much higher proportion of non-spliced genes among lncRNAs (Table 4; Table S2).
Table 3.
Table 4.
Regarding gene trap integrations into specific lncRNA biotypes, we found a significant relationship between vector type and lncRNA biotype. Fig. 2 shows that the retroviral promoter trap vectors VICTR74 and VICTR76 used for creating the OmniBankII library integrated with much higher frequency into lincRNA and TEC genes than any other similarly structured vectors. Although the reasons for this preference remain unknown, it is likely that the somewhat more sensitive ES cell culture and selection protocols employed for OmniBankII (Hansen et al., 2008) enabled a more efficient isolation of these rather weakly expressed genes (Derrien et al., 2012; Djebali et al., 2012). As TEC genes represent genomic regions presumably encoding novel proteins, the gene trap libraries provide a useful resource for characterizing novel PCGs. Unlike promoter traps, polyA trap vectors, which are activated independently of gene expression, captured lncRNA genes at a much higher rate than any other vectors. For example, the polyA trap vectors GepNMDi3, Gen-SD5, pGTNMDf, pGTR1.3 and Gep-SD5 (To et al., 2004) were all found with high frequency in antisense and lincRNA genes, most of which are either weakly expressed or not expressed at all in ES cells (Ghosal et al., 2013; Jia et al., 2013; Loewer et al., 2010) (Fig. 2).
Gene trap activation mechanisms in ncRNA genes
Depending on the trapped lncRNA biotype, gene trap integrations were activated by different mechanisms. For example, intron integrations in multiple-exon lincRNAs such as growth arrest-specific transcript 5 (Gas5) were activated from the sense strand similar to the activations seen in PCGs (Fig. 3A). By contrast, integrations into the first intron of the 1110002L01Rik antisense lncRNA, which overlaps the 3′ end of the kinesin family member 3C (Kif3c) and the 5′UTR of the additional sex combs-like 2 (Asxl2) PCGs, was transcribed in antisense direction to Asxl2 and Kif3c (Fig. 3B). Neither of the PCGs was physically affected by the integration, although mutation of the antisense 1110002L01Rik transcript could, in principle, interfere with the expression of either gene. A promoter trap integration into the D0830050J10Rik bidirectional promoter lncRNA encoded from the opposite strand of the v-raf-leukemia viral oncogene 1 (Raf1) PCG was transcribed from the same bidirectional promoter (Fig. 3C), and an integration into the Gm12971 sense intronic lncRNA was transcribed from its own promoter located in the 14th intron of the Pum1 PCG (Fig. 3D). Fig. 3E shows an integration into a sense overlapping lncRNA exemplified by Sox1 overlapping transcript (Sox1ot), which hosts the SRY (sex determining region Y)-box 1 (Sox1) PCG in the first intron. In this arrangement, the fusion transcript initiating at the Sox1ot promoter terminates at the gene trap pA site residing in the seventh Sox1ot exon (Fig. 3E). Finally, Fig. 3F shows a polyA trap activation from an integration into the last intron of the 4932443L11Rik processed transcript lncRNA gene by including the gene trap as a portable exon.
DISCUSSION
In this study, we re-analyzed a library of 566,554 mutant mouse ES cell lines produced in multiple large-scale gene trap mutagenesis projects. Although the library of mutant ES cell lines was originally produced to study the function of PCGs, the present analysis revealed that the library contains 31,069 ES cell lines with mutations in 2202 unique ncRNA genes, in addition to the ES cell lines with mutations in 12,078 unique PCGs, and provides a useful resource for the functional characterization of many ncRNAs. The cell lines can be used in vitro to explore the role of ncRNAs in controlling ES cell pluripotency and differentiation (Chakraborty et al., 2012; Dinger et al., 2008; Fisher et al., 2017; Guttman et al., 2011; Sheik Mohamed et al., 2010) and can be readily converted into mutant mice for functional studies at organismal level. It is also worthwhile noting that all traps contain a LacZ reporter, easily enabling the in vivo analysis of lncRNA activity at cellular level, which is particularly useful in mutant mouse embryo phenotyping (Dickinson et al., 2016).
The GENCODE reference human and mouse genome annotation database contains three major functional categories of genes: PCGs, non-coding genes and pseudogenes (Harrow et al., 2012). Although gene trap insertions have been found in all these gene classes, a significant proportion involved intergenic regions (Table 2). Considering that 75% of the human genome is covered by primary transcripts and 62% by processed transcripts (Djebali et al., 2012), it is not surprising that 40% of all gene trap integrations were activated from non-annotated genomic regions, thus reflecting the high untapped potential of the gene trap approach for novel gene discovery. In line with this, the existing gene trap resource provides a unique means for resolving the biological significance of not yet annotated genes (Chi, 2016).
Comparison of the integration targets of the different types of gene trap vectors revealed that, owing to fusion transcript size constrictions, promoter trap vectors preferentially integrated near the 5′ ends of both PCGs and multiple-exon lncRNAs (Fig. 1A). However, polyA trap vectors overwhelmingly inserted near the 3′ ends of PCGs and lncRNA genes to produce relatively short fusion transcripts unsusceptible to NMD (Fig. 1B) (Shigeoka et al., 2005; Stanford et al., 2006). In confirmation of previous observations suggesting that gene expression is an important trappability-defining factor for both promoterless and polyA trap vectors (Nord et al., 2007), we found that 90% of the lncRNA genes trapped with promoterless or polyA trap vectors are expressed in ES cells (data not shown).
As ∼900 lncRNAs harbored multiple gene trap integrations at different locations, the ES cell library also provides allelic series for a multitude of lncRNA genes that are extremely useful for specifying distinct functional domains. For example, trapping different regions of the Gas5 gene resulted in a series of Gas5 truncation alleles affecting different protein functions (Fig. 3). Gas5 is a tumor suppressor gene involved in several types of cancer and encodes several molecular functions over its length (Ma et al., 2016), including (1) a glucocorticoid response element (GRE) that competes with DNA for binding to the glucocorticoid receptor DNA-binding domain encoded by a stem-loop structure within the Gas5 exon 12 (Kino et al., 2010); (2) a mir-21-binding function in exon 4 acting as a miRNA sponge regulating mir-21 levels, which are important in development, cancer, cardiovascular disease and inflammation (Zhang et al., 2013); and (3) an eIF4E-binding function, a key factor of the translation initiation complex (Hu et al., 2014). As shown in Fig. 3A, all these specific functions can be addressed by simply selecting the appropriate gene trap clones for in vitro and in vivo studies. In support of the in vivo value of the lncRNA gene trap lines, Miard et al. (2017) recently published a Malat1 lncRNA knockout mouse produced with a VICTR74-expressing OmniBankII gene trap clone (IST14461G11). The Malat1 lncRNA is overexpressed in many types of cancers, including hepatocellular carcinoma, and induces cell proliferation in several cell lines in vitro. Although its inactivation had no effect on liver carcinogenesis in mice treated with the genotoxic agent diethylnitrosamine (DEN), DEN-treated knockout mice developed a robust hypercholesterinemia, implicating Malat1 in the regulation of cholesterol metabolism (Miard et al., 2017).
Finally, mutant alleles of lncRNAs containing a reporter gene can nowadays be established de novo using CRISPR/Cas9 knock-in strategies in mouse ES cells or mouse zygotes (Wefers et al., 2017; Yao et al., 2018). However, notwithstanding the simplicity of the technology, the generation of allelic series including proper quality controls is still quite time consuming, requiring rigorous genotyping to exclude frequently occurring on-target mutations such as large deletions, insertions, inversions and translocation (Boroviak et al., 2017; Kosicki et al., 2018).
Although the functional characterization of all PCGs is well underway, currently comprising ∼5000 already phenotyped mouse mutants, the next big challenge will be the functional dissection of all non-coding genes for which the existing mutant lncRNA ES cell library provides an unprecedented resource.
MATERIAL AND METHODS
Gene trap data
Gene trap sequence tags and their mouse genome coordinates were downloaded from the MGI web portal (www.informatics.jax.org; download on 19 January 2016). We filtered the data set with the objective to finally have one representative sequence tag with a high-quality alignment per vector integration, which was unequivocally mapped to the genome. First, we discarded sequence tags that did not result in a unique high-quality alignment. Insertions that resulted in multiple high-quality alignments and non-successful mappings were also discarded. In a final step all high-quality alignments with the mouse genome, which were indicated as ‘non-representative’, were filtered out.
Genome data
Software to identify the genomic locus for each gene trap vector insertion site was written in Perl 5.8.8 programming language and uses BioPerl libraries. Genome features at each locus mutated by a gene trap vector integration event were retrieved from the Ensembl database (Yates et al., 2016) using the Ensembl application programming interface (Release 83; www.ensembl.org; genome build GRCm38). Gene models were categorized into biotypes according to the reference gene sets for the mouse published by the GENCODE consortium (version M8 August 2015) (Harrow et al., 2012).
Statistical testing
To study the significance of gene trap vector integration frequencies over gene length we used a G-test of goodness-of-fit. To determine whether gene trap insertions with a specific vector are over-represented in a given gene biotype, i.e. more integrations are present in genes of a specific gene biotype than expected by chance, a two-by-two contingency table was constructed and Fisher's exact test was performed. The procedure was repeated for each gene trap vector, and adjusted P-values were computed to control the false discovery rate (Benjamini and Hochberg, 1995). Categories with a P-value not greater than the corresponding adjusted P-value were considered significant. The false discovery rate constraint was set to 0.01. All statistical analyses were performed with R statistical software (R v3.3.1; www.r-project.org), using packages stats, RVAideMemoire, gplots and graphics.
Supplementary Material
Acknowledgements
We thank all our colleagues for generating these large-scale gene trap resources. The excellent technical assistance and system administration of Bernd Lentes is gratefully acknowledged.
Footnotes
Competing interests
The authors declare no competing or financial interests.
Author contributions
Conceptualization: J.H., H.v.M., W.W.; Methodology: J.H.; Software: J.H.; Validation: J.H., H.v.M., W.W.; Formal analysis: J.H.; Investigation: J.H., H.v.M., W.W.; Resources: J.H., W.W.; Data curation: J.H.; Writing - original draft: J.H., W.W.; Writing - review & editing: J.H., H.v.M., W.W.; Visualization: J.H., H.v.M., W.W.; Supervision: W.W.; Project administration: W.W.; Funding acquisition: W.W.
Funding
This work was supported by National Genome Research Network Plus Project ‘From Disease Genes to Protein Pathways’ [FKZ 01GS0858] by Bundesministerium für Bildung, Wissenschaft und Forschung, and the projects ‘EUCOMM’ [FP6 grant number LSHM-CT-2005-01893] and ‘I-DCC: The International Data Coordination Centre’ [FP7-HEALTH-2007-2.1.2-6-223592], by the European Commission.
Supplementary information
Supplementary information available online at https://dmm.biologists.org/lookup/doi/10.1242/dmm.047803.supplemental
References
- Benjamini, Y. and Hochberg, Y. (1995). Controlling the false discovery rate: a practical and powerful approach to multiple testing. J. R. Stat. Soc. Series B (Methodological). 57, 289-300. 10.1111/j.2517-6161.1995.tb02031.x [DOI] [Google Scholar]
- Bond, A. M., Vangompel, M. J. W., Sametsky, E. A., Clark, M. F., Savage, J. C., Disterhoft, J. F. and Kohtz, J. D. (2009). Balanced gene regulation by an embryonic brain ncRNA is critical for adult hippocampal GABA circuitry. Nat. Neurosci. 12, 1020-1027. 10.1038/nn.2371 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Boroviak, K., Fu, B., Yang, F., Doe, B. and Bradley, A. (2017). Revealing hidden complexities of genomic rearrangements generated with Cas9. Sci. Rep. 7, 12867 10.1038/s41598-017-12740-6 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Bradley, A., Anastassiadis, K., Ayadi, A., Battey, J. F., Bell, C., Birling, M.-C., Bottomley, J., Brown, S. D., Bürger, A., Bult, C. J.et al. (2012). The mammalian gene function resource: the international knockout mouse consortium. Mamm. Genome 23, 580-586. 10.1007/s00335-012-9422-2 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Brandl, C., Ortiz, O., Röttig, B., Wefers, B., Wurst, W. and Kühn, R. (2015). Creation of targeted genomic deletions using TALEN or CRISPR/Cas nuclease pairs in one-cell mouse embryos. FEBS Open Bio 5, 26-35. 10.1016/j.fob.2014.11.009 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Chakraborty, D., Kappei, D., Theis, M., Nitzsche, A., Ding, L., Paszkowski-Rogacz, M., Surendranath, V., Berger, N., Schulz, H., Saar, K.et al. (2012). Combined RNAi and localization for functionally dissecting long noncoding RNAs. Nat. Methods 9, 360-362. 10.1038/nmeth.1894 [DOI] [PubMed] [Google Scholar]
- Chi, K. R. (2016). The dark side of the human genome. Nature 538, 275-277. 10.1038/538275a [DOI] [PubMed] [Google Scholar]
- Cobellis, G., Nicolaus, G., Iovino, M., Romito, A., Marra, E., Barbarisi, M., Sardiello, M., Di Giorgio, F. P., Iovino, N., Zollo, M.et al. (2005). Tagging genes with cassette-exchange sites. Nucleic Acids Res. 33, e44 10.1093/nar/gni045 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Collins, F. S., Rossant, J. and Wurst, W. (2007). A mouse for all reasons. Cell 128, 9-13. 10.1016/j.cell.2006.12.018 [DOI] [PubMed] [Google Scholar]
- De-Zolt, S., Schnütgen, F., Seisenberger, C., Hansen, J., Hollatz, M., Floss, T., Ruiz, P., Wurst, W. and von Melchner, H. (2006). High-throughput trapping of secretory pathway genes in mouse embryonic stem cells. Nucleic Acids Res. 34, e25 10.1093/nar/gnj026 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Derrien, T., Johnson, R., Bussotti, G., Tanzer, A., Djebali, S., Tilgner, H., Guernec, G., Martin, D., Merkel, A., Knowles, D. G.et al. (2012). The GENCODE v7 catalog of human long noncoding RNAs: analysis of their gene structure, evolution, and expression. Genome Res. 22, 1775-1789. 10.1101/gr.132159.111 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Dickinson, M. E., Flenniken, A. M., Ji, X., Teboul, L., Wong, M. D., White, J. K., Meehan, T. F., Weninger, W. J., Westerberg, H., Adissu, H.et al. (2016). High-throughput discovery of novel developmental phenotypes. Nature 537, 508-514. 10.1038/nature19356 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Dinger, M. E., Amaral, P. P., Mercer, T. R., Pang, K. C., Bruce, S. J., Gardiner, B. B., Askarian-Amiri, M. E., Ru, K., Solda, G., Simons, C.et al. (2008). Long noncoding RNAs in mouse embryonic stem cell pluripotency and differentiation. Genome Res. 18, 1433-1445. 10.1101/gr.078378.108 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Djebali, S., Davis, C. A., Merkel, A., Dobin, A., Lassmann, T., Mortazavi, A., Tanzer, A., Lagarde, J., Lin, W., Schlesinger, F.et al. (2012). Landscape of transcription in human cells. Nature 489, 101-108. 10.1038/nature11233 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Fisher, C. L., Marks, H., Cho, L. T.-Y., Andrews, R., Wormald, S., Carroll, T., Iyer, V., Tate, P., Rosen, B., Stunnenberg, H. G.et al. (2017). An efficient method for generation of bi-allelic null mutant mouse embryonic stem cells and its application for investigating epigenetic modifiers. Nucleic Acids Res. 45, e174 10.1093/nar/gkx811 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Frankish, A., Diekhans, M., Ferreira, A.-M., Johnson, R., Jungreis, I., Loveland, J., Mudge, J. M., Sisu, C., Wright, J., Armstrong, J.et al. (2019). GENCODE reference annotation for the human and mouse genomes. Nucleic Acids Res. 47, D766-D773. 10.1093/nar/gky955 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Friedel, R. H. and Soriano, P. (2010). Gene trap mutagenesis in the mouse. Methods Enzymol. 477, 243-269. 10.1016/S0076-6879(10)77013-0 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Friedel, R. H., Seisenberger, C., Kaloff, C. and Wurst, W. (2007). EUCOMM the European conditional mouse mutagenesis program. Brief Funct. Genomic. Proteomic. 6, 180-185. 10.1093/bfgp/elm022 [DOI] [PubMed] [Google Scholar]
- Friedrich, G. and Soriano, P. (1991). Promoter traps in embryonic stem cells: a genetic screen to identify and mutate developmental genes in mice. Genes Dev. 5, 1513-1523. 10.1101/gad.5.9.1513 [DOI] [PubMed] [Google Scholar]
- Fritah, S., Niclou, S. P. and Azuaje, F. (2014). Databases for lncRNAs: a comparative evaluation of emerging tools. RNA 20, 1655-1665. 10.1261/rna.044040.113 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Ghosal, S., Das, S. and Chakrabarti, J. (2013). Long noncoding RNAs: new players in the molecular mechanism for maintenance and differentiation of pluripotent stem cells. Stem Cells Dev. 22, 2240-2253. 10.1089/scd.2013.0014 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Gomez, J. A., Wapinski, O. L., Yang, Y. W., Bureau, J.-F., Gopinath, S., Monack, D. M., Chang, H. Y., Brahic, M. and Kirkegaard, K. (2013). The NeST long ncRNA controls microbial susceptibility and epigenetic activation of the interferon-γ locus. Cell 152, 743-754. 10.1016/j.cell.2013.01.015 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Gossler, A., Joyner, A. L., Rossant, J. and Skarnes, W. C. (1989). Mouse embryonic stem cells and reporter constructs to detect developmentally regulated genes. Science 244, 463-465. 10.1126/science.2497519 [DOI] [PubMed] [Google Scholar]
- Grote, P. and Herrmann, B. G. (2015). Long noncoding RNAs in organogenesis: Making the difference. Trends Genet. 31, 329-335. 10.1016/j.tig.2015.02.002 [DOI] [PubMed] [Google Scholar]
- Guttman, M., Donaghey, J., Carey, B. W., Garber, M., Grenier, J. K., Munson, G., Young, G., Lucas, A. B., Ach, R., Bruhn, L.et al. (2011). lincRNAs act in the circuitry controlling pluripotency and differentiation. Nature 477, 295-300. 10.1038/nature10398 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Hansen, J., Floss, T., Van Sloun, P., Füchtbauer, E.-M., Vauti, F., Arnold, H.-H., Schnütgen, F., Wurst, W., von Melchner, H. and Ruiz, P. (2003). A large-scale, gene-driven mutagenesis approach for the functional analysis of the mouse genome. Proc. Natl. Acad. Sci. USA 100, 9918-9922. 10.1073/pnas.1633296100 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Hansen, G. M., Markesich, D. C., Burnett, M. B., Zhu, Q., Dionne, K. M., Richter, L. J., Finnell, R. H., Sands, A. T., Zambrowicz, B. P. and Abuin, A. (2008). Large-scale gene trapping in C57BL/6N mouse embryonic stem cells. Genome Res. 18, 1670-1679. 10.1101/gr.078352.108 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Harrow, J., Frankish, A., Gonzalez, J. M., Tapanari, E., Diekhans, M., Kokocinski, F., Aken, B. L., Barrell, D., Zadissa, A., Searle, S.et al. (2012). GENCODE: the reference human genome annotation for the ENCODE project. Genome Res. 22, 1760-1774. 10.1101/gr.135350.111 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Hicks, G. G., Shi, E.-G., Li, X.-M., Li, C.-H., Pawlak, M. and Ruley, H. E. (1997). Functional genomics in mice by tagged sequence mutagenesis. Nat. Genet. 16, 338-344. 10.1038/ng0897-338 [DOI] [PubMed] [Google Scholar]
- Hu, G., Lou, Z. and Gupta, M. (2014). The long non-coding RNA GAS5 cooperates with the Eukaryotic translation initiation factor 4E to regulate c-Myc translation. PLoS ONE 9, e107016 10.1371/journal.pone.0107016 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Ishida, Y. and Leder, P. (1999). RET: a poly A-trap retrovirus vector for reversible disruption and expression monitoring of genes in living cells. Nucleic Acids Res. 27, e35-e42. 10.1093/nar/27.24.e35 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Jia, W., Chen, W. and Kang, J. (2013). The functions of microRNAs and long non-coding RNAs in embryonic and induced pluripotent stem cells. Genomics Proteomics Bioinformatics 11, 275-283. 10.1016/j.gpb.2013.09.004 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Kaloff, C., Anastassiadis, K., Ayadi, A., Baldock, R., Beig, J., Birling, M.-C., Bradley, A., Brown, S. D. M., Bürger, A., Bushell, W.et al. (2016). Genome wide conditional mouse knockout resources. Drug Discov. Today: Dis. Models 20, 3-12. 10.1016/j.ddmod.2017.08.002 [DOI] [Google Scholar]
- Kino, T., Hurt, D. E., Ichijo, T., Nader, N. and Chrousos, G. P. (2010). Noncoding RNA gas5 is a growth arrest– and starvation-associated repressor of the glucocorticoid receptor. Sci. Signal. 3, ra8 10.1126/scisignal.2000568 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Kosicki, M., Tomberg, K. and Bradley, A. (2018). Repair of double-strand breaks induced by CRISPR-Cas9 leads to large deletions and complex rearrangements. Nat. Biotechnol. 36, 765-771. 10.1038/nbt.4192 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Li, L., Liu, B., Wapinski, O. L., Tsai, M.-C., Qu, K., Zhang, J., Carlson, J. C., Lin, M., Fang, F., Gupta, R. A.et al. (2013). Targeted disruption of Hotair leads to homeotic transformation and gene derepression. Cell Rep. 5, 3-12. 10.1016/j.celrep.2013.09.003 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Liu, F.-L., Liu, T.-Y. and Kung, F.-L. (2014). FKBP12 regulates the localization and processing of amyloid precursor protein in human cell lines. J. Biosci. 39, 85-95. 10.1007/s12038-013-9400-1 [DOI] [PubMed] [Google Scholar]
- Lloyd, K. C. K. (2003). The mutant mouse regional resource center program. Breast Cancer Res. 5, 7 10.1186/bcr666 [DOI] [Google Scholar]
- Lloyd, K. C. K., Adams, D. J., Baynam, G., Beaudet, A. L., Bosch, F., Boycott, K. M., Braun, R. E., Caulfield, M., Cohn, R., Dickinson, M. E.et al. (2020). The deep genome project. Genome Biol. 21, 18 10.1186/s13059-020-1931-9 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Loewer, S., Cabili, M. N., Guttman, M., Loh, Y.-H., Thomas, K., Park, I. H., Garber, M., Curran, M., Onder, T., Agarwal, S.et al. (2010). Large intergenic non-coding RNA-RoR modulates reprogramming of human induced pluripotent stem cells. Nat. Genet. 42, 1113-1117. 10.1038/ng.710 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Ma, C., Shi, X., Zhu, Q., Li, Q., Liu, Y., Yao, Y. and Song, Y. (2016). The growth arrest-specific transcript 5 (GAS5): a pivotal tumor suppressor long noncoding RNA in human cancers. Tumor Biol. 37, 1437-1444. 10.1007/s13277-015-4521-9 [DOI] [PubMed] [Google Scholar]
- Miard, S., Girard, M.-J., Joubert, P., Carter, S., Gonzales, A., Guo, H., Morpurgo, B., Boivin, L., Golovko, A. and Picard, F. (2017). Absence of Malat1 does not prevent DEN-induced hepatocarcinoma in mice. Oncol. Rep. 37, 2153-2160. 10.3892/or.2017.5468 [DOI] [PubMed] [Google Scholar]
- Nakagawa, S., Shimada, M., Yanaka, K., Mito, M., Arai, T., Takahashi, E., Fujita, Y., Fujimori, T., Standaert, L., Marine, J.-C.et al. (2014). The lncRNA Neat1 is required for corpus luteum formation and the establishment of pregnancy in a subpopulation of mice. Development 141, 4618-4627. 10.1242/dev.110544 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Niwa, H., Araki, K., Kimura, S., Taniguchi, S., Wakasugi, S. and Yamamura, K. (1993). An efficient gene-trap method using poly a trap vectors and characterization of gene-trap events. J. Biochem. 113, 343-349. 10.1093/oxfordjournals.jbchem.a124049 [DOI] [PubMed] [Google Scholar]
- Nord, A. S., Vranizan, K., Tingley, W., Zambon, A. C., Hanspers, K., Fong, L. G., Hu, Y., Bacchetti, P., Ferrin, T. E., Babbitt, P. C.et al. (2007). Modeling insertional mutagenesis using gene length and expression in murine embryonic stem cells. PloS ONE 2, e617 10.1371/journal.pone.0000617 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Oliver, P. L., Chodroff, R. A., Gosal, A., Edwards, B., Cheung, A. F. P., Gomez-Rodriguez, J., Elliot, G., Garrett, L. J., Lickiss, T., Szele, F.et al. (2015). Disruption of Visc-2, a brain-expressed conserved long noncoding RNA, does not elicit an overt anatomical or behavioral phenotype. Cereb. Cortex 25, 3572-3585. 10.1093/cercor/bhu196 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Osipovich, A. B., White-Grindley, E. K., Hicks, G. G., Roshon, M. J., Shaffer, C., Moore, J. H. and Ruley, H. E. (2004). Activation of cryptic 3′ splice sites within introns of cellular genes following gene entrapment. Nucleic Acids Res. 32, 2912-2924. 10.1093/nar/gkh604 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Osipovich, A. B., Singh, A. and Ruley, H. E. (2005). Post-entrapment genome engineering: first exon size does not affect the expression of fusion transcripts generated by gene entrapment. Genome Res. 15, 428-435. 10.1101/gr.3258105 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Ringwald, M., Iyer, V., Mason, J. C., Stone, K. R., Tadepally, H. D., Kadin, J. A., Bult, C. J., Eppig, J. T., Oakley, D. J., Briois, S.et al. (2011). The IKMC web portal: a central point of entry to data and resources from the international knockout mouse consortium. Nucleic Acids Res. 39, D849-D855. 10.1093/nar/gkq879 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Rinn, J. L. and Chang, H. Y. (2012). Genome regulation by long noncoding RNAs. Annu. Rev. Biochem. 81, 145-166. 10.1146/annurev-biochem-051410-092902 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Rosen, B., Schick, J. and Wurst, W. (2015). Beyond knockouts: the International knockout mouse consortium delivers modular and evolving tools for investigating mammalian genes. Mamm. Genome 26, 456-466. 10.1007/s00335-015-9598-3 [DOI] [PubMed] [Google Scholar]
- Salminen, M., Meyer, B. I. and Gruss, P. (1998). Efficient poly a trap approach allows the capture of genes specifically active in differentiated embryonic stem cells and in mouse embryos. Dev. Dyn. 212, 326-333. [DOI] [PubMed] [Google Scholar]
- Sauvageau, M., Goff, L. A., Lodato, S., Bonev, B., Groff, A. F., Gerhardinger, C., Sanchez-Gomez, D. B., Hacisuleyman, E., Li, E., Spence, M.et al. (2013). Multiple knockout mouse models reveal lincRNAs are required for life and brain development. eLife 2, e01749 10.7554/eLife.01749 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Schnütgen, F. (2006). Generation of multipurpose alleles for the functional analysis of the mouse genome. Brief Funct. Genomic. Proteomic. 5, 15-18. 10.1093/bfgp/ell009 [DOI] [PubMed] [Google Scholar]
- Schnütgen, F., De-Zolt, S., van Sloun, P., Hollatz, M., Floss, T., Hansen, J., Altschmied, J., Seisenberger, C., Ghyselinck, N. B., Ruiz, P.et al. (2005). Genomewide production of multipurpose alleles for the functional analysis of the mouse genome. Proc. Natl. Acad. Sci. USA 102, 7221-7226. 10.1073/pnas.0502273102 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Sheik Mohamed, J., Gaughwin, P. M., Lim, B., Robson, P. and Lipovich, L. (2010). Conserved long noncoding RNAs transcriptionally regulated by Oct4 and Nanog modulate pluripotency in mouse embryonic stem cells. RNA 16, 324-337. 10.1261/rna.1441510 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Shigeoka, T., Kawaichi, M. and Ishida, Y. (2005). Suppression of nonsense-mediated mRNA decay permits unbiased gene trapping in mouse embryonic stem cells. Nucleic Acids Res. 33, e20 10.1093/nar/gni022 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Skarnes, W. C., Auerbach, B. A. and Joyner, A. L. (1992). A gene trap approach in mouse embryonic stem cells: The lacZ reported is activated by splicing, reflects endogenous gene expression, and is mutagenic in mice. Genes Dev. 6, 903-918. 10.1101/gad.6.6.903 [DOI] [PubMed] [Google Scholar]
- Skarnes, W. C., von Melchner, H., Wurst, W., Hicks, G., Nord, A. S., Cox, T., Young, S. G., Ruiz, P., Soriano, P., Tessier-Lavigne, M.et al. (2004). A public gene trap resource for mouse functional genomics. Nat. Genet. 36, 543-544. 10.1038/ng0604-543 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Skarnes, W. C., Rosen, B., West, A. P., Koutsourakis, M., Bushell, W., Iyer, V., Mujica, A. O., Thomas, M., Harrow, J., Cox, T.et al. (2011). A conditional knockout resource for the genome-wide study of mouse gene function. Nature 474, 337-342. 10.1038/nature10163 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Stanford, W. L., Epp, T., Reid, T. and Rossant, J. (2006). Gene trapping in embryonic stem cells. Methods Enzymol. 420, 136-162. 10.1016/S0076-6879(06)20008-9 [DOI] [PubMed] [Google Scholar]
- Stryke, D., Kawamoto, M., Huang, C. C., Johns, S. J., King, L. A., Harper, C. A., Meng, E. C., Lee, R. E., Yee, A., L'Italien, L.et al. (2003). BayGenomics: a resource of insertional mutations in mouse embryonic stem cells. Nucleic Acids Res. 31, 278-281. 10.1093/nar/gkg064 [DOI] [PMC free article] [PubMed] [Google Scholar]
- To, C., Epp, T., Reid, T., Lan, Q., Yu, M., Li, C. Y. J., Ohishi, M., Hant, P., Tsao, N., Casallo, G.et al. (2004). The centre for modeling human disease gene trap resource. Nucleic Acids Res. 32, D557-D559. 10.1093/nar/gkh106 [DOI] [PMC free article] [PubMed] [Google Scholar]
- von Melchner, H., DeGregori, J. V., Rayburn, H., Reddy, S., Friedel, C. and Ruley, H. E. (1992). Selective disruption of genes expressed in totipotent embryonal stem cells. Genes Dev. 6, 919-927. 10.1101/gad.6.6.919 [DOI] [PubMed] [Google Scholar]
- Wefers, B., Bashir, S., Rossius, J., Wurst, W. and Kühn, R. (2017). Gene editing in mouse zygotes using the CRISPR/Cas9 system. Methods 121-122, 55-67. 10.1016/j.ymeth.2017.02.008 [DOI] [PubMed] [Google Scholar]
- Wiles, M. V., Vauti, F., Otte, J., Füchtbauer, E.-M., Ruiz, P., Füchtbauer, A., Arnold, H.-H., Lehrach, H., Metz, T., von Melchner, H.et al. (2000). Establishment of a gene-trap sequence tag library to generate mutant mice from embryonic stem cells. Nat. Genet. 24, 13-14. 10.1038/71622 [DOI] [PubMed] [Google Scholar]
- Wurst, W., Rossant, J., Prideaux, V., Kownacka, M., Joyner, A., Hill, D. P., Guillemot, F., Gasca, S., Cado, D., Auerbach, A.et al. (1995). A large-scale gene-trap screen for insertional mutations in developmentally regulated genes in mice. Genetics 139, 889-899. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Yao, X., Zhang, M., Wang, X., Ying, W., Hu, X., Dai, P., Meng, F., Shi, L., Sun, Y., Yao, N.et al. (2018). Tild-CRISPR allows for efficient and precise gene Knockin in mouse and human cells. Dev. Cell 45, 526-536.e5. 10.1016/j.devcel.2018.04.021 [DOI] [PubMed] [Google Scholar]
- Yates, A., Akanni, W., Amode, M. R., Barrell, D., Billis, K., Carvalho-Silva, D., Cummins, C., Clapham, P., Fitzgerald, S., Gil, L.et al. (2016). Ensembl 2016. Nucleic Acids Res. 44, D710-D716. 10.1093/nar/gkv1157 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Yoshida, M., Yagi, T., Furuta, Y., Takayanagi, K., Kominami, R., Takeda, N., Tokunaga, T., Chiba, J., Ikawa, Y. and Aizawa, S. (1995). A new strategy of gene trapping in ES cells using 3'RACE. Transgenic Res. 4, 277-287. 10.1007/BF01969122 [DOI] [PubMed] [Google Scholar]
- Zambrowicz, B. P., Friedrich, G. A., Buxton, E. C., Lilleberg, S. L., Person, C. and Sands, A. T. (1998). Disruption and sequence identification of 2,000 genes in mouse embryonic stem cells. Nature 392, 608-611. 10.1038/33423 [DOI] [PubMed] [Google Scholar]
- Zambrowicz, B. P., Abuin, A., Ramirez-Solis, R., Richter, L. J., Piggott, J., BeltrandelRio, H., Buxton, E. C., Edwards, J., Finch, R. A., Friddle, C. J.et al. (2003). Wnk1 kinase deficiency lowers blood pressure in mice: a gene-trap screen to identify potential targets for therapeutic intervention. Proc. Natl. Acad. Sci. USA 100, 14109-14114. 10.1073/pnas.2336103100 [DOI] [PMC free article] [PubMed] [Google Scholar]
- Zhang, Z., Zhu, Z., Watabe, K., Zhang, X., Bai, C., Xu, M., Wu, F. and Mo, Y.-Y. (2013). Negative regulation of lncRNA GAS5 by miR-21. Cell Death and Differ. 20, 1558-1568. 10.1038/cdd.2013.110 [DOI] [PMC free article] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.