Skip to main content
Nucleic Acids Research logoLink to Nucleic Acids Research
. 2023 Feb 6;51(5):2033–2045. doi: 10.1093/nar/gkad049

LINE-1 retrotransposon expression in cancerous, epithelial and neuronal cells revealed by 5′ single-cell RNA-Seq

Wilson McKerrow 1,2, Larisa Kagermazova 3,4, Nicole Doudican 5, Nicholas Frazzette 6, Efiyenia Ismini Kaparos 7,8, Shane A Evans 9,10, Azucena Rocha 11,12, John M Sedivy 13,14, Nicola Neretti 15,16, John Carucci 17, Jef D Boeke 18,19,20, David Fenyö 21,22,
PMCID: PMC10018344  PMID: 36744437

Abstract

LINE-1 retrotransposons are sequences capable of copying themselves to new genomic loci via an RNA intermediate. New studies implicate LINE-1 in a range of diseases, especially in the context of aging, but without an accurate understanding of where and when LINE-1 is expressed, a full accounting of its role in health and disease is not possible. We therefore developed a method—5′ scL1seq—that makes use of a widely available library preparation method (10x Genomics 5′ single cell RNA-seq) to measure LINE-1 expression in tens of thousands of single cells. We recapitulated the known pattern of LINE-1 expression in tumors—present in cancer cells, absent from immune cells—and identified hitherto undescribed LINE-1 expression in human epithelial cells and mouse hippocampal neurons. In both cases, we saw a modest increase with age, supporting recent research connecting LINE-1 to age related diseases.

INTRODUCTION

In the human genome, LINE-1 is the only family of retrotransposons—sequences capable of copying themselves to new genomic loci via an RNA intermediate—that remains active and autonomous (1). LINE-1 element expression is driven by a promoter that lies within its 5′UTR sequence. It is hence an unusual promoter in that it contains 100% ‘downsteam elements’ (2), which is necessitated by its mode of retrotransposition that requires it to ‘take its promoter with it’. Once expressed and translated, LINE-1 mRNAs complex with their two potentially cis-acting (3) protein products, the RNA binding ORF1p (4) and the endonuclease/reverse transcriptase ORF2p (5,6), to form LINE-1 RNPs that are primarily cytoplasmic but can enter the nucleus during mitosis (7). In the nucleus, ORF2p reverse transcribes a new LINE-1 DNA copy at a site of genomic insertion through a process called target primed reverse transcription (TPRT) (8). In addition to its potential to insert into and disrupt protein coding genes (9–12), LINE-1 expression can contribute to DNA damage (13,14) and induce an interferon response (14–17). LINE-1 is highly active in cancer, with some tumors bearing more than 100 new LINE-1 insertions (18–23). It is also active in the germline where it generates insertions that are polymorphic in the human population (24) and occasionally contributes to heritable genetic disorders (9).

The extent of LINE-1 activity outside these two contexts, namely in non-tumor somatic tissues, has been a subject of vigorous debate for decades (25–31). Recently, several studies have suggested that the epigenetic changes that occur during aging provide an opportunity for LINE-1 and other normally silenced genomic regions to become active and contribute to the aging phenotype and related diseases (16,32,33). We therefore sought a highly sensitive method that would use standard techniques to identify LINE-1 RNA expression and to ascertain what counts as normal somatic LINE-1 expression and how that changes in disease and aging contexts. Quantifying transcription from repetitive sequences poses a number of challenges (34,35). Notably, the very high copy number of LINE-1 sequences means that methods based on simple read counts of LINE-1 RNA are confounded by the fact that the hundreds of thousands of LINE-1 copies include many that lie within cellular transcription units and thus those transcripts are not driven by the LINE-1 promoter, but instead are ‘passively co-transcribed’. We therefore leveraged 5′ targeted single cell RNA-seq data sequenced with reads that are at least 100 base pairs long to develop 5′ scL1seq. Because 5′ scL1seq is a straightforward method to extract information about LINE-1 expression from reads sequenced from a standard library prep (5′ targeted 10× genomics single cell RNA-seq), it can easily be deployed by researchers interested in whether LINE-1 expression is present in particular disease models or samples.

We applied this method to publicly available data from 15 healthy somatic tissues, and found that LINE-1 is indeed expressed in normal epithelial cells. We also analyzed skin samples from two younger (under 40) and two older (over 80) non-melanoma skin cancer patients and found that epithelial LINE-1 expression was much higher in malignant cells and somewhat higher in the older patients. Finally, we looked at hippocampal cells from two young (4 month old) and two old (24 month old) mice, and found LINE-1 to be expressed in mouse hippocampal neurons, with modestly increased expression in the older neurons.

Several methods for quantifying TEs from single cell RNA-seq do exist (36–38). In this paper, we additionally argue that for short read 10x genomics sc-RNA-seq, only the 5′ targeted method accurately identifies LINE-1 expression. The challenges of quantifying TE transcripts from 3′ biased single cell data has been noted (39), and a previous paper did note that 5′ targeted single cell methods can be used to detect transposable elements expressed from their own promoters, but did not explicitly dive into LINE-1 expression (40). In this paper, we develop a strategy that is specifically targeted the overcome the challenges of quantifying LINE-1. Long reads have also been shown to be an effective method for the detection of transposable element transcripts in single cell data (41). Long reads provide an even clearer picture of the relevant transcript than 5′ aligning short reads do, but currently, short read sequencing remains the most cost-effective way to do high throughput single cell RNA-seq, and thus methods are needed to quantify LINE-1 expression in this kind of data.

MATERIALS AND METHODS

Construction of the cellranger index

To build the custom reference genome, L1Hs and L1PAx family repeats were masked in hg38 using bedtools maskfasta. L1Hs and L1PAx annotations were downloaded in bed format from the UCSC genome table browser (https://genome.ucsc.edu/cgi-bin/hgTables). Then this masked reference was concatenated with the human L1Hs consensus sequence and all available L1PA consensus sequences from Dfam (https://www.Dfam.org/browse?name_accession = L1PA&clade_descendants = true). For transcript annotation, we used the RefSeq GRCh38 annotation in gtf format. A cellranger index was then built with cellranger mkref.

The mouse version of L1-sc was generated in the same manner as human. We first used bedtools maskfasta to mask L1Md sequences (downloaded from the UCSC genome table browser in bed format) from the mm39 mouse genome. We then added L1Md consensus sequences in Dfam (https://Dfam.org/browse?name_accession=L1Md&classification=root;Interspersed_Repeat;Transposable_Element;Class_I_Retrotransposition;LINE;Group-II;Group-1;L1-like;L1-group;L1&clade=10088&clade_descendants=true) as decoy chromosomes.

L1MdTf_II/III, L1MdGf_II and L1MdA_II/III were excluded as they are highly similar to the L1MdTf_I, L1MdGf_I and L1MdA_I consensuses.

Quantification of LINE-1 UMIs in humans

After running cellranger using the custom reference described above, we looked at the position sorted STAR aligned cellranger output (file name = possorted_bam.bam) to identify LINE-1 UMIs in reads aligning to the L1Hs consensus ‘decoy chromosome’. Briefly, reads aligning in the appropriate location (5′ end or 3′ depending on the chemistry) and orientation were filtered for excessive mismatches and clipping. The barcodes were extracted from the remaining reads to assign LINE-1 expression to specific cells. The specifics of the read filtering strategy are as follows: for the 3′ targeted data, unclipped reads aligning in the final 1kb (but in not the poly A stretch at the very 3′ end) of the L1Hs consensus with two or fewer mismatches were checked for the presence of the cell barcode (CB) and UMI barcode (UB) flags. If both were present the UB was added to a python ‘set’ object within a python dictionary with the CB being the key. Once the whole bam file has been read, the number of L1Hs UBs associated with each CB is reported. For the 5′ targeted data from HeLa cells, quantification was performed in the same manner except reads were instead collected from either the first 150 or first 500 bases of the L1Hs consensus. For the 5′ data with 100 or 150 bp paired end reads, UMIs were only counted if they have a proper (sam flag 2 is present) read 1/read 2 alignment that meets the following criteria: For read 1, the first aligned base must in the first 20 bases of the L1Hs consensus, and the read must have less than 20 bases clipped from 5′ end, no clipping at the 3′ end, and no more than 2 mismatches. (Allowing for clipped bases at the 5′ is necessary because while cellranger will trim off the barcode sequences, the TSO sequence and a variable number of G bases added during the template switching process will remain.) For read 2, neither end can be clipped and no more than 2 mismatches are allowed. To recreate the behavior of cellranger's aggr function, when combining samples, reads were downsampled to assure even coverage between samples. Reads must also reflect the strand-specificity of 10× genomics. For the 100/150 base paired reads, read 1 is aligned on the + strand and read 2 is aligned on the—strand.

Pseudobulk L1EM analysis

To turn 10x genomics single cell data into pseudobulk, the first 46 bases were removed from read 1 using trimmomatic (58) (HEADCROP:46). This removes the barcodes, the TSO sequence and any extra G bases added during the template switching. To make L1EM (https://github.com/FenyoLab/L1EM) able to handle 5′ data, we shortened the LINE-1 annotation in L1EM.400.bed to only cover the first 100 bases of the element. Thus, to be counted as active expression, the first base of aligned read must align within the first 100 bases of a 5′ UTR intact LINE-1 element. Because 10x genomics data includes many more reads than a standard RNA-seq experiment, the following changes were made to control memory usage by L1EM: threads = 8, realignNM = 2, L1EM_NM = 2, NMdiff = 1.

Quantification of LINE-1 in mouse

The mouse L1 promoter has a somewhat different structure from the human version, making it more challenging to quantify 5′ targeted LINE-1 reads in mice. Rather than using the downstream promoter method that allows human LINE-1 to carry its promoter to a new location, the mouse 5′ UTR contains multiple copies of its promoter (i.e. a series of tandem repeats of ∼212 bp), with additional copies potentially being generated during insertion (presumably by slippage during reverse transcription). Given this repetitive 5′ UTR structure, it was not obvious where the true ‘start’ site was located in the monomer. We analyzed the monomer sequence from the Orleans Reeler L1MdTf insertion and looked for a match to the initiator consensus sequence Inr (59,60). We found a single perfect match to Inr, and used this as our landmark to the inferred start site of L1MdTf initiation. When we aligned reads to this modified consensus, we found a read 1 peak at the initiator sequence (Figure 5A). However, the exact mouse LINE-1 TSS was not as precise as it was in human LINE-1, so we included all read pairs with a read 1 fall within a tandem repeat in our mouse LINE-1 quantification. The same clipping and mismatch filters were applied: no more than 2 mismatches and no clipping except up 20 bases at 5′ end of read 1. Reads must also reflect the strand-specificity of 10× genomics: read 1 is aligned on the + strand and read 2 is aligned on the—strand.

Figure 5.

Figure 5.

LINE-1 (L1MdTf) expression in the mouse hippocampus by single nucleus RNA-seq. (A) Read coverage across the L1MdTf_I consensus. (B) UMAP embedding of mouse hippocampal cells colored by log normalized L1MdTf expression. (C) LINE-1 expression by cell type. ***P < 0.001 for enrichment in this cell type, **0.001 < P < 0.01, *0.01 < P < 0.05. Red indicates comparison to all other cells, gray indicates comparison excludes CA neurons. (D) Comparison of neuronal LINE-1 expression in 4-month-old mice (young) to 24-month-old mice (old).

Single cell RNA-seq clustering and cell identification

A count matrix was built using cellranger count and the index described above (default parameters). Cellranger aggr was used to merge related runs. Clustering and UMAP embedding was performed in Seurat(61–64) according to the PBMC 3k tutorial (https://satijalab.org/seurat/v3.2/pbmc3k_tutorial.html). Cells with >25% MT-RNA, or <1000 RNA molecules detected were removed. Clusters showing markers for disparate cell types were taken to be doublets and removed. Human cell types were then identified using marker genes: CD3E for T cells, MS4A1 for B cells, MZB1 for plasma cells, GNLY for NK cells, LYZ for macrophages, KIT for mast cells, COL1A2 for fibroblasts, ACTA2 for muscle, PLVAP for endothelial cells, MLANA for melanocytes, and keratin (KRT) genes for epithelial cells (Supplemental Figures S4-S7). For keratinocyte clusters we used: KRT5 for basal cells, KRTDAP for suprabasal cells, KRT28/19 for hair follicle associated cells, PPARG for sebocytes and KRT77 for sweat duct cells (Supplemental Figure S8). Malignant cells were identified as CNV+ cells by inferCNV (Trinity CTAT project: https://github.com/broadinstitute/inferCNV) (Supplemental Figure S9). The following markers were used in the mouse hippocampus: Gfap for astrocytes, Tmem119 for microglia, Myrf for oligodendrocytes, Rbfox3 for neurons, Cspg4 for OPCs, Cldn5 for ependymal cells, and Slc6a13 for vascular cells. Interneurons were distinguished by Slc6a1 expression, DG neurons by C1ql2, and CA neurons by Neurod6 and Dkk3 (65) (Supplemental Figure S10). UMAP plots were made using the DimPlot and FeaturePlot functions in Seurat.

5′ scRNA-seq from HeLa ± L1RP.

Cell line and plasmid: HeLa-M2 cells (HeLa cells with a reverse tetracycline-controlled transactivator gene coding for rtTA2-M2, a gift from Gerald

Schumann, Paul Ehrlich Institute) were cultured in DMEM supplemented with 10% FBS (Gemini #100–106) and 1× Penicillin–Streptomycin–Glutamine (ThermoFisher #10378016). The cells were routinely tested for Mycoplasma by PCR detection of conditioned medium. The L1rp reporter construct—pLAK002 (tet-inducible promoter-L1rp-GFP-AI) plasmid was introduced into HeLa-M2 cells by plating 0.2 × 106 cells per well in a 6-well plate the day before transfection. The following day, cells were transfected with pLAK002 using FuGENE-HD Transfection Reagent (Promega #E2311), following the manufacturer's protocol. 48 h post-transfection, 1 μg/ml of puromycin was added. The following day, cells were trypsinized and replated in a 10 cm plate with 1 μg/ml of puromycin added. Puromycin was replenished every other day for 14 days until no cell death was observed.

Retrotransposition assay in 6-well plates: Once selection was complete, cells were trypsinized and plated at 2.0 × 105 cells per well in a 6-well plate. 24 h post cell plating, doxycycline (1 μg/ml) was added to induce L1 expression. After 72 h, the percentage of GFP + cells was measured by flow cytometry (FACS buffer: PBS + 2% FBS) to confirm L1 retrotransposition and samples were submitted for scRNAseq.

Single-cell RNAseq: Cells were detached from the plate with TrypLE™ Express Enzyme (ThermoFisher #12604013) following the manufacturer's protocol and then resuspended in DMEM at 0.015 × 106 cells per 38.7 μl and submitted to the NYULH Genome Technology Core Facility. scRNA-seq was performed using the 10x Genomics system using 5′ gene expression library.

Negative binomial generalized linear models

Because LINE-1 expression is low in normal cells, there are usually only a few LINE-1 UMIs detected in a single cell. As a result, LINE-1 detection is biased toward cells with a greater number of UMIs, causing the popular wilcoxon rank sum test on log normalized expression to, in some cases, yield biased results. We therefore chose to estimate LINE-1 expression using a negative binomial generalized linear model with log link function:

graphic file with name M0001.gif

where Inline graphic is the number of LINE-1 UMIs in a cell, Inline graphic is the total number of UMIs in that cell, Inline graphic is a predictor (such as cell type or age), Inline graphic, Inline graphic are parameters to be learned and Inline graphic is the negative binomial error term. These models were fit using the glm.nb function in the MASS package for R, using the ‘offset’ function to ensure that the coefficient on Inline graphic has a fixed value of 1. P values were estimated by anova. Under this model, transcripts per million (TPM) can be estimated as:

Inline graphic

Single cell suspension derived from patient tumors and matched normal

Patient tumors were processed as described previously (66). Adjustments are described: tumors were obtained on the day of Mohs micrographic surgery and washed in cold DMEM [Gibco 11995-065] supplemented with 10% FBS [Thermo Scientific SH30910.03]. Any fat was removed from tumor samples using a scalpel. Then, tumors were then cut into small pieces using a razor and resuspended in 10 mL of DMEM/10% FBS, 10 mg/mL collagenase II [Sigma C2674] and 10 U DNase I [Sigma 00453869] for 20 min at 37°C. After incubation, the suspension was vortexed 1× for 30 s followed by pipetting sequentially through 25, 10 and 5 ml pipettes for 1 min each. Next, the cell suspension was filtered through a 70-μm filter [Fisher 22363548] and spun at 300 g for 5 min. After spinning, the supernatant was removed, and ACK red blood cell lysis was performed according to the manufacturer's protocol as needed [Gibco a10492-01].

Single-cell library construction and 5′ sequencing from patient tumors and matched normal

The cellular suspensions were loaded on a 10× Genomics Chromium instrument to generate single-cell gel beads in emulsion (GEMs). Approximately 10–12 × 103 cells were loaded per channel. Single-cell RNA-Seq libraries were prepared using the following single cell 5′ reagent kits: Chromium™ single cell 5′ Library & Gel Bead Kit, PN-1000006 and chromium single cell VDJ enrichment kit for human T cells (PN-100000). Libraries were run on a NovaSeq 6000 SP (SCC) or S1 (BCC) flow cell (depending on the number of samples per run).

Mouse hippocampus single nucleus RNA sequencing

For snRNA-seq the hippocampus was dissected from the brains of 4-month old and 24-month old C57BL/6 mice. A total of four animals were used in each age group. The hippocampi from two mice were pooled together into one sample, and the other two mice were pooled in another sample. Nuclei were isolated from minced hippocampi tissue using the Nuclei PURE Prep Nuclei Isolation Kit with a Dounce B homogenizer. Samples were subjected to a sucrose gradient, and nuclei were further purified and counted. We targeted 5000 nuclei per sample to load onto a 10x Chromium chip using VD(J) chemistry. We targeted 50 000 sequence reads per nuclei on an Illumina HiSeq device.

Analysis of GTEx data

LINE-1 quantification in GTEx data were performed on the Cancer Genomics Cloud, using L1EM, which was made available as public tool on the platform. For each sample, ORF1 and ORF2 intact loci expressed at least 2 FPM were combined into a single value of LINE-1 expression.

RESULTS

3′ Targeted scRNA-seq is not appropriate for LINE-1 quantitation

We tested three scRNA-seq methods for their ability to capture LINE-1 expression: the standard 3′ targeted 10× Genomics method (Figure 1A), the standard 10× Genomics 5′ method (Figure 2A) and a 5′ library prep followed by 150 base pair (bp) paired end sequencing (Figure 1B). LINE-1 was quantified through a simple method based on our MapRRCon (42) pipeline (Figure 1C): Reads are fed into a custom cellranger index that includes a masked hg38 genome with all L1HS and L1PA sequences removed and replaced by consensus sequences available from Dfam. Reads aligned to the L1HS consensus are filtered for alignment quality and then unique molecular identifiers (UMIs) are counted. The default STAR parameters used by cellranger require alignment seeds that match at most 50 loci, meaning that many LINE-1 reads will go unmapped. Replacing the specific loci with a single consensus allows those reads to map uniquely and collects them in a single location for easy identification and quantification. Nevertheless, reliance on the consensus sequence may bias our method toward loci that have 5′ ends that are more similar to the consensus sequence.

Figure 1.

Figure 1.

5′ but not 3′ single cell RNA-seq can capture LINE-1 expression. (A) 3′ targeted single cell RNA-seq fails to capture LINE-1 expression in a lung adenocarcinoma (LUAD) tumor. Top: Schematic of the 3′ 10x Genomics sequencing strategy. mRNAs are captured at the 3′ end by bead bound poly dT primers attached to a cell specific barcode (CB, 16bp) and a molecule specific barcode (UB, 10bp). The cDNA is fragmented and a read pair is sequenced with a 90bp read 1 (R1) falling at the 5′ end of 3′ fragment and a 26 bp read 2 (R2) covering only the barcodes. Middle: Coverage of reads aligned to the LINE-1 consensus. A lack of difference between tumor/met and normal immediately suggests that this data fails to accurately capture LINE-1 expression. Bottom: UMAP embedding of all cells colored by LINE-1 ‘expression.’ The presence of large numbers of LINE-1 aligning reads in each cell type is further evidence that 3′ targeted single cell RNA-seq is ineffective for LINE-1. (B) 5′ targeted single cell RNA-seq with 150 bp paired end reads successfully captures LINE-1 expression in lung squamous cell carcinoma (LSCC). Top: Schematic of the 5′ 10× Genomics sequencing strategy (with 150 bp paired reads). mRNAs are captured at the 5′ end by a bead bound template switch oligo (TSO) attached to a cell specific barcode (CB, 16 bp) and a molecule specific barcode (UB, 10 bp). The cDNA is fragmented and a read pair is sequenced with a 150 bp read 1 (R1) that covers the bar codes (26 bp), the TSO and, crucially, the exact 5′ end of the transcript sequence (purple section) and a 150 bp read 2 that falls at the 3′ end of the 5′ fragment. Middle: Coverage of reads aligned to the LINE-1 consensus. Transcripts from the LINE-1 promoter is represented by the read 1 peak at the 5′ end. Bottom: UMAP embedding of all cells colored by LINE-1 expression from the 5′ read 1 peak. The known LINE-1 expression pattern—specific to cancer cells—is recapitulated. (C) 5′ sc-L1 pipeline. Single cell rna-seq reads are aligned to custom Cell Ranger index. Reads are filtered for alignment quality and a read 1 starting within 20bp of the 5′ end of L1Hs. LINE-1 unique molecular identifiers (UMIs) are identified from the unique barcodes (UB) and assigned to cell based on the cell barcode (CB).

Figure 2.

Figure 2.

Cell line validation data for the 5′ method. (A) Schematic of sequencing used for HeLa ± L1RP. Exactly as in Figure 1A, except that read 1 is only 26 bp and thus only covers the cell (CB) and unique (UB) barcodes, and read 2 is 90 bp long. (B) Coverage of reads along the L1Hs consensus. Collecting reads aligning with the first 150 bp (blue dashed line) will filter out almost all L1Hs aligning read 2s in the L1RP- control. Allowing reads to fall anywhere in the first 500 bp (solid blue LINE-1 will recover all LINE-1 expression in the L1RP + cells. (C) Violin plots comparing LINE-1 expression in L1RP + versus—HeLa cells using either the 150 or the 500 bp cutoff. (D) UMAP of HCT116 and MCF7 cells colored by LINE-1 expression. (E) Coverage across the L1Hs consensus for MCF-7 and HCT116. Antisense reads and those with more than two mismatches were filtered out.

LINE-1 expression (i.e. RNA transcript accumulation driven by the internal.

LINE-1 5′ promoter), was vastly overestimated when we reanalyzed normal lung, tumor, and metastases (met) samples from lung adenocarcinoma (LUAD) patients sequenced by the 3′ method (43). In this data, only the 90 bp read 1 carries any information about the captured RNA molecules (Figure 1A, top). Read 1 sequences aligned to the L1HS consensus in two peaks (Figure 1A, middle). The smaller peak likely reflects a cryptic polyA site in the LINE-1 sequence (an AAUAA motif appears at the peak's 3′ end.) The second, larger peak occurred as expected at the 3′ end of LINE-1. However, there was no difference between normal, tumor and met, despite considerable evidence showing that LINE-1 is upregulated in cancer cells (44).

We then mapped reads from this second peak into a UMAP embedding and found high LINE-1 ‘expression’ in nearly every cell type (Figure 1A, bottom), further indicating that this method does not reflect the known pattern of tumor LINE-1 expression. LINE-1 is present in the human genome at a very high copy number, but most of these copies are inactivated by mutation and/or 5′ truncation. However, LINE-1s, including the half million or so copies that lack promoter activity, are frequently transcribed into non-LINE-1 transcripts, and these non-LINE-1 specific ‘passive’ transcripts often vastly exceed ‘active’ LINE-1 expression (45,46). Any of these passive transcripts that are poly-adenylated at LINE-1’s 3′ polyadenylation site will be indistinguishable from active LINE-1 expression in 3′ targeted datasets. Furthermore, there may be additional sources of 3′ LINE-1 RNA that is not derived from the LINE-1 promoter activity, such as cryptic promoter activity in the 3′ UTR (47). Therefore, it is not entirely unexpected that a 3′ targeted sequencing method is not effective at measuring LINE-1 expression.

5′ targeted 10x genomics sc-RNA (with an extended read 1 length) captures LINE-1 expression

Because passive LINE-1 transcripts initiate upstream of the element, they can be filtered out by using a 5′ targeted method. 10× Genomics methods use a transcript switch oligo (TSO) to identify RNA molecules that have been fully reverse transcribed into cDNA. Polymerase reaching the 5′ end of the RNA molecule will change templates at the TSO, yielding a molecule where the TSO sequence is affixed to the 5′ end of the fully reverse transcribed cDNA (Figure 1B, top). Thus, if we perform sequencing that is 100 or 150 bp paired end, we will sequence the TSO/cDNA junction, allowing us to precisely identify LINE-1 RNAs that start within a few bases of the 5′ end (i.e. result from LINE-1 promoter activity; see purple segment in Figure 1B top). To test this assertion, we aligned reads from a 10x genomics example dataset that includes 150 bp paired end reads from a different—lung squamous cell carcinoma (LSCC)—tumor (https://support.10xgenomics.com/single-cell-vdj/datasets/2.2.0/vdj_v1_hs_nsclc_5gex). This yielded a large peak of read 1 sequences mapping within 20 bases of the 5′ end of LINE-1 (shaded peak in Figure 1B, middle) as well as a few peaks that are likely not representative of LINE-1 promoter activity (gray arrows in Figure 1B, middle). When we only count UMIs from read pairs that have a read 1 in this peak—a method we call 5′ scL1seq—LINE-1 expression is almost entirely restricted to the epithelial/cancer cluster, reflecting abundant evidence that LINE-1 is upregulated in cancer cells (14,21,44).

Validation in cell lines

To validate the method of identifying LINE-1 expression from 10x genomics 5′ scRNA-seq, we sequenced RNA from a HeLa cell that has been engineered to overexpress LINE-1 (L1RP) alongside the parental (WT) HeLa cell line. We also looked at endogenous expression in two cells: one (MCF-7) that has been shown to express a high level of LINE-1, and one (HCT116) that expresses a low level of LINE-1 (48).

The HeLa cell lines were sequenced using the newer v2 chemistry from 10x genomics. By default, this method saves on sequencing cost by sequencing a 26 bp read 1 that only covers the barcode sequences (Figure 2A). Therefore, only the 90 bp read 2 is aligns to the transcript. Without a TSS defined by read, it is less clear which reads to count as LINE-1 (Figure 2B). We therefore quantified LINE-1 in two ways: one only counting reads in the first 150 bp of L1Hs and one counting reads in the first 500bp of L1Hs. Nevertheless, both methods show a clear overexpression of L1NE-1 in the L1RP cell compared to the parental (WT) HeLa (Figure 2C).

To test how the method handles endogenous LINE-1 expression, we sequenced MCF-7, previously shown to express a high level of LINE-1, and HCT116, which expresses a low level of LINE-1 (48). Consistent with that result, we see much higher expression in MCF-7 compared to HCT116 (Figure 2D). This difference is also reflected in the coverage profile across the L1Hs consensus in these two cell lines (Figure 2E). To further validate this result, we checked whether we could trace the LINE-1 expression back to the specific loci expressed in these cell lines. To that end, we trimmed the barcode and TSO sequences from read 1 to create pseudobulk RNA-seq reads, and quantified expression at the single locus-specific level using a version of L1EM that was modified for 5′ data (see Materials and Methods). For MCF-7, which has high enough LINE-1 expression to clearly identify specific expressed loci, we found that our L1EM analysis agreed with Philippe et al. (2016) (48) about the top locus and three of the fie most highly expressed full length L1Hs loci (non-reference loci were excluded from the top 5 since they are not considered by L1EM). Together, these results suggest that reads aligning at the 5′ end of the L1Hs consensus indeed reflect bona fide 5′UTR promoted LINE-1 expression.

LINE-1 is expressed in normal human epithelial cells

Having developed a high throughput method to measure LINE-1 RNA expression in single cells, we returned to the primary question motivating this study: how much LINE-1 expression occurs in normal, adult somatic cells? To that end, we reanalyzed 5′ targeted single cell RNA-seq data sequenced using 150 bp paired end reads from 15 healthy human tissues: bile duct, bladder, blood, esophagus, heart, liver, lymph node, bone marrow, muscle, rectum, skin, small intestine, spleen, stomach and trachea (49). Fibroblasts, endothelial, muscle and immune cells clustered by cell type regardless of tissue of origin and showed almost no LINE-1 expression. However, epithelial cells, which clustered by tissue of origin, showed clear evidence for LINE-1 expression (Figure 3A, Supplemental Figure S2). The presence of LINE-1 expression in normal epithelial cells is reminiscent of the fact that frequent retrotransposition events have been identified in epithelial cell derived tumors (carcinomas), but retrotransposition is infrequent or absent from cancers derived from blood and brain (21,47,50).

Figure 3.

Figure 3.

LINE-1 is expressed in normal epithelial cells. (A) UMAP of LINE-1 RNA expression in single cells from 15 healthy tissues of a single healthy donor. Epithelial cells are labeled by tissue of origin. (B) LINE-1 expression by cell type. ***P < 0.001 for enrichment in this cell type, *0.01 < P < 0.05. Red indicates comparison to all other cells, gray indicates comparison to other non-epithelial cells. (C) Epithelial cell LINE-1 expression by tissue of origin. ***P < 0.001 for enrichment in this tissue, *0.01 < P < 0.05. (D) LINE-1 RNA expression (estimated by L1EM) in bulk tissue samples from GTEx. (E) LINE-1 ORF1p detection in healthy tissues. (F) Comparison of LINE-1 expression in LSCC versus tracheal epithelial cells to bulk tumor/normal comparisons from bulk LSCC samples. Blue and red LINE-1s show the distribution of tumor sample LINE-1 enrichment in RNA and protein respectively. Black LINE-1 indicates enrichment of LINE-1 RNA expression in LSCC epithelial cells compared to tracheal epithelial cells.

To quantify the LINE-1 enrichment in epithelial cells, we fit a negative binomial generalized linear model (nb glm) to the number of LINE-1 UMIs in each cell, using the total number of UMIs per cell and whether a cell is epithelial as predictors. Using this method, we found that LINE-1 expression is ∼17× higher in epithelial cells compared to others in this dataset (P < 10−16, Figure 3B, red stars). We then fit similar models to non-epithelial cells, separately considering each cell type as a predictor and found that LINE-1 expression is about twice as high in T cells compared to other non-epithelial cells (P = 7.2 × 10−5) and about 73% higher in endothelial cells (P = 0.01, Figure 3B, gray stars). We also used the nb glm to compare epithelial cells from different individual tissues samples. The esophagus sample was most significantly enriched for LINE-1 expression (P < 10−16), but we also identified increased LINE-1 expression in epithelial cells from the bladder (P = 0.01) and trachea (P = 0.02).

Because the overall signal of LINE-1 expression is low in normal epithelial cells and there is a fair amount of background (Supplemental Figure S1), we wanted to map reads back to specific loci to see if they actually come from actively expressed LINE-1 loci. To that end, we applied the 5′ pseudobulk L1EM method (see above and methods) to the esophageal sample, where LINE-1 expression was predicted to be highest. We found the top expressed locus to be an intact L1Hs locus antisense to an intron in the TT28 gene, located in the 22q12.1 cytoband. This locus has previously been shown to be a highly active locus that generates a large number of 3′ transductions in cancer (20,21,23). Looking at the pattern of reads aligning upstream of a putatively expressed LINE-1 locus has been previously used to validate its expression (46,51). We therefore plotted the read coverage for 50 kb upstream of the three most highly expressed loci (see Supplemental Figure S2). Because the data is biased toward the 5′ end, it is difficult to be absolutely certain where the transcripts fall. Nevertheless, we do not find reads piling up immediately upstream of these loci and for two of the three top loci, the upstream read peaks are far smaller than the number of reads pairs predicted to be derived from the 5′ end of the element by L1EM. (Note that reads within the L1 element multimap. The L1EM expected read counts are much higher because they include reads that miss map to other highly similar loci.) Together these observations build confidence that the 5′ read alignments really do reflect proper LINE-1 expression.

We then wanted to know whether the normal epithelial cell LINE-1 expression that we found in single cells was also detectable in bulk samples. A recent study showed that, indeed, LINE-1 expression can be seen in normal tissues from GTEx (52), including high expression in skin, a tissue with a large epithelial cell component. To corroborate these results, we applied our L1EM algorithm (45) to healthy tissue data from GTEx and found that LINE-1 was most often detected in tissues with a significant epithelial cell component. We detected LINE-1 expression (at least one L1Hs locus with >2 fragments per million/FPM) in at least 25% percent of samples from: esophageal mucosa, kidney, lung, ovary, salivary gland, skin, testis and vagina. On the other hand, LINE-1 expression was rarely detected in tissues lacking in epithelial cells such as adipose (fat), brain, muscle and blood (Figure 3D).

We then searched a dataset of particularly deep mass spectrometry proteomics from 29 healthy tissues for spectra matching peptides in the LINE-1 ORF1 protein (53), following our previously described protocol (54). At least 4 peptide spectral matches (PSMs) were identified in lung (51), ovary (23), placenta (10) and prostate (4), but ORF1p was not detected in tissues lacking epithelial cells, including bone marrow, brain, fat, lymph node and smooth muscle (Figure 3E). This suggests that LINE-1 ORF1p is present in these tissues, but typically lies near or below the detection limit for mass spectrometry proteomics. While LINE-1 RNA expression is present in both tumor and normal epithelial cells, it is still about 3.3× higher in the LSCC epithelial/cancer cells, analyzed above, than in the normal trachea epithelial cells. This enrichment is consistent with tumor/normal comparisons of LINE-1 RNA and protein expression in bulk LSCC samples from CPTAC(55) (Figure 3F).

LINE-1 expression in malignant vs normal cells in younger and older individuals

Because we found LINE-1 expression to be highest in epithelial cells and other studies have found LINE-1 to be most active in epithelial derived tumors, we wanted to perform a direct comparison of LINE-1 expression in matched tumor/normal samples. To that end, we performed matched tumor/adjacent normal 5′ single cell RNA-seq with 100 bp paired end reads from four patients: a younger (35-year-old) squamous cell carcinoma (SCC) patient, an older (88-year-old) SCC patient, a younger (34-year-old) basal cell carcinoma (BCC) patient, and an older (84-year-old) BCC patient. Consistent with the previous datasets, keratinocytes (keratin producing epithelial cells of the epidermis and hair follicle—KCs) from both the tumor (up 84x over non-KCs, P < 10−16) and the normal samples (up 7.6× over non-KCs, P < 10−16) most frequently expressed LINE-1 (Figure 4A, B). Excluding KCs, we also saw an enrichment in T cells (up 3.6× over other non-KCs, P < 10−16) and endothelial cells (up 2.3× over other non-KCs, P < 10−16).

Figure 4.

Figure 4.

LINE-1 expression in tumor and matched skin from younger and older non-melanoma skin cancer patients. (A) UMAP embedding of all sequenced cells colored by log normalized LINE-1 expression. (B) LINE-1 expression by cell type. ***P < 0.001 for enrichment in this cell type, *0.01 < P < 0.05. Red indicates comparison to all other cells, gray indicates comparison to other non-tumor KCs. (C) UMAP embedding of reclustered KCs, colored by log normalized LINE-1 expression. (D) KC LINE-1 expression by subtype. ***P < 0.001 for enrichment in this cell type, *:0.01 < P < 0.05. Red indicates comparison to all other cells, gray indicates comparison to other non-BCC KCs. (E) Comparison of normal sample KC LINE-1 expression in the under 40 (younger) and over 80 (older) samples.

To get greater detail on LINE-1 expression in KCs, we reclustered the epithelial cells separately (Figure 4C). LINE-1 expression was by far the highest in malignant BCC cells (up 15× over other KCs, P < 10−16), but was also elevated in SCC cells (up 2.3x over other non-malignant KCs, P = 4.6 × 10−6). Unexpectedly, we also found LINE-1 expression to be elevated in sebocytes (up 2.2× over non-BCC KCs, P = 8.8 × 10−7, Figure 4D).

LINE-1 has been hypothesized to be de-repressed as a result of heterochromatin loss during aging (56). We therefore wanted to know whether the epithelial cell LINE-1 expression identified here was affected by age. We then used an nb glm model to test a binary age variable (>80 years versus < 40 years) as a predictor of LINE-1 expression in our data. We found that advanced age predicted a modest 35% increase in normal KC LINE-1 expression (P = 0.04, Figure 4E). However, with only two patients, this analysis lacked the power to fully account for differences in sample cell type make-up. In particular, the older SCC patient included many hair follicle associated KCs that were absent from other samples (Supplemental Figure S3, bottom right).

LINE-1 is expressed in mouse hippocampal neurons

While we did not see clear evidence for LINE-1 expression in the bulk human brain tissue samples we analyzed above, a number of studies have suggested a role for LINE-1 in normal and diseased brains. We therefore wanted to know whether LINE-1 expression in the brain can be detected at the single cell/nucleus level. While we were not able to find appropriate data from human brain cells, we did identify 150 bp paired-end read 5′ targeted single nucleus RNA-seq data from the hippocampi of two young (4-month-old) and two old (24-month-old) mice. Because mice have three active LINE-1 families (Tf, Gf and A), we aligned reads to the L1MdTf_I, L1MdGf_I and L1MdA_I consensus sequences (57). After filtering low quality alignments, we found L1MdTf_I to be most promising for LINE-1 expression (Figure 5A, Supplemental Figure S11): a read 1 peak occurs in the tandem repeat region (at the position where transcription is expected to initiate) and is followed by a broader read 2 peak. A large number of reads that align near the 3′ end of the sequence are likely a form of background and were filtered out from further analyses. Few reads were found to align to L1MdA_I and read alignments to L1MdGf_I did not form as clear a peak at the 5′ end. We then counted L1MdTf UMIs and found that LINE-1 expression was highest in neurons, particularly those of the cornu ammonis (CA/hippocampus proper, Figure 5B). Overall, neuronal LINE-1 expression was about twice glial LINE-1 expression (P = 2.2 × 10−8), and neurons of the cornu ammonis expressed about 50% more LINE-1 than those of the dentate gyrus (P = 4.3 × 10−4, Figure 5C). We then used the nb glm model to test whether advanced age was predictive of increased LINE-1 expression in neurons. This yielded a modest effect similar in scale to what we observed in the skin samples: LINE-1 expression was 25% higher in older neurons (P = 0.039, Figure 5D).

DISCUSSION

While very strong evidence from both RNA and protein (44,67,68) shows that LINE-1 is highly expressed in many tumors, the extent of its expression in normal cellular contexts is much less well understood. New evidence has linked retrotransposons broadly and LINE-1 specifically to a variety of non-cancer diseases, including age-related inflammation (16,32), neurodegenerative diseases (69–71) and autoimmune disorders (72,73). This makes it all the more important that we have a clear and accurate understanding of where and when LINE-1 expression is driven from its own promoter in human cells. To that end, we developed 5′ scL1seq, a method to quantify LNE-1 expression in single cell RNA-seq data generated on the 10× Genomics platform, and found that the 5′ targeted method combined with 100 + bp paired end reads is particularly effective for accurately identifying LINE-1 expression.

When we analyzed data generated by this method for LINE-1, we were surprised to find clear evidence for LINE-1 expression in normal epithelial cells in multiple tissue types. This reflects the fact that LINE-1 activity in cancer is largely limited to tumors of epithelial origin (18–23,47,50). This relationship between normal and tumor cell LINE-1 expression suggests that LINE-1 de-repression in cancer may not so much be a switch from ‘repressed’ to ‘expressed’ as much as the widening of a pre-existing window of LINE-1 expression. Somatic LINE-1 expression in primary cells prior to tumor development may also have important implications for how cancer develops—particularly in the acquisition of p53 mutations (14). It is, however, important to note that the quantifications here are of LINE-1 RNA. Because there are many post-transcriptional mechanisms for LINE-1 regulation (74), future research is needed to show whether the expression we observe here leads to the translation of LINE-1 proteins and to new retrotransposition events.

We then looked at direct tumor/normal comparisons from non-melanoma skin cancer patients to see how normal LINE-1 expression relates to tumor LINE-1 expression. Despite seeing similar levels of normal LINE-1 expression in basal and suprabasal cells—the normal equivalent for basal (BCC) and squamous cell carcinomas (SCC) respectively—we saw much higher LINE-1 expression in the cancer cells from BCC than SCC. This suggests that LINE-1 expression in tumors is not simply a reflection of the background expression in relevant cell types of origin, but also involves tumor-specific factors, yet to be elucidated.

Because aging is associated with a loss of heterochromatin that can lead to expression of retrotransposons, including LINE-1 (16,32,33,56,75), particularly in senescent cells (16), we wanted to know whether the epithelial LINE-1 we observe here is an age-related phenomenon. While we did find that tumor adjacent normal skin from two patients over 80 had higher LINE-1 expression than tumor adjacent normal skin from two patients under 40, the effect was modest. Thus, it may be that the modest increase with age is due to the increased prevalence of senescent cells, but that there is also another source of epithelial cell LINE-1 expression that is present in both young and old tissues. A much larger sample size would be required to fully determine the effect of age on LINE-1 expression in epithelial cells. Nevertheless, our results suggest that there is agreement between our method and strategies for LINE-1 expression quantification that show an increase with age.

Finally, we analyzed nuclei from the mouse hippocampus for LINE-1 expression, where we found it to be expressed in neurons, especially those of cornu ammonis (hippocampus proper.) In cancer cell lines, LINE-1 RNPs are primarily cytoplasmic7, so it is possible that the single nucleus RNA-seq used to evaluate neuronal expression is measuring a distinct population of LINE-1 mRNAs from those present in malignant and normal epithelial cells. It is also important to note that mouse and human LINE-1 are not identical. Mice have 3 active LINE-1 families whereas humans have 1, and mouse and human LINE-1 may be regulated in different ways due to the differing structures of their 5′ UTRs.

The strength of the 5′ targeted method is that it pinpoints the transcription start site (TSS) of a LINE-1 transcript, allowing us to distinguish proper LINE-1 transcripts from other transcripts that just happen to include some LINE-1 sequence. This is by no means a problem unique to LINE-1 or unique to human and mouse. What transcript gives rise to transposable element (TE) RNA is an important question for analysis of TE expression, regardless of the element under consideration. Knowing the TSS provides key information about the relevant transcripts. Other TEs in other organisms could be analyzed similarly by replacing individual loci with consensus sequences and then counting only high-quality alignments at the relevant TSS. Alternatively, for TEs that are not as highly repetitive as LINE-1 in human and mouse, cellranger will likely be able to align TEs reads to the genome, making it possible to count reads directly from the standard genome alignment.

In this study, we developed a method to accurately quantify LINE-1 expression in tens of thousands of single cells and were able to make significant progress toward answering the question of what constitutes normal LINE-1 expression and how it compares to tumor LINE-1 expression. However, our results also raise the question as to why LINE-1 is more highly expressed in epithelial cells and neurons compared to other cells. The answer likely involves complex locus specific regulation at specific LINE-1 elements (51,76,77).

DATA AVAILABILITY

3′ lung adenocarcinoma data was accessed from the SRA database: PRJNA510251. 5′ lung squamous cell data is provided as example data from 10x genomics (https://www.10xgenomics.com/resources/datasets/nsclc-tumor-1-standard-5-0-0). SRA bioproject accession numbers for the other datasets are as follows: PRJNA510251 for 3′ lung adenocarcinoma patient data; PRJNA670909 for 5′ normal tissues data; PRJNA781454 for the cell line and non-melanoma skin cancer patient data; PRJNA677926 for 5′ mouse hippocampus data. A full list of data and accession is found in Supplementary Table S1. Scripts and readme necessary to run 5′ scL1seq on other datasets can be found on github (https://github.com/wmckerrow/5pL1sc) and Zenodo (https://doi.org/10.5281/zenodo.7541408).

Supplementary Material

gkad049_Supplemental_Files

ACKNOWLEDGEMENTS

We would like to thank Martin S Taylor for his helpful suggestions to improve figure clarity.

Contributor Information

Wilson McKerrow, Institute for Systems Genetics, NYU Langone Health, New York, NY, USA; Department of Biochemistry and Molecular Pharmacology, NYU Langone Health, New York, NY, USA.

Larisa Kagermazova, Institute for Systems Genetics, NYU Langone Health, New York, NY, USA; Department of Biochemistry and Molecular Pharmacology, NYU Langone Health, New York, NY, USA.

Nicole Doudican, Ronald O. Perelman Department of Dermatology, NYU Langone Health, New York, NY, USA.

Nicholas Frazzette, Ronald O. Perelman Department of Dermatology, NYU Langone Health, New York, NY, USA.

Efiyenia Ismini Kaparos, Institute for Systems Genetics, NYU Langone Health, New York, NY, USA; Department of Biochemistry and Molecular Pharmacology, NYU Langone Health, New York, NY, USA.

Shane A Evans, Center for Computational Molecular Biology, Brown University, Providence, RI, USA; Department of Molecular Biology, Cell Biology, and Biochemistry, Brown University, Providence, RI, USA.

Azucena Rocha, Center for Computational Molecular Biology, Brown University, Providence, RI, USA; Department of Molecular Biology, Cell Biology, and Biochemistry, Brown University, Providence, RI, USA.

John M Sedivy, Department of Molecular Biology, Cell Biology, and Biochemistry, Brown University, Providence, RI, USA; Center on the Biology of Aging, Brown University, Providence, RI, USA.

Nicola Neretti, Center for Computational Molecular Biology, Brown University, Providence, RI, USA; Department of Molecular Biology, Cell Biology, and Biochemistry, Brown University, Providence, RI, USA.

John Carucci, Ronald O. Perelman Department of Dermatology, NYU Langone Health, New York, NY, USA.

Jef D Boeke, Institute for Systems Genetics, NYU Langone Health, New York, NY, USA; Department of Biochemistry and Molecular Pharmacology, NYU Langone Health, New York, NY, USA; Department of Biomedical Engineering, NYU Tandon School of Engineering, Brooklyn, NY11201, USA.

David Fenyö, Institute for Systems Genetics, NYU Langone Health, New York, NY, USA; Department of Biochemistry and Molecular Pharmacology, NYU Langone Health, New York, NY, USA.

SUPPLEMENTARY DATA

Supplementary Data are available at NAR Online.

FUNDING

NIH [P01AG051449 (NIA to J.S. subcontracts to J.D.B. and D.F.), U24CA210972 (NCI to D.F.), 1R21CA235521 (NCI to J.D.B.)]; N.N. lab was supported by IDeA [P20GM109035 (Center for Computational Biology of Human Disease)] from NIH NIGMS; NIH NIA [1R01AG050582-01A1]; National Cancer Institute, National Institutes of Health [HHSN261200800001E subcontract to W.M. via Seven Bridges Genomics]. Funding for open access charge: NIH [P01AG051449].

Conflict of interest statement. David Fenyö is a Founder and President of The Informatics Factory, and serves or served on the Scientific Advisory Board or consults for: Spectragen Informatics, Protein Metrics, Preverna. Jef Boeke is a Founder and Director of CDI Labs, Inc., a Founder of and consultant to Neochromosome, Inc., a Founder, SAB member of and consultant to ReOpen Diagnostics, LLC and serves or served on the Scientific Advisory Board of the following: Sangamo, Inc., Logomix Inc., Modern Meadow, Inc., Rome Therapeutics, Inc., Sample6, Inc., Tessera Therapeutics, Inc. and the Wyss Institute. John Sedivy is a cofounder of Transposon Therapeutics, Inc., serves as Chair of its Scientific Advisory Board, and consults for Astellas Innovation Management LLC, Atropos Therapeutics, Inc. and Gilead Sciences, Inc. John Carucci receives research funding from Regeneron, is a clinical investigator for Regeneron and Incyte, and prepared educational materials for Genentech.

REFERENCES

  • 1. Burns K.H., Boeke J.D.. Human transposon tectonics. Cell. 2012; 149:740–752. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 2. Swergold G.D. Identification, characterization, and cell specificity of a human LINE-1 promoter. Mol. Cell. Biol. 1990; 10:6718–6729. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 3. Wei W., Gilbert N., Ooi S.L., Lawler J.F., Ostertag E.M., Kazazian H.H., Boeke J.D., Moran J.V.. Human L1 retrotransposition: cis preference versus trans complementation. Mol. Cell. Biol. 2001; 21:1429–1439. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 4. Martin S.L. The ORF1 protein encoded by LINE-1: structure and function during L1 retrotransposition. J. Biomed. Biotechnol. 2006; 2006:45621. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 5. Feng Q., Moran J.V., Kazazian H.H., Boeke J.D.. Human L1 retrotransposon encodes a conserved endonuclease required for retrotransposition. Cell. 1996; 87:905–916. [DOI] [PubMed] [Google Scholar]
  • 6. Mathias S.L., Scott A.F., Kazazian H.H., Boeke J.D., Gabriel A.. Reverse transcriptase encoded by a human transposable element. Science. 1991; 254:1808–1810. [DOI] [PubMed] [Google Scholar]
  • 7. Mita P., Wudzinska A., Sun X., Andrade J., Nayak S., Kahler D.J., Badri S., LaCava J., Ueberheide B., Yun C.Y.et al.. LINE-1 protein localization and functional dynamics during the cell cycle. Elife. 2018; 7:e30058. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 8. Cost G.J., Feng Q., Jacquier A., Boeke J.D.. Human L1 element target-primed reverse transcription in vitro. EMBO J. 2002; 21:5899–5910. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 9. Hancks D.C., Kazazian H.H.. Roles for retrotransposon insertions in human disease. Mob. DNA. 2016; 7:9. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 10. Miki Y., Nishisho I., Horii A., Miyoshi Y., Utsunomiya J., Kinzler K.W., Vogelstein B., Nakamura Y.. Disruption of the APC gene by a retrotransposal insertion of L1 sequence in a colon cancer. Cancer Res. 1992; 52:643–645. [PubMed] [Google Scholar]
  • 11. Scott E.C., Gardner E.J., Masood A., Chuang N.T., Vertino P.M., Devine S.E.. A hot L1 retrotransposon evades somatic repression and initiates human colorectal cancer. Genome Res. 2016; 26:745–755. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 12. Cajuso T., Sulo P., Tanskanen T., Katainen R., Taira A., Hänninen U.A., Kondelin J., Forsström L., Välimäki N., Aavikko M.et al.. Retrotransposon insertions can initiate colorectal cancer and are associated with poor survival. Nat. Commun. 2019; 10:4022. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 13. Gasior S.L., Wakeman T.P., Xu B., Deininger P.L.. The Human LINE-1 retrotransposon creates DNA double-strand breaks. J. Mol. Biol. 2006; 357:1383–1393. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 14. Ardeljan D., Steranka J.P., Liu C., Li Z., Taylor M.S., Payer L.M., Gorbounov M., Sarnecki J.S., Deshpande V., Hruban R.H.et al.. Cell fitness screens reveal a conflict between LINE-1 retrotransposition and DNA replication. Nat. Struct. Mol. Biol. 2020; 27:168–178. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 15. Thomas C.A., Tejwani L., Trujillo C.A., Negraes P.D., Herai R.H., Mesci P., Macia A., Crow Y.J., Muotri A.R.. Modeling of TREX1-dependent autoimmune disease using human stem cells highlights L1 accumulation as a source of neuroinflammation. Cell Stem Cell. 2017; 21:319–331. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 16. Cecco M.D., Ito T., Petrashen A.P., Elias A.E., Skvir N.J., Criscione S.W., Caligiana A., Brocculi G., Adney E.M., Boeke J.D.et al.. L1 drives IFN in senescent cells and promotes age-associated inflammation. Nature. 2019; 566:73. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 17. Tunbak H., Enriquez-Gasca R., Tie C.H.C., Gould P.A., Mlcochova P., Gupta R.K., Fernandes L., Holt J., van der Veen A.G., Giampazolias E.et al.. The HUSH complex is a gatekeeper of type I interferon through epigenetic regulation of LINE-1s. Nat. Commun. 2020; 11:5387. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 18. Tang Z., Steranka J.P., Ma S., Grivainis M., Rodić N., Huang C.R.L., Shih I.-M., Wang T.-L., Boeke J.D., Fenyö D.et al.. Human transposon insertion profiling: analysis, visualization and identification of somatic LINE-1 insertions in ovarian cancer. Proc. Natl. Acad. Sci. U.S.A. 2017; 114:E733–E740. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 19. Rodić N., Steranka J.P., Makohon-Moore A., Moyer A., Shen P., Sharma R., Kohutek Z.A., Huang C.R., Ahn D., Mita P.et al.. Retrotransposon insertions in the clonal evolution of pancreatic ductal adenocarcinoma. Nat. Med. 2015; 21:1060–1064. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 20. Tubio J.M.C., Li Y., Ju Y.S., Martincorena I., Cooke S.L., Tojo M., Gundem G., Pipinikas C.P., Zamora J., Raine K.et al.. Mobile DNA in cancer. Extensive transduction of nonrepetitive DNA mediated by L1 retrotransposition in cancer genomes. Science. 2014; 345:1251343. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 21. Rodriguez-Martin B., Alvarez E.G., Baez-Ortega A., Zamora J., Supek F., Demeulemeester J., Santamarina M., Ju Y.S., Temes J., Garcia-Souto D.et al.. Pan-cancer analysis of whole genomes identifies driver rearrangements promoted by LINE-1 retrotransposition. Nat. Genet. 2020; 52:306–319. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 22. Helman E., Lawrence M.S., Stewart C., Sougnez C., Getz G., Meyerson M.. Somatic retrotransposition in human cancer revealed by whole-genome and exome sequencing. Genome Res. 2014; 24:1053–1063. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 23. Jung H., Choi J.K., Lee E.A.. Immune signatures correlate with L1 retrotransposition in gastrointestinal cancers. Genome Res. 2018; 28:1136–1146. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 24. Gardner E.J., Lam V.K., Harris D.N., Chuang N.T., Scott E.C., Pittard W.S., Mills R.E.1000 Genomes Project Consortium 1000 Genomes Project Consortium Devine S.E.. The Mobile element locator tool (MELT): population-scale mobile element discovery and biology. Genome Res. 2017; 11:1916–1929. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 25. Skowronski J., Singer M.F.. Expression of a cytoplasmic LINE-1 transcript is regulated in a human teratocarcinoma cell line. Proc. Natl. Acad. Sci. U.S.A. 1985; 82:6050–6054. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 26. Martin S.L. Ribonucleoprotein particles with LINE-1 RNA in mouse embryonal carcinoma cells. Mol. Cell. Biol. 1991; 11:4804–4807. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 27. Hickey D.A. Selfish DNA: a sexually-transmitted nuclear parasite. Genetics. 1982; 101:519–531. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 28. Packer A.I., Manova K., Bachvarova R.F.. A discrete LINE-1 transcript in mouse blastocysts. Dev. Biol. 1993; 157:281–283. [DOI] [PubMed] [Google Scholar]
  • 29. Muotri A.R., Chu V.T., Marchetto M.C.N., Deng W., Moran J.V., Gage F.H.. Somatic mosaicism in neuronal precursor cells mediated by L1 retrotransposition. Nature. 2005; 435:903–910. [DOI] [PubMed] [Google Scholar]
  • 30. Bestor T.H. Cytosine methylation mediates sexual conflict. Trends Genet. TIG. 2003; 19:185–190. [DOI] [PubMed] [Google Scholar]
  • 31. Kano H., Godoy I., Courtney C., Vetter M.R., Gerton G.L., Ostertag E.M., Kazazian H.H.. L1 retrotransposition occurs mainly in embryogenesis and creates somatic mosaicism. Genes Dev. 2009; 23:1303–1312. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 32. Simon M., Van Meter M., Ablaeva J., Ke Z., Gonzalez R.S., Taguchi T., De Cecco M., Leonova K.I., Kogan V., Helfand S.L.et al.. LINE1 Derepression in aged wild-type and SIRT6-deficient mice drives inflammation. Cell Metab. 2019; 29:871–885. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 33. De Cecco M., Criscione S.W., Peterson A.L., Neretti N., Sedivy J.M., Kreiling J.A.. Transposable elements become active and mobile in the genomes of aging mammalian somatic tissues. Aging. 2013; 5:867–883. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 34. Lanciano S., Cristofari G.. Measuring and interpreting transposable element expression. Nat. Rev. Genet. 2020; 21:721–736. [DOI] [PubMed] [Google Scholar]
  • 35. O’Neill K., Brocks D., Hammell M.G.. Mobile genomics: tools and techniques for tackling transposons. Philos. Trans. R. Soc. Lond. B. Biol. Sci. 2020; 375:20190345. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 36. He J., Babarinde I.A., Sun L., Xu S., Chen R., Shi J., Wei Y., Li Y., Ma G., Zhuang Q.et al.. Identifying transposable element expression dynamics and heterogeneity during development at the single-cell level with a processing pipeline scTE. Nat. Commun. 2021; 12:1456. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 37. Stow E.C., Baddoo M., LaRosa A.J., LaCoste D., Deininger P., Belancio V.. SCIFER: approach for analysis of LINE-1 mRNA expression in single cells at a single locus resolution. Mob. DNA. 2022; 13:21. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 38. Rodríguez-Quiroz R., Valdebenito-Maturana B.. SoloTE for improved analysis of transposable elements in single-cell RNA-seq data using locus-specific expression. Commun. Biol. 2022; 5:1063. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 39. Shao W., Wang T.. Transcript assembly improves expression quantification of transposable elements in single-cell RNA-seq data. Genome Res. 2021; 31:88–100. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 40. Linker S.B., Randolph-Moore L., Kottilil K., Qiu F., Jaeger B.N., Barron J., Gage F.H.. Identification of bona fide B2 SINE retrotransposon transcription through single-nucleus RNA-seq of the mouse hippocampus. Genome Res. 2020; 11:1643–1654. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 41. Berrens R.V., Yang A., Laumer C.E., Lun A.T.L., Bieberich F., Law C.-T., Lan G., Imaz M., Bowness J.S., Brockdorff N.et al.. Locus-specific expression of transposable elements in single cells with CELLO-seq. Nat. Biotechnol. 2022; 40:546–554. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 42. Sun X., Wang X., Tang Z., Grivainis M., Kahler D., Yun C., Mita P., Fenyö D., Boeke J.D.. Transcription factor profiling reveals molecular choreography and key regulators of human retrotransposon expression. Proc. Natl. Acad. Sci. U.S.A. 2018; 115:E5526–E5535. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 43. Laughney A.M., Hu J., Campbell N.R., Bakhoum S.F., Setty M., Lavallée V.-P., Xie Y., Masilionis I., Carr A.J., Kottapalli S.et al.. Regenerative lineages and immune-mediated pruning in lung cancer metastasis. Nat. Med. 2020; 26:259–269. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 44. Rodić N., Sharma R., Sharma R., Zampella J., Dai L., Taylor M.S., Hruban R.H., Iacobuzio-Donahue C.A., Maitra A., Torbenson M.S.et al.. Long interspersed element-1 protein expression is a hallmark of many human cancers. Am. J. Pathol. 2014; 184:1280–1286. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 45. McKerrow W., Fenyö D.. L1EM: a tool for accurate locus specific LINE-1 RNA quantification. Bioinforma. Oxf. Engl. 2020; 36:1167–1173. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 46. Deininger P., Morales M.E., White T.B., Baddoo M., Hedges D.J., Servant G., Srivastav S., Smither M.E., Concha M., DeHaro D.L.et al.. A comprehensive approach to expression of L1 loci. Nucleic Acids Res. 2017; 45:e31. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 47. Carreira P.E., Ewing A.D., Li G., Schauer S.N., Upton K.R., Fagg A.C., Morell S., Kindlova M., Gerdes P., Richardson S.R.et al.. Evidence for L1-associated DNA rearrangements and negligible L1 retrotransposition in glioblastoma multiforme. Mob. DNA. 2016; 7:21. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 48. Philippe C., Vargas-Landin D.B., Doucet A.J., van Essen D., Vera-Otarola J., Kuciak M., Corbin A., Nigumann P., Cristofari G.. Activation of individual L1 retrotransposon instances is restricted to cell-type dependent permissive loci. Elife. 2016; 5:e13926. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 49. He S., Wang L.-H., Liu Y., Li Y.-Q., Chen H.-T., Xu J.-H., Peng W., Lin G.-W., Wei P.-P., Li B.et al.. Single-cell transcriptome profiling of an adult human cell atlas of 15 major organs. Genome Biol. 2020; 21:294. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 50. Achanta P., Steranka J.P., Tang Z., Rodić N., Sharma R., Yang W.R., Ma S., Grivainis M., Huang C.R.L., Schneider A.M.et al.. Somatic retrotransposition is infrequent in glioblastomas. Mob. DNA. 2016; 7:22. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 51. Stow E.C., Kaul T., deHaro D.L., Dem M.R., Beletsky A.G., Morales M.E., Du Q., LaRosa A.J., Yang H., Smither E.et al.. Organ-, sex- and age-dependent patterns of endogenous L1 mRNA expression at a single locus resolution. Nucleic Acids Res. 2021; 49:5813–5831. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 52. Navarro F.C., Hoops J., Bellfy L., Cerveira E., Zhu Q., Zhang C., Lee C., Gerstein M.B.. TeXP: deconvolving the effects of pervasive and autonomous transcription of transposable elements. PLoS Comput. Biol. 2019; 15:e1007293. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 53. Wang D., Eraslan B., Wieland T., Hallström B., Hopf T., Zolg D.P., Zecha J., Asplund A., Li L., Meng C.et al.. A deep proteome and transcriptome abundance atlas of 29 healthy human tissues. Mol. Syst. Biol. 2019; 15:e8503. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 54. McKerrow W., Wang X., Mendez-Dorantes C., Mita P., Cao S., Grivainis M., Ding L., LaCava J., Burns K.H., Boeke J.D.et al.. LINE-1 expression in cancer correlates with p53 mutation, copy number alteration, and S phase checkpoint. Proc. Natl. Acad. Sci. U.S.A. 2022; 119:e2115999119. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 55. Satpathy S., Krug K., Beltran P.M.J., Savage S.R., Petralia F., Kumar-Sinha C., Dou Y., Reva B., Kane M.H., Avanessian S.C.et al.. A proteogenomic portrait of lung squamous cell carcinoma. Cell. 2021; 184:4348–4371. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 56. Wood J., Helfand S.. Chromatin structure and transposable elements in organismal aging. Front. Genet. 2013; 4:274. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 57. Sookdeo A., Hepp C.M., McClure M.A., Boissinot S.. Revisiting the evolution of mouse LINE-1 in the genomic era. Mob. DNA. 2013; 4:3. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 58. Bolger A.M., Lohse M., Usadel B.. Trimmomatic: a flexible trimmer for Illumina sequence data. Bioinformatics. 2014; 30:2114–2120. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 59. Javahery R., Khachi A., Lo K., Zenzie-Gregory B., Smale S.T.. DNA sequence requirements for transcriptional initiator activity in mammalian cells. Mol. Cell. Biol. 1994; 14:116–127. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 60. Smale S.T., Kadonaga J.T.. The RNA polymerase II core promoter. Annu. Rev. Biochem. 2003; 72:449–479. [DOI] [PubMed] [Google Scholar]
  • 61. Hao Y., Hao S., Andersen-Nissen E., Mauck W.M., Zheng S., Butler A., Lee M.J., Wilk A.J., Darby C., Zager M.et al.. Integrated analysis of multimodal single-cell data. Cell. 2021; 184:3573–3587. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 62. Stuart T., Butler A., Hoffman P., Hafemeister C., Papalexi E., Mauck W.M., Hao Y., Stoeckius M., Smibert P., Satija R.. Comprehensive integration of single-cell data. 2019; 177:1888–1902. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 63. Butler A., Hoffman P., Smibert P., Papalexi E., Satija R.. Integrating single-cell transcriptomic data across different conditions, technologies, and species. Nat. Biotechnol. 2018; 36:411–420. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 64. Satija R., Farrell J.A., Gennert D., Schier A.F., Regev A.. Spatial reconstruction of single-cell gene expression data. Nat. Biotechnol. 2015; 33:495–502. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 65. Cembrowski M.S., Wang L., Sugino K., Shields B.C., Spruston N.. Hipposeq: a comprehensive RNA-seq database of gene expression in hippocampal principal neurons. Elife. 2016; 5:e14997. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 66. Frazzette N., Khodadadi-Jamayran A., Doudican N., Santana A., Felsen D., Pavlick A.C., Tsirigos A., Carucci J.A.. Decreased cytotoxic T cells and TCR clonality in organ transplant recipients with squamous cell carcinoma. Npj Precis. Oncol. 2020; 4:13. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 67. McKerrow W., Wang X., Mendez-Dorantes C., Mita P., Cao S., Grivainis M., Ding L., LaCava J., Burns K.H., Boeke J.et al.. LINE-1 expression in cancer correlates with DNA damage response, copy number variation, and cell cycle progression. Proc. Natl. Acad. Sci. U.S.A. 2022; 8:e2115999119. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 68. Ardeljan D., Wang X., Oghbaie M., Taylor M.S., Husband D., Deshpande V., Steranka J.P., Gorbounov M., Yang W.R., Sie B.et al.. LINE-1 ORF2p expression is nearly imperceptible in human cancers. Mob. DNA. 2020; 11:1. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 69. Krug L., Chatterjee N., Borges-Monroy R., Hearn S., Liao W.-W., Morrill K., Prazak L., Rozhkov N., Theodorou D., Hammell M.et al.. Retrotransposon activation contributes to neurodegeneration in a Drosophila TDP-43 model of ALS. PLoS Genet. 2017; 13:e1006635. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 70. Tam O.H., Rozhkov N.V., Shaw R., Kim D., Hubbard I., Fennessey S., Propp N.NYGC ALS Consortium NYGC ALS Consortium Fagegaltier D., Harris B.T.et al.. Postmortem cortex samples identify distinct molecular subtypes of ALS: retrotransposon activation, oxidative stress, and activated glia. Cell Rep. 2019; 29:1164–1177. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 71. Guo C., Jeong H.-H., Hsieh Y.-C., Klein H.-U., Bennett D.A., De Jager P.L., Liu Z., Shulman J.M.. Tau activates transposable elements in Alzheimer's disease. Cell Rep. 2018; 23:2874–2880. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 72. Carter V., LaCava J., Taylor M.S., Liang S.Y., Mustelin C., Ukadike K.C., Bengtsson A., Lood C., Mustelin T.. High prevalence and disease correlation of autoantibodies against p40 encoded by long interspersed nuclear elements (LINE-1) in systemic lupus erythematosus. Arthritis Rheumatol. 2019; 1:89–99. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 73. Crow M.K. Long interspersed nuclear elements (LINE-1): potential triggers of systemic autoimmune disease. Autoimmunity. 2010; 43:7–16. [DOI] [PubMed] [Google Scholar]
  • 74. Pizarro J.G., Cristofari G.. Post-transcriptional control of LINE-1 retrotransposition by cellular host factors in somatic cells. Front. Cell Dev. Biol. 2016; 4:14. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 75. Gorbunova V., Seluanov A., Mita P., McKerrow W., Fenyö D., Boeke J.D., Linker S.B., Gage F.H., Kreiling J.A., Petrashen A.P.et al.. The role of retrotransposable elements in ageing and age-associated diseases. Nature. 2021; 596:43–53. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 76. Taylor D., Lowe R., Philippe C., Cheng K.C.L., Grant O.A., Zabet N.R., Cristofari G., Branco M.R.. Locus-specific chromatin profiling of evolutionarily young transposable elements. Nucleic Acids Res. 2022; 50:e33. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 77. Freeman B., White T., Kaul T., Stow E.C., Baddoo M., Ungerleider N., Morales M., Yang H., Deharo D., Deininger P.et al.. Analysis of epigenetic features characteristic of L1 loci expressed in human cells. Nucleic Acids Res. 2022; 50:1888–1907. [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

gkad049_Supplemental_Files

Data Availability Statement

3′ lung adenocarcinoma data was accessed from the SRA database: PRJNA510251. 5′ lung squamous cell data is provided as example data from 10x genomics (https://www.10xgenomics.com/resources/datasets/nsclc-tumor-1-standard-5-0-0). SRA bioproject accession numbers for the other datasets are as follows: PRJNA510251 for 3′ lung adenocarcinoma patient data; PRJNA670909 for 5′ normal tissues data; PRJNA781454 for the cell line and non-melanoma skin cancer patient data; PRJNA677926 for 5′ mouse hippocampus data. A full list of data and accession is found in Supplementary Table S1. Scripts and readme necessary to run 5′ scL1seq on other datasets can be found on github (https://github.com/wmckerrow/5pL1sc) and Zenodo (https://doi.org/10.5281/zenodo.7541408).


Articles from Nucleic Acids Research are provided here courtesy of Oxford University Press

RESOURCES