Summary
Transposable elements (TEs) are now recognized not only as parasitic DNA, whose spread in the genome must be controlled by the host, but also as major players in genome evolution and regulation1,2,3,4,5,6. Long INterspersed Element-1 (LINE-1 or L1), the only currently autonomous mobile transposon in humans, occupies 17% of the genome and continues to generate inter- and intra-individual genetic variation, in some cases resulting in disease1,2,3,4,5,6,7. Nonetheless, how L1 activity is controlled and what function L1s play in host gene regulation remain incompletely understood. Here, we use CRISPR/Cas9 screening strategies in two distinct human cell lines to provide the first genome-wide survey of genes involved in L1 retrotransposition control. We identified functionally diverse genes that either promote or restrict L1 retrotransposition. These genes, often associated with human diseases, control the L1 lifecycle at transcriptional or post-transcriptional levels and in a manner that can depend on the endogenous L1 sequence, underscoring the complexity of L1 regulation. We further investigated L1 restriction by MORC2 and human silencing hub (HUSH) complex subunits MPP8 and TASOR8. HUSH/MORC2 selectively bind evolutionarily young, full-length L1s located within transcriptionally permissive euchromatic environment, and promote H3K9me3 deposition for transcriptional silencing. Interestingly, these silencing events often occur within introns of transcriptionally active genes and lead to down-regulation of host gene expression in a HUSH/MORC2-dependent manner. Together, we provide a rich resource for studies of L1 retrotransposition, elucidate a novel L1 restriction pathway, and illustrate how epigenetic silencing of TEs rewires host gene expression programs.
Most of our knowledge about L1 retrotransposition control comes from studies examining individual candidate genes2,3,4,5,6. To systematically identify genes regulating L1 retrotransposition, we performed a genome-wide CRISPR/Cas9 screen in human chronic myeloid leukemia K562 cells using an L1-G418R retrotransposition reporter9 (Fig. 1a,b). Importantly, the L1-G418R reporter was modified to be driven by a doxycycline (dox)-responsive promoter, as opposed to the native L1 5’UTR, to avoid leaky retrotransposition ahead of the functional screen (Extended Data Fig. 1a–c). The cells become G418R antibiotic resistant only when the L1-G418R reporter undergoes a successful retrotransposition event following dox-induction (Fig. 1b). For the screen, we transduced clonal L1-G418R cells with a lentiviral genome-wide sgRNA library such that each cell expressed a single sgRNA10. We then dox-induced the cells to turn on the L1-G418R reporter for retrotransposition, and split the cells into G418-selected conditions and unselected conditions, which served to eliminate cell growth bias in the screen analysis. The frequencies of sgRNAs in the two populations were measured by deep sequencing (Fig. 1a) and analyzed using Cas9 high-Throughput maximum Likelihood Estimator (CasTLE)11. Consequently, cells transduced with sgRNAs targeting L1 suppressors would have more retrotransposition events than negative control cells and would be enriched through the G418 selection; conversely, cells transduced with sgRNAs targeting L1 activators would be depleted.
Using the above strategy, we identified 25 putative L1 regulators at a 10% FDR cutoff, and 150 genes at a 30% FDR cutoff (Fig. 1c and Extended Data Fig. 1d; see Table S1 for full list). Despite low statistical confidence, many of the 30% FDR cutoff genes overlapped previously characterized L1 regulators (e.g. ALKBH1, SETDB1) and genes functioning in complexes with our top 10% FDR hits (e.g. Fanconi Anemia pathway, HUSH complex), suggesting that they likely encompassed biologically relevant hits. To increase statistical power in distinguishing bona fide L1 regulators among these, we performed a high-coverage secondary screen targeting the 30% FDR hits (150 genes) and an additional 100 genes that were either functionally related to our top hits or which were otherwise previously known to regulate L1 but fell outside of the 30% FDR cutoff threshold (See Table S2 for full list). This secondary screen validated 90 genes out of the top 150 genome-wide screen hits, a fraction close to expected with the 30% FDR cutoff (Fig. 1d and Extended Data Fig. 2a–c).
Altogether, our two-tier screening approach identified 142 human genes that either activate or repress L1 retrotransposition in K562 cells, encompassing over 20 previously known L1 regulators (Extended Data Fig. 2d). Novel candidates are involved in functionally diverse pathways, such as chromatin/transcriptional regulation, DNA damage/repair, and RNA processing (Extended Data Fig. 2e,f). While many DNA damage/repair factors, particularly the Fanconi Anemia (FA) factors, suppress L1 activity, genes implicated in the Non-Homologous End Joining (NHEJ) repair pathway promote L1 retrotransposition (Extended Data Fig. 2f). In agreement, mutations in some of the identified NHEJ factors were previously found to result in decreased retrotransposition frequencies12. Intriguingly, many hits uncovered by our screen (e.g. FA factors, MORC2 and SETX) are associated with human disorders13,14,15,16,17.
To extend our survey of L1 regulators to another cell type, we performed both a genome-wide and a secondary screen in HeLa cells (Extended Data Fig. 1b, 1e) with the same sgRNA libraries used in the K562 screens. Importantly, top hits identified in the K562 genome-wide screen were recapitulated in the HeLa screen (e.g. MORC2, TASOR, SETX, MOV10) (Extended Data Fig. 3a). Furthermore, secondary screens in both K562 and HeLa cells showed concordant effects for groups of genes, for example, the suppressive effects of the FA complex genes, and activating effects of the NHEJ pathway genes (Extended Data Fig. 3b–e). Interestingly, however, a subset of genes showed cell-line selective effects (Extended Data Fig. 3c). At the same time, some of the previously known L1 regulators did not come up as hits in our screen. Several factors could have limited our ability to identify all genes controlling L1 retrotransposition to saturation, such as: (i) a subset of regulators may function in a cell-type specific manner not captured by either K562 or HeLa screens, (ii) essential genes with strong negative effects on cell growth may have dropped out, (iii) regulators that strictly require native L1 UTR sequences may have been missed due to our reporter design. Nonetheless, our combined screens identify many novel candidates for L1 retrotransposition control in human cells and provide a rich resource for mechanistic studies of TEs.
Select screen hits were further validated in K562 cells using a well-characterized L1-GFP reporter18 (Extended Data Fig. 1a), confirming 13 suppressors and 1 activator (SLTM) out of 16 examined genes (Fig. 1e). Interestingly, chromatin regulators (TASOR, MORC2, MPP8, SAFB and SETDB1) suppress the retrotransposition of L1-GFP reporter, but not that of a previously described codon-optimized L1-GFP reporter (hereinafter referred to as (opt)-L1-GFP)19,20, indicating that these factors regulate L1 retrotransposition in a manner dependent upon the native L1 ORF nucleotide sequence (Extended Data Fig. 3f,g). An additional secondary screen against the codon-optimized (opt)-L1-G418R reporter in K562 cells confirmed the sequence-dependent feature of these L1 regulators, and systematically partitioned our top screen hits into native L1 sequence-dependent and –independent candidates (Extended Data Fig. 3h, see Table S2 for full list).
We next examined whether the identified regulators influence the expression of endogenous L1Hs, the youngest and only retrotransposition-competent L1 subfamily in humans. CRISPR-deletion of some genes (TASOR, MPP8, SAFB and MORC2) significantly increased expression of endogenous L1Hs, whereas deletion of other genes, such as SETX, RAD51 or FA complex components, had little effect (Fig. 1f). Since all interrogated genes restrict L1-GFP retrotransposition into the genome (Fig. 1e and Extended Data Fig. 4a), our results suggest that identified suppressors can function at either transcriptional or posttranscriptional level.
We further investigated three candidate transcriptional regulators of L1: MORC2, TASOR and MPP8. TASOR and MPP8 (along with PPHLN1), comprise the HUSH complex and recruit the H3K9me3 methyltransferase SETDB1 to repress genes8. Notably, PPHLN1 and SETDB1 also came up as L1 suppressors in our screen (Fig. 1d and Extended Data Fig. 3b). MORC2, which has recently been shown to biochemically and functionally interact with HUSH21, is a member of the microrchidia (MORC) protein family that has been implicated in transposon silencing in plants and mice22,23. While MORC2/HUSH have been previously implicated in heterochromatin formation, most heterochromatin factors had no impact on L1 retrotransposition, suggesting a selective effect (Fig. 2a and Extended Data Fig. 4b).
Several independent experiments in clonal knockout (KO) K562 lines confirmed that HUSH and MORC2 suppress the retrotransposition of the L1-GFP reporter by silencing its transcription (Fig. 2b,c and Extended Data Fig. 4c–f). Additionally, HUSH/MORC2 repressed endogenous (non-reporter) L1Hs RNA and protein expression in both K562 and human embryonic stem cells24 (hESC, H9) (Fig. 2d and Extended Data Fig. 4g–k). PolyA-selected RNA sequencing (RNA-seq) experiments revealed up-regulated expression of evolutionarily younger L1PA families (including L1Hs) upon HUSH or MORC2 KO in K562 cells (Fig. 2e). Taken together, these data demonstrate that HUSH/MORC2 silence both the reporter transgene as well as endogenous evolutionarily young L1s.
Chromatin immunoprecipitation followed by sequencing (ChIP-seq) from K562 cells and hESCs demonstrated that MORC2, MPP8 and TASOR co-bind genomic regions characterized by specific L1 instances. Elements from the primate-specific L1P family showed higher enrichment than the older L1M family elements (Fig. 3a,b and Extended Data Fig. 5a,b, 7a,b), consistent with the preferential derepression of the former upon HUSH or MORC2 KO (Fig. 2e). Moreover, this enrichment was specific to L1s, as other major repeat classes were not enriched (Fig. 3b and Extended Data Fig. 7b), although all three proteins also targeted expressed KRAB-ZNF genes (Extended Data Fig. 5c,d). HUSH KO in K562 cells almost completely abrogated MORC2 binding at L1s (consistent with recently published observations that HUSH recruits MORC2 for transcriptional repression21), whereas MORC2 deletion led to a modest, but appreciable decrease of HUSH subunit binding (Extended Data Fig. 6). In mouse ESCs, MPP8 bound retrotransposition-competent L1Md-A and L1Md-T, as well as IAP elements, a class of murine endogenous retroviruses that remain currently mobile in the mouse genome (Extended Data Fig. 7c,d), suggesting that regulators uncovered by our study in human cells may in other species target additional active transposons beyond L1s.
Interestingly, even within younger human L1Ps only a subset is bound by HUSH/MORC2 in either K562 cells or hESCs, and we sought to identify genomic or epigenomic features that could explain this selectivity. We found that HUSH/MORC2 selectively target young full-length L1s, particularly the L1PA1-5 in human cells (Fig. 3c,d) and L1Md-A/T in mice (Extended Data Fig. 7e). Both MPP8 and MORC2 bind broadly across the L1: while MORC2 binding is skewed towards the 5’ end, MPP8 shows higher enrichments within the body and at 3’ end of L1PAs, including the L1Hs (L1PA1) elements (Extended Data Fig. 7f,g).
Nonetheless, preference for the full-length, evolutionarily younger L1PAs can only partially explain observed HUSH/MORC2 selectivity, as only a subset of such elements is targeted by the complex (Fig. 3d). We found that the additional layer of selectivity can be explained by the state of surrounding chromatin, with HUSH/MORC2-occupied L1s preferentially immersed within the transcriptionally permissive euchromatic environment marked by modifications such as H3K4me3 and H3K27ac (Fig. 3e). In agreement, HUSH/MORC2-bound L1s are enriched within introns of actively transcribed genes (Extended Data Fig. 8a,b). Furthermore, although most HUSH/MORC2-bound L1s are concordant between K562 and hESCs, those that are bound in a cell type-specific manner tend to be associated with genes that are differentially active between the two cell types (Extended Data Fig. 8c). To understand the role of transcription in HUSH/MORC2 targeting of L1s, we investigated MORC2 and MPP8 occupancy at the inducible L1 transgene. We observed increased binding of these factors upon transcriptional induction (Extended Data Fig. 8d), suggesting that transcription through L1 sequences facilitates HUSH/MORC2 binding. Taken together, HUSH/MORC2 selectively target young, full-length L1s located within transcriptionally permissive euchromatic regions, which are precisely the elements that pose the highest threat to genome integrity, as a subset of them remains mobile and transcription is the first step of L1 mobilization.
Despite their immersion within the euchromatic environment, HUSH/MORC2-bound L1s themselves are heavily decorated with the transcriptionally repressive H3K9me3 (Fig. 3e), consistent with the role of HUSH in facilitating H3K9me3 deposition at target sites8. HUSH/MORC2 KO decreased H3K9me3 level preferentially at L1 versus non-L1 HUSH/MORC2 genomic targets, and at bound versus unbound L1s (Fig. 4a and Extended Data Fig. 9a,b). Since HUSH/MORC2-bound L1s are significantly enriched within introns of transcriptionally active genes (Extended Data Fig. 8a–c), we examined whether HUSH/MORC2 recruitment and its associated H3K9me3 deposition can influence chromatin modification and expression of the host genes. Despite the transcriptionally active status (Extended Data Fig. 8a,b), promoters and especially bodies of genes harboring MORC2/HUSH-bound L1s show appreciable levels of H3K9me3. This enrichment is substantially diminished in the KO lines (Extended Data Fig. 9c) with the concomitant upregulation of genes harboring MORC2/HUSH-bound L1s, but not those with unbound intronic L1s (Fig. 4b). Thus, HUSH/MORC2 binding at intronic L1s leads to a modest, but significant down-regulation of the active genes that harbor them (Fig. 4c and Extended Data Fig. 9d–g, 10a).
Inserting L1 sequences on a transcript leads to decrease in RNA expression via inadequate transcript elongation,25 and this effect has been attributed to the A/T enrichment of L1s. However, our results argue that transcriptional attenuation of host gene expression could be a consequence of epigenetic silencing by HUSH/MORC2 (Fig. 4b,c and Extended Data Fig. 9d–g, 10a), and this possibility is consistent with the described role of genic H3K9me3 in decreasing Pol II elongation rate, leading to its accumulation over the H3K9me3 region26. If such mechanism is at play, then HUSH KO should decrease accumulation of the elongating Pol II over L1 bodies, and this is indeed what we observe in Pol II ChIP-seq experiments (though interestingly, at 5′ UTRs of L1s, Pol II levels are relatively elevated in the KOs) (Extended Data Fig. 10b).
Importantly, host gene regulation is directly dependent on the presence of the intronic L1, as deletion of select MORC2/HUSH-bound L1s from the intron led to the upregulation of host mRNA to a level commensurate with the magnitude of changes caused by HUSH/MORC2 KO (Fig. 4d,e and Extended Data Fig. 10c,d). Thus, dampening expression levels of an active gene can be a by-product of a retrotransposition event and associated HUSH/MORC2-mediated L1 silencing (Fig. 4f). Although observed effects on active host genes are only modulatory, they occur to various extents at hundreds of human genes, illustrating how TE activity can rewire host gene expression patterns.
METHODS
Cell culture and antibodies
K562 cells (ATCC) were grown in Roswell Park Memorial Institute (RPMI) 1640 Medium (11875093, Life Technologies) supplemented with 10% Fetal Bovine Serum (Fisher, Cat# SH30910), 2 mM L-glutamine (Fisher, Cat# SH3003401) and 1% penicillin-streptomycin (Fisher, Cat#SV30010), and cultured at 37 °C with 5% CO2. HeLa cells (ATCC) were grown in Dulbecco’s Modified Eagle’s Medium (Life Technologies, Cat# 11995073) supplemented with 10% FBS, 2 mM L-glutamine, and 1% penicillin-streptomycin, and cultured at 37 °C with 5% CO2. H9 human ES cells were expanded in feeder-free, serum-free medium mTeSR-1 from StemCell technologies, passaged 1:6 every 5–6 days using accutase (Invitrogen) and re-plated on tissue culture dishes coated overnight with growth-factor-reduced matrigel (BD Biosciences). Male mouse embryonic stem cells (R1) were grown as described28. Cell cultures were routinely tested and found negative for mycoplasma infection (MycoAlert, Lonza).
Rabbit MORC2 antibody (A300-149A, Bethyl Laboratories), Rabbit MPP8 antibody (16796-1-AP, Protein Technologies Inc), Rabbit TASOR antibody (HPA006735, Atlas Antibodies) were used in Western blots (1:1000 dilution) and ChIP assays. Mouse anti-LINE-1 ORF1p antibody (MABC1152, Millipore)29, Rabbit HSP90 (C45G5, Cell Signalling, #4877), Beta actin antibody (ab49900, Abcam) were used in Western blots. Histone H3 (tri-methyl K9) antibody (ab8898, Abcam) and RNA Pol II (Santa Cruz Biotechnology, N-20 sc-899) were used in ChIP assays.
L1 reporters
The L1-ORF1-ORF2 sequence is derived from the LRE-GFP30, a gift from John Moran. To make the L1-GFP reporter, we used Gibson assembly to clone the L1_ORF1/2 fragment and a GFP-B-globin-intron cassette driven by the mammalian promoter EF1a into the pB transgene using a dox inducible promoter (modified from PBQM812A-1, System Biosciences) to drive the L1 sequence and a UBC-RTTA3-ires Blast as a selectable marker for reporter integration. To make the L1-G418R reporter, we replaced the GFP-B-globin-intron fragment in the L1-GFP reporter with a NEO-intron-NEO cassette driven by the mammalian promoter EF1a. The codon-optimized L1-ORF1-ORF2 sequence in our (opt)-L1 reporter is derived from the SynL1_optORF1_neo, a gift from Astrid Engel31. We replaced the self-splicing Tetrahymena NEO-intron-NEO cassette with the neo-B-globin-intron-neo cassette driven by the EF1a promoter or the GFP-B-globin-intron-GFP cassette driven by the EF1a promoter. This L1-syn-ORF1-ORF2-indicator cassette was inserted into the pB transgene using a dox inducible promoter and a UBC-RTTA3-ires Blast, as described above.
Genome-wide screen in K562 cells
The K562 cell line (with a BFP-Cas9 lentiviral transgene) was nucleofected with the pB-tetO-L1-G418R/Blast construct and the piggyBac transposase (PB210PA-1, System Biosciences) following the manufacturer’s instructions (Lonza 2b nucleofector, T-016 program). The nucleofected cells were sorted using limiting dilution in 96-well plates, and positive clones were screened first for sensitivity to Blast, and then the ability to generate G418 resistant cells after dox induction. The Cas9/L1-G418R cells were lentivirally infected with a genome-wide sgRNA library as described10, containing ~200,000 sgRNAs targeting 20,549 protein-coding genes and 13,500 negative control sgRNAs at an MOI of 0.3-0.4 (as measured by the mCherry fluorescence from the lentiviral vector), and selected for lentiviral integration using puromycin (1 μg/ml) for 3 days as the cultures were expanded for the screens. In duplicate, 200×106 library-infected cells were dox-induced (1 μg/ml) for 10 consecutive days, with a logarithmic growth (500k cells/ml) maintained each day of the dox-induction. After dox-induction, the cells were recovered in normal RPMI complete media for 24 hours, and then split into the G418-selection condition (300 μg/ml G418, Life Technologies, Cat# 11811031) and non-selection conditions. After 7 days of maintaining cells at 500k/ml, 200 M cells under each condition were recovered in normal RPMI media for 24 hours, before they were pelleted by centrifugation for genomic DNA extraction using Qiagen DNA Blood Maxi kit (Cat# 51194) as described32. The sgRNA-encoding constructs were PCR-amplified using Agilent Herculase II Fusion DNA Polymerase (Cat# 600675) (See Table S4 for the primer sequences used). These libraries were then sequenced across two Illumina NextSeq flow cells (~40 M reads per condition; ~200× coverage per library element). Computational analysis of genome-wide screen was performed as previously described10,11 using CasTLE, which is a maximum likelihood estimator that uses a background of negative control sgRNAs as a null model to estimate gene effect sizes. See Table S1 for the K562 genome-wide screen results.
Secondary screen in K562 cells
The secondary screen library included the following, non-comprehensive sets of genes (253 genes in total, ~10 sgRNAs per gene, plus 2500 negative control sgRNAs): all genes falling within ~30% FDR from the K562 genome-wide screen (~150 genes), genes known to be functionally related to the 30% FDR genes, genes previously implicated in L1 biology, and genes involved in epigenetic regulation or position effect variegation (see Table S2 for a complete list). The library oligos were synthesized by Agilent Technologies and cloned into pMCB320 using BstXI/BlpI overhangs after PCR amplification. The Cas9/L1-G418R (or Cas9/(opt)-L1-G418R) K562 cell line was lentivirally infected with the secondary library (~4,500 elements) at an MOI of 0.3-0.4 as described previously33. After puromycin selection (1 μg/ml for 3 days) and expansion, 40 M (~9,000 coverage per library element) cells were dox-induced for 10 days in replicate, recovered for 1 day, and split for 7-day G418-selection and non-selection conditions, with a logarithmic growth (500k cells/ml) maintained as in the K562 genome-wide screen. 10M cells under each condition were used for genomic extractions, sequenced (~6-10M reads per condition; ~1000-2000× coverage per library element) and analyzed using casTLE as described above10,11. See Table S2 for the K562 secondary screen results with L1-G418R and (opt)-L1-G418R.
Genome-wide screen and Secondary screen in HeLa cells
The pB-tetO-L1-G418R/Blast construct was integrated into Cas9 expressing HeLa cells with piggyBac transposase via nucleofection (Lonza 2b nucleofector, I-013 program) following the manufacturer’s instructions. The Cas9/L1-G418R HeLa cells were blasticidin (10 μg/ml) selected, screened for sensitivity to G418 and the ability to generate G418 resistance cells after dox induction, and lentivirally infected with the genome-wide sgRNA library or with the secondary sgRNA library. Infected cells were then puromycin selected (1 μg/ml) for 5 days and expanded for the screens.
For the genome-wide screen, ~200×106 Cas9/L1-G418R HeLa cells (~1,000× coverage of sgRNA library) were dox-induced for 10 days in replicate, recovered for 1 day, and split for 8-day G418-selection and non-selection conditions, with cells being split every other day to maintain the sgRNA library at a minimum of ~350× coverage. ~200M (1,000× coverage) cells per condition were used for genomic extractions and sequencing as described above for the K562 screens. See Table S1 for the HeLa genome-wide screen results.
For the secondary screen, ~1×107 Cas9/L1-G418R HeLa cells (~2,000× coverage of sgRNA library) were dox-induced for 10 days in replicate, recovered for 1 day, and split for 8-day G418-selection and non-selection conditions, with cells being split every other day to maintain ~400× coverage. ~5 million (1,000× coverage) cells per condition were used for genomic extractions and sequencing as described above. See Table S2 for the HeLa secondary screen results.
Validation of individual candidates using the L1-GFP retrotransposition assay
To validate the genome-wide screen hits, we infected clonal Cas9/L1-GFP K562 cells with individual sgRNAs as previously described32, 3 independent mutant cell lines per gene, each with a different sgRNA (cloned into pMCB320 using BstXI/BlpI overhangs; mU6:sgRNA; EF1a:Puromycin-t2a-mCherry). See Table S3 for sgRNA sequences. The infected cells were selected against puromycin (1 μg/ml) for 3 days, recovered in fresh RPMI medium for 1 day, and dox-induced for 10 days. Then, the percentage of GFP(+) cells was measured on a BD Accuri C6 Flow Cytometer (GFP fluorescence detected in FL1 using 488 nm laser) after gating for live mCherry(+) cells.
CRISPR-mediated deletion of individual genes and intronic L1s
To delete genes in H9 ESCs, we cloned target sgRNAs in pSpCas9(BB)-2A-GFP (PX458) as described34. The sgRNA plasmids were prepared with the Nucleospin plasmid kit (Macherey Nagel) and transfected into H9 ESCs using Fugene following the manufacturer’s instructions. After 48-72 hrs, GFP-positive transfected cells were sorted and expanded. Gene depletion effects were validated by western blots.
To delete the L1 from the host gene intron, we designed sgRNAs targeting both upstream and downstream side of the L1 within the intron; one was cloned into pSpCas9(BB)-2A-BFP, while the other into pSpCas9(BB)-2A-GFP. The two sgRNA plasmids were mixed at 1:1 ratio and nucleofected into K562 cells via electroporation following the manufacturer’s instructions. After 48-72 hours, BFP/GFP-positive transfected cells were single-cell sorted and expanded. The genetic deletion effects were validated by PCR assay.
Western blotting
Live cells were lysed for 30 min at 4°C in protein extraction buffer (300 mM NaCl, 100 mM Tris pH 8, 0.2 mM EDTA, 0.1% NP40, 10% glycerol) with protease inhibitors and centrifuged to collect the supernatant lysate. The cell lysate was measured with Bradford reagent (Biorad), separated on SDS-PAGE gels and transferred to nitrocellulose membranes. The L1-reporter containing K562 cells had not been dox-induced when used for western blot assays characterizing endogenous L1_ORF1p levels (Fig. 2d and Extended Data Fig. 4k).
PCR and gel electrophoresis
PCR experiments characterizing the L1-G418R retrotransposition and the deletion of intronic L1s were performed with Phusion High-Fidelity DNA Polymerase (M0530S, NEB), following the manufacturer’s instructions. In general, 30 cycles of PCR reactions were performed at an annealing temperature 5 ºC below the Tm of the primer. No ‘spliced’ PCR products can be detected without dox-induction, even with 40 PCR cycles. PCR reaction products were separated on 1% agarose gels with ethidium bromide. Primer sequences are in Table S4.
qRT-PCR and PspGI-assisted qPCR
Total RNA was isolated from live cells using the RNeasy kit (74104, Qiagen) and treated with RNase-Free DNase Set (79254, Qiagen) to remove genomic DNA, according to the manufacturer’s instructions. 500 ng total RNA was reverse transcribed with SuperScriptA III First-Strand Synthesis System (18080051, Life Technologies) following the manufacturer’s instructions. Beta-actin mRNA was used as internal control within each RNA sample (Fig. 1f and 4d,e). The sequences of PCR primers, including the one targeting the 5′UTR of L1Hs35,36,37, are summarized in Table S4.
Genomic DNA was isolated using PureLink Genomic DNA Mini Kit (K182001, Life Technologies) with RNase A digestion to remove contaminant RNA, according to the manufacturer’s instructions. 300 ng genomic DNA per sample was digested with 50 units PspGI (R0611S, New England Biolabs) in 1× smart buffer (NEB) at 75 °C for 1hr, to cut uniquely at the intron of the GFP cassette. The reaction mixture was then used in qPCR experiments with primers flanking the intron in the GFP cassette (Table S4). Due to the PspGI digestion, the original unspliced L1-GFP reporter will not be amplified by PCR. Only newly integrated GFP cassettes, where the intron was removed during the retrotransposition process, can be PCR amplified. qPCR runs and analysis were performed on the Light Cycler 480II machine (Roche).
Northern Blotting
Northern blotting was conducted as previously described38. Briefly, 15 μg of total RNA from K562 cells or H9 ESC cells was separated on the 0.7% formaldehyde agarose gel, capillary transferred overnight in 20× SSC to the Hybond N membrane (GE Healthcare), crosslinked with a Stratalinker (Stratagene), and hybridized with32 P-labeled single-stranded DNA probes (106 cpm/ml) in ULTRAhyb-Oligo Hybridization Buffer (AM8663, Life Technologies) following the manufacturer’s instructions. Blots were washed two times with wash buffer (2X SSC, 0.5%SDS), and then exposed to film overnight to several days at –80°C with an intensifying screen. The sequence of oligonucleotide probes is in Table S3.
Single molecule FISH
Single molecule FISH (smFISH) assays were performed following the affymetrix Quantigene ViewRNA ISH Cell Assay user manual. 2.5-5 million live K562 cells were fixed within 4% formaldehyde in 1× PBS for 60 mins at RT, resuspended in 1× PBS, pipetted onto poly-L-lysine coated glass cover slip (~20,000 total cells/spot; spread out with a pipette tip), and baked in dry oven at 50±1 °C for 30 minutes to fix the cells onto the glass slip, followed by digestion with Protease QS (1:4000) in 1× PBS for 10 minutes at RT. Cells were hybridized with smFISH probes, designed to target beta actin mRNA (FITC channel) and the L1-GFP reporter mRNA (Cy3 channel), DAPI stained for 5 mins, and mounted with Prolong Gold Antifade Reagent (10 ml/sample). Images were taken by spinning disk confocal microscope equipped with 60× 1.27NA water immersion objective with an effective pixel size of 108×108 nm. Specifically, for each field of view, a z-series of 8 μm is taken with 0.5 μm/z-step for all 3 channels. For quantitation, maximum-projected images from the z-series is used and analyzed by a custom-written matlab script. In brief, all images are first subtracted with the background determined with the OTSU method39 from the log-transformed image after pillbox blurring with a radius of 3 pixels. mRNA puncta are segmented by tophat filter using the background subtracted images and only the ones above 25th percentile intensity of all segmented puncta are taken for downstream analysis. Each punctum is then assigned to the nuclear mask identified by image areas above the previously determined background. For each single cell, the assigned pixel area of L1-GFP mRNA is then normalized to the assigned pixel area of beta-actin mRNA per cell.
RNA-seq
Two independent biological replicates of K562 cells in culture were extracted to isolate DNA-free total RNA sample, using the RNeasy kit (74104, Qiagen) combined with the RNase-Free DNase Set (79254, Qiagen). PolyA-selected RNA were isolated using ‘Dynabeads mRNA Purification Kit for mRNA Purification from Total RNA preps’ (610-06, Life Technologies) following the manuals. 100 ng polyA-selected RNA was fragmented with NEBNext Magnesium RNA Fragmentation Module (E6150S, New England Biolabs), and used for first strand cDNA synthesis with SuperScriptII (18064-014, Invitrogen) and random hexamers, followed by second strand cDNA synthesis with RNAseH (18021-014, Invitrogen) and DNA PolI (18010-025, Invitrogen). The cDNA was purified, quantified, multiplexed and sequenced with 2× 75bp pair-end reads on an Illumina NEXT-seq (Stanford Functional Genomics Facility).
RNA-seq reads were aligned to hg38 reference genome with hisat2 (–no-mixed, –no-discordant) without constraining to known transcriptome. Known (gencode 25) and de-novo transcript coverages were quantified with featureCount. Repeat Masker coverage was quantified with bedtools coverage. Reads mapping to the same repeat family were then tabulated together, since individual read coverage was too low to obtain meaningful results. Differential expression analysis of join gene-repeat data was performed with DESeq240.
ChIP-seq
Two replicates of ChIP experiments per sample were performed as previously described41,42. Approximately 0.5–1 × 107 cells in culture per sample were crosslinked with 1% paraformaldehyde (PFA) for 10 min at room temperature (RT), and quenched by 0.125 M glycine for 10 min at RT. Chromatin was sonicated to an average size of 0.2-0.7 kb using a Covaris (E220 evolution). Sonicated chromatin was incubated with 5-10 μg antibody bound to 100 μl protein G Dynabeads (Invitrogen) and incubated overnight at 4 °C, with 5% kept as input DNA. Chromatin was eluted from Dynabeads after five times wash (50 mM Hepes, 500 mM LiCl, 1 mM EDTA, 1% NP-40, 0.7% Na-deoxycholate), and incubated at 65 °C water bath overnight (12-16 hrs) to reverse crosslinks. ChIP DNA were subject to end repair, A-tailing, adaptor ligation and cleavage with USER enzyme, followed by size selection to 250-500 bp and amplification with NEBNext sequencing primers. Libraries were purified, quantified, multiplexed (with NEBNext Multiplex Oligos for Illumina kit, E7335S) and sequenced with 2× 75 bp pair-end reads on an Illumina NEXT-seq (Stanford Functional Genomics Facility).
ChIP-seq reads were trimmed with cutadapt (-m 50 -q 10) and aligned with bowtie2 (version 2.2.9, –no-mixed –no-discordant –end-to-end -maxins 500) to the hg38 reference genome. ChIP peaks were called with macs2 (version 2.1.1.20160309) callpeak function with broad peak option and human genome effective size using reads form corresponding loss of gene lines as background model. Visualization tracks were generated with bedtools genomecov (-bg -scale) with scaling factor being 10^6/number aligned reads and converted to bigWig with bedGraphToBigWig (Kent tools). BigWigs were plotted with IGV browser. Individual alignments were inspected with IGB browser.
Heatmaps were generated by intersecting bam alignment files with intervals of interest (bedtools v2.25.0), followed by tabulation of the distances of the reads relative to the center of the interval and scaling to account for total aligned read numbers (10^6/number aligned). Heatmaps were plotted using a custom R function. Aggregate plots were generated by averaging rows of the heatmap matrix. For ChIPs in Ctrl and KO K562 clones, ChIP-seq signals in the corresponding KO cells were used as the null reference.
For ChIP-seq repetitive sequence relationship analysis, repeat masker was intersected with ChIP-seq peak calls to classify each masker entry as MPP8 bound, MORC2-bound or unbound. Enriched families of repeats were identified with R fisher.test() followed by FDR correction with qvalue(). Distribution of sizes of occupied vs non-occupied L1 was plotted using R density() with sizes being taken from repeat masker. ks.test() was used to reject null hypothesis that distribution of sizes for bound and unbound L1s is the same. To investigate relationship between L1 age, length and occupancy, logistic regression was performed with R glm() engine.
Quantitative analysis of H3K9me3 changes was performed by first identifying regions of significant enrichment in each sample relative to corresponding input sample (macs2 callpeak), merging the intervals into a common superset. This superset was joined with a decoy randomized set of intervals, twice the size of actual experimental interval set, with the same size distribution (bedtools shuffle). Next the read coverage was determined for each sample (bedtools coverage) and regions with significant change together with fold changes were identified using DESeq240. H3K9me3 regions were classified into bound vs unbound by performing intersect with MORC2 and MPP8 ChIP peak calls.
Data availability
All sequencing data generated in this work has been deposited at GEO under the accession number: GSE95374. H3K4me3 and H3K27ac K562 ChIP-seq datasets in Fig. 3e are from BioProject (accession number PRJEB8620). hESC RNA-seq datasets in Extended Data Fig. 8c are from SRA run entries SRR2043329 and SRR2043330. The complete results of genome-wide screens in K562 and HeLa cells are in Table S1; The complete results of secondary screens in K562 and HeLa cells are in Table S2. The sequences of gRNAs and oligonucleotides used in this work are in Table S3 and Table S4. The uncropped scans with size marker indications are summarized in the Supplementary Figure. All data are available from the corresponding author upon reasonable request.
Code availability
Detailed Data and further code information are available on request from the authors.
Extended Data
Supplementary Material
Acknowledgments
We thank J. Moran for the LRE-GFP plasmid and Astrid Engel for the codon-optimized L1 construct. We thank D. Fuentes, A. Spencley, R. Srinivasan, J. Mohammed, V. Bajpai, K. Tsui, G. Hess, D. Morgens, G. Cornelis for assistance and discussions. We thank K. Cimprich, A. Fire, A. Urban for comments on the manuscript. This work was funded by Jane Coffin Childs Memorial Fund for Medical Research (N.L.), NSF DGE-114747 (C.H.L.), NIH R01HG008150 (M.C.B.), NIH 1DP2HD084069-01 (M.C.B.), NIH R01 GM112720 and Howard Hughes Medical Institute grants (J.W.).
Footnotes
ACCESSION NUMBERS
The accession number of all sequencing samples reported is GEO: GSE95374.
SUPPLEMENTAL INFORMATION is available in the online version of the paper.
CONTRIBUTIONS
N.L., C.L., T.S., J.W., M.B. designed and performed experiments, analyzed data and wrote the manuscript. E.G., C.L., J.W., M.B. initiated the K562 genome-wide screen. B.G. analyzed smFISH data. J.W., M.B. supervised the entire work.
The authors declare no competing financial interests.
References
- 1.Lander ES, et al. Initial sequencing and analysis of the human genome. Nature. 2001;409:860–921. doi: 10.1038/35057062. [DOI] [PubMed] [Google Scholar]
- 2.Levin HL, Moran JV. Dynamic interactions between transposable elements and their hosts. Nat Rev Genet. 2011;12:615–627. doi: 10.1038/nrg3030. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 3.Beck CR, Garcia-Perez JL, Badge RM, Moran JV. LINE-1 Elements in Structural Variation and Disease. Annu Rev Genomics Hum Genet. 2011;12:187–215. doi: 10.1146/annurev-genom-082509-141802. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 4.Mita P, Boeke JD. How retrotransposons shape genome regulation. Curr Opin Genet Dev. 2016;37:90–100. doi: 10.1016/j.gde.2016.01.001. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 5.Goodier JL. Restricting retrotransposons: a review. Mob DNA. 2016;7:16. doi: 10.1186/s13100-016-0070-z. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 6.Chuong EB, Elde NC, Feschotte C. Regulatory activities of transposable elements: from conflicts to benefits. Nat Rev Genet. 2017;18:71–86. doi: 10.1038/nrg.2016.139. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 7.Philippe C, et al. Activation of individual L1 retrotransposon instances is restricted to cell-type dependent permissive loci. eLife. 2016;5:e13926. doi: 10.7554/eLife.13926. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 8.Tchasovnikarova IA, et al. Epigenetic silencing by the HUSH complex mediates position-effect variegation in human cells. Science. 2015;348:1481–1485. doi: 10.1126/science.aaa7227. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 9.Moran JV, et al. High Frequency Retrotransposition in Cultured Mammalian Cells. Cell. 1996;87:917–927. doi: 10.1016/s0092-8674(00)81998-4. [DOI] [PubMed] [Google Scholar]
- 10.Morgens DW, et al. Genome-scale measurement of off-target activity using Cas9 toxicity in high-throughput screens. Nat Commun. 2017;8:15178. doi: 10.1038/ncomms15178. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 11.Morgens DW, Deans RM, Li A, Bassik MC. Systematic comparison of CRISPR-Cas9 and RNAi screens for essential genes. Nat Biotechnol. 2016;34:634–636. doi: 10.1038/nbt.3567. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 12.Suzuki J, et al. Genetic Evidence That the Non-Homologous End-Joining Repair Pathway Is Involved in LINE Retrotransposition. PLoS Genet. 2009;5 doi: 10.1371/journal.pgen.1000461. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 13.Chance PF, et al. Linkage of the gene for an autosomal dominant form of juvenile amyotrophic lateral sclerosis to chromosome 9q34. Am J Hum Genet. 1998;62:633–640. doi: 10.1086/301769. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 14.Németh AH, et al. Autosomal Recessive Cerebellar Ataxia with Oculomotor Apraxia (Ataxia-Telangiectasia–Like Syndrome) Is Linked to Chromosome 9q34. Am J Hum Genet. 2000;67:1320–1326. doi: 10.1016/s0002-9297(07)62962-0. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 15.Albulym OM, et al. MORC2 mutations cause axonal Charcot–Marie–Tooth disease with pyramidal signs. Ann Neurol. 2016;79:419–427. doi: 10.1002/ana.24575. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 16.Schottmann G, Wagner C, Seifert F, Stenzel W, Schuelke M. MORC2 mutation causes severe spinal muscular atrophy-phenotype, cerebellar atrophy, and diaphragmatic paralysis. Brain. 2016;139:e70–e70. doi: 10.1093/brain/aww252. [DOI] [PubMed] [Google Scholar]
- 17.Brégnard C, et al. Upregulated LINE-1 Activity in the Fanconi Anemia Cancer Susceptibility Syndrome Leads to Spontaneous Pro-inflammatory Cytokine Production. EBioMedicine. 2016;8:184–194. doi: 10.1016/j.ebiom.2016.05.005. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 18.Ostertag EM, Luning Prak ET, DeBerardinis RJ, Moran JV, Kazazian HH. Determination of L1 retrotransposition kinetics in cultured cells. Nucleic Acids Res. 2000;28:1418–1423. doi: 10.1093/nar/28.6.1418. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 19.Han JS, Boeke JD. A highly active synthetic mammalian retrotransposon. Nature. 2004;429:314–318. doi: 10.1038/nature02535. [DOI] [PubMed] [Google Scholar]
- 20.Wagstaff BJ, Barnerβoi M, Roy-Engel AM. Evolutionary Conservation of the Functional Modularity of Primate and Murine LINE-1 Elements. PLOS ONE. 2011;6:e19672. doi: 10.1371/journal.pone.0019672. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 21.Tchasovnikarova IA, et al. Hyperactivation of HUSH complex function by Charcot-Marie-Tooth disease mutation in MORC2. Nat Genet. 2017;49:1035–1044. doi: 10.1038/ng.3878. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 22.Moissiard G, et al. MORC Family ATPases Required for Heterochromatin Condensation and Gene Silencing. Science. 2012;336:1448–1451. doi: 10.1126/science.1221472. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 23.Pastor WA, et al. MORC1 represses transposable elements in the mouse male germline. Nat Commun. 2014;5:5795. doi: 10.1038/ncomms6795. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 24.Garcia-Perez JL, et al. LINE-1 retrotransposition in human embryonic stem cells. Hum Mol Genet. 2007;16:1569–1577. doi: 10.1093/hmg/ddm105. [DOI] [PubMed] [Google Scholar]
- 25.Han JS, Szak ST, Boeke JD. Transcriptional disruption by the L1 retrotransposon and implications for mammalian transcriptomes. Nature. 2004;429:268–274. doi: 10.1038/nature02536. [DOI] [PubMed] [Google Scholar]
- 26.Saint-André V, Batsché E, Rachez C, Muchardt C. Histone H3 lysine 9 trimethylation and HP1γ favor inclusion of alternative exons. Nat Struct Mol Biol. 2011;18:337–344. doi: 10.1038/nsmb.1995. [DOI] [PubMed] [Google Scholar]
- 27.Khan H, Smit A, Boissinot S. Molecular evolution and tempo of amplification of human LINE-1 retrotransposons since the origin of primates. Genome Res. 2006;16:78–87. doi: 10.1101/gr.4001406. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 28.Buecker C, et al. Reorganization of Enhancer Patterns in Transition from Naive to Primed Pluripotency. Cell Stem Cell. 2014;14:838–853. doi: 10.1016/j.stem.2014.04.003. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 29.Taylor MS, et al. Affinity proteomics reveals human host factors implicated in discrete stages of LINE-1 retrotransposition. Cell. 2013;155:1034–1048. doi: 10.1016/j.cell.2013.10.021. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 30.Brouha B, et al. Evidence Consistent with Human L1 Retrotransposition in Maternal Meiosis I. Am J Hum Genet. 2002;71:327–336. doi: 10.1086/341722. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 31.Gasior SL, Roy-Engel AM, Deininger PL. ERCC1/XPF limits L1 retrotransposition. DNA Repair. 2008;7:983–989. doi: 10.1016/j.dnarep.2008.02.006. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 32.Deans RM, et al. Parallel shRNA and CRISPR-Cas9 screens enable antiviral drug target identification. Nat Chem Biol. 2016;12:361–366. doi: 10.1038/nchembio.2050. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 33.Bassik MC, et al. A Systematic Mammalian Genetic Interaction Map Reveals Pathways Underlying Ricin Susceptibility. Cell. 2013;152:909–922. doi: 10.1016/j.cell.2013.01.030. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 34.Cong L, et al. Multiplex Genome Engineering Using CRISPR/Cas Systems. Science. 2013;339:819–823. doi: 10.1126/science.1231143. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 35.Coufal NG, et al. L1 Retrotransposition in Human Neural Progenitor Cells. Nature. 2009;460:1127–1131. doi: 10.1038/nature08248. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 36.Shukla R, et al. Endogenous Retrotransposition Activates Oncogenic Pathways in Hepatocellular Carcinoma. Cell. 2013;153:101–111. doi: 10.1016/j.cell.2013.02.032. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 37.Carreira PE, et al. Evidence for L1-associated DNA rearrangements and negligible L1 retrotransposition in glioblastoma multiforme. Mob DNA. 2016;7 doi: 10.1186/s13100-016-0076-6. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 38.Doucet AJ, Wilusz JE, Miyoshi T, Liu Y, Moran JV. A 3′ Poly(A) Tract Is Required for LINE-1 Retrotransposition. Mol Cell. 2015;60:728–741. doi: 10.1016/j.molcel.2015.10.012. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 39.Otsu N. A Threshold Selection Method from Gray-Level Histograms. IEEE Trans Syst Man Cybern. 1979;9:62–66. [Google Scholar]
- 40.Love M, Huber W, Anders S. Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2. Genome Biol. 2014;15:550. doi: 10.1186/s13059-014-0550-8. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 41.Bajpai R, et al. CHD7 cooperates with PBAF to control multipotent neural crest formation. Nature. 2010;463:958–962. doi: 10.1038/nature08733. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 42.Rada-Iglesias A, et al. A unique chromatin signature uncovers early developmental enhancers in humans. Nature. 2011;470:279–283. doi: 10.1038/nature09692. [DOI] [PMC free article] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Supplementary Materials
Data Availability Statement
All sequencing data generated in this work has been deposited at GEO under the accession number: GSE95374. H3K4me3 and H3K27ac K562 ChIP-seq datasets in Fig. 3e are from BioProject (accession number PRJEB8620). hESC RNA-seq datasets in Extended Data Fig. 8c are from SRA run entries SRR2043329 and SRR2043330. The complete results of genome-wide screens in K562 and HeLa cells are in Table S1; The complete results of secondary screens in K562 and HeLa cells are in Table S2. The sequences of gRNAs and oligonucleotides used in this work are in Table S3 and Table S4. The uncropped scans with size marker indications are summarized in the Supplementary Figure. All data are available from the corresponding author upon reasonable request.