Skip to main content
Fundamental Research logoLink to Fundamental Research
. 2025 Mar 8;6(1):11–19. doi: 10.1016/j.fmre.2025.03.001

Jumping in the human brain: A review on somatic transposition

Yufei Zhang a,b,1, Yanyan Guo a,b,1, Hangxing Jia a, Huijing Ma a, Shengjun Tan a, Yong E Zhang a,b,
PMCID: PMC12869766  PMID: 41647578

Abstract

As a Nobel Prize-winning discovery, transposable elements, or “jumping genes”, have attracted significant interest due to their roles in providing functional coding and regulatory sequences. A longstanding hypothesis suggests that somatic transposition may preferentially occur in the mammalian brain, contributing to neuronal diversity. Here, we aim to provide the latest overview of somatic transposition studies in the human brain. We first introduce the historical context and the limited studies on the functionality of somatic transposition, indicating its pathogenic role. We then highlight the wide variability in somatic transposition rate estimates across studies, discussing the complexities—such as artificial chimeras and the multicopy nature—that contribute to false positive and negative results. We also review the evolving experimental and computational methods designed to mitigate these challenges and briefly cover studies estimating germline transposition rate. Finally, we suggest that advances in single-cell genome amplification methods, coupled with deep learning-based software, could pave the way for more definitive studies on the prevalence and functional role of somatic transposition in the human brain.

Keywords: Transposable elements, Somatic transposition, Human brain, Deep learning, Single-cell whole genome sequencing

1. Introduction

Transposable elements (TEs), often termed “jumping genes”, are selfish genetic elements that increase their copy numbers by changing locations within the host genome. Since Dr. Barbara McClintock’s pioneering discovery of TEs in the 1940s [1,2], research interests have been accumulating due to their prevalence in eukaryotic and prokaryotic genomes, their mutagenic properties, and their roles in functional evolution through the provision of coding and regulatory sequences [[3], [4], [5], [6]]. As in many other species, a substantial proportion (∼46%) of the human reference genome is derived from TEs [7]. These TEs, resulting from germline transposition events accumulated over long-term human evolution, may confer domesticated functional roles shaped by natural selection [[8], [9], [10]]. Compared to reference TEs, non-reference TEs, including recently originated and de novo germline TE insertions, and somatic TE insertions occurring post-zygotically, are less studied. Unlike germline transposition, somatic transposition is particularly challenging to study as only a subset of cells harbors the corresponding insertions. For humans, three types of TEs can contribute to somatic transposition due to their mobility: autonomous Long interspersed element 1 (L1) and non-autonomous Alu and SINE-VNTR-Alu (SVA), with the transposition of Alu and SVA depending on L1 [11]. In 2005, the Gage lab proposed that somatic transposition may preferentially occur in mammalian neurons, contributing to their functional diversity [12]. Despite extensive efforts over the past two decades, the spatiotemporal mode and functional consequences of somatic transposition remain largely unclear, with limited studies supporting its pathogenic role. To summarize the discoveries in this field, an excellent review was published in 2018, comprehensively covering historical background, studies in various species, functional consequences, and technical challenges [13]. Herein, we provide an updated and focused overview of somatic transposition studies in the human brain, addressing potential functionality, conflicting prevalence estimates, experimental and computational advances, and promising research directions. For comparison, we briefly describe studies estimating germline transposition rates.

2. The enigmatic functionality of somatic transposition

Germline and somatic transposition have different functional outcomes. Both types of mutations are subject to natural selection: germline transposition at the organismal level and somatic transposition at the cellular level. Germline TE insertions, especially those long fixed in humans, have undergone extensive positive and negative natural selection. As a result, usually only neutral, nearly neutral, and beneficial insertions have a higher chance to exist. Beneficial insertions become domesticated, providing promoters, enhancers, insulators, and protein sequences. In contrast, somatic TE insertions are subject to selection over a much shorter timescale, potentially resulting in a relatively higher proportion of deleterious or pathogenic insertions.

Consistently, limited studies indicate that somatic transposition tends to be pathogenic. On the one hand, transposition events are often considered deleterious and have been observed in various cancers [14], contributing to both driver and passenger mutations [[15], [16], [17]]. Similarly, increased transposition rates have been noted in neurodevelopmental disorders such as schizophrenia [18] and Rett syndrome [19]. However, neurological disorders may also result from alternative mechanisms, such as the accumulation of TE-derived transcripts or DNA [[20], [21], [22]]. On the other hand, in 10 cases, a single somatic TE insertion has been implicated in directly causing diseases, such as hereditary tumors or immune diseases [[23], [24], [25], [26], [27], [28], [29], [30], [31], [32]].

Nonetheless, some studies suggest a beneficial role for somatic transposition. The most influential one among them is the aforementioned Gage study, which proposed that transposition preferentially occurs during normal brain development, contributing to neuronal diversity [12,[33], [34], [35]]. However, there is hitherto no direct evidence supporting the beneficial role of transposition in brain development.

3. The uncertain rate of somatic transposition

The functional impact of somatic transposition in the brain remains unclear, partly due to the uncertainty in transposition rates. Extensive efforts have been made to identify somatic transposition events in humans, particularly in the brain (Table 1). Transposition has been detected in various regions, including the cerebral cortex and hippocampus, and across various cell types, such as neurons and glia [36,37]. However, the reported quantities and rates of transposition vary widely, ranging from 0 to over 80,000 per bulk study, and 0.04 to 80 insertions per single cell (Table 1). Even within the same brain region and cell type, discrepancies are notable; for instance, the per-cell rate for neurons in the cerebral cortex in healthy individuals ranges from 0.07 [38] to 16.3 [36]. This large contrast may stem from differences in the study design, e.g., sequencing depth (especially across bulk samples), or analytical frameworks (see following sections). Notably, transposition rates have been consistently evaluated for both neurons and glia within the same studies, giving conflicting results. Some studies showed similar rates: 0.1–0.2 insertions per cell in [39], ∼1 insertion per cell in [37], while others showed neurons contained more insertions than glia: 13.7 insertions per cell vs. 6.5 insertions per cell in [36].

Table 1.

Estimates of human somatic transposition rates.

Studya Sample ageb Sequencing strategy Computational method Tissuec Estimated somatic transposition ratesd
L1 Alu SVA
Coufal et al. [42] postnatal bulk qPCR HIP 80/cell
Baillie et al. [43] postnatal bulk RC-seqe customized pipeline HIP, CN 7743/7 samples 0.04/cellg [44] 13,692/7 samples 1350/7 samples
Bundo et al. [18] postnatal bulk WGS customized pipeline PFC 2600/3 samples
PFC from SZ patients 4213/3 samples
Kurnosov et al. [45] postnatal bulk targeted sequencing customized pipeline CB 1651/1 sample 1317/1 sample
FC 462/1 sample 2138/1 sample
SVZ 1133/1 sample 1308/1 sample
DG 3100/1 sample 2984/1 sample
myocardium 1151/1 sample 1243/1 sample
Upton et al. [36] postnatal bulk RC-seqe customized pipeline liver 175/4 samples
Jacob-Hirsch et al. [46] bulk WGS customized pipeline HIP, CB, OC 1911/5 samples
CB, OC, FC, SEGAs from AT, NSA, Rett, SEGA, and TSC patients 84,495/15 samples
Muñoz-Lopez et al. [47] embryonic and newborn bulk (MDA)f RC-seqe, ATLAS-seqe customized pipeline ICM of blastocysts 1/2 samplesh
RC-seqe placenta 6/10 samplesh
Zhao et al. [19] postnatal bulk HAT-seqe customized pipeline PFC (neurons) 3170/5 samples 1.22/cellg
PFC (neurons) from Rett patients 3291/5 samples 1.36/cellg
heart 1170/5 samples 0.54/cellg
heart from Rett patients 580/2 samples 0.69/cellg
eye from Rett patients 563/2 samples 0.61/cellg
fibroblast from Rett patients 411/1 sample 0.69/cellg
Zhu et al. [41] fetal bulk WGS RetroSom [41] cortical tissues (neurons) 0/1 sample 0/1 sample 0/1 sample
cortical tissues (astrocytes) 0/1 sample 0/1 sample 0/1 sample
heart 0/1 sample 0/1 sample 0/1 sample
postnatal STG (neurons) 0/3 samples 0/3 samples 0/3 samples
STG (glia) 0/3 samples 0/3 samples 0/3 samples
STG (neurons) from SZ patients 2/2 samplesh 0/2 samples 0/2 samples
STG (glia) from SZ patients 0/2 samples 0/2 samples 0/2 samples
heart 0/1 sample 0/1 sample 0/1 sample
fibroblast 0/2 samples
0/2 samples
0/2 samples 0/2 samples
fibroblast from SZ patients 0/2 samples 0/2 samples
Berteli et al. [48] germline bulk (MDA)f TIPseqe TIPseqHunter [49] sperm 17/10 samplesh
Möhner et al. [50] postnatal bulk RDAe customized pipeline PFC 5149/2 samples
HIP 4168/2 samples
CF 2614/2 samples
OB 6765/2 samples
CB 4378/2 samples
Ramirez et al. [51] postnatal bulk WGS (ONT) TLDR [52] FC from healthy individuals and AD patients 163/18 samples (L1+Alu+SVA)
Wallace et al. [53] newborn bulk WGS MELT [54] placenta from healthy live births 2/6 samples 0/6 samples 0/6 samples
placenta from live births with FGR 1/4 samples 0/4 samples 0/4 samples
placenta from stillbirths with FGR 2/7 samples 0/7 samples 0/7 samples
Evrony et al. [38] postnatal single-cell (MDA) L1-IPe customized pipeline 300 neurons from FC and CN 0.07/cellh
Evrony et al. [40] postnatal single-cell (MDA) WGS scTea [40] 16 neurons from PFC 0.18/cellh [39] 0/cell 0/cell
Upton et al. [36] postnatal single-cell (MALBAC) RC-seqe customized pipeline 92 neurons from HIP 13.7/cellh 0.18/cellg [39]
22 glia from HIP 6.5/cellh 0.20/cellg [39]
35 neurons from FC 16.3/cellh 0.25/cellg [39]
21 neurons from HIP of AGS patients 8/cellh 0.14/cellg [39]
Erwin et al. [37] postnatal single-cell (MDA) SLAV-seqe customized pipeline 40 neurons from HIP 0.91/cellh
23 glia from HIP 1.66/cellh
15 neurons from FC 0.83/cellh
11 glia from FC 0.78/cellh
Muñoz-Lopez et al. [47] embryonic and newborn single-cell (MDA) RC-seqe, ATLAS-seqe customized pipeline 6 cells from ICM of blastocysts 0/cellh
Sanchez-Luque et al. [55] postnatal single-cell (MDA) WGS, RC-seqe, L1-IPe TEBreak [56] 24 neurons from HIP 0.04/cellh,i 0/cell 0/cell
Nam et al. [57] postnatal clones from single cells WGS MELT [54], TraFiC-mem [58], DELLY [59] and xTea [60] 140 HSC clones 0.007/celli 0/cell 0/cell
341 fibroblast clones 0.04/celli 0.02/celli 0/cell
406 normal colorectal clones from colorectal cancer patients 3.04/celli 0.005/celli 0/cell
a

Studies are first ordered by bulk vs. single-cell and then by publication year

b

To simplify, we used “postnatal” to refer to children or adult samples

c

HIP, hippocampus; CN, caudate nucleus; PFC, prefrontal cortex; CB, cerebellum; FC, frontal cortex; SVZ, subventricular zone; DG, dentate gyrus; OC, occipital cortex; ICM, inner cell mass; STG, superior temporal gyrus; CF, calcarine fissure; OB, olfactory bulb; HSC, hematopoietic stem and progenitor cells; SZ, schizophrenia; AT, ataxia-telangiectasia; NSA, non-syndromic autism; Rett, Rett syndrome; SEGA, subependymal giant cell astrocytoma; TSC, tuberous sclerosis complex; AD, Alzheimer's disease; FGR, fetal growth restriction; AGS, Aicardi-Goutières syndrome

d

“-” represents that the sequencing strategy or the computational method could not detect this type of TE

e

A targeted sequencing strategy was used

f

Multiple displacement amplification (MDA) was used for amplifying DNAs from bulk samples. In addition, MALBAC stands for multiple annealing and looping-based amplification cycles

g

Two estimates were available in the original analysis or the reanalysis

h

The somatic transposition rate was corrected by PCR validation

i

The original work did not provide an estimate, so we calculated it as the total number divided by cell counts.

The high uncertainty of transposition rates means that the temporal mode of somatic transposition events, or whether they biasedly occur in some developmental stage, is also unknown. Actually, as of 2024, only five somatic TE insertions have been timed based on their frequencies and distributions in the brain: one during the embryogenesis (morula) stage [19], three in neuroepithelial cells during initial brain organogenesis [40,41], and one in a neocortical progenitor at a relatively late stage [40]. Despite the small sample size, this dataset suggests that somatic transposition may preferentially occur during early brain development.

Transposition rate studies in somatic tissues other than the brain are scarce (Table 1). However, tissues from the heart, liver, and fibroblasts have been used as controls in brain or cancer studies [14,19]. Only one study estimated transposition rates across the brain and other tissues, finding that the per-cell rate in prefrontal cortex neurons (1.22–1.36) was higher than in heart, eye, and fibroblasts (0.54–0.69) [19]. Whether this high neuronal activity is consistent across studies or in a broader tissue panel requires further investigation.

4. The overall framework for somatic TE insertion identification

Detecting somatic TE insertions involves identifying sequence features of transposition. For L1, transposition relies on target-primed reverse transcription (TPRT), resulting in target site duplications (TSDs) flanking the insertion site and the incorporation of a polyA tail (Fig. 1a). These hallmark sequence features enable the identification of TE insertions in paired-end short-read sequencing by generating two types of supporting read pairs: clipped/split read pairs and discordant read pairs (Fig. 1a). A clipped read pair includes at least one clipped read, where one segment comes from the genomic sequence and the other from the inserted TE sequence. A discordant read pair comprises two reads with conflicting alignments: one read originates solely from the genomic sequence adjacent to the insertion site, while the other read entirely comes from the TE. Additionally, some clipped short-read pairs can capture both sides of TE insertions, especially for small TEs like Alu or severely truncated L1. In long-read sequencing, supporting reads include clipped/split reads and spanning reads capable of capturing entire TE insertions (Fig. 1a).

Fig. 1.

Fig 1 dummy alt text

Identification of somatic TE insertions. (a) Schematic of a typical non-reference TE insertion. Flanking TSDs, the TE sequence, a polyA tail, and corresponding supporting read pairs/reads are shown. 1: clipped/split read pairs from short-read sequencing; 2: discordant read pairs from short-read sequencing; 3: a clipped/split read from long-read sequencing; 4: a spanning read from long-read sequencing. (b) Two typical challenges in somatic TE insertion analyses: multi-mapping issues caused by the multicopy nature of TEs, and artificial chimeras emerging during DNA amplification and library preparation. (c) Sequencing strategies across three dimensions. (d) Overview of read-based and assembly-based computational methods. Note that read-based methods for long-read sequencing are analogous to those employed for short-read sequencing and are therefore omitted from this figure. (e) A random forest model. The model, integrating predictions from multiple decision trees, is used for the classification of supporting read pairs/reads or candidate insertions. (f) A convolutional neural network (CNN) model. The features of a potential insertion are encoded and sent to a CNN model for classification.

Although the framework to identify somatic TE insertions seems straightforward, challenges exist. First, TEs are notoriously difficult to analyze due to their multicopy nature (Fig. 1b). Reads from TE insertions close to preexisting reference TEs are typically unmappable due to alignment ambiguity [61]. Second, artificial chimeras can emerge when two genomic fragments are fused due to template switching during various steps of sequencing library preparation (Fig. 1b), such as PCR in bulk libraries or amplification cycles in single-cell libraries [39]. If one fragment involves a TE, the chimera can mimic signals like clipped/split reads or discordant read pairs. Third, by definition, somatic TE insertions are present in only a subset of cells. In bulk sequencing data from many cells, true positive signals can be hard to identify due to limited supporting reads or read pairs, which may be further obscured by mapping or chimera issues.

5. Experimental advancement to identify somatic TE insertions

To address these challenges, numerous efforts have been made to improve experimental methods for detecting somatic transposition. The initial report by the Gage lab detected somatic transposition in rodent brains using an engineered human L1 reporter system (GFP, [12]), which may not reflect in vivo transposition in humans. Subsequent work implemented quantitative PCR (qPCR) to quantify somatic transposition rates in human brains [42]. However, this approach could be confounded by the accumulated TE sequences which are not integrated into the genome, leading to an overestimation of the rate [13,62]. Consistently, the highest estimate of 80 insertions per cell was generated by qPCR (Table 1). Therefore, subsequent studies have generally taken sequencing-based approaches to detect supporting reads (e.g., clipped/split reads) and identify somatic transposition, evolving along three dimensions (Fig. 1c).

First, compared to bulk sequencing, single-cell sequencing is gaining popularity. A number of studies have used this approach to neurons or glial cells in brain regions such as the hippocampus [36,37,55], cerebral cortex [[36], [37], [38],40], and caudate nucleus [38] (Table 1). This approach increases the likelihood of detecting TE insertions shared by a small proportion of cells, especially when a large number of cells are sequenced. Moreover, reads derived from chimeras and TE insertions are often distinguishable, as chimeras typically do not reach the expected variant allele frequency or fraction (VAF) of 0.5—unless they are generated during the early amplification cycles. Single-cell sequencing also comes with tradeoffs, such as high cost and the tendency of whole-genome amplification methods (e.g., MDA and MALBAC, Table 1) to under-amplify TE regions which cause false negative calls of TE insertions [63]. Studies on mouse neurons and human colorectal epithelial cells have converted single-cell sequencing to bulk sequencing by sequencing clones developed from single cells to control chimeras or uneven DNA amplification [57,64]. This strategy is generally unsuitable for non-dividing cells including mature adult neurons, but it could be applicable to reprogrammed neurons [64]. Additionally, generating enough clones is labor-intensive, and somatic transposition may occur during clone development.

Second, various targeted sequencing techniques have been developed to enrich fragments containing TE sequences and capture more signals associated with somatic transposition in brain regions like the hippocampus [36,37,43,50,55] and cerebral cortex [[36], [37], [38],45,50] (Table 1). Techniques like RC-seq [43] use probes to capture fragments from active TE subfamilies and amplify them for sequencing. Other techniques, such as L1-IP [38], SLAV-seq [37], and HAT-seq [19], design specific primers to enrich the insertions via PCR. Although the enrichment substantially lowers costs by removing unrelated sequences, it generally cannot target all active TEs from L1, Alu, or SVA, leading to false negative calls. Thus, whole-genome sequencing approaches are still actively used.

Third, short- and long-read sequencing approaches have been co-developed. Most somatic transposition studies, including those in Table 1, have been performed with short-read sequencing due to its low cost. However, long reads can easily span the whole TE insertion, addressing multi-mapping ambiguity and differentiating chimeras based on the absence of TSDs. Thus, it is well-known that long-read sequencing is more suitable for detecting various structural variations, including TE insertions [[65], [66], [67], [68], [69]]. With the continuous drop in sequencing prices, whole-genome long-read sequencing has recently been employed to identify somatic TE insertions in the human frontal cortex for the first time [51]. To harness the advantages of long reads while controlling costs, several studies have developed targeted long-read sequencing approaches. In one study, specific gRNAs were designed to cleave target TE sequences with Cas9, followed by bulk sequencing, achieving a read length N50 ranging from 14.9 to 32.3 kb, which substantially improved the resolution of insertion structures when identifying germline TE insertions from human cell lines [70]. Another team combined TE enrichment with single-cell long-read sequencing to detect somatic L1 insertions in mouse breast cancer cells [71]. These techniques should also be applicable for detecting somatic transposition in the brain.

Notably, these three dimensions—bulk vs. single-cell, whole-genome vs. targeted, and short-read vs. long-read—can be combined as needed, providing a versatile approach to study transposition in somatic tissues including brain.

6. Computational advancement to identify somatic TE insertions

In parallel with the blossoming experimental developments dedicated to somatic transposition, computational methods for identifying non-reference TE insertions are also under active development, although they often do not directly differentiate between germline and somatic transposition. Table 2 summarizes up to 25 methods, developed or optimized in the past 5 years, for identifying non-reference TE insertions. Among them, TEBreak [56], TLDR [52], and RretroSom [41] have been used to detect somatic transposition in the human brain (Table 1), while other brain-related studies employed customized pipelines. These tools or pipelines are certainly also suitable for non-brain tissues. Additionally, targeted sequencing studies capturing only one side of TE insertions limited the use of tools requiring evidence from both sides. Since many tools have been extensively reviewed [60,72,73], we will only focus on their core strategies and the role of machine learning algorithms.

Table 2.

Computational methods for identifying non-reference TE insertions in the human genome.

Tool Studya Latest update Type Read type Strategy Genotyping Notes
Mobster Thung et al. [81] 2022 TE Short-read Read-based No
TEBreak Carreira et al. [56] 2023 TE Short-read Read-based VAFb
MELT Gardner et al. [54] 2020 TE Short-read Read-based Yes
RelocaTE2 Chen et al. [82] 2020 TE Short-read Read-based Yes
STEAK Santander et al. [83] 2019 TE Short-read Read-based No
AluMine Puurand et al. [84] 2021 Alu Short-read Read-based Yes
ERVcaller Chen and Li [85] 2024 TE Short-read Read-based Yes
MEScanner Loh et al. [86] 2019 TE Short-read Read-based -c
TIP_finder Orozco-Arias et al. [87] 2021 TE Short-read Read-based No
TypeTE Goubert et al. [88] 2021 TE Short-read Read-based Yes
RetroSom Zhu et al. [41] 2019 TE Short-read Read-based -c Random forest is used for extracting supporting read pairs.
TEMP2 Yu et al. [89] 2024 TE Short-read Read-based VAFb
xTea Chu et al. [60] 2023 TE Short-read and long-read Read-based VAFb Random forest is used for genotyping.
DeepMEI Xu et al. [80] 2024 TE Short-read Read-based Yes CNN is used for identifying insertions.
INSurVeyor Rajaby et al. [90] 2024 SVd Short-read Read-based Yes
nanomonsv Shiraishi et al. [91] 2024 SVd Short-read and Long-read Read-based No
Total ReCall Solovyov et al. [92] -e TE Short-read Read-based -e
McClintock 2 Chen et al. [93] 2024 TE Short-read Read-based VAFb McClintock is a meta-pipeline integrating 12 tools.
rMETL Jiang et al. [94] 2024 TE Long-read Read-based Yes
PALMER Zhou et al. [95] 2023 TE Long-read Read-based No
TLDR Ewing et al. [52] 2023 TE Long-read Read-based No
MEIGA-PAV Ebert et al. [96] 2022 TE Long-read -f No
ricME Ma et al. [97] 2023 TE Long-read Read-based -c
somrit D'Costa and Simpson [98] 2023 TE Long-read Read-based No
GraffiTE Groza et al. [99] 2024 TE Long-read Read-based and assembly-based VAFb
a

Studies are first ordered by short- vs. long-read and then by publication year

b

VAF is also given when genotyping

c

The paper does not explicitly state whether the method can perform genotyping

d

This tool could identify various SVs including TE insertions

e

The source code has not been released

f

MEIGA-PAV relies on several upstream tools.

Tools designed for short-read sequencing data generally rely on the aforementioned clipped and discordant read pairs (Fig. 1d). These supporting read pairs are extracted based on the read-to-genome alignments, followed by realignment to consensus sequences of active TE subfamilies. Properly aligned reads are extracted and clustered by insertion sites as candidate non-reference TE insertions. Two strategies have been developed to further identify somatic transposition among non-reference TE insertions. First, a few tools such as xTea offer a somatic mode that processes both experimental and control bulk samples to retain insertions not shared as candidate somatic TE insertions [60]. If the somatic mode is unavailable, both samples can be analyzed separately, and shared insertions can be manually removed. This strategy could be extended to single-cell sequencing data, with insertions specific to a proportion of cells identified as somatic TE insertions. Second, for bulk whole-genome sequencing data, TE insertions with a VAF significantly lower than 0.5 (expected for heterozygous germline mutations) are deemed candidate somatic TE insertions. Notably, for both strategies, the somatic TE calls can be examined for overlaps with databases collecting non-reference germline TE insertions to exclude polymorphic germline insertions (Fig. 1d, [54,[74], [75], [76], [77]]).

Tools designed for long-read sequencing data rely on either reads or assemblies. Read-based tools are similar in design to those for short-read sequencing data. Assembly-based tools compare assemblies generated from sequencing data with the reference genome to identify TE insertions (Fig. 1d). However, these methods often cannot detect low-frequency somatic TE insertions in bulk data, as these insertions are less likely to be assembled.

The application of machine learning, especially deep learning techniques, has shown superior performance in single nucleotide variant (SNV) and structural variation (SV) detection, as evidenced by tools like DeepVariant [78] and SVision-pro [79]. Similarly, despite limited studies applying machine learning to TE insertion detection, its efficacy in supporting read pair identification, genotyping, and insertion detection is evident. Only three tools have directly applied machine learning to detect TE insertions in the human genome, utilizing random forests and convolutional neural network (CNN) models (Table 2). Specifically, RetroSom employed random forests to extract supporting read pairs, generating a collection of decision trees based on features such as sequence alignment to TE consensus sequences, thereby reducing false positives (Fig. 1e, [41]). Similarly, xTea utilized random forests for insertion genotyping, achieving 99.7% accuracy in testing data (Fig. 1e, [60]). Since CNNs are ideal for image-like data, DeepMEI encoded nucleotide bases, base quality, and mapping quality as pileup images to detect TE insertions (Fig. 1f, [80]).

The development of these computational methods largely reflects advances in experimental techniques. For instance, the emergence of long-read sequencing datasets [51] has driven the need for specialized computational tools (Table 2).

7. The rate of germline transposition

De novo germline transposition, occurring across generations, has been extensively studied before somatic transposition and is recognized for its pathogenic potential [100]. In humans, germline transposition rates have been estimated using either conventional evolutionary methods or the recently developed trio-sequencing approach (Table 3). Evolutionary methods rely on parameters such as the mutation rate, transposition proportion, neutral molecular clock, and evolutionary time [[101], [102], [103]]. In contrast, trio-sequencing is more straightforward by directly identifying mutations present only in offspring. However, this method has two limitations: (1) trio cohorts often come from disease pedigrees, introducing potential sampling bias; and (2) early somatic transposition events at high frequency may be misclassified as germline insertions [104].

Table 3.

Estimates of humande novogermline transposition rates.

Study Estimation strategya Computational methodb Estimated germline transposition rates (insertion/birth)c
L1 Alu SVA
Deininger et al. [105,106] evolutionary methods - 1/100
Kazazian [102] evolutionary methods 1/100–1/8
Li et al. [107] evolutionary methods 1/28–1/2.4
Brouha et al. [108] transposition activity assay analysis 1/33–1/2
Cordaux et al. [101] evolutionary methods 1/20
Xing et al. [109] evolutionary methods 1/212 1/21 1/916
Ewing et al. [103] evolutionary methods 1/270- 1/95
Huang et al. [110] evolutionary methods 1/108
Feusier et al. [111] trio data analysis of healthy individuals MELT [54], RUFUS [112], and TranSurVeyor [113] 1/63 1/40 1/63
Gardner et al. [114] evolutionary methods and trio data analysis of patients with developmental disorders MELT [54] 1/14–1/12
Belyeu et al. [115] trio data analysis of healthy individuals and ASD patients Lumpy [116], Manta [117], Delly [59], Whamg [118], MELT [54], GATK-SV [76,119] 1/231 1/42 1/309
Borges-Monroy et al. [30] trio data analysis of healthy individuals and ASD patients xTea [60] 1/117 1/29 1/206
Niu et al. [120] evolutionary methods MELT [54] 1/17–1/16
Chu et al. [104] trio data analysis of birth defect and childhood cancer patients xTea [60] 1/108 1/34 1/93
a

ASD, autism spectrum disorder

b

“-” represents that the estimation did not utilize computational methods for sequencing data

c

“-” represents that the estimation did not include this type of TE.

Studies on germline transposition rates differ from those on somatic transposition in two key ways (Tables 1, 3). First, while somatic transposition rate estimates vary by several orders of magnitude (Table 1), germline rates show less variation, likely because their high frequency makes detection easier, or their experimental and computational frameworks are more consistent with each other. Both evolutionary and trio-sequencing methods estimate the total germline transposition rate of L1, Alu, and SVA at roughly one event per tens of births (Table 3). Second, Alu transposes more frequently than L1 in the germline (Table 3), consistent with its high genomic copy number, whereas L1 shows higher transposition rates in most somatic studies (Table 1). This discrepancy may reflect distinct regulatory mechanisms between germline and somatic tissues or experimental/computational differences, warranting further investigation.

8. Conclusion and future perspectives

As early as 2005, the Gage group implemented the L1-GFP reporter system and first hypothesized preferential transposition in the brain and its potential benefits [12]. Over the following two decades, advances in experimental and computational methods have enhanced the understanding of somatic transposition in three ways (Table 1). First, short-read sequencing reproduced TE insertions in the normal human brain [43] and confirmed earlier correlations between somatic transposition and neurological disorders (e.g., L1-GFP or PCR [19,121]). Second, single-cell short-read sequencing provided a relatively more accurate transposition rate estimate, much lower than the initial 80 insertions per cell (Table 1). The previously mentioned hypothesis was also moderated: while brain transposition rates may be low, insertions in a few neurons could still substantially impact function due to neuronal circuitry [22]. Third, single-cell short-read sequencing enabled high-resolution temporal analysis, showing that three out of five timed insertions occurred during early brain organogenesis [19,40,41]. Additionally, the recent application of long-read sequencing has resolved somatic TE insertion sequences in the human brain [51]. Both temporal and full-sequence data facilitate interpreting functional impacts. Despite these three lines of progress, the precise rate and function of transposition remain unclear. Discrepancies in reported rates arise from sample heterogeneity (Table 1) and differences in experimental and computational methods. Based on previous work, we identify two promising directions for future exploration.

On the experimental front, further development of low-bias, low template-switching single-cell whole-genome amplification methods is crucial. As mentioned earlier, somatic transposition events present in a subset of cells are difficult to detect in bulk sequencing data, making single-cell sequencing the preferred approach. However, amplification methods used in previous single-cell studies, such as MDA and MALBAC, are associated with underrepresented TE regions and template-switching chimeras [39,63]. Compared to these methods, Linear Amplification via Transposon Insertion (LIANTI) performs relatively better [122]. However, this method is intrinsically complex, involving whole-genome transcription and reverse transcription, and has not been used in somatic transposition studies. In principle, LIANTI could be streamlined, as often shown in the development of sequencing library methods (e.g., [68]). A simplified version of LIANTI or other new low-bias, low-switching methods would enable the generation of high-quality single-cell whole-genome data.

On the computational side, a wave of deep learning-based methods dedicated to somatic transposition detection is on the horizon. Many previous methods have essentially wrapped up sequence alignment with empirical rules without incorporating machine learning algorithms. As shown by the rapid progress in structural variation detection software development, deep learning or neural network-based approaches offer superior performance [79,123]. Since tools specifically designed for TE insertions generally outperform general structural variation detection tools, deep learning-based TE insertion detectors like DeepMEI [80] warrant further development, especially in the following four directions. First, current tools apply machine learning to only one step of the identification process. New tools could benefit from integrating machine learning across multiple steps. Second, the lack of standardized training and benchmark datasets hinders effective model training and evaluation. An ideal dataset would include sufficient artificial chimeras to enhance performance in distinguishing true signals from overwhelming false positives. Third, challenges in TE alignment necessitate the development of novel aligners, more complete reference genomes, and improved TE consensus sequences. Fourth, end-to-end models [124], which directly identify TE insertions from raw sequencing data and minimize preprocessing and reliance on sequence alignment, offer great potential.

With these evolving experimental and computational methodologies and the rapidly declining costs of short- and long-read sequencing, the landscape of somatic transposition in the brain and other human organs is likely to be revealed soon. Precise detection of transposition will pave the way for subsequent functional studies. Additionally, the study of somatic mutations, especially SNVs, has already become an active field, often referred to as somatic mosaicism. Significant insights have been gained by studying somatic SNVs, such as the early cellular division asymmetry of the brain [125] or the rescue effect of somatic mutations for preexisting germline mutations [126]. Novel insights are expected from an in-depth exploration of somatic transposition. One particularly relevant question is whether transposition could sometimes be beneficial, as hypothesized two decades ago [12]. By studying both small mutations like SNVs and large mutations like TE insertions, we can gain a more complete understanding of how development occurs despite inevitable mutational perturbations.

Declaration of competing interest

The authors declare that they have no conflicts of interest in this work.

Acknowledgments

This work was supported by the National Natural Science Foundation of China (32430021, 32325014), the Ministry of Agriculture and Rural Affairs of China, Institute of Zoology, Chinese Academy of Sciences (2023IOZ0204, 2024IOZ0202), and Beijing Municipal Science & Technology Commission, Administrative Commission of Zhongguancun Science Park (Z221100007922017).

Biographies

Yufei Zhang obtained his master’s degree in genomics from the University of Chinese Academy of Sciences. His research interests focus on identifying TE insertions using deep learning methods.

Yanyan Guo is a PhD student at the Institute of Zoology, Chinese Academy of Sciences. Her research interests are transposon insertion and related techniques.

Yong E. Zhang is a researcher at the Institute of Zoology, Chinese Academy of Sciences. From 1997 to 2006, he obtained his bachelor's and doctoral degrees from Peking University, and from 2007 to 2011, he conducted postdoctoral research at the University of Chicago. He joined the Institute of Zoology in 2011. He focuses on the study of evolutionarily new genes and transposons, with expertise in the mutation mechanisms (e.g., transposition) of gene origin, the impact of mutation mechanisms on the functional evolution of new genes, and the translational applications related with transposons.

References

  • 1.McClintock B. Mutable Loci in Maize. Carnegie Institution of Washington Year Book; Washington, D.C: 1948. pp. 155–169. [PubMed] [Google Scholar]
  • 2.McClintock B. The origin and behavior of mutable loci in maize. Proc. Natl. Acad. Sci. U.S.A. 1950;36:344–355. doi: 10.1073/pnas.36.6.344. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 3.Bourque G., Burns K.H., Gehring M., et al. Ten things you should know about transposable elements. Genome Biol. 2018;19:199. doi: 10.1186/s13059-018-1577-z. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 4.Wells J.N., Feschotte C. A field guide to eukaryotic transposable elements. Annu. Rev. Genet. 2020;54:539–561. doi: 10.1146/annurev-genet-040620-022145. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 5.Tan S., Ma H., Wang J., et al. DNA transposons mediate duplications via transposition-independent and -dependent mechanisms in metazoans. Nat. Commun. 2021;12:4280. doi: 10.1038/s41467-021-24585-9. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 6.Tan S., Cardoso-Moreira M., Shi W., et al. LTR-mediated retroposition as a mechanism of RNA-based duplication in metazoans. Genome Res. 2016;26:1663–1675. doi: 10.1101/gr.204925.116. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 7.Hoyt S.J., Storer J.M., Hartley G.A., et al. From telomere to telomere: The transcriptional and epigenetic state of human repeat elements. Science. 2022;376:eabk3112. doi: 10.1126/science.abk3112. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 8.Kapusta A., Kronenberg Z., Lynch V.J., et al. Transposable elements are major contributors to the origin, diversification, and regulation of vertebrate long noncoding RNAs. PLoS Genet. 2013;9 doi: 10.1371/journal.pgen.1003470. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 9.Cosby R.L., Judd J., Zhang R., et al. Recurrent evolution of vertebrate transcription factors by transposase capture. Science. 2021;371:eabc6405. doi: 10.1126/science.abc6405. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 10.Playfoot C.J., Duc J., Sheppard S., et al. Transposable elements and their KZFP controllers are drivers of transcriptional innovation in the developing human brain. Genome Res. 2021;31:1531–1545. doi: 10.1101/gr.275133.120. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 11.Mills R.E., Bennett E.A., Iskow R.C., et al. Which transposable elements are active in the human genome? Trends Genet. 2007;23:183–191. doi: 10.1016/j.tig.2007.02.006. [DOI] [PubMed] [Google Scholar]
  • 12.Muotri A.R., Chu V.T., Marchetto M.C.N., et al. Somatic mosaicism in neuronal precursor cells mediated by L1 retrotransposition. Nature. 2005;435:903–910. doi: 10.1038/nature03663. [DOI] [PubMed] [Google Scholar]
  • 13.Faulkner G.J., Billon V. L1 retrotransposition in the soma: A field jumping ahead. Mob. DNA. 2018;9:22. doi: 10.1186/s13100-018-0128-1. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 14.Rodriguez-Martin B., Alvarez E.G., Baez-Ortega A., et al. Pan-cancer analysis of whole genomes identifies driver rearrangements promoted by LINE-1 retrotransposition. Nat. Genet. 2020;52:306–319. doi: 10.1038/s41588-019-0562-0. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 15.Ardeljan D., Burns K.H. In: Retrotransposons and Human Disease. Gabriel A., editor. World Scientific; Singapore: 2021. LINE-1 mobilization in cancers: More the rule than the exception; pp. 221–243. [Google Scholar]
  • 16.Burns K.H. Repetitive DNA in disease. Science. 2022;376:353–354. doi: 10.1126/science.abl7399. [DOI] [PubMed] [Google Scholar]
  • 17.Burns K.H. Our conflict with transposable elements and its implications for human disease. Annu. Rev. Pathol. 2020;15:51–70. doi: 10.1146/annurev-pathmechdis-012419-032633. [DOI] [PubMed] [Google Scholar]
  • 18.Bundo M., Toyoshima M., Okada Y., et al. Increased L1 retrotransposition in the neuronal genome in schizophrenia. Neuron. 2014;81:306–313. doi: 10.1016/j.neuron.2013.10.053. [DOI] [PubMed] [Google Scholar]
  • 19.Zhao B., Wu Q., Ye A.Y., et al. Somatic LINE-1 retrotransposition in cortical neurons and non-brain tissues of Rett patients and healthy individuals. PLoS Genet. 2019;15 doi: 10.1371/journal.pgen.1008043. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 20.Gorbunova V., Seluanov A., Mita P., et al. The role of retrotransposable elements in ageing and age-associated diseases. Nature. 2021;596:43–53. doi: 10.1038/s41586-021-03542-y. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 21.Benitez-Guijarro M., Benkaddour-Boumzaouad M., Garcia-Perez J.L. In: Retrotransposons and Human Disease. Gabriel A., editor. World Scientific; Singapore: 2021. LINE-1 retrotransposons, stem cells, and Human neurodevelopmental disorders; pp. 129–161. [Google Scholar]
  • 22.Bedrosian T.A., Linker S.B., Gage F.H. In: Retrotransposons and Human Disease. Gabriel A., editor. World Scientific; Singapore: 2021. Retrotransposons in the mammalian brain; pp. 199–220. [Google Scholar]
  • 23.Morse B., Rotherg P.G., South V.J., et al. Insertional mutagenesis of the myc locus by a LINE-1 sequence in a human breast carcinoma. Nature. 1988;333:87–90. doi: 10.1038/333087a0. [DOI] [PubMed] [Google Scholar]
  • 24.Miki Y., Nishisho I., Horii A., et al. Disruption of the APC gene by a retrotransposal insertion of L1 sequence in a colon cancer. Cancer Res. 1992;52:643–645. [PubMed] [Google Scholar]
  • 25.van den Hurk J.A.J.M., Meij I.C., del Carmen Seleme M., et al. L1 retrotransposition can occur early in human embryonic development. Hum. Mol. Genet. 2007;16:1587–1592. doi: 10.1093/hmg/ddm108. [DOI] [PubMed] [Google Scholar]
  • 26.de Boer M., van Leeuwen K., Geissler J., et al. Primary immunodeficiency caused by an exonized retroposed gene copy inserted in the CYBB gene. Hum. Mutat. 2014;35:486–496. doi: 10.1002/humu.22519. [DOI] [PubMed] [Google Scholar]
  • 27.Helman E., Lawrence M.S., Stewart C., et al. Somatic retrotransposition in human cancer revealed by whole-genome and exome sequencing. Genome Res. 2014;24:1053–1063. doi: 10.1101/gr.163659.113. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 28.Vogt J., Bengesser K., Claes K.B.M., et al. SVA retrotransposon insertion-associated deletion represents a novel mutational mechanism underlying large genomic copy number changes with non-recurrent breakpoints. Genome Biol. 2014;15:R80. doi: 10.1186/gb-2014-15-6-r80. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 29.Ardeljan D., Steranka J.P., Liu C., et al. Cell fitness screens reveal a conflict between LINE-1 retrotransposition and DNA replication. Nat. Struct. Mol. Biol. 2020;27:168–178. doi: 10.1038/s41594-020-0372-1. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 30.Borges-Monroy R., Chu C., Dias C., et al. Whole-genome analysis reveals the contribution of non-coding de novo transposon insertions to autism spectrum disorder. Mob. DNA. 2021;12:28. doi: 10.1186/s13100-021-00256-w. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 31.Grundy E.E., Diab N., Chiappinelli K.B. Transposable element regulation and expression in cancer. FEBS J. 2022;289:1160–1179. doi: 10.1111/febs.15722. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 32.Yu L., Li W., Lv G., et al. De novo somatic mosaicism of CYBB caused by intronic LINE-1 element insertion resulting in chronic granulomatous disease. J. Clin. Immunol. 2023;43:88–100. doi: 10.1007/s10875-022-01347-w. [DOI] [PubMed] [Google Scholar]
  • 33.Marchetto M.C.N., Gage F.H., Muotri A.R. In: Perspectives of Stem Cells: From tools For Studying Mechanisms of Neuronal Differentiation Towards Therapy. Ulrich H., editor. Springer Netherlands; Dordrecht: 2010. Retrotransposition and neuronal diversity; pp. 87–96. [Google Scholar]
  • 34.Singer T., McConnell M.J., Marchetto M.C.N., et al. LINE-1 retrotransposons: Mediators of somatic variation in neuronal genomes? Trends Neurosci. 2010;33:345–354. doi: 10.1016/j.tins.2010.04.001. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 35.Erwin J.A., Marchetto M.C., Gage F.H. Mobile DNA elements in the generation of diversity and complexity in the brain. Nat. Rev. Neurosci. 2014;15:497–506. doi: 10.1038/nrn3730. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 36.Upton K.R., Gerhardt D.J., Jesuadian J.S., et al. Ubiquitous L1 mosaicism in hippocampal neurons. Cell. 2015;161:228–239. doi: 10.1016/j.cell.2015.03.026. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 37.Erwin J.A., Paquola A.C.M., Singer T., et al. L1-associated genomic regions are deleted in somatic cells of the healthy human brain. Nat. Neurosci. 2016;19:1583–1591. doi: 10.1038/nn.4388. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 38.Evrony G.D., Cai X., Lee E., et al. Single-neuron sequencing analysis of L1 retrotransposition and somatic mutation in the human brain. Cell. 2012;151:483–496. doi: 10.1016/j.cell.2012.09.035. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 39.Evrony G.D., Lee E., Park P.J., et al. Resolving rates of mutation in the brain using single-neuron genomics. eLife. 2016;5:e12966. doi: 10.7554/eLife.12966. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 40.Evrony G.D., Lee E., Mehta B.K., et al. Cell lineage analysis in Human brain using endogenous retroelements. Neuron. 2015;85:49–59. doi: 10.1016/j.neuron.2014.12.028. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 41.Zhu X., Zhou B., Pattni R., et al. Machine learning reveals bilateral distribution of somatic L1 insertions in human neurons and glia. Nat. Neurosci. 2021;24:186–196. doi: 10.1038/s41593-020-00767-4. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 42.Coufal N.G., Garcia-Perez J.L., Peng G.E., et al. L1 retrotransposition in human neural progenitor cells. Nature. 2009;460:1127–1131. doi: 10.1038/nature08248. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 43.Baillie J.K., Barnett M.W., Upton K.R., et al. Somatic retrotransposition alters the genetic landscape of the human brain. Nature. 2011;479:534–537. doi: 10.1038/nature10531. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 44.Richardson S.R., Morell S., Faulkner G.J. L1 retrotransposons and somatic mosaicism in the brain. Annu. Rev. Genet. 2014;48:1–27. doi: 10.1146/annurev-genet-120213-092412. [DOI] [PubMed] [Google Scholar]
  • 45.Kurnosov A.A., Ustyugova S.V., Nazarov V.I., et al. The evidence for increased L1 activity in the site of Human adult brain neurogenesis. PLoS One. 2015;10 doi: 10.1371/journal.pone.0117854. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 46.Jacob-Hirsch J., Eyal E., Knisbacher B.A., et al. Whole-genome sequencing reveals principles of brain retrotransposition in neurodevelopmental disorders. Cell Res. 2018;28:187–203. doi: 10.1038/cr.2018.8. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 47.Muñoz-Lopez M., Vilar R., Philippe C., et al. LINE-1 retrotransposition impacts the genome of human pre-implantation embryos and extraembryonic tissues. bioRxiv. 2019 doi: 10.1101/522623. [DOI] [Google Scholar]
  • 48.Berteli T.S., Wang F., McKerrow W., et al. Transposon insertion profiling by sequencing (TIPseq) identifies novel LINE-1 insertions in human sperm. J. Assist. Reprod. Genet. 2023;40:1835–1843. doi: 10.1007/s10815-023-02852-6. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 49.Tang Z., Steranka J.P., Ma S., et al. Human transposon insertion profiling: Analysis, visualization and identification of somatic LINE-1 insertions in ovarian cancer. Proc. Natl. Acad. Sci. U.S.A. 2017;114:E733–E740. doi: 10.1073/pnas.1619797114. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 50.Möhner J., Scheuren M., Woronzow V., et al. RDA coupled with deep sequencing detects somatic SVA-retrotranspositions and mosaicism in the human brain. Front. Cell Dev. Biol. 2023;11 doi: 10.3389/fcell.2023.1201258. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 51.Ramirez P., Sun W., Kazempour Dehkordi S., et al. Nanopore-based DNA long-read sequencing analysis of the aged human brain. bioRxiv. 2024 https://www.biorxiv.org/content/10.1101/2024.02.01.578450v1 [Google Scholar]
  • 52.Ewing A.D., Smits N., Sanchez-Luque F.J., et al. Nanopore sequencing enables comprehensive transposable element epigenomic profiling. Mol. Cell. 2020;80:915–928. doi: 10.1016/j.molcel.2020.10.024. [DOI] [PubMed] [Google Scholar]
  • 53.Wallace A.D., Blue N.R., Morgan T., et al. Placental somatic mutation in human stillbirth and live birth: A pilot case-control study of paired placental, fetal, and maternal whole genomes. Placenta. 2024;154:137–144. doi: 10.1016/j.placenta.2024.06.017. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 54.Gardner E.J., Lam V.K., Harris D.N., et al. The Mobile element Locator Tool (MELT): Population-scale mobile element discovery and biology. Genome Res. 2017;27:1916–1929. doi: 10.1101/gr.218032.116. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 55.Sanchez-Luque F.J., Kempen M.-J.H.C., Gerdes P., et al. LINE-1 evasion of epigenetic repression in humans. Mol. Cell. 2019;75:590–604. doi: 10.1016/j.molcel.2019.05.024. [DOI] [PubMed] [Google Scholar]
  • 56.Carreira P.E., Ewing A.D., Li G., et al. Evidence for L1-associated DNA rearrangements and negligible L1 retrotransposition in glioblastoma multiforme. Mob. DNA. 2016;7:21. doi: 10.1186/s13100-016-0076-6. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 57.Nam C.H., Youk J., Kim J.Y., et al. Widespread somatic L1 retrotransposition in normal colorectal epithelium. Nature. 2023;617:540–547. doi: 10.1038/s41586-023-06046-z. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 58.Tubio J.M.C., Li Y., Ju Y.S., et al. Extensive transduction of nonrepetitive DNA mediated by L1 retrotransposition in cancer genomes. Science. 2014;345 doi: 10.1126/science.1251343. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 59.Rausch T., Zichner T., Schlattl A., et al. DELLY: Structural variant discovery by integrated paired-end and split-read analysis. Bioinformatics. 2012;28:i333–i339. doi: 10.1093/bioinformatics/bts378. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 60.Chu C., Borges-Monroy R., Viswanadham V.V., et al. Comprehensive identification of transposable element insertions using multiple sequencing technologies. Nat. Commun. 2021;12:3836. doi: 10.1038/s41467-021-24041-8. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 61.Goerner-Potvin P., Bourque G. Computational tools to unmask transposable elements. Nat. Rev. Genet. 2018;19:688–704. doi: 10.1038/s41576-018-0050-x. [DOI] [PubMed] [Google Scholar]
  • 62.Goodier J.L. Retrotransposition in tumors and brains. Mob. DNA. 2014;5:11. doi: 10.1186/1759-8753-5-11. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 63.Hou Y., Wu K., Shi X., et al. Comparison of variations detection between whole-genome amplification methods used in single-cell resequencing. GigaScience. 2015;4:37. doi: 10.1186/s13742-015-0068-3. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 64.Hazen J.L., Faust G.G., Rodriguez A.R., et al. The complete genome sequences, unique mutational spectra, and developmental potency of adult neurons revealed by cloning. Neuron. 2016;89:1223–1236. doi: 10.1016/j.neuron.2016.02.004. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 65.Chaisson M.J.P., Huddleston J., Dennis M.Y., et al. Resolving the complexity of the human genome using single-molecule sequencing. Nature. 2015;517:608–611. doi: 10.1038/nature13907. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 66.Audano P.A., Sulovari A., Graves-Lindsay T.A., et al. Characterizing the major structural variant alleles of the Human genome. Cell. 2019;176:663–675. doi: 10.1016/j.cell.2018.12.019. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 67.Chaisson M.J.P., Sanders A.D., Zhao X., et al. 2019. Multi-platform discovery of haplotype-resolved structural variation in human genomes. Nat. Commun. 2019;10:1784. doi: 10.1038/s41467-018-08148-z. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 68.Jia H., Tan S., Cai Y., et al. Low-input PacBio sequencing generates high-quality individual fly genomes and characterizes mutational processes. Nat. Commun. 2024;15:5644. doi: 10.1038/s41467-024-49992-6. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 69.Jia H., Tan S., Zhang Y.E. Chasing sequencing perfection: Marching toward higher accuracy and lower costs. Genom. Proteom. Bioinformat. 2024;22:qzae024. doi: 10.1093/gpbjnl/qzae024. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 70.McDonald T.L., Zhou W., Castro C.P., et al. Cas9 targeted enrichment of mobile elements using nanopore sequencing. Nat. Commun. 2021;12:3586. doi: 10.1038/s41467-021-23918-y. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 71.Zhang X., Celic I., Mitchell H., et al. Comprehensive profiling of L1 retrotransposons in mouse. Nucl. Acids Res. 2024;52:5166–5178. doi: 10.1093/nar/gkae273. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 72.Chu C., Zhao B., Park P.J., et al. Identification and genotyping of transposable element insertions from genome sequencing data. Curr. Protoc. Hum. Genet. 2020;107:e102. doi: 10.1002/cphg.102. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 73.Rishishwar L., Mariño-Ramírez L., Jordan I.K. Benchmarking computational tools for polymorphic transposable element detection. Brief. Bioinform. 2016;18:908–918. doi: 10.1093/bib/bbw072. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 74.Wang J., Song L., Grover D., et al. dbRIP: A highly integrated database of retrotransposon insertion polymorphisms in humans. Hum. Mutat. 2006;27:323–329. doi: 10.1002/humu.20307. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 75.Mir A.A., Philippe C., Cristofari G. euL1db: The European database of L1HS retrotransposon insertions in humans. Nucl. Acids Res. 2015;43:D43–D47. doi: 10.1093/nar/gku1043. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 76.Collins R.L., Brand H., Karczewski K.J., et al. A structural variation reference for medical and population genetics. Nature. 2020;581:444–451. doi: 10.1038/s41586-020-2287-8. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 77.Prakrithi P., Singhal K., Sharma D., et al. An Alu insertion map of the Indian population: Identification and analysis in 1021 genomes of the IndiGen project. NAR Genom. Bioinform. 2022;4:lqac009. doi: 10.1093/nargab/lqac009. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 78.Poplin R., Chang P.-C., Alexander D., et al. A universal SNP and small-indel variant caller using deep neural networks. Nat. Biotechnol. 2018;36:983–987. doi: 10.1038/nbt.4235. [DOI] [PubMed] [Google Scholar]
  • 79.Wang S., Lin J., Jia P., et al. De novo and somatic structural variant discovery with SVision-pro. Nat. Biotechnol. 2024;43:181–185. doi: 10.1038/s41587-024-02190-7. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 80.Xu X., Huang Y., Wang X., et al. Identification of mobile element insertion from whole genome sequencing data using deep neural network model. bioRxiv. 2023 doi: 10.1101/2023.03.07.531451. [DOI] [Google Scholar]
  • 81.Thung D.T., de Ligt J., Vissers L.E.M., et al. Mobster: Accurate detection of mobile element insertions in next generation sequencing data. Genome Biol. 2014;15:488. doi: 10.1186/s13059-014-0488-x. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 82.Chen J., Wrightsman T.R., Wessler S.R., et al. RelocaTE2: A high resolution transposable element insertion site mapping tool for population resequencing. PeerJ. 2017;5:e2942. doi: 10.7717/peerj.2942. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 83.Santander C.G., Gambron P., Marchi E., et al. STEAK: A specific tool for transposable elements and retrovirus detection in high-throughput sequencing data. Virus Evol. 2017;3:vex023. doi: 10.1093/ve/vex023. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 84.Puurand T., Kukuškina V., Pajuste F.-D., et al. AluMine: Alignment-free method for the discovery of polymorphic Alu element insertions. Mob. DNA. 2019;10:31. doi: 10.1186/s13100-019-0174-3. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 85.Chen X., Li D. ERVcaller: Identifying polymorphic endogenous retrovirus and other transposable element insertions using whole-genome sequencing data. Bioinformatics. 2019;35:3913–3922. doi: 10.1093/bioinformatics/btz205. [DOI] [PubMed] [Google Scholar]
  • 86.Loh J.W., Ha H., Lin T., et al. Integrated Mobile element scanning (ME-Scan) method for identifying multiple types of polymorphic mobile element insertions. Mob. DNA. 2020;11:12. doi: 10.1186/s13100-020-00207-x. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 87.Orozco-Arias S., Tobon-Orozco N., Piña J.S., et al. TIP_finder: An HPC software to detect transposable element insertion polymorphisms in large genomic datasets. Biology (Basel) 2020;9:281. doi: 10.3390/biology9090281. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 88.Goubert C., Thomas J., Payer L.M., et al. TypeTE: A tool to genotype mobile element insertions from whole genome resequencing data. Nucleic Acids Res. 2020;48:e36. doi: 10.1093/nar/gkaa074. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 89.Yu T., Huang X., Dou S., et al. A benchmark and an algorithm for detecting germline transposon insertions and measuring de novo transposon insertion frequencies. Nucleic Acids Res. 2021;49:e44. doi: 10.1093/nar/gkab010. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 90.Rajaby R., Liu D.-X., Au C.H., et al. INSurVeyor: Improving insertion calling from short read sequencing data. Nat. Commun. 2023;14:3243. doi: 10.1038/s41467-023-38870-2. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 91.Shiraishi Y., Koya J., Chiba K., et al. Precise characterization of somatic complex structural variations from tumor/control paired long-read sequencing data with nanomonsv. Nucleic Acids Res. 2023;51:e74. doi: 10.1093/nar/gkad526. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 92.Solovyov A., Behr J.M., Hoyos D., et al. Mechanism-guided quantification of LINE-1 reveals p53 regulation of both retrotransposition and transcription. bioRxiv. 2023 doi: 10.1101/2023.05.11.539471. [DOI] [Google Scholar]
  • 93.Chen J., Basting P.J., Han S., et al. Reproducible evaluation of transposable element detectors with McClintock 2 guides accurate inference of Ty insertion patterns in yeast. Mob. DNA. 2023;14:8. doi: 10.1186/s13100-023-00296-4. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 94.Jiang T., Liu B., Li J., et al. rMETL: Sensitive mobile element insertion detection with long read realignment. Bioinformatics. 2019;35:3484–3486. doi: 10.1093/bioinformatics/btz106. [DOI] [PubMed] [Google Scholar]
  • 95.Zhou W., Emery S.B., Flasch D.A., et al. Identification and characterization of occult human-specific LINE-1 insertions using long-read sequencing technology. Nucleic Acids Res. 2020;48:1146–1163. doi: 10.1093/nar/gkz1173. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 96.Ebert P., Audano P.A., Zhu Q., et al. Haplotype-resolved diverse human genomes and integrated analysis of structural variation. Science. 2021;372:eabf7117. doi: 10.1126/science.abf7117. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 97.Ma H., Zhong C., Sun H., et al. ricME: Long-read based mobile element variant detection using sequence realignment and identity calculation. Bioinform. Res. Appl. 2023:165–177. [Google Scholar]
  • 98.D’Costa A.V., Simpson J.T. Somrit: The somatic retrotransposon insertion toolkit. bioRxiv. 2023 doi: 10.1101/2023.08.06.552193. [DOI] [Google Scholar]
  • 99.Groza C., Chen X., Wheeler T.J., et al. A unified framework to analyze transposable element insertion polymorphisms using graph genomes. Nat. Commun. 2024;15:8915. doi: 10.1038/s41467-024-53294-2. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 100.Kazazian H.H. In: Retrotransposons and Human Disease. Gabriel A., editor. World Scientific; Singapore: 2021. Retrotransposition as a cause of Human disease: An update; pp. 115–127. [Google Scholar]
  • 101.Cordaux R., Hedges D.J., Herke S.W., et al. Estimating the retrotransposition rate of human Alu elements. Gene. 2006;373:134–137. doi: 10.1016/j.gene.2006.01.019. [DOI] [PubMed] [Google Scholar]
  • 102.Kazazian H.H. An estimated frequency of endogenous insertional mutations in humans. Nat. Genet. 1999;22:130. doi: 10.1038/9638. [DOI] [PubMed] [Google Scholar]
  • 103.Ewing A.D., Kazazian H.H. High-throughput sequencing reveals extensive variation in human-specific L1 content in individual human genomes. Genome Res. 2010;20:1262–1270. doi: 10.1101/gr.106419.110. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 104.Chu C., Ljungström V., Tran A., et al. Contribution of de novo retroelements to birth defects and childhood cancers. medRxiv. 2024 doi: 10.1101/2024.04.15.24305733. [DOI] [Google Scholar]
  • 105.Deininger P.L., Batzer M.A. Alu repeats and Human disease. Mol. Genet. Metab. 1999;67:183–193. doi: 10.1006/mgme.1999.2864. [DOI] [PubMed] [Google Scholar]
  • 106.Deininger P.L., Batzer M.A. In: The Impact of Short Interspersed Elements (SINEs) on the host genome. Maraia R., editor. Landes; Georgetown, Texas: 1995. SINE master genes and population biology; pp. 43–60. [Google Scholar]
  • 107.Li X., Scaringe W.A., Hill K.A., et al. Frequency of recent retrotransposition events in the human factor IX gene. Hum. Mutat. 2001;17:511–519. doi: 10.1002/humu.1134. [DOI] [PubMed] [Google Scholar]
  • 108.Brouha B., Schustak J., Badge R.M., et al. Hot L1s account for the bulk of retrotransposition in the human population. Proc. Natl. Acad. Sci. U.S.A. 2003;100:5280–5285. doi: 10.1073/pnas.0831042100. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 109.Xing J., Zhang Y., Han K., et al. Mobile elements create structural variation: Analysis of a complete human genome. Genome Res. 2009;19:1516–1526. doi: 10.1101/gr.091827.109. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 110.Huang C.R.L., Schneider A.M., Lu Y., et al. Mobile interspersed repeats are major structural variants in the Human genome. Cell. 2010;141:1171–1182. doi: 10.1016/j.cell.2010.05.026. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 111.Feusier J., Watkins W.S., Thomas J., et al. Pedigree-based estimation of human mobile element retrotransposition rates. Genome Res. 2019;29:1567–1577. doi: 10.1101/gr.247965.118. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 112.Ostrander B.E.P., Butterfield R.J., Pedersen B.S., et al. Whole-genome analysis for effective clinical diagnosis and gene discovery in early infantile epileptic encephalopathy. npj Genom. Med. 2018;3:22. doi: 10.1038/s41525-018-0061-8. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 113.Rajaby R., Sung W.-K. TranSurVeyor: An improved database-free algorithm for finding non-reference transpositions in high-throughput sequencing data. Nucleic Acids Res. 2018;46:e122. doi: 10.1093/nar/gky685. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 114.Gardner E.J., Prigmore E., Gallone G., et al. Contribution of retrotransposition to developmental disorders. Nat. Commun. 2019;10:4630. doi: 10.1038/s41467-019-12520-y. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 115.Belyeu J.R., Brand H., Wang H., et al. De novo structural mutation rates and gamete-of-origin biases revealed through genome sequencing of 2,396 families. Am. J. Hum. Genet. 2021;108:597–607. doi: 10.1016/j.ajhg.2021.02.012. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 116.Layer R.M., Chiang C., Quinlan A.R., et al. LUMPY: A probabilistic framework for structural variant discovery. Genome Biol. 2014;15:R84. doi: 10.1186/gb-2014-15-6-r84. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 117.Chen X., Schulz-Trieglaff O., Shaw R., et al. Manta: Rapid detection of structural variants and indels for germline and cancer sequencing applications. Bioinformatics. 2015;32:1220–1222. doi: 10.1093/bioinformatics/btv710. [DOI] [PubMed] [Google Scholar]
  • 118.Kronenberg Z.N., Osborne E.J., Cone K.R., et al. Wham: Identifying structural variants of biological consequence. PLoS Comput. Biol. 2015;11 doi: 10.1371/journal.pcbi.1004572. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 119.Werling D.M., Brand H., An J.-Y., et al. An analytical framework for whole-genome sequence association studies and its implications for autism spectrum disorder. Nat. Genet. 2018;50:727–736. doi: 10.1038/s41588-018-0107-y. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 120.Niu Y., Teng X., Zhou H., et al. Characterizing mobile element insertions in 5675 genomes. Nucleic Acids Res. 2022;50:2493–2508. doi: 10.1093/nar/gkac128. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 121.Muotri A.R., Marchetto M.C.N., Coufal N.G., et al. L1 retrotransposition in neurons is modulated by MeCP2. Nature. 2010;468:443–446. doi: 10.1038/nature09544. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 122.Chen C., Xing D., Tan L., et al. Single-cell whole-genome analyses by Linear Amplification via Transposon insertion (LIANTI) Science. 2017;356:189–194. doi: 10.1126/science.aak9787. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 123.Lin J., Wang S., Audano P.A., et al. SVision: A deep learning approach to resolve complex structural variants. Nat. Methods. 2022;19:1230–1233. doi: 10.1038/s41592-022-01609-w. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 124.Peng C., Liang F., Xia Y., et al. Recent advances and challenges in protein structure prediction. J. Chem. Inf. Model. 2024;64:76–95. doi: 10.1021/acs.jcim.3c01324. [DOI] [PubMed] [Google Scholar]
  • 125.Rodin R.E., Dou Y., Kwon M., et al. The landscape of somatic mutation in cerebral cortex of autistic and neurotypical individuals revealed by ultra-deep whole-genome sequencing. Nat. Neurosci. 2021;24:176–185. doi: 10.1038/s41593-020-00765-6. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 126.Yu Z., Coorens T.H.H., Uddin M.M., et al. Genetic variation across and within individuals. Nat. Rev. Genet. 2024;25:548–562. doi: 10.1038/s41576-024-00709-x. [DOI] [PMC free article] [PubMed] [Google Scholar]

Articles from Fundamental Research are provided here courtesy of The Science Foundation of China Publication Department, The National Natural Science Foundation of China

RESOURCES