Abstract
While mobile elements are largely inactive in healthy somatic tissues, increased activity has been found in cancer tissues, with significant variation among different cancer types. In addition to insertion events, mobile elements have also been found to mediate many structural variation events in the genome. Here, to better understand the timing and impact of mobile element insertions and associated structural variants in cancer, we examined their activity in longitudinal samples of four metastatic breast cancer patients. We identified 11 mobile element insertions or associated structural variants and found that the majority of these occurred early in tumor progression. Most of the variants impact intergenic regions; however, we identified a translocation interrupting MAP2K4 involving Alu elements and a deletion in YTHDF2 involving mobile elements that likely inactivate reported tumor suppressor genes. The high variant allele fraction of the translocation, the loss of the other copy of MAP2K4, the recurrent loss-of-function mutations found in this gene in other cancers, and the important function of MAP2K4 indicate that this translocation is potentially a driver mutation. Overall, using a unique longitudinal dataset, we find that most variants are likely passenger mutations in the four patients we examined, but some variants impact tumor progression.
Subject terms: Genetics, Diseases, Cancer, Cancer genetics, Cancer genomics
Introduction
Mobile elements, or transposable elements, are segments of DNA that are capable of mobilizing from one genomic location to another. These elements compose a significant portion of the human genome, with estimates ranging from nearly 50 to 66%1,2. In humans, retrotransposons represent a class of active mobile elements, inserting new elements through a “copy and paste” mechanism3. While most of these elements have become transcriptionally inactive over time4,5, three mobile element families remain active in humans, including LINE-1 (L1), Alu elements, and SVA. These three mobile element families compose nearly one-third of the human genome2,6. With only a small fraction of these elements retaining the ability to retrotranspose, recent work has shown that germline mobile element insertions in humans are quite rare7. The majority of these insertions seem to have no adverse impact; however, some insertions have been found to cause disease8. Mobile element activity in human disease became a subject of interest after two hemophilia A patients were found to have de novo L1 insertions in the F8 gene9. Since this initial discovery, mobile element insertions have been found to be associated with more than 130 disease cases10.
Somatic mobile element insertions have not been identified in many healthy tissues, though detection of these low-frequency events is difficult without sufficient coverage. An exception to this is in neurons, where somatic mosaicism of L1 insertions have been identified11–14. This increase in activity may be due to slight changes in methylation at L1 loci11. Recently, multiple studies have noted varying degrees of activity of mobile elements in numerous cancer tissues15–22 with some analyses including metastatic samples20,23,24. The level of activity has been found to be quite variable in patients with the same type of cancer and also shows a high degree of variability among cancer types17,20. Regardless of cancer type, roughly half of all tumors have at least one somatic L1 insertion. Breast cancer patients with L1 activity were found to often have a single L1 insertion, with a small number of patients showing up to five insertions20.
The impact of mobile elements on the cancer genome is not limited to somatic insertions but also includes the structural variation (SV) events associated with existing mobile elements (reviewed in25). Notably, mobile elements mediate roughly 10% of all SV events larger than 100 base pairs in the human genome26. Mobile elements have been found to mediate SV events that can lead to cancer development27,28. For example, approximately 42% of the BRCA1 gene is composed of Alu elements29, making this gene a target for non-allelic homologous recombination, specifically Alu-Alu associated events30–33. Additional mobile element events have been classified in breast cancer as well34,35. L1 transduction events have also been examined in a variety of cancers20.
Multiple studies have shown that L1s, in particular, demonstrate increased retrotransposition activity in cancer tissues. However, the timing of these insertions or mobile element associated SVs has not been thoroughly investigated. Additionally, L1s and the structural variants that they mediate have been the focus of most previous studies, though Alu and SVA insertions or the SVs they mediate have been targeted in some studies19,20,22. In this study, we utilize longitudinally sampled breast cancer tissues to better understand the timing and significance of increased mobile element activity on the cancer genome.
Results
To analyze mobile element activity during tumor progression, we used longitudinal whole-genome sequencing (WGS) data from four metastatic breast cancer patients. Tumor samples were obtained from these patients over 2–15 years with varying WGS and bulk RNA-seq timepoints (2–6) available for our analyses. Many of the tumor samples were collected from ascites and the surrounding pleural fluid. Tumor cells in ascites are descended from the primary tumor, though they may contain additional mutations and are likely to be polyclonal. Using ascites to analyze tumors may lead to improved characterization of the tumor and of the clonal changes that occur36,37. Germline DNA was sequenced from the blood of each patient. The WGS timepoints for each patient, and a summary of treatment information over time, are shown in Supplemental Figure 1. More detailed treatment information for each patient can be found in the supplementary information of Brady et al. 201738. All four patients were estrogen receptor (ER) + . Patients 1 and 2 were human epidermal growth factor receptor 2 (HER2) +, while Patients 3 and 4 were HER2− .
We analyzed each of the four patients for mobile element insertions utilizing three tools (MELT, TranSurVeyor, and RUFUS, further described in the “Methods” section). After identifying and filtering mobile element insertions and SVs (see “Methods” section), we generated a list of potential insertion sites. These potential insertion sites were compared to the matched germline sample for each patient to ensure that the identified insertion was a somatic event. Timepoints for each patient were analyzed individually and compared to the matched germline sample. Following our identification and filtering steps, we were able to identify mobile elements and associated variants at both very high Variant Allele Fraction (VAF) (100%) and very low VAF (~ 5%). While the three tools used in this study generally identified the same insertion events, MELT and RUFUS excelled at identifying Alu elements, and TranSurVeyor found an additional L1 insertion that was not identified by the other tools.
Collectively, we identified seven mobile element insertions (either classical or non-classical) and four structural variants involving mobile elements (Table 1). Supplemental Table 1 shows which tools detected each variant. These variants all appear to be somatic mutations acquired during tumorigenesis or during tumor progression because these variants are not present in the corresponding germline samples. IGV images with schematics are shown for three of the events that we identified in Fig. 1, with schematics of the remaining events included as Supplemental Figures 2, 3, and 4. We find that the majority of insertions and variants associated with mobile elements (nine of the eleven) occur before the first sampling timepoint because these were already present in the first and then all subsequent samples. The two variants that were not present in the first sampling timepoint were both SVA insertions. Both SVA insertions identified were present at low frequency in these patients (IGV images shown in Supplemental Figure 5). We also find high variability in the number of insertions and variants that were identified in each of these patients, with three of the patients (Patients 1, 2, and 4) showing multiple insertions and variants and Patient 3 showing only a single structural variant associated with mobile elements.
Table 1.
Patient | Locus | Variant | Region | Genes affected | Present before first tumor timepoint |
---|---|---|---|---|---|
Patient 1 | 1:186,755,400 | Alu (non-classical insertion) | Intergenic | Yes | |
1:29,089,030 | Alu-Alu associated deletion (23 kb) | Exon | YTHDF2 | Yes | |
2:62,111,770 | SVA insertion | Intronic | CCT4 | No | |
17:11,974,341/22:48,343,831 | Alu-associated translocation | Exon | MAP2K4 | Yes | |
Patient 2 | 2:15,168,000 | Alu (non-classical insertion) | Intergenic | Yes | |
8:99,227,400 | L1 insertion | Exon | NIPAL2 | Yes | |
13:19,866,150 | L1 insertion | Intronic | ANKRPD26P3 | Yes | |
Patient 3 | 6:105,643,750 | Alu-Alu associated deletion | Intergenic | Yes | |
Patient 4 | 10:67,431,020 | Alu insertion | Intergenic | Yes | |
11:81,149,160 | L1-associated deletion | Intergenic | Yes | ||
19:23,124,500 | SVA insertion | Intergenic | No |
Insertion or variant sites found within genes (exonic or intronic) are indicated in the “Genes affected” column. Insertion events described as “non-classical insertions” do not show any hallmarks of L1-mediated insertion events.
The use of multiple tools and visual inspection of the identified insertions excluded a large number of false positive calls. In total, MELT identified 8825 potential mobile element variants, and TranSurVeyor identified 69,394 potential mobile element variants. For MELT, we analyzed every polymorphic variant (regardless of genotype) to determine if it was unique to the tumor sample. For TranSurVeyor, we did not use the built-in filtering mechanism to ensure that low frequency variants would be included for further examination. Following this filtering, we retained only those variants that passed visual inspection (see “Methods” section). These 11 variants are shown in Table 1. Counts for the images produced by RUFUS were not included here as RUFUS is designed to detect many types of variants, not exclusively mobile elements.
Most of the identified insertions and variants were found in intergenic regions and are unlikely to impact gene expression as they do not overlap known regulatory or ultra-conserved regions. However, we identified multiple insertions that appeared to impact intronic regions: an SVA insertion in CCT4, which was identified in Patient 1, an L1 insertion in NIPAL2 in Patient 2, and an L1 insertion in ANKRPD26P, which we identified in Patient 2. We also identified two Alu-associated somatic structural variation events that impact exons in two genes: an Alu-Alu associated deletion in YTHDF2 (with ~ 90% homology between the two Alu elements, excluding the poly(A) tail) and a translocation involving Alu elements that impacted MAP2K4. Both of these Alu-associated structural variants occurred in Patient 1 and occurred before the first sampling timepoint but were not present in the germline sequencing data.
To remove contamination from non-tumor cells, we estimated the tumor purity (or cancer cell fraction) of each timepoint. The purity estimates, as well as estimates for genome-wide copy number for each timepoint in each patient, are included in Supplemental Table 2. There are many changes in genomic copy number, with genome-wide estimates ranging from 1.86, just below normal cellular copy number, up to 4.12, more than twice that seen in most normal cells. In these patients, tumor purity ranged from 30.91 to 94.5%, with a mean value of 76.25%. The only formalin-fixed, paraffin-embedded (FFPE) sample in the dataset had the lowest tumor purity value.
We calculated the VAF for each of the mobile element insertions or SVs identified after adjusting for tumor purity (Fig. 2). The VAFs calculated from the adjusted data are slightly higher, but very similar to the VAFs calculated from the unadjusted VAFs (Supplemental Figure 6). Of the four insertions and SVs identified in Patient 1 (Fig. 2A), the deletion in YTHDF2 involving Alu elements, and the translocation in MAP2K4 were present at very high frequency (> 80% for the deletion; 100% for the translocation) at the first timepoint, with decreased frequency over time. The other two identified events were insertions (an Alu and SVA) and were present at much lower frequencies. The SVA identified in this patient was not present at the first two sampling timepoints but was present by the third and remained in the tumor cells in the fourth sampling timepoint. Each of these events show a slight decrease over sampling timepoints. We identified three somatic insertions in Patient 2 (Fig. 2B), two L1 insertions, and one Alu insertion. The L1 insertion on Chromosome 13 was present at a higher frequency than the other two insertions found in this patient. The frequency of these variants closely resembles the frequency of the mobile element insertions seen in Patient 1, but does not reach the high frequency seen in the translocation or deletion of Patient 1.
Of the four individuals sampled, Patient 3 (Fig. 2C) had the fewest variants, a single deletion involving Alu elements (~ 98% homology between the short sequences involved). This deletion was present at a high frequency in the first sampling timepoint (about 75% of reads), but decreased to approximately 50% by the second sampling timepoint. In Patient 4 (Fig. 2D), we identified two insertions, an Alu and SVA, and a single mobile element associated SV, a L1–L1 associated deletion. The first sequenced sampling timepoint for Patient 4 has been excluded in the VAF plot as it was a FFPE sample, decreasing our ability to accurately calculate a VAF for this timepoint. The SVA insertion identified in this patient was not present for the first sampling timepoint but was present for the next two sampling timepoints. By the final sampling timepoint, the SVA was no longer observable in the sequencing data. The L1–L1 associated deletion trends upward in VAF before sharply falling at the final timepoint. The Alu insertion in this patient maintains a steady VAF of ~ 20%.
After identifying the mobile element insertions and the SVs that they mediate in these four patients (Table 1), we analyzed the impact of these variants on the cancer genome. Because both variants that impacted exons were found at high frequency in Patient 1, we examined the genomic copy number present in these regions. Specific genome-wide copy number estimates for the first timepoint in the genome of Patient 1 are shown in Fig. 3. Both the deletion involving Alu elements and the Alu associated translocation appear to have been reduced to only a single copy, contributing to the high frequency for both.
Due to decreased copy number at the MAP2K4 locus, and the translocation interrupting the one remaining copy of MAP2K4 (shown in Fig. 4A), we validated absence of the complete transcript and protein in the cancer cells of Patient 1. Creating a de novo transcript assembly of RNA-seq data for each timepoint in Patient 1 (see “Methods” section), we find only truncated and hybrid transcripts of MAP2K4 (Fig. 4A; Sequences for these transcripts are shown in Supplemental Table 3. To confirm that there was evidence of this translocation in the RNA-seq reads, we manually reviewed reads in the region surrounding the translocation. Multiple RNA-seq reads mapped to either Chromosome 17 or 22, and had a mate that mapped to the other chromosome, with some split reads surrounding the breakpoint. To further validate a lack of protein production by MAP2K4 in the tumor cells of Patient 1, we performed a Western blot using MCF7 cells as a control and tumor cells taken from an ascites sample from Patient 1 (Fig. 4B; uncropped image is shown as Supplemental Figure 7). While the control cells show clear production of MAP2K4, there appears to be a decrease in MAP2K4 in the tumor cells of Patient 1. The remaining production of MAP2K4 may be due to non-tumor cells in the sample. Further, mapping the transcripts from the RNA-seq data in Patient 1 (see “Methods” section), we find an average of only 2.92 TPM that map to MAP2K4. The number of transcripts aligning to MAP2K4 in Patient 1 is much lower than the median value found by GTEx (median 18.05 TPM; the GTEx Portal on 08/28/2020). Additionally, GEPIA39, which uses data from both GTEx and TCGA shows an average of 13.3 TPM for MAP2K4 in cancer and 11.43 TPM for control samples (http://gepia.cancer-pku.cn/detail.php?gene=MAP2K4). Analyzing RNA-seq data from another patient that was not suspected to have any disruption to MAP2K4, we found an average of 14.80 TPM.
Mobile elements can be divided into subfamilies, classified by diagnostic mutations. As different subfamilies may have different expression patterns in cancer, we examined these subfamilies in our patients. Three of the patients in this study had longitudinal bulk RNA-seq data available for further analysis (Supplemental Figure 1). There was no germline RNA-seq data available for these patients, but using SQuIRE we were able to analyze mobile element expression on both subfamily (examining all mobile element transcripts from one mobile element subfamily) and locus-based (specific mobile element loci) levels throughout tumor progression. We saw no subfamily-level change in expression in these three patients for L1, SVA, or Alu elements. At the single-locus level, by comparing the four earliest timepoints of Patient 2 to the latter two timepoints (1016 days; nearly 3 years between the fourth timepoint and the fifth), we see significant expression changes (adjusted p value < 0.05), both increases and decreases, for 337 loci. 125 of these loci are Alu elements, 123 are L1s, and 16 are SVA insertions. The remaining loci largely belong to older mobile elements and LTR families. The complete list of mobile element loci that show significantly different expression over time are shown in Supplemental Table 4. Approximately half of the differentially expressed mobile elements overlapped with genes that were also significantly differentially expressed. The location of these mobile elements that overlap with genes is shown in Supplemental Table 5.
The 264 L1s, Alu elements, and SVAs that were found to be differentially expressed between the first set of timepoints and second set of timepoints in Patient 2 were intersected with a list of regulatory regions in mammary epithelial tissue. Of these 264 mobile elements, 72 overlapped with a total of 87 regulatory regions (Fisher’s exact, p > 0.012). These 87 regulatory regions include a number of CCCTC-binding factor (CTCF) binding sites, open chromatin regions, promoter flanking regions, enhancers, promoters, and transcription factor binding sites. These overlaps are shown in Supplemental Table 6.
Discussion
Using a trio of mobile element and de novo variant detection tools, we identified mobile element insertions and variants in longitudinal WGS data from four breast cancer patients (Supplemental Figure 1). We find that the visual validation step (IGV or other similar tools) reduces the risk of false positive calls and significantly improves the overall accuracy of variant calls (similar to7) (Fig. 1). This is particularly important for cancer genomes because most mobile element detection software is not designed for their complexity. This increased complexity makes the visual inspection step necessary, as thousands of potential mobile element insertions suggested by these programs were not unique to the somatic cells, or were the result of mis-mapping in low complexity or mobile element-rich regions. However, the calls from these programs that passed visual inspection spanned both high variant allele fraction and very low variant allele fraction (Fig. 2), showing that the insertions or variants did not have to be present at a particularly high frequency to be detected. The SV events identified with this pipeline are largely the result of recombination occurring between mobile elements that are already present in the reference genome26,40,41 and not a product of somatic retrotransposition42,43.
Large changes in VAF shown in the current study generally correlate with bottleneck events or large shifts in subclone frequency found previously38. The tumor subclones also change frequency in response to treatment38, which can lead to less stable VAF, a finding shared here when examining mobile element-associated variants over time. The insertions that share similar VAF may be from the same tumor subclone (see the Alu insertion and L1 insertion on Chromosome 8 in Patient 2) (Fig. 2B), and, after adjusting for tumor purity, those that still show a sharp decline in VAF are likely part of a tumor subclone that showed decreased frequency. This decrease in frequency may be due to changes in the number of unique subclones present at later timepoints. The insertions and variants that are present at very high frequency (near 100%) may have occurred early in tumor progression, but these events could also be the result of a bottleneck event or a selective clonal advantage. The identification of these early events may also be influenced by decreased tumor purity in the samples during later timepoints (Supplemental Table 2). In cancers with a greater number of these events, mobile elements may be valuable markers for identifying tumor subclones, similar to their role in population genetics and evolutionary studies44–47. Using these markers in conjunction with SNPs may increase the level of resolution for tumor subclones.
From the summary of the insertions and SVs involving mobile elements identified in four patients (Table 1), we show activity for not only L1s, which we expect based on previous studies16–20,22, but also activity from Alu and SVA elements. Most previous studies have largely focused on L1s and the structural variants that they mediate17,20, though others have identified a small number of Alu insertions22. SVA has been found in a previous study17 and recent work also supports post-zygotic insertions in normal tissue for this mobile element family7. The two SVA insertions identified here are present at low frequency and may have been missed by different filtering methods. We also see variation among patients, with most individuals showing multiple insertions or variants but one showing only a single event that appears to involve an interaction between existing Alu elements. The number of identified insertions is relatively low compared to some cancers that show high mobile element activity, but our results are similar to those seen for L1s in breast cancer20.
The majority of the variants we identified appear to occur early in our sampling timeframe (Table 1, Fig. 2) and likely occur early in tumor progression or development. This suggests that mobile elements may be more active early in cancer and could play a role in tumorigenesis. This supports previous work done in metastatic tumors from multiple patients showing that most insertions that occurred in the primary tumor were reflected in metastases23. However, ascertainment bias may allow for the increased possibility of identifying high frequency mobile elements and SVs. Further, given the lower sequencing depth of the germline in Patient 2 and Patient 4, it is possible that some of the identified events in these patients could be mosaic. In addition to the early insertions and SVs we identified, we find two insertions, both SVAs, that insert later in tumor progression. It is unclear if something unique about SVA insertions causes them to be more active later in cancer, or if this is just a result of our small sample size. Future studies with a larger sample size of patients should attempt to identify SVA insertions to determine if the trend of insertions at later timepoints remains.
Our analysis of RNA-seq data in these patients showed very few subfamily-level expression changes for mobile elements through time. While we did not have control RNA-seq data from the germline, we were able to examine the course of expression change through tumor progression. Of the three patients with RNA-seq data, we did not observe any instances of subfamily expression changes for Alu, SVA, or L1, and only one patient showed a family-level change in expression. This was linked to an increase in HERV activity, which previous studies have reported for HERV-K in various cancers48,49. We were also able to examine the expression of specific mobile element loci in which there were informative reads. Here, using RNA-seq data for Patient 2, which spanned multiple years, we showed that many loci did change expression levels over time. Approximately half of the mobile element loci that showed statistically significant differential expression overlapped genes that were also significantly differentially expressed. Many of the remaining elements were in non-coding regions, though some were found in regulatory regions. Previous work has shown that changes in mobile element methylation and expression can impact nearby gene expression50,51. Some of these expression changes are likely tied to methylation changes due to the high CpG content of mobile elements52. However, differentiating between transposable element transcripts and other transcripts can be challenging, and the patterns shown here may be more indicative of general gene expression change than changes in mobile element expression. We were only able to demonstrate expression changes in Patient 2, who had the most RNA-seq timepoints over the longest course of time. Future studies should examine locus-level expression over time, with germline control comparisons to determine how these expression levels compare with normal cells over longer sampling timepoints.
The early SV events identified in Patient 1 both appear to be associated with Alu elements and to affect the coding sequence. The affected genes (MAP2K4 and YTHDF2) have both been implicated in cancer development53–57. The deletion we uncovered in YTHDF2 removes the final exon that would be present in the primary transcript. Though there have been reports of this gene acting as a tumor suppressor, it is not listed in the Catalogue of Somatic Mutations in Cancer (COSMIC)58, and further work likely needs to be completed to validate its role in cancer. The translocation that we identified in Patient 1 interrupted MAP2K4, a gene with far more supporting evidence that suggests it plays a role in catalyzing tumor development or metastasis53,59,60. MAP2K4 is expressed in mammary tissue (median 18.05 TPM; the GTEx Portal on 08/28/2020), and missense and nonsense mutations have been identified at low frequency as part of The Cancer Genome Atlas (TCGA Research Network: https://www.cancer.gov/tcga). The translocation breakpoint on Chromosome 17 occurs after the third exon of MAP2K4 (Fig. 4A), with the split reads in this region showing evidence of an Alu element. The breakpoint on Chromosome 22 disrupts a reference Alu element, and it is likely that this translocation was associated with an Alu interaction. There have also been previous examples of translocations and recombination events associated with mobile elements in cancer61,62. MAP2K4 is listed in COSMIC as potentially having a role in both tumor suppression and as an oncogene, depending on its expression level.
We find decreased copy number along multiple regions of Chromosome 17, including the region that contains MAP2K4 (Fig. 3). Our findings support a two-hit model, where a copy of MAP2K4 was lost, and the remaining allele was disrupted by the Alu-associated translocation described here. With only a single copy of this gene remaining, the translocation renders MAP2K4 inactive, leading to the decrease in production of the protein shown in the Western blot (Fig. 4B). Further analysis of the RNA-seq data for this patient showed that there were no complete transcripts for MAP2K4. MAP2K4 is found in the JNK pathway, which is responsible for numerous cellular functions63. Previous work in breast cancer tissue has identified multiple mutations along this pathway (including MAP2K4 and MAP2K7)54 and suggests that this mutation could act as a driver by altering function of the JNK pathway. MAP3K1 is another commonly mutated gene in this pathway and is responsible for the signaling step prior to MAP2K454. Mutations in these crucial genes can lead to changes in cell proliferation and the ability to escape from apoptosis, both of which are commonly seen in cancer. Brady et al. first identified this structural variant in MAP2K4 in this patient, but by analyzing mobile elements in these patients we have been able to better understand the cause of the mutation.
Overall, we find that most mobile element insertions and the structural variation events (between reference mobile elements) they mediate appear to occur early in tumor development, and most of these early events appear to be passenger mutations. In addition to these passenger mutations, we find SV events involving mobile elements that disrupt the coding sequence of known (MAP2K4) and suspected (YTHDF2) driver genes in breast cancer. We identified a number of L1, Alu, and SVA insertions occurring during tumor progression. As most studies attempt to identify only L1 insertions and structural variants involving L1s, we may be underestimating the impact that mobile elements have on mediating driver mutations in cancer. However, our sample size is small and may not be representative of the activity of these mobile elements in a larger sample size. Future studies should examine other types of cancer for the patterns and impact of mobile element insertions to determine which cancers have an increase in Alu and SVA activity, as others have done with L1 insertions. Improving our understanding of mobile element insertions and structural variation events in cancer could enhance our ability to identify tumor subclones and our understanding of the mutational landscape in cancer.
Methods
Sequencing data
Information regarding the acquisition of patient samples, sequencing data, as well as quality control and alignment information, can be found in Brady et al. 201738. Blood-derived DNA was sequenced and used as the germline DNA sample. Tumor samples were sequenced from ascites and pleural fluid surrounding the breast tumor. DNA sequencing data had previous been aligned to hg19. Each patient had multiple tumor samples from different timepoints throughout their treatment: Patient 1 had four samples, Patient 2 had six samples, Patient 3 had two samples, and Patient 4 had three samples that were examined. Data from the Brady et al. 2017 publication are available with controlled access on the European Genome-phenome Archive (EGA) under accession EGAS00001002436. Informed consent was obtained from all patients in the original study. Protocols were approved by the University of Utah Institutional Review Board. Data used in this study were publicly available, and experiments were performed in accordance with relevant guidelines and regulations. Coverage for each WGS timepoint was calculated using covstats from the goleft package (https://github.com/brentp/goleft).
Mobile element insertions and variants identification
We used the Mobile Element Locator Tool (MELT) (Version 2.1.5)64, RUFUS (https://github.com/jandrewrfarrell/RUFUS), where possible, and TranSurVeyor (Version 1.0)65 for the identification of mobile elements and related structural variants in our longitudinal breast cancer samples. Each of these three tools incorporates different methods for identifying variants. TranSurVeyor and MELT are tools used to detect mobile element insertions, but these tools use discordant and split reads as part of their algorithm and returned a number of SVs that were identified during our visual inspection step. RUFUS is an alignment-free, k-mer based algorithm that compares reads between the control samples (germline DNA) and the sample of interest (tumor DNA). Through this approach, RUFUS is capable of detecting many types of variants, including SNVs, SVs, and small indels (Described in66). We considered any structural variant (deletions, complex events, and other rearrangements) that included a reference transposable element for further examination. Because these tools use different methods, we required only a single tool to identify a potential insertion. Insertions and variants identified by these programs were filtered to include only those that were called as absent in the germline sequencing data. MELT identified 8825 variants, while TranSurVeyor identified a total of 69,394 variants in these four patients. The total number of variants identified with RUFUS was not calculated because it is designed to look for many types of variants, not only mobile elements. Visual inspection was performed to ensure that the sequences matched a known mobile element sequence using the Integrated Genome Viewer (IGV, Version 2.4.13)67. Through this visual inspection step, > 70,000 IGV images (as there were multiple individuals included in the MELT analysis) were examined from MELT and TranSurveyor. This step helped to prevent the inclusion of reads that had simply mis-mapped to an incorrect region of the genome. Those insertions that showed any evidence of being present in the germline DNA sample, or those loci that did not show any signs of mobile element activity (low complexity repeats or poorly sequenced regions), were discarded. Reads that appeared to have discordant and/or split reads that mapped to a mobile element were included for further validation. To ensure that the potential mobile element insertions or mobile element-associated structural variants were, in fact, mobile element sequences, we used both BLAT68 and RepeatMasker69. Where possible, we used classic hallmarks of retrotransposition events (target site duplications and poly(A) tails) to ensure that we had identified an insertion. We validated candidate structural variation events that appeared to involve mobile elements with Lumpy (Version 0.2.13)70 and we used IGV validation to identify signs of mobile element involvement (breakpoints in or near existing mobile elements, and small regions of microhomology). While the limit of detection of this pipeline is largely determined by the mobile element detection tools, we were able to identify mobile elements and SVs with VAFs of ~ 5%. BEDTools71 intersect was used to determine if the identified variants overlapped with ultraconserved non-coding regions or regulatory elements. Ultraconserved elements and regulatory blocks locations were obtained from UCNEbase (https://ccg.epfl.ch/UCNEbase/).72 The UCSC Genome Browser (ENCODE Regulation track) was manually checked to ensure no other regulatory elements overlapped the insertion site.
Variant allele fraction
Variant allele fraction was calculated by counting the number of reads showing evidence of the insertion or variant in IGV and dividing this by the total number of reads at the breakpoint of the insertion or variant. The variant allele fraction was then adjusted by multiplying the total number of reads at the breakpoint by the tumor purity (or cancer cell fraction) value. Following this, the number of reads containing the variant were divided by the adjusted estimate of the total number of reads that were derived from tumor cells. The tumor purity estimates were calculated using FACETS73, which utilizes SNPs in germline and tumor samples, as well as copy number changes, to provide an estimate of the proportion of tumor cells in the sample. FACETS was also used to determine the location of copy number changes throughout the genome, and particularly at the loci at which we identified variants.
RNA-seq analysis and mobile element expression
Mobile element expression was measured on a subfamily-specific level and a locus specific level using SQuIRE (Version 0.9.9.9a-beta)74. We compared the first timepoint for each patient to later timepoints to understand how expression patterns changed throughout cancer progression for both subfamily-level and locus-level expression. Additionally, where possible, we compared multiple early sampling timepoints with later sampling timepoints to determine if there was a change during cancer progression. SQuIRE was run according to the documentation provided. Briefly, RNA-seq data were aligned to hg38 using STAR (Version 2.5.3 a)75; counts of gene expression and mobile element expression were then generated to quantify expression. DESeq2 (Version 1.16.1)76 was then used to generate calls for differentially expressed subfamilies and loci. For the locus-specific expression changes at mobile element loci, we intersected these loci with mammary epithelial tissue regulatory regions from Ensembl77 using BEDTools.
Trinity (Version 2.8.5)78 was run on bulk RNA-seq data from the first timepoint of Patient 1 to create a library of de novo transcripts. The resulting transcripts were aligned to the transcript of MAP2K4 using BLAT. Transcripts that aligned to MAP2K4 were further examined to determine which portions of the MAP2K4 transcript were being produced. Transcripts that aligned to multiple regions of the genome with > 95% accuracy were not included.
Salmon (Version 1.4.0)79 was run on RNA-seq timepoints from Patient 1 and Patient 2. The data were converted to gene-level annotations, and the transcripts per million (TPM) for MAP2K4 were counted. The TPM values for each timepoint for Patient 1 were averaged, and the process was repeated for three timepoints in Patient 2.
DNA extraction and PCR validation
For validation of our detection methods, PCR amplification was run on a potential L1 and a potential Alu insertion in Patient 2, the only patient for whom we had blood-derived germline DNA and were able to extract DNA from cancer cells (ascites). DNA extraction was performed using Qiagen DNeasy Blood and Tissue Kit (50) (Cat No./ID: 69504). PCR amplifications of 25 ng of germline DNA and 25 ng of tumor DNA were performed in 25-µL reactions using Phusion Hot Start Flex DNA polymerase. Initial denaturation was performed for 30 s at 98 °C, with 40 cycles of denaturation for 10 s at 98 °C, the optimal annealing temperature of 60 °C for 30 s, followed by a 2-min extension at 72 °C, and a final extension for 5 min at 72 °C. The reaction was performed with a negative control (water), the tumor DNA, and the matched germline DNA. Amplicons were run on a 2% agarose gel with ethidium bromide for approximately 90 min at 100 V. The gel was imaged using a Fotodyne Analyst Investigator Eclipse imager. The primer sets used for these reactions are listed in Supplemental Table 7, and the corresponding gel images are shown in Supplemental Figures 3 and 4.
Immunoprecipitation and immunoblotting
Patient ascites samples were grown in Human Breast Epithelial Cell Culture Complete Media (Celprogen M36056-01S); MCF7 whole cell lysate was purchased (Novus). Cells were lysed with ThermoFisher Co-IP lysis buffer. MKK4 (1:100 CST) and GAPDH (1:300 ProteinTech) antibody were incubated overnight at 4C with 1000 µg of protein lysate. Immunoprecipitation was performed according to Pierce™ Classic Magnetic IP/Co-IP Kit protocol. Twenty-eight ug of immunoprecipitated protein was loaded per well of NuPAGE 10% Bis–Tris protein gels. After transfer to PVDF membrane, membranes were blocked with 5% milk:PBST, then incubated with primary antibody (MKK4 1:500, GAPDH 1:000 ThermoFisher) overnight at 4 °C. Blots were washed with PBST and incubated with goat anti-rabbit conjugated to HRP secondary antibody for 60 min at room temperature. Blots were then incubated in chemiluminescent substrate enhancer (BioRad) and visualized using x-ray film. Intensity quantification was performed with Fiji, with the GAPDH band for MCF7 quantified as 78 pixels and the GAPDH band for the patient as 93 pixels.
Uncropped images of the Western blot are shown in Supplemental Figure 7. The Western blot image shown in the main paper has been cropped to show important bands. GAPDH for the patient sample and MCF7 cells were run on the same gel, but not run adjacent to one another. The image has been edited to show them directly adjacent. Membrane edges in the figures are not visible due to the X-ray film used for imaging. The membranes were cut post-antibody incubation to get a clearer image of the bands.
Supplementary information
Acknowledgements
This work was supported by the NIH R35GM118335 (to LBJ) and the NIH/NRSA T32HG008962 (to CJS) from the NHGRI. We would like to thank the Biorepository and Molecular Pathology resource at the Huntsman Cancer Institute for their assistance in accessing samples from these patients. We would also like to thank members of both the Jorde and Marth labs for helpful feedback and discussion.
Author contributions
C.J.S., G.M., and L.B.J. designed the research; Y.Q. assisted with data acquisition and interpretation; C.J.S. and J.E.F. identified and reviewed mobile element calls; K.L.R. performed Western blot validation; C.J.S. performed PCR validation; S.V.T. contributed to experimental design and manuscript editing; C.J.S. wrote the first draft of the manuscript with all authors editing and contributing to the final version.
Data availability
Longitudinal data from the Brady et al. 2017 publication are available with controlled access on the European Genome-phenome Archive (EGA) under accession EGAS00001002436.
Competing interests
The authors declare no competing interests.
Footnotes
Publisher's note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary Information
The online version contains supplementary material available at 10.1038/s41598-021-92444-0.
References
- 1.de Koning AP, Gu W, Castoe TA, Batzer MA, Pollock DD. Repetitive elements may comprise over two-thirds of the human genome. PLoS Genet. 2011;7:e1002384. doi: 10.1371/journal.pgen.1002384. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 2.Lander ES, Linton LM, Birren B, Nusbaum C, Zody MC, Baldwin J, Devon K, Dewar K, Doyle M, FitzHugh W, et al. Initial sequencing and analysis of the human genome. Nature. 2001;409:860–921. doi: 10.1038/35057062. [DOI] [PubMed] [Google Scholar]
- 3.Boeke JD, Garfinkel DJ, Styles CA, Fink GR. Ty elements transpose through an RNA intermediate. Cell. 1985;40:491–500. doi: 10.1016/0092-8674(85)90197-7. [DOI] [PubMed] [Google Scholar]
- 4.Beck CR, Collier P, Macfarlane C, Malig M, Kidd JM, Eichler EE, Badge RM, Moran JV. LINE-1 retrotransposition activity in human genomes. Cell. 2010;141:1159–1170. doi: 10.1016/j.cell.2010.05.021. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 5.Brouha B, Schustak J, Badge RM, Lutz-Prigge S, Farley AH, Moran JV, Kazazian HH., Jr Hot L1s account for the bulk of retrotransposition in the human population. Proc. Natl. Acad. Sci. U. S. A. 2003;100:5280–5285. doi: 10.1073/pnas.0831042100. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 6.Xing J, Witherspoon DJ, Jorde LB. Mobile element biology: New possibilities with high-throughput sequencing. Trends Genet. 2013;29:280–289. doi: 10.1016/j.tig.2012.12.002. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 7.Feusier J, Watkins WS, Thomas J, Farrell A, Witherspoon DJ, Baird L, Ha H, Xing J, Jorde LB. Pedigree-based estimation of human mobile element retrotransposition rates. Genome Res. 2019;29:1567–1577. doi: 10.1101/gr.247965.118. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 8.Hancks DC, Kazazian HH. Roles for retrotransposon insertions in human disease. Mob. DNA. 2016;7:9. doi: 10.1186/s13100-016-0065-9. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 9.Kazazian HH, Jr, Wong C, Youssoufian H, Scott AF, Phillips DG, Antonarakis SE. Haemophilia A resulting from de novo insertion of L1 sequences represents a novel mechanism for mutation in man. Nature. 1988;332:164–166. doi: 10.1038/332164a0. [DOI] [PubMed] [Google Scholar]
- 10.Kazazian HH, Jr, Moran JV. Mobile DNA in health and disease. N. Engl. J. Med. 2017;377:361–370. doi: 10.1056/NEJMra1510092. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 11.Coufal NG, Garcia-Perez JL, Peng GE, Yeo GW, Mu Y, Lovci MT, Morell M, O'Shea KS, Moran JV, Gage FH. L1 retrotransposition in human neural progenitor cells. Nature. 2009;460:1127–1131. doi: 10.1038/nature08248. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 12.Muotri AR, Chu VT, Marchetto MCN, Deng W, Moran JV, Gage FH. Somatic mosaicism in neuronal precursor cells mediated by L1 retrotransposition. Nature. 2005;435:903–910. doi: 10.1038/nature03663. [DOI] [PubMed] [Google Scholar]
- 13.Erwin JA, Marchetto MC, Gage FH. Mobile DNA elements in the generation of diversity and complexity in the brain. Nat. Rev. Neurosci. 2014;15:497–506. doi: 10.1038/nrn3730. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 14.Richardson SR, Morell S, Faulkner GJ. L1 retrotransposons and somatic mosaicism in the brain. Annu. Rev. Genet. 2014;48:1–27. doi: 10.1146/annurev-genet-120213-092412. [DOI] [PubMed] [Google Scholar]
- 15.Doucet O, Hare TT, Rodić N, Sharma R, Darbari I, Abril G, Choi JA, Young Ahn J, Cheng Y, Anders RA, et al. LINE-1 expression and retrotransposition in Barrett’s esophagus and esophageal carcinoma. Proc. Natl. Acad. Sci. 2015;112:E4894. doi: 10.1073/pnas.1502474112. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 16.Doucet-O'Hare TT, Sharma R, Rodić N, Anders RA, Burns KH, Kazazian HH., Jr Somatically acquired LINE-1 insertions in normal esophagus undergo clonal expansion in esophageal squamous cell carcinoma. Hum. Mutat. 2016;37:942–954. doi: 10.1002/humu.23027. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 17.Helman E, Lawrence MS, Stewart C, Sougnez C, Getz G, Meyerson M. Somatic retrotransposition in human cancer revealed by whole-genome and exome sequencing. Genome Res. 2014;24:1053–1063. doi: 10.1101/gr.163659.113. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 18.Li Y, Roberts ND, Wala JA, Shapira O, Schumacher SE, Kumar K, Khurana E, Waszak S, Korbel JO, Haber JE, et al. Patterns of somatic structural variation in human cancer genomes. Nature. 2020;578:112–121. doi: 10.1038/s41586-019-1913-9. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 19.Rodriguez-Martin B, Alvarez EG, Baez-Ortega A, Zamora J, Supek F, Demeulemeester J, Santamarina M, Ju YS, Temes J, Garcia-Souto D, et al. Pan-cancer analysis of whole genomes identifies driver rearrangements promoted by LINE-1 retrotransposition. Nat. Genet. 2020;52:306–319. doi: 10.1038/s41588-019-0562-0. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 20.Tubio JMC, Li Y, Ju YS, Martincorena I, Cooke SL, Tojo M, Gundem G, Pipinikas CP, Zamora J, Raine K, et al. Mobile DNA in cancer: Extensive transduction of nonrepetitive DNA mediated by L1 retrotransposition in cancer genomes. Science. 2014;345:1251343. doi: 10.1126/science.1251343. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 21.Burns KH. Transposable elements in cancer. Nat. Rev. Cancer. 2017;17:415–424. doi: 10.1038/nrc.2017.35. [DOI] [PubMed] [Google Scholar]
- 22.Lee E, Iskow R, Yang L, Gokcumen O, Haseley P, Luquette LJ, 3rd, Lohr JG, Harris CC, Ding L, Wilson RK, et al. Landscape of somatic retrotransposition in human cancers. Science. 2012;337:967–971. doi: 10.1126/science.1222077. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 23.Ewing AD, Gacita A, Wood LD, Ma F, Xing D, Kim M-S, Manda SS, Abril G, Pereira G, Makohon-Moore A, et al. Widespread somatic L1 retrotransposition occurs early during gastrointestinal cancer evolution. Genome Res. 2015;25:1536–1545. doi: 10.1101/gr.196238.115. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 24.Rodić N, Sharma R, Sharma R, Zampella J, Dai L, Taylor MS, Hruban RH, Iacobuzio-Donahue CA, Maitra A, Torbenson MS, et al. Long interspersed element-1 protein expression is a hallmark of many human cancers. Am. J. Pathol. 2014;184:1280–1286. doi: 10.1016/j.ajpath.2014.01.007. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 25.Cordaux R, Batzer MA. The impact of retrotransposons on human genome evolution. Nat. Rev. Genet. 2009;10:691–703. doi: 10.1038/nrg2640. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 26.Xing J, Zhang Y, Han K, Salem AH, Sen SK, Huff CD, Zhou Q, Kirkness EF, Levy S, Batzer MA, Jorde LB. Mobile elements create structural variation: Analysis of a complete human genome. Genome Res. 2009;19:1516–1526. doi: 10.1101/gr.091827.109. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 27.Hsieh S-Y, Chen W-Y, Yeh T-S, Sheen IS, Huang S-F. High-frequency Alu-mediated genomic recombination/deletion within the caspase-activated DNase gene in human hepatoma. Oncogene. 2005;24:6584–6589. doi: 10.1038/sj.onc.1208803. [DOI] [PubMed] [Google Scholar]
- 28.Mauillon JL, Michel P, Limacher J-M, Latouche J-B, Dechelotte P, Charbonnier F, Martin C, Moreau V, Metayer J, Paillot B, Frebourg T. Identification of novel germline <em>hMLH1</em> mutations including a 22 kb Alu-mediated deletion in patients with familial colorectal cancer. Can. Res. 1996;56:5728. [PubMed] [Google Scholar]
- 29.Welcsh PL, King M-C. BRCA1 and BRCA2 and the genetics of breast and ovarian cancer. Hum. Mol. Genet. 2001;10:705–713. doi: 10.1093/hmg/10.7.705. [DOI] [PubMed] [Google Scholar]
- 30.Peixoto A, Pinheiro M, Massena L, Santos C, Pinto P, Rocha P, Pinto C, Teixeira MR. Genomic characterization of two large Alu-mediated rearrangements of the BRCA1 gene. J. Hum. Genet. 2013;58:78–83. doi: 10.1038/jhg.2012.137. [DOI] [PubMed] [Google Scholar]
- 31.Petrij-Bosch A, Peelen T, van Vliet M, van Eijk R, Olmer R, Drusedau M, Hogervorst FB, Hageman S, Arts PJ, Ligtenberg MJ, et al. BRCA1 genomic deletions are major founder mutations in Dutch breast cancer patients. Nat. Genet. 1997;17:341–345. doi: 10.1038/ng1197-341. [DOI] [PubMed] [Google Scholar]
- 32.Puget N, Torchard D, Serova-Sinilnikova OM, Lynch HT, Feunteun J, Lenoir GM, Mazoyer S. A 1-kb Alu-mediated germ-line deletion removing <em>BRCA1</em> exon 17. Can. Res. 1997;57:828. [PubMed] [Google Scholar]
- 33.Rohlfs EM, Puget N, Graham ML, Weber BL, Garber JE, Skrzynia C, Halperin JL, Lenoir GM, Silverman LM, Mazoyer S. An Alu-mediated 7.1 kb deletion of BRCA1 exons 8 and 9 in breast and ovarian cancer families that results in alternative splicing of exon 10. Genes Chromosomes Cancer. 2000;28:300–307. doi: 10.1002/1098-2264(200007)28:3<300::AID-GCC8>3.0.CO;2-1. [DOI] [PubMed] [Google Scholar]
- 34.Morse B, Rotherg PG, South VJ, Spandorfer JM, Astrin SM. Insertional mutagenesis of the myc locus by a LINE-1 sequence in a human breast carcinoma. Nature. 1988;333:87–90. doi: 10.1038/333087a0. [DOI] [PubMed] [Google Scholar]
- 35.Walsh T, Casadei S, Coats KH, Swisher E, Stray SM, Higgins J, Roach KC, Mandell J, Lee MK, Ciernikova S, et al. Spectrum of mutations in BRCA1, BRCA2, CHEK2, and TP53 in families at high risk of breast cancer. JAMA. 2006;295:1379–1388. doi: 10.1001/jama.295.12.1379. [DOI] [PubMed] [Google Scholar]
- 36.Choi YJ, Rhee JK, Hur SY, Kim MS, Lee SH, Chung YJ, Kim TM, Lee SH. Intraindividual genomic heterogeneity of high-grade serous carcinoma of the ovary and clinical utility of ascitic cancer cells for mutation profiling. J. Pathol. 2017;241:57–66. doi: 10.1002/path.4819. [DOI] [PubMed] [Google Scholar]
- 37.Husain H, Nykin D, Bui N, Quan D, Gomez G, Woodward B, Venkatapathy S, Duttagupta R, Fung E, Lippman SM, Kurzrock R. Cell-free DNA from ascites and pleural effusions: Molecular insights into genomic aberrations and disease biology. Mol. Cancer Therap. 2017;16:948–955. doi: 10.1158/1535-7163.MCT-16-0436. [DOI] [PubMed] [Google Scholar]
- 38.Brady SW, McQuerry JA, Qiao Y, Piccolo SR, Shrestha G, Jenkins DF, Layer RM, Pedersen BS, Miller RH, Esch A, et al. Combating subclonal evolution of resistant cancer phenotypes. Nat. Commun. 2017;8:1231. doi: 10.1038/s41467-017-01174-3. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 39.Tang Z, Li C, Kang B, Gao G, Li C, Zhang Z. GEPIA: A web server for cancer and normal gene expression profiling and interactive analyses. Nucleic Acids Res. 2017;45:W98–W102. doi: 10.1093/nar/gkx247. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 40.Gu W, Zhang F, Lupski JR. Mechanisms for human genomic rearrangements. PathoGenetics. 2008;1:4. doi: 10.1186/1755-8417-1-4. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 41.Kolomietz E, Meyn MS, Pandita A, Squire JA. The role of Alu repeat clusters as mediators of recurrent chromosomal aberrations in tumors. Genes Chromosomes Cancer. 2002;35:97–112. doi: 10.1002/gcc.10111. [DOI] [PubMed] [Google Scholar]
- 42.Gilbert N, Lutz S, Morrish TA, Moran JV. Multiple fates of L1 retrotransposition intermediates in cultured human cells. Mol. Cell Biol. 2005;25:7780–7795. doi: 10.1128/MCB.25.17.7780-7795.2005. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 43.Symer DE, Connelly C, Szak ST, Caputo EM, Cost GJ, Parmigiani G, Boeke JD. Human l1 retrotransposition is associated with genetic instability in vivo. Cell. 2002;110:327–338. doi: 10.1016/S0092-8674(02)00839-5. [DOI] [PubMed] [Google Scholar]
- 44.Steely CJ, Walker JA, Jordan VE, Beckstrom TO, McDaniel CL, St Romain CP, Bennett EC, Robichaux A, Clement BN, Raveendran M, et al. Alu insertion polymorphisms as evidence for population structure in baboons. Genome Biol. Evol. 2017;9:2418–2427. doi: 10.1093/gbe/evx184. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 45.Watkins WS, Feusier JE, Thomas J, Goubert C, Mallick S, Jorde LB. The Simons Genome Diversity Project: A global analysis of mobile element diversity. Genome Biol. Evol. 2020;12:779–794. doi: 10.1093/gbe/evaa086. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 46.Watkins WS, Rogers AR, Ostler CT, Wooding S, Bamshad MJ, Brassington AM, Carroll ML, Nguyen SV, Walker JA, Prasad BV, et al. Genetic variation among world populations: Inferences from 100 Alu insertion polymorphisms. Genome Res. 2003;13:1607–1618. doi: 10.1101/gr.894603. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 47.Witherspoon DJ, Zhang Y, Xing J, Watkins WS, Ha H, Batzer MA, Jorde LB. Mobile element scanning (ME-Scan) identifies thousands of novel Alu insertions in diverse human populations. Genome Res. 2013;23:1170–1181. doi: 10.1101/gr.148973.112. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 48.Ma W, Hong Z, Liu H, Chen X, Ding L, Liu Z, Zhou F, Yuan Y. Human endogenous retroviruses-K (HML-2) expression is correlated with prognosis and progress of hepatocellular carcinoma. Biomed. Res. Int. 2016;2016:8201642–8201642. doi: 10.1155/2016/8201642. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 49.Wallace TA, Downey RF, Seufert CJ, Schetter A, Dorsey TH, Johnson CA, Goldman R, Loffredo CA, Yan P, Sullivan FJ, et al. Elevated HERV-K mRNA expression in PBMC is associated with a prostate cancer diagnosis particularly in older men and smokers. Carcinogenesis. 2014;35:2074–2083. doi: 10.1093/carcin/bgu114. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 50.Jang HS, Shah NM, Du AY, Dailey ZZ, Pehrsson EC, Godoy PM, Zhang D, Li D, Xing X, Kim S, et al. Transposable elements drive widespread expression of oncogenes in human cancers. Nat. Genet. 2019;51:611–617. doi: 10.1038/s41588-019-0373-3. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 51.Xie M, Hong C, Zhang B, Lowdon RF, Xing X, Li D, Zhou X, Lee HJ, Maire CL, Ligon KL, et al. DNA hypomethylation within specific transposable element families associates with tissue-specific enhancer landscape. Nat. Genet. 2013;45:836–841. doi: 10.1038/ng.2649. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 52.Yoder JA, Walsh CP, Bestor TH. Cytosine methylation and the ecology of intragenomic parasites. Trends Genet. 1997;13:335–340. doi: 10.1016/S0168-9525(97)01181-5. [DOI] [PubMed] [Google Scholar]
- 53.Ahn Y-H, Yang Y, Gibbons DL, Creighton CJ, Yang F, Wistuba II, Lin W, Thilaganathan N, Alvarez CA, Roybal J, et al. Map2k4 functions as a tumor suppressor in lung adenocarcinoma and inhibits tumor cell invasion by decreasing peroxisome proliferator-activated receptor γ2 expression. Mol. Cell. Biol. 2011;31:4270–4285. doi: 10.1128/MCB.05562-11. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 54.Koboldt DC, Fulton RS, McLellan MD, Schmidt H, Kalicki-Veizer J, McMichael JF, Fulton LL, Dooling DJ, Ding L, Mardis ER, et al. Comprehensive molecular portraits of human breast tumours. Nature. 2012;490:61–70. doi: 10.1038/nature11412. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 55.Su GH, Song JJ, Repasky EA, Schutte M, Kern SE. Mutation rate of MAP2K4/MKK4 in breast carcinoma. Hum. Mutat. 2002;19:81. doi: 10.1002/humu.9002. [DOI] [PubMed] [Google Scholar]
- 56.Xue Z, Vis DJ, Bruna A, Sustic T, van Wageningen S, Batra AS, Rueda OM, Bosdriesz E, Caldas C, Wessels LFA, Bernards R. MAP3K1 and MAP2K4 mutations are associated with sensitivity to MEK inhibitors in multiple cancer models. Cell Res. 2018;28:719–729. doi: 10.1038/s41422-018-0044-4. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 57.Zhong L, Liao D, Zhang M, Zeng C, Li X, Zhang R, Ma H, Kang T. YTHDF2 suppresses cell proliferation and growth via destabilizing the EGFR mRNA in hepatocellular carcinoma. Cancer Lett. 2019;442:252–261. doi: 10.1016/j.canlet.2018.11.006. [DOI] [PubMed] [Google Scholar]
- 58.Tate JG, Bamford S, Jubb HC, Sondka Z, Beare DM, Bindal N, Boutselakis H, Cole CG, Creatore C, Dawson E, et al. COSMIC: The catalogue of somatic mutations in cancer. Nucleic Acids Res. 2018;47:D941–D947. doi: 10.1093/nar/gky1015. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 59.Teng DH, Perry WL, 3rd, Hogan JK, Baumgard M, Bell R, Berry S, Davis T, Frank D, Frye C, Hattier T, et al. Human mitogen-activated protein kinase kinase 4 as a candidate tumor suppressor. Cancer Res. 1997;57:4177–4182. [PubMed] [Google Scholar]
- 60.Pavese JM, Ogden IM, Voll EA, Huang X, Xu L, Jovanovic B, Bergan RC. Mitogen-activated protein kinase kinase 4 (MAP2K4) promotes human prostate cancer metastasis. PLoS ONE. 2014;9:e102289. doi: 10.1371/journal.pone.0102289. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 61.Elliott B, Richardson C, Jasin M. Chromosomal translocation mechanisms at intronic Alu elements in mammalian cells. Mol. Cell. 2005;17:885–894. doi: 10.1016/j.molcel.2005.02.028. [DOI] [PubMed] [Google Scholar]
- 62.Onno M, Nakamura T, Hillova J, Hill M. Rearrangement of the human tre oncogene by homologous recombination between Alu repeats of nucleotide sequences from two different chromosomes. Oncogene. 1992;7:2519–2523. [PubMed] [Google Scholar]
- 63.Johnson GL, Lapadat R. Mitogen-Activated protein kinase pathways mediated by ERK, JNK, and p38 protein kinases. Science. 1911;2002:298. doi: 10.1126/science.1072682. [DOI] [PubMed] [Google Scholar]
- 64.Gardner EJ, Lam VK, Harris DN, Chuang NT, Scott EC, Pittard WS, Mills RE, Devine SE. The mobile element locator tool (MELT): Population-scale mobile element discovery and biology. Genome Res. 2017;27:1916–1929. doi: 10.1101/gr.218032.116. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 65.Rajaby R, Sung WK. TranSurVeyor: An improved database-free algorithm for finding non-reference transpositions in high-throughput sequencing data. Nucleic Acids Res. 2018;46:e122. doi: 10.1093/nar/gky685. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 66.Ostrander BEP, Butterfield RJ, Pedersen BS, Farrell AJ, Layer RM, Ward A, Miller C, DiSera T, Filloux FM, Candee MS, et al. Whole-genome analysis for effective clinical diagnosis and gene discovery in early infantile epileptic encephalopathy. NPJ Genom. Med. 2018;3:22. doi: 10.1038/s41525-018-0061-8. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 67.Robinson JT, Thorvaldsdóttir H, Winckler W, Guttman M, Lander ES, Getz G, Mesirov JP. Integrative genomics viewer. Nat. Biotechnol. 2011;29:24–26. doi: 10.1038/nbt.1754. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 68.Kent WJ. BLAT—The BLAST-like alignment tool. Genome Res. 2002;12:656–664. doi: 10.1101/gr.229202. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 69.RepeatMasker Open-4.0. http://www.repeatmasker.org
- 70.Layer RM, Chiang C, Quinlan AR, Hall IM. LUMPY: A probabilistic framework for structural variant discovery. Genome Biol. 2014;15:R84. doi: 10.1186/gb-2014-15-6-r84. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 71.Quinlan AR, Hall IM. BEDTools: A flexible suite of utilities for comparing genomic features. Bioinformatics. 2010;26:841–842. doi: 10.1093/bioinformatics/btq033. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 72.Dimitrieva S, Bucher P. UCNEbase—A database of ultraconserved non-coding elements and genomic regulatory blocks. Nucleic Acids Res. 2013;41:D101–D109. doi: 10.1093/nar/gks1092. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 73.Shen R, Seshan VE. FACETS: Allele-specific copy number and clonal heterogeneity analysis tool for high-throughput DNA sequencing. Nucleic Acids Res. 2016;44:e131. doi: 10.1093/nar/gkw520. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 74.Yang WR, Ardeljan D, Pacyna CN, Payer LM, Burns KH. SQuIRE reveals locus-specific regulation of interspersed repeat expression. Nucleic Acids Res. 2019;47:e27–e27. doi: 10.1093/nar/gky1301. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 75.Dobin A, Davis CA, Schlesinger F, Drenkow J, Zaleski C, Jha S, Batut P, Chaisson M, Gingeras TR. STAR: Ultrafast universal RNA-seq aligner. Bioinformatics (Oxford, England) 2013;29:15–21. doi: 10.1093/bioinformatics/bts635. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 76.Love MI, Huber W, Anders S. Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2. Genome Biol. 2014;15:550. doi: 10.1186/s13059-014-0550-8. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 77.Aken BL, Achuthan P, Akanni W, Amode MR, Bernsdorff F, Bhai J, Billis K, Carvalho-Silva D, Cummins C, Clapham P, et al. Ensembl 2017. Nucleic Acids Res. 2016;45:D635–D642. doi: 10.1093/nar/gkw1104. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 78.Grabherr MG, Haas BJ, Yassour M, Levin JZ, Thompson DA, Amit I, Adiconis X, Fan L, Raychowdhury R, Zeng Q, et al. Full-length transcriptome assembly from RNA-Seq data without a reference genome. Nat. Biotechnol. 2011;29:644–652. doi: 10.1038/nbt.1883. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 79.Patro R, Duggal G, Love MI, Irizarry RA, Kingsford C. Salmon provides fast and bias-aware quantification of transcript expression. Nat. Methods. 2017;14:417–419. doi: 10.1038/nmeth.4197. [DOI] [PMC free article] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Supplementary Materials
Data Availability Statement
Longitudinal data from the Brady et al. 2017 publication are available with controlled access on the European Genome-phenome Archive (EGA) under accession EGAS00001002436.