Skip to main content
Briefings in Bioinformatics logoLink to Briefings in Bioinformatics
. 2023 Nov 4;24(6):bbad388. doi: 10.1093/bib/bbad388

FLED: a full-length eccDNA detector for long-reads sequencing data

Fuyu Li 1,#, Wenlong Ming 2,#, Wenxiang Lu 3, Ying Wang 4, Xiaohan Li 5, Xianjun Dong 6,7,8,, Yunfei Bai 9,
PMCID: PMC10632013  PMID: 37930031

Abstract

Reconstructing the full-length sequence of extrachromosomal circular DNA (eccDNA) from short sequencing reads has proved challenging given the similarity of eccDNAs and their corresponding linear DNAs. Previous sequencing methods were unable to achieve high-throughput detection of full-length eccDNAs. Herein, a novel algorithm was developed, called Full-Length eccDNA Detection (FLED), to reconstruct the sequence of eccDNAs based on the strategy that combined rolling circle amplification and nanopore long-reads sequencing technology. Seven human epithelial and cancer cell line samples were analyzed by FLED and over 5000 full-length eccDNAs were identified per sample. The structures of identified eccDNAs were validated by both Polymerase Chain Reaction (PCR) and Sanger sequencing. Compared to other published nanopore-based eccDNA detectors, FLED exhibited higher sensitivity. In cancer cell lines, the genes overlapped with eccDNA regions were enriched in cancer-related pathways and cis-regulatory elements can be predicted in the upstream or downstream of intact genes on eccDNA molecules, and the expressions of these cancer-related genes were dysregulated in tumor cell lines, indicating the regulatory potency of eccDNAs in biological processes. The proposed method takes advantage of nanopore long reads and enables unbiased reconstruction of full-length eccDNA sequences. FLED is implemented using Python3 which is freely available on GitHub (https://github.com/FuyuLi/FLED).

Keywords: extrachromosomal circular DNA, full-length detection, long-read sequencing technology

INTRODUCTION

Extrachromosomal circular DNA (eccDNA) is a class of chromosomal original DNA with a covalently closed circular structure which was found common and highly conversed in eukaryotes ranging from yeast to human [1–3], with lengths ranging from tens to millions of base pairs [4, 5]. eccDNAs arising from unique non-repetitive genomic regions have been discovered in normal and malignant cells [6, 7], which carry intact genes and specific regulatory elements [6–8]. The existence of eccDNAs reflects the complex genomic instability and the accumulation of eccDNAs promotes the aging of cells [9]. Due to the lack of mitotic granules, eccDNAs follow unequal inheritance leading to significant intercellular genetic heterogeneity in various tumors [10]. By amplifying oncogenes [11–13] or therapeutic resistance genes [14, 15] located on it, eccDNAs can drive genome remodeling [16] and distal regulation [7, 17].

To effectively purify and amplify eccDNA for de novo detection, a sensitive method called Circle-seq [18, 19] has been applied followed by high-throughput sequencing. Briefly, this approach encompasses column purification, removal of remaining linear chromosomal DNA by exonuclease and rolling-circle amplification (RCA). Several computational methods were developed to identify eccDNA from discordantly mapped paired-end reads as well as soft clipped reads around eccDNA breakpoint [3, 20]. However, limited by the relatively short length of Illumina sequencing reads, these methods were restricted to detect reads that span the eccDNA breakpoints and therefore cannot determine the internal structure of the eccDNA, especially for eccDNAs including parts from different chromosomes [11, 21]. Different kinds of information contained in the full-length sequences of eccDNAs, such as the internal structures, mutations and inter-breakpoint insertion fragments, will facilitate the inferences on the mechanisms of eccDNA biogenesis and transcriptional regulatory functions [3, 22–26]. Although the sequence of eccDNA can be inferred by reconstructing the internal structure of the DNA amplifications [11, 21], the method of assembling short-read data inevitably introduces errors, especially for homologous or similar eccDNA molecules. Long-reads-based sequencing methods like Oxford Nanopore [27] have the advantage of real-time ultra-long read sequencing, allowing covering the sequence of the whole molecule in one read. As noticed, there are few existing bioinformatics tools to identify eccDNA from RCA nanopore data: CReSIL [28], cyrcular-calling-workflow [29], Nanocircle [30], eccDNA_RCA_nanopore [31, 32] and ecc_finder [33]. Similar to short-reads-based methods, Nanocircle determined the joined breakpoint coordinates of genomic regions during eccDNA fragment formation by exploiting split reads while CReSIL reconstructed the sequence of eccDNA fragments by de novo assembly of sequence reads. The cyrcular-calling-workflow constructed a directed graph based on read depth information and split reads, and then called eccDNAs by calculating the posterior probability for each plausible circular path [29]. The eccDNA_RCA_nanopore method detected full-length eccDNAs by bootstrapping successive aligned subreads in each RCA long read [31], while ecc_finder identified candidate reads with a tandem repeat pattern and divided each read into repeat subread first and then determined the fragment composition of full-length eccDNA by the genomic mapping locations of the repeat subread [33]. Therefore, ecc_finder and eccDNA_RCA_nanopore could obtain the full-length sequences of eccDNAs directly from sequencing reads without assembly or splicing, but both required supporting reads spanning across the joint breakpoint of the original eccDNA at least twice to ensure the repeat patterns in RCA reads, which is severely limited by the size of the eccDNAs and the amplification enzyme activity, resulting in a poor utility of the sequencing data.

To address this challenge, a new bioinformatics method has been developed, called Full Length EccDNA Detection (FLED), taking advantage of the nanopore long-read sequencing technology and rolling-circle amplification method, to profile full-length eccDNAs effectively. The FLED process includes three steps. Candidate eccDNA detection was designed to construct two sets of weighted directed graphs based on the split alignment of each RCA long read: the first one aimed to find the optimal successive alignment of each read according to the location of every subread on it, and the second one was constructed based on the genomic coordinates of every subread in the optimal alignment and the candidate eccDNAs were identified by closed-cycle detection. For a given eccDNA candidate, all supporting reads were cut at the breakpoints, and one high-quality consensus sequence was produced as the full-length sequence for each candidate. Finally, the number of supporting reads and the variation of sequencing depth around the joined breakpoint were calculated and used to filter eccDNAs with high confidence. The adapted Circle-seq and FLED workflow is summarized in Figure 1.

Figure 1.

Figure 1

  Workflow of adapted Circle-seq and FLED. (A) Construction of nanopore sequencing libraries. The genomic DNA was extracted and subjected to Plasmid-Safe DNase digestion to degrade linear dsDNA. The remaining DNA was performed by random rolling circle amplification by Phi29. Product DNA was de-branched and sequenced using the MinION platform. (B) Diagram showing the different types of RCA products. For eccDNA from a single genomic locus, the ideal RCA long reads should cross the joined breakpoints at least twice, so the continuous subreads were aligned to a single locus (left panel). Reads covered the joined breakpoint once can be classified as full-length reads or partial reads based on the overlapping of subreads. Partial reads covering breakpoints miss the internal structure and partial reads without breakpoints miss the genomic start and end position of eccDNA. (C) Workflow of eccDNA identification for aligned high-quality nanopore reads. STEP 1: Candidate eccDNA detection. STEP 2: Full-length sequences correction. STEP 3: False-positive reduction.

MATERIALS AND METHODS

EccDNA extraction and sequencing

In this study, seven human cell lines were used for eccDNA detection: gastric cancer cells BGC823, gastric cancer cells SGC7901, gastric epithelial cells GES1, hepatocellular carcinoma cells HepG2, liver cells HL7702, breast carcinoma cells MDA-MB-453 (MB453) and breast epithelial cell line MCF12A. Each sample was subjected to Plasmid-Safe ATP-dependent DNase (Epicentre) digestion and performed rolling circle amplification by Phi29 DNA polymerase with Exo-resistant random primer followed by Nanopore sequencing according to the manufacturer’s protocol (Figure 1A, Detailed in Supplementary Methods available online at at http://bib.oxfordjournals.org/). High-quality base-called reads were aligned to the human reference genome (GRCh38) using long-read splice-aware mapper minimap2 [34, 35], and then input for eccDNA detection using the proposed FLED method.

Full-length eccDNA detection method

FLED takes long-read sequencing data or splice-aware alignment as input and identifies full-length eccDNAs in three main steps (Figure 1B and C).

STEP 1, candidate eccDNA detection

As minimap2 does split-read alignment, nanopore ultra-long reads with high alignment quality were split into multiple aligned subreads [34, 35]. To remove misalignments of subreads, the aligned subreads are transformed into a weighted directed alignment graph (‘Alignment Graph’ in STEP 1 of Figure 1C) with the python library networkX [36], where the nodes represent aligned subreads and the edges indicate the adjacency of two subreads on raw read. The length of the path in the alignment graph was calculated as the total length of all subreads on this path and the set of subreads on the longest path represented the optimal successive alignment of this RCA long read. To reduce the impact of possible DNA contamination during the RCA process, FLED set up a hard filter on the alignments, where the total length of consecutive aligned subreads exceeded 70% of the read length. For filtered reads with only two subreads, FLED identified eccDNAs and classified them according to the overlap of genomic mapping positions (‘locus’ in Figure 1B and C) of these two subreads (Figure 1B). For filtered read with more subreads, FLED builds a splice graph (‘Splice Graph’ in STEP 1 of Figure 1C) based on the genomic coordinates of each subread to identify eccDNAs. Each splice graph is a directed graph in which nodes represent genomic fragments while the weighted directed edges represent the number of connections between two such nodes based on their adjacent parts. Theoretically, the splice graph should be a closed circuit owing to the nature of eccDNAs and the closed traversal path represents the composition of a candidate full-length eccDNA. FLED finds a closed traversal path with the maximum total weight as the composition of the eccDNA by depth-first search (DFS) algorithm [37]. Similarly, FLED discards chimeric read with lower confidence, where the total length of all subreads within the loop is <50% of raw read, due to possible template-dependent polymerase jumping or switching events during RCA [38, 39].

Furthermore, resulting from the large size of few eccDNAs and the low efficiency of RCA, over 90% of nanopore reads contained only partial sequences of eccDNAs. To improve data utilization, FLED retained identified eccDNA breakpoints from non-full-length reads that crossed joined breakpoints (Figure 1B). The algorithm details are shown in STEP 1.

STEP 1. Candidate eccDNA detection

Input: The coordinate-sorted compressed binary Sequence Alignment/Map Format (BAM) file
Output: The types and candidate eccDNAs of reads.
1. For each read j with primary alignment MAPQ ≥10 (recommended cutoff: MAPQ ≥10)
a. Extract all canonical alignments from primary alignment and SA tag [40].
b. Construct a directed alignment graph G1j = (V1j, E1j) with networkX [36], where node set V1j represents the aligned subreads and the edge set E1j carries additional information about the mean length of these subreads. An edge <vj, wj>, where vj, wjV1j, exists if and only if the corresponding subreads are neighbors (with a 50 bp offset allowed) in read j.
c. Calculate the length of each path in graph G1j according to the total length of all subreads on path. Obtain the set of subreads SRj= {SRjk, k ∈ N*} on the longest directed path (SRj ⸦ V1j), where k is the number of subreads.
d. If k = 1, label read j as a partial read without eccDNA breakpoint (BP), which fails to designate the origin of genomic location (locus) of breakpoint.
e. If k = 2, obtain locus coordinates SRj1 = (LSj1, LEj1) and SRj2 = (LSj2, LEj2).
i) If LEj1 < LSj2, read j fails to designate the origin of locus of eccDNA breakpoint.
ii) If LSj1< LEj2, label read j as a Break-type read (partial read with BP) and obtain candidate eccDNA breakpoint (LSj2, LEj1).
iii) If LSj2LSj1LEj2, label read j as a Full-type read (full-length read with 1 BP) and obtain candidate eccDNA breakpoint (LSj2, LEj1).
f. If k ≥ 3,
i) Construct a directed splice graph G2j= (V2j, E2j), where node set V2j represents the loci of subreads in SRj. The edges between nodes represent that the corresponding subreads are neighbors (with a 50 bp offset allowed) in the read j.
ii) Merge edges with the same starting and ending vertices into a single edge with a weight equal to the number of merged edges.
iii) Find all closed traversal paths by DFS method [37]
iv) Pick the closed path with the maximum weight and label read j as a Full-type read. Obtain all loci on this path as the composition of candidate eccDNA.
2. All reads are done. Obtain the meta information of each read (type and supported candidate eccDNA).

STEP 2, full-length sequences correction

The sequence of eccDNA can be completely preserved in the RCA long read, but at the same time, the inaccuracy of nanopore sequencing technology also leads to the failure of using the raw sequence of the subread directly as the sequence of eccDNA. To avoid the uncertainty of assembly, the correction was performed on the full-length eccDNAs only, which was supported by at least one full-length read. As illustrated in STEP 2 of Figure 1C, FLED truncates uncorrected full-length reads into subreads according to the closed-loop detected in the splice graph and groups all subreads based on the putative eccDNAs they support. To improve the accuracy, FLED incorporates high-quality non-full-length reads mapped to the same genomic region and generates a consensus sequence for each group as the accurate full-length sequence of each Full eccDNA by partial order alignment algorithm SPOA [41, 42]. FLED considers the total number of all high-quality reads spanning across the joined breakpoint, including full-length reads with at least 2 breakpoints, full-length reads with 1 breakpoint and partial reads with breakpoints (Figure 1B), as the supporting read numbers for candidate eccDNAs. The algorithm details are shown in STEP 2.

STEP 2. Full-length sequences correction

Input: Fastq file and the meta information of reads (output from STEP 1)
Output: Refined eccDNA and sequence of Full eccDNA
1. Divide reads with the same eccDNA breakpoints (with a 50 bp offset allowed) into one block Bi (i ≥ 1).
2. For each block Bi, (LSj, LEj) represents the eccDNA breakpoint coordinates of supporting read j (1 ≤ jJi), where Ji represents the total number of reads in Bi.
a. Calculate the refined coordinates (LSi, LEi) = (Mode (LSj), Mode (LEj)).
b. Record the supporting read number Ji of (LSi, LEi).
c. Determine the type of Bi (Full or Break) based on the type of supporting read j.
d. If Bi is Full-type, truncate all reads in Bi into subreads according to (LSi, LEi), and generate a consensus sequence using the SPOA method [41, 42]. Obtain the full-length sequence of eccDNA (LSi, LEi).
3. All blocks are done. Obtain the refined eccDNAs, supporting read numbers and sequences of Full eccDNAs.

STEP 3, false-positive reduction

FLED implements hard filters at the alignment and eccDNA level to reduce eccDNA identification errors. At the alignment level, FLED examines the mapping qualities and alignment length as described while at the eccDNA level, FLED applies three filters to the detected eccDNAs. Real eccDNAs are resistant to exonuclease digestion and are supposed to show high sequencing depth inside the junction. Therefore, the candidate eccDNA supported by at least two distinct nanopore reads was further validated by the sequencing depth adjacent to its joined breakpoints (STEP 3 of Figure 1C). For each candidate eccDNA, FLED calculates the sequencing depth of 50 bp upstream and 50 bp downstream around the eccDNA breakpoint sites. The candidate is then outputted when two criteria are met: firstly, the variation in sequencing depth between 50 bp region overlapped by the eccDNA and 50 bp from eccDNA flanking region should be statistically significant based on a Wilcoxon test (P < 0.05); secondly, the ratio of the sequencing depth of 50 bp region overlapped by eccDNA to the combined sequencing depth of both overlapped and flanking regions should exceed 0.5. All the parameters for reducing false positives are adjustable to adapt to various data and requirements. The algorithm details are shown in STEP 3.

STEP 3. False-positive reduction

Input: BAM file and the meta information of the refined eccDNAs (output from STEP 2)
Output: The filtered confident eccDNAs
1. For each refined eccDNA i (recommended cutoff: supporting read number ≥ 2)
a. Calculate the sequencing depth of 50 bp upstream and downstream of the eccDNA breakpoint site, Din represents sites overlapped by eccDNA region and Dout represents sites from eccDNA flanking region, respectively.
b. Filter using Wilcoxon test on Din and Dout (recommended cutoff: P-value <0.05).
c. Calculate the Ratioi as follows, Inline graphic,
and filter (recommended cutoff: Ratio > 0.5).
2. All eccDNAs are done. Obtain the meta information of all filtered confident eccDNAs, including supporting read numbers, P-values of Wilcoxon test and Ratios.

Time complexity of FLED

Suppose there are m reads with an average number of subreads s for each read. The overall time complexity for STEP 1 is O(m (s + s2) + m(k + k2)). O(s + s2) is the time complexity of the directed alignment graph for each read [43]. O(k + k2) is the time complexity of the DFS algorithm in the directed splice graph, where k is the average number of loci for each read (k ≤ s) [37]. Suppose there are n eccDNAs with an average length of l and an average supporting read number of v. In that case, the overall time complexity for STEP 2 is O (n × v2 × l2), where O (v2 × l2) is the time complexity of the SPOA algorithm [41, 42]. The time complexity of STEP 3 can be easily calculated as O (n).

EccDNA validation by PCR and sanger sequencing

To verify the junction sequences of eccDNAs, a pair of PCR primers specific for breakpoint junctions were designed using Primer-Premier 5 (Premier Biosoft Interpairs, Palo Alto, CA). The outward PCR experiments were performed using NEBNext High-Fidelity 2X PCR Master Mix (NEB). For a 50 μl reaction, 10 ng of the MDA products were used, followed by an initial denaturing step of 98 °C for 30 s; 35 cycles of 98 °C for 10 s, 60 °C for 20 s, 72 °C for 20 s; and a final extension of 72 °C for 2 min. The PCR products were analyzed by agarose gel electrophoresis and performed Sanger sequencing. Primer sequences and expected PCR product sizes of the split junction sequences are detailed in Supplementary Table 1 available online at at http://bib.oxfordjournals.org/.

RESULTS

EccDNA detection and validation in simulated datasets

FLED was evaluated in simulated datasets first. Both high- and low-coverage datasets were simulated by NanoSim [44] (Detailed in Supplementary Methods available online at at http://bib.oxfordjournals.org/), resulting in 4297 and 4153 simulated eccDNAs under the sequencing depths of 30× and 10×, respectively. Then the FLED-identified eccDNAs were compared with simulated eccDNAs to evaluate the overall performance of FLED with the comprehensive metric F1 score. On high-coverage data, FLED attained an F1 score of 0.97 by detecting 4058 out of the 4297 simulated eccDNAs, while moderate sensitivity and precision were also found on low-coverage datasets with an F1 score of 0.77. With increased sequencing depth, the proportion of full-length eccDNA raised from 53.81% to 72.95%. The length distribution of full-length sequences obtained by FLED displayed a high correlation with the simulated length (Pearson’s r = 0.9, P-value of 2.2 × 10−26) (Supplementary Figure 1 available online at at http://bib.oxfordjournals.org/). These results indicated that sequencing depth plays a significant role in eccDNA detection, especially for full-length eccDNA. More importantly, FLED still performed well on low-depth data.

An evaluation of the running time on the simulated dataset was conducted, and the results are available in SupplementaryFigure 2 available online at at http://bib.oxfordjournals.org/. The generation of consensus sequences is known to be time-consuming, but FLED is still comparable in terms of running speed and space consumption (Supplementary Table 2 available online at at http://bib.oxfordjournals.org/). Based on the assessment of parameter effects in simulated data, detected eccDNAs with at least two supporting reads were considered plausible (Supplementary Table 3 available online at at http://bib.oxfordjournals.org/).

Verifying eccDNAs by PCR, sanger sequencing and comparison

To validate the accuracy of eccDNA detection by FLED, RCA nanopore sequencing was applied for the GES1 cell line and 5780 eccDNAs were detected based on FLED, and 11 eccDNAs of different sizes ranging from 600 to 9600 bp were randomly selected and the outward primers were designed to amplify the junction region (Supplementary Table 1 available online at at http://bib.oxfordjournals.org/). A total of 90% (10 of 11) selected eccDNA were successfully validated by both PCR and Sanger Sequencing (Figure 2A).

Figure 2.

Figure 2

eccDNAs validation and comparison in GES1 cell line (A) Gel images for a validated subset (n = 10) of eccDNAs by outward PCR and the Sanger sequencing of outward PCR product of validated eccDNA example. EccDNAs are named according to the coordinates of breakpoints. The list of validated eccDNAs and corresponding primer are summarized in Supplementary Table 1 available online at http://bib.oxfordjournals.org/. (B) The eccDNA detected by Circle-Map (2522 in total) for NGS data and FLED (5780 in total) for Nanopore data, and identical eccDNA (1538) detected by both methods. (C) The eccDNA detected by Nanocircle (6558 in total) and FLED (5780 in total) for Nanopore data, and identical eccDNA (4148) detected by both methods. D, E, Comparison of three nanopore-based eccDNA detectors. (D) The number of full-length eccDNAs detected by ecc_finder (1198 in total), eccDNA_RCA_nanopore (795) and FLED (3016). (E) The size distribution of detected eccDNAs. eccfinder: eccDNAs detected by ecc_finder; eccRCAnanopore: eccDNAs detected by eccDNA_RCA_nanopore; FLED Fullwith2BP: FLED-detected Full eccDNAs supported by reads spanning across the joined breakpoint at least twice; FLED Fullwith1BP: FLED-detected Full eccDNAs supported by reads spanning across the joined breakpoint only once; FLED Break: Break eccDNAs detected by FLED. P-values are determined using the Wilcoxon test. Significance: (***) P-value<0.001; (F) Gel images for two multiple-fragments eccDNAs by PCR. The list of validated eccDNAs and corresponding primer sequences are summarized in Supplementary Table 1 available online at http://bib.oxfordjournals.org/.

The Illumina library was constructed for the GES1 cell line and sequenced for comparison and validation. Circle-Map was deployed for eccDNA detection on Next-Generation Sequencing (NGS) data and the detected eccDNAs by Circle-Map were filtered under the same criteria as FLED used. Finally, a total of 2522 eccDNAs breakpoints were detected by Circle-Map, while 5780 by FLED including 4170 Full eccDNAs. A total of 1538 eccDNAs were found to overlap in both results, accounting for 62% and 27% of Circle-Map and FLED, respectively (Figure 2B). Moreover, repeating elements may be misidentified as eccDNAs by NGS-based methods Circle-Map (36 of 2522, 1.43%), whereas the long-read-based method FLED (3 of 5780, 0.05%) can largely avoid interspersed repeats and tandem repeats being misidentified as eccDNAs (Supplementary Table 4 available online at at http://bib.oxfordjournals.org/).

The breakpoint-aware long-read-based method Nanocircle identified a total of 6558 eccDNAs of which 4148 were also detected by FLED, accounting for 97% and 93% of the total eccDNAs detected by Nanocircle and FLED, respectively, indicating the high consistency in detection of eccDNA breakpoints (Figure 2C). In the simulated dataset, FLED exhibited fewer false positives compared to Nanocircle due to the utilization of high-quality full-length reads (Supplementary Figure 3A available online at at http://bib.oxfordjournals.org/). Furthermore, a notable increase in recall (~4%) was observed with FLED, which could be attributed to its robust detection capability for low-abundance eccDNAs (P-value = 0.003, Supplementary Figure 3B available online at at http://bib.oxfordjournals.org/). The performance of FLED was further compared with two well-known full-length eccDNA detectors based on nanopore data both of which identify full-length eccDNA from full-length reads crossing the joined breakpoints at least twice. In the GES1 cell line, 795, 1198 and 3016 eccDNAs with at least 3 supporting reads were identified by eccDNA_RCA_nanopore, ecc_finder, and FLED, respectively (Supplementary Table 5 available online at at http://bib.oxfordjournals.org/). A total of 91% of eccDNAs detected by eccDNA_RCA_nanopore were overlapped with FLED-detected full-length eccDNAs, with 63% for ecc_finder, showing the reliability of FLED (Figure 2D, Supplementary Figure 4 available online at at http://bib.oxfordjournals.org/). FLED displayed a significant advantage in terms of the number of detected full eccDNAs and 481 eccDNAs were found by all tools.

To further reveal the reasons that lead to the better performance of FLED, the effect of different types of reads on the identification and filtration process of eccDNAs was analyzed. A total of 2018 Full eccDNAs were exclusively retained by FLED and missed by eccDNA_RCA_nanopore or ecc_finder. Considering that a candidate eccDNA requires a minimum of three supporting reads to pass the filtration, the contribution of different types of reads was calculated. A total of 1351 eccDNAs (66.95%) required additional evidence from full-length reads with one breakpoint or partial reads to pass the filtration. These two types of reads were often discarded by eccDNA_RCA_nanopore and ecc_finder. Meanwhile, 667 eccDNAs (33.05%) were identified from full-length reads with one breakpoint (Figure 2D, Supplementary Table 6 available online at at http://bib.oxfordjournals.org/). Moreover, when using a more lenient filtration condition, i.e. a candidate eccDNA should be supported by a minimum of 1 read, 3119 eccDNAs were exclusively detected by FLED (Supplementary Figure 4B available online at at http://bib.oxfordjournals.org/). Among these, only 728 eccDNAs (23.34%) were identified from ideal full-length reads with at least two breakpoints, while the majority of eccDNAs (2391, 76.66%) were detected from full-length reads with one breakpoint (Supplementary Table 6 available online at at http://bib.oxfordjournals.org/). Additionally, to quantify the utilization of sequencing data by FLED, the percentage of each type of read was calculated (Supplementary Table 6 available online at at http://bib.oxfordjournals.org/). For instance, FLED identified a total of 5780 eccDNAs (4170 Full eccDNAs and 1610 Break eccDNAs), with a total of 46 919 supporting reads, when compared with Nanocircle. The full-length reads with at least two breakpoints, full-length reads with one breakpoint, and partial reads with breakpoints account for 6496 (13.84%), 4975 (10.60%) and 35 450 (75.56%), respectively (Supplementary Table 6 available online at at http://bib.oxfordjournals.org/). Altogether, the ability of FLED to utilize full-length reads with one breakpoint and partial reads with breakpoints, which might be discarded by other full-length methods, for eccDNA identification significantly enhances data utilization and, more importantly, increases the sensitivity in eccDNA detection.

Furthermore, the comparison of the length distribution of full-length eccDNAs detected by three tools also showed that FLED is more sensitive to eccDNAs with a larger size than ecc_finder and eccDNA_RCA_nanopore benefit from full-length reads covered the breakpoints only once (Figure 2E). Similarly, on the high-coverage simulated dataset, FLED also shows higher sensitivity and accuracy compared to the other long-read-based methods (Supplementary Figure 5 available online at at http://bib.oxfordjournals.org/). Collectively, since FLED made full use of reads with more than one but less than two tandem eccDNA sequence repeats, the sensitivity of FLED, especially for eccDNAs with large sizes, was greatly improved.

In addition, eccDNAs composed of multiple, non-contiguous genomic segments were verified. Two multiple-fragment eccDNAs were selected and three pairs of primers were designed for each eccDNA (Figure 2F). The junctions between fragments and the breakpoints of eccDNAs were validated by PCR, and the internal structures, including the origins, arrangements, and even mutations within the fragments comprising the eccDNAs, were validated by Sanger Sequencing (Figure 2F), demonstrating the unique structure of eccDNA different from the linear genome, while only the breakpoints were determined by Circle-Map. Taken together, these results of experimental validation demonstrated the high reliability and efficiency of FLED for eccDNA detection based on Nanopore data.

EccDNA detection in cell lines using nanopore sequencing datasets

Seven human cell lines from normal and tumor tissue were processed by an adapted Circle-seq protocol and sequenced using the Nanopore MinION platform, generating an average of 1 M reads per sample (detailed in Supplementary Table 7 available online at at http://bib.oxfordjournals.org/). FLED was applied to identify eccDNAs in each sample after pre-processing including base-calling, reads quality control, and alignment (Detailed in Supplementary Methods available online at at http://bib.oxfordjournals.org/). The amount of eccDNA detected by FLED in seven cell lines spanned a wide range, varying from 521 to 19 326, showing the heterogeneity of cell types (detailed in Table 1).

Table 1.

  Nanopore sequencing and eccDNAs detected by FLED in different cell lines

BGC823 SGC7901 GES1 HepG2 HL7702 MB453 MCF12A
Phenotype GC GC GE HCC Liver BC BE
Read number 1 224 574 730 346 666 873 520 421 2 096 323 1 250 482 864 510
Mapping rate 98.36% 98.24% 97.93% 98.10% 94.76% 92.24% 98.40%
Full-length reads 25 867 (2.12%) 25 240 (3.47%) 17 944 (2.71%) 47 630 (9.30%) 40 098 (2.00%) 93 839 (7.96%) 2240 (0.26%)
Partial reads with BP 9.765% 9.407% 8.615% 11.389% 4.022% 7.991% 0.725%
eccDNA number 4589 4428 5780 5030 2817 19 326 521
Full eccDNA number 3625 3651 4170 4490 2356 17 388 323
Full eccDNA with ≥ 2 BP 2799 2955 3077 3944 1951 14 969 215
Full eccDNA with 1 BP 826 696 1093 546 405 2419 108
Break eccDNA 964 777 1610 540 461 1938 198
Full eccDNA rate 78.993% 82.453% 72.145% 89.264% 83.635% 89.972% 61.996%

Note: BP: breakpoint; GC: gastric cancer; GE: gastric epithelial cells; HCC: hepatocellular carcinoma; BC: breast cancer; BE: breast epithelial cells.

Reads containing at least one complete copy of eccDNA were considered full-length reads. The proportion of full-length reads in libraries ranges from 2 to 9%. Based on these reads, the full-length sequences of an average of 75% detected eccDNAs in seven cell lines could be obtained without assembly, providing direct evidence of internal structure in eccDNAs (detailed in Table 1, Supplementary Table 8 available online at at http://bib.oxfordjournals.org/). These full-length sequences are necessary for sequence-based analysis subsequently such as structural variation and gene transcription potential analysis. In addition, there were about 10% of RCA long reads that only contained partial sequences of eccDNAs but crossed the joined junctions; hence, the joined breakpoint coordinates of genomic regions can be determined from their split alignments accurately. However, the internal structure of eccDNA could not be determined from the breakpoint information alone. In this study, eccDNAs with full-length sequences were labeled as ‘full eccDNA’, while others with only breakpoint position as ‘break eccDNA’. The numbers of full eccDNAs in seven cell lines detected by FLED were 3625, 3651, 4170, 4490, 2356, 17 388 and 323, respectively. Furthermore, the size of eccDNAs was defined as the sequence length for full eccDNAs and the distance between breakpoints for break eccDNAs, respectively. The size distributions of eccDNAs were similar among all seven cell lines, ranging from 0.06 kb to 7.7 Mb with a median of 2.64 kb. Furthermore, a comparison between the size distributions of full and break eccDNAs showed a significant difference. The full eccDNAs have a smaller size with a median of 2.1 kb while the length of break eccDNAs was heavily overestimated due to the lack of internal sequences (Figure 3A), suggesting that the RCA method might perform better on small eccDNAs of which full-length sequences are easier to obtain. Previous studies reported that 99% of eccDNAs are shorter than 25 kb [1]. The longest full-length eccDNA detected by ecc_finder and eccDNA_RCA_nanopore was 16 kb, whereas the longest full-length eccDNA detected by FLED reached up to 30 kb, and FLED identified 10 eccDNAs longer than 16 kb, indicating the improved performance of FLED for full-length eccDNA detection. However, for the possible larger eccDNA, direct detection of its full-length sequence still faces challenges due to the inefficiency of RCA.

Figure 3.

Figure 3

Characteristics and genomic annotation of eccDNAs. (A) The size distribution of Full eccDNAs and Break eccDNAs in seven cell lines. (B) The distribution of Full eccDNA breakpoints around the exon-intron boundary. (C) Genomic annotation overlap of detected Full eccDNA in GES1 cell line. (D) KEGG pathway enriched by annotated genes in gastric-related cell lines (BGC823, SGC7901 and GES1). (E) Display of chromosome 1 at TXNIP gene and detected eccDNA chr1:145991770-145997924, and predicted promoter, corresponding TFBS, and highly likely TSS region.

Although most detected eccDNAs are derived from a single genomic locus, FLED also found eccDNAs consisting of multiple discontinuous fragments in seven cell lines. A total of 133 of 159 detected multiple-fragment eccDNAs included two or more fragments from different chromosomes, while 26 multiple-fragment eccDNAs were intra-chromosomal. Considering that these multiple-fragments eccDNAs may have independent biogenesis, they were not included in subsequent analysis.

Characteristics and functional analysis of eccDNAs

The distribution and potential function of eccDNA were further analyzed at chromosome and gene levels. EccDNAs formed from all chromosomes (Supplementary Figure 6A available online at at http://bib.oxfordjournals.org/), and interestingly, higher eccDNA densities appeared on both chromosomes 5 and 20 in multiple cell lines (Supplementary Figure 6B available online at at http://bib.oxfordjournals.org/). Due to the reliability of Full eccDNAs detected by FLED, subsequent analysis at the gene level was limited only to Full eccDNAs. While the eccDNA breakpoint was more enriched around exon-intron boundaries, especially occurred frequently on flanking regions of exons (Figure 3B), more than half of the Full eccDNAs can be annotated to genes, and some could even contain complete genes and enhancer-like signatures, which also indicated that eccDNA might affect gene regulation through disturbing these complete genes and enhancers (Table 2 and Figure 3C). The Gene Ontology (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway enrichment analysis was applied on annotated genes (N = 828, 825 and 953) of Full eccDNAs in three gastric-related cell lines (BGC823, SGC7901 and GES1). Although the numbers of genes enriched in BGC823 and SGC7901 (both were gastric cancer cell lines) were smaller than that in human gastric epithelial cell line GES1, more significant pathways were enriched such as Rap1 signaling pathway, calcium signaling pathway and cAMP signaling pathway (Figure 3D, Supplementary Figure 7 available online at at http://bib.oxfordjournals.org/), which were reported to be involved in the process of tumor cell migration, invasion and metastasis in many kinds of cancers [45–47]. These pathways, which were enriched by the eccDNAs annotation genes, exerted some correlations with tumor activities, suggesting that eccDNAs might be involved in the development and progression of tumors.

Table 2.

  Annotation of Full eccDNA in cell lines. The number of detected Full eccDNA, annotated Full eccDNAs which overlap with gene region and Full eccDNAs contain full gene in seven cell lines respectively

BGC823 SGC7901 GES1 HepG2 HL7702 MB453 MCF12A
Phenotype GC GC GE HCC Liver BC BE
Full eccDNA 3625 3651 4170 4490 2356 17 388 323
Annotated Full eccDNA 2486 2591 2892 3065 1751 11 566 236
Annotated gene 1970 2035 2317 2551 1510 6941 232
Full eccDNA with full gene 82 58 117 102 44 277 14
Full eccDNA with full enhancer 1315 1352 1732 1605 926 5748 171

Note: GC: gastric cancer; GE: gastric epithelial cells; HCC: hepatocellular carcinoma; BC: breast cancer; BE: breast epithelial cells.

Previous studies have shown that intact genes carried by eccDNA can be expressed uncontrollably, which in turn affects transcription levels [7, 12, 13]. To investigate the impact on transcription, the gene differential expression analysis between gastric cancer cells and gastric epithelial cell GES1 was performed based on RNA-seq data, revealing that genes encoded on eccDNA, were significantly overexpressed. An eccDNA named chr1:145991770-145997924 was identified in gastric epithelial cell line, GES1, which contains the complete TXNIP gene (Figure 3E). TXNIP was known as a tumor suppressor, which inhibits tumor cell proliferation and cell cycle progression [48–50]. The functional elements of TXNIP were predicted on eccDNA chr1:145991770-145997924, including the corresponding promoter, transcription factor binding site (TFBS) and transcription start site (TSS). Furthermore, the upregulation of TXNIP in the gastric epithelial cell line (P-value = 0.003) was consistent with the expression pattern of tumor suppressors, indicating the transcriptional potential of eccDNAs. However, to distinguish whether the overexpressed mRNAs were transcribed from eccDNAs or chromosomal gene TXNIP, additional biological experiments are necessary. Similarly, eccDNA chr8:127400294-127404673 (Supplementary Figure 8 available online at at http://bib.oxfordjournals.org/) was identified, where gene CCAT2 was fully encoded on this eccDNA. CCAT2 is a well-recognized long non-coding RNA associated with gastric cancer, actively participating in gastric cancer differentiation and invasion processes, while also exhibiting notable prognostic capabilities [51, 52]. Although the predicted promoters, TFBS and TSS of gene CCAT2, were located downstream of CCAT2, these regulatory elements could still be functional due to the circular nature of eccDNAs.

DISCUSSION

This work reported FLED, an experimental and bioinformatics approach for the detection of full-length eccDNA based on nanopore sequencing technology. Benefiting from the unique closed circular structure of eccDNA, eccDNA was amplified through RCA and sequenced, resulting in the multiple tandem copies of template eccDNA sequences that can be captured in ultra-long nanopore reads. FLED takes both full-length and partial reads and converts alignment information into graphs to identify eccDNAs with full-length sequences.

The main contributions of FLED reside in three aspects: first, it constructs two distinct directed graphs (alignment graph and splice graph) for each read to identify eccDNA. This approach effectively addresses the challenges posed by similar reads originating from homologous eccDNAs, leading to a significant reduction in false positives. Second, before constructing the splice graph for eccDNA identification, FLED also creates an alignment graph based on the arrangement of mapped subreads on this read, in order to obtain the optimal successive alignment of each read. Consequently, this strategy not only simplifies the subsequent splice graph but also facilitates the filtration of low-quality chimeric reads arising from template-dependent polymerase jumping or switching events during the rolling circle amplification process. Finally, FLED not only utilizes full-length reads as the most reliable evidence for identifying eccDNA to ensure accuracy, but also incorporates partial reads as additional evidence to assist candidate eccDNAs in passing filtration, thereby enhancing its sensitivity.

FLED was assessed using simulated datasets, corresponding Illumina data, and experimental validation, demonstrating the high efficiency and reliability of this method in full-length eccDNA detection from nanopore data. FLED can also be applied for eccDNA identification in other species (Supplementary Table 9 available online at at http://bib.oxfordjournals.org/), indicating its adaptability [31, 53]. FLED achieved a high F1 score (0.97) of eccDNA detection in the 30X simulated dataset, and 90% of 11 randomly selected eccDNA was verified by PCR and the Sanger sequencing in real data. These results proved that FLED is reliable and accurate in eccDNA detection. Although short reads-based approaches like Circle-Map can identify eccDNA breakpoints according to the discordantly mapped paired-end reads and soft clipped reads, FLED detected more full-length eccDNAs with accurate sequences, and the average size of full-length eccDNAs is 2 kb, showing the advantage of FLED over the NGS-based methods.

Other nanopore-based eccDNA detectors still have limitations in obtaining accurate full-length sequences of eccDNAs: Nanocircle underutilized the ultra-long nanopore reads, as it solely identified the origins of eccDNAs without capturing essential internal structural details, specifically the arrangements of fragments and mutations on the eccDNAs. The assembly-based method CReSIL merged genomic regions of aligned reads as the representative of the chromosomal origins of eccDNAs. Consequently, distinguishing the origins of these similar reads of eccDNAs from the same genomic region, could pose a challenge for CReSIL. On the other hand, cyrcular-calling-workflow determined eccDNAs according to the posterior probability. Both eccDNA_RCA_nanopore and ecc_finder identified full-length eccDNA directly from nanopore RCA reads. The former excelled in detecting eccDNA with multiple fragments, while ecc_finder was versatile, applicable to both long-read and short-read data. However, eccDNA_RCA_nanopore required reads that hit the genome more than once, and ecc_finder retained reads with at least two tandem repeats of the original eccDNA sequence, which not only result in data waste but also made it challenging for both methods to detect large-sized eccDNAs. This approach held an advantage in enhancing the sensitivity of eccDNA detection, especially for larger-sized eccDNAs. The approach introduced in this study, FLED, identified eccDNAs from as many informative reads as possible, including both ideal full-length reads with at least two breakpoints and full-length reads with 1 breakpoint. It also reduced false positives based on sequencing depth distribution, allowing FLED to strike a balance between detection sensitivity and false positives.

FLED uses the consensus sequences, rather than raw subreads, as the full-length sequences of eccDNAs. Thus, the mutations and internal structures of detected eccDNAs can be preserved in the output of FLED and the sequencing errors can also be eliminated as much as possible (Supplementary Figure 9 available online at at http://bib.oxfordjournals.org/). The mutations on eccDNAs allow a wide range of mutation-based applications and analyses, such as cell genotyping and lineage tracing [23–25]. Assessment of eccDNA structural heterogeneity can also facilitate inference of eccDNA structural dynamics [22, 23]. In addition, the insert fragment of non-genomic origin and short repeats of microhomology (2–15 bp) between eccDNA breakpoints are available from full-length eccDNA sequences, and the analysis of these short sequence elements will contribute to uncovering the mechanism of eccDNA biogenesis and function [3, 26]. The annotation of eccDNAs on the gene level also showed that eccDNAs may play an important role in the oncogenesis, metastasis, and recurrence of tumors. Some cancer-associated pathways were enriched in both gastric cancer cell lines (BGC823 and SGC7901), and it is reported that cAMP, calcium and Rap1 signaling pathway are related to the progression of multiple kinds of cancers including gastric cancer. Moreover, the eccDNAs that contain complete CCAT2 and TXNIP genes also suggested the regulatory role of eccDNAs in cancer cell growth, migration, and invasion.

However, there is still room for improvement in both experimental and computational methods. First, for larger-sized eccDNAs, Phi29 DNA polymerase faces challenges in achieving rolling circle amplification for more than one complete loop, thus making it difficult to generate reads containing the full-length sequences of eccDNA templates. As a result, acquiring the accurate full-length sequences of these eccDNAs without assembly becomes a challenge. By optimizing experimental conditions or size selection, it is possible to enhance the performance of Phi29 to achieve a wider range of eccDNA detection and a higher proportion of full-length reads. Second, FLED integrates partial reads to assist in eccDNA identification and filtration. While this strategy significantly increases FLED’s recall and sensitivity, it also introduces false positives (The false positives caused by partial reads were 2.70 times that of full-length reads). Additionally, even though FLED applies stringent filtering and statistical tests to minimize false positives, structural variations may still be misidentified as eccDNAs. According to the Sanger sequences of 11 randomly selected eccDNAs, the similarity between FLED and Sanger sequences achieved 94.36%, which is not satisfying. The main reason for the inconsistency might be the short repetitive sequences around eccDNA breakpoints, which affects the truncation of subreads and the generation of consensus sequences. In future works, the integration of short-read NGS sequencing data for hybrid correction of nanopore reads could improve the accuracy of FLED sequences. Moreover, we anticipate that with ongoing advancements in nanopore sequencing technology, there will be further enhancements in the accuracy of its sequencing outcomes. Furthermore, limited by the current sequencing depth, FLED is far from saturation, and the whole profile of eccDNAs can be provided with deeper sequencing.

CONCLUSION

This work presents FLED, a new eccDNA detection tool for long-read sequencing data. Based on biological validation, simulated datasets experiments, and comparison with other tools, FLED demonstrates its outperformed reliability. The advantage of FLED in detecting accurate full-length sequences of eccDNAs shows the potential for revealing eccDNA biogenesis and functions, demonstrating its application value in the field of cancer research.

Key Points

  • A novel method named FLED was developed to identify eccDNA and its accurate full-length sequence for long-read sequencing data.

  • FLED constructed directed graphs for each read to ensure identification accuracy and incorporated high-quality non-full-length reads to correct sequencing errors.

  • FLED could effectively identify eccDNA and its accurate full-length sequence, providing new insight into an in-depth understanding of eccDNA biogenesis and functions.

Supplementary Material

Supplementary_Materials_bbad388
Supplementary_Tables_bbad388

ACKNOWLEDGEMENTS

Not applicable.

Fuyu Li is a Ph.D. candidate in School of Biological Science and Medical Engineering, Southeast University. Her research interests focus on bioinformatics.

Wenlong Ming is a Lecturer in School of Artificial Intelligence, Nanjing University of Information Science and Technology. He specialized in bioinformatics.

Wenxiang Lu is a Ph.D. candidate in School of Biological Science and Medical Engineering, Southeast University. He majors in molecular cellular biology.

Ying Wang is a Ph.D. candidate in School of Biological Science and Medical Engineering, Southeast University. Her research interests focus on bioinformatics.

Xiaohan Li is a Ph.D. candidate in School of Biological Science and Medical Engineering, Southeast University. She majors in molecular cellular biology.

Xianjun Dong is an Assistant Professor in the Department of Neurology at Harvard Medical School, a Faculty member of the HMS Initiative for RNA Medicine, and an Associate Member of the Broad Institute. He specialized in developing and applying computational methods.

Yunfei Bai is a Professor in School of Biological Science and Medical Engineering, Southeast University. His research interests focus on biomedical engineering and bioinformatics.

Contributor Information

Fuyu Li, State Key Laboratory of Digital Medical Engineering, School of Biological Science and Medical Engineering, Southeast University, Nanjing, 210096, P. R. China.

Wenlong Ming, Institute for AI in Medicine, School of Artificial Intelligence, Nanjing University of Information Science and Technology, Nanjing, 210044, P. R. China.

Wenxiang Lu, State Key Laboratory of Digital Medical Engineering, School of Biological Science and Medical Engineering, Southeast University, Nanjing, 210096, P. R. China.

Ying Wang, State Key Laboratory of Digital Medical Engineering, School of Biological Science and Medical Engineering, Southeast University, Nanjing, 210096, P. R. China.

Xiaohan Li, State Key Laboratory of Digital Medical Engineering, School of Biological Science and Medical Engineering, Southeast University, Nanjing, 210096, P. R. China.

Xianjun Dong, Genomics and Bioinformatics Hub, Brigham and Women's Hospital, Boston, MA 02115, USA; Precision Neurology Program, Brigham and Women's Hospital, Boston, MA 02115, USA; Department of Neurology, Harvard Medical School, Boston, MA 02115, USA.

Yunfei Bai, State Key Laboratory of Digital Medical Engineering, School of Biological Science and Medical Engineering, Southeast University, Nanjing, 210096, P. R. China.

AUTHOR CONTRIBUTIONS

This study was conceptualized and designed by Y.B., with co-supervision and consultancy from X.D. The experiment and Nanopore sequencing were performed by W.L. The tool development and data analysis were completed by F.L. F.L., W.M., W.L, and X.D. wrote the manuscript with input from all other authors. All authors read and approved the submission and publication.

FUNDING

National Natural Science Foundation of China (grant number: 61871121) to Y.B. American Parkinson’s Disease Association, Aligning Science Across Parkinson’s (ASAP) and National Institutes of Health (grant U01NS120637) to X.D. This research was funded in whole or in part by ASAP-000301 through the Michael J. Fox Foundation for Parkinson’s Research (MJFF). For the purpose of open access, the author has applied a CC BY public copyright license to all Author Accepted Manuscripts arising from this submission. The funders had no role in the study design, data collection, analysis, the decision to publish or the preparation of the manuscript.

DATA AVAILABILITY STATEMENT

The data underlying this article has been deposited at National Genomics Data Center (https://ngdc.cncb.ac.cn/) with the accession number HRA002605.

FLED is implemented using Python3 and is freely accessible at https://github.com/FuyuLi/FLED. The scripts to generate the simulation data, along with the simulated Circle-seq datasets and the reference genomic sequences are also available on GitHub.

References

  • 1. Møller HD, Mohiyuddin M, Prada-Luengo I, et al. Circular DNA elements of chromosomal origin are common in healthy human somatic tissue. Nat Commun 2018;9:1069. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 2. Møller HD, Parsons L, Jørgensen TS, et al. Extrachromosomal circular DNA is common in yeast. Proc Natl Acad Sci U S A 2015;112:E3114–22. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 3. Shibata Y, Kumar P, Layer R, et al. Extrachromosomal microDNAs and chromosomal microdeletions in normal tissues. Science (New York, N.Y.)2012;336:82–6. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 4. Barreto SC, Uppalapati M, Ray A. Small circular DNAs in human pathology. Malays J Med Sci 2014;21:4–18. [PMC free article] [PubMed] [Google Scholar]
  • 5. Liao Z, Jiang W, Ye L, et al. Classification of extrachromosomal circular DNA with a focus on the role of extrachromosomal DNA (ecDNA) in tumor heterogeneity and progression. Biochim Biophys Acta Rev Cancer 2020;1874:188392. [DOI] [PubMed] [Google Scholar]
  • 6. Paulsen T, Kumar P, Koseoglu MM, et al. Discoveries of extrachromosomal circles of DNA in normal and tumor cells. Trends Genet 2018;34:270–8. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 7. Wu S, Turner KM, Nguyen N, et al. Circular ecDNA promotes accessible chromatin and high oncogene expression. Nature 2019;575:699–703. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 8. Paulsen T, Shibata Y, Kumar P, et al. Small extrachromosomal circular DNAs, microDNA, produce short regulatory RNAs that suppress gene expression independent of canonical promoters. Nucleic Acids Res 2019;47:4586–96. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 9. Ain Q, Schmeer C, Wengerodt D, et al. Extrachromosomal circular DNA: current knowledge and implications for CNS aging and neurodegeneration. Int J Mol Sci 2020;21. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 10. deCarvalho AC, Kim H, Poisson LM, et al. Discordant inheritance of chromosomal and extrachromosomal DNA elements contributes to dynamic disease evolution in glioblastoma. Nat Genet 2018;50:708–17. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 11. Deshpande V, Luebeck J, Nguyen ND, et al. Exploring the landscape of focal amplifications in cancer using AmpliconArchitect. Nat Commun 2019;10:392. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 12. Turner KM, Deshpande V, Beyter D, et al. Extrachromosomal oncogene amplification drives tumour evolution and genetic heterogeneity. Nature 2017;543:122–5. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 13. Verhaak RGW, Bafna V, Mischel PS. Extrachromosomal oncogene amplification in tumour pathogenesis and evolution. Nat Rev Cancer 2019;19:283–8. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 14. Nathanson DA, Gini B, Mottahedeh J, et al. Targeted therapy resistance mediated by dynamic regulation of extrachromosomal mutant EGFR DNA. Science (New York, N.Y.)2014;343:72–6. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 15. Yan Y, Guo G, Huang J, et al. Current understanding of extrachromosomal circular DNA in cancer pathogenesis and therapeutic resistance. J Hematol Oncol 2020;13:124. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 16. Koche RP, Rodriguez-Fos E, Helmsauer K, et al. Extrachromosomal circular DNA drives oncogenic genome remodeling in neuroblastoma. Nat Genet 2020;52:29–34. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 17. Morton AR, Dogan-Artun N, Faber ZJ, et al. Functional enhancers shape extrachromosomal oncogene amplifications. Cell 2019;179:1330–1341.e1313. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 18. Møller HD. Circle-Seq: isolation and sequencing of chromosome-derived circular DNA elements in cells. Methods Mol Biol 2020;2119:165–81. [DOI] [PubMed] [Google Scholar]
  • 19. Møller HD, Bojsen RK, Tachibana C, et al. Genome-wide purification of extrachromosomal circular DNA from eukaryotic cells. J Vis Exp 2016;e54239. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 20. Prada-Luengo I, Krogh A, Maretty L, et al. Sensitive detection of circular DNAs at single-nucleotide resolution using guided realignment of partially aligned reads. BMC Bioinformatics 2019;20:663. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 21. Xu K, Ding L, Chang TC, et al. Structure and evolution of double minutes in diagnosis and relapse brain tumors. Acta Neuropathol 2019;137:123–37. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 22. Rosswog C, Bartenhagen C, Welte A, et al. Chromothripsis followed by circular recombination drives oncogene amplification in human cancer. Nat Genet 2021;53:1673–85. [DOI] [PubMed] [Google Scholar]
  • 23. Chamorro González R, Conrad T, Stöber MC, et al. Parallel sequencing of extrachromosomal circular DNAs and transcriptomes in single cancer cells. Nat Genet 2023;55:880–90. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 24. Ludwig LS, Lareau CA, Ulirsch JC, et al. Lineage tracing in humans enabled by mitochondrial mutations and single-cell genomics. Cell 2019;176:1325–1339.e1322. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 25. Hung KL, Luebeck J, Dehkordi SR, et al. Targeted profiling of human extrachromosomal DNA by CRISPR-CATCH. Nat Genet 2022;54:1746–54. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 26. Cohen S, Mechali M. A novel cell-free system reveals a mechanism of circular DNA formation from tandem repeats. Nucleic Acids Res 2001;29:2542–8. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 27. Ashton PM, Nair S, Dallman T, et al. MinION nanopore sequencing identifies the position and structure of a bacterial antibiotic resistance island. Nat Biotechnol 2015;33:296–300. [DOI] [PubMed] [Google Scholar]
  • 28. Wanchai V, Jenjaroenpun P, Leangapichart T, et al. CReSIL: accurate identification of extrachromosomal circular DNA from long-read sequences. Brief Bioinform 2022;23. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 29. Tüns AI, Hartmann T, Magin S, et al. Detection and validation of circular DNA fragments using Nanopore sequencing. Front Genet 2022;13:867018. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 30. Henriksen RA, Jenjaroenpun P, Sjøstrøm IB, et al. Circular DNA in the human germline and its association with recombination. Mol Cell 2022;82:209–217.e207. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 31. Wang Y, Wang M, Djekidel MN, et al. eccDNAs are apoptotic products with high innate immunostimulatory activity. Nature 2021;599:308–14. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 32. Wang Y, Wang M, Zhang Y. Purification, full-length sequencing and genomic origin mapping of eccDNA. Nat Protoc 2023;18:683–99. [DOI] [PubMed] [Google Scholar]
  • 33. Zhang P, Peng H, Llauro C, et al. ecc_finder: a robust and accurate tool for detecting extrachromosomal circular DNA from sequencing data. Front Plant Sci 2021;12:743742. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 34. Li H. Minimap2: pairwise alignment for nucleotide sequences. Bioinformatics 2018;34:3094–100. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 35. Li H. New strategies to improve minimap2 alignment accuracy. Bioinformatics 2021;37:4572–4. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 36. Hagberg A A, Schult D A, Swart P J. Exploring network structure, dynamics, and function using NetworkX. Proceedings of the 7th Python in Science Conference 2008; pp.11–5.
  • 37. Rathi G, Goel S. Applications of depth first search: a survey. Int J Eng Res Technol 2013;02:1341–7. [Google Scholar]
  • 38. Ordóñez CD, Redrejo-Rodríguez M. DNA polymerases for whole genome amplification: considerations and future directions. Int J Mol Sci 2023;24:9331. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 39. Lu N, Qiao Y, An P, et al. Exploration of whole genome amplification generated chimeric sequences in long-read sequencing data. Brief Bioinform 2023;24. [DOI] [PubMed] [Google Scholar]
  • 40. Li H, Handsaker B, Wysoker A, et al. The sequence alignment/map format and SAMtools. Bioinformatics 2009;25:2078–9. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 41. Lee C. Generating consensus sequences from partial order multiple sequence alignment graphs. Bioinformatics 2003;19:999–1008. [DOI] [PubMed] [Google Scholar]
  • 42. Lee C, Grasso C, Sharlow MF. Multiple sequence alignment using partial order graphs. Bioinformatics 2002;18:452–64. [DOI] [PubMed] [Google Scholar]
  • 43. Madraki G, Judd RP. Recalculating the length of the longest path in perturbed directed acyclic graph. IFAC-PapersOnLine 2019;52:1560–5. [Google Scholar]
  • 44. Yang C, Chu J, Warren RL, et al. NanoSim: nanopore sequence read simulator based on statistical characterization. Gigascience 2017;6:1–6. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 45. Xiang T, Yuan C, Guo X, et al. The novel ZEB1-upregulated protein PRTG induced by helicobacter pylori infection promotes gastric carcinogenesis through the cGMP/PKG signaling pathway. Cell Death Dis 2021;12:150. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 46. Xie R, Xu J, Xiao Y, et al. Calcium promotes human gastric cancer via a novel coupling of calcium-sensing receptor and TRPV4 channel. Cancer Res 2017;77:6499–512. [DOI] [PubMed] [Google Scholar]
  • 47. Zhang H, Kong Q, Wang J, et al. Complex roles of cAMP-PKA-CREB signaling in cancer. Exp Hematol Oncol 2020;9:32. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 48. Kwon HJ, Won YS, Nam KT, et al. Vitamin D₃ upregulated protein 1 deficiency promotes N-methyl-N-nitrosourea and helicobacter pylori-induced gastric carcinogenesis in mice. Gut 2012;61:53–63. [DOI] [PubMed] [Google Scholar]
  • 49. Lim JY, Yoon SO, Hong SW, et al. Thioredoxin and thioredoxin-interacting protein as prognostic markers for gastric cancer recurrence. World J Gastroenterol 2012;18:5581–8. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 50. Morrison JA, Pike LA, Sams SB, et al. Thioredoxin interacting protein (TXNIP) is a novel tumor suppressor in thyroid cancer. Mol Cancer 2014;13:62. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 51. Wang CY, Hua L, Yao KH, et al. Long non-coding RNA CCAT2 is up-regulated in gastric cancer and associated with poor prognosis. Int J Clin Exp Pathol 2015;8:779–85. [PMC free article] [PubMed] [Google Scholar]
  • 52. Wu SW, Hao YP, Qiu JH, et al. High expression of long non-coding RNA CCAT2 indicates poor prognosis of gastric cancer and promotes cell proliferation and invasion. Minerva Med 2017;108:317–23. [DOI] [PubMed] [Google Scholar]
  • 53. Merkulov P, Egorova E, Kirov I. Composition and structure of Arabidopsis thaliana extrachromosomal circular DNAs revealed by Nanopore sequencing. Plants (Basel) 2023;12:2178. [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Supplementary_Materials_bbad388
Supplementary_Tables_bbad388

Data Availability Statement

The data underlying this article has been deposited at National Genomics Data Center (https://ngdc.cncb.ac.cn/) with the accession number HRA002605.

FLED is implemented using Python3 and is freely accessible at https://github.com/FuyuLi/FLED. The scripts to generate the simulation data, along with the simulated Circle-seq datasets and the reference genomic sequences are also available on GitHub.


Articles from Briefings in Bioinformatics are provided here courtesy of Oxford University Press

RESOURCES