Abstract
Extrachromosomal DNA (ecDNA) is a common mode of oncogene amplification but is challenging to analyze. Here, we adapt CRISPR-CATCH, in vitro CRISPR-Cas9 treatment and pulsed field gel electrophoresis of agarose-entrapped genomic DNA, previously developed for bacterial chromosome segments, to isolate megabase-sized human ecDNAs. We demonstrate strong enrichment of ecDNA molecules containing EGFR, FGFR2 and MYC from human cancer cells and NRAS ecDNA from human metastatic melanoma with acquired therapeutic resistance. Targeted enrichment of ecDNA versus chromosomal DNA enabled phasing of genetic variants, identified the presence of an EGFRvIII mutation exclusively on ecDNAs and supported an excision model of ecDNA genesis in a glioblastoma model. CRISPR-CATCH followed by nanopore sequencing enabled single-molecule ecDNA methylation profiling and revealed hypomethylation of the EGFR promoter on ecDNAs. We distinguished heterogeneous ecDNA species within the same sample by size and sequence with base-pair resolution and discovered functionally specialized ecDNAs that amplify select enhancers or oncogene-coding sequences.
Subject terms: Electrophoresis, Epigenetics, Oncogenes
CRISPR-CATCH is used to isolate extrachromosomal DNA (ecDNA) molecules containing oncogenes from human cancer cells. CRISPR-CATCH followed by nanopore sequencing allows for methylation profiling, highlighting differences from the native chromosomal loci.
Main
Oncogene amplification is a key cancer-driving mechanism and frequently occurs on circular ecDNA. ecDNA oncogene amplifications are present in half of human cancer types and up to one-third of tumor samples and are associated with poor patient outcomes1–3. Given the prevalence of ecDNA in cancer, there is an urgent need for better characterization of unique genetic and epigenetic features of ecDNA to understand how it may differ from chromosomal DNA and obtain clues about how it is formed and maintained in tumors. However, isolation and targeted profiling of megabase-sized, clonal ecDNAs is currently challenging due to their large sizes and sequence complexity, in contrast to small kilobase- and subkilobase-sized DNA circles known as extrachromosomal circular DNA elements (eccDNAs) observed also in non-cancer cells and apoptotic byproducts4,5.
There are currently three main approaches to analyzing sequences of ecDNAs in cancer cells: (1) DNA fluorescence in situ hybridization (FISH), (2) bulk whole-genome sequencing (WGS) and (3) exonuclease digestion of linear DNA followed by DNA amplification. The first method, DNA FISH, involves arresting cells in metaphase followed by chromosome spreading and hybridization of a DNA probe on a microscope slide. This method provides excellent separation of ecDNA and chromosomal DNA signals and has been used to confirm the presence of oncogenes and drug resistance genes on ecDNA. However, this method is low throughput (tens of cells) and provides limited, binary sequence information (a probe either binds or does not bind to DNA). The second method, bulk short- or long-read sequencing, provides much higher sequence resolution. However, sequencing signal represents a combination of all DNA material in a sample, including ecDNA and chromosomal DNA. In addition to the ambiguous origin of sequencing reads, rearranged ecDNA sequences are computationally inferred1,6 but difficult to validate, as sequencing reads are far too short to span the entire length of an ecDNA molecule (typically several megabases). Optical mapping (OM) allows analysis of longer DNA molecules (up to several hundred kilobases) by compromising nucleotide-level information, but each individual OM molecule is typically shorter than an ecDNA circle7,8. Sequence segments can be computationally ‘stitched’ together to form a list of candidate reconstructed paths, though empirically proving the true ecDNA structure, when possible, is very time-consuming and labor-intensive. The third method, exonuclease treatment combined with DNA amplification, is effective for small DNA circles (up to tens of kilobases; Circle-seq4,9) and was recently applied to ecDNA in cancer cells10. It entails magnetic-bead-based DNA isolation, treatment with an exonuclease to deplete linear DNA, followed by multiple displacement amplification. This method requires intact DNA circles and is therefore highly limited by ecDNA size, as megabase-sized DNA molecules are extremely fragile in solution and prone to breakage. Further, this method requires DNA amplification and, therefore, cannot be used for epigenetic analyses. Phi29, the processive multiple displacement amplification polymerase, produces amplicons that are tens of kilobases and thus amplifies small circles via rolling-circle amplification; however, this is currently challenging for megabase-sized ecDNA. Finally, analysis of these enriched ecDNAs by short- or long-read sequencing also suffers from the same read length limitations for amplicon reconstruction.
Here, we adapt a previously developed method, termed CRISPR-CATCH11 (Cas9-assisted targeting of chromosome segments), to specifically enrich for megabase-sized ecDNA from cancer cells and archival patient tumor tissues. DNA amplification is not required; thus, this method allows targeted analyses of both the genetic sequence and epigenomic landscape of isolated ecDNA. We also provide an analytical pipeline for reconstructing amplicon structures de novo with high confidence using sequence information of ecDNA species separated by size.
Results
Enrichment and visualization of ecDNA by CRISPR-CATCH
Analysis of tumor samples in The Cancer Genome Atlas (TCGA) showed that most ecDNA sequences predicted were above 200 kb, a larger size range than that obtained from standard high-molecular-weight (HMW) DNA extraction and exonuclease-based circular DNA enrichment (Extended Data Fig. 1a–c)4,5. To preserve large intact circular ecDNA, we encapsulated genomic DNA of GBM39 cells (patient-derived glioblastoma neurosphere model containing EGFR ecDNA) in agarose plugs (Methods). Fragment size distribution analysis by pulsed field gel electrophoresis (PFGE) showed that virtually all agarose-entrapped genomic DNA containing ecDNA was restricted to either the loading well or the upper compression zone (CZ; region of large DNA molecules; Extended Data Fig. 1a,d). ecDNA was not detectable in the resolution window, indicating that intact circular ecDNA does not migrate freely in PFGE (Extended Data Fig. 1d). This finding is in agreement with previous Southern blot studies12–14. To selectively pull ecDNA into the resolution window of the gel, we preincubated GBM39 genomic DNA in vitro with CRISPR-Cas9 and a single guide RNA (sgRNA) targeting the EGFR locus, an amplified sequence on ecDNA. We reasoned that a single cut would linearize ecDNA, resulting in differential migration in PFGE (Fig. 1a). We further reasoned that the same single cut in the corresponding chromosomal locus would result in two much larger chromosomal DNA pieces that migrate much more slowly than ecDNA and therefore would not be coenriched. Cas9 digestion of EGFR ecDNA resulted in a prominent band of 1.2–1.37 Mb, concordant with the 1.258-Mb amplicon predicted by bulk WGS and extrachromosomal amplification of the targeted EGFR sequence (Fig. 1b–d and Extended Data Fig. 2a)7,8. Short-read sequencing of the gel-extracted band confirmed strong enrichment of the expected ecDNA sequence (Fig. 1e,f), demonstrating that a single cut is sufficient to allow enrichment of ecDNA by PFGE. We refer to this method as CRISPR-CATCH (a term previously coined for a two-cut Cas9 treatment followed by gel extraction for isolating and cloning bacterial chromosomal fragments11,15). CRISPR-CATCH enabled a 30-fold enrichment of the targeted ecDNA (60% of all sequencing reads versus 2% in WGS), resulting in ultrahigh (~200× normalized) sequencing coverage (Fig. 1e,f and ecDNA in Extended Data Fig. 2b). Simultaneous cleavage of two sgRNA target sites 20 kb away from each other led to loss of the sequence segment between the cut sites, as would be expected given a circular structure and end-to-end junction of the amplified region (Fig. 1g; ecDNA guides A + B). A single cut in the normal diploid chromosomal EGFR locus did not result in a DNA band (as shown in Jurkat cells; Fig. 1d), further supporting enrichment of ecDNAs in GBM39 cancer cells. To isolate the chromosomal EGFR locus, we performed CRISPR-CATCH using two sgRNAs targeting just outside of the amplified region (upstream and downstream; Fig. 1a,c). This dual-cut strategy resulted in a linear fragment of roughly the same size as the ecDNA molecule and successfully enriched for the chromosomal EGFR sequence as demonstrated by increased sequencing coverage around the chromosome-targeting guides (Fig. 1d,g; chromosomal DNA, Extended Data Fig. 2b). Chromosomal gel bands appeared much fainter than ecDNA bands (Fig. 1d), consistent with the fact that ecDNAs exist in higher copy numbers than the chromosomal locus in GBM39 cells. Sequencing coverage analysis further validated enrichment of ecDNA versus chromosomal DNA alleles (Extended Data Fig. 2c,d). Together, these results showed that CRISPR-CATCH can be used to isolate megabase-sized ecDNA molecules and corresponding chromosomal locus from the same cancer cell sample. Although PFGE was previously used in Southern blot studies to visualize ecDNA sizes12,13, CRISPR-CATCH provides an empirical pairing of ecDNA amplicon size (by molecular separation) to structure with base-pair resolution (by sequencing).
To expand the capabilities of CRISPR-CATCH, we further optimized a tumor processing protocol for applying CRISPR-CATCH on flash-frozen patient tumor specimens as demonstrated in an instructive case of metastatic melanoma (Fig. 2a and Methods). As tumor specimens can have large amounts of fragmented DNA interfering with CRISPR-CATCH, we introduce electrodepletion, a sequential electrophoretic strategy to remove fragmented DNA from patient tumor samples (Fig. 2b and Methods). This strategy effectively removes DNA fragments and traps intact genomic DNA as well as intact circular ecDNA, as evidenced by removal of DNA size markers as well as successful fractionation of known FGFR2 ecDNAs from stomach cancer SNU16 cells by CRISPR-CATCH after applying electrodepletion (Extended Data Fig. 3a,b). For our clinical tumor sample, DNA bands were not visible after PFGE due to low amounts of DNA; nonetheless, CRISPR-CATCH still successfully enriched for ecDNAs and confirmed the amplicon size, as shown by strong agreement between the molecular size on the gel and the length of the enriched amplified region in sequencing (Fig. 2c and Extended Data Fig. 3c,d). This clinical tumor sample was obtained from a patient with BRAF V600-mutated melanoma who was treated with BRAF and MEK inhibitors and developed a metastatic lesion with acquired resistance coincident with the acquisition of ecDNA (Fig. 2a). CRISPR-CATCH and AmpliconArchitect confirmed the amplification of an 890-kb ecDNA encompassing NRAS, a gene known to confer acquired resistance to BRAF inhibition16 as well as combined BRAF and MEK inhibition when amplified17 (Fig. 2c and Extended Data Fig. 3c). The NRAS amplicon breakpoints coincided with boundaries of topologically associating domains in a melanoma cell line (Extended Data Fig. 3e); the 3′ portion of the amplicon region encompasses a topologically associating domain containing multiple peaks of histone H3 lysine 27 acetylation (H3K27ac) in at least one of seven human cell types (Extended Data Fig. 3e,f), pointing to potential enhancers that may be rewired to the 5′ located NRAS gene via ecDNA circularization. An NRAS G12R missense mutation, which locks NRAS in the GTP-bound active conformation and previously linked to melanoma18, was identified on ecDNAs with an allele frequency of 100%, suggesting strong selection for the mutated allele on ecDNAs (Fig. 2d). Notably, this metastatic tumor sample was 10 years old at the time of ecDNA isolation (biopsy in October 2012), showing that CRISPR-CATCH is fully feasible on archival human tumor specimens. These data further validate an ecDNA mechanism for acquired resistance to MAP kinase pathway inhibitors in authentic human cancer.
Phasing of oncogenic variants on ecDNA and identification of the chromosomal origin of ecDNA
Next, we performed targeted analysis of the genetic sequences of ecDNA and chromosomal DNA containing the EGFR locus in GBM39 cells (Fig. 3a). From ecDNA and chromosomal DNA molecules containing the EGFR locus isolated using CRISPR-CATCH, we first identified structural variants (SVs) in short-read sequencing data. GBM39 cells were previously shown to harbor the EGFRvIII deletion, an activating EGFR mutation7,8,19. Importantly, sequencing coverage combined with breakpoint analysis of CRISPR-CATCH data revealed that the EGFRvIII mutation is predominantly found on ecDNA, while the chromosomal locus mainly contains full-length EGFR (Fig. 3b). Wild-type EGFR appeared at ~75% in the chromosomal fraction, consistent with the level of chromosomal DNA enrichment and suggesting that the remaining ~25% EGFRvIII comes from carryover ecDNAs (Fig. 3b, Extended Data Fig. 2d). This observation suggests selection and amplification of the EGFRvIII mutation and supports previous studies suggesting that ecDNA may help cancer cells adapt to selective pressure and harbor unique genetic alterations6,20,21.
We then assessed the frequencies of SNVs found on enriched ecDNA and chromosomal DNA. Notably, we observed strong divergence of SNVs on ecDNA compared to those on chromosomal DNA, suggesting that they were haplotype-specific germline variants originating from different parental alleles (Extended Data Fig. 4a,b). Similar to the EGFRvIII analysis, unique SNVs located in the chromosomal fraction exhibited allele frequencies of 70–75%, consistent with the level of chromosomal DNA enrichment (Extended Data Fig. 4a,c). CRISPR-CATCH also identified low-frequency subclonal mutations on ecDNA and chromosomal DNA (Extended Data Fig. 4c,d). Importantly, these subclonal mutations on ecDNA are indistinguishable from chromosomal SNVs in bulk WGS data based on variant allele frequencies (VAFs) alone but can be clearly phased using CRISPR-CATCH (Fig. 3c and Extended Data Fig. 4b–d). The divergent ecDNA and chromosomal haplotypes strongly suggest that ecDNA arose from a single chromosomal allele (allele 1), whereas the second allele (allele 2) containing wild-type EGFR is still present on chromosomal DNA. Based on this finding, we asked whether the chromosomal allele from which ecDNA originated (allele 1) can still be detected. Although there are six copies of chromosome 7 (native location of EGFR) in GBM39 cells (Fig. 1b and Extended Data Fig. 2a), quantification of VAFs in the chromosomal arm upstream and downstream of the EGFR-amplified region showed that one haplotype corresponded to one copy of chromosome 7, whereas a second haplotype corresponded to five copies (Fig. 3d). We further identified an SV resulting from deletion of the amplified region corresponding to one copy of chromosome 7, suggesting that it was an excision scar left behind during the formation of ecDNA (Fig. 3d). Together, this analysis shows the sequence of genomic events that preceded the formation of ecDNA and provides strong evidence for an excision model of ecDNA genesis (Fig. 3e). From the two original parental alleles, there was a DNA rearrangement event on allele 1 that led to the excision and circularization of the EGFR ecDNA. The gain of the EGFRvIII mutation and ecDNA amplification led to the major ecDNA allele we observed. In addition, there was a gain of four additional copies of allele 2 of chromosome 7. These data suggest that the allele that served as the original template for the ecDNA no longer contains the sequence harboring EGFR and provide strong evidence for the ‘episome model’, a model of ecDNA formation in which a genomic locus is excised from chromosomal DNA as an episome and circularized to form an ecDNA (Fig. 3e) rather than duplication of sequences22–25.
Single-molecule DNA methylation profile of isolated ecDNA revealed hypomethylation of gene promoters
We then examined the feasibility of analyzing epigenomic profiles of ecDNA using CRISPR-CATCH. After ecDNA isolation as before, we performed nanopore sequencing to obtain single-molecule sequence information and DNA cytosine methylation (5mC) profiles. We analyzed 5mC-CpG methylation of isolated ecDNA as a proof of concept and observed a strong anti-correlation of 5mC with chromatin accessibility based on bulk assay for transposase-accessible chromatin using sequencing (ATAC-seq), validating the identification of regulatory elements (Methods, Fig. 4a,b and Extended Data Fig. 5a). We also isolated the corresponding EGFR chromosomal locus in GBM39 cells and analyzed its DNA methylation profile (Fig. 4a,b). We observed reduced DNA methylation at regulatory elements on ecDNA compared to the same elements on chromosomal DNA, suggesting altered gene regulation (top 50 ATAC-seq peaks; Fig. 4c). The four regions that lost 5mC on ecDNA compared to its chromosomal locus in the same cells were all gene promoters, including that of the EGFR oncogene (Methods, Fig. 4d,e and Extended Data Fig. 5b–d). The pattern of hypomethylation corresponded to nucleosome positions shown by micrococcal nuclease digestion with deep sequencing (MNase-seq), implying a more active chromatin state on ecDNA (Fig. 4e and Extended Data Fig. 5d)26,27. These hypomethylated sites are located outside the EGFR deletion on ecDNAs and therefore cannot be explained by the SV. Finally, single-molecule analysis of enriched ecDNA at the EGFR promoter showed hypomethylation at the EGFR promoter and co-occurrence of methylation spanning hundreds of CpG sites around the region on the same molecules (285 CpG sites; Fig. 4f). Together, these data show that gene promoters on ecDNA may have increased activities compared to the corresponding chromosomal locus on a single-molecule level and demonstrate that CRISPR-CATCH can be used to measure epigenomic features of ecDNA.
Mapping of ecDNA amplicon structures resolved heterogeneous SVs and an altered enhancer landscape
Many cancer cells contain ecDNAs with more complex, heterogeneous structures, including multiple sequence rearrangements and more than one circle species6. We reasoned that CRISPR-CATCH may provide direct evidence of molecule size and amplicon-phased structural information for these complex amplicons and that this information can be used to computationally reconstruct ecDNA with higher confidence. To this end, we developed an analytical pipeline for amplicon reconstruction from CRISPR-CATCH data (Methods and Fig. 5a). We modified and adopted AmpliconArchitect6 for generating a copy-number-aware breakpoint graph for each isolated amplicon. Next, we implemented a method for extracting ecDNA candidate paths from the graph, called candidate amplicon path enumerator (CAMPER). Candidate ecDNA structures were generated from the breakpoint graph, estimated multiplicity of genomic segments and molecular size based on PFGE using a depth-first search approach (Methods). Finally, quality estimates of resulting structures were produced for filtering out any low-confidence reconstructions in the case of low-quality gel extractions (for example, incompletely separated ecDNA species) or undetectable breakpoints from sequencing, etc. As validation, we reconstructed the 1.258-Mb circular ecDNA circle encoding EGFR in GBM39 cells using this workflow, yielding a structure fully consistent with previous reports using WGS and OM7,8 (Extended Data Fig. 6). To further demonstrate the utility of this tool, we applied this pipeline to a stomach cancer cell line, SNU16, which contains multiple ecDNA species with MYC, FGFR2 and additional sequences connected by complex structural rearrangements (Extended Data Fig. 7a)28. CRISPR-CATCH using guides targeting the MYC or FGFR2 amplicon resulted in multiple visible bands in PFGE (Fig. 5b), revealing extensive molecular heterogeneity of ecDNAs. Gel-extracted ecDNAs were multiplexed for sequencing. Breakpoint graphs of ecDNA species were greatly simplified by CRISPR-CATCH because each amplicon could be separately reconstructed and was not intermixed with all other amplicons (Extended Data Fig. 7b). In 4 of 23 libraries (bands d,i,m,p; Fig. 5b), short-read sequencing of the CRISPR-CATCH-isolated band was sufficient to enable end-to-end, megabase-scale reconstruction of the ecDNA sequence. Five libraries corresponded to the CZ and showed very low levels of ecDNA enrichment, suggesting that the true ecDNA sizes are smaller than 2.2 Mb (bands a,e,h,o,r; Fig. 5b and Extended Data Fig. 8a). In the remaining cases, large amplicon sequences were enriched, but one or more missing edges prevented unambiguous amplicon resolution (Fig. 5b, c, Extended Data Fig. 8a). From these data, we reconstructed three unique ecDNAs containing MYC or FGFR2: a 1.604-Mb FGFR2 ecDNA that was reconstructed from two independent CRISPR-CATCH treatments (using sgRNAs with cut sites >300 kb apart), a smaller FGFR2 ecDNA species that was 278 kb, and a 622-kb MYC ecDNA containing sequences originating from chromosomes 8 and 11 (Fig. 5d–f and Extended Data Fig. 8b). All reconstructions from CRISPR-CATCH data passing quality filters were supported by contigs assembled from OM data (N50 50 Mb) provided to AmpliconReconstructor7, further validating their structures (Methods, Fig. 5d–f and Extended Data Fig. 8b).
Gene expression is regulated by chromatin interactions between gene promoters and non-coding regulatory elements such as enhancers. Recent studies showed that functional enhancers interacting with oncogenes in cis (on the same ecDNA molecule) and in trans (between different ecDNA molecules within an ecDNA hub, or between ecDNA and chromosomal loci) shape ecDNA amplicon structure and oncogene expression28–31. To identify ecDNA structures containing these enhancers, we performed CRISPR-CATCH using sgRNAs targeting various enhancers on SNU16 ecDNAs marked by active H3K27ac, BRD4 binding and chromatin accessibility by ATAC-seq and previously identified to modulate MYC or FGFR2 expression via CRISPR interference28 (Fig. 5c and Extended Data Fig. 8c,d). CRISPR-CATCH enrichment analysis revealed additional ecDNA species showing focal enhancer amplification as well as amplicons containing rearranged enhancers in association with MYC and FGFR2 (Fig. 5c,g,h and Extended Data Fig. 9a). We independently verified instances of FGFR2 enhancer amplicons lacking the FGFR2 oncogene-coding sequence using DNA FISH, further supporting our CRISPR-CATCH results (Extended Data Fig. 9b). These findings suggest that extrachromosomal amplification and rearrangement events may be shaped by both enhancer proximity to oncogenes on an ecDNA molecule as well as overall abundance of enhancer sequences in a pool of ecDNA molecules. These focal enhancer amplification events (Fig. 5c,g and Extended Data Fig. 9b), as well as the small ecDNA species containing the FGFR2 coding sequence but missing its 5′ cognate enhancers (Fig. 5c,f and Extended Data Fig. 9b), suggest ecDNA specialization (Fig. 5h). As ecDNAs can interact with one another in trans within a hub28, amplification of enhancer sequences in a pool of ecDNAs may facilitate intermolecular enhancer-promoter interactions and further increase oncogene expression.
To validate our ecDNA mapping, we compared connected ecDNA segments identified by CRISPR-CATCH with unnormalized background signals in chromatin conformation capture (H3K27ac HiChIP, a protein-directed chromatin conformation capture assay; covalently connected DNA segments have higher frequencies of background interactions than unconnected segments; Methods) and observed a high degree of concordance (Extended Data Fig. 9c,d). In contrast, bulk WGS poorly predicts these ecDNA structures, as shown by low concordance with chromatin conformation capture background signals, demonstrating that WGS provides a collapsed and limited picture of the true diversity of ecDNA structures (Extended Data Fig. 9c,d). Finally, to orthogonally validate ecDNA maps generated by CRISPR-CATCH, we performed dual-color DNA FISH targeting pairs of loci originating from chromosomes 8 and 10 segments on metaphase spreads and confirmed that colocalization of the targeted loci strongly correlated with connected ecDNA segments identified by CRISPR-CATCH (Extended Data Fig. 9c,e,f). Together, these data demonstrate the utility of CRISPR-CATCH as a method for disambiguating ecDNA structures, particularly when a diverse mixture of ecDNAs is present. This method aids in accurate amplicon mapping and reconstruction orthogonal to contig assembly from bulk DNA and provides insights into the ecDNA structural and regulatory landscape.
Discussion
By exploiting the distinctive PFGE migration pattern of large circular ecDNA, we show that ecDNA can be isolated from human cancer cells, including archival patient tumor specimens, and separated by size using CRISPR-CATCH. This method enables targeted analyses of ecDNA sequences and epigenomic features that were previously challenging (Extended Data Fig. 10). CRISPR-CATCH also makes it possible to directly compare ecDNA and the corresponding chromosomal locus in the same cell sample by physically separating them. It is now possible to obtain allele-specific information of ecDNA versus chromosomal DNA without solely relying on SNVs. Furthermore, although ecDNA sequences represent copy-number-amplified genomic regions, we show that VAFs in bulk WGS alone do not accurately reflect the locations of SNVs (for example, a low-frequency SNV can be located on either non-amplified chromosomal DNA or a small subset of ecDNA molecules). In contrast, the ability to phase SNVs by CRISPR-CATCH enables accurate identification of sequencing signal originating from ecDNA to obtain allele-specific information (for example, in bulk ATAC-seq, RNA-sequencing or ChIP-seq data8,32). In addition, allele phasing using CRISPR-CATCH led to our discovery of the chromosomal allelic origin of ecDNAs and direct evidence of an excision site. Future systematic examination of allelic origins of ecDNAs across different cancers may provide clues about the mechanism of ecDNA genesis.
The scope and challenge of ecDNA isoforms were not fully appreciated in the past. As bulk WGS represents the aggregation of sequencing reads originating from multiple ecDNA species as well as chromosomal DNA, it provides a collapsed and limited picture of the true diversity of ecDNA structures. On the other hand, CRISPR-CATCH enables separation of ecDNAs from the rest of the genome and accurate reconstruction of diverse amplicon structures. Thus, CRISPR-CATCH may be applied to future studies on cancer cells during early formation of ecDNA, to cells evolving under chemotherapeutic or other selective pressures and in other settings where changes in genetic and chromatin features of ecDNA are hypothesized to contribute to cancer cell evolution. As ecDNA often exhibits tremendous structural heterogeneity, CRISPR-CATCH opens up a new window into deciphering intratumoral genetic heterogeneity in cancer. The ability to separate ecDNAs by size may provide increased structural resolution to other types of analysis, such as single-cell sequencing, in which heterogeneous mixes of ecDNA structures are computationally inferred but difficult to resolve confidently. These future applications of CRISPR-CATCH may also address how ecDNA and chromosomal DNA diverge as they evolve separately and under different kinetics. We note that tandem duplications on chromosomal DNA (for example, homogeneously staining regions) can also be isolated by CRISPR-CATCH with a single guide. In addition, CRISPR-CATCH requires prior knowledge of the ecDNA-amplified genomic locus and therefore should be used to complement additional methods like WGS to identify the amplified genomic locus and/or metaphase FISH to verify the source of isolated DNA. Delivery of multiple sgRNAs targeting various loci may allow multiplexing in the future in cases in which there are multiple distinct ecDNA-amplified loci and/or sample materials are limited.
We demonstrate that CpG methylation can be measured from enriched ecDNA molecules. Past studies have shown that cells containing ecDNA express amplified genes at higher levels than cells containing linear amplifications, and that the ecDNA oncogene locus is more accessible than other loci on linear DNA by bulk ATAC-seq1,8. Our comparison of ecDNA versus chromosomal DNA encoding the same gene loci from the same cells showed that gene promoters on circular ecDNA are less methylated than the same promoters on linear chromosomal DNA, suggesting that ecDNA enables more active transcription. In principle, CRISPR-CATCH may be coupled to several genomic assays to understand key chromatin-templated processes on ecDNA such as transcription, DNA replication, and repair33–35.
Together, we show that ecDNA profiling using CRISPR-CATCH can provide insights into ecDNA structure, diversity, origin and epigenomic landscape. As such, CRISPR-CATCH presents an opportunity for a multitude of molecular studies that will help elucidate how ecDNA oncogene amplifications are regulated in cancer cells.
Methods
Tissue sample collection
Patient tissue sample used in this study was obtained with informed consent and approval by the institutional review boards at the University of California, Los Angeles.
Cell culture
GBM39 neurospheres were derived from patient tissue as previously described8 and were authenticated using metaphase DNA FISH with probes hybridizing to EGFR as well as a chromosome 7 centromeric probe to confirm ecDNA amplification status. SNU16 cells were obtained from ATCC (CRL-5974). GBM39 cells were maintained in DMEM/Nutrient Mixture F-12 (DMEM/F12 1:1; Gibco, 11320-082), B-27 Supplement (Gibco, 17504044), 1% penicillin-streptomycin (Thermo Fisher, 15140-122), human epidermal growth factor (20 ng ml−1; Sigma-Aldrich, E9644), human fibroblast growth factor (20 ng ml−1; Peprotech) and heparin (5 µg ml−1; Sigma-Aldrich, H3149-500KU). SNU16 cells were maintained in DMEM/F12 supplemented with 10% FBS and 1% penicillin-streptomycin. All cells were cultured at 37 °C with 5% CO2. All cell lines tested negative for mycoplasma contamination.
WGS
WGS data from bulk GBM39 cells were previously published8 and raw fastq reads obtained from the National Center for Biotechnology Information (NCBI) Sequence Read Archive under BioProject accession PRJNA506071. Reads were trimmed of adapter content with Trimmomatic36 (version 0.39), aligned to the hg19 genome using BWA MEM37 (0.7.17-r1188), and PCR duplicates were removed using Picard’s MarkDuplicates (version 2.25.3). WGS data from bulk SNU16 cells were previously generated (SRR530826, Genome Research Foundation).
Analysis of TCGA ecDNA amplicon sizes
To obtain ecDNA intervals for TCGA tumors, we ran AmpliconClassifier (version 0.4.6; https://github.com/jluebeck/AmpliconClassifier) on AmpliconArchitect outputs published previously using WGS data2. ecDNA amplicon sizes were estimated by summing ecDNA amplicon interval sizes for each tumor.
ecDNA isolation by CRISPR-CATCH
Genomic DNA was embedded in agarose plugs using a modified protocol based on guidelines from the manufacturer of the CHEF Mapper XA System (Bio-Rad Laboratories) as previously described38. Briefly, molten 1% certified low-melt agarose (Bio-Rad, 1613112) in PBS was equilibrated to 45 °C. One million cells were pelleted per condition, washed twice with cold 1× PBS, resuspended in 30 µl PBS and briefly heated to 37 °C. Then, 30 µl agarose solution was added to cells, mixed, transferred to a plug mold (Bio-Rad Laboratories, 1703713) and incubated on ice for 10 min. Solid agarose plugs containing cells were ejected into 1.5-ml Eppendorf tubes, suspended in buffer SDE (1% SDS, 25 mM EDTA at pH 8.0) and placed on shaker for 10 min. The buffer was removed and buffer ES (1% N-laurolsarcosine sodium salt solution, 25 mM EDTA at pH 8.0, 50 µg ml−1 proteinase K) was added. Agarose plugs were incubated in buffer ES at 50 °C overnight. On the following day, proteinase K was inactivated with 25 mM EDTA with 1 mM PMSF for 1 h at room temperature with shaking. Plugs were then treated with RNase A (1 mg ml−1) in 25 mM EDTA for 30 min at 37 °C, and washed with 25 mM EDTA with a 5-min incubation. Plugs not directly used for ecDNA enrichment were stored in 25 mM EDTA at 4 °C.
To perform in vitro Cas9 digestion, agarose plugs containing DNA were washed three times with 1× NEBuffer 3.1 (New England BioLabs) with 5-min incubations. Next, DNA was digested in a reaction with 30 nM sgRNA (Synthego) and 30 nM spCas9 (New England BioLabs, M0386S) after pre-incubation of the reaction mix at room temperature for 10 min. To make two cuts on the native chromosomal locus, 15 nM of each sgRNA was added to the reaction. Cas9 digestion was performed at 37 °C for 4 h, followed by overnight digestion with 3 µl proteinase K (20 mg ml−1) in a 200 µl reaction. On the following day, proteinase K was inactivated with 1 mM PMSF for 1 h with shaking. Plugs were then washed with 0.5× TAE buffer three times with 5-min incubations. Plugs were loaded into a 1% certified low-melt agarose gel (Bio-Rad, 1613112) in 0.5× TAE buffer with ladders (CHEF DNA Size Marker, 0.2–2.2 Mb, Saccharomyces cerevisiae Ladder: Bio-Rad, 1703605; CHEF DNA Size Marker, 1–3.1 Mb, Hansenula wingei Ladder: Bio-Rad, 1703667) and PFGE was performed using the CHEF Mapper XA System (Bio-Rad) according to the manufacturer’s instructions and using the following settings: 0.5× TAE running buffer, 14 °C, two-state mode, run time duration of 16 h 39 min, initial switch time of 20.16 s, final switch time of 2 min 55.12 s, gradient of 6 V cm−1, included angle of 120° and linear ramping. Gel was stained with 3× Gelred (Biotium) with 0.1 M NaCl on a rocker for 30 min covered from light and imaged. Bands were then extracted and DNA was isolated from agarose blocks using beta-Agarase I (New England BioLabs, M0392L) following the manufacturer’s instructions.
To perform CRISPR-CATCH on flash-frozen patient tumor tissues, we removed frozen tissues from −80 °C and incubated them at −20 °C overnight. The tissues were thawed on ice, rinsed with MEM, Hanks’ Balanced Salts (Gibco, 11575032) and cut into approximately 5 mm × 5 mm pieces using microdissection scissors. Molten 0.5% certified low-melt agarose (Bio-Rad, 1613112) in 1× PBS was equilibrated to 45 °C, and 50 µl was added to each plug mold (Bio-Rad, 1703713). Each piece of tissue was then suspended into the molten agarose in the plug mold and minced using microdissection scissors. The agarose plug molds were allowed to solidify on ice for 10 min. To dissociate the tissues, agarose-embedded tumors were treated with a mix of 0.1826–1.826 U collagenase (Sigma-Aldrich, C9891), 49.92–124.8 U hyaluronidase (Sigma-Aldrich, H3506) and 1 U dispase (Stem Cell, 07913) in 1 ml MEM at 37 °C for 1 h. Agarose plugs containing tumors were treated with buffer SDE for 10 min as above and buffer ES for 48 h at 50 °C. Plugs were treated PMSF and RNase A and washed with 25 mM EDTA as above. To remove fragmented DNA background in tumor samples via electrodepletion, plugs were loaded into a 1% certified low-melt agarose gel in 0.5× TAE buffer and run in the CHEF Mapper XA System at 14 °C using the following settings: multi-state mode, block 1 with 3 h of constant voltage of 5.2 V cm−1 (3 h initial and final switch times, linear ramping, state 1) and included angle of 0°, block 2 with 2 min of constant voltage of 5.2 V cm−1 (2 min initial and final switch times, linear ramping, state 1) and included angle of 180°. The gel was removed from the chamber, and agarose plugs trapping intact DNA were carefully removed from the loading wells to avoid breakage. The resulting agarose plugs were then subjected to CRISPR-Cas9 in vitro digestion, PFGE and DNA extraction as described above. All guide sequences are provided in Supplementary Table 1. Unprocessed PFGE images are provided as Source Data.
In-solution HMW DNA isolation and exonuclease treatment
For comparison between agarose-embedded DNA and in-solution HMW DNA, we performed HMW DNA extraction using the Qiagen MagAttract HMW DNA Kit (67563) following the manufacturer’s protocol. To digest linear DNA, we used Plasmid-Safe ATP-Dependent DNase (Biosearch Technologies, E3110K) and performed the reaction according to the manufacturer’s protocol over 5 days at 37 °C (1 μl Plasmid-Safe ATP-Dependent DNase, 2 μl 25 mM ATP, 800 ng HMW DNA and 5 μl Plasmid-Safe 10× Reaction Buffer with nuclease-free water to bring up the total reaction volume to 50 μl). After every 24 h, additional enzyme and ATP was added (1 μl Plasmid-Safe ATP-Dependent DNase, 2 μl 25 mM ATP and 0.3 μl Plasmid-Safe 10× Reaction Buffer). After 5 days, DNase was inactivated by a 30-min incubation at 70 °C. To visualize DNA by PFGE, samples were mixed with 1% certified low-melt agarose (Bio-Rad, 1613112) in 0.5× TAE buffer, mixed, transferred to a plug mold (Bio-Rad, 1703713) and incubated on ice for 10 min. Solid agarose plugs were loaded into a 1% certified low-melt agarose gel (Bio-Rad, 1613112) in 0.5× TAE buffer with ladders (CHEF DNA Size Marker, 0.2–2.2 Mb, S. cerevisiae Ladder: Bio-Rad, 1703605; CHEF DNA Size Marker, 1–3.1 Mb, H. wingei Ladder: Bio-Rad, 1703667), and PFGE was performed using the CHEF Mapper XA System (Bio-Rad) using the same settings as those used in CRISPR-CATCH experiments described above.
Hi-C visualization
Hi-C data from the SK-MEL-5 melanoma cell line were obtained from ENCODE (generated by Dekker laboratory) and visualized using the 3D Genome Browser (3dgenome.fsm.northwestern.edu; hg19, raw-rep1)39,40.
Metaphase DNA FISH
Cells were arrested at mitosis with 30 ng ml−1 KaryoMAX Colcemid Solution in PBS (Gibco) for 18 h. Cells were washed once with PBS and resuspended in 0.075 M KCl at 37 °C for 15–20 min and then fixed in an equal volume of freshly prepared Carnoy’s fixative (3:1 methanol/glacial acetic acid, v/v) at room temperature. The cells were washed another three times with fixative, resuspended and dropped onto humidified glass slides. Air-dried samples were washed briefly in 2× SSC buffer (Promega) and then dehydrated in ascending ethanol series (70%, 85% and 100%) each for 2 min. For GBM39 cells, Cytocell EGFR amplification Probe (OGT) targeting both EGFR and D7Z1 (centromeric probe as a control for chromosome 7) was added to the slide and a coverslip was applied. For SNU16 cells, probes targeting MYC (chromosome 8 segment; Empire Genomics, MYC-20-RE), FGFR2 (Empire Genomics, FGFR2-20-GR), and various chromosome 10 segments from Empire Genomics were used (enhancer region in Extended Data Fig. 9b: WI2-2170K5; probes in Extended Data Fig. 9e targeting region 1: RP11-257O17; region 2: RP11-95I16; region 3: RP11-57H2; region 4: RP11-1024G22). The probes were mixed with the provided hybridization buffer in 1:10 ratio and applied onto the sample. The sample was denatured at 75 °C in a slide moat for 3 min and hybridized overnight at 37 °C in a humidified chamber. The sample was washed in 0.4× SSC for 2 min, followed by another 2-min wash with 2× SSC with 0.1% Tween-20. The sample was stained with 4,6-diamidino-2-phenylindole and washed once in ddH2O before mounted onto a glass slide with ProLong Diamond Antifade Mountant (Invitrogen). Images were acquired on a Leica DMi8 widefield microscope with a ×63 objective.
Metaphase DNA FISH image analysis
Colocalization analysis for two-color metaphase FISH data for ecDNAs in SNU16 cells described in Extended Data Fig. 9f was performed using Fiji (version 2.1.0/1.53c)41. Images were split into the two FISH colors + 4,6-diamidino-2-phenylindole channels, and signal threshold set manually to remove background fluorescence. Overlapping FISH signals were segmented using watershed segmentation. Colocalization was quantified using the ImageJ-Colocalization Threshold program and individual and colocalized FISH signals were counted using particle analysis.
Short-read sequencing of DNA isolated by CRISPR-CATCH
To perform short-read sequencing on DNA isolated by CRISPR-CATCH, we first transposed it with Tn5 transposase produced as previously described42 in a 50-µl reaction with TD buffer43, 10 ng DNA and 1 µl transposase. The reaction was performed at 37 °C for 5 min, and transposed DNA was purified using MinElute PCR Purification Kit (Qiagen, 28006). Libraries were generated by seven to nine rounds of PCR amplification using NEBNext High-Fidelity 2× PCR Master Mix (NEB, M0541L), purified using SPRIselect reagent kit (Beckman Coulter, B23317) with double size selection (0.8× right, 1.2× left) and sequenced on the Illumina Miseq, the Illumina Nextseq 550 or the Illumina NovaSeq 6000 platform. For GBM39 enrichment and mutation analyses in Figs. 1 and 2, a 1.2× left-side selection was performed using SPRIselect. Sequencing data were processed as described above for WGS.
Genetic variant analyses
SVs from short-read sequencing were identified with DELLY44 (version 0.8.7; using Boost version 1.74.0 and HTSlib version 1.12) using the delly call command. BCF files were converted to VCF using bcftools view in Samtools45. VAFs were calculated using both imprecise and precise variants. Read alignment was visualized using Gviz in R.
SNVs were identified using GATK (version 4.2.0.0)46 from short-read sequencing data as follows. First, base quality score recalibration was performed on bam files (generated as described above) using gatk BaseRecalibrator followed by gatk ApplyBQSR. Covariates were analyzed using gatk AnalyzeCovariates. SNVs were called using gatk Mutect2 from the recalibrated bam files, and SNVs were filtered using gatk FilterMutectCalls. Finally, VCF files were converted to table format using gatk VariantsToTable with the following parameters: ‘-F CHROM -F POS -F REF -F ALT -F QUAL -F TYPE -GF AD -GF GQ -GF PL -GF GT’. Mutation VAFs were calculated by dividing alternate allele occurrences by the sum of reference and alternate allele occurrences. SNVs that had coverage depth of 5 or less or were not detected in WGS were filtered out. Read alignment was visualized using Gviz in R. To classify ecDNA-specific SNVs in GBM39 cells, we identified all SNVs with VAFs higher than 0.03 in ecDNAs isolated by CRISPR-CATCH using guide A, B or A + B (given chromosome contamination levels of 0.01–0.02; Extended Data Fig. 2d) and with VAFs in WGS lower than 0.997 (nonhomozygous variants). Chromosome-specific SNVs were defined as non-ecDNA SNVs with VAFs in WGS lower than 0.1. Homozygous SNVs were defined as non-ecDNA-specific and non-chromosome-specific SNVs with VAFs in WGS above 0.99.
Nanopore sequencing and 5mC methylation calling
DNA isolated by CRISPR-CATCH was directly used without amplification for nanopore sequencing. Sequencing libraries were prepared using the Rapid Sequencing Kit (Oxford Nanopore Technologies, SQK-RAD004) according to the manufacturer’s instructions. Sequencing was performed on a MinION (Oxford Nanopore Technologies).
Bases were called from fast5 files using guppy (Oxford Nanopore Technologies, version 5.0.16) within Megalodon (version 2.3.3) and DNA methylation status was determined using Rerio basecalling models with the configuration file ‘res_dna_r941_min_modbases-all-context_v001.cfg’ and the following parameters: ‘–outputs basecalls mod_basecalls mappings mod_mappings mods per_read_mods –mod-motif Z CG 0 –write-mods-text –mod-output-formats bedmethyl wiggle –mod-map-emulate-bisulfite –mod-map-base-conv C T –mod-map-base-conv Z C’. Methylation calls on single molecules were visualized using Integrative Genome Viewer (IGV, version 2.11.1) in bisulfite mode.
To quantify 5mC-CpG methylation levels across an entire locus, rolling averages of CpG methylation percentages were calculated using a window of 100 bp sliding every 10 bp (unless otherwise specified). Rolling averages of ecDNA and the native chromosomal locus were linearly regressed using the lm function in R. Standardized residual for the linear regression for each window was calculated using the rstandard function to represent relative methylation frequencies on ecDNA compared to chromosomal DNA. To identify accessible regions which are differentially methylated on ecDNA, we first filtered on ATAC-seq peaks which had log-normalized coverage above 9 (calculated by DESeq2 as described in the ATAC-seq section below; normalized coverage for each peak was divided by peak width after adding 1, scaled to 500 and log2 transformed). Next, methylation sites with coverage above 5 for both the isolated ecDNA and chromosomal locus, and overlapping filtered ATAC-seq peaks were linearly regressed using the lm function in R. Standardized residual for the linear regression for each CpG site was calculated using the rstandard function. For each ATAC-seq peak, a z score was calculated using the formula z = (x − m)/s.e., where x is the mean CpG residual within the peak, m is the mean residual of all CpG sites and s.e. is the standard error calculated from the standard deviation of all CpG sites divided by the square root of the number of CpG sites within the peak. z scores were used to compute two-sided P values using the normal distribution function, which were adjusted with p.adjust in R (version 3.6.1) using the Benjamini–Hochberg procedure.
To quantify co-occurrence of methylated or unmethylated CpGs on single molecules, methylation calls on the ‘+’ strand were offset by 1 bp to match the locations of the corresponding CpG sites on the ‘−’ strand. CpG sites where the base probabilities of methylation were above 0.7 were categorized as methylated, and sites where the base probabilities of unmodified CpG were above 0.7 were categorized as unmethylated. For each pair of CpG sites, co-occurrence was calculated by number of co-occurrences of methylated or unmethylated CpGs on the same nanopore sequencing reads divided by total number of occurrences in which the two CpG sites can be successfully categorized as either methylated or unmethylated.
ATAC-seq
ATAC-seq data for GBM39 were previously published8 and raw fastq reads obtained from the NCBI Sequence Read Archive, under BioProject accession PRJNA506071. ATAC-seq data for SNU16 were previously published under Gene Expression Omnibus accession GSE159986 (ref. 28). Adapter-trimmed reads were aligned to the hg19 genome using Bowtie2 (2.1.0). Aligned reads were filtered for quality using samtools (version 1.9)45, duplicate fragments were removed using Picard’s MarkDuplicates (version 2.25.3) and peaks were called using MACS2 (version 2.2.7.1)47 with a q-value cut-off of 0.01 and a no-shift model. Peaks from replicates were merged, and read counts were obtained using bedtools (version 2.30.0)48 and normalized using DESeq2 (using the ‘counts’ function in DESeq2 with normalized = TRUE; version 1.26.0)49.
MNase-seq
MNase-seq data for GBM39 were previously published8 and raw fastq reads obtained from the NCBI Sequence Read Archive under BioProject accession PRJNA506071. Reads were trimmed of adapter content with Trimmomatic36 (version 0.39), aligned to the hg19 genome using BWA MEM37 (0.7.17-r1188), and PCR duplicates removed using Picard’s MarkDuplicates (version 2.25.3). Coverage of nucleosome midpoints was obtained using bamCoverage from deepTools (version 3.5.1) with the following parameters: ‘–MNase –binSize 1’.
Reporting summary
Further information on research design is available in the Nature Research Reporting Summary linked to this article.
Online content
Any methods, additional references, Nature Research reporting summaries, source data, extended data, supplementary information, acknowledgements, peer review information; details of author contributions and competing interests; and statements of data and code availability are available at 10.1038/s41588-022-01190-0.
Supplementary information
Acknowledgements
We thank members of the Chang and Bafna laboratories for discussions. This work was supported by National Institutes of Health (NIH) grants R35-CA209919 (H.Y.C.) and RM1-HG007735 (H.Y.C., W.J.G. and C.C.); Cancer Grand Challenges grant CGCSDF-2021\100007, with support from Cancer Research UK and the National Cancer Institute (H.Y.C., V.B., P.S.M., M.J.-H. and A.G.H.); NIH grants U24CA264379, R01GM114362, OT2CA278635 and Cancer Grand Challenges grant CGCATF-2021/100025 with support from Cancer Research UK and the National Cancer Institute (V.B.); NIH grants 1R01CA176111A1 and 1P01CA168585 (R.S.L.); the Melanoma Research Alliance (R.S.L. and G.M.); and a Jonsson Comprehensive Cancer Center postdoctoral fellowship (P.D.). K.L.H. was supported by a Stanford Graduate Fellowship and an NCI Predoctoral to Postdoctoral Fellow Transition Award (NIH F99CA274692). H.Y.C. is an Investigator of the Howard Hughes Medical Institute. A.G.H. is supported by Deutsche Forschungsgemeinschaft (German Research Foundation) grant 398299703 and the European Research Council under the European Union’s Horizon 2020 Research and Innovation Programme (grant agreement 949172). K.E.H. was supported by a Canadian Institutes of Health Research Banting postdoctoral fellowship.
Extended data
Source data
Author contributions
K.L.H. and H.Y.C. conceived the project. K.L.H. performed experiments for CRISPR-CATCH method development for ecDNA enrichment and analyses of genetic variants and epigenomic features from short-read sequencing and nanopore sequencing data. J.L. and S.R.D. analyzed short-read sequencing and OM data for amplicon reconstruction. K.L.H., C.I.C. and R.L. performed experiments for CRISPR-CATCH optimization for human tumor processing. I.T.L.W. performed DNA FISH validation experiments. P.D., S.H.L., N.E.W., G.M., X.Z., C.B., K.E.H., W.Y., R.C., C.S., C.C., M.J.-H., A.G.H. and R.S.L. provided human tumor specimens and patient-derived samples for CRISPR-CATCH optimization. P.D. and R.S.L. analyzed ecDNA amplicons in human tumor specimens. C.C. and J.A.L. generated OM data and provided de novo assembly and rare variant analysis results. W.J.G. advised on single-molecule sequencing. P.M., V.B. and H.Y.C. guided data analysis and provided feedback on experimental design. K.L.H. and H.Y.C. wrote the manuscript with input from all authors.
Peer review
Peer review information
Nature Genetics thanks Peter Scacheri and the other, anonymous, reviewer(s) for their contribution to the peer review of this work. Peer reviewer reports are available.
Data availability
Sequencing data generated in this study are deposited in the Sequence Read Archive under BioProject accession PRJNA777710. WGS data from bulk GBM39 cells were obtained from the NCBI Sequence Read Archive under BioProject accession PRJNA506071. WGS data from bulk SNU16 cells were previously generated (SRR530826, Genome Research Foundation). ATAC-seq and MNase-seq data for GBM39 were obtained from the NCBI Sequence Read Archive under BioProject accession PRJNA506071. ChIP-seq data for SNU16 were previously published under Gene Expression Omnibus accession GSE15998628. Sequencing reads were mapped to the hg19 human reference genome. Source data are provided with this paper.
Code availability
Custom code to perform reconstructions of candidate ecDNA structures from CRISPR-CATCH data is available at https://github.com/siavashre/CRISPRCATCH.
Competing interests
H.Y.C. is a co-founder of Accent Therapeutics, Boundless Bio, Cartography Biosciences and Orbital Therapeutics, and an advisor of 10x Genomics, Arsenal Biosciences and Spring Discovery. V.B. is a co-founder, paid consultant and science advisory board member and has equity interest in Boundless Bio and Abterra. The terms of this arrangement have been reviewed and approved by the University of California, San Diego in accordance with its conflict-of-interest policies. P.M. is a co-founder and advisor of Boundless Bio. R.S.L. reports research and clinical trial support from Merck, Pfizer, BMS and OncoSec. J.L. receives compensation as a consultant for Boundless Bio. The remaining authors declare no competing interests.
Footnotes
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
These authors contributed equally: Jens Luebeck, Siavash R. Dehkordi
Extended data
is available for this paper at 10.1038/s41588-022-01190-0.
Supplementary information
The online version contains supplementary material available at 10.1038/s41588-022-01190-0.
References
- 1.Turner KM, et al. Extrachromosomal oncogene amplification drives tumour evolution and genetic heterogeneity. Nature. 2017;543:122–125. doi: 10.1038/nature21356. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 2.Kim H, et al. Extrachromosomal DNA is associated with oncogene amplification and poor outcome across multiple cancers. Nat. Genet. 2020;52:891–897. doi: 10.1038/s41588-020-0678-2. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 3.Verhaak RGW, Bafna V, Mischel PS. Extrachromosomal oncogene amplification in tumour pathogenesis and evolution. Nat. Rev. Cancer. 2019;19:283. doi: 10.1038/s41568-019-0128-6. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 4.Møller HD, et al. Circular DNA elements of chromosomal origin are common in healthy human somatic tissue. Nat. Commun. 2018;9:1069. doi: 10.1038/s41467-018-03369-8. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 5.Wang Y, et al. eccDNAs are apoptotic products with high innate immunostimulatory activity. Nature. 2021;599:308–314. doi: 10.1038/s41586-021-04009-w. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 6.Deshpande V, et al. Exploring the landscape of focal amplifications in cancer using AmpliconArchitect. Nat. Commun. 2019;10:392. doi: 10.1038/s41467-018-08200-y. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 7.Luebeck J, et al. AmpliconReconstructor integrates NGS and optical mapping to resolve the complex structures of focal amplifications. Nat. Commun. 2020;11:4374. doi: 10.1038/s41467-020-18099-z. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 8.Wu S, et al. Circular ecDNA promotes accessible chromatin and high oncogene expression. Nature. 2019;575:699–703. doi: 10.1038/s41586-019-1763-5. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 9.Møller HD, Parsons L, Jørgensen TS, Botstein D, Regenberg B. Extrachromosomal circular DNA is common in yeast. PNAS. 2015;112:E3114–E3122. doi: 10.1073/pnas.1508825112. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 10.Koche RP, et al. Extrachromosomal circular DNA drives oncogenic genome remodeling in neuroblastoma. Nat. Genet. 2020;52:29–34. doi: 10.1038/s41588-019-0547-z. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 11.Jiang W, et al. Cas9-Assisted Targeting of CHromosome segments CATCH enables one-step targeted cloning of large gene clusters. Nat. Commun. 2015;6:1–8. doi: 10.1038/ncomms9101. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 12.van der Bliek AM, Lincke CR, Borst P. Circular DNA of 3T6R50 double minute chromosomes. Nucleic Acids Res. 1988;16:4841–4851. doi: 10.1093/nar/16.11.4841. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 13.Borst P, Van Der Bliek AM, Van Der Velde-Koerts T, Hes E. Structure of amplified DNA, analyzed by pulsed field gradient gel electrophoresis. J. Cell. Biochem. 1987;34:247–258. doi: 10.1002/jcb.240340404. [DOI] [PubMed] [Google Scholar]
- 14.Nassonova ES. Pulsed field gel electrophoresis: theory, instruments and application. Cell Tiss. Biol. 2008;2:557. [PubMed] [Google Scholar]
- 15.Gabrieli T, et al. Selective nanopore sequencing of human BRCA1 by Cas9-assisted targeting of chromosome segments (CATCH) Nucleic Acids Res. 2018;46:e87. doi: 10.1093/nar/gky411. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 16.Nazarian R, et al. Melanomas acquire resistance to B-RAF(V600E) inhibition by RTK or N-RAS upregulation. Nature. 2010;468:973–977. doi: 10.1038/nature09626. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 17.Moriceau G, et al. Tunable-combinatorial mechanisms of acquired resistance limit the efficacy of BRAF/MEK cotargeting but result in melanoma drug addiction. Cancer Cell. 2015;27:240–256. doi: 10.1016/j.ccell.2014.11.018. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 18.Ren M, et al. BRAF, C-KIT, and NRAS mutations correlated with different clinicopathological features: an analysis of 691 melanoma patients from a single center. Ann. Transl. Med. 2022;10:31. doi: 10.21037/atm-21-4235. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 19.Sarkaria JN, et al. Identification of molecular characteristics correlated with glioblastoma sensitivity to EGFR kinase inhibition through use of an intracranial xenograft test panel. Mol. Cancer Ther. 2007;6:1167–1174. doi: 10.1158/1535-7163.MCT-06-0691. [DOI] [PubMed] [Google Scholar]
- 20.Nathanson DA, et al. Targeted therapy resistance mediated by dynamic regulation of extrachromosomal mutant EGFR DNA. Science. 2014;343:72–76. doi: 10.1126/science.1241328. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 21.Nikolaev S, et al. Extrachromosomal driver mutations in glioblastoma and low-grade glioma. Nat. Commun. 2014;5:5690. doi: 10.1038/ncomms6690. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 22.Storlazzi CT, et al. MYC-containing double minutes in hematologic malignancies: evidence in favor of the episome model and exclusion of MYC as the target gene. Hum. Mol. Genet. 2006;15:933–942. doi: 10.1093/hmg/ddl010. [DOI] [PubMed] [Google Scholar]
- 23.Storlazzi CT, et al. Gene amplification as double minutes or homogeneously staining regions in solid tumors: Origin and structure. Genome Res. 2010;20:1198–1206. doi: 10.1101/gr.106252.110. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 24.Carroll SM, et al. Double minute chromosomes can be produced from precursors derived from a chromosomal deletion. Mol. Cell. Biol. 1988;8:1525–1533. doi: 10.1128/mcb.8.4.1525. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 25.Bailey C, Shoura MJ, Mischel PS, Swanton C. Extrachromosomal DNA—relieving heredity constraints, accelerating tumour evolution. Ann. Oncol. 2020;31:884–893. doi: 10.1016/j.annonc.2020.03.303. [DOI] [PubMed] [Google Scholar]
- 26.Lövkvist C, Sneppen K, Haerter JO. Exploring the link between nucleosome occupancy and DNA methylation. Front. Genet. 2018;8:232. doi: 10.3389/fgene.2017.00232. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 27.Kelly TK, et al. Genome-wide mapping of nucleosome positioning and DNA methylation within individual DNA molecules. Genome Res. 2012;22:2497–2506. doi: 10.1101/gr.143008.112. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 28.Hung KL, et al. ecDNA hubs drive cooperative intermolecular oncogene expression. Nature. 2021;600:731–736. doi: 10.1038/s41586-021-04116-8. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 29.Zhu Y, et al. Oncogenic extrachromosomal DNA functions as mobile enhancers to globally amplify chromosomal transcription. Cancer Cell. 2021;39:694–707. doi: 10.1016/j.ccell.2021.03.006. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 30.Helmsauer K, et al. Enhancer hijacking determines extrachromosomal circular MYCN amplicon architecture in neuroblastoma. Nat. Commun. 2020;11:5823. doi: 10.1038/s41467-020-19452-y. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 31.Morton AR, et al. Functional enhancers shape extrachromosomal oncogene amplifications. Cell. 2019;179:1330–1341. doi: 10.1016/j.cell.2019.10.039. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 32.Abramov S, et al. Landscape of allele-specific transcription factor binding in the human genome. Nat. Commun. 2021;12:2751. doi: 10.1038/s41467-021-23007-0. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 33.Müller CA, et al. Capturing the dynamics of genome replication on individual ultra-long nanopore sequence reads. Nat. Methods. 2019;16:429–436. doi: 10.1038/s41592-019-0394-y. [DOI] [PubMed] [Google Scholar]
- 34.Shipony Z, et al. Long-range single-molecule mapping of chromatin accessibility in eukaryotes. Nat. Methods. 2020;17:319–327. doi: 10.1038/s41592-019-0730-2. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 35.Stergachis AB, Debo BM, Haugen E, Churchman LS, Stamatoyannopoulos JA. Single-molecule regulatory architectures captured by chromatin fiber sequencing. Science. 2020;368:1449–1454. doi: 10.1126/science.aaz1646. [DOI] [PubMed] [Google Scholar]
- 36.Bolger AM, Lohse M, Usadel B. Trimmomatic: a flexible trimmer for Illumina sequence data. Bioinformatics. 2014;30:2114. doi: 10.1093/bioinformatics/btu170. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 37.Li H, Durbin R. Fast and accurate short read alignment with Burrows–Wheeler transform. Bioinformatics. 2009;25:1754–1760. doi: 10.1093/bioinformatics/btp324. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 38.Overhauser, J. Encapsulation of Cells in Agarose Beads. in Pulsed-Field Gel Electrophoresis: Protocols, Methods, and Theories (eds. Burmeister, M. & Ulanovsky, L.) 129–134 (Humana Press, 1992). [DOI] [PubMed]
- 39.Dunham I, et al. An integrated encyclopedia of DNA elements in the human genome. Nature. 2012;489:57–74. doi: 10.1038/nature11247. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 40.Wang Y, et al. The 3D Genome Browser: a web-based browser for visualizing 3D genome organization and long-range chromatin interactions. Genome Biol. 2018;19:151. doi: 10.1186/s13059-018-1519-9. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 41.Schindelin J, et al. Fiji: an open-source platform for biological-image analysis. Nat. Methods. 2012;9:676–682. doi: 10.1038/nmeth.2019. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 42.Picelli S, et al. Tn5 transposase and tagmentation procedures for massively scaled sequencing projects. Genome Res. 2014;24:2033–2040. doi: 10.1101/gr.177881.114. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 43.Corces MR, et al. An improved ATAC-seq protocol reduces background and enables interrogation of frozen tissues. Nat. Methods. 2017;14:959–962. doi: 10.1038/nmeth.4396. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 44.Rausch T, et al. DELLY: structural variant discovery by integrated paired-end and split-read analysis. Bioinformatics. 2012;28:i333–i339. doi: 10.1093/bioinformatics/bts378. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 45.Li H, et al. The Sequence Alignment/Map format and SAMtools. Bioinformatics. 2009;25:2078–2079. doi: 10.1093/bioinformatics/btp352. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 46.McKenna A, et al. The Genome Analysis Toolkit: a MapReduce framework for analyzing next-generation DNA sequencing data. Genome Res. 2010;20:1297–1303. doi: 10.1101/gr.107524.110. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 47.Zhang Y, et al. Model-based analysis of ChIP-seq (MACS) Genome Biol. 2008;9:R137. doi: 10.1186/gb-2008-9-9-r137. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 48.Quinlan AR, Hall IM. BEDTools: a flexible suite of utilities for comparing genomic features. Bioinformatics. 2010;26:841–842. doi: 10.1093/bioinformatics/btq033. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 49.Love MI, Huber W, Anders S. Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2. Genome Biol. 2014;15:550. doi: 10.1186/s13059-014-0550-8. [DOI] [PMC free article] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Supplementary Materials
Data Availability Statement
Sequencing data generated in this study are deposited in the Sequence Read Archive under BioProject accession PRJNA777710. WGS data from bulk GBM39 cells were obtained from the NCBI Sequence Read Archive under BioProject accession PRJNA506071. WGS data from bulk SNU16 cells were previously generated (SRR530826, Genome Research Foundation). ATAC-seq and MNase-seq data for GBM39 were obtained from the NCBI Sequence Read Archive under BioProject accession PRJNA506071. ChIP-seq data for SNU16 were previously published under Gene Expression Omnibus accession GSE15998628. Sequencing reads were mapped to the hg19 human reference genome. Source data are provided with this paper.
Custom code to perform reconstructions of candidate ecDNA structures from CRISPR-CATCH data is available at https://github.com/siavashre/CRISPRCATCH.