Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2023 Dec 8.
Published in final edited form as: Mol Cell. 2021 Dec 23;82(1):209–217.e7. doi: 10.1016/j.molcel.2021.11.027

Circular DNA in the human germline and its association with recombination

Rasmus Amund Henriksen 1,4, Piroon Jenjaroenpun 2,5, Ida Borup Sjøstrøm 1, Kristian Reveles Jensen 3, Iñigo Prada-Luengo 1, Thidathip Wongsurawat 2,5, Intawat Nookaew 2, Birgitte Regenberg 1,6,*
PMCID: PMC10707452  NIHMSID: NIHMS1938162  PMID: 34951964

SUMMARY

Extrachromosomal circular DNA (eccDNA) is common in somatic tissue, but its existence and effects in the human germline are unexplored. We used microscopy, long-read DNA sequencing, and new analytic methods to document thousands of eccDNAs from human sperm. EccDNAs derived from all genomic regions and mostly contained a single DNA fragment, although some consisted of multiple fragments. The generation of eccDNA inversely correlates with the meiotic recombination rate, and chromosomes with high coding-gene density and Alu element abundance form the least eccDNA. Analysis of insertions in human genomes further indicates that eccDNA can persist in the human germline when the circular molecules reinsert themselves into the chromosomes. Our results suggest that eccDNA has transient and permanent effects on the germline. They explain how differences in the physical and genetic map might arise and offer an explanation of how Alu elements co-evolved with genes to protect genome integrity against deleterious mutations producing eccDNA.

In brief

We demonstrate the existence of circular DNA from chromosomes in human germline cells. Our results indicate that coevolution of Alu elements and genes has allowed the human genome to counteract deleterious effects of DNA circularization and reveal how this is connected to the recombination rate of different chromosomes.

Graphical abstract

graphic file with name nihms-1938162-f0005.jpg

INTRODUCTION

Human congenital disorders are often linked to structural variation within the germline caused by amplifications and deletions (Turner et al., 2008; Dennis and Eichler, 2016). However, the fate of deleted DNA in 50%–80% of recurrent germline deletions is not explained (Turner et al., 2008). Work on somatic cells shows that deleted DNA can circularize and exist as self-replicating elements (Gresham et al., 2010; Møller et al., 2015). Such extrachromosomal circular DNA (eccDNA) is common in several eukaryotic species, derives from all parts of genomes, and may contain entire genes (Gresham et al., 2010; Møller et al., 2015, 2018, 2020; Turner et al., 2017; Koche et al., 2020; Morton et al., 2019; Von Hoff et al., 1988). Phenotypic effects of eccDNA include tumors, as 35%–50% of all cancers carry eccDNA with oncogenes that appear to drive tumorigenesis (Turner et al., 2017; Koche et al., 2020; Morton et al., 2019; Von Hoff et al., 1988). Although cancers are somatic disorders of individuals, it is unclear whether and to what extent eccDNA occurs in germline cells of animals and if it is passed on across generations. Circular DNA can integrate into the genome of domestic cattle and affect pigmentation patterns (Durkin et al., 2012). In humans, a ring form of chromosome 22 can be transmitted across generations (Stoll and Roth, 1983). These findings indicate that megabase-sized ringed DNA fragments can pass through meiosis, but the existence and inheritance of smaller eccDNAs and their phenotypic effects remain unknown. To address this question, we adapted the Circle-Seq (Møller et al., 2015, 2018; Koche et al., 2020) method (Figure 1A) and purified eccDNA from 1 million sperm cells from each of 29 donors by removing all linear DNA with exonuclease V (exoV) treatment. We combined image visualization with long-read sequencing to systematically identify and quantify eccDNA in human semen.

Figure 1. DNA measurements and circular visualization using confocal and structured illumination microscopy.

Figure 1.

(A) Human semen samples (n = 29) purified using Circle-Seq method for long-read sequencing. EccDNA coordinates were based on split reads identified using the in-house tools NanoCircle and CReSIL (see STAR Methods and Figure S3 for a description of the pipeline).

(B) Mean signal of DNA content (n = 3) obtained from confocal microscopy images using YOYO-1 dye. DNA content was scaled to eccDNA without linear DNA(second bar). *p ≤ 0.05 and **p ≤ 0.005 against Tris-buffer negative controls (first bar) and samples with no apparent DNA after linearization and digestion of circular DNA with AluI and second exoV treatment (third bar). Fourth bar, sample treated with exoV and nicking enzyme Nb.BsmI to relax circular DNA to optimize microscopic imaging.

(C) SIM of Nb.BsmI-treated uncoiled eccDNA dyed with SiR-Hoechst.

(D) SIM of eccDNAs of indicated sizes.

(E) Density plot of eccDNA sizes from a single sample identified by NanoCircle (n = 100) and SIM (n = 143).

(F) Sizes of an unknown eccDNA and two plasmids by SIM of YOYO-1-dyed DNA after enhancing light intensity to infer perimeters and convert to kilobases, assuming B-form DNA.

See also Figure S1.

RESULTS

Circular DNA is common in human germline cells

We measured DNA content using confocal microscopy with fluorescent staining and compared controls without DNA (Figures 1A and 1B, first bar) with exoV-treated samples (Figure 1B, second bar) to confirm that DNA was present after exonuclease treatment of linear DNA (p = 0.0072, one-tailed t test). This purified product appeared to be circular DNA because most of the signal was removed with exonuclease after linearizing the DNA with an endonuclease AluI (Figure 1B, compare second bar and third bar; p = 0.0013, one-tailed t test). The signal appeared to be independent of eccDNA structure because relaxation with the single-strand nicking enzyme Nb.BsmI produced similar DNA content as untreated samples (Figure 1B, second and fourth bars; p = 0.5960, two-tailed t test). High-resolution structured illumination microscopy (SIM) using fluorescent DNA dye showed that purified eccDNA consisted of continuous covalent structures (Figures 1C and 1D). Although they were not always entirely round, we identified their size range as 3–50 kb from their perimeters (Figures 1D and 1E). The eccDNA appeared to have structural diversity with kinks and writhes that were independent of the DNA dye used (Figures 1C, 1D, and S1). These structures could represent the different structural conformations proposed for minicircles (Mitchell et al., 2011), producing size variations also seen for plasmids of known size (Figures 1F and S1). Electron microscopy of exonuclease-treated DNA from sperm confirmed the existence of circular DNA and supported that linear DNA had been removed (Figure S2).

Long-read sequencing identifies the chromosomal coordinates that led to eccDNA

To identify the chromosomal origin of the eccDNA fragments, we amplified them with a φ29 polymerase for long-read sequencing (Figure 1A). We developed a new bioinformatics pipeline, which incorporates multiple tools (Figures 1A and S3). The first is a new in-house alignment method, construction-based rolling-circle amplification for eccDNA sequence identification and location (CReSIL), to reconstruct the sequence of 310 eccDNA fragments by de novo assembly of sequence reads that aligned to genomic regions, had high sequence coverage, and spanned the region at least once. We also exploited split reads using another new method that we developed, NanoCircle (Figures 1A and S3), which determines the breakpoint coordinates of genomic regions that are joined during eccDNA fragment formation. Split reads are sequences that, upon alignment to a reference genome, are divided between two non-contiguous regions and represent reads that span the eccDNA formation site (Figures 1A and 2A2D). This analysis identified the coordinates of 8,346 eccDNAs derived from a single chromosomal fragment (Table S1; Figure 2E). De novo assembly of eccDNA requires high sequence coverage, so the CReSIL pipeline identified fewer eccDNAs than the NanoCircle analysis. Nonetheless, all Cre-SIL-identified eccDNAs were found using the NanoCircle method, indicating that the two techniques were complementary. The false positive background from reads not aligning to eccDNA appeared to be low. We analyzed two Oxford Nanopore whole-genome sequencing datasets (Helmsauer et al., 2020; Jain et al., 2018) using our bioinformatic pipeline for identification of split reads that would indicate eccDNA. We found that an average of 0.58% of all reads from whole-genome sequence datasets could support an eccDNA (range 0.03%–1.56%), while an average of 12.79% of all reads in our eccDNA enriched sequences supported an eccDNA (range 7.72%–17.09%).

Figure 2. Chromosomal origins of eccDNA correlate with meiotic recombination rates.

Figure 2.

(A) Identified eccDNA containing parts of LATS2. Chromosomal coordinates below the circle are from CReSIL and NanoCircle on the basis of supporting sequence reads, white; split reads, orange.

(B) A self-comparing DNA dot plot with the eccDNA sequence from the LATS2 on both the x and y axes. A nucleotide match at a given position in the sequence between the two axes is represented by a dot. The main diagonal shows the self-alignment and its full match across the sequence. Parallel lines to the main diagonal represent direct repeats, and orthogonal lines to the main diagonal represent inverted repeats.

(C) Schematic diagram of eccDNA of three separate α-satellites.

(D) Another self-comparing DNA dot plot with all the parallel lines capturing the repetitive nature of the α-satellite showing all the homology within the α-satellite eccDNA.

(E) Origins of eccDNAs in the human genome are shown with arandom subset of 2,000 eccDNAs, black lines above chromosomes (example, small chromosomal band at 1q41).

(F and G) Negative correlation between number of eccDNAs (n = 8,582) for all autosomes normalized for chromosome size against meiotic recombination rate estimated from centimorgan values between chiasmata normalized for chromosome size. (F) Data from Morton (1991); (G) data from Kong et al. (2002).

See also Figure S6.

We also sequenced 12 of the 29 human samples with paired-end short-read technology, confirming 752 eccDNAs and yielding 579 additional using Circle-Map (Prada-Luengo et al., 2019) (Table S2). This result suggested that a majority of eccDNAs discovered by long-read sequencing were also detected using short-read methods but that factors such as subsampling and lower sequence coverage of long-read sequencing limited detection. Estimated eccDNA size distributions obtained bioinformatically and microscopically matched well (Figures 1E and S1D). These size distributions were independent of the purification method, suggesting that the recorded sizes represented the natural distribution (Figures 1E and S1D). We next determined the eccDNA found in two or more samples with a 90% reciprocal overlap between the circular coordinates. The most common circular DNA molecule detected across the samples was the mitochondria DNA, which we identify 17 times between these comparisons. For non-mitochondrial circular DNA molecules, we recorded only 13 eccDNA, which were found across two samples. We also examined how donor age influenced the number of eccDNA and found a tendency for some younger men to have higher levels of eccDNA (Figure S4). However, high levels of eccDNA also corresponded to high sperm mobility, and high levels of eccDNA are therefore not necessarily a function of donor age.

EccDNA are created from the entire genome, including repeat regions and exons

We found that 2% of the identified eccDNAs contained entire genes and 4% contained promoter regions and an initial exon or exons (Figure 2A). This suggests that, if expressed, eccDNA with genes could have phenotypic effects. Furthermore, eccDNA with exons could provide a means of DNA mobilization across the genome and a mechanism for exon shuffling (Patthy, 1999). We also identified eccDNA from repetitive regions, though the repetitive nature of elements such as α-satellites (Figure 2C) make their coordinate determination challenging (Figure 2D). Most eccDNA contained both repetitive and unique elements, and only 2% comprised solely a repeat element. A majority of solo-repeat eccDNAs originated from long interspersed nuclear elements with some from satellite repeats (Table S3; Figures 2C and 2D). α-Satellite eccDNA could, upon formation and reintegration into the genome, affect the composition of centromeres. α-Satellite repeats vary substantially among individuals in the centromere region (Miga, 2019), and human centromeres consist primarily of interchromosomal duplications (Lander et al., 2001), suggesting that the eccDNAs could be involved in generating centromere variation.

We searched for evidence of recent eccDNA insertions in more than 4,000 sequenced human genomes. We found seven cases that supported permanent alterations in the human germline from putative eccDNA insertions (Figure S5; Table S4). One circle, created from region 10q22.1, contained the first intron of UNC5B, was inserted into a long terminal repeat (LTR) at region 17q12. Another circle was created from the mitochondrial exon MT-ND5 and inserted into an LTR at region 5p13.3. On the basis of these examples of eccDNA reinsertion into the human genome, eccDNA with exons and introns from across the genome might provide a means for genes to acquire new exons, gene duplications, and changes in the order by which genes are organized in the genome, depending on the circular content.

Meiotic recombination rates correlate negatively with the creation of eccDNA

In spermatogenesis, primordial diploid germ cells develop into haploid sperm cells through meiosis (de Kretser et al., 1998), ensuring genetic variation through random segregation and homologous recombination. To understand how meiotic recombination may affect eccDNA in the germline, we compared male per-chromosome recombination rates (Morton, 1991; Kong et al., 2002) with the number of simple eccDNAs produced across all 29 human samples (Figure 2E). The first correlation (Morton, 1991) revealed a significant negative correlation across the chromosomes (p = 2.29E-4, r = −0.7078; Figure 2F). Chromosomes 17, 19, and 22, which had high meiotic recombination rates, generated small numbers of DNA circles, whereas the similarly sized chromosomes 18 and 21, with low meiotic recombination, formed more circles (Figure 2F). The most extreme example was chromosome 4, which had more than 2-fold lower meiotic recombination than chromosome 19 and formed 11.4 times as many eccDNAs per Mb. A similar negative correlation between meiotic recombination and eccDNA formation was found using different male per-chromosome recombination rates from three other studies (Kong et al., 2002, Figure 2G, p = 1.32E-3, r = −0.640; Wang et al., 2012, Figure S6A; and Halldorsson et al., 2019, Figure S6B). On the basis of these results, we conjecture that canonical meiotic homologous recombination affects the formation of eccDNA negatively, as modeled in Figure S6C.

Creation of eccDNA is associated with the density of coding genes and Alu elements

We next examined if genetic factors in cis potentially explained why autosomes such as chromosome 19 formed so few eccDNAs compared with similarly sized chromosomes (Figure 3A). Chromosome 19 is particularly enriched in coding genes and Alu elements compared with other chromosomes (Lander et al., 2001; Grimwood et al., 2004; Grover et al., 2004) (Figure 3A). We hypothesized that these cis factors influenced eccDNA formation. We found negative correlations between the number of coding genes per Mb and the frequency of eccDNA (Figure 3B; p = 2.13E-4, r = −0.71) and the number of Alu elements per Mb and the frequency of eccDNA (Figure 3C; p = 1.38E-4, r = −0.72). Another study showed the opposite correlations, with large numbers of coding genes and eccDNAs in healthy mitotically dividing tissues (Møller et al., 2018). We now find the same to be true for Alu elements, which suggest that the factors driving meiotic and mitotic DNA circularization are fundamentally different. Other repetitive elements, retrotransposons and transposons, ribosomal RNA genes, and simple repeats showed no significant negative correlations with eccDNA (Figure S7). To further corroborate our model (Figure S7G), we then examined how many germline eccDNA contained a full Alu element within the region. In accordance with our model, we saw fewer eccDNA overlapping with an entire Alu element (1,603 for AluJ, 2,743 for AluS, and 955 for AluY) compared with overlap between entire Alu elements and an in silico dataset (2,986 for AluJ, 4,944 for AluS, and 1,890 for AluY), supporting that Alu elements do not drive circularization in male gametogenesis (Figure S8).

Figure 3. Frequency of eccDNA per mega-base correlates negatively with coding genes and Alu elements.

Figure 3.

(A) Schematic illustration of differences in density of coding genes, red; Alu, orange; and eccDNA (black lines).

(B) Significant negative correlation between relative number of eccDNAs (total n = 8,883) identified per chromosome (Y chromosome and mitochondria not included) and number of coding genes from sperm samples (black) in this study (p = 2.13E-4, r = −0.71) compared with significant positive correlation (gray line; p = 2.34E-11, r = 0.95) in a of somatic eccDNAs (n = 10,000) (Møller et al., 2018).

(C) Significant negative correlation between eccDNA and Alu elements in sperm samples from this study (p = 1.38E-4, r = −0.72) compared with positive correlation for somatic eccDNA (p = 4.35E-14, r = 0.97).

We compared the germline eccDNA with those detected in somatic tissue (Møller et al., 2018), finding that only 22 of 138,681 somatic circles overlapped the germline ones, supporting different factors influenced the formation of eccDNA during male gametes in meiosis and human somatic cells from mitosis. We also examined if other factors correlated with the eccDNA formation rates in the germline cells. The G+C content was slightly lower in eccDNA (mean 37.76%) than an in silico-created dataset that represents a random sample of the genome (mean 40.83%), and eccDNA formation positively correlated with LTR elements (Figure S7).

EccDNAs form from single fragments and multiple DNA fragments

Apart from the identification of eccDNA with simple origins, long-read sequencing also allowed us to identify chimeric eccDNA consisting of multiple DNA fragments from a single or several chromosomes (Figure 4). Using NanoCircle, we identified 3,090 chimeric eccDNAs (Table S5), with the most complex containing more than ten fragments (Table S5). We successfully assembled 65 chimeric sequences using CReSIL and visualized them to identify the order and origin of included fragments (Figure 4). To test if chimeric eccDNAs were a purification artifact, we took advantage of microbial contamination observed in several samples. Our analysis found zero microbial DNA fragments in the chimeric eccDNAs, strongly suggesting that chimeric eccDNAs were not a procedural artifact but a genuine product of normal cellular processes. Across all of the 29 human samples, we found that there were fewer chimeric eccDNAs than simple eccDNAs (mean proportion of chimeric eccDNA among all eccDNA 22.23%; Figure S9).

Figure 4. Examples of chimeric circles.

Figure 4.

(A–C) Left: plot of assembled sequence from CReSIL against chromosomal regions. Black diagonal lines, dot plots for chromosomal regions illustrating fragment order and orientation with respective identified coordinates below. Right: corresponding chimeric circle identified using NanoCircle. Chimeric eccDNA with (A) two, (B) three, and (C) four fragments.

DISCUSSION

We report multiple lines of evidence that simple and chimeric eccDNA commonly occurs in the male human germline. We show that these eccDNAs are large enough to carry entire genes or gene fragments. Genes on eccDNA are not expected to be expressed in the sperm cell but can potentially be transcribed and have at least transient phenotypic effects in the next generation, as reported for ring chromosomes (Stoll and Roth, 1983). We suspect the existence of circles larger than the ones we identified, on the basis of indirect evidence from megabase deletions that would be expected to form reciprocally sized eccDNAs (Turner et al., 2008). However, we were unable to detect these, as split reads for larger eccDNA did not exist, and the read coverage was too low for assembly. A small part of the signal we record may derive from amplifications of direct repeats on chromosomal fragments, yielding false positives. However, we confirmed linear DNA removal using SIM, electron microscopy (Figure 1C), confocal microscopy, and PCR (not shown), and therefore we expect our results to reflect true eccDNA. We expect that our experimental pipeline underestimates the number of eccDNA and especially large circles, because our bio-informatics method requires a breakpoint-sequence read, which is less likely to occur for larger circles. Furthermore, larger circles are more susceptible to exonuclease removal because they are more likely to have a backbone nick that might expose them to degradation. Also, sperm chromatin is more tightly packed than other chromatin and we cannot exclude that some circular DNAs have been lost in the DNA purification because of tight packing of eccDNA chromatin. We expect germline eccDNA to have the biggest phenotypic effects if it derives from gene deletions and/or if it reinserts into chromosomes (Durkin et al., 2012), which might affect variation within human populations. For example, reinsertions may be associated with exon shuffling (Patthy, 1999), gene duplications (Prada-Luengo et al., 2020), and possibly genetic disorders. We found various examples of circle reinsertions (Figure S5). Yet we suspect that the insertion rates are underestimated because of the difficulty to detect insertions in short-read sequencing data.

We show that eccDNA does not form randomly but has chromosome-specific frequencies that negatively correlate with chromosome-specific canonical recombination and the chromosomal density of coding genes and Alu elements. Alu elements are associated with cohesins (Hakimi et al., 2002), which have fundamental roles in meiotic recombination mediating sister chromatid cohesion, chromosome pairing, and synaptonemal complex assembly (Brar et al., 2009). We propose a model in which a greater number of Alu elements facilitates correct alignment between homologous chromosomes, particularly in gene-rich regions, while suppressing illegitimate intrachromatid recombination that would lead to eccDNA (Figure S7). This model could answer long-standing questions in human genetics. First, it suggests that the abundance of Alu elements (Lander et al., 2001) has an adaptive significance in maintaining genome integrity by reducing the rate of illegitimate recombination, leading to fewer eccDNAs from gene-rich regions. The model also explains how evolution led to Alu element abundance in gene-rich regions (Lander et al., 2001). If Alu elements reduce illegitimate recombination events, they will be gradually lost from gene-poor regions, where deletions have low costs, and conserved in gene-rich regions. We found that the three types of Alu elements in the human genomes (AluJ, AluS, and AluY) were almost 50% less abundant in the eccDNA than in a random dataset, supporting the idea that Alu elements might protect some regions of the human genome while making others more prone to changes. We are presently unable to gain further insight into how DNAs circularize during human meiosis and how this process relates to Alu elements. To our knowledge, there is currently no experimental model of the human male gametogenesis that lends itself to genetic modifications and can be used to elucidate the role of HR and the primate-specific Alu elements (Deininger, 2011). Other factors than Alu elements might also affect the uneven formation of eccDNA from the human chromosomes. We found slightly lower GC content on the eccDNA (not shown) and a correlation between the formation rate and LTR elements (Figure S7), and there could therefore be a connection between these elements and eccDNA. Somatic chromatin is organized in topologically associating domains (TADs) that could potentially direct the circularization and formation of somatic eccDNA. However, human sperm does not contain TADs (Chen et al., 2019), and it therefore seems less likely that tertiary chromatin structures are important for the formation of eccDNA in the meiotic cells. Our results agree with a hypothesized positive selection for Alu elements within gene-rich regions that implies a beneficial function (Lander et al., 2001), and we find that chromosomes with many Alu elements and genes have higher canonical recombination rates, increasing genetic distances. Our results suggest that the mechanism regulating eccDNA production also regulates recombination, partially explaining why genetic distances (centimorgans) differ from physical distances (megabases) across human chromosomes.

Formation of eccDNA in germline cells follows an opposite pattern to that in somatic tissue (Figure 3), where the gene and Alu element-rich chromosomes form the most eccDNA. If this trend is also true for somatic tissue in the testis, somatic eccDNA mutations must somehow be excluded in the process of spermatogenesis. Baker’s yeast sequesters eccDNA from the rDNA locus during meiosis (King et al., 2019), so that meiotic cells are rejuvenated and free of rDNA eccDNA. The process depends on remodeling of the nuclear envelope that drives the formation of a membranous compartment between the germline cells with the sequestered material. Many of the nuclear proteins in yeast also have homologs in human cells and it seems probable that the human germline cells have mechanisms for exclusion of somatic eccDNA similar to the ones in yeast.

In summary, our findings address what happens to DNA deleted in the germline and contribute to the essential understanding of how the human genome evolved to prevent deleterious meiotic mutations such as DNA circularization and deletions.

STAR★METHODS

RESOURCE AVAILABILITY

Lead contact

Request for further information should be directed to and will be answered by the lead contact Birgitte Regenberg, (bregenberg@bio.ku.dk).

Materials availability

No unique reagent was generated.

Data and code availability

  • All raw sequencing data are publicly available as of the date of publication at the SRA database (Under BioProject number PRJNA655819).

  • The NanoCircle and CReSIL workflows are available at Zenodo - https://doi.org/10.5281/zenodo.5720805

  • Any additional information required to reanalyze the data reported in this paper is available from the lead contact upon request

EXPERIMENTAL MODEL AND SUBJECT DETAILS

Human samples and ethical permission

All participants were human males donating semen samples to the European Sperm bank (Struenseegade 9A, 2; 2200 Copenhagen N – Denmark; europeanspermbank.com). Before donation, the donors consented to their semen being used for scientific research purposes. We received a total of 29 human samples. All these semen samples belonged to the donor type intracervical insemination (ICI), which contains all natural ejaculate fluids and sperm cells. For a subset of the 29 samples, consisting of 17 individuals, we recorded the age of the donor at the time of donation with the sample IDs: HS20-HS23, HS25-HS27, HS29-HS35 and HS37-HS39. Once we received the samples, they were transferred from the sperm bank tubes into new tubes with randomly assigned numbers. The original tubes and the identification key linking donor tubes to the newly assigned numbers were destroyed in order to fully anonymize the semen samples, making the samples and results none-identifiable with specific individuals. We identified simple and chimeric eccDNA specific to each of these samples as described below. These 29 samples served for the main analysis of eccDNA and were denoted: HS1-HS12, HS20-HS23, HS25-HS27, HS29-HS35 and HS37-HS39.

Additionally, 3 fully anonymized human samples with random ID’s: HS68, HS72, HS73 were also collected from the European Sperm bank. These samples belonged to the donor intrauterine insemination (IUI) which have been processed by the sperm bank by washing and separating sperm cells from the seminal fluids prior to cryoconservation. These samples were not used for the main data analysis, but only as a way to identify chimeric eccDNA in purified sperm cells to corroborate the evidence of chimeric eccDNA present in the human germline. Before the initiation of the analysis, the “videnskablige Komiteer” (scientific committee) (Regionsgården; Kongens Vænge 2; 3400 Hillerød) decided, since the samples are fully anonymized, that this project could proceed without further permission from “De videnskabsetiske komiteer for Region Hovedstaden” (The scientific ethics committee for the capital region). This decision (journal number H-15008683) was made in accordance with the “komitélovens” (committee law) §1, section 4 and §14, section 3.

METHOD DETAILS

Purification of male gamete eccDNA

eccDNA was purified by the Circle-Seq method (Møller et al., 2018), with the following modifications.

Circle-Seq, cell lysis

Samples of sperm cell liquid in 2 mL Eppendorf tubes were mixed with in 0.61 mL L1 solution (Plasmid Mini AX; A&A Biotechnology) with 15 μl proteinase K (> 0.1 U/μl, Life Technologies) added before incubation at 50°C for 16 hours, 500 rpm (Eppendorf Thermomixer). After this step, 40 μl was sampled to assess DNA concentration using quantitative PCR (qPCR).

Circle-Seq, purification of total DNA

29 human samples were prepared using eccDNA enrichment described previously (Møller et al., 2018) with column chromatography on an ion exchange membrane column (Plasmid Mini AX; A&A Biotechnology) and with precipitated DNA dissolved in 50 μl water for 7–21 ng total DNA (HS1 – HS12), 15–54 ng (HS20-HS23, HS25-HS27, HS29) and 23–54 ng (HS30-HS35, HS37-HS39) by Qubit dsDNA High Sensitivity assay. Furthermore, DNA from HS50, HS54, HS68, HS72 and HS73 ICI and IUI was also purified with the Plasmid Mini AX; A&A Biotechnology. In the case of HS50, HS54 and HS68, total DNA was also purified with the MagAttract HMW DNA Kit, 67563, QIAGEN according to manufacturer’s instruction, to investigate if the DNA extraction method affected the purification of eccDNA.

Circle-Seq, removal of linear DNA

To remove only linear chromosomal DNA and enrich for eccDNA, 44 μl of each sample (150 μl total) was treated with exonuclease V RecBCD (30 U) (10,000u/ml) for enzymatic reactions similar to those described in (Møller et al., 2018). After 144 hours of chromosome digestion, removal of linear DNA was confirmed with PCR using primers specific for a gene expected to be absent from eccDNA (COX5b). Oligos for forward primer 5′ GGGCACCATTTTCCTTGATCAT 3′ and reverse primer 5′ AGTCGCCTGCTCTTCATCAG 3′, with our expectation to observe a PCR band around 100 bp. If linear DNA appeared not to have been fully removed, we continued exonuclease treatment up to 186 hours, confirming complete removal of linear DNA by PCR against COX5b.

Circle-Seq, rolling-circle amplification of eccDNA

Purified DNA (15 μl out of a total of 50 μl) was used for φ29 DNA amplification (SYGINS TruePrime RCA KIT) for 48 hours 30°C. Amplification was finalized by heat inactivation at 65°C for 10 min.

Long-read sequencing with Oxford Nanopore and data acquisition

The φ29-amplified DNA samples were sequenced using Oxford Nanopore long-read sequencing using the Rapid Sequencing Kit SQK-RBK004 and following the manufacturer’s protocol. 5–6 samples of purified eccDNA were attached different barcodes by transposase reaction of the SQK-RBK004 kit then ~100 ng of each library was pooled and the pool libraries were loaded onto flow cells (version R9.5) for MinION Mk1B sequencers. MinKNOW software v18.12.6 was used for DNA sequencing and data acquisition to generate .fast5 files that were first base called using guppy software v 2.3.1 (Wick et al., 2019) which are only available for Oxford Nanopore Technologies customers via their community at https://nanoporetech.com/community. Each sequencing run on a flow cell yield ~1 Gb of data on average. We demultiplexed and trimmed adapters using the software guppy_barcoder with parameters ‘-r–compress_fastq -q 0–barcode_kits SQK-RBK004–trim_barcodes’ and porechop with parameter ‘b’, producing fastq files for further computational analysis. All raw sequencing data are publicly available at the SRA database (Under BioProject number PRJNA655819)

Identification of eccDNA from long reads with NanoCircle

Long-reads were aligned to the reference human genome (Feb. 2009 GRCh37 / hg19) using minimap2 version 2.15 (Li, 2018) with the preset ‘map-ont used’ for Oxford Nanopore reference alignment. Using bedtools version 2.29.2 (Quinlan and Hall, 2010), we selected genomic regions with continuous read coverage as circle candidate regions. Subcommands used were ‘genomecov’ to compute the read coverage for the entire genome and ‘merge’ to combine overlapping reads within a close interval. For each candidate region, reads spanning the circular breakpoint, soft-clipped reads, were extracted and their chromosomal coordinates identified according to the read orientation and the placement of the primary and supplementary alignments. The soft-clipped reads used for coordinate identification can stem from two scenarios, the long-reads being either longer- or shorter than the candidate region. For the long-reads shorter than the candidate region the soft-clipped part of the read will align to the reference genome as a supplementary alignment either upstream or downstream of the primary alignment. For reads longer than the candidate region, the read will span the circle more than once, resulting in an overlap between primary- and supplementary alignment.

In either scenario, the start and end coordinate of the primary alignment can be identified using the read information; for the supplementary alignment the start coordinate was likewise identified and the end coordinate was calculated based on the length of the supplementary alignment and added to the identified supplementary start coordinate. Based on the obtained coordinates of all reads within a representative region, circular coordinates were classified as: 1) high confidence, 2) confident, 3) low confidence, 4) chimeric. High-confidence coordinates had multiple reads supporting the same coordinate set, thus giving the circles a break-point at a nucleotide resolution. Confident coordinates had multiple split-reads supporting different coordinates within the same regions, which due to sequencing errors could vary a few basepairs or several hundred basepairs due to alignment artifacts in repeat elements. For low-confidence coordinates, we obtained only a single split-read due to low sequence coverage in the breakpoint region.

For chimeric coordinates, the primary and supplementary alignments were located on different chromosomes, supporting that the eccDNA was chimeric. The primary alignment always aligned to a single chromosome, whereas the supplementary alignment could align to multiple chromosomal regions depending on the read- and region length. The simple and chimeric eccDNA was identified with NanoCircle version 1.0.0 with subcommand ‘Circles’. With the output being two tab-separated .bed files, one containing the chromosomal coordinates of the single-source eccDNA and the other containing the coordinates of each chromosome fragments in the chimeric eccDNA. Multiple lines in the chimeric eccDNA output might be different configurations of the same chimeric eccDNA. These configurations was merged together using NanoCircle subcommand ‘Merge’ combining the different configurations into a single chimeric eccDNA. To visualize the eccDNA we used Gepard (Krumsiek et al., 2007) to create the dotplot (Figures 2C and 2E), Ribbon (Nattestad et al., 2016) and Circos (Krzywinski et al., 2009) to visualize the order and orientation of the assembled sequences of the chimeric eccDNA.

Identification of eccDNA from short sequence reads Circle-Map

We resequenced 12 samples (HS1-HS12) on an Illumina sequencing platform HiSeq 2500 100bp PE Rapid. We trimmed the data with Trimmomatic v. 0.39 (Bolger et al., 2014) in paired-end mode to remove adapters and low quality or N bases, and removed reads below the default length of 36 nucleotides. We aligned the reads to the hg19 version of the human genome using BWA MEM v.0.7.17-r1188 (Li, 2013) with the –q option activated. All downstream BAM/SAM file processing analyses were performed using Samtools v1.9 (Li et al., 2009). We used Circle-Map v.1.1.2 (Prada-Luengo et al., 2019) to detect eccDNA in these 12 samples by first extracting all soft-clipped, hard-clipped and discordant reads to a new BAM file using the ‘ReadExtractor’ module and then executing the ‘realignment’ module while keeping all parameters at their default settings. To obtain a robust set of circles from the short—read sequencing dataset, we removed all the circles that did not contain at least 2 split reads and 80% coverage.

De novo assembly of eccDNA from long reads with CReSIL

High quality long-reads were aligned on the reference human genome (Feb. 2009 GRCh37 / hg19) using minimap2 version 2.17 (Li, 2018) via a mappy python library to identify the origin of the individual eccDNA molecules based on primary, secondary and supplementary alignment coordinates. We filtered out replicate sequences derived from rolling circle amplification found by discarding secondary alignment region and keep only primary and supplementary alignment region of read for further analysis. We then identified representative regions (similar to breakpoint analysis with NanoCircle) on the read based on the continuity of primary and supplementary alignment results on the read coordinate. Alignment coordinates on the reference genome were merged to calculate the sequencing depth for combined individual regions using the python library pybedtools version 0.8.0 (Dale et al., 2011). We refined merged region identifications by keeping only the category 1 (high confidence) positions that were supported by sequencing depths of at least five. We extracted sequence information from individual reads corresponding to each of the refined regions to generate a representative sequence with pysam version 0.15.3 (Li et al., 2009). Representative sequences and their reference genome alignment location(s) were kept for breakpoint identification. We also constructed an adjacency matrix representing the inter- and intraconnections of merged regions based on alignment information. We created a graph model with nodes representing merged regions that were connected once representative sequences span over the nodes. Each individual subgraph represented a potential eccDNA molecule and the derived representative sequences were combined. Next, we performed de novo assembly on the individual combined reads (subgraphs) using Flye software version 2.5 (Kolmogorov et al., 2019) to construct the eccDNA sequences at their genomic origin.

Circular insertion identification

We used the previously established database of known human variations, including SNPs, indels and structural variations. The database was generated for the human genome (hg38, GRCh38) assembly from a previous study (Sibbesen et al., 2018). This included 150 Danish individuals (50 trios) (Maretty et al., 2017), 2504 individuals from 26 human populations (Sudmant et al., 2015), 769 individuals from 250 Dutch families (Hehir-Kwa et al., 2016), 569 individuals from the Genotype-Tissue Expression Project (Chiang et al., 2017) and dbSNP (Sherry et al., 2001) as it also contains indels. We extracted alternative sequences from variants longer than 150 bp that were categorized as insertions and we aligned the alternative sequences back to the human genome using minimap2 (Li, 2018). Since the sequences varied in length we processed them similar to that of long-reads so we used the same alignment preset as previously described. Split-read alignments with a mapping quality below 20 were left out from the downstream analysis. To identify circular DNA mediated insertions, we examined the alignments for split reads with non-colinear mappings and similarly identified their chromosomal origin as described for NanoCircle considering the read direction and position.

Identification of bacterial DNA contamination

We observed three human samples (HS6, HS7, HS9) to be contaminated based on their extremely low alignment percentage to the human reference genome. We obtained taxonomic information on contaminants derived from raw sequence reads and unmapped reads using two methods. First, we extracted 10,000 random sequence reads for these samples and performed a BLAST search against the start database Nucleotide collection (nt) containing 59745330 sequences of mixed DNA, using blastall version 2.2.26 (Altschul et al., 1990) with parameters ‘-p blastn, -d nt’. Next, we used centrifuge software version 1.0.3 (Kim et al., 2016) with a NCBI nonredundant reference database provided by the software on unaligned reads to identify sources of contaminating DNA. Contamination sources were added to the reference genome used for alignment of these three samples.

Plasmid, reference genomes

Plasmids used for microscopic visualization were p4339 w. TA::MX4-natR (5315 bp) obtained from Charles Boone, University of Toronto and YGPM25009 (17,836 bp) obtained from Open Biosystems. The main reference genome for the alignment procedure was the Feb. 2009 assembly of the human genome (hg19, GRCh37). Minor reference genomes for sources of contamination for specific samples were used to align reads from contaminated samples. Identified contamination sources were: Meyerozyma guilliermondii strain CBS 2030 mitochondrion (NC_022154.1), Pythium ultimum (DAOMs BR144), Pythium ultimum mitochondrion (NC_014280.1), Staphylococcus warneri (NZ_LR134269.1), Staphylococcus equorum strain:KS1039 (NZ_CP013114.1), Bacillus cereus (NC_004722.1) and Bacillus cellulosilyticus (NC_014829.1).

EccDNA annotation

Coordinates of eccDNA obtained NanoCircle was intersected using bedtools intersect version 2.29.2 with two genomic files containing the coordinates of genes and repetitive elements. The first genomic file contained the coordinates of the genes for the Feb. 2009 assembly of the human genome (hg19, GRCh37) obtained from the Ensembl Biomart Release 99 database (Yates et al., 2020) and filtered using the option Chromosome/Scaffold choosing the autosomes, sex chromosomes and mitochondria with the attributes options: Gene start (bp), Gene end (bp) and Gene name. The second genomic file contained the coordinates of all interspersed repeats and low complexity DNA sequences, downloaded from the Genome Browser annotation tracks Repeat Masker (Smit et al., 2015).

EccDNA correlations

For all of the correlations presented throughout, we have used the number of identified circles across 29 samples HS1-HS12, HS20-HS23, HS25-HS27, HS29-HS35 and HS37-HS39. The number of coding genes and the chromosome lengths in mega-bases were obtained from chromosome statistics in the Ensembl GRCh37 Release 99 database (Yates et al., 2020). The meiotic recombination rates used in Figures 2F and 2G were estimated by normalizing the centimorgan values of the male chiasmata map (Morton, 1991 and Kong et al., 2002) for each individual chromosome with the previous obtained chromosome length. A third set of meiotic recombination rates was calculated from Table S3 from (Wang et al., 2012) and similarly normalized (Figure S6A), and a fourth set was calculated differently, by averaging the already normalized centimorgan values (cM/Mb) for all individual chromosomes using the dataset aau1043_datas1.gz (Halldorsson et al., 2019) (Figure S6B). The number of specific repeat elements used in Figure S7. was counted for each individual chromosome using RepeatMasker (Smit et al., 2015) and normalized to the chromosome length. To create the resulting correlations we used eccDNA identified from samples HS1-HS12, HS20-HS23, HS25-HS27, HS29-HS35 and HS37-HS39. The total dataset contained the total of 8926 identified simple eccDNA (obtained from NanoCircle and Circle-map (Prada-Luengo et al., 2019)) from the 29 human semen samples. A control dataset was a subset of 10,000 randomly extracted eccDNAs identified in a previous study of somatic muscle tissue (Møller et al., 2015). The third dataset was a in silico created dataset to simulate randomly formed eccDNA across the genome, without any potential influence from the cis factors we propose to negatively affect the eccDNA formation in male gametogenesis. This dataset consisted of 10,000 chromosomal regions which were extracted from across the genome with each chromosome having the same likelihood of being sampled, with lengths ranging from 200 nt to 11,000 nt to better emulate the size range of our identified eccDNA.

Donor age and eccDNA content

For a subset of the 29 samples, consisting of 17 individuals, we recorded the age of the donor at the time of donation with the sample IDs: HS20-HS23, HS25-HS27, HS29-HS35 and HS37-HS39. We separated the 20 individuals based on their age, into two groups, those below and those above 30 years. We made a correlation between the individuals age and their corresponding number of identified circles to determine if the age of the donor upon donation could potentially influence the number of circles formed in the spermatozoans.

Overrepresentation of genomic content within our eccDNA

The coordinates of all of the 8926 identified eccDNA obtained from NanoCircle and Circle-Map was intersected using bedtools intersect version 2.29.2 with several genomic files containing the coordinates of the CpG islands and the Alu elements which was further separated into three files containing coordinates of the AluJ, AluS and AluY subfamilies. These coordinates was extracted from the RepeatMasker (Smit et al., 2015) file. The intersections between the genomic files and the eccDNA was performed with a required fraction of overlap for the genomic file elements (-F) of 0.3 and one with 1.0.

To determine if the eccDNA identified from the male sperm samples was uniquely formed from gametogenesis we performed intersection with a previously published dataset with 138681 eccDNA detected in human somatic tissue (Møller et al., 2018). This was done with bedtools intersect version 2.29.2 using 80% overlap between the eccDNA from semen and somatic tissue (-r -f 0.8).

G+C content in eccDNA

We calculated the G+C content from the reference sequence of all detected eccDNA intervals and the in silico created dataset that represents a random sample of the genome (see methods for eccDNA correlations). Assuming the G+C content followed a Gaussian distribution; we calculated the mean G+C content.

Microscopy sample preparation

Linear DNA was removed from three human sperm samples (HS1, HS3, HS12) using exonuclease V and samples were mixed with 10 mM Tris buffer for microscopy of supercoiled eccDNA. EccDNA was nicked at a single site using Nb.BsmI to unwind it. Exonuclease-treated DNA was treated with 10U/ml Nb.BsmI in NEBuffer 3.1 and ultrapure water and incubated at 37°C for 30 min before purification on AMpure XP beads (Agencourt). EccDNA was verified by total DNA digestion using endo- and exonuclease. Exonuclease-treated DNA was incubated with endonuclease AluI in AluI buffer (with bovine serum albumin), and ultrapure water at 37°C for 30 min. Prior to the endonuclease step, NEBuffer 4 (10X), 10 mM ATP, and exonuclease V (RecBCD) were added and incubated at 37° for 6 hours. Exonucleases were thermally inactivated at 70°C for 30 min. DNA was purified as above and collected in 10 mM Tris buffer. The negative control was 10 mM Tris buffer. The plasmid controls were p4339 w. TA::MX4-natR and YGPM25009 as described in previous section.

Confocal microscopy

Samples of 12 ng/μL (168 ng) DNA in 10 mM Tris buffer pH 8 were mixed with YOYO-1 (Thermo Fisher Scientific) in a 1:10 dye-to-base pair ratio and 2 μL (1.5 ng) DNA was placed on a StarFrost glass slide with 1.5H cover glass. Also, 14 μl (10.5 ng) DNA was placed into wells on μ-Slide VI 1.5H glass bottom or ibiTreat polymer bottom slides (ibidi), and allowed to sediment 1–2 hours before imaging. Prior to well loading, uncoated μ-Slide VI 1.5 glass bottom wells were coated with 60 μg/ml Poly-L-lysine (P4832, Sigma-Aldrich), and washed with ultrapure water. Images were acquired on a Zeiss LSM780 confocal microscope with plan-Apochromat 63x/1.40 oil M27 objective, with a 25 mW argon laser at 0.5%, and a 32-channel gallium arsenide phosphide (GaAsP) detector array set at 485–547 nm with a pinhole size of 1.0–1.2 Airy units.

SIM microscopy

Samples of 12 ng/μL DNA in 10 mM Tris buffer pH 8 were mixed with YOYO-1 in a 1:10 dye-to-base pair ratio or 0.5% v/v 2-mercaptoethanolor SiR-Hoechst (Spirochrome) in a 1:2 dye-to-base pair ratio without 2-mercaptoethanol. DNA (2 μl, or 1.5 ng) was placed on a StarFrost glass slide with 1.5H coverglass. Images were obtained on a Zeiss ELYRA PS.1 super-resolution microscope with a 488 nm HR 200mW diode at 10%–15%, Bandpass 495–575 nm/Long-pass 750 nm filter cube, using a plan-Apochromat 63x/1.40 oil M27 objective. Three rotations and a Z stack of 11–15 images were used for structured illumination. The operating resolution was confirmed with green fluorescent beads (Figures S1E and S1F.).

Electron microscopy

The 20 μL of the samples were mixed with 1 ul 5% 2,4,6-Tris(dimethylaminomethyl)phenol (DMP-30) and incubated for 10 min on a piece of parafilm. A glow-discharged 400 mesh carbon-coated grid (Quantifoil Micro Tools GmbH, Germany) was placed in top of the liquid bubble and incubated for 2 min. The grids were stained with 4 μL of 2% w/v uranyl acetate, followed by drainage with a filter paper. Negative stain EM micrographs were taken on a Philips CM100 transmission electron microscope (Philips, UK) operated at 100 kV.

Image processing

Confocal images were processed using FIJI (Schindelin et al., 2012), which uses the ‘analyze particle’ tool to count the number of DNA particles in microscope images. Average signal intensities of individual Z stack images were processed using the Z-projection function, and the auto threshold algorithm MaxEntropy established the signal intensity threshold for each image. The ‘analyze particles’ tool identified the number of DNA particles with intensities above the established threshold by applying a particle size setting of > 0.01 micron2 for each image.

SIM images were reconstructed in Zen 3.1 (Zeiss). The most intense image from the Z stacks or a weighted average Z-projection was used for each circular DNA after enhancing the contrast, depending on the background and peak intensity of the fluorescent signal. The length of the circular DNA was measured by tracing the circumference in Zen with another microscope to independently verify the trace. DNA size was calculated based on an average of 2941 bp per micron. Contrast was enhanced depending on background and peak intensity of the fluorescent signal (Figures S1G and S1H).

QUANTIFICATION AND STATISTICAL ANALYSIS

Statistical test

The DNA content measured (Figure 1B) by confocal microscopy with fluorescent staining was done by taking the means of the DNA measurements from three human samples (HS1, HS3, HS12). To determine if any significant differences existed between the means measured of the DNA groups we used a t test. First, we compared the exoV-treated samples (Figure 1B, 2nd bar) to the control without DNA (Figure 1B, 1st bar) and to the sample with circular DNA linearized with AluI and digested with exoV thus expected to be DNA free (Figure 1B, 3rd bar). In both cases, we used a one-tailed t test to test if the mean DNA content in the 2nd bar was significantly greater than the 1st and 3rd bar. When comparing the exoV-treated samples (Figure 1B, 2nd bar) to the sample with circular DNA molecules relaxed with the single-strand nicking enzyme Nb.BsmI (Figure 1B, 4th bar) we used two-tailed t test to test both if the 1st bar is significantly greater and if the 1st bar is significantly less than the 4th bar. To measure the linear relationship between the number of eccDNA and the recombination rate (Figure 2F), the number of coding genes (Figure 3B) and the number of Alu elements (Figure 3C), we used the Pearson correlation. A positive r-value of +1 indicates total positive linear correlation and r-value of −1 indicates total negative linear correlation. With the p value indicating whether the correlation coefficient is significantly different from 0 (no linear relationship).

Supplementary Material

Supplemental information
TableS3
TableS2
TableS1
TableS4
TableS5

KEY RESOURCES TABLE

REAGENT or RESOURCE SOURCE IDENTIFIER
Biological samples

Human Sperm Cells, fully anonymized European Sperm bank (Struenseegade 9A, 2; 2200 Copenhagen N – Denmark) europeanspermbank.com

Chemicals, peptides, and recombinant proteins

exonuclease V RecBCD New England Biolabs (BioNordika Denmark) NEB-M0345L
proteinase K ThermoFisher #EO0491

Critical commercial assays

Oxford Nanopore Rapid Barcoding Kit Oxford Nanopore SQK-RBK004
φ29 DNA amplification, TruePrime RCA KIT 4basebio; https://www.4basebio.com/ SKU: 390100
L solutions from the Plasmid Mini AX kit A&A Biotechnology #01050
Plasmid Mini AX A&A Biotechnology #01050
MagAttract HMW DNA Kit QIAGEN 67563

Deposited data

Raw and analyzed data This paper Sequence Read Archive database, BioProject number PRJNA655819
Human reference genome NCBI build 37, GRCh37 Genome Reference Consortium https://www.ncbi.nlm.nih.gov/projects/genome/assembly/grc/human/
Database with known human variations, including SNPs, indels and structural variations Sibbesen et al., 2018 variation prior database, https://github.com/bioinformatics-centre/BayesTyper
Genomic file containing the coordinates of the genes, for the Feb. 2009 assembly of the human genome (hg19, GRCh37) The Ensembl Biomart Release 99 database https://www.ensembl.org/index.html
Genomic file contained the coordinates of all interspersed repeats and low complexity DNA sequences Genome Browser annotation tracks Repeat Masker https://www.repeatmasker.org/RepeatMasker/
Centimorgan values of the male chiasmata map used in Figure 2F Morton, 1991 https://www.ncbi.nlm.nih.gov/pmc/articles/PMC52322/
Centimorgan values of the male chiasmata map used in Figure 2G Kong et al., 2002 https://pubmed.ncbi.nlm.nih.gov/12053178/
Meiotic recombination rate used in Figure S6A Wang et al., 2012 Table S3, https://pubmed.ncbi.nlm.nih.gov/22817899/
The normalized centimorgan values (cM/Mb) used in Figure S6B Halldorsson et al., 2019 aau1043_datas1.gz, https://pubmed.ncbi.nlm.nih.gov/30679340/

Oligonucleotides

Forward primer 5′ GGGCACCA TTTTCCTTGATCAT 3′ for gene absent from eccDNA (COX5b) tag copenhagen http://tagc.dk/
Revers primer 5′ AGTCGCCTG CTCTTCATCAG 3′ for gene absent from eccDNA (COX5b) tag copenhagen http://tagc.dk/

Software and algorithms

NanoCircle This paper https://doi.org/10.5281/zenodo.5720805
CReSIL This paper https://doi.org/10.5281/zenodo.5720805
Circle-Map v.1.1.2 Prada-Luengo et al., 2019 https://github.com/iprada/Circle-Map
MinKNOW software v18.12.6 Oxford Nanopore https://nanoporetech.com/community
guppy_barcoder v 2.3.1 Wick et al., 2019 https://nanoporetech.com/community
minimap2 v. 2.15 Li, 2018 https://github.com/lh3/minimap2
BWA mem v.0.7.17-r1188 Li, 2013 http://bio-bwa.sourceforge.net/bwa.shtml
Bedtools v. 2.29.2 Quinlan and Hall, 2010 https://bedtools.readthedocs.io/en/latest/
Samtools v. 1.9 Li et al., 2009 http://www.htslib.org/
Trimmomatic v. 0.39 Bolger et al., 2014 http://www.usadellab.org/cms/?page=trimmomatic

Highlights.

  • Extrachromosomal circular DNA (eccDNA) is common in the human male germline

  • Chromosomes with high gene and Alu-element densities form the least eccDNA

  • Human meiotic recombination rate is inversely correlated to DNA circularization

ACKNOWLEDGMENTS

We thank Jacobus Jan “Koos” Boomsma and Guojie Zhang, Department of Biology, Section for Ecology and Evolution at the University of Copenhagen for critical reading of the manuscript. For technical assistance, we thank Sefa Alizadeh for purifying eccDNA and Pablo Hernandez-Varas for assisting with microscopy. This work is supported by Independent Research Fund Denmark (FNU 6108-00171B to the project, I.P.-L. and B.R.), the VILLUM Foundation (00023247 to R.A.H. and B.R.), and the National Institute of General Medical Sciences of the National Institutes of Health (P20GM125503 to I.N.).

Footnotes

DECLARATION OF INTERESTS

The authors declare no competing interests.

SUPPLEMENTAL INFORMATION

Supplemental information can be found online at https://doi.org/10.1016/j.molcel.2021.11.027.

REFERENCES

  1. Altschul SF, Gish W, Miller W, Myers EW, and Lipman DJ (1990). Basic local alignment search tool. J. Mol. Biol. 215, 403–410. [DOI] [PubMed] [Google Scholar]
  2. Bolger AM, Lohse M, and Usadel B. (2014). Trimmomatic: a flexible trimmer for Illumina sequence data. Bioinformatics 30, 2114–2120. [DOI] [PMC free article] [PubMed] [Google Scholar]
  3. Brar GA, Hochwagen A, Ee LSS, and Amon A. (2009). The multiple roles of cohesin in meiotic chromosome morphogenesis and pairing. Mol. Biol. Cell 20, 1030–1047. [DOI] [PMC free article] [PubMed] [Google Scholar]
  4. Chen X, Ke Y, Wu K, Zhao H, Sun Y, Gao L, Liu Z, Zhang J, Tao W, Hou Z, et al. (2019). Key role for CTCF in establishing chromatin structure in human embryos. Nature 576, 306–310. [DOI] [PubMed] [Google Scholar]
  5. Chiang C, Scott AJ, Davis JR, Tsang EK, Li X, Kim Y, Hadzic T, Damani FN, Ganel L, Montgomery SB, et al. ; GTEx Consortium (2017). The impact of structural variation on human gene expression. Nat. Genet.49, 692–699. [DOI] [PMC free article] [PubMed] [Google Scholar]
  6. Dale RK, Pedersen BS, and Quinlan AR (2011). Pybedtools: a flexible Python library for manipulating genomic datasets and annotations. Bioinformatics 27, 3423–3424. [DOI] [PMC free article] [PubMed] [Google Scholar]
  7. de Kretser DM, Loveland KL, Meinhardt A, Simorangkir D, and Wreford N. (1998). Spermatogenesis. Hum. Reprod. 13 (Suppl 1), 1–8. [DOI] [PubMed] [Google Scholar]
  8. Deininger P. (2011). Alu elements: know the SINEs. Genome Biol. 12, 236. [DOI] [PMC free article] [PubMed] [Google Scholar]
  9. Dennis MY, and Eichler EE (2016). Human adaptation and evolution by segmental duplication. Curr. Opin. Genet. Dev. 41, 44–52. [DOI] [PMC free article] [PubMed] [Google Scholar]
  10. Durkin K, Coppieters W, Drö gemu€ller C, Ahariz N, Cambisano N, Druet T, Fasquelle C, Haile A, Horin P, Huang L, et al. (2012). Serial translocation by means of circular intermediates underlies colour sidedness in cattle. Nature 482, 81–84. [DOI] [PubMed] [Google Scholar]
  11. Gresham D, Usaite R, Germann SM, Lisby M, Botstein D, and Regenberg B. (2010). Adaptation to diverse nitrogen-limited environments by deletion or extrachromosomal element formation of the GAP1 locus. Proc. Natl. Acad. Sci. U S A 107, 18551–18556. [DOI] [PMC free article] [PubMed] [Google Scholar]
  12. Grimwood J, Gordon LA, Olsen A, Terry A, Schmutz J, Lamerdin J, Hellsten U, Goodstein D, Couronne O, Tran-Gyamfi M, et al. (2004). The DNA sequence and biology of human chromosome 19. Nature 428, 529–535. [DOI] [PubMed] [Google Scholar]
  13. Grover D, Mukerji M, Bhatnagar P, Kannan K, and Brahmachari SK (2004). Alu repeat analysis in the complete human genome: trends and variations with respect to genomic composition. Bioinformatics 20, 813–817. [DOI] [PubMed] [Google Scholar]
  14. Hakimi MA, Bochar DA, Schmiesing JA, Dong Y, Barak OG, Speicher DW, Yokomori K, and Shiekhattar R. (2002). A chromatin remodelling complex that loads cohesin onto human chromosomes. Nature 418, 994–998. [DOI] [PubMed] [Google Scholar]
  15. Halldorsson BV, Palsson G, Stefansson OA, Jonsson H, Hardarson MT, Eggertsson HP, Gunnarsson B, Oddsson A, Halldorsson GH, Zink F, et al. (2019). Characterizing mutagenic effects of recombination through a sequence-level genetic map. Science 363, eaau1043. [DOI] [PubMed] [Google Scholar]
  16. Hehir-Kwa JY, Marschall T, Kloosterman WP, Francioli LC, Baaijens JA, Dijkstra LJ, Abdellaoui A, Koval V, Thung DT, Wardenaar R, et al. ; Genome of the Netherlands Consortium (2016). A high-quality human reference panel reveals the complexity and distribution of genomic structural variants. Nat. Commun. 7, 12989. [DOI] [PMC free article] [PubMed] [Google Scholar]
  17. Helmsauer K, Valieva ME, Ali S, Chamorro González R, Schöpflin R, Röefzaad C, Bei Y, Dorado Garcia H, Rodriguez-Fos E, Puiggròs M, et al. (2020). Enhancer hijacking determines extrachromosomal circular MYCN amplicon architecture in neuroblastoma. Nat. Commun. 11, 5823. [DOI] [PMC free article] [PubMed] [Google Scholar]
  18. Jain M, Koren S, Miga KH, Quick J, Rand AC, Sasani TA, Tyson JR, Beggs AD, Dilthey AT, Fiddes IT, et al. (2018). Nanopore sequencing and assembly of a human genome with ultra-long reads. Nat. Biotechnol. 36, 338–345. [DOI] [PMC free article] [PubMed] [Google Scholar]
  19. Kim D, Song L, Breitwieser FP, and Salzberg SL (2016). Centrifuge: rapid and sensitive classification of metagenomic sequences. Genome Res. 26, 1721–1729. [DOI] [PMC free article] [PubMed] [Google Scholar]
  20. King GA, Goodman JS, Schick JG, Chetlapalli K, Jorgens DM, McDonald KL, and Ünal E. (2019). Meiotic cellular rejuvenation is coupled to nuclear remodeling in budding yeast. eLife 8, e47156. [DOI] [PMC free article] [PubMed] [Google Scholar]
  21. Koche RP, Rodriguez-Fos E, Helmsauer K, Burkert M, MacArthur IC, Maag J, Chamorro R, Munoz-Perez N, Puiggròs M, Dorado Garcia H, et al. (2020). Extrachromosomal circular DNA drives oncogenic genome re-modeling in neuroblastoma. Nat. Genet. 52, 29–34. [DOI] [PMC free article] [PubMed] [Google Scholar]
  22. Kolmogorov M, Yuan J, Lin Y, and Pevzner PA (2019). Assembly of long, error-prone reads using repeat graphs. Nat. Biotechnol. 37, 540–546. [DOI] [PubMed] [Google Scholar]
  23. Kong A, Gudbjartsson DF, Sainz J, Jonsdottir GM, Gudjonsson SA, Richardsson B, Sigurdardottir S, Barnard J, Hallbeck B, Masson G, et al. (2002). A high-resolution recombination map of the human genome. Nat. Genet. 31, 241–247. [DOI] [PubMed] [Google Scholar]
  24. Krumsiek J, Arnold R, and Rattei T. (2007). Gepard: a rapid and sensitive tool for creating dotplots on genome scale. Bioinformatics 23, 1026–1028. [DOI] [PubMed] [Google Scholar]
  25. Krzywinski M, Schein J, Birol I, Connors J, Gascoyne R, Horsman D, Jones SJ, and Marra MA (2009). Circos: an information aesthetic for comparative genomics. Genome Res. 19, 1639–1645. [DOI] [PMC free article] [PubMed] [Google Scholar]
  26. Lander ES, Linton LM, Birren B, Nusbaum C, Zody MC, Baldwin J, Devon K, Dewar K, Doyle M, FitzHugh W, et al. ; International Human Genome Sequencing Consortium (2001). Initial sequencing and analysis of the human genome. Nature 409, 860–921. [DOI] [PubMed] [Google Scholar]
  27. Li H. (2013). Aligning sequence reads, clone sequences and assembly contigs with BWA-MEM. arXiv, arXiv:1303.3997 https://arxiv.org/abs/1303.3997. [Google Scholar]
  28. Li H. (2018). Minimap2: pairwise alignment for nucleotide sequences. Bioinformatics 34, 3094–3100. [DOI] [PMC free article] [PubMed] [Google Scholar]
  29. Li H, Handsaker B, Wysoker A, Fennell T, Ruan J, Homer N, Marth G, Abecasis G, and Durbin R; 1000 Genome Project Data Processing Subgroup (2009). The Sequence Alignment/Map format and SAMtools. Bioinformatics 25, 2078–2079. [DOI] [PMC free article] [PubMed] [Google Scholar]
  30. Maretty L, Jensen JM, Petersen B, Sibbesen JA, Liu S, Villesen P, Skov L, Belling K, Theil Have C, Izarzugaza JMG, et al. (2017). Sequencing and de novo assembly of 150 genomes from Denmark as a population reference. Nature 548, 87–91. [DOI] [PubMed] [Google Scholar]
  31. Miga KH (2019). Centromeric satellite DNAs: hidden sequence variation in the human population. Genes (Basel) 10, 352. [DOI] [PMC free article] [PubMed] [Google Scholar]
  32. Mitchell JS, Laughton CA, and Harris SA (2011). Atomistic simulations reveal bubbles, kinks and wrinkles in supercoiled DNA. Nucleic Acids Res. 39, 3928–3938. [DOI] [PMC free article] [PubMed] [Google Scholar]
  33. Møller HD, Parsons L, Jørgensen TS, Botstein D, and Regenberg B. (2015). Extrachromosomal circular DNA is common in yeast. Proc. Natl. Acad. Sci. U S A 112, E3114–E3122. [DOI] [PMC free article] [PubMed] [Google Scholar]
  34. Møller HD, Mohiyuddin M, Prada-Luengo I, Sailani MR, Halling JF, Plomgaard P, Maretty L, Hansen AJ, Snyder MP, Pilegaard H, et al. (2018). Circular DNA elements of chromosomal origin are common in healthy human somatic tissue. Nat. Commun. 9, 1069. [DOI] [PMC free article] [PubMed] [Google Scholar]
  35. Møller HD, Ramos-Madrigal J, Prada-Luengo I, Gilbert MTP, and Regenberg B. (2020). Near-Random distribution of chromosome-derived circular DNA in the condensed genome of pigeons and the larger, more repeatrich human genome. Genome Biol. Evol. 12, 3762–3777. [DOI] [PMC free article] [PubMed] [Google Scholar]
  36. Morton NE (1991). Parameters of the human genome. Proc. Natl. Acad. Sci. U S A 88, 7474–7476. [DOI] [PMC free article] [PubMed] [Google Scholar]
  37. Morton AR, Dogan-Artun N, Faber ZJ, MacLeod G, Bartels CF, Piazza MS, Allan KC, Mack SC, Wang X, Gimple RC, et al. (2019). Functional enhancers shape extrachromosomal oncogene amplifications. Cell 179, 1330–1341.e13. [DOI] [PMC free article] [PubMed] [Google Scholar]
  38. Nattestad M, Chin CS, and Schatz MC (2016). Ribbon: visualizing complex genome alignments and structural variation. BioRxiv, 082123. [Google Scholar]
  39. Patthy L. (1999). Genome evolution and the evolution of exon-shuffling—a review. Gene 238, 103–114. [DOI] [PubMed] [Google Scholar]
  40. Prada-Luengo I, Krogh A, Maretty L, and Regenberg B. (2019). Sensitive detection of circular DNAs at single-nucleotide resolution using guided realignment of partially aligned reads. BMC Bioinformatics 20, 663. [DOI] [PMC free article] [PubMed] [Google Scholar]
  41. Prada-Luengo I, Møller HD, Henriksen RA, Gao Q, Larsen CE, Alizadeh S, Maretty L, Houseley J, and Regenberg B. (2020). Replicative aging is associated with loss of genetic heterogeneity from extrachromosomal circular DNA in Saccharomyces cerevisiae. Nucleic Acids Res. 48, 7883–7898. [DOI] [PMC free article] [PubMed] [Google Scholar]
  42. Quinlan AR, and Hall IM (2010). BEDTools: a flexible suite of utilities for comparing genomic features. Bioinformatics 26, 841–842. [DOI] [PMC free article] [PubMed] [Google Scholar]
  43. Schindelin J, Arganda-Carreras I, Frise E, Kaynig V, Longair M, Pietzsch T, Preibisch S, Rueden C, Saalfeld S, Schmid B, et al. (2012). Fiji: an open-source platform for biological-image analysis. Nat. Methods 9, 676–682. [DOI] [PMC free article] [PubMed] [Google Scholar]
  44. Sherry ST, Ward MH, Kholodov M, Baker J, Phan L, Smigielski EM, and Sirotkin K. (2001). dbSNP: the NCBI database of genetic variation. Nucleic Acids Res. 29, 308–311. [DOI] [PMC free article] [PubMed] [Google Scholar]
  45. Sibbesen JA, Maretty L, and Krogh A; Danish Pan-Genome Consortium (2018). Accurate genotyping across variant classes and lengths using variant graphs. Nat. Genet. 50, 1054–1059. [DOI] [PubMed] [Google Scholar]
  46. Smit AFA, Hubley R, and Green P. (2015). RepeatMasker. http://www.repeatmasker.org/RMDownload.html.
  47. Stoll C, and Roth MP (1983). Segregation of a 22 ring chromosome in three generations. Hum. Genet. 63, 294–296. [DOI] [PubMed] [Google Scholar]
  48. Sudmant PH, Rausch T, Gardner EJ, Handsaker RE, Abyzov A, Huddleston J, Zhang Y, Ye K, Jun G, Fritz MHY, et al. ; 1000 Genomes Project Consortium (2015). An integrated map of structural variation in 2,504 human genomes. Nature 526, 75–81. [DOI] [PMC free article] [PubMed] [Google Scholar]
  49. Turner DJ, Miretti M, Rajan D, Fiegler H, Carter NP, Blayney ML, Beck S, and Hurles ME (2008). Germline rates of de novo meiotic deletions and duplications causing several genomic disorders. Nat. Genet. 40, 90–95. [DOI] [PMC free article] [PubMed] [Google Scholar]
  50. Turner KM, Deshpande V, Beyter D, Koga T, Rusert J, Lee C, Li B, Arden K, Ren B, Nathanson DA, et al. (2017). Extrachromosomal oncogene amplification drives tumour evolution and genetic heterogeneity. Nature 543, 122–125. [DOI] [PMC free article] [PubMed] [Google Scholar]
  51. Von Hoff DD, Needham-VanDevanter DR, Yucel J, Windle BE, and Wahl GM (1988). Amplified human MYC oncogenes localized to replicating submicroscopic circular DNA molecules. Proc. Natl. Acad. Sci. U S A 85, 4804–4808. [DOI] [PMC free article] [PubMed] [Google Scholar]
  52. Wang J, Fan HC, Behr B, and Quake SR (2012). Genome-wide single-cell analysis of recombination activity and de novo mutation rates in human sperm. Cell 150, 402–412. [DOI] [PMC free article] [PubMed] [Google Scholar]
  53. Wick RR, Judd LM, and Holt KE (2019). Performance of neural network basecalling tools for Oxford Nanopore sequencing. Genome Biol. 20, 129. [DOI] [PMC free article] [PubMed] [Google Scholar]
  54. Yates AD, Achuthan P, Akanni W, Allen J, Allen J, Alvarez-Jarreta J, Amode MR, Armean IM, Azov AG, Bennett R, et al. (2020). Ensembl 2020. Nucleic Acids Res. 48 (D1), D682–D688. [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Supplemental information
TableS3
TableS2
TableS1
TableS4
TableS5

Data Availability Statement

  • All raw sequencing data are publicly available as of the date of publication at the SRA database (Under BioProject number PRJNA655819).

  • The NanoCircle and CReSIL workflows are available at Zenodo - https://doi.org/10.5281/zenodo.5720805

  • Any additional information required to reanalyze the data reported in this paper is available from the lead contact upon request

RESOURCES