Abstract
Land plant organellar genomes have extremely low rates of point mutation yet also experience high rates of recombination and genome instability. Characterizing the molecular machinery responsible for these patterns is critical for understanding the evolution of these genomes. While much progress has been made toward understanding recombination activity in land plant organellar genomes, the relationship between recombination pathways and point mutation rates remains uncertain. The organellar-targeted mutS homolog MSH1 has previously been shown to suppress point mutations as well as non-allelic recombination between short repeats in Arabidopsis thaliana. We therefore implemented high-fidelity Duplex Sequencing to test if other genes that function in recombination and maintenance of genome stability also affect point mutation rates. We found small to moderate increases in the frequency of single nucleotide variants (SNVs) and indels in mitochondrial and/or plastid genomes of A. thaliana mutant lines lacking radA, recA1, or recA3. In contrast, osb2 and why2 mutants did not exhibit an increase in point mutations compared to wild-type (WT) controls. In addition, we analyzed the distribution of SNVs in previously generated Duplex Sequencing data from A. thaliana organellar genomes and found unexpected strand asymmetries and large effects of flanking nucleotides on mutation rates in WT plants and msh1 mutants. Finally, using long-read Oxford Nanopore sequencing, we characterized structural variants in organellar genomes of the mutant lines and show that different short repeat sequences become recombinationally active in different mutant backgrounds. Together, these complementary sequencing approaches shed light on how recombination may impact the extraordinarily low point mutation rates in plant organellar genomes.
Keywords: organelle mutation, Duplex Sequencing, mitochondria, plastid, recombination, single nucleotide variant, indel, repair, structural variant
Introduction
Nearly all eukaryotes rely on genes encoded in endosymbiotically derived mitochondrial genomes (mtDNAs) for cellular respiration. Plants and algae additionally rely on the endosymbiotically derived plastid genome (cpDNA) for photosynthesis. In several regards, land plant organellar genome evolution is atypical compared to mtDNA evolution in other eukaryotes (Smith and Keeling 2015). For one, plant organellar genomes have low nucleotide substitution rates relative to those in plant nuclear genomes and to those of many other eukaryotic mtDNAs. The low substitution rates of plant organellar genomes extend even to synonymous sites, which likely experience very little purifying selection, suggesting that the cause of the low evolutionary rates is a low underlying point mutation rate (Wolfe et al. 1987; Drouin et al. 2008).
Compared to the small mtDNAs typical in metazoans (generally below 20 kb) and in algae and fungi (with sizes ranging from approximately 13 to 96 kb and ∼20 to 235 kb, respectively), land plant mtDNAs are much larger with sequenced mtDNAs averaging 395 kb (Wu et al. 2022) and a known range extending from 70 kb to over 10 Mb (Boore 1999; Sloan et al. 2012; Gualberto and Newton 2017; Skippington et al. 2017; Sandor et al. 2018; Chen et al. 2019). Very little of this size variation stems from differences in coding capacity, as plant mtDNAs generally contain a subset of the same 41 protein-coding genes (Mower et al. 2012). Instead, the fluctuations in total mtDNA size primarily result from the acquisition and loss of noncoding DNA. Even closely related species possess very little shared noncoding sequence (Kubo and Newton 2008; Skippington et al. 2017). For example, a comparative analysis of the mtDNAs of 2 species within the Brassicaceae, Arabidopsis thaliana (367 kb) and Brassica napus (222 kb), revealed a mere 78 kb of shared sequence, most of which is coding (Handa 2003). Though size variation of cpDNAs is less extreme than in plant mtDNAs, variation still exists with 98.7% of sequenced land plant cpDNAs ranging from 100 to 200 kb in size (Xiao-Ming et al. 2017).
Plant organellar genomes also experience exceptionally high rates of structural mutation and rearrangement (Palmer and Herbon 1988). As a result, there is virtually no conservation of synteny between plant mtDNAs, as evidenced by the extensive rearrangements in alignments of mtDNAs from Col-0 and Ler ecotypes of A. thaliana (Stupar et al. 2001; Huang et al. 2005; Davila et al. 2011; Pucker et al. 2019; Zou et al. 2022). The structural instability in plant mtDNAs is partly explained by the presence of repeats of various lengths, which recombine frequently and give rise to multiple isomeric subgenomes with circular, linear, and/or branched structures (Palmer and Herbon 1988; Alverson et al. 2011; Wynn and Christensen 2019). In fact, plant mtDNAs lack origins of replication, which help coordinate genome replication in many other eukaryotes, and are instead thought to replicate through break-induced recombination (Gualberto and Newton 2017; Chevigny et al. 2020). Land plant cpDNAs are also recombinationally active but usually remain structurally conserved, albeit with some significant exceptions (Smith and Keeling 2015).
The seemingly disparate features of plant organellar evolution (i.e. high rates of recombination and low rates of point mutation) may be unified through a DNA repair mechanism reliant on recombination (Christensen 2014). This hypothesized mechanism hinges on the activity of the mutS homolog MSH1 (Abdelnoor et al. 2003), which, like all plant organellar DNA maintenance proteins, is encoded in the nuclear genome. Upon translation, MSH1 is dual-targeted to mitochondria and plastids and has long been known to suppress non-allelic recombination between intermediate-sized repeats (50–600 bps) in the A. thaliana mtDNA (Martínez-Zapater et al. 1992; Arrieta-Montiel et al. 2009; Davila et al. 2011; Zou et al. 2022). Plant MSH1 is a chimeric fusion of a mutS gene with a GIY-YIG endonuclease domain (Abdelnoor et al. 2006) that has been proposed to introduce double-stranded breaks (DSBs) in organellar DNA at the site of mismatches, which would then be repaired through homologous recombination (HR) (Christensen 2014, 2018; Ayala-García et al. 2018; Broz et al. 2022). Assays conducted on purified MSH1 in vitro have found that it has DNA binding and endonuclease activity with affinity for displacement loops (D-loops) (Peñafiel-Ayala et al. 2023).
We previously found support for a MSH1-mediated link between recombination and point mutations by using a high-fidelity Duplex Sequencing technique (Kennedy et al. 2014) to screen for single nucleotide variants (SNVs) and indels in msh1 mutants (Wu et al. 2020). In that study, we also included a panel of mutants lacking functional copies of other genes involved in organellar DNA replication, recombination, and/or repair, including the recombination protein RECA3, the paralogous organellar DNA polymerases POLIA and POLIB, and the glycosylases UNG, FPG, and OGG (Wu et al. 2020). Compared to wild-type (WT) lines, msh1 mutants incurred SNVs at a ∼10-fold increase in mtDNA and a ∼100-fold increase in cpDNA, and increases in indel frequencies were even greater. In contrast, recA3 mutants showed only a small (and marginally significant) increase in mtDNA mutation, and none of the other lines in the mutant panel showed a significant increase in SNVs or indels compared to WT plants (Wu et al. 2020). Thus, in contrast to the many genes that have been implicated in mediating recombinational activity and structural rearrangements in plant organellar genomes (Gualberto and Newton 2017), very little progress has been made in identifying factors that determine point mutation rates. Because the effect of MSH1 activity on lowering point mutation rate is thought to proceed via recombinational repair, we hypothesized that genes regulating downstream recombination in plant organellar genomes may also affect point mutation rates. In particular, if DSBs can be repaired by alternative recombination pathways that differ in fidelity, disruption of 1 pathway could lead to reliance on more error-prone repair mechanisms.
Here, we investigate additional organellar genome repair proteins (WHY2, RADA, RECA1, OSB2) known to play a role in the suppression of non-allelic recombination in the A. thaliana organellar genomes. WHY2 is a mitochondrially targeted whirly protein that binds single-stranded DNA to inhibit recombination between small repeated sequences via microhomology-mediated end joining (MMEJ) (Cappadocia et al. 2010) and is also the most abundant protein in mitochondrial nucleoids (as measured in A. thaliana cell culture; Fuchs et al. 2020). RADA is a dual-targeted DNA helicase, which has been shown to accelerate the processing of recombination intermediates and promote mtDNA stability in A. thaliana (Chevigny et al. 2022). RECA1 is a plastid-targeted protein that has been proposed to act synergistically with plastid whirly proteins to promote plastid genome integrity either by facilitating polymerase lesion bypass or by reversing stalled replication forks (Rowan et al. 2010; Zampini et al. 2015). OSB2 is a plastid-targeted single-stranded DNA binding protein that has been shown to hamper MMEJ in vitro (García-Medel et al. 2021). Given that we previously saw a weak signal of increased mtDNA mutation in recA3 mutants (Wu et al. 2020), we included another recA3 mutant allele in this study. In addition to these newly generated mutant lines, we also present an extended analysis of Duplex Sequencing data from Wu et al. (2020) to understand how SNVs are distributed among genomic regions, strand (template vs non-template) of genic regions, and trinucleotide contexts. Finally, we also performed long-read Oxford Nanopore sequencing on the mutant lines, allowing us to study structural mutations and rearrangements. Collectively, these analyses provide a detailed characterization of the effects of numerous recombination-related genes on point mutations and structural variants in plant organellar genomes.
Methods
Generation and analysis of Duplex Sequencing libraries for SNV and indel detection
Like all plant organellar DNA maintenance genes, the genes of interest in this study are encoded in the nuclear genome and targeted to the organelles after they are translated. We obtained seeds for A. thaliana osb2, radA, recA1, recA3, and why2 mutants from the Arabidopsis Biological Resource Center (Supplementary Table 1). The generation of Duplex Sequencing data from mutants and matched WT controls (including crossing, plant growth, organelle isolation, DNA extraction, and library preparation) closely followed our previously described protocols (Wu et al. 2020). For each gene of interest, homozygous mutants were used as the paternal pollinators in crosses against WT maternal plants, which introduced “clean” organellar genomes (i.e. never exposed to a mutant background) into the resulting heterozygous F1s. The presence of 1 WT allele in the F1 heterozygotes should be sufficient for WT-like organelle genome maintenance since the mutant alleles of the repair genes of interest are thought to act recessively (Shedge et al. 2007; Cappadocia et al. 2010; Rowan et al. 2010; Zampini et al. 2015; Wu et al. 2020; García-Medel et al. 2021; Chevigny et al. 2022). The heterozygous F1s were then allowed to self-cross, and we identified 3 homozygous mutant and 3 homozygous WT F2s, which were also allowed to self-cross. Families of F3 seeds were grown together to obtain sufficient leaf tissue for organelle isolation and mutation detection via Duplex Sequencing.
The only notable differences between the methods in this study compared to Wu et al. (2020) were as follows: (1) we only isolated organelles for which the protein of interest is targeted (plastid: OSB2, RADA, and RECA1; mitochondrial: RADA, RECA3, and WHY2), whereas in Wu et al. (2020), we isolated both organelles regardless of targeting. (2) We adjusted our Duplex Sequencing library construction protocol to obtain larger inserts by ultrasonicating the DNA for only 60 s (3 bouts of 20 s, with 15-s pauses between each) and size selecting libraries with a 2% gel on a BluePippin (Sage Science), using a specified target range of 400–700 bp. (3) We implemented a new approach to filter spurious variant calls resulting from nuclear insertions of mtDNA and cpDNA (NUMTs and NUPTs) by comparing putative mutations directly against the A. thaliana nuclear genome (TAIR 10.2; Berardini et al. 2015) and the new assembly of the large NUMT on chromosome 2 (Fields et al. 2022), replacing the k-mer-based NUMT/NUPT filtering approach described in Wu et al. (2020). (4) We performed trinucleotide and strand asymmetry mutation frequency analyses (also described in Waneka et al. 2021) to understand the distribution of de novo mutations among trinucleotide contexts and between template vs non-template strands of genic sequences, respectively. These analyses both rely on Duplex Sequencing coverage of the specific trinucleotide or strand as the denominator of the mutation frequency calculation, so they are not biased by the enrichment of some trinucleotide sequences or differences in nucleotide composition between template vs non-template strands (https://github.com/dbsloan/recomb_mutant_seq). (5) Finally, we added an analysis of dinucleotide mutation frequencies, i.e. sites with 2 adjacent SNVs (Waneka et al. 2021; https://github.com/dbsloan/recomb_mutant_seq).
Generation and analysis of nanopore sequencing libraries for structural variant detection
Nanopore libraries were produced from the same DNA samples that were used for Duplex Sequencing. Sequencing libraries were created following the protocol outlined in the Oxford Nanopore Technologies Rapid Barcoding Kit 96 (SQK-RBK110-96) manual (v110 Mar 24, 2021 revision) and were sequenced on MinION flow cells (FLO-MIN106) under the control of MinKNOW software v22.08.4 or 22.08.9. Multiplexed libraries from cpDNA samples were pooled and run on a single flow cell, whereas pooled mtDNA libraries were run on 2 flow cells. All runs were conducted for 72 h with a minimum read length of 200 bp. Data were processed using the Guppy Basecalling Software v6.3.4+cfaa134.
We sequenced 3 mutant replicates and 1 matched WT control for each gene of interest. Mutant lines for the cpDNA samples included msh1 (CS3246), osb2, recA1, and radA (only 2 radA mutants were sequenced due to a lack of DNA in mutant replicate 2), while mutant lines for the mtDNA samples included msh1 (CS3246), recA3, why2, and radA. The total sequencing yield (3.72 Gb) in our initial run of 15 cpDNA samples was an order of magnitude higher than our subsequent run with the 16 mtDNA samples (0.33 Gb). To increase mtDNA coverage, we re-sequenced 12 of those mtDNA samples (all but the msh1 mutants and matched WT control) in a third run, which had a similar low yield (0.42 Gb) to the second run. In all cases, samples were run on fresh flow cells as opposed to flow cells that had been washed for a second run. Because the msh1 and radA mtDNA samples produced very little data (Supplementary Table 2), we used the mtDNA contamination in the msh1 and radA cpDNA samples in downstream analyses of the nanopore data.
To calculate mitochondrial and plastid read depth, we aligned the nanopore reads to the organellar genomes with minimap2 (version 2.24; Li 2018) and tabulated depth at each position with bedtools (version 2.30.0; Quinlan and Hall 2010). We calculated the average depth in 1,000-bp sliding windows tiling the organellar genomes and plotted depth as a normalized mutant:WT ratio.
The nanopore reads were analyzed with HiFi-SR (https://github.com/zouyinstein/hifisr), a software tool developed to identify structural variants using BLASTn alignments of long reads in plant organellar genomes (Zou et al. 2022). Because the tool was originally developed for PacBio HiFi reads, which are more accurate than nanopore reads, we required at least 2 independent nanopore reads to support putative indels. In addition, we constrained our analysis to reads with only 1 or 2 BLASTn hits, disregarding the reads with 3 or more BLASTn hits (which may originate from reads that span 2 or more recombined repeats). For reads with 2 BLASTn hits, we compared the breakpoints of putative recombination events with the repeats in the A. thaliana organellar genomes, which are reported in Tables S10 (mtDNA) and S28 (cpDNA) by Zou et al. (2022). We calculated recombination frequencies for each repeat pair as the number of recombined reads divided by the total number of repeat-spanning reads. To compute genome-wide repeat frequencies, we restricted the analyses to repeats that showed a total of at least 10 mtDNA recombination reads across all replicates. Because cpDNA recombination events were much less common, we lowered the threshold to a minimum of 3 recombining reads per repeat for calculating recombination frequencies. All of the matched WT controls were averaged for comparisons against the mutant variant frequencies because we only sequenced 1 WT control for each gene of interest.
Results
Duplex Sequencing coverage
We generated Duplex Sequencing libraries from DNA extracted from isolated organelles to test if genes involved in recombination suppression also impact accumulation of SNVs and short indels in A. thaliana organellar genomes. Duplex Sequencing libraries were sequenced on a NovaSeq 6,000 to produce between 30.6 and 139.1 million paired-end reads (2 × 150 nt) per library (Supplementary Table 3). Processing the Duplex Sequencing libraries to collapse Illumina reads into consensus sequences and map them to organellar genomes resulted in coverage of 94.2–816.3× in the mitochondrial libraries (radA, recA3, and why2) and 234.2–1176.6× in the plastid libraries (radA, recA1, and osb2; Supplementary Table 3).
Increased SNV and indel frequency in radA, recA1, and recA3 mutants
We compared variant frequencies of each mutant to the matched WT controls (2-tailed t-test) and found significant increases in SNV and indel frequencies in the radA mutants (P-values reported in Fig. 1). We observed a trend toward increased mtDNA indels and SNVs in the recA3 mutants, though these differences were not significant at the P = 0.05 threshold (P = 0.058 and P = 0.136, respectively; Fig. 1). Similarly, there was a trend toward increased cpDNA indels and SNVs in the recA1 mutants, though again these increases were not statistically significant at a P = 0.05 threshold (P = 0.064 and P = 0.119, respectively). We analyzed our previously generated recA3 mutant from Wu et al. (2020), which represents an independent mutant allele of recA3, and similarly found significant indel and weakly significant SNV increases in mtDNA (Supplementary Fig. 1). In total, we detected 204 SNVs and 123 indels in the newly generated Duplex Sequencing libraries (Supplementary File 1). Dinucleotide mutations involve neighboring sites both experiencing a substitution at the same time and are increasingly being recognized as an important type of mutation (Kaplanis et al. 2019). We assessed whether these mutations increase in frequency in any of the analyzed mutant backgrounds but found no significant differences relative to WT controls (Wilcoxon signed rank test, P > 0.05; Supplementary Fig. 2).
Fig. 1.
De novo point mutations measured with Duplex Sequencing. For each gene of interest (x-axis), mutant lines are plotted in red and matched WT controls are plotted in black. The individual biological replicates are plotted as circles, and group averages are plotted as dashes. Panels separate the data by genome (left column: mitochondria and right column: plastid) and by point mutation type (top row: SNVs and bottom row: indels). Variant frequencies (y-axis) were calculated as the total number of SNVs/total Duplex Sequencing coverage. P-values show the result of a 2-tailed t-test comparing WT vs mutant mutation frequencies for each gene of interest.
Decreased frequency of CG→TA transitions in the mtDNA of newly generated WT lines
The mutant lines assayed in both this study and in Wu et al. (2020) were sequenced with matched WT controls. Surprisingly, pooled WT SNV frequencies generated in the current study were lower than the pooled WT SNV frequencies from the Wu et al. (2020) data set (2.8 × 10−8 vs 1.7 × 10−7, t-test, P = 8.9 × 10−12), driven by a decrease in CG→TA transitions (t-test, P = 2.2 × 10−10; Fig. 2; Supplementary File 1). To understand if the decreased SNV rate in the newly generated WT libraries (Fig. 2) resulted from the changes we made to our library preparation protocol, we created a Duplex Sequencing library following our new protocol using one of the original WT DNA samples from Wu et al. (2020). This new library had an SNV rate of 1.57 × 10−7 (Supplementary Table 4) which is in line with the SNV rates observed in the WT libraries from the 2020 study (Fig. 2; Supplementary Table 4). In fact, the new SNV rate for this DNA sample was slightly higher than that of the library generated and sequenced in Wu et al. (2020) from the same DNA sample (1.39 × 10−7; Supplementary Table 4). Given that the newly created libraries were all size selected on a BluePippin, which involves mixing the libraries with fluorescein-labeled DNA as an internal standard for gauging DNA migration speed, we re-sequenced 2 stored libraries from Wu et al. (2020) with and without size selection on the BluePippin. The inclusion of the sample without size selection on the BluePippin served as a control for the sample processed on the BluePippin and also as an independent test to understand if changes in the sequencing platform could be responsible (all samples were sequenced on a NovaSeq 6,000, but the chemistry of the flow cells has been updated). These re-sequenced libraries had SNV rates typical of the old WT libraries of 1.97 × 10−7 (size selected library) and 1.47 × 10−7 (not size selected) (Supplementary Table 4). Again, these values were slightly higher than the SNV rates from the original round of sequencing (1.36 × 10−7 and 1.39 × 10−7, respectively; Supplementary Table 4). Therefore, it seems highly unlikely that the decreased SNV rate in the new WT libraries is associated with the changes we made to our library preparation protocol. Instead, these appear to be genuine differences in the DNA samples, perhaps due to unknown variation in the growth conditions or DNA extraction procedures between the 2 batches.
Fig. 2.
Comparison of the mutational spectrum of pooled WT controls from the current study vs the WT controls from Wu et al. (2020). The 2 panels show the mitochondrial and plastid data, and the x-axis separates substitutions type by transversions vs transitions and further by the 6 types of substitutions. Individual biological replicates are plotted as circles while group averages are plotted as dashes. Only CG→TA transitions showed a significant increase in the old data set (2-tailed t-test; P = 2.2 × 1010).
SNV frequencies are similar among different genomic regions
To gain a deeper understanding of mutational process in the organellar genomes, we next turned our attention to the distribution of SNVs, focusing primarily on the msh1 mutants and the pooled WT libraries from the Wu et al. (2020) study, given the larger number of mutations in those data sets. First, we assessed if the SNVs in msh1 mutants and pooled WT libraries from Wu et al. (2020) are evenly distributed between intergenic, protein-coding (CDS), intronic, rRNA, and tRNA regions (Fig. 3) and found no significant differences among genomic regions (Kruskal–Wallis test, P > 0.05; Supplementary Table 5) except in the WT plastid comparison, which is likely not biologically meaningful, given the small number of observed WT plastid SNVs (Fig. 2). Given that the vast majority of mtDNA SNVs in the Wu et al. (2020) WT data set are CG→TA transitions, we separately tested if this class of substitutions is evenly distributed across regions and found significant differences (Kruskal–Wallis test, P = 0.0295), driven by a decrease in tRNA genes compared to intergenic sequences (pairwise comparisons with Wilcoxon rank sum test, P = 0.0013). However, tRNA genes make up a small fraction of the genome and, thus, are subject to higher sampling variance, precluding any confident conclusions about whether they actually accumulate fewer CG→TA transitions than intergenic sequence.
Fig. 3.
Distribution of WT (bottom) and msh1 (top) SNVs (from Wu et al. 2020) across genomic region. The individual biological replicates are plotted as circles, and group averages are plotted as dashes. Panels separate the data by genome (left column: mitochondria and right column: plastid) and by genotype with msh1 mutants on top and WT on the bottom. Note the difference in y-axis scale for msh1 mutants and WT. For each of the 4 panels, we performed a Kruskal–Wallis test and found no significant difference between genomic regions except the WT plastid panel (P = 0.022) where comparisons between regions are likely not biologically meaningful given the low number of WT plastid mutations. Note that for this and subsequent analyses of the msh1 Duplex Sequencing data, we pooled the 2 null msh1 alleles to increase statistical power.
C→T substitutions are more common on the template strand in genic regions
Next, we performed a strand asymmetry analysis to understand if the SNVs in these data sets are evenly distributed on template vs non-template (i.e. sense or coding) strands in the CDS, intronic, rRNA, and tRNA regions of the organellar genomes. The analysis of the CG→TA transitions from the Wu et al. (2020) WT data set revealed that G→A substitutions are significantly enriched on the non-template strand of the DNA (paired Wilcoxon signed-rank test; P < 0.05 for CDS, rRNA and tRNA genes). Therefore, C→T substitutions predominately occur on the template strand, which is read by RNA polymerases during transcription (Fig. 4). This asymmetry is most striking in rRNA and tRNA genes, where every C→T substitution occurred on the template strand (25 in rRNA and 7 in tRNA). CG→TA transitions were also asymmetrically distributed between strands in genic regions of the Wu et al. (2020)msh1 mutants (Fig. 5), though only in certain regions of the mtDNA (Fig. 5, top right panel), and not in the cpDNA (Fig. 5, bottom right panel). We also investigated strand asymmetries in the AT→GC transitions of the Wu et al. (2020)msh1 mutants and found a trend toward more T→C substitutions on the template strand of plastid genes (Fig. 5, left panels). We did not investigate strand asymmetries for AT→GC in WT or for any of the other substitution classes in either WT or msh1 mutants because the small number of data points precludes meaningful comparisons between strands (see Fig. 5 of Wu et al. 2020).
Fig. 4.
Strand asymmetry analysis of CG→TA transitions in the WT mtDNA Duplex Sequencing data from Wu et al. (2020). Shown are the log-transformed SNV frequencies (y-axis) of C→T vs G→A mutations on the non-template strand of all genes, separated by genomic region (x-axis). The individual biological replicates are plotted as circles, and group averages are plotted as dashes. P-values show the result of paired Wilcoxon tests comparing the complementary substitution classes in each genomic region. In all but intronic regions, G→A substitutions are significantly higher on the non-template strand (conversely, C→T substitutions are significantly higher on the template strand). Strikingly, in all of the observed CG > TA transitions in the rRNA and tRNA genes, the C→T substitution occurred on the template strand (i.e. all the G→A substitutions occurred on the non-template stand).
Fig. 5.
Strand asymmetry analysis of CG→TA and AT→GC transitions in the msh1 Duplex Sequencing data from Wu et al. (2020). Shown are the log-transformed SNV frequencies (y-axis) of mutations on the non-template strands of all genes (separated by complementary substitution types). The individual biological replicates are plotted as circles, and group averages are plotted as dashes. The panels divide the data by transition type, with AT→GC transitions on the left and CG→TA transitions shown on the right, and by genome, with mitochondrial data on the top and plastid data on the bottom. Transversions were not analyzed because there were relatively few observed mutations of this type in the msh1 duplex data. P-values show the result of paired t-tests comparing the complementary substitution classes in each genomic region.
CG→TA transition frequencies vary depending on trinucleotide context
To understand how surrounding nucleotides impact SNV accumulation in plant organellar genomes, we performed a trinucleotide analysis, again focusing on CG→TA transitions in WT and both transition types in msh1 mutants, due to a lack of data in other substitution classes. In the WT data set (Wu et al. 2020), we found that CG→TA transitions are 8.4-fold more common in the mtDNA and 3.7-fold more common in the cpDNA when the C is 3′ of a pyrimidine (Fig. 6). Interestingly, this same trinucleotide context (5′ pyrimidine) is not enriched for CG→TA transitions in the msh1 mutant data. Instead CG→TA transitions are 3.0-fold more common when the C is 5′ of a G in the msh1 mutants (Fig. 7, right panels). Meanwhile AT→GC transitions are 1.8-fold more common when the A is 5′ of a C (Fig. 7, left panels). In all cases, these trinucleotide mutation frequencies are normalized by the total coverage of a given trinucleotide context so that the values are not inflated in trinucleotides that are relatively common in the mtDNA.
Fig. 6.
Analysis of surrounding nucleotides on C→T transition frequencies in the WT Duplex Sequencing data from Wu et al. (2020). The panels divide the data based on genome with mitochondrial data on the top and plastid data on the bottom, note the difference in the y-axis scale, as CG→TA were less frequent in the plastid. The x-axis captures the trinucleotide context with downstream nucleotides displayed next to the 3′ and upstream nucleotides display next to the 5′. The data suggest that trinucleotide contexts with upstream pyrimidines (5′ CCN 3′ and 5′ TCN 3′, where N is any nucleotide) have increased frequencies of C→T substitutions.
Fig. 7.
Analysis of surrounding nucleotides on A→G and C→T transition frequencies in the msh1 Duplex Sequencing data from Wu et al. (2020). The panels divide the data based on substitution type (A→G substitutions on the left and C→T substitutions on the right) and by genome (mitochondrial data on the top and plastid data on the bottom). The x-axis captures the trinucleotide context with downstream nucleotides displayed next to the 3′ and upstream nucleotides display next to the 5′. The A→G data suggest that trinucleotide contexts with downstream Cs (5′ NAC 3′) have increased frequencies of A→G substitutions. The C→T data suggest that trinucleotide contexts with downstream Gs (5′ NCG 3′) have increased frequencies of C→T substitutions.
Chloroplast extractions produced an order of magnitude more nanopore sequencing data than mitochondrial extractions
We next generated long-read Oxford Nanopore libraries to gain a deeper understanding of how the genes in our panel impact plant organellar genome stability. Unexpectedly, the libraries produced from the mitochondrial isolations sequenced poorly compared to the plastid-derived libraries (see Methods), so we investigated cross-organelle contamination (mtDNA molecules in the plastid-derived samples and cpDNA molecules in mitochondrially derived samples) to understand if poor mtDNA sequencing performance was inherent to the mtDNA or associated with differences in the organellar isolation methods. The level of mtDNA contamination in the plastid-derived nanopore libraries is similar to the level of contamination in the Duplex Sequencing libraries (Supplementary Fig. 3). The average median read length of the mitochondrial derived nanopore libraries is about 2.5-fold higher than the average median read length of the plastid-derived libraries (2.48 kb vs 1.08 kb, respectively). In the plastid-derived nanopore libraries, the median lengths of the contaminating mtDNA reads tend to be slightly longer than the median lengths of native cpDNA reads (average median lengths of 1.17 kb vs 0.98 kb, respectively), though there is substantial variation between samples (Supplementary Fig. 4). In the mitochondrially derived libraries, the contaminating cpDNA and native mtDNA median read lengths show more correlation (average median lengths of 2.41 kb and 2.56 kb, respectively; Supplementary Fig. 4).
These analyses suggest that the difference in yields for the different nanopore runs is likely related to differences in the organellar isolation methods. One unique feature of the mitochondrial isolation protocol is the use of a DNase I treatment to remove contaminating nuclear and plastid DNA molecules (Wu et al. 2020). It is possible that this treatment results in nicking of the mtDNA that interrupts the molecules as they are threaded through the nanopore in a single-stranded fashion. Such nicking would not be expected to disrupt Duplex Sequencing library creation since the first step of making Duplex Sequencing libraries is to break DNA into small fragments via ultrasonication. However, this explanation is somewhat inconsistent with the 2.5-fold greater median read length in the mitochondrially derived nanopore libraries. Fortunately, the contaminating mtDNA-derived reads in the msh1 and radA cpDNA sequenced samples provided sufficient mtDNA coverage for analyzing structural variation in the mtDNA (Supplementary Table 2 and Fig. 3, left panel).
Repeat-mediated recombination drives distinct patterns of mtDNA instability in msh1, radA, and recA3 mutants
Given the known role of recombination-related genes in maintaining organellar genome copy number and structural stability (Arrieta-Montiel et al. 2009; Davila et al. 2011; Miller-Messmer et al. 2012; Chevigny et al. 2022; Zou et al. 2022), we analyzed the ratio of mutant coverage to WT coverage to characterize structural perturbations on a genome-wide level (Fig. 8). We see distinct variation patterns in the mtDNA coverage in msh1, radA, and recA3 mutants, consistent with the expected structural effects of these genes (Fig. 8) and similar to previously documented coverage patterns (Wu et al. 2020; Chevigny et al. 2022). In contrast, the why2 coverage does not deviate from WT coverage, suggesting there is no substantial and consistent structural effect of losing why2. In recA3, the nanopore and Duplex Sequencing lines are tightly correlated, while the nanopore data tend to show greater variance in the msh1, radA, and why2 plots, perhaps because of the lower nanopore coverage in those samples (Supplementary Table 6 and Figs. 6 and 7). Interestingly, radA and recA3 share many major coverage peaks and valleys, suggesting genome structure is perturbed in similar ways in these mutants (Fig. 8; Supplementary Figs. 6 and 7). Compared to the mitochondrial samples, the cpDNA samples display much less coverage variation (Supplementary Fig. 5), with a notable exception in the recA1 nanopore data. However, inspection of the coverage in the individual cpDNA replicates (Supplementary Fig. 8) reveals depth irregularities in the WT control compared to the other WT samples. Regardless, the recA1 Duplex Sequencing data does not show any depth variation along the cpDNA, so the nanopore result does not appear to reflect a biological effect on cpDNA structure. One other intriguing pattern in the cpDNA plots is an apparent correlation in peaks and valleys in radA and osb2 in the Duplex Sequencing data (most notable is the shared valley at 112 kb). However, inspection of the individual recA1 mutant and matched WT control replicates (Supplementary Fig. 9) reveals that all samples have a dip at 112 kb and the dip is more pronounced in 1 or more of the osb2 and radA mutants. Given the large number of PCR cycles used to amplify the Duplex Sequencing libraries (19 cycles), the unified movement of all replicates is likely explained in part by amplification bias in AT- or GC-rich regions. Therefore, variation in amplification bias may result in lower coverage of AT or GC rich regions, so these patterns are likely not biological.
Fig. 8.
Normalized coverage of mitochondria genomes in mutant lines of interest. Coverage of each Duplex Sequencing (red) or nanopore (blue) library was calculated in 1,000-bp windows. Mutant coverage was pooled and divided by WT coverage, and the resulting ratios were normalized to 1 for plotting. The total amount of sequencing data used to generate each plot is shown in the top left corner of each panel (red, Duplex Sequencing; blue, nanopore) and is included to highlight the instances where disagreement between the Duplex Sequencing and nanopore lines may be explained by increased variance in the nanopore sample due to lower mtDNA coverage. Repeats that are likely important for driving coverage variation across the mtDNA are plotted above (also see Table 1) according to Fig. 6 of Chevigny et al. (2022). Regions with altered stoichiometry and flanked by repeats are shown as colored blocks, as in Fig. 6 of Chevigny et al. (2022).
We analyzed the nanopore reads for evidence of repeat-mediated recombination. To do so, we calculated recombination frequencies for each repeat pair as the count of nanopore reads that recombined at a given repeat [according the BLASTn alignments generated by HiFi-SR (Zou et al. 2022)] divided by the total number of reads that mapped to the repeat. Table 1 shows the 5 repeats with the highest recombination frequency for each mutant genotype and the matched WT controls. Figure 9 shows examples of how the long nanopore reads map to the mitochondrial genome following recombination at inverted (Fig. 9a) or directed repeats (Fig. 9, b and c).
Table 1.
Repeat-specific recombination frequencies at the 5 most recombinationally active mtDNA repeats for each genotype.
| Genotype | Recombined reads | Total repeat-spanning reads | Recomb. freq. | Repeat name | Repeat pair coordinates | Percent ID | Length |
|---|---|---|---|---|---|---|---|
| msh1 | 49 | 178 | 0.284 | B | 41,464–41,999, 321,967–321,431 | 99.81 | 537 |
| msh1 | 51 | 199 | 0.268 | A* | 19,682–20,237, 346,208–346,763 | 99.82 | 556 |
| msh1 | 40 | 157 | 0.256 | G | 30,938–31,272, 271,395–271,061 | 99.40 | 335 |
| msh1 | 36 | 171 | 0.242 | MMJS | 134,427–135,193, 257,452–258,143 | 88.66 | 767 |
| msh1 | 48 | 203 | 0.222 | D | 6118–6569, 84,540–84,089 | 97.79 | 452 |
| radA | 94 | 125 | 0.692 | L* | 270,775–271,023, 331,877–332,125 | 100 | 249 |
| radA | 135 | 262 | 0.476 | A* | 19,682–20,237, 346,208–346,763 | 99.82 | 556 |
| radA | 201 | 529 | 0.4 | EE* | 65,547–65,673, 73,611–73,737 | 99.21 | 127 |
| radA | 124 | 284 | 0.357 | F* | 206,095–206,444, 246,766–247,115 | 100 | 350 |
| radA | 43 | 258 | 0.144 | X | 288,315–288,518, 306,969–307,174 | 97.57 | 206 |
| recA3 | 198 | 907 | 0.227 | L* | 270,775–271,023, 331,877–332,125 | 100 | 249 |
| recA3 | 210 | 1384 | 0.168 | EE* | 65,547–65,673, 73,611–73,737 | 99.21 | 127 |
| recA3 | 159 | 1019 | 0.149 | F* | 206,095–206,444, 246,766–247,115 | 100 | 350 |
| recA3 | 88 | 770 | 0.116 | A* | 19,682–20,237, 346,208–346,763 | 99.82 | 556 |
| recA3 | 67 | 1111 | 0.06 | I* | 30,442–30,722, 255,122–254,842 | 99.64 | 281 |
| why2 | 1 | 274 | 0.042 | Unnamed | 239,143–239,268, 263,789–263,905 | 91.27 | 126 |
| why2 | 5 | 256 | 0.007 | A* | 19,682–20,237, 346,208–346,763 | 99.82 | 556 |
| why2 | 5 | 272 | 0.007 | F* | 206,095–206,444, 246,766–247,115 | 100 | 350 |
| why2 | 5 | 260 | 0.007 | L* | 270,775–271,023, 331,877–332,125 | 100 | 249 |
| why2 | 3 | 219 | 0.005 | D | 6118–6569, 84,540–84,089 | 97.79 | 452 |
| WT | 23 | 902 | 0.093 | A* | 19,682–20,237, 34,620–346,763 | 99.82 | 556 |
| WT | 10 | 858 | 0.057 | L* | 270,775–271,023, 331,877–332,125 | 100 | 249 |
| WT | 13 | 931 | 0.055 | B | 41,464–41,999, 321,967–321,431 | 99.81 | 537 |
| WT | 6 | 1050 | 0.041 | C | 36,362–36,824, 144,409–143,947 | 99.57 | 463 |
| WT | 11 | 933 | 0.04 | MMJS | 134,427–135,193, 257,452–258,143 | 88.66 | 767 |
Listed are the 5 most active repeats for each genotype, ordered by the recombination frequency within each genotype. Repeat names were sourced from Table S11 of Zou et al. (2022). For the msh1 mtDNA analysis, we relied exclusively on the plastid-derived msh1 samples, and for the radA mtDNA analysis, we used a combination of the low coverage radA mitochondrial samples and the plastid radA samples (see main text). For the WT comparison, we took the average across the single matched WT libraries that were sequenced with each mutant line, including msh1and radA WT plastid samples (Supplementary Table 2). The repeats that are also plotted in Fig. 8 are denoted with an asterisk. Repeats which are among the top 5 most active repeats in more than 1 genotype are in bold. Repeat-specific recombination frequencies that exceed 0.1 are shown in bold, and note that none of the WT or why2 repeat-specific recombination frequencies meet this threshold.
Fig. 9.
Examples of 3 nanopore reads from radA mitochondrial replicate 1 that capture repeat-mediated recombination. Nanopore reads that derive from recombination between inverted repeats map with 2 hits, one in the forward orientation and the other in the reverse orientation, both flanked by the sequence of a repeat, as shown in a) where the 29-kb read is flanked by repeats I-1 and I-2. Recombination between direct repeats results in 2 hits in the same orientation with a deletion of the intervening sequence b). The alternative product of recombination between direct repeats is the production of a small circular molecule. We identified a number of putative circular molecules or tandem duplications mediated by recombination between repeats EE-1 and EE-2, which map with 2 hits in the same orientation, but with a section of the end of the read mapping in front of the end of the read c).
We calculated genome-wide recombination frequencies for the mtDNA by summing across repeats with at least 10 recombining reads (Supplementary File 2). The threshold was lowered to repeats with at least 3 recombining reads in the cpDNA given the smaller number of recombining reads observed in the cpDNA (Supplementary File 3). We found significant differences in the frequency of mtDNA rearrangements among the WT and mutant lines (1-way ANOVA, P = 1.5 × 10−8; Fig. 10), which were driven by increases in recombination frequency in msh1, radA, and recA3 compared to WT (Tukey pairwise comparison, P = 3.0 × 10−7, 2.0 × 10−7, and 0.02, respectively). In contrast, there was no mtDNA recombination frequency difference between why2 mutants and WT samples (Tukey pairwise comparison, P = 0.99).
Fig. 10.
Frequency of repeat-mediated structural variants in the nanopore data. The individual biological replicates are plotted as circles with the size of the circle scaled by the number of repeats that are covered in the nanopore alignments. Closed circles are the libraries from mitochondrial extractions, while the open circles are libraries from the plastid extractions. In some cases, cpDNA extractions were used to harvest contaminating mtDNA-mapping reads because of low yield from direct sequencing of the mtDNA extractions. Group averages are plotted as dashes. Mutants are plotted in red, while WT samples are plotted in black. Letters represent statistically significant groupings according to Tukey pairwise comparisons on a 1-way ANOVA (P < 0.001). There were no differences among plastid genotypes.
We found that different repeats apparently become active in different mutant background as evidenced by a 2-way ANOVA with a significant interaction between genotype and repeat (P < 2.0 × 10−16). Because our analysis focuses on reads with 2 or fewer BLASTn hits, we may have underestimated global recombination frequencies, especially in mutant backgrounds, as a PacBio HiFi study found that such reads with 3 or more BLASTn hits (which arise when reads span 2 or more repeats that have recombined) comprise 0.34 and 8.69% of all reads in WT and msh1, respectively (Zou et al. 2022). Consistent with previous characterization of repeat-mediated recombination in plant mtDNAs (Arrieta-Montiel et al. 2009; Davila et al. 2011; Miller-Messmer et al. 2012; Chevigny et al. 2022; Zou et al. 2022), we found that repeat length and percent identity are also predictive of recombination frequency through a 3-way ANCOVA with repeat length and percent identity as continuous variables (P = 1.8 × 10−12 and 1.4 × 10−6, respectively) and genotype as a categorical variable (P = 2.0 × 10−22).
There were no significant differences in repeat-mediated recombination between any of the cpDNA mutants (msh1, radA, osb2, and recA1) compared to the WT samples (1-way ANOVA, P = 0.849; Fig. 9), even though disruption of some of these genes is known the lead to destabilization of the plastid genome (Rowan et al. 2010; Xu et al. 2011; Zampini et al. 2015; Zou et al. 2022). The inability to detect such effects here likely reflects the fact the frequencies of plastid structural variants are much lower than in mtDNAs because of a general lack of intermediate-sized repeats in Arabidopsis cpDNA. We identified no insertions above 10 nts in the HiFi-SR variant calls (after requiring at least 2 nanopore reads to support a putative insertion) and only a single cpDNA deletion of 106 bp in msh1 mutant replicate 2, which was supported by 18 independent nanopore reads (cpDNA position 148490–148596). The paucity of inferred indels in the HiFi-SR calls likely stems from the high error rate of nanopore sequencing (including rampant artifactual insertions and deletions) in combination with our criteria of 2 (or more) nanopore reads supporting the exact same indel (same position and length).
Discussion
Potential causes of elevated organellar mutation rates in lines with disrupted recombination machinery
By utilizing highly accurate Duplex Sequencing for point mutation detection and long-read Oxford Nanopore sequencing for structural variant detection, we have characterized the overall organellar mutational dynamics in A. thaliana lines lacking genes with roles in organellar genome recombination. The increases in point mutations we observed in radA, recA3, and recA1 are much smaller than the effects previously observed in msh1 mutants (Wu et al. 2020) where mutants experience 6.0-fold and 116.5-fold increases in SNVs (in mtDNA and cpDNA, respectively) and 86.6-fold and 790.6-fold increases in indels (in mtDNA and cpDNA, respectively). In contrast, radA mutants incurred 2.6-fold and 12.6-fold more mtDNA and cpDNA SNVs (respectively) and 5.1-fold and 3.1-fold more mtDNA and cpDNA indels (respectively) than the matched WT controls. The point mutation increases in recA3 and recA1 were even smaller than in the radA mutants. One complication with directly comparing the mutant vs WT fold changes across the newly generated mutant lines compared to those generated in Wu et al. (2020) is the decrease in WT mutation rates in the new genes (Fig. 2). Because of the shift in the baseline WT rates, the numbers cited above may actually underestimate the gap in effect size between msh1 and the newly analyzed genes.
The point mutation increases in msh1 mutants have clear mechanistic explanations which were first predicted based on the MSH1 mismatch recognition and GIY-YIG endonuclease domains (Christensen 2014; Wu et al. 2020). In contrast, given that RADA, RECA3, and RECA1 are all thought to function in the resolution of recombination intermediates, it is more difficult to explain the mechanisms responsible for increased point mutations in these lines. One possibility is that in the absence of 1 recombination pathway, recombining molecules are shuttled into an alternative, less faithful recombination pathway. For example, in mutant lines deficient in HR, DSBs may be repaired via error-prone MMEJ, which could drive increases in indels and SNVs (Waters et al. 2014; García-Medel et al. 2019). Evidence suggests that RADA functions as the principal branch migration factor in a primary mtDNA and cpDNA HR pathway, while RECA3 may fill the same role as RADA in a partially redundant and less utilized mtDNA-specific HR pathway (Chevigny et al. 2022). Interestingly, RECA2 is thought to initiate recombination in both pathways and is essential in plants (Miller-Messmer et al. 2012; Chevigny et al. 2022). The larger SNV and indel increases in the radA mutants than in the recA3 mutants may reflect the relative utilization (and importance) of these 2 partially redundant HR pathways (Chevigny et al. 2022). Similarly, previous studies have documented increased MMEJ in cpDNA of recA1 mutants (Zampini et al. 2015), which is consistent with the significant increase in indels and marginally significant increase in SNVs reported here (Fig. 1). One avenue for dissecting these complex relationships among pathways will be to perform similar studies on higher-order mutants with disruptions in multiple pathways.
Another possibility is that the rise in point mutations is an indirect effect of increased repeat-mediated recombination and its associated harm to organelle function. Increased recombination between short repeat sequences may disrupt genes, organellar genome stoichiometry, and genome organellar replication, which is recombination-dependent in plants (Shedge et al. 2007; Rowan et al. 2010; Chevigny et al. 2020). Plant organellar genomes encode proteins necessary for the electron transport chains of respiration and photosynthesis and disruption of these pathways can result in the excess production of DNA damaging reactive oxygen species (ROS; Liu et al. 2021). Although a direct link between ROS-mediated damage to DNA and mutation rates remains contentious (Kennedy et al. 2013; Itsara et al. 2014; Broz et al. 2021; Sanchez-Contreras et al. 2021; Waneka et al. 2021), ROS molecules have been shown to indirectly affect point mutation rates by impairing proofreading capabilities via damage to the metazoan mtDNA polymerase (Pol γ; Anderson et al. 2020). Impairment of organellar function is also consistent with phenotypic growth defects in radA, which include retarded development and distorted leaves with chlorotic sectors (Chevigny et al. 2022).
Potential explanations of mutational biases based on DNA strand asymmetry and flanking nucleotides
We found that SNVs in the msh1 mutants and WT plants from Wu et al. (2020) had biased distributions in terms of strand (non-template vs template) and trinucleotide context. Such patterns are useful for understanding the underlying mechanisms driving mutation formation (Haradhvala et al. 2016; Sun et al. 2018; Moeckel et al. 2023). For example, CG→TA strand asymmetries documented in diverse metazoan mtDNAs have been proposed to result from the 2 DNA strands experiencing unequal time in single-stranded states during mtDNA replication, since single-stranded DNA is more vulnerable to cytosine deamination (a primary driver of CG→TA transitions) (Kennedy et al. 2013; Itsara et al. 2014; Arbeithuber et al. 2020; Sanchez-Contreras et al. 2021; Waneka et al. 2021). In mammals, C→T substitutions are ∼10-fold more common than G→A substitution on the mtDNA heavy strand (H-strand), which likely spends more time in a single-stranded state as the mtDNA is copied via a strand-asynchronous replication mechanism (Kennedy et al. 2013; Arbeithuber et al. 2020). Further, the C→T substitutions form 2 gradients starting at the 2 H-strand origins of replication, consistent with the regions closest to the origin being single stranded for longer (Sanchez-Contreras et al. 2021).
The substantial CG→TA strand asymmetries we observed in the mtDNA of the Wu et al. (2020) WT libraries are unlikely to be explained by replication mechanisms given that plant mtDNAs lack discrete origins of replication or dedicated “leading and lagging” strands (alternatively referred to as light and heavy strands, respectively, in some systems) and instead rely on recombination-mediated replication (Gualberto and Newton 2017; Brieba 2019; Chevigny et al. 2020). Instead, our strand asymmetry analysis focused on genic regions, motivated by well-established patterns of more C→T than G→A substitution on non-template strands which spend more time in exposed single-stranded states during transcription (Haradhvala et al. 2016; Vöhringer et al. 2021; Moeckel et al. 2023). Surprisingly, we found an opposite pattern with template strands exhibiting far more C→T than G→A substitutions (Fig. 4). This effect was especially pronounced in rRNA and tRNA genes where the C→T substitutions occurred on the template strand in all 32 observed CG→TA transitions. An enrichment of C→T substitutions on template strands also occurred in the mtDNA (but not the cpDNA) of the msh1 mutants, though there was less power for detecting statistically significant effects (Fig. 5). The overabundance of A→G compared to T→C substitutions in msh1 mutant cpDNA template strands also occurs in the opposite direction of predicted effects given that the non-template strand is again expected to experience increased adenine deamination (which leads to A→G substitutions; Mugal et al. 2009; Sanchez-Contreras et al. 2021).
Enrichment of C→T and A→G substitutions on template strands is puzzling, and to our knowledge, there are no other instances where this widespread transcriptional asymmetry has been reversed (Mugal et al. 2009; Moeckel et al. 2023). Reversals in strand asymmetries have been reported in metazoan mitochondrial genomes, but in these cases, the asymmetries are replication based, and the reversals are proceeded by an inversion of the origin of replication, effectively switching the leading and lagging strands (Wei et al. 2010). It is notable that the WT CG→TA asymmetries are most pronounced in the rRNA and tRNA genes (Fig. 4), which are likely more highly expressed than the protein-coding genes. Increases in transcription have been shown to drive genomic instability in the A. thaliana cpDNA due to the increased formation of R-loops (RNA/DNA hybrids formed by displacement of the other DNA strand), which stall replication forks and lead to DSBs (Pérez Di Giorgio et al. 2019). It is possible that increased mtDNA expression also leads to the formation of R-loops and DSBs which may then be repaired through error-prone MMEJ. However, it is not clear how this would drive strand asymmetric mutation. Further, such a mechanism is not consistent with the relatively even distribution of SNVs across intergenic vs transcribed regions of the genome (Fig. 3). The magnitude of the CG→TA asymmetries is decreased in the msh1 mutants (roughly 2-fold averaging across all genic sequences) compared to in the WT controls (roughly 6-fold). This shift may reflect a larger proportional contribution of mutations from simple DNA polymerase misincorporation errors (which are not expected to be strand-biased) in the absence of MSH1 activity.
The CG→TA transitions in the WT lines and both transitions in the msh1 mutants were also impacted by the identity of neighboring nucleotides (Figs. 6 and 7). Trinucleotide effects have previously been implicated to bias mutation distribution in the A. thaliana nuclear genome (Lu et al. 2021) as well as in the mtDNAs of various metazoans (Itsara et al. 2014; Arbeithuber et al. 2020; Sanchez-Contreras et al. 2021; Waneka et al. 2021). It is noteworthy that the specific trinucleotides associated with CG→TA transitions differ between WT and msh1 mutants. The 5′ YCN signature (where Y is any pyrimidine and N is any nucleotide) in the WT lines is similar to that induced by APOBEC3-mediated cytosine deamination in human cell lines (Carpenter et al. 2023), though plants lack APOBEC enzymes so the relevance of this shared pattern is unclear. Meanwhile, the 5′ NCG signature in the msh1 mutants is consistent with spontaneous water mediated cytosine deamination (Carpenter et al. 2023).
Patterns of repeat-mediated recombination differs among mutant lines
The repeat-mediated mtDNA recombination activity we documented in the msh1, radA, and recA3 mutants is consistent with the previously documented recombination increases of these mutant backgrounds (Shedge et al. 2007; Arrieta-Montiel et al. 2009; Rowan et al. 2010; Davila et al. 2011; Miller-Messmer et al. 2012; Zampini et al. 2015; Wu et al. 2020; Chevigny et al. 2022; Zou et al. 2022). The absence of an effect in the why2 mutants is interesting given that why2 is the most abundant protein in mitochondrial nucleoids (Fuchs et al. 2020) and plants lacking why2 display aberrant mitochondrial morphology (Golin et al. 2020; Negroni et al. 2024). On the other hand, this result is consistent with a previous study that showed why2 mutants become more recombinationally active than WT under increased genotoxic stress (ciprofloxacin treatment) but showed no recombinational difference from WT under “normal” growth conditions (Cappadocia et al. 2010; Negroni et al. 2024).
Though msh1, radA, and reca3 are all required for the suppression of repeat-mediated recombination in mtDNA, these proteins likely function either in independent HR pathways (radA, recA3) or in different ways (msh1). As noted, RECA3 is thought to facilitate branch migration in an HR pathway that may be relatively minor compared to the one in which RADA functions (Chevigny et al. 2022). Previous studies of recA3/msh1 and recA3/radA double mutants have shown that the double mutants are more recombinationally active than recA3 single mutants (Shedge et al. 2007), supporting the hypothesis that RECA3-mediated HR is at least partially independent of RADA-mediated HR (Miller-Messmer et al. 2012; Chevigny et al. 2022). This model is supported by the greater increase in global recombination frequency in radA compared to recA3 (Fig. 10). We might also expect different repeats to become active in recA3 compared to radA mutants. However, as seen in Table 1, there is substantial overlap in the repeats with increased recombination frequencies in these mutants, though the extremely high recombination frequency at repeat L in radA is one major difference. Meanwhile, MSH1 has been proposed to suppress non-allelic recombination by recognizing and rejecting mismatches in the invading strand during heteroduplex formation (Christensen 2018; Broz et al. 2022), which could be a shared feature in both RADA and RECA3 dependent HR pathways. Supporting this idea, there is an increased number of repeats that become active in msh1 mutants compared to radA and recA3 mutants. Specifically, there are 12 repeat pairs with a recombination frequency greater than 0.1 in msh1 mutants but only 4 and 9 repeat pairs that meet this threshold in recA3 and radA mutants, respectively (Supplementary File 2).
Given that recombination is activated differently between the mutants (Fig. 8), the high degree of repeatability between replicates is fascinating (Supplementary Figs. 5–8). These repeatable patterns rely on consistent activation of distinct repeat pairs and/or consistent maintenance/replication of certain recombination products. Understanding why different repeats become active and how these patterns relate to the increase in point mutations reported here remains an important unanswered question in the field of plant organellar genome maintenance.
Supplementary Material
Contributor Information
Gus Waneka, Department of Biology, Colorado State University, Fort Collins, CO 80523, USA.
Amanda K Broz, Department of Biology, Colorado State University, Fort Collins, CO 80523, USA.
Forrest Wold-McGimsey, Department of Biology, Colorado State University, Fort Collins, CO 80523, USA.
Yi Zou, Guangdong Laboratory for Lingnan Modern Agriculture, Genome Analysis Laboratory of the Ministry of Agriculture, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen, GD 518120, China.
Zhiqiang Wu, Guangdong Laboratory for Lingnan Modern Agriculture, Genome Analysis Laboratory of the Ministry of Agriculture, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen, GD 518120, China.
Daniel B Sloan, Department of Biology, Colorado State University, Fort Collins, CO 80523, USA.
Data availability
The Duplex Sequencing and Oxford Nanopore reads were deposited to the NCBI Sequence Read Archive (SRA) under BioProject PRJNA1113549. The Duplex Sequencing data were analyzed with our previously published pipeline (Wu et al. 2020; https://github.com/dbsloan/duplexseq) with additional analyses described using code available via https://github.com/dbsloan/recomb_mutant_seq, which also contains the R code used to create the figures in this paper.
Supplemental material available at G3 online.
Funding
This work was supported by the National Institutes of Health (NIGMS R35GM148134).
Literature cited
- Abdelnoor RV, Christensen AC, Mohammed S, Munoz-Castillo B, Moriyama H, Mackenzie SA. 2006. Mitochondrial genome dynamics in plants and animals: convergent gene fusions of a MutS homologue. J Mol Evol. 63(2):165–173. doi: 10.1007/s00239-005-0226-9. [DOI] [PubMed] [Google Scholar]
- Abdelnoor RV, Yule R, Elo A, Christensen AC, Meyer-Gauen G, Mackenzie SA. 2003. Substoichiometric shifting in the plant mitochondrial genome is influenced by a gene homologous to MutS. Proc Natl Acad Sci U S A. 100(10):5968–5973. doi: 10.1073/pnas.1037651100. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Alverson AJ, Zhuo S, Rice DW, Sloan DB, Palmer JD. 2011. The mitochondrial genome of the legume Vigna radiata and the analysis of recombination across short mitochondrial repeats. PLoS One. 6(1):e16404. doi: 10.1371/journal.pone.0016404. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Anderson AP, Luo X, Russell W, Yin YW. 2020. Oxidative damage diminishes mitochondrial DNA polymerase replication fidelity. Nucleic Acids Res. 48(2):817–829. doi: 10.1093/nar/gkz1018. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Arbeithuber B, Hester J, Cremona MA, Stoler N, Zaidi A, Higgins B, Anthony K, Chiaromonte F, Diaz FJ, Makova KD. 2020. Age-related accumulation of de novo mitochondrial mutations in mammalian oocytes and somatic tissues. PLoS Biol. 18(7):e3000745. doi: 10.1371/journal.pbio.3000745. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Arrieta-Montiel MP, Shedge V, Davila J, Christensen AC, Mackenzie SA. 2009. Diversity of the Arabidopsis mitochondrial genome occurs via nuclear-controlled recombination activity. Genetics. 183(4):1261–1268. doi: 10.1534/genetics.109.108514. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Ayala-García VM, Baruch-Torres N, García-Medel PL, Brieba LG. 2018. Plant organellar DNA polymerases paralogs exhibit dissimilar nucleotide incorporation fidelity. FEBS J. 285(21):4005–4018. doi: 10.1111/febs.14645. [DOI] [PubMed] [Google Scholar]
- Berardini TZ, Reiser L, Li D, Mezheritsky Y, Muller R, Strait E, Huala E. 2015. The Arabidopsis information resource: making and mining the “gold standard” annotated reference plant genome. Genesis. 53(8):474–485. doi: 10.1002/dvg.22877. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Boore JL. 1999. Animal mitochondrial genomes. Nucleic Acids Res. 27(8):1767–1780. doi: 10.1093/nar/27.8.1767. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Brieba LG. 2019. Structure-function analysis reveals the singularity of plant mitochondrial DNA replication components: a mosaic and redundant system. Plants. 8(12):533. doi: 10.3390/plants8120533. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Broz AK, Keene A, Fernandes Gyorfy M, Hodous M, Johnston IG, Sloan DB. 2022. Sorting of mitochondrial and plastid heteroplasmy in Arabidopsis is extremely rapid and depends on MSH1 activity. Proc Natl Acad Sci U S A. 119(34):e2206973119. doi: 10.1073/pnas.2206973119. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Broz AK, Waneka G, Wu Z, Fernandes Gyorfy M, Sloan DB. 2021. Detecting de novo mitochondrial mutations in angiosperms with highly divergent evolutionary rates. Genetics. 218(1):iyab039. doi: 10.1093/genetics/iyab039. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Cappadocia L, Maréchal A, Parent J-S, Lepage E, Sygusch J, Brisson N. 2010. Crystal structures of DNA-Whirly complexes and their role in Arabidopsis organelle genome repair. Plant Cell. 22(6):1849–1867. doi: 10.1105/tpc.109.071399. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Carpenter MA, Temiz NA, Ibrahim MA, Jarvis MC, Brown MR, Argyris PP, Brown WL, Starrett GJ, Yee D, Harris RS. 2023. Mutational impact of APOBEC3A and APOBEC3B in a human cell line and comparisons to breast cancer. PLoS Genet. 19(11):e1011043. doi: 10.1371/journal.pgen.1011043. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Chen C, Li Q, Fu R, Wang J, Xiong C, Fan Z, Hu R, Zhang H, Lu D. 2019. Characterization of the mitochondrial genome of the pathogenic fungus Scytalidium auriculariicola (Leotiomycetes) and insights into its phylogenetics. Sci Rep. 9(1):17447. doi: 10.1038/s41598-019-53941-5. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Chevigny N, Schatz-Daas D, Lotfi F, Gualberto JM. 2020. DNA repair and the stability of the plant mitochondrial genome. Int J Mol Sci. 21(1):328. doi: 10.3390/ijms21010328. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Chevigny N, Weber-Lotfi F, Le Blevenec A, Nadiras C, Fertet A, Bichara M, Erhardt M, Dietrich A, Raynaud C, Gualberto JM. 2022. RADA-dependent branch migration has a predominant role in plant mitochondria and its defect leads to mtDNA instability and cell cycle arrest. PLoS Genet. 18(5):e1010202. doi: 10.1371/journal.pgen.1010202. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Christensen AC. 2014. Genes and junk in plant mitochondria—repair mechanisms and selection. Genome Biol Evol. 6(6):1448–1453. doi: 10.1093/gbe/evu115. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Christensen AC. 2018. Mitochondrial DNA repair and genome evolution. Ann Plant Rev Online. 50:11–32. doi: 10.1002/9781119312994.apr0544. [DOI] [Google Scholar]
- Davila JI, Arrieta-Montiel MP, Wamboldt Y, Cao J, Hagmann J, Shedge V, Xu Y-Z, Weigel D, Mackenzie SA. 2011. Double-strand break repair processes drive evolution of the mitochondrial genome in Arabidopsis. BMC Biol. 9(1):64. doi: 10.1186/1741-7007-9-64. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Drouin G, Daoud H, Xia J. 2008. Relative rates of synonymous substitutions in the mitochondrial, chloroplast and nuclear genomes of seed plants. Mol Phylogenet Evol. 49(3):827–831. doi: 10.1016/j.ympev.2008.09.009. [DOI] [PubMed] [Google Scholar]
- Fields PD, Waneka G, Naish M, Schatz MC, Henderson IR, Sloan DB. 2022. Complete sequence of a 641-kb insertion of mitochondrial DNA in the Arabidopsis thaliana nuclear genome. Genome Biol Evol. 14(5):evac059. doi: 10.1093/gbe/evac059. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Fuchs P, Rugen N, Carrie C, Elsässer M, Finkemeier I, Giese J, Hildebrandt TM, Kühn K, Maurino VG, Ruberti C, et al. 2020. Single organelle function and organization as estimated from Arabidopsis mitochondrial proteomics. Plant J. 101(2):420–441. doi: 10.1111/tpj.14534. [DOI] [PubMed] [Google Scholar]
- García-Medel PL, Baruch-Torres N, Peralta-Castro A, Trasviña-Arenas CH, Torres-Larios A, Brieba LG. 2019. Plant organellar DNA polymerases repair double-stranded breaks by microhomology-mediated end-joining. Nucleic Acids Res. 47(6):3028–3044. doi: 10.1093/nar/gkz039. [DOI] [PMC free article] [PubMed] [Google Scholar]
- García-Medel PL, Peralta-Castro A, Baruch-Torres N, Fuentes-Pascacio A, Pedroza-García JA, Cruz-Ramirez A, Brieba LG. 2021. Arabidopsis thaliana PrimPol is a primase and lesion bypass DNA polymerase with the biochemical characteristics to cope with DNA damage in the nucleus, mitochondria, and chloroplast. Sci Rep. 11(1):20582. doi: 10.1038/s41598-021-00151-7. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Golin S, Negroni YL, Bennewitz B, Klösgen RB, Mulisch M, La Rocca N, Cantele F, Vigani G, Lo Schiavo F, Krupinska K, et al. 2020. WHIRLY2 plays a key role in mitochondria morphology, dynamics, and functionality in Arabidopsis thaliana. Plant Direct. 4(5):e00229. doi: 10.1002/pld3.229. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Gualberto JM, Newton KJ. 2017. Plant mitochondrial genomes: dynamics and mechanisms of mutation. Annu Rev Plant Biol. 68(1):225–252. doi: 10.1146/annurev-arplant-043015-112232. [DOI] [PubMed] [Google Scholar]
- Handa H. 2003. The complete nucleotide sequence and RNA editing content of the mitochondrial genome of rapeseed (Brassica napus L.): comparative analysis of the mitochondrial genomes of rapeseed and Arabidopsis thaliana. Nucleic Acids Res. 31(20):5907–5916. doi: 10.1093/nar/gkg795. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Haradhvala NJ, Polak P, Stojanov P, Covington KR, Shinbrot E, Hess JM, Rheinbay E, Kim J, Maruvka YE, Braunstein LZ, et al. 2016. Mutational strand asymmetries in cancer genomes reveal mechanisms of DNA damage and repair. Cell. 164(3):538–549. doi: 10.1016/j.cell.2015.12.050. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Huang CY, Grünheit N, Ahmadinejad N, Timmis JN, Martin W. 2005. Mutational decay and age of chloroplast and mitochondrial genomes transferred recently to angiosperm nuclear chromosomes. Plant Physiol. 138(3):1723–1733. doi: 10.1104/pp.105.060327. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Itsara LS, Kennedy SR, Fox EJ, Yu S, Hewitt JJ, Sanchez-Contreras M, Cardozo-Pelaez F, Pallanck LJ. 2014. Oxidative stress is not a major contributor to somatic mitochondrial DNA mutations. PLoS Genet. 10(2):e1003974. doi: 10.1371/journal.pgen.1003974. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Kaplanis J, Akawi N, Gallone G, McRae JF, Prigmore E, Wright CF, Fitzpatrick DR, Firth HV, Barrett JC, Hurles ME. 2019. Exome-wide assessment of the functional impact and pathogenicity of multinucleotide mutations. Genome Res. 29(7):1047–1056. doi: 10.1101/gr.239756.118. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Kennedy SR, Salk JJ, Schmitt MW, Loeb LA. 2013. Ultra-sensitive sequencing reveals an age-related increase in somatic mitochondrial mutations that are inconsistent with oxidative damage. PLoS Genet. 9(9):e1003794. doi: 10.1371/journal.pgen.1003794. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Kennedy SR, Schmitt MW, Fox EJ, Kohrn BF, Salk JJ, Ahn EH, Prindle MJ, Kuong KJ, Shen J-C, Risques R-A, et al. 2014. Detecting ultralow-frequency mutations by Duplex Sequencing. Nat Protoc. 9(11):2586–2606. doi: 10.1038/nprot.2014.170. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Kubo T, Newton KJ. 2008. Angiosperm mitochondrial genomes and mutations. Mitochondrion. 8(1):5–14. doi: 10.1016/j.mito.2007.10.006. [DOI] [PubMed] [Google Scholar]
- Li H. 2018. Minimap2: pairwise alignment for nucleotide sequences. Bioinformatics. 34(18):3094–3100. doi: 10.1093/bioinformatics/bty191. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Liu Y, Zhou B, Khan A, Zheng J, Dawar FU, Akhtar K, Zhou R. 2021. Reactive oxygen species accumulation strongly allied with genetic male sterility convertible to cytoplasmic male sterility in kenaf. Int J Mol Sci. 22:1107. doi: 10.3390/ijms22031107. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Lu Z, Cui J, Wang L, Teng N, Zhang S, Lam H-M, Zhu Y, Xiao S, Ke W, Lin J, et al. 2021. Genome-wide DNA mutations in Arabidopsis plants after multigenerational exposure to high temperatures. Genome Biol. 22(1):160. doi: 10.1186/s13059-021-02381-4. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Martínez-Zapater JM, Gil P, Capel J, Somerville CR. 1992. Mutations at the Arabidopsis CHM locus promote rearrangements of the mitochondrial genome. Plant Cell. 4(8):889–899. doi: 10.1105/tpc.4.8.889. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Miller-Messmer M, Kühn K, Bichara M, Le Ret M, Imbault P, Gualberto JM. 2012. RecA-dependent DNA repair results in increased heteroplasmy of the Arabidopsis mitochondrial genome. Plant Physiol. 159(1):211–226. doi: 10.1104/pp.112.194720. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Moeckel C, Zaravinos A, Georgakopoulos-Soares I. 2023. Strand asymmetries across genomic processes. Comput Struct Biotechnol J. 21:2036–2047. doi: 10.1016/j.csbj.2023.03.007. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Mower JP, Sloan DB, Alverson AJ. 2012. Plant mitochondrial genome diversity: the genomics revolution. In: Wendel JF, Greilhuber J, Dolezel J, Leitch IJ, editors. Plant Genome Diversity Volume 1: Plant Genomes, Their Residents, and Their Evolutionary Dynamics. Vienna: Springer Vienna. p. 123–144 [Google Scholar]
- Mugal CF, von Grünberg H-H, Peifer M. 2009. Transcription-induced mutational strand bias and its effect on substitution rates in human genes. Mol Biol Evol. 26(1):131–142. doi: 10.1093/molbev/msn245. [DOI] [PubMed] [Google Scholar]
- Negroni YL, Doro I, Tamborrino A, Luzzi I, Fortunato S, Hensel G, Khosravi S, Maretto L, Stevanato P, Lo Schiavo F, et al. 2024. The Arabidopsis mitochondrial nucleoid-associated protein WHIRLY2 is required for a proper response to salt stress. Plant Cell Physiol. 65(4):576–589. doi: 10.1093/pcp/pcae025. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Palmer JD, Herbon LA. 1988. Plant mitochondrial DNA evolves rapidly in structure, but slowly in sequence. J Mol Evol. 28(1–2):87–97. doi: 10.1007/BF02143500. [DOI] [PubMed] [Google Scholar]
- Peñafiel-Ayala A, Peralta-Castro A, Mora-Garduño J, García-Medel P, Zambrano-Pereira AG, Díaz-Quezada C, Abraham-Juárez MJ, Benítez-Cardoza CG, Sloan DB, Brieba LG. 2023. Plant organellar MSH1 is a displacement loop specific endonuclease. Plant Cell Physiol. 65:560–575. doi: 10.1093/pcp/pcad112. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Pérez Di Giorgio JA, Lepage É, Tremblay-Belzile S, Truche S, Loubert-Hudon A, Brisson N. 2019. Transcription is a major driving force for plastid genome instability in Arabidopsis. PLoS One. 14(4):e0214552. doi: 10.1371/journal.pone.0214552. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Pucker B, Holtgräwe D, Stadermann KB, Frey K, Huettel B, Reinhardt R, Weisshaar B. 2019. A chromosome-level sequence assembly reveals the structure of the Arabidopsis thaliana Nd-1 genome and its gene set. PLoS One. 14(5):e0216233. doi: 10.1371/journal.pone.0216233. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Quinlan AR, Hall IM. 2010. BEDTools: a flexible suite of utilities for comparing genomic features. Bioinformatics. 26(6):841–842. doi: 10.1093/bioinformatics/btq033. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Rowan BA, Oldenburg DJ, Bendich AJ. 2010. Reca maintains the integrity of chloroplast DNA molecules in Arabidopsis. J Exp Bot. 61(10):2575–2588. doi: 10.1093/jxb/erq088. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Sanchez-Contreras M, Sweetwyne MT, Kohrn BF, Tsantilas KA, Hipp MJ, Schmidt EK, Fredrickson J, Whitson JA, Campbell MD, Rabinovitch PS, et al. 2021. A replication-linked mutational gradient drives somatic mutation accumulation and influences germline polymorphisms and genome composition in mitochondrial DNA. Nucleic Acids Res. 49(19):11103–11118. doi: 10.1093/nar/gkab901. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Sandor S, Zhang Y, Xu J. 2018. Fungal mitochondrial genomes and genetic polymorphisms. Appl Microbiol Biotechnol. 102(22):9433–9448. doi: 10.1007/s00253-018-9350-5. [DOI] [PubMed] [Google Scholar]
- Shedge V, Arrieta-Montiel M, Christensen AC, Mackenzie SA. 2007. Plant mitochondrial recombination surveillance requires unusual RecA and MutS homologs. Plant Cell. 19(4):1251–1264. doi: 10.1105/tpc.106.048355. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Skippington E, Barkman TJ, Rice DW, Palmer JD. 2017. Comparative mitogenomics indicates respiratory competence in parasitic Viscum despite loss of complex I and extreme sequence divergence, and reveals horizontal gene transfer and remarkable variation in genome size. BMC Plant Biol. 17(1):1–12. doi: 10.1186/s12870-017-0992-8. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Sloan DB, Alverson AJ, Chuckalovcak JP, Wu M, McCauley DE, Palmer JD, Taylor DR. 2012. Rapid evolution of enormous, multichromosomal genomes in flowering plant mitochondria with exceptionally high mutation rates. PLoS Biol. 10(1):e1001241. doi: 10.1371/journal.pbio.1001241. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Smith DR, Keeling PJ. 2015. Mitochondrial and plastid genome architecture: reoccurring themes, but significant differences at the extremes. Proc Natl Acad Sci U S A. 112(33):10177–10184. doi: 10.1073/pnas.1422049112. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Stupar RM, Lilly JW, Town CD, Cheng Z, Kaul S, Buell CR, Jiang J. 2001. Complex mtDNA constitutes an approximate 620-kb insertion on Arabidopsis thaliana chromosome 2: implication of potential sequencing errors caused by large-unit repeats. Proc Natl Acad Sci U S A. 98(9):5099–5103. doi: 10.1073/pnas.091110398. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Sun S, Li Q, Kong L, Yu H. 2018. Multiple reversals of strand asymmetry in molluscs mitochondrial genomes, and consequences for phylogenetic inferences. Mol Phylogenet Evol. 118:222–231. doi: 10.1016/j.ympev.2017.10.009. [DOI] [PubMed] [Google Scholar]
- Vöhringer H, Hoeck AV, Cuppen E, Gerstung M. 2021. Learning mutational signatures and their multidimensional genomic properties with TensorSignatures. Nat Commun. 12(1):3628. doi: 10.1038/s41467-021-23551-9. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Waneka G, Svendsen JM, Havird JC, Sloan DB. 2021. Mitochondrial mutations in Caenorhabditis elegans show signatures of oxidative damage and an AT-bias. Genetics. 219(2):iyab116. doi: 10.1093/genetics/iyab116. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Waters CA, Strande NT, Pryor JM, Strom CN, Mieczkowski P, Burkhalter MD, Oh S, Qaqish BF, Moore DT, Hendrickson EA, et al. 2014. The fidelity of the ligation step determines how ends are resolved during nonhomologous end joining. Nat Commun. 5(1):4286. doi: 10.1038/ncomms5286. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Wei S-J, Shi M, Chen X-X, Sharkey MJ, van Achterberg C, Ye G-Y, He J-H. 2010. New views on strand asymmetry in insect mitochondrial genomes. PLoS One. 5(9):e12708. doi: 10.1371/journal.pone.0012708. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Wolfe KH, Li WH, Sharp PM. 1987. Rates of nucleotide substitution vary greatly among plant mitochondrial, chloroplast, and nuclear DNAs. Proc Natl Acad Sci U S A. 84(24):9054–9058. doi: 10.1073/pnas.84.24.9054. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Wu Z-Q, Liao X-Z, Zhang X-N, Tembrock LR, Broz A. 2022. Genomic architectural variation of plant mitochondria—a review of multichromosomal structuring. J Syst Evol. 60(1):160–168. doi: 10.1111/jse.12655. [DOI] [Google Scholar]
- Wu Z, Waneka G, Broz AK, King CR, Sloan DB. 2020. MSH1 is required for maintenance of the low mutation rates in plant mitochondrial and plastid genomes. Proc Natl Acad Sci U S A. 117(28):16448–16455. doi: 10.1073/pnas.2001998117. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Wynn EL, Christensen AC. 2019. Repeats of unusual size in plant mitochondrial genomes: identification, incidence and evolution. G3 (Bethesda). 9(2):549–559. doi: 10.1534/g3.118.200948. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Xiao-Ming Z, Junrui W, Li F, Sha L, Hongbo P, Lan Q, Jing L, Yan S, Weihua Q, Lifang Z, et al. 2017. Inferring the evolutionary mechanism of the chloroplast genome size by comparing whole-chloroplast genome sequences in seed plants. Sci Rep. 7(1):1555. doi: 10.1038/s41598-017-01518-5. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Xu Y-Z, Arrieta-Montiel MP, Virdi KS, de Paula WBM, Widhalm JR, Basset GJ, Davila JI, Elthon TE, Elowsky CG, Sato SJ, et al. 2011. Muts HOMOLOG1 is a nucleoid protein that alters mitochondrial and plastid properties and plant response to high light. Plant Cell. 23(9):3428–3441. doi: 10.1105/tpc.111.089136. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Zampini É, Lepage É, Tremblay-Belzile S, Truche S, Brisson N. 2015. Organelle DNA rearrangement mapping reveals U-turn-like inversions as a major source of genomic instability in Arabidopsis and humans. Genome Res. 25(5):645–654. doi: 10.1101/gr.188573.114. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Zou Y, Zhu W, Sloan DB, Wu Z. 2022. Long-read sequencing characterizes mitochondrial and plastid genome variants in Arabidopsis msh1 mutants. Plant J. 112(3):738–755. doi: 10.1111/tpj.15976. [DOI] [PMC free article] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Supplementary Materials
Data Availability Statement
The Duplex Sequencing and Oxford Nanopore reads were deposited to the NCBI Sequence Read Archive (SRA) under BioProject PRJNA1113549. The Duplex Sequencing data were analyzed with our previously published pipeline (Wu et al. 2020; https://github.com/dbsloan/duplexseq) with additional analyses described using code available via https://github.com/dbsloan/recomb_mutant_seq, which also contains the R code used to create the figures in this paper.
Supplemental material available at G3 online.










