Skip to main content
eLife logoLink to eLife
. 2024 Oct 23;12:RP88456. doi: 10.7554/eLife.88456

Somatic mutation rates scale with time not growth rate in long-lived tropical trees

Akiko Satake 1,†,, Ryosuke Imai 1,, Takeshi Fujino 2, Sou Tomimoto 1, Kayoko Ohta 1, Mohammad Na'iem 3, Sapto Indrioko 3, Widiyatno Widiyatno 3, Susilo Purnomo 4, Almudena Molla Morales 5, Viktoria Nizhynska 5, Naoki Tani 6,7, Yoshihisa Suyama 8, Eriko Sasaki 1, Masahiro Kasahara 2
Editors: Wenfeng Qian9, Detlef Weigel10
PMCID: PMC11498935  PMID: 39441734

Abstract

The rates of appearance of new mutations play a central role in evolution. However, mutational processes in natural environments and their relationship with growth rates are largely unknown, particular in tropical ecosystems with high biodiversity. Here, we examined the somatic mutation landscapes of two tropical trees, Shorea laevis (slow-growing) and S. leprosula (fast-growing), in central Borneo, Indonesia. Using newly constructed genomes, we identified a greater number of somatic mutations in tropical trees than in temperate trees. In both species, we observed a linear increase in the number of somatic mutations with physical distance between branches. However, we found that the rate of somatic mutation accumulation per meter of growth was 3.7-fold higher in S. laevis than in S. leprosula. This difference in the somatic mutation rate was scaled with the slower growth rate of S. laevis compared to S. leprosula, resulting in a constant somatic mutation rate per year between the two species. We also found that somatic mutations are neutral within an individual, but those mutations transmitted to the next generation are subject to purifying selection. These findings suggest that somatic mutations accumulate with absolute time and older trees have a greater contribution towards generating genetic variation.

Research organism: Other

Introduction

Biodiversity ultimately results from mutations that provide genetic variation for organisms to adapt to their environments. However, how and when mutations occur in natural environments is poorly understood (Whitham and Slobodchikoff, 1981; Gill et al., 1995; Schoen and Schultz, 2019). Recent genomic data from long-lived multicellular species have begun to uncover the somatic genetic variation and the rate of naturally occurring mutations (Yu et al., 2020; Reusch et al., 2021). The rate of somatic mutations per year in a 234-year-old oak tree has been found to be surprisingly low (Schmid-Siegert et al., 2017) compared to the rate in an annual her (Ossowski et al., 2010). Similar analyses in other long-lived trees have also shown low mutation rates in both broadleaf trees (Plomion et al., 2018; Wang et al., 2019; Orr et al., 2020; Hofmeister et al., 2020; Duan et al., 2022) and conifers (Hanlon et al., 2019). Despite the growing body of knowledge of somatic mutation landscapes in temperate regions, there is currently no knowledge on the somatic mutation landscapes in organisms living in tropical ecosystems, which are among the most diverse biomes on Earth.

Mutations can arise from errors during replication (Reijns et al., 2015), or from DNA damage caused by exogenous mutagens or endogenous reactions at any time during cell growth (Gao et al., 2016). While DNA replication errors have long been assumed to be major sources of mutations (Makova and Li, 2002; Tomasetti and Vogelstein, 2015), a modeling study that relates the mutation rate to rates of DNA damage, repair and cell division (Gao et al., 2016) and experimental studies in yeast (Liu and Zhang, 2019), human (Abascal et al., 2021), and other animals de Manuel et al., 2022 have shown the importance of mutagenic processes that do not depend on cell division. Consequently, it remains largely unknown which source of mutations, whether replicative or non-replicative, predominates in naturally growing organisms.

To investigate the rates and patterns of somatic mutation and their relation to growth rates in tropical organisms, we studied the somatic mutation landscapes of slow- and fast-growing tropical trees in a humid tropical rain forest of Southeast Asia. By comparing the somatic mutation landscape between slow- and fast-growing species in a tropical ecosystem, we can gain insights into the mutagenesis that occurs in a natural setting. This comparison provides a unique opportunity to understand the impact of growth rate on somatic mutations and its potential role in driving evolutionary processes.

Results

Detecting somatic mutations in slow- and fast-growing tropical trees

The humid tropical rainforests of Southeast Asia are characterized by a preponderance of trees of the Dipterocarpaceae family (Ghazoul, 2016). Dipterocarp trees are highly valued for both their contribution to forest diversity and their use in timber production. For the purposes of this study, we selected Shorea laevis and S. leprosula, both native hardwood species of the Dipterocarpaceae family (Figure 1—figure supplement 1a). S. laevis is a slow-growing species (Widiyatno et al., 2014), with a mean annual increment (MAI) of diameter at breast height (DBH) of 0.38 cm/year (as measured over a 20-year period in n=2 individuals; Supplementary file 1a). In contrast, S. leprosula exhibits a faster growth rate (Widiyatno et al., 2014), with an MAI of 1.21 cm/year (n=18; Supplementary file 1a), which is 3.2 times greater than that of S. laevis. We selected the two largest individuals of each species (S1 and S2 for S.laevis and F1 and F2 for S. leprosula; Figure 1a) at the study site, located just below the equator in central Borneo, Indonesia (Figure 1—figure supplement 1b). We collected leaves from the apices of seven branches and a cambium from the base of the stem from each tree (Figure 1a; Figure 1—figure supplement 2), resulting in a total of 32 samples. To determine the physical distance between the sampling positions, we measured the length of each branch (Supplementary file 1b) and DBH (Supplementary file 1c). The average heights of the slow- and fast-growing species were 44.1 m and 43.9 m, respectively (Figure 1a; Supplementary file 1c). While it is challenging to accurately estimate the age of tropical trees due to the absence of annual rings, we used the DBH/MAI to approximate the average age of the slow-growing species to be 256 years and the fast-growing species to be 66 years (Supplementary file 1c).

Figure 1. Physical tree structures and phylogenetic trees constructed from somatic mutations.

(a) Comparisons of physical tree structures (left, branch length in meters) and neighbor-joining (NJ) trees (right, branch length in the number of nucleotide substitutions) in two tropical tree species: S. laevis, a slow-growing species (S1 and S2), and S. leprosula, a fast-growing species (F1 and F2). IDs are assigned to each sample from which genome sequencing data were generated. Vertical lines represent tree heights. (b) Distribution of somatic mutations within tree architecture. A white and gray panel indicates the presence (gray) and absence (white) of somatic mutation in each of eight samples compared to the genotype of sample 0. Sample IDs are the same between panels (a) and (b). The distribution pattern of somatic mutations is categorized as Single, Double, and More depending on the number of samples possessing the focal somatic mutations. Among 27–1 possible distribution patterns, the patterns observed in at least one of the four individuals are shown.

Figure 1.

Figure 1—figure supplement 1. Target tropical trees and location of study site.

Figure 1—figure supplement 1.

(a) Images of S. laevis (S1), a slow-growing species, and S. leprosula (F1), a fast-growing species. (b), Location of the study site in central Borneo, Indonesia.
Figure 1—figure supplement 2. Workflow for identifying de novo somatic SNVs.

Figure 1—figure supplement 2.

Eight samples (seven leaves and one cambium) were collected from four trees (two trees from each species). DNA was extracted twice independently from each sample and sequenced independently. Reads were mapped to the reference genome and used for SNV calling and filtering. SNVs over eight samples were called using GATK HaplotypeCaller (GATK) and Bcftools mpileup (BCF tools) for each set of biological replicates from seven branches and one cambium independently, generating potential SNVs for each set of replicates and for each SNP caller (G1 and G2 for GATK, B1 and B2 for BCF tools). For BCF tools, we set three thresholds (T40, T30, and T20) with different base quality (BQ) and mapping quality (MQ). SNVs detected in both replicates were extracted for each SNP callers and generated potential SNVs for each SNP caller, SNVGATK for GATK and SNVBCF for Bcftools with three thresholds. These SNVs were filtered by extracting SNVs detected in both SNP callers, generating potential SNVs for each threshold: SNVT40, SNVT30, and SNVT20. Finally, SNVs detected at any of the three thresholds were extracted to obtain candidate SNVs. We checked the candidate SNVs manually and obtained a final set of SNVs, SNVFinal.
Figure 1—figure supplement 3. Synteny relationship between S. laevis and S. leprosula.

Figure 1—figure supplement 3.

The collinear blocks within the genomes of S. leprosula and S. laevis were displayed by gray lines, with orange objects representing the contigs of the S. leprosula genome and green objects denoting the contigs of the S. laevis genome. In cases where the direction of a contig in S. laevis was partly different from that in S. leprosula, the contigs of the S. laevis genome were colored in red, otherwise it is indicated as green.

To identify somatic mutations, we constructed new reference genomes of the slow- and fast-growing species. We generated sequence data using long-read PacBio RS II and short-read Illumina sequencing and assembled the genome using DNA extracted from the apical leaf at branch 1–1 of the tallest individual of each species (S1 and F1). The genomes were estimated to contain 52,935 and 40,665 protein-coding genes, covering 97.9% and 97.8% of complete BUSCO genes (eudicots_odb10) for the slow- and fast-growing species (Supplementary file 1d). Genome sizes estimated using k-mer distribution were 347 and 376 Mb for the slow- and fast-growing species, respectively. The synteny relationship between S. laevis and S. leprosula exhibited a high level of conservation overall (Figure 1—figure supplement 3).

To accurately identify somatic mutations, we extracted DNA from each sample twice to generate two biological replicates (Figure 1—figure supplement 2). A total of 64 DNA samples were sequenced, yielding an average coverage of 69.3 and 56.5×per sample for the slow- and fast-growing species, respectively (Supplementary file 1e). We identified Single Nucleotide Variants (SNVs) within the same individual by identifying those that were identical within two biological replicates of each sample (Figure 1—figure supplement 2). We identified 728 and 234 SNVs in S1 and S2, and 106 and 68 SNVs in F1 and F2, respectively (Figure 1—figure supplement 2; Supplementary file 1f). All somatic mutations were unique and did not overlap between individuals. We conducted an independent evaluation of a subset of the inferred single nucleotide variants (SNVs) using amplicon sequencing. Our analysis demonstrated accurate annotation for 31 out of 33 mutations (94% overall), with 22 out of 24 mutations on S1 and all 9 mutations on S2 (Supplementary file 1g).

Somatic mutation rates per year is independent of growth rate

Phylogenetic trees constructed using somatic mutations were almost perfectly congruent with the physical tree structures (Figure 1a), even though we did not incorporate knowledge of the branching topology of the tree in the SNV discovery process. The majority of somatic mutations were present at a single branch, but we also identified somatic mutations present in multiple branches (Figure 1b) which are likely transmitted to new branches during growth. We also observed somatic mutations that did not conform to the branching topology (Figure 1b), as theoretically predicted due to the stochastic loss of somatic mutations during branching (Tomimoto and Satake, 2023).

Our analysis revealed that the number of SNVs increases linearly as the physical distance between branch tips increases (Figure 2a). The somatic mutation rate per site per meter was determined by dividing the slope of the linear regression of the number of SNVs against the physical distance between branch tips by the number of callable sites from the diploid genome of each tree (Figure 2b; Supplementary file 1h). The somatic mutation rate per nucleotide per meter was 7.08×10–9 (95% CI: 6.41–7.74×10−9) and 4.27×10–9 (95% CI: 3.99–4.55×10−9) for S1 and S2, and 1.77×10–9 (95% CI: 1.64–1.91×10−9) and 1.29×10–9 (95% CI: 1.05–1.53×10−9) for F1 and F2, respectively. The average rate of somatic mutation for the slow-growing species was 5.67×10–9 nucleotide–1 m–1, which is 3.7-fold higher than the average rate of 1.53×10–9 nucleotide–1 m–1 observed in the fast-growing species (Figure 2b; Supplementary file 1h). This result indicates that the slow-growing tree accumulates more somatic mutations compared to the fast-growing tree to grow the unit length. This cannot be explained by differences in the number of cell divisions, as the length and diameter of fiber cells in both species are not substantially different 1.29 mm and 19.0 μm for the slow-growing species (Usami, 1978) and 0.91 mm and 22.7 μm for the fast-growing species (Praptoyo and Mayaningsih, 2012).

Figure 2. The relationship between the physical distance and the numbers of SNVs.

Figure 2.

(a) Linear regression of the number of SNVs against the pair-wise distance between branch tipcs with an intercept of 0 for each tree (S1: blue, S2: right blue, F1: red, and F2: orange). Shaded areas represent 95% confidence intervals of regression lines. Regression coefficients are listed in Supplementary file 1h. (b) Comparison of somatic mutation rates per nucleotide per growth and per year across four tropical trees. Bars indicate 95% confidence intervals.

Based on the estimated age of each tree, somatic mutation rate per nucleotide per year was calculated for each tree. On average, resultant values were largely similar between the two species, with 7.71×10–10 and 8.05×10–10 nucleotide–1 year–1 for the slow- and fast-growing species, respectively (Figure 2b; Supplementary file 1h). This result suggests that somatic mutation accumulates in a clock-like manner as they age regardless of tree growth. The result suggests that somatic mutation accumulates in a clock-like manner as they age regardless of tree growth. Our estimates of somatic mutation rates per nucleotide per year in Shorea are higher than those previously reported in other long-lived trees such as Quercus robur (Schmid-Siegert et al., 2017), Populus trichocarpa (Hofmeister et al., 2020), Eucalyptus melliodora (Orr et al., 2020), and Picea sitchensis (Hanlon et al., 2019). This might suggest that long-lived trees in the tropics do not necessarily suppress somatic mutation rates to the same extent as their temperate counterparts. To validate this assertion, additional studies are required to compare somatic mutation rates among trees in tropical, temperate, and boreal regions, employing standardized methodologies.

Mutational spectra are similar between slow- and fast-growing trees

Somatic mutations may be caused by exogenous factors such as ultraviolet and ionizing radiation, or endogenous factors such as oxidative respiration and errors in DNA replication. To identify characteristic mutational signatures caused by different mutagenic factors, we characterized mutational spectra by calculating the relative frequency of mutations at the 96 triplets defined by the mutated base and its flanking 5' and 3' bases (Figure 3; Figure 3—figure supplement 1). Across species, the mutational spectra showed a dominance of cytosine-to-thymine (C>T and G>A on the other strand, noted as C:G>T:A) substitutions at CpG sites with CG (Figure 3a and b). This is believed to result from the spontaneous deamination of 5-methylcytosine (Coulondre et al., 1978; Duncan and Miller, 1980). Methylated CpG sites spontaneously deaminate, leading to TpG sites and increasing the number of C>T substitutions (Cooper and Krawczak, 1989). Compared to the proportion of CpG sites in the reference genomes, the proportion of somatic mutations at CpG sites showed a 3.38-fold and 2.56-fold increase for F1 and F2, and a 4.54-fold and 3.53-fold increase for S1 and S2, respectively.

Figure 3. Mutational spectra of somatic SNVs.

Somatic mutation spectra in S. laevis (upper panel) and S. leprosula (lower panel). The horizontal axis shows 96 mutation types on a trinucleotide context, coloured by base substitution type. Different colours within each bar indicate complementary bases. For each species, the data from two trees (S1 and S2 for S. laevis and F1 and F2 for S. leprosula) were pooled to calculate the fraction of each mutated triplet.

Figure 3.

Figure 3—figure supplement 1. Mutational spectra of somatic and inter-individual substitutions.

Figure 3—figure supplement 1.

(a) Somatic mutation spectra for S1 and S2 individuals in S. laevis. (b), Somatic mutation spectra for F1 and F2 individuals in S. leprosula. (c), Inter-individual SNVs between S1 and S2 (upper panel) and between F1 and F2 (lower panel). The horizontal axis shows 96 mutation types on a trinucleotide context, coloured by base substitution type. Different colours in each bar indicate complementary bases.
Figure 3—figure supplement 2. Manual confirmation of candidate SNVs.

Figure 3—figure supplement 2.

(a) SNVs that passed manual confirmation. (b) SNVs that were removed due to their fixed heterozygote pattern. (c) SNVs that have been removed due to the difference between the observed pattern and the genotyping call. (d) SNVs that were removed due to the presence of another allele with multiple reads.
Figure 3—figure supplement 3. Proportion of potential false positive SNVs for S. laevis (S1, S2) and S. leprosula (F1, F2).

Figure 3—figure supplement 3.

Potential false positive SNVs was identified as the subset of candidate SNVs that were not included in the final set for each threshold (T40, T30, and T20). This subset was then divided by the total number of potential SNVs at each threshold to determine the proportion.
Figure 3—figure supplement 4. Proportion of potential false negative SNVs for S. laevis (S1, S2) and S. leprosula (F1, F2).

Figure 3—figure supplement 4.

Potential false negative SNVs was identified as the subset of potential SNVs present in the final set but excluded from the candidate SNVs for each threshold (T40, T30, and T20). This subset was then divided by the total number of potential SNVs at each threshold to calculate the proportion.

We compared the mutational spectra of our tropical trees to single-base substitution (SBS) signatures in human cancers using the Catalogue Of Somatic Mutations In Cancer (COSMIC) compendium of mutation signatures (COSMICv.2 Alexandrov et al., 2013; Nik-Zainal et al., 2016; Alexandrov et al., 2020). The mutational spectra were largely similar to the dominant mutation signature in humans known as SBS1 (cosine similarity = 0.789 and 0.597 for the slow- and fast-growing species; Supplementary file 1i). SBS1 is believed to result from the spontaneous deamination of 5-methylcytosine. The mutational spectra were also comparable to another dominant signature in all human cancers, SBS5 (cosine similarity = 0.577 and 0.558 for the slow- and fast-growing species; Supplementary file 1i), the origin of which remains unknown. Our finding that somatic mutations in tropical trees accumulate in a clock-like manner (Figure 2a) is consistent with the clock-like mutational process observed in SBS1 and SBS5 in human somatic cells (Alexandrov et al., 2015; Lee-Six et al., 2019). This suggests that the mutational processes in plants and animals are conserved, despite the variation in their life forms and environmental conditions.

Somatic mutations are neutral but inter-individual SNVs are subject to selection

We tested whether the somatic mutations and inter-individual SNVs are subject to selection (Figure 4a). The observed rate of non-synonymous somatic mutations did not deviate significantly from the expected rate under the null hypothesis of neutral selection in both the slow- (binomial test: p=0.71) and fast-growing (binomial test: p=1.0) species (Figure 4b; Supplementary file 1j). In contrast, the number of inter-individual SNVs were significantly smaller than expected (p<10–15 for both species: Figure 4c). These results indicate that somatic mutations are largely neutral within an individual, but mutations passed to next generation are subject to strong purifying selection during the process of embryogenesis, seed germination, and growth.

Figure 4. Detecting selection on somatic and inter-individual SNVs.

(a) An illustration of somatic and inter-individual SNVs. Different colours indicate different genotypes. (b) Expected (Exp.) and observed (Obs.) rates of somatic non-synonymous substitutions. (c) Expected (Exp.) and observed (Obs.) rates of inter-individual non-synonymous substitutions. (d) The difference between the fractions of inter-individual and somatic substitutions spectra in S. laevis (upper panel) and S. leprosula (lower panel). The positive and negative values are plotted in different colours. The horizontal axis shows 96 mutation types on a trinucleotide context, coloured by base substitution type.

Figure 4.

Figure 4—figure supplement 1. A calculation scheme for the expected rate of non-synonymous mutation.

Figure 4—figure supplement 1.

The possible numbers of synonymous (NS), missense (NM), and nonsense (NNon) mutations were counted for each of six base substitution classes from all possible mutations in CDS of length Lcds and used for the calculation of expected rate of non-synonymous mutation. For non-synonymous mutation, we pooled the number for missense and nonsense mutations. The background mutation rate for each substitution class i (ri) is calculated from the observed somatic substitutions in intergenic regions.

Overall, the mutational spectra were similar between somatic and inter-individual SNVs (Figure 3—figure supplement 1). However, the fraction of C>T substitutions, in particular at CpG sites, was lower in inter-individual SNVs compared to somatic SNVs (Figure 4d). This observation may be indicative of the potential influence of GC-biased gene conversion during meiosis (Duret and Galtier, 2009) or biased purifying selection for C>T inter-individual nucleotide substitutions.

Discussion

Our study demonstrates that while the somatic mutation rate per meter is higher in the slow- than in fast-growing species, the somatic mutation rate per year is independent of growth rate. To gain deeper understanding of these findings, we developed a simple model that decomposes the mutation rate per site per cell division (μ into the two components: DNA replication dependent (α) and replication independent (β) mutagenesis). This can be represented as μ=α+βτ, where τ is the duration of cell cycle measured in years. The replication dependent mutation emanates from errors that occur during DNA replication, such as the misincorporation of a nucleotide during DNA synthesis. The replication independent mutation arises from DNA damage caused by endogenous reactions or exogenous mutagens at any time of cell cycle. Since the number of cell division per year is given as r=1/τ, the mutation rate per year becomes rμ = α/τ+β. From the relationship, the number of nucleotide substitution per site accumulated over t years, denoted as m(t), is given by m(t)=(α/τ+β)t. The formula indicates that when β is significantly greater than α, somatic mutations accumulate with tree age rather than with tree growth.

We estimated the relative magnitudes of α and β by using the results obtained from our study. Given that the cell cycle duration is likely inversely proportional to MAI, we have τS/τF = 3.2 (Supplementary file 1a), where τS and τF denote the cell cycle duration for the slow- and fast-growing species, respectively. It is also reasonable to assume that the same number of cell divisions are required to achieve 1 m of growth in both species as the cell size is similar between the two species. Based on our estimates of the somatic mutation rate per site per meter for the slow- (μS) and fast-growing species (μF), we have μS/μF = (α+βτS)/(α+βτF) = 3.7, which is close to the ratio of cell cycle duration τS/τF. This consistency can be explained by the substantial contribution of the replication independent mutagenesis to the somatic mutation rate (i.e. βα), as long as the magnitudes for α and β are similar between the two species. The time required for a unit length to grow can vary even within the same species, depending on microenvironmental conditions such as the availability of light and nutrients. These variations could explain the differences in somatic mutation rates per unit growth between two individuals within the same species (Figure 2).

This argument concords with previous studies in human and other animals, which showed the presence of mutations that do not track cell division (Abascal et al., 2021; de Manuel et al., 2022). This study contributes to understanding the importance of non-replicative mutagenesis in naturally grown trees by decoupling the impacts of growth and time on the rate of somatic mutation. The preponderance of non-replicative mutational process can be attributed to its distinct molecular origin, the accumulation of spontaneous CpG mutations with absolute time. The neutral nature of newly arising somatic mutations within the tree results in a molecular clock, a constant rate of molecular evolution (Zuckerkandl and Pauling, 1965; Kimura and Ota, 1971; Kimura, 1983). For our argument, we made an intuitive assumption that the number of stem cell divisions increases with distance regardless of species when cell size is similar. However, to further validate this assumption, we require mathematical models that consider the asymmetric division of stem cells within the meristem (Watson et al., 2016; Lanfear, 2018) and complex stem cell population dynamics during elongation and branching in tree growth (Tomimoto and Satake, 2023; Iwasa et al., 2023). Moreover, understanding establishment timing of germlines during development is crucial in addressing the impact of somatic mutation on the next generation (Lanfear, 2018). The model we have presented here is based on the assumption that genetic drift is prominent within a stem cell population, and that a single stem cell lineage becomes fixed within a meristem. However, future studies could explore relaxing this assumption to consider the contribution of multiple stem cell lineages. By doing so, we can gain insights into how the relationship between pairwise genetic differences and the distance between branch tips is influenced by the branching architecture of the tree and the strength of genetic drift. Furthermore, improving the accuracy of our argument, as derived from the model, can be achieved through future investigations that directly estimate the cell cycle duration for each individual tree.

The relative importance of replication independent mutagenesis, represented as the relative magnitude of β compared to α, can vary through evolution possibly through selection on DNA repair pathways. The selection pressure that leads to different magnitudes either or both for α or β may explain the differential somatic mutation rate per year in mammals with different lifespan (Cagan et al., 2022). Conversely, in plants, the selection pressure to constrain somatic mutation rates to lower levels in long-lived trees might be less significant. A definitive answer to this query awaits the accumulation of additional data on somatic mutation rates in closely related plant species inhabiting the same environment but exhibiting different growth rates.

Materials and methods

Study site and sampling methods

The study site is in a humid tropical rain forest in Central Borneo, Indonesia (00°49′ 45.7′′ S, 112°00′ 09.5′′ E; Figure 1—figure supplement 1b). The forest is characterized by a prevalence of trees of the Dipterocarpaceae family and is managed through a combination of selective logging and line planting (Tebang Pilih Tanam Jalur, TPTJ). The mean annual temperature range from 2001–2009 was between 22 to 28°C at night and 30 to 33°C during the day, with an average annual precipitation of 3376 mm41.

The study focuses on two native Dipterocarpaceae species, S. laevis and S. leprosula (Figure 1—figure supplement 1a). We logged two individuals from each species (S1 and S2 for S. laevis and F1 and F2 for S leprosula; Figure 1—figure supplement 1a) on July 17–18, 2018 and collected samples prior to their transportation for timber production. Approximately 0.4–1.0 g of leaf tissue was collected from each of the apices of seven branches and approximately 5 g of cambium tissue was taken from the base of the stem per individual (Figure 1—figure supplement 2). To calculate the physical distance between sampling positions within the tree architecture, we measured the length of each branch (Supplementary file 1b). Samples were promptly preserved in a plastic bag with silica gel following harvest and transported to the laboratory within 4 days of sampling. During transportation, samples were kept in a cooler box with ice to maintain a low temperature. Once in the laboratory, samples were stored at −80 °C until DNA and RNA extraction.

DBH have been recorded for the trees with DBH greater than 10 cm every two years since 1998 within three census plots of 1 hectare (100×100 m) in size located near the target trees. The mean growth was calculated by taking the average of MAI of DBH for 2 and 18 trees for the slow- and fast-growing species, respectively (Supplementary file 1a).

DNA extraction

For short-read sequencing, DNA extraction was performed using a modified version of the method described previously (Doyle and Doyle, 1987) as follows: Frozen leaves were ground in liquid nitrogen and washed up to five times with 1 mL buffer (including 100 mM HEPES pH 8.0, 1% PVP, 50 mM Ascorbic acid, 2% (v/v) β-mercaptoethanol) (Toyama et al., 2015). DNA was treated with Ribonuclease (Nippongene, Tokyo, Japan) according to the manufacture’s instruction. DNA was extracted twice independently from each sample for two biological replicates. The DNA yield was measured on a NanoDrop ND-2000 spectrophotometer (Thermo Fisher Scientific, Waltham, MA, USA) and Qubit4 Fluorometer (Thermo Fisher Scientific). For long-read sequencing, we extracted high-molecular-weight genomic DNA from branch 1–1 leaf materials of S1 and F1 individuals using a modified CTAB method (Doyle, 1991).

RNA extraction and sequencing

For genome annotation, total RNA was extracted from the cambium sample of the S1 individual of S. laevis in accordance with the method described in a previous study (Yeoh et al., 2017). RNA integrity was measured using the Agilent RNA 6000 Nano kit on a 2100 Bioanalyzer (Agilent Technologies, Santa Clara, CA, USA), and the RNA yield was determined using a NanoDrop ND-2000 spectrophotometer (Thermo Fisher Scientific). The extracted RNA was sent to Pacific Alliance Lab (Singapore), where a cDNA library was prepared with a NEBNext Ultra RNA Library Prep Kit for Illumina (New England BioLabs, Ipswich, MA, USA) and 150 paired-end transcriptome sequencing was conducted using an Illumina NovaSeq6000 sequencer (Illumina, San Diego, CA, USA). For S. leprosula, we used published RNA-seq data (Ng et al., 2021).

Illumina short-read sequencing and library preparation

For Illumina short-read sequencing, the DNA sample from the first replicate of the S1 individual of S. laevis was sent to the Next Generation Sequencing Facility at Vienna BioCenter Core Facilities (VBCF), a member of the Vienna BioCenter (VBC) in Austria, for library preparation and sequencing on the Illumina HiSeq2500 platform (Illumina). The library was prepared using the on-bead tagmentation library prep method according to the manufacturer’s protocol and was individually indexed with the Nextera index Kit (Illumina) by PCR. Insert size was adjusted to around 450 bp. The quantity and quality of each amplified library were analyzed using the Fragment Analyzer (Agilent Technologies) and the HS NGS Fragment Kit (Agilent Technologies).

The DNA sample from the second replicate of the S1 individual and two replicates from the S2, F1, and F2 individuals were sent to Macrogen Inc (Republic of Korea) for sequencing on the Illumina HiseqX platform (Illumina). DNA was sheared to around 500 bp fragments in size using dsDNA fragmentase (New England BioLabs). Library preparation was performed using the NEBNext Ultra II DNA Library Prep Kit (New England BioLabs) according to the manufacturer’s protocol, and the libraries were individually indexed with the NEBNext Multiplex Oligos for Illumina (New England BioLabs) by PCR. The quality and quantity of each amplified library were analyzed using the Bioanalyzer 2100 (Agilent Technologies), the High Sensitivity DNA kit (Agilent Technologies), and the NEBNext Library Quant Kit for Illumina (New England BioLabs). In total, 64 samples (16 samples per individual) were used for short-read sequencing.

PacBio long-read sequencing and library preparation

To construct the reference genome of S. laevis and S. leprosula, high-molecular-weight DNA samples were extracted from branch 1–1 leaf materials of S1 and F1 individuals of each species, and sequenced using PacBio platforms. For S. laevis, library preparation and sequencing were performed at VBCF. The library was prepared using the SMRTbell express Kit (PacBio, Menlo Park, CA, USA), and sequenced on the Sequel platform with six SMRT cells (PacBio). For S. leprosula, library preparation and sequencing were performed by Macrogen Inc (Republic of Korea). The library for S. leprosula was prepared using the HiFi SMARTbell library preparation system (PacBio) according to the manufacturer’s protocol, and was sequenced on the Sequel II platform (PacBio) with one SMRT cell.

Genome assembly

The PacBio continuous long reads of S. laevis were assembled using Flye 2.7-b1587 (Kolmogorov et al., 2019) with 12 threads and with an estimated genome size of 350 Mbp. We subsequently used HyPo v1.0.3 (Kundu et al., 2019) for polishing the contigs. The Illumina read alignments provided to HyPo were created using Bowtie v2.3.4.3 (Langmead and Salzberg, 2012) with --very-sensitive option and using 32 threads. We used the Illumina reads from all branches of the individual S1 rather than utilizing exclusively those of branch 1–1, in order to capitalize on the increased aggregate sequencing depth.

The PacBio HiFi reads of S. leprosula with an average Quality Value (QV) 20 or higher were extracted, and subsequently assembled using Hifiasm 0.16.1-r375 (Cheng et al., 2021), with -z10 option and using 40 threads. The primary assembly of S. leprosula was used for further analysis. The quality and completeness of the genome assembly were assessed by searching for a set of 2326 core genes from eudicots_odb10 using BUSCO v5.3.0 (Manni et al., 2021) for each species (Supplementary file 1d).

Genome annotation

We constructed repeat libraries of S. laevis and S. leprosula using EDTA v2.0.0 (Ou et al., 2019). Using the libraries, we ran RepeatMasker 4.1.2-p1 (Smit et al., 2021) with -s option and with Cross_match as a search engine, to perform soft-masking of trepetitive sequences in the genomes. The estimated percentages of the repetitive sequences were 42.4% for S. laevis and 39.5% for S. leprosula (Supplementary file 1d).

We ran BRAKER 2.1.6 (Brůna et al., 2021) to perform gene prediction by first incorporating RNA-seq data and subsequently utilizing a protein database, resulting in the generation of two sets of gene predictions for each species. To perform RNA-seq-based prediction, we mapped the RNA-seq reads (see RNA extraction in Methods section) to the genomes using HISAT 2.2.1 (Kim et al., 2019), with the alignments subsequently being employed as training data for BRAKER. For protein-based prediction, we used proteins from the Viridiplantae level of OrthoDB v10 (Zdobnov et al., 2021) as the training data.

The two sets of gene predictions were merged using TSEBRA (commit 0e6c9bf in the GitHub repository, Hoff, 2022; Gabriel et al., 2021) to select reliable gene predictions for each species. Although in principle TSEBRA groups overlapping transcripts and considers them as alternative spliced isoforms of the same gene, we identified instances where one transcript in a gene overlapped with another transcript in a separate gene. In such cases, we manually clustered these transcripts into the same gene.

We used EnTAP 0.10.8 (Hart et al., 2020) with default parameters for functional annotation. The databases employed were: UniProtKB release 2022_05 Bateman, 2021, NCBI RefSeq plant proteins release 215 (O’Leary et al., 2016), EnTAP Binary Database v0.10.8 (Hart et al., 2020) and EggNOG 4.1 (Powell et al., 2014). We constructed the standard gene model by utilizing the gene predictions of each species, eliminating any gene structures that lacked a complete ORF. Transcripts containing Ns were also excluded. Following the filtering process, the splice variant displaying the longest coding sequence (CDS) was selected as the primary isoform for each gene. The set of primary isoforms was used as the standard gene model.

Genome size estimation

We estimated genome size of two species using GenomeScope (Vurture et al., 2017). We counted k-mer from forward sequence data of branch 1–1 from the S1 and F1 individuals using KMC 3 (Kokot et al., 2017) (k=21). The genome size and heterozygous ratio were estimated by best model fitting. Estimated genome sizes were 347 Mb for the slow-growing species and 376 Mb for the fast-growing species. These estimates were 8% and 7% smaller than the estimates obtained through flow cytometry (Ng et al., 2016), respectively. The genome size of the fast-growing species was nearly identical to that previously reported for S. leprosula in peninsular Malaysia (Ng et al., 2021).

Genome synteny analysis

To investigate the syntenic relationship between S. laevis and S. leprosula, the synteny analysis performed using the MCScanX in TBtools-II (Toolbox for Biologists) v1.120 (https://github.com/CJ-Chen/TBtools/releases; Chen, 2023) with default parameters. For the synteny analysis, we selected 20 contigs from S. leprosula because these were the only ones that exhibited synteny blocks between the two species. 20 contigs covers more than 99.5% of the S. leprosula genome. The syntenic blocks spanning more than 30 genes were displayed in the synteny map (Figure 1—figure supplement 3).

Somatic (intra-individual) SNV discovery

We filtered low quality reads out and trimmed adapters using fastp v22.0 (Chen et al., 2018) with following options: -q 20 n 10 t 1 T 1 l 20 w 16. The cleaned reads were mapped to the reference genome using bwa-mem2 22.1 (Vasimuddin et al., 2019) with default parameters. We removed PCR duplicates using fixmate and markdup function of samtools 1.13 (Li et al., 2009). The sequence reads were mapped to the reference genome, yielding average mapping rates of 91.61% and 89.5% for the slow- and fast-growing species, respectively. To identify reliable SNVs, we utilized two SNP callers bcftools mpileup (Li et al., 2009; Li, 2011) and GATK (4.2.4.0) HaplotypeCaller (McKenna et al., 2010) and extracted SNVs detected by both (Figure 1—figure supplement 2).

We first called SNVs with BCFtools 1.13 (Danecek et al., 2021) mpileup at three different thresholds; threshold 1 (T40): mapping quality (MQ)=40, base quality (BQ)=40; threshold 2 (T30): MQ = 30, BQ = 30; threshold 3 (T20): MQ = 20, BQ = 20. SNVs detected under each threshold were pooled for further analyses, with duplicates removed. We normalized indels using bcftools norm for vcf files. We removed indels and missing data using vcftools 0.1.16 (Danecek et al., 2011).

Second, we called SNVs using GATK (4.2.4.0) HaplotypeCaller and merged the individual gvcfs into a vcf file containing only variant sites. We removed indels from the vcf using the GATK SelectVariants. We filtered out unreliable SNVs using GATK VariantFiltration with the following filters: QD (Qual By Depth)<2.0, QUAL (Base Quality)<30.0, SOR (Strand Odds Ratio)>4.0, FS (Fisher Strand)>60.0, MQ (RMS Mapping Quality)<40.0, MQRankSum (Mapping Quality Rank Sum Test)<–12.5, ReadPosRankSum (Read Pos Rank Sum Test)<–8.0. After performing independent SNV calling for each biological replicate using each SNP caller, we extracted SNVs that were detected in both replicates for each SNP caller. We further extracted SNVs that were detected by both bcftools mpileup and GATK HaplotypeCaller (Figure 1—figure supplement 2) using Tassel5 (Bradbury et al., 2007) and a custom python script, generating potential SNVs for each threshold. Finally, SNVs detected at any of the three thresholds were extracted to obtain candidate SNVs. The number of SNVs at each filtering step can be found in Supplementary file 1f.

The candidate SNV calls were manually confirmed by two independent researchers using the IGV browser (Robinson et al., 2017). We removed sites from the list of candidates if there were fewer than five high-quality reads (MQ >20) in at least one branch sample among the 16 samples. After labeling branches carrying the called variant as somatic mutations, we compared the observed pattern with the genotyping call and extracted SNVs that were supported more than one read in both biological replicates (Figure 3—figure supplement 2a). We illustrated three types of false positive SNVs that were removed from the list of candidates in Figure 3—figure supplement 2b–d. The final set of SNVs can be found in Supplementary file 1k. Proportion of potential false positive and negative SNVs for each threshold are illustrated in Figure 3—figure supplements 3 and 4.

The NJ tree for each individual was generated using MEGA11 (Tamura et al., 2021) based on the matrix of the number of sites with somatic SNVs present between each pair of branches and edited using FigTree v1.4.4 (http://tree.bio.ed.ac.uk/software/figtree/). Most of the somatic SNVs were heterozygous, whereas 4% of the total SNVs (46/1136) were homozygous (Supplementary file 1k). The homozygous sites were treated as a single mutation due to the likelihood of a genotyping error being higher than the probability of two mutations occurring at the same site.

Inter-individual SNV discovery

We also identified SNVs between pairs of individuals within each species as inter-individuals SNVs. The method for calling inter-individual SNVs was the same as for intra-individual SNVs, except that only threshold 2 (MQ = 30, BQ = 30) for BCFtools 1.13 (Danecek et al., 2021) was used. We extracted SNVs that are present in all branches within an individual using Tassel5 (Bradbury et al., 2007). To exclude ambiguous SNV calls, we removed SNVs within 151 bp of indels that were called with BCFtools 1.13 (Danecek et al., 2021) with the option of threshold 2. We eliminated SNVs within 151 bp of sites with a depth value of zero that occur in more than ten consecutive sites. We also removed SNVs that had a depth smaller than five or larger than d+3d, where d represents the mean depth of all sites (Li, 2014). Due to the large number of candidates for inter-individual SNVs, the manual checking process was skipped.

Somatic SNVs confirmation by amplicon sequencing

We verified the reliability of the final set of somatic SNVs by amplicon sequencing approximately 5% of the SNVs in S. laevis (31 and 10 SNVs for S1 and S2, respectively). We used multiplexed phylogenetic marker sequencing method MPM-seq (Suyama et al., 2022) with modifications to the protocol as follows: to amplify 152–280 bp fragments, the first PCR primers comprising tail sequences for the second PCR primers were designed on the flanking regions of each SNV. The first PCR was conducted using the Fast PCR cycling kit (Qiagen, Düsseldorf, Germany) under the following conditions: an initial activation step at 95 °C for 5 min, followed by 30 cycles of denaturation at 96 °C for 5 s, annealing at 50/54/56 °C for 5 s, and extension at 68 °C for 10 s. This was followed by a final incubation at 72 °C for 1 min. Subsequent next-generation sequencing was performed on an Illumina MiSeq platform using the MiSeq Reagent Kit v2 (300 cycles: Illumina).

Amplicon sequencing reads were mapped to the reference genome using bwa-mem2 22.1 (Vasimuddin et al., 2019) with default parameters. Using bcftools mpileup (Danecek et al., 2021), we called the genotypes of all sites on target regions and eliminated candidate sequences with MQ and BQ less than 10. The final set of sites selected for confirmation consisted of 24 for the S1 individual and 9 for the S2 individual. We manually confirmed the polymorphic patterns at the target sites using the IGV browser (Robinson et al., 2017). If the alternative allele was present or absent in all eight branches in the amplicon sequence, the site was determined as fixed. The site was determined as mismatch if the difference of polymorphic patterns between the somatic SNV calls and amplicon sequence was supported by more than four reads per branch. The sites that were neither fixed nor mismatched were determined as true. 94% (31/33) of SNVs at the final target sites, with 22 out of 24 mutations on S1 and all 9 mutations on S2, were confirmed to exhibit a polymorphic pattern that exactly matched between the somatic SNV calls and amplicon sequence (Supplementary file 1g). It is important to note that the SNVs that were not matched with amplicon sequencing data could potentially represent true somatic mutations. This discrepancy could be attributed to a low allele frequency, where the call is not identified as heterozygous despite the presence of a true mutation.

Somatic mutation rates per growth and per year

To estimate the somatic mutation rate per nucleotide per growth (μg), a linear regression analysis of the number of somatic SNVs against the physical distance between sampling positions within an individual was conducted using the lm package, with an intercept of zero, in R version 3.6.2. The somatic mutation rate per nucleotide per growth was estimated as:

μg=b2×R,

where b indicates the slope of linear regression and R denotes the number of callable sites, respectively. Note that the denominator includes a factor of two due to diploidy. A site was considered callable when it passed the filters as the polymorphic sites, that is, a mapping quality of at least 40 using GATK, a mapping quality of at least 20 using BCFtools, and a depth greater than or equal to 5. This resulted in 388,801,756 and 320,739,335 base pairs for S1 and S2 and 327,435,618 and 263,488,812 base pairs for F1 and F2, respectively.

The somatic mutation rate per nucleotide per year (μy) was estimated as:

μy=M2×R×A

Here, M indicate the total number of SNVs accumulated from the base (ID 0 in Figure 1a; Supplementary file 1b) to the branch tip and A represents tree age, respectively. R denotes the number of callable sites that was also used to estimate μg. Because there are seven branch tips for each tree (Figure 1a), we estimated μy for each of branch tips and then calculated the mean and 95% confidence interval for each tree (Supplementary file 1h).

Mutational spectrum

Mutational spectra were derived directly from the reference genome and alternative alleles at each variant site. There are a total of six possible classes of base substitutions at each variant site: A:T>G:C (T>C), G:C>A:T (C>T), A:T>T:A (T>A), G:C>T:A (C>A), A:T>C:G (T>G), and G:C>C:G (C>G), By considering the bases immediately 5′ and 3′ to each mutated base, there are a total of 96 possible mutation classes, referred to as triplets, in this classification. We used seqkit (Shen et al., 2016) to extract the triplets for each variant site. To count the number of each triplet, we used the Wordcount tool in the EMBOSS web service (https://www.bioinformatics.nl/cgi-bin/emboss/wordcount). We calculated the fraction of each mutated triplet by dividing the number of mutated triplets by the total number of triplets in the reference genome.

We compared the mutational signatures of our tropical trees to those of single-base substitution (SBS) signatures in human cancers using Catalogue Of Somatic Mutations In Cancer (COSMIC) compendium of mutation signatures (COSMICv.2 Alexandrov et al., 2013; Nik-Zainal et al., 2016; Alexandrov et al., 2020; Greenman et al., 2006; Martincorena et al., 2017, available at https://cancer.sanger.ac.uk/cosmic/signatures_v2). Cosine similarity was calculated between each tropical tree species and each SBS signature in human cancers.

Testing selection of somatic and inter-individual SNVs

To test whether somatic and inter-individual SNVs are subject to selection, we calculated the expected rate of non-synonymous mutation. For the CDS of length Lcds, there are possible numbers of mutations of length of 3Lcds (Figure 4—figure supplement 1). We classified all possible mutations into three types based on the codon table: synonymous, missense, and nonsense (Figure 4—figure supplement 1). Each type of mutation was counted for each of the six base substitution classes (Figure 4—figure supplement 1). We generated count tables based on two distinct categories of CDS: those that included all isoforms and those that only encompassed primary isoforms (Supplementary file 1l). As the two tables were largely congruent, we employed the version which included all isoforms of CDS.

Using the count table and background mutation rate for each category of substitution class, we calculated the expected number of synonymous (λS) and non-synonymous mutations (λN) (Figure 4—figure supplement 1). As a background mutation rate, we adopted the observed somatic mutation rates in the six substitution classes in the intergenic region (Supplementary file 1m), assuming that the intergenic region is nearly neutral to selection. Because the number of nonsense somatic mutation is small, we combined missense and nonsense mutations as non-synonymous. The intergenic regions were identified as the regions situated between 1 kbp upstream of the start codon and 500 bp downstream of the stop codon. Expected rate of synonymous mutation (pN) is given as λN/(λS+λN). Given the observed number of non-synonymous and synonymous mutations, we rejected the null hypothesis of neutral selection using a binomial test with the significance level of 5% (Supplementary file 1j). We used the package binom.test in R v3.6.2.

We also used the observed somatic mutation rate in the whole genome (Supplementary file 1m), including genic and intergenic regions, as the background mutation rate and confirmed the robustness of our conclusion (Supplementary file 1j). The somatic mutation rates in the intergenic region and the whole genome were calculated for each species by pooling the data from two individuals (Supplementary file 1m). While cancer genomics studies have accounted for more detailed context-dependent mutations, such as the high rate of C>T at CpG dinucleotides (Greenman et al., 2006) or comprehensive analysis of 96 possible substitution classes in triplet context (Martincorena et al., 2017), the number of SNVs in our tropical trees is too small to perform such a comprehensive analysis. Therefore, we used the relatively simple six base substitution classes. The genes with somatic SNVs can be found in Supplementary file 1n.

Acknowledgements

The authors thank to M Ohno for her insightful discussion, M Seki for his assistance with statistical analysis, S K Hirota for his technical support in molecular experiments, and Y Ikezaki for her support in synteny analyses. We also thank Y Iwasa, H Tachida, M M Manuel, N Spisak, M Przeworski and M, Nordborg for their very insightful comments on the initial draft of our manuscript.

Funding Statement

The funders had no role in study design, data collection and interpretation, or the decision to submit the work for publication.

Contributor Information

Akiko Satake, Email: akiko.satake@kyudai.jp.

Wenfeng Qian, Chinese Academy of Sciences, China.

Detlef Weigel, Max Planck Institute for Biology Tübingen, Germany.

Funding Information

This paper was supported by the following grants:

  • Japan Society for the Promotion of Science JP17H06478 to Akiko Satake.

  • Japan Society for the Promotion of Science JP22H04925 to Masahiro Kasahara.

Additional information

Competing interests

No competing interests declared.

Author contributions

Conceptualization, Resources, Formal analysis, Supervision, Funding acquisition, Validation, Methodology, Writing – original draft, Project administration, Writing – review and editing.

Data curation, Software, Formal analysis, Validation, Investigation, Visualization, Writing – review and editing.

Data curation, Software, Writing – review and editing.

Formal analysis, Validation, Visualization.

Investigation.

Resources.

Resources.

Resources.

Resources.

Investigation.

Investigation.

Resources.

Resources, Investigation, Writing – review and editing.

Formal analysis, Validation, Visualization, Writing – review and editing.

Data curation, Software, Funding acquisition, Writing – review and editing.

Additional files

Supplementary file 1. Supplementary tables.

(a) Mean annual increment (MAI) of diameter at breast height (DBH). (b) Matrix of physical distances (m) between sampling positions and the number of SNVs indicated in parentheses. (c) Summary statistics of the studied trees. Height and DBH were directly measured for two individuals of S. laevis and S. leprosula. Age was estimated as DBH divided by a mean annual increment (MAI). (d) Summary statistics of genome assemblies for S. laevis and S. leprosula. We assembled the genome using DNA extracted from the apical leaf at branch 1–1 of the tallest individual of each species (S1 and F1). Summary statistics of genome assemblies are listed here. (e) Summary statistics of whole genome sequencing. (f) The number of candidate SNVs during each step of the filtering process. (g) Assessment of candidate SNVs using amplicon sequencing. (h) Somatic mutation rates. The somatic mutation rate per nucleotide per meter was estimated as μg=b2×R, where b indicates the slope of linear regression. The somatic mutation per nucleotide per year (μy) was estimated as μy=M2×R×A, where M indicates the total number of SNVs accumulated from the base to the branch tip and A represents tree age, respectively. R denotes the number of callable sites. (i) Cosine similarity of mutation spectra between Shorea trees and humans. (j) Results of the binomial test for selection on somatic and inter-individual SNVs. To test whether somatic and inter-individual SNVs are subject to selection, we calculated the expected rate of non-synonymous mutation. Given the observed number of non-synonymous and synonymous mutations, we rejected the null hypothesis of neutral selection using a binomial test with the significance level of 5%. pN_expected and pN_observed represent the expected and observed rate of non-synonymous substitutions. (k) The final set of SNVs. (l) Fractions of synonymous, missense, and nonsense substitutions. (m) Somatic mutation rates for six substitution classes. Somatic mutation rates for six substitution classes were calculated based on the observed number of SNVs both from the intergenic region and the whole genome. S1 +S2 and F1 +F2 represent the use of pooled data from two individuals for each species: S. laevis (S1, S2) and S. leprosula (F1, F2). The values based on the pooled data (indicated in bold type) were used to calculate the expected rate of non-synonymous mutation. (n) List of genes with somatic SNVs.

elife-88456-supp1.xlsx (201.3KB, xlsx)
MDAR checklist

Data availability

The raw sequencing data, the genome assembly, and the gene annotation are available at DDBJ under accessions PRJDB14538 for S. laevis and PRJDB15012 for S. leprosula.The codes for the bioinformatics pipeline to process whole genome sequencing data is available from https://github.com/ku-biomath/Shorea_mutation_detection (copy archived at ku-biomath, 2023).

The following datasets were generated:

Satake A, Imai R, Fujino T, Tomimoto S, Ohta K, Na'iem M, Indrioko S, Widiyatno PS, Mollá-Morales A, Nizhynska V, Tani N, Suyama Y, Sasaki E, Kasahara M. 2023. Genetic mosaicism and somatic mutation rate in tropical trees. DNA Data Bank of Japan. PRJDB14538

Satake A, Imai R, Fujino T, Tomimoto S, Ohta K, Na'iem M, Indrioko S, Widiyatno PS, Mollá-Morales A, Nizhynska V, Tani N, Suyama Y, Sasaki E, Kasahara M. 2023. Genetic mosaicism and somatic mutation rate in tropical trees. DNA Data Bank of Japan. PRJDB15012

References

  1. Abascal F, Harvey LMR, Mitchell E, Lawson ARJ, Lensing SV, Ellis P, Russell AJC, Alcantara RE, Baez-Ortega A, Wang Y, Kwa EJ, Lee-Six H, Cagan A, Coorens THH, Chapman MS, Olafsson S, Leonard S, Jones D, Machado HE, Davies M, Øbro NF, Mahubani KT, Allinson K, Gerstung M, Saeb-Parsy K, Kent DG, Laurenti E, Stratton MR, Rahbari R, Campbell PJ, Osborne RJ, Martincorena I. Somatic mutation landscapes at single-molecule resolution. Nature. 2021;593:405–410. doi: 10.1038/s41586-021-03477-4. [DOI] [PubMed] [Google Scholar]
  2. Alexandrov LB, Nik-Zainal S, Wedge DC, Campbell PJ, Stratton MR. Deciphering signatures of mutational processes operative in human cancer. Cell Reports. 2013;3:246–259. doi: 10.1016/j.celrep.2012.12.008. [DOI] [PMC free article] [PubMed] [Google Scholar]
  3. Alexandrov LB, Jones PH, Wedge DC, Sale JE, Campbell PJ, Nik-Zainal S, Stratton MR. Clock-like mutational processes in human somatic cells. Nature Genetics. 2015;47:1402–1407. doi: 10.1038/ng.3441. [DOI] [PMC free article] [PubMed] [Google Scholar]
  4. Alexandrov LB, Kim J, Haradhvala NJ, Huang MN, Tian Ng AW, Wu Y, Boot A, Covington KR, Gordenin DA, Bergstrom EN, Islam SMA, Lopez-Bigas N, Klimczak LJ, McPherson JR, Morganella S, Sabarinathan R, Wheeler DA, Mustonen V, Getz G, Rozen SG, Stratton MR. The repertoire of mutational signatures in human cancer. Nature. 2020;578:94–101. doi: 10.1038/s41586-020-1943-3. [DOI] [PMC free article] [PubMed] [Google Scholar]
  5. Bateman A. UniProt: the universal protein knowledgebase in 2021. Nucleic Acids Research. 2021;49:D480–D489. doi: 10.1093/nar/gkaa1100. [DOI] [PMC free article] [PubMed] [Google Scholar]
  6. Bradbury PJ, Zhang Z, Kroon DE, Casstevens TM, Ramdoss Y, Buckler ES. TASSEL: software for association mapping of complex traits in diverse samples. Bioinformatics. 2007;23:2633–2635. doi: 10.1093/bioinformatics/btm308. [DOI] [PubMed] [Google Scholar]
  7. Brůna T, Hoff KJ, Lomsadze A, Stanke M, Borodovsky M. BRAKER2: automatic eukaryotic genome annotation with GeneMark-EP+ and AUGUSTUS supported by a protein database. NAR Genomics and Bioinformatics. 2021;3:lqaa108. doi: 10.1093/nargab/lqaa108. [DOI] [PMC free article] [PubMed] [Google Scholar]
  8. Cagan A, Baez-Ortega A, Brzozowska N, Abascal F, Coorens THH, Sanders MA, Lawson ARJ, Harvey LMR, Bhosle S, Jones D, Alcantara RE, Butler TM, Hooks Y, Roberts K, Anderson E, Lunn S, Flach E, Spiro S, Januszczak I, Wrigglesworth E, Jenkins H, Dallas T, Masters N, Perkins MW, Deaville R, Druce M, Bogeska R, Milsom MD, Neumann B, Gorman F, Constantino-Casas F, Peachey L, Bochynska D, Smith ESJ, Gerstung M, Campbell PJ, Murchison EP, Stratton MR, Martincorena I. Somatic mutation rates scale with lifespan across mammals. Nature. 2022;604:517–524. doi: 10.1038/s41586-022-04618-z. [DOI] [PMC free article] [PubMed] [Google Scholar]
  9. Chen S, Zhou Y, Chen Y, Gu J. fastp: an ultra-fast all-in-one FASTQ preprocessor. Bioinformatic. 2018;34:i884–i890. doi: 10.1093/bioinformatics/bty560. [DOI] [PMC free article] [PubMed] [Google Scholar]
  10. Chen CJ. TBtools-II. v1.120GitHub. 2023 https://github.com/CJ-Chen/TBtools-II
  11. Cheng H, Concepcion GT, Feng X, Zhang H, Li H. Haplotype-resolved de novo assembly using phased assembly graphs with hifiasm. Nature Methods. 2021;18:170–175. doi: 10.1038/s41592-020-01056-5. [DOI] [PMC free article] [PubMed] [Google Scholar]
  12. Cooper DN, Krawczak M. Cytosine methylation and the fate of CpG dinucleotides in vertebrate genomes. Human Genetics. 1989;83:181–188. doi: 10.1007/BF00286715. [DOI] [PubMed] [Google Scholar]
  13. Coulondre C, Miller JH, Farabaugh PJ, Gilbert W. Molecular basis of base substitution hotspots in Escherichia coli. Nature. 1978;274:775–780. doi: 10.1038/274775a0. [DOI] [PubMed] [Google Scholar]
  14. Danecek P, Auton A, Abecasis G, Albers CA, Banks E, DePristo MA, Handsaker RE, Lunter G, Marth GT, Sherry ST, McVean G, Durbin R. The variant call format and VCFtools. Bioinformatics. 2011;27:2156–2158. doi: 10.1093/bioinformatics/btr330. [DOI] [PMC free article] [PubMed] [Google Scholar]
  15. Danecek P, Bonfield JK, Liddle J, Marshall J, Ohan V, Pollard MO, Whitwham A, Keane T, McCarthy SA, Davies RM, Li H. Twelve years of SAMtools and BCFtools. GigaScience. 2021;10:giab008. doi: 10.1093/gigascience/giab008. [DOI] [PMC free article] [PubMed] [Google Scholar]
  16. de Manuel M, Wu FL, Przeworski M. A paternal bias in germline mutation is widespread in amniotes and can arise independently of cell division numbers. eLife. 2022;11:e80008. doi: 10.7554/eLife.80008. [DOI] [PMC free article] [PubMed] [Google Scholar]
  17. Doyle JJ, Doyle JL. A rapid DNA isolation procedure for small quantities of fresh leaf tissue. Phytochemical Bulletin. 1987;19:11–15. [Google Scholar]
  18. Doyle J. DNA Protocols for Plants in Molecular Techniques in Taxonomy. Springer; 1991. [DOI] [Google Scholar]
  19. Duan Y, Yan J, Zhu Y, Zhang C, Tao X, Ji H, Zhang M, Wang X, Wang L. Limited accumulation of high-frequency somatic mutations in a 1700-year-old Osmanthus fragrans tree. Tree Physiology. 2022;42:2040–2049. doi: 10.1093/treephys/tpac058. [DOI] [PubMed] [Google Scholar]
  20. Duncan BK, Miller JH. Mutagenic deamination of cytosine residues in DNA. Nature. 1980;287:560–561. doi: 10.1038/287560a0. [DOI] [PubMed] [Google Scholar]
  21. Duret L, Galtier N. Biased gene conversion and the evolution of mammalian genomic landscapes. Annual Review of Genomics and Human Genetics. 2009;10:285–311. doi: 10.1146/annurev-genom-082908-150001. [DOI] [PubMed] [Google Scholar]
  22. Gabriel L, Hoff KJ, Brůna T, Borodovsky M, Stanke M. TSEBRA: transcript selector for BRAKER. BMC Bioinformatics. 2021;22:566. doi: 10.1186/s12859-021-04482-0. [DOI] [PMC free article] [PubMed] [Google Scholar]
  23. Gao Z, Wyman MJ, Sella G, Przeworski M. Interpreting the dependence of mutation rates on age and time. PLOS Biology. 2016;14:e1002355. doi: 10.1371/journal.pbio.1002355. [DOI] [PMC free article] [PubMed] [Google Scholar]
  24. Ghazoul J. Dipterocarp Biology, Ecology, and Conservation. Oxford University Press; 2016. [DOI] [Google Scholar]
  25. Gill DE, Chao L, Perkins SL, Wolf JB. Genetic mosaicism in plants and clonal animals. Annual Review of Ecology and Systematics. 1995;26:423–444. doi: 10.1146/annurev.es.26.110195.002231. [DOI] [Google Scholar]
  26. Greenman C, Wooster R, Futreal PA, Stratton MR, Easton DF. Statistical analysis of pathogenicity of somatic mutations in cancer. Genetics. 2006;173:2187–2198. doi: 10.1534/genetics.105.044677. [DOI] [PMC free article] [PubMed] [Google Scholar]
  27. Hanlon VCT, Otto SP, Aitken SN. Somatic mutations substantially increase the per-generation mutation rate in the conifer Picea sitchensis. Evolution Letters. 2019;3:348–358. doi: 10.1002/evl3.121. [DOI] [PMC free article] [PubMed] [Google Scholar]
  28. Hart AJ, Ginzburg S, Xu MS, Fisher CR, Rahmatpour N, Mitton JB, Paul R, Wegrzyn JL. EnTAP: Bringing faster and smarter functional annotation to non-model eukaryotic transcriptomes. Molecular Ecology Resources. 2020;20:591–604. doi: 10.1111/1755-0998.13106. [DOI] [PubMed] [Google Scholar]
  29. Hoff K. TSEBRA: transcript selector for BRAKER. 0e6c9bfGitHub. 2022 https://github.com/Gaius-Augustus/TSEBRA
  30. Hofmeister BT, Denkena J, Colomé-Tatché M, Shahryary Y, Hazarika R, Grimwood J, Mamidi S, Jenkins J, Grabowski PP, Sreedasyam A, Shu S, Barry K, Lail K, Adam C, Lipzen A, Sorek R, Kudrna D, Talag J, Wing R, Hall DW, Jacobsen D, Tuskan GA, Schmutz J, Johannes F, Schmitz RJ. A genome assembly and the somatic genetic and epigenetic mutation rate in a wild long-lived perennial Populus trichocarpa. Genome Biology. 2020;21:259. doi: 10.1186/s13059-020-02162-5. [DOI] [PMC free article] [PubMed] [Google Scholar]
  31. Iwasa Y, Tomimoto S, Satake A. The genetic structure within a single tree is determined by the behavior of the stem cells in the meristem. Genetics. 2023;223:iyad020. doi: 10.1093/genetics/iyad020. [DOI] [PubMed] [Google Scholar]
  32. Kim D, Paggi JM, Park C, Bennett C, Salzberg SL. Graph-based genome alignment and genotyping with HISAT2 and HISAT-genotype. Nature Biotechnology. 2019;37:907–915. doi: 10.1038/s41587-019-0201-4. [DOI] [PMC free article] [PubMed] [Google Scholar]
  33. Kimura M, Ota T. On the rate of molecular evolution. Journal of Molecular Evolution. 1971;1:1–17. doi: 10.1007/BF01659390. [DOI] [PubMed] [Google Scholar]
  34. Kimura M. The Neutral Theory of Molecular Evolution. Cambridge University Press; 1983. [DOI] [Google Scholar]
  35. Kokot M, Dlugosz M, Deorowicz S. KMC 3: counting and manipulating k-mer statistics. Bioinformatics. 2017;33:2759–2761. doi: 10.1093/bioinformatics/btx304. [DOI] [PubMed] [Google Scholar]
  36. Kolmogorov M, Yuan J, Lin Y, Pevzner PA. Assembly of long, error-prone reads using repeat graphs. Nature Biotechnology. 2019;37:540–546. doi: 10.1038/s41587-019-0072-8. [DOI] [PubMed] [Google Scholar]
  37. ku-biomath Shorea_mutation_detection. swh:1:rev:5bdd43519ddf9ecdb529076b0def492f4b6295cdSoftware Heriatge. 2023 https://archive.softwareheritage.org/swh:1:dir:ecc02195497b690d48207d9862c66308cafadbc4;origin=https://github.com/ku-biomath/Shorea_mutation_detection;visit=swh:1:snp:bc71973e4a1cb0832a554fa2db9b283c21b155e5;anchor=swh:1:rev:5bdd43519ddf9ecdb529076b0def492f4b6295cd
  38. Kundu R, Casey J, Sung WK. HyPo: super fast & accurate polisher for long read genome assemblies. Bioinformatics. 2019;1:e2506. doi: 10.1101/2019.12.19.882506. [DOI] [Google Scholar]
  39. Lanfear R. Do plants have a segregated germline? PLOS Biology. 2018;16:e2005439. doi: 10.1371/journal.pbio.2005439. [DOI] [PMC free article] [PubMed] [Google Scholar]
  40. Langmead B, Salzberg SL. Fast gapped-read alignment with Bowtie 2. Nature Methods. 2012;9:357–359. doi: 10.1038/nmeth.1923. [DOI] [PMC free article] [PubMed] [Google Scholar]
  41. Lee-Six H, Olafsson S, Ellis P, Osborne RJ, Sanders MA, Moore L, Georgakopoulos N, Torrente F, Noorani A, Goddard M, Robinson P, Coorens THH, O’Neill L, Alder C, Wang J, Fitzgerald RC, Zilbauer M, Coleman N, Saeb-Parsy K, Martincorena I, Campbell PJ, Stratton MR. The landscape of somatic mutation in normal colorectal epithelial cells. Nature. 2019;574:532–537. doi: 10.1038/s41586-019-1672-7. [DOI] [PubMed] [Google Scholar]
  42. Li H, Handsaker B, Wysoker A, Fennell T, Ruan J, Homer N, Marth G, Abecasis G, Durbin R. The sequence alignment/map format and SAMtools. Bioinformatics. 2009;25:2078–2079. doi: 10.1093/bioinformatics/btp352. [DOI] [PMC free article] [PubMed] [Google Scholar]
  43. Li H. A statistical framework for SNP calling, mutation discovery, association mapping and population genetical parameter estimation from sequencing data. Bioinformatics. 2011;27:2987–2993. doi: 10.1093/bioinformatics/btr509. [DOI] [PMC free article] [PubMed] [Google Scholar]
  44. Li H. Toward better understanding of artifacts in variant calling from high-coverage samples. Bioinformatics. 2014;30:2843–2851. doi: 10.1093/bioinformatics/btu356. [DOI] [PMC free article] [PubMed] [Google Scholar]
  45. Liu H, Zhang J. Yeast spontaneous mutation rate and spectrum vary with environment. Current Biology. 2019;29:1584–1591. doi: 10.1016/j.cub.2019.03.054. [DOI] [PMC free article] [PubMed] [Google Scholar]
  46. Makova KD, Li WH. Strong male-driven evolution of DNA sequences in humans and apes. Nature. 2002;416:624–626. doi: 10.1038/416624a. [DOI] [PubMed] [Google Scholar]
  47. Manni M, Berkeley MR, Seppey M, Simão FA, Zdobnov EM. BUSCO Update: novel and streamlined workflows along with broader and deeper phylogenetic coverage for scoring of eukaryotic, prokaryotic, and viral genomes. Molecular Biology and Evolution. 2021;38:4647–4654. doi: 10.1093/molbev/msab199. [DOI] [PMC free article] [PubMed] [Google Scholar]
  48. Martincorena I, Raine KM, Gerstung M, Dawson KJ, Haase K, Van Loo P, Davies H, Stratton MR, Campbell PJ. Universal patterns of selection in cancer and somatic tissues. Cell. 2017;171:1029–1041. doi: 10.1016/j.cell.2017.09.042. [DOI] [PMC free article] [PubMed] [Google Scholar]
  49. McKenna A, Hanna M, Banks E, Sivachenko A, Cibulskis K, Kernytsky A, Garimella K, Altshuler D, Gabriel S, Daly M, DePristo MA. The genome analysis toolkit: a mapreduce framework for analyzing next-generation DNA sequencing data. Genome Research. 2010;20:1297–1303. doi: 10.1101/gr.107524.110. [DOI] [PMC free article] [PubMed] [Google Scholar]
  50. Ng CH, Lee SL, Tnah LH, Ng KKS, Lee CT, Madon M. Genome size variation and evolution in Dipterocarpaceae. Plant Ecology & Diversity. 2016;9:437–446. doi: 10.1080/17550874.2016.1267274. [DOI] [Google Scholar]
  51. Ng KKS, Kobayashi MJ, Fawcett JA, Hatakeyama M, Paape T, Ng CH, Ang CC, Tnah LH, Lee CT, Nishiyama T, Sese J, O’Brien MJ, Copetti D, Isa MNM, Ong RC, Putra M, Siregar IZ, Indrioko S, Kosugi Y, Izuno A, Isagi Y, Lee SL, Shimizu KK. The genome of Shorea leprosula (Dipterocarpaceae) highlights the ecological relevance of drought in aseasonal tropical rainforests. Communications Biology. 2021;4:1166. doi: 10.1038/s42003-021-02682-1. [DOI] [PMC free article] [PubMed] [Google Scholar]
  52. Nik-Zainal S, Davies H, Staaf J, Ramakrishna M, Glodzik D, Zou X, Martincorena I, Alexandrov LB, Martin S, Wedge DC, Van Loo P, Ju YS, Smid M, Brinkman AB, Morganella S, Aure MR, Lingjærde OC, Langerød A, Ringnér M, Ahn SM, Boyault S, Brock JE, Broeks A, Butler A, Desmedt C, Dirix L, Dronov S, Fatima A, Foekens JA, Gerstung M, Hooijer GKJ, Jang SJ, Jones DR, Kim HY, King TA, Krishnamurthy S, Lee HJ, Lee JY, Li Y, McLaren S, Menzies A, Mustonen V, O’Meara S, Pauporté I, Pivot X, Purdie CA, Raine K, Ramakrishnan K, Rodríguez-González FG, Romieu G, Sieuwerts AM, Simpson PT, Shepherd R, Stebbings L, Stefansson OA, Teague J, Tommasi S, Treilleux I, Van den Eynden GG, Vermeulen P, Vincent-Salomon A, Yates L, Caldas C, van’t Veer L, Tutt A, Knappskog S, Tan BKT, Jonkers J, Borg Å, Ueno NT, Sotiriou C, Viari A, Futreal PA, Campbell PJ, Span PN, Van Laere S, Lakhani SR, Eyfjord JE, Thompson AM, Birney E, Stunnenberg HG, van de Vijver MJ, Martens JWM, Børresen-Dale AL, Richardson AL, Kong G, Thomas G, Stratton MR. Landscape of somatic mutations in 560 breast cancer whole-genome sequences. Nature. 2016;534:47–54. doi: 10.1038/nature17676. [DOI] [PMC free article] [PubMed] [Google Scholar]
  53. O’Leary NA, Wright MW, Brister JR, Ciufo S, Haddad D, McVeigh R, Rajput B, Robbertse B, Smith-White B, Ako-Adjei D, Astashyn A, Badretdin A, Bao Y, Blinkova O, Brover V, Chetvernin V, Choi J, Cox E, Ermolaeva O, Farrell CM, Goldfarb T, Gupta T, Haft D, Hatcher E, Hlavina W, Joardar VS, Kodali VK, Li W, Maglott D, Masterson P, McGarvey KM, Murphy MR, O’Neill K, Pujar S, Rangwala SH, Rausch D, Riddick LD, Schoch C, Shkeda A, Storz SS, Sun H, Thibaud-Nissen F, Tolstoy I, Tully RE, Vatsan AR, Wallin C, Webb D, Wu W, Landrum MJ, Kimchi A, Tatusova T, DiCuccio M, Kitts P, Murphy TD, Pruitt KD. Reference sequence (RefSeq) database at NCBI: current status, taxonomic expansion, and functional annotation. Nucleic Acids Research. 2016;44:D733–D745. doi: 10.1093/nar/gkv1189. [DOI] [PMC free article] [PubMed] [Google Scholar]
  54. Orr AJ, Padovan A, Kainer D, Külheim C, Bromham L, Bustos-Segura C, Foley W, Haff T, Hsieh J-F, Morales-Suarez A, Cartwright RA, Lanfear R. A phylogenomic approach reveals A low somatic mutation rate in A long-lived plant. Proceedings of the Royal Society B. 2020;287:20192364. doi: 10.1098/rspb.2019.2364. [DOI] [PMC free article] [PubMed] [Google Scholar]
  55. Ossowski S, Schneeberger K, Lucas-Lledó JI, Warthmann N, Clark RM, Shaw RG, Weigel D, Lynch M. The rate and molecular spectrum of spontaneous mutations in Arabidopsis thaliana. Science. 2010;327:92–94. doi: 10.1126/science.1180677. [DOI] [PMC free article] [PubMed] [Google Scholar]
  56. Ou S, Su W, Liao Y, Chougule K, Agda JRA, Hellinga AJ, Lugo CSB, Elliott TA, Ware D, Peterson T, Jiang N, Hirsch CN, Hufford MB. Benchmarking transposable element annotation methods for creation of a streamlined, comprehensive pipeline. Genome Biology. 2019;20:275. doi: 10.1186/s13059-019-1905-y. [DOI] [PMC free article] [PubMed] [Google Scholar]
  57. Plomion C, Aury JM, Amselem J, Leroy T, Murat F, Duplessis S, Faye S, Francillonne N, Labadie K, Le Provost G, Lesur I, Bartholomé J, Faivre-Rampant P, Kohler A, Leplé JC, Chantret N, Chen J, Diévart A, Alaeitabar T, Barbe V, Belser C, Bergès H, Bodénès C, Bogeat-Triboulot MB, Bouffaud ML, Brachi B, Chancerel E, Cohen D, Couloux A, Da Silva C, Dossat C, Ehrenmann F, Gaspin C, Grima-Pettenati J, Guichoux E, Hecker A, Herrmann S, Hugueney P, Hummel I, Klopp C, Lalanne C, Lascoux M, Lasserre E, Lemainque A, Desprez-Loustau ML, Luyten I, Madoui MA, Mangenot S, Marchal C, Maumus F, Mercier J, Michotey C, Panaud O, Picault N, Rouhier N, Rué O, Rustenholz C, Salin F, Soler M, Tarkka M, Velt A, Zanne AE, Martin F, Wincker P, Quesneville H, Kremer A, Salse J. Oak genome reveals facets of long lifespan. Nature Plants. 2018;4:440–452. doi: 10.1038/s41477-018-0172-3. [DOI] [PMC free article] [PubMed] [Google Scholar]
  58. Powell S, Forslund K, Szklarczyk D, Trachana K, Roth A, Huerta-Cepas J, Gabaldón T, Rattei T, Creevey C, Kuhn M, Jensen LJ, von Mering C, Bork P. eggNOG v4.0: nested orthology inference across 3686 organisms. Nucleic Acids Research. 2014;42:D231–D239. doi: 10.1093/nar/gkt1253. [DOI] [PMC free article] [PubMed] [Google Scholar]
  59. Praptoyo H, Mayaningsih R. Anatomical features of wood from some fast growing red meranti. Proceeding of the 4th International Symposium of IWoRs.2012. [Google Scholar]
  60. Reijns MAM, Kemp H, Ding J, de Procé SM, Jackson AP, Taylor MS. Lagging-strand replication shapes the mutational landscape of the genome. Nature. 2015;518:502–506. doi: 10.1038/nature14183. [DOI] [PMC free article] [PubMed] [Google Scholar]
  61. Reusch TBH, Baums IB, Werner B. Evolution via somatic genetic variation in modular species. Trends in Ecology & Evolution. 2021;36:1083–1092. doi: 10.1016/j.tree.2021.08.011. [DOI] [PubMed] [Google Scholar]
  62. Robinson JT, Thorvaldsdóttir H, Wenger AM, Zehir A, Mesirov JP. Variant review with the integrative genomics viewer. Cancer Research. 2017;77:e31–e34. doi: 10.1158/0008-5472.CAN-17-0337. [DOI] [PMC free article] [PubMed] [Google Scholar]
  63. Schmid-Siegert E, Sarkar N, Iseli C, Calderon S, Gouhier-Darimont C, Chrast J, Cattaneo P, Schütz F, Farinelli L, Pagni M, Schneider M, Voumard J, Jaboyedoff M, Fankhauser C, Hardtke CS, Keller L, Pannell JR, Reymond A, Robinson-Rechavi M, Xenarios I, Reymond P. Low number of fixed somatic mutations in a long-lived oak tree. Nature Plants. 2017;3:926–929. doi: 10.1038/s41477-017-0066-9. [DOI] [PubMed] [Google Scholar]
  64. Schoen DJ, Schultz ST. Somatic mutation and evolution in plants. Annual Review of Ecology, Evolution, and Systematics. 2019;50:49–73. doi: 10.1146/annurev-ecolsys-110218-024955. [DOI] [Google Scholar]
  65. Shen W, Le S, Li Y, Hu F. SeqKit: a cross-platform and ultrafast toolkit for fasta/q file manipulation. PLOS ONE. 2016;11:e0163962. doi: 10.1371/journal.pone.0163962. [DOI] [PMC free article] [PubMed] [Google Scholar]
  66. Smit A, Hubley R, Green P. Institute for Systems Biology; 2021. http://www.repeatmasker.org [Google Scholar]
  67. Suyama Y, Hirota SK, Matsuo A, Tsunamoto Y, Mitsuyuki C, Shimura A, Okano K. Complementary combination of multiplex high‐throughput DNA sequencing for molecular phylogeny. Ecological Research. 2022;37:171–181. doi: 10.1111/1440-1703.12270. [DOI] [Google Scholar]
  68. Tamura K, Stecher G, Kumar S. MEGA11: molecular evolutionary genetics analysis version 11. Molecular Biology and Evolution. 2021;38:3022–3027. doi: 10.1093/molbev/msab120. [DOI] [PMC free article] [PubMed] [Google Scholar]
  69. Tomasetti C, Vogelstein B. Variation in cancer risk among tissues can be explained by the number of stem cell divisions. Science. 2015;347:78–81. doi: 10.1126/science.1260825. [DOI] [PMC free article] [PubMed] [Google Scholar]
  70. Tomimoto S, Satake A. Modelling somatic mutation accumulation and expansion in a long-lived tree with hierarchical modular architecture. Journal of Theoretical Biology. 2023;565:111465. doi: 10.1016/j.jtbi.2023.111465. [DOI] [PubMed] [Google Scholar]
  71. Toyama H, Kajisa T, Tagane S, Mase K, Chhang P, Samreth V, Ma V, Sokh H, Ichihashi R, Onoda Y, Mizoue N, Yahara T. Effects of logging and recruitment on community phylogenetic structure in 32 permanent forest plots of Kampong Thom, Cambodia. Philosophical Transactions of the Royal Society of London. Series B, Biological Sciences. 2015;370:20140008. doi: 10.1098/rstb.2014.0008. [DOI] [PMC free article] [PubMed] [Google Scholar]
  72. Usami K. Tropical woods as pulp stuffs. Journal of Agricultural Research Quarterly. 1978;12:109–114. [Google Scholar]
  73. Vasimuddin M, Misra S, Li H, Aluru S. Efficient architecture-aware acceleration of BWA-MEM for multicore systems. Proceedings - 2019 IEEE 33rd 23775 International Parallel and Distributed Processing Symposium, IPDPS 2019; 2019. pp. 314–324. [DOI] [Google Scholar]
  74. Vurture GW, Sedlazeck FJ, Nattestad M, Underwood CJ, Fang H, Gurtowski J, Schatz MC. GenomeScope: fast reference-free genome profiling from short reads. Bioinformatics. 2017;33:2202–2204. doi: 10.1093/bioinformatics/btx153. [DOI] [PMC free article] [PubMed] [Google Scholar]
  75. Wang L, Ji Y, Hu Y, Hu H, Jia X, Jiang M, Zhang X, Zhao L, Zhang Y, Jia Y, Qin C, Yu L, Huang J, Yang S, Hurst LD, Tian D. The architecture of intra-organism mutation rate variation in plants. PLOS Biology. 2019;17:e3000191. doi: 10.1371/journal.pbio.3000191. [DOI] [PMC free article] [PubMed] [Google Scholar]
  76. Watson JM, Platzer A, Kazda A, Akimcheva S, Valuchova S, Nizhynska V, Nordborg M, Riha K. Germline replications and somatic mutation accumulation are independent of vegetative life span in Arabidopsis. PNAS. 2016;113:12226–12231. doi: 10.1073/pnas.1609686113. [DOI] [PMC free article] [PubMed] [Google Scholar]
  77. Whitham TG, Slobodchikoff CN. Evolution by individuals, plant-herbivore interactions, and mosaics of genetic variability: the adaptive significance of somatic mutations in plants. Oecologia. 1981;49:287–292. doi: 10.1007/BF00347587. [DOI] [PubMed] [Google Scholar]
  78. Widiyatno W, Soekotjo S, Naiem M, Purnomo S, Setiyanto PE. Early performance of 23 dipterocarp species planted in logged-over rainforest. Journal of Tropical Forest Science. 2014;26:259–266. [Google Scholar]
  79. Yeoh SH, Satake A, Numata S, Ichie T, Lee SL, Basherudin N, Muhammad N, Kondo T, Otani T, Hashim M, Tani N. Unravelling proximate cues of mass flowering in the tropical forests of South-East Asia from gene expression analyses. Molecular Ecology. 2017;26:5074–5085. doi: 10.1111/mec.14257. [DOI] [PubMed] [Google Scholar]
  80. Yu L, Boström C, Franzenburg S, Bayer T, Dagan T, Reusch TBH. Somatic genetic drift and multilevel selection in a clonal seagrass. Nature Ecology & Evolution. 2020;4:952–962. doi: 10.1038/s41559-020-1196-4. [DOI] [PubMed] [Google Scholar]
  81. Zdobnov EM, Kuznetsov D, Tegenfeldt F, Manni M, Berkeley M, Kriventseva EV. OrthoDB in 2020: evolutionary and functional annotations of orthologs. Nucleic Acids Research. 2021;49:D389–D393. doi: 10.1093/nar/gkaa1009. [DOI] [PMC free article] [PubMed] [Google Scholar]
  82. Zuckerkandl E, Pauling L. Evolving Genes and Proteins. Academic Press; 1965. [DOI] [Google Scholar]

eLife assessment

Wenfeng Qian 1

Satake and colleagues' important study elucidates somatic mutation processes in plants, demonstrating that in two tropical trees, mutation rates correlate with age, not growth rates. Their convincing evidence shows that many mutations do not align with cell divisions, suggesting many somatic mutations are generated in a replication-independent manner. This study represents a significant step towards advancing our understanding of plant development and the patterns and inheritance of mutations. This significant research is poised to engage a diverse array of scholars in plant evolution and development.

Reviewer #1 (Public review):

Anonymous

In this study, Satake and colleagues endeavored to explore the rates and patterns of somatic mutations in wild plants, with a focus on their relationship to longevity. The researchers examined slow- and fast-growing tropical tree species, demonstrating that slow-growing species exhibited five times more mutations than their fast-growing counterparts. The number of somatic mutations was found to increase linearly with branch length. Interestingly, the somatic mutation rate per meter was higher in slow-growing species, but the rate per year remained consistent across both species. A closer inspection revealed a prevalence of clock-like spontaneous mutations, specifically cytosine-to-thymine substitutions at CpG sites. The author suggested that somatic mutations were identified as neutral within an individual, but subject to purifying selection when transmitted to subsequent generations. The authors developed a model to assess the influence of cell division on mutational processes, suggesting that cell-division independent mutagenesis is the primary mechanism.

The authors have gathered valuable data on somatic mutations, particularly regarding differences in growth rates among trees. Their meticulous computational analysis led to fascinating conclusions, primarily that most somatic mutations accumulate in a cell-division independent manner. The discovery of a molecular clock in somatic mutations significantly advances our comprehension of mutational processes that may generate genetic diversity in tropical ecosystems. The interpretation of the data appears to be based on the assumption that somatic mutations can be effectively transmitted to the next generation unless negative selection intervenes. However, accumulating evidence suggests that plants may also possess "effective germlines," which could render the somatic mutations detected in this study non-transmittable to progeny. Incorporating additional analyses/discussion in the context of plant developmental biology, particularly recent studies on cell lineage, could further enhance this study.

Specifically, several recent studies address the topics of effective germline in plants. For instance, Robert Lanfear published an article in PLoS Biology exploring the fundamental question, "Do plants have a segregated germline?" A study in PNAS posited that "germline replications and somatic mutation accumulation are independent of vegetative life span in Arabidopsis." A phylogenetic-based analysis titled "Rates of Molecular Evolution Are Linked to Life History in Flowering Plants" discovered that "rates of molecular evolution are consistently low in trees and shrubs, with relatively long generation times, as compared with related herbaceous plants, which generally have shorter generation times." Another compelling study, "The architecture of intra-organism mutation rate variation in plants," published in PLoS Biology, detected somatic mutations in peach trees and strawberries. Although some of these studies are cited in the current work, a deeper examination of the findings in relation to the existing literature would strengthen the interpretation of the data.

Reviewer #2 (Public review):

Anonymous

In this manuscript, the authors used an original empirical design to test if somatic mutation rates are different depending on the plant growth rates. They detected somatic mutations along the growth axes of four trees - two individuals per species for two dipterocarp tree species growing at different rates. They found here that plant somatic mutations are accumulated are a relatively constant rate per year in the two species, suggesting that somatic mutation rates correlate with time rather than with growth, i.e. the number of cell divisions. The authors then suggest that this result is consistent with a low relative contribution of DNA replication errors (referred to as α in the manuscript) to the somatic mutation rates as compared to the other sources of mutations (β). Given that plants - in particular, trees - are generally assumed to deviate from the August Weismann's theory (a part of the somatic variation is expected to be transmitted to the next generation), this work could be of interest for a large readership interested by mutation rates as a whole, since it has implications also for heritable mutation rates too. In addition, even if this is not discussed, the putatively low contribution of DNA replication errors could help to understand the apparent paradox associated to trees. Indeed, trees exhibit clear signatures of lower molecular evolution (Lanfear et al. 2013), therefore suggesting lower mutation rates per unit of time. Trees could partly keep somatic mutations under control thanks to a long-term evolution towards low α values, resulting in low α/β ratios as compared to short-lived species. I therefore consider that the paper tackles a fundamental albeit complex question in the field.

Overall, I consider that the authors should clearly indicate the weaknesses of the studies. For instance, because of the bioinformatic tools used, they have reasonably detected a small part of the somatic mutations, those that have reached a high allele frequency in tissues. Mutation counts are known to be highly dependent on the experimental design and the methods used. Consequently, (i) this should be explicit and (ii) a particular effort should be made to demonstrate that the observed differences in mutation counts are robust to the potential experimental biases. This is important since, empirically, we know how mutation counts can vary depending on the experimental designs. For instance, a difference of an order of magnitude has been observed between the two papers focusing on oaks (Schmid-Siegert et al. 2017 and Plomion et al. 2018) and this difference is now known to be due to the differences in the experimental designs, in particular the sequencing effort (Schmitt et al. 2022).

Having said that, my overall opinion is that (i) the authors have worked on an interesting design and generated unique data, (ii) the results are probably robust to some biases and therefore strong enough (but see my comments regarding possible improvements), (iii) the interpretations are reasonable and (iv) the discussion regarding the source of somatic mutations is valuable (even if I also made some suggestions here also).

Reviewer #3 (Public review):

Anonymous

In animals, several recent studies have revealed a substantial role for non-replicative mutagenic processes such as DNA damage and repair rather than replicative error as was previously believed. Much less is known about how mutation operates in plants, with only a handful of studies devoted to the topic. Authors Satake et al. aimed to address this gap in our understanding by comparing the rates and patterns of somatic mutation in a pair of tropical tree species, slow-growing Shorea lavis and fast-growing S. leprosula. They find that the yearly somatic mutation rates in the two species is highly similar despite their difference in growth rates. The authors further find that the mutation spectrum is enriched for signatures of spontaneous mutation and that a model of mutation arising from different sources is consistent with a large input of mutation from sources uncorrelated with cell division. The authors conclude that somatic mutation rates in these plants appears to be dictated by time, not cell division numbers, a finding that is in line with other eukaryotes studied so far.

In general, this work shows careful consideration and study design, and the multiple lines of evidence presented provide good support for the authors' conclusions. In particular, they use a sound approach to identify rare somatic mutations in the sampled trees including biological replicates, multiple SNP-callers and thresholds, and without presumption of a branching pattern.

Inter-species comparisons of absolute mutation rates is challenging. This is largely due to differences in SNP-calling methods and reference genome quality leading to variable sensitivity and specificity in identifying mutations. By applying their pipeline consistently across both species, the authors provide confidence in the comparative mutation rate results. Moreover, the presented false negative and false positive rate estimates for each species would apparently have minimal impact on the overall findings.

Despite the overall elegance of the authors' experimental setup, one methodological wrinkle warrants consideration. The authors compare the mutation rate per meter of growth, demonstrating that the rate is higher in slow-growing S. laevis: a key piece of evidence in favor of the authors' conclusion that somatic mutations track absolute time rather than cell division. To estimate the mutation rate per unit distance, they regress the per base-pair rate of mutations found between all pairwise branch tips against the physical distance separating the tips (Fig. 2a). While a regression approach is appropriate, the narrowness of the confidence interval is overstated as the points are not statistically independent: internal branches are represented multiple times. (For example, all pairwise comparisons involving a cambium sample will include the mutations arising along the lower trunk.) Regressing rates and lengths of distinct branches might be more appropriate. Judging from the data presented, however, the point estimates seem unlikely to change much.

This work deepens our understanding of how mutation operates at the cellular level by adding plants to the list of eukaryotes in which many mutations appear to derive from non-replicative sources. Given these results, it is intriguing to consider whether there is a fundamental mechanism linking mutation across distantly related species. Plants, generally, present a unique opportunity in the study of mutation as the germline is not sequestered, as it is in animals, and thus the forces of both mutation and selection acting throughout an individual plant's life could in principle affect the mutations transmitted to seed. The authors touch on this aspect, finding no evidence for a reduction in non-synonymous somatic mutations relative to the background rate, but more work-both experimental and observational-is needed to understand the dynamics of mutation and cell-competition within an individual plant. Overall, these results open the door to several intriguing questions in plant mutation. For example, is somatic mutation age-dependent in other species, and do other tropical plants harbor a high mutation rate relative to temperate genera? Any future inquiries on this topic would benefit from modeling their approach for identifying somatic mutations on the methods laid out here.

eLife. 2024 Oct 23;12:RP88456. doi: 10.7554/eLife.88456.3.sa4

Author response

Akiko Satake 1, Ryosuke Imai 2, Takeshi Fujino 3, Sou Tomimoto 4, Kayoko Ohta 5, Mohammad Na'iem 6, Sapto Indrioko 7, Widiyatno Widiyatno 8, Susilo Purnomo 9, Almudena Molla Morales 10, Viktoria Nizhynska 11, Naoki Tani 12, Yoshihisa Suyama 13, Eriko Sasaki 14, Masahiro Kasahara 15

The following is the authors’ response to the original reviews.

Reviewer #1

1. Here are a few sentences that could potentially benefit from further discussion, particularly in the context of the plant developmental framework of an effective germline. It is important to note that the idea of an effective germline is supported by many, but not all, scientists. Nevertheless, as long as this concept remains relevant, a discussion based on it may be appropriate.

The early establishment of germlines during development is crucial in addressing the impact of somatic mutation on the next generation. To emphasize this aspect, we have included an additional sentence addressing this point in ll. 242–244.

1. Lines 161-163: The suggestion that long-lived tropical trees do not necessarily suppress somatic mutation rates to the same extent as their temperate counterparts might warrant additional examination.

We have revised our statement to present a more balanced perspective, and we have also included a sentence to emphasize the importance of conducting further studies in future.

1. Lines 200-202: The observation of potential influences of GC-biased gene conversion during meiosis or biased purifying selection for C>T inter-individual nucleotide substitutions could be further elaborated upon.

Our data does not provide enough information to delve into a more detailed discussion regarding GC-biased gene conversion during meiosis or biased purifying selection for C>T substitution. However, future studies that obtain genome sequences from somatic cells, male or female gametophytes, and offspring (such as seeds or seedlings) would offer opportunities to assess these phenomena.

1. Line 245: The statement "somatic mutations can be transmitted to seeds" might be correct, but it would be helpful to explore the extent to which this occurs.

In response to the comment from Reviewer 1 (#4) and 2 (#16), we have decided to remove the discussion about the heritability of somatic mutations in next generation. We have completely rewritten the final paragraph to discuss the possibility of a disparity in the relationship between lifespan and somatic mutation rates between plants and animals.

Reviewer #2

1. l. 108- 115: The authors seem to have made a really great work at assembling and annotating two reference genomes. Even if this does not represent the main result of the manuscript, these genomic resources are a plus for the community, especially given that reference genomes from tropical trees are known to be underrepresented in the literature (e.g. Plomion et al. 2016). The authors have made the particular effort of generating two high-quality reference genome assemblies for two species of the same genus, including one with an excellent contiguity. Even if they do not explicitly indicate the divergence time between the two species, it is clear that the cheapest solution would have been to map the reads of the two species against a single assembly, but this could have generated some biases. So by generating two de novo assemblies, the authors have used here the best design possible to control for some potential biases for the detection of somatic mutations. However, given the interests these two assemblies represent by themselves, I consider that a couple of additional investigations could have been made on local synteny and orthologous genes in particular. Thanks to whole-genome alignments and orthology (e.g. Lovell et al. 2022), they could have generated more general information regarding the two assembles and investigated additional questions regarding mutations, e.g. mutations in collinear / non-collinear (if any) segments, intensity of purifying selection (or neutral evolution) at single vs. multiple copies or between shared vs. private genes, etc.

To address the comment by Reviewer 2, we performed synteny analysis using the MCScanX in TBtools-II and added Supplementary Figure 3 to illustrate conserved synteny relationship between S. laevis and S. leprosula. Detecting selection in the genome will be a future study as our current data are not sufficient for the aim because of limited number of individuals (n = 2 for each species).

1. l. 123-124. Here, the authors indicate that they have "validated" 93.9% of the mutations. It would be more accurate to indicate that they have "validated" 31/33 mutations (94%), 22/24 mutations on S1 and 9/9 on S2 (Table S5). Can the authors indicate why no somatic mutations from the F1 and F2 were tested? According to me, the use of the word "validation" is not totally accurate (see also Schmitt et al. 2022), since amplicon sequencing can be viewed as a kind of validation but it doesn't represent a complete validation since it represents new sequencing data that are mapped against the same reference assembly, in such a way that we could always imagine that the same biases are at play, leading to a similarly false positive call. Reciprocally, a "non-validated" mutation could be associated to a mutation that is at a too low allele frequency, at least after amplification, in such a way that the call is not heterozygous despite the fact that the mutation is real. I think that another terminology than "validated" could be used, plus one or two sentences explaining this degree of complexity.

To improve the clarity of the statement, we have modified the sentence as follows: We conducted an independent evaluation of a subset of the inferred single nucleotide variants (SNVs) using amplicon sequencing. Our analysis demonstrated accurate annotation for 31 out of 33 mutations (94% overall), with 22 out of 24 mutations on S1 and all 9 mutations on S2 (Supplementary Table 5).”

While we did not conduct additional assessments using F1 and F2, we anticipate a similar high level of agreement between the somatic SNV calls and amplicon sequencing in these trees. We have included sentences in the Materials and Methods section to elucidate the challenges involved in validating true somatic mutations.

1. l. 135-137 the reasoning appears to be quite circular to me. As indicated by the authors in the line just before, an incongruent pattern could also be explained biologically, in such a way that the overall congruency between the phylogenetic tree and the tree architecture cannot be considered as a way to prove the reliability of the detection. In some species, it seems clear that the phylogenetic tree do not seem to follow the plant architecture (Zahradnikova et al. 2020) in such a way that we should argue to not consider the plant architecture in the design and not consider this represents either a way to validate mutations or a way to validate the methodological framework. I suggest removing this sentence.

We have removed the sentence as suggested by Reviewer 2.

1. l. 150. It seems that the differences in length and diameter between the two species come from two different studies and therefore that no statistical test has been performed to test its significance.

We agree with Reviewer 2. To clarify this point, we have replaced “significantly” with “substantially” in the revised text.

1. l. 156-159: the same sentence is repeated twice.

We have removed the repeated sentence.

1. l. 159-161: Comparing somatic mutation rates between studies is difficult. It is too sensitive to the methodology used, here again see Schmitt et al. 2022. I propose to remove these two sentences. It represents an interesting working hypothesis but would require a better design, or at least, to reanalyze all the data with the same pipeline.

We have toned down our statement, and added a sentence that additional studies are required to compare somatic mutation rates among trees in tropical, temperate, and boreal regions, employing standardized methodologies.

1. l. 171-175: Here I am wondering if the authors could provide more information regarding the enrichment at CpG sites? I suggest first estimating the proportion of CpG sites thanks to the two genome assemblies and then using this information as a way to weight the results and therefore to estimate the level of enrichment of mutations at CpG sites.

In response to the comment by Reviewer 2, we first determined the proportion of CpG sites as 0.030 and 0.028 for S. laevis and S. leprosula, respectively, based on the triplet matrix using the reference genome of each species. Subsequently, we estimated the proportion of somatic mutations at CpG sites. The results revealed a 4.54-fold and 3.53-fold increase in somatic mutations at CpG sites for S1 and S2, and a 3.38-fold and 2.56-fold increase for F1 and F2, respectively. We have incorporated this finding into ll. 172–175.

1. l. 176-187. Interesting comparison and insights. You could also indicate that SBS5 is also detected in all human cancers too. So the detection of SBS1 and SBS5 signatures indeed suggest some shared mutation biases. Note that in humans, a specific signature of UV is associated to TCG -> TTG mutations (Martincorena & Campbell, 2015). It seems that there is a substantial difference in the mutation spectra between the two trees for this specific category, note sure if this difference could be associated to UV.

We slightly modified the sentence to indicate that SBS5 is also detected in all human cancers. We are very interested in the potential impact of UV on somatic mutations in tropical trees, considering the high levels of UVR in the tropics. Conducting a comparative analysis of the mutational spectrum among trees inhabiting diverse UVR environments would provide valuable insights to substantiate this hypothesis.

1. l. 206: I rather suggest "the somatic mutation rate per year is roughly the same, suggesting that somatic mutations rates are independent of growth rate".

In response to the suggestion from Reviewer 2, we have revised the sentence as follows: "The somatic mutation rate per year remains largely consistent, indicating that somatic mutation rates are independent of the growth rate."

1. l. 207-232: Here, It is the section looks a mixture between a result and a discussion. I guess the authors consider here that it remains a verbal model at this stage and it therefore represents more a discussion. If so, I agree but it could be good to discuss more this part, in particular to know how this model could be improved and empirically tested.

The argument based on the model will be more accurate when the cell cycle duration can be directly estimated for each tree. We have added this explanation in the revised text.

1. l. 238-239: The parallel drawn with the molecular clock is interesting but according to me, it remains a working hypothesis at this stage, since it is not validated outside the two focal species. I encourage the readers to continue to work on this question and to investigate also some annual plants for instance in the future (assuming that they have a higher α) in order to be able to derive a global model. In addition, even if I consider that the authors use and interpret this parallel wisely, I consider that the use of this terminology could be misleading for some readers. That's why I also suggest removing "molecular clock" from the title and using a more explicit one, e.g. "Somatic mutation rates scale with time not growth rate in dipterocarp trees".

We agree with Reviewer 2. We have changed the title to “Somatic mutation rates scale with time not growth rate in long-lived tropical trees.”

1. l. 245-249: The results rather suggest that (i) there is little diversity due to somatic mutations and that (ii) most heritable non-synonymous mutations are deleterious and therefore purged from the population. So rather than this last section of this discussion that has little interest and could be quite debatable, I consider that the authors could extend their discussion, e.g. the differences with somatic mutations in mammals (recently, Cagan and coauthors (2022) demonstrated that somatic mutation rates are inversely correlated with lifespan in mammals) or the overall low rate of molecular evolution in trees could be some directions. But there are many others.

We have completely rewritten the final paragraph to propose the possibility of a disparity in the relationship between lifespan and somatic mutation rates between plants and animals, rather than discussing the heritability of somatic mutation in next generation.

1. l. 570-571: I guess, the reader should understand here "fixed at the heterozygous state"

To avoid confusion, we have modified the text as follows: “If the alternative allele was present or absent in all eight branches in the amplicon sequence, the site was determined as fixed within an individual tree.” We have also removed “heterozygote” in Supplementary Figure 5.

1. Fig. 4d. the y-axis would be easier to interpret by writing "Delta Inter-individual vs. Somatic SNPs" and/or by adding arrows on the right margin of the plot to indicate the directions with some short sentences such as "more somatic mutations observed than expected assuming the inter-individual comparison", "less somatic mutation than expected". According to me, some statistical tests are lacking here. Are the differences in the mutation spectra significant given the relatively limited amount of somatic mutations detected?

We have added short sentences explaining the directions.

1. Supplementary Tables (excel file): please correct the typos. There are many on these supplementary tables.

We carefully checked supplementary tables and corrected the typos.

Reviewer #3

1. To estimate false negative rates, the authors might consider using mutation insertion tools such as Bamsurgeon (https://github.com/adamewing/bamsurgeon) to create simulated mutations. Alternatively, one could assess the calling rate of high-confidence SNPs that differ between individuals of the same species to get at the FNR.

We agree with Reviewer 3. To calibrate our pipeline, we previously performed simulation to estimate the false negative and positive rates in different tree species (Betula platyphylla) using wgsim v0.1.11 (https://github.com/lh3/wgsim). Based on our simulations, we found that the false negative and false positive rates were very low, averaging at 0.050 and 0.046, respectively. It is important to note that the estimated false positive rate obtained from the simulation data was substantially lower than the proportion of potential false positive SNVs (as shown in Supplementary Fig. 5). This observation suggests that simulation-based evaluation of the false positive rate is not reliable, at least for the tree species we studied. Similarly, the same argument could be applied to the false negative rate. Therefore, we conclude that the simulation-based analysis for estimating false positive and false negative rates is not informative for our study.

The rate of true-positive or false-negative mutation calls can be estimated only when the true mutational status is known, but the data are not currently available. However, under the assumption that the final set of SNVs represents true somatic mutations, we were able to calculate the potential false negative rate. Our findings indicate that this rate is low, specifically less than 10%, when using less stringent filtering thresholds such as BQ20 and MQ20. While these estimated values may not precisely represent the true false negative rate, we included them as potential false negative rates in Supplementary Figure 7 of the revised manuscript. This information provides additional insights into the performance of our pipeline under different filtering thresholds and contributes to the overall assessment of our study.

1. It may be interesting to examine the mutation trees for constancy (or not) in mutation rate per meter. Examining Figure 1, it appears that the number of mutations near the crown "4" node is consistently higher than in nearby nodes (3-1 and 3-2).

We calculated the branch-level increment of SNVs per meter by dividing the number of single nucleotide variations (SNVs) by the physical distance. Our analysis revealed a slight increase in the number of SNVs per meter as the branch position became higher in S. laevis, as shown in Author response table 1. However, this trend was not clearly observed in S. leprosula. We found this observation in S. laevis intriguing, particularly because our recent analysis (Tomimoto et al., in preparation) demonstrated that genetic distance increases in branch pairs located in the upper part of a tree. This was elucidated through a mathematical model that describes the dynamics of the stem cell population during elongation and branching. We opted not to delve further into the findings in the current manuscript, as this topic will be extensively investigated in a future study.

Author response table 1. The branch-level increment of SNVs per meter.

Branch ID 0 1_1 12 2.1 22 3_1 32
1-1 3.23
S. laevis 1.2 5.18 13.32
2.1 4.01 4.67 6.42
S1 2.2 3.78 4.42 6.19 7.18
3.1 4.33 5.06 6.89 6.81 6.43
3.2 4.30 5.02 6.85 6.75 6.38 10.82
4 4.14 4.77 6.43 5.95 5.86 7.43 7.41
1.1 2.37
1.2 2.36 1.96
2-1 3.11 2.86 2.84
latevis 2.2 3.05 2.79 2.77 3.91
S2 3.1 2.61 2.23 1.86 3.45 3.36
3.2 2.59 2.20 1.84 3.40 3.31 3.92
4 2.77 2.43 2.41 3.73 3.63 3.74 3.67
1.1 1.21
1.2 1.11 1.18
- leprosula 2-1 1.50 1.35 1.19
F1 _(2)^(2)2 1.57 1.46 1.28 2.22
3=1 1.27 1.05 0.96 1.41 1.52
32 1.19 0.95 0.87 1.27 1.38 2.41
4 1.07 0.83 0.78 1.05 1.14 0.97 0.82
1.1 0.83
12 0.55 2.07
2-1 0.34 1.10 0.54
S. leprosula 2.2 0.39 1.19 0.62 1.25
F? 3+1 0.44 1.24 0.68 0.46 0.53
3.2 0.47 1.32 0.74 0.50 0.58 1.15
4 0.60 1.57 0.95 0.71 0.78 0.97 1.05

1. Line 150: Use of "significantly different" is confusing as the phrase is usually reserved for statistical significance. Consider replacing with "substantially different."

We have replaced “significantly” with “substantially” in the revised text.

1. In the Discussion, a clearer explanation of the assumptions that underlie the authors' reasoning would be welcome: e.g., constancy in mutation rate per meter within an individual tree. In particular, the authors assume that mutations that are seen in one leaf and not in another cannot have predated the most recent common meristematic node linking the two leaves. Is this a reasonable assumption? Since the meristem is multicellular, is it possible for a mutation to have arisen earlier in development and "assorted" into one cell lineage but not another?

We greatly appreciate an important comment. It is true that when the meristem is multicellular, and the stem cell lines are retained during mutation accumulation (e.g. a structured meristem analyzed in Tomimoto and Satake 2023), it is possible for a mutation to have arisen earlier before the bifurcation. Using a mathematical model, we have proved that the intercept and slope of the linear regression between the pairwise genetic distance and physical distance are influenced by the type of a meristem (strength of somatic genetic drift in a meristem) as well as the branching architecture of the tree. We have included an explanation of this point in the revised manuscript (ll. 244–249).

1. Supplementary Data 7: Column J should be "2_2"

We corrected the typo.

Associated Data

    This section collects any data citations, data availability statements, or supplementary materials included in this article.

    Data Citations

    1. Satake A, Imai R, Fujino T, Tomimoto S, Ohta K, Na'iem M, Indrioko S, Widiyatno PS, Mollá-Morales A, Nizhynska V, Tani N, Suyama Y, Sasaki E, Kasahara M. 2023. Genetic mosaicism and somatic mutation rate in tropical trees. DNA Data Bank of Japan. PRJDB14538 [DOI] [PMC free article] [PubMed]
    2. Satake A, Imai R, Fujino T, Tomimoto S, Ohta K, Na'iem M, Indrioko S, Widiyatno PS, Mollá-Morales A, Nizhynska V, Tani N, Suyama Y, Sasaki E, Kasahara M. 2023. Genetic mosaicism and somatic mutation rate in tropical trees. DNA Data Bank of Japan. PRJDB15012 [DOI] [PMC free article] [PubMed]

    Supplementary Materials

    Supplementary file 1. Supplementary tables.

    (a) Mean annual increment (MAI) of diameter at breast height (DBH). (b) Matrix of physical distances (m) between sampling positions and the number of SNVs indicated in parentheses. (c) Summary statistics of the studied trees. Height and DBH were directly measured for two individuals of S. laevis and S. leprosula. Age was estimated as DBH divided by a mean annual increment (MAI). (d) Summary statistics of genome assemblies for S. laevis and S. leprosula. We assembled the genome using DNA extracted from the apical leaf at branch 1–1 of the tallest individual of each species (S1 and F1). Summary statistics of genome assemblies are listed here. (e) Summary statistics of whole genome sequencing. (f) The number of candidate SNVs during each step of the filtering process. (g) Assessment of candidate SNVs using amplicon sequencing. (h) Somatic mutation rates. The somatic mutation rate per nucleotide per meter was estimated as μg=b2×R, where b indicates the slope of linear regression. The somatic mutation per nucleotide per year (μy) was estimated as μy=M2×R×A, where M indicates the total number of SNVs accumulated from the base to the branch tip and A represents tree age, respectively. R denotes the number of callable sites. (i) Cosine similarity of mutation spectra between Shorea trees and humans. (j) Results of the binomial test for selection on somatic and inter-individual SNVs. To test whether somatic and inter-individual SNVs are subject to selection, we calculated the expected rate of non-synonymous mutation. Given the observed number of non-synonymous and synonymous mutations, we rejected the null hypothesis of neutral selection using a binomial test with the significance level of 5%. pN_expected and pN_observed represent the expected and observed rate of non-synonymous substitutions. (k) The final set of SNVs. (l) Fractions of synonymous, missense, and nonsense substitutions. (m) Somatic mutation rates for six substitution classes. Somatic mutation rates for six substitution classes were calculated based on the observed number of SNVs both from the intergenic region and the whole genome. S1 +S2 and F1 +F2 represent the use of pooled data from two individuals for each species: S. laevis (S1, S2) and S. leprosula (F1, F2). The values based on the pooled data (indicated in bold type) were used to calculate the expected rate of non-synonymous mutation. (n) List of genes with somatic SNVs.

    elife-88456-supp1.xlsx (201.3KB, xlsx)
    MDAR checklist

    Data Availability Statement

    The raw sequencing data, the genome assembly, and the gene annotation are available at DDBJ under accessions PRJDB14538 for S. laevis and PRJDB15012 for S. leprosula.The codes for the bioinformatics pipeline to process whole genome sequencing data is available from https://github.com/ku-biomath/Shorea_mutation_detection (copy archived at ku-biomath, 2023).

    The following datasets were generated:

    Satake A, Imai R, Fujino T, Tomimoto S, Ohta K, Na'iem M, Indrioko S, Widiyatno PS, Mollá-Morales A, Nizhynska V, Tani N, Suyama Y, Sasaki E, Kasahara M. 2023. Genetic mosaicism and somatic mutation rate in tropical trees. DNA Data Bank of Japan. PRJDB14538

    Satake A, Imai R, Fujino T, Tomimoto S, Ohta K, Na'iem M, Indrioko S, Widiyatno PS, Mollá-Morales A, Nizhynska V, Tani N, Suyama Y, Sasaki E, Kasahara M. 2023. Genetic mosaicism and somatic mutation rate in tropical trees. DNA Data Bank of Japan. PRJDB15012


    Articles from eLife are provided here courtesy of eLife Sciences Publications, Ltd

    RESOURCES