Significance
“A fundamental tenet of evolutionary biology is that mutations are random events,” where this randomness means “that the likelihood of any particular mutational event is independent of its specific value to the organism,” according to a famous paper [Lenski and Mittler, Science 259, 188–194 (1993)]. Yet, the likelihood of any particular DNA mutation had not been directly measured. Recently, a method enabling such measurement was applied to the human HbS mutation, showing that its de novo origination rate is significantly higher in the population and gene case where it is adaptive. Here, we investigated another conditionally adaptive variant—the 1024A→G mutation in the APOL1 gene. Results are highly congruent with the HbS findings, underscoring the importance of high-resolution mutation rate studies.
Keywords: de novo mutation rates, maximum depth sequencing, human African sleeping sickness, APOL1 G1 allele, nonrandom mutation
Abstract
Mutation rates have long been measured as averages across many genomic positions. Recently, a method to measure the rates of individual mutations was applied to a narrow region in the human hemoglobin subunit beta (HBB) gene containing the site of the hemoglobin S (HbS) mutation as well as to a paralogous hemoglobin subunit delta (HBD) region, in sperm samples from sub-Saharan African and northern European donors [Melamed et al., Genome Res. 32, 488–498 (2022)]. The HbS mutation, which protects against malaria while causing sickle-cell anemia in homozygotes, originated de novo significantly more frequently in the HBB gene in Africans compared to the other three test cases combined (the European HBB gene and the European and African HBD gene). Here, we apply this approach to the human apolipoprotein L1 (APOL1) gene containing the site of the G1 1024A→G mutation, which protects against African sleeping sickness caused by Trypanosoma brucei gambiense while causing a substantially increased risk of chronic kidney disease in homozygotes. We find that the 1024A→G mutation is the mutation of highest de novo origination rate and deviates most from the genome-wide average rate for its type (A→G) compared to all other observable mutations in the region and that it originates de novo significantly more frequently in Africans than in Europeans—i.e., in the population where it is of adaptive significance. The results are unexpected given the notion that the probability of a specific mutational event is independent of its value to the organism and underscore the importance of studying mutation rates at the individual-mutation resolution.
Mutations provide the raw material for evolution and are implicated in genetic disease; therefore, understanding their origination is of fundamental importance (1–4). Their origination rates have long been measured as averages across many genomic positions—for example, across the billion base pairs of the entire haploid human genome (5–7), across many instances of any given short motif (7–9) [e.g., the A→G mutation rate in the middle of the AAG motif, estimated from the frequency of extremely rare variants averaged across nearly 29,000 instances of this motif per diploid human genome and across nearly 4,000 individuals (9)], and across the stretches of genes in certain cases (4, 10, 11) [e.g., averaged across the 3,657 bp of the locus implicated in Alagille syndrome, AGS (11)]. Such averages combine observations on many positions in the genome into one summary statistic, which, while informative, precludes a far more precise assessment of mutation rates of individual mutations at individual base positions. However, measuring the origination rate of a target individual mutation is challenging: It requires a large sample of DNA fragments containing the site of that mutation and an ultra-accurate method to detect the minute fraction of de novo instances of that particular mutation in the sample.
One ultra-accurate method is Duplex Sequencing (DS) (12), where duplex barcodes are attached to double-stranded DNA fragments to enable consensus sequencing based on both strands following amplification (12, 13). The optimal target size for DS is constrained by sequencing depth and target-capture efficiency and falls roughly between 1 Mb and 50 kb (13, 14). Combined with additional enrichment steps, it has been applied to regions as small as a few thousand bases in size (14–17). For example, when applied to the 16.6 kb of mitochondrial DNA from macaque oocytes and somatic tissues (18) and to a 4.5 kb region of the FGFR3 gene in human sperm cells (17), it enabled observing multiple mutations per base position in a small fraction of positions (17, 18). In ref. 18, single base positions were declared as hotspots if they were mutated in a number of animals considered high (5) given the average mutation rate observed across the region.
To focus on individual target mutations, however, greater coverage depth is needed. For example, in the mtDNA and FGFR3 studies, though recurrent mutations were observed a posteriori at a small fraction of scattered positions (17, 18), for any particular, target base of interest to have had even a single mutation instance in expectation across all samples combined, rates higher than in that particular position would have been needed. This depth-limitation of standard DS stems from the hybridization-based capture it normally employs to target a small region in a genome: The smaller the region is, the less frequently it appears in the captured library (13).
Unlike standard DS, amplicon-based methods (19, 20) are potentially applicable to arbitrarily narrow regions. One such method is a DS variant that preserves the duplex barcode by using a specific primer on one end and a generic primer on the other (20). However, it has only been applied so far to small amounts of DNA (21, 22). Another such method, which has been practiced more broadly (23–26), is Maximum Depth Sequencing (MDS) (19). MDS is a single strand–based method that differs from DS by barcoding only one of the two strands of a target DNA molecule and applying multiple rounds of linear amplification to the barcoded strand for consensus sequencing. After excluding mutation types arising during library amplification from single-strand lesions (especially G→T and C→T), MDS can potentially reach an ultra-high accuracy of 5 10−8 [extrapolated false positive rate with a high-fidelity polymerase (19)], similar to that of DS ( was noted by ref. 13 based on unpublished data). Yet, MDS’s accuracy and yield still render it challenging to apply to measuring the rates of natural individual target de novo mutations at individual base positions such as those of the human genome, with its average point mutation rate of 10−8.
Recently, Melamed et al. combined MDS with controlled mutation enrichment by a restriction enzyme (27). This combination (Mutation Enrichment followed by upscaled Maximum Depth Sequencing—MEMDS) increased both accuracy and yield by 100-fold from the MDS baseline, enabling ultra-high depth, accuracy, and yield and thus allowing focus on rates of individual target natural mutations at individual base positions across a very narrow region of interest (ROI) that is defined by the restriction enzyme recognition site (typically 6 bp) (27). Therefore, while DS remains a method of choice for measuring average mutation rates in regions of size down to a few kb and enables focusing on mutations occurring at extremely high rates, MEMDS offers much higher coverage in tiny regions for detection of natural individual target mutations.
In the first study of its kind, Melamed et al., applied MEMDS to a 6 base pair ROI in the human hemoglobin subunit beta gene (HBB) and to the identical paralogous region of the hemoglobin subunit delta gene (HBD) (27). The HBB region includes the site of the 20A→T mutation—commonly referred to as the hemoglobin S (HbS) mutation—which protects against malaria while causing sickle cell anemia in homozygotes (28–32), whereas HBD mutations have not been implicated in malaria resistance (33). These genomic regions were examined in human sperm samples from both sub-Saharan and northern European donors (27). Results showed, first, that the rates of the same mutations varied substantially between the two genes and the two populations even though these mutations appeared on the same local genetic background in these genes and populations (27), and that the population-level difference in the point mutation rate averaged across the 6 bp HBB region was two orders of magnitude larger than previously measured population-level differences in mutation rates averaged across many loci (27, 34, 35). Given that the mutation rate variation obtained at the level of individual mutations was notably higher than that of mutation rate averages across many positions obtained in previous studies (34, 35), could it be that an important part of the adaptively meaningful variation in mutation rates is to be found at the resolution of individual mutations?
In addition to demonstrating high mutation rate variation at the individual-mutation resolution, results showed that the 20A→T mutation originated de novo significantly more frequently in the HBB gene in sub-Saharan African donors, where it is adaptively significant, compared to the three cases where it is not, combined: the HBB gene in European donors and the HBD gene in both African and European donors (27). That is, given that malaria is common in sub-Saharan Africa and that the 20A→T mutation is protective against it when it occurs in HBB but not in HBD, this mutation was found to originate de novo significantly more frequently in the gene and in the population where it is of adaptive significance (27). This finding was unexpected given the central tenet of neo-Darwinism that mutations are random, where this randomness “refers to the supposition that the likelihood of any particular mutational event is independent of its specific value to the organism” (36). Given that previous studies had not measured the probabilities of target individual mutational events in the DNA, could it be that there actually is a relationship between the likelihood of a particular mutational event and its specific value to the organism, which could not have been systematically and effectively uncovered with previous methods?
It is tempting to answer with a resounding “no” to this question, given that there could hardly be a more fundamental assumption in evolutionary theory that data could violate (37). Yet the only way of actually knowing the answer is to carry out more studies of the probabilities of particular mutational events. Here, we examined a different gene and mutation that has also been directly implicated in adaptation: the human apolipoprotein L1 (APOL1) G1 1024A→G mutation.
APOL1 is the only known member of the apolipoprotein L family of innate immunity gene products that is secreted from the liver into the bloodstream, where it associates with serum high-density lipoprotein particles (38, 39). Human APOL1 has also evolved to confer protection against an extended spectrum of Trypanosoma brucei species, spread by the Glossina species of the tsetse fly vector, which cause African sleeping sickness—a devastating disease of wide historical prevalence and impact in Africa (40). In what appears to be an evolutionary arms race, two T. brucei species—Trypanosoma brucei gambiense in West Africa and Trypanosoma brucei rhodesiense in East Africa (41–45)—evolved mechanisms of resistance to the ancestral human APOL1, and two sets of human APOL1 variants evolved in Africa, each providing protection respectively against one of these species (43, 46, 47). The G1 variants comprise two nonsynonymous mutations in nearly perfect linkage disequilibrium: rs73885319 (1024A→G), altering codon 342 from Serine to Glycine (S342G), and rs60910145 (1152T→G), altering codon 384 from Isoleucine to Methionine (I384M), and evidence supports that both homozygotes and heterozygotes for the first of these mutations (1024A→G) are asymptomatic for Trypanosoma brucei gambiense (43, 44, 48–50). The G2 variant (rs71785313)—consisting of a 6-nucleotide in-frame deletion of two adjacent codons for Asparagine and Tyrosine at positions 388 and 389—protects against Trypanosoma brucei rhodesiense: neither homozygotes nor heterozygotes become infected (43). Besides these protective abilities, however, the G1/G1 and G2/G2 homozygotes as well as the G1/G2 compound heterozygote impose a substantially increased risk of chronic kidney disease (46, 47)—itself a devastating condition affecting millions of people of African descent worldwide (51), though there has been rapidly emerging progress in developing precision therapeutics (52). The G1 and G2 variants are prevalent in populations that live in or trace their recent ancestry to sub-Saharan Africa (53, 54) and provide protection against African sleeping sickness, which is endemic to sub-Saharan Africa. Therefore, it has been assumed that they are prevalent in these populations due to their heterozygote advantage alone, much as in the case of the HbS mutation with regard to malaria protection (43, 44, 46, 47). Thus, examination of the de novo origination rate of the mutations involved can provide empirical test cases similar to that of the HbS mutation in distinguishing whether the prevalence of an adaptive mutation in a population is due to random mutation and natural selection (rm/ns) (plus subsidiary factors such as random genetic drift) or whether such mutations are more likely to arise de novo in the populations where they are adaptive.
The MEMDS method has enabled measurement of mutation rates of individual mutations by focusing on narrow genomic ROIs and combining high-throughput consensus sequencing with a mutation enrichment step (27). A key element of it is the controlled removal of the vast majority of wild-type (WT) gene-copy fragments prior to sequencing using a restriction enzyme (RE) that digests specifically WT molecules while leaving RE-protected mutant sequences intact (27). This removal of WT greatly reduces the number of WT molecules that undergo sequencing and which could have become false positive mutation calls. The molecules removed are known to carry the WT sequence in the ROI and therefore need not be sequenced. This step reduces both sequencing runs and the false positive rate due to PCR amplification or high-throughput sequencing errors by the same factor. Next, randomized barcodes are attached directly to the remaining ROI-carrying molecules, allowing linearly amplified copies that originate from the same DNA fragment to be grouped together and consensus-sequenced to remove the remaining PCR and high-throughput sequencing errors at the computational stage (19). At the same time, the novel experimental design described in ref. 27 is used to keep track of the number of WT molecules removed and thereby calculate the frequencies of mutations in the original sample. For this purpose, two DNA samples are prepared, each with the addition of mock DNA resistant to RE digestion, and both are subjected to the same protocol, except for the RE digestion step, which is applied to only one of them. Following accurate barcode-based consensus sequencing, one obtains the ratios of mock RE-resistant to RE-sensitive sequences in each sample, and, based on them, calculates the enrichment factor of RE-resistant mutations, the number of WT molecules removed by digestion, and thus the frequency in the original sample of each RE-resistant mutation (Fig. 1A). This method lowers the sequencing runs and error rate sufficiently to identify and count de novo mutations in human sperm samples at the single-digit resolution (27).
Fig. 1.

MEMDS experimental setup. (A) Schematic illustration of the MEMDS experimental design enabling calculation of the HindIII-enrichment factor and the number of target-APOL1 WT molecules that were scanned, modified from ref. 27. The experiment begins with two tubes, one containing sperm genomic DNA that carries mostly HindIII-sensitive APOL1 ROI sequences, , and one containing artificial APOL1 molecules, , which carry a known 6 bp sequence different from the HindIII recognition sequence and are therefore resistant to HindIII digestion. Volumes are drawn in known amounts from each source tube to create two mixtures designated “HindIII-treated” and “HindIII-untreated.” These two samples undergo the full MEMDS protocol (SI Appendix, Fig. S1), with the exception that the former is treated with HindIII to enrich for mutations in the ROI while the latter is not. Note that the volumes taken for both mixtures are not equal to allow for efficient identification of both genomic and artificial APOL1 sequences following the different treatments and to scan a large amount of genomic APOL1 sequences for de novo mutations in the HindIII-treated sample (detailed in ref. 27, SI Appendix, Text 2). Following high-throughput sequencing, variants are identified by the MEMDS computational pipeline (27), and the numbers of HindIII-sensitive ROI variants (i.e., WT ROIs) and artificial RE-resistant ROI variants are determined for each sample. These numbers, together with the known volumes taken from the source tubes to create the input mixtures, are used to calculate the HindIII enrichment factor, , and the total number of WT sequences that were either digested by HindIII or evaded digestion and were sequenced and identified, , as shown in the figure (for full derivation and assumptions, see ref. 27). (B) A schematic illustration of the APOL1 gene fragment analyzed and variant enrichment. The 6 bp region of interest (ROI) that matches the HindIII-restriction site is shown in the yellow box. HindIII digestion of human sperm DNA removes APOL1 gene fragments carrying the WT sequence and enriches for APOL1 variants with mutations at the ROI site, while the experimental setup described enables keeping track of the number of WT molecules removed. PvuII digestion is used for the addition of a unique barcode sequence near the ROI for the subsequent consensus sequencing. See SI Appendix, Fig. S1 for the complete MEMDS protocol.
In the case of mutations that are expected to affect the fitness of the organisms carrying them but not the viability or fertility of sperm cells of healthy noncarriers in which they appeared de novo, such as the HbS or the APOL1 G1 or G2 mutations, the frequency of the mutation in sperm samples is equivalent to the probability that it will be transmitted to the offspring and appear as a de novo mutation in it. Therefore, it is equivalent to the evolutionarily relevant mutation-specific de novo origination rate. Importantly, this equivalence holds regardless of whether different instances of such a mutation in the sample have all arisen independently or via replications of a cell or cells that mutated early in spermatogenesis, a possibility referred to in ref. 27 as “clonal dependence.” Although clonal dependence, if it exists in the data, would increase the variance in the mutation rate estimate, this has been taken into account statistically both in ref. 27 and here.
For its WT-reduction procedure, MEMDS requires both the ROI to consist of an RE recognition sequence and the mutations of interest to disrupt this sequence. Therefore, we examined the G1 and G2 mutations and found that the G1 1024A→G mutation satisfies these conditions: It disrupts a sequence that, in the WT, constitutes the recognition site of HindIII (AAGCTT), a commercially available RE (Fig. 1B). Accordingly, we obtained sperm samples from 7 Ghanaian and 8 northern European donors and examined a total of more than half a billion APOL1 ROI-carrying gene-copy fragments individually to measure the de novo origination rates of the G1 1024A→G mutation as well as nearby point mutations and indels in or overlapping with the APOL1 ROI. See SI Appendix for a complete methods description and sequence library properties (SI Appendix, Figs. S1–S10 and Tables S1–S4).
Results
Previous studies utilizing single-strand DNA sequencing consider G→T and C→T changes to represent the experimental disruption of an ongoing in vivo process of base damage and repair due to guanine oxidation and cytosine deamination as well as corresponding in vitro damage rather than durable mutations which might be propagated (19, 27, 55). Indeed, in accord with this, the vast majority of DNA sequencing variations observed here were of these two types (SI Appendix, Fig. S6) and were excluded from further analysis.
In addition, no natural de novo mutations are expected to be observed in notable frequencies in the ROI in the HindIII-untreated samples, and, if they are observed there, they should appear at incomparably lower frequencies compared to the HindIII-treated sample due to lack of enrichment. However, three mutations, 1026C→A, 1027T→A, and 1028T→A were observed in the ROI in the untreated samples (no other mutations, including the G1 1024A→G mutation, were observed in these samples), and their frequencies, normalized to the average noise level in the ROI-flanking regions, were very similar between the treated and untreated samples (SI Appendix, Fig. S10). These observations are inconsistent with the possibility that these mutations originated naturally during gametogenesis and were enriched with MEMDS but are consistent with the possibility that they represent noise introduced during library preparation. Therefore, we excluded these mutations from further analysis. Following these foregoing exclusions, 13 point mutations and potentially many indels remain observable by MEMDS in this ROI (Table 1).
Table 1.
Counts of de novo mutations observable by MEMDS in the APOL1 ROI in sperm samples, 7 from sub-Saharan Africans and 8 from northern Europeans
| Positions | Point mutations | Indels | ||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| (or center of indels)→ | 1023 | 1024 | 1025 | 1026 | 1027 | 1028 | 1024 | 1025 | 1026 | 1028 | ||||||||
| Cells scanned (106)↓ | A>G | A>C | A>T | A>G | A>C | A>T | G>A | G>C | C>G | T>C | T>G | T>C | T>G | del | del | del | del | |
| AFR | 54.0 | 2 | 9 | 1 | ||||||||||||||
| 39.4 | 5 | 1 | 1 | 1 | ||||||||||||||
| 50.3 | 2 | 1 | 1 | 1 | 1 | 2 | 1 | 1 | 1 | |||||||||
| 33.1 | 4 | 4 | 1 | 1 | 3 | |||||||||||||
| 38.5 | 1 | 11 | 2 | 2 | ||||||||||||||
| 30.8 | 4 | 7 | 1 | 5 | 2 | |||||||||||||
| 45.0 | 3 | 12 | 7 | 1 | ||||||||||||||
| EUR | 47.4 | 1 | 1 | |||||||||||||||
| 54.9 | 2 | 1 | ||||||||||||||||
| 43.7 | 4 | 1 | 2 | 1 | 3 | 1 | 1 | |||||||||||
| 36.8 | 2 | 5 | 2 | 1 | 1 | |||||||||||||
| 20.2 | ||||||||||||||||||
| 45.8 | 2 | 3 | 2 | |||||||||||||||
| 35.3 | 1 | 1 | 7 | 3 | 3 | 1 | ||||||||||||
| 38.6 | 1 | 2 | 1 | 1 | ||||||||||||||
The number of haploid individual genomes scanned by MEMDS in the ROI per donor are given in the second column in units of . The base positions of observable mutations and the particular observable mutations in those positions are given in the second and third rows for both point mutations and indels (all 4 observed indels are single-base deletions). The other rubrics provide the counts of de novo instances per donor for each mutation (empty cells represent zero counts). The rate of the 1024A→G mutation is notably higher in Africans than Europeans and notably higher than that of all other observable mutations in the region.
The mutations observed in the flanking regions outside of the ROI are not enriched. Although some of these mutations may be real, categorizing all of them as due to noise and dividing the average mutation rate across the flanking regions by the enrichment factor sets an upper bound on the noise level (the false positive rate, FPR) in the ROI (Fig. 2). While in ref. 27 this upper bound FPR was itself negligible (), for the current ROI it was found to be on the order of . Although in principle this noise level could make a study of mutation rates at the single mutation resolution challenging given the average human genome-wide mutation rate of , here it turned out not to be the case because of the hotness observed for this ROI. Nevertheless, in order to take this noise level into account and obtain the lower bound (i.e., most conservative) de novo mutation rates, we subtracted the upper bound FPR of each mutation from the actual observed rate of that mutation calculated from the values in Table 1, obtaining the mutation rates in Fig. 3. All statistical tests and their results provided below are based on these noise-reduced numbers, and in all count-based tests the expected mutation numbers following noise reduction were first summed up and then rounded to the nearest integer. Following this upper bound noise reduction, the total numbers of point mutations in Africans and Europeans were 88 and 47 respectively (compared to 97 and 53 in Table 1), showing that even this maximum possible noise reduction has a limited effect on the data.
Fig. 2.
Error rates and yield of MEMDS as applied to the APOL1 ROI. (A) Rates per mutation type in the ROI-flanking sequences of APOL1 fragments for the HindIII-treated and HindIII-untreated samples combined show a similar pattern for the African (AFR, dark blue) and European (EUR, light blue) groups. (B) Under the stringent assumption that all of the mutations in the flanking regions are due to sequencing and/or library generation errors, these mutations, with the exception of G→T and C→T, are used to calculate the average FPR per base in the flanking sequences (blue). To calculate the average FPR for the 6 bp ROI (red), the per-donor FPRs for the flanking sequences are divided by the corresponding HindIII-enrichment factors. (C) Depletion of APOL1 sequences carrying the WT ROI by HindIII digestion increases the yield (i.e., the number of called consensus bases per sequenced base) for the 6 bp ROI sequence. The yield can become larger than one because the composition of the sequences removed by digestion is known.
Fig. 3.
Boxplot of de novo mutation rates of observable point mutations in the APOL1 ROI in Africans (AFR, red) and Europeans (EUR, turquoise). The Trypanosoma brucei gambiense–protective 1024A→G mutation exhibits a striking difference in de novo rate between the two populations.
Average ROI Mutation Rates.
To obtain comparisons to well-accepted quantities, we used as grossly representative of the human genome-wide average point mutation rate (6, 7, 56–59) and a lower estimate () for the human genome-wide average indel rate (11, 27, 60, 61). We found that, as measured by MEMDS for the two ethnic groups combined, the average point mutation and indel rates in the APOL1 ROI were and respectively, significantly higher by 4.1-fold (, 95% CI to , two-sided exact binomial test) and marginally significantly higher by 2.0-fold (, 95% CI to , two-sided exact binomial test) than the aforementioned genome-wide average point mutation and indel rates respectively.
Population Differences in ROI Rates.
To test for a population-level difference in the overall point mutation rates per person across the APOL1 ROI while excluding the possibility that such a difference is merely due to sample-level variation (e.g., due to clonal dependence) and/or individual-level variation alone, we used a two-sided Wilcoxon rank-sum test. We found that these rates were marginally significantly higher in the African than in the European group (). When excluding the G1 1024A→G mutation from the count, the rates were not significantly different between the groups (, two-sided Wilcoxon rank-sum test). Next, pooling together cells from all donors within each group while ignoring individual- and sample-level variation to obtain the average point mutation rate including the G1 1024A→G mutation across the ROI showed this average rate to be significantly higher by 2.1-fold (, OR 95% CI to , two-sided Fisher exact test) in Africans than in Europeans. The average pooled indel rate did not differ significantly between the groups (, two-sided Fisher exact test).
Mutation-Specific Origination Rates.
While the above measurements relate to population comparisons for the overall ROI, statistical analysis of the results at a finer scale resolution yielded a clear picture. First, following upper-bound noise reduction, analysis showed that nearly 46 G1 1024A→G mutation instances were observed in the African group among a total of 291 106 cells, whereas 19 G1 1024A→G mutation instances were observed in the European group among a total of 323 106 cells. To examine whether the de novo origination rates of the G1 1024A→G mutation differed significantly between the groups while excluding the possibility that this difference is due to sample-level variation (e.g., due to clonal dependence) and/or individual-level variation alone, we compared the per person G1 1024A→G mutation rates between the two groups using a two-sided Wilcoxon rank-sum test. Results showed that these rates were significantly higher in the African than in the European group (). Next, pooling together cells from all donors within each group while ignoring individual- and sample-level variation to obtain the average mutation-specific origination rate of G1 1024A→G across individuals showed this rate to be significantly higher by 2.7-fold (, OR 95% CI to , two-sided Fisher exact test) in Africans than in Europeans.
Additionally, we found that the G1 1024A→G mutation is the point mutation of highest mutation-specific rate in the ROI (Fig. 3), and that it deviates by far the most from the genome-wide average mutation rate for its type (A→G) compared to all other 12 point mutations studied in the ROI (Table 2). This deviation increases when expanding the local genetic context (the size of the motif) based on which the genome-wide average mutation rates are calculated, from 1-mer to 7-mer (Table 2).
Table 2.
De novo mutation rate fold changes from the corresponding genome-wide average (GWA) rates for observable point mutations in the African APOL1 ROI
| De novo mutation–based | ERV-based, 3-mer | ERV-based, 5-mer | ERV-based, 7-mer | |||||
|---|---|---|---|---|---|---|---|---|
| GWA rate | GWA rate | GWA rate | GWA rate | |||||
| Variant | (10−8) | Fold-change | (10−8) | Fold-change | (10−8) | Fold-change | (10−8) | Fold-change |
| 1023A>G | 0.593 | 8.11∗∗∗∗ | 0.452 | 10.65∗∗∗∗ | 0.531 | 9.05∗∗∗∗ | 0.443 | 10.85∗∗∗∗ |
| 1023A>C | 0.170 | 2.03 | 0.146 | 2.35 | 0.121 | 2.84 | 0.125 | 2.74 |
| 1023A>T† | 0.138 | 0.00 | 0.176 | 0.00 | 0.108 | 0.00 | 0.116 | 0.00 |
| 1024A>G | 0.593 | 26.64∗∗∗∗∗ | 0.431 | 36.64∗∗∗∗∗ | 0.314 | 50.37∗∗∗∗∗ | 0.278 | 56.93∗∗∗∗∗ |
| 1024A>C† | 0.170 | 0.00 | 0.157 | 0.00 | 0.150 | 0.00 | 0.125 | 0.00 |
| 1024A>T† | 0.138 | 0.00 | 0.105 | 0.00 | 0.099 | 0.00 | 0.079 | 0.00 |
| 1025G>A | 0.732 | 3.76∗∗ | 0.829 | 3.32∗∗ | 0.802 | 3.43∗∗ | 0.840 | 3.27∗∗ |
| 1025G>C | 0.213 | 4.83∗ | 0.238 | 4.34∗ | 0.300 | 3.44 | 0.264 | 3.90∗ |
| 1026C>G | 0.213 | 4.83∗ | 0.238 | 4.34∗ | 0.300 | 3.44 | 0.206 | 5.00∗ |
| 1027T>C | 0.593 | 1.74 | 0.431 | 2.39 | 0.402 | 2.56 | 0.353 | 2.92 |
| 1027T>G† | 0.170 | 0.00 | 0.157 | 0.00 | 0.147 | 0.00 | 0.184 | 0.00 |
| 1028T>C | 0.593 | 6.37∗∗∗ | 0.322 | 11.74∗∗∗∗ | 0.329 | 11.50∗∗∗∗ | 0.252 | 15.01∗∗∗∗ |
| 1028T>G† | 0.170 | 0.00 | 0.135 | 0.00 | 0.155 | 0.00 | 0.741 | 0.00 |
GWA rates were calculated for the 1-mer context based on genome-wide family sequencing studies (7) and for the 3-, 5-, and 7-mer contexts based on the relative frequencies of Extremely Rare Variants (ERVs) (9). The mutation-specific origination rate of the 1024A→G mutation deviates by far the most from the GWA for its type compared to all other 12 observable point mutations in the ROI (over 26-fold vs. - and 6-fold for the second and third largest deviations in the 1-mer context, and nearly 57-fold compared to 11- and 15-fold in the 7-mer context). The deviations are not accounted for by expanding the local genetic context taken into account in the GWAs. Adjustments to the GWA calculation are minor compared to the variation in mutation-specific origination rates, and the deviations of the three mutations of highest rate from their corresponding GWAs merely increase overall as we move from the 1-mer to the 7-mer context. Fold-change P values were based on a two-sided exact binomial test.
†Variant was not found in the analyzed samples of African donors.
∗.
∗∗.
∗∗∗.
∗∗∗∗.
∗∗∗∗∗.
We also found that the lower frequency 1023A→G mutation shows a significant difference in de novo rates between the populations (, two-tailed Wilcoxon rank-sum test), albeit with much smaller rates.
Discussion
Results showed that the G1 1024A→G mutation, which protects against Trypanosoma brucei gambiense—the most common cause of human African sleeping sickness, originated de novo significantly more frequently in the African than in the European group. Namely, the G1 1024A→G de novo rate is significantly higher in the population where this mutation is of adaptive significance. Furthermore, the pattern held even when controlling for possible clonal dependence among the observed instances of the mutation. Adding to the uniqueness of the G1 1024A→G mutation, not only is its rate by far the highest among the rates of all other point mutations in the region, it also deviates by far the most from the genome-wide average for its type (A→G), and this deviation merely increases when expanding the local genetic context (further highlighting the fact that the local genetic context alone does not explain mutation rate variation at the single mutation resolution).
The patterns thus observed are strikingly similar to those observed for the malaria-protective HbS mutation in the African HBB gene (27). In that study, Melamed et al. found that the de novo origination rate of the HbS mutation was higher specifically in HBB in Africans—i.e., in the gene and population case where it protects against malaria—compared to the other three groups combined (the HBB gene in individuals of European origin, whose ancestors have not been exposed to intense malarial pressure, and the paralogous HBD gene, which is not implicated in malaria resistance, in individuals of both African and European origin), while accounting for possible clonal dependence (27). In addition, they addressed the question of whether the effect existed at the population level independently of a gene-level effect as follows. They noted that, both with or without counting the HbS mutation, the overall point mutation rate in the HBB ROI was significantly higher in Africans than in Europeans and a significant correspondence existed between the de novo mutation rates observed in the study and allele frequencies in populations (27); and that for the HbS mutation rate not to be higher in Africans than in Europeans, it would have had to violate both of these independent statistically significant patterns, when in fact it was already aligned with both more strongly than the rate of any other point mutation in the region (27). In other words, triangulating from different statistically significant tests suggested that the population-level difference held independently of a gene level difference (27). In contrast, in the current study, the higher mutation rate for the APOL1 ROI compared to the HBB ROI together with a larger sample size yielded a larger number of observations sufficient to resolve with a single, simple statistical test that the G1 1024A→G mutation rate is higher in Africans than in Europeans.
It is thus notable that in both of the first two studies to examine the de novo origination rates of target individual mutations, both of which examined mutations of known adaptive values—the well-known malaria-protective HbS mutation as well as the G1 1024A→G mutation—the focal mutations originated de novo significantly more frequently specifically where they are of adaptive value. Both cases thus add a level of complexity to what was previously known regarding mutation rates (e.g., refs. 7, 9, 56, 57, 62, and 63). Indeed, while it has been assumed that “the likelihood of any particular mutational event is independent of its specific value to the organism” (36), both studies showed a pattern opposite than expected. It is worthwhile to note that common knowledge of mutation rates had been based on averages across genomic positions (4, 7–11, 64), and therefore, unless the new findings are refuted experimentally, they require a reexamination of the aforementioned expectation and its consequences (65).
Melamed et al. (27) considered three possible explanations for the HbS mutation origination pattern, relevant also here. Because attribution to coincidence leaves the results anomalous, and because modifier theory does not easily address changes in individual DNA mutation rates (62, 66–70), we focus on the third. We start with a concrete example from structural variation mutation, then bring the insights to bear on point mutations, next propose possibilities regarding the HbS and APOL1 1024A→G mutations’ mechanisms of origination, and finally discuss consequences.
According to the mutational replacement hypothesis, evolved, complex regulatory phenomena can directly and mechanistically lead to specific mutations that replace and simplify these phenomena (71). For example, interacting genes are more likely to undergo a fusion mutation in evolution (72). It has been proposed that, because DNA loops bring remote genes that interact to the same place at the same time in the nucleus with their chromatin open (73–75), reverse transcription, recombination-based mechanisms, and other processes preferentially fuse such genes rather than others (71, 72, 76, 77). Furthermore, to explain the fact that the gene fusion effect applies to both germline and somatic genes (72) without resorting to Lamarckism, it has been proposed that the transcriptional promiscuity of the germline (78–80)—the fact that somatic genes are regularly transcribed in the germline—exposes close regulatory somatic connections, allowing somatic as well as germline genes to participate in mutational mechanisms such as the one described in a heritable mode (65, 71, 72, 76).
The gene-fusion findings imply that each pair of genes has its own fusion mutation probability, which itself is influenced by complex information in the genome—including all of the promoters, enhancers, transcription factors, and epigenetic marks of the two focal loci and other loci regulating them which determine the strength of the interaction between the genes (71, 72). They also connect between the causes and consequence of a fusion mutation: An interaction that previously required two separate activations is simplified and replaced with a “ready-made” genetic unit (71, 72, 76). Similarly, interaction-based mutational translocation of genes into the same neighborhood and of exons into the same gene has been offered to shed light on the correlation between genomic proximity and functional interaction and on exon shuffling, respectively (72, 76). Also related, because transposable elements (TEs) are active in the germline (81, 82), it has been proposed that multiple genes that have been coopted into a novel network and that are active at the same time in the germline due to its transcriptional promiscuity are more likely to be invaded over evolutionary time by copies of the same TE in a self-accelerating manner, offering an explanation (71) for the Britten–Davidson model, whereby a TE proliferates and becomes a master-coordinator of a gene regulatory network (83, 84). In all of the above, evolved regulatory phenomena condition the genome for their own replacement and simplification via mutational mechanisms (71).
The same mutational replacement principle has been applied to point mutations. It has been suggested that evolved, strong RNA A→I (equivalent to A→G) editing increases the probability of an A→G mutation at the corresponding DNA position directly and mechanistically either via reverse transcription or by ADAR acting directly at a lower rate on the transcribed DNA (71) (see also refs. 85–89). In addition to implicating RNA editing in the mechanisms of evolution (90), this could explain the otherwise puzzling observations that A→I RNA-edited sites in one species correspond to A→G substitution sites in related species and to A/G DNA polymorphisms in the same species in both coding and noncoding regions and at both synonymous and nonsynonymous sites (91–97). Much as in the structural variation mutation examples above, such point mutations would hardwire the phenotype into the genome in a simple local form (71). Also relevant for our conclusions below regarding the HbS and APOL1 1024A→G mutations, evidence suggests that gene expression in the soma correlates with transcription-associated point mutations (TAMs) in the germline (98). Therefore, because evolved excessive expression of a gene likely follows recent evolutionary pressure due to environmental change, it has been proposed that germline TAMs are more likely than randomly occurring point mutations to be relevant to the evolution of adaptations (71) and are further narrowed down to specific mutations and positions by additional regulatory processes (71).
The structural variation and point mutation examples above taken together suggest that the genome’s functional activity influences the probabilities of origination of individual mutations (65, 71, 72, 76, 99). These probabilities are determined by complex information in the genome, including information about which genes interact with each other, which genes are expressed excessively, which nucleotides are repeatedly edited, and more (71, 76, 99). According to this view, the causes and consequences of individual mutations are meaningfully related without involving Lamarckism: The internal information affecting mutation probabilities itself has accumulated under past selection pressures and reflects current structures and functions, and therefore the distribution of individual mutation rates across the genome at any one time in evolution is specifically relevant to the adaptations evolving at that time (65, 71, 99).
According to the above, possibilities for the HbS (20A→T in HBB) and APOL1 1024G mutations’ mechanisms of origination can now be raised. Pan et al. found that in saker falcons, highly expressed hemoglobin genes exhibit high TAMs in the blood, especially in a population living at a higher altitude and therefore subject to relative hypoxia, generating mutations, especially A→T transversions, at particular hemoglobin loci possibly related to regulatory phenomena and to increased RNA diversity (100). This raises the question of whether also in humans, populations that have been under intense malarial pressure for many generations, which affects the hemoglobin genes, exhibit high hemoglobin RNA diversity due to evolved regulatory activity from which the HbS and other specific malaria-protective mutations follow (27, 71, 99). Furthermore, strong A→I (A→G) RNA editing has been observed in the 3’UTR of APOL1, close to the APOL1 ROI (101–103); and although no such activity was found at position 1024 itself, the relevant tissues and populations have not been examined (germline and liver cells in sub-Saharan Africans). Though the details remain unknown, the mutational replacement hypothesis shows us in principle how the gradual process of the evolution of regulation can lead to punctuated, large-effect mutational change: A complex regulatory phenomenon first evolves by allele frequency and small-effect changes across loci, which then conditions the genome for this phenomenon’s own replacement and simplification (71). This does not entail that the genome “knows” in advance whether each mutation will be beneficial or not. Genomic friction points have been hypothesized, where long-term selection pressures lead to mutational pressures on specific loci, resulting in recurrent genetic disease due to the difficulty of changing a complex whole without cost, until the mutational pressure is channeled to another route forward or compensatory mutations arise (65). The HbS and APOL1 1024A→G mutations may be prominent examples of such friction points.
That adaptive mutations like the HbS and APOL1 1024A→G ones may arise from preexisting interactions raises the question of how new heritable adaptive information originates and brings out the consequences of the above. Though its origin has been often attributed to accidental, point-wise change, according to the theory of interaction-based evolution (IBE), new information arises at the system level from emergent interactions between mechanistically driven heritable changes (71, 76). Because elements that are both simple and effective tend to become useful across contexts, IBE holds that mutational simplification under selection generates from preexisting interactions new elements that have the inherent capacity to come together with other such elements into novel interactions, connecting simplification under selection to co-option and the emergence of complexity (71, 76). Under this view, simplification itself is not attributed to rm/ns but is taken to be a basic element, like natural selection: Evolution involves both an internal force of natural simplification and an external force of natural selection (71, 76). This enhances the commonly made analogy between evolution and trial and error by allowing the “trials” to be influenced by prior information and by engaging the well-known power of the pressures of parsimony and fit when applied in combination to the same object (76). Alongside these conceptual consequences, multiple future research directions arise, from the mirroring of somatic, close regulatory connections by regulatory activity in the germline (65, 71), to testing IBE with evolutionary experiments (27, 71, 99), to mapping the rates of individual mutations across genomes of different organisms and investigating their potential relation to the evolution of adaptations.
Materials and Methods
The MEMDS method as applied to the APOL1 G1 mutation was adapted from Melamed et al. (27). A detailed description of the MEMDS method is available in the SI Appendix, Fig. S1.
Reagents.
All oligos were obtained from Integrated DNA Technologies with standard desalting purity. Full descriptions of the oligos used are available in SI Appendix, Table S4. All enzymes were received from New England Biolabs. DNA purifications were carried out using QIAGEN kits in accord with the manufacturer’s instructions unless mentioned otherwise.
Preparation of Spike-In Plasmids.
Two puc19-based plasmids, ALP23 and ALP24, were constructed to include a sequence identical to the studied APOL1 gene fragment between positions 764 and 1,255, relative to the mRNA translation start site, except that the HindIII AAGCTT restriction site was replaced with TTGAAA and CTCTAG, respectively. To make the spike-in mixture, the two plasmids were linearized by BamHI, mixed in equal amounts and diluted to 10fg/l.
Collection of Sperm Samples.
Sperm samples were obtained from healthy donors between the ages 18 and 39 with no history of cancer or infertility and with no high fever in the 3 mo prior to donation. European samples were purchased from Fairfax Cryobank and African samples were collected in the Assisted Conception Unit of the Lister Hospital & Fertility Centre in Accra, Ghana following clinical standards with the approvals of the Institutional Review Board of the Noguchi Memorial Institute for Medical Research (NMIMR-IRB 081/16-17) at the University of Ghana, Legon, the Rambam Health Care Center Helsinki Committee, Haifa (0312-16-RMB), and the Israel Ministry of Health (20188768). Informed consent was obtained from all participants, and samples were deidentified prior to use in this study.
Extraction of Sperm DNA.
DNA extraction from sperm samples was carried out as described previously by Melamed et al. (27). Briefly, 500 l aliquots from each donor’s sperm sample were washed twice with 70% ethanol to remove seminal plasma. Cells were rotated overnight at 50 °C in a 700l lysis buffer (50 mM Tris-HCl pH 8.0, 100 mM NaCl2, 50 mM EDTA, 1% SDS) containing 0.5% Triton X-100 (Fisher BioReagents BP151-100), 50mM Tris (2-carboxyethyl) phosphine hydrochloride (TCEP; Sigma-Aldrich 646547) and 1.75 mg/mL Proteinase K (Fisher BioReagents BP1700-100). Lysates were centrifuged at 21,000g for 10 min at room temperature and supernatants were united in a single tube. The cleared lysate was supplemented by 15 mL of buffer G2 (QIAGEN Blood & Cell Culture DNA Maxi Kit, 13362), and purification continued according to the manufacturer’s instructions. The eluted DNA was allowed to dissolve overnight at room temperature. For each donor, a small aliquot from the extracted DNA was PCR-amplified and sanger-sequenced to confirm the absence of G1 and G2 mutations in each donor’s genetic background as well as to confirm the compatibility of the target region for library generation by the MEMDS protocol (i.e., having no other somatic mutations that may inhibit HindIII or PvuII digestion or which may interfere with primer binding). Additionally, the haplotype background at positions 150, 228, and 255 of the APOL1 protein was verified for all donors.
Enzymatic Digestion of Sperm DNA.
For the HindIII-treated sample, DNA amounts equivalent to 75 to 90 million haploid cells were mixed with 0.2 pg of the plasmid spike-in mixture, divided in a 96-well plate with 10 U HindIII-HF (R3104L) per well and incubated overnight at 37 °C, according to the manufacturer’s instructions (for the genomic and plasmid spike-in mixture volumes used that were crucial for the HindIII enrichment factor calculation, see SI Appendix, Table S3). Then, each well was supplemented by 20 U PvuII-HF (R3151L) and 20 U NdeI (R0111L) and incubated for an additional 3 h at 37 °C.
For the HindIII-untreated reaction, sperm DNA in the amount equivalent to 5% of that used for the HindIII-treated reaction was mixed with 1.2 pg of plasmid spike-in mixture and 10 U SalI-HF (R3138L), aliquoted to 5 tubes, and incubated overnight at 37 °C, according to the manufacturer’s instructions. This allowed for similar conditions of DNA digestion, without affecting APOL1 target sequences. Each tube was then supplemented by 20 U PvuII-HF (R3151L) and 20 U NdeI (R0111L) and incubated at 37 °C for an additional 3 h. DNA purification was carried out with QIAGEN PCR purification kit.
Primary Barcode Labeling and Linear Amplification.
Direct barcode labeling followed by linear amplification of the digested APOL1 strands were carried out together in a single reaction in 96-well plates. Each well contained about 1g of digested DNA, 0.1 M primary barcode oligo (Oligo APA; see SI Appendix, Table S4) and 1 M of 5′- phosphorothioate-protected primer for linear amplification (Oligo APB). The reaction was carried out with Q5 high-fidelity polymerase according to the manufacturer’s instructions, using the following thermocycler parameters: initial denaturation at 98 °C for 20 s, followed by 16 cycles of 98 °C for 5 s, 67 °C for 15 s, and 72 °C for 20 s. For each donor and each treatment, a different APA oligo was used with a unique Donor Identifier-1 (ID-1) sequence, allowing for elimination of any post-processing contamination at the sequence analysis step.
5′-Exonuclease Treatment.
Following purification, ~15 g DNA aliquots from the post–linearly amplified product of the HindIII-treated sample were each incubated at 37 °C in the presence of 20 U of Lambda exonuclease, 40 U of T7 exonuclease and 120 U of RecJF exonuclease in 1 CutSmart buffer for 3.5 h. The post–linearly amplified product of the HindIII-untreated sample was incubated at 37 °C for 3.5 h with 10 U of Lambda exonuclease, 20 U of T7 exonuclease and 60 U of RecJF exonuclease.
Secondary Barcode Labeling and 3′-Exonuclease Treatment.
Following purification, the DNA was aliquoted into a 96-well plate (~0.75 g per well). A single primer extension reaction was carried out using 0.5 M of the secondary barcode primer (Oligo APC) and Q5 high-fidelity polymerase according to the manufacturer’s instructions, using the following thermocycler parameters: initial denaturation at 98 °C for 20 s, followed by a single cycle of 98 °C for 5 s, 70 °C for 15 s, and 72 °C for 40 s. Immediately after the thermocycler temperature dropped to 16 °C, to remove free oligo APC, 20 U of thermolabile Exo I were added directly to each well, together with the relabeling control primer (Oligo APD). Oligo APD was added in a known amount equivalent to 0.66% of the secondary barcode primer (6.6% for AFR3 and AFR7). After one hour of incubation at 37 °C, Exo I was heat-inactivated for one minute at 80 °C and the DNA was purified. For each donor, each of the HindIII-treated and untreated samples was labeled by an Oligo APC with a different Donor Identifier-2 sequence (ID-2), which also was not shared by samples from other donors, resulting in each donor and each condition having a unique Identifier-2 sequence.
PCR Amplification and Sequencing.
The first PCR of the barcoded product was carried out using the primers APE and APF1 and Q5 high-fidelity polymerase according to the manufacturer’s instructions. The following thermocycler parameters were used: initial denaturation at 98 °C for 30 s, followed by 10 cycles of 98 °C for 5 s, 72 °C for 15 s, 72 °C for 30 s, and a final extension at 72 °C for 30 s. Amplification products were purified and the second PCR was carried out using 25% of the first PCR product as a template, the primers APE and APF2, and Q5 high-fidelity polymerase according to the manufacturer’s instructions (different F2 primers were used in order to add a unique Illumina index sequence to each HindIII-treated and untreated sample). The following thermocycler parameters were used: initial denaturation at 98 °C for 30 s, followed by 20 cycles (for AFR1-4 and EUR1-4) or 16 cycles (for AFR5-7 and EUR5-7) of 98 °C for 5 s, 70 °C for 15 s, 72 °C for 30 s, and a final extension at 72 °C for 1 min. PCR products were agarose-gel-purified and further concentrated by a DNA clean & concentrator kit (Zymo Research). DNA libraries prepared from the HindIII-treated and untreated samples of the same donor were mixed in a 3:2 ratio, respectively. HindIII-treated and untreated mixtures from three donors were combined and paired-end sequenced with 50% control yeast-genomic DNA library by Illumina NextSeq 550 Mid Output 300 cycles at the Technion Genome Center and at the Genomic Center of the Azrieli Faculty of Medicine, Bar-Ilan University.
Sequence Analysis.
Paired-end (PE) sequences were merged via Pear (104) using the default options. Merged sequences were trimmed from Illumina adapters using Cutadapt (105) and quality-filtered by Trimmomatic (106), using a sliding window size of 3 bp, a Phred quality threshold of 30 and a minimum read length threshold of 90 bp. Quality filtered sequences were trimmed at their ends to retrieve barcode and sample ID data from each read. The 5′ end was trimmed up to position 18, to include the 14 bases of the primary barcode and the 4 bases of sample ID-1. Similarly, the last 9 bp at the 3′ end was trimmed to include the 5 bases of the secondary barcode and the 4 bases of sample ID-2. The barcode and the ID information was added to the read header. Only sequences with the correct ID-1, ID-2 and first two bases of APOL1 library sequence (consisting of the PvuII partial recognition sequence) were maintained. Trimmed sequences having the correct sample ID were confirmed for carrying the APOL1 gene fragment based on the bases occupying positions 37 to 41 (ATTCG for the APOL1 sequence), allowing two mismatches and frameshifts of up to 3 or 3 bp upstream of these sorting positions. Approved sequences were mapped to the APOL1 reference sequence (obtained by Sanger-sequencing aliquots from the matching donor samples) using BWA (107) with parameters -M -t. Query sequences whose first sequence position did not align with the first position of the reference sequence were excluded from further analyses. Mapped reads were scanned for presence of mutations using a custom script. High-quality mutations (Phred score 28) were noted. To confirm observed mutations as true variants, reads were grouped by their primary barcodes into “families” and processed according to the MEMDS pipeline (see SI Appendix, Fig. S9 in ref. 27). Briefly, only families containing at least four reads were included in the analysis; those not passing this cutoff were discarded. In the approved families, only mutations passing a mutation frequency cutoff of 0.7 and a secondary barcode cutoff of 3 were accepted. If a mutation failed to pass these thresholds, but the WT base in the same position was accepted by the same cutoff criteria, the base was considered WT. If neither mutation nor WT was accepted at a certain position, this position was considered ambiguous and marked by “N.” Read families having ambiguous positions were excluded from further analyses.
Supplementary Material
Appendix 01 (PDF)
Acknowledgments
We thank Mary Otoo and Joshua Adoboe for help with sample collection and Kim Weaver for extensive help. This publication was made possible through the support of grant 61129 from the John Templeton Foundation to K.L.S. and A.L., ISF grant 3757/20 to K.L.S., grant 62220 from the John Templeton Foundation to A.L. (subaward), and through the support of the Sagol Network. The opinions expressed in this publication are those of the authors and do not necessarily reflect the views of the John Templeton Foundation.
Author contributions
D.M. and A.L. designed research; D.M., R.S., M.B.Y., and D.F.-B. performed research; D.M., E.B., and A.L. analyzed data; M.B.Y. and E.K.H. coordinated and carried out sample collection; M.B.Y., E.K.H., K.L.S., and A.L. obtained Helsinki and IRB approvals; D.M., E.B., and A.L. wrote the paper; D.M., R.S., E.B., K.L.S., and A.L. revised the paper; K.L.S. provided general advice; K.L.S. and A.L. acquired funding; D.M. and A.L. conceived of the project; and A.L. supervised the project.
Competing interests
The University of Haifa has filed a patent application regarding the method described here, No. PCT/IL2022/050502.
Footnotes
This article is a PNAS Direct Submission.
Preprint server: https://doi.org/10.1101/2024.10.10.617206.
Data, Materials, and Software Availability
All raw sequencing data generated in this study are available in the NCBI database of Genotypes and Phenotypes (https://www.ncbi.nlm.nih.gov/gap/) under accession number phs002391.v2.p1 (108). The MEMDS pipeline is available at GitHub: https://github.com/livnat-lab/MEMDS_analysis_pipeline_v1.2 (109).
Supporting Information
References
- 1.Darwin C., The Variation of Animals and Plants under Domestication (John Murray, London, ed. 1, 1868). [Google Scholar]
- 2.Morgan T. H., Evolution and Adaptation (The Macmillan Company, New York, 1903). [Google Scholar]
- 3.Luria S. E., Delbrück M., Mutations of bacteria from virus sensitivity to virus resistance. Genetics 28, 491 (1943). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 4.Vogel F., Motulsky A., Human Genetics: Problems and Approaches (Springer-Verlag, Berlin, 1997). [Google Scholar]
- 5.Roach J. C., et al. , Analysis of genetic inheritance in a family quartet by whole-genome sequencing. Science 328, 636–639 (2010). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 6.Kong A., et al. , Rate of de novo mutations and the importance of father’s age to disease risk. Nature 488, 471 (2012). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 7.Rahbari R., et al. , Timing, rates and spectra of human germline mutation. Nat. Genet. 48, 126 (2016). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 8.Hwang D. G., Green P., Bayesian Markov chain Monte Carlo sequence analysis reveals varying neutral substitution patterns in mammalian evolution. Proc. Natl. Acad. Sci. U.S.A. 101, 13994–14001 (2004). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 9.Carlson J., et al. , Extremely rare variants reveal patterns of germline mutation rate heterogeneity in humans. Nat. Commun. 9, 3753 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 10.Haldane J. B. S., The rate of mutation of human genes. Hereditas 35, 267–273 (1949). [Google Scholar]
- 11.Kondrashov A. S., Direct estimates of human per nucleotide mutation rates at 20 loci causing Mendelian diseases. Hum. Mutat. 21, 12–27 (2003). [DOI] [PubMed] [Google Scholar]
- 12.Schmitt M. W., et al. , Detection of ultra-rare mutations by Next-Generation Sequencing. Proc. Natl. Acad. Sci. U.S.A. 109, 14508–14513 (2012). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 13.Kennedy S. R., et al. , Detecting ultralow-frequency mutations by Duplex Sequencing. Nat. Protoc. 9, 2586 (2014). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 14.Schmitt M. W., et al. , Sequencing small genomic targets with high efficiency and extreme accuracy. Nat. Methods 12, 423–425 (2015). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 15.Nachmanson D., et al. , Targeted genome fragmentation with CRISPR/Cas9 enables fast and efficient enrichment of small genomic regions and ultra-accurate sequencing with low DNA input (CRISPR-DS). Genome Res. 28, 1589–1599 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 16.Short N. J., et al. , Ultra-accurate Duplex Sequencing for the assessment of pretreatment ABL1 kinase domain mutations in Ph+ ALL. Blood Cancer J. 10, 61 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 17.Salazar R., et al. , Discovery of an unusually high number of de novo mutations in sperm of older men using duplex sequencing. Genome Res. 32, 499–511 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 18.Arbeithuber B., et al. , Advanced age increases frequencies of de novo mitochondrial mutations in macaque oocytes and somatic tissues. Proc. Natl. Acad. Sci. U.S.A. 119, e2118740119 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 19.Jee J., et al. , Rates and mechanisms of bacterial mutagenesis from maximum-depth sequencing. Nature 534, 693 (2016). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 20.Cohen J. D., et al. , Detection of low-frequency DNA variants by targeted sequencing of the Watson and Crick strands. Nat. Biotechnol. 39, 1220–1227 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 21.Mattox A. K., et al. , The origin of highly elevated cell-free DNA in healthy individuals and patients with pancreatic, colorectal, lung, or ovarian cancer. Cancer Discov. 13, 2166–2179 (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 22.Wang Y., et al. , Detection of rare mutations, copy number alterations, and methylation in the same template DNA molecules. Proc. Natl. Acad. Sci. U.S.A. 120, e2220704120 (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 23.Pham P., et al. , AID-RNA polymerase II transcription-dependent deamination of IgV DNA. Nucleic Acids Res. 47, 10815–10829 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 24.Li S., MacAlpine D. M., Counter C. M., Capturing the primordial Kras mutation initiating urethane carcinogenesis. Nat. Commun. 11, 1800 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 25.Meissner M. E., et al. , Development of a user-friendly pipeline for mutational analyses of HIV using ultra-accurate maximum-depth sequencing. Viruses 13, 1338 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 26.Tomkova M., et al. , Human DNA polymerase ε is a source of C>T mutations at CpG dinucleotides. Nat. Genet. 56, 2506–2516 (2024). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 27.Melamed D., et al. , De novo mutation rates at the single-mutation resolution in a human HBB gene-region associated with adaptation and genetic disease. Genome Res. 32, 488–498 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 28.Allison A. C., Protection afforded by sickle-cell trait against subtertian malarial infection. BMJ Brit. Med. J 1, 290–294 (1954). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 29.Flint J., Harding R. M., Boyce A. J., Clegg J. B., The population genetics of the haemoglobinopathies. Baillière’s Clin. Haem 11, 1–51 (1998). [DOI] [PubMed] [Google Scholar]
- 30.Feng Z., Smith D., McKenzie F., Levin S., Coupling ecology and evolution: Malaria and the S-gene across time scales. Math. Biosci. 189, 1–19 (2004). [DOI] [PubMed] [Google Scholar]
- 31.Kwiatkowski D. P., How malaria has affected the human genome and what human genetics can teach us about malaria. Am. J. Hum. Genet. 77, 171–192 (2005). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 32.Piel F. B., et al. , Global distribution of the sickle cell gene and geographical confirmation of the malaria hypothesis. Nat. Commun. 1, 104 (2010). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 33.Steinberg M., Adams J. I., Hemoglobin A2: Origin, evolution, and aftermath. Blood 78, 2165–2177 (1991). [PubMed] [Google Scholar]
- 34.Harris K., Evidence for recent, population-specific evolution of the human mutation rate. Proc. Natl. Acad. Sci. U.S.A. 112, 3439–3444 (2015). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 35.Harris K., Pritchard J. K., Rapid evolution of the human mutation spectrum. eLife 6, e24284 (2017). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 36.Lenski R. E., Mittler J. E., The directed mutation controversy and neo-Darwinism. Science 259, 188–194 (1993). [DOI] [PubMed] [Google Scholar]
- 37.Futuyma D. J., Evolution (Sinauer Associates, Sunderland, MA, ed.3, 1998). [Google Scholar]
- 38.Weckerle A., et al. , Characterization of circulating APOL1 protein complexes in African Americans. J. Lipid Res. 57, 120–130 (2016). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 39.Shukha K., et al. , Most ApoL1 is secreted by the liver. J. Am. Soc. Nephrol. 28, 1079–1083 (2017). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 40.Steverding D., The history of African trypanosomiasis. Parasit. Vectors 1, 1–8 (2008). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 41.Uzureau P., et al. , Mechanism of Trypanosoma brucei gambiense resistance to human serum. Nature 501, 430 (2013). [DOI] [PubMed] [Google Scholar]
- 42.Pays E., Vanhollebeke B., Uzureau P., Lecordier L., Pérez-Morga D., The molecular arms race between African trypanosomes and humans. Nat. Rev. Microbiol. 12, 575 (2014). [DOI] [PubMed] [Google Scholar]
- 43.Cooper A., et al. , APOL1 renal risk variants have contrasting resistance and susceptibility associations with African trypanosomiasis. eLife 6, e25461 (2017). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 44.Kruzel-Davila E., Skorecki K., The double-edged sword of evolution. eLife 6, e29056 (2017). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 45.Gbadegesin R., et al. , APOL1 genotyping is incomplete without testing for the protective M1 modifier p. N264K variant. Glomerular Dis. 4, 43–48 (2024). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 46.Tzur S., et al. , Missense mutations in the APOL1 gene are highly associated with end stage kidney disease risk previously attributed to the MYH9 gene. Hum. Genet. 128, 345–350 (2010). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 47.Genovese G., et al. , Association of trypanolytic Apol1 variants with kidney disease in African Americans. Science 329, 841–845 (2010). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 48.Kopp J. B., et al. , APOL1 genetic variants in focal segmental glomerulosclerosis and HIV-associated nephropathy. J. Am. Soc. Nephrol. 22, 2129–2137 (2011). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 49.Thomson R., et al. , Evolution of the primate trypanolytic factor APOL1. Proc. Natl. Acad. Sci. U.S.A. 111, E2130–E2139 (2014). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 50.Limou S., Nelson G. W., Kopp J. B., Winkler C. A., APOL1 kidney risk alleles: Population genetics and disease associations. Adv. Chronic Kidney Dis. 21, 426–433 (2014). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 51.Wang H., et al. , Global, regional, and national life expectancy, all-cause mortality, and cause-specific mortality for 249 causes of death, 1980–2015: A systematic analysis for the global burden of disease study 2015. Lancet 388, 1459–1544 (2016). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 52.Tabachnikov O., Skorecki K., Kruzel-Davila E., APOL1 nephropathy—a population genetics success story. Curr. Opin. Nephrol. Hypertens. 33, 447–455 (2024). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 53.Shlush L. I., et al. , Admixture mapping of end stage kidney disease genetic susceptibility using estimated mutual information ancestry informative markers. BMC Med. Genom. 3, 47 (2010). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 54.Ulasi I. I., et al. , High population frequencies of APOL1 risk variants are associated with increased prevalence of non-diabetic chronic kidney disease in the Igbo people from south-eastern Nigeria. Nephron Clin. Pract. 123, 123–128 (2013). [DOI] [PubMed] [Google Scholar]
- 55.Arbeithuber B., Makova K. D., Tiemann-Boege I., Artifactual mutations resulting from DNA lesions limit detection levels in ultrasensitive sequencing applications. DNA Res. 23, 547–559 (2016). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 56.Ségurel L., Wyman M. J., Przeworski M., Determinants of mutation rate variation in the human germline. Annu. Rev. Genom. Hum. Genet. 15, 47–70 (2014). [DOI] [PubMed] [Google Scholar]
- 57.Campbell C. D., Eichler E. E., Properties and rates of germline mutations in humans. Trends Genet. 29, 575–584 (2013). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 58.Francioli L. C., et al. , Genome-wide patterns and properties of de novo mutations in humans. Nat. Genet. 47, 822 (2015). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 59.Goldmann J. M., et al. , Parent-of-origin-specific signatures of de novo mutations. Nat. Genet. 48, 935 (2016). [DOI] [PubMed] [Google Scholar]
- 60.Lynch M., Rate, molecular spectrum, and consequences of human mutation. Proc. Natl. Acad. Sci. U.S.A. 107, 961–968 (2010). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 61.Turner T. N., et al. , Genomic patterns of de novo mutation in simplex autism. Cell 171, 710–722 (2017). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 62.Hodgkinson A., Eyre-Walker A., Variation in the mutation rate across mammalian genomes. Nat. Rev. Genet. 12, 756–766 (2011). [DOI] [PubMed] [Google Scholar]
- 63.Shendure J., Akey J. M., The origins, determinants, and consequences of human mutations. Science 349, 1478–1483 (2015). [DOI] [PubMed] [Google Scholar]
- 64.Nachman M. W., Crowell S. L., Estimate of the mutation rate per nucleotide in humans. Genetics 156, 297–304 (2000). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 65.Livnat A., Interaction-based evolution: How natural selection and nonrandom mutation work together. Biol. Direct 8, 24 (2013). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 66.Feldman M. W., Liberman U., An evolutionary reduction principle for genetic modifiers. Proc. Natl. Acad. Sci. U.S.A. 83, 4824–4827 (1986). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 67.Altenberg L., Feldman M. W., Selection, generalized transmission and the evolution of modifier genes. I. The reduction principle. Genetics 117, 559–572 (1987). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 68.Martincorena I., Luscombe N. M., Non-random mutation: The evolution of targeted hypermutation and hypomutation. BioEssays 35, 123–130 (2013). [DOI] [PubMed] [Google Scholar]
- 69.Altenberg L., Liberman U., Feldman M. W., Unified reduction principle for the evolution of mutation, migration, and recombination. Proc. Natl. Acad. Sci. U.S.A. 114, E2392–E2400 (2017). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 70.Walsh B., Lynch M., Evolution and Selection of Quantitative Traits (Oxford University Press, Oxford, UK, 2018). [Google Scholar]
- 71.Livnat A., Melamed D., Evolutionary honing in and mutational replacement: How long-term directed mutational responses to specific environmental pressures are possible. Theory Biosci. 142, 87–105 (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 72.Bolotin E., Melamed D., Livnat A., Genes that are used together are more likely to be fused together in evolution by mutational mechanisms: A bioinformatic test of the used-fused hypothesis. Evol. Biol. 50, 30–55 (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 73.Jackson D. A., Hassan A. B., Errington R. J., Cook P. R., Visualization of focal sites of transcription within human nuclei. EMBO J. 12, 1059 (1993). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 74.Dixon J. R., et al. , Topological domains in mammalian genomes identified by analysis of chromatin interactions. Nature 485, 376–380 (2012). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 75.Soler-Oliva M. E., Guerrero-Martínez J. A., Bachetti V., Reyes J. C., Analysis of the relationship between coexpression domains and chromatin 3D organization. PLoS Comput. Biol. 13, e1005708 (2017). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 76.Livnat A., Simplification, innateness, and the absorption of meaning from context: How novelty arises from gradual network evolution. Evol. Biol. 44, 145–189 (2017). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 77.Livnat A., Papadimitriou C., Evolution and learning: Used together, fused together. A response to Watson and Szathmáry. Trends Ecol. Evol. 31, 894–896 (2016). [DOI] [PubMed] [Google Scholar]
- 78.Kleene K. C., Sexual selection, genetic conflict, selfish genes, and the atypical patterns of gene expression in spermatogenic cells. Dev. Biol. 277, 16–26 (2005). [DOI] [PubMed] [Google Scholar]
- 79.Melé M., et al. , The human transcriptome across tissues and individuals. Science 348, 660–665 (2015). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 80.Xia B., et al. , Widespread transcriptional scanning in the testis modulates gene evolution rates. Cell 180, 248–262 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 81.Miller D., Brinkworth M., Iles D., The testis as a conduit for genomic plasticity: An advanced interdisciplinary workshop. Biochem. Soc. Trans. 35, 605–608 (2007). [DOI] [PubMed] [Google Scholar]
- 82.Tracy L., Zhang Z., Transposon persistence and control in germ cells. Curr. Opin. Genet. Dev. 93, 102370 (2025). [DOI] [PubMed] [Google Scholar]
- 83.Britten R. J., Davidson E. H., Gene regulation for higher cells: A theory: New facts regarding the organization of the genome provide clues to the nature of gene regulation. Science 165, 349–357 (1969). [DOI] [PubMed] [Google Scholar]
- 84.Lynch V. J., Leclerc R. D., May G., Wagner G. P., Transposon-mediated rewiring of gene regulatory networks contributed to the evolution of pregnancy in mammals. Nat. Genet. 43, 1154–1159 (2011). [DOI] [PubMed] [Google Scholar]
- 85.Zheng Y., Lorenzo C., Beal P. A., DNA editing in DNA/RNA hybrids by adenosine deaminases that act on RNA. Nucleic Acids Res. 45, 3369–3377 (2017). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 86.Tsuruoka N., et al. , ADAR1 protein induces adenosine-targeted DNA mutations in senescent Bcl6 gene-deficient cells. J. Biol. Chem. 288, 826–836 (2013). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 87.Chandramouly G., et al. , Polθ reverse transcribes RNA and promotes RNA-templated DNA repair. Sci. Adv. 7, eabf1771 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 88.Su Y., et al. , Human DNA polymerase η has reverse transcriptase activity in cellular environments. J. Biol. Chem. 294, 6073–6081 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 89.Jalan M., et al. , RNA transcripts serve as a template for double-strand break repair in human cells. Nat. Commun. 16, 1–16 (2025). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 90.Mattick J., Amaral P., RNA, the Epicenter of Genetic Information (Taylor & Francis, 2023). [PubMed] [Google Scholar]
- 91.Grauso M., Reenan R., Culetto E., Sattelle D., Novel putative nicotinic acetylcholine receptor subunit genes, Dα5, Dα6 and Dα7, in Drosophila melanogaster identify a new and highly conserved target of adenosine deaminase acting on RNA-mediated A-to-I pre-mRNA editing. Genetics 160, 1519–1533 (2002). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 92.Ohlson J., Pedersen J. S., Haussler D., Öhman M., Editing modifies the GABAA receptor subunit α3. RNA 13, 698–703 (2007). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 93.Tian N., Wu X., Zhang Y., Jin Y., A-to-I editing sites are a genomically encoded G: Implications for the evolutionary significance and identification of novel editing sites. RNA 14, 211–216 (2008). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 94.Danecek P., et al. , High levels of RNA-editing site conservation amongst 15 laboratory mouse strains. Genome Biol. 13, 1–12 (2012). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 95.Xu G., Zhang J., Human coding RNA editing is generally nonadaptive. Proc. Natl. Acad. Sci. U.S.A. 111, 3769–3774 (2014). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 96.Chen J. Y., et al. , RNA editome in rhesus macaque shaped by purifying selection. PLoS Genet. 10, e1004274 (2014). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 97.Popitsch N., et al. , A-to-I RNA editing uncovers hidden signals of adaptive genome evolution in animals. Genome Biol. Evol. 12, 345–357 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 98.Park C., Qian W., Zhang J., Genomic evidence for elevated mutation rates in highly expressed genes. EMBO Rep. 13, 1123–1129 (2012). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 99.Livnat A., Love A. C., Mutation and evolution: Conceptual possibilities. BioEssays 46, 2300025 (2024). [DOI] [PubMed] [Google Scholar]
- 100.Pan S., et al. , Transcription-associated mutation promotes RNA complexity in highly expressed genes—a major new source of selectable variation. Mol. Biol. Evol. 35, 1104–1119 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 101.Wang I. X., et al. , ADAR regulates RNA editing, transcript stability, and gene expression. Cell Rep. 5, 849–860 (2013). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 102.Sharpnack M. F., et al. , Global transcriptome analysis of RNA abundance regulation by ADAR in lung adenocarcinoma. EBioMedicine 27, 167–175 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 103.Liu Y., et al. , Research progress of N1-methyladenosine RNA modification in cancer. Cell Commun. Signal 22, 79 (2024). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 104.Zhang J., Kobert K., Flouri T., Stamatakis A., PEAR: A fast and accurate Illumina Paired-End reAd mergeR. Bioinformatics 30, 614–620 (2014). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 105.Martin M., Cutadapt removes adapter sequences from high-throughput sequencing reads. EMBnet 17, 10–12 (2011). [Google Scholar]
- 106.Bolger A. M., Lohse M., Usadel B., Trimmomatic: A flexible trimmer for Illumina sequence data. Bioinformatics 30, 2114–2120 (2014). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 107.Li H., Aligning sequence reads, clone sequences and assembly contigs with BWA-MEM. arXiv [Preprint] (2013). 10.48550/arXiv.1303.3997 (Accessed 1 February 2023). [DOI]
- 108.Melamed D., et al. , De novo mutation rates at the single-mutation resolution in the human genome. dbGaP. https://www.ncbi.nlm.nih.gov/projects/gap/cgi-bin/study.cgi?study_id=phs002391.v2.p1. Deposited 3 December 2024.
- 109.Bolotin E., MEMDS_analysis_pipeline_v1.2. GitHub. https://github.com/livnat-lab/MEMDS_analysis_pipeline_v1.2. Deposited 6 March 2024.
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Supplementary Materials
Appendix 01 (PDF)
Data Availability Statement
All raw sequencing data generated in this study are available in the NCBI database of Genotypes and Phenotypes (https://www.ncbi.nlm.nih.gov/gap/) under accession number phs002391.v2.p1 (108). The MEMDS pipeline is available at GitHub: https://github.com/livnat-lab/MEMDS_analysis_pipeline_v1.2 (109).


