Skip to main content
Cell Genomics logoLink to Cell Genomics
. 2023 Jul 12;3(7):100358. doi: 10.1016/j.xgen.2023.100358

Sampling a wide swathe of primate genetic diversity

Evan E Eichler 1,2,
PMCID: PMC10363911  PMID: 37492108

Abstract

Two studies published in Science report the deepest survey of primate genetic diversity using short-read sequencing to sample ∼47% of extant species. Kuderna et al.1 investigate genetic diversity, mutation rates, and our primate phylogeny, while Gao et al.2 use the data to better classify disease-causing mutations.


Two studies published in Science report the deepest survey of primate genetic diversity using short-read sequencing to sample ∼47% of extant species. Kuderna et al. investigate genetic diversity, mutation rates, and our primate phylogeny, while Gao et al. use the data to better classify disease-causing mutations.

Main text

Following the publication of the draft human genomes,3,4 calls were made for the reconstruction of the evolutionary history of each of the ∼3 billion base pairs of our genetic code. This was motivated by the desire to better identify the genetic differences that make us uniquely human and to improve our understanding of disease-causing mutations. Two recent papers1,2 make significant strides in realizing this grand challenge.

Taking advantage of the evolutionary diversity across the primate phylogeny (Figure 1), Illumina whole-genome sequence data were generated from 703 individuals representing 211 distinct species. These data were combined with previous primate population studies5 to yield 809 genomes from 233 different primate species, representing approximately half of the 521 estimated extant species.

Figure 1.

Figure 1

Primate sequence diversity

Phylogenetic tree of 233 sequenced primate genomes (adapted from data generated by Kuderna et al.1)

Kuderna et al.1 explored primate evolution, including genome-wide patterns of genetic diversity within and between primate groups. Mapping the data to 36 higher-quality reference genomes and using 27 fossil calibration points, they constructed a comprehensive primate phylogenetic tree (Figure 1). They estimated that prosimians and simians diverged 58–63 million years ago (mya). Their estimates push back the divergence of chimpanzee and human between 6.9 and 9 mya.

This dataset suggests that mutation rates vary by ∼6-fold among different primate lineages (0.25–1.62 × 10−8 substitutions/generation). Lemurs harbor the lowest mutation rates and great apes the highest. Humans exhibit low levels of diversity—just greater than several endangered primate species. A positive correlation between mutation rate and generation time is mostly attributed to lower mutation rates in species with larger effective population sizes. Surprisingly, 63% of the human-specific variants with high frequency in humans, yet absent in Neanderthals and Denisovans, are observed in at least one other primate species. This suggests mutational recurrence and cautions against defining lineage-specific events from examinations of limited numbers of species, although the low number of archaic genomes available should be noted. Using this broader survey of primate genomes and requiring an allele frequency of 99.9% in humans, they highlight 124 missense Homo sapiens mutations for more in-depth functional characterization.

Gao and colleagues2 focused specifically on protein-coding variation. They use primate variation data to better distinguish benign from disease-causing protein-altering mutations. The close genetic relationship among primates means that the majority of genes and gene models can be easily tracked across species. Each branch in the primate phylogeny allows for the accumulation of potentially benign missense mutations—essentially multiple runs of natural selection. This increases the power to detect naturally occurring protein-coding changes as additional primate phylogenetic branches are surveyed. Thus, the authors created a primate population variant database of 4.3 million common missense variants—50 times larger than the clinical variant database (ClinVar) generated from human data.

Utilizing ClinVar annotations, 98.7% of these common primate protein-altering variants are estimated to be benign. Armed with this knowledge, Gao et al. train a semi-supervised 3D convolutional neural network, PrimateAI-3D, which incorporates predicted alpha-2-fold protein structure and the distribution of common missense variants from the primate multiple sequence alignment to better predict benign and pathogenic variants. PrimateAI-3D outperforms 15 machine-learning classifiers and estimates an increase in pathogenic variant yield discovery from 1.36- to 2.0-fold, depending on the thresholds applied. Such gains are important for disorders like neurodevelopmental delay and autism, where de novo missense mutations are still mostly classified as variants of unknown significance though more likely to contribute to a greater proportion of patients than loss-of-function mutations.6

While the two studies make important contributions, several limitations and future directions are acknowledged by the authors. The analyses were based on short reads, and gene characterization was limited to genes with unambiguous 1:1 mapping between human and nonhuman primates. There has been considerable duplication during the evolution of the primate lineage—including biomedically relevant genes and regions implicated in unique features of human brain development.7,8 While unclear how many human protein-coding genes were excluded, it is indicated that ∼20% of the genome (0.6 Gbp) could not be considered in this analysis. Related to this, the use of both incomplete references from human (GRCh38) and nonhuman primates may have introduced errors resulting from incorrect sequence data comparisons and mapping. Given the modest number of primate species, a complete telomere-to-telomere genome using long-read sequencing data from each primate species will be a critical step in understanding the evolution of all genes.9,10

Also, for most species, they sequenced only a single representative (133/233) with uncertain disease status. The average of 3.5 individuals per species is skewed by earlier studies whose data were included, such as the deeply surveyed ape lineages.5 Sequencing multiple individuals should provide greater clarity on common versus rare species-specific variants and improve the predictive value of their algorithm. Finally, many more functional tests of the predictions of PrimateAI-3D, including comparisons with deep mutational scanning data and functional assays in human tissues, are needed. Differences in selective pressure among even closely related species may affect what mutations are tolerated.

In summary, both papers make a compelling case of how more complete genetic information across primates improves our understanding of human genetic variation, including functional changes unique to our species. For disease gene discovery, the sequencing of additional primate genomes complements large-scale surveys of human population controls because of the relatively low degree of genetic diversity within our species. These analyses are particularly timely given that 60% of primates are endangered. There is an irony in that the genetic information present in nonhuman primates—facing extinction due to human interference—may improve our understanding of our own species and the health of our children.

Acknowledgments

This work was supported, in part, by US National Institutes of Health (NIH) grant R01HG002385 to E.E.E. E.E.E. is an investigator of the Howard Hughes Medical Institute.

Declaration of interests

E.E.E. is a scientific advisory board (SAB) member of Variant Bio, Inc.

References

  • 1.Kuderna L.F.K., Gao H., Janiak M.C., Kuhlwilm M., Orkin J.D., Bataillon T., Manu S., Valenzuela A., Bergman J., Rousselle M., et al. A global catalog of whole-genome diversity from 233 primate species. Science. 2023;380:906–913. doi: 10.1126/science.abn7829. [DOI] [PubMed] [Google Scholar]
  • 2.Gao H., Hamp T., Ede J., Schraiber J.G., McRae J., Singer-Berk M., Yang Y., Dietrich A.S.D., Fiziev P.P., Kuderna L.F.K., et al. The landscape of tolerated genetic variation in humans and primates. Science. 2023;380:eabn8153. doi: 10.1126/science.abn8197. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 3.IHGSC Initial sequencing and analysis of the human genome. Nature. 2001;409:860–921. doi: 10.1038/35057062. [DOI] [PubMed] [Google Scholar]
  • 4.Venter J.C., Adams M.D., Myers E.W., Li P.W., Mural R.J., Sutton G.G., Smith H.O., Yandell M., Evans C.A., Holt R.A., et al. The sequence of the human genome. Science. 2001;291:1304–1351. doi: 10.1126/science.1058040. [DOI] [PubMed] [Google Scholar]
  • 5.Prado-Martinez J., Sudmant P.H., Kidd J.M., Li H., Kelley J.L., Lorente-Galdos B., Veeramah K.R., Woerner A.E., O'Connor T.D., Santpere G., et al. Great ape genetic diversity and population history. Nature. 2013;499:471–475. doi: 10.1038/nature12228. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 6.Iossifov I., O'Roak B.J., Sanders S.J., Ronemus M., Krumm N., Levy D., Stessman H.A., Witherspoon K.T., Vives L., Patterson K.E., et al. The contribution of de novo coding mutations to autism spectrum disorder. Nature. 2014;515:216–221. doi: 10.1038/nature13908. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 7.Sudmant P.H., Huddleston J., Catacchio C.R., Malig M., Hillier L.W., Baker C., Mohajeri K., Kondova I., Bontrop R.E., Persengiev S., et al. Evolution and diversity of copy number variation in the great ape lineage. Genome Res. 2013;23:1373–1382. doi: 10.1101/gr.158543.113. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 8.Dennis M.Y., Harshman L., Nelson B.J., Penn O., Cantsilieris S., Huddleston J., Antonacci F., Penewit K., Denman L., Raja A., et al. The evolution and populaton diversity of human-specific segmental duplications. Nat. Ecol. Evol. 2017;1:1–10. doi: 10.1038/s41559-016-0069. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 9.Kronenberg Z.N., Fiddes I.T., Gordon D., Murali S., Cantsilieris S., Meyerson O.S., Underwood J.G., Nelson B.J., Chaisson M.J.P., Dougherty M.L., et al. High-resolution comparative analysis of great ape genomes. Science. 2018;360 doi: 10.1126/science.aar6343. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 10.Nurk S., Koren S., Rhie A., Rautiainen M., Bzikadze A.V., Mikheenko A., Vollger M.R., Altemose N., Uralsky L., Gershman A., et al. The complete sequence of a human genome. Science. 2022;376:44–53. doi: 10.1126/science.abj6987. [DOI] [PMC free article] [PubMed] [Google Scholar]

Articles from Cell Genomics are provided here courtesy of Elsevier

RESOURCES