Skip to main content

This is a preprint.

It has not yet been peer reviewed by a journal.

The National Library of Medicine is running a pilot to include preprints that result from research funded by NIH in PMC and PubMed.

medRxiv logoLink to medRxiv
[Preprint]. 2025 Mar 14:2023.11.27.23299062. Originally published 2023 Nov 28. [Version 2] doi: 10.1101/2023.11.27.23299062

Proteome-wide model for human disease genetics

Rose Orenbuch, Courtney A Shearer, Aaron W Kollasch, Hansen D Spinner, Thomas A Hopf, Lood van Niekerk, Dinko Franceschi, Mafalda Dias, Jonathan Frazer, Debora S Marks
PMCID: PMC10705666  PMID: 38076790

Abstract

Identifying variants driving disease accelerates both genetic diagnosis and therapeutic development, but missense variants still present a bottleneck as their effects are less straightforward than truncations or nonsense mutations. While computational prediction methods are sufficiently accurate to be of clinical value for variants in known disease genes, they do not generalize well to other genes as the scores are not calibrated across the proteome 1–6 . To address this, we developed a deep generative model, popEVE, that combines evolutionary information with population sequence data 7 and achieves state-of-the-art performance on a suite of proteome-wide prediction tasks, without overestimating the prevalence of deleterious variants in the population. popEVE identifies 442 genes in a developmental disorder cohort 8 , including evidence of 123 novel candidates, many without the need for cohort-wide enrichment. Candidate genes are functionally similar to known developmental disorder genes and case variants tend to fall in functionally important regions of these genes. Finally, we show that these findings can be reproduced from analysis of the patient exomes alone, demonstrating that popEVE provides a new avenue for genetic analysis in situations where traditional methods fail, including genetic diagnosis of rare-as-one diseases, even in the absence of parent sequencing.

Full Text

The Full Text of this preprint is available as a PDF (992.7 KB). The Web version will be available soon.


Articles from medRxiv are provided here courtesy of Cold Spring Harbor Laboratory Preprints

RESOURCES