Abstract
Identifying variants driving disease accelerates both genetic diagnosis and therapeutic development, but missense variants still present a bottleneck as their effects are less straightforward than truncations or nonsense mutations. While computational prediction methods are sufficiently accurate to be of clinical value for variants in known disease genes, they do not generalize well to other genes as the scores are not calibrated across the proteome 1–6 . To address this, we developed a deep generative model, popEVE, that combines evolutionary information with population sequence data 7 and achieves state-of-the-art performance on a suite of proteome-wide prediction tasks, without overestimating the prevalence of deleterious variants in the population. popEVE identifies 442 genes in a developmental disorder cohort 8 , including evidence of 123 novel candidates, many without the need for cohort-wide enrichment. Candidate genes are functionally similar to known developmental disorder genes and case variants tend to fall in functionally important regions of these genes. Finally, we show that these findings can be reproduced from analysis of the patient exomes alone, demonstrating that popEVE provides a new avenue for genetic analysis in situations where traditional methods fail, including genetic diagnosis of rare-as-one diseases, even in the absence of parent sequencing.
Full Text
The Full Text of this preprint is available as a PDF (992.7 KB). The Web version will be available soon.
