Abstract
Evidence from human genetics supporting the therapeutic hypothesis increases the likelihood that a drug will succeed in clinical trials. Rare and common disease genetics yield a wide array of alleles with a range of effect sizes that can proxy for the effect of a drug in disease. Recent advances in large scale population collections and whole genome sequencing approaches have provided a rich resource of human genetic evidence to support drug target selection. As the range of phenotypes profiled increases and ever more alleles are discovered across worldwide populations, these approaches will increasingly influence multiple stages across the lifespan of a drug discovery programme.
Keywords: Genetics, Genomics, Drug Discovery, GWAS, Sequencing
Introduction
Over the past 75 years, the development of new therapeutic drugs for serious illness has had a major impact on improving lifespan and quality of life. Early successes were mostly characterised by chance observations and application of emerging chemistry on successful folk medicines [1]. Over time, drug discovery has evolved to be a targeted endeavour, with mostly protein targets selected based on growing understanding of disease biology with cellular and animal models providing most of the insights to support clinical translation. Unfortunately, this period has also seen research and development (R&D) productivity decline dramatically. In the 1970’s, about 10 new drugs were approved for every billion US dollars R&D spent. By 2000, the same investment yielded less than one approved drug per year [2]. New technologies increased the scale and cost of R&D capabilities, but low success rates in clinical development largely driven by poor effects on disease outcomes resulted in ever lower productivity. However, over the past decade we have seen evidence that this decline may be stabilising, and perhaps even reversing [2].
These productivity gains are increasingly being driven by improvements in understanding of the causes of rare disease, cancer and, more recently, common complex disease provided by genetics and genomics. Identifying genes underlying rare genetic disease has opened many of them to effective therapeutic development, with much higher average development success rates than seen in most other clinical areas [3]. Diagnosis of cancer at the causal genetic and genomic level has resolved apparently similar diseases at the histological scale to often a highly heterogeneous set of diseases from the perspective of the underlying mutations. Designing therapies with these differences in mind, and including them in patient screening, improves development success rates at least two fold [4]. In addition to these “simple” conditions, at least from a causal genetics perspective, what has been equally remarkable is that evidence from genes with subtle effects on the risk of more complex, multifactorial diseases, result in drug mechanisms that are at least twice as likely to succeed in clinical development compared to drugs lacking such support. Insights into these more marginal causal genetic factors can be leveraged to identify and prioritise more effective drug discovery opportunities [5][6]. Moreover, a recent retrospective analysis found that 2 out of 3 of the 2021 US Food and Drug Administration (FDA)-approved drugs are supported by human genetics evidence [7]. Taken together, integration of genetic evidence into drug discovery can dramatically improve overall R&D productivity.
Rare Disease Genetics
Rare genetic diseases are usually attributed to protein-coding mutations affecting a single gene (monogenic). Their pathological mechanisms tend to be better understood facilitating translation of these insights into drug development. Examples include mutations in the CFTR gene in cystic fibrosis, in the beta-globin gene in sickle cell anemia, and in Huntington gene HTT in Huntington’s disease. Advances in technology including access to whole exome and genome sequencing have revolutionised our understanding of the genetic basis of a broad spectrum of rare diseases and interpretation of their clinical consequences [8].
Often, knowledge of the underlying causal gene in rare disease can be directly utilised for drug development. Duchenne muscular dystrophy (DMD) is a lethal disease caused by deletions in one or more exons of the dystrophin gene (mainly exons 45, 48 and 51) causing a reading frameshift and premature termination, resulting in a truncated dysfunctional protein and consequently muscle degeneration [9]. This understanding has enabled development of targeted exon skipping therapies. Casimersen is an antisense oligonucleotide drug designed to bind to exon 45 of the dystrophin pre-mRNA, leading to exon removal during splicing, thereby correcting the reading frame to give a shorter but functional dystrophin protein in treated patients [10]. Casimersen received its first approval In 2021 [11].
Rare disease genetics can also inform drug discovery in common disease. Identification of a spectrum of rare protein-coding mutations within a gene ranging from loss of function (LoF) to gain of function (GoF) mutations combined with phenotypic and clinical information can inform the potential efficacy and toxicity from therapeutically modulating that protein in humans [12]. Given that the majority of drugs are inhibitors, attention has turned towards rare LoF mutations that protect against disease and gain of function mutations that increase disease risk (Figure 1). A well known example is the proprotein convertase PCSK9. LoF mutations in PCSK9 reduce serum LDL cholesterol levels and protect against coronary heart disease while GoF mutations increase risk of hyperlipidemia, a condition that significantly increases early-onset cardiovascular disease risk [13]. Functional experiments demonstrated that, after entering circulation, PCSK9 binds LDL receptors promoting their endocytosis and degradation and reducing cholesterol removal from blood [14]. These evidence combined led to development of PCSK9 inhibitors and by 2017 two (alirocumab and evolocumab) were approved by the FDA to treat patients with high cholesterol unresponsive to statins or diet [15] [16] [17].
Figure 1. Framework used to derive the desired therapeutic target modulation in the context of a phenotype.
Loss of function mutations associated with protective phenotypes or gain of function mutations associated with risk phenotypes make the gene an attractive target for inhibitor drugs. Conversely loss of function mutations associated with risk phenotypes or gain of function mutations associated with protective phenotypes favour drugs with an activator mechanism of action.
Common Disease Genetics
Many complex diseases including neurodegeneration, auto-immunity and type 2 diabetes are partially driven by many common genetic variants, each having a modest individual effect on the phenotype [18]. Genome-wide association studies (GWAS) approaches were designed to map this polygenic architecture by scanning thousands or millions of common variants in the genome for the subset present at significantly higher frequency in patients compared to controls. Unlike rare disease-causing mutations, most GWAS-associated variants fall in the non-coding part of the genome (mainly intergenic and introns) suggesting that they affect gene expression rather than protein structure. The connection of associated variants with their likely targeted gene is laborious requiring integration of a range of transcriptomic, proteomic and epigenomic data across different cell types and tissues [19]. Functional validation is often conducted to pinpoint the likely causal variant and the actual causal gene. Other statistical approaches, such as Mendelian randomisation [20][21], can be used post-GWAS to implicate a protein or a gene causally with an outcome [22].
The first GWAS for any common human disease was in 2005 for age-related macular degeneration (AMD), a condition with complex etiology including gradual loss of central vision [23]. Two decades prior, genetic linkage analysis in families implicated a region of chromosome 1 in the disease [24]. That same region was identified in the GWAS and a common coding missense variant in the complement factor H gene (CFH) was strongly associated with increased AMD risk [23]. Over the next decade, additional common variants were identified in or near other complement genes including CFB, C2, C3, C9 and CFI further implicating the complement pathway in AMD [25]. Complement inhibitors are either currently under development or being repurposed to treat AMD [26].
Allelic series
Recent analysis of rich genotypic and phenotypic datasets suggests the historical separation of monogenic and polygenic diseases may be too simplistic. In reality most genetic diseases and complex traits have risk alleles distributed across a continuous spectrum of frequencies and effect sizes [27] [28].
An allelic series refers to a set of naturally occurring rare and common variations within the same gene with different effects on phenotype severity. This can help estimate the quantitative relationship between gene function and the clinical phenotype generating a dose-response curve approximating that of a potential drug [12] (Figure 2). For instance, loss of function mutations identified through exome/whole genome sequencing could proxy for the effect of complete inhibition of a gene product, while milder common regulatory variants downregulating the gene identified through GWAS could be equivalent to partial inhibition.
Figure 2. Schematic dose-response curve.
A range of alleles perturbing target function is used to estimate the impact on clinically-relevant phenotypes. In this case LoF alleles have a protective effect on disease risk as for PCSK9 where LoF reduces serum LDL cholesterol levels and protects against coronary heart disease. For the more common case where LoF increases disease risk the y axis would be reversed.
By combining rare and common variant signals from whole exome sequencing (WES) and GWAS data in half a million UK Biobank participants, Regeneron scientists recently identified that rare variant associations were often found near a GWAS signal for the same trait and the majority remained significant after conditioning on the common signal [29] suggesting that many novel allelic series could be found. Leucine-rich repeat kinase 2 (LRRK2) is a serine/threonine-protein kinase that phosphorylates a range of proteins and affects neuronal plasticity and autophagy. Rare GoF mutations in LRRK2 with high and moderate penetrance and common regulatory variants affecting LRRK2 gene expression significantly increase Parkinson’s disease risk [30], suggesting that inhibiting LRRK2 kinase activity may be therapeutic. Substantial pre-clinical and clinical effort has been undertaken to reduce the toxicity linked to LRRK2 hyperactivity yielding several inhibitor drugs currently in clinical trials [31].
Future Prospects
Growing evidence for genetic support as a positive prognostic indicator for clinical drug trials and increased investment of pharmaceutical companies in genetics (and geneticists) [32] suggests that the value of genetic evidence will continue to grow. Increasingly, evidence will come from systematic analysis in populations such as UK Biobank [33], Finngen [34] or 23andMe [35] and other national biobanks with increased use of whole exome and genome sequencing integrated with electronic health records (EHRs). Other notable initiatives among many include the NIH’s AllofUs programme, the Danish Genome project, the Million Veterans Program and expanded genome sequencing in the UK’s National Health Service. With larger populations and broad phenotyping, ever smaller but significant effects on relevant traits in disease will be revealed, as well as drug repurposing opportunities or potential safety liabilities from genetic analysis across traits (PheWAS) [36]. Furthermore with increased population diversity, other population-specific effects will be detected absent from the primarily European populations studied so far. Differences in linkage disequilibrium structure between populations can also help to resolve the causal gene [37]. It will become possible to generate gene function-clinical phenotype combinations for nearly every gene in the genome. In fact, sequencing ~5 million individuals is predicted to identify more than 500 heterozygous LoF carriers for around 15,000 genes [29]. Furthermore, accessing consanguineous populations where homozygous LoF are prevalent with recall by genotype for deep carrier phenotyping will speed up our understanding of gene function-clinical phenotype relationships [38,39]. Each of these associations is a potential therapeutic hypothesis to test. As this map of the genetic burden in disease improves, drug discovery scientists will also look to network analysis to identify suitably tractable targets connected to but not directly implicated by genetic evidence [40].
The increased number of hypotheses to test will require higher throughput approaches to target validation. Genetics and genomics approaches can assist here with increasingly sophisticated techniques including high throughput gene editing and single cell analysis to unpick the relevant biological mechanisms. Initiatives such as the Human Cell Atlas [41] and the Atlas of Variant Effects [42] are just some of the resources providing this information.
We anticipate continued advancements in machine learning technology, particularly applications of deep neural networks, to have an increasing impact on our ability to integrate vast, multi-modal genetic and genomic data to better understand how genes and the molecular pathways they encode affect disease [43]. Just as AlphaFold [44] sparked a variety of entirely new structural biology research applications, new approaches to predicting and understanding the functional impact of genetic variants will advance translational genetics. Examples include deep neural networks trained to discriminate pathogenic from benign variants, such as PrimateAI, SpliceAI, DeepSEA, and DanQ, as well as large language models such as ESM1b that learn the language of proteins to recognize the functional impact of protein changes [45].
Current genetics studies primarily examine the onset of disease. However future studies will focus on disease progression either through collections of specific patient populations, or availability of longitudinal imaging datasets such as that recently announced by UK Biobank. It is very likely that clinical trials will increasingly collect genetic information, particularly if there is a genetic biomarker for patient stratification. However the size of clinical trials is such that power to detect an effect in a single trial is low. More likely, identifying genetic effects on drug efficacy/response will come from large scale analysis of the EHRs in the large biobanks [46].
In addition to implicating targets, genetics can stratify the disease population. In the future we will see many more diseases treated on the basis of single biomarkers or more likely collections of genetic markers derived from polygenic risk analysis [47]. This vision of personalised medicine based on genetic stratification of the disease is already possible in oncology but will become more prevalent elsewhere as molecular signatures of disease are collected. Eventually these signatures may even be derived from machine learning of a combination of genotype with clinical phenotypes and cellular and molecular profiling of the patient and disease tissue. In a future of near universal clinical genome sequencing applied across the health services, genetics can drive changes in clinical care, discovery research, and drug development.
Acknowledgements
This work was funded by Open Targets. This research was funded in part by a Wellcome Trust grant [Grant number 206194]. For the purpose of Open Access, the authors have applied a CC-BY public copyright licence to any author-accepted manuscript version arising from this submission. We thank Helena Cornu for assistance with preparation of Figure 1.
Footnotes
Declaration of Interests
The authors declare they have no conflicts of interest. MRN is an employee of Deerfield.
References
- 1.The Practice of Medicinal Chemistry. Academic Press; 2008. A History of Drug Discovery: From first steps of chemistry to achievements in molecular pharmacology; pp. 1–62. [Google Scholar]
- 2.Ringel MS, Scannell JW, Baedeker M. Schulze U: Breaking Eroom’s Law. Nat Rev Drug Discov. 2020;19:833–834. doi: 10.1038/d41573-020-00059-3. [DOI] [PubMed] [Google Scholar]
- 3.Blin O, Lefebvre M-N, Rascol O, Micallef J. Orphan drug clinical development. Therapie. 2020;75:141–147. doi: 10.1016/j.therap.2020.02.004. [DOI] [PubMed] [Google Scholar]
- 4.Wong CH, Siah KW, Lo AW. Estimation of clinical trial success rates and related parameters. Biostatistics. 2019;20:273–286. doi: 10.1093/biostatistics/kxx069. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 5.Nelson MR, Tipney H, Painter JL, Shen J, Nicoletti P, Shen Y, Floratos A, Sham PC, Li MJ, Wang J, et al. The support of human genetic evidence for approved drug indications. Nat Genet. 2015;47:856–860. doi: 10.1038/ng.3314. [DOI] [PubMed] [Google Scholar]
- 6.King EA, Davis JW, Degner JF. Are drug targets with genetic support twice as likely to be approved? Revised estimates of the impact of genetic support for drug mechanisms on the probability of drug approval. PLoS Genet. 2019;15(12):e1008489. doi: 10.1371/journal.pgen.1008489. [* King et al revisit and find support for the contribution of genetic evidence to supporting progress through clinical trials.] [DOI] [PMC free article] [PubMed] [Google Scholar]
- 7.Ochoa D, Karim M, Ghoussaini M, Hulcoop DG, McDonagh EM, Dunham I. Human genetics evidence supports two-thirds of the 2021 FDA-approved drugs. Nat Rev Drug Discov. 2022;21:551. doi: 10.1038/d41573-022-00120-3. [* To reinforce previous suggestions on the utility of human genetics for drug discovery, this retrospective analysis investigated the fraction of the 2021-FDA approved that are supported by human genetics. 33 out of the 50 approved drugs were found to have primary targets associated with the clinical indication or a closely related phenotype or to physically interact with a protein associated with the indication.] [DOI] [PubMed] [Google Scholar]
- 8.100,000 Genomes Project Pilot Investigators. Smedley D, Smith KR, Martin A, Thomas EA, McDonagh EM, Cipriani V, Ellingford JM, Arno G, Tucci A, et al. 100,000 Genomes Pilot on Rare-Disease Diagnosis in Health Care - Preliminary Report. N Engl J Med. 2021;385:1868–1880. doi: 10.1056/NEJMoa2035790. [** This paper highlights how the routine use of whole genome sequencing (WGS) in clinical care is likely to transform the life of patients with rare genetic disorders through genetic diagnosis and better care. As part of the 100,000 Genomes Project and using WGS in the National Health Service (NHS) system in the United Kingdom, this pilot study sequenced ~4,700 patients from ~2,200 families covering a broad spectrum of rare diseases. Diagnostic yield was much higher in family trios, larger pedigrees and monogenic disorders and reached 40-55% for disorders such as intellectual disability, hearing disorders, and vision disorders. The findings will hopefully assist other health systems in considering routine clinical use of WGS in the care of patients with rare diseases.] [DOI] [PMC free article] [PubMed] [Google Scholar]
- 9.Duchenne muscular dystrophy. Nat Rev Dis Primers. 2021;7:14. doi: 10.1038/s41572-021-00255-4. [DOI] [PubMed] [Google Scholar]
- 10.Zakeri SE, Pradeep SP, Kasina V, Laddha AP, Manautou JE, Bahal R. Casimersen for the treatment of Duchenne muscular dystrophy. Trends Pharmacol Sci. 2022;43:607–608. doi: 10.1016/j.tips.2022.04.009. [DOI] [PubMed] [Google Scholar]
- 11.Deng J, Zhang J, Shi K, Liu Z. Drug development progress in duchenne muscular dystrophy. Front Pharmacol. 2022;13:950651. doi: 10.3389/fphar.2022.950651. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 12.Plenge RM, Scolnick EM, Altshuler D. Validating therapeutic targets through human genetics. Nat Rev Drug Discov. 2013;12:581–594. doi: 10.1038/nrd4051. [* Seminal paper describing the concept of a dose-response curve for a prospective drug based on genetic allele effects.] [DOI] [PubMed] [Google Scholar]
- 13.Cohen JC, Boerwinkle E, Mosley TH, Jr, Hobbs HH. Sequence variations in PCSK9, low LDL, and protection against coronary heart disease. N Engl J Med. 2006;354:1264–1272. doi: 10.1056/NEJMoa054013. [DOI] [PubMed] [Google Scholar]
- 14.Qian Y-W, Schmidt RJ, Zhang Y, Chu S, Lin A, Wang H, Wang X, Beyer TP, Bensch WR, Li W, et al. Secreted PCSK9 downregulates low density lipoprotein receptor through receptor-mediated endocytosis. J Lipid Res. 2007;48:1488–1498. doi: 10.1194/jlr.M700071-JLR200. [DOI] [PubMed] [Google Scholar]
- 15.Hao Q, Aertgeerts B, Guyatt G, Bekkering GE, Vandvik PO, Khan SU, Rodondi N, Jackson R, Reny J-L, Al Ansary L, et al. PCSK9 inhibitors and ezetimibe for the reduction of cardiovascular events: a clinical practice guideline with risk-stratified recommendations. BMJ. 2022;377:e069066. doi: 10.1136/bmj-2021-069066. [DOI] [PubMed] [Google Scholar]
- 16.Sabatine MS, Giugliano RP, Pedersen TR. Evolocumab in Patients with Cardiovascular Disease. N Engl J Med. 2017;377:787–788. doi: 10.1056/NEJMc1708587. [DOI] [PubMed] [Google Scholar]
- 17.Schwartz GG, Steg PG, Szarek M, Bhatt DL, Bittner VA, Diaz R, Edelberg JM, Goodman SG, Hanotin C, Harrington RA, et al. Alirocumab and Cardiovascular Outcomes after Acute Coronary Syndrome. N Engl J Med. 2018;379:2097–2107. doi: 10.1056/NEJMoa1801174. [DOI] [PubMed] [Google Scholar]
- 18.Boyle EA, Li YI, Pritchard JK. An Expanded View of Complex Traits: From Polygenic to Omnigenic. Cell. 2017;169:1177–1186. doi: 10.1016/j.cell.2017.05.038. [* This paper proposes an omnigenic model of heritability in common disease such that most or even all genes contribute to disease traits] [DOI] [PMC free article] [PubMed] [Google Scholar]
- 19.Lichou F, Trynka G. Functional studies of GWAS variants are gaining momentum. Nat Commun. 2020;11:6283. doi: 10.1038/s41467-020-20188-y. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 20.Burgess S, Thompson SG. Mendelian Randomization: Methods for Causal Inference Using Genetic Variants. CRC Press; 2021. [Google Scholar]
- 21.Zuber V, Grinberg NF, Gill D, Manipur I, Slob EAW, Patel A, Wallace C, Burgess S. Combining evidence from Mendelian randomization and colocalization: Review and comparison of approaches. Am J Hum Genet. 2022;109:767–782. doi: 10.1016/j.ajhg.2022.04.001. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 22.Ference BA, Ray KK, Nicholls SJ. Mendelian Randomization Study of ACLY and Cardiovascular Disease. Reply. N Engl J Med. 2020;383:e50. doi: 10.1056/NEJMc1908496. [DOI] [PubMed] [Google Scholar]
- 23.Klein RJ, Zeiss C, Chew EY, Tsai J-Y, Sackler RS, Haynes C, Henning AK, SanGiovanni JP, Mane SM, Mayne ST, et al. Complement factor H polymorphism in age-related macular degeneration. Science. 2005;308:385–389. doi: 10.1126/science.1109557. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 24.Weeks DE, Conley YP, Tsai HJ, Mah TS, Rosenfeld PJ, Paul TO, Eller AW, Morse LS, Dailey JP, Ferrell RE, et al. Age-related maculopathy: an expanded genome-wide scan with evidence of susceptibility loci within the 1q31 and 17q25 regions. Am J Ophthalmol. 2001;132:682–692. doi: 10.1016/s0002-9394(01)01214-4. [DOI] [PubMed] [Google Scholar]
- 25.Geerlings MJ, de Jong EK, den Hollander AI. The complement system in age-related macular degeneration: A review of rare genetic variants and implications for personalized treatment. Mol Immunol. 2017;84:65–76. doi: 10.1016/j.molimm.2016.11.016. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 26.Patel PN, Patel PA, Land MR, Bakerkhatib-Taha I, Ahmed H, Sheth V. Targeting the Complement Cascade for Treatment of Dry Age-Related Macular Degeneration. Biomedicines. 2022;10 doi: 10.3390/biomedicines10081884. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 27.Astle WJ, Elding H, Jiang T, Allen D, Ruklisa D, Mann AL, Mead D, Bouman H, Riveros-Mckay F, Kostadima MA, et al. The Allelic Landscape of Human Blood Cell Trait Variation and Links to Common Complex Disease. Cell. 2016;167:1415–1429.:e19. doi: 10.1016/j.cell.2016.10.042. [* GWAS in the UK Biobank and INTERVAL studies testing nearly 30 million genetic variants for association with blood cell traits found hundreds of low frequency (<5%) and rare (<1%) alleles in addition to the common variants.] [DOI] [PMC free article] [PubMed] [Google Scholar]
- 28.Raychaudhuri S. Mapping rare and common causal alleles for complex human diseases. Cell. 2011;147:57–69. doi: 10.1016/j.cell.2011.09.011. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 29.Backman JD, Li AH, Marcketta A, Sun D, Mbatchou J, Kessler MD, Benner C, Liu D, Locke AE, Balasubramanian S, et al. Exome sequencing and analysis of 454,787 UK Biobank participants. Nature. 2021;599:628–634. doi: 10.1038/s41586-021-04103-z. [** This is one of the largest UK Biobank exome data analysis conducted to date. Performed on nearly half a million individuals, the paper provides a unique catalogue of protein-coding variations which when combined with large sample size and in depth phenotypes provides opportunities to elucidate gene function at scale. It also identifies 564 strong target-disease associations. This paper also shows that, often, common variants from GWAS and rare variant signals from exome data overlap and affect the same phenotype independently suggesting that rare variant burden signals can be leveraged to pinpoint effector genes at GWAS loci at scale.] [DOI] [PMC free article] [PubMed] [Google Scholar]
- 30.Nalls MA, Blauwendraat C, Vallerga CL, Heilbron K, Bandres-Ciga S, Chang D, Tan M, Kia DA, Noyce AJ, Xue A, et al. Identification of novel risk loci, causal insights, and heritable risk for Parkinson’s disease: a meta-analysis of genome-wide association studies. Lancet Neurol. 2019;18:1091–1102. doi: 10.1016/S1474-4422(19)30320-5. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 31.Patel A, Patel S, Mehta M, Patel Y, Langaliya D, Bhalodiya S, Bambharoliya T. Recent Update on the Development of Leucine-Rich Repeat Kinase 2 (LRRK2) Inhibitors: A Promising Target for the Treatment of Parkinson’s Disease. Med Chem. 2022;18:757–771. doi: 10.2174/1573406418666220215122136. [DOI] [PubMed] [Google Scholar]
- 32.Bell J. Drug companies pay for exclusive access to UK genetic data. BioPharma Dive. 2019 [Google Scholar]
- 33.Bycroft C, Freeman C, Petkova D, Band G, Elliott LT, Sharp K, Motyer A, Vukcevic D, Delaneau O, O’Connell J, et al. The UK Biobank resource with deep phenotyping and genomic data. Nature. 2018;562:203–209. doi: 10.1038/s41586-018-0579-z. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 34.Kurki MI, Karjalainen J, Palta P, Sipilä TP, Kristiansson K, Donner K, Reeve MP, Laivuori H, Aavikko M, Kaunisto MA, et al. FinnGen: Unique genetic insights from combining isolated population and national health register data. medRxiv. 2022 doi: 10.1101/2022.03.03.22271360. [DOI] [Google Scholar]
- 35.Heilbron K, Mozaffari SV, Vacic V, Yue P, Wang W, Shi J, Jubb AM, Pitts SJ, Wang X. Advancing drug discovery using the power of the human genome. J Pathol. 2021;254:418–429. doi: 10.1002/path.5664. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 36.Pendergrass SA, Brown-Gentry K, Dudek SM, Torstenson ES, Ambite JL, Avery CL, Buyske S, Cai C, Fesinmeyer MD, Haiman C, et al. The use of phenome-wide association studies (PheWAS) for exploration of novel genotype-phenotype relationships and pleiotropy discovery. Genet Epidemiol. 2011;35:410–422. doi: 10.1002/gepi.20589. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 37.Rosenberg NA, Huang L, Jewett EM, Szpiech ZA, Jankovic I, Boehnke M. Genome-wide association studies in diverse populations. Nat Rev Genet. 2010;11:356–366. doi: 10.1038/nrg2760. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 38.Finer S, Martin HC, Khan A, Hunt KA, MacLaughlin B, Ahmed Z, Ashcroft R, Durham C, MacArthur DG, McCarthy MI, et al. Cohort Profile: East London Genes & Health (ELGH), a community-based population genomics and health study in British Bangladeshi and British Pakistani people. Int J Epidemiol. 2020;49:20–21i. doi: 10.1093/ije/dyz174. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 39.Narasimhan VM, Hunt KA, Mason D, Baker CL, Karczewski KJ, Barnes MR, Barnett AH, Bates C, Bellary S, Bockett NA, et al. Health and population effects of rare gene knockouts in adult humans with related parents. Science. 2016;352:474–477. doi: 10.1126/science.aac8624. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 40.Barrio-Hernandez I, Beltrao P. Network analysis of genome-wide association studies for drug target prioritisation. Curr Opin Chem Biol. 2022;71:102206. doi: 10.1016/j.cbpa.2022.102206. [DOI] [PubMed] [Google Scholar]
- 41.Osumi-Sutherland D, Xu C, Keays M, Levine AP, Kharchenko PV, Regev A, Lein E, Teichmann SA. Cell type ontologies of the Human Cell Atlas. Nat Cell Biol. 2021;23:1129–1135. doi: 10.1038/s41556-021-00787-7. [DOI] [PubMed] [Google Scholar]
- 42.Esposito D, Weile J, Shendure J, Starita LM, Papenfuss AT, Roth FP, Fowler DM, Rubin AF. MaveDB: an open-source platform to distribute and interpret data from multiplexed assays of variant effect. Genome Biol. 2019;20:223. doi: 10.1186/s13059-019-1845-6. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 43.Eraslan G, Avsec Z, Gagneur J, Theis FJ. Deep learning: new computational modelling techniques for genomics. Nat Rev Genet. 2019;20:389–403. doi: 10.1038/s41576-019-0122-6. [DOI] [PubMed] [Google Scholar]
- 44.Jumper J, Evans R, Pritzel A, Green T, Figurnov M, Ronneberger O, Tunyasuvunakool K, Bates R, Zídek A, Potapenko A, et al. Highly accurate protein structure prediction with AlphaFold. Nature. 2021;596:583–589. doi: 10.1038/s41586-021-03819-2. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 45.Brandes N, Goldman G, Wang CH, Ye CJ, Ntranos V. Genome-wide prediction of disease variants with a deep protein language model. bioRxiv. 2022 doi: 10.1101/2022.08.25.505311. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 46.Wu Y, Byrne EM, Zheng Z, Kemper KE, Yengo L, Mallett AJ, Yang J, Visscher PM, Wray NR. Genome-wide association study of medication-use and associated disease in the UK Biobank. Nat Commun. 2019;10:1891. doi: 10.1038/s41467-019-09572-5. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 47.Chatterjee N, Shi J, García-Closas M. Developing and evaluating polygenic risk prediction models for stratified disease prevention. Nat Rev Genet. 2016;17:392–406. doi: 10.1038/nrg.2016.27. [DOI] [PMC free article] [PubMed] [Google Scholar]


