Gene Portals: A Framework for Integrating Clinical, Functional, and Structural Evidence into Rare Disease Variant Classification

Tobias Brünger; Ilona Krey; Suyeon Kim; Chiara Klöckner; Scott J Myers; Katrine M Johannesen; Arthur Stefanski; Gary Taylor; Eduardo Perez-Palma; Marie Macnee; Stephanie Schorge; Rebekka S Dahl; Hongjie Yuan; Riley E Perszyk; Sukhan Kim; Sunanjay Bajaj; Ingo Helbig; Jen Q Pan; Mark Farrant; Lonnie Wollmuth; David J A Wyllie; Erkin Kurganov; David Baez; Sameer Zuberi; Christian M Boßelmann; Holger Lerche; Massimo Mantegazza; Sandrine Cestèle; Patrick May; Alina Ivaniuk; Mary Anne Meskis; Veronica Hood; Leah Schust; Kimberly Goodspeed; Jing-Qiong Kang; Amber Freed; Cornelius Gati; Ludovica Montanucci; Arthur Wuster; Marena Trinidad; Steven Froelich; Alexander T Deng; Ángel Aledo Serrano; Artem Borovikov; Artem Sharkov; Arjan Bouman; MJ Hajianpour; Deb K Pal; Leslie Danvoye; Damien Lederer; Tugce R Balci; Eveline E O Hagebeuk; Alexis Heidlebaugh; Kathryn Oetjens; Trevor L Hoffman; Pasquale Striano; Sarah Drewes Williams; Kalene van Engelen; Katherine B Howell; Jean Khoury; Tim A Benke; Vincent Strehlow; Konrad Platzer; Amy Ramsey; Lisa Manaster; Sunitha Malepati; Pangkong Fox; Jeffrey Noebels; Wendy Chung; Annapurna Poduri; Laina Lusk Stripe; Sarah M Ruggiero; Stacey Cohen; Lacey Smith; Sylvia Boesch; Olivia Wilmarth; Anna Jenne Prentice; Esther Cha; Nikita Budnik; Marina P Hommersom; Audra Kramer; Carlos G Vanoye; Guo-Qiang Zhang; Michael Nothnagel; Aarno Palotie; Mark J Daly; Alfred L George, Jr; Yuri A Zarate; Andreas Brunklaus; Stephen F Traynelis; Rikke S Møller; Johannes R Lemke; Dennis Lal

doi:10.64898/2026.03.05.26347086

This is a preprint.

It has not yet been peer reviewed by a journal.

The National Library of Medicine is running a pilot to include preprints that result from research funded by NIH in PMC and PubMed.

[Preprint]. 2026 Mar 6:2026.03.05.26347086. [Version 1] doi: 10.64898/2026.03.05.26347086

Gene Portals: A Framework for Integrating Clinical, Functional, and Structural Evidence into Rare Disease Variant Classification

Tobias Brünger ^1,^2,⁺, Ilona Krey ^3,⁺, Suyeon Kim ¹, Chiara Klöckner ³, Scott J Myers ⁴, Katrine M Johannesen ^5,⁶, Arthur Stefanski ⁷, Gary Taylor ¹, Eduardo Perez-Palma ⁸, Marie Macnee ², Stephanie Schorge ⁹, Rebekka S Dahl ^1,^5,¹⁰, Hongjie Yuan ⁴, Riley E Perszyk ⁴, Sukhan Kim ⁴, Sunanjay Bajaj ¹, Ingo Helbig ^11,^12,^13,¹⁴, Jen Q Pan ¹⁵, Mark Farrant ⁹, Lonnie Wollmuth ¹⁶, David J A Wyllie ¹⁷, Erkin Kurganov ¹⁵, David Baez ¹⁵, Sameer Zuberi ¹⁸, Christian M Boßelmann ¹⁹, Holger Lerche ¹⁹, Massimo Mantegazza ²⁰, Sandrine Cestèle ²⁰, Patrick May ²¹, Alina Ivaniuk ²², Mary Anne Meskis ²³, Veronica Hood ²³, Leah Schust ²⁴, Kimberly Goodspeed ²⁵, Jing-Qiong Kang ²⁶, Amber Freed ²⁷, Cornelius Gati ²⁸, Ludovica Montanucci ¹, Arthur Wuster ²⁹, Marena Trinidad ³⁰, Steven Froelich ²⁹, Alexander T Deng ³¹, Ángel Aledo Serrano ³², Artem Borovikov ³³, Artem Sharkov ³⁴, Arjan Bouman ³⁵, MJ Hajianpour ³⁶, Deb K Pal ³⁷, Leslie Danvoye ³⁸, Damien Lederer ³⁹, Tugce R Balci ^40,⁴¹, Eveline E O Hagebeuk ⁴², Alexis Heidlebaugh ^43,⁴⁴, Kathryn Oetjens ⁴³, Trevor L Hoffman ⁴⁵, Pasquale Striano ⁴⁶, Sarah Drewes Williams ⁴⁷, Kalene van Engelen ⁴⁸, Katherine B Howell ⁴⁹, Jean Khoury ⁷, Tim A Benke ⁵⁰, Vincent Strehlow ³, Konrad Platzer ³, Amy Ramsey ⁵¹, Lisa Manaster ⁵², Sunitha Malepati ⁵², Pangkong Fox ⁵², Jeffrey Noebels ⁵³, Wendy Chung ^54,⁵⁵, Annapurna Poduri ^54,⁵⁶, Laina Lusk Stripe ^11,^12,¹³, Sarah M Ruggiero ^11,^12,¹³, Stacey Cohen ^11,^12,¹³, Lacey Smith ⁵⁶, Sylvia Boesch ⁵⁷, Olivia Wilmarth ^11,^13,⁵⁸, Anna Jenne Prentice ^11,^12,¹³, Esther Cha ⁵⁹, Nikita Budnik ⁶⁰, Marina P Hommersom ⁶¹, Audra Kramer ⁶², Carlos G Vanoye ⁵⁹, Guo-Qiang Zhang ¹, Michael Nothnagel ^2,⁶³, Aarno Palotie ^59,^64,⁶⁵, Mark J Daly ¹⁵, Alfred L George Jr ⁶², Yuri A Zarate ^66,⁶⁷, Andreas Brunklaus ¹⁸, Stephen F Traynelis ⁴, Rikke S Møller ^68,⁶, Johannes R Lemke ^3,⁶⁹, Dennis Lal ^2,^15,^70,^71,⁷²

¹Department of Neurology, The University of Texas Health Science Center at Houston, Houston, TX, USA.

²Cologne Center for Genomics (CCG), University of Cologne, Cologne, 50931, Germany.

³Institute of Human Genetics, University of Leipzig Medical Center, Leipzig, Germany.

⁴Department of Pharmacology and Chemical Biology, and the Center for Functional Evaluation of Rare Variants (CFERV), Emory University School of Medicine, Atlanta, GA, USA

⁵Department of Epilepsy Genetics and Personalized Treatment, Danish Epilepsy Centre, Dianalund, Denmark.

⁶Department of Genetics, University Hospital of Copenhagen, Rigshospitalet, Copenhagen, Denmark

⁷Genomic Medicine Institute and Epilepsy Center, Cleveland Clinic, Cleveland, OH 44195, USA.

⁸Universidad del Desarrollo, Centro de Genética y Genómica, Instituto de Ciencias e Innovación en Medicina, Facultad de Medicina Clínica Alemana, Santiago de Chile 7610658, Chile.

⁹Department of Neuroscience, Physiology and Pharmacology, University College London, London, UK.

¹⁰Institute of Regional Health Research, University of Southern Denmark, Odense, Denmark

¹¹Division of Neurology, Children's Hospital of Philadelphia, Philadelphia, PA, USA.

¹²Department of Biomedical and Health Informatics, Children's Hospital of Philadelphia, PA.

¹³The Epilepsy NeuroGenetics Initiative (ENGIN), Children's Hospital of Philadelphia, PA.

¹⁴Department of Neurology, University of Pennsylvania Perelman School of Medicine, Philadelphia

¹⁵Stanley Center for Psychiatric Research, Broad Institute of MIT and Harvard, Boston, MA, USA.

¹⁶Department of Neurobiology & Behavior and Biochemistry & Cell Biology, Center for Nervous System Disorders, Stony Brook University, Stony Brook, NY, USA.

¹⁷Institute for Neuroscience and Cardiovascular Research, University of Edinburgh, Edinburgh, UK.

¹⁸School of Health and Wellbeing and Royal Hospital for Children University of Glasgow, Glasgow, UK.

¹⁹Department of Neurology and Epileptology, Hertie Institute for Clinical Brain Research, University of Tübingen, 72076, Tübingen, Germany.

²⁰Université Côte D'azur, CNRS UMR7275, Inserm U1323, Institute of Molecular and Cellular Pharmacology, Valbonne - Sophia Antipolis, France.

²¹Luxembourg Centre for Systems Biomedicine, University of Luxembourg, Esch-sur-Alzette, Luxembourg.

²²Department of Neurology, Mayo Clinic in Florida, Jacksonville, Fl, 32224, USA.

²³Dravet Syndrome Foundation, Cherry Hill, NJ, USA.

²⁴FamilieSCN2A Foundation 501(c)(3), Gettysburg, PA, USA.

²⁵Department of Pediatrics, University of Texas Southwestern Medical Center, Dallas, TX, USA.

²⁶Vanderbilt Brain Institute, Vanderbilt University, Nashville, TN 37232, USA.

²⁷SLC6A1 Connect, 1939 Temperence Hill Drive, Frisco, TX 75034, USA.

²⁸Department of Biological Sciences, Bridge Institute, USC Michelson Center for Convergent Bioscience, University of Southern California, Los Angeles, CA 90089, USA.

²⁹BioMarin Pharmaceutical Inc., Novato, CA, USA.

³⁰Innovative Genomics Institute, University of California, Berkeley, CA, USA, 94720

³¹NHS South East Genomic Medicine Service, Guy’s and St Thomas’s NHS Foundation Trust, London SE1 9RT, UK

³²Epilepsy Program, Neurology Department, Hospital Ruber Internacional, Madrid 28034, Spain.

³³Research and Counseling Department, Research Centre for Medical Genetics, Moscow 115478, Russia.

³⁴Veltischev Research and Clinical Institute for Pediatrics and Pediatric Surgery of the Pirogov Russian National Research Medical University, Russia.

³⁵Department of Clinical Genetics, Erasmus MC, University Medical Center Rotterdam, PO Box 2040, Rotterdam 3000 CA, the Netherlands

³⁶Department of Pediatrics, Division of Medical Genetics and Genomics, Albany Medical College, Albany Med Health System, Albany, NY 12208, USA.

³⁷Department of Basic and Clinical Neurosciences, Institute of Psychiatry, Psychology and Neuroscience, King's College, London SE58AF, UK.

³⁸Department of Neurology, Université catholique de Louvain, Cliniques universitaires Saint-Luc, Brussels 1200, Belgium.

³⁹Centre de Génétique Humaine, Institut de Pathologie et de Génétique, 6041 Charleroi, Belgium.

⁴⁰Department of Pediatrics, Division of Medical Genetics, Western University, London, ON N6A3K7, Canada.

⁴¹Department of Pediatrics, Division of Medical Genetics, Schulich School of Medicine and Dentistry, Western University, London, ON N6A3K7, Canada

⁴²Stichting Epilepsie Instellingen Nederland (SEIN), Department of Pediatric Neurology, Heemstede, The Netherlands.

⁴³Department of Developmental Medicine, Geisinger, Danville, PA 17837, USA

⁴⁴Department of Neurology and Developmental Medicine, Kennedy Krieger Institute, Baltimore, Maryland 21205, USA

⁴⁵Department of Regional Genetics, Anaheim, Southern California Kaiser Permanente Medical Group, CA 92806, USA.

⁴⁶Pediatric Neurology and Muscular Diseases Unit, IRCCS Istituto Giannina Gaslini, Genoa 16147, Italy

⁴⁷Division of Genetic and Genomic Medicine, UPMC Children's Hospital of Pittsburgh, Pittsburgh, PA 15224, USA.

⁴⁸Medical Genetics Program of Southwestern Ontario, London Health Sciences Centre, London, ON N6A5W9, Canada.

⁴⁹Department of Neurology, Royal Children's Hospital, Melbourne, VIC 3052, Australia.

⁵⁰Department of Pediatrics, Neurology and Pharmacology, University of Colorado School of Medicine, Anschutz Medical Campus, Aurora, CO, USA.

⁵¹Department of Pharmacology & Toxicology, University of Toronto, Toronto, Ontario, Canada.

⁵²CACNA1A Foundation, Inc., 31 Point Rd, Norwalk, CT 06854, USA.

⁵³Blue Bird Circle Developmental Neurogenetics Laboratory, Department of Neurology, Baylor College of Medicine, One Baylor Plaza, Houston, TX, USA.

⁵⁴Department of Neurology, Harvard Medical School, Boston, MA, USA.

⁵⁵Department of Pediatrics, Boston Children's Hospital and Harvard Medical School, Boston, MA, USA.

⁵⁶Epilepsy Genetics Program, Department of Neurology, Boston Children's Hospital, Boston, MA, USA.

⁵⁷Center for rare Movement Disorders Innsbruck, Department of Neurology, Medical University Innsbruck, Innsbruck, Austria

⁵⁸Inova Health System, Falls Church, VA, USA.

⁵⁹Department of Pharmacology, Northwestern University Feinberg School of Medicine, Chicago, IL 60611, USA.

⁶⁰The Stanley Center for Psychiatric Research and Program in Medical and Population Genetics, The Broad Institute of MIT and Harvard, Cambridge, MA, USA.

⁶¹Department of Human Genetics, Radboud University Medical Center, Donders Institute for Brain, Cognition, and Behaviour, Nijmegen 6500 HB, The Netherlands.

⁶²Department of Pharmacology and Physiology, University of Maryland School of Medicine, Baltimore, MD 20201, USA

⁶³University Hospital Cologne, Medical Faculty, University of Cologne, Cologne, Germany.

⁶⁴Analytic and Translational Genetics Unit, Department of Medicine, Department of Neurology, and Department of Psychiatry, Massachusetts General Hospital, Boston, MA, USA.

⁶⁵Institute for Molecular Medicine Finland (FIMM), HiLIFE, University of Helsinki, Helsinki, Finland.

⁶⁶Division of Genetics and Metabolism, University of Kentucky, Lexington, KY, USA.

⁶⁷Section of Genetics and Metabolism, University of Arkansas for Medical Sciences, Little Rock, AR, USA

⁶⁸Department of Epilepsy Genetics and Personalized Treatment, Danish Epilepsy Centre, member of ERN EpiCARE, Dianalund, Denmark.

⁶⁹Center for Rare Diseases, University of Leipzig Medical Center, Leipzig, Germany.

⁷⁰Center for Neurogenetics, The University of Texas Health Science Center at Houston, Houston, TX 77030, USA.

⁷¹Center for Innovation in Health Informatics, Cook Children’s Health Care System, Fort Worth, TX, USA

⁷²Department of Bioengineering and Department of Computer Science and Engineering, University of Texas at Arlington, Arlington, TX, USA

Corresponding author. Dennis.lal@cookchildrens.org

⁺

Authors contributed equally to this work.

Author’s Contributions

Data collection and curation: I.K., Suy.K., S.J.M., K.M.J., S.S., H.Y., R.E.P., Suk.K., I.H., J.Q.P., M.F., L.W., D.J.A.W., E.K., D.B., S.Z., C.M.B., Ma.Man., Sa.C., A.I., K.G., S.D.W., K.V.E., K.H., J.Q.K., C.G., A.W., M.T., S.F., A.T.D., A.A.S., Art.B., A.Sh., Arj.B., MJ.H., D.K.P., L.D., D.L., T.R.B., E.E.O.H., A.H., K.O., T.L.H., P.S., T.A.B., V.S., K.P., A.R., W.C., A.P., L.L.S., S.M.R., St.C., L.S., Sy.B., O.W., A.J.P., E.C., N.B., M.P.H., A.K., C.G.V., G.Q.Z., And.B., S.F.T., R.S.M., J.R.L. Front and backend development: T.B., C.K., Ma.Mac., Suy.K., A.St., G.T., R.S.D; GP design (feedback): T.B., D.L., C.K., Ma.Mac., S.J.M., I.K., R.S.D., I.H., P.M., M.A.M., V.H., L.S.M., A.F., L.M., S.M., P.F., J.N., M.N., Y.A.Z., And.B., S.F.T., R.S.M., I.K.

Roles

Tobias Brünger: Conceptualization, Writing-original draft, Writing-editing

Ilona Krey: Writing-editing

Suyeon Kim: Writing-editing

Chiara Klöckner: Conceptualization

Gary Taylor: Writing-editing

Eduardo Perez-Palma: Conceptualization, Writing-editing

Marie Macnee: Conceptualization

Rebekka S Dahl: Writing-editing

Hongjie Yuan: Writing-editing

Riley E Perszyk: Writing-editing

Sukhan Kim: Writing-editing

Sunanjay Bajaj: Writing-editing

Ingo Helbig: Writing-editing

Jen Q Pan: Writing-editing

Mark Farrant: Writing-editing

Lonnie Wollmuth: Writing-editing

David J A Wyllie: Writing-editing

Erkin Kurganov: Writing-editing

Christian M Boßelmann: Writing-editing

Holger Lerche: Writing-editing

Patrick May: Writing-editing

Alina Ivaniuk: Writing-editing

Ludovica Montanucci: Writing-editing

Artem Sharkov: Writing-editing

MJ Hajianpour: Writing-editing

Damien Lederer: Conceptualization, Writing-original draft

Tugce R Balci: Writing-editing

Trevor L Hoffman: Writing-editing

Tim A Benke: Writing-editing

Wendy Chung: Writing-editing

Annapurna Poduri: Writing-editing

Sylvia Boesch: Writing-editing

Audra Kramer: Writing-editing

Carlos G Vanoye: Writing-editing

Michael Nothnagel: Writing-editing

Aarno Palotie: Conceptualization

Mark J Daly: Conceptualization

Alfred L George Jr: Writing-editing

Yuri A Zarate: Conceptualization

Stephen F Traynelis: Writing-editing

Rikke S Møller: Writing-editing

Johannes R Lemke: Conceptualization, Writing-editing

Dennis Lal: Writing-editing

PMCID: PMC12976905 PMID: 41822692

Abstract

Rare Mendelian disorders affect 300-400 million people globally. Although genetic testing has become widely adopted, gene-specific evidence for tailored variant interpretation remains scattered across resources. We present Gene Portals, a framework for gene-centered multimodal knowledge bases that co-localize expert-harmonized clinical data, functional assays, population variation, structural annotations and gene-specific ACMG/AMP specifications within a single resource. A modular interface integrates this unified evidence with VCEP-refined ACMG specifications to enable automated gene-specific variant classification, infer molecular mechanisms, and support cross-gene analyses. We demonstrate the framework's utility across five Gene portals spanning eleven neurodevelopmental disorder-associated genes, integrating data from 4,423 individuals with 2,838 unique variants, 36,149 ClinVar submissions, and 1,044 expert-curated molecular readouts. By organizing evidence that is otherwise dispersed across multiple sources into a unified, queryable framework, the SCN, GRIN, CACNA1A, SATB2 and SLC6A1 Gene Portals became widely used community resources and provide an extensible template for standardized rare-disease variant interpretation and mechanism-aware discovery.

More than 7000 rare Mendelian disorders affect 300-400 million individuals gloabally. However, even when the causal gene is known, allelic heterogeneity and pleiotropy generate substantial clinical and mechanistic diversity: different pathogenic variants in the same gene can alter dosage, biophysical properties, protein stability, trafficking, protein-protein interactions, or cell-type–specific function in distinct ways. In genes encoding ion channels, receptors, and transcriptional regulators, these diverse molecular effects, such as loss, gain or mixed function, are associated with broad phenotypic spectra that often blur classical syndromic labels^1-3. As a result, accurate variant classification increasingly requires integrating clinical evidence with variant-specific functional and protein-specific structural data, yet the tailored data remain fragmented and difficult to access for affected individuals, clinicians, genetic counselors, and researchers.

Current resources for variant classification are characterized by two related limitations. First, clinically well-characterized patient cohorts and targeted functional assay data are typically generated, curated, and disseminated in gene- or study-specific efforts, often using heterogeneous formats and vocabularies, with limited harmonization across datasets. Second, even when high-quality data are available, clinical, population, functional, structural, and in silico evidence are typically organized in isolation rather than interconnected within a unified gene-centered framework. As a result, existing platforms capture only isolated components of the broader evidence landscape and rarely support systematic integration of phenotypes linked to individual variants, mechanistic functional evidence, structural context, and gene family-aware information^4-8. Genome-wide tools^9,10 implementing the American College of Medical Genetics and Genomics and American Molecular Pathology (ACMG/AMP) ¹¹ guidelines provide standardized and scalable support for variant classification across large datasets. However, their reliance on generic, large-scale evidence resources and limited support for gene-tailored datasets and annotations may be insufficient for rare-disease genes in which variant pathogenicity and clinical presentation are tightly coupled to positional context and molecular disease mechanisms¹².

Together, these constraints highlight the need for a gene-centric framework that integrates clinical, functional, structural, and population-level evidence within a unified resource to enable tailored data interpretation at single-variant resolution. To address this gap, we developed the Gene Portal (GP) framework, which aggregates and harmonizes multimodal datasets, links variants across transcripts and gene families, and contextualizes evidence along protein sequence and three-dimensional structure. The framework supports both expert-guided and automated ACMG classification and is designed to incorporate continuously updated clinical and experimental data within a centralized infrastructure.

As proof of concept of the framework, we implemented it across five GPs encompassing eleven neurodevelopmental disorder (NDD) associated genes. By transforming previously fragmented datasets into interoperable, mechanism-aware knowledge bases, this implementation establishes a scalable architecture for standardized variant interpretation and cross-gene genotype–phenotype analyses.

Results

A multimodal platform integrating clinical, genetic, structural, and functional evidence

Each GP is built on a gene-centered knowledge base that provides synchronized access to aggregated and harmonized multidomain datasets at single-variant resolution (Fig. 1). For each gene, this knowledge base includes expert-curated clinical phenotypes and functional assay results from peer-reviewed literature. Collaborator-contributed datasets are also integrated and, when unpublished, undergo independent review by multiple external domain experts to confirm data quality and interpretation before inclusion. For a given gene or gene family, variant-level data are represented as unified entities mapped across transcripts, protein isoforms, paralogous genes, and three-dimensional structures, enabling evidence from multiple domains to be interpreted in a shared positional and mechanistic context. Users can interact with the knowledge base through three main modules. The Clinical Overview module allows users to examine aggregated cohort characteristics, including gene-specific phenotype spectra and comorbidities. The Variant Classification module enables users to evaluate individual variants using gene- and variant-tailored ACMG/AMP¹² classification while assessing experimentally derived and/or predicted effects on protein function. The Research module allows users to explore linked clinical and functional evidence across genes and to visualize variant distributions along linear protein sequences and within three-dimensional protein assemblies.

Figure 1 ∣ — The GP unifies curated clinical, population, functional, and *in silico* data into a standardized, gene-specific knowledge base. **Panel 1** summarizes data aggregation across three evidence domains: (A) clinical and population data comprising expert-curated patient cohorts, literature-derived cases, registry datasets, and variant submissions from ClinVar and gnomAD; (B) functional data capturing electrophysiological and other molecular readouts extracted from published and unpublished experiments (see Methods for review details); and (C) computational and structural annotations integrating protein features, AlphaFold and PDB structures, genome-wide prediction scores, and gene-specific functional predictors. **Panel 2** depicts the unified knowledge base backend that links variant, phenotype, functional, and structural information at single-variant resolution. The backend is generated through a standardization and annotation workflow performing VCF and HGVS normalization, isoform and paralog mapping, domain and motif alignment, three-dimensional structure integration, application of computational evidence, and construction of a comprehensive annotated coding SNV backbone. **Panel 3** presents the interactive GP modules: The *Clinical Overview* module summarizes gene-specific phenotypes, comorbidities, and cohort characteristics. The *Variant Classification* module integrates clinical, functional, structural, and population evidence to support automated ACMG classification using gene-tailored criteria. The Research module enables cross-gene exploration of variant–phenotype–function relationships, residue-level mutational patterns, and structural context. Together, these components provide a reproducible and extensible system for mechanism-aware variant classification across rare-disease genes. **Abbreviations:** ACMG: American College of Medical Genetics and Genomics; API: Application programming interface; PDB: Protein Data Bank; HGVS: Human Genome Variation Society nomenclature; VCF: Variant Call Format; SNV: Single Nucleotide Variant.

This framework was first applied to N-methyl-D-aspartate receptor (NMDAR) encoding genes, creating the GRIN portal. At present, there are five independently deployed GPs across eleven clinically and mechanistically heterogeneous NDD-associated genes: the SCN (encoding voltage-gated sodium channels), GRIN (encoding N-methyl-D-aspartate receptors, NMDAR), CACNA1A (encoding the CaV2.1 calcium channel), SATB2 (encoding DNA-binding protein SATB2), and SLC6A1 (encoding the GABA transporter protein type 1) GPs, which are all publicly accessible at https://lalresearchgroup.org. User surveys show that clinicians and researchers report accelerated variant review and additional decision-relevant context, and web analytics demonstrate rapid community uptake with an average of >600 monthly active users globally over a six-month period (Supplementary Data ‘User Feedback and Global Reach’, Supplementary Figure 1). Regular, versioned updates integrate new clinical reports and functional studies to expand coverage over time.

Resources in the Gene Portals

Integrated data landscape.

In their current deployment, the GPs integrate expert-curated clinical information from 4,423 affected individuals, 2,838 unique variants, 36,149 ClinVar submissions, and functional readouts for 1,044 variants across 11 genes. Although ClinVar represents the largest single publicly available collection of patient variants for these genes, only 6,574 variants (18.2%) are classified as likely pathogenic or pathogenic across all genes, whereas 24–58% of ClinVar submissions are variants of uncertain significance (VUS) (Fig. 2B). Against this background, data from clinical research networks substantially expand the spectrum of represented variants; for example, 36.2% (N = 603) of missense variants observed in the curated patient cohorts are absent from ClinVar's list of likely pathogenic and pathogenic variants. The list of expert-curated variants reveals marked gene-specific profiles: missense variants dominate in SLC6A1, most voltage-gated sodium channel genes (SCN genes), and most N-methyl-d-aspartate-type glutamate receptor genes (GRIN genes), whereas null variants are more frequent among individuals with SATB2 variants (Fig. 2A). A central principle of the GP framework is that variant evidence is contextualized at the level of conserved protein positions, which is already implemented in the ACMG/AMP variant curation criteria in epilepsy-related sodium channels¹³. For the GP framework, we have applied this concept across gene families of related (paralogous) genes more broadly: (Likely)-pathogenic variants at a given amino acid residue can provide evidence for the variant classification of other substitutions at the same position, and homologous residues in paralogous genes provide evidence when sequence and structural constraints are conserved^1,11,12. Incorporating mappings on residue-level and across paralogs increases the fraction of missense SNVs that can be informed through established patient variants from 2% to as much as ~30% (Fig. 2C). Complementary, functional assays are available for only 0.2–3.8% of possible missense SNVs, yet positional and paralog-based mapping extends the fraction of all missense SNVs that available functional assays can support up to 23% (Fig. 2D). In addition, to capture broader sequence patterns from curated patient variants, we computed pathogenic variant–enriched regions (PERs)¹⁴, which span 1.5–18.5% of residues across genes (Fig. 2E) and highlight domains enriched for curated and ClinVar likely pathogenic and pathogenic missense variants relative to gnomAD.

Figure 2 ∣ — **(A)** Variant-type distributions are shown for expert-curated individuals across 11 genes.(B) Clinical significance of 36,149 ClinVar submissions harmonized in the shared schema, showing 35–60% uncertain/conflicting classifications across genes. **(C)** Fraction of all possible missense SNVs located at positions represented by curated patient variants, shown as coverage of the same residue, same amino-acid exchange, or paralog-aligned position. **(D)** Equivalent coverage based on functionally tested variants highlights the smaller but complementary sequence space informed by experimental assays. **(E)** Considering the MANE transcript-associated isoform, we calculated the fraction of residues located in pathogenic variant–enriched regions (PERs). PERs cover 1.5–18.5% of coding residues across genes. N indicates the number of residues per gene that fall within PERs. **Abbreviations**: P: Pathogenic; LP: Likely-pathogenic; VUS: Variant of uncertain significance; LB: Likely-benign; B: Benign; Null: nonsense, frameshift, canonical splice-site, and start-loss variants; Other: non-missense and non-null variants, including in-frame indels and multigene copy-number variants.

Integrated pathogenicity and functional evidence.

The harmonized GP backend also enables systematic evaluation of pathogenicity metrics and gene-level functional prediction models across gene families. The distributions of six widely used variant pathogenicity classification scores (AlphaMissense¹⁵, REVEL¹⁶, CADD¹⁷, EVE, PARA-Z¹⁸, and MTR¹⁹) differ between CACNA1A, disease-associated GRIN genes (GRIN1, GRIN2A, GRIN2B, GRIN2D), SCN genes (SCN1A, SCN2A, SCN3A, SCN8A), SATB2, and SLC6A1, and across expert-curated patient, ClinVar, and gnomAD variants (Fig. 3). These patterns highlight strong gene- and score-specific variation in the ability to distinguish presumably pathogenic and benign variants. In each GP, the distributions of prediction scores for patient-derived and population reference variants within the same gene are displayed alongside the user-selected variant, providing visual context to guide the selection and interpretation of computational evidence for variant classification.

Figure 3 ∣ — Distributions of six widely used pathogenicity and constraint scores (AlphaMissense, REVEL, CADD, EVE, PARA-Z and MTR) across curated patient variants, ClinVar pathogenic/likely pathogenic and benign/likely benign variants, and control variants from gnomAD for the four gene families represented in the GPs. The patterns reveal gene- and score-specific separability between presumably pathogenic and benign variants. **Abbreviations**: P: Pathogenic; LP: Likely-pathogenic; LB: Likely-benign; B: Benign.

Gene Portal modules

All GPs implement a common interface architecture design with three core analytic modules - Clinical Overview (CO), Variant Classification (VC), and Research (R), ensuring consistency and comparability across genes while allowing gene-specific customization. In addition, selected GPs include two optional modules focused on education and prospective data collection, the Educational Resources module and the Registry module. Layout, visualizations, and available filters are adapted to the structure and completeness of the underlying datasets and refined in collaboration with scientific, clinical, and family communities. This combination of a standardized backbone layer with disease community-informed customization ensures methodological coherence across GPs while allowing each GP to remain responsive to the needs of its respective disease community. The GPs are already endorsed by 55 patient advocacy groups (PAGs) (Supplementary Table 1), underscoring broad support for their flexible application.

Module I – Clinical Overview (CO).

The CO module illustrates how gene-specific clinical knowledge is synthesized into an interpretable summary for both experts and non-experts. For example, in the CACNA1A portal, a timeline aggregates key milestones in CACNA1A research from the earliest gene–disease association studies implicating CACNA1A in familial hemiplegic migraine and episodic ataxia to more recent genotype–phenotype and natural-history studies that delineate clinical subtypes (Fig. 4A). The module further provides a structured overview of CACNA1A-related phenotypes (Fig. 4B). All reported features from the curated cohort and literature are standardized to Human Phenotype Ontology (HPO) terms²⁰, enabling quantitative comparisons across variant classes. In the registry-based clinical summary (Fig. 4C), missense variant carriers are enriched for epilepsy and global developmental delay, whereas protein-truncating variants are more frequently associated with ataxia and nystagmus. Together, these views demonstrate how the CO module converts heterogeneous registry and literature data into a gene-specific, variant-type–aware disease profile that can be rapidly inspected in clinical practice.

Figure 4 ∣ — **(A)** Timeline summarizing major milestones in *CACNA1A* research, from early gene-disease association studies to recent genotype-phenotype analyses refining clinical subtypes. **(B)** Overview of *CACNA1A*-related disorders outlining the major neurological and developmental phenotypes, including episodic ataxia, developmental and epileptic encephalopathy, hemiplegic migraine, and autism spectrum disorder. **(C)** Clinical summary displaying the most frequent Human Phenotype Ontology (HPO) terms observed among individuals with *CACNA1A* variants, illustrating differences in phenotype prevalence between missense and null variant carriers.

Module II – Variant Classification (VC).

To demonstrate the functionality of the VC module, we highlight a representative example from the GRIN portal (GRIN2B (NM_000834.5): p. Gly820Ala. Users start by specifying gene, transcript, and variant at the cDNA or protein level (Fig. 5A). The GRIN portal then returns a variant summary including transcript context (NM_000834.5) and an automated ACMG/AMP classification, which is pathogenic in this case (Fig. 5B). This classification is generated directly from co-localized, expert-harmonized clinical and functional datasets integrated within the GRIN portal, that are otherwise unavailable, eliminating the need to retrieve evidence from disparate external sources. Notably, several classification criteria have been refined for the GRIN genes according to the currently preliminary specifications of the ClinGen GRIN Variant Curation Expert Panel (VCEP)²¹, which will be released soon. Each applied criterion is explicitly listed with its assigned weight, and users can immediately export a structured HTML report for documentation or multidisciplinary review. The evidence supporting this classification can be explored across several coordinated panels: In the example shown in Figure 5, a cohort table summarizes six individuals from the Global GRI Registry carrying p.(Gly820Ala) and lists their associated phenotype summaries, while cross-gene sequence alignments identify additional patients with variants at paralogous conserved alignment index positions in GRIN1, supporting application of PS1 and related criteria¹² (Fig. 5C). Functional data visualizations aggregate six experimentally measured molecular parameters that have been selected for clinical interpretation by the ClinGen GRIN VCEP²², indicating a likely loss-of-function (LoF) effect consistent with reduced open probability and altered kinetic properties (Fig. 5D). These results align with the predictions from a GRIN-specific variant function prediction model²³, providing orthogonal support for the ACMG/AMP computational evidence criterion (PP3). In silico scores, including a high REVEL value, separate patient and population variants and support PP3, while the variant location within a pre-computed mutational hotspot displayed in both linear protein sequence and 3D receptor structure contributes to the mutational hotspot criterion (PM1) (Fig. 5E). In addition to mutational hotspots, the GRIN portal displays three-dimensional missense intolerance (3D-MTR), highlighting structurally constrained regions of the protein. By integrating these data into a single interactive view and mapping them directly onto ACMG/AMP criteria in accordance with the most recent VCEP specifications, the VC module allows users to move seamlessly from variant query to transparent, reproducible, gene-tailored classification. The utility of this module for ongoing research efforts is highlighted by a recently published prediction model to predict the functional effects of missense variants in GRIN genes²³, which was built using data from GRIN portal. The GRIN functional predictions are also accessible through the GRIN portal.

Figure 5 ∣ — **(A)** Variant input panel where users can search by gene, transcript, and variant at the cDNA or protein level. **(B)** Automated ACMG classification summary for the *GRIN2B* variant (NM_000834.5:p.Gly820Ala) with an option to export a detailed HTML report (see Supplementary data “GRIN Portal: Variant Analysis Report”). **(C)** Cohort exploration table showing six expert-curated individuals from the Global GRI Registry carrying the same variant and their associated clinical features. **(D)** Functional data visualization summarizing experimentally tested molecular parameters used for classification, indicating a likely loss-of-function (LoF) effect of the N-methyl-D-aspartate receptor (NMDAR). **(E)** Interactive ACMG criterion exploration with data-supported evidence, including pathogenicity predictor plots (PP3), mutational hotspot mapping (PM1), and integrated variant distributions on the linear protein sequence and protein structure alongside tested and predicted functional outcomes.

Module III – Research (R).

The R module demonstrates how the GPs support hypothesis generation using linear protein sequences, 3D structures, and biomedical data visualizations. In the GRIN portal, users can filter by gene, variant type, location within a mutational hotspot, functional consequence, and phenotype using a flexible query interface (Fig. 6A). For example, selecting variants annotated as “likely loss-of-function (LoF)” across the GRIN1, GRIN2A, and GRIN2B cohorts reveals distinct clustering patterns along the linear protein sequence, with most LoF variants localizing in mutational hotspots (Fig. 6B). The same variants are projected onto the NMDAR complex structure, enabling visual assessment of whether LoF variants preferentially affect specific domains, interfaces, or subunits. The phenotype interface links these molecular patterns to clinical outcomes (Fig. 6C). Individuals with missense LoF variants in GRIN1 predominantly exhibit severe intellectual disability, whereas those with missense LoF variants in GRIN2A or GRIN2B span a broader range of cognitive severity. Seizure frequency also differs by gene: more than 80% of individuals with missense LoF variants in GRIN2A have seizures, compared with approximately 20% of those with missense LoF variants in GRIN2B. These cross-gene views illustrate how the R module connects variant location, known functional consequences, structural context, and phenotype, enabling domain-level hypotheses about mechanisms and their correlations with clinical phenotypes and function.

Figure 6 ∣ — **(A)** Filtering panel allowing selection by gene, variant type, mutational hotspot, functional consequence, or phenotype. **(B)** Distribution of likely loss-of-function (LoF) variants mapped onto the linear protein sequence with pathogenic variant–enriched regions (PERs) and visualized on the GLuN1-GluN2A (*GRIN1, GRIN2A*) receptor complex structure (PDB ID: 6MMB). **(C)** Phenotype interface showing degree of intellectual disability and seizure frequency among individuals with LoF variants across *GRIN1, GRIN2A*, and *GRIN2B*.

Module IV – Educational Resources

Most GPs include an educational resource module designed to lower the barrier to entry for clinicians, researchers, trainees, affected individuals, and families. For families, these resources facilitate the return and contextualization of genetic findings by translating complex molecular and clinical concepts into accessible explanations, thereby supporting informed decision-making and sustained engagement. This module provides expert-designed educational videos and tutorials that explain the genetic architecture of each disorder, the principles of functional assays, and key aspects of variant classification. Subtitles in multiple languages enhance accessibility for international users and patient communities. GPs also include, if available, curated links to relevant patient advocacy groups, facilitating bidirectional communication between clinicians, researchers, and affected families.

Module V – Registry

Both the GRIN and SATB2 GPs include an integrated registry module that provides families with easy access to, and enrollment and participation in research directly through the GPs. As an illustrative example, the GRIN portal implements a REDCap-based Global GRIN Registry that captures harmonized genetic and phenotypic information across a broad range of disease features. After enrollment, data will be curated and validated, and if questions or inconsistencies arise, families will be re-contacted to request additional information, if necessary. The registry currently includes data from 773 affected individuals, and participants may be re-contacted for eligibility screening in ongoing research studies, natural history efforts, or interventional trials. Thus, the GP is not only a repository for newly generated clinical and functional data to be aggregated, curated, and used in research projects, but also coordinates clinical trial readiness through registries maintained by patient advocacy organizations. The registry data of Module V feeds back the clinical overview displayed in Module I after an additional curation for quality assurance.

Particularly, modules IV and V extend the framework from a static knowledge base to a living ecosystem that supports education, community engagement, and prospective data collection.

Discussion

Genomic resources such as ClinVar⁵, HGMD⁴, and DECIPHER⁷ catalog disease-associated variants, while others, such as MaveDB⁸, list variant consequences quantified in high-throughput assays. These resources provide indispensable references at population or genome-wide resolution. However, for most rare-disease genes, the underlying clinical and experimental evidence remains fragmented across case reports, patient registries, and isolated functional studies, which are inconsistently formatted and often difficult to access and integrate. Patient registries are a particularly valuable, yet underutilized, source of variants not uploaded to ClinVar by diagnostic laboratories (36.2% of missense variants in our resources), as well as rich longitudinal and deeply phenotyped clinical data. GPs address this gap through community-driven, gene-centered knowledge bases that interconnect expert-curated clinical phenotypes, functional assay data, and structural annotations, alongside population references, within unified, variant-centric, modular interfaces. This integration enables gene-tailored variant classification grounded in mechanism-specific evidence and supports systematic exploration of genotype–phenotype–function relationships that are not accessible through genome-wide resources alone. By combining these analytical interfaces with educational resources and direct links to patient registries, the GPs function not only as centralized gene-specific knowledge bases but also as integrative platforms that connect clinicians, researchers, families, and patient advocacy organizations.

Accurate variant classification increasingly requires gene-tailored evidence models augmented by expert-guided criteria. Cross-gene analyses consistently show that the weights for a pathogenicity prediction algorithm for variant classification cannot be reliably inferred from genome-wide calibrations alone²⁴: even closely related neurodevelopmental genes differ in constraint patterns, functional parameter space, and phenotype–mechanism coupling, such that cross-gene thresholds for these prediction algorithms obscure crucial biological heterogeneity^22,25. This recognition underlies ClinGen’s move toward gene- and disease-specific ACMG/AMP adaptations, where expert panels define criteria calibrated to known mechanisms, paralog structure, and phenotypic spectrum (ClinGen VCEP Protocol, Version 11, 2023²⁶). The GPs operate this principle systematically and at scale: they aggregate and harmonize clinical and molecular datasets that are not available in genome-wide resources to complement population reference datasets, and, when VCEP specifications exist, embed them directly within gene-specific ACMG/AMP workflows. For the GRIN and SCN GPs, we implemented the published SCN and preliminary (to be released soon) GRIN-specific refinements¹³ into the automated classifiers. By embedding VCEP-refined criteria within interfaces that organize all gene-relevant evidence, the GPs support interpretation grounded in gene-specific biology rather than generic rule application. They complement expert review by consolidating the information required for multidisciplinary discussions, a format increasingly used in rare-disease genetics to improve VUS adjudication and clinical decision-making²⁷.

The GPs provide a research environment in which genotype and phenotype data are harmonized at single-variant resolution, enabling systematic exploration of variant-type- and variant-location-specific clinical patterns and scalable genotype–phenotype modeling. For example, the framework enables the identification of variant-type-specific phenotypic features, including a narrowly constrained age range for seizure onset and offset as well as co-occurrence of schizophrenia in carriers of GRIN2A null variants^28,29. Within the same variant-type–stratified framework, mechanistically informed therapeutic stratification has been suggested, including the selective benefit of L-serine in individuals with GRIN LoF variants and memantine in gain-of-function variants^28,30. In the SCN portal, the harmonized variant–phenotype backbone further enabled the detection of in silico and location-dependent patterns that stratify Dravet syndrome and GEFS+ trajectories in individuals with SCN1A variants, the two main disease subgroups. This information has already supported the development of early-diagnosis prediction models for SCN1A-related epilepsies^31,32. Extending beyond single genes, gene family-based (“paralog-aware”) mappings across the voltage-gated sodium channel family allowed quantitative incorporation of evidence from related genes. This information revealed gene-family–wide phenotype correlations that are inaccessible to isolated gene-centric analyses¹².

Beyond genotype–phenotype correlations, the GPs enable systematic interrogation of how variant location and structural context relate to molecular effect and clinical severity. In GRIN genes, residue-level mapping demonstrated that spatial distance to agonist and antagonist ligand-binding sites is a strong predictor of both pathogenicity and NMDAR functional impact, providing a basis for NMDAR gene-specific machine-learning models that infer variant pathogenicity and effect direction directly from structural features²³. Complementary location-based analyses in SLC6A1 revealed enrichment of loss-of-function variants within specific protein domains and showed that complete loss of GAT1 uptake, encoded by SLC6A1, is associated with more severe clinical phenotypes, directly linking residue position, transporter dysfunction, and disease severity³³. These examples highlight how structural and positional annotation within the GP framework enables mechanistic interpretation of variants and supports quantitative models that connect molecular disruption to patient-level outcomes.

Limitations.

Although the GP framework is designed to be generalizable and has been deployed across diverse gene classes, its depth and interpretability depend on the maturity of gene-specific knowledge and community engagement. GPs developed for genes with limited functional characterization or mechanistic understanding will necessarily provide fewer evidence layers, even though the underlying architecture remains applicable. In addition, certain visualization strategies and evidence metrics reflect assumptions derived from specific molecular contexts and may require validation when extended to other gene classes. While the framework supports the incorporation of emerging variant-relevant dimensions, such as trafficking dynamics or protein–protein interactions, effective expansion requires continued domain expertise and adaptation of the underlying templates. In addition, although the GPs substantially streamline expert review and centralize fragmented data, they are intended as research resources and are not validated for clinical decision-making. Prospective evaluation within diagnostic workflows remains necessary to assess effects on time-to-classification, concordance, reclassification trajectories, and downstream management. Finally, sustained completeness and currency depend on ongoing community contributions, positioning the GPs as collaborative, evolving infrastructures rather than static repositories.

Conclusion.

GPs are a unified, extensible architecture that can transform fragmented clinical, functional, and structural data into interoperable resources for variant classification, and mechanistic analysis in a research context. The framework offers a practical model for developing community-driven, gene-centric knowledge based at scale. As functional assays, population cohorts, and natural history datasets grow, and as computational predictors mature, the GP infrastructure provides a flexible, centralized, and open-access foundation for increasingly automated and comprehensive gene-resolved resources that further strengthen the translation of patient-identified variants into clinically actionable and mechanistically informative insights into rare diseases.

Methods

Gene Portal implementation

The Gene Portals (GPs) were implemented in R (v.4.4.0) using the R Shiny framework (https://shiny.posit.com/) (v.1.9.1), enabling R code to be deployed as interactive web applications accessible across all major browsers, including mobile devices. To ensure stable and portable builds, datasets and Shiny application code were packaged into Ubuntu 20.04 LTS Docker³⁴ images, deployed on ChromeOS “Lakitu” milestone 113 virtual machines from Google’s stable release channel. All visualizations are generated with ggplot2³⁵(v.3.5.2) and plotly³⁶(v.4.10.4) for interactive 2D graphics, while protein structures are displayed in 3D using the r3dmol³⁷(v.0.1.2) R library. The GPs integrate curated clinical cohorts, functional electrophysiology datasets, and the annotated coding reference set into three user-facing modules (Clinical Overview, Variant Classification, and Research; Fig. 1). These modules provide interactive access to the underlying data resources within a standardized annotation framework. All GPs are publicly available via dedicated domains at https://lalresearchgroup.org.

Clinical and functional data aggregation and harmonization

Curated clinical and functional data form the foundation of the GPs, informing all three user-facing modules: they provide gene-level clinical summaries in the Clinical Overview module, enable phenotype lookups and support ACMG-based interpretation in the Variant Classification module and allow phenotype- and function-based filtering across the Research module.

Curated patient cohorts.

Patient-level data were aggregated from peer-reviewed publications, international registries, clinician-led case series, and institutional databases under formal data-sharing agreements. All datasets were de-identified before integration and were approved by the UTHealth institutional review boards (IRB number: HSC-MS-23-1129). Only cases in which a genetic variant in a target NDD gene and the corresponding clinical phenotypes were documented for the same individual were included. At a minimum, cases included a diagnostic label (e.g., Dravet syndrome). Where available, clinical symptoms such as epilepsy, developmental delay, or movement disorder were annotated, together with quantitative measures such as age at seizure onset. These data elements were captured in standardized formats following established Common Data Elements to ensure comparability across sources. Data were acquired through two models: in some GPs (GRIN, CACNA1A, SLC6A1, SATB2), a designated lead collaborator consolidated case-level information prior to transfer; in others (SCN), multiple collaborators contributed datasets directly, and the GP team performed post hoc aggregation. Across sources, potential duplicate cases were systematically identified and excluded. Newly published cases, participants enrolled through disease-specific registries linked to the GPs, and datasets shared by collaborating groups are incorporated on a rolling basis, reflecting the asynchronous availability of curated data from diverse sources. Once retrieved and quality-controlled, contributed datasets are integrated into the GP infrastructure typically within 1-2 days. Public reference resources, including ClinVar and gnomAD, are updated at fixed 6-month intervals. A detailed overview of the clinical cohorts included in each GP is provided in Supplementary Table 2.

ClinVar submissions.

To complement the curated patient cohorts, which contain detailed harmonized phenotype data, we also incorporated patient-level variants submitted to the public ClinVar database³⁸. ClinVar aggregates variant classifications with a phenotype label from clinical laboratories and researchers but typically provides limited or no standardized phenotype information. Variants (accessed May 2024) were downloaded in tabular format from the public FTP site (ftp://ftp.ncbi.nlm.nih.gov/pub/clinvar/) and restricted to those mapped to the genes represented in the GPs. Each variant was harmonized into standardized VCF format and re-annotated using the unified pipeline described below, ensuring direct comparability with cohort-derived data.

Functional datasets.

Curated functional datasets complement the clinical cohorts and provide experimental insights into the molecular effects of variants. Functional data were aggregated from systematic literature reviews, published studies, and dedicated experimental consortia, and include electrophysiological or molecular readouts from variants tested in heterologous expression systems. Only variants where the raw experimental readouts could be reviewed were included. Reported assays encompassed patch-clamp electrophysiology for sodium and calcium channels, transporter activity assays for GAT1 encoded by SLC6A1, and a panel of electrophysiological and expression assays NMDARs encoded by GRIN1, GRIN2A, GRIN2B and GRIN2D. Across all GPs, results were curated into harmonized formats and variants classified into functional categories such as gain-of-function, loss-of-function, mixed, or wild-type–like. These functional datasets represent the largest curated resources of their kind for the included genes and are updated on a rolling basis as new experimental results are generated and shared by collaborators. For variants not yet described in the peer-reviewed literature, a panel of experts outside the institution conducting the functional experiments reviewed the raw electrophysiological data to ensure quality. A detailed overview of the functional data included in each GP is provided in Supplementary Table 3.

Genetic variant standardization and annotation pipeline

To ensure that curated patient cohorts, functionally tested variants, and the full set of all possible missense substitutions can be cross-referenced and jointly analyzed, we developed a unified genetic standardization and annotation pipeline (Fig. 1). Variants not already in VCF format³⁹ were first converted using GeneBe⁴⁰. Each variant was then normalized and annotated using our custom ANNOVAR(v.2023 August)⁴¹ pipeline, which provided HGVS nomenclature at transcript and protein levels, genomic coordinates, exon boundaries, and variant type (e.g., missense, frameshift). Coding sequence alignments (MUSCLE⁴², default parameters) were applied to map variants across isoforms and to equivalent positions in paralogous genes from the same gene family, enabling systematic cross-gene comparisons. Paralogs were defined using HGNC gene-family assignments⁴³ together with downstream subdividing as previously described in Lal et al. 2020¹⁸. The harmonization step provided a standardized backbone for downstream analyses, including cross-cohort comparisons, paralogous residue mapping, and application of the American College of Medical Genetics and Genomics and the Association for Molecular Pathology (ACMG/AMP) criteria¹¹.

Genome-wide and gene-specific pathogenicity and functional prediction scores.

Each variant was annotated with widely used genome-wide in silico pathogenicity predictors, including CADD(v1.4)¹⁷, REVEL(v1.4)¹⁶, AlphaMissense (Science 2023 release)¹⁵, EVE (Nature 2022 release)⁴⁴, MutPred2 (Nat. Comm. 2020 release)⁴⁵, and SpliceAI(v1.4)⁴⁶. These annotations support the application of computational evidence ACMG/AMP criteria PP3 and BP4, providing in silico predictions indicative of deleterious or benign effects. While four of the five GPs use REVEL as the default predictor for supporting PP3 and BP4, the GRIN portal uses MutPred2 for these criteria in accordance with gene-specific recommendations from the GRIN Variant Curation Expert Panel (VCEP) guidelines⁴⁷. In addition, we annotated a position-level population constraint metric, the missense tolerance ratio⁴⁸, and a paralog conservation score to quantify amino acid conservation within the same gene family¹⁸. Where available, gene-specific prediction scores were also annotated. We annotated the FuncIon⁴⁹ and SCION⁵⁰ prediction scores, which estimate the functional effect of a variant (gain- versus loss-of-function) on voltage-gated sodium channels in the SCN portal. Additionally, we annotated the GRIN portal with NMDAR gene-specific pathogenicity and functional prediction scores²³.

Population frequencies and mutational hotspots.

Population allele frequencies were obtained from gnomAD(v4.1.0)⁶, along with gene-level intolerance metrics such as missense Z-score and pLI score⁵¹. To identify protein regions significantly enriched for pathogenic versus control variants, we applied the established approach to calculate pathogenic variant–enriched regions (PERs) using curated patient missense variants as the pathogenic set and all gnomAD missense variants as controls, with parameters set as described in Pérez-Palma et al., 2019¹⁴. PERs were calculated on the MANE select (Matched Annotation from NCBI and EMBL-EBI)⁵² transcripts for each protein and then mapped to alternative isoforms using the isoform alignments described above. Briefly, missense burden was evaluated in sliding windows of nine amino acids with 50% overlap. For each window, counts of patient and population variants inside versus outside the window were compared using a one-sided Fisher’s exact test (R v4.4.0). Multiple testing correction was performed using a Bonferroni adjustment, accounting for the number of windows across the alignment. Significant windows (adjusted P < 0.05) were merged into contiguous PERs. Identified PERs were used to support the application of the mutational hotspot ACMG/AMP criterion (PM1) across all GPs except the GRIN portal. In addition, for the GRIN portal we calculated structure-informed 3D missense tolerance ratio (3D-MTR) scores by comparing observed and expected missense versus synonymous variation across spatially neighboring residues defined by protein structural coordinates, thereby quantifying local constraint in three-dimensional space⁵³. Intolerant microdomains (iMDs) were defined as contiguous clusters of residues with significantly reduced 3D-MTR values within resolved protein structures and were used to delineate spatial mutational hotspot regions. Identified iMDs were incorporated in the GRIN portal, replacing PER-based hotspot definitions for applying the ACMG/AMP PM1 criterion in accordance with gene-specific VCEP guidelines⁴⁷.

Variant and domain mapping on the protein sequence and structures.

Protein domain annotations were retrieved from the UniProt database⁵⁴ and mapped to the UniProt canonical isoform. Domain positions were cross-referenced with MANE⁵² transcripts to ensure consistent representation across resources. Protein structures were obtained from the Protein Data Bank⁵⁵ (PDB) and the AlphaFold structure server(v2)⁵⁶ for the MANE isoform; for PDB entries, structural coordinates were aligned to the MANE transcript using SIFTS⁵⁷. For visualization within the GP, all variants, including curated patient variants, population variants, and functionally tested variants, were mapped to the MANE Select transcript. This framework enables consistent mapping of variants across protein sequences and three-dimensional structural models.

Reference set of all possible coding SNVs.

As part of the GP design, we created a comprehensive reference set of all possible single-nucleotide variants (SNVs) in the coding regions of each covered gene, ensuring that any potential variant can be queried and annotated within the GPs. For each gene, all protein-coding transcripts (RefSeq NM, GRCh38; Ensembl BioMart Release 114⁵⁸) were retrieved using the biomaRt (v.2.62.1) R package⁵⁹, with the MANE canonical isoform designated and flagged. From these sequences, we systematically generated all possible codon substitutions, yielding the full set of potential synonymous, missense, and nonsense SNVs. Each variant was processed through the unified annotation pipeline (see above), which added HGVS annotations, population frequencies, constraint metrics, functional domains, structural coordinates, and in silico prediction scores. This produced a standardized reference dataset of annotated coding SNVs, enabling consistent variant-level interrogation, comparison across cohorts, and alignment of functional and clinical observations within the broader mutational landscape. Within the VC module, this resource allows users to query any possible SNV and immediately obtain integrated annotations, including semi-automated ACMG/AMP classification (described below). In addition to this SNV reference backbone, the GPs support recognition and interpretation of selected non-SNV variant types at query time. Insertions and deletions within protein-coding regions are parsed to determine coding consequences, including in-frame versus frameshift effects, and exon-level deletions are detected, enabling the automated application of ACMG/AMP criteria such as PVS1. For non-SNV variants queried at the cDNA level, population allele frequencies are retrieved from gnomAD where available, and key attributes, including affected exons and predicted loss-of-function status, are reported. Noncoding variants and splice-site variants outside the canonical ±5 bp window are currently not considered. The framework is designed to support backbone-level annotation of additional non-SNV variant types in future releases.

Development of an automated ACMG/AMP classification

To operationalize variant classification within the GP, we implemented automated ACMG/AMP criteria assignment with explicit transparency and user control. Criteria were either automatically applied based on available annotations, assigned by user input, or left modifiable. Up to 19 ACMG/AMP criteria (PVS1, PS1, PS2, PS3, PS4, PM1, PM2, PM4, PM5, PM6, PP1, PP2, PP3, PP4, BA1, BS1, BS3, BP4, BP5) are automatically applied based on available data and user input. Quantitative and database-driven criteria were prefilled directly from the harmonized annotations, with details being outlined in Supplementary Table 4. In brief, PM1 from pathogenic variant–enriched regions (PERs) or intolerant 3d microdomains (GRIN portal); PM2, BA1, and BS1 from gnomAD allele frequencies; PP3 and BP4 from computational prediction scores; PS3 and BS3 from curated functional data; PS1, PS4 and PM5 from curated patient variants and aggregate variant classifications in ClinVar; PVS1 from the annotated variant type and predicted effect (e.g., premature stop, frameshift, canonical splice site); PM4 from variant consequence (e.g., in-frame insertion/deletion); and PP2 from missense constraint at the gene or region level (see GRIN VCEPs). Criteria requiring user input (e.g., PS2/PM6 for de novo status, PP4 for phenotype specificity, or segregation evidence) were user-selectable, with embedded guidance provided to support consistent decision-making. For all other non–data-driven or non–user-prompted criteria, users could still manually adjust application and strength levels. Where available, ClinGen VCEP⁶⁰ gene-specific specifications were incorporated to refine evidence thresholds and criteria application. The application of VCEPs is documented in each GP where applicable. Outputs are generated in real time as a five-tier classification (Pathogenic, Likely Pathogenic, VUS, Likely Benign, Benign), with full visibility of applied criteria, rationales, and thresholds.

Gene Portal-specific customizations

In addition to the core framework implemented across all GPs, each GP was customized to reflect the available data types and community-specific resources. Visualizations in the Clinical Overview and Research sections were adapted to reflect the underlying data structure and the completeness of clinical and functional datasets for each gene. Educational materials were integrated into the GPs in collaboration with the respective family foundations. Short explanatory videos describing the gene or gene family–specific disorders were produced using whiteboard-style animation (VideoScribe v3.9.5, Sparkol 2012; https://www.videoscribe.co/en/download/) and embedded directly within each GP. To improve accessibility and community engagement, each GP also provides direct links to corresponding patient organizations and, where applicable, to active patient registries (e.g., the GRIN portal).

User Survey

To evaluate the usability, functionality, and impact of the GPs on variant classification, we conducted a user survey using the REDCap platform⁶¹. The survey assessed the utility of GPs in interpreting variants in voltage-gated sodium channel genes, NMDAR-encoding genes, SLC6A1, and CACNA1A, particularly for non-experts. Participation was voluntary, and individuals retained the right to withdraw at any time. Researchers and clinicians with prior genetic knowledge were recruited through in-person outreach at scientific conferences and via email invitations sent to collaborating research centers and department heads. Survey responses were collected between July 6 and August 31, 2024. The survey consisted of five parts: (1) eleven questions assessing demographics and prior experience in variant classification; (2) an initial task to classify two randomly assigned variants (from a pool of four) without using the GP, including documentation of the applied ACMG/AMP criteria and final classification for each variant; (3) viewing a tutorial video introducing the GPs (https://www.youtube.com/watch?v=BObzR8qzeE4); (4) reevaluation of the same two variants using the respective GP, and (5) a follow-up questionnaire comprising 16 questions on usability, accessibility, perceived utility, and suggestions for improvement. The Redcap dictionary of the full survey is available as Supplementary Table 5. The study protocol (IRB: HSC-MS-24-0309) adhered to ethical research standards, including obtaining informed consent, maintaining confidentiality, and providing the right to withdraw.

Supplementary Material

Supplement 1

media-1.xlsx^{(19.1KB, xlsx)}

Supplement 2

media-2.xlsx^{(13KB, xlsx)}

Supplement 3

media-3.xlsx^{(11.1KB, xlsx)}

Supplement 4

media-4.xlsx^{(14.1KB, xlsx)}

Supplement 5

media-5.xlsx^{(31.9KB, xlsx)}

Supplement 6

media-6.pdf^{(1.1MB, pdf)}

Acknowledgement

We gratefully acknowledge Felicia Mermer, Maina Kava, Thomas Balslev, Marc Engelen, Marwan Shinawi, Katherine A. Bosanko, Anne Ducros, Kristin Baranano, Elisabetta Indelicato, Julia Koh, Sooyeon Jo, Anna Abuli Vidal and Line Futtrup for their valuable contributions to the development, refinement and community integration of the GPs. We thank the individuals and families who participated in research studies and generously provided feedback that informed us about the design and functionality of the GPs. We further acknowledge the support of all patient advocacy organizations (Supplementary Table 1) and community partners whose sustained collaboration enabled cross-GP dissemination and implementation of this framework.

Funding

Funding for this work was provided by the German Federal Ministry for Education and Research (BMBF, Treat-ION, 01GM1907D) to D.L., T.B., and P.M., by the BMBF (Treat-Ion2, 01GM2210B, 01GM2210A) to P.M and H.L., by the Fonds Nationale de la Recherche in Luxembourg (FNR, Research Unit FOR-2715, INTER/DFG/21/16394868 MechEPI2) to P.M., by the Chilean National Agency for Research and Development to E.P.P., (ANID) Fondecyt grant 1221464 to E.P.P., by the Dravet Syndrome Foundation (grant number, 272016) to D.L, the by NIH NINDS (Channelopathy-Associated Epilepsy Research Center, U54-NS108874) to A.L.G., J.Q.P., C.G.V., I.H., and D.L., the Agence Nationale de la Recherche - France (Initiative of Excellence Université Côte d’Azur ANR-15-IDEX-01) to SC and MM, 23% the MRC (MR/T002506/1) to M.F., the CureGRIN Foundation to M.F.by the NIH-NINDS (NS111619 SFT), the NIH-NIMH (MH127404 H.Y), NICHD (HD082373 H.Y), the GRIN2B Foundation (H.Y), GRIN Therapeutics (S.F.T and S.J.M), Austin’s Purpose (S.F.T), SFARI (732132 to S.F.T), by the University Research Committee (Emory URC to H.Y), by Imagine, Innovate and Impact (I³) Awards from the Emory University School of Medicine and through the Georgia CTSA NIH award (UL1-TR002378; H.Y), by the National Institute of Health grants S10MH133644 (J.Q.P.), NS108874 (J.Q.P.), MH131719 (J.Q.P.), MH129722-02 (M. D.), and Stanley Center for Psychiatric Research (J.Q.P. and M.D.), and by grants from CACNA1A foundation (J.Q.P.) and the Ladders to Cures Scientific Accelerator of the Broad Institute of MIT and Harvard (J.Q.P.). The CACNA1A portal development was funded in part by the Chan Zuckerberg Initiative Rare as One grant. A.P. received support for Simons Searchlight by the Simons Foundation.

Footnotes

Availability of data and materials

Data can be accessed within each GP; downloadable materials are described in the repository, while patient-level data are viewable in the GPs and available upon request of the corresponding author. The code to reproduce the GPs is available in the GitHub repository: (https://gitlab.com/neurogenetics/geneportals).

Conflict of interests

A. Brunklaus has received honoraria for presenting at educational events, advisory boards and consultancy work for Biocodex, Encoded Therapeutics, Jazz Pharma, Servier, Stoke Therapeutics, and UCB. S. Boesch has served as a consultant for VICO Therapeutics, Reata Pharmaceuticals, and Biogen; has participated on advisory boards for Biogen, Reata Pharmaceuticals, and Biohaven; and has received honoraria from Ipsen, Merz, Reata Pharmaceuticals, and Biogen. AW, MT, and SF are current or past employees of BioMarin Pharmaceutical Inc K. Johannesen is on the advisory board of SLC6A1 Connect Europe. SY.B. is a member of the European Reference Network for Rare Neurological Diseases (Project ID No. 739510).

Ethics approval and consent to participate

Use of de-identified patient-level data integrated from institutional sources was approved by the UTHealth Institutional Review Board (IRB HSC-MS-23-1129). The survey evaluating the impact of the Gene Portals among medical professionals was approved under a separate UTHealth IRB protocol (HSC-MS-24-0309). Published and registry-derived data were used in accordance with the originating source terms and applicable governance.

References

1.Brunklaus A. et al. Gene variant effects across sodium channelopathies predict function and guide precision therapy. Brain awac006 (2022) doi: 10.1093/brain/awac006. [DOI] [Google Scholar]
2.Backwell L. & Marsh J. A. Diverse Molecular Mechanisms Underlying Pathogenic Protein Mutations: Beyond the Loss-of-Function Paradigm. Annu Rev Genomics Hum Genet 23, 475–498 (2022). [DOI] [PubMed] [Google Scholar]
3.Strehlow V. et al. GRIN2A-related disorders: genotype and functional consequence predict phenotype. Brain 142, 80–92 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
4.Stenson P. D. et al. HGMD Professional Variant Dataset, Release 2024.2. Dataset. QIAGEN https://doi.org/https://www.hgmd.cf.ac.uk/ (2024). [Google Scholar]
5.Landrum M. J. et al. ClinVar: public archive of relationships among sequence variation and human phenotype. Nucleic Acids Res. 42, D980–985 (2014). [DOI] [PMC free article] [PubMed] [Google Scholar]
6.Karczewski K. J. et al. The mutational constraint spectrum quantified from variation in 141,456 humans. Nature 581, 434–443 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
7.Firth H. V. et al. DECIPHER: Database of Chromosomal Imbalance and Phenotype in Humans Using Ensembl Resources. Am. J. Hum. Genet. 84, 524–533 (2009). [DOI] [PMC free article] [PubMed] [Google Scholar]
8.Esposito D. et al. MaveDB: an open-source platform to distribute and interpret data from multiplexed assays of variant effect. Genome Biology 20, 223 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
9.Kopanos C. et al. VarSome: the human genomic variant search engine. Bioinformatics 35, 1978–1980 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
10.Franklin. https://franklin.genoox.com/clinical-db/home. [Google Scholar]
11.Richards et al. Standards and guidelines for the interpretation of sequence variants: a joint consensus recommendation of the American College of Medical Genetics and Genomics and the Association for Molecular Pathology. Genet Med 17, 405–423 (2015). [DOI] [PMC free article] [PubMed] [Google Scholar]
12.Brünger T. et al. Conserved missense variant pathogenicity and correlated phenotypes across paralogous genes. Genome Biol 26, 197 (2025). [DOI] [PMC free article] [PubMed] [Google Scholar]
13.Sodium channel VCEP. Epilepsy Sodium Channel Variant Curation Expert Panel. (2024). [Google Scholar]
14.Pérez-Palma E. et al. Identification of Pathogenic Variant Enriched Regions across Genes and Gene Families. http://biorxiv.org/lookup/doi/10.1101/641043 (2019) doi: 10.1101/641043. [DOI] [Google Scholar]
15.Cheng J. et al. Accurate proteome-wide missense variant effect prediction with AlphaMissense. Science 381, eadg7492 (2023). [DOI] [PubMed] [Google Scholar]
16.Ioannidis N. M. et al. REVEL: An Ensemble Method for Predicting the Pathogenicity of Rare Missense Variants. Am J Hum Genet 99, 877–885 (2016). [DOI] [PMC free article] [PubMed] [Google Scholar]
17.Rentzsch P., Witten D., Cooper G. M., Shendure J. & Kircher M. CADD: predicting the deleteriousness of variants throughout the human genome. Nucleic Acids Research 47, D886–D894 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
18.Lal D. et al. Gene family information facilitates variant interpretation and identification of disease-associated genes in neurodevelopmental disorders. Genome Med 12, 28 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
19.Traynelis J. et al. Optimizing genomic medicine in epilepsy through a gene-customized approach to missense variant interpretation. Genome Res. 27, 1715–1729 (2017). [DOI] [PMC free article] [PubMed] [Google Scholar]
20.Köhler S. et al. The Human Phenotype Ontology in 2017. Nucleic Acids Res 45, D865–D876 (2017). [DOI] [PMC free article] [PubMed] [Google Scholar]
21.GRIN Disorders Variant Curation Expert Panel - ClinGen ∣ Clinical Genome Resource. https://clinicalgenome.org/affiliation/50078/. [Google Scholar]
22.Myers S. J. et al. Classification of missense variants in the N-methyl-d-aspartate receptor GRIN gene family as gain- or loss-of-function. Hum Mol Genet 32, 2857–2871 (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]
23.Montanucci L. et al. Ligand distances as key predictors of pathogenicity and function in NMDA receptors. Hum Mol Genet 34, 128–139 (2025). [DOI] [PMC free article] [PubMed] [Google Scholar]
24.Tejura M. et al. Calibration of variant effect predictors on genome-wide data masks heterogeneous performance across genes. The American Journal of Human Genetics 111, 2031–2043 (2024). [DOI] [PMC free article] [PubMed] [Google Scholar]
25.Berg A. T. et al. Expanded clinical phenotype spectrum correlates with variant function in SCN2A-related disorders. Brain 147, 2761–2774 (2024). [DOI] [PMC free article] [PubMed] [Google Scholar]
26.VCEP Group. ClinGen Variant Curation Expert Panel (VCEP) Protocol v.11. (2023). [Google Scholar]
27.Horowitz K. et al. Enhancing variant of uncertain significance (VUS) interpretation in neurogenetics: collaborative experiences from a tertiary care centre. J Med Genet 62, 37–45 (2024). [DOI] [PubMed] [Google Scholar]
28.Lemke J. R. et al. GRIN2A null variants confer a high risk for early-onset schizophrenia and other mental disorders and potentially enable precision therapy. Mol Psychiatry 31, 374–382 (2026). [DOI] [PMC free article] [PubMed] [Google Scholar]
29.Camp C. R. et al. Loss of Grin2a causes a transient delay in the electrophysiological maturation of hippocampal parvalbumin interneurons. Commun Biol 6, 952 (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]
30.Krey I. et al. L-Serine Treatment is Associated with Improvements in Behavior, EEG, and Seizure Frequency in Individuals with GRIN-Related Disorders Due to Null Variants. Neurotherapeutics 19, 334–341 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
31.Brunklaus A. et al. Development and Validation of a Prediction Model for Early Diagnosis of SCN1A-Related Epilepsies. Neurology 98, e1163–e1174 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
32.Gallagher D. et al. Genotype-phenotype associations in 1018 individuals with SCN1A-related epilepsies. Epilepsia 65, 1046–1059 (2024). [DOI] [PubMed] [Google Scholar]
33.Stefanski A. et al. SLC6A1 variant pathogenicity, molecular function and phenotype: a genetic and clinical analysis. Brain 146, 5198–5208 (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]
34.Merkel Dirk. Docker: lightweight linux containers for consistent development and deployment. Linux Journal 2014, (2014). [Google Scholar]
35.Wickham H. Ggplot2: Elegant Graphics for Data Analysis. (Springer-Verlag, New York, 2016). [Google Scholar]
36.Sievert C. Interactive Web-Based Data Visualization with R, Plotly, and Shiny. (Chapman and Hall/CRC, New York, 2020). [Google Scholar]
37.Rego N. & Koes D. 3Dmol.js: molecular visualization with WebGL. Bioinformatics 31, 1322–1324 (2015). [DOI] [PMC free article] [PubMed] [Google Scholar]
38.Landrum M. J. & Kattman B. L. ClinVar at five years: Delivering on the promise. Human Mutation 39, 1623–1630 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]
39.Danecek P. et al. The variant call format and VCFtools. Bioinformatics 27, 2156–2158 (2011). [DOI] [PMC free article] [PubMed] [Google Scholar]
40.Stawiński P. & Płoski R. Genebe.net: Implementation and validation of an automatic ACMG variant pathogenicity criteria assignment. Clin Genet 106, 119–126 (2024). [DOI] [PubMed] [Google Scholar]
41.Wang K., Li M. & Hakonarson H. ANNOVAR: functional annotation of genetic variants from high-throughput sequencing data. Nucleic Acids Res. 38, e164 (2010). [DOI] [PMC free article] [PubMed] [Google Scholar]
42.Edgar R. C. MUSCLE: multiple sequence alignment with high accuracy and high throughput. Nucleic Acids Research 32, 1792–1797 (2004). [DOI] [PMC free article] [PubMed] [Google Scholar]
43.Seal R. L. et al. Genenames.org: the HGNC resources in 2023. Nucleic Acids Res 51, D1003–D1009 (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]
44.Frazer J. et al. Disease variant prediction with deep generative models of evolutionary data. Nature 599, 91–95 (2021). [DOI] [PubMed] [Google Scholar]
45.Pejaver V. et al. Inferring the molecular and phenotypic impact of amino acid variants with MutPred2. Nature Communications 11, 5918 (2020). [Google Scholar]
46.Jaganathan K. et al. Predicting Splicing from Primary Sequence with Deep Learning. Cell 176, 535–548.e24 (2019). [DOI] [PubMed] [Google Scholar]
47.GRIN Disorders Variant Curation Expert Panel - ClinGen ∣ Clinical Genome Resource. https://clinicalgenome.org/affiliation/50078/. [Google Scholar]
48.Silk M., Petrovski S. & Ascher D. B. MTR-Viewer: identifying regions within genes under purifying selection. Nucleic Acids Res 47, W121–W126 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
49.Heyne H. O. et al. Predicting functional effects of missense variants in voltage-gated sodium and calcium channels. Sci Transl Med 12, (2020). [Google Scholar]
50.Boßelmann C. M., Hedrich U. B. S., Lerche H. & Pfeifer N. Predicting functional effects of ion channel variants using new phenotypic machine learning methods. PLoS Comput Biol 19, e1010959 (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]
51.Lek M. et al. Analysis of protein-coding genetic variation in 60,706 humans. Nature 536, 285–291 (2016). [DOI] [PMC free article] [PubMed] [Google Scholar]
52.Morales J. et al. Matched Annotation from NCBI and EMBL-EBI (MANE) Select & Plus Clinical Transcript Set, Release v1.4. Dataset. NCBI/EMBL-EBI https://doi.org/https://ftp.ncbi.nlm.nih.gov/refseq/MANE/MANE_human_release_v1.2 (2024). [Google Scholar]
53.Perszyk R. E., Kristensen A. S., Lyuboslavsky P. & Traynelis S. F. Three-dimensional missense tolerance ratio analysis. Genome Res 31, 1447–1461 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
54.The UniProt Consortium. UniProt: a worldwide hub of protein knowledge. Nucleic Acids Research 47, D506–D515 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
55.Berman H. M. et al. The Protein Data Bank. Nucleic Acids Res 28, 235–242 (2000). [DOI] [PMC free article] [PubMed] [Google Scholar]
56.Varadi M. et al. AlphaFold Protein Structure Database: massively expanding the structural coverage of protein-sequence space with high-accuracy models. Nucleic Acids Research 50, D439–D444 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
57.Dana J. M. et al. SIFTS: updated Structure Integration with Function, Taxonomy and Sequences resource allows 40-fold increase in coverage of structure-based annotations for proteins. Nucleic Acids Research 47, D482–D489 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
58.Kinsella R. J. et al. Ensembl BioMart: Human Gene Paralogues (GRCh38) Dataset, Release 114. Dataset. EMBL-EBI https://doi.org/https://www.ensembl.org/biomart/ (2025). [Google Scholar]
59.Durinck S., Spellman P. T., Birney E. & Huber W. Mapping identifiers for the integration of genomic datasets with the R/Bioconductor package biomaRt. Nat Protoc 4, 1184–1191 (2009). [DOI] [PMC free article] [PubMed] [Google Scholar]
60.Preston C. G. et al. ClinGen Variant Curation Interface: a variant classification platform for the application of evidence criteria from ACMG/AMP guidelines. Genome Med 14, 6 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
61.Harris P. A. et al. Research electronic data capture (REDCap)--a metadata-driven methodology and workflow process for providing translational research informatics support. J Biomed Inform 42, 377–381 (2009). [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Supplement 1

media-1.xlsx^{(19.1KB, xlsx)}

Supplement 2

media-2.xlsx^{(13KB, xlsx)}

Supplement 3

media-3.xlsx^{(11.1KB, xlsx)}

Supplement 4

media-4.xlsx^{(14.1KB, xlsx)}

Supplement 5

media-5.xlsx^{(31.9KB, xlsx)}

Supplement 6

media-6.pdf^{(1.1MB, pdf)}

[R1] 1.Brunklaus A. et al. Gene variant effects across sodium channelopathies predict function and guide precision therapy. Brain awac006 (2022) doi: 10.1093/brain/awac006. [DOI] [Google Scholar]

[R2] 2.Backwell L. & Marsh J. A. Diverse Molecular Mechanisms Underlying Pathogenic Protein Mutations: Beyond the Loss-of-Function Paradigm. Annu Rev Genomics Hum Genet 23, 475–498 (2022). [DOI] [PubMed] [Google Scholar]

[R3] 3.Strehlow V. et al. GRIN2A-related disorders: genotype and functional consequence predict phenotype. Brain 142, 80–92 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]

[R4] 4.Stenson P. D. et al. HGMD Professional Variant Dataset, Release 2024.2. Dataset. QIAGEN https://doi.org/https://www.hgmd.cf.ac.uk/ (2024). [Google Scholar]

[R5] 5.Landrum M. J. et al. ClinVar: public archive of relationships among sequence variation and human phenotype. Nucleic Acids Res. 42, D980–985 (2014). [DOI] [PMC free article] [PubMed] [Google Scholar]

[R6] 6.Karczewski K. J. et al. The mutational constraint spectrum quantified from variation in 141,456 humans. Nature 581, 434–443 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]

[R7] 7.Firth H. V. et al. DECIPHER: Database of Chromosomal Imbalance and Phenotype in Humans Using Ensembl Resources. Am. J. Hum. Genet. 84, 524–533 (2009). [DOI] [PMC free article] [PubMed] [Google Scholar]

[R8] 8.Esposito D. et al. MaveDB: an open-source platform to distribute and interpret data from multiplexed assays of variant effect. Genome Biology 20, 223 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]

[R9] 9.Kopanos C. et al. VarSome: the human genomic variant search engine. Bioinformatics 35, 1978–1980 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]

[R10] 10.Franklin. https://franklin.genoox.com/clinical-db/home. [Google Scholar]

[R11] 11.Richards et al. Standards and guidelines for the interpretation of sequence variants: a joint consensus recommendation of the American College of Medical Genetics and Genomics and the Association for Molecular Pathology. Genet Med 17, 405–423 (2015). [DOI] [PMC free article] [PubMed] [Google Scholar]

[R12] 12.Brünger T. et al. Conserved missense variant pathogenicity and correlated phenotypes across paralogous genes. Genome Biol 26, 197 (2025). [DOI] [PMC free article] [PubMed] [Google Scholar]

[R13] 13.Sodium channel VCEP. Epilepsy Sodium Channel Variant Curation Expert Panel. (2024). [Google Scholar]

[R14] 14.Pérez-Palma E. et al. Identification of Pathogenic Variant Enriched Regions across Genes and Gene Families. http://biorxiv.org/lookup/doi/10.1101/641043 (2019) doi: 10.1101/641043. [DOI] [Google Scholar]

[R15] 15.Cheng J. et al. Accurate proteome-wide missense variant effect prediction with AlphaMissense. Science 381, eadg7492 (2023). [DOI] [PubMed] [Google Scholar]

[R16] 16.Ioannidis N. M. et al. REVEL: An Ensemble Method for Predicting the Pathogenicity of Rare Missense Variants. Am J Hum Genet 99, 877–885 (2016). [DOI] [PMC free article] [PubMed] [Google Scholar]

[R17] 17.Rentzsch P., Witten D., Cooper G. M., Shendure J. & Kircher M. CADD: predicting the deleteriousness of variants throughout the human genome. Nucleic Acids Research 47, D886–D894 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]

[R18] 18.Lal D. et al. Gene family information facilitates variant interpretation and identification of disease-associated genes in neurodevelopmental disorders. Genome Med 12, 28 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]

[R19] 19.Traynelis J. et al. Optimizing genomic medicine in epilepsy through a gene-customized approach to missense variant interpretation. Genome Res. 27, 1715–1729 (2017). [DOI] [PMC free article] [PubMed] [Google Scholar]

[R20] 20.Köhler S. et al. The Human Phenotype Ontology in 2017. Nucleic Acids Res 45, D865–D876 (2017). [DOI] [PMC free article] [PubMed] [Google Scholar]

[R21] 21.GRIN Disorders Variant Curation Expert Panel - ClinGen ∣ Clinical Genome Resource. https://clinicalgenome.org/affiliation/50078/. [Google Scholar]

[R22] 22.Myers S. J. et al. Classification of missense variants in the N-methyl-d-aspartate receptor GRIN gene family as gain- or loss-of-function. Hum Mol Genet 32, 2857–2871 (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]

[R23] 23.Montanucci L. et al. Ligand distances as key predictors of pathogenicity and function in NMDA receptors. Hum Mol Genet 34, 128–139 (2025). [DOI] [PMC free article] [PubMed] [Google Scholar]

[R24] 24.Tejura M. et al. Calibration of variant effect predictors on genome-wide data masks heterogeneous performance across genes. The American Journal of Human Genetics 111, 2031–2043 (2024). [DOI] [PMC free article] [PubMed] [Google Scholar]

[R25] 25.Berg A. T. et al. Expanded clinical phenotype spectrum correlates with variant function in SCN2A-related disorders. Brain 147, 2761–2774 (2024). [DOI] [PMC free article] [PubMed] [Google Scholar]

[R26] 26.VCEP Group. ClinGen Variant Curation Expert Panel (VCEP) Protocol v.11. (2023). [Google Scholar]

[R27] 27.Horowitz K. et al. Enhancing variant of uncertain significance (VUS) interpretation in neurogenetics: collaborative experiences from a tertiary care centre. J Med Genet 62, 37–45 (2024). [DOI] [PubMed] [Google Scholar]

[R28] 28.Lemke J. R. et al. GRIN2A null variants confer a high risk for early-onset schizophrenia and other mental disorders and potentially enable precision therapy. Mol Psychiatry 31, 374–382 (2026). [DOI] [PMC free article] [PubMed] [Google Scholar]

[R29] 29.Camp C. R. et al. Loss of Grin2a causes a transient delay in the electrophysiological maturation of hippocampal parvalbumin interneurons. Commun Biol 6, 952 (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]

[R30] 30.Krey I. et al. L-Serine Treatment is Associated with Improvements in Behavior, EEG, and Seizure Frequency in Individuals with GRIN-Related Disorders Due to Null Variants. Neurotherapeutics 19, 334–341 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]

[R31] 31.Brunklaus A. et al. Development and Validation of a Prediction Model for Early Diagnosis of SCN1A-Related Epilepsies. Neurology 98, e1163–e1174 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]

[R32] 32.Gallagher D. et al. Genotype-phenotype associations in 1018 individuals with SCN1A-related epilepsies. Epilepsia 65, 1046–1059 (2024). [DOI] [PubMed] [Google Scholar]

[R33] 33.Stefanski A. et al. SLC6A1 variant pathogenicity, molecular function and phenotype: a genetic and clinical analysis. Brain 146, 5198–5208 (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]

[R34] 34.Merkel Dirk. Docker: lightweight linux containers for consistent development and deployment. Linux Journal 2014, (2014). [Google Scholar]

[R35] 35.Wickham H. Ggplot2: Elegant Graphics for Data Analysis. (Springer-Verlag, New York, 2016). [Google Scholar]

[R36] 36.Sievert C. Interactive Web-Based Data Visualization with R, Plotly, and Shiny. (Chapman and Hall/CRC, New York, 2020). [Google Scholar]

[R37] 37.Rego N. & Koes D. 3Dmol.js: molecular visualization with WebGL. Bioinformatics 31, 1322–1324 (2015). [DOI] [PMC free article] [PubMed] [Google Scholar]

[R38] 38.Landrum M. J. & Kattman B. L. ClinVar at five years: Delivering on the promise. Human Mutation 39, 1623–1630 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]

[R39] 39.Danecek P. et al. The variant call format and VCFtools. Bioinformatics 27, 2156–2158 (2011). [DOI] [PMC free article] [PubMed] [Google Scholar]

[R40] 40.Stawiński P. & Płoski R. Genebe.net: Implementation and validation of an automatic ACMG variant pathogenicity criteria assignment. Clin Genet 106, 119–126 (2024). [DOI] [PubMed] [Google Scholar]

[R41] 41.Wang K., Li M. & Hakonarson H. ANNOVAR: functional annotation of genetic variants from high-throughput sequencing data. Nucleic Acids Res. 38, e164 (2010). [DOI] [PMC free article] [PubMed] [Google Scholar]

[R42] 42.Edgar R. C. MUSCLE: multiple sequence alignment with high accuracy and high throughput. Nucleic Acids Research 32, 1792–1797 (2004). [DOI] [PMC free article] [PubMed] [Google Scholar]

[R43] 43.Seal R. L. et al. Genenames.org: the HGNC resources in 2023. Nucleic Acids Res 51, D1003–D1009 (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]

[R44] 44.Frazer J. et al. Disease variant prediction with deep generative models of evolutionary data. Nature 599, 91–95 (2021). [DOI] [PubMed] [Google Scholar]

[R45] 45.Pejaver V. et al. Inferring the molecular and phenotypic impact of amino acid variants with MutPred2. Nature Communications 11, 5918 (2020). [Google Scholar]

[R46] 46.Jaganathan K. et al. Predicting Splicing from Primary Sequence with Deep Learning. Cell 176, 535–548.e24 (2019). [DOI] [PubMed] [Google Scholar]

[R47] 47.GRIN Disorders Variant Curation Expert Panel - ClinGen ∣ Clinical Genome Resource. https://clinicalgenome.org/affiliation/50078/. [Google Scholar]

[R48] 48.Silk M., Petrovski S. & Ascher D. B. MTR-Viewer: identifying regions within genes under purifying selection. Nucleic Acids Res 47, W121–W126 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]

[R49] 49.Heyne H. O. et al. Predicting functional effects of missense variants in voltage-gated sodium and calcium channels. Sci Transl Med 12, (2020). [Google Scholar]

[R50] 50.Boßelmann C. M., Hedrich U. B. S., Lerche H. & Pfeifer N. Predicting functional effects of ion channel variants using new phenotypic machine learning methods. PLoS Comput Biol 19, e1010959 (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]

[R51] 51.Lek M. et al. Analysis of protein-coding genetic variation in 60,706 humans. Nature 536, 285–291 (2016). [DOI] [PMC free article] [PubMed] [Google Scholar]

[R52] 52.Morales J. et al. Matched Annotation from NCBI and EMBL-EBI (MANE) Select & Plus Clinical Transcript Set, Release v1.4. Dataset. NCBI/EMBL-EBI https://doi.org/https://ftp.ncbi.nlm.nih.gov/refseq/MANE/MANE_human_release_v1.2 (2024). [Google Scholar]

[R53] 53.Perszyk R. E., Kristensen A. S., Lyuboslavsky P. & Traynelis S. F. Three-dimensional missense tolerance ratio analysis. Genome Res 31, 1447–1461 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]

[R54] 54.The UniProt Consortium. UniProt: a worldwide hub of protein knowledge. Nucleic Acids Research 47, D506–D515 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]

[R55] 55.Berman H. M. et al. The Protein Data Bank. Nucleic Acids Res 28, 235–242 (2000). [DOI] [PMC free article] [PubMed] [Google Scholar]

[R56] 56.Varadi M. et al. AlphaFold Protein Structure Database: massively expanding the structural coverage of protein-sequence space with high-accuracy models. Nucleic Acids Research 50, D439–D444 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]

[R57] 57.Dana J. M. et al. SIFTS: updated Structure Integration with Function, Taxonomy and Sequences resource allows 40-fold increase in coverage of structure-based annotations for proteins. Nucleic Acids Research 47, D482–D489 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]

[R58] 58.Kinsella R. J. et al. Ensembl BioMart: Human Gene Paralogues (GRCh38) Dataset, Release 114. Dataset. EMBL-EBI https://doi.org/https://www.ensembl.org/biomart/ (2025). [Google Scholar]

[R59] 59.Durinck S., Spellman P. T., Birney E. & Huber W. Mapping identifiers for the integration of genomic datasets with the R/Bioconductor package biomaRt. Nat Protoc 4, 1184–1191 (2009). [DOI] [PMC free article] [PubMed] [Google Scholar]

[R60] 60.Preston C. G. et al. ClinGen Variant Curation Interface: a variant classification platform for the application of evidence criteria from ACMG/AMP guidelines. Genome Med 14, 6 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]

[R61] 61.Harris P. A. et al. Research electronic data capture (REDCap)--a metadata-driven methodology and workflow process for providing translational research informatics support. J Biomed Inform 42, 377–381 (2009). [DOI] [PMC free article] [PubMed] [Google Scholar]

PERMALINK

This is a preprint.

Gene Portals: A Framework for Integrating Clinical, Functional, and Structural Evidence into Rare Disease Variant Classification

Tobias Brünger

Ilona Krey

Suyeon Kim

Chiara Klöckner

Scott J Myers

Katrine M Johannesen

Arthur Stefanski

Gary Taylor

Eduardo Perez-Palma

Marie Macnee

Stephanie Schorge

Rebekka S Dahl

Hongjie Yuan

Riley E Perszyk

Sukhan Kim

Sunanjay Bajaj

Ingo Helbig

Jen Q Pan

Mark Farrant

Lonnie Wollmuth

David J A Wyllie

Erkin Kurganov

David Baez

Sameer Zuberi

Christian M Boßelmann

Holger Lerche

Massimo Mantegazza

Sandrine Cestèle

Patrick May

Alina Ivaniuk

Mary Anne Meskis

Veronica Hood

Leah Schust

Kimberly Goodspeed

Jing-Qiong Kang

Amber Freed

Cornelius Gati

Ludovica Montanucci

Arthur Wuster

Marena Trinidad

Steven Froelich

Alexander T Deng

Ángel Aledo Serrano

Artem Borovikov

Artem Sharkov

Arjan Bouman

MJ Hajianpour

Deb K Pal

Leslie Danvoye

Damien Lederer

Tugce R Balci

Eveline E O Hagebeuk

Alexis Heidlebaugh

Kathryn Oetjens

Trevor L Hoffman

Pasquale Striano

Sarah Drewes Williams

Kalene van Engelen

Katherine B Howell

Jean Khoury

Tim A Benke

Vincent Strehlow

Konrad Platzer

Amy Ramsey

Lisa Manaster

Sunitha Malepati

Pangkong Fox

Jeffrey Noebels

Wendy Chung

Annapurna Poduri

Laina Lusk Stripe

Sarah M Ruggiero

Stacey Cohen

Lacey Smith

Sylvia Boesch

Olivia Wilmarth

Anna Jenne Prentice