Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2023 Mar 1.
Published in final edited form as: Am J Med Genet A. 2021 Nov 23;188(3):735–750. doi: 10.1002/ajmg.a.62565

Quantitative dissection of multilocus pathogenic variation in an Egyptian infant with severe neurodevelopmental disorder resulting from multiple molecular diagnoses

Isabella Herman 1,2,3,*, Angad Jolly 2,4,*, Haowei Du 2, Moez Dawood 2,4,5, Ghada M H Abdel-Salam 6, Dana Marafi 2,8,7, Tadahiro Mitani 2, Daniel G Calame 1,2,3, Zeynep Coban-Akdemir 2,8, Jawid M Fatih 2, Ibrahim Hegazy 6, Shalini N Jhangiani 2,5, Richard A Gibbs 2,5, Davut Pehlivan 1,2,3, Jennifer E Posey 2, James R Lupski 2,3,5,9
PMCID: PMC8837671  NIHMSID: NIHMS1753700  PMID: 34816580

Abstract

Genomic sequencing and clinical genomics have demonstrated that substantial subsets of atypical and/or severe disease presentations result from multilocus pathogenic variation (MPV) causing blended phenotypes. Using the Human Phenotype Ontology (HPO), we dissected the blended phenotype of an infant with severe neurodevelopmental disorder, brain malformation, dysmorphism, and hypotonia found by exome sequencing (ES) to have four distinct molecular diagnoses. ES revealed variants in CAPN3 (c.259C>G:p.L87V), MUSK (c.1781C>T:p.A594V), NAV2 (c.1996G>A:p.G666R), and ZC4H2 (c.595A>C:p.N199H). CAPN3, MUSK, and ZC4H2 are established disease genes linked to limb-girdle muscular dystrophy (OMIM# 253600), congenital myasthenia (OMIM# 616325), and Wieacker-Wolff syndrome (OMIM# 314580), respectively. NAV2 is a retinoic-acid responsive novel disease gene candidate with biological roles in neurite outgrowth and cerebellar dysgenesis in mouse models. Using semantic similarity, we show that no gene identified by ES individually explains the proband phenotype, but rather the totality of the clinically observed disease is explained by the combination of disease-contributing effects of the identified genes. These data reveal that multilocus pathogenic variation can result in a blended phenotype with each gene affecting a different part of the nervous system and nervous system-muscle connection. We provide evidence from this n =1 study that in patients with MPV and complex blended phenotypes resulting from multiple molecular diagnoses, quantitative HPO analysis can allow for dissection of phenotypic contribution of both established disease genes and novel disease gene candidates not yet proven to cause human disease.

Keywords: multilocus pathogenic variation, blended phenotype, multiple molecular diagnosis, HPO, neurogenetic

INTRODUCTION

Clinical genomics and genomic sequencing have allowed for unprecedented rates of molecular diagnoses in even the most complex patients (Lupski et al., 2020). Novel disease gene discovery is accelerating at a rapid pace with no end in sight, given that pathogenic variation in only ~5,000 of the predicted ~20,000 computationally annotated protein coding genes in the haploid human reference genome have been demonstrated to cause a human disease trait to date (Amberger et al., 2019). Therefore, potential human disease traits due to pathogenic variation in each of the remaining ~15,000 genes could tremendously illuminate disease biology (Lupski et al., 2020; Posey et al., 2019).

With increasing knowledge about molecular and genetic causes of human disease traits, including novel disease gene candidates and distinct genetic mechanisms within single genes, came the unanticipated realization that some complex clinical patient phenotypes can be due to multiple molecular diagnoses originating from multilocus pathogenic variation (MPV, Figure 1a) (Balci et al., 2017; Karaka et al., 2018; Posey et al., 2016; Posey et al., 2017; Posey et al., 2019; Yang et al., 2013). Approximately 5% of cases with an identified molecular diagnosis by clinical exome sequencing have been found to actually have more than one actionable molecular diagnosis, suggesting that the contribution of MPV to clinically relevant disease phenotypes is significant.

Figure 1. Quantitative analysis using the Human Phenotype Ontology (HPO) allows dissection of complex disease-gene and gene-disease associations.

Figure 1.

a) Example of multilocus pathogenic variation of two genes A and B in asymptomatic carrier parents that results in dual molecular diagnosis and two autosomal recessive diseases in the proband. These two disease phenotypes can be expressed concurrently or occur sequentially over time. b) Pathogenic variation in separate genes can result in the same disease. c) Pathogenic variation in a single gene can result in multiple distinct disease traits. d) Use of HPO terms to describe a patient’s phenotype allows for standardized terminology and database annotation for genes and diseases. e) Phenotypic similarity score using HPO terms is used to analyze patient phenotype-gene associations. Checkmark-presence of phenotypic feature, X-absence of phenotypic feature. f) Similarly, phenotypic similarity scores using HPO terms can be used for analysis of patient phenotype-disease associations.

MPV, a form of mutational burden, is an emerging concept explaining how two or more distinct or overlapping Mendelian disorders can occur either concurrently or develop sequentially over time leading to blended phenotypes (Jehee et al., 2017; Posey et al., 2016; Posey et al., 2017; Posey et al., 2019; Potocki et al., 1999). MPV may also underlie digenic inheritance, wherein pathogenic variation at two loci is required for manifestation of a single disease trait. In each case, MPV confers high penetrance of one (digenic inheritance), two (dual molecular diagnoses) or more (multiple molecular diagnoses) conditions. This contrasts with other multilocus, or oligogenic, disease models wherein a small number of minor affect alleles might aggregate to influence phenotypic expression.

Moreover, the discovery of MPV demonstrated that phenotypic expansion, i.e. expansion of the phenotype beyond core constituent features defining an established, Mendelian disease trait may actually originate from MPV at a combination of two or more distinct gene loci (Balci et al., 2017; Karaka et al., 2018; Posey et al., 2017; Posey et al., 2019). Additionally, the degree of rare variant mutational burden within genes has implications on phenotypic variability among affected individuals, and has been shown to explain differences in phenotypic severity between affected siblings (Gonzaga-Jauregui et al., 2015).

Multiple molecular diagnoses occur in at least one out of every 20 diagnostically informative exomes (Posey et al., 2017) with higher MPV rates for particular traits in distinct populations (Karaka et al., 2018; Pehlivan et al., 2019; Tareilo-Graovac et al., 2016). The clinical distinction between phenotypic expansion and MPV is frequently difficult to discern given phenotypic overlap of many established Mendelian disease traits and nonspecific, e.g. developmental delay/intellectual disability (DD/ID), clinical characteristics observed in some patients (Posey et al., 2019; Smith et al., 2019). Additionally, one disease phenotype, such as Charcot-Marie-Tooth neuropathy, can result from pathogenic variation in multiple genes (i.e. genetic or locus heterogeneity) (Lupski et al., 2010) and that variation in one gene can lead to different disease traits (allelic affinity) (Figure 1b, c) (Inoue et al., 2004). Similarly, it is challenging to determine phenotypic contribution of individual gene variants in patients with MPV, which is even more challenging in situations when pathogenic variation is identified in both known disease genes and novel disease gene candidates.

To facilitate objective phenotypic analysis, the Human Phenotype Ontology (HPO) provides a structured language and comprehensive bioinformatic resource to bridge genome biology (i.e. genotype) and clinical medicine (i.e. phenotype; Figure 1d) (Kohler et al., 2017). Here, we perform phenotypic dissection using semantic similarity in an n=1 study of a patient with a severe neurodevelopmental disorder found by exome sequencing (ES) to harbor MPV in three known disease genes (CAPN3, MUSK, ZC4H2) and one novel disease gene candidate (NAV2). These data provide evidence that the crosstalk between robust clinical phenotyping and unbiased bioinformatics approaches has the potential to ascertain a more comprehensive picture of disease traits stemming from MPV.

METHODS

This study was approved by the Institutional Review Board at Baylor College of Medicine (Protocol # H-29697) with informed consent obtained from the family, including broad consent for re-contacting, specimen studies, public database data sharing, publishing in a scientific journal, and publishing photographs relevant to phenotypes.

Exome sequencing

Trio ES was performed at the Baylor College of Medicine Human Genome Sequencing Center (BCM-HGSC) as previously described (Eldomery, et al., 2017; Karaka et al., 2018). Genomic DNA was obtained from peripheral blood from the affected proband and both parents. Sequencing was performed using an Illumina dual indexed, paired-end, pre-capture library per manufacturer protocol with modifications as described in the BCM-HGSC protocol (https://www.hgsc.bcm.edu/content/protocols-sequencing-library-construction). Libraries were pooled and hybridized to the HGSC VCRome 2.1 (Bainbridge et al., 2011) plus custom Spike-In design according to the manufacturer’s protocol (NimbleGen) with minor revisions. Paired-end sequencing was performed with the Illumina NovaSeq6000 platform with a sequencing yield of 12.2 Gb. Ninety-eight percent of targeted exome bases covered were sequenced to a depth of 20X or greater in all samples. The average depth of coverage for all 3 samples was 119x. Illumina sequence analysis was performed using the HGSC HgV analysis pipeline (Challis et al., 2012; Reid et al., 2014), which moves data through various analysis tools from initial sequence generation on the instrument to annotated variant calls (SNPs and intra-read in/dels). In parallel to the exome workflow a SNP Trace panel was generated for final quality assessment. This included orthogonal confirmation of sample identity and purity using the Error Rate In Sequencing (ERIS) pipeline developed at the HGSC. Using an “e-GenoTyping” approach, ERIS screens all sequence reads for exact matches to probe sequences defined by the variant and position of interest. A successfully sequenced sample must meet quality control metrics of ERIS SNP array concordance (>90%). Rare variant family-based exome analysis was performed as previously described to identify potentially causative variants based on predetermined criteria including minor allele frequencies, evolutionary conservation, in silico prediction tools (e.g. CADD, REVEL, SIFT, MutationTaster, among others), protein modeling using RaptorX (Kallberg et al., 2012), gene interactions, as well as deep phenotyping data using the proband’s clinical history and exam findings (Eldomery et al., 2017) as characterized in phenoDB (Wohler, et al. 2021). Identified variants were orthogonally confirmed and segregated via Sanger dideoxy DNA sequencing.

Absence of heterozygosity (AOH)

BafCalculator (https://github.com/BCM-Lupskilab/BafCalculator) (Karaka et al., 2018), an in-house developed bioinformatic tool that extracts the calculated B-allele frequency (ratio of variant reads/total reads) from unphased exome data, was used to calculate genomic intervals and total genomic content of absence of heterozygosity (AOH) intervals as a surrogate measure for runs of homozygosity (ROH) identity-by-descent (IBD). B-allele frequency was transformed by subtracting 0.5 and taking the absolute value for each data point before being processed by circular binary segmentation using the DNAcopy R Bioconductor package. The estimated coefficient of inbreeding values from ROH were calculated as the fraction of the sum of AOH genomic intervals >1.5 Mb in size to the length of the autosomal genome (3,100 Mbp) (Genome Reference Consortium).

Human Phenotype Ontology-based analysis of phenotypic overlap

After identification of potentially causative variant alleles by family-based genomics with rare variant analysis, BH14188’s phenotype was annotated with HPO terms for comparison against known disease phenotypes (n = 12,045) and disease-associated gene phenotypes (n = 4,460) (Figure 1d) (Liu et al, 2019; Robinson et al, 2008). Online Mendelian Inheritance in Man (OMIM) and Orphanet HPO annotated phenotypes for congenital diseases and disease-associated genes were downloaded via links provided at https://hpo.jax.org/app/download/annotation. The Ontology X suite (Greene et al., 2017) of R packages was used to compare the proband HPO term set to OMIM (https://omim.org/; Amberger et al., 2015; Hamosh, et al., 2005; McKusick 2007) and Orphanet (Orphanet 1997) annotated gene and disease HPO sets, excluding ancestral terms. The patient HPO x gene HPO matrix and the patient HPO x disease HPO matrix were generated using the Lin's semantic similarity score and average method and rank-orderd by Lin phenotypic similarity score (He and Lin, 2016). These parallel analyses were performed as disease-associated genes may represent HPO term sets for multiple diseases (e.g., nonsynonymous variants, truncating, and null variants may have different phenotypic spectrums). Therefore, parallel analyses allow the characterization of previously undescribed allelic types in a known disease associated gene or for the match of a proband phenotype to a known disease caused by a particular allelic type rather than the combination of diseases that may be associated with a gene. This is done by comparison of match results from the parallel disease and disease-associated gene HPO terms set analyses to the proband phenotype. These analyses may also help tease out other factors relevant to phenotypic expression, such as sex-limited traits as the sex-specific spectra are only represented in disease HPO term sets while a known disease associated gene HPO term set would include traits for both sexes.

The top scoring disease (p < 0.005) and disease associated gene (p < 0.01) phenotypic matches were used to visualize described phenotypic spectrums most like that of the proband. The clustering of these similar phenotypes represents sub-clusters of phenotypically similar disease within which the proband phenotype best matches. The clustering of the proband to a particular category suggests that among this related group of diseases (i.e., a molecular differential) the proband phenotype fits best within that cluster. This analysis is focused on identifying the disease associated gene phenotype that is most impactful or similar to the proband phenotype, and is supplemented by analyses for multilocus pathogenic variation. It is also supported by cross reference to parsed rare, predicted deleterious variants present in the proband variant call file to identify molecular diagnos(is/es). Analysis considering multilocus pathogenic variation (MPV), described above, then assist in the determination of whether the category represented by a single disease trait/entity associated to a gene/variant allele rather than a combination of disease associated genes/blended phenotypes is potentially a more evidenced based representation of the proband phenotype.”

Patient similarity scores were visualized using the Complex Heatmap package, and statistical analysis of patient groups was done using the OntologyX suite (Gu et al, 2016). Two distinct comparisons were performed: a patient-to-disease phenotype comparison, and a patient-to-gene phenotypic comparison (Figure 1e-f). The top 200 phenotypic similarity score matches were tested for p-value by comparison of the similarity score between the proband and the match with 100,000 randomly selected groups of 2 from the disease term sets (disease match analysis) or disease-associated gene phenotype term sets (gene match analysis). P-values were used to define a set of related diseases (p < 0.005) or disease-associated genes (p < 0.01) and visualized by Hierarchical Agglomerative Clustering (HAC) and heatmap generation. Hierarchical agglomerative clustering (HAC) was performed on distance matrices using the Ward method (Ward 1963). Gap statistic curves were used to define number of clusters for subdivision of the matches and proband. This allowed for identification of diseases or disease-associated genes with the best phenotypic match and visualization of the clustering within a group of related disorders.

When considering possible pathogenicity of > 1 gene, the proband’s variant call file was parsed for gnomAD frequency < 0.001, Baylor Hopkins Center for Mendelian Genomics database frequency < 0.001, CADD score >15, variant read count > 3, presence of a homozygous variant within a region of absence of heterozygosity, lack of the same variant in homozygosity in gnomAD, requirement that the variant was coding (nonsynonymous, frameshift, stop-gain, splicing included), and segregation of the variant with disease. The parsed variants (n = 4) were then cross referenced to the patient disease associated gene matches. All combinations of the HPO term sets for the 4 identified genes were tested for phenotypic similarity to the proband’s HPO annotated phenotype. Support for multilocus pathogenic variation was assessed by the change of phenotypic similarity to the proband phenotype from the highest scoring single gene HPO term set.

RESULTS

Proband’s clinical presentation and phenotype

The proband (BH14188) is a now deceased nine-month-old male born to first-cousin Egyptian consanguineous parents. A five-generation pedigree is shown in Fig. 2a. He was conceived via in vitro fertilization (IVF) after eight years of infertility of unknown etiology. There were no known miscarriages. Pregnancy was complicated by decreased fetal movement and intrauterine growth restriction. The patient was born via cesarean section at term, weighed 2 kg (Z score −2.5), and required intubation for one week due to respiratory failure. At 9 months, growth parameters were as follows: weight 7 kg (Z score −2.4), length 68 cm (9th percentile), and head circumference 41.5 cm (Z score −2.9). Clinical features (Fig. 2b-e) included failure to thrive, facial dysmorphism (hypotonic face, retrognathia, long and smooth philtrum, low-set ears, ptosis, downturned corners of the mouth, epilepsy, profound hypotonia with minimal spontaneous movement, musculoskeletal abnormalities (bilateral dislocated hips, congenital vertical talus, camptodactyly, and brachydactyly), micropenis, hypospadias, and hydrocele. Examination at nine months of age revealed a minimally responsive infant with respiratory distress and secretions. Reflexes were intact. Electromyography without repetitive nerve stimulation was unremarkable. Brain MRI obtained at 9 months of age revealed ventriculomegaly, increased extra-axial spaces with prominent cortical sulci, dysgenesis and thinning of the corpus callosum, cerebellar vermian hypoplasia and right hemicerebellar dysgenesis (Fig. 2f-j). The proband died in infancy with the cause of death determined to be respiratory failure in the setting of pneumonia.

Figure 2. Pedigree and clinical phenotype of proband.

Figure 2.

a) Five generation family pedigree. b-e) Patient images showing severe hypotonia, facial and neck weakness, dysmorphic features (upturned nose, carp-shaped mouth, high forehead, low set ears, retrognathia), and musculoskeletal features (rocker-bottom feet, camptodactyly, and brachydactyly). F-g) Sagittal T1 MRI brain obtained at 9 months of age showing frontal cerebral atrophy with increased extra-axial spaces, thinning of the corpus callosum, and ventriculomegaly). h-j) Coronal T2 MRI images showing ventriculomegaly, cerebellar vermian dysgenesis, and right hemicerebellar dysgenesis. k-v) Sanger sequencing results, amino acid residue conservation, and absence of heterozygosity (AOH) block of CAPN3, MUSK, NAV2, and ZC4H2, respectively.

Genetic and genomic evaluation

Previous genetic workup of the proband included a normal G-banded karyotype. He did not have any other clinical genetic workup. Research ES data and AOH analyses revealed evidence for consanguinity with genomic intervals demonstrating IBD; the observed total AOH was 267.5 Mb. The calculated coefficient of consanguinity was 0.074, which is consistent with the historical report of consanguinity and expected coefficient of consanguinity for product of a first cousin mating (0.0625) (Hamamy 2012; Purcell et al., 2007). In parental exomes, the total calculated AOH was 300 Mb in the mother and 57.3 Mb in the father, consistent with the mother also being a product of a first cousin mating. Trio ES did not reveal any large deletions or duplications, any potentially causative or contributory de novo variant alleles, or any single gene variant to explain the totality of the proband’s phenotype but showed the following four rare single nucleotide variants (SNVs) (Table 1 and Fig. 2k-v): CAPN3 (c.259C>G:pL87V; previously published in one other patient and subsequently reported in ClinVar in three additional patients with limb-girdle muscular dystrophy) (Piluso et al., 2005; Stenson et al., 2003), MUSK (c.1781C>T:pA594V; novel), NAV2 (c.1996G>A:pG666R; novel), and ZC4H2 (c.595A>C:pN199H; novel). CAPN3, MUSK, and NAV2 map within AOH intervals of 2.6 Mb, 10.4 Mb, and 6.5 Mb, respectively. ZC4H2 is located on the X chromosome, is hemizygous in the male proband, and, in accordance with Mendelian expectations for an X-linked recessive trait, the variant was maternally inherited. All other variants are homozygous with both parents being carriers in accordance with Mendelian expectations for an autosomal recessive (AR) trait. CAPN3, MUSK, and ZC4H2 are established disease genes linked to limb-girdle muscular dystrophy (LGMD, OMIM# 253600) (Piluso et al., 2005; Richard et all., 1995), congenital myasthenic syndrome (OMIM# 616325) (Mihaylova et al., 2009), and Wieacker-Wolff syndrome (WWS, OMIM# 314580) (Hirata et al., 2013), respectively. NAV2 is a novel retinoic acid-responsive disease gene candidate implicated in cerebellar dysgenesis in mouse models (McNeill et al., 2011), but no associated human disease phenotype has been reported to date. All variants are either ultra-rare [< 1/10,000 in population database controls] or private with minor allele frequencies (MAF) <0.001 in gnomAD.

Table 1.

Summary of Multilocus Variant Alleles

Gene Position (hg19) Nucleotide Change Protein Change Zygosity gnomAD Allele Count REVEL Score CADD Score (v.1.6)
CAPN3 Chr15:42652262:C>G c.259C>G p.L87V hmz 37 htz, 0 hmz 0.724 26
MUSK Chr9:113549972:C>T c.1781C>T p.A594V hmz 0 htz, 0 hmz 0.629 27
NAV2 Chr11:19955717:G>A c.1996G>A p.G666R hmz 17 htz, 0 hmz 0.105 23
ZC4H2 ChrX:64137743:T>G c.595A>C p.N199H hemi 0 htz, 0 hmz 0.797 29

Abbreviations: hemi-hemizygous, hmz-homozygous, htz-heterozygous, REVEL-rare exome variant ensemble learner, CADD-Combined Annotation-Dependent Depletion

Further exploration of the proband’s personal genome rare variant data revealed that 92% of non-VUS missense/in-frame/non-synonymous variants in CAPN3 have been found to be pathogenic based on UniProt (https://www.uniprot.org/) and the variant identified in our study is predicted pathogenic by multiple in silico prediction tools. The variant in MUSK is a private variant, absent from gnomAD, predicted pathogenic by 8 different prediction tools, and based on UniProt, 73% of variants identified in MUSK are considered pathogenic with 40% of these variants being nonsynonymous missense variants. The variant identified in ZC4H2 is a private variant, absent from gnomAD in the hemi or homozygous state, affected amino acid fully conserved across species, predicted pathogenic by 9 in silico prediction tools, and 87% of identified missense variants are considered pathogenic. The variant identified in NAV2 has a MAF <0.001, is absent from gnomAD in the homozygous state, the affected amino acid residue is very well conserved across species. However, given that NAV2 has not been linked to human disease yet, there are no clinically reported pathogenic variants and only a total of 13 variants have been clinically reported across the gene, which are considered benign. To determine potential effects on protein structure and functioning of all identified missense variants, we performed molecular modeling, which demonstrated deleterious impacts on secondary structure and hydrogen bonding by the variant proteins (Fig. S1).

Phenotypic dissection using HPO analyses

Given the complex and potentially blended phenotype of BH14188 resulting from MPV, and identification of potentially causative variants in three known disease trait genes (CAPN3, MUSK, ZC4H2), and one novel candidate disease contributory gene/variant allele (NAV2) by exome analysis and review of the existing literature, HPO analysis was implemented to ascertain individual variant impact and to assess for the presence of individual gene associated traits in the proband’s blended phenotype. First, the proband’s phenotype was converted to HPO terms and compared to that of known disease-associated genes reported in OMIM (www.omim.org) and Orphanet (www.orpha.net) databases using the Lin method (He and Lin, 2016) (Figure 3). To determine the optimal number of clusters for grouping into statistically significant (p<0.01) gene matches, a gap statistic was calculated for different numbers of clusters. A local maximum in the gap statistic curve showed that the transition from three to four clusters increased gap statistic to a greater extent than subsequent increases to number of clusters. Therefore, gene matches from OMIM and Orphanet were grouped into four distinct clusters (Figure 3a). The same method was applied to match proband phenotype to diseases reported in OMIM and Orphanet databases.

Figure 3. Quantitative analysis of phenotypic overlap with known disease genes and known disease phenotypes using semantic similarity.

Figure 3.

a) For significant matches (p < 0.01) from the Gene-HPO Annotated set to BH14188, gap statistic for cluster size 1-10 was calculated using 500 bootstrap iterations. A local maximum or point where the slope changes from a steep to more flattened curve was used to determine number of clusters to group matches. Gene matches were clustered into 4 groups. b) Gap statistics for disease matches as was completed for gene matches in a, with a p cut-off of 0.0037 due to the larger size of the disease HPO annotated database. Disease matches were clustered into 4 groups as a local maximum of gap statistic value was observed for k = 4. c) Genes with significant HPO semantic similarity to BH14188 (p < 0.01) are shown along with results of HAC analysis. The phenotype of BH14188 was converted to HPO terms and compared to disease associated genes annotated with HPO terms via OMIM and Orphanet databases. Clusters are labeled on the dendrogram according to the disease category represented by boxed genes within that cluster, and color coded. Gene and proband IDs are shown along the right of the dendrogram. A legend for color intensity correlated with similarity score is shown. d) Dendrogram and heatmap showing diseases with significant HPO semantic similarity to BH14188 (p < 0.0037). WWS = Wieacker-Wolff Syndrome; WWS-FR = Wiacker-Wolff Syndrome, Female Restricted; -ON = Orphanet; IHPRF2 = Infantile Hypotonia with Psychomotor Retardation and Characteristic Facies; MRXS33/35 = Mental Retardation, X-linked, Syndromic, 33/35; HLD10/14 = Leukodystophy, Hypomyelenating, 10/14; IDDFSDA = Intellectual Developmental Disorder with Dysmorphic Facies, Seizures, and Distal Limb Anomalies; MRD22 = Mental Retardation, Autosomal Dominant, 22; PFCRD = Peroxisomal Fatty Acyl-CoA Reductase 1 Disorder; PSPHD = Phosphoserine Phosphatase Deficiency; CDG1AA/2E/1W = Congenital Disorder of Glycosylation Type Iaa/IIe/Iw; MIGSB = Microcephaly, Growth Deficiency, Seizures , and Brain Malformations; NDMSBA = Neurodevelopmental Disorder with Progressive Microcephaly, Spasticity, and Brain Anomalies; CdLS = Cornelia de Lange Syndrome; CdLS5 = Cornelia de Lange Syndrome 5; GUST = Gustavson Syndrome (Mental Retardation with Optic Atrophy, Deafness, and Seizures); AMRS = Arthrogryposis, Mental Retardation, and Seizures; NEDSEBA = Neurodevelopmental Disorder with Seizures and Brain Atrophy; Kondoh = Kondoh Syndrome (Mental Retardation, Microcephaly, Growth Retardation, Joint Contractures, and Facial Dysmorphism); KFS4 = Klippel-Feil Syndrome 4; COFS1 = Cerebrooculofacioskeletal Syndrome 1; NEDMABA = Neurodevelopmental Disorder with Microcephaly, Arthrogryposis, and Strcutural Brain Anomalies; MWKS = Marden-Walker Syndrome; WITKOS = Witteveen-Kolk Syndrome; MRT36 = Mental Retardation, Autosomal Recessive, 36; NEDDFSA = Neurodevelopmental Disorder with Dysmorphic Facies and Distal Skeletal Anomalies; BRWS1 = Baraitser-Winter Syndrome 1; 2p15-p16.1del = Chromosome 2p15-p16.1 Deletion Syndrome; MRSXT = Mental Retardation, X-Linked, Syndromic, Turner Type; 6q24-q25del = 6q24-q25 Deletion Syndrome; BOPS = Bohring-Opitz Syndrome; 10q26del = 10q26 Deletion Syndrome; MRD28 = Mental Retardation, Autosomal Dominant, 28; PRPTS = Pierpont Syndrome; NDDFDSA = Neurodevelopmental Disorder with Dysmorphic Facies and Distal Skeletal Anomalies; CSS1 = Coffin-Siris Syndrome 1; MSSP = Microcephaly, Short Stature, and Polymicrogyria with or withot Seizures; Micro = Warburg Micro Syndrome; GAMOS1 = Galloway-Mowat Syndrome 1; LISX2 = Lissencephaly, X-linked, 2; PCH1C/1B = Pontocerebellar Hypoplasia Type 1C/1B; STT3A-CDG = STT3A associated Congenital Disorder of Glycosylation; COG5-CDG = COG5 associated Congential Disorder of Glycosylation

Significant disease matches were clustered into four groups using the same gap statistic methods employed for semantic similarity score gene matches with a lower p-value cutoff reflecting the difference in the number of HPO annotated diseases versus genes (Figure 3b). Analysis of phenotypic overlap with disease-associated genes and disease phenotypes are shown in Figure 3c and 3d, respectively. The four clusters represented Cornelia de Lange Syndrome (CdLS; purple), Warburg Micro Syndrome (WMS; green), syndromic brain malformation disorders (red), and a group of syndromic intellectual disability genes with phenotypic similarity (gold) (Figure 3d). The known disease gene with the highest rank ordered similarity score and lowest p-value was ZC4H2; however, it did not phenotypically cluster with the proband. A complete list of rank ordered similarity scores is available in table S1. The data suggest that the clinical synopsis of the ZC4H2-associated disease trait does not fully represent the proband phenotype, which could represent a limitation to the annotated disease-trait phenotype, or be a result of an inherent limitation of the experimental approach. MUSK did not have a significant phenotype match using a cutoff value of p<0.01 but did with a cutoff of p<0.05. Notably, MUSK was the second highest scoring disease associated gene in which homozygous rare variation was found by ES. These data suggest that of the known genes with putative pathogenic variants found in proband BH14188, ZC4H2 demonstrates the best phenotypic match. As shown by the gap statistic for a single cluster (Figure 3b), the group of matching diseases was much more similar, and while syndromes including cerebellar atrophy along with additional neurologic disorder are separated out (purple), the other three clusters represent syndromic intellectual disability disorders with brain malformation. The split of this group into three clusters is due to specific additional features present for that cluster. Here, Wolff-Wieacker syndrome (X-linked dominant) was the top match and the proband phenotype clustered with the disease phenotype (red).

Interestingly, ZC4H2 and its associated disease, WWS, were the top matches to the proband phenotype in both analyses. However, there was discordance in whether the proband clustered with the phenotypic spectrum associated with ZC4H2 and WWS (Figure 3c, d). The discordance is partially due to the annotation of ZC4H2 with both X-linked dominant and X-linked recessive (female restricted) phenotypes, but may also suggest other loci contribute to phenotype in addition to ZC4H2. To better visualize the proband’s phenotype and disease/gene associated phenotype matches, an annotation grid was generated which shows the probands phenotypic features along with any shared features of rank ordered matches (Figure 4a, b). Of all known disease genes with a significant (p < 0.01) phenotypic match, ZC4H2 was the only one predicted pathogenic variation found by proband ES. Importantly, no single disease or disease associated gene phenotypic spectrum identified through HPO similarity score analysis explained the totality of the proband’s complex clinically observed phenotype. This more firmly supports that MPV, including ZC4H2, may contribute to the proband phenotype.

Figure 4. Annotation grids showing overlap of proband phenotypic traits to known disease traits and known disease gene-associated traits used to explore multilocus pathogenic variation (MPV).

Figure 4.

a) Known Mendelian Inheritance of Man (MIM) disease match (p < 0.0037) annotation grid showing phenotypic overlap (red) with proband BH14188. Y-axis shows known diseases from OMIM/Orphanet. X-axis depicts phenotypes using HPO terminology. Frequency of disease-associated HPO terms is shown in black. b) Proband phenotypic overlap within the top (p < 0.01) matching HPO-term annotated gene-set. Matches are rank ordered by semantic similarity. Frequency of disease gene-associated HPO terms is shown in black. c) HPO analysis of terms from OMIM/Orphanet annotation to proband phenotype with known disease genes (ZC4H2, MUSK, and CAPN3) and novel candidate disease gene (NAV2) identified by research exome analysis. ZC4H2, MUSK, and CAPN3 were initially an analyzed individually, followed by the combined unique HPO term set of ZC4H2, MUSK, and CAPN3. NAV2 was initially analyzed individually using HPO2GO predicted terms, followed by the analysis of the totality of all 4 genes identified by research exome analysis (ZC4H2, MUSK, CAPN3, NAV2). Novel disease gene candidate NAV2 HPO2GO predicted phenotypic traits present in the proband are shown in orange. Asterisks represent significance according to standard convention (*** < 0.01, * < 0.05).

Next, given that of the four distinct genes identified by rare variant ES analysis (CAPN3, MUSK, NAV2, ZC4H2) and only ZC4H2 and MUSK had statistically significant phenotypic similarity scores using cutoffs of p<0.01 and p<0.05, respectively, we assessed the phenotypic contribution of all four identified genes to the proband’s phenotype (Figure 4c). Patient similarity score matches were cross referenced with rare variants present within the exome to identify candidate genes. While all genes identified by research exome analysis did not meet a cut-off value of p < 0.01, they were among the top quartile of matches. Our approach used the individual phenotypes, i.e. HPO terms, reported in association with ZC4H2, MUSK, and CAPN3 as a clinical synopsis of the gene associated trait, and compared to the proband’s total clinically observed phenotype. This was followed by a combination of phenotypes from these three genes as a unique set of phenotypes; a ‘synthetic blended phenotype’ designated as “MPV”. This unique set of phenotypes resulting from a combination of disease traits for the known disease genes ZC4H2, CAPN3, and MUSK showed the greatest similarity score; this was taken as objective evidence supporting the contention of a better match to the proband’s phenotype than any individual identified disease gene.

As a control, we performed similarity score analysis for all combinations of known disease-associated genes with rare variants in proband AOH regions to determine the combination of known disease genes with the highest similarity score for the proband phenotype (Table 2). A total of four known disease-associated genes, CAPN3, MUSK, PGAP2, and ZC4H2 passed control analysis parsing filters, i.e. are homozygous in the proband only, located within an AOH region, and were identified to have rare variants. PGAP2 is associated with hyperphosphatasia with mental retardation syndrome 3 (OMIM #614207) with cardinal features of elevated alkaline phosphatase, hyperphosphatemia, developmental delay, intellectual disability, and hypotonia. The proband described in this report did not have any metabolic derangements considered a hallmark of this disease and therefore, PGAP2 was not considered contributory to the phenotype. Data showed that MUSK-ZC4H2 had a higher similarity score than ZC4H2 alone.

Table 2.

Similarity scores for known disease-associated gene combinations within proband AOH regions and individual known disease-associated genes.

Gene Combinations Sim. Score Gene Name Sim. Score
MUSK ZC4H2 0.738 ZC4H2 0.730
CAPN3 MUSK ZC4H2 0.737 MUSK 0.649
CAPN3 MUSK PGAP ZC4H2 0.732 PGAP2 0.584
MUSK PGAP2 ZC4H2 0.730 CAPN3 0.354
CAPN3 ZC4H2 0.721
CAPN3 PGAP2 ZC4H2 0.707
PGAP2 ZC4H2 0.707
CAPN3 PGAP2 MUSK 0.689
PGAP2 MUSK 0.685
CAPN3 MUSK 0.653
CAPN3 PGAP2 0.612

Despite a high similarity score for a combination of MPV in known disease genes MUSK, and ZC4H2, several clinical features, including cerebellar atrophy, remained unexplained. Therefore, the contribution of the novel candidate disease gene NAV2 was explored. Given that no human data about the role of NAV2 in disease have been published, phenotypic data from model organisms was applied to predict the phenotypic effects of deleterious NAV2 variation in humans.

Towards this end, the Gene Ontology (Ashburner et al., 2000; The Gene Ontology Consortium, 2019), like the Human Phenotype Ontology, is a structured language database that includes annotations of molecular functions and pathways for genes from multiple model organisms based on previous publications. This integration of biological data across multiple organisms aides in annotation of novel candidate disease genes not yet associated with HPO terms, such as NAV2. Instead of HPO terms, “GO terms” are used in the Gene Ontology database and enable generation of derivative HPO terms based on the model organism data and human disease-gene association data available. HPO2GO (Dogan 2018) is an automated cross-ontology mapping of GO to HPO via statistical resampling of co-occurrence similarity distributions. Each HPO mapping indicates a phenotypic abnormality due to loss of the corresponding GO term(s) function(s). Using this list, the GO terms Nav2 were converted to equivalent NAV2 HPO. Phenotypic overlap between the NAV2 annotated HPO terms and the proband’s phenotype was determined (Figure 4c) and showed that NAV2 overlapped markedly with ZC4H2 in several phenotypes, including seizures, microcephaly, global developmental delay, but additionally was associated with a unique set of phenotypes observed in NAV2 and the proband that were not fully explained by the other three known disease genes (e.g. cerebellar atrophy, hypospadias, sparse hair; highlighted green in Figure 4c). Notably, ZC4H2-MUSK associated phenotypes combined represented phenotypic features not predicted to be associated with NAV2 (highlighted purple, Figure 4c). These analyses reveal a clinically observed pathognomonic signature for this patient’s blended phenotype and disease process and provide compelling evidence for contributions from all four genes.

DISCUSSION

Clinical genomics has allowed genome-wide assessment of pathogenic variation for an unbiased analysis, i.e. an analysis not targeted to a single gene or genetic locus, of molecular etiologies. It can be informative for the most complex disease phenotypes and can illuminate disease that potentially represents blended phenotypes due to MPV. This genomics approach is distinct from a single gene test approach or a disease gene panel. The ES clinical genomics approach has led to the previously underrecognized understanding that some blended phenotypes arise from pathogenic variant alleles at more than one gene locus; i.e. MPV (Posey et al., 2017; Yang et al., 2013; Yang et al., 2014). Dual or multiple molecular diagnoses arise when pathogenic variation occurs at two or more discrete gene loci, leading to distinct or overlapping expression of Mendelian disease traits (Figure 1a) (Posey et al., 2017; Yang et al., 2013; Yang et al., 2014). In recent years, ES has identified multiple molecular diagnoses in 1.4 to 22% of diagnostically informative exomes and is therefore more common than anticipated (Farwell et al., 2015; Gonzaga-Jauregui et al., 2015; Karaka et al., 2018; Mitani et al., 2021; Pehlivan et al., 2019; Posey et al., 2017; Retterer et al., 2016; Yang et al., 2013; Yang et al., 2014) or mutational theory may have ‘predicted’. This wide range is due to several factors, including patient population, phenotypes studied, and rate of consanguinity within a population (Balci et al., 2017; Mitani et al., 2021; Pehlivan et al., 2019).

The inheritance pattern for individual genes and traits involved in MPV depends on the substructure of the populations studied. Two or more autosomal dominant (AD) inheritance patterns are the most common among non-consanguineous populations, with ~44.7% of cases with two monoallelic variants having de novo mutation at both loci (AD + AD trait, AD + XL trait) and only 9.3% of multiple molecular diagnoses originating from AR + AR combinations (Posey et al., 2017). However, in consanguineous populations, the rate of two or more autosomal recessive homozygous variant gene loci (i.e. trait loci under the AR + AR designation) is markedly higher due to identity-by-descent and high AOH burden, approaching over 80% for some phenotypes in MPV cases (Mitani et al., 2021; Pehlivan et al., 2019). Remarkably, recently carrier screening of recessive disease traits in consanguineous couples has shown that 28% of those couples tested via ES carried pathogenic or likely pathogenic variants in diseases not known to the couple to occur in the family (Sallevelt et al., 2021). Therefore, this population is at high risk of producing offspring with AR diseases with a 25% chance in couples who both carry a deleterious recessive disease trait and preconception counseling is suggested.

We have used a comprehensive and well-established approach of family-based genomics with rare variant analysis to select the variants identified in this study. Only after this first and very important ‘ascertainment/selection step’ was the complex strategy of HPO analysis used as a proof of principle to provide additional evidence towards the contribution of the variants to the proband’s phenotype and show that this approach can be used in situations where multilocus pathogenic variation (MPV) is the likely underlying cause of a complex blended phenotypes. The proband described in this report is the offspring of first cousin Egyptian parents and was conceived via in vitro fertilization after eight years of infertility of unknown etiology. Although a cryptic balanced translocation in one of the parents is possible, it is also possible that a high degree of mutational burden and multilocus variation within the family resulted in infertility or early unrecognized miscarriages.

Family-based ES with rare variant analysis discerned distinct potentially causative variants in three previously reported disease genes (CAPN3:c.259C>G:p.L87V, MUSK:c.1781C>T:p.A594V, ZC4H2:c.595A>C:p.N199H) and one novel candidate disease gene (NAV2:c.1996G>A:p.G666R, Table 1). These genes CAPN3, MUSK, NAV2, and ZC4H2 map to four distinct chromosomes: Ch15q15.1, Ch9q31.3, Ch11p15.1, and ChXq11.2, respectively, and each affect different parts of the nervous system and nervous system muscle connection (Figure 5). All variants are rare and predicted to be deleterious and CAPN3:c.259C>G:p.L87V has been previously reported as a causative mutation in a patient with LGMD (Piluso et al., 2005). The results of this study suggest that this degree of multilocus variation has resulted in a complex blended phenotype of severe neurodevelopmental disorder with brain malformation, epilepsy, profound hypotonia, and musculoskeletal abnormalities not explained by a single known disease gene, nor by a single disease syndrome (Figure 3). CAPN3 is associated with autosomal dominant and autosomal recessive limb-girdle muscular dystrophy (OMIM# 253600) and causes facial and proximal muscle weakness, contractures, and proximal muscle atrophy (Richard et al., 1995). Although the disease trait of pathogenic variation in CAPN3 arises in later childhood or adulthood, a combination of MPV in other disease genes may certainly influence the phenotypic spectrum of disease and result in earlier disease manifestation. It is surely plausible that due to the very young age of the patient at presentation, as well as his premature death, the full phenotypic effect of this homozygous variant will not be possible to be assessed and we therefore did not observe any additional phenotypic contribution after quantitative HPO analysis. MUSK is a well-established gene linked to autosomal recessive congenital myasthenic syndrome type 9 (OMIM# 616325) with profound weakness and hypotonia, facial and neck muscle involvement, and respiratory insufficiency (Mihaylova et al., 2009). Wieacker-Wolff syndrome (OMIM# 314580) is characterized by dysmorphic features, musculoskeletal abnormalities, including camptodactyly and club feet, and neurological symptoms with global developmental delay, seizures, and delayed myelination (Hirata et al., 2013).

Figure 5. Multilocus variation in known disease genes ZC4H2, MUSK, and CAPN3, and novel candidate disease gene NAV2 affects different parts of the nervous system and nerve-muscle connection.

Figure 5.

ZC4H2 is linked to Wieacker-Wolff syndrome (OMIM #314580) causing microcephaly, delayed myelination, and cerebral atrophy. Pathogenic variation in MUSK causes congenital myasthenic syndrome (OMIM #616325) with decreased acetylcholine receptor (AChR) and reduced miniature endplate potential responsible for hypotonia. CAPN3 causes limb-girdle muscular dystrophy (OMIM #253600) and muscle atrophy with weakness and hypotonia. NAV2 is a novel candidate disease gene linked to cerebellar dysgenesis in mouse models. Figure was modified from https://www.merckmanuals.com/home/brain,-spinal-cord,-and-nerve-disorders/symptoms-of-brain-spinal-cord-and-nerve-disorders/weakness.

NAV2 is a retinoic acid responsive gene that encodes the protein NAV2, neuron navigator 2, with function in neuronal development and neurite outgrowth (Merrill et al., 2002). Interestingly, NAV2 (also known as RAINB1) was identified from a retinoic acid screen of neuroblastoma cells (Merrill et al., 2002); an approach similar to that used to isolate RAI1, the gene responsible for another neurodevelopmental disorder (Smith-Magenis Syndrome abbreviated SMS: OMIM# 182290). NAV2 has not been previously reported to cause a human phenotype, but homozygous Nav2−/− loss of function mice manifest profound defects in cerebellar development, neuronal cell proliferation, and axonal and migration defects (McNeill et al., 2011).

Given the complexity of the proband’s phenotype, and observed MPV in four distinct genes, we performed HPO analysis which allowed semantic similarity calculation of the proband’s clinically observed phenotypic features to the disease traits for both known disease genes and for known diseases reported in OMIM (Figures 3 and 4). These analyses strongly suggest that the complex blended phenotype of the proband is due to a combination of pathogenic variation in known disease genes MUSK, and ZC4H2 with contribution of the novel disease gene candidate NAV2, and potentially later-onset disease-contributing effects or combinatorial disease-contributing effects of CAPN3 not currently statistically observed HPO-based ontologic similarity analysis methods have been as an adjuvant tool and used in an effective reanalysis pipeline in the diagnostic testing setting for rare congenital disorders, as well as to assess dual molecular diagnosis rates from clinical exome studies (Liu, Meng et al., 2019; Posey et al., 2017).

Although phenotypic analysis of MPV and blended phenotypes using HPO and semantic similarity allows for objective dissection of gene contribution, it has several limitations. First, analysis improves markedly as more phenotypic features (HPO terms) are clinically annotated for a patient and given disease trait or gene. Therefore, diseases with unclear phenotypic spectrums are more challenging to evaluate. It is always possible during ES analysis to miss modifying or driver variants, or to have insufficient knowledge of some ultra-rare disease. Next, it is difficult to analyze novel disease gene candidate contribution to a proband’s phenotype due to lack of human data, i.e. HPO annotation for the novel gene. Additional challenges arise from lack of specificity resulting in phenotypic overlaps for certain HPO terms. For example, neonatal hypotonia (HP:0001319), generalized hypotonia (HP:0001290), and muscular or central hypotonia (HP:0001252; HP:0011398) could all be reported as simply hypotonia. General practice guidelines of proband HPO term annotation will have to be addressed to overcome this semantic challenge. For genes that are not yet associated with human disease, HPO annotation can be completed by manual annotation using data from model organisms. However, because HPO2GO often over annotates phenotypes (i.e., false positives are often present), NAV2’s predicted term set was not used in Lin similarity score analysis. However, as predicted phenotypes still represent a more specific subset of the HPO term set likely to be associated with deleterious variation in a particular gene and has been shown to predict biologically relevant phenotype associations (Dogan, 2018), NAV2’s predicted term set was used in comparison of phenotypes by phenotype annotation grid. These analyses seek to represent model organism studies of genes that are not currently associated with human disease in a more consistent manner, as these studies are often used in support of candidate disease-gene association. Further characterization of rare, deleterious variation in NAV2 in association with phenotypic development is required to definitively assess its contribution to phenotype alone or in combination with known disease associated genes. Another challenge for quantitative analysis arises when considering disease traits demonstrating age dependent penetrance, rather than disease traits that present concurrently at birth, as has been previously reported in multiple molecular diagnoses (Potocki et al., 1999). Therefore, at the time of clinical evaluation, an evaluated proband may have not yet developed all phenotypic traits caused by her/his MPV.

Even with comprehensive phenotypic dissection, there were still some phenotypic features not fully explained by the combination of genes discussed here. There are several possible explanations for this. First, this analysis, in its current form, does not account for the possibility that gene-gene or disease-disease combinations might result in ‘new’ phenotypic features that are not associated with either gene or either disease alone, i.e. in synthetic phenotypes (Lupski 2021; Yuan et al., 2015). Thus, it is possible that the combination of MPV has more than an additive phenotypic effect or results in increased severity, but instead generates new phenotypic features derived only from the MPV itself via genetic epistasis. Second, it is possible that additional pathogenic variation exists in other genes not discovered by family-based ES with rare variant analysis. Advances in i) allelic series for annotated human genes on the haploid reference build, ii) disease trait association gene ‘discoveries’, and iii) model organism mechanistic studies will each result in improved accuracy of this approach due to a larger HPO annotated gene set and more refined GO term annotation based on new model organism investigations. Additionally, this quantitative phenotyping approach will need to be applied to larger cohorts of patients with MPV in order to substantiate validity and reproducibility.

In conclusion, MPV has been increasingly recognized as the cause for complex phenotypic presentations, intrafamilial variability, and phenotypic expansion of established disease genes (Mitani et al., 2019; Mitani et al., 2021) with impact on the expression of phenotypic traits. Structured phenotypic analysis of complex blended phenotypes using HPO analysis with semantic similarity may facilitate systematically dissecting the phenotypic contribution of gene variants in MPV and multiple molecular diagnoses, as highlighted in a patient with severe neurodevelopmental disorder likely resulting from MPV in four distinct genes. However, it is important to note that the method used herein has not yet been proven and as such, may not fully explain the proband’s clinical picture. Nevertheless, this approach has the potential to further empower both the clinical neurologist and the clinical geneticist and may help in resolving previously unidentified gene-phenotype relationships, thus contributing to novel disease gene discovery and functional annotation of the human genome.

Supplementary Material

fS1

Figure S1. Protein modeling of missense variant alleles in CAPN3, MUSK, NAV2, and ZC4H2 shows abnormal secondary and tertiary structure formation.

Protein FASTA sequences were obtained from UniProt for use as reference protein sequence. Variant FASTAs were made by making the appropriate amino acid change to the reference sequence. RaptorX was used to model protein structure using reference and variant FASTA sequences as input. The best model provided by RaptorX was then used to view detailed changes to protein structure at the site of amino acid change with the Research Collaboratory for Structural Bioinformatics Protein Database tool (rcsb.org). Note that the following discussion is based on these predicted structures. a) LEU87 forms noncovalent bonds with GLU84, THR85, and PHE86 important to alpha helix secondary structure in CAPN3. The L87V change causes disruption of these noncovalent bonds and a shift in alpha helical structure. b) Within MUSK, ALA594 forms noncovalent bonds with GLU576 and THR604 that stabilize beta sheets. Change of ALA594 to VAL594 causes a change in noncovalent bonding, allowing PRO595 to form a noncovalent interaction and causing a change in beta sheet structure. c) The change from GLY666, a flexible (low steric hindrance) residue to ARG666 causes the widening and increased tortuosity of a loop within NAV2. d) ASN199 forms hydrogen bonds with ASP223 and LYS219 important the structure of a loop near a more central alpha helix in ZC4H3. Change of ASN199 to HIS199 disrupts both bonds and causes poor interaction between the same loop and alpha helix, impacting tertiary structure.

tS1

Acknowledgements

This study was supported in part by the U.S. National Human Genome Research Institute (NHGRI) and National Heart Lung and Blood Institute (NHBLI) to the Baylor-Hopkins Center for Mendelian Genomics (BHCMG, UM1 HG006542,); NHGRI Mendelian Genomics Research Center (MGRC, U01 HG011758), NHGRI grant to Baylor College of Medicine Human Genome Sequencing Center (U54HG003273 to R.A.G.), U.S. National Institute of Neurological Disorders and Stroke (NINDS) (R35NS105078 to J.R.L.) and Muscular Dystrophy Association (MDA) (512848 to J.R.L.). D.M. is supported by a Medical Genetics Research Fellowship Program through the United States National Institute of Health (T32 GM007526-42). T.M. is supported by the Uehara Memorial Foundation. D.P. is supported by International Rett Syndrome Foundation (IRSF grant #3701-1). J.E.P. was supported by NHGRI K08 HG008986. D.G.C. is supported by the Multidisciplinary Training in Brain Disorders and Development (NINDS 5T32NS043124-20) and Muscular Dystrophy Association Development Grant 873841. A.J. and M.D. are in the Baylor College of Medicine Medical Scientist Training Program. License for HGMDpro was obtained through the Human Genome Sequencing Center at Baylor College of Medicine.

Footnotes

Potential Conflict of Interest

J.R.L. has stock ownership in 23andMe, is a paid consultant for the Regeneron Genetics Center, and is a co-inventor on multiple United States and European patents related to molecular diagnostics for inherited neuropathies, eye diseases, and bacterial genomic fingerprinting. The Department of Molecular and Human Genetics at Baylor College of Medicine receives revenue from clinical genetic testing conducted at Baylor Genetics (BG) Laboratories, JRL is a member of the Scientific Advisory Board of BG. Other authors have no potential conflicts to report.

Data Availability Statement

The data that support the findings of this study are available from the corresponding author upon reasonable request.

REFERENCES

  1. Amberger JS, Bocchini CA, Schiettecatte F, et al. (2015). OMIM.org: Online Mendelian Inheritance in Man (OMIM®), an Online catalog of human genes and genetic disorders. Nucleic Acids Research. 43(D1):D789–D798. doi: 10.1093/nar/gku1205. [DOI] [PMC free article] [PubMed] [Google Scholar]
  2. Amberger JS, Boccini CA, Scott AF, and Hamosh A (2019). OMIM.org: Leveraging knowledge across phenotype-gene relationships. Nucleic Acids Research. 47(D1):D1038–D1043. doi: 10.1093/nar/gky1151. [DOI] [PMC free article] [PubMed] [Google Scholar]
  3. Ashburner M, Ball CA, Blake JA, et al. , (2000). Gene ontology: tool for the unification of biology. Nature Genetics. 25(1):25–9. doi: 10.1038/75556. [DOI] [PMC free article] [PubMed] [Google Scholar]
  4. Bainbridge MN, Wang M, Wu Y, et al. (2011). Targeted enrichment beyond the consensus coding DNA sequence exome reveals exons with higher variant densities. Genome Biology. 12(7):R68. doi: 10.1186/gb-2011-12-7-r68. [DOI] [PMC free article] [PubMed] [Google Scholar]
  5. Balci TB, Hartley T, Xi Y, Dyment DA, Beaulieu CL, Bernier FP, et al. (2017). Debunking Occam's razor: Diagnosing multiple genetic diseases in families by whole-exome sequencing. Clin Genet, 92(3):281–289. doi: 10.1111/cge.12987. [DOI] [PubMed] [Google Scholar]
  6. Challis D, Yu J, Evani US, et al. (2012). An integrative variant analysis suite for whole exome next-generation sequencing data. BMC Bioinformatics. 13:8. doi: 10.1186/1471-2105-13-8. [DOI] [PMC free article] [PubMed] [Google Scholar]
  7. Doğan T (2018). HPO2GO: Prediction of human phenotype ontology term associations for proteins using cross ontology annotation co-occurrences. PeerJ. 6:e5298. doi: 10.7717/peerj.5298. [DOI] [PMC free article] [PubMed] [Google Scholar]
  8. Eldomery MK, Coban-Akdemir Z, Harel T, et al. (2017). Lessons learned from additional research analyses of unsolved clinical exome cases. Genome Medicine. 9(1):26. doi: 10.1186/s13073-017-0412-6 [DOI] [PMC free article] [PubMed] [Google Scholar]
  9. Gonzaga-Jauregui C, Harel T, Gambin T, et al. (2015). Exome sequence analysis suggests that genetic burden contributes to phenotypic variability and complex neuropathy. Cell Reports.12:1169–1183. doi: 10.1016/j.celrep.2015.07.023. [DOI] [PMC free article] [PubMed] [Google Scholar]
  10. Farwell KD, Shahmirzadi L, El-Khechen D, et al. (2015). Enhanced utility of family-centered diagnostic exome sequencing with inheritance model-based analysis: results from 500 unselected families with undiagnosed genetic conditions. Genetics in Medicine.17(7):578–586. doi: 10.1038/gim.2014.154. [DOI] [PubMed] [Google Scholar]
  11. Greene D, Richardson S, Turro E (2017). OntologyX: A suite of R packages for working with ontological data. Bioinformatics. 33(7):1104–1106. doi: 10.1093/bioinformatics/btw763. [DOI] [PMC free article] [PubMed] [Google Scholar]
  12. Gu Z, Eils R, Schlesner M (2016). Complex heatmaps reveal patterns and correlations in multidimensional genomic data. Bioinformatics. 32(18):2847–2849. doi: 10.1093/bioinformatics/btw313. [DOI] [PubMed] [Google Scholar]
  13. Hamamy H (2012). Consanguineous marriages. Preconception consultation in primary health care settings. Journal of Community Genetics.3(3):185–192. doi: 10.1007/s12687-011-0072-y. [DOI] [PMC free article] [PubMed] [Google Scholar]
  14. Hamosh A, Scott AF, Amberger JS, et al. (2005). Online Mendelian Inheritance in Man (OMIM), a knowledgebase of human genes and genetic disorders. Nucleic Acids Research. 33(Database issue):D514–7. doi: 10.1093/nar/gki033. [DOI] [PMC free article] [PubMed] [Google Scholar]
  15. He H, Lin J (2016). Pairwise word interaction modeling with deep neural networks for semantic similarity measurement. Proceedings of the 2016 conference of the North American Chapter of the Association for Computational Linguistics Conference.p. 937–948. [Google Scholar]
  16. Hirata H, Nanda I, van Riesen A, McMichael G, et al. (2013). ZC4H2 mutations are associated with arthrogryposis multiplex congenita and intellectual disability through impairment of central and peripheral synaptic plasticity. American Journal of Human Genetics. 92: 681–695. doi: 10.1016/j.ajhg.2013.03.021. [DOI] [PMC free article] [PubMed] [Google Scholar]
  17. Inoue K, Khajavi M, Ohjama T, et al. (2004). Molecular mechanisms for distinct neurological phenotypes conveyed by allelic truncating mutations. Nature Genetics.36(4):361–369. doi: 10.1038/ng1322. [DOI] [PubMed] [Google Scholar]
  18. Jehee FS, de Oliveira VT, Gurgel-Giannetti J, et al. (2017). Dual molecular diagnosis contributes to atypical Prader–Willi phenotype in monozygotic twins. American Journal of Medical Genetics Part A. 173A:2451–2455. doi: 10.1002/ajmg.a.38315. [DOI] [PMC free article] [PubMed] [Google Scholar]
  19. Källberg M, Wang H, Wang S, et al. (2012). Template-based protein structure modeling using the RaptorX web server. Nature Protocols 7, 1511–1522. [DOI] [PMC free article] [PubMed] [Google Scholar]
  20. Karaca E, Posey JE, Coban Akdemir Z, et al. (2018). Phenotypic expansion illuminates multilocus pathogenic variation. Genetics in Medicine, 20:1528–1537. doi: 10.1038/gim.2018.33. [DOI] [PMC free article] [PubMed] [Google Scholar]
  21. Kohler S, Vasilevski NA, Engelstad M, et al. (2017). The Human Phenotype Ontology in 2017. Nucleic Acids Research. 45(D1):865–876. doi: 10.1093/nar/gkw1039. [DOI] [PMC free article] [PubMed] [Google Scholar]
  22. Liu C, Peres Kury FS, Li Z, et al. (2019). Doc2Hpo: a web application for efficient and accurate HPO concept curation. Nucleic Acids Research. 47(W1):W566–W570. doi: 10.1093/nar/gkz386. [DOI] [PMC free article] [PubMed] [Google Scholar]
  23. Liu P, Meng L, Normand EA, et al. (2019). Renalysis of clinical exome sequencing data. New England Journal of Medicine, 380(25):2478–2480. [DOI] [PMC free article] [PubMed] [Google Scholar]
  24. Lupski JR (2021). Clan Genomics: from OMIM phenotypic traits to genes and biology. American Journal of Medical Genetics. Aug 18. doi: 10.1002/ajmg.a.62434. [DOI] [PMC free article] [PubMed] [Google Scholar]
  25. Lupski JR, Lui P, Stankiewicz P, Carvalho C, and Posey JE (2020). Clinical genomics and contextualizing genome variation in the diagnostic laboratory. Expert Review Molecular Diagnostics., 20(10):995–1002. doi: 10.1080/14737159.2020.1826312. [DOI] [PMC free article] [PubMed] [Google Scholar]
  26. Lupski JR, Reid J, Gonzaga-Jauregui C, et al. (2010). Whole genome sequencing in a patient with Charcot-Marie Tooth Neuropathy. New England Journal of Medicine. 362:1181–1191. doi: 10.1056/NEJMoa0908094. [DOI] [PMC free article] [PubMed] [Google Scholar]
  27. McKusick VA (2007). Mendelian Inheritance in Man and its online version, OMIM. American Journal of Human Genetics. 80(4):588–604. doi: 10.1086/514346. [DOI] [PMC free article] [PubMed] [Google Scholar]
  28. McNeill EM, Klöckner-Bormann M, Roesler EC, Talton LE, et al. (2011). Nav2 hypomorphic mutant mice are ataxic and exhibit abnormalities in cerebellar development. Developmental Biology. 353(2):331–43. doi: 10.1016/j.ydbio.2011.03.008. [DOI] [PMC free article] [PubMed] [Google Scholar]
  29. Merrill RA, Plum LA, Kaiser ME, Clagett-Dame M (2002). A mammalian homolog of unc-53 is regulated by all-trans retinoic acid in neuroblastoma cells and embryos. Proceedings of the National Academy of Sciences. 99: 3422–3427. doi: 10.1073/pnas.052017399. [DOI] [PMC free article] [PubMed] [Google Scholar]
  30. Mihaylova V, Salih MAM, Mukhtar MM, Abuzeid HA, El-Sadig SM, et al. (2009). Refinement of the clinical phenotype in musk-related congenital myasthenic syndromes. Neurology. 73: 1926–1928. doi: 10.1212/WNL.0b013e3181c3fce9. [DOI] [PubMed] [Google Scholar]
  31. Mitani T, Punetha J, Akalin I, et al. (2019). Bi-allelic pathogenic variants in TUBGCP2 cause microcephaly and lissencephaly spectrum disorders. American Journal of Human Genetics.105(5):1005–1015. doi: 10.1016/j.ajhg.2019.09.017. [DOI] [PMC free article] [PubMed] [Google Scholar]
  32. Mitani T, Isikay S, Gezdirici A, et al. (2021). Evidence for a higher prevalane of oligogenic inheritance in neurodevelopmental disorders in the Turkish population. American Journal of Human Genetics. S0002-9297(21)00308–6. doi: 10.1016/j.ajhg.2021.08.009. [DOI] [Google Scholar]
  33. Pehlivan D, Bayram Y, Gunes N, et al. (2019). The genomics of arthrogryposis, a complex trait: candidate genes and further evidence for oligogenic inheritance. American Journal of Human Genetics.105(1):132–150. doi: 10.1016/j.ajhg.2019.05.015. [DOI] [PMC free article] [PubMed] [Google Scholar]
  34. Piluso G, Politano L, Aurino S, et al. (2005). Extensive scanning of the calpain-3 gene broadens the spectrum of LGMD2A phenotypes. Journal of Medical Genetics. 42(9):686–693. doi: 10.1136/jmg.2004.028738. [DOI] [PMC free article] [PubMed] [Google Scholar]
  35. Posey JE, O’Donnell-Luria AH, Chong JX, et al. (2019). Insights into genetics, human biology and disease gleaned from family based genomic studies. Genet Med, 21(4):798–812. doi: 10.1038/s41436-018-0408-7. [DOI] [PMC free article] [PubMed] [Google Scholar]
  36. Posey JE, Harel T, Liu P, et al. (2017). Resolution of disease phenotypes resulting from multilocus genomic variation. New England Journal of Medicine. 376(1):21–31. doi: 10.1056/NEJMoa1516767. [DOI] [PMC free article] [PubMed] [Google Scholar]
  37. Posey JE, Rosenfeld JA, James RA, et al. (2016). Molecular diagnostic experience of whole-exome sequencing in adult patients. Genet Med. 2016;18(7):678–685. doi: 10.1038/gim.2015.142 [DOI] [PMC free article] [PubMed] [Google Scholar]
  38. Potocki L, Chen KS, Koeuth T, et al. (1999). DNA rearrangements on both homologues of chromosome 17 in a mildly delayed individual with a family history of autosomal dominant carpal tunnel syndrome. American Journal of Human Genetics. 64:471–478. doi: 10.1086/302240. [DOI] [PMC free article] [PubMed] [Google Scholar]
  39. Purcell S, Neale B, Todd-Brown K, et al. (2007). PLINK: a tool set for whole-genome association and population-based linkage analyses. American Journal of Human Genetics. 81(3):559–575. doi: 10.1086/519795. [DOI] [PMC free article] [PubMed] [Google Scholar]
  40. Reid JG, Carroll A, Veeraraghavan N, et al. (2014). Launching genomics into the cloud: deployment of Mercury, a next generation sequence analysis pipeline. BMC Bioinformatics.15:30. doi: 10.1186/1471-2105-15-30. [DOI] [PMC free article] [PubMed] [Google Scholar]
  41. Retterer K, Juusola J, Cho MT, et al. (2016). Clinical application of whole exome sequencing across clinical indications. Genet Med. 18(7):696–704. doi: 10.1038/gim.2015.148. [DOI] [PubMed] [Google Scholar]
  42. Richard I, Broux O, Allamand V, Fougerousse F, et al. (1995). Mutations in the proteolytic enzyme calpain 3 cause limb-girdle muscular dystrophy type 2A. Cell. 81: 27–40. doi: 10.1016/0092-8674(95)90368-2. [DOI] [PubMed] [Google Scholar]
  43. Robinson PN, Köhler S, Bauer S, et al. (2008). The Human Phenotype Ontology: A Tool for Annotating and Analyzing Human Hereditary Disease. American Journal of Human Genetics. 83(5):610–615. doi: 10.1016/j.ajhg.2008.09.017. [DOI] [PMC free article] [PubMed] [Google Scholar]
  44. Sallevelt SC, Stegmann AP, de Konig B, et al. (2021). Diagnostic exome-based preconception carrier testing in consanguineous couples: results from the first 100 couples in clinical practice. Genet Med. doi: 10.1038/s41436-021-01116-x. Online ahead of print. [DOI] [PMC free article] [PubMed] [Google Scholar]
  45. Smith ED, Blanco K, Sajan SA, et al. (2019). A retrospective review of multiple findings in diagnostic exome sequencing: half are distinct and half are overlapping diagnoses. Genet Med. 21(10):2199–2207. doi: 10.1038/s41436-019-0477-2. [DOI] [PMC free article] [PubMed] [Google Scholar]
  46. Stenson PD, Ball EV, Mort M, et al. (2003). Human Gene Mutation Database (HGMD): 2003 update. Human Mutation. 21(6):5770581. doi: 10.1002/humu.10212. [DOI] [PubMed] [Google Scholar]
  47. Tarailo-Graovac M, Shyr C, Ross CJ, et al. (2016). Exome Sequencing and the Management of Neurometabolic Disorders. New England Journal of Medicine. 374(23):2246–2255. doi: 10.1056/NEJMoa1515792. [DOI] [PMC free article] [PubMed] [Google Scholar]
  48. The Gene Ontology Consortium. (2019).The Gene Ontology Resource: 20 years and still going strong. Nucleic Acids Research. 47(D1):D330–D338. doi: 10.1093/nar/gky1055. [DOI] [PMC free article] [PubMed] [Google Scholar]
  49. Ward JH Hierarchical Grouping to Optimize and Objective Function. (1963). Journal of the American Statistical Association. 58:236–244. [Google Scholar]
  50. Wohler E, Martin R, Griffith S, et al. (2021). PhenoDB, GeneMatcher, and VariantMatcher, tools for analysis and sharing of sequence data. Orphanet J Rare Dis. 10.1186/s13023-021-01916-z. [DOI] [PMC free article] [PubMed] [Google Scholar]
  51. Yang Y, Muzny DM, Xia F, et al. (2014). Molecular findings among patients referred for clinical whole-exome sequencing. Journal of the American Medical Association. 312:1870–1879. doi: 10.1001/jama.2014.14601. [DOI] [PMC free article] [PubMed] [Google Scholar]
  52. Yang Y, Muzny DM, Reid JG, et al. (2013). Clinical whole-exome sequencing for the diagnosis of mendelian disorders. New England Journal of Medicine. 369:1502–1511. doi: 10.1056/NEJMoa1306555. [DOI] [PMC free article] [PubMed] [Google Scholar]
  53. Yuan B, Harel T, Gu S, et al. (2015). Nonrecurrent 17p11.2p12 Rearrangement Events that Result in Two Concomitant Genomic Disorders:The PMP22-RAI1 Contiguous Gene Duplication Syndrome. American Journal of Human Genetics. 97:691–707. [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

fS1

Figure S1. Protein modeling of missense variant alleles in CAPN3, MUSK, NAV2, and ZC4H2 shows abnormal secondary and tertiary structure formation.

Protein FASTA sequences were obtained from UniProt for use as reference protein sequence. Variant FASTAs were made by making the appropriate amino acid change to the reference sequence. RaptorX was used to model protein structure using reference and variant FASTA sequences as input. The best model provided by RaptorX was then used to view detailed changes to protein structure at the site of amino acid change with the Research Collaboratory for Structural Bioinformatics Protein Database tool (rcsb.org). Note that the following discussion is based on these predicted structures. a) LEU87 forms noncovalent bonds with GLU84, THR85, and PHE86 important to alpha helix secondary structure in CAPN3. The L87V change causes disruption of these noncovalent bonds and a shift in alpha helical structure. b) Within MUSK, ALA594 forms noncovalent bonds with GLU576 and THR604 that stabilize beta sheets. Change of ALA594 to VAL594 causes a change in noncovalent bonding, allowing PRO595 to form a noncovalent interaction and causing a change in beta sheet structure. c) The change from GLY666, a flexible (low steric hindrance) residue to ARG666 causes the widening and increased tortuosity of a loop within NAV2. d) ASN199 forms hydrogen bonds with ASP223 and LYS219 important the structure of a loop near a more central alpha helix in ZC4H3. Change of ASN199 to HIS199 disrupts both bonds and causes poor interaction between the same loop and alpha helix, impacting tertiary structure.

tS1

Data Availability Statement

The data that support the findings of this study are available from the corresponding author upon reasonable request.

RESOURCES