Skip to main content
. 2024 Jul 22;56(8):1644–1653. doi: 10.1038/s41588-024-01836-1

Fig. 5. Machine learning identifies features relevant to the diagnostic yield and can support variant prioritization.

Fig. 5

a, The coefficient paths of regression analysis using the LASSO are shown. Only features that are included in the final model and are present in at least 5% of the cases that were used for training are depicted. The more to the left [lower ln(λ)] a coefficient path starts to deviate from the x axis, the more informative the corresponding feature is in terms of predicting the diagnostic yield. Features with positive coefficients increase the diagnostic yield. In contrast, features with negative coefficients render a monogenic cause less likely. For example, dysfunction of higher cognitive abilities and ataxia are associated with a higher diagnostic yield (clinical features are colored according to their higher-order HPO groups; for details, see Supplementary Note). An algorithm to predict the diagnostic yield (YieldPred) was developed on the basis of these data and can be found online (https://translate-namse.de). b, The performances of variant prioritization approaches were compared. All disease-associated genes were ranked using the respective variant prioritization method. Subsequently, the proportion of cases detected with the correct disease-associated gene (sensitivity) was shown as a function of the number of disease-associated genes considered, beginning at the top score. The following four approaches for variant prioritization were tested in solved cases from the PEDIA cohort (n = 94): (1) only a molecular pathogenicity score (CADD68) with top-10 accuracy of 48%; (2) feature-based score (CADA69) in addition to CADD with top-10 accuracy of 68%; and (3 and 4) a gestalt score from facial image analysis (GestaltMatcher40) alone or in addition to both CADD and CADA referred to as PEDIA score41 with top-10 accuracy of 82%. Note that the bold lines indicate the observed top-k accuracy and bootstrapped 95% CIs are indicated by the lighter shading around the lines. MRI, magnetic resonance imaging; abn., abnormality; con., congenital; dysf., dysfunction; psych., psychiatric; sym., symptoms; sec, secondary.