Fig. 5. Machine learning identifies features relevant to the diagnostic yield and can support variant prioritization.
a, The coefficient paths of regression analysis using the LASSO are shown. Only features that are included in the final model and are present in at least 5% of the cases that were used for training are depicted. The more to the left [lower ln(λ)] a coefficient path starts to deviate from the x axis, the more informative the corresponding feature is in terms of predicting the diagnostic yield. Features with positive coefficients increase the diagnostic yield. In contrast, features with negative coefficients render a monogenic cause less likely. For example, dysfunction of higher cognitive abilities and ataxia are associated with a higher diagnostic yield (clinical features are colored according to their higher-order HPO groups; for details, see Supplementary Note). An algorithm to predict the diagnostic yield (YieldPred) was developed on the basis of these data and can be found online (https://translate-namse.de). b, The performances of variant prioritization approaches were compared. All disease-associated genes were ranked using the respective variant prioritization method. Subsequently, the proportion of cases detected with the correct disease-associated gene (sensitivity) was shown as a function of the number of disease-associated genes considered, beginning at the top score. The following four approaches for variant prioritization were tested in solved cases from the PEDIA cohort (n = 94): (1) only a molecular pathogenicity score (CADD68) with top-10 accuracy of 48%; (2) feature-based score (CADA69) in addition to CADD with top-10 accuracy of 68%; and (3 and 4) a gestalt score from facial image analysis (GestaltMatcher40) alone or in addition to both CADD and CADA referred to as PEDIA score41 with top-10 accuracy of 82%. Note that the bold lines indicate the observed top-k accuracy and bootstrapped 95% CIs are indicated by the lighter shading around the lines. MRI, magnetic resonance imaging; abn., abnormality; con., congenital; dysf., dysfunction; psych., psychiatric; sym., symptoms; sec, secondary.