Table 1.
Feature label | RF(sklearn) | BRF(imblearn) |
---|---|---|
HPO-cosine | 0.2895 | 0.2471 |
PyxisMap | 0.2207 | 0.2079 |
CADD Scaled | 0.1031 | 0.1007 |
phylop100 conservation | 0.0712 | 0.0817 |
phylop conservation | 0.0641 | 0.0810 |
phastcon100 conservation | 0.0572 | 0.0628 |
GERP rsScore | 0.0357 | 0.0416 |
HGMD assessment type_DM | 0.0373 | 0.0344 |
HGMD association confidence_High | 0.0309 | 0.0311 |
Gnomad Genome total allele count | 0.0192 | 0.0322 |
ClinVar Classification_Pathogenic | 0.0228 | 0.0200 |
ADA Boost Splice Prediction | 0.0081 | 0.0109 |
Random Forest Splice Prediction | 0.0077 | 0.0105 |
Meta Svm Prediction_D | 0.0088 | 0.0092 |
PolyPhen HV Prediction_D | 0.0075 | 0.0071 |
Effects_Premature stop | 0.0049 | 0.0057 |
SIFT Prediction_D | 0.0026 | 0.0056 |
PolyPhen HD Prediction_D | 0.0025 | 0.0049 |
Effects_Possible splicing modifier | 0.0029 | 0.0035 |
ClinVar Classification_Likely Pathogenic | 0.0034 | 0.0020 |
This table shows the top 20 features that were used to train the classifiers ordered from most important to least important. After training, the two random forest classifiers report the importance of each feature in the classifier (total is 1.00 per classifier). We average the two importance values, and order them from most to least important. Feature labels with an ‘_’ represent a single category of a multi-category feature (i.e. “HGMD assessment type_DM” means the “DM” bin-count feature from the “HGMD assessment type” annotation in Codicem)