Table 3.
Features in the “hero” optimized cost-sensitive RF classifier ranked by importance
Model's feature | MDI | Model's feature | MDI |
---|---|---|---|
other_lipid_lowering_drugs_duration_yrs | 0.52 | alcohol_current_consumption | 0.2 |
surgery_type | 0.41 | smoking_time_since_quitting_yrs | 0.2 |
radio_bolus | 0.4 | radio_imrt | 0.19 |
chemotherapy | 0.36 | radio_photon_boostdose_Gy | 0.19 |
boost | 0.35 | other_antihypertensive_drug | 0.19 |
radio_photon_dose_MV | 0.34 | household_members | 0.19 |
epirubicin_chemo_drug | 0.34 | radio_breast_fractions_dose_per_fraction_Gy | 0.19 |
blood_pressure | 0.33 | radio_elec_boost_field_y_cm | 0.19 |
Bra_band_size | 0.3 | radio_photon_2nd | 0.19 |
radio_treated_breast | 0.3 | bra_cup_size | 0.19 |
tumour_size_mm | 0.29 | radio_breast_fractions | 0.19 |
paclitaxel_chemo_drug | 0.29 | n_stage | 0.18 |
grade_invasive | 0.28 | hypertension_duration_yrs | 0.18 |
breast_separation | 0.28 | radio_supraclavicular_fossa | 0.18 |
smoking | 0.27 | education_profession | 0.18 |
radio_elec_energy_MeV | 0.27 | radio_axillary_levels | 0.18 |
BED_boost | 0.27 | hypertension | 0.18 |
docetaxel_chemo_drug | 0.27 | radio_photon_boost_fractions_per_week | 0.17 |
BED_Total | 0.27 | smoker | 0.17 |
radio_elec_boost_dose_Gy | 0.27 | depression | 0.17 |
On_tamoxifen | 0.26 | menopausal_status | 0.17 |
radio_heart_mean_dose_Gy | 0.26 | radio_boost_diameter_cm | 0.16 |
t_stage | 0.26 | 5-fluorouracil (5-FU)_chemo_drug | 0.16 |
radio_hot_spots_107 | 0.25 | radio_photon_boost_dose_per_fraction_Gy | 0.16 |
BED_Breast | 0.25 | antidepressant_duration_yrs | 0.16 |
tobacco_products_per_day | 0.25 | radio_breast_fractions_per_week | 0.15 |
age_at_radiotherapy_start_yrs | 0.25 | radio_boost_type | 0.15 |
radio_breast_ct_volume_cm3 | 0.25 | Carboplatin_chemo_drug | 0.15 |
hormone_replacement_therapy | 0.24 | radio_boost_sequence | 0.15 |
radio_photon_boost_volume_cm3 | 0.24 | radio_photon_boost_fractions | 0.15 |
antidepressant | 0.24 | household_income | 0.15 |
height_cm | 0.24 | methotrexate_chemo_drug | 0.15 |
radio_photon_2nd_energy_MV | 0.24 | other_lipid_lowering_drugs | 0.14 |
radio_ipsilateral_lung_mean_Gy | 0.24 | radio_photon_energy_MV or kV | 0.14 |
alcohol_previous_consumption | 0.24 | ace_inhibitor | 0.13 |
radio_photon_2nd_dose_fractions_per_week | 0.23 | analgesics_duration_yrs | 0.13 |
radio_skin_max_dose_Gy | 0.23 | radio_photon_2nd_dose_per_fraction_Gy | 0.13 |
histology | 0.23 | antidiabetic_duration_yrs | 0.13 |
monopause_age_yrs | 0.23 | depression_duration_yrs | 0.13 |
other_antihypertensive_drug_duration_yrs | 0.23 | on_statin_duration_yrs | 0.12 |
weight_at_cancer_diagnosis_kg | 0.23 | antidiabetic | 0.12 |
tobacco_product | 0.23 | diabetes | 0.11 |
cyclophosphamide_chemo_drug | 0.22 | ace_inhibitor_duration_yrs | 0.11 |
combined_chemo_drugs | 0.22 | on_statin | 0.11 |
boost_frac | 0.22 | doxorubicin_chemo_drug | 0.11 |
analgesics | 0.22 | history_of_heart_disease | 0.09 |
breast_cancer_family_history_1st_degree | 0.22 | radio_axillary_other | 0.09 |
smoking_duration_yrs | 0.21 | ethnicity | 0.09 |
radio_photon_boostdose_precise_Gy | 0.21 | radio_interrupted | 0.08 |
radio_elec_boost_field_x_cm | 0.21 | pegfilgrastim_chemo_drug | 0.07 |
radio_photon_2nd_fractions | 0.21 | history_of_heart_disease_duration_yrs | 0.06 |
radio_boost_fractions | 0.21 | radiotherapy_toxicity_family_history | 0.06 |
alcohol_intake | 0.21 | diabetes_duration_yrs | 0.05 |
radio_type_imrt | 0.21 | radio_interrupted_days | 0.05 |
radio_treatment_pos | 0.21 | trastuzumab_chemo_drug | 0.04 |
radio_breast_dose_Gy | 0.2 | other_collagen_vascular_disease | 0.03 |
rheumatoid arthritis_duration_yrs | 0.2 | rheumatoid arthritis | 0.02 |
Abbreviations: BED = biologically effective dose; IMRT = intensity modulated radiation therapy; MDI = mean decrease impurity; MeV = mega electron volt; MV = mega volt; RF = random forest.
Feature importance is calculated as the decrease in node impurity weighted by the probability of reaching that node. The node probability can be calculated by the number of samples that reach the node, divided by the total number of samples. The higher the value, the more important the feature.