Table 2.
Table summarizes the impact of feature cluster removal (i.e., based on their respective linkage distances) on the predictive performance of the LGBM model
| No. of features | Mean absolute error (n = 10) | Standard deviation (n = 10) | Wards linkage distance |
|---|---|---|---|
| 17 | 0.116 | 0.018 | 0.00 |
| 15 | 0.116 | 0.017 | 0.06 |
| 13 | 0.142 | 0.017 | 0.12 |
| 12 | 0.143 | 0.017 | 0.24 |
| 11 | 0.143 | 0.018 | 0.29 |
| 10 | 0.143 | 0.019 | 0.35 |
| 9 | 0.139 | 0.022 | 0.53 |
| 8 | 0.150 | 0.021 | 0.76 |
| 5 | 0.296 | 0.023 | 0.82 |
| 4 | 0.296 | 0.024 | 0.88 |
The performance of the LGBM model with various numbers of input features was assessed by comparing the average and standard deviation of the AE values obtained from a series of trials (n = 10 trials) that randomly grouped 20% of the drug–polymer combinations as a holdout test set.