Table 1.
Study Type | Species | Total number of effect level values | Number of unique chemicals with curated structure and descriptors | Training-Test set split | Theoretical lower bound on RMSE (Mean standard deviation of effect level values, Figure 2) | Random Forest Model Performance Metrics | |||||
---|---|---|---|---|---|---|---|---|---|---|---|
Internal Training Set (5-fold cross-validation) | External Test Set | ||||||||||
RMSE | RMSE/σ | R2 | RMSE | RMSE/σ | R2 | ||||||
Chronic (CHR) | Rat | 7172 | 1129 | 903-226 | 0.51 | 0.93 | 0.76 | 0.43 | 0.93 | 0.82 | 0.33 |
Mouse | 4029 | 720 | 576-144 | 0.50 | 0.86 | 0.84 | 0.29 | 0.96 | 0.82 | 0.33 | |
Rat, Mouse | 11201 | 1236 | 988-248 | 0.55 | 0.94 | 0.78 | 0.39 | 0.92 | 0.77 | 0.40 | |
Subchronic (SUB) | Rat | 36017 | 3199 | 2559-640 | 0.48 | 0.81 | 0.82 | 0.33 | 0.86 | 0.85 | 0.28 |
Mouse | 5030 | 723 | 578-145 | 0.50 | 0.88 | 0.83 | 0.31 | 0.86 | 0.80 | 0.37 | |
Rabbit | 1516 | 415 | 332-83 | 0.49 | 0.96 | 0.96 | 0.08 | 0.83 | 0.90 | 0.20 | |
Rat, Mouse, Rabbit | 42563 | 3306 | 2644-662 | 0.50 | 0.83 | 0.83 | 0.31 | 0.80 | 0.82 | 0.33 | |
Reproductive (REP) | Rat | 5446 | 841 | 672-169 | 0.49 | 0.79 | 0.80 | 0.36 | 0.76 | 0.84 | 0.30 |
Mouse | 505 | 87 | 69-18 | 0.36 | 1.80 | 1.09 | −0.19 | 0.88 | 0.71 | 0.50 | |
Rat, Mouse | 5951 | 889 | 711-178 | 0.49 | 0.91 | 0.86 | 0.26 | 0.79 | 0.83 | 0.31 | |
Developmental (DEV) | Rat | 6021 | 930 | 744-186 | 0.41 | 0.80 | 0.92 | 0.16 | 0.78 | 0.89 | 0.20 |
Mouse | 704 | 116 | 92-24 | 0.34 | 1.05 | 0.96 | 0.09 | 0.80 | 1.05 | −0.09 | |
Rabbit | 3220 | 491 | 392-99 | 0.38 | 0.76 | 0.89 | 0.20 | 0.88 | 0.95 | 0.10 | |
Rat, Mouse, Rabbit | 9945 | 1004 | 803-201 | 0.42 | 0.76 | 0.89 | 0.29 | 0.80 | 0.86 | 0.26 | |
Subacute | Rat | 946 | 160 | 128-32 | 0.58 | 0.92 | 0.98 | 0.05 | 1.04 | 1.05 | −0.10 |
ALL (CHR, SUB, REP, DEV, SAC) | All (Rat, Mouse, Rabbit) | 71020 | 3632 | 2905-727 | 0.53 | 0.82 | 0.82 | 0.32 | 0.81 | 0.80 | 0.36 |
*ALL (CHR, SUB, REP, DEV, SAC) | All (Rat, Mouse, Rabbit) | 71020 | 3632 | 8349-2088 | - | 0.67 | 0.65 | 0.54 | 0.70 | 0.64 | 0.57 |
The numbers in this row reflect the final model (Figure 5) which use study type and species as additional descriptors and so each chemical is used more than once in the training dataset as separate instances reflected in the much larger size of training and test sets.