Skip to main content
. Author manuscript; available in PMC: 2021 Nov 1.
Published in final edited form as: Comput Toxicol. 2020 Nov 1;16(November 2020):10.1016/j.comtox.2020.100139. doi: 10.1016/j.comtox.2020.100139

Table 1.

Summary of the dataset and random forest model performance metrics for different study types and species combinations used in this study. Only the combinations with more than 50 unique chemicals were used in this analysis. The mean standard deviation per combination reflects the underlying variability in the data and can be thought of as a theoretical lower bound on RMSE i.e. we can expect the RMSE to be higher than this.

Study Type Species Total number of effect level values Number of unique chemicals with curated structure and descriptors Training-Test set split Theoretical lower bound on RMSE (Mean standard deviation of effect level values, Figure 2) Random Forest Model Performance Metrics
Internal Training Set (5-fold cross-validation) External Test Set
RMSE RMSE/σ R2 RMSE RMSE/σ R2
Chronic (CHR) Rat 7172 1129 903-226 0.51 0.93 0.76 0.43 0.93 0.82 0.33
Mouse 4029 720 576-144 0.50 0.86 0.84 0.29 0.96 0.82 0.33
Rat, Mouse 11201 1236 988-248 0.55 0.94 0.78 0.39 0.92 0.77 0.40
Subchronic (SUB) Rat 36017 3199 2559-640 0.48 0.81 0.82 0.33 0.86 0.85 0.28
Mouse 5030 723 578-145 0.50 0.88 0.83 0.31 0.86 0.80 0.37
Rabbit 1516 415 332-83 0.49 0.96 0.96 0.08 0.83 0.90 0.20
Rat, Mouse, Rabbit 42563 3306 2644-662 0.50 0.83 0.83 0.31 0.80 0.82 0.33
Reproductive (REP) Rat 5446 841 672-169 0.49 0.79 0.80 0.36 0.76 0.84 0.30
Mouse 505 87 69-18 0.36 1.80 1.09 −0.19 0.88 0.71 0.50
Rat, Mouse 5951 889 711-178 0.49 0.91 0.86 0.26 0.79 0.83 0.31
Developmental (DEV) Rat 6021 930 744-186 0.41 0.80 0.92 0.16 0.78 0.89 0.20
Mouse 704 116 92-24 0.34 1.05 0.96 0.09 0.80 1.05 −0.09
Rabbit 3220 491 392-99 0.38 0.76 0.89 0.20 0.88 0.95 0.10
Rat, Mouse, Rabbit 9945 1004 803-201 0.42 0.76 0.89 0.29 0.80 0.86 0.26
Subacute Rat 946 160 128-32 0.58 0.92 0.98 0.05 1.04 1.05 −0.10
ALL (CHR, SUB, REP, DEV, SAC) All (Rat, Mouse, Rabbit) 71020 3632 2905-727 0.53 0.82 0.82 0.32 0.81 0.80 0.36
*ALL (CHR, SUB, REP, DEV, SAC) All (Rat, Mouse, Rabbit) 71020 3632 8349-2088 - 0.67 0.65 0.54 0.70 0.64 0.57
*

The numbers in this row reflect the final model (Figure 5) which use study type and species as additional descriptors and so each chemical is used more than once in the training dataset as separate instances reflected in the much larger size of training and test sets.