Skip to main content
. 2024 Sep 10;11:985. doi: 10.1038/s41597-024-03793-0

Table 3.

Summary of Datasets in PharmaBench: ‘Property Name’ refers to the name of the dataset.

Category Property Name Entries After Data Processing Final Entries for AI modeling Unit Mission Type
Physochemical LogD 14,141 13,068 regression
Water Solubility 14,818 11,701 log10nM regression
Absorption BBB 12,486 8,301 classification
Distribution PPB 1,310 1,262 % regression
Metabolism CYP 2C9 4,507 999 Log10uM regression
CYP 2D6 1214 Log10uM regression
CYP 3A4 1980 Log10uM regression
Clearance HLMC 5,252 2286 Log10(mL.min-1.g-1) regression
RLMC 1129 Log10(mL.min-1.g-1) regression
MLMC 1403 Log10(mL.min-1.g-1) regression
Toxicity AMES 24,780 9,139 classification
Total 77,294 52,482

‘Entries After Data Processing’ indicates the number of entries remaining after the data processing workflow. ‘Final Entries for AI Modeling’ denotes the number of entries used in the final AI modeling process. ‘Unit’ specifies the measurement unit for regression tasks. ‘Mission Type’ encompasses two categories: regression and classification.