Table 1. Overview of descriptor sets from the chemical, protein target, and cytotoxicity domain to be used in modelling toxicity data in all possible combinations. In each modelling repeat, the feature selection and pre-processing procedure was applied to the data in the respective modelling set to select an optimum similarly sized subset of descriptors from each domain.
Data domain | Details | Source | Information encoded |
Chemical | 192 2D descriptors | MOE | Chemical structure and physiochemical properties |
Protein target | 477 human target-affinity descriptors | In silico algorithm trained on dataset extracted from ChEMBL version 14 | Translation of chemical space into biological space; likelihood of interaction with subset of human proteome |
Cytotoxicity | 182 dose–response datapoints of 14 concentrations across 13 human, rat and mouse cell lines, scaled such that the maximum response for each curve equals 1. | Original data extracted from PubChem and processed to remove noise as per study of Sedykh et al. (2011) | Experimental cell-viability outcomes of compound exposure |