Skip to main content
. Author manuscript; available in PMC: 2019 Mar 18.
Published in final edited form as: Toxicology. 2017 Jun 23;389:139–145. doi: 10.1016/j.tox.2017.06.003

Table 2.

Description of the sources upon which the training set was built. In number of compounds, “+” denotes the number of DILI-positive compounds and “−” the number of negative compounds. These numbers correspond to the number of compounds remaining after data curation in a source by source basis.

Source name Type of data Number of compounds Label choice
O’Brien et al. (2006) In vitro cell-based assay 132 (100+/32−) “severely” and “moderately” toxic are considered positives.
Rodgers et al. (2010) FDA reports database 382 (75+/307−) Authors classification
Fourches et al. (2010) Text mining 902 (620+/282−) Authors classification
Greene et al. (2010) Compilation of published data 385 (252+/133−) Authors classification
Ekins et al. (2010) Clinical data for hepatotoxicity 499 (294+/205−) Authors classification
Chen et al. (2011) FDA-approved labels 279 (218+/61−) “most DILI concern” and “less DILI concern” are considered positives
Liu et al. (2011) SIDER_2 database 835 (188+/647−) Authors classification
Zhu and Kruhlak (2014) Post-marketing safety data 1948 (651+/1297−) Authors classification, keeping only highest class certainty
Liu et al. (2015b) LiverTox database 583 (409+/174−) “hepatotoxic” and “possible hepatotoxic” are considered positives