Skip to main content
. 2024 Dec 18;11:1379. doi: 10.1038/s41597-024-04190-3

Table 1.

Structure of entries in the dataset.

Section Column Description Data type
Chemical Compound name Generated in ChemBioDraw string
Empirical formula Generated in ChemBioDraw string
CAS Chemical Abstracts Service number (where available) string
SMILES Simplified Molecular Input Line Entry System, a chemical format that allows encoding a 3-D structure of a chemical in a string of symbols. Regular SMILES can have multiple valid representations for the same molecule. string
Canonical SMILES A unique or standardized SMILES representation ensuring that each molecule is assigned a distinct SMILES string. The specific output, however, depends on the canonicalization algorithm employed, which may influence the final representation. string
Mw Molecular weight numeric
Biological IC50, EC50, CC50* Half-maximal inhibitory, effective, or cytotoxicity concentration numeric
Range of IC50, EC50, or CC50 Confidence intervals, standard deviations, standard errors of the mean, etc., for the cytotoxicity values provided in the previous field numeric
Incubation time Time period, for which cells were exposed to a given IL numeric
Cell line Name of the cell culture used in a particular study string
Method Assay used for measuring IC50, EC50, or CC50 string
Notes Additional information on biological activity of a particular IL, if provided string
Bibliographic Reference Author, year, and journal text
DOI Digital Object Identifier text

* In the papers annotated for the dataset, the designation of the value – IC50, EC50, or CC50 – depended mainly on the authors of a particular study. Here, the assay determines the type of value, and we can compare the values from different studies obtained by the same assay. Thus, in most cases, we made no marks in the dataset on the particular designations used by the authors.