Skip to main content
. 2022 Mar 7;14:10. doi: 10.1186/s13321-022-00590-y

Table 1.

Data

source summary. MolData was created using 9 data sources, the number of bioassays within each data source is shown in AID count column. Each molecule in a given source can have bioactivity for multiple bioassays and constitute multiple data points. Unique active molecules are defined as molecules that demonstrate bioactivity in at least one bioassay

PubChem Source AID count Active data points Total data points % Active datapoints Unique active molecules Total unique molecules % Unique active molecules
Broad Institute 67 125,627 22.2 m 0.56% 85,579 472,858 18.1%
Burnham Center for Chemical Genomics 67 139,021 21.9 m 0.63% 77,159 381,794 20.21%
Emory University Molecular Libraries Screening Center 12 24,195 2.47 m 0.98% 20,964 348,231 6.02%
ICCB-Longwood Screening Facility, Harvard Medical School 11 8358 2.1 m 0.39% 6656 564,021 1.18%
Johns Hopkins Ion Channel Center 22 48,545 6.8 m 0.71% 35,487 344,497 10.30%
NMMLSC 42 48,186 11.5 m 0.42% 37,949 369,431 10.27%
National Center for Advancing Translational Sciences (NCATS) 174 720,319 53.4 m 1.35% 240,096 592,616 40.51%
The Scripps Research Institute Molecular Screening Center 148 275,224 47.6 m 0.58% 142,055 920,418 15.43%
Tox21 57 21,475 0.47 m 5.67% 4183 8743 47.84%