Abstract
Motivation
The evaluation of chemicals for their carcinogenic hazard requires the analysis of a wide range of data and the characterization of these results relative to the key characteristics of carcinogens. The workflow used historically requires many manual steps that are labor-intensive and can introduce errors, bias and inconsistencies.
Results
The automation of parts of the evaluation workflow using the kc-hits software has led to significant improvements in process efficiency, as well as more consistent and comprehensive results.
Availability and implementation
Supplementary information
Supplementary data are available at Bioinformatics online.
1 Introduction
The evaluation of chemicals for their carcinogenic hazard requires the careful analysis of a wide range of data, including results from epidemiological cancer studies in humans, bioassays in animals and various types of in vivo and in vitro experiments relevant to metabolism and mechanisms (International Agency for Research on Cancer, 2019). One source for the latter type of data is the United States Environmental Protection Agency’s Toxicity Forecaster//Toxicity Testing in the 21st Century (ToxCast/Tox21) database (Richard et al., 2016), which contains the detailed results and summary analyses of over 1800 high-throughput assays related to molecular targets and cellular responses for over 8700 chemicals. These assay data are available in downloadable ‘raw’ form (US Environmental Protection Agency, 2021b) and, in processed form, are accessible through the CompTox Chemicals Dashboard web-based front end (Williams et al., 2017).
One of the important evaluation criteria in categorizing chemicals for carcinogenicity is related to the extent to which these agents display one or more of the key characteristics (KCs) of carcinogens (Smith et al., 2016), which are a consistent and systematic framework for unifying, harmonizing and categorizing the myriad molecular- and cellular-biological mechanisms underlying chemical carcinogenesis.
Thus, current best practices for evaluation and classification of chemicals for their carcinogenic hazard requires linking multiple types of data relevant to the KCs to support mechanistic conclusions (Samet et al., 2020). Unfortunately, making these linkages is a labor-intensive process, necessitating mining, interpreting and synthesizing information from disparate sources and often requiring the expertise of scientists from a range of disciplines, including toxicology, molecular biology, cancer biology, biochemistry and data science. Considering the ToxCast/Tox21 data stream alone, the conventional workflow involves conducting a search using a web portal for each agent of interest, filtering the results, downloading or screen-capturing the data, manually conducting an assay-to-KC mapping using a curated list, examining the data for relevance, coherence and consistency, performing statistical analyses on the results and preparing tables and plots. This workflow facilitates consideration and scrutiny of in vitro test results, including the suitability of the test article (i.e. the physical and chemical characteristics), the concentration range and the endpoint, as well as any limitations of the test system (e.g. metabolic capabilities) (Chiu et al., 2018; Samet et al., 2020).
As a significant step toward automating this workflow, we developed the software kc-hits (KCs of carcinogens—high-throughput screening discovery tool) with the overarching aims of providing a user-friendly, productivity-enhancing tool for the diverse scientists and agencies involved in classifying chemicals for their carcinogenic potential, while improving the accuracy and consistency of the processed results compared with those produced manually from this data stream. The software has other potential applications, including the prioritization of chemicals for testing and for indicating gaps in coverage in the high-throughput screening databases.
2 Materials and methods
The base set of data comprised three elements: (i) the assay data (US Environmental Protection Agency, 2021b), (ii) detailed descriptions of the assay methodologies (US Environmental Protection Agency, 2021a) and (iii) the curated mapping of assays to KCs (International Agency for Research on Cancer, 2020).
A cleaning and preparation pipeline was written to take these raw data, resolve inconsistencies (e.g. replicate assay results showing different outcomes), remove unnecessary information and condense to a format ingestible by downstream analysis and presentation tools. This pipeline can be reapplied as new versions of the underlying assay data are published.
Code was written to query and aggregate the preprocessed results and then to analyze and present them in various formats useful in the carcinogenicity characterization process. These steps could be conducted through a graphical user interface (GUI) or application programming interfaces. Functionality was included to save results to a familiar multi-worksheet spreadsheet format for printing and further data processing.
The underlying code was structured to enable more sophisticated analyses, such as the development of quantitative structure–activity relationships or biologically based models.
The GUI was designed to be familiar and require minimal training of the user. It comprises a single screen with a chemical selection pane and a multi-tabbed results pane (Fig. 1).
Fig. 1.

GUI with a chemical selected
The inventory of chemicals can be easily narrowed through an interactive filtering mechanism. The application itself can be operated using a mouse or the keyboard. A simple help screen is also included that summarizes operation and lists the keyboard shortcuts. An additional menu item contains information about the version of the application, underlying data and mapping.
3 Results
Further details of the application, including screenshots of the GUI, are given in the Supplementary Materials for this manuscript and in the code repository.
The authors conducted several tests comparing the results output by kc-hits and (i) those produced manually and (ii) the assay lists produced from the CompTox Chemicals Dashboard (Williams et al., 2017). The software passed these tests, with all comparisons showing identical results or any differences being explainable owing to historical changes in methodologies and/or data.
As a more practical test, the software was used in a recent evaluation of the carcinogenicity of five chemicals (1,1,1-trichloroethane, 1,2-diphenylhydrazine, diphenylamine, N-methylolacrylamide and isophorone) conducted by the International Agency for Research on Cancer (Belpoggi et al., 2021). Information on all except one of these chemicals was available in the ToxCast/Tox21 database. The total number of assays for the four compounds was 2962 (1,1,1-trichloroethane: 235 assays; 1,2-diphenylhydrazine: 882 assays; diphenylamine: 968 assays; isophorone: 877 assays). Although not a rigorous analysis or result, polling of individuals involved in the process indicated that the manual workflow described earlier would normally have taken about 2 h per chemical for someone familiar with the procedure; however, using kc-hits, the entire process for all four chemicals was completed in <15 min and resulted in a consistent and comprehensive set of results across all chemicals.
Data availability
The code for the software is available under an Open Source (MIT) license and may be downloaded from https://gitlab.com/i1650/kc-hits.git. An installable package is available through the Python packaging index (PyPI) and a standalone MS Windows version is available at https://doi.org/10.5281/zenodo.5846990.
Acknowledgements
The authors thank Dr. Antony Williams and Dr. Katie Paul Friedman of the US Environmental Protection Agency for providing counsel on the procedures used by the EPA to resolve inconsistencies in ToxCast/Tox21 assay results.
Disclaimer
Where authors are identified as personnel of the International Agency for Research on Cancer/World Health Organization, the authors alone are responsible for the views expressed in this article and they do not necessarily represent the decisions, policy or views of the International Agency for Research on Cancer/World Health Organization.
Funding
This work was supported by the National Institutes of Health and National Cancer Institute [NIH-NCI U01CA033193] and was conducted for the IARC Monographs programme.
Conflict of Interest: None.
Contributor Information
Brad Reisfeld, Evidence Synthesis and Classification Branch, International Agency for Research on Cancer, World Health Organization, Lyon 69372 CEDEX 08, France.
Aline de Conti, Evidence Synthesis and Classification Branch, International Agency for Research on Cancer, World Health Organization, Lyon 69372 CEDEX 08, France.
Fatiha El Ghissassi, Evidence Synthesis and Classification Branch, International Agency for Research on Cancer, World Health Organization, Lyon 69372 CEDEX 08, France.
Lamia Benbrahim-Tallaa, Evidence Synthesis and Classification Branch, International Agency for Research on Cancer, World Health Organization, Lyon 69372 CEDEX 08, France.
William Gwinn, Evidence Synthesis and Classification Branch, International Agency for Research on Cancer, World Health Organization, Lyon 69372 CEDEX 08, France.
Yann Grosse, Evidence Synthesis and Classification Branch, International Agency for Research on Cancer, World Health Organization, Lyon 69372 CEDEX 08, France.
Mary Schubauer-Berigan, Evidence Synthesis and Classification Branch, International Agency for Research on Cancer, World Health Organization, Lyon 69372 CEDEX 08, France.
References
- Belpoggi F. et al. (2021) Carcinogenicity of 1,1,1-trichloroethane and four other industrial chemicals. Lancet Oncol., 22, 1661–1662. [DOI] [PubMed] [Google Scholar]
- Chiu W.A. et al. (2018) Use of high-throughput in vitro toxicity screening data in cancer hazard evaluations by IARC Monograph Working Groups. ALTEX, 35, 51–64. [DOI] [PMC free article] [PubMed] [Google Scholar]
- International Agency for Research on Cancer (2019) IARC Monographs on the Identification of Carcinogenic Hazards to Humans: Preamble. International Agency for Research on Cancer, Lyon, France. [Google Scholar]
- International Agency for Research on Cancer (2020) Some Nitrobenzenes and Other Industrial Chemicals. Annex 1. Supplementary Material for ToxCast/Tox21 Section 4.4. International Agency for Research on Cancer, Lyon, France. [PubMed]
- Richard A.M. et al. (2016) ToxCast chemical landscape: paving the road to 21st century toxicology. Chem. Res. Toxicol., 29, 1225–1251. [DOI] [PubMed] [Google Scholar]
- Samet J.M. et al. (2020) The IARC monographs: updated procedures for modern and transparent evidence synthesis in cancer hazard identification. J. Natl. Cancer Inst., 112, 30–37. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Smith M.T. et al. (2016) Key characteristics of carcinogens as a basis for organizing data on mechanisms of carcinogenesis. Environ. Health Perspect., 124, 713–721. [DOI] [PMC free article] [PubMed] [Google Scholar]
- US Environmental Protection Agency (2021a) ToxCast & Tox21 High-Throughput Assay Documentation. https://www.epa.gov/chemical-research/exploring-toxcast-data-downloadable-data (19 November 2021, date last accessed).
- US Environmental Protection Agency (2021b) ToxCast & Tox21 Summary Files for invitroDBv3.4. https://www.epa.gov/chemical-research/exploring-toxcast-data-downloadable-data (19 November 2021, date last accessed).
- Williams A.J. et al. (2017) The CompTox Chemistry Dashboard: a community data resource for environmental chemistry. J. Cheminform., 9, 61. [DOI] [PMC free article] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Data Availability Statement
The code for the software is available under an Open Source (MIT) license and may be downloaded from https://gitlab.com/i1650/kc-hits.git. An installable package is available through the Python packaging index (PyPI) and a standalone MS Windows version is available at https://doi.org/10.5281/zenodo.5846990.
