Skip to main content
EPA Author Manuscripts logoLink to EPA Author Manuscripts
. Author manuscript; available in PMC: 2022 Feb 22.
Published in final edited form as: J Chem Inf Model. 2021 Jan 22;61(2):565–570. doi: 10.1021/acs.jcim.0c01273

Enabling High-Throughput Searches for Multiple Chemical Data using the US-EPA CompTox Chemicals Dashboard

Charles N Lowe 1,**, Antony J Williams 1,*
PMCID: PMC8630643  NIHMSID: NIHMS1747950  PMID: 33481596

Abstract

The core goal of cheminformatics is to efficiently store robust and accurate chemical information and make it accessible for drug discovery, environmental analysis, and the development of prediction models including quantitative structure-activity relationships (QSAR). The US Environmental Protection Agency (EPA) has developed a web-based application, the CompTox Chemicals Dashboard, which provides access to a compilation of data generated within the agency and sourced from public databases and literature, and to utilities for real time QSAR prediction and chemical read-across. While the vast majority of online tools only allow interrogation of chemicals one at a time, the Dashboard provides a batch search feature that allows for the sourcing of data based on thousands of chemical inputs at one time, by chemical identifier (e.g., names, Chemical Abstract Service registry numbers, or InChIKeys) or by mass or molecular formulas. Chemical information that can then be sourced via the batch search includes chemical identifiers and structures; intrinsic, physicochemical and fate and transport properties; in vitro and in vivo toxicity data; and the presence in environmentally relevant lists. We outline how to use the batch search feature and provides an overview regarding the type of information that can be sourced by considering a series of typical-use questions.

Graphical Abstract

graphic file with name nihms-1747950-f0001.jpg

A number of public chemical databases are available to researchers, although these databases may contain structural inconsistencies or lack key information necessary for environmental research. The US-EPA CompTox Chemicals Dashboard provides a publicly accessible web-interface to a number of the EPA’s databases and tools from the Center for Computational Toxicology and Exposure including the Distributed Structure-Searchable Toxicity (DSSTox) database, the ToxVal database and the invitrodb bioactivity database, as well as a number of environmental research-relevant tools. The Dashboard’s batch search feature allows researchers to search multiple chemical identifiers via a single query, allowing for high-throughput access to these data and other associated information.

Introduction

The US-EPA CompTox Chemicals Dashboard1, from here on referred to as the Dashboard, is a web-based application that provides access to various types of data for ~900,000 chemicals registered in the underlying Distributed Structure-Searchable Toxicity (DSSTox) Database2. These data include chemical identifiers, physicochemical properties, in vivo hazard and in vitro bioactivity data and exposure data in the form of information regarding what products a chemical might be contained in and at what concentration. The Dashboard also integrates a generalized read-across module (GenRA)3, an automated approach to make reproducible read-across predictions of toxicity outcomes from in vivo studies, and the Abstract Sifter module4 that performs real-time searches against the National Library of Medicine’s Pubmed database5.

For substances registered in the DSSTox database, unique generic substance identifiers, DTXSIDs, are generated.2 Chemical identifier-structure associations are controlled to prevent conflicts and ensure one-to-one mappings. In addition, a combination of automated and manual curation is deployed to ensure data quality, i.e. correct stereochemistry representation, Markush structures, etc.

While there exists a number of other public domain databases for chemical information [e.g. PubChem, ChemSpider, ChEMBL and DrugBank]69, the Dashboard sets itself apart by providing high-quality data relevant to the environmental sciences specifically, especially computational toxicology and exposure. Other defining features of the Dashboard include lists of chemicals associated with a specific topic (e.g. pesticides, drugs, etc.), the ability to search based on molecular masses and formulas, and real-time structure-based predictions of toxicological and physical properties. A major feature which is unavailable in the vast majority of other public databases is a batch search capability (i.e., the ability to obtain information for many chemicals at one time). Herein, we will discuss the utility of this tool and show how it can benefit a researcher with respect to consistency and efficiency in data acquisition.

Basic Overview of the Batch Search

The batch search can be accessed via the navigation bar at the top of the Dashboard interface or directly via the URL: https://comptox.epa.gov/dashboard/dsstoxdb/batch_search. A number of inputs are available for searching (vide infra) and exports can be generated in Excel, comma- and tab-separated values (CSV and TSV), and structure data file SDF10 formats. At the time of writing, searches are limited to ~5000 input identifiers. However, for formulas and monoisotopic mass inputs it is recommended that searches be limited to ~100 inputs because of the potential explosion in the search results where a single formula or mass input can map to hundreds of chemicals: a formula search for C15H16O2 results in >250 results. The process for using the batch search is as simple as selecting the type of input, entering the list of relevant inputs into the search box (commonly through copy-paste operations), selecting the export format, selecting data to include in the export, and clicking download. Note that the user can either include all identifiers by default but for large collections of input identifiers performance is greatly enhanced if the user defines the type of input so that thousands of identifiers are not trying to be resolved against a large number of input types. The detailed operation of the batch search will become self-evident through the example questions and use cases that are listed below.

I have a list of different types of chemical names and identifiers and would like to download available data. How do I do it?

The batch search offers the ability to search by six different identifiers and retrieve data in a number of formats. The identifiers are: chemical names, Chemical Abstract Service Registry Numbers (CASRN)11, InChIKeys and the associated InChIKey skeleton12, DTXSID and DTXCID (i.e the DSSTox substance and structure identifiers)2. There are other input search terms, such as molecular formula and mass, and these will be covered in detail later.

In order to execute a batch search, a user simply inputs the list of identifiers, one per line, into the search box and selects the type of identifier that is being searched. As shown in Figure 1, substance identifiers (chemical names, CASRN, InChIKey and DTXSID) are grouped and it is possible to select a single identifier type or all types for inclusion in the input list.

Figure 1.

Figure 1.

A set of substance identifiers pasted into the batch search box on the batch search page, https://comptox.epa.gov/dashboard/dsstoxdb/batch_search, with the relevant input types selected under “Select Input Type(s)”. Note that the inputs in this example are both CASRN and names.

Chemical names are the most general identifiers that can be used for searching in the dashboard, and it should be noted that a single chemical can have many tens of names and other identifiers (e.g. CAS Numbers (CASRNs), Pesticide code numbers, European Community Numbers) as evidenced by the synonyms list associated with aspirin (https://comptox.epa.gov/dashboard/dsstoxdb/results?search=aspirin#synonyms). A single chemical can be associated with a number of deleted or active CASRNs, one example chemical being Bisphenol A [https://comptox.epa.gov/dashboard/DTXSID7020182] that has five deleted registry numbers. There can be many tens of deleted CASRN as illustrated by the substance “Bisphenol A/Epichlorohydrin resin” [https://comptox.epa.gov/dashboard/DTXSID0050479] registered with the Active CASRN 25068-38-6, and with 316 Deleted CASRNs. All are included in the database for completeness and to support searches. The InChIKey12 is a structural identifier linked with substances that have associated chemical structures. The vast majority of online databases generate such keys for chemicals in their database, and as a result, they are invaluable for performing structure searches using standard internet searches, for example, a Google search using the key as input. InChIKeys are now commonly used for reporting structures in peer reviewed publications and chemical databases. Pasting a list of InChIKeys into the batch search will identify what chemicals are present in the publication or database for cross-walking between data sets. It is also possible to search using the “InChIKey skeleton”, the first section of the InChIKey (i.e. SNGREZUHAYWORS, rather than SNGREZUHAYWORS-UHFFFAOYSA-N (PFOA)), which returns all substances with the associated molecular skeleton. Batch searching of the DTXSID substance identifier is also supported. Identifier searches are currently based on exact matches only so any subtleties in spelling, hyphenation and so on may lead to failed matches (although other spellings may match spelling variations associated with other languages included as synonyms). Future plans include taking into account fuzziness in the name and will be discussed later.

Once the chemical identifiers are pasted into the search box and the relevant checkboxes are checked, as shown in Figure 1, the “Download Chemical Data” button is clicked. This presents the user with a large menu of data available for inclusion in the output file, such as other identifiers, text-based chemical structures, property data etc., all of which will be discussed in detail in later examples. Information about the data associated with each checkbox can be obtained by hovering over the information symbol at the end of each data set name. The selected data can then be downloaded to an output file by pressing the blue “Download” button.

I have a list of chemicals – how can I find out which have been identified as pesticides?

The concept of a “chemical list” on the dashboard defines a collection of chemicals associated with a project, publication, or family of chemicals. The currently available set of public chemical lists is accessed via the “Lists” link on the top Dashboard banner menu [https://comptox.epa.gov/dashboard/chemical_lists]. Each has the list title, the number of associated chemicals in the list, and a short summary describing the content and source(s) of the list. At the time of writing, ~290 lists were available ranging from a small list of mycotoxins [https://comptox.epa.gov/dashboard/chemical_lists/MYCOTOXINS] with a few dozen chemicals to the list of chemicals associated with the active non-confidential Toxic Substances Control Act chemical inventory [https://comptox.epa.gov/dashboard/chemical_lists/TSCA_ACTIVE_NCTI_0320] with over 33,000 chemicals. A more detailed list description, including a chemical tile-based view of substances in the list, is accessed by clicking on the blue box next to the list name.

The lists in which a chemical is present are displayed on an individual chemicals “Details Page”. For example, as Figure S1 shows, the lists for Bifenthrin (a pyrethroid insecticide), are viewed by opening the “Presence in Lists” accordion. The lists are segregated into Federal, US State, International and “other” lists. Selecting any of the colored tiles will open the entire list for review.

One or more lists can also be selected from the batch search interface (see Figure S2 on the right-hand side of the batch interface for data export selection). Therefore, for a list of chemicals, selecting one or more lists will include a “Y” (yes) flag for that chemical in the output file if it is present on the selected list. There are currently nine lists specifically associated with pesticides available. The pesticide lists are selected, and the output file is then downloaded (Table S1). We can now view the output file and note which queried chemicals are flagged as present in each of the pesticide lists.

The Dashboard contains mappings between parent compounds and transformation products (i.e. metabolites or degradants). I have a list of chemicals and would like to identify which ones have associated transformation products. How do I do it?

Chemicals in the dashboard can have “related substances” mapped to them in the underlying DSSTox database. Such relationships can include predecessor and transformation products (e.g. for caffeine https://comptox.epa.gov/dashboard/dsstoxdb/results?search=caffeine#related-substances) or members of a chemical family (e.g. polybrominated diphenyl ethers (PBDEs), https://comptox.epa.gov/dashboard/dsstoxdb/results?search=PBDEs#related-substances). Other mappings define polymer-monomer and parent-salt relationships.

To obtain the related substances for a list of chemicals, the relevant identifiers are entered into the search box, the “Related Substance Relationships” option is selected under the “Enhanced Data Sheets” section, and the download button is clicked. Note that many Enhanced Data Sheets options will not be active unless Excel is selected as the output file form. The resulting output Excel file will have a second worksheet containing the related substances for each searched identifier. This worksheet contains the identifiers of the related substance as well as its relationship to the searched identifier. An example of the enhanced data sheet is provided as Table S2.

I suspect an environmental sample contains PFAS (per- and polyfluoroalkyl substances) and would like to identify all chemical substances matching a formula I have identified through mass spectrometry.

Mass spectrometry (MS) is an analytical spectroscopy technique that helps to identify molecular components based on detection of mass/charge ratio that ultimately translates to one or more molecular formula(e) that, in terms of the dashboard, can be used as a lookup against the database. Substances in the dashboard can be multi-component chemicals or salts in the form of multiple stereovariants etc. so a formula search for a single component detected via any of the multiple forms of mass spectrometry is complicated by the formulas for these more complex substances. To solve this issue the Dashboard has an option to use “MS-ready” structures13, which result from standardization approaches that process chemicals into forms that are desalted and neutralized, have had stereochemistry removed, split into individual single components if contained in a mixture and, for completeness, also have isotopes removed. With many of the substances registered in DSSTox having an MS-ready structure, and associated mass and formula, it is possible to identify substances containing a specific formula of interest. We will use this feature to easily answer this question.

Assume two molecular formulas have been identified in a sample, C8HF17O3S and C8HF15O2. To identify all possible single component matches for those formulas, select the input type “MS-Ready Formula(e)” on the batch search page and enter the two formulas into the search box (one on each line, as with identifiers). Before downloading the chemical data associated with one or more molecular formula(e) or masses, it is advisable to view how many chemicals match your search. Clicking the “Display All Chemicals” button takes the user to a “Search Results” page. For these two formulas, a total of 96 chemicals are identified. In the user interface, isotopically labeled chemicals and multicomponent chemicals can be filtered out if necessary, using the “Hide chemicals that are:” dropdown box to select the desired exclusion terms. Holding the “Ctrl” key allows multiple selections to be made.

The user now has two options to download the chemical data. Selecting the “Download” dropdown menu provides the option to download additional chemical information that has been found to be useful to mass spectrometrists (i.e. source counts to rank likely chemical identity), in either of the previously discussed file formats. This file contains information such as chemical identifiers, molecular masses, and counts of each particular chemical’s occurrence in databases and literature (as discussed below). The other option is to select the “Send to Batch Search” button, which returns the user to the batch search page and pre-populates the identifier box with the DTXSIDs of chemicals associated with the formulas. The user can now selectively choose which information to download.

Of particular interest to researchers performing suspect-screening analysis are the data available under the Metadata heading. Selecting the “Data Sources” option provides a count of how many chemical lists are associated with a chemical. This count can be considered evidence of presence in environmental samples, as demonstrated in previous research on the identification of “Known-Unknowns”1417. If a particular sample is associated with a specific type of media, (e.g. the Blood Exposome Database and Chemicals in human blood (plasma and serum) lists would be particularly relevant for a blood sample), then a chemical’s presence in that list(s) could be highlighted as the user can select individual lists from the “Presence in Lists” section or can choose to select “Select all in Lists” under the “Customize Results” section.

Another form of metadata to include in the ranking of tentative identifications uses the number of articles associated with the chemical in the National Library of Medicine’s PubMed search engine5 (checkbox “Number of PubMed Articles”). It should be noted that this count is from a snapshot in time rather than a real time count and is a representative count only. Similarly, the PubChem database6, selected using the “PubChem Data Sources”, incorporates data from numerous sources (756 as of the time of writing: https://pubchem.ncbi.nlm.nih.gov/sources) and can be used as a likelihood metric. Finally, the number of consumer products containing the tentative identifications may be of interest. The Chemical and Products Database (CPDat)18 is a periodically-updated EPA database of chemicals reported and measured in products. The batch search includes an option, “CPDat Product Occurrence Count”, which provides the number of products containing the chemical in the current version of CPDat. Using the metadata discussed here, the initial prioritization of tentative identifications for confirmation by chemical standard should be possible. In the case of C8HF17O3S and C8HF15O2, their likely identities are perfluorooctanesulfonic acid and perfluorooctanoic acid, based on high occurrence in both literature and databases. An output file with the information discussed here is provided as Table S3.

I am building a quantitative structure-activity relationship (QSAR) model and need a list of QSAR-ready SMILES for the chemicals in my training/test datasets, in order to calculate molecular descriptors.

Popular software solutions for molecular descriptor calculation, such as PaDEL19 and Mordred20, utilize SMILES to describe molecular connectivity and are the primary input for generation. However, these descriptors are often only valid for organic molecules that are desalted, de-isotoped, and contain no stereochemistry. A common issue with datasets used for QSAR modeling is the presence of chemicals that are indeed salts, may be isotopically labeled, and have stereocenters. These structure-level features are absent in “QSAR-ready SMILES”, similar in nature to the MS-Ready SMILES discussed earlier. The Dashboard makes these available via the batch search using the “QSAR-Ready SMILES” option under the “Structure” section. As previously done, a list of chemical identifiers can be entered into the search box, the “QSAR-Ready SMILES” option selected, and an output file can be downloaded. Note that this option will not generate a result for all substances, e.g. inorganic salts. Because these substances are being desalted, these will most likely link to multiple substances. In the resulting output file, the QSAR-ready SMILES for each compound will be available as a separate column.

I want to obtain a general review of the type of toxicity data available for a set of chemicals.

There are various types of toxicity data accessible via the Dashboard (i.e. in vivo data, in vitro bioactivity data and certain types of predicted toxicity data). The relevant checkboxes for TEST (Toxicity Estimation Software Tool), include ToxVal data availability, assay hit count, IRIS, PPRTV, and associated ToxCast assays are selected, as shown in Figure S4, and a resultant file with this information is downloaded. In the remainder of this section, we will discuss each of the toxicity data types in detail.

“TEST Model Predictions” include a combination of both toxicity and physicochemical property estimates. Among available toxicity predictions are oral rat LD50 and Ames mutagenicity test values. Another available option to access available toxicity data is the “Include ToxVal Data Availability” which outputs a flag denoting the availability of ToxVal data (though at present not how much and which data) for each queried chemical. ToxVal is an aggregated database containing in vivo toxicity data from >30 separate databases, for >50,000 chemical substances and ~70,000 literature articles. If a queried chemical has some form of associated toxicity data in ToxVal then a hyperlink to the relevant Dashboard page is inserted into the download file.

Another option provides access to bioactivity data from in vitro assays. Selection of the “Assay Hit Count” checkbox provides both a count and percentage of the number of active hit calls (assays where a chemical is active) as a fraction of the total number of bioassays tested as part of the ToxCast/Tox21 program21, 22 (e.g. 51/214 indicates 51 active hit calls out a total of 214 different assay endpoint measurements). To obtain the individual ToxCast assays measured for a particular chemical, the “Associated ToxCast Assays” option is selected. In addition to the standard output file (Table S4), this provides a separate spreadsheet identifying inactive assays with a blue-colored “0” entry and active assays with a red-colored “1” entry. Cells are populated with “-” when a chemical has not been tested with an assay (Table S5).

The final two options regarding the availability of toxicity data are the “IRIS” and “PPRTV” checkboxes that provide flags and URL links when data are available. The Integrated Risk Information System, IRIS, is the EPA’s preferred source of toxicity information and provides reference concentrations and cancer descriptors, among other toxicological values. The Provisional Peer Reviewed Toxicity Values, PPRTV, are taken from reports for the Superfund Program which contain toxicity values derived from scientific literature, when those values are not available in IRIS.

Future Work

A number of prototype projects that ultimately will be integrated into a future version of the dashboard are already underway. These include the ability to perform structure, substructure and similarity searches based on structure input. Such searches will provide hit lists that will be passed to the batch search to harvest data en masse for a set of input structures. The ability to perform batch searches based on mass or formulas already supports our research efforts in mass spectrometry, specifically non-targeted analysis17, 23 and our present research includes spectral searches of experimental data against in silico generated mass spectra24. Integrating such searches with the availability of associated toxicity data will provide a platform for both structure identification and hazard and risk-based prioritization of potential identifications for follow-up. As the dashboard expands in terms of available data, the batch search, as well as a publicly available application programming interface presently in development, will allow for improved access to data for the community. It is already acknowledged that the need for “fuzzy matching” of identifiers to account for differences in spelling, hyphenation, errors in CASRN, and potential automated corrections thereof could be highly beneficial to improved matching during searches.

Conclusion

In this Application Note, we have attempted to demonstrate the utility and versatility of the batch search feature of the CompTox Chemicals Dashboard. While many other public chemical databases restrict searching to one chemical at a time, batch search allows thousands of chemicals to be searched at once. We have shown how data can be quickly obtained from various types of chemical identifiers (i.e. CASRN, names, InChIKeys) including the presence of those chemicals in specific chemical lists (~280 at the time of writing), physicochemical and environmental fate and transport properties, and various types of toxicity data. While new functionality and data are introduced with each new version of the dashboard (10 releases in the past 5 years), the batch search capabilities of the dashboard reported in this article are especially important for the community. The current version of the Dashboard will be extended to deliver additional capabilities as outlined in the future work section above.

Supplementary Material

Supplemental Figures
Supplemental Tables

Acknowledgements

The authors acknowledge the contributions of the various members of the information technology and software development teams that have contributed to the development, deployment and support of the dashboard and related tools and databases since its inception. We also acknowledge the important contribution of the curation team that work diligently to ensure that both the addition of new chemical substances, and the ongoing enhancement of data quality. We specifically acknowledge Ann Richard and Chris Grulke for their efforts in building the foundation technologies of the DSSTox project and overseeing the chemical curation efforts.

The information in this document has been funded wholly or in part by the US Environmental Protection Agency. It does not signify that the contents necessarily reflect the views of the Agency, nor does mention of trade names or commercial products constitute endorsement or recommendation for use. The paper has been subjected to the Agency’s review process and approved for publication.

Footnotes

Declaration of Interests

The authors declare no competing interests.

Disclaimer

This article was reviewed in accordance with the policies of the Office of Research and Development, U.S. Environmental Protection Agency, and approved for publication. Mention of trade names or commercial products does not constitute endorsement or recommendation for use.

Data and Code Availability

This study did not generate/analyze any novel datasets or computer code. All data discussed herein are available via the CompTox Chemicals Dashboard, https://comptox.epa.gov/dashboard.

References

  • 1.Williams AJ; Grulke CM; Edwards J; McEachran AD; Mansouri K; Baker NC; Patlewicz G; Shah I; Wambaugh JF; Judson RS, The CompTox Chemistry Dashboard: a community data resource for environmental chemistry. Journal of cheminformatics 2017, 9, 61. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 2.Grulke CM; Williams AJ; Thillanadarajah I; Richard AM, EPA’s DSSTox database: History of development of a curated chemistry resource supporting computational toxicology research. Computational Toxicology 2019, 12, 100096. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 3.Helman G; Shah I; Williams AJ; Edwards J; Dunne J; Patlewicz G, Generalised Read-Across (GenRA): A workflow implemented into the EPA CompTox Chemicals Dashboard. Altex 2019, 36, 462. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 4.Baker N; Knudsen T; Williams A, Abstract Sifter: a comprehensive front-end system to PubMed. F1000Research 2017, 6. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 5.PubMed web services. https://www.ncbi.nlm.nih.gov/pmc/tools/developers/
  • 6.PubChem. https://pubchem.ncbi.nlm.nih.gov/
  • 7.Pence HE; Williams A, ChemSpider: An Online Chemical Information Resource. Journal of Chemical Education 2010, 87, 1123–1124. [Google Scholar]
  • 8.Gaulton A; Hersey A; Nowotka M; Bento AP; Chambers J; Mendez D; Mutowo P; Atkinson F; Bellis LJ; Cibrian-Uhalte E; Davies M; Dedman N; Karlsson A; Magarinos MP; Overington JP; Papadatos G; Smit I; Leach AR, The ChEMBL database in 2017. Nucleic Acids Res 2017, 45, D945–D954. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 9.Wishart DS; Feunang YD; Guo AC; Lo EJ; Marcu A; Grant JR; Sajed T; Johnson D; Li C; Sayeeda Z, DrugBank 5.0: a major update to the DrugBank database for 2018. Nucleic acids research 2018, 46, D1074–D1082. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 10.Dalby A; Nourse JG; Hounshell WD; Gushurst AKI; Grier DL; Leland BA; Laufer J, Description of several chemical structure file formats used by computer programs developed at Molecular Design Limited. Journal of Chemical Information and Modeling 1992, 32, 244–255. [Google Scholar]
  • 11.CAS REGISTRY - The gold standard for chemical substance information. https://www.cas.org/support/documentation/chemical-substances
  • 12.Heller SR; McNaught A; Pletnev I; Stein S; Tchekhovskoi D, InChI, the IUPAC international chemical identifier. Journal of cheminformatics 2015, 7, 23. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 13.McEachran AD; Mansouri K; Grulke C; Schymanski EL; Ruttkies C; Williams AJ, “MS-Ready” structures for non-targeted high-resolution mass spectrometry screening studies. Journal of cheminformatics 2018, 10, 45. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 14.McEachran AD; Sobus JR; Williams AJ, Identifying known unknowns using the US EPA’s CompTox Chemistry Dashboard. Analytical and bioanalytical chemistry 2017, 409, 1729–1735. [DOI] [PubMed] [Google Scholar]
  • 15.Schymanski EL; Williams AJ, Open science for identifying “known unknown” chemicals. Environmental science & technology 2017, 51, 5357. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 16.Newton SR; McMahen RL; Sobus JR; Mansouri K; Williams AJ; McEachran AD; Strynar MJ, Suspect screening and non-targeted analysis of drinking water using point-of-use filters. Environmental pollution 2018, 234, 297–306. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 17.Sobus JR; Wambaugh JF; Isaacs KK; Williams AJ; McEachran AD; Richard AM; Grulke CM; Ulrich EM; Rager JE; Strynar MJ, Integrating tools for non-targeted analysis research and chemical safety evaluations at the US EPA. Journal of exposure science & environmental epidemiology 2018, 28, 411–426. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 18.Dionisio KL; Phillips K; Price PS; Grulke CM; Williams A; Biryol D; Hong T; Isaacs KK, The Chemical and Products Database, a resource for exposure-relevant data on chemicals in consumer products. Scientific data 2018, 5, 180125. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 19.Yap CW, PaDEL-descriptor: An open source software to calculate molecular descriptors and fingerprints. Journal of computational chemistry 2011, 32, 1466–1474. [DOI] [PubMed] [Google Scholar]
  • 20.Moriwaki H; Tian Y-S; Kawashita N; Takagi T, Mordred: a molecular descriptor calculator. Journal of cheminformatics 2018, 10, 4. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 21.Richard AM; Judson RS; Houck KA; Grulke CM; Volarath P; Thillainadarajah I; Yang C; Rathman J; Martin MT; Wambaugh JF; Knudsen TB; Kancherla J; Mansouri K; Patlewicz G; Williams AJ; Little SB; Crofton KM; Thomas RS, ToxCast Chemical Landscape: Paving the Road to 21st Century Toxicology. Chem Res Toxicol 2016, 29, 1225–51. [DOI] [PubMed] [Google Scholar]
  • 22.Thomas RS; Paules RS; Simeonov A; Fitzpatrick SC; Crofton KM; Casey WM; Mendrick DL, The US Federal Tox21 Program: A strategic and operational plan for continued leadership. Altex 2018, 35, 163. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 23.Ulrich EM; Sobus JR; Grulke CM; Richard AM; Newton SR; Strynar MJ; Mansouri K; Williams AJ, EPA’s non-targeted analysis collaborative trial (ENTACT): genesis, design, and initial findings. Analytical and bioanalytical chemistry 2019, 411, 853–866. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 24.Chao A; Al-Ghoul H; McEachran AD; Balabin I; Transue T; Cathey T; Grossman JN; Singh RR; Ulrich EM; Williams AJ, In silico MS/MS spectra for identifying unknowns: a critical examination using CFM-ID algorithms and ENTACT mixture samples. Analytical and bioanalytical chemistry 2020, 412, 1303–1315. [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Supplemental Figures
Supplemental Tables

Data Availability Statement

This study did not generate/analyze any novel datasets or computer code. All data discussed herein are available via the CompTox Chemicals Dashboard, https://comptox.epa.gov/dashboard.

RESOURCES