Abstract
Background:
Although evidence linking environmental chemicals to breast cancer is growing, mixtures-based exposure evaluations are lacking.
Objective:
This study aimed to identify environmental chemicals in use inventories that co-occur and share properties with chemicals that have association with breast cancer, highlighting exposure combinations that may alter disease risk.
Methods:
The occurrence of chemicals within chemical use categories was characterized using the Chemical and Products Database. Co-exposure patterns were evaluated for chemicals that have an association with breast cancer (BC), no known association (NBC), and understudied chemicals (UC) identified through query of the Silent Spring Institute’s Mammary Carcinogens Review Database and the U.S. Environmental Protection Agency’s Toxicity Reference Database. UCs were ranked based on structure and physicochemical similarities and co-occurrence patterns with BCs within environmentally relevant exposure sources.
Results:
A total of 6,793 chemicals had data available for exposure source occurrence analyses. 50 top-ranking UCs spanning five clusters of co-occurring chemicals were prioritized, based on shared properties with co-occuring BCs, including chemicals used in food production and consumer/personal care products, as well as potential endocrine system modulators.
Significance:
Results highlight important co-exposure conditions that are likely prevalent within our everyday environments that warrant further evaluation for possible breast cancer risk.
Keywords: Cancer, Co-exposures, Environmental Chemicals, Mixtures, Informatics, ExpoCast
INTRODUCTION
Globally, breast cancer is the most commonly diagnosed cancer and the leading cause of cancer-related death in women [1]. Genetic risk factors are estimated to contribute to only 5–10% of breast cancer cases [2,3], leaving a substantial portion of cases attributable to other risk factors, including environmental exposures. The link between environmental chemicals and breast cancer has specifically been identified as a research priority by multiple organizations worldwide, including The Institute of Medicine of the National Academies and the World Cancer Research Fund [4,5]. Determining exposures in the environment that can impact breast cancer will build an evidence base needed to better identify sources of exposure that should be reduced and/or eliminated, with the goal of reducing the global burden of environmentally influenced disease.
Environmental chemical exposures have been previously related to breast cancer. Because breast cancer etiology is highly intertwined with reproductive status, including serum hormone levels of estrogen and progesterone, chemicals that modulate signaling relevant to estrogen/progesterone levels have been linked to this disease outcome [6–8]. These chemicals include endocrine modulating chemicals, such as bisphenol A, parabens, phthalates, and polybrominated diphenyl ethers [6–8]. Other environmentally relevant chemicals that have been linked to increased risk of breast cancer include air pollutants (e.g., polycyclic aromatic hydrocarbons [PAHs]), dioxins, metals (e.g., cadmium and lead), industrial chemicals (e.g., benzene, ethylene oxide, and 1,3-butadiene), perfluoroalkyl substances (e.g., perfluorooctanoic acid [PFOA] and perfluorooctane sulfonate [PFOS]), and pesticides [6–8]. These chemicals have been largely evaluated based on individual exposure conditions due to a lack of information regarding co-occurrence of environmental chemicals, i.e., mixtures, with breast cancer incidence. This is a critical research gap, as humans are commonly exposed to multiple potentially harmful chemicals at a time [9], and thus potential joint toxicities resulting from co-occurring chemicals remain understudied in relation to breast cancer incidence. At the same time, the design of such studies remains difficult, as data are lacking that describe which chemicals commonly occur in our everyday environment as mixtures that may have associations with breast cancer incidence.
Humans are exposed to chemicals that originate from a variety of sources in their everyday environments. Sources include industrial, agricultural, and consumer uses of chemicals. Human exposure can occur via contact with a chemical source directly (e.g., a consumer product) or via contact with contaminated environmental media (e.g., air, soil, food, house dust). Characterization of the uses associated with the thousands of chemicals in commerce is needed to identify and prioritize chemicals and chemical combinations according to their exposure potential and ultimate human health impacts. To address this research goal, the curation of use information in chemical inventories has expanded in recent years, providing the foundation for much needed evaluation of understudied chemicals in the environment. For example, the U.S. Environmental Protection Agency’s (U.S. EPA’s) Chemical and Product Categories database (CPCat) was organized to capture data on consumer exposure pathways and patterns of chemical use in the environment [10]. This database has since been expanded and combined with additional product and chemical use data to form the Chemicals and Products Database (CPDat) [11]. These databases serve as an important foundation for a better understanding of the landscape of human exposures, and ongoing curation efforts will further increase their utility.
As described within this study, we have recently expanded data within CPDat to include additional source documents and improved descriptions of chemical use terms. We leveraged this unique resource to identify understudied chemicals in the environment, with respect to their association with breast cancer risk, that co-occur alongside chemicals shown to have an association with breast cancer. These exposure combinations could modify breast cancer risk. Understudied chemicals were then evaluated for physicochemical and structural similarities to chemicals associated with breast cancer. Resulting chemicals that likely co-occurred with, and showed similar chemical properties to, breast cancer carcinogens were identified as priority contaminants for further study to inform possible breast cancer risk, individually, and as mixtures.
METHODS
Exposure Source Categories
This study relied on the general chemical use data contained within CPDat [11], which collates and curates data obtained from federal and state reports, academic journal articles, and publications from international government agencies. Resulting chemical records span worldwide chemical use and product information inventories, chemical safety guideline sheets, food inventories, pesticide use information, and water and soil contamination data. To provide examples of such inventories, these include (listed respectively): the Danish Environmental Protection Agency’s Surveys on Chemicals in Consumer Products, Washington State’s Children’s Safe Products Act, USDA Annual Reports on Agricultural Chemical Usage for Field Crops and Fruit, State of Arizona’s Reported Pesticide Use Within Arizona, and the Minnesota Department of Agriculture’s Annual Water Quality Monitoring Report, among many others. CPDat aggregates chemical use information from hundreds of such documents into one organized dataset, harmonized to general descriptors that impart information on how the chemical is used, according to the original source. Here, we updated the CPDat descriptors to better align them with newly developed consumer product [12] and functional use categories [13]. These newly developed “CPDat Chemical List Presence keywords” are provided publicly in current versions of CPDat (EPA 2020a) and are further described in Supplemental Methods. A key feature of this updated dataset is the additional harmonization of the various chemical identifiers to Distributed Structure-Searchable Toxicity Database Substance Identifiers (DTXSID) to remove redundancies in chemical annotations, yielding unique substance identifiers [14,15].
The current study used Version 2 of the CPDat Chemical List Presence dataset [16]. These data included 73,573 chemical records derived from 1,543 chemical use reports (covering the years 1987–2019), reflecting 20,530 unique DTXSIDs and 129 unique use keywords, with multiple keywords (reflecting multiple uses) often assigned to a single chemical record. For the purposes of the current evaluation, related keywords were grouped into 32 broader “exposure source categories.” These exposure source categories served as higher-level descriptors of chemical use information designed to aid in user interpretation while allowing more effective chemical use clustering in proceeding analyses (multiple categories could still result for a single chemical). Select keywords or terms that did not impart information descriptive of an existing specific exposure source (e.g., “nondetect”, “prohibited”, “restricted”) were excluded from this analysis.
Association with Breast Cancer
This study focused on chemicals and their co-occurrence patterns in relation to breast cancer. Chemicals were grouped into categories based upon query of two databases: Silent Spring Institute’s Mammary Carcinogens Review Database [17,18] and U.S. EPA’s Toxicological Reference Database (ToxRefDB) [19]. Data from both human and animal studies were leveraged here, as there is evidence to support genetic similarities between rodent mammary tumors and breast carcinogenesis pathways in humans [20,21]. The Mammary Carcinogens Review Database includes information on 216 chemicals and breast cancer-relevant findings aggregated across findings from the International Agency for Research on Cancer Monographs, Carcinogenic Potency Database, U.S. National Toxicology Program (NTP), NTP 11th Report on Carcinogens, and the Chemical Carcinogenesis Research Information System Database. Thus, this database draws findings from both human and animal studies. The ToxRefDB represents one of the largest publicly available databases of curated in vivo study results, consisting largely of data from animal studies performed in accordance with or similar to U.S. Environmental Protection Agency Health Effects Series Guideline Studies from pharmaceutical, agrochemical, and other industrial chemicals [19]. This database was used to identify chemicals for which repeat dose studies in adults evaluated potential mammary gland changes in phenotypes.
Chemicals were organized into the following three categories: (1) Breast cancer chemicals (BCs) associated with breast cancer in humans and/or mammary gland cancer in animals; (2) Non-breast cancer chemicals (NBCs) that have been tested but not found to cause mammary gland carcinogenicity in animals and currently have no known association with breast cancer; (3) Understudied chemicals (UCs) that remain understudied in relation to human breast cancer and/or mammary gland phenotype changes in animals.
To identify BCs, both the Mammary Carcinogens Review Database and ToxRefDB were queried. All 216 chemicals currently included in the Mammary Carcinogens Review Database were included as BCs, determined by Silent Spring Institute as chemicals that met at least one of the following criteria: (i) reported through IARC Monograph summaries to increase mammary gland tumors; (ii) included in the Carcinogenic Potency Database with at least one study that reported an increase in mammary gland tumors; (iii) reported on the NTP website as “chemicals associated with site-specific tumor induction in mammary gland,” (drawn from the collection of NTP Technical Reports) as well as chemicals that increased mammary tumors in the NTP Study Reports Collection: Abstracts and Target sites in two-year studies; (iv) reported in the NTP 11th Report on Carcinogens as associated with increased mammary tumors; and (v) reported through the Chemical Carcinogenesis Research Information System Database as having positive results in the “Carcinogenicity Studies” section after filtering for “mammary” [17]. All chemicals were linked to DTXSIDs and corresponding chemical names through a batch search (using CASRN) on the CompTox Chemicals Dashboard [22]. ToxRefDB was queried as an additional database to identify BCs, selecting chemicals that had recorded instances of a treatment-related mammary gland change that was cancer-related in adult animals exposed in subchronic or chronic studies [19]. These effects were either micro- or macroscopic pathology, and included findings described as fibroadenoma or fibroma, adenocarcinoma or adenoma, carcinoma, and mixed tumor (malignant or not otherwise specified).
To identify NBCs, ToxRefDB was queried for chemicals that were tested for mammary gland changes in pathology, focusing on mouse and rat species to parallel the ToxRefDB BC query results. Records were confined to chronic two-year animal bioassays, to focus on exposure designs that included sufficient time for cancer endpoints to develop. Chemicals that were included in this list, but were not identified as BCs, were then defined as the list of NBCs for the purposes of the current investigation. As the Mammary Carcinogens Review Database lacked parallel “negative” data, the identification of NBCs was based solely on animal data contained in ToxRefDB; though the final NBC list notably contained chemicals that were not present in the Mammary Carcinogens Review Database, supporting their lack of currently known association with breast cancer.
To identify UCs, chemicals within the CPDat Chemical List Presence dataset that were not identified as BCs or NBCs were then designated as the UCs. These UCs represent chemicals that have yet to be tested in relation to breast cancer / changes in mammary gland phenotypes, or generally lacked information surrounding their potential carcinogenicity, that also have exposure source information relevant to human environmental exposure conditions.
Chemical Exposure Patterns
The chemicals represented in CPDat were hierarchically clustered based on their associated exposure source category linkages to characterize potential chemical co-occurrence patterns. We have demonstrated success with this approach in identifying trends in both chemical use and molecular response signatures [23–26] and have used it here to identify co-occurrence between BCs and UCs that could potentially alter disease risk.
CPDat data were first filtered for unique combinations of chemicals and exposure source categories. Additional filtering to include only those chemicals present in at least two categories to allow analysis of potential environmental co-occurrence patterns resulted in 6,793 unique chemicals. A summary table of the resulting chemicals and exposure source categories was produced, containing values of 1, indicating at least one association between the chemical and the exposure category, and values of 0, indicating no association. A distance matrix was derived from this summary table using the Jaccard distance, the complement of the Jaccard similarity, through the ‘vegan’ package (v2.5–7) in R (v4.0.3). The Jaccard similarity for two sets is defined as the size of intersection divided by the size of the union of the sets, thus the Jaccard distance can be calculated as follows (Eqn. 1):
| Eqn 1. |
For this equation, D(A,B) is the Jaccard distance for sets A and B [27], here, representing a pair of chemicals (chemical A and chemical B). For the purposes of this evaluation, the magnitudes of the intersection and union reflect the number of associated exposure source categories common between chemicals and the total number of exposure source categories associated with either of the chemicals, respectively. This metric was used here to gauge the similarities between exposure source categories associated with each possible pair of chemicals. The resulting distance ranged from 0 (low dissimilarity [i.e., high similarity]) to 1 (high dissimilarity [i.e., low similarity]). To determine the optimal number of chemical clusters, the within cluster sum of squares and average silhouette width [28] (both measures of within-cluster compactness) were calculated and visualized for 1≤ k ≤100 using the fviz_nbclust function in the R package ‘factoextra’ (v1.0.7). The minimum number of clusters that produced a reasonable cluster compactness, while allowing for interpretable clusters, was selected as optimal. These clustering methods were selected due to their recognized utility in the field of exposure science [29–31] and efficiency in terms of computing time which was required for our large dataset.The data were grouped into the selected optimal number of clusters through hierarchical clustering using the diana function in the R package ‘cluster’ (v2.1.1). The same methods were also employed to derive clusters of similar exposure source categories, with the goal of enhancing interpretability of chemical use patterns. A heatmap was produced to visualize the resulting chemical clusters through the pheatmap function in the R package ‘pheatmap’ (v1.0.12). The heatmap was color-coded such that a different color indicated an association between an exposure source category and BCs, NBCs, or UCs, allowing for visual results interpretation.
Identifying Structural Features that are Enriched in BCs vs NBCs
Chemical structural features are known to influence (i) whether or not a chemical elicits toxicity and ultimately causes disease; and (ii) how a chemical elicits toxicity (i.e., the underlying etiology of a chemical-induced disease outcome). Structural features are thus the primary data used to inform quantitative structure-activity relationship (QSAR) / read-across toxicity predictions, which are entirely based upon in silico approaches [32]. Here, structural feature data were first evaluated comparing BCs to NBCs to identify which structural attributes are more commonly abundant. Then, this information was used to identify UCs that contain these breast cancer-associated features and should be prioritized for further evaluation.
Structural feature descriptors (i.e., atom, bond, chain, and ring types) were acquired as ToxPrint fingerprint data, which contain information on whether a specific structural feature was present (1) or absent (0) [33], downloaded from the CompTox Chemicals Dashboard. This batch search yielded ToxPrint fingerprint data with 729 available chemical structural features. Features specifically enriched in BCs were identified using a method previously published [34] to test whether a specific structural attribute was present in BCs at a rate higher than would occur by chance, in comparison to NBCs. The significance of this association was indicated by a one-sided Fisher’s exact p-value ≤ 0.05 and an odds ratio ≥ 3. There was also a requirement for at least three BCs to contain the feature. The converse was also evaluated, whereby absence of structural features was analyzed for enrichment in BCs vs NBCs, using parallel filters. Calculations were carried out in R using the ‘tidyverse’ (v1.3.0) and ‘janitor’ (v2.1.0) packages.
Assessing Physicochemical Property Similarity across Chemicals
Physicochemical properties were evaluated here to further inform which chemicals may induce toxicity similar to chemicals known to cause breast cancer, based on similar etiologies, as we have previously published evidence supporting the utility of physicochemical properties in computational-based predictions of in vivo toxicology [24,35]. Furthermore, physicochemical properties play a critical role in chemical fate and transport within the environment, and thus further inform occurrence patterns of chemicals across environmental exposure sources [36]. Here, physicochemical property data were evaluated amongst BCs, and then compared to UCs, to identify pairs of co-occurring BCs and UCs in the environment that display similar physicochemical properties. These chemicals may impart similar toxicity and environmental fate/transport patterns and should likely be prioritized for further evaluation.
Physicochemical property data for all evaluated chemicals were obtained from the CompTox Chemicals Dashboard [22]. A batch search was performed for the DTXSIDs of interest, and OPEn structure-activity Relationship App (OPERA) predictions for physicochemical properties and environmental fate endpoints were downloaded [37]. These included atmospheric hydroxylation rate, bioconcentration factor, biodegradability half-life, boiling point, fish biotransformation half-life, Henry’s Law constant, melting point, octanol/air partition coefficient, octanol/water partition coefficient, soil adsorption coefficient, vapor pressure, and water solubility. Organized data were then z-score normalized by property within each cluster. The degree of physicochemical property similarity between pairs of chemicals (i.e., UC-BC pairs) based on scaled properties was evaluated using the Spearman Rank Correlation test (cor.test ‘stats’ [v4.0.3]). The resulting correlation coefficients (R) and p-value distributions were evaluated across all possible UC-BC chemical pairs per cluster, and the highest correlation result for each UC from all BC pairings was selected to inform the final chemical prioritization.
Prioritizing Chemical Mixtures for Breast Cancer Evaluations
Information on chemical exposure sources, structural similarities to BCs, and physicochemical similarities to BCs were integrated to identify UCs that co-occur alongside chemicals associated with elevated breast cancer risk, and from a mixtures-based exposure and toxicity standpoint, should be prioritized for further evaluation. First, a series of filters were applied at the cluster-level, to prioritize clusters that included chemicals most likely to be prevalent in everyday environments and that showed some of the highest structural and physicochemical property similarities to chemicals associated with breast cancer. This set of filters specifically included the following criteria: (1) Clusters were required to include at least one UC that had high structural similarity to BCs, defined as containing at least four structural features that were statistically enriched within BCs. (2) Clusters were required to include at least one UC that was physicochemically similar to co-occurring BCs, defined as having physicochemical properties features that were correlated at R ≥ 0.8 to a BC in the same cluster. Structural and physicochemical property similarity cut-offs were selected based review of the resulting distribution of enriched features/correlations, as described in the results. (3) Clusters were required to include chemicals that map to exposure source categories that were relevant to multiple common environmental exposures, such as those pertaining to personal care, the household environment, food, and water. These filters resulted in prioritized clusters of co-occurring chemicals in the environment that were selected for further examination in this analysis.
A ranking scheme was then applied at the individual chemical level (Figure 1), focusing on chemicals within each of the prioritized clusters. This ranking scheme was based on a score that combined information on chemical structural similarity and chemical physicochemical property similarity to BCs. This score parallels other algorithms used in chemical prioritization efforts [38–40]. For the structural similarity component, each UC in the cluster was given a structural similarity score (SS) based on enriched structural features (EFs) calculated as follows (Eqn. 2):
| Eqn 2. |
Figure 1. Schematic for ranking of UCs based potential for influencing breast cancer risk.

This analysis resulted in the ranking of UCs within specific chemical clusters based on use co-occurrence, structural similarity, and physicochemical property similarity to BCs. Chemical data were analyzed separately, here on a per-cluster basis, for clusters 1, 4, 5, 6, and 9. These clusters were selected based on environmental relevancy as well as other requirements described within Methods (see section Prioritizing Chemical Mixtures for Breast Cancer Evaluations).
Here, EFuc reflects the number of enriched structural features present in the UC for which a score is being calculated. EFcluster,min and EFcluster,max are, respectively, the minimum and maximum number of enriched features identified in any UC within the cluster under evaluation. Therefore, if the number of enriched features present in the UC under consideration equaled the minimum for the entire cluster, the SS for the UC would be 0, indicating low concern (i.e., low priority). Conversley, if the number of enriched features in the UC under consideration equaled the maximum for the cluster, the SS for the UC would be 1, indicating an elevated concern (i.e., high priority). A similar physicochemical similarity (PS) score was calculated for each UC in a cluster based on the highest property correlation value of the UC with a BC. Since a correlation value was calculated for each UC-BC pair in a cluster, there were often multiple correlation values for each UC. As the most conservative approach, we selected the highest correlation value (R) for each UC, and these values informed the PS scoring for the cluster, calculated as follows (Eqn. 3):
| Eqn 3. |
Here, Ruc reflects the highest correlation value for the UC for which a score is being calculated. Rcluster,min and Rcluster,max are, respectively, the minimum and maximum of the highest selected correlation values of any UC within the cluster under evaluation. Similarly to the SS, the PS could range from 0–1, with 0 indicating the lowest concern (i.e., low priority) and 1 indicating the highest concern (i.e., high priority) based on degree of physicochemical correlation to a BC. The structural and physicochemical similarity scores were then summed to provide an overall score (OS) for each UC within each previously identified cluster of interest. The overall scores were then used to inform the final UC ranking in each of the clusters of interest. Top-ranking UCs were also reported alongside their co-occurring BCs to provide examples of high priority mixtures likely occurring in our everyday environment that require further evaluation for putative relationships to breast cancer risk. All calculations were carried out in R using base statistical packages, ‘tidyverse’ (v1.3.0), and ‘janitor’ packages (v2.1.0).
RESULTS
Exposure Source Categories for Describing Human Chemical Use Patterns
Patterns of human exposure to chemicals in the environment were evaluated using chemical use inventory information organized within CPDat. The updated CPDat Chemical List Presence dataset contained 140 unique keywords; these were mapped to 32 unique exposure source categories (Figure 2; and Table S1 available at [41]). After the filtering described in the Methods, a final list of 6,793 chemicals was carried forward in the current analysis (Table S2 available at [41]). The final exposure source categories notably captured those that are relevant to environmental exposure sources that humans experience in their everyday environment, including sources from arts and crafts / office supplies, building materials, children’s products and toys, cleaning products, electronics, furniture, general consumer products, household care and cleaning products, personal care, and other common sources of exposure.
Figure 2. Translating chemical use inventory data to inform human exposure patterning.

Groups A-I illustrate the identified clusters of exposure source categories.
Categorizing Chemicals based on Association to Breast Cancer
Chemicals with associated chemical use information were binned into categories of BCs, NBCs, and UCs to describe their current known (or unknown) association to breast cancer risk. The Mammary Carcinogens Review Database contained 216 chemicals, 208 of which had a CASRN and 199 mapped to DTXSIDs, which were carried forward in the analysis (Table S3 available at [41]). Within ToxRefDB, a total of 53 unique chemicals were identified to show instances of causing mammary gland cancer-related effects, spanning results from mouse and rat chronic and subchronic bioassays (Table S4 available at [41]). In total, 228 unique chemicals were identified between the two sources as being associated with breast cancer, including human breast cancer and/or animal cancer-related mammary gland changes. These chemicals thus represented the full list of identified BCs (Table S5 available at [41]). Also within ToxRefDB, 535 unique chemicals were identified to show instances of being tested, in general, for any mammary gland cancer-related effects from mouse and rat chronic bioassays (Table S6 available at [41]). Of these 535 unique chemicals, 53 had been identified as having an association with mammary gland cancer-related effects, and therefore were already classified as BCs. As a result, 482 remaining chemicals were identified as NBCs (Table S5 available at [41]), representing chemicals that currently lack a known association with breast cancer. Mapping these chemicals to those with the required chemical use information in CPDat resulted in the following counts of chemicals in each category: 78 BCs, 409 NBCs, and 6,306 UCs (Table S5 available at [41]).
Groups of Co-Occurring Chemicals based on Chemical Use Patterns
Chemicals were evaluated for common exposure source patterns through clustering algorithms based on a Jaccard distance metric. Selection of the number of clusters (k) was determined by examination of the reduction in the proportion of within cluster variance (or sum of squares) compared to total variance and average silhouette of the resulting distance matrix (Supplemental Figure S1A). Based on the results, an optimal k = 19 clusters was selected to minimize compactness (variability) within the clusters, while grouping data into interpretable cluster assignments. Exposure source categories were similarly grouped into k=12 clusters (Supplemental Figure S1B). These resulting clusters inform the identification of 19 groups of chemicals that likely co-occur within the environment based on chemical use patterns (Figure 3). These chemical clusters include a variety of different total chemical counts, ranging between 5 and 3,175 (mean = 358). Chemical clusters also contain a wide range of BC/NBC/UC distributions, with some clusters containing 0 BCs (e.g., cluster 14) and others containing up to 15% BCs (e.g., cluster 15). Other clusters represent largely untested chemicals with up to 99% UCs. Specific clusters of interest are further detailed below.
Figure 3. Clusters of chemicals arranged by human use patterns.

Each row reflects a chemical while each column reflects an exposure source category. An association between an exposure source category and a chemical is shown as yellow (UCs), blue (NBCs), or red (BCs). Grey indicates chemicals that were not present in a particular exposure source category. Chemical clusters 1, 4, 5, 6 and 9 were prioritized for further characterization.
Structural Features Enriched in BCs
Chemical structural feature data were compared between BCs vs NBCs with the goal of identifying feature attributes that are significantly enriched within chemicals associated with carcinogenic changes in mammary glands in animals and/or breast cancer in humans. Here, ToxPrint data were organized across 78 BCs and 393 NBCs with available feature data (Table S7 available at [41]). 390 features were present in at least one BC or NBC (614 with UCs were included). Enrichment analyses identified 26 structural features that were more commonly present in BCs vs NBCs (Table 1), highlighting chemotypes that could be evaluated further in future studies for mechanistic involvement in potential cancer etiology. Enrichment analyses also identified 14 features that were more commonly absent in BCs vs NBCs (Table S8 available at [41]). Note that this does not provide evidence that the 14 features are more commonly present in NBCs. These features were not considered in further characterization steps or used to prioritize UCs. The maximum number of enriched features within an individual UC was 8, out of the possible 26 features (Figure S2). Also, the overall distribution of feature presence was right-skewed, where many UCs had either 0 or 1 enriched features. This information was then carried forward in the chemical ranking step, in which UCs were evaluated for the presence of the 26 structural features associated with BCs, as detailed in the chemical prioritization results.
Table 1. ToxPrint chemotypes identified as being enriched in BCs.
Enrichment statistics are shown here, including the odds ratio and p-value for each structure in relation to its occurrence in chemicals associated with breast cancer. The number of true positives is also listed, indicating the number of BCs that contain each chemotype. Chemotypes reflect general structural fragments, which detail information surrounding each chemical’s atom, bond, chain, and ring types, as well as group-level information when available [60].
| Chemotype | Odds Ratio | Fisher p-value | True Positives |
|---|---|---|---|
| ring.fused_steroid_generic_.5_6_6_6. | 50.51 | 5.42E-07 | 9 |
| ring.fused_.6_6._tetralin | 26.56 | 5.82E-04 | 5 |
| chain.alkeneLinear_diene_1_3.butene | 15.54 | 1.55E-02 | 3 |
| bond.CX_halide_alkyl.X_ethyl | 12.71 | 2.11E-04 | 7 |
| bond.CX_halide_alkyl.Cl_ethyl | 10.75 | 9.59E-04 | 6 |
| bond.C.N_imine_C.connect_H_gt_0. | 7.77 | 3.41E-02 | 3 |
| ring. hetero_.3._Z_generic | 7.77 | 3.41E-02 | 3 |
| chain.aromaticAlkene_Ph.C2_acyclic_generic | 7.6 | 1.05E-03 | 7 |
| chain.alkeneLinear_mono.ene_ehtylene_terminal | 5.47 | 1.68E-03 | 8 |
| bond.CN_amine_sec.NH_aromatic_aliphatic | 5.41 | 3.41E-03 | 7 |
| bond.CX_halide_alkenyl.Cl_dichloro_.1_1.. | 5.23 | 2.88E-02 | 4 |
| chain.aromaticAlkene_Ph.C2 | 5.23 | 2.88E-02 | 4 |
| chain.alkaneCyclic_pentyl_C5 | 4.51 | 2.15E-03 | 9 |
| ring.hetero_.6._N_triazine_.1_3_5.. | 4.36 | 4.29E-03 | 8 |
| bond.CX_halide_alkenyl.X_dihalo_.1_1.. | 4.18 | 4.54E-02 | 4 |
| chain.alkaneCyclic_hexyl_C6 | 3.85 | 1.79E-03 | 11 |
| bond.CN_amine_sec.NH_alkyl | 3.76 | 1.26E-02 | 7 |
| bond.CX_halide_alkenyl.X_acyclic_generic | 3.54 | 2.45E-02 | 6 |
| ring.hetero_.6._Z_1_3_5. | 3.52 | 6.77E-03 | 9 |
| bond.CX_halide_alkyl.X_ethyl_generic | 3.41 | 1.79E-02 | 7 |
| bond.CX_halide_alkyl.X_primary | 3.41 | 1.79E-02 | 7 |
| bond.CN_amine_aromatic_generic | 3.24 | 6.15E-04 | 17 |
| bond.CN_amine_pri.NH2_aromatic | 3.18 | 3.41E-02 | 6 |
| bond.CN_amine_sec.NH_aromatic | 3.12 | 2.45E-02 | 7 |
| chain.alkeneLinear_mono.ene_ethylene_generic | 3.11 | 1.17E-03 | 16 |
| bond.C.N_imine_N.connect_noZ. | *INF | 4.40E-03 | 3 |
INF refers to infinite. This occurs when the odds ratio could not be calculated for a chemotype due the absence of the chemotype in all NBCs.
Physicochemical Property Similarity Results
Physicochemical property data were used to compare property similarities between individual chemical pairings within clusters of chemicals likely co-occurring due to chemical use patterns. A total of 4,251 chemicals (78 BCs, 380 NBCs, and 3,793 UCs) had physicochemical data available (Table S9 available at [41]). Correlating physicochemical properties between pairs of chemicals identified some chemicals that were highly correlated, while others were not. To provide example data distributions of the correlation results, all correlation values across UC-BC pairs in each cluster were combined and visualized (Figure S3). These results show that there is a wide distribution of physicochemical property similarities between UC-BC pairs. These findings were then used to inform the chemical ranking, in which UCs with physicochemical properties that were highly correlated to BCs were ranked highly in the overall mixtures-based prioritization, as detailed in the next section.
Chemical Prioritization Results for Mixtures Evaluation in Breast Cancer Studies
UCs were characterized in this study to yield a list of top-ranking chemicals to test individually and in combination with co-occuring BCs in the context of mixtures-based cancer evaluations. Chemicals were first prioritized by cluster, according to groups of chemicals that likely co-occur in the environment due to similar chemical use patterns. Clusters of interest were specifically selected by first applying a set of quantitative filters, that required that the cluster include at least one UC that was highly structurally similar to BCs, defined as containing at least four structural features that were statistically enriched within BCs. Second, clusters were required to include at least one UC that was physicochemically similar to a co-occurring BC, as defined as having physicochemical features that were correlated at R ≥ 0.8 to a BC in the same cluster. These two filters were applied to prioritize UCs that could be involved in similar breast cancer initiation/propagation pathways and that may display similar fate and transport properties within the environment. This filtering strategy resulted in a list of ten chemical clusters of interest to further evaluate: clusters 1,2,3,4,5,6,7,8,9, and 11.
These ten chemical clusters were then investigated further, with a focus on those that included chemicals present in several exposure source categories relevant to the environment. This last prioritization filter yielded five top-ranking chemical clusters; namely, clusters 1, 4, 5, 6 and 9 (Figure 3). UCs within these clusters were then ranked based upon overall score (OS), representing a combination of physicochemical and structural similarities to chemicals that have an association with breast cancer. It is important to note that these scores were produced by comparing chemicals within clusters, and should thus be interpreted on a per-cluster basis, as opposed to a per-chemical basis across various clusters. 50 top-ranking UCs were summarized in Table 2 and Figure 4, with all results detailed in Table S10 (available at [41]). Notable data trends per-cluster are summarized below:
Table 2. Top 50 ranking UCs, spanning ten in each of the prioritized chemical clusters, representing groups of chemicals that likely co-occur in the environment due to common chemical use patterns.
Chemicals are arranged, per cluster, by overall score, which is a composite score based on physicochemical and structural similarity to the listed BC. Each UC is listed alongside the BC that displayed the most similar physicochemical properties per cluster (i.e., highest correlated BC).
| UC DTXSID | UC Chemical Name | Highest Correlated BC DTXSID | Highest Correlated BC Chemical Name | Spearman Rank Correlation | Spearman Rank p- value | Physicoche mical Similarity Score | Total Enriched Features | Structural Similarity Score | Overall Score |
|---|---|---|---|---|---|---|---|---|---|
| Cluster 1 | |||||||||
| DTXSID90889705 | Methyl Blue | DTXSID7021441 | Benzyl Violet 4B | 0.993 | <0.001 | 0.992 | 5 | 0.714 | 1.706 |
| DTXSID2020189 | FD&C Blue No. 1 | DTXSID7021441 | Benzyl Violet 4B | 1 | <0.001 | 1 | 4 | 0.571 | 1.571 |
| DTXSID4034310 | C.I. Acid Blue 9 | DTXSID7021441 | Benzyl Violet 4B | 1 | <0.001 | 1 | 4 | 0.571 | 1.571 |
| DTXSID601015325 | C.I. Acid Blue 9, aluminum salt (3:2) | DTXSID7021441 | Benzyl Violet 4B | 1 | <0.001 | 1 | 4 | 0.571 | 1.571 |
| DTXSID10925937 | Sodium 4-[[4-(diethylamino)phe nyl][4-(diethyliminio)cycl ohexa-2,5-dien-1-ylidene]methyl]na phthalene-2,7-disulfonate | DTXSID7021441 | Benzyl Violet 4B | 0.993 | <0.001 | 0.992 | 4 | 0.571 | 1.564 |
| DTXSID3020673 | FD&C Green No. 3 | DTXSID7021441 | Benzyl Violet 4B | 0.993 | <0.001 | 0.992 | 4 | 0.571 | 1.564 |
| DTXSID3026065 | Sulfan blue | DTXSID7021441 | Benzyl Violet 4B | 0.979 | <0.001 | 0.977 | 4 | 0.571 | 1.548 |
| DTXSID0029264 | C.I. Fluorescent brightening agent 28 | DTXSID1020069 | 2-Amino-5-azotoluene | 0.594 | 4.58E-02 | 0.547 | 7 | 1 | 1.547 |
| DTXSID2027757 | Disodium 4,4’-bis-(2-sulfostyryl)biphen yl | DTXSID1020069 | 2-Amino-5- azotoluene | 0.594 | 4.58E-02 | 0.547 | 7 | 1 | 1.547 |
| DTXSID5038888 | Basic Blue 7 | DTXSID7021441 | Benzyl Violet 4B | 0.545 | 7.07E-02 | 0.492 | 7 | 1 | 1.492 |
| Cluster 4 | |||||||||
| DTXSID40974175 | Acid green 22 | DTXSID4022367 | Estrone | 0.671 | 2.04E-02 | 0.801 | 4 | 0.8 | 1.601 |
| DTXSID1036541 | Pregnenolone | DTXSID3022370 | Progesterone | 0.972 | <0.001 | 1 | 3 | 0.6 | 1.6 |
| DTXSID00873918 | Acid Blue 3 | DTXSID4022367 | Estrone | 0.629 | 3.24E-02 | 0.773 | 4 | 0.8 | 1.573 |
| DTXSID80192119 | 4-(3-((9,10-Dihydro-4-hydroxy-9,10-dioxo-1-anthracenyl)amin o)prop- yl)-4-methyl morpholinium methyl sulfate | DTXSID0020573 | 17beta-Estradiol | 0.629 | 3.24E-02 | 0.773 | 4 | 0.8 | 1.573 |
| DTXSID2021236 | HC Red 3 | DTXSID0020573 | 17beta-Estradiol | 0.035 | 9.21E-01 | 0.38 | 5 | 1 | 1.38 |
| DTXSID90179613 | 2-(4-Amino-3-nitroanilino)ethanol | DTXSID0020573 | 17beta-Estradiol | 0.035 | 9.21E-01 | 0.38 | 5 | 1 | 1.38 |
| DTXSID9044532 | D&C Blue No. 9 | DTXSID0020573 | 17beta-Estradiol | 0.846 | <0.001 | 0.917 | 2 | 0.4 | 1.317 |
| DTXSID60874042 | Basic Blue 99 | DTXSID4022367 | Estrone | 0.238 | 4.57E-01 | 0.514 | 4 | 0.8 | 1.314 |
| DTXSID90885262 | 2-Propen-1-one, 1-[4-[[6-O-(6-deoxy-.alpha.-L-mannopyranosyl)-.beta.-D-glucopyranosyl]ox y]-2-hydroxy-6-methoxyphenyl]-3-(3-hydroxy-4-methoxyphenyl)-, (2E)- | DTXSID4022367 | Estrone | 0.469 | 1.27E-01 | 0.667 | 3 | 0.6 | 1.267 |
| DTXSID50868556 | 2-Nitro-5-glyceryl methylaniline | DTXSID0020573; DTXSID4022367 |
17beta-Estradiol; Estrone | 0.154 | 6.35E-01 | 0.458 | 4 | 0.8 | 1.258 |
| Cluster 5 | |||||||||
| DTXSID8037594 | Secbumeton | DTXSID1023869 | Ametryn | 1 | <0.001 | 1 | 6 | 0.857 | 1.857 |
| DTXSID90869542 | Ethametsulfuron | DTXSID9034868; DTXSID8024101 |
Prosulfuron; Tribenuron-methyl | 0.993 | <0.001 | 0.991 | 6 | 0.857 | 1.848 |
| DTXSID1022057 | 1,3-Dichloropropene | DTXSID0020448 | 1,2-Dichloropropan e | 0.979 | <0.001 | 0.972 | 6 | 0.857 | 1.829 |
| DTXSID1032305 | (Z)-Dichloropropene | DTXSID0020448 | 1,2-Dichloropropan e | 0.979 | <0.001 | 0.972 | 6 | 0.857 | 1.829 |
| DTXSID2040286 | Methoprotryne | DTXSID1023869 | Ametryn | 0.972 | <0.001 | 0.963 | 6 | 0.857 | 1.82 |
| DTXSID3024318 | Terbutryn | DTXSID8034586 | Etoxazole | 0.972 | <0.001 | 0.963 | 6 | 0.857 | 1.82 |
| DTXSID2042488 | Trietazine | DTXSID8034586 | Etoxazole | 0.951 | <0.001 | 0.935 | 6 | 0.857 | 1.792 |
| DTXSID3032416 | Cybutryne | DTXSID8034586; DTXSID1023869 |
Etoxazole; Ametryn | 0.951 | <0.001 | 0.935 | 6 | 0.857 | 1.792 |
| DTXSID3041615 | Aziprotryne | DTXSID8024238 | Oryzalin | 0.937 | <0.001 | 0.917 | 6 | 0.857 | 1.774 |
| DTXSID0058223 | Indaziflam | DTXSID4021218 | Quercetin | 0.797 | 3.16E-03 | 0.731 | 7 | 1 | 1.731 |
| Cluster 6 | |||||||||
| DTXSID5037494 | Deethylatrazine | DTXSID9020112 | Atrazine | 0.888 | <0.001 | 0.886 | 7 | 1 | 1.886 |
| DTXSID0037495 | Deisopropylatrazi ne | DTXSID9020112 | Atrazine | 0.867 | <0.001 | 0.862 | 7 | 1 | 1.862 |
| DTXSID1023990 | Cyanazine | DTXSID0032520 | Azoxystrobin | 0.832 | 1.44E-03 | 0.821 | 6 | 0.857 | 1.678 |
| DTXSID1037806 | 6-Chloro-1,3,5- triazine-2,4-diamine | DTXSID4021268 | Simazine | 0.979 | <0.001 | 0.992 | 4 | 0.571 | 1.563 |
| DTXSID30886374 | Cyclopropanecarb oxylic acid, 3-(2,2-dichloroethenyl)- 2,2-dimethyl-, methyl ester, (1R,3R)-rel- | DTXSID3020413 | 1,2-Dibromo-3-chloropropane | 0.965 | <0.001 | 0.976 | 4 | 0.571 | 1.547 |
| DTXSID90886375 | Cyclopropanecarb oxylic acid, 3-(2,2-dichloroethenyl)- 2,2-dimethyl-, methyl ester, (1R,3S)-rel- | DTXSID3020413 | 1,2-Dibromo-3-chloropropane | 0.965 | <0.001 | 0.976 | 4 | 0.571 | 1.547 |
| DTXSID5024344 | Tri-allate | DTXSID3020413 | 1,2-Dibromo-3-chloropropane | 0.804 | 2.75E-03 | 0.789 | 4 | 0.571 | 1.36 |
| DTXSID001017911 | Desulfinylfipronil amide | DTXSID0020446 | Diuron | 0.972 | <0.001 | 0.984 | 2 | 0.286 | 1.269 |
| DTXSID1024124 | Thifensulfuron methyl | DTXSID0024345 | Triasulfuron | 0.958 | <0.001 | 0.967 | 2 | 0.286 | 1.253 |
| DTXSID7027833 | 2-Ethyl-6-methylaniline | DTXSID5020449 | Dichlorvos | 0.951 | <0.001 | 0.959 | 2 | 0.286 | 1.245 |
| Cluster 9 | |||||||||
| DTXSID7052529 | 2-Methoxy-4-vinylphenol | DTXSID7022413 | Isoeugenol | 0.832 | 1.44E-03 | 0.868 | 4 | 1 | 1.868 |
| DTXSID5021625 | p-Isopropenylaceto phenone | DTXSID7022413 | Isoeugenol | 0.944 | <0.001 | 0.96 | 3 | 0.75 | 1.71 |
| DTXSID40110056 | Cinnamic acid | DTXSID7022413 | Isoeugenol | 0.769 | 5.25E-03 | 0.816 | 3 | 0.75 | 1.566 |
| DTXSID7047647 | 3-Phenyl-2-propen-1-yl 3-phenylacrylate | DTXSID7022413 | Isoeugenol | 0.755 | 6.60E-03 | 0.805 | 3 | 0.75 | 1.555 |
| DTXSID4025587 | 2-Methylcinnamicaldehyde | DTXSID7022413 | Isoeugenol | 0.748 | 7.35E-03 | 0.799 | 3 | 0.75 | 1.549 |
| DTXSID201016569 | 3,5-Dimethoxy-4-hydroxycinnamald ehyde | DTXSID7022413 | Isoeugenol | 0.734 | 9.05E-03 | 0.787 | 3 | 0.75 | 1.537 |
| DTXSID3052143 | dl-Borneol | DTXSID7022413 | Isoeugenol | 0.979 | <0.001 | 0.989 | 2 | 0.5 | 1.489 |
| DTXSID40905045 | (+)-trans-4-Thujanol | DTXSID7022413 | Isoeugenol | 0.979 | <0.001 | 0.989 | 2 | 0.5 | 1.489 |
| DTXSID9021841 | N-Methylaniline | DTXSID7022413 | Isoeugenol | 0.364 | 2.46E-01 | 0.483 | 4 | 1 | 1.483 |
| DTXSID9027520 | Hexa(methoxymethyl)melamine | DTXSID7022413 | Isoeugenol | 0.615 | 3.73E-02 | 0.69 | 3 | 0.75 | 1.44 |
Figure 4.

The ten UCs with the highest Overall Scores in clusters 1, 4, 5, 6 and 9 along with the most similar BCs (based on physicochemical property correlations) in each respective cluster.
Cluster 1 results summary:
Cluster 1 notably contained the largest number of chemicals, including inert ingredients, pesticides, and chemicals labelled for nonfood use. Nonetheless, there are some instances of cluster 1 chemicals being detected in food or food contact substances. Notable top-ranking chemicals in this cluster included dyes, such as C.I. acid blue 9 and methyl blue, which co-occur alongside the known BC, benzyl violet 4B. Interestingly, benzyl violet 4B was used as a food and cosmetics additive in the U.S. until its delisting by the U.S. Federal Drug Administration in 1977 [42], though it may still be used in other countries. Regardless, the potential for co-occurrence of two potentially similar dyes, or one of these dyes with chemicals of similar properties within similar exposure sources remains a concern. The other primary co-occuring BC in this cluster was 2-amino-5-azotoluene.
Cluster 4 results summary:
Cluster 4 contained chemicals included in personal care products, children’s products and toys, and some other product-relevant categories, with three BCs involved in pharmaceuticals. Chemicals in this cluster included those that co-occurred alongside BCs 17β-estradiol (also known as estradiol), estrone, and progesterone, representing chemicals shown to have notable associations with breast cancer in humans [7,8,43]. Noteworthy UCs in this cluster included 2-nitro-5-glyceryl methylaniline, 4-(3-((9,10-Dihydro-4-hydroxy-9,10-dioxo-1-anthracenyl)amino)prop-yl)-4-methylmorpholinium methyl sulfate, acid green 22, and HC red 3. The observed exposure source categories in cluster 4 included dyes, including those used in hair products and cosmetics.
Cluster 5 results summary:
Cluster 5 contained chemicals with exposure sources related to pesticides, food, and drinking water. This cluster included top-ranking chemicals containing many structural features that were enriched amongst BCs. Specifically, all ten top-ranking chemicals contained six structural features that were statistically enriched within BCs, with eight being the highest number found in any UC as a point of comparison. Additionally, the range of correlations, 0.7–1, indicates high physicochemical similarity to BCs for nine of the ten top-ranking UCs. It is also worth noting co-occurrence with the seven unique BCs 1,2-dichloropropane, ametryn, etoxazole, oryzalin, prosulfuron, quercetin, and tribenuron-methyl, indicating that this cluster contained a relatively high number of chemicals associated with breast cancer. A few of the top-ranking chemicals in this cluster included 1,3-dichloropropene, ethametsulfuron, methoprotryne, secbumeton, and (Z)-dichloropropene. 1,3-dichloropropene is of particular interest as it has recently been identified by Cardona and Rudel as one of 35 pesticides of concern regarding effects to the mammary gland [44]. This work reviewed EPA pesticide Registration Eligibility Decisions and examined whether mammary tumors were considered in corresponding carcinogenicity classifications. Additionally, mechanistic data were evaluated for biological activity relevant to in vivo outcomes. BCs ametryn and oryzalin were also included amongst these pesticides.
Cluster 6 results summary:
Cluster 6 showed patterns similar to cluster 5 in that it contained chemicals with exposure sources related to pesticides, food, and drinking water; in addition it contained the exposure sources of groundwater and surface water. Top-ranking UCs in cluster 6, which included cyanazine, deethylatrazine, and deisopropylatrazine, displayed co-occurrence with the seven BCs. These BCs included 1,2-dibromo-3-chloropropane, azoxystrobin, and triasulfuron. Additional BCs atrazine, dichlorvos, diuron, and simazine were notably among the 35 pesticides of concern identified by Cardona and Rudel [44]. The number of enriched structural features of top-ranking chemicals in the cluster ranged from two to seven and the physicochemical correlations ranged from 0.789–0.992.
Cluster 9 results summary:
Cluster 9 contained chemicals involved in a few different exposure source categories, particularly those relevant to consumer products and various interior / household sources of exposure through product use categories typically used in the indoor environment. This cluster contained chemicals that co-occurred with the BC, isoeugenol. An example chemical to highlight is 2-methoxy-4-vinylphenol, which like other top-ranking chemicals in the cluster, may be used as a flavoring or fragrance additive.
DISCUSSION
This study aimed to characterize combinations of chemicals that likely occur in our everyday environment which may be impacting risk of acquiring breast cancer. We used informatics-based approaches to identify novel understudied chemicals that may co-occur with chemicals associated with cancer risk. Clustering-based analyses of the 6,793 chemicals with use information (78 BCs; 409 NBCs; 6,306 UCs) yielded 19 clusters of chemicals that represent likely patterns of co-exposure conditions that humans may experience. These results are of high translational relevance, as humans are commonly exposed to multiple chemicals and other stressors in their everyday environments [9]. Our finding that understudied chemicals in the environment share chemical property similarities with cancer-associated chemicals is novel, and particularly impactful when chemical combinations are prioritized based on likely co-occurrence in everyday exposure scenarios. We further identified 50 top-ranking chemicals in five clusters of environmental relevance, ranked based on co-occurrence exposure patterns and predicted carcinogenicity from structural and physicochemical property similarities to BCs. Findings from this study yielded a novel list of understudied chemicals that warrant further testing through toxicological and human biomonitoring studies, individually and in combination with co-occuring BCs in the context of environmental mixtures.
One of the prioritized chemical clusters (cluster 4) contained chemicals present in personal care products and children’s products and toys co-occurring with the compounds, 17β-estradiol (also known as estradiol), estrone, and progesterone. Estradiol, estrone, and progesterone are major steroid hormones largely produced by ovaries in females that regulate female reproductive cycling, as well as a variety of cell processes and functions [43,45,46]. Alterations in the levels of these hormones and associated changes in the estrogen and progesterone receptor pathways have known implications in breast cancer etiology, often serving as targets in therapeutic intervention [46,47]. The observed exposure patterns in cluster 4 were interesting for two distinct reasons. First, these chemicals largely encompass dyes, including dyes used in hair products and cosmetics (e.g., 4-(3-((9,10-dihydro-4-hydroxy-9,10-dioxo-1-anthracenyl)amino)prop-yl)-4-methylmorpholinium methyl sulfate, acid green 22, and HC red 3). The potential co-occurrence of these chemicals in personal care products of this nature is concerning, as subsets of the population are likely experiencing combined exposures to these specific chemicals. Therefore, if these chemicals induce similar toxicological changes (e.g., changes in estrogen or progesterone receptor-related signaling), their combined exposures may alter associated disease risk. Second, the UCs in this cluster largely emphasized co-occurrence patterns with estradiol, estrone, and progesterone within the exposure source category of ‘personal care’. However, these data suggest that these UCs may also be present within exposure sources, particularly drinking water, surface water, and wastewater, that perhaps have yet to be evaluated for the presence of these chemicals. Future biomonitoring efforts could focus on the potential presence of these understudied chemicals in other exposure media.
Two separate priority chemical clusters (cluster 5 and cluster 6) included chemicals with exposure sources of pesticides, food, and water. The observed exposure patterns in clusters 5 and 6 were interesting for the following three reasons: First, there are a high proportion of BCs contained in each of these clusters, highlighting important chemicals that already show an association with breast cancer and co-occur across environmental exposure sources. In cluster 5 some of the chemicals that induced cancer-related changes in mammary tissue in animals and/or were associated with breast cancer in humans included ametryn, prosulfuron, and tribenuron-methyl. In cluster 6 these chemicals included atrazine, azoxystrobin, and simazine. These chemicals, in themselves, warrant further investigation, as potential contributors towards human disease based on cumulative exposure impacts. Second, the top-ranking UCs, within these clusters had many structural features that were statistically enriched in BCs and showed distinctly high physicochemical correlations with BCs in the cluster. Specific chemicals with high structural and physicochemical similarities included 1,3-dichloropropene, ethametsulfuron, and secbumeton (cluster 5) and cyanazine, deethylatrazine, and deisopropylatrazine (cluster 6). Third, these chemicals showed co-occurrence patterns largely involving pesticides, food, and water, indicating that humans may experience combined exposures to these chemicals via ingestion. This common exposure route may also exacerbate the effects of these chemicals on the human body, given that they may hit similar initial target tissues.
Exposure sources for the remaining two priority clusters (1 and 9) spanned pesticide and food related categories (cluster 1) and indoor environment and household categories (cluster 9). Cluster 9 contained chemicals that may co-occur via multiple exposure routes. However, more data are needed surrounding the presence of these chemicals in specific exposure scenarios and environmental media to further develop future hypotheses to test these chemicals. Chemicals in cluster 1 included dyes (C.I. acid blue 9 and methyl blue) and chemicals in cluster 9 included flavorants and fragrances (e.g., 2-methoxy-4-vinylphenol). Thus, such chemicals could be present in multiple types of products within household environments.
This study advances knowledge surrounding new exposure patterns relevant to environmental mixtures and their potential relevance to breast cancer, though it is notable that future research could further enhance this topic of investigation. The focus here was on identifying chemical co-occurrence based on use pattern alone, without the explicit additional consideration of fate and transport or exposure mechanism. Although some chemicals may co-occur with BCs in various exposure sources, their properties may render them more or less likely to result in an ultimate exposure. These considerations were addressed in a manner herein via the prioritization based on property similarity. In the future, studies could expand on these findings by leveraging additional exposure-relevant resources, such as biological or environmental monitoring data (e.g., National Health and Nutrition Examination Survey [NHANES] [48] or the National Water Information System [49], and exposure models that incorporate mechanistic information (e.g., the consumer and ambient models included in The Systematic Empirical Evaluation of Models [SEEM] framework) [50].
In this work, the Mammary Carcinogens Review Database and ToxRefDB were used to identify chemicals with associations to breast cancer. These databases consist of findings from both human and animal studies, with the broad assumption that rodent models of carcinogenesis in the mammary gland are indicative of human breast carcinogenesis. This assumption also contributed to the identification of potential NBCs, representing a list of chemicals needed to identify which chemical structures were enriched amongst BCs in comparison to NBCs. As breast cancer is a multi-etiological disease, identification of chemicals associated with increased rates of breast cancer is challenging due to lack of comprehensive datasets and current consensus supporting breast cancer classifications across the wide landscape of chemicals. As data on this topic continues to expand, these classifications will continue to improve. For instance, the final list of NBCs included chemicals without evidence of causing mammary gland tumors in animal models and chemicals that were not present in the Mammary Carcinogens Review Database. This list therefore represents chemicals that currently lack a known association with breast cancer (and thus have a lower prioritization in terms of breast cancer concern), though these associations may change over time as studies continue to develop. Additional health databases could be queried to inform the delineation of BCs, NBCs, and UCs, such as more exhaustive literature reviews and/or text mining approaches [51]. Other informatics approaches could be leveraged as these efforts continue to expand, including frequent itemset mining [52] and machine learning/predictive modeling approaches [35,53,54]. Additionally, leveraging the increasing chemical annotation in resources like PubMed could expand upon this study’s findings; for instance, normalized pointwise mutual information approaches could be expanded to find chemical:gene:disease associations in the published literature [55]. The use of animal model data as an indicator of putative breast cancer risk avoids imputing relationships from the published literature on chemical:gene and gene:disease associations, but as confidence and data increase in these chemical:gene and gene:disease associations, this could revolutionize how the risk of exposure to mixtures is evaluated.
In conclusion, this study set out to identify which understudied chemicals in our everyday environment co-occur alongside BCs according to chemical usage, and also show structural and physicochemical similarities to BCs. The resulting chemicals represent those that are of high interest in the designing of future epidemiological and toxicological investigations for understanding the effects of exposure to individual chemicals and mixtures. We specifically highlighted 50 top-ranking chemicals that remain understudied in relation to their putative breast cancer risk. These chemicals on their own may warrant further investigation, and when co-occuring with BCs, may represent high priority mixtures in the environment that have the potential to impact breast cancer risk. Though there is recent momentum surrounding the evaluation of chemical mixtures in the environment [56–58], it is imperative that this focus continues to expand across exposure and health science fields.
Supplementary Material
IMPACT STATEMENT.
Most environmental studies on breast cancer have focused on evaluating relationships between individual, well-known chemicals and breast cancer risk. This study set out to expand this research field by identifying understudied chemicals and mixtures that may occur in everyday environments due to their patterns of commercial use. Analyses focused on those that co-occur alongside chemicals associated with breast cancer, based upon in silico chemical database querying and analysis. Particularly in instances when understudied chemicals share physicochemical properties and structural features with carcinogens, these chemical mixtures represent conditions that should be studied in future clinical, epidemiological, and toxicological studies.
Acknowledgements
The research described in this manuscript has been reviewed by the Center for Computational Toxicology and Exposure, U.S. EPA, and approved for publication. Approval does not signify that contents necessarily reflect the views and policies of the agency, nor does the mention of trade names or commercial products constitute endorsement or recommendation for use. The authors would like to thank Drs. Peter Egeghy and Chris Corton for providing internal technical review of this manuscript.
Funding
This study was supported by the Institute for Environmental Health Solutions (IEHS) at the Gillings School of Global Public Health, RFA-18-01, ‘Identifying solutions that optimize the health of cancer survivors’, and through the National Institutes of Health (NIH) from the National Institute of Environmental Health Sciences, including grant funds (P42ES031007). Support was also provided by the Intramural Research Program of the Office of Research and Development, U.S. Environmental Protection Agency, Research Triangle Park, North Carolina.
Footnotes
Conflict of interest
The authors declare no competing interests.
Data Availability
All data used for these analyses are publicly available, either through CPDat [16] ToxRefDB [19], or the CompTox Chemicals Dashboard [22]. Script associated with these analyses are publicly available through the Ragerlab Github repository [59]. Data that were combined and analyzed in generating results for this specific study are provided as supplemental material (Supplemental Tables S1–S10, provided online through the Ragerlab-Dataverse repository [41]).
REFERENCES
- 1.IBCERCC. Breast Cancer and the Environment: Prioritizing Prevention. Report of the Interagency Breast Cancer and Environmental Research Coordinating Committee (IBCERCC) 2013. [cited 2021 Jun 1]. Available from: https://www.niehs.nih.gov/about/assets/docs/ibcercc_full_508.pdf. [Google Scholar]
- 2.Campeau PM, Foulkes WD, Tischkowitz MD. Hereditary breast cancer: new genetic developments, new therapeutic avenues. Hum Genet. 2008;124(1):31–42. [DOI] [PubMed] [Google Scholar]
- 3.Apostolou P, Fostira F. Hereditary breast cancer: the era of new susceptibility genes. Biomed Res Int. 2013;2013:747318. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 4.IOM. Breast Cancer and the Environment: A Life Course Approach: The Institute of Medicine (IOM) of the National Academies; 2012. [cited 2021 Jun 1]. Available from: https://www.nap.edu/catalog/13263/breast-cancer-and-the-environment-a-life-course-approach.
- 5.WCRF. Diet, nutrition, physical activity and breast cancer: World Cancer Research Fund (WCRF); 2018. [cited 2021 Jun 1]. Available from: https://www.wcrf.org/wp-content/uploads/2021/02/Breast-cancer-report.pdf.
- 6.Hiatt RA, Brody JG. Environmental Determinants of Breast Cancer. Annu Rev Public Health. 2018;39:113–33. [DOI] [PubMed] [Google Scholar]
- 7.Gray JM, Rasanayagam S, Engel C, Rizzo J. State of the evidence 2017: an update on the connection between breast cancer and the environment. Environ Health. 2017;16(1):94. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 8.Rodgers KM, Udesky JO, Rudel RA, Brody JG. Environmental chemicals and breast cancer: An updated review of epidemiological literature informed by biological mechanisms. Environ Res. 2018;160:152–82. [DOI] [PubMed] [Google Scholar]
- 9.Carlin DJ, Rider CV, Woychik R, Birnbaum LS. Unraveling the health effects of environmental mixtures: an NIEHS priority. Environ Health Perspect. 2013;121(1):A6–8. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 10.Dionisio KL, Frame AM, Goldsmith MR, Wambaugh JF, Liddell A, Cathey T, et al. Exploring consumer exposure pathways and patterns of use for chemicals in the environment. Toxicol Rep. 2015;2:228–37. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 11.Dionisio KL, Phillips K, Price PS, Grulke CM, Williams A, Biryol D, et al. The Chemical and Products Database, a resource for exposure-relevant data on chemicals in consumer products. Sci Data. 2018;5:180125. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 12.Isaacs KK, Dionisio K, Phillips K, Bevington C, Egeghy P, Price PS. Establishing a system of consumer product use categories to support rapid modeling of human exposure. J Expo Sci Environ Epidemiol. 2020;30(1):171–83. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 13.OECD. Internationally harmonised functional product and article use categories ENV/JM/MONO(2017)14. Organisation for Economic Co-operation and Development (OECD). Organisation for Economic Co-operation and Development, 2017. [Google Scholar]
- 14.Williams AJ, Grulke CM, Edwards J, McEachran AD, Mansouri K, Baker NC, et al. The CompTox Chemistry Dashboard: a community data resource for environmental chemistry. J Cheminform. 2017;9(1):61. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 15.Grulke CM, Williams AJ, Thillanadarajah I, Richard AM. EPA’s DSSTox database: History of development of a curated chemistry resource supporting computational toxicology research. Comput Toxicol. 2019;12. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 16.EPA U The Chemical and Products Database (CPDat) MySQL Data File 2020. [cited 2020 September 23]; Version 2:[Available from: https://epa.figshare.com/articles/dataset/The_Chemical_and_Products_Database_CPDat_MySQL_Data_File/5352997/2.
- 17.SSI. Mammary Carcinogens Review Database. Silent Spring Institute (SSI) 2021. [cited 2021 Nov 1]. Available from: http://sciencereview.silentspring.org/mamm_about.cfm.
- 18.Rudel RA, Attfield KR, Schifano JN, Brody JG. Chemicals causing mammary gland tumors in animals signal new directions for epidemiology, chemicals testing, and risk assessment for breast cancer prevention. Cancer. 2007;109(12 Suppl):2635-66. [DOI] [PubMed] [Google Scholar]
- 19.Watford S, Ly Pham L, Wignall J, Shin R, Martin MT, Friedman KP. ToxRefDB version 2.0: Improved utility for predictive and retrospective toxicology analyses. Reprod Toxicol. 2019;89:145–58. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 20.Harvey JB, Hong HH, Bhusari S, Ton TV, Wang Y, Foley JF, et al. F344/NTac Rats Chronically Exposed to Bromodichloroacetic Acid Develop Mammary Adenocarcinomas With Mixed Luminal/Basal Phenotype and Tgfbeta Dysregulation. Vet Pathol. 2016;53(1):170–81. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 21.Dunnick JK, Elwell MR, Huff J, Barrett JC. Chemically induced mammary gland cancer in the National Toxicology Program’s carcinogenesis bioassay. Carcinogenesis. 1995;16(2):173–9. [DOI] [PubMed] [Google Scholar]
- 22.EPA U CompTox Chemicals Dashboard Batch Search 2020. [cited 2021]; 3.5:[Available from: https://comptox.epa.gov/dashboard/dsstoxdb/batch_search.
- 23.Rager JE, Clark J, Eaves LA, Avula V, Niehoff NM, Kim YH, et al. Mixtures modeling identifies chemical inducers versus repressors of toxicity associated with wildfire smoke. Sci Total Environ. 2021;775:145759. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 24.Klaren WD, Ring C, Harris MA, Thompson CM, Borghoff S, Sipes NS, et al. Identifying Attributes That Influence In Vitro-to-In Vivo Concordance by Comparing In Vitro Tox21 Bioactivity Versus In Vivo DrugMatrix Transcriptomic Responses Across 130 Chemicals. Toxicol Sci. 2019;167(1):157–71. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 25.Rager JE, Suh M, Chappell GA, Thompson CM, Proctor DM. Review of transcriptomic responses to hexavalent chromium exposure in lung cells supports a role of epigenetic mediators in carcinogenesis. Toxicol Lett. 2019;305:40–50. [DOI] [PubMed] [Google Scholar]
- 26.Phillips KA, Wambaugh JF, Grulke CM, Dionisio KL, Isaacs KK. High-throughput screening of chemicals as functional substitutes using structure-based classification models. Green Chem. 2017;19(4):1063–74. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 27.Leydesdorff L On the normalization and visualization of author co-citation data: Salton’s Cosine versus the Jaccard index. Journal of the American Society for Information Science & Technology. 2008;59(1):77–85. [Google Scholar]
- 28.Rousseeuw PJ. Silhouettes: a graphical aid to the interpretation and validation of cluster analysis Journal of Computational and Applied Mathematics. 1987;20:53–65. [Google Scholar]
- 29.Krishna S, Berridge B, Kleinstreuer N. High-Throughput Screening to Identify Chemical Cardiotoxic Potential. Chem Res Toxicol. 2021;34(2):566–83. [DOI] [PubMed] [Google Scholar]
- 30.Lowe CN, Phillips KA, Favela KA, Yau AY, Wambaugh JF, Sobus JR, et al. Chemical Characterization of Recycled Consumer Products Using Suspect Screening Analysis. Environ Sci Technol. 2021;55(16):11375–87. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 31.Beckers LM, Busch W, Krauss M, Schulze T, Brack W. Characterization and risk assessment of seasonal and weather dynamics in organic pollutant mixtures from discharge of a separate sewer system. Water Res. 2018;135:122–33. [DOI] [PubMed] [Google Scholar]
- 32.Patlewicz G, Ball N, Booth ED, Hulzebos E, Zvinavashe E, Hennes C. Use of category approaches, read-across and (Q)SAR: general considerations. Regul Toxicol Pharmacol. 2013;67(1):1–12. [DOI] [PubMed] [Google Scholar]
- 33.Yang C, Tarkhov A, Marusczyk J, Bienfait B, Gasteiger J, Kleinoeder T, et al. New publicly available chemical query language, CSRML, to support chemotype representations for application to data mining and modeling. J Chem Inf Model. 2015;55(3):510–28. [DOI] [PubMed] [Google Scholar]
- 34.Wang J, Hallinger DR, Murr AS, Buckalew AR, Lougee RR, Richard AM, et al. High-throughput screening and chemotype-enrichment analysis of ToxCast phase II chemicals evaluated for human sodium-iodide symporter (NIS) inhibition. Environ Int. 2019;126:377–86. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 35.Ring C, Sipes NS, Hsieh JH, Carberry C, Koval LE, Klaren WD, et al. Predictive modeling of biological responses in the rat liver using in vitro Tox21 bioactivity: Benefits from high-throughput toxicokinetics. Comput Toxicol. 2021;18. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 36.Zhang Z, Wang S, Li L. Emerging investigator series: the role of chemical properties in human exposure to environmental chemicals. Environ Sci Process Impacts. 2021. [DOI] [PubMed] [Google Scholar]
- 37.Mansouri K, Grulke CM, Judson RS, Williams AJ. OPERA models for predicting physicochemical properties and environmental fate endpoints. J Cheminform. 2018;10(1):10. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 38.Rager JE, Strynar MJ, Liang S, McMahen RL, Richard AM, Grulke CM, et al. Linking high resolution mass spectrometry data with exposure and toxicity forecasts to advance high-throughput environmental monitoring. Environ Int. 2016;88:269–80. [DOI] [PubMed] [Google Scholar]
- 39.Auerbach S, Filer D, Reif D, Walker V, Holloway AC, Schlezinger J, et al. Prioritizing Environmental Chemicals for Obesity and Diabetes Outcomes Research: A Screening Approach Using ToxCast High-Throughput Data. Environ Health Perspect. 2016;124(8):1141–54. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 40.Reif DM, Martin MT, Tan SW, Houck KA, Judson RS, Richard AM, et al. Endocrine profiling and prioritization of environmental chemicals using ToxCast data. Environ Health Perspect. 2010;118(12):1714–20. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 41.Koval LE, Dionisio KL, Friedman KP, Isaacs KK, Rager JE. Dataset for Environmental Mixtures and Breast Cancer: Identifying Co-Exposure Patterns between Understudied vs Breast Cancer-Associated Chemicals using Chemical Inventory Informatics 2022. [cited 2022 May 27]. Available from: 10.15139/S3/UMPCKW. [DOI] [PMC free article] [PubMed]
- 42.CFR- Code of Federal Regulations Title 21. Sect. 81.10 (1977).
- 43.Samavat H, Kurzer MS. Estrogen metabolism and breast cancer. Cancer Lett. 2015;356(2 Pt A):231–43. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 44.Cardona B, Rudel RA. US EPA’s regulatory pesticide evaluations need clearer guidelines for considering mammary gland tumors and other mammary gland effects. Mol Cell Endocrinol. 2020;518:110927. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 45.Stillwell W An Introduction to Biological Membranes Composition, Structure and Function. 2nd edition ed: Elsevier Science; 2016. June 30. [Google Scholar]
- 46.Trabert B, Sherman ME, Kannan N, Stanczyk FZ. Progesterone and Breast Cancer. Endocr Rev. 2020;41(2). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 47.Kulkoyluoglu-Cotul E, Arca A, Madak-Erdogan Z. Crosstalk between Estrogen Signaling and Breast Cancer Metabolism. Trends Endocrinol Metab. 2019;30(1):25–38. [DOI] [PubMed] [Google Scholar]
- 48.CDC. Fourth National Report on Human Exposure to Environmental Chemicals. 2021. [PubMed]
- 49.USGS. USGS Water Data for USA 2021. Available from: https://waterdata.usgs.gov/nwis?
- 50.Ring CL, Arnot JA, Bennett DH, Egeghy PP, Fantke P, Huang L, et al. Consensus Modeling of Median Chemical Intake for the U.S. Population Based on Predictions of Exposure Pathways. Environ Sci Technol. 2019;53(2):719–32. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 51.Baker N, Knudsen T, Williams A. Abstract Sifter: a comprehensive front-end system to PubMed. F1000Res. 2017;6. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 52.Kapraun DF, Wambaugh JF, Ring CL, Tornero-Velez R, Setzer RW. A Method for Identifying Prevalent Chemical Combinations in the U.S. Population. Environ Health Perspect. 2017;125(8):087017. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 53.Clark J, Avula V, Ring C, Eaves LA, Howard T, Santos HP, et al. Comparing the Predictivity of Human Placental Gene, microRNA, and CpG Methylation Signatures in Relation to Perinatal Outcomes. Toxicol Sci. 2021. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 54.Wambaugh JF, Wang A, Dionisio KL, Frame A, Egeghy P, Judson R, et al. High throughput heuristics for prioritizing human exposure to environmental chemicals. Environ Sci Technol. 2014;48(21):12760–7. [DOI] [PubMed] [Google Scholar]
- 55.Watford SM, Grashow RG, De La Rosa VY, Rudel RA, Friedman KP, Martin MT. Novel application of normalized pointwise mutual information (NPMI) to mine biomedical literature for gene sets associated with disease: use case in breast carcinogenesis. Comput Toxicol. 2018;7:46–57. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 56.Taylor KW, Joubert BR, Braun JM, Dilworth C, Gennings C, Hauser R, et al. Statistical Approaches for Assessing Health Effects of Environmental Chemical Mixtures in Epidemiology: Lessons from an Innovative Workshop. Environ Health Perspect. 2016;124(12):A227–A9. [DOI] [PMC free article] [PubMed] [Google Scholar]; child lead exposure for the plaintiffs in a public nuisance case related to childhood lead poisoning. None of these activities were directly related to the present study. The other authors declare they have no actual or potential competing financial interests.
- 57.Drakvik E, Altenburger R, Aoki Y, Backhaus T, Bahadori T, Barouki R, et al. Statement on advancing the assessment of chemical mixtures and their risks for human health and the environment. Environ Int. 2020;134:105267. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 58.Rider CV, McHale CM, Webster TF, Lowe L, Goodson WH 3rd, La Merrill MA, et al. Using the Key Characteristics of Carcinogens to Develop Research on Chemical Mixtures and Cancer. Environ Health Perspect. 2021;129(3):35003. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 59.Ragerlab. Ragerlab Github 2021. [cited 2021]. Available from: https://github.com/Ragerlab.
- 60.ToxPrint. ToxPrint: Altamira LLC; 2021. [cited 2021 August, 6]. Available from: https://toxprint.org.
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Supplementary Materials
Data Availability Statement
All data used for these analyses are publicly available, either through CPDat [16] ToxRefDB [19], or the CompTox Chemicals Dashboard [22]. Script associated with these analyses are publicly available through the Ragerlab Github repository [59]. Data that were combined and analyzed in generating results for this specific study are provided as supplemental material (Supplemental Tables S1–S10, provided online through the Ragerlab-Dataverse repository [41]).
