Abstract
Cheminformatics-based applications to predict transformation pathways of environmental contaminants are useful to quickly prioritize contaminants with potentially toxic/persistent products. Direct photolysis can be an important degradation pathway for sunlight-absorbing compounds in the aquatic systems. In this study, we developed the first freely available direct phototransformation pathway predictive tool, which uses a rule-based reaction library. Journal publications studying diverse contaminants (such as pesticides, pharmaceuticals, and energetic compounds) were systematically compiled to encode 155 reaction schemes into the reaction library. The execution result of this predictive tool was internally evaluated against 390 compounds from the compiled journal publications and externally evaluated against 138 compounds from the regulatory reports. The recall (sensitivity) and precision (selectivity) were 0.62 and 0.35, respectively, for internal evaluation, and 0.56 and 0.20, respectively, for external evaluation, when only the products formed from the first reaction step were counted. This predictive tool could help to narrow the data gaps in chemical registration/evaluation and inform future experimental studies.
Graphical Abstract
Introduction
Knowledge about environmental transformation products of organic contaminants is important for chemical risk assessment performed by regulatory agencies, research scientists, and chemical manufacturers. Facing the challenge of an increasing number of synthetic organic compounds being released into the environment with limited degradation data, a number of cheminformatics-based tools have been created to predict the transformation products to address this knowledge gap. These commercial or free tools include pathway prediction for microbial degradation (e.g., University of Minnesota/Eawag-BBD Pathway Prediction System/enviPath1, 2), mammalian metabolism (e.g. Meteor,3 BioTransformer4), and abiotic hydrolysis and reduction [Chemical Transformation Simulator (CTS)5].
Aquatic phototransformation is an important elimination process for synthetic organic contaminants with the possibility of forming more toxic or more persistent photoproducts.6, 7 However, this process has received minimal attention in regard to in silico product prediction. To date, the Meta-PC expert system from MultiCASE Inc. has provided the only available phototransformation product prediction program in environmentally relevant aquatic systems since 2001.8, 9 Although this expert system is a pioneer in photoproduct prediction, it was only available through the purchase of a software license (currently discontinued), and its photodegradation library does not seem to have been updated since its first release. During the last two decades, a growing number of publications have focused on identifying direct phototransformation products of emerging as well as traditional contaminants using more sensitive analytical techniques (Figure S1 in the supporting information (SI)), necessitating a new tool to capture the current process science. Besides Meta-PC, Zeneth is a commercially available software system from Lhasa Limited to predict forced degradation pathways of mainly pharmaceuticals, including degradation under the impact of light stress.10, 11 However, light stress tests are often conducted using pure drugs and/or in simple (organic) solutions/suspensions,12 limiting the software’s applicability in aquatic systems.
Therefore, the goal of this study is to build and evaluate the first freely available in silico predictive tool for direct photolysis of organic contaminants in sunlit aquatic systems. This tool will be made available through the CTS website developed by the United States Environmental Protection Agency (U.S. EPA) shortly after the publication of this paper.13 We chose to start with direct photolysis (the contaminant itself absorbs light leading to its degradation) over indirect photolysis (other water substituents absorb light to induce contaminant degradation) because of the more abundant literature on the former subject. We pursued a knowledge-based approach by building a reaction library composed of a set of reaction schemes (also called rules or transformations by other predictive tools), which define how a structural fragment within the input chemical is changed to the corresponding structural fragment within the product.14 Prediction performance of the reaction library was evaluated internally and externally to provide prediction confidence and directions for future improvement. We also applied the reaction library to a large compound set of environmentally relevant contaminants to investigate the importance of different reaction schemes.
Methods
Database collection.
Experimental data were collected as three separate databases for the development and/or evaluation of the direct photolysis library. The first database was comprehensively compiled from peer-reviewed journal publications that identified direct phototransformation products in environmentally relevant experimental settings. These environmental settings were defined by the authors to specify that irradiation wavelengths (λ) were within the solar spectrum (≥ 280 nm if multichromatic and ≥ 290 nm if monochromatic) and the solute was dissolved/suspended in buffered/unbuffered deionized water with ≤ 10% nonabsorbing organic co-solvent such as acetonitrile and methanol. Depending on the type of lab instrumentation used and the availability of standards, some papers report numerous products, while others report only major products. While we tried to capture as many products as possible in our database, for literature that reported more than five products, usually only major products (i.e., of higher concentrations or larger detector-specific peak areas), products formed by one to several known transformation steps, or clearly identified products (i.e., verified by chemical standards) were included in the database. Regioisomers formed from the same reaction scheme were logged only once unless clearly differentiated in the referenced paper. Products formed from chirality/double-bond stereoisomerization, dimerization, and primary hydrolysis were not included. The formation of inorganic products (such as carbon dioxide, sulfuric acid, and ammonia) were usually omitted. The above-presented database is referred to as DB-J-ENV. Some photolysis studies conducted under conditions that were not environmentally relevant were also reviewed to support reaction mechanism elucidation. However, these data, stored in a second database referred to as DB-J-SUP, were used only to constrain reactivity of a few reaction schemes and not to evaluate the reaction library.
Another source of information was selected from the European Food Safety Authority (EFSA) reports on pesticides according to similar criteria used for DB-J-ENV.15 The database extracted from EFSA reports is referred to as DB-EFSA-ENV. The acceptable EFSA reports include the results of experiments conducted with a xenon lamp without clear confirmation if a λ ≥ 290 nm filter was used. While these reports do not strictly meet our criteria for inclusion in the database, we deemed them acceptable for external performance evaluation. These reports should comply with EFSA’s general guidance to perform the study under sunlight-relevant conditions, and few significant differences in the reported products were found with or without the mention of a solar filter in DB-EFSA-ENV.
Development of the direct photolysis reaction library.
We created reaction schemes for the direct photolysis reaction library according to DB-J-ENV. First, the name of every transformation(s) linking the parent compound and the product was assigned based on the referenced literature or authors’ chemical intuition. When a transformation in DB-J-ENV with enough supporting knowledge (from DB-J-ENV and DB-J-SUP) is a known single reaction step or a combination of a few fast consecutive steps, this transformation was manually encoded in the reaction library as a reaction scheme. Our identification of a known/clear mechanism is based on our literature searches and experience. Similar reaction schemes were combined if allowed by the encoding language; however, in some cases, minor differences in the chemical structure were associated with significant reactivity differences, necessitating the development of multiple reaction schemes to capture these differences. Because unknown mechanisms impede reaction scheme generalization, it was not our goal to ensure that all observed products were predicted. Instead, we favored the first few transformations and those with well-known/clear mechanisms. Figure S2 provides an example illustrating the selection of transformations to be encoded as reaction schemes in our direct photolysis reaction library. A balance between precision (as defined below) and scheme generalization was carefully sought for reaction scheme development. Transformations that have low precision because of a large number of potential reaction sites such as “aromatic photohydroxylation”, in which an OH group replaces an H atom attached to an aromatic carbon, were not included. Self-sensitized reactions occur when the parent compound is degraded by reactive species (such as a singlet oxygen, a hydroxyl radical, and excited triplet states of the parent compound) induced by the light absorption of the parent compound. Although this type of reaction could not be strictly defined as direct photolysis, it is hard to distinguish self-sensitized reactions from direct photolysis with simple experiments, and self-sensitized reactions are more likely to happen when the compound concentration is high, as is the case in most laboratory studies. Therefore, self-sensitization reactions were included in our direct photolysis reaction library.
We used ChemAxon’s Reactor software (ChemAxon LLC, Budapest, Hungary) to encode reaction schemes in the direct photolysis reaction library and Metabolizer software to execute the prediction of photolysis transformation products. Details can be found in ChemAxon’s publication14 and our earlier paper on building a hydrolysis reaction library.5 Briefly, each reaction scheme consists of a reactant fragment, a reaction arrow, and a product fragment(s) linked to the reactant by atom mapping (see examples in Figure 1). Reactivity, selectivity, and exclusion rules can be written using ChemAxon’s Chemical Terms language to constrain the reaction scheme. The photolysis reaction library is currently unranked at this development stage and we execute the direct photolysis library in the exhaustive mode where all products are predicted without prioritization. Considering the need for ranking the library, we are currently exploring several approaches, which is the subject of our next publication.
Figure 1.
Reaction scheme examples from the direct photolysis reaction library. The number counts of reaction schemes in each category are included in the parentheses after the category name. Notation using ChemAxon’s Chemical Term Language and associated rules were detailed in the documentation file.
Performance evaluation.
We used the two environmentally relevant databases as evaluation sets: DB-J-ENV composed of 390 compounds that served as an internal evaluation set aimed at assessing the reproducibility of our library prediction against the experimental data and DB-EFSA-ENV composed of 138 compounds that served as an external evaluation set.
Library prediction performance up to a selected reaction generation (step) was evaluated by its comparison with experimental data in terms of product structure. The matching of a product was achieved by comparing the unique smiles strings16 generated with ChemAxon’s Molconverter command line program after removing double-bond stereo and chiral stereo information and converting nitro groups to the charged form using ChemAxon’s Standardizer software.
Two types of compound sets were first generated for the compounds in the evaluation sets: the products that are experimentally observed up to a certain reaction generation i (CO,i) and the products that are predicted up to generation i (CP,i). CO,all represents all experimentally observed products regardless of the generation. When a product is observed in different publications for the same parent compound, this product is only included once in CO,i (or CO,all). Using the language of set theory in mathematics, we can define the compound set of experimentally observed products that are correctly predicted (COP,i) as the intersection (denoted by ∩) of CO,i (or CO,all) and CP,i.
Therefore, three types of numbers were counted: the number of experimentally observed products that are correctly predicted [NOP,i = ∣COP,i∣, which equals true positive (TP)], the number of predicted products [NP,i = ∣CP,i∣, which equals TP plus false positive (FP)], and the number of experimentally observed products [NO,i = ∣CO,i∣ or NO,all = ∣CO,all ∣, which equals TP plus false negative (FN)]. The relationship of the three product counts is shown as a schematic diagram in Figure S3. Two measures, recall (also referred to as sensitivity by other papers) and precision (also referred to as selectivity by other papers) were calculated to evaluate the prediction performance as detailed below.
To understand how many products of all experimentally observed ones were correctly predicted after prediction up to a certain generation i, we conducted an overall comparison where CO,all was used as the dataset of observed products:
(1) |
(2) |
(3) |
To understand how many products of all observed first-generation products were correctly predicted in first-generation prediction, we conducted first-generation comparison where CO,1 is used as the dataset of observed products:
(4) |
(5) |
(6) |
Two additional evaluations were made in addition to the above compound-specific evaluation. First, we investigated the performance of individual reaction schemes in the library to reveal how many chemicals were subjected to transformation by the reaction scheme and to help diagnose over-prediction or under-prediction of products by each reaction scheme. To calculate the recall and precision for each reaction scheme in the library, similar first-generation comparisons were conducted using data associated with the same reaction scheme. In this case, the counts NO,1, NP,1, and NOP,1 become the counts for parent compounds instead of products. The generation notation of “1” in the subscript is dropped for simplification because we performed only first-generation comparisons in the reaction-specific evaluation and “r” is added in the subscript to indicate that the evaluation is based on the reaction scheme, and thus the notation becomes NO_r, NP_r, and NOP_r. Second, when information about the concentration of a photoproduct was available, we also evaluated the recall for predicting a major photoproduct which is defined to be formed at levels of ≥ 10 % of the degraded parent concentration at least once during the time course of the photolysis experiment. Figure S4 provides a simple example for calculating the three evaluation counts based on the compound-specific performance, reaction-scheme-specific performance, and major-product performance.
Results and Discussion
Direct photolysis reaction schemes.
The direct photolysis reaction library was created based on the compiled environmentally relevant peer-reviewed journal publication database DB-J-ENV. DB-J-ENV spans a large variety of 390 organic chemicals consisting of pesticides, industrial compounds, pharmaceuticals, personal care products, munitions chemicals, and so forth. Among these, pesticides and pharmaceuticals constitute the majority of the chemicals. The complete list of compound names and standardized smiles in DB-J-ENV is provided in List S1. About 21% of these compounds were studied in more than one paper.
The current library consists of 155 reaction schemes. Based on the structural differences between the reactant and product fragments, the reaction schemes were loosely classified into 10 different categories: photorearrangement, photodissociation, photoelimination, photocyclization, photochemical ring contraction, photohydrolysis, photohydration, photooxidation, photoreduction, and secondary dark reaction (Table S1). One example from each category is displayed in Figure 1. Cleavage type reactions including photodissociation, photoelimination, and photohydrolysis represent about half of the reaction schemes. All the nondark reaction schemes capture the chromophore as best as feasible, but substituents not shown in the reactant fragment are sometimes responsible for enhanced light absorption in the solar range. The term “secondary” in the category of “secondary dark reaction” means that the associated product was formed by a dark process from intermediate photoproducts. The categorization was not meant for mechanism elucidation. For instance: photodissociation reactions can result from oxidation; photohydrolysis reactions do not necessarily require water as a reactant and can result from homolytic cleavage of the associated bond or even oxidation; and photooxidation reactions can be caused by transient reactive species generated by self-sensitization. Instead, the categorization was used to capture the structural change leading to a transformation product. For example, the photohydrolysis reaction scheme indicates that a reaction will generate products that resemble the traditional dark hydrolysis product. The underlying mechanisms of a direct photolytic reaction are much more complicated and involve light absorption to form an electronically excited state, formation of unstable intermediates such as a radical pair or an ion pair, and a series of spontaneous reactions of the intermediates to form stable/metastable products in the ground state.17 A list of all the reaction schemes with example compounds is provided in the documentation file.
A total of 136 of the 155 reaction schemes were created based on the observed first-generation transformation products in DB-J-ENV (Table S1, NO_r > 0 and thus NOP_r > 0). These reaction schemes are also referred to as nonsecondary reaction schemes. The counts of example compounds which were observed to undergo these reaction schemes (NO_r in Table SI) range from 1 to 55, with 52 schemes supported by more than two example compounds (NO_r ≥ 3). The remaining 19 reaction schemes, including all 8 secondary dark reactions and 11 other schemes, were created based on products beyond the first generation, and therefore were excluded in any reaction-scheme-specific evaluation of first-generation processes.
Examples and internal evaluation of the direct photolysis reaction library.
In order to evaluate how well we integrated the current process science of direct photolysis described by DB-J-ENV into the reaction library, the direct photolysis products of every compound in DB-J-ENV were predicted using the library up to the third generation. This serves as an internal evaluation. Prediction beyond the third generation was not performed because a few compounds generated too many products which exceeded the memory limit of the internal setting of Metabolizer. Two important measures in prediction evaluation were calculated for each compound: recall, which describes the portion of observed products that are correctly predicted, and precision, which describes the portion of predicted products that are experimentally observed.
As a first attempt at evaluating the library performance, we summed the product counts for all the compounds in DB-J-ENV in Table 1 to conduct an internal validation of the library. The overall comparison counted all the observed products regardless of reaction generations (NO,all), while the first-generation comparison used only the observed products formed within the first generation (NO,1). About 40% of all the observed products and 62% of the observed first-generation products were correctly predicted by executing the direct photolysis library for one generation. Increasing the number of predicted generations to two increased the overall recall to 48%, while a further increase of the number of predicted generations resulted in a limited improvement. This is because the experimental data focused on the formation of transformation products from the first few generations; however, the limited improvement with the prediction of subsequent generations is highly compound specific. The recall increased to 61-80% when only major products were considered, confirming our strategy of focusing on capturing the major products. All these recall values are less than 100% because we did not create schemes to capture the formation of every reported product in the DB-J-ENV database. Schemes were only created for parent-product transformations that are based on clearly characterized direct photolysis mechanisms. Additionally, in some cases, the DB-J-ENV database contained highly conflicting information about whether a particular transformation would occur; unless the structure-dependent reactivity could be elucidated based on the information in DB-J-ENV or DB-J-SUP, these transformations were not included in the library. Including every reported product in the DB-J-ENV database would force the recall to 100% for the internal validation; however, this exhaustiveness would also have created a greater number of extremely specific reaction schemes and exponentially lowered precision. Precision was 35% for first-generation prediction but decreased by approximately half with each increase in the number of predicted generations. The combination of insufficient detection/identification of multiple generation products (including isomers) and the lack of reaction rates and reactivity constraints (i.e. rules defining reactivity based on the structural difference in compounds that are predicted to undergo the same reaction scheme) largely account for the low precision.
Table 1.
Internal evaluation of the direct photolysis library against the journal publication database DB-J-ENV.
overall comparison |
first-generation comparison |
||||||||
---|---|---|---|---|---|---|---|---|---|
generation (up to i) |
NO,i | NP,i | NOP,i | recall NOP,i/NO_all |
precision NOP,i/NP,i |
NOP,1 | recall NOP,1/NO,1 |
precision NOP,1/NP,1 |
|
all products (390 compounds) | 1 | 891 | 1608 | 563 | 0.40 | 0.35 | 555 | 0.62 | 0.35 |
1+2 | 1241 | 4341 | 676 | 0.48 | 0.16 | ||||
1+2+3 | 1353 | 7775 | 703 | 0.50 | 0.09 | ||||
all | 1396 | ||||||||
major products (134 compounds) | 1 | 177 | 144 | 0.61 | 142 | 0.80 | |||
1+2 | 223 | 171 | 0.73 | ||||||
1+2+3 | 229 | 175 | 0.74 | ||||||
all | 235 |
The precision and recall of each individual compound in DB-J-ENV were highly variable as shown in Figure S5. The precision calculation is only applicable with compounds of nonzero number of predicted products (NP,i). The mean and median recall and precision values of all compounds with nonzero NP,i are also plotted in Figure S5 and are similar to the values in Table 1 or higher than the values in Table 1 by up to 0.2. Therefore, evaluation measures in Table 1 are conservative (i.e. on the lower end) because they include chemicals with no predicted products. As expected, the increase in the number of predicted generations from one to three enhanced the recall and decreased the precision.
We selected two compounds from Figure S5 for the comparison of observed products with those predicted from our library. Figure 2 depicts the direct photolysis prediction of an organophosphate pesticide fenthion. This example was chosen because of a balanced recall and precision along with an adequate number of products for the purpose of illustration. All five experimentally observed products were correctly predicted through “organophosphorus ester photohydrolysis”, “organothiophosphorus ester photooxidation to oxon”, and “aromatic thioether photooxidation”. With the additional generations of predictions, however, the overprediction of products increased. The eight products that were predicted, but not observed might be below the detection limit in the experiment, not formed because of low reactivity, or quickly transformed to other products.
Figure 2.
Library prediction example for fenthion. Experimentally observed products were compiled from literature.20-24 The triggered reaction schemes in the first generation and the ones leading to observed products are labelled: 1. “organophosphorus ester photohydrolysis”; 2. “organothiophosphorus ester photooxidation to oxon”; 3.“organothiophosphorus ester photorearrangement”; 4. “aromatic thioether photooxidation”. All experimentally observed products (products without boxes) in DB-J-ENV were predicted in this example. Products a and b were observed in a study which did not clearly state whether they were formed in distilled water or natural water, and product b was also observed as a photoproduct of parent compound c (fenthion sulfoxide) in DB-J-ENV. Note that this manually created diagram is different from the output format of CTS.
Hexahydro-1,3,5-trinitro-1,3,5-triazine (RDX), a nitro-containing munitions compound, provides another prediction example with complexity beyond three generations (Figure S6). The prediction tree expanded only during the third generation, and six out of the nine observed products were correctly predicted after three generations. A number of reactions were predicted to occur beyond the third generation, resulting in the formation of only four new products, two of which were experimentally observed. One observed product was not predicted at all, which was detected only in trace amount,18 thus no reaction scheme was created for it. Six products were predicted but not observed, some of which were likely not detectable by the experimental methods.
External evaluation with regulatory registration data for pesticides.
Because of data scarcity, no data from peer-reviewed journal publications were withheld during the library development. Instead, we turned to the EFSA database of 138 compounds (DB-EFSA-ENV, a list of compound names and standardized smiles displayed in List S2) for an external evaluation. The EFSA database is extracted from the regulatory registration reports for pesticides, and the chemicals are less diverse in terms of photolabile functional groups. The structural fragments present in the compounds in DB-EFSA-ENV were predicted to be transformed by only 70 reaction schemes within the library in the first generation. For 46 of these reaction schemes, the predicted products matched the observed products from the EFSA reports. These counts were only a half and one-third of the number of predicted (141) and observed (136) schemes for DB-J-ENV, respectively. Nevertheless, this is the only other publicly available data source we were able to find. A total of 33 of the 138 compounds in DB-EFSA-ENV overlaps with DB-J-ENV, and the agreement in detected products between both databases for the common compound list varies.
Table 2 shows that the recall was 30-37% depending on the number of prediction generations for the overall comparison and 56% for the first-generation comparison. Precision was 20-21% for first-generation prediction. Again, these measures based on the summation of all compounds are generally more conservative estimates compared to the mean and the median of the measures of individual compounds (Figure S7). Recall and precision of the external evaluation were slightly lower compared to the internal evaluation, which could be ascribed to a number of reasons such as differences in compound sets, experimental methods, data logging, and so forth. However, the trends are similar: prediction of two generations had the most benefit of increasing the recall with decreasing precision by a half; a higher recall of 64% was obtained when the evaluation was based on major products formed in the first generation.
Table 2.
External evaluation of the direct photolysis library against the EFSA database DB-EFSA-ENV.
overall comparison |
first-generation comparison |
||||||||
---|---|---|---|---|---|---|---|---|---|
generation (up to i) |
NO,i | NP,i | NOP,i | recall NOP,i/NO_all |
precision NOP,i/NP,i |
NOP,1 | recall NOP,1/NO,1 |
precision NOP,1/NP,1 |
|
all products (138 compounds) | 1 | 239 | 649 | 134 | 0.30 | 0.21 | 133 | 0.56 | 0.20 |
1+2 | 391 | 1758 | 156 | 0.34 | 0.09 | ||||
1+2+3 | 438 | 3149 | 170 | 0.37 | 0.05 | ||||
all | 454 | ||||||||
major products (119 compounds) | 1 | 136 | 88 | 0.35 | 87 | 0.64 | |||
1+2 | 210 | 93 | 0.37 | ||||||
1+2+3 | 241 | 104 | 0.41 | ||||||
all | 254 |
Reaction-scheme-specific performance.
Reaction-scheme-specific recall and precision were also analyzed for each reaction scheme in the library Table S1) for the prediction of first-generation products for DB-J-ENV. We were able to correctly predict transformations for all the parent compounds that were observed to undergo a certain reaction scheme with one exception, that is, we did not predict the transformation of one compound that was observed to undergo “diphenyl ether photohydrolysis” because of conflicting information from the literature.
“Aromatic halide photohydrolysis” (55 compounds), “aromatic ether photohydrolysis” (21 compounds), and “aromatic photohydrodehalogenation” (41 compounds) were the most commonly observed reaction schemes. Figure 3 depicts the distribution of all nonsecondary reaction schemes (those created based on first-generation transformation products in DB-J-ENV) in a two-dimensional space of precision (NOP_r/NP_r) over correctly predicted parent compounds (NOP_r) in the first generation. Most of the reaction schemes are located in the upper left area, and hence associated with a few examples and high precision, suggesting the specificity of the corresponding reaction center. Reaction schemes in the upper right area, such as “dinitroaniline photochemical N-dealkylation” and “sulfonamide SO2 extrusion photorearrangement (6-6)”, can be applied with higher confidence to predict products of compounds not found in DB-J-ENV. Reaction schemes with low precision (e.g. < 0.5, 37 reaction schemes) and/or with fewer example compounds (e.g. <5, 109 reaction schemes) necessitate future experimental studies. Similar results of the common reaction schemes and specificity of most reaction centers are shown for evaluation using DB-EFSA-ENV in Table S1 and Figure S8.
Figure 3.
Characterization for the 136 nonsecondary reaction schemes observed in the first generation evaluated for DB-J-ENV. Nonsecondary reaction schemes were created based on observed first-generation transformation products in DB-J-ENV (i.e. NO_r > 0 and thus NOP_r > 0). The size (and color) of circles represent count of reaction schemes with the same NOP_r (number of parent compounds correctly predicted to undergo a certain reaction scheme) and precision (NOP_r/NP_r).
For compounds with multiple reaction centers, the low precision of triggered reaction schemes often leads to a large number of predicted products after consecutive generations, known as the combinatorial explosion problem. Note that in DB-J-ENV, 24 of 31 compounds that were predicted to have more than 50 products after 3 generations have more than 2 halogens on an aromatic ring. The development of reactivity/selectivity rules constraining “aromatic photohydrodehalogenation” and “aromatic halide photohydrolysis” would greatly reduce the extent of combinatorial explosion.
Application of direct photolysis prediction to environmental chemicals.
In order to test the relevance of the reaction schemes in the library for common environmental contaminants, we predicted the direct photolytic transformation products for a list of 32,464 compounds. This list is the prediction set curated by the Collaborative Estrogen Receptor Activity Prediction Project (CERAPP) to include a large portion of manufactured organic chemicals with environmental exposure potential.19 The counts of compounds that were predicted to undergo a given reaction scheme are tabulated in Table S1. Potential direct photolysis products were predicted for 40% of chemicals in the CERAPP list. The most common structural fragments in the listed chemicals were “aromatic halide” (14% of chemicals), “aromatic ether” (10%), “aromatic amine” (4%), “N-aryl amide” (3%), and “aromatic nitro” (4%), similar to those of DB-J-ENV and DB-EFSA-ENV.
On the other hand, about 23 of the library’s 136 nonsecondary reaction schemes (as defined above) triggered less than 5 contaminants in the CERAPP list in the first generation. These reaction schemes generally involve very special, mostly large structural fragments. Some of the special fragments could be generalized if future experiments on structurally similar compounds become available. The importance of these reaction schemes, however, should not be merely determined by their predicted occurrence in the first generation of transformations for the original parent compound. Further reaction generations and other environmental transformation types such as microbial degradation should also be considered. An example is “phenoxyphenol dehalogenative photorearrangement” and “phenoxyphenol dehalogenative photocyclization to dioxin” reaction schemes, both of which were predicted only for one compound (triclosan) out of the thirty-two thousand compounds. However, when the second reaction generation was considered, another 36 compounds triggered the rearrangement reaction, and 18 of them triggered the cyclization reaction, forming potentially toxic dioxin products.
As discussed above, Figure 3 provides information about which reaction schemes require further improvement based on the number of observed examples and precision. Additionally, we plotted the ratio of the percentage of parent compounds predicted to be transformed by a given reaction scheme in the CERAPP compound list (%NP_r) to that of the DB-J-ENV list against the same x-axis as in Figure 3 (Figure 4). This approach can provide a visual tool to identify which reaction schemes can potentially occur for a large number of uninvestigated contaminants found in the CERAPP list relative to DB-J-ENV, a third factor for identifying which reaction schemes should be prioritized for future investigation. The range of these ratios (y-axis in Figure 4, referred to as triggering ratios below) spans more than three orders of magnitude (~0.001 to ~1) for reaction schemes with less than 5 observed examples in DB-J-ENV and narrows to one order of magnitude (~0.1 to ~1) for those with at least 5 observed examples. As plotted in Figure 4, reaction schemes that were predicted to impact a larger fraction of compounds in the CERAPP list relative to the fraction in DB-J-ENV occur in the upper left area, and schemes with low precision in this area of the plot should be of higher priority for future studies. An example scheme is “aromatic amine photohydrolysis” with a low precision of 0.05, a low number of example compounds of 1, but a high triggering ratio of 0.8.
Figure 4.
The difference in occurrence of nonsecondary reaction schemes between CERAPP and DB-J-ENV lists. Precision of a certain reaction scheme were evaluated using DB-J-ENV. The reference line of “same %NP_r” indicates the percentage of compounds predicted to undergo a given reaction scheme is the same for the two compound lists, and in other words, %NP_r (CERAPP)/%NP_r (DB-J-ENV) = (NP_r (CERAPP)/32464)/(NP_r (DB-J-ENV)/390)=1. The markers were 40% transparent and thus overlaying markers look darker.
In conclusion, this first freely available direct photolysis reaction library contains reaction schemes for a wide range of commonly occurring, as well as highly specific functional groups, which are likely to transform upon absorption of sunlight in aquatic systems. The execution of this library through the CTS website could provide useful supplementary information for experimentalists to identify the possible direct phototransformation products detected through targeted/nontargeted analysis, and for modelers to estimate their environmental persistence and potential to form toxic transformation products. Library predictions could be improved through the addition of a sunlight-absorption calculation module to check the light absorption ability of a contaminant, constraining reaction schemes through the addition of proper reactivity/selectivity/exclusion rules, ranking the likelihood/rates of the reaction schemes based on experimental data, and the addition of photo-mineralization reaction schemes to account for the incomplete mass balance observed in most photolysis experiments. Additionally, there will be an ongoing need to modify and expand the direct phototransformation library as data from future studies on direct photolysis of environmental organic contaminants become available.
Supplementary Material
Acknowledgements
This research was supported in part by an appointment to the Internship/Research Participation Program at the U.S. EPA Office of Research and Development administered by the Oak Ridge Institute for Science and Education (ORISE) through Interagency Agreement no. DW-89-92525701-0 between the United States Department of Energy and the U.S. EPA. C.Y. thanks Wei Li for the initial python script for library evaluation. We thank Michaela Koopmans and John Olmstead for quality control of the databases.
Footnotes
Disclaimer: The views expressed in this paper are those of the authors and do not necessarily represent the views or policies of the U.S.EPA. Mention of trade names or products does not convey and should not be interpreted as conveying official U.S. EPA approval, endorsement, or recommendation.
Supporting information
The Supporting Information is available free of charge at https://pubs.acs.org/doi/10.1021/acs.est.0c00484.
Complete list of compound names and standardized smiles in DB-J-ENV and DB-EFSA-ENV, figure showing the increasing number of publications on identifying direct phototransformation products, example illustrating the selection of transformations to be encoded as reaction schemes in our direct photolysis reaction library, relationship and calculation example for the three evaluation counts, table providing the reaction-scheme-specific evaluation counts for all databases, example prediction flowchart for RDX, variability of the precision and recall of each individual compound in DB-J-ENV and DB-EFSA-ENV, and relationship between precision and count of correctly predicted examples for reaction schemes evaluated using DB-EFSA-ENV (PDF1) the whole documentation of the direct photolysis reaction library with environmentally relevant examples (PDF2)
References
- 1.Gao J; Ellis LBM; Wackett LP, The university of minnesota pathway prediction system: Multi-level prediction and visualization. Nucleic Acids Res. 2011, 39 (suppl_2), W406–W411. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 2.Wicker J; Lorsbach T; Giitlein M; Schmid E; Latino D; Kramer S; Fenner K, EnviPATH - the environmental contaminant biotransformation pathway resource. Nucleic Acids Res. 2016, 44 (D1), D502–D508. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 3.Marchant CA; Briggs KA; Long A, In silico tools for sharing data and knowledge on toxicity and metabolism: Derek for windows, meteor, and vitic. Toxicol. Mech. Methods 2008, 18 (2-3), 177–187. [DOI] [PubMed] [Google Scholar]
- 4.Djoumbou-Feunang Y; Fiamoncini J; Gil-de-la-Fuente A; Greiner R; Manach C; Wishart DS, Biotransformer: A comprehensive computational tool for small molecule metabolism prediction and metabolite identification. J. Cheminform 2019, 11 (1), 2. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 5.Tebes-Stevens C; Patel JM; Jones WJ; Weber EJ, Prediction of hydrolysis products of organic chemicals under environmental pH conditions. Environ. Sci. Technol 2017, 51 (9), 5008–5016. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 6.Schwarzenbach RP; Gschwend PM; Imboden DM, Environmental organic chemistry. 3rd ed.; Wiley: 2016. [Google Scholar]
- 7.Cory WC; Welch AM; Ramirez JN; Rein LC, Naproxen and its phototransformation products: Persistence and ecotoxicity to toad tadpoles (anaxyrus terrestris), individually and in mixtures. Environ. Toxicol. Chem 2019, 38 (9), 2008–2019. [DOI] [PubMed] [Google Scholar]
- 8.Sedykh A; Saiakhov R; Klopman G, Meta v. A model of photodegradation for the prediction of photoproducts of chemicals under natural-like conditions. Chemosphere 2001, 45 (6), 971–981. [DOI] [PubMed] [Google Scholar]
- 9.Meta-PC. Expert rule based program to predict metabolic and degradation products of chemicals, http://multicase.com/meta-pc (accessed on Apr 27, 2020)
- 10.Kleinman MH; Baertschi SW; Alsante KM; Reid DL; Mowery MD; Shimanovich R; Foti C; Smith WK; Reynolds DW; Nefliu M; Ott MA, In silico prediction of pharmaceutical degradation pathways: A benchmarking study. Mol. Pharm 2014, 11 (11), 4179–4188. [DOI] [PubMed] [Google Scholar]
- 11.Parenty ADC; Button WG; Ott MA, An expert system to predict the forced degradation of organic molecules. Mol. Pharm 2013, 10 (8), 2962–2974. [DOI] [PubMed] [Google Scholar]
- 12.ICH harmonised tripartite guideline. Stability testing: Photostability testing of new drug substances and products Q1B. https://www.ich.org/fileadmin/Public_Web_Site/ICH_Products/Guidelines/Quality/Q1B/Step4/Q1B_Guideline.pdf (accessed on Oct 15, 2019)
- 13.Kurt Wolfe; Pope N; Parmar R; Galvin M; Stevens C; Weber EJ; Flaishans J; Purucker T, Chemical transformation system: Cloud based cheminformatic services to support integrated environmental modeling. In Proceedings of the 8th International Congress on Environmental Modelling and Software, Toulouse, France, 2016. [Google Scholar]
- 14.Pirok G; Máté N; Varga J; Szegezdi J; Vargyas M; Dorant S; Csizmadia F, Making “real” molecules in virtual space. J. Chem. Inf. Model 2006, 46 (2), 563–568. [DOI] [PubMed] [Google Scholar]
- 15.EFSA provision of documents. Rapporteur member state assessment reports submitted for the eu peer review of active substances used in plant protection products (published before March 2015). http://dar.efsa.europa.eu/dar-web/provision/request (accessed on Oct 15, 2019)
- 16.Weininger D; Weininger A; Weininger JL, Smiles. 2. Algorithm for generation of unique smiles notation. J. Chem. Inf. Comput. Sci 1989, 29 (2), 97–101. [Google Scholar]
- 17.Turro NJ; Ramamurthy V; Scaiano JC, Morden molecular photochemistry of organic molecules. University Science Books: Sausalito, California, 2001. [Google Scholar]
- 18.Hawaii J; Halasz A; Groom C; Deschamps S; Paquet L; Beaulieu C; Corriveau A, Photodegradation of rdx in aqueous solution: A mechanistic probe for biodegradation with rhodococcus sp. Environ. Sci. Technol 2002, 36 (23), 5117–5123. [DOI] [PubMed] [Google Scholar]
- 19.Mansouri K; Abdelaziz A; Rybacka A; Roncaglioni A; Tropsha A; Varnek A; Zakharov A; Worth A; Richard AM; Grulke CM; Trisciuzzi D; Fourches D; Horvath D; Benfenati E; Muratov E; Wedebye EB; Grisoni F; Mangiatordi GF; Incisivo GM; Hong H; Ng HW; Tetko IV; Balabin I; Kancherla J; Shen J; Burton J; Nicklaus M; Cassotti M; Nikolov NG; Nicolotti O; Andersson PL; Zang Q; Politi R; Beger RD; Todeschini R; Huang R; Farag S; Rosenberg SA; Slavov S; Hu X; Judson RS, Cerapp: Collaborative estrogen receptor activity prediction project. Environ. Health Perspect 2016, 124 (7), 1023–1033. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 20.Yamada K; Terasaki M; Makino M, A novel estrogenic compound transformed from fenthion under uv-a irradiation.!. Hazard. Mater 2010, 176 (1), 685–691. [DOI] [PubMed] [Google Scholar]
- 21.Torrisi S; Sortino S, New insights into the photoreactivity of the organophosphorus pesticide fenthion: A σ aryl cation as a key intermediate in the photodecomposition. J. Agric. Food Chem 2004, 52 (19), 5943–5949. [DOI] [PubMed] [Google Scholar]
- 22.Hirahara Y; Ueno H; Nakamuro K, Aqueous photodegradation of fenthion by ultraviolet b irradiation: Contribution of singlet oxygen in photodegradation and photochemical hydrolysis. Water Res. 2003, 37 (2), 468–476. [DOI] [PubMed] [Google Scholar]
- 23.Hirahara Y; Ueno H; Nakamuro K, Comparative photodegradation study of fenthion and disulfoton under irradiation of different light sources in liquid- and solid-phases. J. Health Sci 2001, 47 (2), 129–135. [Google Scholar]
- 24.Huang J; Mabury SA, The role of carbonate radical in limiting the persistence of sulfur-containing chemicals in sunlit natural waters. Chemosphere 2000, 41 (11), 1775–1782. [DOI] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.