Skip to main content
EPA Author Manuscripts logoLink to EPA Author Manuscripts
. Author manuscript; available in PMC: 2019 Mar 1.
Published in final edited form as: Chemosphere. 2017 Nov 23;194:94–106. doi: 10.1016/j.chemosphere.2017.11.137

Demonstration of a consensus approach for the calculation of physicochemical properties required for environmental fate assessments

Caroline Tebes-Stevens* , Jay M Patel , Michaela Koopmans §, John Olmstead §, Said H Hilal , Nick Pope ¥, Eric J Weber , Kurt Wolfe
PMCID: PMC6146973  NIHMSID: NIHMS983252  PMID: 29197820

Abstract

Eight software applications are compared for their performance in estimating the octanol-water partition coefficient (Kow), melting point, vapor pressure and water solubility for a dataset of polychlorinated biphenyls, polybrominated diphenyl ethers, polychlorinated dibenzodioxins, and polycyclic aromatic hydrocarbons. The predicted property values are compared against a curated dataset of measured property values compiled from the scientific literature with careful consideration given to the analytical methods used for property measurements of these hydrophobic chemicals. The variability in the predicted values from different calculators generally increases for higher values of Kow and melting point and for lower values of water solubility and vapor pressure. For each property, no individual calculator outperforms the others for all four of the chemical classes included in the analysis. Because calculator performance varies based on chemical class and property value, the geometric mean and the median of the calculated values from multiple calculators that use different estimation algorithms are recommended as more reliable estimates of the property value than the value from any single calculator.

Keywords: Physicochemical property, QSAR, Predictive models, Consensus, Cheminformatics, Persistent Organic Pollutants (POPs)

Introduction

Environmental fate and transport models can be used to assess the likelihood of environmental exposure of vulnerable organisms to toxic organic chemicals dissolved in groundwater, surface water, and runoff. These models generally require the user to provide input parameters that characterize the effects of partitioning as well as abiotic and microbial transformation reactions on the fate of a chemical. Model input parameters may include physicochemical properties (e.g., water solubility), partition coefficients (e.g., sediment-water partition coefficients) and transformation rate constants.

Since the 1970’s, numerous models of varying complexity have been developed to estimate physicochemical property values used to parameterize models for environmental fate and transport of organic chemicals. Many early Quantitative Structure-Property Relationships (QSPRs) estimated the value of a property from the summation of constants associated with the atoms and/or fragments present in the molecule (e.g., Hansch and Leo, 1979, Meylan and Howard, 1995). Other early approaches focused on the development of linear regression equations to estimate a property value from a measured value of another property (e.g., Hansch et al., 1968, Meylan et al., 1996). More recent quantitative structure activity relationship (QSAR)-based approaches have advanced the use of sophisticated molecular descriptors and/or complex algorithms to improve predictive capability (e.g. Guha and Willighagen, 2012, Kim et al., 2016, Martin et al., 2008, Papa et al., 2009, Schüürmann et al., 2006, Svetnik et al., 2003, Tetko et al., 2001).

Other models have been developed to estimate physicochemical property values using mechanistic approaches. For example, SPARC utilizes a “toolbox” of mechanistic perturbation models to estimate physicochemical properties based on fundamental chemical structure theory (Hilal et al., 2003, Hilal et al., 2004). This toolbox includes models accounting for the effects of resonance, electrostaticinteractions, solute-solvent interactions, solute-solute (self) interactions, dispersion, induction, dipole and H-bonding. COSMO-RS (COnductor-like Screening MOdel for Realistic Solvents) estimates property values based on chemical potential differences of molecules in liquids (Klamt et al., 1998). The model uses quantum chemically generated charge density surfaces to describe each molecule and its interactions with other molecules. This approach automatically incorporates electronic group effects such as inductive and mesomeric influences on the polarity, as well as intramolecular interactions such as hydrogen bonding and excess energies.

In addition to utilizing a variety of different computational approaches, existing physicochemical property calculators were developed using different training sets to calibrate the models. Commercially available property calculators typically do not disclose the training set of measured values that was used to derive their models; however, many public domain property calculators make this information available. The paucity of good measured data for some groups of chemicals has made it difficult to calibrate existing physicochemical property calculators to accurately predict the physicochemical property values for these chemicals. For example, low-solubility chemicals are not well-represented in the training sets for most calculators due to the inherent difficulty in measuring the aqueous-phase properties of these molecules (Kim et al., 2016). Comparisons of calculated physicochemical property values to measured values for select chemicals can reveal the capabilities and limitations of existing calculators.

A primary objective of this paper is to demonstrate the value of using consensus predictions from a variety of different physicochemical property calculators that take different approaches to calculating specific physicochemical properties (Tropsha, 2010, Tropsha and Golbraikh, 2010). This analysis was motivated by the requirement to calculate physicochemical property values for transformation products predicted by the Chemical Transformation Simulator (CTS), a web-based software tool under development at the U.S. EPA National Exposure Research Laboratory to predict environmental transformation pathways for organic chemicals (Tebes-Stevens et al., 2017, Wolfe et al., 2016). Depending on the molecule of interest, a substantial fraction of the predicted transformation products may not have assigned Chemical Abstracts Service Registry Numbers (CASRNs) or be available for purchase in the catalogs of chemical merchants. For these unrecognized transformation products, no measured property values will be available; therefore, it is necessary to estimate the property values based on molecular structure alone.

A curated dataset of measured property values was compiled for chemicals with low solubility and high hydrophobicity for the purpose of evaluating calculator performance. The properties selected for comparison include Kow, which can be used to characterize the likelihood of partitioning to sediment organic carbon (Karickhoff et al., 1979), and water solubility and vapor pressure. Taken together, these three physicochemical properties control partitioning of the chemical between the aqueous, solid and gaseous phases, and therefore, largely determine the environmental fate and residence time of a chemical. Melting points were also included in the data compilation, because some of the calculators use the melting point to estimate the water solubility and vapor pressure of the chemical.

The dataset included 62 polychlorinated biphenyls (PCBs), 48 polybrominated diphenyl ethers (PBDEs), 34 polychlorinated dibenzodioxins (PCDDs), and 16 polycyclic aromatic hydrocarbons (PAHs) with available measured property data for at least one of the four properties of interest. Based on their physicochemical properties and their observed persistence in the environment, these chemical classes have been classified as Persistent Bioaccumulative Toxics (PBTs) (Weisbrod et al., 2007) and Persistent Organic Pollutants (POPs) (Wania and Mackay, 1996, Jones and de Voogt, 1999). These four chemical classes are of environmental concern because of their tendency to persist in soil and sediment and to bioaccumulate in fatty tissue of organisms due to their long half-lives and lipophilic tendencies. Additionally, PCBs, PBDEs, PCDDs and PAHs with moderate to high vapor pressure can be transported in the atmosphere over long distances due to their ability to volatilize at environmental temperatures (Wania and Mackay, 1996).

2. Materials and methods

2.1. Physicochemical property calculators

The physicochemical property calculators used in this analysis were selected for their diversity in terms of computational approaches, maturity and training sets. Additionally, first-principles and quantum chemistry based models were excluded from this comparison because their long run times make them impractical for batch estimation of properties for chemical lists and unsuitable for integration with other software tools. Table 1 lists the calculators included in this comparison, identifies which of the four properties considered in this analysis are available from each calculator, notes the software version used in this evaluation, and summarizes the calculation method used to estimate the property values of interest. Several of the software applications implement more than one algorithm for prediction of a property value. ACD/Percepta, ChemAxon Plugin Calculators, EPI Suite and Toxicity Estimation Software Tool (T.E.S.T.) are available as desktop applications, while the Online Chemical Modeling Environment (OCHEM) (Sushko et al., 2011) and SPARC are both web-based applications. The ChemAxon Plugin Calculators, SPARC and T.E.S.T. models are also available as webservices, facilitating their integration with other software tools (Wolfe et al., 2016). The developers of NICEATM and OPERA have made their source code freely available; however, users must have some familiarity with open source cheminformatics tools to install and execute the code. Fortunately, pre-calculated property values from NICEATM and OPERA have been made available for hundreds of thousands of organic chemicals on EPA’s CompTox Chemistry Dashboard (https://comptox.epa.gov/dashboard/). ACD/Percepta, ChemAxon Plugin Calculators and SPARC are commercial software packages; however, the remaining applications are freely available.

Table 1.

Summary of Calculators Used in this Comparison.

Application or
Website
Version Model Property Calculation Method Reference(s)
ACD/Percepta 2012 PhysChem
Profiler
Kow, WS Group Contribution: MLR with fragment
counts as descriptors; based on CLOGP
algorithm with improved correction
factors
Petrauskas and Kolovanov (2000)
ChemAxon
Plugin
Calculators
16.10.31.0 KLOP Kow Group Contribution: MLR with fragment
counts as descriptors
Klopman et al. (1994)
VG Kow Group Contribution: MLR with fragment
counts as descriptors
Viswanadhan et al. (1989)
PHYS Kow Group Contribution: MLR with fragment
counts as descriptors
Based on Viswanadhan et al. (1989) with
PHYSPROP as training set
Solubility
Predictor
WS Group Contribution: MLR with atom
counts as descriptors
Hou et al. (2004)
EPI Suite 4.11 KOWWIN™ Kow Group Contribution: MLR with fragment
counts as descriptors
Meylan and Howard (1995)
WATERNT WS Group Contribution: MLR with fragment
counts as descriptors
US EPA (2012)
WSKOW WS MLR with log Kow, MP and MW as
descriptors
Meylan et al., 1996, US EPA, 2012
MPBPVP MP, VP Group Contribution: MLR with fragment
counts as descriptors for MP and BP; VP
from nonlinear function of BP
US EPA (2012)
NICEATM 2017 Kow, WS,
MP, VP
QSPR based on support vector
regression
Zang et al. (2017)
OCHEM 2016-01-
24
ALOGPS 2.1 Kow, WS Associative Neural Network (ASNN)
with electrotopological state indices as
descriptors
Tetko and Tanchuk (2002)
2013-10-
23
ALOGPS 3.0 Kow, WS
2015-11-
23
Best E-state
(dan2097)
MP ASNN with electrotopological state
indices as descriptors
Tetko et al. (2016)
2014-08-2 Best E-state
(itetko)
MP ASNN with electrotopological state
indices as descriptors
Tetko et al. (2014)
OPERA 1.02 Kow, WS,
MP, VP
Distance-weighted k-nearest neighbors
(kNN) using PaDEL molecular
descriptors
Mansouri et al. (2016)
SPARC 2017 Kow, WS,
VP
Mechanistic perturbation and solute-
solvent interaction models
Hilal et al., 2003, Hilal et al., 2004
T.E.S.T. 4.2 FDA MP, WS,
VP
Hierarchical Clustering with similar
chemicals
Contrera et al., 2003, Martin et al., 2008
Group
Contribution
MP, WS,
VP
Group Contribution: MLR with fragment
counts as descriptors
Martin and Young (2001)
Hierarchical
Clustering
MP, WS,
VP
Hierarchical Clustering Martin et al. (2008)
Nearest
Neighbor
MP, WS,
VP
Average property value for 3 most
similar molecules based on cosine
similarity coefficient
Martin et al., 2008, U.S. EPA, 2016

As is shown in Table 1, about half of the prediction models compared in this analysis are based on atom- or fragment-based group contribution models. These QSAR models estimate the property of interest as a summation of constants associated with the individual atoms or fragments present in the molecule. Many of these models include correction terms, to account for interactions among the fragments or the effects of hydrogen bonds, for example. Both the correction terms and the constants quantifying the atom or fragment contribution to the property value are derived through multiple linear regression (MLR) using a training set of measured property values.

Other QSAR models considered in this evaluation use more complex descriptors, which are themselves calculated from the 2D molecular structure of the molecule. These include a series of models available on the OCHEM website, which estimate properties using an Associative Neural Network (ASNN) algorithm with electrotopological state indices as descriptors (Tetko and Tanchuk, 2002, Tetko et al., 2014, Tetko et al., 2016). Additionally, the descriptor set for the Hierarchical Clustering and FDA methods implemented in the T.E.S.T. application consists of over 800 descriptors, including indices for electrotopological state (E-state), connectivity, topology and geometry, as well as molecular properties and fragment counts (Martin et al., 2008, Martin et al., 2012, U.S. EPA, 2016).

The Nearest Neighbor algorithms implemented in T.E.S.T., NICEATM (Zang et al., 2017) and OPERA (Mansouri et al., 2016) estimate physicochemical property values from the available measured values for “similar” molecules. The T.E.S.T. Nearest Neighbor algorithm estimates the property value as the average of the values for the three most similar chemicals based on the cosine similarity coefficient for the descriptor pool of 816 descriptors (Martin et al., 2008, U.S. EPA, 2016). The OPERA command-line application implements a k-nearest neighbors (kNN) algorithm where k ranges between 3 and 7, with similarity defined by the Euclidean distance between the molecule of interest and the chemicals in the training set based on 766 descriptors calculated by the PaDEL software (Yap, 2011).

The physicochemical property estimation algorithms implemented in SPARC are based on mechanistic models describing intra/intermolecular interactions (Hilal et al., 2003, Hilal et al., 2004). Intramolecular effects are represented using a perturbation model with contribution terms from structural fragments in the molecule. Intermolecular effects are modeled using interaction terms for dispersion, induction, dipole-dipole, and hydrogen-bonding. SPARC calculates both vapor pressure and solute-solvent activity coefficients from summations of these interaction terms. Water solubility is then calculated as the infinite dilution activity coefficient in water and Kowas the ratio of the activities of the compound in water and octanol (Hilal et al., 2004).

2.2. Chemical structure representation and manipulation

Software applications for the estimation of physicochemical properties calculate the property values from the molecular structure of the chemical of interest. These applications generally allow the user to specify the chemical structure by drawing the structure using a graphical interface, by entering a string-based structural identifier (e.g., Simplified Molecular Input Line Entry System (SMILES) (Weininger, 1988) or the International Chemical Identifier InChi or InChiKey (Heller et al., 2013, Heller et al., 2015), or by providing an input file with a description of the molecular structure in a standard format (e.g., mol or sdf (Dalby et al., 1992)). These applications may also allow the user to enter nonstructural identifiers (e.g., names or CASRNs) for chemicals, relying on databases that associate the structural representation of a known chemical with a set of nonstructural identifiers. The structure associated with a particular name or CASRN may differ slightly among the databases; for example, names or CASRNs may be associated with different isomers or tautomers in different databases. Furthermore, PHYSPROP (Syracuse Research Corporation, 1994) and other databases have been shown to contain various errors, including invalid CASRNs, erroneous structures and inconsistencies among the chemical names, CASRNs and structures (Young et al., 2008, Mansouri et al., 2016).

In this evaluation, chemicals were identified by a SMILES string, rather than a name or CASRN, to ensure that all of the physicochemical property calculators were estimating property values for the same molecular structures. For each chemical of interest, the SMILES string was first obtained from available databases (e.g., EPA’s CompTox Chemistry Dashboard). The SMILES strings were then processed to obtain the dominant tautomeric form of the chemical at neutral pH. ChemAxon Plugin Calculators were used to obtain the dominant tautomers; however, open source tools are also available for this purpose (O’Boyle et al., 2011, Kochev et al., 2013).

Additional manipulation of the SMILES strings is needed to make the strings amenable to batch-mode calculations with the SPARC model. SPARC includes functionality to modify SMILES strings as needed when calculating properties for one molecule at a time; however, this functionality is not available in batch mode. EPI Suite has similar constraints with respect to the format of SMILES strings when it is executed using command-line syntax, which is how the application is implemented in the CTS tool. When running EPI Suite in the command line mode or when using SPARC to perform calculations for a list of SMILES strings, it is necessary to modify the SMILES strings to remove stereochemistry characters (/, ∖, and @) and square brackets from the SMILES strings. Square brackets are often used to denote explicit hydrogens (e.g., [H] or [H+]) and to represent protonated (e.g., [NH+]) or deprotonated (e.g., [O-]) atoms in a SMILES string representation of a molecule; therefore, it is necessary to remove explicit hydrogens and to neutralize the SMILES strings for batch-mode SPARC calculations and command-line execution of EPI Suite. Additionally, for molecules containing a nitro functional group (NO2) it is necessary to convert the nitro group representation from the charged form ([N+](=O)[O-]) to the neutral form (N(=O) = O) in the SMILES string. All of these SMILES string manipulations were accomplished with ChemAxon’s Standardizer application; however, open source tools are also available for this purpose (O’Boyle et al., 2011).

2.3. Compilation of measured property values

Measured property values were compiled to evaluate the performance of the selected property calculators for estimation of the octanol-water partition coefficient (Kow), water solubility, vapor pressure and melting point. The property values were compiled from the literature for a dataset of hydrophobic chemicals comprised of PCBs, PBDEs, PCDDs, and PAHs.

Critical literature reviews have been published summarizing the available physicochemical property values for PCBs (Shiu and Mackay, 1986; and Li et al., 2003), PBDEs (Wania and Dugani, 2003) and PCDDs (Åberg et al., 2008, Shiu et al., 1988). The reviews provide recommended values for log Kow, water solubility, and vapor pressure; however, some of the reviews are missing more recently published data. Additionally, the recommended value sometimes incorporates or is based on calculated property values. In particular, the reviews by Åberg et al. (2008) and Li et al. (2003) report a final adjusted value (FAV) for each property, obtained using a mathematical algorithm to derive internally consistent values of solubility values and partition coefficients. While these calculated FAVs may be considered reliable estimates of the property values, we elected to include only literature-reported measured values in our dataset. Recommended values from the literature reviews were included only if the recommendation was made based on a thorough and up-to-date analysis of available measured data, with outliers identified and consideration given to the analytical methods used for the property measurements. If the available data was sparse for a given property, we consulted the reviews for promising sources of data and included the value in our dataset only after consulting the primary data source to obtain the value in its original reported units and review the analytical procedures described in the publication.

Our objective was to assemble a dataset of reliable measurements of Kow, water solubility, vapor pressure and melting point; therefore, measured values were included in the testing data set only if the appropriate analytical methods were used (U.S. EPA, 1996, U.S. EPA, 1998a, U.S. EPA, 1998b, OECD, 1995, OECD, 2006a, OECD, 2006b). In particular, for chemicals with a log Kow of five or higher, the recommended measurement method for Kow is the slow-stirring method (OECD, 2006b, Tolls et al., 2003). The traditional shake-flask method is only considered reliable below a log Kowof 4, above which the formation of octanol emulsions in water results in underestimation of the Kow value. (Danielsson and Zhang, 1996, Finizio et al., 1997, OECD, 2006a, OECD, 2006b, Tolls et al., 2003). Furthermore, the widely used RP-HPLC method, an indirect method requiring correlation against structurally analogous reference standards, is only considered reliable below a log Kow of 6 (Danielsson and Zhang, 1996, Finizio et al., 1997, OECD, 2004). If more than one reliable measured value was available for Kow, water solubility, or vapor pressure, the average of these values was used as the benchmark for calculator performance.

The preferred source of measured melting point values in our dataset was the NIST Chemistry Webbook (http://webbook.nist.gov/chemistry/). If no melting point was available from the NIST Chemistry Webbook, measured melting points were obtained from journal publications or the melting point dataset compiled by Bradley et al. (2014).

Calculator performance was assessed by means of the root mean square error(RMSE) between the predicted values from each calculator and the measured values in our curated dataset. This performance metric was selected because it is expressed in the same units as the physicochemical properties and it is normalized by the number of available measured property values, making it appropriate for comparing performance on datasets of different sizes. Calculator performance was assessed separately for each chemical class, as well as for the combined dataset. In addition to evaluating performance of the individual calculators, we assessed the predictive performance of the arithmetic mean, geometric mean and median of the estimated property values from all calculators.

3. Results and discussion

3.1. Composition of dataset of measured property values

The curated dataset of measured property values that was assembled to evaluate calculator performance is provided as Table SI-1 in the Supplementary data. This curated dataset differs from the training sets used to calibrate physicochemical property calculators in that it focuses on four specific chemical classes, while calculator training sets are typically assembled to include a wide diversity of chemical structure. Additionally, values are only included in the dataset if the measurements were made according to accepted methods of analysis for chemicals with low solubility and high hydrophobicity. In contrast, training sets are generally compiled with the goal of maximizing coverage of property values and chemicals in the dataset. These training sets are often assembled by pulling measured values from available compilations of data. For example, in the PHYSPROP database that served as a training set for several of the physicochemical property calculators considered in this evaluation, the citations provided for most of the Kow values are handbooks or databases with little information about how the Kow values were measured. Table 2summarizes the number and range of measured property values compiled for the four chemical classes included in this analysis, and Table SI-2 in the Supplementary data provides the same information for the PHYSPROP database.

Table 2.

Composition of dataset of measured values.

Chemical
Class
Log Kow Melting Point (°C) Vapor Pressure
(torr)
Water Solubility
(mg L−1)
# Range # Range # Range # Range
PCBs 24 4.01 to
8.19
57 17.00 to
306.63
16 9.50e-9 to
1.56e-3
16 1.25e-4 to 1.66
PBDEs 9 5.74 to
8.27
36 18.75 to
307.25
31 6.77e-15 to
2.24e-3
11 8.70e-4 to 0.13
PCDDs 14 6.20 to
6.74
12 88.85 to
325.50
20 1.50e-12 to
4.48e-4
16 1.94e-7 to 1.06
PAHs 9 3.83 to
6.22
15 91.35 to
279.85
14 2.30e-11 to
0.010
12 1.37e-4 to 3.93

Among the four chemical classes considered in this analysis, PCBs had the most available measured data for all four properties. In the curated dataset compiled for this analysis, the primary data sources for Kow of PCBs were two studies conducted with the slow-stir method, which is the recommended method for highly hydrophobic chemicals. The values from one of these studies, reported in De Bruijn et al. (1989), were included in the PHYSPROP database and two critical literature reviews for PCB property values (Shiu and Mackay, 1986; and Li et al., 2003); however, the measured Kow values in the other study, reported by Jabusch and Swackhamer (2005) were not included in these earlier compilations. For the 23 congeners that were included in both our dataset and PHYSPROP, the Kow values were within 0.2 log units of each other with the exception of four congeners. For PCB-8, the value in our dataset was about 0.5 log units higher than the value in PHYSPROP; conversely, the Kow values for PCB-52, PCB-155, and PCB-194 were at least 0.5 log units lower than the PHYSPROP values. The primary source for vapor pressure and solubility values included in our data set were the recommended values in the critical review by Li et al. (2003). A comparison of the 14 PCB congeners that had vapor pressure values reported in both PHYSPROP and Li et al. (2003) showed that the PHYSPROP values were consistently higher than the values recommended by Li et al. (2003). For water solubility, there were 16 PCB congeners with available values in both PHYSPROP and the review by Li et al. (2003); the PHYSPROP values were greater than the recommended values for 9 of the congeners and less than the recommended values for 7 of the congeners.

In the curated dataset compiled for this analysis, the primary data sources for Kow of PBDEs was a study by Braekevelt et al. (2003), which reported slow-stir measurements for Kow of nine PBDE congeners. The primary data source for PBDE water solubility measurements was Tittlemier et al. (2002), and the vapor pressure values were obtained from Tittlemier et al., 2002, Wong et al., 2001 and Fu and Suuberg (2011). The PHYSPROP database contained very few PBDE property values, with only one Kow value (for PBDE-55), one water solubility value (PBDE-209), four melting points, and eight vapor pressure measurements. Comparing the vapor pressure values for the seven PBDE congeners that were included in both PHYSPROP and our dataset, the PHYSPROP values were lower than the values we compiled for PBDE congeners with four or fewer chlorines; however, the PHYSPROP values were higher than the values in our dataset for PBDE-138 and PBDE-190.

For PCDDs, the majority of the Kow values in our dataset were obtained from slow-stir measurements by Sijm et al. (1989). Many of these values were also included in the PHYSPROP database, along with measurements made with other analytical procedures. The two primary sources of PCDD vapor pressure measurements in our dataset were measured values by Rordorf (1989) and Li et al. (2004). The Rordorf (1989) values were also included in the PHYSPROP database. Unfortunately, the PHYSPROP database also included predicted values reported by Rordorf (1989), which were incorrectly coded as experimental values in the database. For two of these molecules, 1,3,6,8-tetrachlorodibenzo-p-dioxin and 1,2,4,7,8-pentachlorodibenzo-p-dioxin, measured vapor pressures published by Li et al. (2004)were approximately an order of magnitude larger than the predicted values from Rordorf (1989). Additionally, for 2-chlorodibenzo-p-dioxin, the value attributed to Rordorf (1989) in PHYSPROP is an order of magnitude lower than the reported measured value in the original paper. Other than these three discrepancies, the PHYSPROP vapor pressure values were generally within a factor of two of the values in our dataset. Sources of the water solubility measurements in our dataset included Friesen et al., 1985, Shiu et al., 1988, Doucette and Andren, 1988 and Oleszek-Kudlak et al. (2007). In general, the PCDD water solubility measurements in the PHYSPROP database were within 25% of the values in our dataset; however, the PHYSPROP water solubility for octachlorodibenzo-p-dioxin was approximately twice the value in our dataset.

For PAHs, the Kow, water solubility and vapor pressure measurements in our dataset were in good agreement with the values in PHYSPROP. The primary source of both Kow and water solubility values in our dataset was de Maagd et al. (1998), with additional water solubility values taken from Mackay and Shiu (1977). Both our dataset and PHYSPROP included vapor pressure values reported by Sonnefeld et al. (1983) for three of the PAHs; however, for eight PAHs, our dataset includes measured values from Fu and Suuberg (2011) and Odabasi et al. (2006), which were not available when the PHSYPROP database was assembled.

3.2. Evaluation of calculator performance for low solubility chemicals

Ten different calculation methods for estimating Kow were evaluated against the dataset of measured Kow values for chemicals with high hydrophobicity. Calculator performance varied based on chemical class, as is evidenced by the comparison of RMSE values in Table 3. Benfenati et al. (2003) found a similar dependence of calculator performance on chemical class in their evaluation of log Kow prediction software for a dataset of pesticides. In each column of Table 3, cells were shaded green for the RMSE values in the lowest third of the RMSE values for that chemical class and yellow for the RMSE values in the middle third. The calculators with the best performance (lowest RMSE) were ALOGPS 2.1 for PCBs, and OPERA for PBDEs, PCDDs, and PAHs. The ACD/Labs, ALOGPS 3.0, and OPERA calculators, as well as the arithmetic and geometric means and the median of all calculated Kow values, gave consistently good performance, with none of the RMSE values in the highest third of the RMSE values for the four chemical classes. Table 3 shows that these three calculators, as well as the mean and median calculated values, had RMSE less than 0.4 for all four classes of chemicals.

Table 3.

RMSE for Comparison of Calculated to Measured Values of log Kow for Curated Dataset.

Calculator PCBs PBDEs PCDDs PAHs
ACD/Labs 0.374 0.162 0.202 0.232
Chem Axon: KLOP 0.572 0.387 0.530 0.759
Chem Axon: PHYS 0.593 0.327 0.554 0.854
Chem Axon: VG 0.499 0.322 1.253 0.671
EPI Suite 0.606 0.550 0.891 0.290
NICEATM 0.876 1.487 1.212 1.331
OCHEM: ALOGPS 2.1 0.268 0.679 0.472 0.329
OCHEM: ALOGPS 3.0 0.305 0.377 0.369 0.307
OPERA 0.299 0.115 0.184 0.209
SPARC 0.696 0.754 0.552 0.353
Mean Calculated Kow 0.490 0.293 0.364 0.247
Geometric Mean (Mean of log Kow) 0.366 0.187 0.243 0.347
Median Calculated Kow 0.309 0.134 0.183 0.235

Plots of the calculated vs. measured log Kow values in the Fig. 1 show that the spread in the predicted values from the different calculators increases as log Kow increases. Additionally, while the log Kow values are spread roughly evenly around the identity line (y = x) for lower values of log Kow, the predicted log Kow values are skewed higher than the measured values for PCBs with log Kow values above 7.5. For example, for decachlorobiphenyl, the PCB with the highest log Kow, all of the calculators except NICEATM overestimate the value of Kow. In Fig. 1c, there is limited variability in the calculated and measured values plotted for PCDDs, because slow-stir measurements were only available for a subset of the tetra- and penta-chlorodibenzo-p-dioxins, with measured values ranging between 6.2 and 6.7. The Åberg et al. (2008) review of PCDD property values includes additional Kow measurements made with the RP-HPLC method; however, our analysis of these values indicates that the RP-HPLC measurements are at least a half log unit higher than the values obtained with the slow-stir method. The limited availability of Kow values measured with the slow-stir method for the more highly chlorinated PCDDs constrains the development and evaluation of predictive models for the Kow of PCDDs.

fig. 1.

fig. 1.

Comparison of calculated versus measured values of log Kow for PCBs, PBDEs, PCDDs and PAHs. Line delineates 1:1 relationship (y = x).

Among the nine methods compared for estimated the melting point, the T.E.S.T. Hierarchical method performed best for PCBs and PCDDs, while the Tetko et al. (2014) E-State model performed best for PBDEs, and T.E.S.T. FDA performed best for PAHs (Table 4). The melting point estimation methods that gave consistently good performance across all four chemical classes were NICEATM, the Tetko et al. (2014)E-State model, and the geometric mean and median of all calculated values. Fig. 2shows that almost all of the calculators underpredict the melting point for those PCBs, PCDDs and PAHs with measured melting points above 200 °C. Fig. 2 also shows that most of the calculators overpredict the melting points of the PBDEs in our dataset. This observation and the relatively high RMSE values for PBDE predictions in Table 4are likely due to the limited availability of melting points for PCDD congeners in the training sets of these models. The PBDE plot in Fig. 2 demonstrates a potential pitfall with nearest neighbor methods when limited measured data is included in the training set for a particular class. The T.E.S.T. Nearest Neighbor algorithm assigns the same melting point (349 °C) to all tetra-, penta-, hexa-, hepta, octa- and nona-bromodiphenyl ethers, because this is the average of the melting points of the three molecules in the training set which are determined to be the most similar to all of these PBDEs.

Table 4.

RMSE for comparison of calculated to measured values of melting point (°C) for curated Dataset.

Calculator PCBs PBDEs PCDDs PAHs
EPI Suite 43.79 56.24 62.47 61.91
NICEATM 36.63 48.75 11.30 42.67
OCHEM: dan2097 35.62 30.96 39.44 52.59
OCHEM: itetko 34.87 29.92 33.18 48.64
OPERA 32.43 80.59 13.13 40.21
TEST: FDA 38.45 49.36 14.92 39.64
TEST: Group Contribution 45.59 59.32 47.78 55.24
TEST: Hierarchical 25.50 64.53 5.80 61.14
TEST: Nearest Neighbor 48.95 196.05 42.71 50.80
Mean Calculated MP 30.85 61.83 22.20 42.87
Geometric Mean of Calculated MP 31.01 54.19 23.92 43.72
Median Calculated MP 29.98 54.28 22.75 43.43

Note: Cells are shaded green for values in the lowest third of the RMSE values for the chemical class and yellow for values in the middle third of the RMSE values for the chemical class.

fig. 2.

fig. 2.

Comparison of calculated versus measured values of melting point (in °C) for PCBs, PBDEs, PCDDs and PAHs. Line delineates 1:1 relationship (y = x).

Of the eight methods compared for estimating the vapor pressure, EPI Suite performed best for PCBs and PBDEs, and SPARC performed best for PAHs. For PCDDs, the average of the calculated vapor pressure values had the lowest RMSE, and among the individual calculators, NICEATM had the lowest RMSE. Across all four chemical classes, Table 5 indicates that the vapor pressure estimation methods that gave consistently good performance were SPARC, the geometric mean of all calculated values, and the median of all calculated values. For solids, both EPI Suite and SPARC estimate the solid vapor pressure from the supercooled liquid vapor pressure using a correction term based on the melting point supplied by the user. Kühne et al. (1995) previously demonstrated that the inclusion of such a term improves predictive performance for the estimation of water solubility of solids. The fact that EPI Suite and SPARC generally had lower RMSEs than the other algorithms for estimating vapor pressure suggests that this approach also gives more accurate prediction of vapor pressure, provided that a reliable melting point is available.

Table 5.

RMSE for comparison of calculated to measured values of vapor pressure (torr) for Curated Dataset.

Calculator PCBs PBDEs PCDDs PAHs
EPI Suite 0.000171 0.000202 0.000051 0.002356
NICEATM 0.029109 0.000892 0.000038 0.000334
OPERA 0.000830 0.000300 0.000083 0.001937
SPARC 0.000197 0.000303 0.000053 0.000254
TEST: FDA 0.001458 0.000557 0.000217 0.011250
TEST: Group Contribution 0.002540 0.000325 0.000088 0.002453
TEST: Hierarchical 0.001417 0.000448 0.000202 0.002116
TEST: Nearest Neighbor 0.000313 0.000349 0.000080 0.001997
Mean Calculated VP 0.004395 0.000252 0.000028 0.002136
Geometric Mean of Calculated VP 0.000963 0.000278 0.000042 0.00199
Median Calculated VP 0.001113 0.000262 0.000055 0.002025

Note: Cells are shaded green for values in the lowest third of the RMSE values for the chemical class and yellow for values in the middle third of the RMSE values for the chemical class.

Fig. 3 shows that the spread in the predicted values from the different calculators increases as vapor pressure decreases, particularly for vapor pressures of approximately 10−9 torr or lower. For PCBs and PBDEs with very low measured vapor pressure, all of the calculators except SPARC overestimate the vapor pressure. This can be explained by the fact that the more recent vapor pressure measurements in our curated dataset were lower than the corresponding vapor pressure values in the PHYSPROP database, which was used as a training set for most of the vapor pressure calculators in this analysis.

fig. 3.

fig. 3.

Comparison of calculated versus measured values of vapor pressure (in torr) for PCBs, PBDEs, PCDDs and PAHs. Line delineates 1:1 relationship (y = x).

The plots in Fig. 3 further reveal that some of the calculators that perform well in the higher range of vapor pressure do not perform as well in the low range of vapor pressure, particularly for PBDEs and PAHs. The RMSE values in Table 5 do not fully capture this variability in calculator performance as a function of the magnitude of the vapor pressure, because the RMSE was calculated on the basis of absolute errors rather than relative errors. To more fully capture calculator performance across the entire range of vapor pressure values, the RMSE values were recomputed on the basis of differences in the logarithm of vapor pressure values (Table SI-3). This comparison confirmed that the median and the average of logarithm of the calculated vapor pressure (i.e., the geometric mean) gave consistently good predictive performance across all four chemical classes. The individual calculators with the best performance based on differences in the logarithms of the calculated versus measured vapor pressure values were EPI Suite, T.E.S.T. Group Contribution method, NICEATM and OPERA for PCBs, PBDEs, PCDDs and PAHs, respectively.

Fourteen different methods were compared with respect to their predictive performance for water solubility. Table 6 indicates that T.E.S.T. FDA method had the lowest RMSE for prediction of the water solubility of PCBs, and the T.E.S.T. Hierarchical method has the lowest RMSE for PCDDs and PAHs. SPARC had the best predictive performance for the water solubility of the PBDEs in our dataset. The water solubility estimation methods that gave consistently good performance across all four chemical classes were the WATERNT model in EPI Suite, the ALOGPS 3.0 model, SPARC, the T.E.S.T. Hierarchical model, and the geometric mean and median of all calculated values.

Table 6.

RMSE for comparison of calculated to measured values of water solubility (mg L−1) for curated Dataset.

Calculator PCBs PBDEs PCDDs PAHs
ACD/LabS 0.317 0.468 0.281 2.928
ChemAxon 0.947 0.240 7.704 1.926
EPI Suite: WATERNT 0.317 0.350 0.066 0.257
EPI Suite: WSKOW 0.437 0.029 0.375 1.835
NICEATM 0.998 1.427 2.046 2.423
OCHEM: ALOGPS 2.1 0.377 0.354 48.721 0.331
OCHEM: ALOGPS 3.0 0.340 0.040 0.175 1.204
OCHEM: dan2097 0.420 0.290 1.135 1.369
OPERA 0.171 2.611 0.814 0.546
SPARC 0.176 0.024 0.645 0.790
TEST: FDA 0.071 7.695 0.264 7.143
TEST: Group Contribution 2.360 0.353 0.450 7.196
TEST: Hierarchical 0.197 0.083 0.066 0.168
TEST: Nearest Neighbor 0.394 0.046 0.238 0.851
Mean Calculated WS 0.432 0.980 4.395 1.869
Geometric Mean of Calculated WS 0.220 0.157 0.450 1.114
Median Calculated WS 0.312 0.310 0.375 1.261

Note: Cells are shaded green for values in the lowest third of the RMSE values for the chemical class and yellow for values in the middle third of the RMSE values for the chemical class.

As was the case for vapor pressure, Fig. 4 shows that the spread in the predicted values from the different calculators increases as water solubility decreases. Almost all of the calculators overpredicted the values of the water solubility for the PCDD congeners in our dataset and for PAHs with a measured water solubility less than about 0.01; however, the predicted values are spread roughly evenly around the identity line (y = x) for PCBs and PBDEs. The RMSE values were recomputed on the basis of differences in the logarithms of calculated and measured water solubility values (Table SI-4) to better evaluate calculator performance across the entire range of water solubility values. The comparison based on log-based RMSE showed that the ALOGPS 3.0 model had the lowest RMSE for PBDEs and PCDDs and had the second and third lowest RMSE values for PCBs and PAHs, respectively. Other calculators with good performance based on differences in logarithms of calculated vs measured water solubility values included the T.E.S.T. Hierarchical model for PCBs and OPERA for PAHs. Once again, the median and the average of logarithm of the calculated water solubility (i.e., the geometric mean) gave consistently good predictive performance across all four chemical classes.

fig. 4.

fig. 4.

Comparison of calculated versus measured values of water solubility (in mg L−1) for PCBs, PBDEs, PCDDs and PAHs. Line delineates 1:1 relationship (y = x).

The comparison of the RMSE values in Table 3, Table 4, Table 5, Table 6 reveals that while individual calculators may perform very well for one of the chemical classes, no single calculator outperforms the other calculators for all four classes. When all four chemical classes are combined in a single dataset, the lowest RMSE values for predictions of log Kow, melting point, vapor pressure and water solubility were obtained from OPERA, the Tetko et al. (2014) E-State model, SPARC and the T.E.S.T. Hierarchical model, respectively (Table SI-5 in the Supplementary Data). As was the case when the classes were considered separately, both the geometric mean and the median of the calculated values can be considered robust estimates of the property values for the combined dataset. The RMSE for the median calculated property values ranks as the second and fourth lowest RMSE for Kow and melting point, while the RMSE for the geometric mean of the calculated property values ranks as the fourth and fifth lowest RMSE for vapor pressure and water solubility, respectively.

3.3. Model applicability domain

The OPERA and T.E.S.T. calculators include algorithms to assess whether or not the predicted property value lies in the applicability domain of the model. These assessments are based on the similarity of the molecule of interest to the chemicals included in the model training set (Tropsha and Golbraikh, 2010). When these calculators are run for one chemical at a time, the output reports provide both quantitative and qualitative metrics to assess whether or not the prediction lies within the applicability domain of the model. The OPERA model report provides a Boolean answer (yes/no) as to whether or not the prediction lies within the global applicability domain, as well as a numeric local applicability domain index. The global applicability domain metric is based on the distance between the chemical of interest and the centroid of the training set in multidimensional chemical space, while the local index is based on the distance between the chemical of interest and its five nearest neighbors. To assess whether or not the prediction is within the applicability domain of each method implemented in the T.E.S.T. application, the T.E.S.T. model report compares the mean absolute error (MAE) of the entire training set to the MAE of the training set chemicals with a similarity coefficient greater than 0.5 when compared to the chemical of interest. If the MAE for the set of similar chemicals is less than the MAE for the entire dataset, then the prediction is considered to be within the applicability domain. Conversely, if the MAE for the similar chemicals is larger than that of the entire dataset, the prediction is considered less reliable. Unfortunately, T.E.S.T. does not provide an applicability domain assessment in batch mode, and the OPERA applicability domain results are not available through batch download from the CompTox Chemistry Dashboard.

While we were not able to readily obtain the applicability domain assessment for all molecules in our dataset, we ran individual predictions for the molecules with the largest deviation between the logarithms of the predicted and measured values for each combination of property and chemical class. For this particular dataset of molecules, the applicability domain algorithms implemented in OPERA and T.E.S.T. indicate most of the predictions are within the applicability domain for these models, even for the molecules with the largest logarithmic difference between the calculated and measured property values (Table SI-6). The output from the OPERA model indicates that the predictions are within the applicability domain for all of the 16 property/chemical class pairs except for the vapor pressure predictions for the PBDE and PAH molecules with the with the largest logarithmic deviation between the predicted and measured vapor pressure values. For the four methods implemented in the T.E.S.T. application, the predictions for the molecules with the largest logarithmic deviation between the predicted and measured values are identified as being outside the applicability domain for 42% of the molecules and within the applicability domain for 58% of the molecules. For the T.E.S.T. model, the property with the largest fraction of molecules outside of the applicability domain is the melting point. The T.E.S.T. model output indicates that melting point predictions are outside of the applicability domain for all four calculation methods for the PBDE, PCDD and PAH molecules with the largest logarithmic deviation between predicted and measured melting points.

4. Conclusions

The results of this comparison of calculator performance demonstrate the value of applying a variety of calculators that use different estimation algorithms to estimate the physicochemical property values for a chemical of interest. For a list of structurally related chemicals, a comparison of calculator performance against available measured property values can be used to select the calculator with the best performance for a particular property. Without adequate data to perform a rigorous comparison of individual calculator performance against measured data, the geometric mean and the median of the predicted values from multiple calculators provide more robust estimates of the property value than any individual calculator. Web-based calculator platforms (e.g. OCHEM) and the availability of pre-calculated property values through web-based data platforms (e.g., EPA’s CompTox Chemistry Dashboard) can facilitate the retrieval of predicted property values from multiple calculators. The implementation of calculators as webservices can further facilitate the use of property estimations from multiple calculators by enabling software applications to obtain estimated property values directly from the calculator without user involvement.

In this analysis, the deviation between calculated and measured property values was observed to increase for very high Kow and melting point and for very low vapor pressure and water solubility. In these ranges of the property values, measurement of the properties becomes more challenging. This suggests that the predictive capability of physicochemical property calculators is largely constrained by the limited availability of accurate measured values in these marginal ranges of the property values. Many of the software applications considered in this analysis are still being refined and improved; therefore, it is likely that calculator performance will improve as the models are re-trained on curated datasets of property values for chemicals of low solubility and high hydrophobicity.

Supplementary Material

Info

Acknowledgements

The views expressed in this paper are those of the authors and do not necessarily represent the views or policies of the U.S. Environmental Protection Agency (EPA). Mention of trade names or products does not convey, and should not be interpreted as conveying, official EPA approval, endorsement or recommendation. This research was supported in part by an appointment to the Internship/Research Participation Program at the National Exposure Research Laboratory administered by the Oak Ridge Institute for Science and Education through Interagency Agreement No. DW-922983301–01 between the U.S. Department of Energy and the U.S. EPA.

Footnotes

Appendix A. Supplementary data

The following is the supplementary data related to this article:

References

  1. Åberg A, MacLeod M, Wiberg K 2008. Physical-Chemical property data for dibenzo-p-dioxin (DD), dibenzofuran (DF), and chlorinated DD/Fs: A critical review and recommended Values. J. Phys. Chem. Ref. Data 37(4): 1997–2008. [Google Scholar]
  2. Bradley J-C, Williams A, Lang A 2014. Jean-Claude Bradley open melting point dataset. figshare. 10.6084/m9.figshare.1031637.v2 Retrieved: 20:55, August 15, 2017. (GMT). [DOI] [Google Scholar]
  3. Braekevelt E, Tittlemeier SA, Tomy GT 2003. Direct measurement of octanol–water partition coefficients of some environmentally relevant brominated diphenyl ether congeners. Chemosphere 51: 563–567. [DOI] [PubMed] [Google Scholar]
  4. Contrera JF, Matthews EJ, Benz RD 2003. Predicting the carcinogenic potential of pharmaceuticals in rodents using molecular structural similarity and E-state indices. Reg. Toxicol. Pharmacol 38: 243–259. [DOI] [PubMed] [Google Scholar]
  5. Dalby A, Nourse JG, Hounshell WD, Gushurst AKI Grier DL, Leland BA, Laufer J. 1992. Description of several chemical structure file formats used by computer programs developed at Molecular Design Limited. J. Chem. Inf. Comput. Sci 32(3): 244–255. [Google Scholar]
  6. De Bruijn J, Busser F, Seinen W, Hermens J 1989. Determination of octanol/water partition coefficients for hydrophobic organic chemicals with the “slow-stirring” method. Environ. Toxicol. Chem 8(6): 499–512. [Google Scholar]
  7. de Maagd PG-J, ten Hulscher DTEM, van den Heuvel H, Opperhuizen A and Sijm DTHM. 1998. Physicochemical properties of polycyclic aromatic hydrocarbons: Aqueous solubilities, n-octanol/water partition coefficients, and Henry’s law constants. Environ. Toxicol. Chem 17(2): 251–257. [Google Scholar]
  8. Doucette WJ, Andren AW 1988. Aqueous solubility of selected biphenyl, furan, and dioxin congeners. Chemosphere 17: 243–252. [Google Scholar]
  9. Finizio A, Vighi M, Sandroni D 1997. Determination of n-octanol-water partition coefficient (Kow) of pesticide: Critical review and comparison of methods. Chemosphere 34: 131–161. [Google Scholar]
  10. Friesen KJ, Sarna LP, Webster GRB 1985. Aqueous solubility of polychlorinated dibenzo-p-dioxins determined by high pressure liquid chromatography. Chemosphere 14: 1267–1274. [Google Scholar]
  11. Fu J, Suuberg EM 2011. Vapor pressure of solid polybrominated diphenyl ethers determined via Knudsen effusion method. Environ. Toxicol. Chem 30(10): 2216–2219. [DOI] [PMC free article] [PubMed] [Google Scholar]
  12. Guha R, Willighagen E 2012. A survey of quantitative descriptions of molecular structure, Curr. Top. Med. Chem 12 (2012): 1946–1956. [DOI] [PMC free article] [PubMed] [Google Scholar]
  13. Hansch C, Quinlan JE, Lawrence GL 1968. The linear free-energy relationship between partition coefficients and the aqueous solubility of organic liquids, J. Org. Chem 33: 347–350. [Google Scholar]
  14. Hansch C, Leo AJ 1979. Substituent Constants for Correlation Analysis in Chemistry and Biology; Wiley: New York, 1979. [DOI] [PubMed] [Google Scholar]
  15. Heller S, McNaught A, Stein S, Tchekhovskoi D, Pletnev I 2013. InChI - the worldwide chemical structure identifier standard. J. Cheminform 5: 7. [DOI] [PMC free article] [PubMed] [Google Scholar]
  16. Heller SR, McNaught A, Pletnev I, Stein S, Tchekhovskoi D 2015. InChI, the IUPAC international chemical identifier. J. Cheminform 7: 23. [DOI] [PMC free article] [PubMed] [Google Scholar]
  17. Hilal SH, Carreira LA, Karickhoff SW 2003. Prediction of the vapor pressure, boiling point, heat of vaporization and diffusion coefficient of organic compounds. QSAR & Combinatorial Science 22: 565–574. [Google Scholar]
  18. Hilal SH, Carreira LA Karickhoff, S.W. 2004. Prediction of the solubility, activity coefficient, gas/liquid and liquid/liquid distribution coefficients of organic compounds, QSAR & Combinatorial Science 23: 709–720. [Google Scholar]
  19. Hou TJ, Xia K, Zhang W, Xu XJ 2004. ADME evaluation in drug discovery. 4. prediction of aqueous solubility based on atom contribution approach, J. Chem. Inf. Comput. Sci 44: 266–275. [DOI] [PubMed] [Google Scholar]
  20. Jabusch TW, Swackhamer DL 2005. Partitioning of polychlorinated biphenyls in octanol/water, triolein/water, and membrane/water systems. Chemosphere 60(9): 1270–1278. [DOI] [PubMed] [Google Scholar]
  21. Jones KC, de Voogt P 1999. Persistent organic pollutants (POPs): state of the science. Environ. Pollut 100: 209–221. [DOI] [PubMed] [Google Scholar]
  22. Karickhoff SW, Brown DS, Scott TA 1979. Sorption of hydrophobic pollutants on natural sediments. Water Res 13: 241–248. [Google Scholar]
  23. Klamt A, Jonas V, Bürger T, Lohrenz JC 1998. Refinement and parametrization of COSMO-RS. J. Phys. Chem. A 102(26): 5074–5085. [Google Scholar]
  24. Klopman G, Li J-Y, Wang S, Dimayuga M. 1994. Computer automated log P calculations based on an extended group contribution approach. J. Chem. Inf. Comput. Sci 34: 752–781. [Google Scholar]
  25. Kochev NT, Paskaleva VH, Jeliazkova N 2013. Ambit-Tautomer: An open source tool for tautomer generation. Mol. Inform 32: 481–504. [DOI] [PubMed] [Google Scholar]
  26. Li N, Wania F, Lei YD, Daly GL 2003. A comprehensive and critical compilation, evaluation, and selection of physical–chemical property data for selected polychlorinated biphenyls. J. Phys. Chem. Ref. Data 32(4): 1545–1590. [Google Scholar]
  27. Li X-W, Shibata E, Kasai E, Nakamura T 2004. Vapor pressures and enthalpies of sublimation of 17 polychlorinated dibenzo-p-dioxins and five polychlorinated dibenzofurans. Environ. Toxicol. Chem 23(2): 348–354. [DOI] [PubMed] [Google Scholar]
  28. Mackay D, Shiu WY 1977. Aqueous solubility of polynuclear aromatic hydrocarbons. J. Chem. Eng. Data 22: 399–402. [Google Scholar]
  29. Mansouri K, Grulke CM, Richard AM, Judson RS, Williams AJ 2016. An automated curation procedure for addressing chemical errors and inconsistencies in public datasets used in QSAR modelling, SAR and QSAR in Environmental Research 27: 911–937. [DOI] [PubMed] [Google Scholar]
  30. Martin TM, Harten P, Venkatapathy R, Das S Young DM. A hierarchical clustering methodology for the estimation of toxicity, toxicology mechanisms and methods 18 (2008): 251–266. [DOI] [PubMed] [Google Scholar]
  31. Martin TM, Harten P, Young DM, Muratov EN, Golbraikh A, Zhu H, Tropsha A 2012. Does rational selection of training and test sets improve the outcome of QSAR modeling J. Chem. Inf. Model 52(10): 2570–2578. [DOI] [PubMed] [Google Scholar]
  32. Martin TM, Young DM. 2001. Prediction of the acute toxicity (96-h LC50) of organic compounds in the fathead minnow (Pimephales Promelas) using a group contribution method. Chem. Res. Toxicol 14: 1378–1385. [DOI] [PubMed] [Google Scholar]
  33. Meylan WM, Howard PH 1995. Atom/Fragment contribution method for estimating octanol-water partition coefficients, Journal of Pharmaceutical Sciences, 84: 83–92. [DOI] [PubMed] [Google Scholar]
  34. Meylan WM, Howard PH, Boethling RS 1996. Improved method for estimating water solubility from octanol/water partition coefficient. Environ. Toxicol. Chem 15: 100–106. [Google Scholar]
  35. O’Boyle NM, Guha R, Willighagen EL, Adams SE, Alvarsson J, Bradley JC, Filippov IV, Hanson RM, Hanwell MD, Hutchison GR, James CA, Jeliazkova N, Lang AS, Langner KM, Lonie DC, Lowe DM, Pansanel J, Pavlov D, Spjuth O, Steinbeck C, Tenderholt AL, Theisen KJ, Murray-Rust P 2011. Open data, open source and open standards in chemistry: The blue obelisk five years on. J Cheminform 3(1): 37. [DOI] [PMC free article] [PubMed] [Google Scholar]
  36. Odabasi M, Cetin E, Sofuoglu A 2006. Determination of octanol–air partition coefficients and supercooled liquid vapor pressures of PAHs as a function of temperature: Application to gas–particle partitioning in an urban atmosphere. Atmos. Environ 40: 6615–6625. [Google Scholar]
  37. OECD. 2006. OECD guidelines for the testing of chemicals: 123 partition coefficient (1-Octanol/Water): Slow-Stirring Method. Paris. [Google Scholar]
  38. Oleszek-Kudlak S, Shibata E, Nakamura T 2007. Solubilities of selected PCDDs and PCDFs in water and various chloride solutions. J. Chem. Eng. Data 52: 1824–1829. [Google Scholar]
  39. Petrauskas AA, Kolovanov EA 2000. ACD/Log P method description. Perspect. Drug Discov 19(1): 99–116. [Google Scholar]
  40. Richardson SD, Kimura SY 2016. Water analysis: emerging contaminants and current issues. Anal. Chem 88(1): 546–582. [DOI] [PubMed] [Google Scholar]
  41. Rordorf BF. 1989. Prediction of vapour pressures, boiling points and enthalpies of fusion for twenty-nine halogenated dibenzo-p-dioxins and fifty-five dibenzofurans by a vapor pressure correlation method. Chemosphere 18: 783–788. [Google Scholar]
  42. Shiu WY, Doucette W, Gobas FAPC, Andren A, Mackay D 1988. Physical-chemical properties of chlorinated dibenzo-p-dioxins. Environ. Sci. Technol 22: 651–658. [Google Scholar]
  43. Shiu WY, Mackay D 1986. A critical review of aqueous solubilities, vapor pressures, Henry’s Law [Google Scholar]
  44. Constants, and octanol-water partition coefficients of the polychlorinated biphenyls. J. Phys. Chem. Ref. Data 15(2): 911–929. [Google Scholar]
  45. Sijm DTHM, Wever H, de Vries PJ, Opperhuizen A 1989. Octan-1-ol / water partition coefficients of polychlorinated dibenzo-p-dioxins and dibenzofurans: Experimental values determined with a stirring method. Chemosphere. 19(1–6): 263–266. [Google Scholar]
  46. Sonnefeld WJ, Zoller WH, May WE 1983. Dynamic coupled-column liquid chromatographic determination of ambient temperature vapor pressures of polynuclear aromatic hydrocarbons. Anal. Chem 55: 275–280. [Google Scholar]
  47. Svetnik V Liaw A. Tong C. Culberson JC, Sheridan RP, Feuston BP. 2003. Random Forest: A classification and regression tool for compound classification and QSAR modeling. J. Chem. Inf. Comput. Sci 43: 1947–1958. [DOI] [PubMed] [Google Scholar]
  48. Syracuse Research Corporation. Physical/Chemical Property Database (PHYSPROP); SRC Environmental Science Center: Syracuse, NY, 1994. http://esc.syrres.com/interkow/pp1357.htm [Google Scholar]
  49. Tebes-Stevens CL, Patel JM, Jones WJ, Weber EJ 2017. Prediction of hydrolysis products of organic chemicals under environmental pH conditions. Environ. Sci. Technol 51: 5008–5016. [DOI] [PMC free article] [PubMed] [Google Scholar]
  50. Tetko IV, Sushko Y, Novotarskyi S, Patiny L, Kondratov I, Petrenko AE, Charochkina L, Asiri AM 2014. How accurately can we predict the melting points of drug-like compounds? J. Chem. Inf. Model, 54: 3320–3329. [DOI] [PMC free article] [PubMed] [Google Scholar]
  51. Tetko I, Tanchuk VY, Villa AEP 2001. Prediction of n-octanol/water partition coefficients from PHYSPROP database using artificial neural networks and E-state indices, J. Chem. Inf. Comput. Sci 41: 1407–1421. [DOI] [PubMed] [Google Scholar]
  52. Tetko IV, Tanchuk VY 2002. Application of associative neural networks for prediction of lipophilicity in ALOGPS 2.1 program. J. Chem. Inf. Comput. Sci 42: 1136–1145. [DOI] [PubMed] [Google Scholar]
  53. Tetko IV, Williams AJ, Lowe D 2016. The development of models to predict melting and pyrolysis point data associated with several hundred thousand compounds mined from PATENTS, J. Cheminform 8:2. [DOI] [PMC free article] [PubMed] [Google Scholar]
  54. Tittlemeier SA, Halldorson T, Stern GA, Tomy GT 2002. Vapor pressures, aqueous solubilities, and Henry’s Law Constants of some brominated flame retardants. Environ. Toxicol. and Chem 21(9): 1804–1810. [PubMed] [Google Scholar]
  55. Tolls J, Bodo K, De Felip E, Dujardin R, Kim YH, Moeller-Jensen L, Mullee D, Nakajima A, Paschke A, Pawliczek J-B, Schneider J, Tadeo J-L, Tognucci AC, Webb J, Zwijzen AC 2003. Slow-stirring method for determining the n-octanol/water partition coefficient (Pow) for highly hydrophobic chemicals: performance evaluation in a ring test. Environ. Toxicol. Chem 22: 1051–1057. [PubMed] [Google Scholar]
  56. Tropsha A 2010. Best practices for QSAR model development, validation, and exploitation. Mol. Inform 29(6‐7): 476–488. [DOI] [PubMed] [Google Scholar]
  57. Tropsha A, Golbraikh A 2010. “Predictive quantitative structure-activity relationships modeling.” Handbook of chemoinformatics algorithms 33: 211. [Google Scholar]
  58. U.S. EPA. 1996. Product Properties Test Guidelines 830.7950: Vapor Pressure, EPA/712/C-96/043. United States Environmental Protection Agency, Washington, DC, USA. [Google Scholar]
  59. U.S. EPA. 1998a. Product Properties Test Guidelines 830.7840 Water Solubility: Column Elution Method; Shake Flask Method, EPA/712/C-98/041. United States Environmental Protection Agency, Washington, DC, USA. [Google Scholar]
  60. EPA US. 1998b. Product Properties Test Guidelines 830.7860. Water Solubility, Generator Column Method, EPA/712/C-98/042. United States Environmental Protection Agency, Washington, DC, USA. [Google Scholar]
  61. U.S. EPA. 2012. Estimation Programs Interface Suite™ for Microsoft® Windows, v 4.11. United States Environmental Protection Agency, Washington, DC, USA. [Google Scholar]
  62. U.S. EPA 2016. User’s Guide for T.E.S.T (version 4.2) (Toxicity Estimation Software Tool): A Program to Estimate Toxicity from Molecular Structure, EPA/600/R-16/058. United States Environmental Protection Agency, Washington, DC, USA. [Google Scholar]
  63. Viswanadhan VN, Ghose AK, Revankar GR, Robins RK 1989. Atomic physicochemical parameters for three dimensional structure directed quantitative structure-activity relationships. 4. Additional parameters for hydrophobic and dispersive interactions and their application for an automated superposition of certain naturally occurring nucleoside antibiotics. J. Chem. Inf. Comput. Sci 29: 163–172. [Google Scholar]
  64. Wania F, Dugani CB 2003. Assessing the long-range transport potential of polybrominated diphenyl ethers: A comparison of four multimedia models. Environ. Toxicol. Chem 22(6): 1252–1261. [PubMed] [Google Scholar]
  65. Wania F, Mackay D 1996. Tracking the distribution of persistent organic pollutants. Environ. Sci. Technol 30(9): 390A–396A. [DOI] [PubMed] [Google Scholar]
  66. Weininger D 1988. SMILES, a chemical language and information system. 1. introduction to methodology and encoding rules. J. Chem. Inf. Comput. Sci 28: 31–36. [Google Scholar]
  67. Weisbrod AV, Burkhard LP, Arnot J, Mekenyan O, Howard PH, Russom C, Boethling R, Sakuratani Y, Traas T, Bridges T, Lutz C, Bonnell M, Woodburn K, Parkerton T 2007. Workgroup report: review of fish bioaccumulation databases used to identify persistent, bioaccumulative, toxic substances. Environ. Health Perspect 115: 255–61. [DOI] [PMC free article] [PubMed] [Google Scholar]
  68. Wolfe K, Pope N, Parmar R, Galvin M, Stevens C, Weber E, Flaishans J, Purucker T 2016. Chemical Transformation System: Cloud based cheminformatic services to support integrated environmental modeling. Proceedings of the 8th International Congress on Environmental Modelling and Software; Toulouse, France, July 10-14, 2016; ISBN: 978-88-9035-745-9. [Google Scholar]
  69. Wong A, Lei YD, Alaee M, Wania F 2001. Vapor pressures of the polybrominated diphenyl ethers. J. Chem. Eng. Data 46:239–242 [Google Scholar]
  70. Yap CW 2011. PaDEL-descriptor: an open source software to calculate molecular descriptors and fingerprints. J. Comput. Chem 32(7): 1466–1474. [DOI] [PubMed] [Google Scholar]
  71. Young D, Martin T, Venkatapathy R, Harten P 2008. Are the chemical structures in your QSAR correct? Mol. Inform 27(11–12): 1337–1345. [Google Scholar]
  72. Zang Q, Mansouri K, Williams A, Judson R, Allen D, Casey W, Kleinstreuer N 2017. In silico prediction of physicochemical properties of environmental chemicals using molecular fingerprints and machine learning. J. Chem. Inf. Model. 57: 36–49. [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Info

RESOURCES