Skip to main content
EPA Author Manuscripts logoLink to EPA Author Manuscripts
. Author manuscript; available in PMC: 2020 Feb 1.
Published in final edited form as: Regul Toxicol Pharmacol. 2018 Oct 22;101:12–23. doi: 10.1016/j.yrtph.2018.10.013

Integrating Data Gap Filling Techniques: A Case Study Predicting TEFs for Neurotoxicity TEQs to Facilitate the Hazard Assessment of Polychlorinated Biphenyls

Prachi Pradeep a,b, Laura M Carlson c, Richard Judson b, Geniece M Lehmann c, Grace Patlewicz b
PMCID: PMC6756469  NIHMSID: NIHMS1531137  PMID: 30359698

Abstract

The application of toxic equivalency factors (TEFs) or toxic units to estimate toxic potencies for mixtures of chemicals which contribute to a biological effect through a common mechanism is one approach for filling data gaps. Toxic Equivalents (TEQ) have been used to express the toxicity of dioxin-like compounds (i.e., dioxins, furans, and dioxin-like polychlorinated biphenyls (PCBs)) in terms of the most toxic form of dioxin: 2,3,7,8-tetrachlorodibenzo-p-dioxin (2,3,7,8-TCDD). This study sought to integrate two data gap filling techniques, quantitative structure–activity relationships (QSARs) and TEFs, to predict neurotoxicity TEQs for PCBs. Simon et al. (2007) previously derived neurotoxic equivalent (NEQ) values for a dataset of 87 PCB congeners, of which 83 congeners had experimental data. These data were taken from a set of four different studies measuring different effects related to neurotoxicity, each of which tested overlapping subsets of the 83 PCB congeners. The goals of the current study were to: (i) evaluate an alternative neurotoxic equivalent factor (NEF) derivations from an expanded dataset, relative to those derived by Simon et al., and (ii) develop QSAR models to provide NEF estimates for the large number of untested PCB congeners. The models used multiple linear regression, support vector regression, k-nearest neighbor and random forest algorithms within a 5-fold cross validation scheme. and position-specific chlorine substitution patterns on the biphenyl scaffold as descriptors. Alternative NEF values were derived but the resulting QSAR models had relatively low predictivity (RMSE ~0.24). This was mostly driven by the large uncertainties in the underlying data and NEF values. The derived NEFs and the QSAR predicted NEFs to fill data gaps should be applied with caution.

Keywords: PCB congeners, neurotoxicity, Toxic equivalency factors (TEFs), QSAR

1. Introduction

Grouping similar compounds together to form chemical categories is a well-established practice in regulatory science, particularly in the context of read-across or other data gap filling techniques. If a grouping approach is applied, endpoint or property data for chemicals can be used to estimate the corresponding endpoints or properties for untested chemicals. There are three main data gap filling techniques; read-across, trend analysis and quantitative structure–activity relationships (QSARs) [13]. These common techniques are described in much more detail in associated regulatory technical guidance documents such as the OECD grouping guidance [4]. One of the less common data gap filling techniques is the use of toxic equivalent factors (TEFs). The principle requirement for derivation and use of these factors is that the chemicals of interest act via a common mechanism, such that this approach is strictly applicable for mixtures of chemicals that have been formally grouped based on mechanistic considerations. Furthermore, toxicity data relevant to the common mechanism must be available for each component chemical in the mixture, a data requirement that is not often met.

Development of TEFs and the TEF methodology date back to the 1980s [5]. The TEF approach was introduced to facilitate the hazard assessment of polychlorinated dibenzo-p-dioxins (PCDDs), polychlorinated dibenzofurans (PCDFs) and dioxin-like polychlorinated biphenyls (PCBs), a group of persistent environmental chemicals. These chemicals exist as complex mixtures of various congeners in the environment [6]. Several PCDDs, PCDFs and PCBs have been shown to cause toxic responses like those caused by 2,3,7,8-tetrachlorodibenzo-p-dioxin (TCDD), the most potent chemical within this group of compounds [58]. These dioxin-like toxic responses, including dermal toxicity, immunotoxicity, carcinogenicity in rodents, and adverse effects on reproduction, development and endocrine functions, are mediated through the aryl hydrocarbon receptor (AhR), a cytosolic receptor protein present in most vertebrate tissues with high affinity for some PCDD, PCDF, and PCB congeners [911]. Data from many studies with mixtures of these compounds were consistent with an additive model (in references cited within [7]); and, as a result, the toxic equivalency concept was developed. TEFs reflect the relative effect potency (REP) determined for individual congeners relative to a reference compound; for PCDDs, PCDFs, and dioxin-like PCBs, the reference compound is 2,3,7,8-TCDD. There are several criteria for chemicals assessed using the 2,3,7,8-TCDD-based TEFs; namely, a compound must show a structural relationship to PCDDs and PCDFs, bind to the AhR, and elicit AhR-mediated biochemical and toxic responses. The total toxic equivalence (TEQ) of a mixture is operationally defined by the sum of the products of the concentration of each congener multiplied by its TEF value and is an estimate of the total 2,3,7,8-TCDD-like activity of the mixture (see equations 1, 2 below and Table 1) [4]. TEF values range from 0–1 [58].

Table 1:

Illustrating the TEF and TEQ equations 1 and 2

Dioxin like component Toxic Equivalency Factor (TEF)
PCDD-1 1 (reference compound)
PCDD-2 0.76
PCDD-3 0.1
PCDD-4 0.04
Sample =
20%PCDD-
1+60%PCDD-
2+ 10%PCDD-
3+ 10%PCDD-4
TEQ (sample) =
(0.2*1) +
(0.6*0.76) +
(0.1*0.1) +
(0.1*0.04)
  ⇨0.67
TEF(component A)=Reference effect value¯Component A effect Value¯ (1)
TEQ=(concentration×TEF) (2)

Of interest in this case study were PCBs, especially those that do not meet the criteria required for assessment using the 2,3,7,8-TCDD-based TEFs. PCBs consist of a biphenyl scaffold substituted with varying numbers of chlorine atoms with the chemical formula C12H(10-x)Clx where x ranges from 1 – 10 [12, 13]. There are a total of 209 unique PCB congeners [1214]. Each PCB can be uniquely identified based on the number and position of chlorine substitutions on the biphenyl scaffold (Figure 1(a)). For example, 2,3’,4’,5-tetrachlorobiphenyl is a congener with four chlorine substitutions on the biphenyl scaffold: chlorines at the positions 3’ and 4’ on phenyl ring one and chlorines at the positions 2 and 5 on phenyl ring two (Figure 1(b)).

Figure 1:

Figure 1:

(a). Polychlorinated biphenyl scaffold. Positions marked 1–5 and 1’−5’ represent the different ortho, meta and para substitution positions on the scaffold, (b). Structure of 2,3’,4’,5-Tetrachlorobiphenyl (PCB-70).

PCBs produce adverse effects by several mechanisms associated with the position and number of chlorine substitutions on each congener [1519]. The best-known mechanism involves the binding of a set of 12 PCBs with 0 – 1 chlorine substitution at the “ortho” positions (i.e., 2, 2’, 6, and 6’) to the Ah receptor. These congeners are all structurally capable forming a planar conformation required for docking with AhR [1519]. This dioxin-like behavior makes these congeners applicable to the 2,3,7,8-TCDD-based TEF approach. However, PCB congeners with 2 or more ortho substitutions do not interact with the Ah receptor since the ortho chlorines form a barrier of rotation that prevents the molecule from assuming a planar conformation [1519]. Several studies have linked these non-planar congeners to neurotoxic outcomes despite the inability to interact with the Ah receptor [16, 17, 20]. This observation has been supported by structure–activity relationship (SAR) and QSAR models [2123]. It has also been shown that meta (i.e., at positions 3, 3’, 5, and 5’) and para (i.e., at positions 4 and 4’) substitutions may also affect the neurotoxic potential of PCBs [20, 21, 24, 25]. Simon et al. [26] proposed and developed an initial neurotoxic equivalence (NEQ) scheme for non-dioxin-like PCB congeners using various in vitro neurotoxicity experimental data and used their scheme to derive NEQ values for 87 congeners. Simon et al. [26] termed the scheme NEQ where NEQ values ranged from 0 to 1. To be more consistent with the terminology of TEFs and TEQs and how these are derived, we have chosen to rename the NEQs that Simon et al. [26] derived to neurotoxic equivalent factors (NEFs) throughout the remainder of this article. The experimental data are derived from four in vitro neurotoxicity assays that measure endpoints including alterations in protein kinase C translocation, changes in dopamine (DA) uptake, and formation of reactive oxygen species (ROS) [17,22,28,29]. In addition to data from Simon et al. [26], we have included additional data related to dopamine uptake and signaling from Stenberg et al. [18] and Wigestrand et al. [27]. According to Simon et al. [26], the endpoints used in the derivation of NEF values may be related by “similar cellular or biochemical mechanisms, or the endpoints may be separate but occur in parallel fashion and appear to be related on an organismal level”. Thus, the derivation of PCB NEFs in this work relies on the rationale of Simon et al [26] that these endpoints are important mediators of PCB congener neurotoxicity.

The goals of this study were to: (i) evaluate whether alternative NEF values could be derived from an expanded dataset, relative to those derived by Simon et al. [26]; and (ii), evaluate the development of QSAR models using the alternative NEF values derived in (i) as training data to predict NEF values for the remaining untested 122 PCB congeners using novel structure based fingerprints based on the number and position of chlorine substitutions.

2. Methods

2.1. Dataset

The experimental data were taken from Simon et al. [26], who compiled potency data for effects related to neurotoxicity from four experimental datasets, Stenberg et al. [18] and Wigestrand et al. [27] The measures of potency were EC50 (µM) or IC50 values for all the effects except Stenberg data, which were expressed as a percentage of the control uptake for different concentrations measured.

  1. Twenty-seven congeners were tested for the first effect, protein kinase C (PKC) translocation measured by [3H]phorbol ester binding (PEB) in rat cerebellar granule cells [28]. PCB congeners can perturb cellular Ca2+ homeostasis and PKC translocation in vitro. PKC translocation can be a result of several factors including a rise in intracellular Ca2+. The authors hypothesized that the in vitro effects of PCB congeners may be related to interaction at specific chlorine substitution sites. To test this, the translocation of PKC from cytosol to the plasma membrane of the cells was measured by PEB which is an indication of PKC association with the plasma membrane, and is increased with an increase in the concentration of free intracellular Ca2+ [28]. This dataset is referred to as “Kodavanti PEB” in Figure 2.

  2. Thirty-seven congeners were tested for the second effect, microsomal and mitochondrial Ca2+ sequestration in rat cerebellum [29]. Increase in free cytosolic Ca2+ can activate PKC since it is dependent on intracellular Ca2+. This study postulated that PCB congeners can potentially interfere with intracellular Ca2+ sequestration by inhibiting Ca2+ uptake into mitochondria and microsomes. To test that hypothesis, uptake of Ca2+ was measured in the isolated rat mitochondria and microsomes [29]. These datasets are referred to as “Kodavanti Mito” and “Kodavanti Micro” in Figure 2.

  3. Twenty-four congeners were tested for the third effect, reduction in dopamine content in PC-12 cells in vitro. PCBs have been known to cause decreased dopamine function and altered cognitive functions in nonhuman primates [17]. This study used a PC12 cell culture as an alternative to animal testing to test the effect of PCBs on dopamine function. The results led to the hypothesis that the PCBs exert their neurotoxic effect by regulation of brain dopamine content related to chlorine substitutions at specific positions [17]. This dataset is referred to as “Shain PC-12 dop” in Figure 2.

  4. Thirty-six congeners were tested for the fourth effect, enhanced ryanodine receptor type 1 (RyR1) activity using [3H]ryanodine ([3H]Ry) binding analysis [22]. The authors have hypothesized that non-dioxin like PCBs enhance the activity of ryanodine sensitive Ca2+ channels embedded in the sarco/endoplasmic reticulum (ryanodine receptor (RyR)). Two isoforms of RyR, RyR1 and RyR2, are found to be predominant in the brain. Toxicity data support the role of RyR to neurotoxicity of non-dioxin like PCBs. This study tested whether PCB congeners could alter RyR activity as observed in [3H]ryanodine binding assays and compared those toxicity effects with the functional behavior of RyRs themselves [22]. Subsequent to the Simon et al. [26] publication, more data were generated on 14 untested congeners by Holland et al. [24] for enhanced RyR1 activity using the same experimental protocol as in the previous study by Pessah et al. [22]. Data on these congeners were also included in our study, except for PCB 208, which had a very low EC50 value as measured by Holland et al. [24]. We considered this to be potential outlier. This dataset was combined with the Pessah et al. dataset described earlier and referred to as “Pessah/Holland RyR1” in Figure 2.

  5. Seventeen PCB congeners were tested in the fifth effect, inhibition of vesicular transport-mediated (v) uptake of dopamine and glutamate into synaptosomes [18, 30] provided as part of supplemental file [18]. For the analysis in this work, data for dopamine inhibition at 20 µM were used. This dataset is referred to as “Stenberg Dop” in Figure 2.

  6. Seventeen PCB congeners were tested in the sixth effect, inhibition of membrane transport-mediated uptake of dopamine (DAT) measured in striatum or whole braine (br) [18, 30] provided as part of supplemental file [18]. For the analysis in this work, data for DAT at 40 µM were used. This dataset is referred to as “Stenberg DAT” in Figure 2.

  7. Thirteen PCB congeners were tested in the seventh effect, PCB interference with [3H]WIN-35,428 binding at DAT in rat striatal synaptosomes (DAT IC50 in the supplemental data file) [27]. The authors point out that uptake of dopamine into synaptosomes is sensitive to PCB exposure. This dataset is referred to as “Wigestrand DAT” in Figure 2.

Figure 2:

Figure 2:

Scatter plot matrix demonstrating the correlation between the neurotoxic relative potency (REP) estimates from each experimental source with each other and the derived NEQ values by Simon et al. [26] and in this work. The first five columns represent the REP values derived from experimental data (Kodavanti PEB [28], Shain PC-12 dop [17], Kodavanti Micro [29], Kodavanti Mito [29], Pessah/Holland RyR1 [22, 24], Stenberg Dop [18], Stenberg DAT [18], and Wigestrand DAT [27]) and the last two columns are the NEF values derived by Simon et al. [26] and NEF values derived in our work. The Pearson’s correlation coefficient for each pair of values is shown in the inset. The diagonal shows a histogram of REP/NEF values corresponding to each source. As shown, some experiments are poorly correlated in their measurement of neurotoxic effects for the PCBs.

2.2. Simon et al. Neurotoxic equivalent factors

Using the datasets described above, Simon et al. [26] derived a neurotoxic relative potency (REP) value for each tested congener as the ratio of lowest EC50 value in each study to the EC50 value for the congener. Thus, the most potent congener (lowest EC50) has a REP value of 1 whereas less potent congeners have REP values in the range 0–1. These REP values were further used to calculate a NEF value using the following scheme: (i) for a congener with 2 or more REP values, the NEF value was an average of the REP values, (ii) for a congener with just 1 REP value, either that value was considered as the NEF value or the empirical Bayes estimate derived using the technique described in Svensgaard et al. [31] was taken, and (iii) for a congener with no experimentally derived REP value, either a statistical estimate of the neurotoxic REP developed from the PKC translocation data using the Bayes technique described in Svensgaard et al. [31] or a value based on a structurally similar congener was considered. Like REP values, NEF values also range from 0–1.

2.3. Derivation of alternative NEF values

Scaling potency values by the lowest EC50 value is prone to introducing bias since only a fraction of congeners were tested in each experiment, and the most potent congener in that fraction is assigned a REP value of 1, ignoring the effect of potentially more potent untested congeners. Additionally, use of an empirical estimate or a value from a structurally similar congener when experimental data are lacking adds another layer of uncertainty to the NEF predictions. As such, we describe a different approach for deriving NEF values. Similar to the Simon et al. [26] methodology, a REP value was derived for each congener in each study. The REP values were then used to derive an average NEF value for each congener. The steps for deriving the REP and NEF values are as follows for data obtained from Simon et al. [26] and Wigestrand et al. [27]: (i) All congeners that resulted in no effect observed (NEO), i.e. did not show any effects until the threshold of testing concentration was reached, could not be measured because they were above solubility limit (ASL), or were tested to be inactive in the individual studies were assigned a REP value of 0 (inactive); (ii) Each congener with a quantitative EC50/IC50 value was assigned a REP value equal to the ratio of median EC50/IC50 value in each study to the EC50/IC50 value for that congener; (iii) Since the REP (and NEF) values should be between 0–1, they were scaled by dividing by the maximum value of the ratio in step 2 within each study; (iv) the REP values for data from Stenberg et al. [18] were provided as a percentage of the control uptake for different concentrations measured and were converted into a REP value as 100 minus percentage uptake divided by 100; and (v) the NEF values (listed in Table 2) for all congeners were calculated as the average of the REP values across all seven studies. The original experimental EC50 values and the derived NEF values from Simon et al. [26] are provided as part of supplemental data. Along with additional data used in this study, 87 congeners had measured potency data in one or more experiments and had derived alternative NEF values that were used as training data for the QSAR model development.

Table 2:

Alternative neurotoxic equivalent factors (NEF) derived for 87 congeners using the experimental data from four sources compiled by Simon et al [26], as well as data from Holland et al. [24], Stenberg et al. [18], and Wigestrand et al. [27].

Congener Number Derived NEF
1 0.352
2 0.000
3 0.000
4 0.506
5 0.370
6 0.370
7 0.320
9 0.158
10 0.604
11 0.336
12 0.379
13 0.000
14 0.182
15 0.000
18 0.445
19 0.344
20 0.287
21 0.000
24 0.244
25 0.000
26 0.239
27 0.095
28 0.394
30 0.228
31 0.364
33 0.346
39 0.085
41 0.068
44 0.368
47 0.451
49 0.401
50 0.497
51 0.533
52 0.592
53 0.377
54 0.000
56 0.356
66 0.100
69 0.821
70 0.141
74 0.405
75 0.271
77 0.025
80 0.130
84 0.172
95 0.611
96 0.403
99 0.241
100 0.512
101 0.514
103 0.275
104 0.564
105 0.487
110 0.591
111 0.000
118 0.275
122 0.140
123 0.000
126 0.007
127 0.000
128 0.297
132 0.172
133 0.260
136 0.502
138 0.324
149 0.443
151 0.201
153 0.162
155 0.410
156 0.363
157 0.000
159 0.000
163 0.161
169 0.000
170 0.245
171 0.478
176 0.378
179 0.214
180 0.293
181 0.000
183 0.237
187 0.169
190 0.346
194 0.660
197 0.225
202 1.000

2.4. Molecular descriptors

PCBs share a common biphenyl scaffold and differ only in the position and number of chlorine substitutions (Figure 1(a)). Thus, a PCB congener can be adequately represented using a custom fingerprint where each bit encodes information on the number of chlorine substitutions at each defined position. In this study, three custom fingerprints were developed based on positional equivalency on the biphenyl scaffold as summarized in Table 3. The process was automated using KNIME software [32] to decompose the PCBs into R-groups based on the positions of the substitutions and using biphenyl as a scaffold. Each R-group is a chlorine atom substituted on one of the ten positions on the biphenyl scaffold. This information from KNIME was converted into a data matrix where each row represented a congener and each column represented a substitution position. The presence or absence of a chlorine atom at each of the ten available substitution positions was stored as 0 (absent) or 1 (present). The matrix was then used to construct three custom fingerprints (Table 3) each of which consider one or more substitution positions as equivalent. Custom fingerprint 1 considers each substitution position as uniquely independent and is comprised of 11 bits. Bits 1–10 represent the number of substitutions at positions 1–5 and 1’−5’ (Figure 1(a)) whereas bit 11 represents the total number of chlorine substitutions. Custom fingerprint 2 is comprised of 4 bits where all ortho (1, 1’, 5, 5’; Figure 1(a)), meta (2, 2’, 4, 4’; Figure 1(a)) and para (3, 3’; Figure 1(a)) positions on both the rings are considered as equivalent. Bits 1–3 represent total number of chlorine substitutions on the ortho, meta and para positions, respectively whereas bit 4 represents the total number of chlorine substitutions. Custom fingerprint 3 is comprised of 6 bits where ortho (1, 1’, 5, 5’; Figure 1(a)) and meta positions (2, 2’, 4, 4’; Figure 1(a)) on the same ring, and para (3, 3’; Figure 1(a)) positions on both the rings are considered as equivalent. Bits 1–4 represent the total number of chlorine substitutions on the ortho and meta positions on ring 1 and ring 2, bit 5 represents the total number of chlorine substitutions on the para positions, and bit 6 represents the total number of chlorine substitutions. As an example, PCB-70 shown in Figure 1(b) would be represented as bit string ‘01100010014’ using custom fingerprint 1, ‘1214’ using custom fingerprint 2 and ‘011114’ using custom fingerprint 3. Select whole molecule descriptors were also calculated to evaluate whether these improved the performance of the QSAR models developed. Seven molecular descriptors spanning physicochemical, steric and electronic properties (listed in Table 4 with a brief description) were calculated using molecular operating environment (MOE) software [33].

Table 3:

Summary of custom fingerprints for PCBs based on positional and total number of chlorine substitutions.

Fingerprint Number Description Number of bits Bit representation details
1 Each substitution position 11 1–10: Positions 1–5 and 1’-5’
11: Total no. of Cl substitutions
2 Ortho, meta, and para positional equivalency 4 1 (Ortho): Positions 1, 1’, 5, 5’
2 (Meta): Positions 2, 2’, 4, 4’
3 (Para): Positions 3, 3’
4: Total no. of Cl substitutions
3 Same ring positional equivalency 6 1 (Ortho ring 1): 1, 5
2 (Ortho ring 2): 1’, 5’
3 (Meta ring 1): 2, 4
4 (Meta ring 2): 2’, 4’
5 (Para): 3, 3’
6: Total no. of Cl substitutions

Table 4:

9 molecular properties including physicochemical, steric and electronic properties were calculated and used as additional descriptors for development of predictive models. However, the performance metrics of the models (not shown in the manuscript) did not improve.

Descriptor Number Descriptor Name Description
1 balabanJ Balaban’s connectivity topological index [Balaban 1982].
2 logP(o/w) Log of the octanol/water partition coefficient (including implicit hydrogens). This property is calculated from a linear atom type model [LOGP 1998] with r2 = 0.931, RMSE=0.393 on 1,847 molecules.
3 SlogP Log of the octanol/water partition coefficient (including implicit hydrogens). This property is an atomic contribution model [Crippen 1999] that calculates logP from the given structure; i.e., the correct protonation state (washed structures). Results may vary from the logP(o/w) descriptor. The training set for SlogP was ~7000 structures.
4 PM3_Eele The electronic energy (kcal/mol) calculated using the PM3 Hamiltonian [MOPAC].
5 PM3_HOMO The energy (eV) of the Highest Occupied Molecular Orbital calculated using the PM3 Hamiltonian [MOPAC].
6 PM3_IP The ionization potential (kcal/mol) calculated using the PM3 Hamiltonian [MOPAC].
7 PM3_LUMO The energy (eV) of the Lowest Unoccupied Molecular Orbital calculated using the PM3 Hamiltonian [MOPAC].
8 vol van der Waals volume calculated using a grid approximation (spacing 0.75 A).
9 AM1_dipole The dipole moment calculated using the AM1 Hamiltonian [MOPAC].

2.4. QSAR model development

Four machine learning algorithms (ordinary least squares linear regression, support vector regression, k-nearest neighbor, random forests and a consensus ensemble) were used to develop QSAR models for predicting NEF values (response) for PCBs (instances) using the three custom fingerprints as descriptors (predictors).

  1. Linear regression (LR) is a supervised regression technique that models the response variable as a linear combination of predictor variables. The sum of squares between the observed and the predicted response value (cost function) is minimized to obtain the optimal relationship between the response and the predictor variables [34, 35].

  2. Support vector machines (SVM) is a non-parametric machine learning algorithm that calculates an optimal hyperplane in a high-dimensional space that can be used for classification and regression problems. In case of non-linear relationships, kernel functions are used to map the non-linear relationships in a higher dimension. The distance of the training data from the hyperplanes or margin of tolerance (cost function) is minimized to determine the optimal hyperplane that separates the training instances. The version of SVMs for regression is called support vector regression (SVR) [34, 36, 37]. The algorithm hyper-parameters tuned for developing the models were: kernel type (‘linear’ or ‘rbf’) to be used in the algorithm (kernel), penalty parameter of the error term (C), kernel coefficient for ‘rbf’ (gamma) and epsilon-tube within which no penalty is associated in the training loss function (epsilon).

  3. K-nearest neighbor (kNN) is an instance-based non-parametric algorithm that relies on the principle that similar instances within a certain proximity have similar responses. In kNN regression the predicted response for an instance is determined using the response values of its k nearest neighbors [38, 39]. The algorithm hyper-parameters tuned for developing the models were: weight function used in prediction (weights), number of neighbors (k) and the algorithm used to compute the nearest neighbors (algorithm).

  4. Random forest (RF) is an example of ensemble machine learning methods, which constructs modified bagging ensembles of random decision trees. Each tree gives a predicted response for an instance and the final predicted response is the average prediction from all the trees in the ensemble [34, 40]. The algorithm hyper-parameters tuned for developing the models were: number of trees in the forest (n_estimators), number of descriptors to consider when looking for the best split (max_features), and minimum number of samples required to be at a leaf node (min_samples_split). Every node in the decision trees built in a random forest are based on the value of one of the descriptors used to model the response. The impact of each descriptor can be assessed by averaging the results across all the trees in the forest. By such an analysis, it is possible to calculate the importance of each descriptor that is used to build the model.

All the QSAR models were developed using each of the custom fingerprints within a 5-fold cross-validation scheme, where the dataset was split 80%, 20% into training and test sets respectively. The models were built using the 80% training set and were evaluated using 5-fold internal cross validation. The 20% test set was used for external validation. The performance of each QSAR model was evaluated using the root mean squared error (RMSE). Next, the models were built using the hyper-parameters (for each machine learning algorithm optimized in the internal cross-validation with 80% data) to develop models using the entire training dataset, which were then used to predict the NEF values for the congeners with unknown NEF values. The software code for data analysis and model development was written in Python 2.7 [41]. The models were developed using the sklearn package and the hyper-parameters for each machine learning language were tuned using the gridsearch function. The code is available as part of the supplementary information.

3. Results and Discussion

3.1. Neurotoxic equivalence (NEF) values

Alternative NEF values are listed in Table 2. Figure 2 shows a scatter plot with correlation coefficients (Pearson correlation) between REP values from 7 individual experimental datasets, NEF values derived by Simon et al. [26] and alternative NEF values derived in this study. PCBs with a derived NEF value of zero were not included in this scatter plot to avoid bias in correlation coefficients due to the high number of inactives. As shown in figure 2, there are limited data on each congener from more than one experimental data source. When there are data available from multiple sources, there is low correlation in the experimental outcome. For example, only 9 PCBs were measured for both protein kinase C translocation (Kodavanti PEB [42]) and enhanced RyR1 activity (Pessah RyR1 [22]), and the correlation coefficient between the two assays is 0.19, indicating very low concordance. The NEF values derived by Simon et al. [26] and in this work, were compared to the REP values from each experiment and with each other. The correlation coefficient between the individual experiments with the alternative NEF values derived in this work and NEF values derived by Simon et al. [26] were very similar, with an exception to Kodavanti PEB [43], Pessah/Holland RyR1 [22, 24], Stenberg Dop [18] and Stenberg DAT [18] which correlated better with the alternative NEF values. The correlation coefficients derived by Simon et al. [26] and NEF values from this work are: (0.50, 0.66) for protein kinase C translocation effect [42], (0.73, 0.76) for reduction in dopamine content effect [17], (0.72, 0.69) and (0.82, 0.84) for microsomal and mitochondrial Ca+2-sequestration effect, respectively [29],(0.40, 0.70) for enhanced RyR1 activity [22], inhibition of dopamine uptake (0.52, 0.85 - Stenberg Dop [18]; 0.57, 0.69 – Stenberg DAT [18]; 0.58, 0.49 – Wigestrand DAT [27]). Overall, we derived alternative NEF values for 87 congeners that were used to develop QSAR models (Table 2). NEF values derived by Simon et al. [26] shifted towards higher values than the NEFs derived in this study, suggesting that more PCBs have a higher NEF value compared to the NEF values derived in this study. The reason for this observed association could be because the distributions of values from the different sources of experimental data were different, including the fraction of inactive chemicals. We also used slightly different approaches for deriving NEF values, such as using median values instead of averaging (as described in 2.1). All of this contributes to the data variability, and will impact the predictivity of any QSAR models being developed.

3.2. QSAR model

QSAR models were developed using 4 different machine learning algorithms (LR, SVM, kNN and RF) using each of the three custom fingerprints and the alternatively derived NEQ values with 5-fold cross validation. The performance of the models was compared using root mean squared error (RMSE) and coefficient of determination (R2) as a metric. The RMSE values for all the different models derived were found to be similar with values ranging from 0.21 – 0.24 (Figure 3(a)). No model significantly outperformed any of the others, but R2 for the SVR models were slightly larger than those for the LR, kNN and RF models (Figure 3(b)). Custom fingerprint 3 showed the best predictive performance across all the models based on the RMSE and R2 values. Figure 4 shows the plot of derived versus predicted NEQ values for all the four models using fingerprint 3. Adding the physicochemical descriptors did not improve the models (results not shown). Even though the RMSE values are within the variability of REP values in the underlying dataset (Figure 5, discussed later), the zero and negative values of R2 indicate that the QSAR models show low predictivity. The failure to build robust and reliable PCB QSAR models for NEF prediction can be attributed to two major factors:

  1. The derivation and quality of NEF values: As discussed in the methods, the data used to derive the NEQ values are taken from seven datasets obtained from several experimental sources. There are multiple mechanisms through which PCBs can exert their neurotoxic effect, and each individual experiment used here measured PCB neurotoxic potential via a different mechanism. The experimental data from each of the sources do not have high concordance with each other as shown in Figure 2. Figure 5 shows a boxplot of the REP values and the NEF values superimposed as a red dot for each congener with experimental data. As shown, there is high variability in the REP values for most of the congeners with data from more than one source. High variance in training data tends to be a limiting factor in development of a predictive model and imposes an upper bound on model predictivity. The histogram in inset (Figure 5) shows the range of REP values for PCBs with data from multiple sources. As shown, the mean value of the range of REP values is 0.19, which illustrates the variance in the training data. The width of the REP range can be roughly interpreted as the lowest possible value of RMSE that can be achieved from a perfect predictive model. PCBs are highly volatile chemicals which are hard to dissolve in a solvent media. This may impact the accuracy of experimentally measured AC50 or other potency values. Additionally, PCBs occur in the form of mixtures and purity of the individual isolated PCBs may vary from laboratory to laboratory, which may result in low concordance between the experimental measurements. Finally, there are inherent errors in measurement which contribute to the variance in experimental data.

  2. Quantity of available experimental data: There are a limited number of PCBs that were tested in each of the experimental assays. The number of data points limits the ability of a machine learning algorithm to learn the structure-activity relationships well.

Figure 3:

Figure 3:

5-fold cross validation performance metrics of the QSAR models with the three fingerprints as measured using (a) root mean squared error (RMSE), and (b) coefficient of determination (R2). Note that all the methods result in poor models and fingerprint 3 results in the most stable metrics across all methods.

Figure 4:

Figure 4:

Comparison of the derived neurotoxicity equivalent factors (NEFs) with the predicted values using 5-fold cross validation. (a) Linear regression model, (b) support vector regression model, (c) k-nearest neighbor model, and (d) random forest model. The black solid line denotes perfect correspondence between derived and predicted values. The green dotted lines denote the ±0.1 error interval. The legend on top left in each graph shows the RMSE and R2 values from each method.

Figure 5:

Figure 5:

Box plot distributions of the neurotoxic relative potency (REP) values derived using experimental data for each PCB congener. The congeners (on x-axis) are ordered according to increasing number of chlorine substitutions from left to right. The variation in the REP values is represented as mean (red line) ± standard deviation (error bars). The red dots superimposed on each box plot corresponds to the derived NEF value for each PCB congener. As shown, there is high variance in the REP values for congeners which have data from more than one experimental source. The figure in the inset shows the distribution of the range of REP values. As shown, the mean range for the REP value is 0.28.

Further, to make a prediction for the 126 PCB congeners with no NEF value, the models were re-developed using the complete training dataset (87 PCBs with derived NEF values). Table 5 lists the predicted NEF values for all 209 congeners from all the machine learning models using fingerprint 3. The predictions for any given congener appear quite similar for any of the modelling approaches and across the congeners themselves emphasizing the lack of robustness in the models on account of the underlying experimental data quality and variability. Figure 6 shows the relative importance of each bit in the fingerprint in predicting NEQ using the RF algorithm. As shown, the total number of chlorine substitutions is the most important predictor of neurotoxic effects of PCBs. However, a potential disadvantage of calculating descriptor importance is that if the dataset has correlated features, any of those can be randomly selected to get the best split, resulting in higher importance for one versus the other.

Table 5:

Predicted NEF values for the all 209 congeners from the models developed using linear regression, support vector regression, k-nearest neighbors and random forest, SVR, kNN and RF algorithms and fingerprint 3.

Congener Number Predicted NEF
Linear Regression Support Vector Regression k-nearest Neighbor Random Forest
1 0.240 0.352 0.286 0.272
2 0.250 0.225 0.245 0.133
3 0.231 0.232 0.169 0.110
4 0.252 0.399 0.286 0.418
5 0.241 0.370 0.123 0.338
6 0.262 0.370 0.415 0.354
7 0.276 0.320 0.140 0.308
8 0.327 0.294 0.325 0.350
9 0.339 0.209 0.114 0.240
10 0.295 0.377 0.372 0.481
11 0.288 0.312 0.239 0.286
12 0.339 0.376 0.264 0.337
13 0.258 0.192 0.250 0.108
14 0.266 0.182 0.155 0.108
15 0.249 0.196 0.369 0.075
16 0.286 0.258 0.345 0.203
17 0.279 0.181 0.000 0.065
18 0.291 0.399 0.302 0.360
19 0.322 0.344 0.305 0.338
20 0.278 0.287 0.159 0.248
21 0.249 0.161 0.054 0.086
22 0.290 0.388 0.387 0.520
23 0.306 0.292 0.628 0.405
24 0.279 0.244 0.278 0.220
25 0.259 0.194 0.117 0.118
26 0.238 0.239 0.201 0.232
27 0.303 0.196 0.229 0.128
28 0.282 0.385 0.249 0.312
29 0.291 0.265 0.238 0.213
30 0.270 0.281 0.197 0.293
31 0.270 0.281 0.197 0.293
32 0.282 0.385 0.249 0.312
33 0.294 0.346 0.325 0.337
34 0.219 0.274 0.238 0.179
35 0.271 0.313 0.286 0.317
36 0.229 0.294 0.245 0.280
37 0.284 0.252 0.184 0.189
38 0.294 0.315 0.228 0.321
39 0.315 0.201 0.212 0.133
40 0.327 0.273 0.200 0.294
41 0.273 0.280 0.290 0.288
42 0.252 0.313 0.121 0.285
43 0.252 0.313 0.121 0.285
44 0.295 0.368 0.414 0.369
45 0.293 0.297 0.299 0.262
46 0.282 0.327 0.245 0.356
47 0.273 0.280 0.290 0.288
48 0.324 0.287 0.304 0.249
49 0.261 0.401 0.378 0.379
50 0.274 0.466 0.363 0.502
51 0.285 0.404 0.375 0.449
52 0.274 0.466 0.363 0.502
53 0.303 0.377 0.323 0.353
54 0.243 0.213 0.107 0.089
55 0.303 0.316 0.442 0.383
56 0.201 0.356 0.119 0.240
57 0.297 0.200 0.231 0.172
58 0.276 0.320 0.140 0.308
59 0.297 0.200 0.231 0.172
60 0.317 0.280 0.232 0.243
61 0.285 0.303 0.285 0.267
62 0.306 0.298 0.171 0.341
63 0.306 0.298 0.171 0.341
64 0.318 0.332 0.337 0.376
65 0.327 0.314 0.358 0.330
66 0.297 0.200 0.231 0.172
67 0.348 0.271 0.114 0.222
68 0.359 0.283 0.193 0.254
69 0.264 0.287 0.471 0.439
70 0.264 0.287 0.471 0.439
71 0.276 0.276 0.292 0.353
72 0.285 0.300 0.352 0.425
73 0.288 0.246 0.143 0.278
74 0.243 0.388 0.289 0.342
75 0.286 0.271 0.415 0.351
76 0.284 0.299 0.289 0.324
77 0.279 0.207 0.112 0.133
78 0.284 0.299 0.289 0.324
79 0.274 0.277 0.096 0.244
80 0.234 0.182 0.178 0.145
81 0.255 0.172 0.057 0.207
82 0.295 0.377 0.372 0.481
83 0.267 0.277 0.150 0.225
84 0.255 0.172 0.057 0.207
85 0.315 0.311 0.374 0.354
86 0.285 0.300 0.352 0.425
87 0.257 0.278 0.033 0.218
88 0.265 0.272 0.257 0.271
89 0.277 0.295 0.320 0.304
90 0.223 0.265 0.170 0.249
91 0.213 0.277 0.166 0.243
92 0.244 0.298 0.292 0.349
93 0.234 0.291 0.169 0.355
94 0.254 0.306 0.337 0.326
95 0.309 0.384 0.284 0.465
96 0.321 0.402 0.220 0.391
97 0.267 0.288 0.285 0.221
98 0.319 0.320 0.495 0.368
99 0.288 0.312 0.239 0.286
100 0.330 0.423 0.357 0.424
101 0.300 0.503 0.475 0.532
102 0.319 0.320 0.495 0.368
103 0.308 0.275 0.370 0.285
104 0.278 0.380 0.355 0.419
105 0.351 0.406 0.262 0.443
106 0.290 0.319 0.503 0.383
107 0.329 0.314 0.371 0.274
108 0.341 0.339 0.503 0.346
109 0.288 0.312 0.239 0.286
110 0.300 0.503 0.475 0.532
111 0.298 0.192 0.251 0.068
112 0.309 0.319 0.468 0.285
113 0.288 0.312 0.239 0.286
114 0.300 0.503 0.475 0.532
115 0.350 0.337 0.539 0.344
116 0.255 0.248 0.455 0.164
117 0.268 0.276 0.240 0.248
118 0.268 0.276 0.240 0.248
119 0.276 0.276 0.522 0.290
120 0.279 0.207 0.112 0.133
121 0.297 0.261 0.401 0.294
122 0.300 0.212 0.384 0.168
123 0.317 0.195 0.261 0.156
124 0.225 0.285 0.198 0.230
125 0.276 0.281 0.198 0.193
126 0.266 0.182 0.155 0.108
127 0.247 0.141 0.101 0.041
128 0.258 0.297 0.314 0.250
129 0.276 0.276 0.522 0.290
130 0.276 0.276 0.522 0.290
131 0.276 0.276 0.522 0.290
132 0.327 0.201 0.385 0.231
133 0.235 0.260 0.178 0.200
134 0.225 0.276 0.178 0.221
135 0.300 0.298 0.347 0.319
136 0.300 0.298 0.347 0.319
137 0.312 0.429 0.430 0.401
138 0.320 0.324 0.326 0.315
139 0.290 0.293 0.242 0.334
140 0.333 0.363 0.350 0.376
141 0.341 0.306 0.175 0.285
142 0.353 0.336 0.344 0.324
143 0.249 0.271 0.202 0.215
144 0.279 0.181 0.000 0.065
145 0.322 0.295 0.171 0.186
146 0.300 0.298 0.347 0.319
147 0.290 0.269 0.155 0.244
148 0.281 0.252 0.169 0.264
149 0.312 0.429 0.430 0.401
150 0.363 0.336 0.405 0.312
151 0.300 0.298 0.347 0.319
152 0.290 0.269 0.155 0.244
153 0.281 0.252 0.169 0.264
154 0.320 0.324 0.326 0.315
155 0.332 0.410 0.282 0.383
156 0.333 0.363 0.350 0.376
157 0.279 0.181 0.000 0.065
158 0.291 0.399 0.302 0.360
159 0.342 0.221 0.285 0.089
160 0.259 0.250 0.146 0.186
161 0.268 0.184 0.226 0.098
162 0.271 0.279 0.108 0.227
163 0.249 0.161 0.054 0.086
164 0.261 0.250 0.062 0.111
165 0.278 0.287 0.159 0.248
166 0.309 0.279 0.067 0.254
167 0.282 0.258 0.054 0.122
168 0.257 0.268 0.159 0.192
169 0.268 0.184 0.226 0.098
170 0.319 0.245 0.139 0.237
171 0.217 0.354 0.213 0.337
172 0.261 0.251 0.146 0.169
173 0.303 0.293 0.296 0.247
174 0.282 0.284 0.121 0.305
175 0.324 0.306 0.355 0.294
176 0.333 0.378 0.301 0.349
177 0.293 0.286 0.317 0.333
178 0.344 0.320 0.392 0.344
179 0.302 0.214 0.223 0.209
180 0.323 0.293 0.389 0.306
181 0.365 0.215 0.247 0.149
182 0.261 0.251 0.146 0.169
183 0.281 0.237 0.265 0.212
184 0.272 0.283 0.318 0.293
185 0.303 0.293 0.296 0.247
186 0.354 0.283 0.256 0.139
187 0.302 0.214 0.223 0.209
188 0.314 0.346 0.284 0.318
189 0.302 0.214 0.223 0.209
190 0.314 0.346 0.284 0.318
191 0.249 0.300 0.379 0.299
192 0.269 0.282 0.387 0.280
193 0.252 0.278 0.183 0.230
194 0.290 0.388 0.387 0.520
195 0.273 0.283 0.628 0.307
196 0.293 0.276 0.479 0.391
197 0.285 0.225 0.524 0.383
198 0.336 0.294 0.524 0.466
199 0.314 0.312 0.624 0.519
200 0.306 0.292 0.479 0.487
201 0.326 0.372 0.524 0.721
202 0.326 0.372 0.524 0.721
203 0.347 0.291 0.431 0.567
204 0.293 0.276 0.479 0.391
205 0.305 0.294 0.524 0.447
206 0.265 0.281 0.129 0.301
207 0.317 0.294 0.524 0.489
208 0.338 0.303 0.479 0.640
209 0.330 0.291 0.479 0.444

Figure 6:

Figure 6:

Descriptor importance (scale 0–1) calculated using the random forest model. The error bars indicate the inter-trees variability. As shown, total number of chlorine substitutions is the most informative factor in predicting neurotoxic effects of the PCBs.

Conclusions

The toxic equivalency approach has facilitated the risk assessment of PCBs associated with health effects driven by aryl hydrocarbon receptor activity [68]. For non-dioxin-like PCBs with neurotoxic effects, a comparable neurotoxic equivalence scheme was developed by Simon et al. [26], which relied upon in vitro potency data for 83 congeners. This study: (i) builds upon the NEF scheme proposed by Simon et al. by deriving an updated set of NEF estimates for an extended dataset that address the experimental measurement bias at extreme concentrations, and (ii) uses these alternative NEFs in conjunction with scaffold-based features for PCBs for the development of QSAR models to predict NEF values for untested PCB congeners with no experimental data. The QSAR models developed were not robust. Multiple machine learning models and chemical descriptor sets were used, and all culminated in the order of the same low level of predictivity. We believe that the predictivity is limited by the quality and quantity of underlying experimental data rather than by the modeling approaches used. Figures 2 and 5 illustrate the key issue of lack of correspondence between the different assays for the same chemical. Unfortunately, each of the individual assays covered too few chemicals for practical model building by each mode of action. One practical next step would be to use this modeling framework (scaffolds and machine learning) to prioritize the selection of a significant number of PCB congeners to be experimentally tested in one or more of the current assays. Such a bootstrapping approach should provide an efficient way to map out the structure activity relationship in such a well-defined chemical space.

Supplementary Material

Data_analysis_Code
Prediction_Code
Supplement1
Supplement2
Supplement3

Acknowledgments:

This work was supported in part by an appointment to the ORISE participant research program supported by an interagency agreement between the US EPA and DOE. The authors wish to acknowledge Ted Simon, Jeff Gift and Prasada Rao Kodavanti for their helpful and insightful comments.

Footnotes

Publisher's Disclaimer: Disclaimer:

The views expressed in this paper are those of the authors and do not necessarily reflect the views or policies of the U.S. Environmental Protection Agency. Mention of trade names or commercial products does not constitute endorsement or recommendation for use.

References

  • 1.Patlewicz G, et al. , Use of category approaches, read-across and (Q)SAR: General considerations. Regulatory Toxicology and Pharmacology, 2013. 67(1): p. 1–12. [DOI] [PubMed] [Google Scholar]
  • 2.van Leeuwen K, et al. , Using chemical categories to fill data gaps in hazard assessment. SAR and QSAR in Environmental Research, 2009. 20(3–4): p. 207–220. [DOI] [PubMed] [Google Scholar]
  • 3.Patlewicz G, et al. , Navigating through the minefield of read-across tools: A review of in silico tools for grouping. Computational Toxicology, 2017. 3: p. 1–18. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 4.OECD 2014. Guidance on grouping of chemicals OECD Series on Testing and Assessment No. 194. Organisation for Economic Co-operation and Development, Paris, France: 2016. [Google Scholar]
  • 5. EPA. 2010 https://rais.ornl.gov/documents/dioxin_tef.pdf. [Google Scholar]
  • 6.Van den Berg M, et al. , Toxic equivalency factors (TEFs) for PCBs, PCDDs, PCDFs for humans and wildlife. Environ Health Perspect, 1998. 106(12): p. 775–92. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 7.Van den Berg M, et al. , The 2005 World Health Organization reevaluation of human and Mammalian toxic equivalency factors for dioxins and dioxin-like compounds. Toxicol Sci, 2006. 93(2): p. 223–41. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 8.Safe SH, Development validation and problems with the toxic equivalency factor approach for risk assessment of dioxins and related compounds. Journal of Animal Science, 1998. 76(1): p. 134. [DOI] [PubMed] [Google Scholar]
  • 9.Safe S, et al. , PCBs: structure-function relationships and mechanism of action. Environ Health Perspect, 1985. 60: p. 47–56. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 10.Safe SH, Comparative Toxicology and Mechanism of Action of Polychlorinated Dibenzo-P-Dioxins and Dibenzofurans. Annual Review of Pharmacology and Toxicology, 1986. 26(1): p. 371–399. [DOI] [PubMed] [Google Scholar]
  • 11.Becker RA, et al. , The adverse outcome pathway for rodent liver tumor promotion by sustained activation of the aryl hydrocarbon receptor. Regulatory Toxicology and Pharmacology, 2015. 73(1): p. 172–190. [DOI] [PubMed] [Google Scholar]
  • 12.Mills Iii SA, Thal DI, and Barney J, A summary of the 209 PCB congener nomenclature. Chemosphere, 2007. 68(9): p. 1603–1612. [DOI] [PubMed] [Google Scholar]
  • 13.Ahlborg UG, Hanberg A, and Kenne K, Risk assessment of polychlorinated biphenyls (PCBs). 1992: Nordic Council of Ministers. [Google Scholar]
  • 14.Mullins MD, et al. , High-resolution PCB analysis: synthesis and chromatographic properties of all 209 PCB congeners. Environmental Science & Technology, 1984. 18(6): p. 468–476. [DOI] [PubMed] [Google Scholar]
  • 15.IARC monographs on the evaluation of carcinogenic risks to humans, volume 103 ; bitumens and bitumen emissions, and some N- and S-heterocyclic aromatic hydrocarbons IARC monographs on the evaluation of carcinogenic risks to humans. 2013, Lyon: IARC Press; 342. [PMC free article] [PubMed] [Google Scholar]
  • 16.Seegal RF, Bush B, and Shain W, Neurotoxicology of ortho-substituted polychlorinated biphenyls. Chemosphere, 1991. 23(11–12): p. 1941–1949. [Google Scholar]
  • 17.Shain W, Bush B, and Seegal R, Neurotoxicity of polychlorinated biphenyls: Structure-activity relationship of individual congeners. Toxicology and Applied Pharmacology, 1991. 111(1): p. 33–42. [DOI] [PubMed] [Google Scholar]
  • 18.Stenberg M, et al. , Multivariate toxicity profiles and QSAR modeling of non-dioxin-like PCBs – An investigation of in vitro screening data from ultra-pure congeners. Chemosphere, 2011. 85(9): p. 1423–1429. [DOI] [PubMed] [Google Scholar]
  • 19.Ruiz P, et al. , Prediction of the health effects of polychlorinated biphenyls (PCBs) and their metabolites using quantitative structure–activity relationship (QSAR)☆☆☆. Toxicology Letters, 2008. 181(1): p. 53–65. [DOI] [PubMed] [Google Scholar]
  • 20.Seegal RF, Bush B, and Shain W, Lightly chlorinated ortho-substituted PCB congeners decrease dopamine in nonhuman primate brain and in tissue culture. Toxicol Appl Pharmacol, 1990. 106(1): p. 136–44. [DOI] [PubMed] [Google Scholar]
  • 21.Kodavanti PRS and Tilson HA, Structure-activity relationships of potentially neurotoxic PCB congeners in the rat. Neurotoxicology, 1997. 18(2): p. 425–441. [PubMed] [Google Scholar]
  • 22.Pessah IN, et al. , Structure−Activity Relationship for Noncoplanar Polychlorinated Biphenyl Congeners toward the Ryanodine Receptor-Ca2+Channel Complex Type 1 (RyR1). Chemical Research in Toxicology, 2006. 19(1): p. 92–101. [DOI] [PubMed] [Google Scholar]
  • 23.Rayne S and Forest K, Quantitative structure-activity relationship (QSAR) studies for predicting activation of the ryanodine receptor type 1 channel complex (RyR1) by polychlorinated biphenyl (PCB) congeners. Journal of Environmental Science and Health, Part A, 2010. 45(3): p. 355–362. [DOI] [PubMed] [Google Scholar]
  • 24.Holland EB, et al. , An Extended Structure–Activity Relationship of Nondioxin-Like PCBs Evaluates and Supports Modeling Predictions and Identifies Picomolar Potency of PCB 202 Towards Ryanodine Receptors. Toxicological Sciences, 2017. 155(1): p. 170–181. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 25.Seegal RF, Brosch KO, and Bush B, Polychlorinated biphenyls produce regional alterations of dopamine metabolism in rat brain. Toxicology Letters, 1986. 30(2): p. 197–202. [DOI] [PubMed] [Google Scholar]
  • 26.Simon T, Britt JK, and James RC, Development of a neurotoxic equivalence scheme of relative potency for assessing the risk of PCB mixtures. Regulatory Toxicology and Pharmacology, 2007. 48(2): p. 148–170. [DOI] [PubMed] [Google Scholar]
  • 27.Wigestrand MB, et al. , Non-dioxin-like PCBs inhibit [3H]WIN-35,428 binding to the dopamine transporter: A structure–activity relationship study. NeuroToxicology, 2013. 39: p. 18–24. [DOI] [PubMed] [Google Scholar]
  • 28.Kodavanti PR, et al. , Increased [3H]phorbol ester binding in rat cerebellar granule cells and inhibition of 45Ca(2+) buffering in rat cerebellum by hydroxylated polychlorinated biphenyls. Neurotoxicology, 2003. 24(2): p. 187–98. [DOI] [PubMed] [Google Scholar]
  • 29.Kodavanti PRS, et al. , Inhibition of microsomal and mitochondrial Ca2+ sequestration in rat cerebellum by polychlorinated biphenyl mixtures and congeners - Structure activity relationships. Archives of Toxicology, 1996. 70(3–4): p. 150–157. [DOI] [PubMed] [Google Scholar]
  • 30.Mariussen E and Fonnum F, The effect of polychlorinated biphenyls on the high affinity uptake of the neurotransmitters, dopamine, serotonin, glutamate and GABA, into rat brain synaptosomes. Toxicology, 2001. 159(1–2): p. 11–21. [DOI] [PubMed] [Google Scholar]
  • 31.Svendsgaard DJ, et al. , Empirical modeling of an in vitro activity of polychlorinated biphenyl congeners and mixtures. Environ Health Perspect, 1997. 105(10): p. 1106–15. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 32.Berthold MR, et al. , KNIME - the Konstanz information miner: version 2.0 and beyond. ACM SIGKDD Explorations Newsletter, 2009. 11(1): p. 26. [Google Scholar]
  • 33.Molecular Operating Environment (MOE), 2013.08; Chemical Computing Group Inc, 1010 Sherbooke St. West, Suite #910, Montreal, QC, Canada, H3A 2R7, 2016: 2016. [Google Scholar]
  • 34.Schapire RE and Freund Y, Foundations of Machine Learning. Boosting: Foundations and Algorithms, 2012: p. 23–52. [Google Scholar]
  • 35.Mendenhall W, Sincich T, and Boudreau NS, A second course in statistics: regression analysis. Vol. 5 1996: Prentice Hall Upper Saddle River^e; New Jersey New Jersey. [Google Scholar]
  • 36.Cortes C and Vapnik V, Support-Vector Networks. Machine Learning, 1995. 20(3): p. 273–297. [Google Scholar]
  • 37.Smola AJ and Schölkopf B, A tutorial on support vector regression. Statistics and Computing, 2004. 14(3): p. 199–222. [Google Scholar]
  • 38.Kotsiantis SB, Zaharakis I, and Pintelas P, Supervised machine learning: A review of classification techniques. 2007. [Google Scholar]
  • 39.Altman NS, An Introduction to Kernel and Nearest-Neighbor Nonparametric Regression. The American Statistician, 1992. 46(3): p. 175–185. [Google Scholar]
  • 40.Breiman L, Random forests. Machine Learning, 2001. 45(1): p. 5–32. [Google Scholar]
  • 41.Python Software Foundation. Python Language Reference, version 2.7. Available at http://www.python.org.
  • 42.Kodavanti PR, et al. , Increased [3H]phorbol ester binding in rat cerebellar granule cells by polychlorinated biphenyl mixtures and congeners: structure-activity relationships. Toxicol Appl Pharmacol, 1995. 130(1): p. 140–8. [DOI] [PubMed] [Google Scholar]
  • 43.Kodavanti PR, et al. , Increased [3H]phorbol ester binding in rat cerebellar granule cells and inhibition of 45Ca2+ sequestration in rat cerebellum by polychlorinated diphenyl ether congeners and analogs: structure-activity relationships. Toxicol Appl Pharmacol, 1996. 138(2): p. 251–61. [DOI] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Data_analysis_Code
Prediction_Code
Supplement1
Supplement2
Supplement3

RESOURCES