Skip to main content
Heliyon logoLink to Heliyon
. 2024 Feb 28;10(5):e26808. doi: 10.1016/j.heliyon.2024.e26808

Quantitative structure-activity relationship model development for estimating the predicted No-effect concentration of petroleum hydrocarbon and derivatives in the ecological risk assessment

Jiajia Wei a,b,c, Lei Tian a,b,d, Fan Nie a, Zhiguo Shao a, Zhansheng Wang a,⁎⁎, Yu Xu a, Mei He a,b,c,
PMCID: PMC10925994  PMID: 38468969

Abstract

Quantitative structure-activity relationship (QSAR) is a cost-effective solution to directly and accurately estimating the environmental safety thresholds (ESTs) of pollutants in the ecological risk assessment due to the lack of toxicity data. In this study, QSAR models were developed for estimating the Predicted No-Effect Concentrations (PNECs) of petroleum hydrocarbons and their derivatives (PHDs) under dietary exposure, based on the quantified molecular descriptors and the obtained PNECs of 51 PHDs with given acute or chronic toxicity concentrations. Three high-reliable QSAR models were respectively developed for PHDs, aromatic hydrocarbons and their derivatives (AHDs), and alkanes, alkenes and their derivatives (ALKDs), with excellent fitting performance evidenced by high correlation coefficient (0.89–0.95) and low root mean square error (0.13–0.2 mg/kg), and high stability and predictive performance reflected by high internal and external verification coefficient (Q2LOO, 0.66–0.89; Q2F1, 0.62–0.78; Q2F2, 0.60–0.73). The investigated quantitative relationships between molecular structure and PNECs indicated that 18 autocorrelation descriptors, 3 information index descriptors, 4 barysz matrix descriptors, 6 burden modified eigenvalues descriptors, and 1 BCUT descriptor were important molecular descriptors affecting the PNECs of PHDs. The obtained results supported that PNECs of PHDs can be accurately estimated by the influencing molecular descriptors and the quantitative relationship from the developed QSAR models, that provided a new feasible solution for ESTs derivation in the ecological risk assessment.

Keywords: Petroleum hydrocarbons, Dietary exposure, Predicted No-effect concentrations, Quantitative structure-activity relationship, Toxicity

1. Introduction

During the long-term exploration of oilfields and rapid development of petroleum industry, a wide variety of PHDs with high risks are produced and released into the adjacent environments [[1], [2], [3]]. It is estimated that the concentration of total petroleum hydrocarbons (TPHs) was 1.83 × 105 mg/kg in the soils and sediments around the productive oilfields [4,5], which exceeded the Soil environmental quality-Risk control standard of TPHs for development land (826 mg/kg) developed by Ministry of Ecology and Environment of China by 221 times [6]. A large number of PHDs are easily accumulated in organisms from the polluted environments, and pose potential high risks to high trophic organisms [7,8] and even human [[9], [10], [11]] through biomagnification [12], resulting in the decline of ecosystem services function [13] and the disrupt of the ecological mechanism [14]. However, accurate risk assessment for PHDs is limited due to the lack of environmental quality limits for specific PHDs. Present risk assessment of PHDs are mostly conducted based on the environmental quality and risk control standard of TPHs rather than specific PHDs.

Biotoxicity testing is widely-used for evaluating the environmental safety thresholds of pollutants [15], but usually limited by the labor-intensive, time-consuming, high-cost of the testing procedures and the raised ethical issues related to the animal testing [16,17]. QSAR models can act as a low-cost alternative for biotoxicity testing which directly estimate the biotoxicity based on the mathematical relationship between molecular structure and available toxicity concentrations [18,19]. In recent years, the advancements in computer technology also greatly promoted QSAR modelling strategies so that QSAR has been proposed as an effective technology for direct biotoxicity estimation by many authoritative environmental protection organizations such as REACH and OECD [[20], [21], [22]]. Previous studies have reported effective and reliable QSAR models for toxicity estimation of pesticides [18,23], 1,2,4-triazoles [24], pharmaceuticals and persistent organic pollutants [18], halogen derivatives, ethers and tertiary amines [25]. The molecular descriptors that describe the electrical, hydrophobicity, and thermodynamic structural characteristics, such as the energy of the highest unoccupied molecular orbital (EHOMO), octanol-water partition coefficient (logKow), and overall or summation solute hydrogen bond basicity (MLFER_BO), were often found to be strongly correlated with the toxicity of chemicals [[26], [27], [28], [29], [30]]. However, little attention was paid to systematically investigate the quantitative relationship between the molecular structure and the biotoxicity and developed QSAR models for biotoxicity estimation of PHDs. Only a small amount of work has been devoted to develop QSAR models for estimating the acute toxicity of aromatic hydrocarbons and their derivatives (AHDs) in particular polycyclic aromatic hydrocarbon (PAHs) [19,29,31].

Current QSAR models are mostly developed for estimating the acute or chronic toxicity concentrations towards individual species, however, few QSAR models are directly developed to estimate the ESTs to the ecosystem. Little information are available to the ESTs of specific PHDs in the risk assessment, probably due to lack of sufficient toxicity data. Only ESTs for a few PAHs list as priority pollutants by U.S Environmental Protection Agency (U.S. EPA) [[32], [33], [34]] and aromatic hydrocarbons (AHs) [35] have so far been reported. PNECs is one of the ESTs that characterizes the magnitude of risks posed by the pollutants. The adverse effects of the pollutants are likely to occur when its exposure concentrations exceed the PNECs, especially after chronic or long-term exposure [36,37]. In this study, the quantitative relationship between PNECs and molecular structure was investigated and the QSAR models were developed for direct PNECs estimation of specific PHDs, which could greatly improve the ecological risk assessment of PHDs.

Dietary exposure is an important exposure route for PHDs from accidental ingestion, especially for the workers engaged in the oil industry, which can lead to severe bioconcentration and biomagnification for higher trophic organisms [38,39]. The present study focused on the risk assessment of PHDs from the dietary exposure and developed reliable QSAR models for the PNECs estimation of PHDs. The quantitative relationship between PNECs and molecular structure from the developed QSAR models was investigated to understand the underlying toxicity mechanisms of PHDs. The specific details are shown as follows: (1) All the existing acute and chronic toxicity concentrations for multiple toxicity endpoints of PHDs were collected from the US EPA-ECOTOX database and the current literatures; (2) All the collected toxicity concentrations were used for the PNECs derivation of PHDs, using the assessment factor (AF) approach [[40], [41], [42]]; (3) All the PHDs with the derived PNECs were selected as the specific PHDs datasets for QSAR model development; (4) The molecular structure of the selected specific PHDs were quantified by a series of molecular descriptors; (5) Reliable QSAR models were developed based on the PNECs and the molecular descriptors; (6) The quantitative relationship between the molecular structure and the PNECs from the developed QSAR models were investigated to understand the toxicity mechanisms of PHDs.

2. Materials and methods

2.1. Toxicity data collection and screening

All the existing toxicity concentrations (tested using OECD toxicity test methods) of PHDs (mainly AHDs and ALKDs) to plants, animals, and microorganisms through dietary exposure, including acute toxicity concentrations (e.g., median effective concentration EC50, median lethal dose LD50) and chronic toxicity concentrations (e.g., no observed effect concentration NOEC, lowest observed effect concentration LOEC) for multiple endpoints such as morphology, growth, histology, and reproduction, were collected from the US EPA-ECOTOX database (https://cfpub.epa.gov/ecotox/search.cfm) and the current literature. Detailed chemical information including chemical names, abbreviations, CAS numbers and chemical formulas of these PHDs were shown in Table S1 and their molecular structures were shown in Fig. 1.

Fig. 1.

Fig. 1

The chemical structure of the petroleum hydrocarbons and derivatives.

The collected toxicity concentrations were firstly subjected to a preliminary screening. The toxicity concentrations that meet the following principles were selected: (1) the toxicity concentrations obtained using the toxicity testing methods proposed by the internationally recognized standard experimental guidelines; (2) the toxicity concentrations with clear exposure time and exposure route; (3) for chronic toxicity concentrations, NOEC and NOEL (no observed effect level) were preferred, but L (E)C10 (the concentration causing a 10% effect within a specified time interval) was also considered; (4) for the toxicity concentrations with a range, the minimum, mean, and maximum values were preferred. Then, the unit for all the selected toxicity concentrations was uniformly converted into mg/kg. The selected toxicity concentrations were used for subsequent PNECs estimation and QSAR model development.

2.2. PNECs estimation

PNECs were frequently employed as ESTs in the ecological risk assessment of pollutants for both aquatic and terrestrial ecosystems [43]. In this study, the PNECs of PHDs were estimated by dividing the acute toxicity concentrations (LD50/EC50) or chronic toxicity concentrations (NOEC/NOEL) by an appropriate assessment factor (AF), as described by Okonski et al. [44]. If all of the toxicity concentrations mentioned above is lacking, E (L) C10 or the half of LOEC or the half of LOEL (lowest observed effect level) was used instead of NOEC/NOEL, as recommended by the U.S. EPA [45]. AF was determined by the amount of the acute and chronic toxicity concentrations from different trophic levels (Table 1), according to the method described by Finizio et al. [37]. All the collected toxicity concentrations were used for the PNECs derivation of PHDs. PNECs of the PHDs with acute toxicity concentrations over three trophic levels or chronic toxicity concentrations over one trophic level were derived, according to the criteria requirements for PNECs derivation proposed by US EPA [45].

Table 1.

The assessment factors used for deriving the PNECs of PHDs to the ecosystems.

Data set Assessment factor
At least one short-term L(E)C50 from each of three taxonomic groups 1000
One long-term EC10 or NOEC from species representing one taxonomic group 100
Two long-term results (e.g., EC10 or NOECs) from species representing two taxonomic groups 50
Long-term results (e.g., EC10 or NOECs) from at least three species representing three taxonomic groups 10

2.3. Quantification of molecular structure

The molecular structure of PHDs was quantified by multiple molecular descriptors in this study. The PHDs molecules were firstly visualized by ChemDraw 2D software and then optimized to their stable three-dimensional structures with the minimal energy by Chem3D software. The optimized molecular structure of PHDs was finally used to obtain a variety of molecular descriptors that describe different aspects of the molecular structure using PADEL software and ORCA software at the B3LYP/6-311G++ (d, p) level based on the Density Functional Theory. Octanol-Water Partition Coefficient (LogKow) of PHDs was characterized by EPIWEB4.1. As shown in Table S2 and 1444 two-dimensional molecular descriptors and 27 three-dimensional molecular descriptors, including 1 hydrophobic descriptor (LogKow), 17 electronic descriptors (e.g., EHOMO, qH+, μ, αxx), 1 steric descriptor (Vm), 8 thermodynamic descriptors (e.g., Eth, CV, Gθ), 489 electrotopological state atom type descriptor (e.g., nHBint8, Shother, maxssssPb), 346 two-dimensional autocorrelation descriptors (e.g., AATSC0m, MATS1c, GATS1c), 96 burden modified eigenvalues descriptors (e.g., SpMax2_Bhm, SpMin6_Bhv), 91 barysz matrix descriptors (e.g., SpAbs_DzZ, SpMAD_Dzm, SpAbs_Dze), 67 ring count descriptors (e.g., nF8HeteroRing, n4HeteroRing, n3Ring), were obtained to characterize the molecular structure of PHDs.

Table 2.

Biological species involved in the PNECs estimation and the estimated PNECs.

PHDs Chemical name Acute toxicity
Chronic toxicity
Assessment Factor PNECs (mg/kg)
Aquatic Terrestrial Aquatic Terrestrial
PAHDs Naphthalene Colinus virginianus Colinus virginianus, Mus musculus, Oryctolagus cuniculus, Rattus norvegicus 10 0.25
2-Phenylphenol Anas platyrhynchos, Colinus virginianus Rattus norvegicus, Anas platyrhynchos, Colinus virginianus 100 1
9H-Fluoren-9-One Rattus norvegicus Rattus norvegicus 100 0.75
Phenanthrene Neanthes arenaceodentata Neanthes arenaceodentata, Platichthys flesus Porcellio scaber, Mus musculus, Mesocricetus auratus, Rattus norvegicus 10 0.0089
9H-Fluorene Oniscus asellus, Rattus norvegicus 50 0.0044
Fluoranthene Nereis virens, Porcellio scaber, Rattus norvegicus, Mus musculus 10 5
Chrysene Platichthys flesus Drosophila melanogaster 50 0.0018
Benz [a]anthracene Oniscus asellus, Porcellio scaber 100 0.0096
Pyrene Acheta domesticus, Orchesella cincta, Mus musculus, Rattus norvegicus 50 0.35
Benzo [a]pyrene Fundulus heteroclitus, Oniscus asellus, Rattus norvegicus, Gallus gallus, 10 0.074
MAHDs Benzene Drosophila melanogaster, Mus musculus 50 40
Phenol Oncorhynchus mykiss Rattus norvegicus 100 0.4
Resorcinol Mus musculus, Rattus norvegicus 100 0.5
Benzyl Alcohol Drosophila melanogaster, Mus musculus 50 25
Toluene Mus musculus 100 10
P-Cresol Mouse,Rat Drosophila melanogaster, Mus musculus, Oryctolagus cuniculus, Rattus norvegicus 100 0.05
O-cresol Mustela putorius, Neovison vison Drosophila melanogaster, Mustela putorius, Neovison vison 50 0.1
M-cresol Oryctolagus cuniculus, Rattus norvegicus 100 0.05
P-Xylene Rattus norvegicus 100 10
M-Xylene Rattus norvegicus 100 10
O-Xylene Rattus norvegicus 100 10
Butyl 4-Hydroxybenzoate Oncorhynchus mykiss Oncorhynchus mykiss Mus musculus 50 0.086
4-Tert-Octylphenol Oryzias latipes Platichthys flesus 100 0.5
ALKDs 1,2-Ethanediol Mus musculus 100 110.9
Acrolein Anas platyrhynchos, Colinus virginianus, Mus musculus Mus musculus, Oryctolagus cuniculus, Rattus norvegicus, Canis familiaris 50 0.001
Allyl Alcohol Mus musculus, Rattus norvegicus 100 0.03
Isopropyl Alcohol Rattus norvegicus 100 6.01
Acrylic Acid Rattus norvegicus 100 0.27
2-Methoxyethanol Drosophila melanogaster, Mus musculus, Rattus norvegicus 50 6
Ethyl acetate Ostrinia nubilalis 100 50
1-Butanol Mouse, Hamster, Bird, Dog Rattus norvegicus 100 1.25
Tert-Butanol Mus musculus, Rattus norvegicus 100 7.41
Methyl acrylate Drosophila melanogaster 100 5
1,2-Dimethoxyethane Mus musculus Mus musculus 100 20
2-Ethoxyethanol Drosophila melanogaster, Mus musculus 50 72.1
1-Pentanol Rattus norvegicus 100 1.4
Glutaraldehyde Anas platyrhynchos, Colinus virginianus Drosophila melanogaster, Anas platyrhynchos, Colinus virginianus 50 4.3
4-Methyl-2-pentanone Rattus norvegicus Rattus norvegicus 100 10
1-Hexanol Rat Rat 100 11.27
2-Ethylhexan-1-Ol Mice, Rabbits, Guinea pigs, Rattus norvegicus Rattus norvegicus 100 0.5
1-Octanol Rattus norvegicus Rattus norvegicus 100 13
Octanoic acid Heterobothrium okamotoi, Pagrus major, Takifugu rubripes Rattus norvegicus 10 3.75
Triglyme Mus musculus Mus musculus 100 35
Nonanoic acid Mice, Colinus virginianus Rattus norvegicus 100 15
1-Undecanol Rattus norvegicus Male rats 100 20
Undecane Rattus norvegicus 100 1
1-Dodecanol Rattus norvegicus, Male rats 100 20
1-Decanol Rattus norvegicus Rattus norvegicus 100 15.83
1-Eicosanol Rattus norvegicus 100 29.85
Heneicosane Wistar rats 100 10
Cis-9-Tricosene Colinus virginianus, Anas platyrhynchos, Oryctolagus cuniculus, Rattus norvegicus Anas platyrhynchos 100 0.001

2.4. QSAR model development

Before QSAR model development, a selection process was conducted to all the obtained molecular descriptors to avoid the over-fitting in the QSAR modelling. The specific selection was conducted as follows: Firstly, the molecular descriptors with missing values were manually excluded. Then, the remaining molecular descriptors were imported into SPSS26 software to analyze the Pearson correlation coefficient between the molecular descriptors. Those molecular descriptors with high correlations that an absolute value of Pearson correlation coefficient was higher than 0.95 were removed to eliminate multicollinearity [46,47]. After this selection process, a total of 488 molecular descriptors were left for subsequent QSAR modelling.

The QSAR model was developed with -logPNEC as the dependent variable and molecular descriptors as the independent variables, using multiple linear regression (MLR) by SPSS26 software, according to the OECD QSAR guidelines “an unambiguous algorithm”. MLR was performed stepwise until passing the tests (P < 0.05) and identified the most important molecular descriptors for -logPNEC. Based on the results of the stepwise regression, a preliminary QSAR model was developed. High reliable QSAR models are characterized by higher adjusted multiple correlation coefficient (R2 > 0.6) and the multicollinearity diagnosis (variance inflation factor, VIF<10). Ordinary least squares were used to eliminate insignificant molecular descriptors using F-test and t-test. If the F-test and t-test did not pass (P > 0.05) or if R2 was small (<0.6), the regression were re-performed [48,49]. If VIF>10, the principal components of the variables were extracted and the regression analyses were re-performed to eliminate covariances. The specific formulas for the above parameters are shown in Table S3.

Double cross validation (Internal and external validations) was performed to the preliminary QSAR model, according to the OECD QSAR guidelines “appropriate measures of goodness-of-fit, robustness and predictivity” [49]. The training sets and validation sets were randomly selected in an approximate ratio of 4:1 from all the PHDs dataset, AHDs datasets and ALKDs datasets, respectively. Then, the training set is internally validated by leave-one-out (LOO) cross-validation to assess the internal robustness of the separate three QSAR models for PHDs, AHDs and ALKDs. The internal verification coefficient Q2LOO exceeding 0.5 indicates the developed QSAR model with good robustness [50,51]. Q2LOO is calculated using formula (1). The external validation coefficients Q2F1 and Q2F2 of the validation set were utilized to assess the predictive performance of the model. Both of Q2F1 and Q2F2 exceed 0.5 indicates the developed QSAR model with good external prediction ability [51]. Q2F1 and Q2F2 are calculated using (2), (3), respectively.

QLOO2=1i=1ntraining(yiexpyipred)ˆ2i=1ntraining(yiexp-y¯)ˆ2 (1)

where yexp and ypred are the estimated and predicted -logPNEC of the training set, y¯ is the average -log PNEC concentration of the training set, ntraining is the chemical number of the training set.

Q2F1=1i=1ntest(yiexpyipred)ˆ2i=1ntest(yiexp-ytraining¯)ˆ2 (2)
Q2F2=1i=1ntest(yiexpyipred)ˆ2i=1ntest(yiexp-ytest¯)ˆ2 (3)

where yexp and ypred are the estimated and predicted -logPNEC of the validation set, ytraining¯ andytest¯ are the average -log PNEC of the training set and the validation set, ntest is the chemical number of the validation set.

The QSAR models passed the internal and external validations were developed for the PNECs estimation in this study. The -logPNEC (L) is described with the optimal combination of influential molecular descriptors (X1, X2 … Xn) used as independent variables. formula (4) is represented as follows:

L=K1·X1+K2·X2++Kn·Xn+K0 (4)

where L represents the dependent variable -logPNEC, X1, …, Xn denote the independent variables of the molecular descriptors, K1, …, Kn are the unstandardized coefficients of the independent variables, and k0 is the constant term.

2.5. Application domain analysis of the QSAR model

In general, QSAR model development has its own limitations due to some influencing factors such as the sample number restriction and the model algorithm [52]. The application domain (AD), defined as a threshold for the chemicals that can apply the developed models, is usually analyzed through outlier detection. In this study, a well-defined application domain range for the QSAR model was analyzed through the identification of structural outliers and predicted outliers. It is not reliable to use the developed QSAR model with chemicals outside the application domain to estimate the PNECs. The chemicals with structural outliers in the training set and validation set were identified by the hat value (h) using the leverage approach [53], following formula (5). The warning leverage (h*) was calculated as formula (6). The chemicals with a hat value (h) higher than warning leverage (h*) were considered as structural outliers.

h=xi(XTX)XiT1 (5)
h* = 3(k + 1)/n (6)

where xi is the molecular descriptor of the ith chemical, X represents the matrices of molecular descriptors, k is the number of molecular descriptors, n is the number of chemicals in the training set.

The chemicals with predicted outliers were identified according to the standardized residuals between the estimated and predicted -log PNEC of the chemicals. The standardized residual (δ) was calculated as formula (7). The chemicals that exceed the threshold of the standardized residuals (from −3 to 3) were considered as predicted outliers [46,49].

δ=yypredi=1n(yypred)ˆ2nk1 (7)

where y and ypred are the estimated and predicted -log PNEC of the training set and validation set, n is the number of chemicals in the training set, k is the number of molecular descriptors.

Then, the standardized residuals versus hat values of the chemicals in both the training set and validation set were plotted to visualize the outliers and establish the application domain range for the developed the QSAR models, according to the distance-based methods [54,55].

2.6. Quantitative relationship between PNECs and molecular structure

The quantitative relationship between PNECs and molecular structure was demonstrated by the standardized coefficients of the molecular descriptors involved in the developed QSAR models. The standardized coefficients of the molecular descriptors indicated the influencing weight of its effect on PNECs and were used to describe the quantitative relationship between PNECs and molecular structure. The specific details were conducted as follows. Firstly, K1, …, Kn in formula (4), the unstandardized coefficients for the 1st-nth influencing molecular descriptors, were transferred to the corresponding standardized coefficients of the molecular descriptors according to formula (8).

Ki* = Ki/(SL/SXi) (8)

where Ki* is the standardized coefficient of a molecular descriptor, Ki is the unstandardized coefficient of the molecular descriptor, SL is the standard deviation of the dependent variable L, and SXi is the standard deviation of the independent variable Xi.

Then, the influencing weight of the molecular descriptors on the PNECs (Wi) were calculated using formula (9), based on their standardized coefficients.

Wi (%) = Ki*/ (K1* + K2* + … + Kn*) *100% (9)

where K1*, …,Kn* represent the standardized coefficient of the 1st-nth molecular descriptor.

3. Results and discussion

3.1. The molecular structural information of PHDs

In this study, the molecular structure of PHDs used for QSAR model development was shown in Fig. 1 and the additional detailed chemical information was provided in Table S1. The PHDs were composed of 45.1% AHDs, 47.1% alkenes and derivatives, and 7.8% alkanes and derivatives, indicating the 51 PHDs selected in this study covered a wide range of molecular structures (Fig. 2a). Multiple molecular descriptors, including hydrophobic descriptors, electronic descriptors, steric descriptors, two-dimensional autocorrelation descriptors, and information index descriptors (Table S2), were used to characterize different aspects of the molecular structural information of these PHDs and provide a detailed description of the molecular structural features. The large variations in the molecular descriptors also supported that the 51 PHDs selected in this study covered a wide variety of diverse molecular structures with quite different molecular structural properties (Tables S4–S7). For example, AHDs showed a stronger ability to gain and lose electrons than ALKDs, as reflected by higher q (a electronic descriptor that described the ability of chemicals to gain or lose electrons [56]) of AHDs (ranged from −0.16 to −2.04) than ALKDs. AHDs has a stronger hydrophobicity than ALKDs, as evidenced by higher MLOGP (a two-dimensional molecular descriptor that described the chemical hydrophobicity [57]) of AHDs (ranged from 1.9 to 3.66) than ALKDs. The variation of the information index descriptors (e.g., IC0–IC4, TIC0–TIC4, SIC0–SIC4, CIC0–CIC5, BIC0–BIC4, MIC0–MIC4, ZMIC0–ZMIC4) of AHDs (changed from 1.53 to 12.4 times) was also much larger than that of ALKDs (varied from 1.70 to 31.4 times).

Fig. 2.

Fig. 2

(a) The -logPNEC concentrations of different PHDs; (b) The distribution of the -logPNEC concentrations of the PHDs for the three QSAR models; (c) The -logPNEC concentrations of the PHDs in the individual QSAR model.

3.2. The PNECs of PHDs

As presented in Table 2, the PNECs of 51 PHDs were obtained based on the acute or chronic toxicity concentrations (e.g., LD50, NOEC) of PHDs to various vertebrates (e.g., rodents, rabbits, pigs, fish, and reptiles) and invertebrates (e.g., worms, arthropod). The toxicity concentrations specifically used for the PNECs estimation, were summarized in detail in Tables S8–S9. As the PNECs visualized in Table 2, the -logPNEC of the PHDs exhibit a normal distribution (Fig. 2b) and differed by nearly five logarithmic units (range from −2.04 to 3, Fig. 2a), indicating a large difference in the toxicity among these PHDs. Thus, the QSAR models based on these representative PHDs with wide-range toxicity in this study can be better applied to the toxicity estimation for diverse PHDs.

The results showed that the PNECs varied significantly (0.001–110.9 mg/L) with the type of the PHDs, following the order of polycyclic aromatic hydrocarbons and derivatives (PAHDs) > monocyclic aromatic hydrocarbons and derivatives (MAHDs) > ALKDs. The PNECs of AHDs were significantly lower than that of ALKDs, indicating higher toxicity of AHDs than ALKDs, which was in agreement with the toxicity investigation of aromatic hydrocarbons and long-chain n-alkanes [58]. In this study, there were 16 AHDs and 6 ALKDs in the 22 PHDs of higher toxicity characterized with -logPNEC>0, whereas 7 AHDs and 22 ALKDs in the 29 PHDs of lower toxicity characterized with -logPNEC<0, indicating larger proportion of the investigated AHDs with higher toxicity and lower proportion of the studied ALKDs with lower toxicity (Fig. 2a). Among the AHDs, the PNECs of PAHDs (ranged from 0.001824 to 5) were significantly lower than that of MAHDs (ranged from 0.05 to 40), indicating a higher toxicity of PAHDs (Table 2), which was consistent with previous finding of the hazardous concentration for 5% of species (HC5) for AHDs [59].

3.3. The developed QSAR models

In the present study, three QSAR models were separately developed for all the PHDs (model 1, 51 datasets), the AHDs (model 2, 23 datasets), and the ALKDs (model 3, 28 datasets), strictly following the procedures of the OECD QSAR guidelines. The model equations and the performance of the three QSAR models were shown in Fig. 3, and the molecular descriptor descriptions were shown in Table S10. Results showed that all the three models showed good fitting performance, as evidenced by their high goodness of fit (high R2 (0.89,0.91,0.95) and low RMSE values (0.13,0.19,0.2)). The closer R2 is to 1, the better the goodness of fitting for the developed QSAR model.The three models are internally robust and stable, as reflected by the internal validation parameters (high Q2LOO (0.76,0.66,0.89)), and showed excellent external prediction capability by the high external validation coefficients (Q2F1: 0.62,0.78,0.72; Q2F2: 0.6,0.73,0.66). All of the model parameters were much higher than the acceptable thresholds of the OECD QSAR development requirements (R2 > 0.6; Q2 LOO > 0.5; Q2F1 > 0.5, Q2F2 > 0.5) [48,51].

Fig. 3.

Fig. 3

The model equations and parameters of the three developed QSAR models and the influencing weight of the molecular descriptors on the PNECs of PHDs.

The predictive performance of the developed QSAR models were visualized by the comparison of the experimental and predicted -logPNEC of the PHDs in the training set and validation set in Fig. 4. The results showed that the R2 of the regression line fit for the three models were high to 0.93, 0.96 and 0.96, respectively, indicating a high degree of fitting for the developed models. Both of the experimental and the predicted -logPNEC in both the training set and validation set were evenly distributed on both sides of the regression line, and were in very good agreement with each other, supporting a high prediction accuracy of these models. Relatively small residuals of the predicted values against the experimental values of -logPNEC were observed for the three models, which were 0.01–1.11, 0.02–0.83, 0–0.96, respectively (Table S11), indicating the developed three models without systematic errors. Comparatively, the sum of squared residuals in the model 1 (6.38) was much larger than model 2 (2.563), suggesting model 2 is the more accurate for estimating the PNECs of AHDs. Similarly, the sum of squared residuals in the model 1 (3.82) was slightly larger than using model 3 (3.40), suggesting model 3 was better for the PNECs estimation of ALKDs with higher accuracy.

Fig. 4.

Fig. 4

The application domain range analysis and the predictive performance of the three developed QSAR models.

The application domain range of the developed QSAR models was defined and visualized via a Williams plot (hi < h*, −3< standardized residuals<3) based on the standardized residuals versus hat values (h) of the PHDs in both the training set and the validation set (Fig. 4). The h values of the PHDs involved in both the training set and the validation set were below their respective waring leverage (h* = 1.256, 1.895 and 0.955) of the three models, indicating no structural outliers of PHDs existed in these models. The standardized residuals for all the PHDs involved in the three models did not exceed the standardized residual threshold (from −3 to 3), indicating no predicted outliers of PHDs in all the developed models. Therefore, all the PHDs involved in the developed three models are within the application domain. It is reliable to use the three QSAR models to estimate the PNECs of PHDs, AHDs, and ALKDs, respectively.

3.4. The QSAR model accuracy in the PNEC estimation

In this study, model 2 and model 3 were suggested to estimate the PNECs of AHDs and ALKDs, respectively. Three aromatic hydrocarbons (Naphthalene, 1-Methylnaphthalene, and Pyrene) and two alkanes (n-Decane and n-Heptane) that within the application domain range in the application domain analysis (Fig. 5a) and not included in the previous QSAR modeling were used to verify the accuracy of the two models by comparing the estimated PNECs using the developed models with the published regulatory limits of these PHDs by international authoritative environmental protection organizations. The results showed that the estimated PNECs of Naphthalene, 1-Methylnaphthalene, and Pyrene (0.455, 0.044, and 0.255 mg/kg) were significantly lower than that for n-Decane and n-Heptane (2.771 and 4.70 mg/kg). The estimation results for the PNECs of the three aromatic hydrocarbons and two alkanes were consistent with their toxicity. As shown in Fig. 5b, the estimated PNEC of naphthalene was then compared with the peer-reviewed toxicity concentrations published by the European Food Safety Authority (EFSA), and the estimated PNECs of 1-Methylnaphthalene, Pyrene, n-Decane and n-Heptane were compared with the proposed safety limits published by the Environmental Protection Agency (EPA). The PNECs of these PHDs estimated by the developed models were approximate to the published regulatory limits by 0.07 – 0.36 log units. The obtained results supported that the developed models were with high accuracy in PNECs estimation for PHDs with diverse molecular structures.

Fig. 5.

Fig. 5

(a) The defined application domains range of the models; (b)Comparison of the estimated PNECs and the proposed safety limits of the PHDs.

3.5. The quantitative structure-PNECs relationship

The quantitative relationships between the molecular structure and PNECs in the present study were directly obtained from the developed QSAR models (Fig. 4). A total of 17, 11 and 6 molecular descriptors were related to the PNECs of all the PHDs (model 1), the AHDs (model 2), and the ALKDs (model 3), respectively. The influence of the molecular descriptors on the logPNEC varied with PHDs. For all the PHDs, eight molecular descriptors (ATSC2v, ATSC4e, ATSC4i, AATSC1e, MATS2c, VE2_D2S, SpMax3_Bhs, and ZMIC4) were positively correlated with the PNECs, whereas nine molecular descriptors (ATSC2s, AATSC2c, MATS4v, GATS3c, GATS3i, SpMin1_Bhp, SpMin6_Bhs, IC2 and VR3_D) were negatively correlated with the PNECs. However, for the AHDs, nine molecular descriptors (ATSC4c, AATSC2m, AATSC4p, MATS1p, GATS4c, GATS1s, VE3_DzZ, VE3_Dzs and CIC5) were observed to be positively with the logPNEC. Two descriptors (ATSC2c and SpMin6_Bhi) were negatively related with the logPNEC of the AHDs. For the ALKDs, the molecular descriptor AATSC2m was positively correlated with the PNECs, and five molecular descriptors (AATSC4v, BCUTp-1l, SpMin6_Bhe, SpMin8_Bhp and IC2) were negatively correlated with the PNECs.

All the influencing molecular descriptors are two-dimensional molecular descriptors, including autocorrelation descriptors, information index descriptors, burden modified eigenvalues descriptors, barysz matrix descriptors, and BCUT descriptors. The effects of these molecular descriptors on the biotoxicity had been reported before and the developed QSAR model was also used for the estimation of the toxicity concentration of some chemicals [54,60,61]. For instance, the autocorrelation descriptors (GATS7p, MATS1p, ATSC5v, MATS8e, ATSC2p, ATSC1m), the burden modified eigenvalues descriptors (SpMax2_Bhp, SpMin4_Bhe, SpMin2_Bhs, SpMin1_Bhs), and the BCUT descriptor (BCUTw-1h), were observed to significantly affect the interspecies toxicity of 1,2,4-triazole compounds to mice [24]. The BCUT descriptors and the information index descriptors were investigated as important molecular descriptors on the toxicity of alcohol compounds to Rana temporaria [62]. Three autocorrelation parameters (GATS5s, GATS1p, and ATSC7v) and the barysz matrix descriptor (VE3 DzZ) were useful in estimating the acute toxicity of the emerging contaminants such as active ingredients and their metabolites, ingredients of cosmetic and personal care products, pesticides and their trans-formation products to freshwater invertebrates [25]. The autocorrelation parameters (ATSC2e, MATS2v, ATSc2, MATS6s), the BCUT descriptor (BCUTw-1l), and the burden modified eigenvalues descriptor (SpMax5_Bhs) were used for estimating the acute oral toxicity of PAHs to mammals by a two-dimensional parametric model using genetic algorithms and multiple linear regression [31]. However, the quantitative relationships between the molecular structure and PNECs and the application of these quantitative relationships to estimate the PNECs are rarely reported.

As shown in Fig. 4, the influencing weight of these molecular descriptors contributed to the PNECs in each model was used to characterize the influence of the molecular descriptors on the PNECs. For the developed three models, the autocorrelation descriptors (e.g., ATSC4i, MATS2c, GATS3c), information index descriptors (e.g., IC2, ZMIC4, CIC5), burden modified eigenvalues descriptors (e.g., SpMax3_Bhs, SpMin1_Bhp), barysz matrix descriptors (e.g., VR3_D, VE3_DzZ), and BCUT descriptors (BCUTp-1l), were accounted for 45.67%, 22.13%, 16.67%, 8.17%, and 7.47% of the weight in all the influencing molecular descriptors.

3.6. The mechanism underlying the quantitative relationships

The quantitative relationships between the molecular structure and PNECs indicated that 34 two-dimensional molecular descriptors, including autocorrelation descriptors, information index descriptors, burden modified eigenvalues descriptors, barysz matrix descriptors, and BCUT descriptors, were associated with the toxicity of PHDs. The obtained results can provide insights into the underlying mechanisms for the effects of these molecular descriptors on the toxicity and PNECs of PHDs.

Autocorrelation descriptors are important molecular descriptors affecting the PNECs and biotoxicity of AHDs in this study. Among the 34 influencing molecular descriptors, 18 molecular descriptors (ATSC2c, ATSC2s, ATSC2v, ATSC4e, ATSC4i, ATSC4c, AATSC4v, AATSC2c, AATSC1e, AATSC2m, AATSC4p, MATS4v, MATS1p, MATS2c, GATS3c, GATS3i, GATS4c, and GATS1s) were autocorrelation descriptors in this study. A high proportion of the autocorrelation descriptors showed significant effects on the PNECs of PHDs. The autocorrelation descriptors that affected the PNECs of PHDs were mainly the mass (m), polarizability (p), van der Waals volume (v), first ionization potential (i), and state (s) weighting of Broto-Moreau (AST), Geary (GAT), and Moran (MATS) descriptors. The three autocorrelation descriptors characterized the structural conformation of chemical molecules [49,63], which has been found to be highly relevant to the aquatic toxicity of cosmetics and personal care additives [49]. These autocorrelation descriptors may affect the PNECs and toxicity by influencing the spatial conformation of PHDs. The specific weighting of the autocorrelation descriptors such as the mass (m), polarizability (p), van der Waals volume (v), first ionization potential (i), and state (s), which characterized the functions and properties of atoms in a molecule, were also found to be important factors on the PNECs and toxicity of PHDs. Taking the polarizability, mass, and first ionization potential weighting of the autocorrelation descriptors as examples. AATSC4p and MATS1p are autocorrelation descriptors weighted by atomic polarizability, describing the overall mobility of electrons and the reactivity of a chemical, accounted for 10.4% and 5.9% of the weight in all the influencing molecular descriptors (Fig. 4). The great influence of AATSC4p and MATS1p on the PNECs and toxicity of PHDs is probably affected by the atomic polarizability. A chemical with a high polarization is usually not easy to cross the biofilm to accumulate in biological tissues, generally resulting in its low toxicity [25]. This is consistent with the results in this study that AATSC4p and MATS1p are positively relevant to the PNEC and negatively related with the toxicity of PHDs. AATSC2m is an autocorrelation descriptor weighted by atomic mass, measuring the strength between relative atomic mass of the atom pairs, accounted for 9.6% in Model 2 and 4.2% in Model 3 of the weight in all the influencing molecular descriptors. The positive correlation between AATSC2m and PNEC might be influenced by the atomic mass. A chemical with a greater molecular mass is usually more difficult to enter into the organisms and then act on the active site [64] and thus produces less toxic effects [65]. GATS3i is a 2D Geary autocorrelation descriptor weight by the first ionization potential of atom pairs, describing the ionization potential from the molecules with several carbon—carbon bonds, accounted for 3.8% of the weight in all the influencing molecular descriptors. Molecular with a lower GATS3i value usually has a higher carbon content which might lead to a higher toxic effect [66].

Individual information index descriptors were observed to the most influencing molecular descriptors on the PNECs of PHDs in the present study. The complementary information content index of the neighborhood symmetry of order-5 (CIC5) and the information content index of the neighborhood symmetry of order-2 (IC2) showed the maximum influencing weight (17.7% and 31.5%) on the PNECs of AHDs and ALKDs, respectively. IC2 also showed a high negative contribution to the PNECs of all the PHDs, with a high influencing weight of 9.2%. Many studies have focused on the relationship between the information index descriptors and the toxicity, however, the correlation between the information index descriptors and the PNECs is still not clear. Taking IC2 as examples, IC2 primarily represents the topological features and information transfer capabilities of chemical molecules [67]. A higher IC2 indicates a stronger information transfer among the atoms within the molecule and a higher molecular connectivity of a chemical. As a result, the chemical appeared to exhibit a higher diffusion coefficient and a stronger interaction, and thus potentially presented a greater toxic effect [68]. Therefore, the positive correlation between IC2 and biotoxicity was obviously observed in the PHDs with longer molecular topological distances.

Burden modified eigenvalues descriptors and barysz matrix descriptors, derived from the Burden and Barysz matrices, were also important in influencing the PNECs and toxicity of PHDs in this study. Six Burden modified eigenvalues descriptors (SpMax3_Bhs, SpMin1_Bhp, SpMin6_Bhs, SpMin6_Bhi, SpMin8_Bhp, SpMin6_Bhe) and barysz matrix descriptors (VE2_Dzs, VR3_D, VE3_DzZ, VE3_Dzs), showed a 16.67% and 8.17% weight in all the influencing molecular descriptors, respectively. The two types of molecular descriptors were related to the molecular topological characteristics of chemicals that associated with the molecular size, the atomic number, and the content of some specific heteroatoms with a role in the toxicity [25,29], and thus affected the PNECs and toxicity of PHDs.

The first lowest eigenvalue in the Burden matrix weighted by polarizability (BCUTp-1l) was a significant BCUT descriptor affecting the PNECs and toxicity of ALKDs in this study. BCUTp-1l was negatively related with the PNECs of ALKDs, contributing 22.4% weight to the PNECs in the Model 3. Previous studies have reported that high BCUTp-1l demonstrated a higher spatial metric polarizability that describing electron mobility and reactivity of a chemical, and thus resulted in higher activity and toxic effects on organisms [69], which was in agreement with the results in this study.

The list of the abbreviations and its definition in this study.

Abbreviations Definition
ESTs Environmental safety thresholds
QSAR Quantitative structure-activity relationship
PNECs Predicted no-effect concentrations
PHDs Petroleum hydrocarbons and their derivatives
AHDs Aromatic hydrocarbons and their derivatives
ALKDs Alkanes, Alkenes, and their derivatives
TPHs Total petroleum hydrocarbons
SSD Species sensitivity distribution
AF Assessment factor
AHs Aromatic hydrocarbons
EC50 Median effective concentration
LD50 Median lethal dose
NOEC Chronic toxicity concentrations
LOEC Lowest observed effect concentration
NOEL No observed effect level
L(E)C10 The concentration causing a 10% effect within a specified time interval
MLR Multiple linear regression
VIF Variance inflation factor
AD Application domain

4. Conclusions

  • (1)

    Three validated QSAR models with high accuracy in the estimation of PNECs were separately developed for PHDs, AHDs and ALKDs. The separate model developed for AHDs and ALKDs showed better performance in estimating the PNECs.

  • (2)

    The developed QSAR models showed wide application domain range, supporting a new cost-effective and reliable approach for directly estimating the PNECs of PHDs in the ecological risk assessment.

  • (3)

    34 two-dimensional molecular descriptors were observed to influence the PNECs of PHDs. Most of the involved molecular descriptors were autocorrelation descriptors, and the individual information index descriptors contributed the highest weight in all the influencing molecular descriptors on the PNECs of PHDs.

  • (4)

    The quantitative relationships between the molecular descriptors and PNECs provides new insights into understanding the mechanism of the effects of the associated molecular descriptors on the toxicity and PNECs of PHDs.

Data availability statement

The authors confirm that the data supporting the findings of this study are available within the manuscript and its supplementary materials.

CRediT authorship contribution statement

Jiajia Wei: Writing – original draft, Validation, Investigation, Formal analysis, Data curation, Conceptualization. Lei Tian: Writing – review & editing, Validation, Methodology, Investigation, Funding acquisition, Formal analysis. Fan Nie: Validation, Project administration, Methodology, Formal analysis. Zhiguo Shao: Validation, Supervision, Project administration, Methodology. Zhansheng Wang: Validation, Supervision, Project administration. Yu Xu: Validation, Project administration, Methodology, Formal analysis. Mei He: Writing – review & editing, Validation, Supervision, Project administration, Investigation, Funding acquisition, Formal analysis.

Declaration of competing interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Acknowledgments

This study was supported by the CNPC Scientific Research and Technology Development Programme (2021DJ6605), and the Yangtze Talents Fund [2020–2023].

Footnotes

Appendix A

Supplementary data to this article can be found online at https://doi.org/10.1016/j.heliyon.2024.e26808.

Contributor Information

Zhansheng Wang, Email: wangzs@cnpc.com.cn.

Mei He, Email: hemei-521@163.com.

Appendix A. Supplementary data

The following is the Supplementary data to this article:

Multimedia component 1
mmc1.docx (179.5KB, docx)

References

  • 1.Wu B., Guo S., Wang J. Spatial ecological risk assessment for contaminated soil in oiled fields. J. Hazard Mater. 2021;403 doi: 10.1016/j.jhazmat.2020.123984. [DOI] [PubMed] [Google Scholar]
  • 2.Nishiwaki J., Kawabe Y., Sakamoto Y., et al. Volatilization properties of gasoline components in soils. Environ. Earth Sci. 2011;63:87–95. doi: 10.1007/s12665-010-0671-7. [DOI] [Google Scholar]
  • 3.Wu B., Guo S., Zhang L., et al. Spatial variation of residual total petroleum hydrocarbons and ecological risk in oilfield soils. Chemosphere. 2021;291 doi: 10.1016/j.chemosphere.2021.132916. [DOI] [PubMed] [Google Scholar]
  • 4.Liu Q., Xia C., Wang L., et al. Fingerprint analysis reveals sources of petroleum hydrocarbons in soils of different geographical oilfields of China and its ecological assessment. Sci. Rep. 2022;12(1):4808. doi: 10.1038/s41598-022-08906-6. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 5.Andrade-Couce M., Marcet P., Fernández-Feal L., et al. Impact of the Prestige oil spill marsh soils: relationship between heavy metal, sulfide and total petroleum hydrocarbon contents at the Villarrube and Lires marshes (Galicia, Spain) Cienc. Mar. 2004;30:477–487. doi: 10.7773/cm.v30i3.281. [DOI] [Google Scholar]
  • 6.Standardization Administration of the people's Republic of China . 2015. Soil Environmental Quality-Risk Control Standard for Soil Contamination of Development Land, GB36600-2018.https://www.mee.gov.cn/ywgz/fgbz/bz/bzwb/trhj/201807/t20180703_446027.shtml Beijing, China. [Google Scholar]
  • 7.Alkio M., Tabuchi T.M., Wang X., et al. Stress responses to polycyclic aromatic hydrocarbons in Arabidopsis include growth inhibition and hypersensitive response-like symptoms. J. Exp. Bot. 2005;56(421):2983–2994. doi: 10.1093/jxb/eri295. [DOI] [PubMed] [Google Scholar]
  • 8.Wang L., Zheng B., Meng W. Photo-induced toxicity of four polycyclic aromatic hydrocarbons, singly and in combination, to the marine diatom Phaeodactylum tricornutum. Ecotoxicol. Environ. Saf. 2008;71(2):465–472. doi: 10.1016/j.ecoenv.2007.12.019. [DOI] [PubMed] [Google Scholar]
  • 9.Wen J., Pan L. Short-term exposure to benzo[a]pyrene causes oxidative damage and affects haemolymph steroid levels in female crab Portunus trituberculatus. Environ. Pollut. 2016;208:486–494. doi: 10.1016/j.envpol.2015.10.019. [DOI] [PubMed] [Google Scholar]
  • 10.Yang L., Wang W.-C., Lung S.-C.C., et al. Polycyclic aromatic hydrocarbons are associated with increased risk of chronic obstructive pulmonary disease during haze events in China. Sci. total environ. 2017;574:1649–1658. doi: 10.1016/j.scitotenv.2016.08.211. [DOI] [PubMed] [Google Scholar]
  • 11.Song J.A., Choi C.Y. Exposure to benzo[α]pyrene causes oxidative stress and cell damage in bay scallop Argopecten irradians. Aquac rep. 2021;21 doi: 10.1016/j.aqrep.2021.100860. [DOI] [Google Scholar]
  • 12.Wang H., Huang X., Kuang Z., et al. Source apportionment and human health risk of PAHs accumulated in edible marine organisms: a perspective of “source-organism-human”. J. Hazard Mater. 2023;453 doi: 10.1016/j.jhazmat.2023.131372. [DOI] [PubMed] [Google Scholar]
  • 13.Han H., Huang S., Liu S., et al. An assessment of marine ecosystem damage from the penglai 19-3 oil spill accident. J. Mar. Sci. Eng. 2021;9:732. doi: 10.3390/jmse9070732. [DOI] [Google Scholar]
  • 14.Andres B. The exxon valdez oil spill disrupted the breeding of black oystercatchers. J Wildl. 1997;61:1322. doi: 10.2307/3802132. [DOI] [Google Scholar]
  • 15.Khan M.I., Cheema S.A., Tang X., et al. A battery of bioassays for the evaluation of phenanthrene biotoxicity in soil. Arch. Environ. Contam. Toxicol. 2013;65(1):47–55. doi: 10.1007/s00244-013-9879-3. [DOI] [PubMed] [Google Scholar]
  • 16.Russom C., Breton R., Walker J., et al. An overview of the use of quantitative structure-activity relationships for ranking and prioritizing large chemical inventories for environmental risk assessments. Environ. Toxicol. Chem. 2003;22:1810–1821. doi: 10.1897/01-194. [DOI] [PubMed] [Google Scholar]
  • 17.Tao S., Xi X., Xu F., et al. A fragment constant QSAR model for evaluating the EC50 values of organic chemicals to Daphnia magna. Environ. Pollut. 2002;116(1):57–64. doi: 10.1016/S0269-7491(01)00119-1. [DOI] [PubMed] [Google Scholar]
  • 18.Samanipour S., O'Brien J.W., Reid M.J., et al. From molecular descriptors to intrinsic fish toxicity of chemicals: an alternative approach to chemical prioritization. Environ. Sci. Technol. 2022 doi: 10.1021/acs.est.2c07353. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 19.Toropov A.A., Di Nicola M.R., Toropova A.P., et al. A regression-based QSAR-model to predict acute toxicity of aromatic chemicals in tadpoles of the Japanese brown frog (Rana japonica): calibration, validation, and future developments to support risk assessment of chemicals in amphibians. Sci. Total Environ. 2022;830 doi: 10.1016/j.scitotenv.2022.154795. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 20.Golbamaki A., Cassano A., Lombardo A., et al. Comparison of in silico models for prediction of Daphnia magna acute toxicity. SAR QSAR Environ. Res. 2014;25(8):673–694. doi: 10.1080/1062936x.2014.923041. [DOI] [PubMed] [Google Scholar]
  • 21.Cassotti M., Consonni V., Mauri A., et al. Validation and extension of a similarity-based approach for prediction of acute aquatic toxicity towards Daphnia magna. SAR QSAR Environ. Res. 2014;25(12):1013–1036. doi: 10.1080/1062936x.2014.977818. [DOI] [PubMed] [Google Scholar]
  • 22.Singh K.P., Gupta S., Kumar A., et al. Multispecies QSAR modeling for predicting the aquatic toxicity of diverse organic chemicals for regulatory toxicology. Chem. Res. Toxicol. 2014;27(5):741–753. doi: 10.1021/tx400371w. [DOI] [PubMed] [Google Scholar]
  • 23.Yu X., Zeng Q. Random forest algorithm-based classification model of pesticide aquatic toxicity to fishes. Aquat. Toxicol. 2022;251 doi: 10.1016/j.aquatox.2022.106265. [DOI] [PubMed] [Google Scholar]
  • 24.Liu Z., Dang K., Gao J., et al. Toxicity prediction of 1,2,4-triazoles compounds by QSTR and interspecies QSTTR models. Ecotoxicol. Environ. Saf. 2022;242 doi: 10.1016/j.ecoenv.2022.113839. [DOI] [PubMed] [Google Scholar]
  • 25.Lavado G.J., Baderna D., Gadaleta D., et al. Ecotoxicological QSAR modeling of the acute toxicity of organic compounds to the freshwater crustacean Thamnocephalus platyurus. Chemosphere. 2021;280 doi: 10.1016/j.chemosphere.2021.130652. [DOI] [PubMed] [Google Scholar]
  • 26.Zvinavashe E., Du T., Griff T., et al. Quantitative structure-activity relationship modeling of the toxicity of organothiophosphate pesticides to Daphnia magna and cyprinus carpio. Chemosphere. 2009;75(11):1531–1538. doi: 10.1016/j.chemosphere.2009.01.081. [DOI] [PubMed] [Google Scholar]
  • 27.Li X., Zhang T., Min X., et al. Toxicity of aromatic compounds to Tetrahymena estimated by microcalorimetry and QSAR. Aquat. Toxicol. 2010;98(4):322–327. doi: 10.1016/j.aquatox.2010.03.002. [DOI] [PubMed] [Google Scholar]
  • 28.Gu W., Li X., Du M., et al. Identification and regulation of ecotoxicity of polychlorinated naphthalenes to aquatic food Chain (green algae-Daphnia magna-fish) Aquat. Toxicol. 2021;233 doi: 10.1016/j.aquatox.2021.105774. [DOI] [PubMed] [Google Scholar]
  • 29.Chen S., Sun G., Fan T., et al. Ecotoxicological QSAR study of fused/non-fused polycyclic aromatic hydrocarbons (FNFPAHs): assessment and priority ranking of the acute toxicity to Pimephales promelas by QSAR and consensus modeling methods. Sci. Total Environ. 2023;876 doi: 10.1016/j.scitotenv.2023.162736. [DOI] [PubMed] [Google Scholar]
  • 30.Zhang Y., Zhu Y., Shao Y., et al. Toxicity of disinfection byproducts formed during the chlorination of sulfamethoxazole, norfloxacin, and 17β-estradiol in the presence of bromide. Environ. Sci. Pollut. Res. 2021;28(36):50718–50730. doi: 10.1007/s11356-021-14161-5. [DOI] [PubMed] [Google Scholar]
  • 31.Sun G., Zhang Y., Pei L., et al. Chemometric QSAR modeling of acute oral toxicity of Polycyclic Aromatic Hydrocarbons (PAHs) to rat using simple 2D descriptors and interspecies toxicity modeling with mouse. Ecotoxicol. Environ. Saf. 2021;222 doi: 10.1016/j.ecoenv.2021.112525. [DOI] [PubMed] [Google Scholar]
  • 32.Wang Y., Wang J., Mu J., et al. Aquatic predicted no-effect concentration for three polycyclic aromatic hydrocarbons and probabilistic ecological risk assessment in Liaodong Bay of the Bohai Sea, China. Environ. Sci. Pollut. Res. Int. 2014;21(1):148–158. doi: 10.1007/s11356-013-1597-x. [DOI] [PubMed] [Google Scholar]
  • 33.Zeng L., Zeng S., Dong X., et al. Probabilistic ecological risk assessment of polycyclic aromatic hydrocarbons in southwestern catchments of the Bohai Sea, China. Ecotoxicology. 2013;22(8):1221–1231. doi: 10.1007/s10646-013-1110-9. [DOI] [PubMed] [Google Scholar]
  • 34.Leung K.M.Y., Gray J.S., Li W.K., et al. Deriving sediment quality guidelines from field-based species sensitivity distributions. Environ. Sci. Technol. 2005;39(14):5148–5156. doi: 10.1021/es050450x. [DOI] [PubMed] [Google Scholar]
  • 35.Im J.-K., Cho Y.-C., Noh H.-R., et al. Geographical distribution and risk assessment of volatile organic compounds in tributaries of the han river watershed. Agronomy. 2021;11:956. doi: 10.3390/agronomy11050956. [DOI] [Google Scholar]
  • 36.Jin X., Zha J., Xu Y., et al. Derivation of predicted no effect concentrations (PNEC) for 2,4,6-trichlorophenol based on Chinese resident species. Chemosphere. 2012;86(1):17–23. doi: 10.1016/j.chemosphere.2011.08.040. [DOI] [PubMed] [Google Scholar]
  • 37.Finizio A., Villa S., Vighi M. Predicted No effect concentration (PNEC) Reference module in biomedical Sciences. 2021 doi: 10.1016/B978-0-12-824315-2.00004-X. [DOI] [Google Scholar]
  • 38.Abdel-Shafy H., Mansour M. A review on polycyclic aromatic hydrocarbons: source, environmental impact, effect on human health and remediation. Egypt J Pet. 2015;25:107–123. doi: 10.1016/j.ejpe.2015.03.011. [DOI] [Google Scholar]
  • 39.Flores-Serrano R.M., Iturbe-Argüelles R., Pérez-Casimiro G., et al. Ecological risk assessment for small omnivorous mammals exposed to polycyclic aromatic hydrocarbons: a case study in northeastern Mexico. Sci. Total Environ. 2014;476–477:218–227. doi: 10.1016/j.scitotenv.2013.12.092. [DOI] [PubMed] [Google Scholar]
  • 40.Fan H., Wang Y., Liu X., et al. Derivation of predicted no-effect concentrations for thirty-five pharmaceuticals and personal care products to freshwater ecosystem. Front. Mar. Sci. 2022;9 doi: 10.3389/fmars.2022.1043792. [DOI] [Google Scholar]
  • 41.Salvito D.T., Senna R.J., Federle T.W. A framework for prioritizing fragrance materials for aquatic risk assessment. Environ. Toxicol. Chem. 2002;21(6):1301–1308. doi: 10.1002/etc.5620210627. [DOI] [PubMed] [Google Scholar]
  • 42.Chen Y., Xi X., Yu G., et al. Pharmaceutical compounds in aquatic environment in China: locally screening and environmental risk assessment. Front. Environ. Sci. 2015;9(3):394–401. doi: 10.1007/s11783-014-0653-1. [DOI] [Google Scholar]
  • 43.Sorgog K., Kamo M. Quantifying the precision of ecological risk: conventional assessment factor method vs. species sensitivity distribution method. Ecotoxicol. Environ. Saf. 2019;183 doi: 10.1016/j.ecoenv.2019.109494. [DOI] [PubMed] [Google Scholar]
  • 44.Okonski A.I., MacDonald D.B., Potter K., et al. Deriving predicted no-effect concentrations (PNECs) using a novel assessment factor method. Hum. Ecol. Risk Assess. 2021;27(6):1613–1635. doi: 10.1080/10807039.2020.1865788. [DOI] [Google Scholar]
  • 45.Agency, U.S.E.P . 821-R-02-013. 2002. Short-term methods for estimating the chronic toxicity of effluents and receiving waters to freshwater organisms. [Google Scholar]
  • 46.Hamadache M., Benkortbi O., Hanini S., et al. A Quantitative structure activity relationship for acute oral toxicity of pesticides on rats: validation, domain of application and prediction. J. Hazard Mater. 2016;303:28–40. doi: 10.1016/j.jhazmat.2015.09.021. [DOI] [PubMed] [Google Scholar]
  • 47.Cai Z., Zafferani M., Akande O.M., et al. Quantitative structure–activity relationship (QSAR) study predicts small-molecule binding to RNA structure. J. Med. Chem. 2022;65(10):7262–7277. doi: 10.1021/acs.jmedchem.2c00254. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 48.Golbraikh A., Shen M., Xiao Z., et al. Rational selection of training and test sets for the development of validated QSAR models. J. Comput. Aided Mol. Des. 2003;17(2–4):241–253. doi: 10.1023/a:1025386326946. [DOI] [PubMed] [Google Scholar]
  • 49.Yang Y.-T., Ni H.-G. Predictive in silico models for aquatic toxicity of cosmetic and personal care additive mixtures. Water Res. 2023;236 doi: 10.1016/j.watres.2023.119981. [DOI] [PubMed] [Google Scholar]
  • 50.Fang Z., Yu X., Zeng Q. Random forest algorithm-based accurate prediction of chemical toxicity to Tetrahymena pyriformis. Toxicology. 2022;480 doi: 10.1016/j.tox.2022.153325. [DOI] [PubMed] [Google Scholar]
  • 51.Sharma B.K., Pilania P., Singh P., et al. CP-MLR directed QSAR study of carbonic anhydrase inhibitors: sulfonamide and sulfamate inhibitors. Cent. Eur. J. Chem. 2009;7(4):909–922. doi: 10.2478/s11532-009-0073-4. [DOI] [Google Scholar]
  • 52.Roy K., Kar S., Ambure P. On a simple approach for determining applicability domain of QSAR models. Chemometr. Intell. Lab. Syst. 2015;145:22–29. doi: 10.1016/j.chemolab.2015.04.013. [DOI] [Google Scholar]
  • 53.Wu X., Guo J., Dang G., et al. Prediction of acute toxicity to Daphnia magna and interspecific correlation: a global QSAR model and a Daphnia-minnow QTTR model. SAR QSAR Environ. Res. 2022;33(8):583–600. doi: 10.1080/1062936x.2022.2098814. [DOI] [PubMed] [Google Scholar]
  • 54.Bo T., Lin Y., Han J., et al. Machine learning-assisted data filtering and QSAR models for prediction of chemical acute toxicity on rat and mouse. J. Hazard Mater. 2023;452 doi: 10.1016/j.jhazmat.2023.131344. [DOI] [PubMed] [Google Scholar]
  • 55.Sahigara F., Mansouri K., Ballabio D., et al. Comparison of different approaches to define the applicability domain of QSAR models. Molecules. 2012:4791–4810. doi: 10.3390/molecules17054791. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 56.Grzonkowska M., Sosnowska A., Barycki M., et al. How the structure of ionic liquid affects its toxicity to Vibrio fischeri? Chemosphere. 2016;159:199–207. doi: 10.1016/j.chemosphere.2016.06.004. [DOI] [PubMed] [Google Scholar]
  • 57.Liu H., Wei M., Yang X., et al. Development of TLSER model and QSAR model for predicting partition coefficients of hydrophobic organic chemicals between low density polyethylene film and water. Sci. Total Environ. 2016;574 doi: 10.1016/j.scitotenv.2016.08.051. [DOI] [PubMed] [Google Scholar]
  • 58.Bornstein J.M., Adams J., Hollebone B., et al. Effects-driven chemical fractionation of heavy fuel oil to isolate compounds toxic to trout embryos. Environ. Toxicol. Chem. 2014;33(4):814–824. doi: 10.1002/etc.2492. [DOI] [PubMed] [Google Scholar]
  • 59.McGrath J.A., Di Toro D.M. Validation of the target lipid model for toxicity assessment of residual petroleum constituents: monocyclic and polycyclic aromatic hydrocarbons. Environ. Toxicol. Chem. 2009;28(6):1130–1148. doi: 10.1897/08-271.1. [DOI] [PubMed] [Google Scholar]
  • 60.Chen S., Sun G., Fan T., et al. Ecotoxicological QSAR study of fused/non-fused polycyclic aromatic hydrocarbons (FNFPAHs): assessment and priority ranking of the acute toxicity to Pimephales promelas by QSAR and consensus modeling methods. Sci. Total Environ. 2023;876 doi: 10.1016/j.scitotenv.2023.162736. [DOI] [PubMed] [Google Scholar]
  • 61.Di Marzio W., Saenz M.E. Quantitative structure-activity relationship for aromatic hydrocarbons on freshwater fish. Ecotoxicol. Environ. Saf. 2004;59(2):256–262. doi: 10.1016/j.ecoenv.2003.11.006. [DOI] [PubMed] [Google Scholar]
  • 62.Wang L., Xing P., Wang C., et al. Maximal information coefficient and support vector regression based nonlinear feature selection and QSAR modeling on toxicity of alcohol compounds to tadpoles of Rana temporaria. J. Braz. Chem. Soc. 2019;30:279–285. doi: 10.21577/0103-5053.20180176. [DOI] [Google Scholar]
  • 63.Yang L., Tian R., Li Z., et al. Data driven toxicity assessment of organic chemicals against Gammarus species using QSAR approach. Chemosphere. 2023;328 doi: 10.1016/j.chemosphere.2023.138433. [DOI] [PubMed] [Google Scholar]
  • 64.Rezić T., Vrsalović Presečki A., Kurtanjek Ž. New approach to the evaluation of lignocellulose derived by-products impact on lytic-polysaccharide monooxygenase activity by using molecular descriptor structural causality model. Bioresour. Technol. 2021;342 doi: 10.1016/j.biortech.2021.125990. [DOI] [PubMed] [Google Scholar]
  • 65.Adawara S., Shallangwa G., Mamza P., et al. Molecular docking and QSAR theoretical model for prediction of phthalazinone derivatives as new class of potent dengue virus inhibitors. Beni-Suef univ j basic appl sci. 2020;9 doi: 10.1186/s43088-020-00073-9. [DOI] [Google Scholar]
  • 66.Cassotti M., Ballabio D., Todeschini R., et al. A similarity-based QSAR model for predicting acute toxicity towards the fathead minnow (Pimephales promelas) SAR QSAR Environ. Res. 2015;26(6):521. doi: 10.1080/1062936x.2015.1035056. [DOI] [PubMed] [Google Scholar]
  • 67.Yijun R., Xiao-Ke X., Tao J. 2022. The Maximum Capability of a Topological Feature in Link Prediction. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 68.Zhu T., Jiang Y., Cheng H., et al. Development of pp-LFER and QSPR models for predicting the diffusion coefficients of hydrophobic organic compounds in LDPE. Ecotoxicol. Environ. Saf. 2020;190 doi: 10.1016/j.ecoenv.2020.110179. [DOI] [PubMed] [Google Scholar]
  • 69.Pearlman R.S., Smith K.M. Novel software tools for chemical diversity. Perspect. Drug Discov. Des. 1998;9:339–353. doi: 10.1023/A:1027232610247. [DOI] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Multimedia component 1
mmc1.docx (179.5KB, docx)

Data Availability Statement

The authors confirm that the data supporting the findings of this study are available within the manuscript and its supplementary materials.


Articles from Heliyon are provided here courtesy of Elsevier

RESOURCES