Skip to main content
Scientific Reports logoLink to Scientific Reports
. 2025 Jul 8;15:24557. doi: 10.1038/s41598-025-08002-5

Comparative QSPR study of food preservatives using topological indices and regression models

K B Gayathri 1, S Roy 1,
PMCID: PMC12238366  PMID: 40628862

Abstract

Food preservatives play a crucial role in extending the shelf life of food products. Understanding their physicochemical properties can help in designing more effective and safer preservatives. In this study, we use a Quantitative Structure Property Relationship (QSPR) approach based on topological indices to develop a predictive model for certain physicochemical properties of food preservatives. We compare the performance of linear and curvilinear regression models to understand which provides the best prediction model. Among the tested models, the cubic regression model demonstrated superior predictive performance. Of all the models tested, the cubic regression model had the best predictive capabilities such as Inline graphic for vapour density and Inline graphic for molecular weight. To validate our findings, we employ the developed model to estimate the properties of an existing food preservative, the propionic acid. Our results offer valuable insights that can aid in the development of new and improved food preservatives.

Keywords: Topological indices, QSPR study, Curvilinear regression models, Food preservatives

Subject terms: Chemical biology, Chemistry, Mathematics and computing

Introduction

Food additives are substances added to food to enhance its shelf life, nutritional value, flavour, or appearance. These substances can be classified as colorants, emulsifiers, flavour enhancers, antioxidants, preservatives, and sweeteners1,2. Natural or synthetic substances added to food to prolong its shelf life are known as food preservatives3,4. These compounds function as antibacterial agents, lessen oxidation, and inhibit the generation of toxins. Up to 40% of all food produced for human consumption globally is wasted due to microbial food deterioration. Food safety and edible quality are ensured by the addition of preservatives5. The demand for less hazardous preservatives is increasing. As a result, the market for safe and effective preservatives is growing. With increasing demand, developing and evaluating new preservatives is imperative6. Analyzing the physicochemical properties of the compound is crucial for its effectiveness. Predictive models use computational and mathematical techniques without the aid of extensive laboratory testing setups to estimate the biological effects. Here we develop models which help to assess the toxicity, allergenicity, and metabolic effects using the structural and physicochemical properties7,8.

QSPR (Quantitative Structure Property Relationship) and QSAR (Quantitative Structure-Activity Relationship) are computational modeling techniques that use a molecule’s chemical structure to predict its physicochemical properties and biological activity, respectively9,10. With the use of certain molecular descriptors and statistical or machine learning models, QSAR establishes a relationship between a compound’s chemical structure and biological activities, such as toxicity or medicinal efficacy. Rather, QSPR focuses on using the structure to predict physical or chemical aspects like melting or boiling points. Both approaches use quantitative analysis of topological properties to generate prediction models that support material science, environmental chemistry, and drug design1113.

The study of networks of points connected by lines is the subject of the mathematical field known as graph theory14. The application of graph theory can help solve several real-world problems. Over the nineteenth century, graph theory was used in chemistry. The development of chemical graph theory, a new subfield of graph theory, followed. Chemical graphs are used to depict chemical structures in chemical graph theory15. An atom is represented by a vertex, and the relationship between atoms is represented by an edge. The compound’s chemical structure is linked to the majority of its chemical information. A chemical network is represented by topological indices (TI), which are numerical invariants. The TIs are utilized in QSAR and QSPR research to build models that forecast the physical, chemical, or biochemical properties of certain compounds1619. As a computational method for drug design and various other structural analysis, this concept is widely applied2026.

Several studies have used different topological indices to predict specific properties of compounds. In 2002, scientists used certain specific TIs to forecast the biological activity of particular alkoxyphenols27. To forecast the physical characteristics of medications for mental disorders, Ejma et al. utilized ensemble learning in conjunction with topological indices28. Using regression models and topological indices, Huili Li et al. examined the structural characteristics of amino acids in order to forecast their chemical and physical characteristics29. The physical characteristics of some antibacterial medications were recently predicted by Abubakar et al. using two regression models and specific neighborhood sum-based indices30.

Both linear regression models and non-linear regression models can be utilized to execute QSAR and QSPR analysis31,13. The link between the molecular structural property and the desired activity or property determines the type of model to use. When the relationship between the dependent and independent variables is non-linear, curvilinear models are thought to be optimal. In this study, the physicochemical properties (P) of the food preservatives are the dependent variables, and their corresponding topological indices (TI) are the independent variables. Here we consider linear, quadratic and cubic regression models.

Linear:

graphic file with name d33e268.gif

Quadratic:

graphic file with name d33e274.gif

Cubic:

graphic file with name d33e280.gif

where ai’s (Inline graphic to 9) are constants.

Materials and methods

The study takes into account fourteen chemical compounds that are currently used for efficient food preservation. Acetic acid is a commonly used preservative as it breaks down bacterial and fungal cell membranes, making it impossible for them to survive. It is also vinegar’s primary ingredient. It is a widely accepted chemical that occurs naturally32. Ascorbic acid is a naturally occurring antioxidant that is mostly added to fruits, vegetables, and drinks, in contrast to other preservatives33. An ester derivative of ascorbic acid is ascorbyl palmitate. It is added to sunflower oil to extend its shelf life34. One of the earliest chemical preservatives still in use in the food enterprises is benzoic acid. It works well to stop the growth of yeast35. Generally regarded as antibacterials, butylated hydroxyanisole (BHA) and butylated hydroxytoluene (BHT) are phenolic antioxidants36. As an anti-enzymatic preservative, citric acid slows down food from going through enzymatic reactions long after it has been harvested37. A chemical preservative called dimethyl carbonate is used to prolong the shelf life of goods like salsa38. A cationic surfactant called ethyl lauroyl arinate is used in food preservation to enhance the microbiological safety and quality characteristics of a variety of food products39. Often referred to as methylparaben, methyl 4-hydroxybenzoate is a typical food preservative used in baked goods and jams40. Likewise, processed meat and dairy products include propyl 4-hydroxybenzoate, also referred to as propylparaben, as a food preservative41. A common phenolic antioxidant in the food, cosmetics and pharmaceutical sectors is propyl gallate42. Sorbic acids and their salts are the most widely used preservatives in the food industry due to their neutral flavor and physiological inertness43. Sunflower oil is also preserved using tertiary butylhydroquinone (TBHQ)44. Figure 1 shows the molecular structures of the food preservatives mentioned above.

Fig. 1.

Fig. 1

Molecular structures of food preservatives.

Figure 2 shows the molecular graph representation of benzoic acid’s chemical structure. Similarly all of the aforementioned compounds’ chemical structures can be depicted as molecular graphs.

Fig. 2.

Fig. 2

Molecular graph of Benzoic acid.

The efficacy, stability, and safety of a food preservative compound depend much on several physical and chemical properties, including molecular weight (MW), boiling point (BP), melting point (MP), pKa value, vapour density (VD), log P and LD50 value. A greater LD50 value suggests less toxicity, so the preservative is safer for use. Table 1 depicts the physicochemical properties and LD50 value of the selected food preservatives. All the data are collected from PubChem45.

Table 1.

Physicochemical properties and LD50-value of food preservatives.

Compound Molecular weight (g/mol) Melting Inline graphic Boiling point at 760 mmHg pKa Vapour density log P LD50 (oral, rat) (mg/kg)
Acetic acid 60.05 16.6 117.9 2.07 − 0.17 3310
Ascorbic acid 176.12 191 552.7 4.7 − 1.85 11900
Ascorbyl Palmitate 414.5 116.5
Benzoic acid 122.12 122.4 249.2 4.207 4.21 1.87 1700
BHA 360.494 51 267
BHT 220.35 70.5 264.5 12.23 7.6 5.32 2930
Citric acid 192.12 156 310 2.79 − 1.64 5500
Dimethyl Dicarbonate 134.09 17 175 260
Ethyl Lauroyl Arginate 384.6
Methyl 4-hydroxybenzoate 152.98 125.2 275 1.96 5600
Propyl 4-hydroxybenzoate 180.2 97.5 294.5 3.04
Propyl gallate (PG) 212.199 150 7.94 7.3 1.8 .
Sorbic acid 112.13 134.5 228 4.76 3.87 1.33 7360
TBHQ 166.217 128 273 10.8 .

The study’s methodology flow chart is presented in Fig. 3.

Fig. 3.

Fig. 3

Methodological framework of the study.

Let G be a simple connected graph with vertex set V(G) and edge set E(G). Let u be any arbitrary vertex in the set V(G). Then Inline graphic, the degree of the vertex u, is the number of vertices adjacent to u. To examine the molecular structure of the above mentioned compounds, we employ sixteen degree based indices. Table 2 shows the selected topological indices used for the study.

Table 2.

Topological indices.

Topological index Notation Formula References
First Zagreb Index Inline graphic Inline graphic Gutman and Trinjstic46
Second Zagreb Index Inline graphic Inline graphic Gutman and Trinjstic46
Reduced Zagreb Index Inline graphic Inline graphic Furtula et al.47
Hyper Zagreb Index HM(G) Inline graphic Shirdel et al.48
Augmented Zagreb Index AZ(G) Inline graphic Ali et al.49
Randić Zagreb Index R(G) Inline graphic Randić50
Reciprocal Randić Zagreb Index RR(G) Inline graphic Gutman et al.51
Reduced Reciprocal Randić Index RRR(G) Inline graphic Gutman et al.51
Harmonic Index H(G) Inline graphic Fajtlowicz52
Sum Connectivity Index SC(G) Inline graphic Trinjastic53
Geometric Arithmetic Index GA(G) Inline graphic Vukicevic et al.54
Inverse Sum Index IS(G) Inline graphic Vukicevic et al.55
Forgotten Index F(G) Inline graphic Gutman and Trinjstic46
Symmetric Division Index SD(G) Inline graphic Vukicevic et al.55
Atom Bond Connectivity Index ABC(G) Inline graphic Estrada et al.56
Sombor Index So(G) Inline graphic Gutman57

For calculating the topological indices, we take the aid of the edge partitioning method. This is a useful technique for computing topological indices by categorizing edges based on the degree of their end vertices. Let Inline graphic represent the set of edges joining the vertices u and v. Here, r and s represent the degree of vertices u and v, respectively. For every chemical graph, the maximum degree of a vertex is 4. Here in this study, the molecular graphs of the selected compounds have edges of types Inline graphic . Inline graphic represents the cardinality of the set Inline graphic. Table 3 gives the edge partitions of the dataset of compounds.

Table 3.

Edge partitions of the compounds.

Compound Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic Inline graphic
Acetic acid 3
Ascorbic acid 1 4 3 4
Ascorbyl Palmitate 1 5 14 5 4
Benzoic acid 2 4 2 1
BHA 1 1 3 1 5 1 1
BHT 2 6 4 2 2
Citric acid 6 1 2 2 1
Dimethyl Dicarbonate 2 2 4
Ethyl Lauroyl Arginate 2 4 13 6 1
Methyl 4-hydroxybenzoate 1 2 2 5 1
Propyl 4-hydroxybenzoate 1 2 4 5 1
Propyl gallate (PG) 1 4 2 5 3
Sorbic acid 1 2 3 1
TBHQ 2 3 1 3 1 1

Results and discussion

In this section, we computed the topological indices of all selected compounds. As a representative example, we consider the case of benzoic acid, and the results are summarized in the following theorem.

Theorem 3.1

Let Inline graphic represent the molecular graph of benzoic acid. Then, (1) Inline graphic, (2) Inline graphic, (3) Inline graphic, (4) Inline graphic, (5) Inline graphic, (6) Inline graphic, (7) Inline graphic, (8) Inline graphic, (9) Inline graphic, (10) Inline graphic, (11) Inline graphic, (12) Inline graphic, (13) Inline graphic, (14) Inline graphic, (15) Inline graphic, (16) Inline graphic.

Proof

Let Inline graphic denote the molecular graph of benzoic acid. Then evaluating each indices from Table 2, using the edge partition from Table 3, we get,

  • Inline graphic

  • Inline graphic

  • Inline graphic

  • Inline graphic

  • Inline graphic

  • Inline graphic

  • Inline graphic

  • Inline graphic

  • Inline graphic

  • Inline graphic

  • Inline graphic

  • Inline graphic

  • Inline graphic

  • Inline graphic

  • Inline graphic

  • Inline graphic

We determine the degree-based topological indices of each of the chosen compounds using MATLAB and a methodology similar to that in the previously discussed Theorem. The compounds’ computed indices are shown in the Tables 4 and 5.

Table 4.

Calculated topological indices of the compounds (Part I).

Compound Inline graphic Inline graphic Inline graphic HM AZ R RR RRR
Acetic acid 12 9 0 48 10.125 1.7321 5.1962 0
Ascorbic acid 58 68 22 292 91.0625 5.5746 27.6909 12.2426
Ascorbyl Palmitate 128 139 40 582 222.4375 13.9684 62.3219 29.0711
Benzoic acid 40 43 12 182 66.1406 4.3045 19.3631 8.8284
BHA 64 72 21 326 91.7007 5.9477 29.8576 12.5206
BHT 84 96 28 452 103.4015 7.0317 38.1903 14.5558
Citric acid 58 62 16 292 68.4444 5.7764 26.4122 8.7420
Dimethyl Dicarbonate 34 34 8 150 54.75 4.2019 16.0905 5.6569
Ethyl Lauroyl Arginate 110 113 29 476 192.8906 13.0064 53.4536 28.4853
Methyl 4-hydroxybenzoate 50 55 16 234 82.1406 5.2364 24.1258 11.0711
Propyl 4-hydroxybenzoate 58 63 18 266 98.1406 6.2364 28.1258 13.0711
Propyl gallate (PG) 70 79 24 338 111.6719 7.0577 33.5899 15.0711
Sorbic acid 28 26 5 114 46.7500 3.7701 13.3278 4.4142
TBHQ 55 61 17 283 71.0757 5.0015 25.2767 9.6921

Table 5.

Calculated topological indices of the compounds (Part II).

Compound H SC GA IS F SDI ABC So
Acetic acid 1.5 1.5 2.5981 2.25 30 10 2.4495 9.4868
Ascorbic acid 5.2 5.5520 11.3463 13.2667 156 30.3333 8.7611 42.6724
Ascorbyl Palmitate 13.5 13.9464 28.1719 30.4167 304 66 20.8913 92.6438
Benzoic acid 4.1333 4.3027 8.6916 9.4 96 21 6.5423 29.0926
BHA 5.4857 5.9413 12.0976 14.031 182 35.5 9.6765 47.8665
BHT 6.2381 7.0446 14.4307 17.5286 260 49 12.2819 63.9707
Citric acid 5.1524 5.5361 10.8311 12.081 168 35.6667 9.2389 44.2521
Dimethyl Dicarbonate 3.9333 3.9436 7.5369 7.6333 82 20.3333 5.8756 25.2189
Ethyl Lauroyl Arginate 12.5667 12.7462 25.2285 26.0333 250 519.3333 18.7819 79.7667
Methyl 4-hydroxybenzoate 5 5.2217 10.5738 11.6667 124 26 7.9565 36.4879
Propyl 4-hydroxybenzoate 6 6.2217 12.5738 13.6667 140 30 9.3707 42.1447
Propyl gallate (PG) 6.6667 7.0382 14.3059 16.1667 180 36.6667 10.9228 51.2977
Sorbic acid 3.5667 3.5246 6.6547 6.3667 62 17.333 5.1685 20.6515
TBHQ 4.519 4.9695 10.0612 11.7143 161 3 8.3717 41.5816

Tables 678910 and 11 show the linear, quadratic, and cubic regression models that use different degree-based topological indices as predictors to predict the physicochemical properties of the food preservtaives (MW, MP, BP, pKa, VD, Log P, and LD50). All the regression models are developed using MATLAB progamming. Here we consider the following metrics for evaluation:

  • R (regression coefficient): Measures the strength of the relationship between the actual value and value predicted by the model. The range of values for R is Inline graphic to 1. When the R-value is higher than 0.9, it indicates a strong positive correlation and a strong influence of the predictor variable on the dependent parameter.

  • Inline graphic (coefficient of determination): Indicates the proportion of the variance in the dependent variable the model can account for. A higher Inline graphic value denotes a better fit and implies that the majority of the variation is explained by the model.

  • RMSE (Root Mean Square Error): Demonstrates average error in the predictions. A smaller RMSE indicates a better model.

  • F-statistic: It evaluates the overall significance of the model. A low p value and a higher F-statistic indicate that the independent factors largely account for the variance in the dependent variable.

  • p value: It evaluates the model’s statistical significance. It is suggested that the predictor is statistically significant if the p value is less than 0.05.

In linear regression analysis, most of the computed indices show a positive correlation with the physicochemical properties. The Reduced Reciprocal Randić index shows the highest correlation with LD50 value and melting point. Vapour density and molecular weight have the strongest correlations with the Atom bond connectivity Index. Randic index, Hyper Zagreb index and Harmonic index shows a high association with boiling point, pKa value and Log P value respectively. Although they do not predominate, descriptors such as RM2, AZ, and GA exhibit reasonable relationships across many attributes. The regression coefficient between each property and index is displayed in Table 12. The linear regression plots of the physicochemical property and index pairs with the highest coefficient of determination are displayed in Fig. 4. These plots illustrate the strong predictive relationship between specific topological indices and selected physicochemical properties.

Table 6.

Linear regression analysis.

Equation R Inline graphic RMSE F-statistic p value
Inline graphic 0.9019 0.8135 44.3986 52.3308 0
Inline graphic 0.0266 0.0007 54.0850 0.0085 0.9280
Inline graphic 0.5211 0.2715 87.8094 3.3543 0.1003
Inline graphic 0.6406 0.4103 2.4482 4.1753 0.087
Inline graphic 0.9876 0.9754 0.3332 118.7533 0.0017
Inline graphic 0.4110 0.1689 1.9440 1.4228 0.2718
Inline graphic 0.1829 0.0334 3370.5546 0.2076 0.6647
Inline graphic 0.8896 0.7914 46.9468 45.5369 0.0000
Inline graphic 0.0773 0.0060 53.9424 0.0721 0.7929
Inline graphic 0.5540 0.3069 85.6467 3.9861 0.0770
Inline graphic 0.6262 0.3921 2.6053 3.2246 0.1325
Inline graphic 0.9847 0.9696 0.3704 95.5742 0.0023
Inline graphic 0.3938 0.1551 1.9601 1.2850 0.2943
Inline graphic 0.2128 0.0453 3349.8183 0.2846 0.6129
Inline graphic 0.7932 0.6292 62.5971 20.3630 0.0007
Inline graphic 0.2454 0.0602 52.4491 0.7693 0.3977
Inline graphic 0.5843 0.3415 83.4874 4.6665 0.0590
Inline graphic 0.2900 0.0841 3.1979 0.4589 0.5282
Inline graphic 0.8649 0.7480 1.0657 8.9061 0.0584
Inline graphic 0.2766 0.0765 2.0493 0.5799 0.4712
Inline graphic 0.1350 0.0182 3396.9600 0.1114 0.7499
Inline graphic 0.8758 0.7670 53.5982 39.4997 0.0000
Inline graphic 0.2202 0.0485 54.2124 0.5603 0.4698
Inline graphic 0.4810 0.2313 99.7177 2.7086 0.1342
Inline graphic 0.7680 0.5898 2.5321 7.1900 0.0437
Inline graphic 0.7582 0.5748 1.7942 5.4080 0.0806
Inline graphic 0.5536 0.3065 2.2337 1.7676 0.2544
Inline graphic 0.0041 0.0000 3972.8655 0.0001 0.9916
Inline graphic 0.8901 0.7923 47.5157 50.0762 0
Inline graphic 0.4063 0.1651 54.1512 2.3727 0.1479
Inline graphic 0.5875 0.3452 95.1462 3.7175 0.0805
Inline graphic 0.5394 0.2909 2.5915 2.8629 0.1235
Inline graphic 0.9872 0.9745 0.3411 152.0418 0.0003
Inline graphic 0.3287 0.1080 1.9427 0.9726 0.3487
Inline graphic 0.3365 0.1132 3095.1422 1.0211 0.3418
Inline graphic 0.8764 0.7681 50.0842 46.2338 0
Inline graphic 0.4373 0.1912 53.2048 2.8042 0.1266
Inline graphic 0.6982 0.4875 87.1746 9.5063 0.0115
Inline graphic 0.5874 0.3450 2.5132 3.7121 0.0807
Inline graphic 0.9845 0.9693 0.3676 125.2836 0.0004
Inline graphic 0.3461 0.1198 1.9024 1.0843 0.3238
Inline graphic 0.3744 0.1402 3050.1256 1.1392 0.3125
Inline graphic 0.8921 0.7969 46.6978 50.2742 0
Inline graphic 0.4312 0.1859 53.3772 2.7131 0.1317
Inline graphic 0.7108 0.5052 85.1373 10.3018 0.0101
Inline graphic 0.5932 0.3519 2.4874 3.8476 0.0781
Inline graphic 0.9886 0.9773 0.3216 169.4287 0.0002
Inline graphic 0.3567 0.1272 1.8953 1.1433 0.3106
Inline graphic 0.3842 0.1476 3031.2456 1.1986 0.3001
Inline graphic 0.9024 0.8144 44.7321 55.1387 0
Inline graphic 0.4378 0.1917 53.1954 2.7934 0.1268
Inline graphic 0.7172 0.5144 83.8572 10.7691 0.0092
Inline graphic 0.5771 0.3331 2.5107 3.7461 0.0821
Inline graphic 0.9898 0.9797 0.3079 180.6244 0.0002
Inline graphic 0.3723 0.1386 1.8731 1.2298 0.2943
Inline graphic 0.4086 0.1670 2995.3287 1.3315 0.2745

Table 7.

Linear regression analysis.

Equation R Inline graphic RMSE F-statistic p value
Inline graphic 0.8840 0.7814 51.9136 42.8963 0.0000
Inline graphic 0.2294 0.0526 51.8268 0.6667 0.4301
Inline graphic 0.0677 0.0046 103.8197 0.0552 0.8182
Inline graphic 0.4519 0.2042 3.5270 1.2828 0.3088
Inline graphic 0.9836 0.9674 0.4950 88.9777 0.0025
Inline graphic 0.4853 0.2355 1.9948 1.8484 0.2228
Inline graphic 0.2288 0.0523 3853.7237 0.3314 0.5858
Inline graphic 0.8940 0.799 49.761 47.749 0.00001
Inline graphic 0.2608 0.068 53.653 0.803 0.389
Inline graphic 0.5536 0.306 94.717 3.978 0.077
Inline graphic 0.5238 0.274 3.368 1.891 0.228
Inline graphic 0.9936 0.987 0.310 230.76 0.0006
Inline graphic 0.4011 0.161 2.215 1.342 0.285
Inline graphic 0.1165 0.0136 3462.3901 0.0825 0.7836
Inline graphic 0.8935 0.7984 46.1575 47.5214 0.0000
Inline graphic 0.2757 0.0760 49.1407 0.9052 0.3618
Inline graphic 0.5532 0.3060 85.7052 3.9684 0.0775
Inline graphic 0.6499 0.4223 2.5397 3.6553 0.1141
Inline graphic 0.9914 0.9828 0.2782 171.6443 0.0010
Inline graphic 0.4071 0.1657 1.9477 1.3906 0.2768
Inline graphic 0.1129 0.0127 3463.8341 0.0774 0.7901
Inline graphic 0.8904 0.7927 46.8009 45.8961 0.0000
Inline graphic 0.3746 0.1403 47.4005 1.7953 0.2073
Inline graphic 0.6058 0.3670 81.8504 5.2186 0.0482
Inline graphic 0.1019 0.0104 3.3240 0.0525 0.8279
Inline graphic 0.9714 0.9437 0.5038 50.2757 0.0058
Inline graphic 0.0640 0.0041 2.1281 0.0288 0.8701
Inline graphic 0.2232 0.0498 3398.2034 0.3145 0.5953
Inline graphic 0.8619 0.7429 52.1226 34.6773 0.0001
Inline graphic 0.2463 0.0606 49.5482 0.7101 0.4173
Inline graphic 0.4575 0.2093 91.4826 2.3821 0.1571
Inline graphic 0.6654 0.4428 2.4942 3.9734 0.1028
Inline graphic 0.9559 0.9137 0.6238 31.7523 0.0111
Inline graphic 0.3997 0.1598 1.9547 1.3311 0.2865
Inline graphic 0.1203 0.0145 3460.7911 0.0881 0.7766
Inline graphic 0.8951 0.8012 45.8347 48.3627 0.0000
Inline graphic 0.2301 0.0529 49.7510 0.6149 0.4495
Inline graphic 0.4307 0.1855 92.8499 2.0493 0.1861
Inline graphic 0.6364 0.4050 2.5774 3.4037 0.1244
Inline graphic 0.9674 0.9359 0.5373 43.8302 0.0070
Inline graphic 0.3969 0.1575 1.9573 1.3091 0.2902
Inline graphic 0.0962 0.0093 3469.9437 0.0561 0.8207
Inline graphic 0.9028 0.8151 44.2025 52.9029 0.0000
Inline graphic 0.2584 0.0668 49.3861 0.7871 0.3940
Inline graphic 0.5152 0.2655 88.1719 3.2529 0.1048
Inline graphic 0.5940 0.3528 2.6881 2.7259 0.1596
Inline graphic 0.9949 0.9898 0.2146 290.5792 0.0004
Inline graphic 0.4103 0.1684 1.9447 1.4171 0.2727
Inline graphic 0.1116 0.0125 3464.3409 0.0757 0.7925
Inline graphic 0.8998 0.8096 44.8602 51.0137 0.0000
Inline graphic 0.2621 0.0687 49.3354 0.8114 0.3870
Inline graphic 0.5007 0.2507 89.0574 3.0104 0.1168
Inline graphic 0.6369 0.4056 2.5761 3.4119 0.1240
Inline graphic 0.9827 0.9657 0.3931 84.4840 0.0027
Inline graphic 0.4086 0.1669 1.9464 1.4026 0.2749
Inline graphic 0.1222 0.0149 3459.9941 0.0909 0.7732

Table 8.

Quadratic regression analysis.

Equation R Inline graphic RMSE F-statistic p value
Inline graphic 0.9022 0.8139 44.3444 24.0571 0.0001
Inline graphic 0.4622 0.2136 45.3338 1.3584 0.3007
Inline graphic 0.6446 0.4155 78.6519 2.8438 0.1167
Inline graphic 0.7061 0.4985 2.3662 1.9883 0.2515
Inline graphic 0.9899 0.9799 0.3010 48.7421 0.0201
Inline graphic 0.5513 0.3039 1.7792 1.3097 0.3373
Inline graphic 0.2221 0.0493 3399.0454 0.1297 0.8812
Inline graphic 0.8919 0.7954 46.4962 21.3847 0.0002
Inline graphic 0.4787 0.2291 44.8847 1.4863 0.2722
Inline graphic 0.6551 0.4292 77.7285 3.0074 0.1062
Inline graphic 0.7104 0.5047 2.3516 2.0379 0.2453
Inline graphic 0.9870 0.9742 0.3409 37.7717 0.0258
Inline graphic 0.5323 0.2834 1.8052 1.1863 0.3680
Inline graphic 0.2122 0.0450 3406.7461 0.1178 0.8912
Inline graphic 0.8607 0.7409 52.3281 15.7261 0.0006
Inline graphic 0.4913 0.2413 44.5283 1.5906 0.2513
Inline graphic 0.8630 0.4396 77.0152 3.1378 0.0986
Inline graphic 0.6676 0.4457 2.4876 1.6085 0.3072
Inline graphic 0.9840 0.9683 0.3782 30.5181 0.0317
Inline graphic 0.4546 0.2067 1.8994 0.7815 0.4993
Inline graphic 0.2221 0.0493 3399.0239 0.1298 0.8812
Inline graphic 0.8817 0.7774 48.4989 19.2101 0.0003
Inline graphic 0.4883 0.2384 44.6139 1.5653 0.2562
Inline graphic 0.6616 0.4377 77.1479 3.1132 0.1000
Inline graphic 0.7223 0.5218 2.3107 2.1821 0.2287
Inline graphic 0.9858 0.9717 0.3569 34.3884 0.0283
Inline graphic 0.5480 0.3003 1.7838 1.2875 0.3426
Inline graphic 0.2479 0.0615 3377.2959 0.1637 0.8534
Inline graphic 0.8899 0.7920 46.8887 20.9364 0.0002
Inline graphic 0.4693 0.2202 45.1444 1.4119 0.2883
Inline graphic 0.6223 0.3872 80.5331 2.5278 0.1410
Inline graphic 0.5134 0.2636 2.8674 0.7160 0.5423
Inline graphic 0.9847 0.9696 0.3703 31.8779 0.0304
Inline graphic 0.4609 0.2124 1.8925 0.8091 0.4885
Inline graphic 0.1600 0.0256 3441.1868 0.0657 0.9372
Inline graphic 0.8972 0.8049 45.4067 22.6903 0.0001
Inline graphic 0.4580 0.2098 45.4456 1.3272 0.3082
Inline graphic 0.5949 0.3539 82.6973 2.1906 0.1743
Inline graphic 0.5418 0.2936 2.8084 0.8312 0.4990
Inline graphic 0.9984 0.9968 0.1199 312.2868 0.0032
Inline graphic 0.4660 0.2172 1.8868 0.8322 0.4797
Inline graphic 0.1922 0.0370 3421.0900 0.0959 0.9102
Inline graphic 0.9024 0.8143 44.3045 24.1103 0.0001
Inline graphic 0.4653 0.2165 45.2515 1.3816 0.2953
Inline graphic 0.6387 0.4080 79.1602 2.7562 0.1229
Inline graphic 0.6918 0.4786 2.4127 1.8361 0.2718
Inline graphic 0.9918 0.9837 0.2715 60.1651 0.0163
Inline graphic 0.5447 0.2967 1.7883 1.2659 0.3478
Inline graphic 0.2028 0.0411 3413.6492 0.1073 0.9003
Inline graphic 0.8910 0.7939 46.6689 21.1861 0.0002
Inline graphic 0.4775 0.2280 44.9183 1.4766 0.2742
Inline graphic 0.6277 0.3940 80.0849 2.6011 0.1348
Inline graphic 0.5752 0.3309 2.7332 0.9891 0.4477
Inline graphic 0.9816 0.9636 0.4051 26.4614 0.0364
Inline graphic 0.5030 0.2531 1.8430 1.0163 0.4167
Inline graphic 0.1545 0.0239 3444.2302 0.0612 0.9414

Table 9.

Quadratic regression analysis.

Equation R-value Inline graphic RMSE F-statistic p value
Inline graphic 0.8907 0.7934 46.7232 21.1240 0.0002
Inline graphic 0.4683 0.2193 45.1713 1.4043 0.2901
Inline graphic 0.5878 0.3455 83.2326 2.1112 0.1835
Inline graphic 0.4698 0.2207 2.9497 0.5664 0.6073
Inline graphic 0.9893 0.9786 0.3102 45.8277 0.0214
Inline graphic 0.4487 0.2014 1.9057 0.7565 0.5094
Inline graphic 0.1805 0.0326 3428.8522 0.0842 0.9205
Inline graphic 0.8980 0.8064 45.2267 22.9152 0.0001
Inline graphic 0.4635 0.2149 45.2989 1.3683 0.2984
Inline graphic 0.6028 0.3633 82.0902 2.2825 0.1643
Inline graphic 0.5733 0.3287 2.7377 0.9794 0.4506
Inline graphic 0.9974 0.9948 0.1524 193.0910 0.0052
Inline graphic 0.4979 0.2479 1.8493 0.9889 0.4254
Inline graphic 0.1907 0.0364 3422.1640 0.0943 0.9116
Inline graphic 0.8950 0.8010 45.8624 22.1329 0.0001
Inline graphic 0.4677 0.2188 45.1862 1.4001 0.2910
Inline graphic 0.6057 0.3668 81.8638 2.3173 0.1607
Inline graphic 0.7258 0.5268 2.2985 2.2266 0.2239
Inline graphic 0.9945 0.9889 0.2232 89.4665 0.0111
Inline graphic 0.5159 0.2661 1.8268 1.0879 0.3952
Inline graphic 0.1733 0.0300 3433.3551 0.0774 0.9266
Inline graphic 0.8904 0.7928 46.7905 21.0475 0.0002
Inline graphic 0.5403 0.2920 43.0172 2.0617 0.1779
Inline graphic 0.6340 0.4019 79.5616 2.6882 0.1279
Inline graphic 0.5632 0.3172 2.7610 0.9292 0.4662
Inline graphic 0.9719 0.9446 0.4996 17.0555 0.0554
Inline graphic 0.4339 0.1883 1.9212 0.6959 0.5348
Inline graphic 0.2338 0.0546 3389.5317 0.1445 0.8689
Inline graphic 0.8635 0.7457 51.8433 16.1249 0.0005
Inline graphic 0.5069 0.2569 44.0682 1.7289 0.2265
Inline graphic 0.6625 0.4390 77.0595 3.1296 0.0991
Inline graphic 0.7207 0.5194 2.3165 2.1612 0.2310
Inline graphic 0.9867 0.9736 0.3449 36.8963 0.0264
Inline graphic 0.5528 0.3056 1.7770 1.3203 0.3348
Inline graphic 0.2778 0.0772 3348.8623 0.2091 0.8180
Inline graphic 0.8952 0.8014 45.8124 22.1932 0.0001
Inline graphic 0.4445 0.1976 45.7952 1.2310 0.3327
Inline graphic 0.6440 0.4148 78.7032 2.8349 0.1173
Inline graphic 0.6839 0.4677 2.4379 1.7573 0.2833
Inline graphic 0.9937 0.9875 0.2378 78.6844 0.0125
Inline graphic 0.5340 0.2852 1.8029 1.1970 0.3652
Inline graphic 0.2819 0.0795 3344.7401 0.2158 0.8130
Inline graphic 0.9038 0.8170 43.9735 24.5578 0.0001
Inline graphic 0.4533 0.2055 45.5676 1.2934 0.3165
Inline graphic 0.6266 0.3927 80.1759 2.5861 0.1361
Inline graphic 0.6678 0.4459 2.4872 1.6097 0.3070
Inline graphic 0.9949 0.9899 0.2136 97.8067 0.0101
Inline graphic 0.5382 0.2897 1.7973 1.2234 0.3584
Inline graphic 0.2223 0.0494 3398.8996 0.1299 0.8810
Inline graphic 0.9003 0.8106 44.7418 23.5344 0.0001
Inline graphic 0.4618 0.2132 45.3456 1.3551 0.3015
Inline graphic 0.6483 0.4203 78.3304 2.9001 0.1129
Inline graphic 0.7084 0.5019 2.3583 2.0152 0.2481
Inline graphic 0.9892 0.9786 0.3109 45.6368 0.0214
Inline graphic 0.5504 0.3030 1.7803 1.3040 0.3386
Inline graphic 0.2364 0.0559 3387.3384 0.1479 0.8661

Table 10.

Cubic regression analysis.

Equation R Inline graphic RMSE F-statistic p value
Inline graphic 0.9022 0.8139 44.3423 14.5817 0.0006
Inline graphic 0.5681 0.3227 42.0728 1.4294 0.2973
Inline graphic 0.6759 0.4569 75.8189 1.9628 0.2082
Inline graphic 0.7142 0.5100 2.3390 1.0409 0.4873
Inline graphic 0.9918 0.9837 0.2711 20.1015 0.1622
Inline graphic 0.7385 0.5454 1.4378 1.9995 0.2327
Inline graphic 0.4046 0.1637 3188.0429 0.2610 0.8507
Inline graphic 0.8920 0.7957 46.4641 12.9829 0.0009
Inline graphic 0.5573 0.3106 42.4484 1.3513 0.3183
Inline graphic 0.6949 0.4829 73.9811 2.1789 0.1785
Inline graphic 0.7160 0.5127 2.3325 1.0521 0.4838
Inline graphic 0.9881 0.9763 0.3270 13.7198 0.1953
Inline graphic 0.7380 0.5446 1.4390 1.9933 0.2336
Inline graphic 0.4538 0.2059 3106.5286 0.3457 0.7953
Inline graphic 0.8611 0.7414 52.2726 9.5583 0.0028
Inline graphic 0.5286 0.2794 43.3958 1.1634 0.3762
Inline graphic 0.9071 0.8999 72.7515 2.3327 0.0005
Inline graphic 0.6896 0.4756 2.4197 0.9069 0.5311
Inline graphic 0.9842 0.9687 0.3758 10.3027 0.2242
Inline graphic 0.6887 0.4743 1.5462 1.5035 0.3214
Inline graphic 0.4656 0.2167 3085.2781 0.3690 0.7805
Inline graphic 0.8818 0.7775 48.4913 11.6472 0.0013
Inline graphic 0.5658 0.3201 42.1537 1.4124 0.3018
Inline graphic 0.6797 0.4620 75.4574 2.0041 0.2021
Inline graphic 0.7224 0.5218 2.3107 1.0912 0.4723
Inline graphic 0.9888 0.9778 0.3164 14.6756 0.1890
Inline graphic 0.7519 0.5654 1.4059 2.1680 0.2102
Inline graphic 0.4951 0.2451 3028.8685 0.4330 0.7412
Inline graphic 0.8903 0.7925 46.8217 12.7346 0.0009
Inline graphic 0.5189 0.2693 43.7000 1.1057 0.3963
Inline graphic 0.6790 0.4611 75.5263 1.9962 0.2033
Inline graphic 0.5223 0.2728 2.8495 0.3751 0.7790
Inline graphic 0.9848 0.9699 0.3685 10.7289 0.2199
Inline graphic 0.4892 0.2393 1.8598 0.5244 0.6843
Inline graphic 0.3162 0.1000 3307.2395 0.1481 0.9257
Inline graphic 0.8972 0.8050 45.3987 13.7577 0.0007
Inline graphic 0.5271 0.2779 43.4434 1.1543 0.3793
Inline graphic 0.6779 0.4596 75.6282 1.9845 0.2050
Inline graphic 0.7051 0.4972 2.3694 0.9888 0.5036
Inline graphic 0.9986 0.9973 0.1113 120.9979 0.0667
Inline graphic 0.6634 0.4401 1.5957 1.3099 0.3685
Inline graphic 0.4017 0.1613 3192.5405 0.2565 0.8537
Inline graphic 0.9026 0.8146 44.2580 14.6500 0.0005
Inline graphic 0.5608 0.3145 42.3260 1.3765 0.3114
Inline graphic 0.6827 0.4660 75.1783 2.0363 0.1975
Inline graphic 0.7131 0.5084 2.3427 1.0344 0.4892
Inline graphic 0.9920 0.9840 0.2686 20.4962 0.1606
Inline graphic 0.7361 0.5419 1.4433 1.9716 0.2367
Inline graphic 0.3845 0.1478 3218.1667 0.2313 0.8705

Table 11.

Cubic regression analysis.

Equation R Inline graphic RMSE F-statistic p value
Inline graphic 0.8908 0.7935 46.7183 12.8058 0.0009
Inline graphic 0.5202 0.2706 43.6604 1.1131 0.3937
Inline graphic 0.6951 0.4831 73.9624 2.1812 0.1782
Inline graphic 0.4709 0.2218 2.9477 0.2850 0.8350
Inline graphic 0.9940 0.9881 0.2315 27.6890 0.1386
Inline graphic 0.5289 0.2797 1.8098 0.6473 0.6176
Inline graphic 0.4675 0.2185 3081.7808 0.3728 0.7781
Inline graphic 0.8981 0.8066 45.2057 13.9040 0.0007
Inline graphic 0.5367 0.2880 43.1374 1.2135 0.3597
Inline graphic 0.6826 0.4659 75.1846 2.0356 0.1976
Inline graphic 0.6711 0.4504 2.4771 0.8195 0.5630
Inline graphic 0.9991 0.9982 0.0900 185.2779 0.0539
Inline graphic 0.6750 0.4557 1.5733 1.3953 0.3466
Inline graphic 0.3920 0.1537 3207.1254 0.2421 0.8633
Inline graphic 0.8951 0.8012 45.8322 13.4360 0.0008
Inline graphic 0.5351 0.2863 43.1877 1.2037 0.3629
Inline graphic 0.6626 0.4391 77.0527 1.8263 0.2302
Inline graphic 0.9419 0.8854 2.2406 1.2240 0.0060
Inline graphic 0.9998 0.9996 0.0437 786.3431 0.0262
Inline graphic 0.6803 0.4628 1.5630 1.4356 0.3369
Inline graphic 0.3521 0.1240 3262.8268 0.1887 0.8990
Inline graphic 0.8904 0.7929 46.7865 12.7588 0.0009
Inline graphic 0.5427 0.2945 42.9390 1.2525 0.3474
Inline graphic 0.6531 0.4266 77.9033 1.7360 0.2464
Inline graphic 0.7171 0.5143 2.3288 1.0587 0.4819
Inline graphic 0.9759 0.9523 0.4634 6.6616 0.2757
Inline graphic 0.4970 0.2470 1.8504 0.5468 0.6716
Inline graphic 0.2548 0.0649 3371.0144 0.0926 0.9602
Inline graphic 0.8635 0.7457 51.8395 9.7746 0.0026
Inline graphic 0.5669 0.3214 42.1138 1.4208 0.2996
Inline graphic 0.6659 0.4435 76.7495 1.8592 0.2247
Inline graphic 0.7217 0.5209 2.3129 1.0871 0.4734
Inline graphic 0.9897 0.9795 0.3038 15.9476 0.1816
Inline graphic 0.7677 0.5894 1.3665 2.3923 0.1847
Inline graphic 0.5101 0.2603 2998.3607 0.4691 0.7199
Inline graphic 0.8987 0.8076 45.0872 13.9947 0.0007
Inline graphic 0.5744 0.3300 41.8469 1.4773 0.2852
Inline graphic 0.6455 0.4166 78.5783 1.6664 0.2598
Inline graphic 0.6987 0.4882 2.3904 0.9539 0.5150
Inline graphic 0.9976 0.9951 0.1479 68.3555 0.0886
Inline graphic 0.7426 0.5514 1.4283 2.0486 0.2258
Inline graphic 0.4125 0.1702 3175.6327 0.2735 0.8424
Inline graphic 0.9039 0.8170 43.9733 14.8837 0.0005
Inline graphic 0.5634 0.3175 42.2353 1.3954 0.3063
Inline graphic 0.6649 0.4421 76.8411 1.8493 0.2263
Inline graphic 0.7154 0.5118 2.3346 1.0485 0.4849
Inline graphic 0.9958 0.9917 0.1933 39.8914 0.1157
Inline graphic 0.7418 0.5502 1.4302 2.0388 0.2272
Inline graphic 0.3680 0.1354 3241.5237 0.2088 0.8856
Inline graphic 0.9004 0.8107 44.7272 14.2747 0.0006
Inline graphic 0.5711 0.3261 41.9659 1.4520 0.2915
Inline graphic 0.6698 0.4487 76.3899 1.8988 0.2182
Inline graphic 0.7125 0.5077 2.3446 1.0311 0.4902
Inline graphic 0.9923 0.9846 0.2630 21.3801 0.1574
Inline graphic 0.7393 0.5466 1.4359 2.0091 0.2313
Inline graphic 0.4187 0.1753 3165.7712 0.2835 0.8358

Table 12.

Linear regression coefficients of properties with molecular descriptors.

MInline graphic MInline graphic RMInline graphic HM AZ R RR RRR H SC GA IS F SDI ABC So
MW 0.9019 0.8896 0.7932 0.8758 0.8901 0.8764 0.8921 0.9024 0.8840 0.8940 0.8935 0.8904 0.8619 0.8951 0.9028 0.8998
MP 0.0266 0.0773 0.2454 0.2202 0.4063 0.4373 0.4312 0.4378 0.2294 0.2608 0.2757 0.3746 0.2463 0.2301 0.2584 0.2621
BP 0.5211 0.5540 0.5843 0.4810 0.5875 0.6982 0.7108 0.7172 0.0677 0.5536 0.5532 0.6058 0.4575 0.4307 0.5152 0.5007
pKa 0.6406 0.6262 0.2900 0.7680 0.5394 0.5874 0.5932 0.5771 0.4519 0.5238 0.6499 0.1019 0.6654 0.6364 0.5940 0.6369
VD 0.9876 0.9847 0.8649 0.7582 0.9872 0.9845 0.9886 0.9898 0.9836 0.9936 0.9914 0.9714 0.9559 0.9674 0.9949 0.9827
log P 0.4110 0.3938 0.2766 0.5536 0.3287 0.3461 0.3567 0.3723 0.4853 0.4011 0.4071 0.0640 0.3997 0.3964 0.4103 0.4086
LD50 0.1829 0.2128 0.1350 0.0041 0.3365 0.3744 0.3842 0.4086 0.2288 0.1165 0.1129 0.2232 0.1203 0.0964 0.1116 0.1222

Bold values indicate the highest correlation.

Fig. 4.

Fig. 4

Linear regression plots.

In quadratic regression analysis, the Atom bond connectivity index has the highest predictive ability (highest regression coefficient) for molecular weight. Geometric arithmetic index and inverse sum connectivity index have the highest predictive ability for pKa value and melting point, respectively. Symmetric division index shows the highest predictive ability for LD50 value. Unlike linear regression, RM2 shows high predictive ability for boiling point. Forgotten index and sum connectivity index have the highest predictive ability for log P value and vapour density respectively. The regression coefficient between each property and index is displayed in Table 13. The quadratic regression plots of the physicochemical property and index pairs with the highest coefficient of determination are displayed in Fig. 5.

Table 13.

Quadratic regression coefficients of properties with molecular descriptors.

MInline graphic MInline graphic RMInline graphic HM AZ R RR RRR H SC GA IS F SDI ABC So
MW 0.9022 0.8919 0.8607 0.8817 0.8899 0.8972 0.9024 0.8910 0.8907 .8980 0.8950 0.8904 0.8635 0.8952 0.9038 0.9003
MP 0.4622 0.4787 0.4913 0.4883 0.4693 0.4580 0.4653 0.4775 0.4683 0.4635 0.4677 0.5403 0.5069 0.4445 0.4533 0.4618
BP 0.6446 0.6551 0.8630 0.6616 0.6223 0.5949 0.6387 0.6277 0.5878 0.6028 0.6057 0.6340 0.6625 0.6440 0.6266 0.6483
pKa 0.7061 0.7104 0.6676 0.7223 0.5134 0.5418 0.6918 0.5752 0.4698 0.5733 0.7258 0.5632 0.7207 0.6839 0.6678 0.7084
VD 0.9899 0.9870 0.93840 0.9858 0.9847 0.9984 0.9918 0.9816 0.9893 0.9974 0.9945 0.9719 0.9867 0.9932 9949 0.9892
log P 0.5513 0.5323 0.4546 0.5480 0.4609 0.4660 0.5447 0.5030 0.4487 0.4979 0.5159 0.4339 0.5528 0.5340 0.5382 0.5504
LD50 0.2221 0.1222 0.2221 0.479 0.1600 0.192 0.2028 0.1545 0.1805 0.1907 0.1733 0.2338 0.2778 0.2817 0.2223 0.2364

Bold values indicate the highest correlation.

Fig. 5.

Fig. 5

Quadratic regression plots.

In cubic regression analysis, the Atom bond connectivity index exhibits the greatest predictive capacity, indicated by the highest regression coefficient, for molecular weight. The best predictor of pKa value and vapour density is the geometric arithmetic index. The best predictor of LD50 value and melting point is the symmetric division index. High boiling point prediction ability is demonstrated by RM2. The best predictor of the log P value is the forgotten index. The regression coefficient between each property and index is displayed in Table 14. The cubic regression plots of the physicochemical property and index pairs with the highest coefficient of determination are displayed in Fig. 6.

Table 14.

Cubic regression coefficients of properties with molecular descriptors.

MInline graphic MInline graphic RMInline graphic HM AZ R RR RRR H SC GA IS F SDI ABC So
MW 0.9022 0.8920 0.8611 0.8818 0.8908 0.8972 0.9026 0.8911 0.8908 0.8981 0.8951 0.8904 0.8635 0.8987 0.9039 0.9004
MP 0.5681 0.5573 0.5286 0.5658 0.5189 0.5272 0.5608 0.5371 0.5202 0.5367 0.5351 0.5427 0.5669 0.5744 0.5634 0.5711
BP 0.6759 0.6949 0.9071 0.6797 0.6790 0.6779 0.6827 0.6645 0.6951 0.6826 0.6626 0.6531 0.6659 0.6455 0.6649 0.6698
pKa 0.7142 0.7160 0.6896 0.7224 0.5223 0.7051 0.7131 0.5765 0.4709 0.6711 0.9419 0.7171 0.7217 0.6987 0.7154 0.7125
VD 0.9918 0.9881 0.9842 0.9888 0.9848 0.9986 0.9920 0.9919 0.9940 0.9991 0.9998 0.9759 0.9897 0.9976 9958 0.9923
log P 0.7385 0.7380 0.6887 0.7519 0.4892 0.6634 0.7361 0.5676 0.5289 0.6750 0.6803 0.4970 0.7677 0.7426 0.7418 7393
LD50 0.4046 0.4538 0.4656 0.4950 0.3162 0.4017 0.3845 0.2781 0.4675 0.3950 0.3521 0.2548 0.5101 0.4125 0.3680 0.4187

Bold values indicate the highest correlation.

Fig. 6.

Fig. 6

Cubic regression plots.

For molecular weight, the cubic model has the highest Inline graphic value, 0.9039, suggesting that it explains most variance, with ABC as the best predictor. For melting point, cubic model has the highest Inline graphic-value (0.5744) but has a high p value of 0.2852, making the model statistically weak. For the boiling point in the cubic model, Inline graphic and p value < 0.05, making it the best model, with the predictor index RM2. For the pKa value, Inline graphic for cubic model, showing the best predictive ability. For vapor density, the cubic model is the best (Inline graphic) with the lowest RMSE. All the models are highly significant. For log P values and LD50 values, the models are statistically insignificant due to high p values ( > 0.05).

To develop the most predictive model, cubic regression offers the maximum accuracy for the four of molecular parameters, molecular weight (MW), boiling point (BP), pKa, and vapour density (VD), as indicated by the highest Inline graphic values. The models for BP, MW, pKa, and VD have p values less than 0.05 in terms of statistical significance, indicating that they are dependable in forecasting these properties. However, because of their weaker correlations and statistical insignificance, the models for Melting Point (MP), LogP, and LD50 show lower Inline graphic values and higher p values, making them less accurate predictors. Table 15 shows the best predictive model.

Table 15.

Best predictive model.

Equation R Inline graphic p value
Inline graphic 0.9039 0.8170 0.0005
Inline graphic 0.9071 0.8999 0.0005
Inline graphic 0.9419 0.8854 0.0060
Inline graphic 0.9998 0.9996 0.0262

We take into consideration propionic acid, a different food preservative, in order to validate the predictive model. Propionic acid is a fungicide used to stop bacteria and fungi from growing in grain that has been stored for use by animals and poultry58. Propionic acid’s chemical structure is depicted in the Fig. 7. We compute the degree-based topological indices of propionic acid, as in Theorem 3.1, and the results are shown in Table 16. By substituting the index values in the predictive model, the physicochemical characteristics of propionic acid are estimated. The physiochemical characteristics’ experimental values are gathered from PubChem45. The experimental and predicted values for propionic acid’s physicochemical characteristics are compared in Table 17, which indicates that they are approximately equal. These results show how well the model predicts the characteristics of novel food preservative candidates. This technique can be used to screen a large number of chemicals and identify those that have a high potential for usage as food preservatives.

Fig. 7.

Fig. 7

Propionic acid.

Table 16.

Topological indices of Propionic acid.

MInline graphic MInline graphic RMInline graphic HM AZ R RR RRR
16 14 2 66 22.75 2.2701 7.3278 1.4142
H SC GA IS F SDI ABC So
2.0667 2.0246 3.6547 3.3667 38 11.3333 3.0472 12.1662

Table 17.

Comparison between experimental and predicted values.

Property Experimental value Predicted value
Molecular Weight(g/mol) 74.08 78.2329
Boiling point (°C) at 760 mmHg 141 142.3753
pKa 4.88 4.82
Vapour Density 2.56 2.7514

Conclusion

In this study, topological indices are effectively utilized to create prediction models for evaluating the physicochemical characteristics of food preservatives. We determined the most reliable method for predicting these attributes by comparing the linear and curvilinear regression models. From the findings, it can be concluded that cubic regression models are better than linear and quadratic models as they yield Inline graphic values such as 0.9998 for vapor density and 0.9039 for molecular weight, which indicates high predictive capability. The best predictive model is also identified. Furthermore, the applicability and dependability of the model are confirmed by its validation using propionic acid, an existing food preservative. In further studies, this predictive model can be utilized in the screening of food preservative compounds. This study demonstrates how computational approaches can be used to screen and design new food preservatives, providing an economical and effective substitute for experimental testing.

Author contributions

GKB: Conceptualization; writing-original draft; validation;software RS: Supervision; validation; conceptualization.

Funding

Open access funding provided by Vellore Institute of Technology.

Data availability

All data generated or analysed during this study are included in this published article.

Declarations

Competing interests

The authors declare no competing interests.

Footnotes

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

References

  • 1.Branen, A. L. et al. (eds) Food Additives (CRC Press, 2001). [Google Scholar]
  • 2.Lindsay, R. C. Food additives. In Fennema’s Food Chemistry 701–762 (CRC Press, 2007).
  • 3.Russell, N. J. & Gould, G. W. (eds) Food Preservatives (Springer, 2003). [Google Scholar]
  • 4.Awuchi, C. G., Twinomuhwezi, H., Igwe, V. S. & Amagwula, I. O. Food additives and food preservatives for domestic and industrial food applications. J. Anim. Health2(1), 1–16 (2020). [Google Scholar]
  • 5.Sharma, S. Food preservatives and their harmful effects. Int. J. Sci. Res. Publ.5(4), 1–2 (2015). [Google Scholar]
  • 6.Jurić, M., Bandić, L. M., Carullo, D. & Jurić, S. Technological advancements in edible coatings: Emerging trends and applications in sustainable food preservation. Food Biosci.58, 103835 (2024). [Google Scholar]
  • 7.Damayanti, S., Permana, J. & Tjahjono, D. H. The use of computational chemistry to predict toxicity of antioxidants food additives and its metabolites as a reference for food safety regulation. Pharma Chem J.7, 174–181 (2015). [Google Scholar]
  • 8.Mirza, N. et al. Health implications, toxicity, and safety assessment of functional food additives. In Food Additives-From Chemistry to Safety (IntechOpen, 2024).
  • 9.Roy, K. Advances in QSAR modeling. In Applications in Pharmaceutical, Chemical, Food, Agricultural and Environmental Sciences 555 (Springer, 2017).
  • 10.Roy, K., Kar, S. & Das, R. N. A Primer on QSAR/QSPR Modeling: Fundamental Concepts (Springer, 2015). [Google Scholar]
  • 11.Katritzky, A. R., Lobanov, V. S. & Karelson, M. QSPR: The correlation and quantitative prediction of chemical and physical properties from structure. Chem. Soc. Rev.24(4), 279–287 (1995). [Google Scholar]
  • 12.Liu, P. & Long, W. Current mathematical methods used in QSAR/QSPR studies. Int. J. Mol. Sci.10(5), 1978–1998 (2009). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 13.Kiralj, R. & Ferreira, M. Basic validation procedures for regression models in QSAR and QSPR studies: Theory and application. J. Braz. Chem. Soc.20, 770–787 (2009). [Google Scholar]
  • 14.West, D. B. Introduction to Graph Theory Vol. 2 (Prentice Hall, 2001). [Google Scholar]
  • 15.Trinajstic, N. Chemical Graph Theory (CRC Press, 2018). [Google Scholar]
  • 16.Gutman, I. Degree-based topological indices. Croat. Chem. Acta86(4), 351–361 (2013). [Google Scholar]
  • 17.Estrada, E. & Uriarte, E. Recent advances on the role of topological indices in drug discovery research. Curr. Med. Chem.8(13), 1573–1588 (2001). [DOI] [PubMed] [Google Scholar]
  • 18.Furtula, B. & Gutman, I. A forgotten topological index. J. Math. Chem.53(4), 1184–1190 (2015). [Google Scholar]
  • 19.Rouvray, D. H. The modeling of chemical phenomena using topological indices. J. Comput. Chem.8(4), 470–480 (1987). [Google Scholar]
  • 20.Jyothish, K. & Santiago, R. Quantitative structure-property relationship modeling with the prediction of physicochemical properties of some novel Duchenne muscular dystrophy drugs. ACS Omega10(4), 3640 (2025). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 21.Gayathri, K. B. & Roy, S. Quantitative structure-property relationship study of postpartum depression medications using topological indices and regression models. Ain Shams Eng. J.16(1), 103194 (2025). [Google Scholar]
  • 22.Renai, P. N. A. D., Roy, S. & Govardhan, S. QSPR analysis for certain bio-molecular architectures. Int. J. Quantum Chem.124(11), e27423 (2024). [Google Scholar]
  • 23.Govardhan, S., Roy, S., Prabhu, S. & Arulperumjothi, M. Topological characterization of cove-edged graphene nanoribbons with applications to NMR spectroscopies. J. Mol. Struct.1303, 137492 (2024). [Google Scholar]
  • 24.Jeyaraj, S. V. & Santiago, R. A study on efficient technique for generating vertex-based topological characterization of boric acid 2D structure. ACS Omega8(25), 23089–23097 (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 25.Augustine, T. & Santiago, R. On neighborhood degree-based topological analysis over melamine-based TriCF structure. Symmetry15(3), 635 (2023). [Google Scholar]
  • 26.Hasani, M., Ghods, M., Mondal, S., Siddiqui, M. K. & Cheema, I. Z. Modeling QSPR for pyelonephritis drugs: A topological indices approach using MATLAB. J. Supercomput.81(3), 479 (2025). [Google Scholar]
  • 27.Pyka, A. Application of topological indices for prediction of the biological activity of selected alkoxyphenols. Acta Pol. Pharm.59(5), 347–352 (2002). [PubMed] [Google Scholar]
  • 28.Ejima, O., Abubakar, M. S., Pawa, S. S., Ibrahim, A. H. & Aremu, K. O. Ensemble learning and graph topological indices for predicting physical properties of mental disorder drugs. Phys. Scr.99(10), 106009 (2024). [Google Scholar]
  • 29.Li, H. et al. Topological analysis and predictive modeling of amino acid structures with implications for bioinformatics and structural biology. Sci. Rep.15(1), 638 (2025). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 30.Abubakar, M. S., Ejima, O., Sanusi, R. A., Ibrahim, A. H. & Aremu, K. O. Predicting antibacterial drugs properties using graph topological indices and machine learning. IEEE Access12, 181420–181435 (2024). [Google Scholar]
  • 31.Yap, C. W., Li, H., Ji, Z. L. & Chen, Y. Z. Regression methods for developing QSAR and QSPR models to predict compounds of specific pharmacodynamic, pharmacokinetic, and toxicological properties. Mini-Rev. Med. Chem.7(11), 1097–1107 (2007). [DOI] [PubMed] [Google Scholar]
  • 32.Levine, A. S. & Fellers, C. R. Action of acetic acid on food spoilage microorganisms. J. Bacteriol.39(5), 499–515 (1940). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 33.Tewari, S., Sehrawat, R., Nema, P. K. & Kaur, B. P. Preservation effect of high pressure processing on ascorbic acid of fruits and vegetables: A review. J. Food Biochem.41(1), e12319 (2017). [Google Scholar]
  • 34.Imran, M. et al. Ascorbyl palmitate: A comprehensive review on its characteristics, synthesis, encapsulation, and applications. Process Biochem.142, 68–80 (2024). [Google Scholar]
  • 35.Del Olmo, A., Calzada, J. & Nunez, M. Benzoic acid and its derivatives as naturally occurring compounds in foods and as additives: Uses, exposure, and controversy. Crit. Rev. Food Sci. Nutr.57(14), 3084–3103 (2017). [DOI] [PubMed] [Google Scholar]
  • 36.Goodman, D. L., McDonnel, J. T., Nelson, H. S., Vaughan, T. R. & Weber, R. Chronic urticaria exacerbated by the antioxidant food preservatives, butylated hydroxyanisole (BHA) and butylated hydroxytoluene (BHT). J. Allergy Clin. Immunol.85, 570–575 (1990). [DOI] [PubMed] [Google Scholar]
  • 37.Zhang, W., Roy, S., Assadpour, E., Cong, X. & Jafari, S. M. Cross-linked biopolymeric films by citric acid for food packaging and preservation. Adv. Colloid Interface Sci.314, 102886 (2023). [DOI] [PubMed] [Google Scholar]
  • 38.Liu, S. et al. Dimethyl dicarbonate as a food additive effectively inhibits Geotrichum citri-aurantii of citrus. Foods.11(15), 2328 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 39.Nerin, C., Becerril, R. & Silva, F. Ethyl lauroyl arginate (LAE): Antimicrobial activity and applications in food systems. In Antimicrobial Food Packaging 405–414 (Academic Press, 2025).
  • 40.Talevi, A., Bellera, C. L., Castro, E. A. & Bruno-Blanch, L. E. A successful virtual screening application: Prediction of anticonvulsant activity in MES test of widely used pharmaceutical and food preservatives methylparaben and propylparaben. J. Comput. Aided Mol. Des.21, 527–538 (2007). [DOI] [PubMed] [Google Scholar]
  • 41.Vandenberg, L. N. & Bugos, J. Assessing the public health implications of the food preservative propylparaben: Has this chemical been safely used for decades?. Curr. Environ. Health Rep.8, 54–70 (2021). [DOI] [PubMed] [Google Scholar]
  • 42.Javaheri-Ghezeldizaj, F., Alizadeh, A. M., Dehghan, P. & Dolatabadi, J. E. N. Pharmacokinetic and toxicological overview of propyl gallate food additive. Food Chem.423, 135219 (2023). [DOI] [PubMed] [Google Scholar]
  • 43.Stopforth, J. D., Sofos, J. N. & Busta, F. F. Sorbic acid and sorbates. Food Sci. Technol.145, 49 (2005). [Google Scholar]
  • 44.Khezerlou, A., Pouya Akhlaghi, A., Alizadeh, A. M., Dehghan, P. & Maleki, P. Alarming impact of the excessive use of tert-butylhydroquinone in food products: A narrative review. Toxicol. Rep.9, 1066–1075 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 45.National Center for Biotechnology Information. PubChem Compound Database. U.S. National Library of Medicine. https://pubchem.ncbi.nlm.nih.gov/. Accessed 11 Mar (2025).
  • 46.Gutman, I. & Trinajstić, N. Graph theory and molecular orbitals. In New Concepts II 49–93 (Springer, 2005).
  • 47.Furtula, B., Gutman, I. & Ediz, S. On difference of Zagreb indices. Discrete Appl. Math.178, 83–88 (2014). [Google Scholar]
  • 48.Shirdel, G. H., Rezapour, H. & Sayadi, A. M. The hyper-Zagreb index of graph operations (2013).
  • 49.Ali, A., Raza, Z., & Bhatti, A. A. On the augmented Zagreb index. arXiv preprint arXiv:1402.3078 (2014).
  • 50.Randic, M. Characterization of molecular branching. J. Am. Chem. Soc.97(23), 6609–6615 (1975). [Google Scholar]
  • 51.Gutman, I., Furtula, B., & Elphick, C. Three new/old vertex-degree-based topological indices. In MATCH Communications in Mathematical and in Computer Chemistry (2014).
  • 52.Zhong, L. The harmonic index for graphs. Appl. Math. Lett.25(3), 561–566 (2012). [Google Scholar]
  • 53.Zhou, B. & Trinajstić, N. On a novel connectivity index. J. Math. Chem.46, 1252–1270 (2009). [Google Scholar]
  • 54.Vukičević, D. & Furtula, B. Topological index based on the ratios of geometrical and arithmetical means of end-vertex degrees of edges. J. Math. Chem.46, 1369–1376 (2009). [Google Scholar]
  • 55.Vukičević, D. & Gaśperov, M. Bond additive modeling 1. Adriatic indices. Croat. Chem. Acta83, 243–260 (2010). [Google Scholar]
  • 56.Estrada, E., Torres, L., Rodriguez, L. & Gutman, I. An atom-bond connectivity index: Modelling the enthalpy of formation of alkanes Indian. J. Chem.37A, 849 (1998). [Google Scholar]
  • 57.Gutman, I. Geometric approach to degree-based topological indices: Sombor indices. MATCH Commun. Math. Comput. Chem.86(1), 11–16 (2021). [Google Scholar]
  • 58.Ranaei, V., Pilevar, Z., Khaneghah, A. M. & Hosseini, H. Propionic acid: Method of production, current state, and perspectives. Food Technol. Biotechnol.58(2), 115 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Data Availability Statement

All data generated or analysed during this study are included in this published article.


Articles from Scientific Reports are provided here courtesy of Nature Publishing Group

RESOURCES