Abstract
We define the odd log-logistic exponential Gaussian regression with two systematic components, which extends the heteroscedastic Gaussian regression and it is suitable for bimodal data quite common in the agriculture area. We estimate the parameters by the method of maximum likelihood. Some simulations indicate that the maximum-likelihood estimators are accurate. The model assumptions are checked through case deletion and quantile residuals. The usefulness of the new regression model is illustrated by means of three real data sets in different areas of agriculture, where the data present bimodality.
Keywords: Agriculture data, bimodal data, exponential Gaussian distribution, regression model, simulation study
1. Introduction
The normal (Gaussian) distribution is used to model many phenomena in almost all areas. It is adequate for real data when most of the data are near to the mean. On the other hand, the exponential is a continuous distribution with positive support. It is one of the simplest probabilistic models used to describe time to failure.
If and , where ), and and are independent random variables, then the sum has the exponential Gaussian (ExGa) distribution, say . Some results were reported for the ExGa distribution. For example, [28] implemented this distribution in R software (GAMLSS), [7] proved that it may provide better fits for some classes of phenomena including intermitotic time and protein expression variability data. Further, [11] used the ExGa distribution for reconstruction of chromatographic peaks, [18] applied it in experiments to measure response item and [30] used this distribution for integrated extended time-lapse automated imaging to quantify the dynamics of cell proliferation. All these papers consider unimodal data, but in some situations, this assumption does not hold. For example, we consider the following datasets:
The data are related to the index of germination speed of tomato seeds. The research was developed at the Central Seed Laboratory of the Federal University of Lavras, Lavras, MG, Brazil (see Figure 1(a)).
Another data set consists of the degrees Brix (a measure of the density or sugar concentration of solutions) of yacon (a tuber native to Peru) (see Figure 1(b)).
Figure 1.
Histograms.
Figure 1(a,b) displays the existence of a bimodal data distribution. These data sets are analyzed in this paper in the application section. Our first objective is to define a new distribution, called the odd log-logistic exponential Gaussian (OLLExGa), to model data with two modes (bimodal). In many practical situations, the response variable is affected by several explanatory variables, such as temperature, radiation, sulfurgran, ascorbic acid, germination index, among others. The regression model that provides a better fit tends to produce more precise estimates for the quantities of interest.
Recently, some studies of regressions have been published in different contexts. For example, [21] introduced the heteroscedastic odd log-logistic generalized gamma regression for censored data, [10] studied a zero-spiked regression models generated by gamma random variables with application in the resin oil production and [22] considered a generalized odd log-logistic flexible Weibull regression with applications in repairable systems, [27] proposed the odd log-logistic generalized inverse Gaussian with real estate data regression, among others. Further, [9] defined the G family of continuous distributions with mathematical properties, characterizations and regression modeling, [13] presented the odd power Lindley generator of probability distributions with properties, characterizations and regression modeling, [14] introduced the Weibull Marshall–Olkin family with regression and application to censored data and [12] proposed a new flexible lifetime model with log-location regression, properties and applications.
Based on these surveys, our second objective is to construct a regression based on the OLLExGa distribution to model bimodal data by considering a classic analysis and with different applications in agriculture. The inferential part is carried out using asymptotic maximum-likelihood estimators (MLEs). Some Monte Carlo simulation studies are performed to verify the accuracy of the OLLExGa regression by means of the variance and mean squared error. We check the model assumptions and detect possible influential or extreme observations that can cause distortions in the results of the fitted regression. An efficient way to detect these observations, called case deletion or global influence, was proposed by [2]. We introduce quantile residuals (qrs) to check the regression assumptions and carry out simulation studies to evaluate their empirical distribution when the data are bimodal. We draw envelope plots as a measure of the goodness-of-fit. Our research can be summarized in the following contributions:
First, we present the OLLExGa distribution to model bimodal data.
Second, based on the OLLExGa distribution, we propose a regression with two systematic components to model bimodal data. There are no classic models for bimodal data in the literature.
Third, we present diagnostic and residual analysis to verify all assumptions of the new regression.
Finally, we present three applications where the main motivation is the presence of bimodality in these data. We emphasize that in the first application, the researcher responsible for the execution of the experiment provides all final interpretations of these analyses. She even emphasized that the normal regression cannot be adopted for these data. In these terms, we are sure that our proposed regression can be used not only in the area of agriculture, but regression may be used in other areas. We focus on agricultural applications.
In Section 2, we define the OLLExGa distribution and display some plots. In Section 3, we propose the OLLExGa regression and investigate the accuracy of the MLEs from several simulations. In Section 4, we define qrs for the fitted regression and some diagnostic measures. We also provide a simulation study to check the normal approximation for these residuals. Three applications to real data in agriculture area in Section 5 confirm the flexibility of the OLLExGa distribution and its associated regression model. Section 6 ends with some conclusions.
2. The model definition
It is important to have extended forms of classic distributions in many applied areas such as agriculture data modeling. We adopt the parametrization of the ExGa distribution used in the GAMLSS library [28] in R. The cumulative distribution function (cdf) and probability density function (pdf) of the ExGa distribution are
(1) |
and
(2) |
respectively, where and are the mean and standard deviation of the normal distribution, is the mean of the exponential variable and is the standard normal cumulative function.
Let be a random variable having density function (2). The moment generating function (mgf) of W is . It can be checked from that the ExGa distribution converges to the normal distribution when ν goes to zero. By differentiating , the mean, variance, skewness and kurtosis of W are
respectively.
Based on the odd log-logistic generator (OLL-G) class [6], we define the OLLExGa cdf, say , by integrating the log-logistic density function with shape parameter , namely
(3) |
where . Hereafter, we assume that the random variable Y follows the cdf (3) with parameters , say . The OLLExGa distribution includes as special cases the ExGa distribution when and the normal distribution when and .
Consider to simplify the notation. The density function of Y has the form
(4) |
The main motivation for the new distribution is to make its skewness more flexible (compared to the ExGa model) and allow bimodality. Equation (4) provides greater flexibility of the tails of the density and can be widely applied in many areas of engineering and biology.
Plots of the density (4) for selected parameter values are displayed in Figure 2. It is clear that the proposed distribution is much more flexible, especially in relation to bimodality (for ) than the ExGa distribution, which does not have this characteristic.
Figure 2.
Plots of the OLLExGa density for some parameter values.
The quantile function (qf) of the OLLExGa distribution can be expressed as
(5) |
where is the qf of the ExGa distribution available in the GAMLSS package [28].
This scheme is useful because of the existence of fast generators for the ExGa random variables in some statistical packages. The plots comparing the exact OLLExGa densities and the histograms from two simulated data sets with 100,000 replications for selected parameter values are displayed in Figure 3. These plots (and several others not shown here) reveal that the simulated values are consistent with the OLLExGa distribution.
Figure 3.
Histograms and plots of the OLLExGa densities.
In the Appendix, we derive some mathematical properties of the OLLExGa distribution including a linear representation for its density function.
3. The OLLExGa regression
In several problems of the medical, biological, industrial and chemical areas, among others, it is of great interest to verify if two or more variables are related in some way. To investigate this relationship is very important to construct a regression model. The data collection allows to know the nature of the relationship between variables and to carry out studies capable of accommodating unexpected situations, such as variability in raw material, ambient temperature, machine and operators. They are built with the following objectives: model formulation, parameter estimation, inference, diagnostic and residual analysis and prediction. In this research, we focus on our these goals. In these terms, the OLLExGa regression is a very competitive alternative to the ExGa regression.
The regression technique aims to choose the distribution of Y given the matrix of explanatory variables. The parameters μ and σ are related to the explanatory variables by the systematic components
(6) |
respectively, where and and are the unknown vectors of coefficients.
The total log-likelihood function for the vector of parameters from model (6) given n independent observations has the form
(7) |
The log-likelihood (7) can be maximized numerically using the GAMLSS software to find the MLE of . By fitting the ExGa regression (with ) yields initial values for and . Some simulations of the fitted model (6) confirm the adequacy of this maximization in Section 3.1.
The elements of the Hessian matrix can be determined numerically in the R software. The multivariate normal distribution can approximate the distribution of since Equation (4) satisfies some standard regularity conditions. More importantly, it can be utilized to obtain approximate confidence intervals for the parameters in . The adequacy of some special models of the OLLExGa regression can be verified via likelihood ratio (LR) statistics.
3.1. Two simulation studies
In this section, we provide two simulation studies: one to examine the adequacy of the MLEs in the OLLExGa distribution and other to investigate the adequacy of the estimates in the regression model with systematic components for μ and σ.
-
First simulation: the OLLExGa distribution
Some properties of the MLEs are examined using a classical analysis by means of a Monte Carlo simulation study. We simulate the OLLExGa distribution as follows: (i) Generate ; (ii) Obtain OLLExGa observations from Equation (5).
We set , , and to provide bimodality in the data as shown in Figure 3(b). We choose four scenarios (n = 50, 100, 500 and 1000) for the replications to calculate , , and . Then, we obtain the average estimates (AEs), biases and means square errors (MSEs) from 1000 Monte Carlo simulations via the GAMLSS software. The results listed in Table 1 confirm the accuracy of the estimates and that their MSEs decrease when n increases in agreement with first-order asymptotic theory.
-
Second Simulation: the OLLExGa regression
We examine the performance of the MLEs in the OLLExGa regression by means of some simulations with n = 100, 300 and 500. We simulate 1, 000 samples from two scenarios ( and ). For both cases, we take , , , , , and under the systematic components and . The response variable and explanatory variables and are generated as follows: , and .
We calculate the AEs, biases and MSEs for each fitted regression. The figures in Table 2 reveal that the MSEs of the estimates tend to zero and the AEs converge to the true parameters when n increases. Both facts strongly support that the approximate normal distribution is adequate to the finite sample distribution of the estimates.
Table 1. AEs, biases and MSEs for the parameters of the OLLExGa distribution.
Scenario 1 | Scenario 2 | ||||||
---|---|---|---|---|---|---|---|
n = 50 | n = 100 | ||||||
Parameter | AE | Bias | MSE | Parameter | AE | Bias | MSE |
9.1779 | −0.8221 | 10.7552 | 9.4233 | −0.5767 | 4.2744 | ||
1.1141 | 0.3141 | 0.3622 | 0.9858 | 0.1858 | 0.1772 | ||
0.5415 | 0.1415 | 1.4680 | 0.6357 | 0.2357 | 0.5249 | ||
0.4363 | 0.2363 | 0.2081 | 0.3335 | 0.1335 | 0.1446 | ||
Scenario 3 | Scenario 4 | ||||||
n = 500 | n = 1000 | ||||||
Parameter | AE | Bias | MSE | Parameter | AE | Bias | MSE |
9.8203 | −0.1797 | 1.1707 | 9.8306 | −0.1694 | 0.5779 | ||
0.9151 | 0.1151 | 0.0376 | 0.8952 | 0.0952 | 0.0214 | ||
0.5886 | 0.1886 | 0.1332 | 0.5734 | 0.1734 | 0.0750 | ||
0.2508 | 0.0508 | 0.0083 | 0.2391 | 0.0391 | 0.0038 |
Table 2. AEs, biases and MSEs for the OLLExGa regression under scenarios 1 () and 2 ().
Scenario 1 | |||||||||
---|---|---|---|---|---|---|---|---|---|
n = 100 | n = 300 | n = 500 | |||||||
Parameter | AE | Bias | MSE | AE | Bias | MSE | AE | Bias | MSE |
2.0729 | −0.0271 | 0.1386 | 2.0861 | −0.0139 | 0.0537 | 2.0793 | −0.0207 | 0.0332 | |
−0.4115 | −0.0115 | 0.1359 | −0.4128 | −0.0128 | 0.0442 | −0.4007 | −0.0007 | 0.0262 | |
0.3035 | 0.0035 | 0.0251 | 0.3011 | 0.0011 | 0.0076 | 0.3031 | 0.0031 | 0.0046 | |
−1.1173 | −0.1173 | 0.2861 | −1.0281 | −0.0281 | 0.0807 | −1.0163 | −0.0163 | 0.0447 | |
0.2438 | 0.0438 | 0.4549 | 0.1978 | −0.0022 | 0.0971 | 0.2107 | 0.0107 | 0.0525 | |
−0.1085 | −0.0085 | 0.0706 | −0.1101 | −0.0101 | 0.0146 | −0.0998 | 0.0002 | 0.0068 | |
ν | 0.4499 | 0.0499 | 0.1129 | 0.4282 | 0.0282 | 0.0497 | 0.4254 | 0.0254 | 0.0329 |
τ | 0.5468 | 0.0468 | 0.0948 | 0.5225 | 0.0225 | 0.0407 | 0.5225 | 0.0225 | 0.0282 |
Scenario 2 | |||||||||
n = 100 | n = 300 | n = 500 | |||||||
Parameter | AE | Bias | MSE | AE | Bias | MSE | AE | Bias | MSE |
2.1881 | 0.0881 | 0.0534 | 2.1125 | 0.0125 | 0.0259 | 2.1021 | 0.0021 | 0.0186 | |
−0.4042 | −0.0042 | 0.0216 | −0.4008 | −0.0008 | 0.0065 | −0.3979 | 0.0020 | 0.0037 | |
0.2997 | −0.0003 | 0.0034 | 0.3022 | 0.0022 | 0.0011 | 0.2999 | −0.0001 | 0.0007 | |
−1.1240 | −0.1240 | 0.7935 | −1.0332 | −0.0332 | 0.2199 | −1.0255 | −0.0255 | 0.0919 | |
0.2087 | 0.0087 | 0.9323 | 0.2093 | 0.0093 | 0.0976 | 0.2032 | 0.0032 | 0.0417 | |
−0.1009 | −0.0009 | 0.0713 | −0.1004 | −0.0004 | 0.0150 | −0.1038 | −0.0038 | 0.0079 | |
ν | 0.3060 | −0.0940 | 0.0677 | 0.3875 | −0.0125 | 0.0359 | 0.4001 | 0.0001 | 0.0270 |
τ | 1.4524 | 0.1524 | 1.5819 | 1.4080 | 0.1080 | 0.9306 | 1.3395 | 0.0395 | 0.4188 |
4. Checking model
The assessment of robustness aspects of the parameter estimates in statistical models has been an important concern of various researchers in recent decades. The case deletion measures, which consists of studying the impact on the parameter estimates after dropping individual observations, is probably the most employed technique to detect influential observations; see, for example [3]. A global influence measure considered by [31] is a generalization of the Cook distance defined as a standardized norm of expressed as
(8) |
where is the observed information matrix. Another measure to evaluate the influence is called of likelihood distance and considers the difference between and . Thus, the likelihood distance has the form
(9) |
where is the value of the logarithm of the likelihood function of the full sample and is the value of the logarithm of the likelihood function of the sample excluding the ith observation.
The analysis of the residuals is an efficient method to check the model adequacy. Recently, [23] presents a discussion and application of the qrs for regression models. Here, we also use the qrs to check the adequacy of the OLLExGa regression. The residuals usually allow to check the local fit to each observation and whether the differences between the observed and fitted values occur randomly or are due to a systematic behavior. The qrs [5] for model (6) are defined by
(10) |
where and is the inverse of the standard normal cumulative distribution.
The construction of simulated confidence bands to provide a better interpretation of the probability normal plot of the residuals was pioneered by [1]. The majority of points will be randomly distributed within these bands when the model is well-suited to the data.
Simulation study of the quantile residuals
The behavior of the empirical distribution of the for the OLLExGa regression is investigated by generating 1, 000 samples via the algorithm introduced in Section 3.1. We construct the normal probability plot to check the deviation from the normality hypothesis for the residuals. The plots in Figures 4 and 5 representing the first and second scenarios, respectively, indicate that the empirical distribution of these residuals agrees with the standard normal distribution. Also, this empirical distribution becomes closer to the standard normal distribution when n increases
Figure 4.
Normal probability plots for in the OLLExGa regression for scenario 1 () (a) n = 100. (b) n = 300. (c) .
Figure 5.
Normal probability plots for in the OLLExGa regression for scenario 2 () (a) n = 100. (b) n = 300. (c) n = 500.
5. Applications
In this section, we present three real applications in the field of agriculture, where we prove that the OLLExGa regression can be quite useful in this area. The calculations are performed with the R software.
5.1. Application 1: tomato seeds data
The data refer to index of germination speed of tomato seeds Ozone. The production of tomatoes for fresh consumption generally involves the germination of seeds in trays with subsequent transplanting of the seedlings. One of the main problems noted in this production system is the rapid vegetative growth of the aerial part (etiolation). This imbalance causes the formation of elongated and fragile seedlings with thin hypocotyls and few roots, making them more susceptible to biotic and abiotic stresses, with consequent death of these seedlings [26]. Some growth inhibitors, such as paclobutrazol (PBZ), are used to reduce this problem. PBZ is a growth regulator that belongs to the triazole group and acts by reducing biosynthesis of gibberellins (GAs). It therefore reduces the growth of the stem without impairing cell differentiation and without causing phytotoxicity [17]. The gibberellins are hormones responsible for regulating the height of plants, by promoting alteration of the juvenility and sexuality of the flowers and the establishment and growth of the fruits, besides affecting the activation of hydrolytic enzymes responsible for seed germination [29]. PBZ can be applied by foliar spraying, by soil applications or by seed treatment. The application by seed treatment is one of the safest options by avoiding the problem of residues in the fruits and environmental contamination [15]. However, since PBZ acts by reducing the synthesis of GAs, it can have deleterious effects on seed germination. Calculation of the germination speed index (GSI) proposed by [16] can be used as a test of the relative vigor of seeds in controlled laboratory germination experiments. There is a direct relationship between the germination sped and vigor of seeds [19]. Therefore, a reduction of the GSI serves as an indicator of a negative influence on seed germination.
Method for testing the seed germination index of tomatoes
The study was carried out in the Central Seed Laboratory of Lavras Federal University (UFLA), in the municipality of Lavras, Minas Gerais, Brazil. Seeds of the Ozone tomato cultivar were treated with four PBZ doses 0, 0.004, 0.008 and 0.016 mL/10 g of seeds, with the dose of 0.008 recommended by the manufacturer. So that all treatments receive the same volume of solution (0.214 mL), complementary water was applied. The volume of the solution was distributed as uniformly as possible in Petri dishes with a diameter of 25 cm, after which the seeds were added and the dishes were covered with lids and manually shaken for approximately 4 min. After the treatments, a portion of the seeds from each dish was submitted to analysis (period 1) and the other portion was placed in a paper bag and stored in a refrigerator (, 50% RH) for 5 months (period 2). The GSI was calculated daily by counting the number of germinated seeds (with radicle emergence) in the germination tests, using acrylic gerboxes containing blotter paper as substrate, moistened to 2.5 times the dry weight. The gerboxes were kept in BOD chambers at temperature of –C for 14 days, with 12:12 h photoperiod. The GSI values were estimated using the formula proposed by [16], namely
(11) |
where
denote the number of normal seedlings tallied on the first, second, …, last count, respectively;
denote the number of days since sowing on the first, second, …, last count, respectively.
Thus, we adopt a regression based on the OLLExGa distribution to model these data considering two systematic components for μ and σ. The objective is to verify if any of the doses in different periods is relevant in determining the GSI. The variables under study are:
: germination speed index (GSI);
: doses of PBZ (0, 0.004, 0.008 and 0.016 mL) per 10 grams of seeds. In this case using three dummy variables;
: two periods (, ).
Table 3 provides the descriptive analysis of the response variable, where the mean and median are 11.0020 and 8.5650, respectively. The distribution of the data is asymmetric and then the OLLExGa distribution is an alternative to the analysis of the data.
Table 3. Descriptive statistics for the GSI response variable.
Mean | Median | SD | Skewness | Kurtosis | Min. | Max. |
---|---|---|---|---|---|---|
11.0020 | 8.5650 | 5.1852 | 0.5635 | −0.2438 | 3.9100 | 27.2500 |
In Table 4, we give the MLEs, their standard errors (SEs) (in parentheses) and some goodness-of-fit measures: Akaike Information Criterion (AIC), Bayesian Information Criterion (BIC) and Global Deviance (GD) for three fitted distributions. The figures in this table indicate that the OLLExGa distribution has the lowest values of these statistics. In fact, it is the best model to fit the current data.
Table 4. MLEs, SEs and AIC, BIC and GD statistics for some models fitted to the tomato seeds data.
Model | μ | τ | AIC | BIC | GD | ||
---|---|---|---|---|---|---|---|
OLLExGa | 10.7499 | 0.2458 | −0.8386 | 0.1690 | 551.0679 | 561.3253 | 543.0679 |
(0.0057) | (0.1718) | (0.0002) | (0.0373) | ||||
ExGa | 4.5912 | −0.5605 | 1.8585 | 1 | 569.9559 | 577.6489 | 563.9559 |
(0.2914) | (0.4579) | (0.1114) | (–) | ||||
normal | 11.0022 | 1.6405 | (–) | (–) | 591.4290 | 596.5577 | 587.4290 |
(0.5265) | (0.0722) | (–) | (–) |
The OLLExGa distribution includes as a special case the ExGa model, and then we can compare them using the LR statistic. This statistic for testing the hypotheses versus , i.e. to compare the OLLExGa and ExGa distributions, is w = 20.8879 (). It is evident that the OLLExGa distribution provides a better fit to these data than the ExGa distribution.
More information is addressed by a visual comparison of the histogram of the data and the estimated densities and cumulative functions. The plots of the fitted OLLExGa, ExGa and normal densities are displayed in Figure 6. The estimated OLLExGa density gives the closest fit to this histogram.
Figure 6.
Estimated densities of the OLLExGa, ExGa and normal models for tomato seeds data.
In addition, we note that the tomato seeds data have a bimodal shape in Figure 6, whereas the ExGa and normal distributions cannot cope with this shape. So, this plot indicates that the OLLExGa regression is a possible model to explain the current data. Based on this marginal and descriptive analysis, we consider the OLLExGa regression with two systematic components:
and
where , and refer to the dose–period interaction.
Table 5 gives the MLEs, their approximate SEs and p-values obtained from the fitted OLLExGa regression. We can note from Table 5 that the covariate is significant considering a level of 5%, which indicates that there is a significant difference between the levels of the doses 0.000 and 0.008. In the systematic component referring to the dispersion parameter , the factor is significant, and then there is a significant difference between times 1 and 2. Other interpretations are reported at the end of this application.
Table 5. MLEs, SEs and p-values for the OLLExGa regression fitted to the tomato seeds data.
Sources of variation | Parameter | Estimate | SE | p-value | |
---|---|---|---|---|---|
Intercept | 6.3362 | 2.2801 | 0.0068 | ||
Dose 0.004 | −0.8545 | 0.4373 | 0.0542 | ||
Dose 0.008 | −1.2836 | 0.3994 | 0.0019 | ||
μ | Dose 0.016 | 0.4649 | 0.3612 | 0.2018 | |
Period 2 | 9.7536 | 2.3479 | <0.001 | ||
Dose 0.004 × Period 2 | −0.3778 | 2.3813 | 0.8744 | ||
Dose 0.008 × Period 2 | −0.1861 | 0.9406 | 0.8437 | ||
Dose 0.016 × Period 2 | −1.2384 | 0.9121 | 0.1785 | ||
Intercept | 1.8998 | 0.9169 | 0.0416 | ||
Dose 0.004 | 0.0011 | 0.0106 | 0.9160 | ||
Dose 0.008 | −0.1719 | 0.2842 | 0.5470 | ||
σ | Dose 0.016 | −0.4708 | 0.2902 | 0.1088 | |
Period 2 | 0.7969 | 0.3192 | 0.0147 | ||
Dose 0.004 × Period 2 | −0.7715 | 0.3595 | 0.0350 | ||
Dose 0.008 × Period 2 | 0.1658 | 0.4678 | 0.7239 | ||
Dose 0.016 × Period 2 | 0.4448 | 0.4728 | 0.3497 | ||
−0.8481 | 5.3061 | ||||
τ | 6.9780 | 6.2670 |
Table 6 lists the AIC, BIC and GD statistics for some fitted regressions. The results indicate that the OLLExGa regression has the smallest values of these statistics among all of them. So, it could be chosen as the more suitable regression to these data. The LR statistic for testing the hypotheses versus , i.e. to compare the OLLExGa and ExGa regressions, is w = 9.8310 (), which supports the OLLExGa regression.
Table 6. AIC, BIC and GD statistics for some fitted regressions to the tomato seeds data.
Model | AIC | BIC | GD |
---|---|---|---|
OLLExGa | 370.4092 | 416.5674 | 334.4092 |
normal | 376.1662 | 417.1958 | 344.1662 |
ExGa | 378.2402 | 421.8341 | 344.2402 |
We use the software R to compute case deletion measures and defined in Section 4. The results of such influence measure index plots are displayed in Figure 7. These plots show that the cases , , and are possible influential observations.
Figure 7.
Index plot for : (a) (likelihood distance) and (b) (generalized Cook's distance).
We perform the residual analysis by plotting in Figure 8(a) the 's (see Section 4) against the index of the observations. Figure 8(b) gives the normal probability plot with generated envelope. Figure 8(a) shows some large residuals (observations and ), although Figure 8(b) supports the hypothesis that the OLLExGa regression is very suitable for these data.
Figure 8.
(a) Index plot of the qrs (). (b) Normal probability plot with envelope for . (c) Estimated cdf from the OLLExGa regression for the tomato seeds and the empirical cdf for levels of the variable .
After removing some non-significant explanatory variables, the final model has the form
and
Some interpretations for the final regression
The numbers in Table 7 indicate that the covariable variables (representing dose) is significant at 5%, meaning a significant difference between the doses 0.000 vs. 0.004 and 0.000 vs. 0.008 in relation to the GSI. The dose of 0.008 (mL i.e. ) is the one recommended by the manufacturer of the product (Syngenta) for tomatoes. Besides this, it has not yet been demonstrated that this dose is sufficient for control of etiolation, because even though this dose is recommended, it is very low ( commercial product (Pc) ) in relation to the volume of seeds. Tomato seeds are very small (1000 seeds weigh about 3 to 3.5 g), making it hard to perform uniform seed treatment. For this reason, we tested twice this dose and by interpolation and intermediate dose (0.012 mL i.a. ). But as observed in these results, the product reduces the physiological quality of the seeds even when applied in the recommended dose. In agreement with the results found in the conventional statistical analyzes [25].
We also observed that the covariable is significant (5%), indicating the existence of a significant difference between periods 1 and 2 in relation to the GSI. It was observed that GSI at all doses was higher at period 2 (Figure 8(c)), and this may be due to a lower reactivity of the product at period 2 associated with the biological factor (seed deterioration). The residual period (time when the chemical or biological product's active ingredient continues to be effective in the environment where it is used) of the PBZ is 180 days. Since the seeds were treated and stored for 150 days (period 2), the potential harmful effect of the product on seed germination was reduced. In addition, a higher GSI in period 2 compared to 1 may be due to a longer delay in structuring the seed membrane systems at the moment of germination, allowing a faster imbibition (absorption of a liquid by a solid), with consequent faster germination speed. In this case, the larger GSI does not necessarily mean better quality. Confirming this would require associating this result with others obtained by the same method in other tests (seed germination, seedling emergence, among others).
Note that the interaction of the covariables is significant, meaning a relevant difference exists of the interaction of the dose of 0.004 mL in period 2. This result is contrary to expectation, since this dose is half that recommended by the manufacturer. Therefore, our expectation was that there would be no damaging effect on the seed vigor even in period 1, when the product was fully active. In contrast, in period 2 we observed lower activity of the product.
Table 7. MLEs, SEs and p-values for the final OLLExGa regression fitted to the tomato seeds data.
Sources of variation | Parameter | Estimates | SE | p-value | |
---|---|---|---|---|---|
Intercept | 5.9078 | 0.8288 | <0.001 | ||
Dose 0.004 | −0.8002 | 0.3649 | 0.0311 | ||
μ | Dose 0.008 | −1.3167 | 0.3610 | 0.0005 | |
Dose 0.016 | 0.4241 | 0.3434 | 0.2204 | ||
Period 2 | 9.2285 | 0.3235 | <0.001 | ||
Intercept | 2.0641 | 1.4065 | 0.1461 | ||
Period 2 | 0.8026 | 0.3664 | 0.0314 | ||
Dose 0.004 × Period 1 | 0.0088 | 0.3319 | 0.9789 | ||
σ | Dose 0.004 × Period 2 | −0.7739 | 0.3632 | 0.0361 | |
Dose 0.008 × Period 1 | −0.1735 | 0.3386 | 0.6098 | ||
Dose 0.008 × Period 2 | −0.0075 | 0.3781 | 0.9841 | ||
Dose 0.016 × Period 1 | −0.4781 | 0.3408 | 0.1645 | ||
Dose 0.016 × Period 2 | 0.0149 | 0.3879 | 0.9695 | ||
−0.1194 | 0.8552 | ||||
τ | 8.3140 | 11.3920 |
Finally, a graphical comparison between the levels of the variable is illustrated in Figure 8(c). These plots refer to the empirical cdf and the estimated cdf of the OLLExGa regression. They confirm that the OLLExGa regression provides a superior fit. Also, we can note a significant difference between the epochs in relation to the speed index of germination of seeds of tomato cultivar Ozone.
5.2. Application 2: weight of rat pups data
For the second application, we consider only the regression structure for the parameter μ. The data refer to the birth weights of rats, see [20]. The sample size is n = 322 and we consider the following variables:
: weight of newborn rats;
: sex (, );
: treatment (, , ). In this case, we take two dummy variables ().
A brief descriptive analysis of the data in Table 8 reveals that the mean weight is 6.081 g and the median is 6.055 g, thus indicating that the data have symmetrical shape. Because of this fact, we can compare the OLLExGa, ExGa and normal regression models. The OLLExGa regression is indeed flexible and it is considered not only for bimodal asymmetric data, but also for symmetric data.
Table 8. Descriptive statistics for the weights of rat pups data.
Mean | Median | SD | Skewness | Kurtosis | Min. | Max. |
---|---|---|---|---|---|---|
6.081 | 6.055 | 0.6474 | 0.4970 | 1.1251 | 3.680 | 8.330 |
The OLLExGa regression with only a systematic component for μ is
Table 9 provides the MLEs, their approximate SEs and p-values obtained from the fitted OLLExGa regression. We conclude that the two explanatory variables are significant at a 5% significant level. Thus, we can confirm (under this risk) that there is a significant difference between female and male in relation to the weights of rats. Similarly, under this same level, there is a significant difference between treatments [control (0) vs. high (1)] and [control (0) vs. low (2)].
Table 9. MLEs, SEs and p-values for the fitted OLLExGa regression to the weights of rat pups data.
Parameter | Estimate | SE | p-value |
---|---|---|---|
5.9099 | 0.0611 | <0.001 | |
0.2023 | 0.0653 | 0.0021 | |
−0.4608 | 0.0789 | <0.001 | |
−0.3852 | 0.0785 | <0.001 | |
0.9236 | 0.0431 | ||
−1.2171 | 0.1093 | ||
τ | 4.8450 | 0.2096 |
Table 10 gives the AIC, BIC and GD statistics for some regressions. The results indicate that the OLLExGa regression has the smallest values of these statistics among all fitted regressions. So, it could be chosen as the more suitable model to these data. The LR statistic for testing versus , i.e. to compare the OLLExGa and ExGa regressions, is w = 9.6831 (). This p-value indicates that the first regression yields the best fit to the weights of rat pups data.
Table 10. Goodness-of-fit measures for the weights of rat pups data.
Model | AIC | BIC | GD |
---|---|---|---|
OLLExGa | 585.4798 | 611.9016 | 571.4798 |
ExGa | 593.1629 | 615.8103 | 581.1629 |
normal | 598.2369 | 617.1097 | 588.2369 |
We use the R software to compute and in the diagnostic analysis discussed in Section 4. The results of such influence measures index plots are displayed in Figure 9. The plots reveal that the cases , , and are possible influential observation.
Figure 9.
Index plot for : (a) (likelihood distance) and (b) (generalized Cook's distance).
In addition, Figure 10(a) gives the index plot of the qrs for the fitted model. The observation is just one out of the range . Hence, there is no evidence against the model assumptions. We present the normal plot for the qrs with a generated envelope in Figure 10(b) to detect possible departures from these assumptions and outliers. This plot shows that the fitted OLLExGa regression provides a good fit to the current data, since only two points are outside the envelope.
Figure 10.
(a) Index plot of the . (b) Normal probability plot with envelope for the . (c) Estimated cdf from the fitted OLLExGa regression to the weights of rat pups data and the empirical cdf.
In order to assess whether the model fits the data appropriately, the empirical cdf and estimated cdf of the OLLExGa regression are plotted in Figure 10(c) for the sex explanatory variable. We can note a significant difference between female and male individuals in relation to the weights of the rats.
5.3. Application 3: degrees brix of yacon data
The data from the third application (yacon data) were taken from the package agricolae [4] available in R software having as source: CIP, Experimental field, 2003. The data were kindly provided by Ivan Manrique and Carolina Tasso. The third application considers the OLLExGa regression with two systematic components for μ and σ. The data (n = 432) refer to a native plant of the Peruvian Andes called yacon (Smallanthus sonchifolius), which is a common plant in the country. We use the covariable location (Cajamarca, Lima, Oxapampa) to verify how much this variable explains the response variable degrees brix (a numerical scale that measures the density or sugar concentration of solutions). The data belong to the International Potato Center in Lima (Peru).
The variables for the regression analysis are ():
: degrees brix of yacon (response variable);
: locale (, , ) (two dummy variables).
In the first part, we perform a univariate analysis considering only the response variable. Table 11 gives the descriptive analysis of the response variable, where the mean and median are 9.431 and 10.450, respectively. Then, the data have an asymmetric shape.
Table 11. Descriptive statistics for degrees brix of yacon.
Mean | Median | SD | Skewness | Kurtosis | Min. | Max. |
---|---|---|---|---|---|---|
9.4310 | 10.4500 | 3.6674 | −0.0492 | −1.5410 | 2.9000 | 16.1000 |
We provide in Table 12 the AIC, BIC and GD measures for three fitted regressions which indicate that the OLLExGa regression has the lowest values of these statistics. So, it could be chosen as the best regression for these data.
Table 12. MLEs, SEs and AIC, BIC and GD values for the fitted models to degrees brix data.
Model | μ | τ | AIC | BIC | GD | ||
---|---|---|---|---|---|---|---|
OLLExGa | 9.5262 | −0.1805 | −3.892 | 0.1194 | 2149.156 | 2165.652 | 2141.379 |
(0.0448) | (0.0079) | (4811.252) | (0.0065) | ||||
ExGa | 9.1299 | 1.2943 | −1.212 | 1 | 2353.742 | 2365.950 | 2347.744 |
(0.9904) | (0.0400) | (3.276) | (-) | ||||
normal | 9.4313 | 1.2983 | (-) | (-) | 2351.722 | 2359.859 | 2347.722 |
(0.1762) | (0.0340) | (-) | (-) |
The LR statistic for testing the hypotheses versus , i.e. to compare the OLLExGa and ExGa regressions, is (). Clearly, the proposed distribution outperforms the ExGa distribution based on the value of this statistic. We display in Figure 11(a), the histogram of the data and the plots of the fitted OLLExGa, ExGa and normal densities. We given in Figure 11(b) the plots of the empirical cdf and fitted OLLExGa, ExGa and normal cumulative distributions. The plots confirm that the OLLExGa distribution provides a better fit to these data. The plot in Figure 11(a) reveals that the degrees brix histogram has a bimodality shape, where the ExGa and normal distributions cannot have this shape.
Figure 11.
(a) Estimated densities of the OLLExGa, ExGa and normal models for power generation data. (b) Estimated cumulative functions of the OLLExGa, ExGa and normal models and the empirical cdf for degrees brix data.
Regression analysis with two systematic components
We consider the OLLExGa regression with two systematic components
and
Table 14 gives the MLEs, SEs and their p-values for this fitted regression. The figures in this table indicate that all covariables are significant in the two systematic components for a significance level. Thus, there is a significant difference between the localities [Cajamarca (0) vs Lima (1)] and [Cajamarca (0) versus Oxapampa (2)] in relation to the degrees brix of yacon.
Table 13. MLEs, SEs and p-values for the OLLExGa regression fitted to the degrees brix of yacon data.
Parameter | Estimate | SE | p-Value |
---|---|---|---|
9.7882 | 0.2093 | <0.001 | |
−3.8355 | 0.1996 | 0.0021 | |
2.7567 | 0.2251 | <0.001 | |
0.3485 | 0.1530 | 0.0233 | |
−0.8574 | 0.0639 | 0.0021 | |
−1.2253 | 0.0623 | <0.001 | |
−3.5055 | 0.1513 | ||
τ | 0.2890 | 0.0659 |
Further, the figures in Table 15 indicate that the OLLExGa regression has the lowest AIC, BIC and GD values among those of the fitted regressions. So, it could be chosen as the best model. We compare the OLLExGa and ExGa regressions using LR statistic. This statistic for testing the hypotheses versus , is w = 26.7174 (), which yields favorable indications toward to the OLLExGa regression (see Table 16).
Table 14. Goodness-of-fit measures for degrees brix of yacon data.
Model | AIC | BIC | GD |
---|---|---|---|
OLLExGa | 1714.103 | 1746.651 | 1698.103 |
normal | 1736.773 | 1761.184 | 1724.773 |
ExGa | 1738.821 | 1767.300 | 1724.821 |
We use the R software to compute the and measures in the diagnostic analysis presented in Section 4. The results of such influence measures plots are displayed in Figure 12. These plots reveal that the cases , , , , , , and are possible influential observations.
Table 15. LR tests for degrees brix of yacon data.
Models | Hypotheses | Statistic w | p-value |
---|---|---|---|
OLLExGa vs ExGa | vs | 26.7174 | <0.001 |
Figure 12.
Index plot for : (a) (likelihood distance) and (b) (generalized Cook's distance).
In addition, Figure 13(a) gives plots of the qrs for the fitted regression which indicate a random behavior of these residuals and that the cases and are out of the range . The normal plot for the qrs with a generated envelope is displayed in Figure 13(b). These plots confirm that the model assumptions hold and that the fitted regression explain the data of the degrees brix of the yacon plant to certain localities of Peru.
Figure 13.
(a) Index plot of the . (b) Normal probability plot with envelope for . (c) Estimated cdf from the fitted OLLExGa regression model and empirical cdf (for the localities) for the degrees brix of yacon data.
A graphical comparison among the localities (Cajamarca, Lima, Oxapampa) is give in Figure 13(c). These plots provide the empirical cdf and the estimated cdf of the OLLExGa regression model. It is clear from these plots that the OLLExGa regression presents a good fit. We can also note that there is a significant difference among localities in relation to the degrees brix of yacon. In summary, the OLLExGa regression outperforms the ExGa and normal regressions irrespective of the criteria and then it can be effectively adopted to fit these data.
6. Concluding remarks
We propose a new four parameter odd log-logistic exponential Gaussian (OLLExGa) distribution which allows modeling of bimodal data without requiring mixtures of distributions. We also define a more realistic regression based on the new distribution for modeling agriculture real data. It is an important extension of some well-known regression models. Some Monte Carlo simulation studies investigate the accuracy of the normal approximation for the maximum-likelihood estimators of the model parameters. Diagnostic measures and quantile residuals are investigated to verify the sensitivity of these estimators. We prove empirically the usefulness of the proposed models by means of three real data sets applied in agricultural experiments.
Appendix.
In this Appendix, we derive useful expansions for in (3) and in (4) to find structural properties for the proposed distribution. First, we consider the following power series for for any real τ since
(A1) |
where
For any real τ, we use the generalized binomial expansion
(A2) |
Inserting (A1) and (A2) in Equation (3), we have
where (for ). The ratio of these two power series is expressed as
(A3) |
where the coefficients 's (for ) are determined from the recurrence equation
In practical terms, we need only six terms in (A3) to achieve good approximations for . By differentiating (A3), we obtain he density of Y
(A4) |
where is the exponentiated exponential Gaussian (EEGa) density function with power parameter k + 1 (for ).
Further, we can write the cdf of the ExGa distribution from (1) as
where , , , and is the error function.
By expanding the last integral (say I) in Taylor series using Mathematica at v = 0,
where
, , and , etc. Then, we can rewrite as
where and for . We can take at most four to six terms in this power series to approximate adequately .
A power series raised to a positive integer power k is given by [8], Section 0.314
(A5) |
where the coefficients (for ) are determined recursively from the equation
and . The coefficient can be given explicitly in terms of the coefficients 's, although it is not necessary for programming numerically our expansions in any algebraic or numerical software.
Hence, the density function of Y can be expressed from (A4) and (A5) as
(A6) |
where . In applications, the index k runs at most up to five.
We can obtain some mathematical properties of the OLLExGa distribution using (A6) from those of the ExGa distribution with small values for i such as four. For example, the nth ordinary moment of Y can be expressed as a linear combination of those ordinary moments of W with orders n, n + 1 up to n + 4.
Let be any integrable function on a real line and be the qf of the ExGa distribution. We can write
(A7) |
Hence, based on (A7), several mathematical quantities of the OLLExGa distribution can be computed numerically from linear combinations of integrals over of adequate functions involving only the qf of the ExGa distribution.
Funding Statement
This work was supported by Conselho Nacional de Desenvolvimento Científico e Tecnológico (CNPq) and Coordenação de Aperfeiçoamento de Pessoal de Nível Superior (CAPES), Brazil.
Disclosure statement
No potential conflict of interest was reported by the author(s).
ORCID
Gauss Moutinho Cordeiro http://orcid.org/0000-0002-3052-6551
Edwin Moises Marcos Ortega http://orcid.org/0000-0003-3999-7402
References
- 1.Atkinson A.C., Plots, Transformations, and Regression: An Introduction to Graphical Methods of Diagnostic Regression Analysis, Clarendon Press, Oxford, 1985. [Google Scholar]
- 2.Cook R.D., Assessment of local influence, J. R. Stat. Soc. Ser. B Methodol. 48 (1986), pp. 133–155. [Google Scholar]
- 3.Cook R.D. and Weisberg S., Residuals and Influence in Regression, Chapman and Hall, New York, 1982. [Google Scholar]
- 4.de Mendiburu F. and de Mendiburu M.F., Package ‘agricolae’. R Package, 2019, pp. 1–2.
- 5.Dunn P. and Smyth G., Randomized quantile residuals, J. Comput. Graph. Stat. 5 (1996), pp. 236–244. [Google Scholar]
- 6.Gleaton J.U. and Lynch J.D., Properties of generalized log-logistic families of lifetime distributions, J. Probab. Statist. Sci. 4 (2006), pp. 51–64. [Google Scholar]
- 7.Golubev A., Exponentially modified Gaussian (EMG) relevance to distributions related to cell proliferation and differentiation, J. Theor. Biol. 262 (2010), pp. 257–266. doi: 10.1016/j.jtbi.2009.10.005 [DOI] [PubMed] [Google Scholar]
- 8.Gradshteyn I.S. and Ryzhik I.M., Table of Integrals Series, and Products, Academic Press, San Diego, CA, 2000. [Google Scholar]
- 9.Hamedani G.G., Altun E., Korkmaz M.Ç., Yousof H.M., and Butt N.S., A new extended G family of continuous distributions with mathematical properties, characterizations and regression modeling, Pakistan J. Statist. Oper. Res. 14 (2018), pp. 737–758. doi: 10.18187/pjsor.v14i3.2484 [DOI] [Google Scholar]
- 10.Hashimoto E.M., Ortega E.M.M., Cordeiro G.M., Canchoand V.G., and Klauberg C., Zero-spiked regression models generated by gamma random variables with application in the resin oil production, J. Stat. Comput. Simul. 89 (2018), pp. 52–70. doi: 10.1080/00949655.2018.1534116 [DOI] [Google Scholar]
- 11.Kalambet Y., Kozmin Y., Mikhailova k., Nagaev I., and Tikhonov P., Reconstruction of chromatographic peaks using the exponentially modified Gaussian function, J. Chemom. 25 (2011), pp. 352–356. doi: 10.1002/cem.1343 [DOI] [Google Scholar]
- 12.Korkmaz M.Ç., Altun E., Alizadeh M., and Yousof H.M., A new flexible lifetime model with log-location regression modeling, properties and applications, J. Stat. Manag. Syst. 22 (2019), pp. 871–891. doi: 10.1080/09720510.2019.1572980 [DOI] [Google Scholar]
- 13.Korkmaz M.Ç., Altun E., Yousof H.M., and Hamedani G.G., The odd power Lindley generator of probability distributions: Properties, characterizations and regression modeling, Int. J. Statist. Probab. 8 (2019), pp. 70–89. doi: 10.5539/ijsp.v8n2p70 [DOI] [Google Scholar]
- 14.Korkmaz M.Ç., Cordeiro G.M., Yousof H.M., Pescim R.R., Afify A.Z., and Nadarajah S., The Weibull Marshall-Olkin family: Regression model and application to censored data, Comm. Stat. Theory Methods 48 (2019), pp. 4171–4194. doi: 10.1080/03610926.2018.1490430 [DOI] [Google Scholar]
- 15.Magnitskiy S.V., Pasian C.C., Bennett M.A., and Metzger J.D., Effects of soaking cucumber and tomato seeds in paclobutrazol solutions on fruit weight, fruit size, and paclobutrazol level in fruits, HortScience 41 (2006), pp. 1446–1448. doi: 10.21273/HORTSCI.41.6.1446 [DOI] [Google Scholar]
- 16.Maguire J.D., Speed of germination aid in selection and evaluation for seedling emergence and vigor, Crop Sci. 2 (1962), pp. 176–177. doi: 10.2135/cropsci1962.0011183X000200020033x [DOI] [Google Scholar]
- 17.Oliveira H.T.B, Pereira E.C., Mendonça V., Silva R.M.S., Leite G.A., and Dantas L.L.G.R., Produção e qualidade de frutos de mangueira tommy aktins sob doses de paclobutrazol, Agropecuária Científica no Semiárido 10 (2015), pp. 89–92. [Google Scholar]
- 18.Palmer E.M., Horowitz T.S., Torralba A., and Wolfe J.M., What are the shapes of response time distributions in visual search, J. Exp. Psych. Hum. Percep. Perform. 37 (2011), pp. 58–71. doi: 10.1037/a0020747 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 19.Panobianco M., Vieira R.D., Krzyzanowski F.C., and Neto J.F., Electrical conductivity of soybean seed and correlation with seed coat lignin content, Seed Sci. Technol. 27(3) (1999), pp. 945–949. [Google Scholar]
- 20.Pinheiro J. C. and Bates D. M., Mixed-Effects Models in S and S-PLUS, Springer, NewYork, NY, 2006. [Google Scholar]
- 21.Prataviera F., Ortega E.M.M., Cordeiro G.M., and Braga A.S., The heteroscedastic odd log-logistic generalized gamma regression model for censored data, Comm. Statist. – Simulation Comput. 48 (2018a), pp. 1–25. [Google Scholar]
- 22.Prataviera F., Ortega E.M.M., Cordeiro G.M., Pescim R.R., and Verssani B.A.W, A new generalized odd log-logistic flexible Weibull regression model with applications in repairable systems, Reliab. Eng. Syst. Safety. 176 (2018b), pp. 13–26. doi: 10.1016/j.ress.2018.03.034 [DOI] [Google Scholar]
- 23.Prataviera F., Vasconcelos J.C.S., Cordeiro G.M., Hashimoto E.M., and Ortega. E.M.M., The exponentiated power exponential regression model with different regression structures: application in nursing data, J. Appl. Statist. 46 (2019), pp. 1792–1821. doi: 10.1080/02664763.2019.1572719 [DOI] [Google Scholar]
- 25.Rezende É.M.D., Oliveira J.A., Carvalho E.R., Clemente A.D.C.S., and Oliveira G.E., Physiological quality of tomato seeds treated with polymers in combination with paclobutrazol, J. Seed Sci. 39 (2017), pp. 338–343. doi: 10.1590/2317-1545v39n4164432 [DOI] [Google Scholar]
- 26.Seleguini A., Faria Júnior M.J.A., Bennet K.S.S., Lemos O.L., and Seno S., Estratégias para produção de mudas de tomateiro utilizando paclobutrazol, Semina: Ciências Agrárias 34 (2013), pp. 539–548. [Google Scholar]
- 27.Souza Vasconcelos J.C., Cordeiro G.M., Ortega E.M., and Araújo E.G., The new odd log-Logistic generalized inverse Gaussian regression model, J. Probab. Statist. 2019 (2019), pp. 1–13. doi: 10.1155/2019/8575424 [DOI] [Google Scholar]
- 28.Stasinopoulos D.M. and Rigby R.A., Generalized additive models for location scale and shape (GAMLSS) in R, J. Statist. Softw. 23 (2007), pp. 1–46. doi: 10.18637/jss.v023.i07 [DOI] [Google Scholar]
- 29.Taiz L. and Zeiger E., Fisiologia Vegetal, 5th ed., Editora Artmed, Semina: Porto Alegre, 2013. [Google Scholar]
- 30.Tyson D.R., Garbett S.P., Frick P.L., and Quaranta V., Fractional proliferation: A method to deconvolve cell population dynamics from single-cell data, Nat. Methods 9 (2012), pp. 923–928. doi: 10.1038/nmeth.2138 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 31.Xie F.C. and Wei B.C., Diagnostics analysis in censored generalized poisson regression model, J. Stat. Comput. Simul. 77 (2007), pp. 695–708. doi: 10.1080/10629360600581316 [DOI] [Google Scholar]