A new regression model for bimodal data and applications in agriculture

Julio Cezar Souza Vasconcelos; Gauss Moutinho Cordeiro; Edwin Moises Marcos Ortega; Édila Maria de Rezende

doi:10.1080/02664763.2020.1723503

. 2020 Feb 5;48(2):349–372. doi: 10.1080/02664763.2020.1723503

A new regression model for bimodal data and applications in agriculture

Julio Cezar Souza Vasconcelos ^a,^CONTACT, Gauss Moutinho Cordeiro ^b, Edwin Moises Marcos Ortega ^a, Édila Maria de Rezende ^c

PMCID: PMC9042034 PMID: 35707692

Abstract

We define the odd log-logistic exponential Gaussian regression with two systematic components, which extends the heteroscedastic Gaussian regression and it is suitable for bimodal data quite common in the agriculture area. We estimate the parameters by the method of maximum likelihood. Some simulations indicate that the maximum-likelihood estimators are accurate. The model assumptions are checked through case deletion and quantile residuals. The usefulness of the new regression model is illustrated by means of three real data sets in different areas of agriculture, where the data present bimodality.

Keywords: Agriculture data, bimodal data, exponential Gaussian distribution, regression model, simulation study

1. Introduction

The normal (Gaussian) distribution is used to model many phenomena in almost all areas. It is adequate for real data when most of the data are near to the mean. On the other hand, the exponential is a continuous distribution with positive support. It is one of the simplest probabilistic models used to describe time to failure.

If $Y_{1} \sim N (μ, σ^{2})$ and $Y_{2} \sim Exp (ν)$ , where $ν = E (Y_{2}$ ), and $Y_{1}$ and $Y_{2}$ are independent random variables, then the sum $Y = Y_{1} + Y_{2}$ has the exponential Gaussian (ExGa) distribution, say $Y \sim ExGa (μ, σ^{2}, ν)$ . Some results were reported for the ExGa distribution. For example, [28] implemented this distribution in R software (GAMLSS), [7] proved that it may provide better fits for some classes of phenomena including intermitotic time and protein expression variability data. Further, [11] used the ExGa distribution for reconstruction of chromatographic peaks, [18] applied it in experiments to measure response item and [30] used this distribution for integrated extended time-lapse automated imaging to quantify the dynamics of cell proliferation. All these papers consider unimodal data, but in some situations, this assumption does not hold. For example, we consider the following datasets:

The data are related to the index of germination speed of tomato seeds. The research was developed at the Central Seed Laboratory of the Federal University of Lavras, Lavras, MG, Brazil (see Figure 1(a)).
Another data set consists of the degrees Brix (a measure of the density or sugar concentration of solutions) of yacon (a tuber native to Peru) (see Figure 1(b)).

Figure 1(a,b) displays the existence of a bimodal data distribution. These data sets are analyzed in this paper in the application section. Our first objective is to define a new distribution, called the odd log-logistic exponential Gaussian (OLLExGa), to model data with two modes (bimodal). In many practical situations, the response variable is affected by several explanatory variables, such as temperature, radiation, sulfurgran, ascorbic acid, germination index, among others. The regression model that provides a better fit tends to produce more precise estimates for the quantities of interest.

Recently, some studies of regressions have been published in different contexts. For example, [21] introduced the heteroscedastic odd log-logistic generalized gamma regression for censored data, [10] studied a zero-spiked regression models generated by gamma random variables with application in the resin oil production and [22] considered a generalized odd log-logistic flexible Weibull regression with applications in repairable systems, [27] proposed the odd log-logistic generalized inverse Gaussian with real estate data regression, among others. Further, [9] defined the G family of continuous distributions with mathematical properties, characterizations and regression modeling, [13] presented the odd power Lindley generator of probability distributions with properties, characterizations and regression modeling, [14] introduced the Weibull Marshall–Olkin family with regression and application to censored data and [12] proposed a new flexible lifetime model with log-location regression, properties and applications.

Based on these surveys, our second objective is to construct a regression based on the OLLExGa distribution to model bimodal data by considering a classic analysis and with different applications in agriculture. The inferential part is carried out using asymptotic maximum-likelihood estimators (MLEs). Some Monte Carlo simulation studies are performed to verify the accuracy of the OLLExGa regression by means of the variance and mean squared error. We check the model assumptions and detect possible influential or extreme observations that can cause distortions in the results of the fitted regression. An efficient way to detect these observations, called case deletion or global influence, was proposed by [2]. We introduce quantile residuals (qrs) to check the regression assumptions and carry out simulation studies to evaluate their empirical distribution when the data are bimodal. We draw envelope plots as a measure of the goodness-of-fit. Our research can be summarized in the following contributions:

First, we present the OLLExGa distribution to model bimodal data.
Second, based on the OLLExGa distribution, we propose a regression with two systematic components to model bimodal data. There are no classic models for bimodal data in the literature.
Third, we present diagnostic and residual analysis to verify all assumptions of the new regression.
Finally, we present three applications where the main motivation is the presence of bimodality in these data. We emphasize that in the first application, the researcher responsible for the execution of the experiment provides all final interpretations of these analyses. She even emphasized that the normal regression cannot be adopted for these data. In these terms, we are sure that our proposed regression can be used not only in the area of agriculture, but regression may be used in other areas. We focus on agricultural applications.

In Section 2, we define the OLLExGa distribution and display some plots. In Section 3, we propose the OLLExGa regression and investigate the accuracy of the MLEs from several simulations. In Section 4, we define qrs for the fitted regression and some diagnostic measures. We also provide a simulation study to check the normal approximation for these residuals. Three applications to real data in agriculture area in Section 5 confirm the flexibility of the OLLExGa distribution and its associated regression model. Section 6 ends with some conclusions.

2. The model definition

It is important to have extended forms of classic distributions in many applied areas such as agriculture data modeling. We adopt the parametrization of the ExGa distribution used in the GAMLSS library [28] in R. The cumulative distribution function (cdf) and probability density function (pdf) of the ExGa distribution are

G_{μ, σ, ν} (y) = \frac{1}{ν} \int_{0}^{y} \exp (\frac{μ - t}{ν} + \frac{σ^{2}}{2 ν^{2}}) Φ (\frac{t - μ}{σ} - \frac{σ}{ν}) d t, y \in R,

(1)

and

g_{μ, σ, ν} (y) = \frac{1}{ν} \exp (\frac{μ - y}{ν} + \frac{σ^{2}}{2 ν^{2}}) Φ (\frac{y - μ}{σ} - \frac{σ}{ν}),

(2)

respectively, where $μ \in R$ and $σ > 0$ are the mean and standard deviation of the normal distribution, $ν > 0$ is the mean of the exponential variable and $Φ (\cdot)$ is the standard normal cumulative function.

Let $W \sim ExGa (μ, σ, ν)$ be a random variable having density function (2). The moment generating function (mgf) of W is $M_{W} (t) = (1 - ν t)^{- 1} \exp (μ t + σ^{2} t^{2} / 2)$ . It can be checked from $M_{W} (t)$ that the ExGa distribution converges to the normal distribution when ν goes to zero. By differentiating $M_{W} (t)$ , the mean, variance, skewness and kurtosis of W are

\begin{aligned} E (W) & = μ + ν, V (W) = σ^{2} + ν^{2}, \\ S (W) & = 2 {(1 + \frac{σ^{2}}{ν^{2}})}^{- 3 / 2} and K (W) = 6 {(1 + \frac{σ^{2}}{ν^{2}})}^{- 2}, \end{aligned}

respectively.

Based on the odd log-logistic generator (OLL-G) class [6], we define the OLLExGa cdf, say $F (y) = F (y; μ, σ, ν, τ)$ , by integrating the log-logistic density function with shape parameter $τ > 0$ , namely

F (y) = \int_{0}^{G_{μ, σ, ν} (y) / {\bar{G}}_{μ, σ ν} (y)} \frac{τ x^{τ - 1}}{(1 + x^{τ})^{2}} d x = \frac{G_{μ, σ, ν} (y)^{τ}}{G_{μ, σ, ν} (y)^{τ} + {\bar{G}}_{μ, σ, ν} (y)^{τ}},

(3)

where ${\bar{G}}_{μ, σ, ν} (y) = 1 - G_{μ, σ, ν} (y)$ . Hereafter, we assume that the random variable Y follows the cdf (3) with parameters $(μ, σ, ν, τ)^{T}$ , say $Y \sim OLLExGa (μ, σ, ν, τ)$ . The OLLExGa distribution includes as special cases the ExGa distribution when $τ = 1$ and the normal distribution when $τ = 1$ and $ν = 0$ .

Consider $η (y) = G_{μ, σ, ν} (y)$ to simplify the notation. The density function of Y has the form

\begin{aligned} f (y) & = f (y; μ, σ, ν, τ) = \frac{τ}{ν} \exp (\frac{μ - y}{ν} + \frac{σ^{2}}{2 ν^{2}}) Φ (\frac{y - μ}{σ} - \frac{σ}{ν}) \\ \times {η (y) [1 - η (y)]}^{τ - 1} {η (y)^{τ} + [1 - η (y)]^{τ}}^{- 2} . \end{aligned}

(4)

The main motivation for the new distribution is to make its skewness more flexible (compared to the ExGa model) and allow bimodality. Equation (4) provides greater flexibility of the tails of the density and can be widely applied in many areas of engineering and biology.

Plots of the density (4) for selected parameter values are displayed in Figure 2. It is clear that the proposed distribution is much more flexible, especially in relation to bimodality (for $0 < τ < 0.5$ ) than the ExGa distribution, which does not have this characteristic.

The quantile function (qf) of the OLLExGa distribution can be expressed as

y = Q_{ExGa} (\frac{u^{1 / τ}}{u^{1 / τ} + [1 - u]^{1 / τ}}),

(5)

where $Q_{ExGa} (u) = G_{μ, σ, ν}^{- 1} (u)$ is the qf of the ExGa distribution available in the GAMLSS package [28].

This scheme is useful because of the existence of fast generators for the ExGa random variables in some statistical packages. The plots comparing the exact OLLExGa densities and the histograms from two simulated data sets with 100,000 replications for selected parameter values are displayed in Figure 3. These plots (and several others not shown here) reveal that the simulated values are consistent with the OLLExGa distribution.

Figure 3. — Histograms and plots of the OLLExGa densities.

In the Appendix, we derive some mathematical properties of the OLLExGa distribution including a linear representation for its density function.

3. The OLLExGa regression

In several problems of the medical, biological, industrial and chemical areas, among others, it is of great interest to verify if two or more variables are related in some way. To investigate this relationship is very important to construct a regression model. The data collection allows to know the nature of the relationship between variables and to carry out studies capable of accommodating unexpected situations, such as variability in raw material, ambient temperature, machine and operators. They are built with the following objectives: model formulation, parameter estimation, inference, diagnostic and residual analysis and prediction. In this research, we focus on our these goals. In these terms, the OLLExGa regression is a very competitive alternative to the ExGa regression.

The regression technique aims to choose the distribution of Y given the matrix $X = (x_{1}, \dots, x_{n})^{T}$ of explanatory variables. The parameters μ and σ are related to the explanatory variables by the systematic components

μ_{i} = x_{i}^{T} β_{1} and e σ_{i} = \exp (x_{i}^{T} β_{2}), i = 1, \dots, n,

(6)

respectively, where $x_{i}^{T} = (x_{i 1}, \dots, x_{i p})$ and $β_{1} = (β_{11}, \dots, β_{1 p})^{T}$ and $β_{2} = (β_{21}, \dots, β_{2 p})^{T}$ are the unknown vectors of coefficients.

The total log-likelihood function for the vector of parameters $θ = (β_{1}^{T}, β_{2}^{T}, ν, τ)^{T}$ from model (6) given n independent observations $(y_{1}, x_{1}), \dots, (y_{n}, x_{n})$ has the form

\begin{aligned} l (θ) & = n \log (τ) - n \log (ν) + \sum_{i = 1}^{n} (\frac{μ_{i} - y_{i}}{ν} + \frac{σ_{i}^{2}}{2 ν^{2}}) + \sum_{i = 1}^{n} \log Φ (\frac{y_{i} - μ_{i}}{σ_{i}} + \frac{σ_{i}}{ν}) \\ + (τ - 1) \sum_{i = 1}^{n} \log {η (y_{i}) [1 - η (y_{i})]} - 2 \sum_{i = 1}^{n} \log {η (y_{i})^{τ} + [1 - η (y_{i})]^{τ}} . \end{aligned}

(7)

The log-likelihood (7) can be maximized numerically using the GAMLSS software to find the MLE $\hat{θ}$ of $θ$ . By fitting the ExGa regression (with $τ = 1$ ) yields initial values for $β_{1}$ and $β_{2}$ . Some simulations of the fitted model (6) confirm the adequacy of this maximization in Section 3.1.

The elements of the $(2 p + 2) \times (2 p + 2)$ Hessian matrix $\ddot{L} (θ)$ can be determined numerically in the R software. The multivariate normal distribution $N_{2 p + 2} (0, - \ddot{L} (\hat{θ})^{- 1})$ can approximate the distribution of $\hat{θ}$ since Equation (4) satisfies some standard regularity conditions. More importantly, it can be utilized to obtain approximate confidence intervals for the parameters in $θ$ . The adequacy of some special models of the OLLExGa regression can be verified via likelihood ratio (LR) statistics.

3.1. Two simulation studies

In this section, we provide two simulation studies: one to examine the adequacy of the MLEs in the OLLExGa distribution and other to investigate the adequacy of the estimates in the regression model with systematic components for μ and σ.

First simulation: the OLLExGa distribution

Some properties of the MLEs are examined using a classical analysis by means of a Monte Carlo simulation study. We simulate the OLLExGa distribution as follows: (i) Generate $u \sim$ $U (0, 1)$ ; (ii) Obtain OLLExGa observations $y = Q_{E x G a} (u)$ from Equation (5).

We set $μ = 10$ , $σ = 0.8$ , $ν = 0.4$ and $τ = 0.2$ to provide bimodality in the data as shown in Figure 3(b). We choose four scenarios (n = 50, 100, 500 and 1000) for the replications to calculate $\hat{μ}$ , $\hat{σ}$ , $\hat{ν}$ and $\hat{τ}$ . Then, we obtain the average estimates (AEs), biases and means square errors (MSEs) from 1000 Monte Carlo simulations via the GAMLSS software. The results listed in Table 1 confirm the accuracy of the estimates and that their MSEs decrease when n increases in agreement with first-order asymptotic theory.
Second Simulation: the OLLExGa regression

We examine the performance of the MLEs in the OLLExGa regression by means of some simulations with n = 100, 300 and 500. We simulate 1, 000 samples from two scenarios ( $τ = 0.5$ and $τ = 1.3$ ). For both cases, we take $β_{10} = 2.1$ , $β_{11} = - 0.4$ , $β_{12} = 0.3$ , $β_{20} = - 1$ , $β_{21} = 0.2$ , $β_{22} = - 0.1$ and $ν = 0.4$ under the systematic components $μ_{i} = β_{10} + β_{11} x_{i 1} + β_{12} x_{i 2}$ and $σ_{i} = β_{20} + β_{21} x_{i 1} + β_{22} x_{i 2}$ . The response variable $Y_{i}$ and explanatory variables $X_{i 1}$ and $X_{i 2}$ are generated as follows: $Y_{i} \sim OLLExGa (μ_{i}, σ_{i}, ν, τ)$ , $X_{i 1} \sim Uniform (0, 1)$ and $X_{i 2} \sim Binomial (2, 0.5)$ .

We calculate the AEs, biases and MSEs for each fitted regression. The figures in Table 2 reveal that the MSEs of the estimates tend to zero and the AEs converge to the true parameters when n increases. Both facts strongly support that the approximate normal distribution is adequate to the finite sample distribution of the estimates.

Table 1. AEs, biases and MSEs for the parameters of the OLLExGa distribution.

	Scenario 1				Scenario 2
	n = 50				n = 100
Parameter	AE	Bias	MSE	Parameter	AE	Bias	MSE
$μ$	9.1779	−0.8221	10.7552	$μ$	9.4233	−0.5767	4.2744
$σ$	1.1141	0.3141	0.3622	$σ$	0.9858	0.1858	0.1772
$ν$	0.5415	0.1415	1.4680	$ν$	0.6357	0.2357	0.5249
$τ$	0.4363	0.2363	0.2081	$τ$	0.3335	0.1335	0.1446
	Scenario 3				Scenario 4
	n = 500				n = 1000
Parameter	AE	Bias	MSE	Parameter	AE	Bias	MSE
$μ$	9.8203	−0.1797	1.1707	$μ$	9.8306	−0.1694	0.5779
$σ$	0.9151	0.1151	0.0376	$σ$	0.8952	0.0952	0.0214
$ν$	0.5886	0.1886	0.1332	$ν$	0.5734	0.1734	0.0750
$τ$	0.2508	0.0508	0.0083	$τ$	0.2391	0.0391	0.0038

Scenario 1
	n = 100			n = 300			n = 500
Parameter	AE	Bias	MSE	AE	Bias	MSE	AE	Bias	MSE
$β_{10}$	2.0729	−0.0271	0.1386	2.0861	−0.0139	0.0537	2.0793	−0.0207	0.0332
$β_{11}$	−0.4115	−0.0115	0.1359	−0.4128	−0.0128	0.0442	−0.4007	−0.0007	0.0262
$β_{12}$	0.3035	0.0035	0.0251	0.3011	0.0011	0.0076	0.3031	0.0031	0.0046
$β_{20}$	−1.1173	−0.1173	0.2861	−1.0281	−0.0281	0.0807	−1.0163	−0.0163	0.0447
$β_{21}$	0.2438	0.0438	0.4549	0.1978	−0.0022	0.0971	0.2107	0.0107	0.0525
$β_{22}$	−0.1085	−0.0085	0.0706	−0.1101	−0.0101	0.0146	−0.0998	0.0002	0.0068
ν	0.4499	0.0499	0.1129	0.4282	0.0282	0.0497	0.4254	0.0254	0.0329
τ	0.5468	0.0468	0.0948	0.5225	0.0225	0.0407	0.5225	0.0225	0.0282
Scenario 2
	n = 100			n = 300			n = 500
Parameter	AE	Bias	MSE	AE	Bias	MSE	AE	Bias	MSE
$β_{10}$	2.1881	0.0881	0.0534	2.1125	0.0125	0.0259	2.1021	0.0021	0.0186
$β_{11}$	−0.4042	−0.0042	0.0216	−0.4008	−0.0008	0.0065	−0.3979	0.0020	0.0037
$β_{12}$	0.2997	−0.0003	0.0034	0.3022	0.0022	0.0011	0.2999	−0.0001	0.0007
$β_{20}$	−1.1240	−0.1240	0.7935	−1.0332	−0.0332	0.2199	−1.0255	−0.0255	0.0919
$β_{21}$	0.2087	0.0087	0.9323	0.2093	0.0093	0.0976	0.2032	0.0032	0.0417
$β_{22}$	−0.1009	−0.0009	0.0713	−0.1004	−0.0004	0.0150	−0.1038	−0.0038	0.0079
ν	0.3060	−0.0940	0.0677	0.3875	−0.0125	0.0359	0.4001	0.0001	0.0270
τ	1.4524	0.1524	1.5819	1.4080	0.1080	0.9306	1.3395	0.0395	0.4188

Model	μ	$\log (σ)$	$\log (ν)$	τ	AIC	BIC	GD
OLLExGa	10.7499	0.2458	−0.8386	0.1690	551.0679	561.3253	543.0679
	(0.0057)	(0.1718)	(0.0002)	(0.0373)
ExGa	4.5912	−0.5605	1.8585	1	569.9559	577.6489	563.9559
	(0.2914)	(0.4579)	(0.1114)	(–)
normal	11.0022	1.6405	(–)	(–)	591.4290	596.5577	587.4290
	(0.5265)	(0.0722)	(–)	(–)

	Sources of variation	Parameter	Estimate	SE	p-value
	Intercept	$β_{10}$	6.3362	2.2801	0.0068
	Dose 0.004	$β_{111}$	−0.8545	0.4373	0.0542
	Dose 0.008	$β_{112}$	−1.2836	0.3994	0.0019
μ	Dose 0.016	${\hat{β}}_{113}$	0.4649	0.3612	0.2018
	Period 2	${\hat{β}}_{12}$	9.7536	2.3479	<0.001
	Dose 0.004 × Period 2	$β_{131}$	−0.3778	2.3813	0.8744
	Dose 0.008 × Period 2	$β_{132}$	−0.1861	0.9406	0.8437
	Dose 0.016 × Period 2	$β_{133}$	−1.2384	0.9121	0.1785
	Intercept	$β_{20}$	1.8998	0.9169	0.0416
	Dose 0.004	$β_{211}$	0.0011	0.0106	0.9160
	Dose 0.008	$β_{212}$	−0.1719	0.2842	0.5470
σ	Dose 0.016	${\hat{β}}_{213}$	−0.4708	0.2902	0.1088
	Period 2	${\hat{β}}_{22}$	0.7969	0.3192	0.0147
	Dose 0.004 × Period 2	$β_{231}$	−0.7715	0.3595	0.0350
	Dose 0.008 × Period 2	$β_{232}$	0.1658	0.4678	0.7239
	Dose 0.016 × Period 2	$β_{233}$	0.4448	0.4728	0.3497
		$\log (ν)$	−0.8481	5.3061
		τ	6.9780	6.2670

Model	AIC	BIC	GD
OLLExGa	370.4092	416.5674	334.4092
normal	376.1662	417.1958	344.1662
ExGa	378.2402	421.8341	344.2402

	Sources of variation	Parameter	Estimates	SE	p-value
	Intercept	$β_{10}$	5.9078	0.8288	<0.001
	Dose 0.004	$β_{111}$	−0.8002	0.3649	0.0311
μ	Dose 0.008	$β_{112}$	−1.3167	0.3610	0.0005
	Dose 0.016	$β_{113}$	0.4241	0.3434	0.2204
	Period 2	$β_{12}$	9.2285	0.3235	<0.001
	Intercept	$β_{20}$	2.0641	1.4065	0.1461
	Period 2	$β_{22}$	0.8026	0.3664	0.0314
	Dose 0.004 × Period 1	$β_{231}$	0.0088	0.3319	0.9789
σ	Dose 0.004 × Period 2	$β_{232}$	−0.7739	0.3632	0.0361
	Dose 0.008 × Period 1	$β_{233}$	−0.1735	0.3386	0.6098
	Dose 0.008 × Period 2	$β_{234}$	−0.0075	0.3781	0.9841
	Dose 0.016 × Period 1	$β_{235}$	−0.4781	0.3408	0.1645
	Dose 0.016 × Period 2	$β_{236}$	0.0149	0.3879	0.9695
		$\log (ν)$	−0.1194	0.8552
		τ	8.3140	11.3920

Parameter	Estimate	SE	p-value
$β_{10}$	5.9099	0.0611	<0.001
$β_{11}$	0.2023	0.0653	0.0021
$β_{12}$	−0.4608	0.0789	<0.001
$β_{13}$	−0.3852	0.0785	<0.001
$\log (σ)$	0.9236	0.0431
$\log (ν)$	−1.2171	0.1093
τ	4.8450	0.2096

Model	AIC	BIC	GD
OLLExGa	585.4798	611.9016	571.4798
ExGa	593.1629	615.8103	581.1629
normal	598.2369	617.1097	588.2369

Model	μ	$\log (σ)$	$\log (ν)$	τ	AIC	BIC	GD
OLLExGa	9.5262	−0.1805	−3.892	0.1194	2149.156	2165.652	2141.379
	(0.0448)	(0.0079)	(4811.252)	(0.0065)
ExGa	9.1299	1.2943	−1.212	1	2353.742	2365.950	2347.744
	(0.9904)	(0.0400)	(3.276)	(-)
normal	9.4313	1.2983	(-)	(-)	2351.722	2359.859	2347.722
	(0.1762)	(0.0340)	(-)	(-)

Parameter	Estimate	SE	p-Value
$β_{10}$	9.7882	0.2093	<0.001
$β_{11}$	−3.8355	0.1996	0.0021
$β_{12}$	2.7567	0.2251	<0.001
$β_{20}$	0.3485	0.1530	0.0233
$β_{21}$	−0.8574	0.0639	0.0021
$β_{22}$	−1.2253	0.0623	<0.001
$\log (ν)$	−3.5055	0.1513
τ	0.2890	0.0659

Model	AIC	BIC	GD
OLLExGa	1714.103	1746.651	1698.103
normal	1736.773	1761.184	1724.773
ExGa	1738.821	1767.300	1724.821

PERMALINK

A new regression model for bimodal data and applications in agriculture

Julio Cezar Souza Vasconcelos

Gauss Moutinho Cordeiro

Edwin Moises Marcos Ortega

Édila Maria de Rezende

Abstract

1. Introduction

Figure 1.

2. The model definition

Figure 2.

Figure 3.

3. The OLLExGa regression

3.1. Two simulation studies

Table 1. AEs, biases and MSEs for the parameters of the OLLExGa distribution.

Table 2. AEs, biases and MSEs for the OLLExGa regression under scenarios 1 (τ=0.5) and 2 (τ=1.3).

4. Checking model

Figure 4.

Figure 5.

5. Applications

5.1. Application 1: tomato seeds data

Table 3. Descriptive statistics for the GSI response variable.

Table 4. MLEs, SEs and AIC, BIC and GD statistics for some models fitted to the tomato seeds data.

Figure 6.

Table 5. MLEs, SEs and p-values for the OLLExGa regression fitted to the tomato seeds data.

Table 6. AIC, BIC and GD statistics for some fitted regressions to the tomato seeds data.

Figure 7.

Figure 8.

Table 7. MLEs, SEs and p-values for the final OLLExGa regression fitted to the tomato seeds data.

5.2. Application 2: weight of rat pups data

Table 8. Descriptive statistics for the weights of rat pups data.

Table 9. MLEs, SEs and p-values for the fitted OLLExGa regression to the weights of rat pups data.

Table 10. Goodness-of-fit measures for the weights of rat pups data.

Figure 9.

Figure 10.

5.3. Application 3: degrees brix of yacon data

Table 11. Descriptive statistics for degrees brix of yacon.

Table 12. MLEs, SEs and AIC, BIC and GD values for the fitted models to degrees brix data.

Figure 11.

Table 13. MLEs, SEs and p-values for the OLLExGa regression fitted to the degrees brix of yacon data.

Table 14. Goodness-of-fit measures for degrees brix of yacon data.

Table 15. LR tests for degrees brix of yacon data.

Figure 12.

Figure 13.

6. Concluding remarks

Appendix.

Funding Statement

Disclosure statement

ORCID

References

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases

Table 2. AEs, biases and MSEs for the OLLExGa regression under scenarios 1 ( $τ = 0.5$ ) and 2 ( $τ = 1.3$ ).