Abstract
While there has been considerable research on the analysis of extreme values and outliers by using heavy-tailed distributions, little is known about the semi-heavy-tailed behaviors of data when there are a few suspicious outliers. To address the situation where data are skewed possessing semi-heavy tails, we introduce two new skewed distribution families of the hyperbolic secant with exciting properties. We extend the semi-heavy-tailedness property of data to a linear regression model. In particular, we investigate the asymptotic properties of the ML estimators of the regression parameters when the error term has a semi-heavy-tailed distribution. We conduct simulation studies comparing the ML estimators of the regression parameters under various assumptions for the distribution of the error term. We also provide three real examples to show the priority of the semi-heavy-tailedness of the error term comparing to heavy-tailedness. Online supplementary materials for this article are available. All the new proposed models in this work are implemented by the shs R package, which can be found on the GitHub webpage.
Keywords: Asymptotic properties, heavy-tailed distribution, ML estimators, skew hyperbolic secant distributions, semi-heavy-tailed distribution
1. Introduction
Practitioners usually use heavy-tailed distributions when robustness to potential outliers is a concern. For symmetric distributions, to use robust estimation techniques is often a standard approach for implementing statistical inference. However, robust methods do not work for asymmetric distributions that lead to strongly biased parameter estimates [32]. Hence, accounting for both skewness and tick tails features of real data is an attractive benefit for statistical modeling and inference, especially in the context of a regression model. Several methods to obtain skewed heavy-tailed distributions have been presented in the literature; e.g. Fernandez and Steel [12], Ma and Genton [24], Azzalini and Capitanio [4], Jones and Faddy [22], Ferreira and Steel [14], and many others.
In a broad variety of applied contexts, such as economics, finance, and hydrology, the thickness of the tail of the distribution is not too heavy, but semi-heavy. We have experienced that the semi-heavy-tailed distributions often fit observed data quite well and better rather than heavy-tailed alternatives when there are a few suspicious outliers. Therefore, we consider a situation where data are skewed possessing semi-heavy tails. Indeed, we address the case where a skewed semi-heavy-tailed distribution for the error term in a linear regression is a better option rather than asymmetric heavy-tailed alternatives. To this end, we propose two new families of skewed distributions that have a semi-heavy tail, in some sense. Furthermore, we carry through a simulation study to validate that the proposed distributions perform over the alternative skewed heavy-tailed distributions concerning model fitting and information criteria, including Akaike information criterion (AIC; [1]) and Bayesian information criterion (BIC; [29]).
There are a few works of literature on symmetric semi-heavy-tailed distributions; e.g. hyperbolic secant (HS) family of distributions that originates from Fisher [18], according to Fischer [15]. Vaughan [34] introduced a skewed version of the HS distribution, named generalized HS (GHS), and Fischer and Vaughan [16] added an alternative one called it as skewed generalized HS (SGSH) distribution. Another skewed version of a semi-heavy-tailed distribution, beta hyperbolic secant (BHS), submitted by Fischer and Vaughan [17].
These skewed versions have complex probability densities and cumulative distribution functions. Also, their moments are usually complicated to calculate due to the existence of an infinite series. Furthermore, the asymptotic behavior of maximum likelihood (ML) estimators of the relevant parameters in a more applicable linear regression model, for these distributions, is not usually discussed. Preferably, our proposed distributions (Section 2) have several advantages, including (1) simple formula for density functions, (2) easy to calculate moments, (3) better fit to the real data, and (4) simple stochastic representations that lead to introduce efficient algorithms for generating random numbers.
In Section 2, we present two new skewed semi-heavy-tailed distributions and their basic properties. We enter them into a linear regression model in Section 3 as the distribution of errors, then verify the consistency and asymptotic normality of the corresponding ML estimators. In Section 4, we provide the results of a simulation study that compares the performance of the ML estimators in a linear regression model when errors are generated with our skewed semi-heavy-tailed distributions and also with a heavy-tailed skew Student-t distribution [2]. Section 5 presents three real data analyses with (i) acoustic comfort evaluation data, (ii) pricing diamond stones data, and (iii) the excess rate of return for a company, named Martin Marietta, and an index of the excess rate of return for the New York stock exchange. The final section contains concluding remarks. All technical details are available online as supplementary materials. All methods presented in this work are implemented in the shs R package [27], which can be found on the GitHub webpage .1
2. Proposed sampling models
A hyperbolic secant distribution is symmetric, and its density is bell-shaped similar to the Gaussian distribution but has slightly heavier tails. The probability density function of an HS distribution is given by
where μ is the mean and σ is the scale parameter. The variance of the distribution is .
Fischer [15] showed that if V has a half-Cauchy density function, then has a HS distribution. Using the fact that Cauchy distribution is the special case of a Student-t distribution with 1 degree of freedom, we can get a generalized HS distribution. Let Y has a half Student-t distribution with ν degrees of freedom, then has a standard new asymmetric HS distribution. We refer to this distribution as the skew HS type 1 () distribution. The probability density function of Z is given by
| (1) |
Figure 1 shows a variety of densities for several values of ν. Given the density (1) for Z, it is easy to show that follows . When , reduces to the HS distribution with location parameter μ and scale parameter σ. When or , distribution is negatively or positively skewed, respectively. Cumulative distribution function of (1) is also given by
where is the incomplete beta function.
Figure 1.
densities for different values of ν.
The following expression for the moment generating function of Z can be obtained (see the supplementary materials for the proof).
Proposition 2.1
For an distribution with density (1), the moment generating function is given by
Therefore, we can easily calculate that
where , , , , in which is the digamma function and , , and are its first, second, and third derivatives, respectively. We can check that for , and , which are the mean and variance of standard HS distribution, respectively.
2.1. Second proposal
It could be attractive to have a skewness parameter whose negative or positive values display a negative or positive skewness of data or vice versa. The distribution has not such a feature. To achieve this, we introduce another generalization for HS distribution and refer to that as SHS type 2 (). For this distribution, the probability density function is given by
| (2) |
If the random variable X has the density (2), we denote it as . The following proposition, confirms that has the symmetry property of skewness parameter.
Proposition 2.2
If then .
When , the distribution reduces to the HS distribution. When or , distribution is positively or negatively skewed, respectively. Figure 2 shows a variety of densities for several values of α. It is also easy to show that the cumulative distribution function of is as follows:
Furthermore, there is a simple stochastic representation for distribution as well; certainly, if , then agrees with density (2).
Figure 2.
densities for different values of α.
When , the following expressions for characteristic and moment generating functions can be obtained.
Proposition 2.3
Characteristic and moment generating functions of Z is given by
respectively.
By applying Proposition 2.3, we can show that, for example
2.2. Tail behavior
Figure 3 plots the tails of densities in (1) and (2) and compares them with normal and Cauchy densities. The Cauchy distribution has a thick tail, while the normal distribution has a thin tail. Intuitively, our proposed distributions have tails lighter than Cauchy (as a well-known heavy-tailed distribution) but heavier than normal, hence possess semi-heavy tails. This intuition can be formalized by studying the behavior of the tails.
Figure 3.
Comparison of tails for Cauchy (solid), normal (dotted), (dotted-dashed), and (dashed) densities.
The following theorem demonstrates that the SHS distributions have semi-heavy tails. The theorem is proved for the standard case. First, formal definitions of heavy and semi-heavy tails are required. We consider the following definitions provided by Omey et al. [26].
Definition 2.4
The random variable X has a fat tail density if its density function satisfies
Definition 2.5
A density function is called a semi-heavy-tailed function if it is of the form , , where is a fat tail density function.
Theorem 2.6
Both and distributions have semi-heavy tails.
3. Regression modeling
In linear regression modeling, the usual normality assumption for error terms is not appropriate in cases where observations may be skewed and may have heavy or semi-heavy tails. Departures from normality of the errors may have adverse effects on inferential results [5]. Many remedies arise throughout the statistics literature on extending the classical normal linear regression model; e.g. Student-t regression [23], scale mixtures of normal [13], skew-normal regression [36], finite mixture of normals [31], and Skew-t components regression [9]. However, none of these models explicitly examine semi-heavy tails distributions for errors. Furthermore, the identifiability and computational burden of the mixture-based approaches could be a challenge. The methods that are based on Student-t regression also have a limitation regarding the estimation of the degrees of freedom parameter, once its estimation is known to be complicated and computationally expensive.
Here, we address the situation where error terms are skewed and have a semi-heavy tail. We extend a likelihood-based approach for inference when the errors follow SHS distributions. We assume that the observations , , to be generated from
| (3) |
where is a vector of covariates, is the vector of unknown regression parameters, and σ is a scale parameter. We assume that the error terms are i.i.d. or . In the following, we study each model separately.
3.1. error term
When the error terms follow the distribution, the log-likelihood of the model (3) is given by
To maximize ℓ with respect to the parameters of the model, we proceed through a quasi-Newton optimization method, the Broyden–Fletcher–Goldfarb–Shanno (BFGS) algorithm, which was proposed by Broyden [6], Fletcher [19], Goldfarb [20], and Shanno [30], independently. Let denote the vector of the parameters. The components of the score vector
are
3.2. error term
For errors with distribution, the log-likelihood of the model (3) is given by
For this log-likelihood function, the components of the score vector
are given by
To compute the ML estimators of the parameters in this model, we use a BFGS method as well.
3.3. Asymptotic properties
Let denote an design matrix. We assume to be rank k. Let Θ be the parameter space for model, an open subset of . Also let Ψ be the paramater space for model, an open subset of . The second derivative matrix of can be written as
where its elements are given in the supplementary materials. Similarly, the second derivative matrix of (see the supplementary materials) can be written as
Using model for the error term, we obtain the Fisher information matrix as
and for model, we get
See the supplementary materials for the details.
To prove the asymptotic results, we need the following conditions:
The true parameter vector , in the model, is an interior point of Θ.
The true parameter vector , in the model, is an interior point of Ψ.
as , for , where is a finite real number.
converges to a finite and positive definite matrix C, as .
as , for .
The following two theorems reveal the consistency and asymptotic normality of the maximum likelihood estimators in our two proposed models.
Theorem 3.1
Given the conditions C1, C3, C4, and C5, the maximum likelihood estimators of the linear regression model (3) with the model for error term, is weakly consistent and
Theorem 3.2
Given the conditions C2, C3, C4, and C5, the maximum likelihood estimators of the linear regression model (3) with the model for error term, is weakly consistent and
4. Simulation study
We conducted a simulation study to assess and compare the performance of our two proposed regression models to one another and to the skew Student-t (SSt) regression, as a heavy-tail model, and the skew-normal (SN; [3]) regression model. We simulated the responses from a linear model:
In this model, for errors with skew normal distribution, the density of ε is given by
where and are density and cumulative distribution functions of the standard normal distribution, respectively. Here, α is the skewness parameter of the density. For errors with skew Student-t distribution, the density of the error term is given by
in which and are density and cumulative distribution functions of standard Student-t distribution with ν degrees of freedom, respectively. Similar to skew-normal model, α controls the skewness of the errors.
In our study, we examined the following configuration options:
the sample size is n = 100, n = 200, or n = 500.
the regression covariates and are generated from the standard normal distribution.
the real values of regression coefficients are set to . The real value for scale parameter is also specified by .
both covariates and response variable are centered on its mean.
Toward developing an understanding of the performance of these models, we considered three simulation scenarios that vary in terms of the true underlying error distribution:
.
In the first and second scenarios, the error distribution is set to correspond to the SHS distributions so that the observations in the simulated data possess a semi-heavy tail. We chose and for and , respectively, to have both positive and negative skewness. To evaluate the performance of the model when the errors do follow a heavy-tailed distribution, the third scenario takes it to be a skew Student-t distribution. Specifically, we are curious to assess the sensitivity of the proposed semi-heavy-tailed distributions on the misspecification of the tail properties of errors.
The above configurations imply a total of nine different models, and we simulated and estimated R = 1000 times for each. We also computed bias and the mean squared error (MSE) of the parameter estimates for each model. To compare these measures in a better way and faster, we calculated a relative version of them as a ratio of the measure obtained by estimated misspecifying models and the estimated true model for the error term. Results for scenario (1) are summarized in Table 1, those for scenario (2) in Table 2, and those for scenario (3) in Table 3. In each scenario, for the true model, the pure bias and MSE of the estimates are reported as well.
Table 1. Relative bias and MSE of the parameter estimates under distributed errors.
| n | 100 | 200 | 500 | ||||
|---|---|---|---|---|---|---|---|
| Model | |||||||
| Bias | 0.0002 | −0.0012 | −0.0076 | −0.0038 | −0.0039 | −0.0000 | |
| MSE | 0.0117 | 0.0117 | 0.0052 | 0.0058 | 0.0025 | 0.0023 | |
| Bias | 0.2031 | 1.1926 | 1.1041 | 1.0298 | 1.0769 | 4.8573 | |
| MSE | 1.0660 | 1.0659 | 1.0798 | 1.0843 | 1.0803 | 1.0914 | |
| SSt | Bias | 1.8930 | 1.2208 | 0.9862 | 1.1240 | 0.9527 | 8.9018 |
| MSE | 1.0334 | 1.0236 | 1.0202 | 1.0348 | 1.0035 | 1.0250 | |
| SN | Bias | −0.3185 | 0.6290 | 1.2384 | 1.9219 | 0.8533 | 44.9954 |
| MSE | 1.2076 | 1.2073 | 1.2146 | 1.2215 | 1.1709 | 1.2559 | |
Notes: For the model, the pure bias and MSE of the estimates are reported.
Table 2. Relative bias and MSE of the parameter estimates under distributed errors.
| n | 100 | 200 | 500 | ||||
|---|---|---|---|---|---|---|---|
| Model | |||||||
| Bias | 0.9419 | 0.9442 | 0.9925 | 1.0840 | 0.8363 | 1.0166 | |
| MSE | 0.9596 | 0.9655 | 0.9738 | 0.9664 | 0.9617 | 0.9612 | |
| Bias | 0.0038 | 0.0044 | 0.0058 | 0.0023 | −0.0010 | −0.0022 | |
| MSE | 0.0427 | 0.0410 | 0.0224 | 0.0209 | 0.0079 | 0.0083 | |
| SSt | Bias | 1.6202 | 0.8584 | 1.0212 | 1.3580 | 0.2751 | 1.0326 |
| MSE | 0.9510 | 0.9504 | 0.9379 | 0.9430 | 0.9335 | 0.9282 | |
| SN | Bias | 1.4444 | 0.9297 | 0.5085 | 0.9448 | 0.8340 | 2.0620 |
| MSE | 1.2020 | 1.1738 | 1.1808 | 1.2000 | 1.2733 | 1.1664 | |
Note: For the model, the pure bias and MSE of the estimates are reported.
Table 3. Relative bias and MSE of the parameters estimates under skew Student-t distributed errors.
| n | 100 | 200 | 500 | ||||
|---|---|---|---|---|---|---|---|
| Model | |||||||
| Bias | 1.1092 | 1.1489 | 1.0177 | 1.0559 | −2.4400 | 1.2461 | |
| MSE | 1.0263 | 1.0112 | 1.0160 | 1.0172 | 1.0454 | 1.0119 | |
| Bias | 1.1678 | 1.1670 | 1.0147 | 0.9766 | −1.7733 | 1.4539 | |
| MSE | 1.0680 | 1.0496 | 1.0515 | 1.0524 | 1.0722 | 1.0502 | |
| SSt | Bias | 0.0027 | −0.0048 | −0.0033 | −0.0014 | 0.0001 | −0.0007 |
| MSE | 0.0074 | 0.0084 | 0.0041 | 0.0039 | 0.0016 | 0.0016 | |
| SN | Bias | 0.2147 | 0.9159 | 1.4184 | 1.4412 | −24.2274 | 1.9104 |
| MSE | 1.5658 | 1.5123 | 1.4132 | 1.4089 | 1.5629 | 1.4771 | |
Note: For the SSt model, the pure bias and MSE of the estimates are reported.
The results show that for all true error distributions, both the bias and MSE tend to zero as the sample size is increased; hence we can conclude the consistency and unbiasedness of the estimators of regression coefficients. Further, for all sample sizes and simulation scenarios, the estimators obtained by assuming the SN model for errors are inefficient. Its loss of efficiency comparing to the SSt model is substantial. For the scenario (3), when the error model is an SSt, the estimators obtained by the true model are uniformly more efficient; however, the difference with and models are minor. The same results can be derived for the scenario (1). Therefore, we can conclude the robustness of our proposed SHS models to misspecification of a heavy-tailed distribution like the SSt model in estimating regression parameters.
Figure 4 displays the average bias plus and minus the average standard error (SE) for the regression parameters under each scenario. The results in Figure 4 confirm those obtained in Tables 1–3; i.e. both the average bias and standard errors tend to zero as the sample size is increased. The lengths of intervals for all scenarios and sample sizes, under all models but the SN model, are almost the same. For the SN model, the length of the interval is the longest one.
Figure 4.
Average bias and confidence intervals on a standardized scale for different error distributions by sample size and true error distributions: (left), (middle), and SSt (right). (solid line), (dashed line), SSt (dotted line) and SN (dotdash line).
For the regression parameters, we also computed empirical coverage rates under different models for errors. Figure 5 displays the results for all scenarios and sample sizes. The results in Figure 5 show that the estimated coverage rates are close to the nominal level of for all sample sizes and simulation scenarios. It justifies the asymptotic normality of the ML estimators of the regression coefficients.
Figure 5.
Coverage rate for each regression parameter for different error distributions by sample size and true error distributions: SHS1 (left), (middle), and (right). (solid line), (dashed line), SSt (dotted line) and SN (dotdash line).
It is also of interest to directly compare the goodness of fit of the models to the observed data. To this end, we estimated two model assessment measures: the AIC and the BIC. We do not compare the absolute values of AICs or BICs directly, but considers their difference which is defined (e.g.for AIC) as
in which the is the true model for the error term and is the misspecified one. The negative values for DAICs/DBICs are in favor of the correct model. Box plots of DAIC values for all simulation scenarios and sample sizes are shown in Figure 6, and box plots of DBIC values are displayed in Figure 7.
Figure 6.
Box plots of DAIC values for misspecified error models by sample size and true error distributions: (left), (middle), and SSt (right).
Figure 7.
Box plots of DBIC values for misspecified error models by sample size and true error distributions: (left), (middle), and SSt (right).
For scenarios (1) and (3), the correct model shows superior performance, mainly when the sample size is increased. However, for scenario (2), using an incorrectly specified SSt or model leads to better performance in a model selection based on both AIC and BIC. Also, for all sample sizes and simulation scenarios, the SN model has poor performance.
To some extent, it can be said that for the correct SSt model, the performance of incorrectly specified SHS models is approximately equivalent to the correct one based on AIC/BIC. Again, it shows the robustness of the proposed SHS models in model selection for heavy-tailed data.
5. Real examples
In this section, we present three real data analyses with (i) Acoustic comfort evaluation data, (ii) Pricing diamond stones data, and (iii) Martin Marietta Company data. We compared eight different distributions for error terms, including , , SN, SSt, GSH, SGHS, BHS, and normal distributions, having .
In this paper, for testing goodness of regression fit under different models for error terms, we consider the Kolmogorov–Smirnov test based on the empirical distribution of residuals. Specifically, we assume that the regression model can be written in a location-scale form as
with error distribution . If denotes the ML estimates of all parameters including both regression coefficients and parameters of error distribution, then the error distribution under this model is built as
Therefore, the linear regression model with a specific distribution for error terms is true if and only if the distributions and are the same. By this result, a Kolmogorov–Smirnov type test for the error distribution can be constructed by comparing the empirical distribution of the residuals, , with the one estimated under the assumed distribution, , as follows:
5.1. Acoustic comfort evaluation data
Zhang et al. [37] used a linear regression framework with Gaussian error terms to model the acoustic comfort evaluation in the common problems of the engineering vehicles with high environmental noise. This high-level noise pollutes the surrounding environment and endangers the driver's physical and mental health. As Zhang et al. [37] have been noticed, in the future, acoustic vehicle comfort will be an essential topic for research due to its close connection to daily life. Therefore, it is vital to address how to evaluate the acoustic comfort of engineering vehicles.
The most studies of acoustic comfort focus on acoustic subjective and objective evaluations and their mathematical mapping [10,25,35]. The objective assessment of acoustic comfort is based on psychoacoustic parameters, including loudness, sharpness, and so on. For subjective evaluation, annoyance could also be a proper index. Linear regression is a standard model to establish the mapping function of psychoacoustic objective parameters and subjective assessment as well (Zhang et al. [37]).
The data, in this example, include an actual application case of 50 noise samples from forklifts. The subjective evaluation index (annoyance) was divided into ten grades, in which the noise annoyance was further subdivided into five parts from low to high, and each part had two grades, as shown in Table 1 of Zhang et al. [37]. Similar to Zhang et al. [37], we selected nine objective parameters as the objective evaluation indexes: Linear sound pressure level (LSPL), A-weighted sound pressure level (ASPL), loudness, sharpness, roughness, fluctuation, tonality, articulation index (AI), and impulsiveness. The details of the data are given in Table 2 of Zhang et al. [37]. These data include one (or even two) extreme observations (Figure 2 in Zhang et al. [37], not reported here). Hence, it means that it may not be relevant to assume Gaussian errors for the regression model. By this feature, we considered the proposed different families for error distribution.
We first performed multiple covariate analyses by using the ML approach to select the best subset of objective parameters. From our results based on all considered error distributions, not shown here, our analysis will be centered on the model with only loudness, sharpness, and impulsiveness as the covariates. We obtained the ML estimates and their standard errors for the selected covariates and computed AICs and BICs, under all models for errors. The results are reported in Tabel 4. It shows that the normal model returns substantially different regression coefficient estimates in comparison with other models. Generally, there is not a consistent agreement in regression coefficient estimates between the given error models, but for our proposed SHS models. All three covariates have a significant positive effect on the response; i.e. higher amounts of annoyance are associated with higher values of these covariates simultaneously.
Table 4. ML inferences under different models for errors for forklift data.
| BHS | GSH | SGHS | SSt | SN | Normal | |||
|---|---|---|---|---|---|---|---|---|
| Regression parameters: | ||||||||
| constant | ||||||||
| se | ||||||||
| loudness | ||||||||
| se | ||||||||
| sharpness | ||||||||
| se | ||||||||
| impulsiveness | 2.0420 | |||||||
| se | 0.4005 | |||||||
| Distributional parameters: | ||||||||
| Scale | ||||||||
| se | ||||||||
| Shape1 | ||||||||
| se | ||||||||
| Shape2 | ||||||||
| se | ||||||||
| d.f. | ||||||||
| se | ||||||||
| AIC | 59.0161 | 68.5898 | 71.7423 | 74.6952 | 84.1990 | |||
| BIC | 70.4882 | 81.9739 | 85.1265 | 84.1673 | 93.7591 | |||
Note: Bold values indicate the best selected one.
Smaller values of AIC and BIC indicate a more appropriate model. Hence, for the models we considered here, the model fits the data better according to both AIC and BIC, with the same complexity as others. It also fits the data much better than the normal (Normal) model for the errors. The performance of the model is almost equivalent to the model. Tabel 5 also shows the results of the Kolmogorov–Smirnov test corresponding to all error models. According to the results, we can verify the goodness of fit of all models, excluding the normal model, when the size of the test is 0.1.
Table 5. The Kolmogorov–Smirnov test for forklift data.
| BHS | GSH | SGHS | SSt | SN | Normal | |||
|---|---|---|---|---|---|---|---|---|
| T | 0.0577 | 0.0628 | ||||||
| P-value | 0.9928 | 0.0656 |
Overall, the results of forklift data analysis provide strong arguments in favor of modeling the tail of the errors when it is semi-heavy, not heavy.
5.2. Diamond data
Chu [8] describes the development of a pricing model for diamond stones using data that appeared in an advertisement in Singapore's Business Times edition of 18 February 2000. A total of 308 diamond stones were included in the analysis. This dataset is available in the Ecdat R package [21]. We assumed a linear model with the logarithm of the price of diamond stones, as the response variable, and the covariates, weight (carat), clarity, color, and certification body. The weight of a diamond stone is given in terms of carat units. One carat is equal to 0.2 g.
The last three covariates are categorical. Clarity of a diamond stone is classified in descending order as internally flawless (IF), very very slightly imperfect (VVS1 or VVS2), and very slightly imperfect (VS1 or VS2). The most prized diamonds display color purity. Top color purity invites a grade of D. Subsequent degrees of color purity are graded E, F, G, and so on. For the present data, we are facing six different degrees of color purity. Also, the different certification bodies assay diamond stones and provide each of them with a certificate listing their caratage and their grades of clarity and color. Three certification bodies are GIA, IGI, and HRD [8].
We included the categorical covariates in the regression model as a set of dummy variables. We selected color I as the baseline category for the color purity and compared it to the other five colors. For the clarity, we selected VVS2 as the baseline category. Likewise, for the certification body, the IGI is chosen as the baseline category. According to the results of Chu [8], we also considered an interaction term between carat and certification bodies.
Table 6 shows the ML estimates and their standard errors of the regression coefficients and the parameters of error distribution under different models for error terms. Notably, the result from the normal model is different from the rest, exhibiting more inflation in standard errors of the estimated coefficients compared to others. This inflation results in different conclusions comparing to other models; for example, based on the normal errors, the difference between VS1 and VVS2 is not significant, which is not in agreement with the rest. When the other models for errors are applied, all effects are significant and are consistent in the sign of estimates. Table 6 also reports the corresponding values of AIC and BIC. Both criteria provide strong support for the model and indicate that our proposed model is the most appropriate distribution for the error terms. For this example, the performance of the model is also almost equivalent to the model.
Table 6. ML inferences under different models for errors for diamond data.
| BHS | GSH | SGHS | SSt | SN | Normal | |||
|---|---|---|---|---|---|---|---|---|
| Regression parameters: | ||||||||
| constant | ||||||||
| se | ||||||||
| carat | ||||||||
| se | ||||||||
| D | ||||||||
| se | ||||||||
| E | ||||||||
| se | ||||||||
| F | ||||||||
| se | ||||||||
| G | ||||||||
| se | ||||||||
| H | ||||||||
| se | ||||||||
| IF | ||||||||
| se | ||||||||
| VS1 | ||||||||
| se | ||||||||
| VS2 | ||||||||
| se | ||||||||
| VVS1 | ||||||||
| se | ||||||||
| GIA | ||||||||
| se | ||||||||
| HRD | ||||||||
| se | 0.1298 | |||||||
| carat × GIA | ||||||||
| se | ||||||||
| carat × HRD | ||||||||
| se | ||||||||
| Distributional parameters: | ||||||||
| Scale | 0.1052 | |||||||
| se | 0.0073 | 0.0734 | ||||||
| Shape1 | ||||||||
| se | – | 0.0582 | 2.2730 | |||||
| Shape2 | 1.8586 | |||||||
| se | – | – | 0.3920 | 0.5829 | ||||
| d.f. | 9.3565 | 5.8957 | ||||||
| se | (3.4098) | – | – | 1.2959 | ||||
| AIC | −471.6554 | −493.3932 | −421.9113 | −462.4386 | −430.2067 | −448.6855 | −459.4364 | −322.4285 |
| BIC | −408.2437 | −429.9815 | −354.7695 | −399.0269 | −363.0649 | −381.5437 | −398.0247 | −262.7469 |
Tabel 7 shows the results of the Kolmogorov–Smirnov test for different error models, rendering the goodness of fit for all models when the size of the test is 0.05. However, a substantially better fit of the model could be seen clearly in Figure 8.
Table 7. The Kolmogorov–Smirnov test for diamond data.
| BHS | GSH | SGHS | SSt | SN | Normal | |||
|---|---|---|---|---|---|---|---|---|
| T | 0.0725 | |||||||
| P-value | 0.0788 |
Figure 8.
Histograms of the residuals and the corresponding estimated densities for different error models for diamond data.
5.3. Martin Marietta company data
Butler et al. [7] analyzed a set of data on the excess rate of return for the Martin Marietta Company. They considered a linear regression of Y, the excess rate of return for the Martin Marietta Company, on CRSP, an index of the excess rate of return for the New York stock exchange, as follows:
| (4) |
The data consist of 60 monthly observations collected in 5 years, from January 1982 to December 1986. This dataset has also been analyzed by Azzalini and Capitanio [4], Taylor and Verbyla [33], DiCiccio and Monti [11], and Salazar et al. [28]. These data include one very extreme observation (Figure 9); Moreover, according to DiCiccio and Monti [11], some diagnostic measures show that two additional points are possible outliers. Therefore it indicates that it may not be appropriate to assume Gaussian errors for the regression model.
Figure 9.
Scatterplot of the Martin Marietta data and fitted regression lines with different distributions for error term.
We now proceed to consider the proposed different families for error distributions. The ML inferences are reported in Tabel 6. It shows that the Normal estimates return substantially different regression coefficient estimates in comparison with other models. As Butler et al. [7] have also mentioned, the estimated regression coefficients produced by the Normal method classify Martin Marietta Company into a different risk classification. The Normal estimate of means Martin Marietta Company rates as a very aggressive investment relative to the market portfolio. However, the corresponding estimates for other models put the company as moderately aggressive. Further, the corresponding estimates for SSt and SN models are somewhat different compared to semi-heavy-tailed distribution families. The smaller values for in our proposed semi-heavy-tailed models than the SSt and SN models, emphasize more on classifying the company as moderately aggressive.
Table 8. ML inferences under different models for errors for the Martin Marietta Company data.
| BHS | GSH | SGHS | SN | SSt | Normal | |||
|---|---|---|---|---|---|---|---|---|
| Regression parameters: | ||||||||
| Intercept | −0.0275 | −0.0460 | −0.0492 | −0.0073 | −0.0075 | −0.0933 | −0.0755 | 0.0011 |
| (0.0125) | (0.0167) | (0.0332) | (0.0087) | (0.0089) | (0.0125) | (0.0207) | (0.0087) | |
| Slope | 1.2482 | 1.2643 | 1.2515 | 1.2456 | 1.2459 | 1.3790 | 1.3391 | 1.8024 |
| (0.1937) | (0.1979) | (0.2035) | (0.2029) | (0.1942) | (0.2415) | (0.2087) | (0.2015) | |
| Distributional parameters: | ||||||||
| Scale | 0.0313 | 0.0400 | 0.0368 | 0.0911 | 0.0849 | 0.1362 | 0.0937 | 0.0928 |
| (0.0076) | (0.0072) | (0.0232) | (0.0180) | (0.0155) | (0.0293) | (0.0042) | (0.0085) | |
| Shape1 | 0.4386 | 1.2657 | −2.5607 | 0.9019 | 3.9154 | 2.4587 | ||
| (0.1555) | (1.3370) | (0.3520) | (0.0752) | (1.4049) | (1.4575) | |||
| Shape2 | 0.5108 | −2.4090 | ||||||
| (0.3731) | (0.4474) | |||||||
| d.f. | 0.4320 | 6.0660 | ||||||
| (0.1488) | (5.4508) | |||||||
| AIC | −135.1765 | −135.3089 | −133.176 | −132.1232 | −124.0341 | −134.0729 | −124.0341 | −108.9898 |
| BIC | −126.7991 | −126.9316 | −122.7043 | −123.7458 | −115.6567 | −123.6012 | −115.6567 | −102.7068 |
For the models we considered here, the model fits the data better according to both AIC and BIC. It also fits the data much better than the normal model for the errors. Similar to two previous examples, the performance of the model is almost equivalent to the model.
Figure 8 shows the fitted regression lines following the supposed distributions for the error term in the model (4). For the Normal fitted model, the regression line is shrunk to upward due to the extreme observations. However, the semi-heavy-tailed models limit the influence of extreme observations and provide a robust alternative when there exist a few outliers. On the other hand, both SN and SSt models give high leverage observations less influence than the semi-heavy-tailed models due to their heavy tail properties, which causes their regression lines to be more shrunk to downward. Therefore, a heavy tail assumption for the distribution of errors for these data results in a poor model fit.
For these data, the Kolmogorov–Smirnov test (Table 9) verifies the goodness of regression fit under all models for errors, as well.
Table 9. The Kolmogorov–Smirnov test for Martin Marietta data.
| BHS | GSH | SGHS | SSt | SN | Normal | |||
|---|---|---|---|---|---|---|---|---|
| T | 0.0864 | 0.0852 | 0.0897 | 0.0761 | 0.0759 | 0.0768 | 0.1386 | 0.1474 |
| P-value | 0.7289 | 0.7436 | 0.6861 | 0.8513 | 0.8534 | 0.8445 | 0.1816 | 0.1334 |
6. Conclusions
To consider the semi-heavy-tailedness of data, in the presence of a few outliers, could provide more efficient inferences in the real applications when they do not possess a heavy tail. We developed a regression methodology based on the semi-heavy-tailedness of the errors in a linear model under two new skewed version of the hyperbolic secant distribution. We fitted our models using ML, which is the standard nowadays. We also examined the asymptotic properties of the ML estimators of the regression parameters in our proposed methodology, including consistency and asymptotic normality.
This article demonstrates the utility of the proposed methodology in applied situations when data are not too heavy, using simulated data as well as data from real applications. Our suggested models are also robust comparing to actual heavy-tailedness of data.
Future work will also include extensions of the models for use with multivariate and spatial data. An extension to multivariate data would consist of not only to consider the spatial dependence of each variable but also to examine the spatial dependence between each pair of variables.
Supplementary Material
Acknowledgments
The authors wish to thank two referees and the associate editor for their valuable comments and suggestions that improved the previous version of this paper.
Note
Install the package from github with library(devtools)install_git (" git://github.com/jamilownuk/shs.git" ).
Disclosure statement
No potential conflict of interest was reported by the author(s).
References
- 1.Akaike H., A new look at the statistical model identification, in Selected Papers of Hirotugu Akaike, Parzen, Emanuel, Tanabe, Kunio, and Kitagawa, Genshiro, eds., Springer, New York, 1974, pp. 215–222
- 2.Arellano-Valle R.B. and Azzalini A., The centred parameterization and related quantities of the skew-t distribution, J. Multivar. Anal. 113 (2013), pp. 73–90. doi: 10.1016/j.jmva.2011.05.016 [DOI] [Google Scholar]
- 3.Azzalini A., A class of distributions which includes the normal ones, Scand. J. Stat. 12 (1985), pp. 171–178. [Google Scholar]
- 4.Azzalini A. and Capitanio A., Distributions generated by perturbation of symmetry with emphasis on a multivariate skew t distribution, J. R. Stat. Soc.: Ser. B (Stat. Methodol.) 65 (2003), pp. 367–389. doi: 10.1111/1467-9868.00391 [DOI] [Google Scholar]
- 5.Bartolucci F. and Scaccia L., The use of mixtures for dealing with non-normal regression errors, Comput. Stat. Data. Anal. 48 (2005), pp. 821–834. doi: 10.1016/j.csda.2004.04.005 [DOI] [Google Scholar]
- 6.Broyden C.G., The convergence of a class of double-rank minimization algorithms 1. General considerations, IMA J. Appl. Math. 6 (1970), pp. 76–90. doi: 10.1093/imamat/6.1.76 [DOI] [Google Scholar]
- 7.Butler R.J., McDonald J.B., Nelson R.D., and White S.B., Robust and partially adaptive estimation of regression models, Rev. Econ. Stat. 72 (1990), pp. 321–327. doi: 10.2307/2109722 [DOI] [Google Scholar]
- 8.Chu S., Pricing the C's of diamond stones, J. Stat. Educ. 9 (2001), pp. 1–12. 10.1080/10691898.2001.11910659. [DOI] [Google Scholar]
- 9.da Silva N.B., Prates M.O., and Gonçalves F.B., Bayesian linear regression models with flexible error distributions, arXiv preprint arXiv:1711.04376.
- 10.Di G.Q., Chen X.W., Song K., Zhou B., and Pei C.M., Improvement of Zwicker's psychoacoustic annoyance model aiming at tonal noises, Appl. Acoust. 105 (2016), pp. 164–170. doi: 10.1016/j.apacoust.2015.12.006 [DOI] [Google Scholar]
- 11.DiCiccio T.J. and Monti A.C., Inferential aspects of the skew exponential power distribution, J. Am. Stat. Assoc. 99 (2004), pp. 439–450. doi: 10.1198/016214504000000359 [DOI] [Google Scholar]
- 12.Fernandez C. and Steel M.F., On Bayesian modeling of fat tails and skewness, J. Am. Stat. Assoc. 93 (1998), pp. 359–371. [Google Scholar]
- 13.Ferreira C.S. and Lachos V.H., Nonlinear regression models under skew scale mixtures of normal distributions, Stat. Methodol. 33 (2016), pp. 131–146. doi: 10.1016/j.stamet.2016.08.004 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 14.Ferreira J.T.S. and Steel M.F.J., A constructive representation of univariate skewed distributions, J. Am. Stat. Assoc. 101 (2006), pp. 823–829. doi: 10.1198/016214505000001212 [DOI] [Google Scholar]
- 15.Fischer M.J., Generalized Hyperbolic Secant Distributions: with Applications to Finance, Springer Science & Business Media, Heidelberg, 2013. [Google Scholar]
- 16.Fischer M.J. and Vaughan D., Classes of skew generalized hyperbolic secant distributions, Diskussionspapiere, Friedrich-Alexander-Universität Erlangen-Nürnberg, Lehrstuhl für Statistik und Ökonometrie, 2002
- 17.Fischer M.J. and Vaughan D., The beta-hyperbolic secant distribution, Austrian J. Stat. 39 (2010), pp. 245–258. doi: 10.17713/ajs.v39i3.247 [DOI] [Google Scholar]
- 18.Fisher R.A., On the'probable error'of a coefficient of correlation deduced from a small sample, Metron 1 (1921), pp. 1–32. [Google Scholar]
- 19.Fletcher R., A new approach to variable metric algorithms, Comput. J. 13 (1970), pp. 317–322. doi: 10.1093/comjnl/13.3.317 [DOI] [Google Scholar]
- 20.Goldfarb D., A family of variable-metric methods derived by variational means, Math. Comput. 24 (1970), pp. 23–26. doi: 10.1090/S0025-5718-1970-0258249-6 [DOI] [Google Scholar]
- 21.Graves S., Ecdat: Data Sets for Econometrics, R package (2019) version 0.3-3.
- 22.Jones M. and Faddy M., A skew extension of the t distribution, with applications, J. R. Stat. Soc.: Ser. B (Stat. Methodol.) 65 (2003), pp. 159–174. doi: 10.1111/1467-9868.00378 [DOI] [Google Scholar]
- 23.Lange K.L., Little R.J., and Taylor J.M., Robust statistical modeling using the t-distribution, J. Am. Stat. Assoc. 84 (1989), pp. 881–896. [Google Scholar]
- 24.Ma Y. and Genton M.G., Flexible class of skew-symmetric distributions, Scand. J. Stat. 31 (2004), pp. 459–468. doi: 10.1111/j.1467-9469.2004.03_007.x [DOI] [Google Scholar]
- 25.Ma C., Ma C., Li Q., Liu Q., Wang D., Gau J., Tang H., and Sun Y., Sound quality evaluation of noise of hub permanent-magnet synchronous motors for electric vehicles, IEEE Trans. Ind. Electron. 63 (2016), pp. 5663–5673. doi: 10.1109/TIE.2016.2569067 [DOI] [Google Scholar]
- 26.Omey E., Van Gulck S., and Vesilo R., Semi-heavy tails, Lith. Math. J. 58 (2018), pp. 480–499. doi: 10.1007/s10986-018-9417-0 [DOI] [Google Scholar]
- 27.R Core Team , R: A Language and Environment for Statistical Computing, R Foundation for Statistical Computing, Vienna, Austria, 2019. Available at http://www.R-project.org/. [Google Scholar]
- 28.Salazar E., Ferreira M.A., and Migon H.S., Objective Bayesian analysis for exponential power regression models, Sankhya B 74 (2012), pp. 107–125. doi: 10.1007/s13571-012-0045-0 [DOI] [Google Scholar]
- 29.Schwarz G., Estimating the dimension of a model, Ann. Stat. 6 (1978), pp. 461–464. doi: 10.1214/aos/1176344136 [DOI] [Google Scholar]
- 30.Shanno D.F., Conditioning of quasi-Newton methods for function minimization, Math. Comput. 24 (1970), pp. 647–656. doi: 10.1090/S0025-5718-1970-0274029-X [DOI] [Google Scholar]
- 31.Soffritti G. and Galimberti G., Multivariate linear regression with non-normal errors: a solution based on mixture models, Stat. Comput. 21 (2011), pp. 523–536. doi: 10.1007/s11222-010-9190-3 [DOI] [Google Scholar]
- 32.Takeuchi I., Bengio Y., and Kanamori T., Robust regression with asymmetric heavy-tail noise distributions, Neural Comput. 14 (2002), pp. 2469–2496. doi: 10.1162/08997660260293300 [DOI] [PubMed] [Google Scholar]
- 33.Taylor J. and Verbyla A., Joint modelling of location and scale parameters of the t distribution, Stat. Modell., 4 (2004), pp. 91–112. doi: 10.1191/1471082X04st068oa [DOI] [Google Scholar]
- 34.Vaughan D.C., The generalized secant hyperbolic distribution and its properties, Commun. Stat. Theory Methods 31 (2002), pp. 219–238. doi: 10.1081/STA-120002647 [DOI] [Google Scholar]
- 35.Xu Z.M., Xia X.J., He Y.S., and Zhang F.Z., Analysis and evaluation of car engine starting sound quality, J. Vib. Shock 11 (2014), pp. 142–147. [Google Scholar]
- 36.Zeller C.B., Cabral C.R., and Lachos V.H., Robust mixture regression modeling based on scale mixtures of skew-normal distributions, Test 25 (2016), pp. 375–396. doi: 10.1007/s11749-015-0460-4 [DOI] [Google Scholar]
- 37.Zhang E., Zhang Q., Xiao J., Hou L., and Guo T., Acoustic comfort evaluation modeling and improvement test of a forklift based on rank score comparison and multiple linear regression, Appl. Acoust. 135 (2018), pp. 29–36. doi: 10.1016/j.apacoust.2018.01.026 [DOI] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.









