ABSTRACT
The heteroscedastic nonlinear regression model (HNLM) is an important tool in data modeling. In this paper we propose a HNLM considering skew scale mixtures of normal (SSMN) distributions, which allows fitting asymmetric and heavy-tailed data simultaneously. Maximum likelihood (ML) estimation is performed via the expectation-maximization (EM) algorithm. The observed information matrix is derived analytically to account for standard errors. In addition, diagnostic analysis is developed using case-deletion measures and the local influence approach. A simulation study is developed to verify the empirical distribution of the likelihood ratio statistic, the power of the homogeneity of variances test and a study for misspecification of the structure function. The method proposed is also illustrated by analyzing a real dataset.
KEYWORDS: EM algorithm, heteroscedastic nonlinear regression models, influence diagnostics, likelihood ratio test, skew scale mixtures of normal distributions
1. Introduction
Nonlinear regression models (NLM) are applied in some areas of the sciences to model data for which nonlinear functions of unknown parameters are used to explain or describe the phenomena under study. For a broad discussion on nonlinear models, see, for instance, [4,30]. According [5], a large degree of heteroscedasticity is more commonly seen with data that are best fit by a nonlinear regression model than with data that can be adequately fit by a linear model. However, it is well known that several phenomena have asymmetry and/or heavy-and-lightly tailed behavior, so it is necessary to work with more flexible classes of distributions.
Some authors have proposed homoscedastic nonlinear regression models with an asymmetric structure in the error term. For instance, Cancho et al. [7] introduced the skew-normal nonlinear regression models (SN-NLM) and presented a complete likelihood based analysis, including an efficient EM-type algorithm for ML estimation. Garay et al. [18] introduced an extension of the SN-NLM by using the scale mixtures of skew-normal (SMSN) distributions, proposed by Branco and Dey [6], in the error structure, allowing modeling data with skewness and heavy-tails simultaneously. More recently, Ferreira and Lachos [16] extended the SN-NLM using the skew scale mixtures of normal (SSMN) distributions [15]. This novel class of distributions provides a useful generalization of the symmetrical and asymmetrical NLM, since the error term distributions cover both asymmetric and heavy-tailed distributions, such as the skew-t-normal, skew-slash and skew-contaminated normal, among others. There are some important differences between the classes of SSMN and SMSN distributions (see, for example, [16]).
On the other hand, heteroscedastic nonlinear regression models (HNLM) have been studied recently by some authors. For example, Xie et al. [32,33] developed score statistics for testing homogeneity in the SN-NLM. Lin et al. [24] developed diagnostic tool for skew-t-normal nonlinear models and investigated the properties of a score test statistic for homogeneity of the variance through Monte Carlo simulations. Louzada et al. [26] proposed a HNLM with a skew-normal structure in the presence of heteroscedasticity applied to growth curve modeling. Garay et al. [19] developed diagnostics analysis for HNLM under SMSN (SMSN-HNLM) distributions and presented a score test for checking the homogeneity of the scale parameter. More recently, Araújo et al. [2] addressed the issue of hypothesis testing of the dispersion parameter in the symmetric HNLM using the likelihood ratio test.
The assessment of robustness aspects of the parameter estimates in statistical models has received special attention in recent decades. Identification of problems caused by influential aspects may provide ideas to improve the model assumptions. The case deletion measures [10] consist of studying the impact on the parameter estimates after dropping individual observations. The influence of small perturbations in the model/data on the parameter estimates can be ascertained by performing local influence analysis [11]. Zhu et al. [34] proposed the selection of an appropriate perturbation scheme and the development of influence measures for objective functions at a point with a nonzero first derivative based on the observed log-likelihood function. Chen et al. [9] extended Zhu et al. [34]'s approach to analyzing complex latent variable models using the complete-data log-likelihood. Another approach for case deletion measures and local influence analysis, based on conditional expectation of the complete-data log-likelihood at the E-step of the EM algorithm, was developed by Zhu et al. [36] and Zhu and Lee [35], respectively. For some applications of Zhu and Lee [35]'s approach in the context of asymmetric models, we refer to [14,17,23,28], among others.
In this work, we propose the heteroscedastic nonlinear regression model with errors following a SSMN distribution (SSMN-HNLM), generalizing the SSMN-NLM proposed by Ferreira and Lachos [16]. The ML estimates of the model parameters are obtained via the EM algorithm and the observed information matrix is obtained analytically. Thus, in this paper we develop influence diagnostic tools (case deletion and local influences) for our model based in Zhu and Lee [35]'s well-known approach, which is based on the complete log-likelihood function. We perform a simulation study to verify the asymptotic distribution of the likelihood ratio test statistic and its empirical power to test homogeneity of variances. Further, a study for misspecification of the structure function is performed.
The rest of the paper is organized as follows. In Section 2, we present some properties of the univariate SSMN family. Section 3 outlines the SSMN-HNLM and the EM algorithm for maximum likelihood estimation. In Section 4, we discuss the log-likelihood ratio test for checking homogeneity of a scale parameter and investigate its properties through Monte Carlo simulations. The case deletion measures and local influence of three perturbation schemes are derived in Section 5. The proposed method is illustrated in Section 6 by analyzing a real dataset, and some concluding remarks are presented in Section 7.
2. Skew scale mixtures of normal distributions
In order to motivate our proposed methods, we present a brief introduction to the skew-normal (SN), the scale mixture of normal (SMN) and the SSMN class of distributions. For further details we refer to [15,17].
Let and be the probability density function (pdf) and the cumulative distribution function (cdf), respectively, of the distribution evaluated at x. A random variable Y follows a univariate skew-normal distribution [3] with location parameter μ, scale parameter and skewness parameter λ if its pdf is given by:
(1) |
For a random variable with pdf as in (1), we use the notation . When , the skew normal distribution reduces to the normal distribution (). Its marginal stochastic representation [21], which can be used to derive several of its properties, is given by:
(2) |
where and are independent, denotes the absolute value of and ‘’ means ‘distributed as’. The expectation and variance of Y are given, respectively, by:
(3) |
where .
A random variable Y follows a SMN distribution [1] with location parameter and scale parameter if its pdf assumes the form:
(4) |
where is the cdf of a positive random variable U indexed by the parameter vector and is a strictly positive function. For a random variable with a pdf as in (4), we use the notation .
A random variable Y follows a SSMN distribution [15] with location parameter scale factor and skewness parameter , if its pdf is given by:
(5) |
where is a SMN density as defined in (4). For a random variable with pdf as in (5), we use the notation . If and , we refer to it as the standard SSMN distribution and we denote it by . Clearly, when , we get the corresponding SMN distributions proposed by Andrews and Mallows [1].
For a SSMN random variable, a convenient hierarchical representation is given next, which can be used to quickly simulate realizations of Y and to implement the EM algorithm.
Let . Then its hierarchical representation is given by:
(6) |
Thus, the distributions in the SSMN class that will be considered in this work are:
- The skew Student-t normal distribution (StN), [20], with degrees of freedom, denoted by and , with , which has pdf
where and is the gamma function. When , we obtain the SN distribution as the limiting case. Lastly, .(7) - The skew slash distribution (SSL), denoted by , arises when and , with . Its pdf is given by:
The skew slash distribution reduces to the SN distribution when . It is easy to see that where is the truncated gamma distribution, in the interval .(8) - The skew contaminated normal distribution (SCN), denoted by , , , . Here, and U is a discrete random variable taking one of two states. The probability density function of U is given by:
The skew contaminated normal distribution reduces to the SN distribution when . The conditional distribution is given by:(9)
where . - The skew power-exponential distribution (SPE), denoted by , with and , has pdf given by:
which reduces to the SN distribution when . Although the conditional distribution of is not known, Ferreira et al. [15] showed that:(10) (11)
3. The model and the EM algorithm for ML estimation
3.1. The model
The SSMN-HNLM is defined by:
(12) |
where is the response variable, is a known covariate vector, is the nonlinear predictor, where is an injective and twice continuously differentiable function with respect to the vector of unknown regression coefficients , is a known positive continuously differentiable function, contains values of the explanatory variables, which constitute in general, although not necessarily, a subset of , and is a vector of unknown parameters (see [19,32] for more details). We assume that there is a unique value such that for all .
Using Equations (4) and (5), it follows that the observed-data log-likelihood function of the parameter vector can be expressed as:
(13) |
where is the pdf of U and is the vector of observed values of the response variable Y.
3.2. The ECME algorithm for the SSMN-HNLM model
Note that it is not possible to obtain an analytical solution to the ML estimates of using directly. The hierarchical representation of the SSMN distributions, see Equations (2) and (6), enables the construction of an EM-type algorithm [13] for ML estimation of the SSMN-HNLM. When the M-step of the EM turns out to be analytically intractable, it can be replaced with a sequence of conditional maximization (CM) steps, referred to as the ECM algorithm [27]. Another option is to use the ECME algorithm [25], a faster extension of the EM and ECM, which is obtained by maximizing the constrained Q-function (the expected complete data function) with some CM steps that maximize the corresponding constrained actual marginal likelihood function, called the CML steps.
In this section, we demonstrate how to employ the ECME algorithm for ML estimation of the SSMN-HNLM model. From Equations (2) and (6), the following hierarchical representation for can be obtained:
(14) |
where denotes the univariate normal distribution, , truncated in the interval .
Using Lemma 1 presented by Ferreira and Lachos [16], and after some algebraic manipulations, the joint distribution of can be written as:
Let and . Considering and as missing data, it follows that the complete log-likelihood function associated with is given by:
where C is a constant not depending on unknown parameters .
Given the current estimate , the E-step calculates the function
(15) |
with and
It is important to note that these values require expressions for , and .
As presented by Ferreira and Lachos [16], and the expectations and can be readily evaluated by:
(16) |
(17) |
where and for .
Updating , as in [15], we have computationally attractive expressions for for different SSMN distributions, as presented in Table 1.
Table 1. for different SSMN distributions.
Distributions | |
---|---|
SN | 1 |
StN | |
SSL | |
SCN | |
SPE |
denotes the cdf of the distribution, evaluated at x.
Thus, the CM-step then conditionally maximizes with respect to , obtaining a new estimate , as described below:
E-step: For , compute , using Equations (16)–(17) and from Table 1.
-
CM-step:
Update , and as
where with , is the diagonal matrix, is the corrected observed response given by and . - CML-step: Considering the values , , and obtain by optimizing the constrained log-likelihood function, i.e.
where is the respective symmetric pdf as defined in (4).(18)
The more efficient CML–step follows [25] (ECME), which is referred to as the conditional marginal likelihood step (CML-step), where we replace the usual M-step by a step that maximizes the restricted actual log-likelihood function. Furthermore, this step requires a one-dimensional search of StN, SSL and SPE models and a bi-dimensional search of the SCN model, which can be easily accomplished by using, for example, the ‘optim/optimize’ routine in R [29].
The iterations of the above algorithms are repeated until a suitable convergence rule is satisfied, e.g. or is sufficiently small, say .
3.3. Notes on implementation
Although the EM-type algorithm tends to be robust with respect to the choice of the starting values, it may not converge when initial values are far from good ones. Thus, the choice of adequate starting values for the EM algorithm plays a big role in parameter estimation. A set of reasonable values can be obtained by computing and from standard nonlinear least squares (NLS). Then, calculate as the skewness coefficient of the NLS residuals, using the ‘nls’ routine in R [29]. The value of can be the value such that for all .
4. Likelihood ratio test for homogeneity of variance
The SSMN-HNLM defined in Equation (12) supposes that the variance of the model is not constant with scale parameter given by with . Besides this, it is assumed that a unique value exists such that for all . Thus, the test for homogeneity of the scalar parameter in the model (12) can be expressed by:
In this work, we use a likelihood ratio (LR) test statistic to test , which is provided by , where denotes the observed-data log-likelihood function, given by Equation (13), and represent the ML estimates obtained using the ECME algorithm under and , respectively. Under , the LR test statistic has an asymptotically distribution, where q is the length of . Thus, in order to analyze the empirical distribution and power of the LR test, we develop two simulation studies.
4.1. Simulation studies
In this section, the performance of the asymptotic distribution and power of the log-likelihood ratio (LR) test statistic are examined. First, we compare the empirical distribution with the theoretical distribution () via Monte Carlo simulations. Second, we investigate the power of the LR test for a grid of values of ρ.
4.1.1. The empirical distributions of LR test statistics
The performance of the asymptotic distribution of the LR test statistic is examined following the procedure described in [19,33]. The model used in this simulation study is
(19) |
where , with . The variable is generated from a uniform distribution in the interval(0.2,2). The parameter values are set at , , . The values of ν are chosen to achieve heavy-tails, with for StN and SSL.
We generate values of by the model (19) with the true values of parameters and (under ), repeating this procedure 2000 times (the values of s are fixed for each replication). Then, using the 2000 estimates of the LR statistics, we obtain the empirical distribution functions (edf). Figure 1 shows comparisons between the edf and the theoretical distribution of for n = 30, 70 and 120 in SN, StN and SSL models. It can be seen that when ‘n’ increases, the edf's are very close to the theoretical distribution for the distributions considered in our study.
Figure 1.
Simulated comparisons between the empirical distributions of the score statistic and distribution, using SN (first row), StN (second row) and SSL (last row).
4.1.2. The power of the LR test
In order to study the power of the test, we use different values of ‘n’ and ρ to get the simulated sizes and powers of the test statistic. We consider the values of , 0.2, 0.4, 0.6, 0.8, 1 and n = 10, 20, 30, 50, 70, 90 and 150. Each simulation is repeated 2000 times, so the proportion of times when the null hypothesis is rejected is just the simulated power value. All the statistics are compared with the critical value at level. Table 2 presents the rejection rate for the hypothesis from the test statistic LR for the SN, StN and SSL distributions. It can be seen that for , the rejection rate of the test approximates the true nominal level when ‘n’ increases. When ‘n’ and ρ increase, the power of the test approaches of 1 for all models. Figure 2 presents the rejection rate when varying the parameter ρ in the interval and varying the sample size between 30 and 150 for each distribution.
Table 2. Rejection rate for at the nominal level of from the LR statistic for the SN, StN and SSL distributions.
n | ||||||
---|---|---|---|---|---|---|
SN-NLM | ||||||
10 | 0.0940 | 0.1025 | 0.0895 | 0.0895 | 0.0870 | 0.1220 |
20 | 0.0715 | 0.0680 | 0.0810 | 0.1050 | 0.1730 | 0.2340 |
30 | 0.0570 | 0.0610 | 0.0895 | 0.1590 | 0.2510 | 0.3920 |
50 | 0.0630 | 0.0790 | 0.1400 | 0.2675 | 0.4650 | 0.6475 |
70 | 0.0535 | 0.0810 | 0.1865 | 0.3655 | 0.5955 | 0.8140 |
90 | 0.0535 | 0.1075 | 0.2370 | 0.5085 | 0.7550 | 0.9105 |
120 | 0.0500 | 0.1175 | 0.3425 | 0.6440 | 0.8505 | 0.9585 |
150 | 0.0490 | 0.1230 | 0.4120 | 0.7305 | 0.9290 | 0.9925 |
StN-NLM | ||||||
10 | 0.1335 | 0.1340 | 0.1375 | 0.1175 | 0.1285 | 0.1560 |
20 | 0.0760 | 0.0925 | 0.1075 | 0.1250 | 0.1615 | 0.2255 |
30 | 0.0735 | 0.1070 | 0.1095 | 0.1500 | 0.2215 | 0.3170 |
50 | 0.0715 | 0.1105 | 0.1375 | 0.2245 | 0.3285 | 0.4695 |
70 | 0.0600 | 0.0985 | 0.1580 | 0.2710 | 0.4455 | 0.6185 |
90 | 0.0520 | 0.1065 | 0.2005 | 0.3660 | 0.5190 | 0.7100 |
120 | 0.0540 | 0.1190 | 0.2455 | 0.4385 | 0.6665 | 0.8455 |
150 | 0.0465 | 0.1420 | 0.3090 | 0.5275 | 0.7640 | 0.9050 |
SSL-NLM | ||||||
10 | 0.0805 | 0.0665 | 0.0630 | 0.0650 | 0.0615 | 0.0960 |
20 | 0.0545 | 0.0560 | 0.0425 | 0.0520 | 0.0760 | 0.0940 |
30 | 0.0540 | 0.0460 | 0.0380 | 0.1085 | 0.0890 | 0.2775 |
50 | 0.0415 | 0.0390 | 0.0795 | 0.1310 | 0.2410 | 0.4670 |
70 | 0.0445 | 0.0460 | 0.1160 | 0.2765 | 0.3060 | 0.6430 |
90 | 0.0450 | 0.0495 | 0.1180 | 0.2890 | 0.5245 | 0.7625 |
120 | 0.0430 | 0.0600 | 0.1490 | 0.4020 | 0.6310 | 0.9205 |
150 | 0.0375 | 0.0715 | 0.2190 | 0.5860 | 0.8070 | 0.9615 |
Figure 2.
Power of the analysis to detect significant heteroscedasticity over a range of possible values and different sample sizes (n), considering the SN, StN and SSL error distributions.
4.1.3. Study of misspecification of the structure function
As suggested by a referee, we report here a simulation study to analyze the influence of misspecification of the structure function. We simulate from model (12) with , , . The true values of the parameters are set at: , , and (for StN and SSL). We use the following structure functions for : (1) (homoscedasticity), (2) (linear relation), (3) (true relation) and (4) . We generate 2000 Monte Carlos samples of size n = 300 and we compute the coverage rates (CR) given by the proportion of estimates that filled in the confidence interval and the bias, given by the difference between the mean of the estimates and the true value of the parameters. For CR, we expected a value close to and for the bias a value close to 0. According to Table 3, the true structure function attains our expectations in terms of CR bias for all parameters and all the distributions taken into consideration. On the other hand, we note that other specifications of present relatively large bias and CR in at least one parameter.
Table 3. Coverage rates (CR) at the nominal level of and bias for different structure functions with (true values of the parameters are in parentheses).
Parameter | CR | bias | CR | bias | CR | bias | CR | bias |
---|---|---|---|---|---|---|---|---|
SN-HNLM | ||||||||
86.40 | −0.12 | 87.94 | 0.11 | 94.64 | −0.02 | 92.20 | −0.06 | |
90.08 | −0.86 | 87.57 | 0.83 | 94.11 | 0.15 | 94.78 | −0.06 | |
94.04 | −1.6 | 82.16 | 4.7 | 94.26 | 5.2 | 94.82 | 9.6 | |
5.02 | 0.61 | 81.97 | −0.12 | 92.76 | 1.4 | 90.50 | 0.10 | |
92.04 | −0.01 | 74.32 | −1.16 | 95.85 | −0.25 | 96.22 | −0.30 | |
– | – | 13.63 | 0.72 | 94.24 | −4.9 | 18.47 | 0.26 | |
StN-HNLM | ||||||||
93.04 | −0.09 | 98.38 | −0.04 | 92.13 | −0.01 | 90.51 | −0.04 | |
95.11 | −0.08 | 55.46 | 0.21 | 92.90 | 0.16 | 93.54 | 0.15 | |
94.75 | 1.4 | 0.00 | 1.2 | 93.26 | 5.7 | 93.24 | 1.1 | |
77.59 | 0.50 | 62.94 | −0.01 | 92.58 | 0.04 | 93.06 | 0.06 | |
93.56 | 0.04 | 27.73 | −0.42 | 93.46 | −0.49 | 93.71 | −0.46 | |
– | – | 85.65 | 0.35 | 94.38 | −2.0 | 45.51 | 0.31 | |
96.85 | −4.2 | 31.06 | 0.43 | 96.56 | 0.65 | 96.06 | 0.57 | |
SSL-HNLM | ||||||||
79.16 | −0.30 | 82.36 | −0.10 | 96.14 | −0.08 | 93.41 | −0.08 | |
82.23 | 1.07 | 71.59 | 1.91 | 95.34 | 0.19 | 98.12 | 0.29 | |
56.83 | 3.9 | 75.67 | 6.8 | 95.05 | 1.7 | 96.89 | 1.6 | |
4.67 | −0.05 | 0.00 | 41.68 | 95.54 | 0.04 | 94.91 | 0.04 | |
3.42 | 1.47 | 57.73 | 0.55 | 97.09 | −0.25 | 96.89 | −0.29 | |
– | – | 0.00 | −0.10 | 95.15 | 0.31 | 24.28 | 0.31 | |
0.00 | −1.90 | 3.38 | −0.96 | 92.62 | 3.89 | 90.21 | 4.63 |
4.1.4. Computational aspects
The simulation studies were run in a Linux server, with 2 processors of 2.4 GHz, 12 cores, 24 threads and 32 GB of RAM memory. All the computational procedures were coded and implemented using the statistical software package R (R Core Team, 2018). For each procedure based on 2000 replicates, we used the parallel routine of R. The run time of each simulation procedure varied between 7 and 40 minutes, depending on the sample size and ρ value used. We did not observe problems with convergence and out-of-boundary estimates in our simulation study. The computer programs are available from the first author upon request.
5. Diagnostic analysis
Diagnostic techniques are used to detect observations that seriously influence the results of a statistical analysis. In the literature, there are basically two approaches to detect influential observations. One approach is the case-deletion method [10], in which the impact of deleting an observation on the estimates is directly assessed by measures such as the likelihood distance and Cook distance. The second approach is a general statistical technique used to assess the stability of the estimation outputs with respect to the model inputs [11]. Inspired by the results of Zhu et al. [36], Zhu and Lee [35] and Lee and Xu [23], we study the case-deletion measures and the local influence diagnostics for nonlinear regression models on the basis of the Q-function. In the following subsections we describe the background and details of the classic diagnostic methods to detect influential observations.
5.1. The local influence approach
Let and with be the complete log-likelihood functions from the postulated model, considering a perturbation vector , varying in an open region , respectively. We assume that a vector exists such that for all . To assess the influence of the perturbations on the ML estimate of , one may consider the Q-displacement function, defined as:
where denotes the ML estimator in the perturbed model.
Following the approach developed in [11,35], the normal curvature for in the direction of some unit vector is given by:
where and .
Let be the eigenvalue-eigenvector pairs of the matrix with , and . The aggregated contribution vector of all eigenvectors corresponding to nonzero eigenvalues is given by:
Following Lee and Xu [23], we use as a benchmark to regard the lth case as influential, where is an arbitrary constant (depending on the real application) and is the standard deviation of . Appendix 3 presents the Hessian matrix and some perturbation schemes used in this work.
5.2. Case deletion measures
In the process of model validation, it is fundamental to verify if there are observations with a disproportionate influence on the estimates of the model's parameters. Case-deletion is a classic approach to study the effects of dropping the ith case from the dataset. Thus, considering the model in (12) and , we compare the ML estimate with all observations with the ML estimate obtained when the ith observation has been deleted from the dataset. The SSMN-HNLM in (12) is rewritten as:
Let be the augmented dataset, where the subscript ‘’ denotes the original vector with the ith observation deleted. The complete-data log-likelihood function based on the data with the ith case deleted is denoted by . Let be the maximizer of the function of the proposed regression model, where the estimates are obtained by using the EM algorithm based on the remaining n−1 observations. If is far from in some sense, then the ith case is regarded as influential.
Similar to the classic case-deletion measures, Cook distance and the likelihood displacement, Zhu et al. [36] presented analogous measures based on the Q-function.
- Generalized Cook distance GD: This measure, similar to the usual Cook distance [10], determines the degree of influence of the ith observation on the estimate of and is defined by:
- Q-distance QD: This measure of the influence of the ith case is similar to the likelihood distance discussed by Cook and Weisberg [12], defined by:
6. Application
In this section we consider the likelihood analysis of the dataset presented in [24], which describes data from ultrasonic calibration. Labra et al. [22] analyzed this dataset in a heteroscedastic nonlinear model with SMSN distribution errors. In this context, the authors verified the presence of outliers. Here we reanalyze the dataset with the aim of showing the capacity of the SSMN distributions to fit real datasets in the presence of asymmetry and heavy-tails in heteroscedastic nonlinear models. The data consist of 214 observations where the response variable is ultrasonic response Y, and the predictor variable is metal distance x. From the descriptive statistics, presented in Table 4, we observe a large and positive sample skewness. The distance between the mean and median suggest using an asymmetric distribution as an alternative to model the data. On the other hand, Figure 3 shows a nonlinear relationship between the metal distance and the ultrasonic response.
Table 4. Summary statistics for ultrasonic calibration data (SD is sample standard deviation).
Min | Max | Mean | Median | SD | Skewness | Kurtosis |
---|---|---|---|---|---|---|
3.75 | 92.90 | 30.26 | 21.11 | 23.68 | 0.91 | 2.56 |
Figure 3.
Scatter-plot of ultrasonic calibration data.
Following Lin et al. [24], we consider a SSMN-HNLM of the form:
(20) |
where for .
Table 5 contains the ML estimates of the parameters from the SN, StN, SSL, SPE and SCN models, together with their corresponding standard errors (SE) calculated via the observed information matrix (Appendix 1). Moreover, both the Akaike information criterion (AIC) and Bayesian information criterion (BIC) indicate that the SSMN models with heavy-tails (StN-NLM, SSL-NLM and SPE-NLM) present better fit than the SN model, with the StN-NLM being significantly better. In addition, by comparing our results with those obtained by Labra et al. [22, Table 2, p. 2159], we can see that the ML estimates in the StN and SSL distributions have higher log-likelihood () and consequently lower AIC values, indicating better performance of the SSMN models for the ultrasonic calibration dataset.
Table 5. ML estimation results of fitting various mixture models to the ultrasonic calibration data. The SE values are the asymptotic standard errors based on the observed information matrix.
SN-NLM | StN-NLM | SSL-NLM | SPE-NLM | |||||
---|---|---|---|---|---|---|---|---|
Parameter | Estimate | SE | Estimate | SE | Estimate | SE | Estimate | SE |
0.188 | 0.019 | 0.190 | 0.014 | 0.196 | 0.016 | 0.190 | 0.027 | |
0.006 | 4.6 | 0.006 | 3.8 | 0.006 | 3.8 | 0.006 | 4.7 | |
0.013 | 9.1 | 0.012 | 8.6 | 0.012 | 6.9 | 0.013 | 8.7 | |
33.981 | 5.997 | 11.244 | 2.050 | 9.977 | 0.101 | 18.514 | 12.545 | |
λ | 2.088 | 0.428 | 0.651 | 0.125 | 0.824 | 0.158 | 1.448 | 0.784 |
ρ | −1.082 | 0.128 | −1.028 | 0.119 | −1.091 | 0.065 | −1.091 | 0.139 |
ν | – | – | 3.846 | 1.184 | 1.454 | 0.058 | 0.773 | 0.150 |
−520.305 | −514.764 | −515.108 | −518.008 | |||||
AIC | 1052.609 | 1043.529 | 1044.215 | 1050.017 | ||||
BIC | 1072.806 | 1067.090 | 1067.778 | 1073.578 |
From (20), the model is homoscedastic when . So, the test for heteroscedasticity based on the likelihood ratio test statistic is , which has an approximate distribution under . The LR statistic for the StN model was with . This result is in accordance with that obtained by Lin et al. [24] using the score statistic test and Labra et al. [22] using the likelihood ratio test. Therefore the assumption of homogeneity of variance is not suitable for the ultrasonic calibration data.
In order to detect incorrect specification of the error distribution, we use the Mahalanobis distance , for to construct simulated envelopes. In the skew-normal case, we have . Thus, we can use as cutoff points the quantile , where . From Ferreira et al. [15], we have the following properties related to the Mahalanobis distance: for StN, for SSL and for the SPE distribution.
The QQ-plots and simulated envelopes for the Mahalanobis distance of the fitted SN-NLM, StN-NLM, SSL-NLM and SPE-NLM models are shown in Figure 4. The lines in these figures represent the 2.5th percentile, the mean, and the 97.5th percentile of 100 simulated points for each observation. It can be seen that the SN and SPE models contain some observations outside the confidence band, but the StN and SSL models present good fit to the dataset.
Figure 4.
Ultrasonic calibration data. QQ-plots and simulated envelopes for the Mahalanobis distance. (a) SN-NLM, (b) StN-NLM, (c) SSL-NLM and (d) SPE-NLM.
First, we identify influential observations in the fitted model based on case-deletion measures, the generalized Cook distance (Figure 5) and Q-distance (Figure 6), which are similar for each model but with less scale in the StN model. We note from these figures that observations and for both measures are potentially influential on the parameter estimates in the SN model, while and are influential in the StN model and , and are influential in the SSL model.
Figure 5.
Ultrasonic calibration data. Index plot of the generalized Cook distance. (a) SN-HNLM, (b) StN-HNLM, (c) SSL-HNLM.
Figure 6.
Ultrasonic calibration data. Index plot of the Q-distance. (a) SN-HNLM, (b) StN-HNLM, (c) SSL-HNLM.
We used the same strategy presented by Cao et al. [8] to compare the robustness among models. Thus, in order to reveal the impact of the four observations considered as potential outliers, using the Mahalanobis distance (See Figure 7), on the parameter estimates, we refitted the five models eliminating these four cases.
Figure 7.
Ultrasonic calibration data. Mahalanobis distance for SN-HNLM.
In Table 6, we show the estimated values and their relative changes, defined by , where means the kth parameter estimate, without the set of potential outliers. We observe that the RC for StN-NLM becomes smaller than the SN-NLM, which implies that the StN model is less sensitive than the SN model, in the presence of potential outliers.
Table 6. Ultrasonic calibration data. Relative changes (RC) of , after deleting outliers.
SN-NLM | StN-NLM | SSL-NLM | SPE-NLM | |||||
---|---|---|---|---|---|---|---|---|
Parameter | Estimate | RC | Estimate | RC | Estimate | RC | Estimate | RC |
0.1790 | 13.7299 | 0.1751 | 4.6419 | 0.1870 | 5.5330 | 0.2040 | −7.2792 | |
0.0055 | 9.7201 | 0.0056 | 2.6319 | 0.0057 | 4.4269 | 0.0061 | −4.1235 | |
0.0131 | −5.0432 | 0.0123 | −3.0677 | 0.0125 | −6.0685 | 0.0118 | 5.4580 | |
27.2558 | 31.9260 | 13.4812 | −0.5992 | 20.7536 | −108.0115 | 26.3738 | −42.4551 | |
λ | 1.8818 | 46.7073 | 0.6055 | 0.5030 | 1.2628 | −53.2015 | 1.6482 | −13.8311 |
ρ | −1.1117 | −19.3049 | −1.2033 | −5.6935 | −1.2359 | −13.2407 | −1.1854 | −8.6415 |
ν | – | – | 7.8713 | 2.2535 | 6.4618 | −344.3733 | 0.9999 | −29.2878 |
Figures 8–9 present local influence diagnostic analysis using the case-weight and response perturbations, respectively. For the perturbation schemes we obtained the values of and the figures present the index graphs of . The horizontal lines delimit the benchmark for , with [23]. Observations and stand out as influential in case-weight perturbation in all models, but with less scale in StN and SSL models. The same three observations () are influential in the response perturbation in the StN and SSL models, while the SN model presents other observations as influential.
Figure 8.
Ultrasonic calibration data. Index plot of in the case weight perturbation. (a) SN-HNLM, (b) StN-HNLM, (c) SSL-HNLM.
Figure 9.
Ultrasonic calibration data. Index plot of in the response perturbation. (a) SN-HNLM, (b) StN-HNLM, (c) SSL-HNLM.
As suggested by a referee, in order to evaluate whether the likelihood ratio (LR) statistic to test , is sensitive to the presence of the influential observations #82, #130, #145, #162, #163, #175 and (based in the diagnostic analysis), we removed each observation individually and all of them together from the full data, and obtained the LR statistics and their corresponding p-values. Table 7 shows that was rejected in all cases, so the SSMN-HNLM remains effective when influential observations are removed individually or jointly.
Table 7. Ultrasonic calibration data. Likelihood ratio (LR) statistics when removing each observation individually and all simultaneously.
Removed Observations | ||||||||
---|---|---|---|---|---|---|---|---|
Model | All influential | |||||||
SN | 60.596 | 82.487 | 66.013 | 69.210 | 80.171 | 79.849 | 64.741 | 86.439 |
StN | 31.361 | 34.673 | 32.034 | 34.305 | 32.913 | 33.547 | 30.954 | 45.529 |
SSL | 32.788 | 42.236 | 39.588 | 41.848 | 41.588 | 41.425 | 38.382 | 52.032 |
SPE | 32.897 | 36.674 | 33.389 | 35.998 | 34.794 | 34.521 | 32.070 | 56.935 |
7. Conclusions
In this paper we developed an EM algorithm for maximum likelihood estimation in the SSMN-HNLM, where closed-form expressions are obtained for the E and M steps of the EM algorithm with the standard errors as byproducts. Furthermore, we applied Zhu and Lee [35]'s approach for case-deletion measures and local influence diagnostics. A simulation study was developed to verify the asymptotic distribution of the likelihood ratio test statistic and the empirical power of the test. For , the rejection rate of the test approached of the true nominal level when n increased. When n and ρ increased, the power of the test approached 1. The diagnostic analysis showed that the influence of the observations declined when we considered distributions with heavier tails than the SN one. The models can be fitted using standard available software packages from and the program codes are available from us on request.
Finally, the proposed method can be extended to a more general framework, such as censored regression models [31], measurement error models and multivariate regression models, providing satisfactory results at the expense of additional complexity of implementation. An in-depth investigation of such extension is beyond the scope of the present paper, but is certainly an interesting topic for future research.
Appendices.
Appendix 1. The observed information matrix for SSMN heteroscedastic nonlinear regression models
Consider the SSMN-HNLM model given in (12), where the corresponding observed-data log-likelihood function of is of the form . In this section we write for simplification. We have that where is the log-likelihood function of the corresponding symmetric SMN distribution and . To simplify the text, we write and . Thus, the observed information matrix for can be written as:
where:
with , with .
The first-order derivatives of in relation to are given by:
The second-order derivatives of in relation to are given by:
The first and second-order derivatives of in relation to can be calculated for each considered SMN distribution as follows:
A.1. The normal distribution
with .
A.2. The Student-t distribution
where , is the digamma function and is the trigamma function.
A.3. The slash distribution
where
A.4. The contaminated normal distribution
with .
The first partial derivatives are given by
The second partial derivatives are given by
A.5. The power-exponential distribution
Appendix 2. Computation of the function and its derivatives
-
Skew-normal distribution:
.
-
Skew power-exponential distribution:
In this case, there is no explicit form for function and consequently cannot be calculated explicitly.
-
Skew Student-t normal distribution:
Since and , we have that: -
Skew slash distribution:
In this case, and . Thus: -
Skew contaminated-normal distribution:
Here . So,
with , , , and .
Appendix 3. Hessian matrix and perturbation schemes
To obtain the diagnostic measures of the SSMN-HNLM based on the approach proposed by Zhu and Lee [35], it is necessary to compute the Hessian matrix, which is defined by , where . It follows from (15) that the derivatives have elements given by:
where
A.6. Perturbation schemes
In this section we consider two different perturbation schemes for SSMN-HNLM. For each case, we need to calculate the matrix .
A.6.1. Case weight perturbation
Let be an dimensional vector with . Then the expected value of the perturbed complete-data log-likelihood function (perturbed Q-function) can be written as:
In this case the elements of , , are given by
A.6.2. Response variable perturbation
A perturbation of the response variables is introduced by , where is the standard deviation of . In this case, and
In this case the elements of , , are given by
A.6.3. Explanatory variable perturbation
A perturbation in a specific explanatory variable can be obtained as , for , where is a scale factor that can be the standard deviation of . To simplify the notation, we write and . In this case, and
In this case the elements of , , are given by:
Funding Statement
The first author thanks to the FAPEMIG (Minas Gerais State Research Support Foundation), [grant number CEX APQ 01944/17] for financial support. The research of Aldo M. Garay was supported by Grant 420082/2016-6 from CNPq-Brazil.
Disclosure statement
No potential conflict of interest was reported by the authors.
References
- 1.Andrews D.F. and Mallows C.L., Scale mixtures of normal distributions, J. R. Stat. Soc. Ser. B 36 (1974), pp. 99–102. [Google Scholar]
- 2.Araújo M.C., Cysneiros A.H.M.A., and Montenegro L.C., Improved heteroskedasticity likelihood ratio tests in symmetric nonlinear regression models, Stat. Papers (2017). doi: 10.1007/s00362-017-0933-5. [DOI] [Google Scholar]
- 3.Azzalini A., A class of distributions which includes the normal ones, Scand. J. Statist. 12 (1985), pp. 171–178. [Google Scholar]
- 4.Battes D.M. and Watts D.G., Nonlinear Regression Analysis and its Applications, John Wiley & Sons, New York, 1988. [Google Scholar]
- 5.Beale E.M.L. and Little R.J.A., Missing values in multivariate analysis, J. R. Stat. Soc. Ser. B 37 (1975), pp. 129–146. [Google Scholar]
- 6.Branco M.D. and Dey D.K., A general class of multivariate skew-elliptical distributions, J. Multivar. Anal. 79 (2001), pp. 99–113. doi: 10.1006/jmva.2000.1960 [DOI] [Google Scholar]
- 7.Cancho V.C., Lachos V.H., and Ortega E.M.M., A nonlinear regression model with skew-normal errors, Stat. Papers 51 (2010), pp. 547–558. doi: 10.1007/s00362-008-0139-y [DOI] [Google Scholar]
- 8.Cao C., Wang Y., Shi J.Q., and Lin J., Measurement error models for replicated data under asymmetric heavy-tailed distributions, Comput. Econ. 52 (2018), pp. 531–553. doi: 10.1007/s10614-017-9702-8 [DOI] [Google Scholar]
- 9.Chen F., Zhu H.T., and Lee S.Y., Perturbation selection and local influence analysis for nonlinear structural equation model, Psychometrika 74 (2009), pp. 493–516. doi: 10.1007/s11336-009-9114-3 [DOI] [Google Scholar]
- 10.Cook R.D., Detection of influential observation in linear regression, Technometrics 19 (1977), pp. 5–18. [Google Scholar]
- 11.Cook R.D., Assessment of local influence, J. R. Stat. Soc. Ser. B 48 (1986), pp. 133–169. [Google Scholar]
- 12.Cook R.D. and Weisberg S., Residuals and Influence in Regression, Chapman & Hall/CRC, Boca Raton, FL, 1982. [Google Scholar]
- 13.Dempster A., Laird N., and Rubin D., Maximum likelihood from incomplete data via the EM algorithm, J. R. Stat. Soc. Ser. B 39 (1977), pp. 1–38. [Google Scholar]
- 14.Ferreira C.S. and Arellano-Valle R., Estimation and diagnostic analysis in skew-generalized-normal regression models, J. Stat. Comput. Simul. 88 (2018), pp. 1039–1059. doi: 10.1080/00949655.2017.1419351 [DOI] [Google Scholar]
- 15.Ferreira C.S., Bolfarine H., and Lachos V.H., Skew scale mixtures of normal distributions: Properties and estimation, Stat. Methodol. 8 (2011), pp. 154–171. doi: 10.1016/j.stamet.2010.09.001 [DOI] [Google Scholar]
- 16.Ferreira C.S. and Lachos V.H., Nonlinear regression models under skew scale mixtures of normal distributions, Stat. Methodol. 33 (2016), pp. 131–146. doi: 10.1016/j.stamet.2016.08.004 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 17.Ferreira C.S., Lachos V.H., and Bolfarine H., Inference and diagnostics in skew scale mixtures of normal regression models, J. Stat. Comput. Simul. 85 (2015), pp. 517–537. doi: 10.1080/00949655.2013.828057 [DOI] [Google Scholar]
- 18.Garay A.M., Lachos V.H., and Abanto-Valle C.A., Nonlinear regression models based on scale mixtures of skew-normal distributions, J. Korean Stat. Soc. 40 (2011), pp. 115–124. doi: 10.1016/j.jkss.2010.08.003 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 19.Garay A.M., Lachos V.H., Labra F.V., and Ortega E.M.M., Statistical diagnostics for nonlinear regression models based on scale mixtures of skew-normal distributions, J. Stat. Comput. Simul. 84 (2014), pp. 1761–1778. doi: 10.1080/00949655.2013.766188 [DOI] [Google Scholar]
- 20.Gómez H.W., Venegas O., and Bolfarine H., Skew-symmetric distributions generated by the distribution function of the normal distribution, Environmetrics 18 (2007), pp. 395–407. doi: 10.1002/env.817 [DOI] [Google Scholar]
- 21.Henze N., A probabilistic representation of the skew-normal distribution, Scand. J. Statist. 13 (1986), pp. 271–275. [Google Scholar]
- 22.Labra F.V., Garay A.M., Lachos V.H., and Ortega E.M.M., Estimation and diagnostics for heteroscedastic nonlinear regression models based on scale mixtures of skew-normal distributions, J. Stat. Plan. Inference 142 (2012), pp. 2149–2165. doi: 10.1016/j.jspi.2012.02.018 [DOI] [Google Scholar]
- 23.Lee S.Y. and Xu L., Influence analyses of nonlinear mixed-effects models, Comput. Stat. Data. Anal. 45 (2004), pp. 321–341. doi: 10.1016/S0167-9473(02)00303-1 [DOI] [Google Scholar]
- 24.Lin J.G., Xie F.C., and Wei B., Statistical diagnostics for skew-t-normal nonlinear models, Comm. Stat. Simul. Comput. 38 (2009), pp. 2096–2110. doi: 10.1080/03610910903249502 [DOI] [Google Scholar]
- 25.Liu C. and Rubin D.B., The ECME algorithm: A simple extension of EM and ECM with faster monotone convergence, Biometrika 80 (1994), pp. 267–278. [Google Scholar]
- 26.Louzada F., Ferreira P.H., and Diniz C.A., Skew-normal distribution for growth curve models in presence of a heteroscedasticity structure, J. Appl. Stat. 41 (2014), pp. 1785–1798. doi: 10.1080/02664763.2014.891005 [DOI] [Google Scholar]
- 27.Meng X.L. and Rubin D.B., Maximum likelihood estimation via the ECM algorithm: A general framework, Biometrika 81 (1993), pp. 633–648. [Google Scholar]
- 28.Montenegro L.C., Bolfarine H., and Lachos V.H., Influence diagnostics for a skew extension of the grubbs measurement error model, Comm. Stat. Simul. Comput. 38 (2009), pp. 667–681. doi: 10.1080/03610910802618385 [DOI] [Google Scholar]
- 29.R Core Team. R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing. Vienna, Austria, 2018. Available at http://www.R-project.org/.
- 30.Ratkowsky D.A., Handbook of Nonlinear Regression Models, Marcel Dekker, New York, 1990. [Google Scholar]
- 31.Vaida F. and Liu L., Fast implementation for normal mixed effects models with censored response, J. Comput. Graph. Stat. 18 (2009), pp. 797–817. doi: 10.1198/jcgs.2009.07130 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 32.Xie F.C., Lin J.G., and Wei B.C., Diagnostics for skew-normal nonlinear regression models with ar(1) errors, Comput. Stat. Data. Anal. 53 (2009a), pp. 4403–4416. doi: 10.1016/j.csda.2009.06.010 [DOI] [Google Scholar]
- 33.Xie F.C., Wei B.C., and Lin J.G., Homogeneity diagnostics for skew-normal nonlinear regression models, Stat. Probab. Lett. 79 (2009b), pp. 821–827. doi: 10.1016/j.spl.2008.11.001 [DOI] [Google Scholar]
- 34.Zhu H.T., Ibrahim J.G., Lee S.Y., and Zhang H.P., Perturbation selection and influence measures in local influence analysis, Ann. Statist. 35 (2007), pp. 2565–2588. doi: 10.1214/009053607000000343 [DOI] [Google Scholar]
- 35.Zhu H. and Lee S., Local influence for incomplete-data models, J. R. Stat. Soc. Ser. B 63 (2001), pp. 111–126. doi: 10.1111/1467-9868.00279 [DOI] [Google Scholar]
- 36.Zhu H., Lee S., Wei B., and Zhou J., Case-deletion measures for models with incomplete data, Biometrika 88 (2001), pp. 727–737. doi: 10.1093/biomet/88.3.727 [DOI] [Google Scholar]