Skip to main content
Journal of Applied Statistics logoLink to Journal of Applied Statistics
. 2019 Nov 11;47(9):1690–1719. doi: 10.1080/02664763.2019.1691158

Inference and diagnostics for heteroscedastic nonlinear regression models under skew scale mixtures of normal distributions

Clécio da Silva Ferreira a,CONTACT, Víctor H Lachos b, Aldo M Garay c
PMCID: PMC9041946  PMID: 35707586

ABSTRACT

The heteroscedastic nonlinear regression model (HNLM) is an important tool in data modeling. In this paper we propose a HNLM considering skew scale mixtures of normal (SSMN) distributions, which allows fitting asymmetric and heavy-tailed data simultaneously. Maximum likelihood (ML) estimation is performed via the expectation-maximization (EM) algorithm. The observed information matrix is derived analytically to account for standard errors. In addition, diagnostic analysis is developed using case-deletion measures and the local influence approach. A simulation study is developed to verify the empirical distribution of the likelihood ratio statistic, the power of the homogeneity of variances test and a study for misspecification of the structure function. The method proposed is also illustrated by analyzing a real dataset.

KEYWORDS: EM algorithm, heteroscedastic nonlinear regression models, influence diagnostics, likelihood ratio test, skew scale mixtures of normal distributions

1. Introduction

Nonlinear regression models (NLM) are applied in some areas of the sciences to model data for which nonlinear functions of unknown parameters are used to explain or describe the phenomena under study. For a broad discussion on nonlinear models, see, for instance, [4,30]. According [5], a large degree of heteroscedasticity is more commonly seen with data that are best fit by a nonlinear regression model than with data that can be adequately fit by a linear model. However, it is well known that several phenomena have asymmetry and/or heavy-and-lightly tailed behavior, so it is necessary to work with more flexible classes of distributions.

Some authors have proposed homoscedastic nonlinear regression models with an asymmetric structure in the error term. For instance, Cancho et al. [7] introduced the skew-normal nonlinear regression models (SN-NLM) and presented a complete likelihood based analysis, including an efficient EM-type algorithm for ML estimation. Garay et al. [18] introduced an extension of the SN-NLM by using the scale mixtures of skew-normal (SMSN) distributions, proposed by Branco and Dey [6], in the error structure, allowing modeling data with skewness and heavy-tails simultaneously. More recently, Ferreira and Lachos [16] extended the SN-NLM using the skew scale mixtures of normal (SSMN) distributions [15]. This novel class of distributions provides a useful generalization of the symmetrical and asymmetrical NLM, since the error term distributions cover both asymmetric and heavy-tailed distributions, such as the skew-t-normal, skew-slash and skew-contaminated normal, among others. There are some important differences between the classes of SSMN and SMSN distributions (see, for example, [16]).

On the other hand, heteroscedastic nonlinear regression models (HNLM) have been studied recently by some authors. For example, Xie et al. [32,33] developed score statistics for testing homogeneity in the SN-NLM. Lin et al. [24] developed diagnostic tool for skew-t-normal nonlinear models and investigated the properties of a score test statistic for homogeneity of the variance through Monte Carlo simulations. Louzada et al. [26] proposed a HNLM with a skew-normal structure in the presence of heteroscedasticity applied to growth curve modeling. Garay et al. [19] developed diagnostics analysis for HNLM under SMSN (SMSN-HNLM) distributions and presented a score test for checking the homogeneity of the scale parameter. More recently, Araújo et al. [2] addressed the issue of hypothesis testing of the dispersion parameter in the symmetric HNLM using the likelihood ratio test.

The assessment of robustness aspects of the parameter estimates in statistical models has received special attention in recent decades. Identification of problems caused by influential aspects may provide ideas to improve the model assumptions. The case deletion measures [10] consist of studying the impact on the parameter estimates after dropping individual observations. The influence of small perturbations in the model/data on the parameter estimates can be ascertained by performing local influence analysis [11]. Zhu et al. [34] proposed the selection of an appropriate perturbation scheme and the development of influence measures for objective functions at a point with a nonzero first derivative based on the observed log-likelihood function. Chen et al. [9] extended Zhu et al. [34]'s approach to analyzing complex latent variable models using the complete-data log-likelihood. Another approach for case deletion measures and local influence analysis, based on conditional expectation of the complete-data log-likelihood at the E-step of the EM algorithm, was developed by Zhu et al. [36] and Zhu and Lee [35], respectively. For some applications of Zhu and Lee [35]'s approach in the context of asymmetric models, we refer to [14,17,23,28], among others.

In this work, we propose the heteroscedastic nonlinear regression model with errors following a SSMN distribution (SSMN-HNLM), generalizing the SSMN-NLM proposed by Ferreira and Lachos [16]. The ML estimates of the model parameters are obtained via the EM algorithm and the observed information matrix is obtained analytically. Thus, in this paper we develop influence diagnostic tools (case deletion and local influences) for our model based in Zhu and Lee [35]'s well-known approach, which is based on the complete log-likelihood function. We perform a simulation study to verify the asymptotic distribution of the likelihood ratio test statistic and its empirical power to test homogeneity of variances. Further, a study for misspecification of the structure function is performed.

The rest of the paper is organized as follows. In Section 2, we present some properties of the univariate SSMN family. Section 3 outlines the SSMN-HNLM and the EM algorithm for maximum likelihood estimation. In Section 4, we discuss the log-likelihood ratio test for checking homogeneity of a scale parameter and investigate its properties through Monte Carlo simulations. The case deletion measures and local influence of three perturbation schemes are derived in Section 5. The proposed method is illustrated in Section 6 by analyzing a real dataset, and some concluding remarks are presented in Section 7.

2. Skew scale mixtures of normal distributions

In order to motivate our proposed methods, we present a brief introduction to the skew-normal (SN), the scale mixture of normal (SMN) and the SSMN class of distributions. For further details we refer to [15,17].

Let ϕ(x;μ,σ2) and Φ(x;μ,σ2) be the probability density function (pdf) and the cumulative distribution function (cdf), respectively, of the N(μ,σ2) distribution evaluated at x. A random variable Y follows a univariate skew-normal distribution [3] with location parameter μ, scale parameter σ2 and skewness parameter λ if its pdf is given by:

f(y)=2ϕ(y;μ,σ2)Φ(λyμσ),yR. (1)

For a random variable with pdf as in (1), we use the notation YSN(μ,σ2,λ). When λ=0, the skew normal distribution reduces to the normal distribution (YN(μ,σ2)). Its marginal stochastic representation [21], which can be used to derive several of its properties, is given by:

Y=dμ+σ(δ|T0|+(1δ2)1/2T1),withδ=λ/1+λ2, (2)

where T0N(0,1) and T1N(0,1) are independent, |T0| denotes the absolute value of T0 and ‘=d’ means ‘distributed as’. The expectation and variance of Y are given, respectively, by:

E[Y]=μ+bσδ,Var[Y]=σ2(1b2δ2), (3)

where b=2/π.

A random variable Y follows a SMN distribution [1] with location parameter μR and scale parameter σ2 if its pdf assumes the form:

f0(y)=0ϕ(y;μ,κ(u)σ2)dH(u;τ), (4)

where H(u;τ) is the cdf of a positive random variable U indexed by the parameter vector τ and κ(.) is a strictly positive function. For a random variable with a pdf as in (4), we use the notation YSMN(μ,σ2,H;κ).

A random variable Y follows a SSMN distribution [15] with location parameter μR, scale factor σ2 and skewness parameter λR, if its pdf is given by:

f(y)=2f0(y)Φ(λyμσ), (5)

where f0(y) is a SMN density as defined in (4). For a random variable with pdf as in (5), we use the notation YSSMN(μ,σ2,λ,H;κ). If μ=0 and σ2=1, we refer to it as the standard SSMN distribution and we denote it by SSMN(λ,H;κ). Clearly, when λ=0, we get the corresponding SMN distributions proposed by Andrews and Mallows [1].

For a SSMN random variable, a convenient hierarchical representation is given next, which can be used to quickly simulate realizations of Y and to implement the EM algorithm.

Let YSSMN(μ,σ2,λ,H;κ). Then its hierarchical representation is given by:

Y|U=uSN(μ,σ2κ(u),λκ1/2(u)),UH(τ). (6)

Thus, the distributions in the SSMN class that will be considered in this work are:

  • The skew Student-t normal distribution (StN), [20], with τ=ν degrees of freedom, denoted by StN(μ,σ2,λ,ν) and κ(u)=1/u, with UGamma(ν/2,ν/2),ν>0, which has pdf
    f(y)=21σνπΓ((ν+1)/2)Γ(ν2)(1+dν)((ν+1)/2)Φ(λ(yμ)σ), (7)
    where d=(yμ)2/σ2 and Γ() is the gamma function. When ν, we obtain the SN distribution as the limiting case. Lastly, U|Y=yGamma((ν+1)/2,(ν+d)/2).
  • The skew slash distribution (SSL), denoted by SSL(μ,σ2,λ,ν), arises when κ(u)=1/u and UBeta(ν,1), with τ=ν>0. Its pdf is given by:
    f(y)=2ν01uν1ϕ(y;μ,σ2u)duΦ(λyμσ),yR. (8)
    The skew slash distribution reduces to the SN distribution when ν. It is easy to see that U|Y=yTGamma(0,1)(ν+1/2,d/2), where TGamma(t1,t2)(a,b) is the truncated gamma distribution, in the interval (t1,t2).
  • The skew contaminated normal distribution (SCN), denoted by SCN(μ,σ2,λ,ν,γ), τ=(ν,γ), 0<ν<1, 0<γ<1. Here, κ(u)=1/u and U is a discrete random variable taking one of two states. The probability density function of U is given by:
    h(u;τ)=νI(u=γ)+(1ν)I(u=1)andf(y)=2{νϕ(y;μ,σ2γ)+(1ν)ϕ(y;μ,σ2)}Φ(λyμσ). (9)
    The skew contaminated normal distribution reduces to the SN distribution when γ=1. The conditional distribution U|Y=y is given by:
    f(u|Y=y)=1f0(y){νϕ(y;μ,σ2/γ)I(u=γ)+(1ν)ϕ(y;μ,σ2)I(u=1)},
    where f0(y)=νϕ(y;μ,σ2/γ)+(1ν)ϕ(y;μ,σ2).
  • The skew power-exponential distribution (SPE), denoted by SPE(μ,σ2,λ,ν), with τ=ν and 0.5<ν1, has pdf given by:
    f(y)=2ν21/2νσΓ(1/2ν)edν/2Φ(λyμσ), (10)
    which reduces to the SN distribution when ν=1. Although the conditional distribution of U|Y=y is not known, Ferreira et al. [15] showed that:
    E[κ1(U)|Y=y]=νdν1. (11)

3. The model and the EM algorithm for ML estimation

3.1. The model

The SSMN-HNLM is defined by:

Yi=ηi+εi,i=1,,n,εiSSMN(0,σ2mi,λ;H,κ), (12)

where Yi is the response variable, xi is a known p×1 covariate vector, ηi=η(β,xi) is the nonlinear predictor, where η() is an injective and twice continuously differentiable function with respect to the vector of unknown regression coefficients β=(β1,,βp), mi=m(ρ,zi) is a known positive continuously differentiable function, zi contains values of the explanatory variables, which constitute in general, although not necessarily, a subset of xi, and ρ is a vector of unknown parameters (see [19,32] for more details). We assume that there is a unique value ρ=ρ0 such that mi(ρ0,zi)=1 for all i=1,,n.

Using Equations (4) and (5), it follows that the observed-data log-likelihood function of the parameter vector θ=(β,σ2,λ,ρ,τ) can be expressed as:

(θ|y)=i=1nlog{20ϕ(yi;ηi,σ2miκ(ui))h(ui;τ)duiΦ(λ(yiηi)σmi1/2)}, (13)

where h(;τ) is the pdf of U and y=(y1,,yn) is the vector of observed values of the response variable Y.

3.2. The ECME algorithm for the SSMN-HNLM model

Note that it is not possible to obtain an analytical solution to the ML estimates of θ using (θ|y) directly. The hierarchical representation of the SSMN distributions, see Equations (2) and (6), enables the construction of an EM-type algorithm [13] for ML estimation of the SSMN-HNLM. When the M-step of the EM turns out to be analytically intractable, it can be replaced with a sequence of conditional maximization (CM) steps, referred to as the ECM algorithm [27]. Another option is to use the ECME algorithm [25], a faster extension of the EM and ECM, which is obtained by maximizing the constrained Q-function (the expected complete data function) with some CM steps that maximize the corresponding constrained actual marginal likelihood function, called the CML steps.

In this section, we demonstrate how to employ the ECME algorithm for ML estimation of the SSMN-HNLM model. From Equations (2) and (6), the following hierarchical representation for Yi can be obtained:

Yi|Ui=ui,Ti=tiindN(ηi+σλmi1/2κ(ui)(1+λ2κ(ui))1/2ti,σ2miκ(ui)1+λ2κ(ui)),UiiidH(τ),TiiidTN(0,)(0,1),i=1,,n, (14)

where TN(0,)(μ,σ2) denotes the univariate normal distribution, (N(μ,σ2)), truncated in the interval (0,).

Using Lemma 1 presented by Ferreira and Lachos [16], and after some algebraic manipulations, the joint distribution of (Yi,Ui,Ti) can be written as:

f(yi,ui,ti)=2ϕ(yi;ηi,σ2miκ(ui))ϕ(ti;λ(yiηi),σ2mi)h(ui;τ),ti0,ui0,yiR.

Let u=(u1,,un) and t=(t1,,tn). Considering u and t as missing data, it follows that the complete log-likelihood function associated with yc=(y,u,t) is given by:

c(θ;yc)=C+i=1n{logh(ui,τ)logσ2logmi12σ2mi12σ2mi[(κ1(ui)+λ2)(yiηi)22λti(yiηi)+ti2]},

where C is a constant not depending on unknown parameters θ.

Given the current estimate θ^(k)=(β^(k),σ2^(k),λ^(k),ρ^(k),τ^(k)), the E-step calculates the function

Q(θ|θ^)=E[c(θ;yc)|y,θ^(k)]=i=1nQ1i(θ1|θ^(k))+i=1nQ2i(τ|θ^(k)), (15)

with θ1=(β,σ2,λ,ρ) and

Q1i(θ1|θ^(k))=Clogσ2logmimi12σ2{(κ^i(k)+λ2)(yiηi)22λt^i(k)(yiηi)+t2^i(k)}Q2i(τ|θ^(k))=E[logh(Ui;τ)|yi,θ^(k)],see Appendix 2 for more details.

It is important to note that these values require expressions for κ^i(k)=E[κ1(Ui)|yi,θ^(k)], t^i(k)=E[Ti|yi,θ^(k)] and t2^i(k)=E[Ti2|yi,θ^(k)].

As presented by Ferreira and Lachos [16], Ti|Y=yiTN(0,)(λ(yiηi),σ2mi) and the expectations t^i(k) and t2^i(k) can be readily evaluated by:

t^i(k)=λ^(k)e^i(k)+σ^(k)(m^i(k))1/2WΦ(λ^(k)e^i(k)σ^(k)(m^i(k))1/2), (16)
t2^i(k)=(λ^(k)e^i(k))2+σ^2(k)m^i(k)+λ^(k)σ^(k)(m^i(k))1/2e^i(k)WΦ(λ^(k)e^i(k)σ^(k)(m^i(k))1/2), (17)

where WΦ(u)=ϕ(u)/Φ(u), m^i(k)=mi(ρ^(k),zi), η^i(k)=η(β^(k),xi) and e^i(k)=yiη^i(k) for i=1,,n.

Updating d^i(k)=(yiη^i(k))2/(σ2^(k)m^i(k)), as in [15], we have computationally attractive expressions for κ^i(k) for different SSMN distributions, as presented in Table 1.

Table 1. κ^i(k)=E[κ1(Ui)|yi,θ^(k)] for different SSMN distributions.

Distributions κ^i(k)
SN 1
StN ν^(k)+1ν^(k)+d^i(k)
SSL (2ν^(k)+1)d^i(k)P1(ν^(k)+3/2,d^i(k)/2)P1(ν^(k)+1/2,d^i(k)/2),
SCN 1ν^(k)+ν^(k)γ^(k)3/21ν^(k)+ν^(k)γ^(k)1/2×exp{(1γ^(k))d^i(k)/2}exp{(1γ^(k))d^i(k)/2},
SPE ν^(k)(d^i(k))ν^(k)1,

()Px(a,b) denotes the cdf of the Gamma(a,b) distribution, evaluated at x.

Thus, the CM-step then conditionally maximizes Q(θ|θ^) with respect to θ, obtaining a new estimate θ^(k+1), as described below:

  • E-step: For i=1,,n, compute t^i(k), t2^i(k) using Equations (16)–(17) and κ^i(k) from Table 1.

  • CM-step:

    Update β^(k), ρ^(k) σ2^(k) and λ^(k) as
    β^(k+1)=argminβ{(b^(k)η(β,x))D(a^(k))(b^(k)η(β,x))},σ2^(k+1)=i=1n1m^i(k)[(κ^i(k)+(λ^(k))2)(yiη^i(k+1))22λ^(k)t^i(k)(yiη^i(k+1))+t2^i(k)]/2n,λ^(k+1)=i=1n1m^i(k)t^i(yiη^i(k+1))/i=1n1m^i(k)(yiη^i(k+1))2,ρ^(k+1)=argminρi=1n{logmi(ρ)mi1(ρ)2σ2^(k+1)+mi1(ρ)2σ2^(k+1)[(κ^i(k)+(λ^(k+1))2)(yiη^i(k+1))22λ^(k+1)t^i(k)(yiη^i(k+1))+t2^i(k)(yiη^i(k+1))2]mi1(ρ)2σ2^(k+1)},
    where a^(k)=(a^1(k),,a^n(k)) with a^i(k)=(m^i(k))1(κ^i(k)+λ^(k)2), D() is the diagonal matrix, b(k)=(b1(k),,bn(k)) is the corrected observed response given by bi(k)=yiλ^(k)t^i(k)/(κ^i(k)+λ^(k)2), m^i(k)=mi(ρ^(k),zi), η(β,x)=(η(β,x1),,η(β,xn)) and η^i(k)=η(β^(k),xi).
  • CML-step: Considering the values β^(k+1), ρ^(k+1), σ2^(k+1) and λ^(k+1), obtain τ^(k) by optimizing the constrained log-likelihood function, i.e.
    τ^(k+1)=argmaxτ{i=1nlogf0(yi;η^i(k+1),σ2^(k+1)m^i(k+1),τ)}, (18)
    where f0(y|θ) is the respective symmetric pdf as defined in (4).

The more efficient CML–step follows [25] (ECME), which is referred to as the conditional marginal likelihood step (CML-step), where we replace the usual M-step by a step that maximizes the restricted actual log-likelihood function. Furthermore, this step requires a one-dimensional search of StN, SSL and SPE models and a bi-dimensional search of the SCN model, which can be easily accomplished by using, for example, the ‘optim/optimize’ routine in R [29].

The iterations of the above algorithms are repeated until a suitable convergence rule is satisfied, e.g. ||θ(k+1)θ(k)|| or |(θ(k+1)|y)(θ(k)|y)| is sufficiently small, say 106.

3.3. Notes on implementation

Although the EM-type algorithm tends to be robust with respect to the choice of the starting values, it may not converge when initial values are far from good ones. Thus, the choice of adequate starting values for the EM algorithm plays a big role in parameter estimation. A set of reasonable values can be obtained by computing β^(0) and σ^2(0) from standard nonlinear least squares (NLS). Then, calculate λ^(0) as the skewness coefficient of the NLS residuals, using the ‘nls’ routine in R [29]. The value of ρ^(0) can be the value ρ(0) such that m^i(0)=mi(ρ(0),zi)=1 for all i=1,,n.

4. Likelihood ratio test for homogeneity of variance

The SSMN-HNLM defined in Equation (12) supposes that the variance of the model is not constant with scale parameter given by σi2=σ2mi with mi=m(ρ,zi). Besides this, it is assumed that a unique value ρ=ρ0 exists such that mi(ρ0,zi)=1 for all i=1,,n. Thus, the test for homogeneity of the scalar parameter in the model (12) can be expressed by:

H0:ρ=ρ0vsH1:ρρ0.

In this work, we use a likelihood ratio (LR) test statistic to test H0, which is provided by LR=2((θ^,y)(θ^0,y), where (θ,y) denotes the observed-data log-likelihood function, given by Equation (13), θ^ and θ^0 represent the ML estimates obtained using the ECME algorithm under H1 and H0, respectively. Under H0, the LR test statistic has an asymptotically χq2 distribution, where q is the length of ρ. Thus, in order to analyze the empirical distribution and power of the LR test, we develop two simulation studies.

4.1. Simulation studies

In this section, the performance of the asymptotic distribution and power of the log-likelihood ratio (LR) test statistic are examined. First, we compare the empirical distribution with the theoretical distribution (χ2) via Monte Carlo simulations. Second, we investigate the power of the LR test for a grid of values of ρ.

4.1.1. The empirical distributions of LR test statistics

The performance of the asymptotic distribution of the LR test statistic is examined following the procedure described in [19,33]. The model used in this simulation study is

Yi=eβxi+εi,i=1,,n, (19)

where εiindSSMN(0,σi2,λ;H,κ), with σi2=σ2eρxi. The variable xi is generated from a uniform distribution in the interval(0.2,2). The parameter values are set at β=2, σ2=0.5, λ=3. The values of ν are chosen to achieve heavy-tails, with ν=3 for StN and SSL.

We generate values of Yi by the model (19) with the true values of parameters and ρ=0 (under H0), repeating this procedure 2000 times (the values of xis are fixed for each replication). Then, using the 2000 estimates of the LR statistics, we obtain the empirical distribution functions (edf). Figure 1 shows comparisons between the edf and the theoretical distribution of χ(1)2 for n = 30, 70 and 120 in SN, StN and SSL models. It can be seen that when ‘n’ increases, the edf's are very close to the theoretical distribution for the distributions considered in our study.

Figure 1.

Figure 1.

Simulated comparisons between the empirical distributions of the score statistic and χ(1)2 distribution, using SN (first row), StN (second row) and SSL (last row).

4.1.2. The power of the LR test

In order to study the power of the test, we use different values of ‘n’ and ρ to get the simulated sizes and powers of the test statistic. We consider the values of ρ=0, 0.2, 0.4, 0.6, 0.8, 1 and n = 10, 20, 30, 50, 70, 90 and 150. Each simulation is repeated 2000 times, so the proportion of times when the null hypothesis is rejected is just the simulated power value. All the statistics are compared with the χ12 critical value at α=0.05 level. Table 2 presents the rejection rate for the hypothesis H0:ρ=0 from the test statistic LR for the SN, StN and SSL distributions. It can be seen that for ρ=0, the rejection rate of the test approximates the true nominal level when ‘n’ increases. When ‘n’ and ρ increase, the power of the test approaches of 1 for all models. Figure 2 presents the rejection rate when varying the parameter ρ in the interval [0,1] and varying the sample size between 30 and 150 for each distribution.

Table 2. Rejection rate for H0:ρ=0 at the nominal level of 5% from the LR statistic for the SN, StN and SSL distributions.
n ρ=0.0 ρ=0.2 ρ=0.4 ρ=0.6 ρ=0.8 ρ=1.0
SN-NLM
10 0.0940 0.1025 0.0895 0.0895 0.0870 0.1220
20 0.0715 0.0680 0.0810 0.1050 0.1730 0.2340
30 0.0570 0.0610 0.0895 0.1590 0.2510 0.3920
50 0.0630 0.0790 0.1400 0.2675 0.4650 0.6475
70 0.0535 0.0810 0.1865 0.3655 0.5955 0.8140
90 0.0535 0.1075 0.2370 0.5085 0.7550 0.9105
120 0.0500 0.1175 0.3425 0.6440 0.8505 0.9585
150 0.0490 0.1230 0.4120 0.7305 0.9290 0.9925
StN-NLM
10 0.1335 0.1340 0.1375 0.1175 0.1285 0.1560
20 0.0760 0.0925 0.1075 0.1250 0.1615 0.2255
30 0.0735 0.1070 0.1095 0.1500 0.2215 0.3170
50 0.0715 0.1105 0.1375 0.2245 0.3285 0.4695
70 0.0600 0.0985 0.1580 0.2710 0.4455 0.6185
90 0.0520 0.1065 0.2005 0.3660 0.5190 0.7100
120 0.0540 0.1190 0.2455 0.4385 0.6665 0.8455
150 0.0465 0.1420 0.3090 0.5275 0.7640 0.9050
SSL-NLM
10 0.0805 0.0665 0.0630 0.0650 0.0615 0.0960
20 0.0545 0.0560 0.0425 0.0520 0.0760 0.0940
30 0.0540 0.0460 0.0380 0.1085 0.0890 0.2775
50 0.0415 0.0390 0.0795 0.1310 0.2410 0.4670
70 0.0445 0.0460 0.1160 0.2765 0.3060 0.6430
90 0.0450 0.0495 0.1180 0.2890 0.5245 0.7625
120 0.0430 0.0600 0.1490 0.4020 0.6310 0.9205
150 0.0375 0.0715 0.2190 0.5860 0.8070 0.9615
Figure 2.

Figure 2.

Power of the analysis to detect significant heteroscedasticity over a range of possible ρ0 values and different sample sizes (n), considering the SN, StN and SSL error distributions.

4.1.3. Study of misspecification of the structure function

As suggested by a referee, we report here a simulation study to analyze the influence of misspecification of the structure function. We simulate from model (12) with ηi(β,xi)=β1/(1+β2eβ3xi), xiU(0.1,15), mi(ρ,zi)=eρzi,zi=xii=1,,n. The true values of the parameters are set at: β=(37,43,0.6), σ2=0.5, λ=3 and ν=3 (for StN and SSL). We use the following structure functions for mi: (1) mi=1 (homoscedasticity), (2) mi=ρzi (linear relation), (3) mi=eρzi (true relation) and (4) mi=ziρ. We generate 2000 Monte Carlos samples of size n = 300 and we compute the coverage rates (CR) given by the proportion of estimates that filled in the 95% confidence interval and the bias, given by the difference between the mean of the estimates and the true value of the parameters. For CR, we expected a value close to 95% and for the bias a value close to 0. According to Table 3, the true structure function attains our expectations in terms of CR bias for all parameters and all the distributions taken into consideration. On the other hand, we note that other specifications of mi present relatively large bias and CR in at least one parameter.

Table 3. Coverage rates (CR) at the nominal level of 5% and bias for different structure functions with mi=e0.1zi (true values of the parameters are in parentheses).
  mi=1 mi=ρzi mi=eρzi mi=ziρ
Parameter CR bias CR bias CR bias CR bias
SN-HNLM
β1(37) 86.40 −0.12 87.94 0.11 94.64 −0.02 92.20 −0.06
β2(43) 90.08 −0.86 87.57 0.83 94.11 0.15 94.78 −0.06
β3(0.6) 94.04 −1.6e3 82.16 4.7e3 94.26 5.2e4 94.82 9.6e4
σ2(0.5) 5.02 0.61 81.97 −0.12 92.76 1.4e3 90.50 0.10
λ(3) 92.04 −0.01 74.32 −1.16 95.85 −0.25 96.22 −0.30
ρ(0.1) 13.63 0.72 94.24 −4.9e4 18.47 0.26
StN-HNLM
β1(37) 93.04 −0.09 98.38 −0.04 92.13 −0.01 90.51 −0.04
β2(43) 95.11 −0.08 55.46 0.21 92.90 0.16 93.54 0.15
β3(0.6) 94.75 1.4e4 0.00 1.2e3 93.26 5.7e4 93.24 1.1e3
σ2(0.5) 77.59 0.50 62.94 −0.01 92.58 0.04 93.06 0.06
λ(3) 93.56 0.04 27.73 −0.42 93.46 −0.49 93.71 −0.46
ρ(0.1) 85.65 0.35 94.38 −2.0e4 45.51 0.31
ν(3) 96.85 −4.2e4 31.06 0.43 96.56 0.65 96.06 0.57
SSL-HNLM
β1(37) 79.16 −0.30 82.36 −0.10 96.14 −0.08 93.41 −0.08
β2(43) 82.23 1.07 71.59 1.91 95.34 0.19 98.12 0.29
β3(0.6) 56.83 3.9e3 75.67 6.8e3 95.05 1.7e3 96.89 1.6e3
σ2(0.5) 4.67 −0.05 0.00 41.68 95.54 0.04 94.91 0.04
λ(3) 3.42 1.47 57.73 0.55 97.09 −0.25 96.89 −0.29
ρ(0.1) 0.00 −0.10 95.15 0.31 24.28 0.31
ν(3) 0.00 −1.90 3.38 −0.96 92.62 3.89 90.21 4.63

4.1.4. Computational aspects

The simulation studies were run in a Linux server, with 2 processors of 2.4 GHz, 12 cores, 24 threads and 32 GB of RAM memory. All the computational procedures were coded and implemented using the statistical software package R (R Core Team, 2018). For each procedure based on 2000 replicates, we used the parallel routine of R. The run time of each simulation procedure varied between 7 and 40 minutes, depending on the sample size {10,20,,150} and ρ value used. We did not observe problems with convergence and out-of-boundary estimates in our simulation study. The computer programs are available from the first author upon request.

5. Diagnostic analysis

Diagnostic techniques are used to detect observations that seriously influence the results of a statistical analysis. In the literature, there are basically two approaches to detect influential observations. One approach is the case-deletion method [10], in which the impact of deleting an observation on the estimates is directly assessed by measures such as the likelihood distance and Cook distance. The second approach is a general statistical technique used to assess the stability of the estimation outputs with respect to the model inputs [11]. Inspired by the results of Zhu et al. [36], Zhu and Lee [35] and Lee and Xu [23], we study the case-deletion measures and the local influence diagnostics for nonlinear regression models on the basis of the Q-function. In the following subsections we describe the background and details of the classic diagnostic methods to detect influential observations.

5.1. The local influence approach

Let c(θ;yc) and c(θ,ω|yc), with θRp, be the complete log-likelihood functions from the postulated model, considering a perturbation vector ω, varying in an open region ΩRq, respectively. We assume that a vector ω0 exists such that c(θ,ω0|yc)=c(θ|yc), for all θ. To assess the influence of the perturbations on the ML estimate of (θ;yc), one may consider the Q-displacement function, defined as:

gQ(ω)=2{Q(θ^|θ^)Q(θ^ω|θ^)},

where θ^ω denotes the ML estimator in the perturbed model.

Following the approach developed in [11,35], the normal curvature for θ in the direction of some unit vector d is given by:

CgQ,d(θ)=2dΔω0{Q¨(θ^|θ^)}1Δω0d,

where Q¨(θ^|θ^)=2Q(θ|,θ^)/θθ|θ=θ^ and Δω=2Q(θ,ω|θ^)/θω|θ=θ^ω.

Let (λ1,h1),,(λn,hn) be the eigenvalue-eigenvector pairs of the matrix 2Q¨ωo=2Δω0{Q¨(θ^|θ^)}1Δω0, with λ1λq>0,λq+1==λn=0, λ¯k=λk/(λ1++λq) and hk2=(hk12,,hkn2). The aggregated contribution vector of all eigenvectors corresponding to nonzero eigenvalues is given by:

M(0)=k=1qλ¯khk2.

Following Lee and Xu [23], we use 1/n+cSM(0) as a benchmark to regard the lth case as influential, where c is an arbitrary constant (depending on the real application) and SM(0) is the standard deviation of {M(0)l,l=1,,n}. Appendix 3 presents the Hessian matrix Q¨(θ^|θ^)=2Q(θ|θ^)/θθ|θ=θ^ and some perturbation schemes used in this work.

5.2. Case deletion measures

In the process of model validation, it is fundamental to verify if there are observations with a disproportionate influence on the estimates of the model's parameters. Case-deletion is a classic approach to study the effects of dropping the ith case from the dataset. Thus, considering the model in (12) and θ=(β,σ2,λ,ρ,τ), we compare the ML estimate with all observations θ^ with the ML estimate θ^[i] obtained when the ith observation has been deleted from the dataset. The SSMN-HNLM in (12) is rewritten as:

Yj=μj+εj,j=1,,n,ji.

Let yc[i]=(y[i],u[i],t[i]) be the augmented dataset, where the subscript ‘[i]’ denotes the original vector with the ith observation deleted. The complete-data log-likelihood function based on the data with the ith case deleted is denoted by c(θ|yc[i]). Let θ^[i]=(β^[i],σ2^[i],λ^[i],ρ^[i]) be the maximizer of the function Q[i](θ|θ^)=E{c(θ|Yc[i])|y[i],θ=θ^} of the proposed regression model, where the estimates θ^[i] are obtained by using the EM algorithm based on the remaining n−1 observations. If θ^[i] is far from θ^ in some sense, then the ith case is regarded as influential.

Similar to the classic case-deletion measures, Cook distance and the likelihood displacement, Zhu et al. [36] presented analogous measures based on the Q-function.

  1. Generalized Cook distance GD: This measure, similar to the usual Cook distance Di [10], determines the degree of influence of the ith observation on the estimate of θ and is defined by:
    GDi=(θ^[i]θ^){Q¨(θ^|θ^)}(θ^[i]θ^).
  2. Q-distance QD: This measure of the influence of the ith case is similar to the likelihood distance LDi discussed by Cook and Weisberg [12], defined by:
    QDi=2{Q(θ^|θ^)Q(θ^[i]|θ^)}.

6. Application

In this section we consider the likelihood analysis of the dataset presented in [24], which describes data from ultrasonic calibration. Labra et al. [22] analyzed this dataset in a heteroscedastic nonlinear model with SMSN distribution errors. In this context, the authors verified the presence of outliers. Here we reanalyze the dataset with the aim of showing the capacity of the SSMN distributions to fit real datasets in the presence of asymmetry and heavy-tails in heteroscedastic nonlinear models. The data consist of 214 observations where the response variable is ultrasonic response Y, and the predictor variable is metal distance x. From the descriptive statistics, presented in Table 4, we observe a large and positive sample skewness. The distance between the mean and median suggest using an asymmetric distribution as an alternative to model the data. On the other hand, Figure 3 shows a nonlinear relationship between the metal distance and the ultrasonic response.

Table 4. Summary statistics for ultrasonic calibration data (SD is sample standard deviation).

Min Max Mean Median SD Skewness Kurtosis
3.75 92.90 30.26 21.11 23.68 0.91 2.56

Figure 3.

Figure 3.

Scatter-plot of ultrasonic calibration data.

Following Lin et al. [24], we consider a SSMN-HNLM of the form:

Yi=exp(β1xi)β2+β3xi+εi,εiindSSMN(0,σi2,λ;H,κ), (20)

where σi2=σ2xiρ for i=1,,214.

Table 5 contains the ML estimates of the parameters from the SN, StN, SSL, SPE and SCN models, together with their corresponding standard errors (SE) calculated via the observed information matrix (Appendix 1). Moreover, both the Akaike information criterion (AIC) and Bayesian information criterion (BIC) indicate that the SSMN models with heavy-tails (StN-NLM, SSL-NLM and SPE-NLM) present better fit than the SN model, with the StN-NLM being significantly better. In addition, by comparing our results with those obtained by Labra et al. [22, Table 2, p. 2159], we can see that the ML estimates in the StN and SSL distributions have higher log-likelihood ((θ^)) and consequently lower AIC values, indicating better performance of the SSMN models for the ultrasonic calibration dataset.

Table 5. ML estimation results of fitting various mixture models to the ultrasonic calibration data. The SE values are the asymptotic standard errors based on the observed information matrix.

  SN-NLM StN-NLM SSL-NLM SPE-NLM
Parameter Estimate SE Estimate SE Estimate SE Estimate SE
β1 0.188 0.019 0.190 0.014 0.196 0.016 0.190 0.027
β2 0.006 4.6e4 0.006 3.8e4 0.006 3.8e4 0.006 4.7e4
β3 0.013 9.1e4 0.012 8.6e4 0.012 6.9e4 0.013 8.7e4
σ2 33.981 5.997 11.244 2.050 9.977 0.101 18.514 12.545
λ 2.088 0.428 0.651 0.125 0.824 0.158 1.448 0.784
ρ −1.082 0.128 −1.028 0.119 −1.091 0.065 −1.091 0.139
ν 3.846 1.184 1.454 0.058 0.773 0.150
(θ^) −520.305 514.764 −515.108 −518.008
AIC 1052.609 1043.529 1044.215 1050.017
BIC 1072.806 1067.090 1067.778 1073.578

From (20), the model is homoscedastic when ρ=0. So, the test for heteroscedasticity based on the likelihood ratio test statistic is H0:ρ=0vs.H1:ρ0, which has an approximate χ12 distribution under H0. The LR statistic for the StN model was LR=2((θ^)(θ^0))=32.232 with pvalue0. This result is in accordance with that obtained by Lin et al. [24] using the score statistic test and Labra et al. [22] using the likelihood ratio test. Therefore the assumption of homogeneity of variance is not suitable for the ultrasonic calibration data.

In order to detect incorrect specification of the error distribution, we use the Mahalanobis distance Di=(Yiη(β,xi))2/σi2, for i=1,,214 to construct simulated envelopes. In the skew-normal case, we have Diχ12. Thus, we can use as cutoff points the quantile υ=χ12(ξ), where 0<ξ<1. From Ferreira et al. [15], we have the following properties related to the Mahalanobis distance: DiF(1,ν) for StN, Pr(Diυ)=Pr(χ12υ)(2νΓ(ν+1/2)/υνΓ(1/2))Pr(χ2ν+12υ) for SSL and Pr(Diυ)=υ1/2IG(1/2ν,υν/2)/Γ(1/2ν)21/2ν for the SPE distribution.

The QQ-plots and simulated envelopes for the Mahalanobis distance of the fitted SN-NLM, StN-NLM, SSL-NLM and SPE-NLM models are shown in Figure 4. The lines in these figures represent the 2.5th percentile, the mean, and the 97.5th percentile of 100 simulated points for each observation. It can be seen that the SN and SPE models contain some observations outside the confidence band, but the StN and SSL models present good fit to the dataset.

Figure 4.

Figure 4.

Ultrasonic calibration data. QQ-plots and simulated envelopes for the Mahalanobis distance. (a) SN-NLM, (b) StN-NLM, (c) SSL-NLM and (d) SPE-NLM.

First, we identify influential observations in the fitted model based on case-deletion measures, the generalized Cook distance (Figure 5) and Q-distance (Figure 6), which are similar for each model but with less scale in the StN model. We note from these figures that observations #142 and #176 for both measures are potentially influential on the parameter estimates in the SN model, while #146 and #147 are influential in the StN model and #142, #146 and #147 are influential in the SSL model.

Figure 5.

Figure 5.

Ultrasonic calibration data. Index plot of the generalized Cook distance. (a) SN-HNLM, (b) StN-HNLM, (c) SSL-HNLM.

Figure 6.

Figure 6.

Ultrasonic calibration data. Index plot of the Q-distance. (a) SN-HNLM, (b) StN-HNLM, (c) SSL-HNLM.

We used the same strategy presented by Cao et al. [8] to compare the robustness among models. Thus, in order to reveal the impact of the four observations considered as potential outliers, using the Mahalanobis distance (See Figure 7), on the parameter estimates, we refitted the five models eliminating these four cases.

Figure 7.

Figure 7.

Ultrasonic calibration data. Mahalanobis distance for SN-HNLM.

In Table 6, we show the estimated values (θ^) and their relative changes, defined by RCk=(θ^[j]kθ^k)/θ^k, where θ^[j]k means the kth parameter estimate, without the set [j] of potential outliers. We observe that the RC for StN-NLM becomes smaller than the SN-NLM, which implies that the StN model is less sensitive than the SN model, in the presence of potential outliers.

Table 6. Ultrasonic calibration data. Relative changes (RC) of θ^, after deleting outliers.

  SN-NLM StN-NLM SSL-NLM SPE-NLM
Parameter Estimate RC Estimate RC Estimate RC Estimate RC
β1 0.1790 13.7299 0.1751 4.6419 0.1870 5.5330 0.2040 −7.2792
β2 0.0055 9.7201 0.0056 2.6319 0.0057 4.4269 0.0061 −4.1235
β3 0.0131 −5.0432 0.0123 −3.0677 0.0125 −6.0685 0.0118 5.4580
σ2 27.2558 31.9260 13.4812 −0.5992 20.7536 −108.0115 26.3738 −42.4551
λ 1.8818 46.7073 0.6055 0.5030 1.2628 −53.2015 1.6482 −13.8311
ρ −1.1117 −19.3049 −1.2033 −5.6935 −1.2359 −13.2407 −1.1854 −8.6415
ν 7.8713 2.2535 6.4618 −344.3733 0.9999 −29.2878

Figures 89 present local influence diagnostic analysis using the case-weight and response perturbations, respectively. For the perturbation schemes we obtained the values of M(0) and the figures present the index graphs of M(0). The horizontal lines delimit the benchmark for M(0), with c=4 [23]. Observations #146 and #147 stand out as influential in case-weight perturbation in all models, but with less scale in StN and SSL models. The same three observations (#82,145,193) are influential in the response perturbation in the StN and SSL models, while the SN model presents other observations as influential.

Figure 8.

Figure 8.

Ultrasonic calibration data. Index plot of M(0) in the case weight perturbation. (a) SN-HNLM, (b) StN-HNLM, (c) SSL-HNLM.

Figure 9.

Figure 9.

Ultrasonic calibration data. Index plot of M(0) in the response perturbation. (a) SN-HNLM, (b) StN-HNLM, (c) SSL-HNLM.

As suggested by a referee, in order to evaluate whether the likelihood ratio (LR) statistic to test H0:ρ=0, is sensitive to the presence of the influential observations #82, #130, #145, #162, #163, #175 and #193 (based in the diagnostic analysis), we removed each observation individually and all of them together from the full data, and obtained the LR statistics and their corresponding p-values. Table 7 shows that H0 was rejected in all cases, so the SSMN-HNLM remains effective when influential observations are removed individually or jointly.

Table 7. Ultrasonic calibration data. Likelihood ratio (LR) statistics when removing each observation individually and all simultaneously.

  Removed Observations
Model #82 #130 #145 #162 #163 #175 #193 All influential
SN 60.596 82.487 66.013 69.210 80.171 79.849 64.741 86.439
StN 31.361 34.673 32.034 34.305 32.913 33.547 30.954 45.529
SSL 32.788 42.236 39.588 41.848 41.588 41.425 38.382 52.032
SPE 32.897 36.674 33.389 35.998 34.794 34.521 32.070 56.935

7. Conclusions

In this paper we developed an EM algorithm for maximum likelihood estimation in the SSMN-HNLM, where closed-form expressions are obtained for the E and M steps of the EM algorithm with the standard errors as byproducts. Furthermore, we applied Zhu and Lee [35]'s approach for case-deletion measures and local influence diagnostics. A simulation study was developed to verify the asymptotic distribution of the likelihood ratio test statistic and the empirical power of the test. For ρ=0, the rejection rate of the test approached of the true nominal level when n increased. When n and ρ increased, the power of the test approached 1. The diagnostic analysis showed that the influence of the observations declined when we considered distributions with heavier tails than the SN one. The models can be fitted using standard available software packages from R and the program codes are available from us on request.

Finally, the proposed method can be extended to a more general framework, such as censored regression models [31], measurement error models and multivariate regression models, providing satisfactory results at the expense of additional complexity of implementation. An in-depth investigation of such extension is beyond the scope of the present paper, but is certainly an interesting topic for future research.

Appendices.

Appendix 1. The observed information matrix for SSMN heteroscedastic nonlinear regression models

Consider the SSMN-HNLM model given in (12), where the corresponding observed-data log-likelihood function of θ=(β,σ2,λ,ρ,τ) is of the form (θ|y)=i=1ni(θ|yi). In this section we write i(θ|yi)=i(θ) for simplification. We have that i(θ)=log2+1i(θ)+logΦ(2i(θ)), where 1i(θ) is the log-likelihood function of the corresponding symmetric SMN distribution and 2i(θ)=λ(yiη(β,xi))/(σmi1/2(ρ,zi)). To simplify the text, we write ηi=η(β,xi) and mi=mi(ρ,zi). Thus, the observed information matrix for θ can be written as:

I(θ)=I1(θ)+I2(θ),

where:

I1(θ)=i=1n21i(θ)θθ;I2(θ)=i=1n[WΦ(2i(θ))22i(θ)θθ+WΦ(2i(θ))2i(θ)θ2i(θ)θ],

with WΦ(x)=WΦ(x)(x+WΦ(x)), with WΦ(x)=ϕ(x)/Φ(x).

The first-order derivatives of 2i(θ) in relation to θ are given by:

2i(θ)β=λσmi1/2ηiβ;2i(θ)σ2=λ(yiηi)2σ3mi1/2;2i(θ)λ=(yiηi)σmi1/2;2i(θ)ρ=λ2σ(yiηi)mi3/2miρ;2i(θ)τ=0

The second-order derivatives of 2i(θ) in relation to θ are given by:

22i(θ)ββ=λσmi1/22ηiββ;22i(θ)σ2β=λ2σ3mi1/2ηiβ;22i(θ)λβ=1σmi1/2;ηiβ;22i(θ)ρβ=λ2σmi3/2miρηiβ;22i(θ)σ4=3λ4σ5(yiηi)mi1/2;22i(θ)σ2λ=12σ3(yiηi)mi1/2;22i(θ)σ2ρ=λ4σ3(yiηi)mi3/2miρ;22i(θ)λρ=12σ(yiηi)mi3/2miρ;22i(θ)ρρ=λ2σ(yiηi)mi3/2[32mimiρmiρ+2miρρ]22i(θ)λ2=0;22i(θ)θτ=0.

The first and second-order derivatives of 1i(θ) in relation to θ can be calculated for each considered SMN distribution as follows:

A.1. The normal distribution

1i(θ)β=(yiηi)σ2miηiβ;1i(θ)σ2=(di1)2σ2;1i(θ)λ=0;1i(θ)ρ=(di1)2mimiρ;1i(θ)τ=0;21i(θ)ββ=1σ2mi[(yiηi)2ηiββηiβηiβ];21i(θ)σ2β=(yiηi)σ4miηiβ;21i(θ)ρβ=(yiηi)σ2mi2miρηiβ;21i(θ)σ4=12σ4diσ4;21i(θ)σ2ρ=di2σ2mimiρ;21i(θ)ρρ=12di2mi2miρmiρ+di12mi2miρρ;21i(θ)θλ=21i(θ)θτ=0,

with di=(yiηi)2/σ2mi.

A.2. The Student-t distribution

1i(θ)β=(ν+1)(yiηi)Viνσ2miηiβ;1i(θ)σ2=12σ2[(ν+1)Vidi/ν1];1i(θ)λ=0;1i(θ)ρ=(ν+1)Vidi2νmimiρ;1i(θ)ν=log(Vi)2+(ν+1)Vidi2νmi+12[Ψ((ν+1)/2)Ψ(ν/2)1/ν];21i(θ)ββ=(ν+1)Viνσ2mi[(2Vidiνmi1)ηiβηiβ+(yiηi)2ηiββ]21i(θ)σ2β=ν+1νσ4miVi(yiηi)(Vidiν1)ηiβ21i(θ)ρβ=(ν+1)νσ2mi2Vi(yiηi)(1νVidi1)miρηiβ;21i(θ)νβ=1ν2σ2miVi(yiηi)(ν+1νVidi1)ηiβ;21i(θ)σ4=12σ4+ν+1νσ4Vidi(12νVidi1);21i(θ)σ2ρ=(ν+1)2νσ2miVidi(1νVidi1)miρ;21i(θ)νσ2=12ν2σ2Vidi(ν+1νVidi1);21i(θ)ρρ=(ν+1)2νmiVidi[1mi(1νVidi2)miρmiρ+2miρρ];21i(θ)νρ=12ν2miVidi(ν+1νVidi1)miρ;21i(θ)ν2=14[Ψ1(ν+12)Ψ1(ν2)+2ν2]+12ν2Vidi+12ν4Vidi[(ν+1)Vidiν(ν+2)];21i(θ)θλ=0,

where Vi=(1+di/ν)1, Ψ(x) is the digamma function and Ψ1(x) is the trigamma function.

A.3. The slash distribution

1i(θ)β=(yiηi)IG3(di)σ2miIG1(di)ηiβ;1i(θ)σ2=12σ2+diIG3(di)2σ2IG1(di);1i(θ)λ=0;1i(θ)ρ=12mimiρ+diIG3(di)2miIG1(di)miρ;1i(θ)ν=1/ν+E1,1/IG1(di);21i(θ)ββ=1σ2miIG1(di){(yiηi)IG3(di)2ηiββ+[diIG1(di)(IG5(di)IG1(di)IG32(di))IG3(di)]ηiβηiβ};21i(θ)σ2β=(yiηi)σ4miIG1(di)[di2(IG5(di)IG32(di)IG1(di))IG3(di)]ηiβ;21i(θ)ρβ=(yiηi)σ4miIG1(di)[di2(IG5(di)IG32(di)IG1(di))IG3(di)]miρηiβ;21i(θ)νβ=(yiηi)σ2miIG12(di)ηiβ(IG1(di)E1,3(di)IG3(di)E1,1(di));21i(θ)σ4=12σ4{1+diIG1(di)[di2(IG5(di)IG32(di)IG1(di))2IG3(di)]};21i(θ)σ2ρ=di2σ2miIG1(di)[di2(IG5(di)IG32(di)IG1(di))IG3(di)]miρ;21i(θ)νσ2=di2σ2IG12(di)(IG1(di)E1,3(di)IG3(di)E1,1(di));21i(θ)ρρ=12mi(1mimiρmiρ2miρρ);+di22miIG1(di){IG3(di)2miρρ+1mi[di2IG1(di)(IG5(di)IG1(di)IG32(di))2IG3(di)]ηiβηiβ};21i(θ)νρ=di2miIG12(di)(IG1(di)E1,3(di)IG3(di)E1,1(di))miρ;21i(θ)ν2=1ν2+1IG12(di)(IG1(di)E2,1(di)E1,12(di));21i(θ)θλ=0,

where

IGk(di)=01uν+k/21eudi/2du=Γ(ν+k/2)(di/2)ν+k/2P1(ν+k/2,di/2),Ej,k(di)=01uν+k/21[ln(u)]jeudi/2du.

A.4. The contaminated normal distribution

U1(θ)=i=1n1i(θ)θ=i=1n1fs(yi|θ)fs(yi|θ)θ.I1(θ)=i=1n21i(θ)θθ,where21i(θ)θjθk=1fs2(yi|θ)fs(yi|θ)θjfs(yi|θ)θk+1fs(yi|θ)2fs(yi|θ)θjθk,

with fs(yi|θ)=νϕ(yi;ηi,σ2mi/γ)+(1ν)ϕ(yi;ηi,σ2mi).

The first partial derivatives are given by

fs(yi|θ)β=1σ2mi[νγϕ(yi;ηi,σ2mi/γ)+(1ν)ϕ(yi;ηi,σ2mi)](yiηi)ηiβ;fs(yi|θ)σ2=12σ2{di[νγϕ(yi;ηi,σ2mi/γ)+(1ν)ϕ(yi;ηi,σ2mi)]fs(yi|θ)};fs(yi|θ)ρ=12mi{di[νγϕ(yi;ηi,σ2mi/γ)+(1ν)ϕ(yi;ηi,σ2mi)]fs(yi|θ)}miρ;fs(yi|θ)ν=ϕ(yi;ηi,σ2mi/γ)ϕ(yi;ηi,σ2mi);fs(yi|θ)γ=ν2γϕ(yi;ηi,σ2mi/γ)(1γdi);fs(yi;θ)λ=0.

The second partial derivatives are given by

2fs(yi|θ)ββ=1σ2mi{(νγϕ(yi;ηi,σ2mi/γ)+(1ν)ϕ(yi;ηi,σ2mi))(yiηi)2ηiββ;+[di(νγ2ϕ(yi;ηi,σ2mi/γ)+(1ν)ϕ(yi;ηi,σ2mi))νγϕ(yi;ηi,σ2mi/γ)(1ν)ϕ(yi;ηi,σ2mi)]ηiβηiβ};2fs(yi|θ)σ2β=(yiηi)2σ4mi[νγϕ(yi;ηi,σ2mi/γ)(γdi3)+(1ν)ϕ(yi;ηi,σ2mi)(di3)]ηiβ;2fs(yi|θ)ρβ=(yiηi)σ2mi2[νγϕ(yi;ηi,σ2mi/γ)(γdi3/2)+(1ν)ϕ(yi;ηi,σ2mi)(di3/2)]miρηiβ;2fs(yi|θ)νβ=(yiηi)σ2mi(γϕ(yi;ηi,σ2mi/γ)ϕ(yi;ηi,σ2mi))ηiβ;2fs(yi|θ)γβ=ν(yiηi)2σ2miϕ(yi;ηi,σ2mi/γ)(3γdi)ηiβ;2fs(yi|θ)σ4=14σ4[νγdiϕ(yi;ηi,σ2mi/γ)(γdi6)+(1ν)diϕ(yi;ηi,σ2mi)(di6)+3fs(yi|θ)];2fs(yi|θ)σ2ρ=12σ2mi[νγdiϕ(yi;ηi,σ2mi/γ)(γdi5/2)+(1ν)diϕ(yi;ηi,σ2mi)(di5/2)+1/2fs(yi|θ)]miρ;2fs(yi|θ)νσ2=12σ2[ϕ(yi;ηi,σ2mi/γ)(γdi1)+ϕ(yi;ηi,σ2mi)(1di)];2fs(yi|θ)γσ2=ν4σ2ϕ(yi;ηi,σ2mi/γ)(γdi2+4di1/γ);2fs(yi|θ)ρρ=12mi[di(2fs(yi|θ))2miρρ+3fs(yi|θ)2mimiρmiρ];di2mi2[νγϕ(yi;ηi,σ2mi/γ)(γdi+2)+(1ν)ϕ(yi;ηi,σ2mi)(di+2)]miρmiρ;2fs(yi|θ)νρ=1mi[ϕ(yi;ηi,σ2mi/γ)(γdi1/2)+ϕ(yi;ηi,σ2mi)(1/2di)]miρ;2fs(yi|θ)γρ=14miϕ(yi;ηi,σ2mi/γ)(2νγdi2+(6ν+γ)di1)miρ;2fs(yi|θ)ν2=0;2fs(yi|θ)νγ=12ϕ(yi;ηi,σ2mi/γ)(1/γdi);2fs(yi|θ)γ2=ϕ(yi;ηi,σ2mi/γ)2[12(1/γdi)21/γ2];2fs(yi|θ)θλ=0.

A.5. The power-exponential distribution

1i(θ)β=ν(yiηi)diν1σ2miηiβ;1i(θ)σ2=νdiν12σ2;1i(θ)λ=0;1i(θ)ρ=νdiν12mimiρ;1i(θ)ν=12ν2[2ν+log(2)+Ψ(1/2ν)];21i(θ)ββ=νdiν1σ2mi[(12ν)ηiβηiβ+(yiηi)2ηiββ];21i(θ)σ2β=ν2(yiηi)diν1σ4miηiβ;21i(θ)ρβ=ν2(yiηi)diν1σ2mi2miρηiβ;21i(θ)νβ=(yiηi)diν1σ2mi(1+νln(di))ηiβ;21i(θ)σ4=12σ4[ν(ν+1)diν1];21i(θ)σ2ρ=ν2diν2σ2mimiρ;21i(θ)νσ2=12σ2(1+νln(di))diν;21i(θ)ρρ=12mi[1mi(1ν(ν+1)diν)miρmiρ+(νdiν1)2ηiββ];21i(θ)νρ=diν2mi(1+νlog(di))miρ;21i(θ)ν2=1ν2[1+ln(2)ν+14ν2(4νΨ(12ν)+Ψ(1,12ν))]12diν[log(di)]2;21i(θ)θλ=0.

Appendix 2. Computation of the Q2i(τ|θ^(k)) function and its derivatives

  • Skew-normal distribution:

    Q2i(τ|θ^(k))=0.

  • Skew power-exponential distribution:

    In this case, there is no explicit form for h(|τ) function and consequently Q2i(τ|θ^(k)) cannot be calculated explicitly.

  • Skew Student-t normal distribution:

    Since UGamma(ν/2,ν/2) and log(h(u|ν))=ν/2log(ν/2)log(Γ(ν/2))+(ν/21)log(u)νu/2, we have that:
    Q2i(τ|θ^(k))=ν/2log(ν/2)log(Γ(ν/2))+(ν/21)lu^i(k)νu^i(k)/2;Q2i(τ|θ^(k))ν=[log(ν/2)+1Ψ(ν/2)+lu^i(k)u^i(k)]/2;2Q2i(τ|θ^(k))ν2=1/(2ν)Ψ(1,ν/2)/4,wherelu^i=Ψ(ν^+12)log(ν^+d^i2).
  • Skew slash distribution:

    In this case, UBeta(ν,1) and log(h(u|ν))=log(ν)+(ν1)log(u). Thus:
    Q2i(τ|θ^(k))=log(ν)+(ν1)lu^i(k);Q2i(τ|θ^(k))ν=1/ν+lu^i(k);2Q2i(τ|θ^(k))ν2=1/ν2,wherelu^i=(d^i/2)ν^+1/2Γ(ν^+1/2)P1(ν^+1/2,d^i/2)01uν^1/2log(u)eud^i/2du.
  • Skew contaminated-normal distribution:

    Here log(h(u|τ))=log(ν)I{γ}(u)+log(1ν)I{1}(u). So,
    Q2i(τ|θ^(k))=u^γi(k)log(ν)+u^1i(k)log(1ν);Q2i(τ|θ^)ν=u^γi(k)νu^1i(k)1ν;2Q2i(τ|θ^)ν2=u^γi(k)ν2u^1i(k)(1ν)2;Q2i(τ|θ^)γ=2Q2i(τ|θ^)γ2=2Q2i(τ|θ^)νγ=0,
    with u^γi=ν^ϕγi/fs(τ^|yi,θ^1), u^1i=(1ν^)ϕ1i/fs(τ^|yi,θ^1), fs(τ|yi,θ^1)=νϕγi+(1ν)ϕ1i, ϕγi=ϕ(yi;η^i,σ2^mi(ρ^,zi)/γ^) and ϕ1i=ϕ(yi;η^i,σ2^mi(ρ^,zi)).

Appendix 3. Hessian matrix and perturbation schemes

To obtain the diagnostic measures of the SSMN-HNLM based on the approach proposed by Zhu and Lee [35], it is necessary to compute the Hessian matrix, which is defined by Q¨θ(θ^)=2Q(θ|θ^)/θθ|θ=θ^, where θ=(β,σ2,λ,ρ,τ). It follows from (15) that the derivatives {2Q(θ|θ^)/θθ} have elements given by:

2Q(θ|θ^)ββ=1σ2i=1nmi1{(κ^i+λ2)[ηiβηiβ+(yiηi)2ηiββ]+λt^i2ηiββ};2Q(θ|θ^)ρβ=1σ2i=1nmi2[(κ^i+λ2)(yiηi)+λt^i]miρηiβ;2Q(θ|θ^)σ2β=1σ4i=1nmi1[(κ^i+λ2)(yiηi)+λt^i]ηiβ;2Q(θ|θ^)λβ=1σ2i=1nmi1[2λ(yiηi)+t^i]ηiβ;2Q(θ|θ^)ρρ=12σ2i=1nmi1[(κ^i+λ2)(yiηi)22λt^i(yiηi)+t2^i]i=1nmi2;2Q(θ|θ^)σ2ρ=12σ4i=1nmi2[(κ^i+λ2)(yiηi)22λt^i(yiηi)+t2^i]miρ;2Q(θ|θ^)λρ=1σ2i=1nmi2[λ(yiηi)2t^i(yiηi)]miρ;2Q(θ|θ^)σ4=nσ41σ6i=1nmi1[(κ^i+λ2)(yiηi)22λt^i(yiηi)+t2^i];2Q(θ|θ^)λσ2=1σ4i=1nmi1[λ(yiηi)2t^i(yiηi)];2Q(θ|θ^)λ2=1σ2i=1nmi1(yiηi)2,2Q(θ|θ^)τθ1=0;2Q(θ|θ^)ττ=i=1n2Q2i(τ|θ^)ττ(seeAppendix2),

where

mi1=1mi3[2miρmiρmi2miρρ],mi2=1mi2[mi2miρρmiρmiρ]andθ1=(β,σ2,λ,ρ).

A.6. Perturbation schemes

In this section we consider two different perturbation schemes for SSMN-HNLM. For each case, we need to calculate the matrix Δω0=2Q(θ,ω|θ^)/θω|ω=ω0=(Δβ,Δσ2,Δλ,Δρ,Δτ).

A.6.1. Case weight perturbation

Let ω=(w1,,wn), be an n×1 dimensional vector with ω0=(1,,1)=1n. Then the expected value of the perturbed complete-data log-likelihood function (perturbed Q-function) can be written as:

Q(θ,ω|θ^)=E[c(θ,ω|yc)]=i=1nωiE[ci(θ|yc)]=i=1nωiQi(θ|θ^).

In this case the elements of Δθi, i=1,,n, are given by

Δβi=1σ2mi1[(κ^i+λ2)(yiηi)+λt^i]ηiβ;Δσi2=1σ2+12σ4mi1[(κ^i+λ2)(yiηi)22λt^i(yiηi)+t2^i];Δλi=1σ2mi1[λ(yiηi)2t^i(yiηi)];Δρi={12σ2mi1[(κ^i+λ2)(yiηi)22λt^i(yiηi)]+1}mi1miρ;Δτi=Q2i(τ,θ^)τ(SeeAppendix2),fori=1,,n.
A.6.2. Response variable perturbation

A perturbation of the response variables y=(y1,,yn) is introduced by yω=y+Syω, where Sy is the standard deviation of y. In this case, ω0=0n×1 and

Q(θ,ω|θ^)i=1nQ2i(τ|θ^)nlogσ2i=1nlogmi12σ2i=1nmi1[(κ^i+λ2)(yi+Syωiηi)22λt^i(yi+Syωiηi)+t2^i].

In this case the elements of Δθi, i=1,,n, are given by

Δβi=Syσ2mi1(κ^i+λ2)ηiβ;Δσi2=Syσ4mi1[(κ^i+λ2)(yiηi)λt^i];Δλi=Syσ2mi1[2λ(yiηi)t^i];Δρi=Syσ2mi2[(κ^i+λ2)(yiηi)λt^i]miρ;Δτi=0.
A.6.3. Explanatory variable perturbation

A perturbation in a specific explanatory variable xp can be obtained as xiωp=xip+Sxωi, for i=1,,n, where Sx is a scale factor that can be the standard deviation of xp. To simplify the notation, we write miω=mi(ρ,xiωp) and ηiω=η(β,xiωp). In this case, ω0=0n×1 and

Q(θ,ω|θ^)i=1nQ2i(τ|θ^)nlogσ2i=1nlogmiω12σ2i=1nmiω1[(κ^i+λ2)(yiηiω)22λt^i(yiηiω)+t2^i].

In this case the elements of 2Q(θ,ω|θ^)/θωi, i=1,,n, are given by:

2Q(θ,ω|θ^)βωi=1σ2{1miω2miωωiηiωβ[(κ^i+λ2)(yiηiω)+λt^i]+1miω[(κ^i+λ2)ηiωωiηiωβ+λt^i2ηiωωiβ]};2Q(θ,ω|θ^)σ2ωi=12σ4{1miω2miωωi[(κ^i+λ2)(yiηiω)2+t2^i2λt^i(yiηiω)]+2miωηiωωi[(κ^i+λ2)(yiηiω)+λt^i]};2Q(θ,ω|θ^)λωi=1σ2{1miω2miωωi[λ(yiηiω)2t^i(yiηiω)]+1miωηiωωi[2λ(yiηiω)+t^i]};2Q(θ,ω|θ^)ρωi=1miω2[miω2miωωiρmiωωimiωρ]+12σ2{1miω3[miω2miωωiρ2miωωimiωρ]×[(κ^i+λ2)(yiηiω)2+t2^i2λt^i(yiηiω)]+1miω2miωρηiωωi[2(κ^i+λ2)(yiηiω)+2λt^i]};2Q(θ,ω|θ^)τωi=0.

Funding Statement

The first author thanks to the FAPEMIG (Minas Gerais State Research Support Foundation), [grant number CEX APQ 01944/17] for financial support. The research of Aldo M. Garay was supported by Grant 420082/2016-6 from CNPq-Brazil.

Disclosure statement

No potential conflict of interest was reported by the authors.

References

  • 1.Andrews D.F. and Mallows C.L., Scale mixtures of normal distributions, J. R. Stat. Soc. Ser. B 36 (1974), pp. 99–102. [Google Scholar]
  • 2.Araújo M.C., Cysneiros A.H.M.A., and Montenegro L.C., Improved heteroskedasticity likelihood ratio tests in symmetric nonlinear regression models, Stat. Papers (2017). doi: 10.1007/s00362-017-0933-5. [DOI] [Google Scholar]
  • 3.Azzalini A., A class of distributions which includes the normal ones, Scand. J. Statist. 12 (1985), pp. 171–178. [Google Scholar]
  • 4.Battes D.M. and Watts D.G., Nonlinear Regression Analysis and its Applications, John Wiley & Sons, New York, 1988. [Google Scholar]
  • 5.Beale E.M.L. and Little R.J.A., Missing values in multivariate analysis, J. R. Stat. Soc. Ser. B 37 (1975), pp. 129–146. [Google Scholar]
  • 6.Branco M.D. and Dey D.K., A general class of multivariate skew-elliptical distributions, J. Multivar. Anal. 79 (2001), pp. 99–113. doi: 10.1006/jmva.2000.1960 [DOI] [Google Scholar]
  • 7.Cancho V.C., Lachos V.H., and Ortega E.M.M., A nonlinear regression model with skew-normal errors, Stat. Papers 51 (2010), pp. 547–558. doi: 10.1007/s00362-008-0139-y [DOI] [Google Scholar]
  • 8.Cao C., Wang Y., Shi J.Q., and Lin J., Measurement error models for replicated data under asymmetric heavy-tailed distributions, Comput. Econ. 52 (2018), pp. 531–553. doi: 10.1007/s10614-017-9702-8 [DOI] [Google Scholar]
  • 9.Chen F., Zhu H.T., and Lee S.Y., Perturbation selection and local influence analysis for nonlinear structural equation model, Psychometrika 74 (2009), pp. 493–516. doi: 10.1007/s11336-009-9114-3 [DOI] [Google Scholar]
  • 10.Cook R.D., Detection of influential observation in linear regression, Technometrics 19 (1977), pp. 5–18. [Google Scholar]
  • 11.Cook R.D., Assessment of local influence, J. R. Stat. Soc. Ser. B 48 (1986), pp. 133–169. [Google Scholar]
  • 12.Cook R.D. and Weisberg S., Residuals and Influence in Regression, Chapman & Hall/CRC, Boca Raton, FL, 1982. [Google Scholar]
  • 13.Dempster A., Laird N., and Rubin D., Maximum likelihood from incomplete data via the EM algorithm, J. R. Stat. Soc. Ser. B 39 (1977), pp. 1–38. [Google Scholar]
  • 14.Ferreira C.S. and Arellano-Valle R., Estimation and diagnostic analysis in skew-generalized-normal regression models, J. Stat. Comput. Simul. 88 (2018), pp. 1039–1059. doi: 10.1080/00949655.2017.1419351 [DOI] [Google Scholar]
  • 15.Ferreira C.S., Bolfarine H., and Lachos V.H., Skew scale mixtures of normal distributions: Properties and estimation, Stat. Methodol. 8 (2011), pp. 154–171. doi: 10.1016/j.stamet.2010.09.001 [DOI] [Google Scholar]
  • 16.Ferreira C.S. and Lachos V.H., Nonlinear regression models under skew scale mixtures of normal distributions, Stat. Methodol. 33 (2016), pp. 131–146. doi: 10.1016/j.stamet.2016.08.004 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 17.Ferreira C.S., Lachos V.H., and Bolfarine H., Inference and diagnostics in skew scale mixtures of normal regression models, J. Stat. Comput. Simul. 85 (2015), pp. 517–537. doi: 10.1080/00949655.2013.828057 [DOI] [Google Scholar]
  • 18.Garay A.M., Lachos V.H., and Abanto-Valle C.A., Nonlinear regression models based on scale mixtures of skew-normal distributions, J. Korean Stat. Soc. 40 (2011), pp. 115–124. doi: 10.1016/j.jkss.2010.08.003 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 19.Garay A.M., Lachos V.H., Labra F.V., and Ortega E.M.M., Statistical diagnostics for nonlinear regression models based on scale mixtures of skew-normal distributions, J. Stat. Comput. Simul. 84 (2014), pp. 1761–1778. doi: 10.1080/00949655.2013.766188 [DOI] [Google Scholar]
  • 20.Gómez H.W., Venegas O., and Bolfarine H., Skew-symmetric distributions generated by the distribution function of the normal distribution, Environmetrics 18 (2007), pp. 395–407. doi: 10.1002/env.817 [DOI] [Google Scholar]
  • 21.Henze N., A probabilistic representation of the skew-normal distribution, Scand. J. Statist. 13 (1986), pp. 271–275. [Google Scholar]
  • 22.Labra F.V., Garay A.M., Lachos V.H., and Ortega E.M.M., Estimation and diagnostics for heteroscedastic nonlinear regression models based on scale mixtures of skew-normal distributions, J. Stat. Plan. Inference 142 (2012), pp. 2149–2165. doi: 10.1016/j.jspi.2012.02.018 [DOI] [Google Scholar]
  • 23.Lee S.Y. and Xu L., Influence analyses of nonlinear mixed-effects models, Comput. Stat. Data. Anal. 45 (2004), pp. 321–341. doi: 10.1016/S0167-9473(02)00303-1 [DOI] [Google Scholar]
  • 24.Lin J.G., Xie F.C., and Wei B., Statistical diagnostics for skew-t-normal nonlinear models, Comm. Stat. Simul. Comput. 38 (2009), pp. 2096–2110. doi: 10.1080/03610910903249502 [DOI] [Google Scholar]
  • 25.Liu C. and Rubin D.B., The ECME algorithm: A simple extension of EM and ECM with faster monotone convergence, Biometrika 80 (1994), pp. 267–278. [Google Scholar]
  • 26.Louzada F., Ferreira P.H., and Diniz C.A., Skew-normal distribution for growth curve models in presence of a heteroscedasticity structure, J. Appl. Stat. 41 (2014), pp. 1785–1798. doi: 10.1080/02664763.2014.891005 [DOI] [Google Scholar]
  • 27.Meng X.L. and Rubin D.B., Maximum likelihood estimation via the ECM algorithm: A general framework, Biometrika 81 (1993), pp. 633–648. [Google Scholar]
  • 28.Montenegro L.C., Bolfarine H., and Lachos V.H., Influence diagnostics for a skew extension of the grubbs measurement error model, Comm. Stat. Simul. Comput. 38 (2009), pp. 667–681. doi: 10.1080/03610910802618385 [DOI] [Google Scholar]
  • 29.R Core Team. R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing. Vienna, Austria, 2018. Available at http://www.R-project.org/.
  • 30.Ratkowsky D.A., Handbook of Nonlinear Regression Models, Marcel Dekker, New York, 1990. [Google Scholar]
  • 31.Vaida F. and Liu L., Fast implementation for normal mixed effects models with censored response, J. Comput. Graph. Stat. 18 (2009), pp. 797–817. doi: 10.1198/jcgs.2009.07130 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 32.Xie F.C., Lin J.G., and Wei B.C., Diagnostics for skew-normal nonlinear regression models with ar(1) errors, Comput. Stat. Data. Anal. 53 (2009a), pp. 4403–4416. doi: 10.1016/j.csda.2009.06.010 [DOI] [Google Scholar]
  • 33.Xie F.C., Wei B.C., and Lin J.G., Homogeneity diagnostics for skew-normal nonlinear regression models, Stat. Probab. Lett. 79 (2009b), pp. 821–827. doi: 10.1016/j.spl.2008.11.001 [DOI] [Google Scholar]
  • 34.Zhu H.T., Ibrahim J.G., Lee S.Y., and Zhang H.P., Perturbation selection and influence measures in local influence analysis, Ann. Statist. 35 (2007), pp. 2565–2588. doi: 10.1214/009053607000000343 [DOI] [Google Scholar]
  • 35.Zhu H. and Lee S., Local influence for incomplete-data models, J. R. Stat. Soc. Ser. B 63 (2001), pp. 111–126. doi: 10.1111/1467-9868.00279 [DOI] [Google Scholar]
  • 36.Zhu H., Lee S., Wei B., and Zhou J., Case-deletion measures for models with incomplete data, Biometrika 88 (2001), pp. 727–737. doi: 10.1093/biomet/88.3.727 [DOI] [Google Scholar]

Articles from Journal of Applied Statistics are provided here courtesy of Taylor & Francis

RESOURCES