Influence diagnostics in the Heckman selection models based on EM algorithms

Marcos S Oliveira; Marcos O Prates; Christian E Galarza; Victor H Lachos

doi:10.1080/02664763.2025.2461715

. 2025 Feb 5;52(13):2384–2412. doi: 10.1080/02664763.2025.2461715

Influence diagnostics in the Heckman selection models based on EM algorithms

Marcos S Oliveira ^a,^CONTACT, Marcos O Prates ^b, Christian E Galarza ^c, Victor H Lachos ^d

PMCID: PMC12490367 PMID: 41048364

Abstract

This study presents diagnostic techniques for Heckman selection models estimated using the EM algorithm. The focus is on the selection t and normal models, based on the bivariate Student's-t and bivariate normal distributions, respectively. The Heckman selection model is a key econometric tool for estimating relationships while addressing selection bias. Relying on the EM-type algorithm, we develop global and local influence analyses based on the conditional expectation of the complete-data log-likelihood function, exploring four perturbation schemes for local influence analysis. To assess the effectiveness of the proposed diagnostic measures in identifying influential observations, we conducted a simulation study, complemented by two real-data applications that demonstrate how these techniques can effectively identify influential points. The proposed algorithms and methodologies are incorporated into the R package HeckmanEM.

Keywords: Case-deletion, Heckman selection model, local influence, model perturbation, multivariate Student's-t

1. Introduction

The Heckman selection model, introduced by Heckman [14], is a widely used method in econometrics and statistics to correct for selection bias in datasets where the outcomes are observed only for a non-random subset of the population. The model originally assumes bivariate normality (SLn), simplifying its mathematical formulation. However, real-world data often deviate from this assumption, exhibiting skewness or heavy tails, which can lead to biased estimates if not properly addressed. To address these limitations, Marchenko and Genton [24] extended the model to incorporate a bivariate Student's-t error distribution (SLt), offering greater flexibility for modeling heavier tails with the addition of the degrees of freedom parameter. This extension enhances resistance to outliers, a common issue in empirical datasets.

Numerous studies have advanced the understanding of Heckman selection models and their variations. For example, Lee [20] provided a generalized framework for selection models, while Bastos and Barreto-Souza [3] introduced a sample selection model based on the bivariate Birnbaum–Saunders distribution. Subsequently, Bastos et al. [4] generalized the Heckman model by allowing selection bias and dispersion parameters to depend on covariates. Lee [21] developed nonparametric methods for estimating treatment effects under selection bias, and more recently, Saulo et al. [33] extended the model to a broad class of symmetric distributions. Despite these advancements and the relevance of the area, there remains a significant gap in diagnostic methodologies specifically tailored to Heckman selection models, particularly regarding resistance and influence analysis.

Recognizing the need for reliable estimation methods under the SLn [36] and SLt model [19] proposed an EM-type algorithm that improves the computation of maximum likelihood estimates by involving the first two moments of a truncated respective multivariate distribution in the E-step. This estimation methodology served as the foundation for the development of the diagnostic methods for Heckman models presented in this work.

The importance of influence diagnostics in the Heckman selection models lies in its sensitivity to influential observations. Identifying the impact of specific data points is crucial, as even a few influential cases can disproportionately affect parameter estimates and predictions, leading to potentially misleading conclusions.

This study seeks to fill the existing gap by adapting diagnostic procedures to Heckman selection models and offering practical tools to enhance the validity of analyses in real-world applications. Although diagnostic methodologies for regression models have been extensively studied, influence diagnostics for Heckman selection models, particularly in local influence analysis, remain unexplored. The observed log-likelihood function of the SLn model includes intractable integrals, complicating the application of Cook's approach (see [7]). To address these challenges, Zhu and Lee [38] introduced local influence analysis using the Q-displacement function, aligning with the E-step of the EM algorithm; additionally, Zhu et al. [39] explored case-deletion methods for identifying influential observations.

Building on the work of Zhu and Lee [38], this study adapts their diagnostic procedures for both the SLn and SLt Heckman models and demonstrates their application with real-world data. Our framework identifies influential observations and assesses their impact on the stability and reliability of model estimates. These diagnostic tools enable practitioners to refine their models, ensuring that decisions are not unduly influenced by anomalous data points. The proposed methods are implemented in the R package HeckmanEM, providing researchers and analysts with accessible solutions for applying Heckman selection models.

The paper is structured as follows. Section 2 introduces the multivariate Student's-t distribution, its truncated version, and the extended multivariate skew-t and extended skew-normal distributions. Section 3 discusses the SLn and SLt models and the EM-type algorithm for maximum likelihood estimates. Section 4 derives diagnostic measures for global and local influence, considering four perturbation schemes. The simulation study and two real-data applications are presented in Sections 5 and 6, respectively, while Section 7 provides final remarks and conclusions. Proofs for the derived quantities are in the Appendix.

2. Background

2.1. The multivariate Student's-t distribution and its truncated version

A p-dimensional random variable $W$ following a multivariate Student's-t (MVT) distribution with location vector $μ$ , positive-definite scale-covariance matrix $Σ$ , and degrees of freedom ν is denoted by $W \sim t_{p} (μ, Σ, ν)$ , with its probability density function (pdf) denoted by $t_{p} (w ∣ μ, Σ, ν)$ . For $α = (α_{1}, \dots, α_{p})^{⊤}$ and $β = (β_{1}, \dots, β_{p})^{⊤}$ , the cumulative distribution function (cdf) is represented by:

\begin{aligned} T_{p} (α, β; μ, Σ, ν) = \int_{α}^{β} t_{p} (w ∣ μ, Σ, ν) d w . \end{aligned}

Special cases include $T_{p} (β; μ, Σ, ν)$ for $α = - \infty$ , and $T_{p} (β; ν)$ and $t_{p} (β ∣ ν)$ when $μ = 0$ and $Σ = I_{p}$ . When p = 1, the subscript p is omitted. As ν approaches infinity, $W$ converges in distribution to a multivariate normal (MVN) distribution $N_{p} (μ, Σ)$ . A key property of $W$ is its representation as a scale mixture of an MVN random vector and a positive random variable:

\begin{aligned} W = μ + U^{- 1 / 2} Z, \end{aligned}

(1)

where $Z \sim N_{p} (0, Σ)$ and is independent of $U \sim G (ν / 2, ν / 2)$ , with $G (α, β)$ being a gamma distribution with mean $α / β$ .

Considering the Borel set $B$ within $R^{p}$ :

\begin{aligned} B = {(w_{1}, \dots, w_{p}) \in R^{p} : α_{1} \leq w_{1} \leq β_{1}, \dots, α_{p} \leq w_{p} \leq β_{p}} = {w \in R^{p} : α \leq x \leq β}, \end{aligned}

(2)

a p-dimensional random vector $X$ following a doubly truncated Student's-t (TMVT) distribution within the truncation region $B$ , denoted by $X \sim T t_{p} (μ, Σ, ν; B)$ , has the pdf:

\begin{aligned} T t_{p} (x ∣ μ, Σ, ν; B) = \frac{t_{p} (x ∣ μ, Σ, ν)}{T_{p} (α, β; μ, Σ, ν)}, α \leq x \leq β . \end{aligned}

The cdf of $X$ within $α \leq x \leq β$ is:

\begin{aligned} T T_{p} (x ∣ μ, Σ, ν; B) = \frac{1}{T_{p} (α, β; μ, Σ, ν)} \int_{α}^{x} t_{p} (y ∣ μ, Σ, ν) d y = \frac{T_{p} (α, x; μ, Σ, ν)}{T_{p} (α, β; μ, Σ, ν)} . \end{aligned}

In the next subsection, we will introduce the multivariate extended skew-t and skew-normal distributions, which are pivotal for improving computational efficiency in moment calculations within the EM algorithm for Heckman selection models.

2.2. The multivariate extended skew-t distribution

The multivariate extended skew-t (EST) distribution for a p-dimensional random vector $X$ with location vector $μ$ , positive-definite dispersion matrix $Σ$ , skewness vector $λ$ , $τ \in R$ , and degrees of freedom ν is denoted as $X \sim ES T_{p} (μ, Σ, λ, τ, ν)$ . The pdf is:

\begin{aligned} ES T_{p} (x ∣ μ, Σ, λ, τ, ν) \\ = \frac{t_{p} (x ∣ μ, Σ, ν)}{T (τ / \sqrt{1 + λ^{⊤} λ}; ν)} T {\sqrt{\frac{ν + p}{ν + δ (x)}} (τ + λ^{⊤} Σ^{- 1 / 2} (x - μ)); ν + p} . \end{aligned}

(3)

From Valeriano et al. [35], the mean vector and variance-covariance matrix of $X$ are given by:

\begin{aligned} E [X] = μ + η_{1} Σ^{1 / 2} Δ, Cov [X] = Σ^{1 / 2} [γ (I_{p} - Δ Δ^{⊤}) + (η_{2} - η_{1}^{2}) Δ Δ^{⊤}] Σ^{1 / 2}, \end{aligned}

(4)

where $Δ = λ / (1 + λ^{⊤} λ)^{1 / 2}$ , $γ = \frac{ν + η_{2}}{ν - 1}$ , $η_{1} = \frac{ν}{ν - 1} (1 + \frac{{\tilde{τ}}^{2}}{ν}) \frac{t (\tilde{τ} ∣ ν)}{T (\tilde{τ}; ν)}, ν > 1,$ and $η_{2} = \frac{ν (ν - 1)}{ν - 2} \frac{T (\sqrt{(ν - 2) / ν} \tilde{τ}; ν - 2)}{T (\tilde{τ}; ν)} - ν, ν > 2,$ with $\tilde{τ} = τ / (1 + λ^{⊤} λ)^{1 / 2}$ . When $τ = 0$ , the distribution simplifies to the skew-t distribution as described by Lachos et al. [18]. In the limits $τ \to + \infty$ and $ν \to + \infty$ , it converges to the Student's-t and multivariate extended skew-normal (ESN) distributions, respectively. The ESN pdf is given by:

\begin{aligned} ES N_{p} (x ∣ μ, Σ, λ, τ) = \frac{ϕ_{p} (x ∣ μ, Σ) Φ (τ + λ^{⊤} Σ^{- 1 / 2} (x - μ))}{Φ (τ / \sqrt{1 + λ^{⊤} λ})}, \end{aligned}

(5)

denoted as $X \sim ES N_{p} (μ, Σ, λ, τ)$ . From Galarza et al. [11], the mean vector and variance-covariance matrix of an ESN random vector are:

\begin{aligned} E [X] = μ + η Σ^{1 / 2} λ, Cov [X] = Σ - η Σ^{1 / 2} λ {(η λ - \frac{τ}{1 + λ^{⊤} λ} λ)}^{⊤} Σ^{1 / 2}, \end{aligned}

(6)

where $η = ϕ (τ ∣ 0, 1 + λ^{⊤} λ) / Φ (τ / \sqrt{1 + λ^{⊤} λ})$ . Here, $ϕ_{p} (\cdot ∣ μ, Σ)$ and $Φ_{p} (\cdot ∣ μ, Σ)$ represent the pdf and cdf of $N_{p} (μ, Σ)$ , respectively, with the subscript p omitted for p = 1. Refer to Galarza et al. [10] and Galarza et al. [11] for more details on the EST and ESN distribution properties.

The moments from Equation (4) are essential for the E-step in the EM algorithm of the Heckman selection-t model. Despite their asymmetry, the EST and ESN distributions naturally arise in selection models, belonging to a broader class known as the multivariate selection elliptical family (see, [1]).

3. The Heckman selection model

Sample selection bias and missing data often pose significant challenges in research. The SL model tackles these issues using two equations: a linear equation for the dependent variable and a Probit equation for the sample selection process. The linear equation describes the relationship between the independent and dependent variables, while the Probit equation estimates the probability of a sample being selected. The outcome equation is:

\begin{aligned} Y_{1 i} = x_{i}^{⊤} β + ϵ_{1 i}, \end{aligned}

(7)

and the sample selection mechanism is described by the latent linear equation:

\begin{aligned} Y_{2 i} = w_{i}^{⊤} γ + ϵ_{2 i}, \end{aligned}

(8)

for $i \in {1, \dots, n}$ . Here, $β \in R^{p}$ and $γ \in R^{q}$ are unknown parameters, $x_{i}^{⊤} = (x_{i 1}, \dots, x_{ip})$ and $w_{i}^{⊤} = (w_{i 1}, \dots, w_{iq})$ are known characteristics. The covariates in $x_{i}$ and $w_{i}$ can overlap, and the exclusion restriction is met when at least one element of $w_{i}$ is not in $x_{i}$ . The sample selection indicator is $C_{i} = I (Y_{2 i} > 0)$ . We observe the outcome $V_{1 i}$ only if $C_{i} > 0$ , which means $Y_{1 i} = V_{1 i}$ if $C_{i} = 1$ , and $Y_{1 i} = NA$ (missing data) if $C_{i} = 0$ . Therefore, the observed data for the ith subject is $(V_{i}, C_{i})$ , where $V_{i}$ represents the vector of censored readings and $C_{i} = I (Y_{2 i} > 0)$ is the censoring indicator.

3.1. The classical Heckman selection model

Heckman [15] assumes that the error terms are independently distributed according to a bivariate normal distribution:

\begin{aligned} (\begin{matrix} ϵ_{1 i} \\ ϵ_{2 i} \end{matrix}) \sim N_{2} (0, Σ), Σ = (\begin{array}{cc} σ^{2} & ρσ \\ ρσ & 1 \end{array}), \end{aligned}

(9)

where the second diagonal element equals one due to the probit link associated with the latent variable $Y_{2 i}$ , ensuring model identifiability. The model defined in (7)–(9) is referred to as the Heckman selection (SLn) model, with parameter vector $θ = (β^{⊤}, γ^{⊤}, σ^{2}, ρ)^{⊤}$ . When the selection effect is absent ( $ρ = 0$ ), it indicates that the unobserved outcomes are missing at random.

Using Bayes' rule, the conditional pdf of an observation $Y_{1 i} = V_{1 i} ∣ (C_{i} = 1)$ is (see, [24]):

\begin{aligned} f (Y_{1 i} ∣ C_{i} = 1, x_{i}, w_{i}; θ) = ϕ (V_{1 i} ∣ x_{i}^{⊤} β, σ^{2}) Φ (\frac{w_{i}^{⊤} γ + \frac{ρ}{σ} (V_{1 i} - x_{i}^{⊤} β)}{\sqrt{1 - ρ^{2}}}) / Φ (w_{i}^{⊤} γ), \end{aligned}

(10)

which belongs to the ESN family, as discussed in Subsection 2.2, i.e.

\begin{aligned} Y_{1 i} = V_{1 i} ∣ (C_{i} = 1) \sim ESN (μ = x_{i}^{⊤} β, Σ = σ^{2}, λ = \frac{ρ}{\sqrt{1 - ρ^{2}}}, τ = \frac{w_{i}^{⊤} γ}{\sqrt{1 - ρ^{2}}}) . \end{aligned}

From (6), we have $η = \sqrt{1 - ρ^{2}} ϕ (w_{i}^{⊤} γ) / Φ (w_{i}^{⊤} γ)$ and $Σ^{1 / 2} λ = σρ / \sqrt{1 - ρ^{2}}$ , thus the mean equation for the observed outcomes is:

\begin{aligned} E [Y_{1 i} ∣ C_{i} = 1, x_{i}, w_{i}; θ] = x_{i}^{⊤} β + ρσ λ^{N} (w_{i}^{⊤} γ), \end{aligned}

(11)

where $λ^{N} (a) = ϕ (a) / Φ (a)$ is the inverse Mills ratio. The SLn problem can be viewed as a model misspecification case, combining a linear component, $x_{i}^{⊤} β$ , with a nonlinear correction term, $ρσ λ^{N} (w_{i}^{⊤} γ)$ . Heckman [15] proposed a two-step procedure to address this, which is less efficient than ML estimation but remains robust even if the error terms are not jointly normal. In the two-step procedure, the standard probit model $P (C_{i} = c_{i} ∣ w_{i}; γ) = (Φ (w_{i}^{⊤} γ))^{c_{i}} (1 - Φ (w_{i}^{⊤} γ))^{1 - c_{i}}$ provides the estimate $\hat{γ}$ ; then $λ^{N} (w_{i}^{⊤} \hat{γ})$ is an additional covariate in (11), and the least squares coefficient of $λ^{N} (w_{i}^{⊤} \hat{γ})$ estimates $ρσ$ . This can be implemented in R using the sampleSelection library [16]. Alternatively, the ML estimate of $θ$ can be computed by maximizing the likelihood function given the observed data $(V, C)$ :

\begin{aligned} L (θ ∣ V, C) = \prod_{i = 1}^{n} {ϕ (V_{1 i} | x_{i}^{⊤} β, σ^{2}) Φ (\frac{w_{i}^{⊤} γ + \frac{ρ}{σ} (V_{1 i} - x_{i}^{⊤} β)}{\sqrt{1 - ρ^{2}}})}^{c_{i}} {Φ (- w_{i}^{⊤} γ)}^{1 - c_{i}}, \end{aligned}

(12)

or via the ECM algorithm discussed in Zhao et al. [36] and Lachos et al. [19], implemented in the R package HeckmanEM [19].

3.2. The Heckman selection-t model

Marchenko and Genton [24] introduced the selection-t (SLt) model to handle heavy-tailed distributions. This model assumes that the error terms in Equations (7)-(8) follow a bivariate Student's-t distribution with unknown degrees of freedom, ν:

\begin{aligned} (\begin{matrix} ϵ_{1 i} \\ ϵ_{2 i} \end{matrix}) \sim t_{2} (0, Σ, ν), Σ = (\begin{array}{cc} σ^{2} & ρσ \\ ρσ & 1 \end{array}) . \end{aligned}

(13)

The model, defined by (7)–(8) and (13), is known as the Heckman selection-t (SLt) model, with parameter vector $θ = (β^{⊤}, γ^{⊤}, σ^{2}, ρ, ν)^{⊤}$ . According to Miao et al. [28] (Theorem 5), all SLt model parameters are identifiable from the observed data $P (Y_{1 i}, C_{i} = 1), i = 1, \dots, n$ .

The conditional pdf of an observed outcome $Y_{1 i} = V_{1 i} ∣ (C_{i} = 1)$ (see, [24]) is given by:

\begin{aligned} f (Y_{1 i} ∣ C_{i} = 1, x_{i}, w_{i}; θ) \\ = t (V_{1 i} ∣ x_{i}^{⊤} β, σ^{2}, ν) T (\frac{w_{i}^{⊤} γ + \frac{ρ}{σ} (V_{1 i} - x_{i}^{⊤} β)}{\sqrt{1 - ρ^{2}}} \sqrt{\frac{ν + 1}{ν + δ (V_{1 i})}}; ν + 1) / T (w_{i}^{⊤} γ; ν), \end{aligned}

(14)

that is,

\begin{aligned} Y_{1 i} = V_{1 i} ∣ (C_{i} = 1) \sim EST (μ = x_{i}^{⊤} β, Σ = σ^{2}, λ = \frac{ρ}{\sqrt{1 - ρ^{2}}}, τ = \frac{w_{i}^{⊤} γ}{\sqrt{1 - ρ^{2}}}, ν) . \end{aligned}

Using Equation (4), the conditional expectation for the observed outcome simplifies to

\begin{aligned} E [Y_{1 i} ∣ C_{i} = 1, x_{i}, w_{i}] = x_{i}^{⊤} β + σρ λ_{ν} (w_{i}^{⊤} γ), ν > 1, \end{aligned}

(15)

by identifying the terms $\tilde{τ} = w_{i}^{⊤} γ$ , $η_{1} = \frac{ν + (w_{i}^{⊤} γ)^{2}}{ν - 1} \cdot \frac{t (w_{i}^{⊤} γ | ν)}{T (w_{i}^{⊤} γ; ν)}$ , $Σ^{1 / 2} Δ = σρ$ , and defining $λ_{ν} (a) = \frac{ν + a^{2}}{ν - 1} \cdot \frac{t (a; ν)}{T (a; ν)}$ as a specialized function. Like the SLn model, the traditional OLS regression produces inconsistent results when $ρ = 0$ . Marchenko and Genton [24, Figure 1] found that for negative values of the selection linear predictor $w_{i}^{⊤} γ$ , the conditional expectation (15) is typically underestimated in the SLn model when the degrees of freedom ν are moderate. However, this bias diminishes as the degrees of freedom increase.

The likelihood function of $θ$ for the SLt model, given the observed sample $(V, C)$ , is expressed as:

\begin{aligned} L (θ ∣ V, C) & = \prod_{i = 1}^{n} {t (V_{1 i} ∣ x_{i}^{⊤} β, σ^{2}, ν) T (\frac{w_{i}^{⊤} γ + \frac{ρ}{σ} (V_{1 i} - x_{i}^{⊤} β)}{\sqrt{1 - ρ^{2}}} \sqrt{\frac{ν + 1}{ν + δ (V_{1 i})}}; ν + 1)}^{c_{i}} \\ \times {T (- w_{i}^{⊤} γ; ν)}^{1 - c_{i}} . \end{aligned}

(16)

As seen, it is similar to the SLn case due to the standard probit model $P (C_{i} = c_{i} | w_{i}; γ) = (T (w_{i}^{⊤} γ; ν))^{c_{i}} (1 - T (w_{i}^{⊤} γ; ν))^{1 - c_{i}}$ . There are no closed-form expressions for the ML estimates of the parameters in (16); thus, the ML estimates are obtained numerically or via the ECM algorithm implemented in the R package HeckmanEM. We briefly outline the EM-type algorithm from Lachos et al. [19], where all parameters are updated (M-step) by treating both the outcome $(Y_{1 i})$ and sample selection ( $Y_{2 i}$ ) as missing data [26,34]. Further technical details are in Lachos et al. [19].

Disregarding censoring momentarily, consider observations for n independent individuals:

\begin{aligned} Y_{i} \sim t_{2} (μ_{i}, Σ, ν), i \in {1, \dots, n} . \end{aligned}

Here, $Y_{i} = (Y_{1 i}, Y_{2 i})^{⊤}$ represents the vector of independent responses for sample unit i, with

\begin{aligned} μ_{i} = X_{ic} β_{c}, X_{ic} = (\begin{array}{cc} x_{i}^{⊤} & 0 \\ 0 & w_{i}^{⊤} \end{array}), β_{c} = (\begin{matrix} β \\ γ \end{matrix}), \end{aligned}

and the dispersion matrix $Σ$ depends on an unknown parameter vector $(σ, ρ)$ , as defined in (9). Using representation (1) and temporarily disregarding censoring, the distribution of $Y_{i}$ can be hierarchically expressed as follows:

\begin{aligned} Y_{i} ∣ U_{i} = u_{i} \sim N_{2} (μ_{i}, u_{i}^{- 1} Σ), U_{i} \sim G (ν / 2, ν / 2) . \end{aligned}

(17)

Consider $y = (y_{1}^{⊤}, \dots, y_{n}^{⊤})^{⊤}$ , $V = (V_{1}, \dots, V_{n})$ , $C = (C_{1}, \dots, C_{n})$ , $u = (U_{1}, \dots, U_{n})$ , where we observe $(V_{i}, C_{i})$ for the ith subject. In the estimation process, $y$ and $u$ are considered as hypothetical missing data and augmented with the observed dataset to form $y_{c} = (V, C, y, u)^{⊤}$ . Therefore, the EM-type algorithm is applied to the complete-data log-likelihood function:

\begin{aligned} ℓ_{c} (θ ∣ y_{c}) = \sum_{i = 1}^{n} ℓ_{ic} (θ), \end{aligned}

where

\begin{aligned} ℓ_{ic} (θ) = - \frac{1}{2} {\ln | Σ | + u_{i} (y_{i} - μ_{i})^{⊤} Σ^{- 1} (y_{i} - μ_{i})} + \ln h (u_{i} ∣ ν) + c . \end{aligned}

Here, c represents a constant that does not depend on $θ$ , and $h (u_{i} ∣ ν)$ denotes the pdf of the $G (ν / 2, ν / 2)$ distribution. The EM algorithm for the SLt model can be outlined in the following two steps:

E-step:
with the current estimation $θ = {\hat{θ}}^{(k)}$ at the kth stage of the algorithm, the E-step offers the conditional expectation of the complete-data log-likelihood function
$\begin{aligned} Q (θ ∣ {\hat{θ}}^{(k)}) & = E [ℓ_{c} (θ ∣ y_{c}) ∣ V, C, {\hat{θ}}^{(k)}] = \sum_{i = 1}^{n} E [ℓ_{ic} (θ ∣ y_{c}) ∣ V_{i}, C_{i}, {\hat{θ}}^{(k)}] \\ = \sum_{i = 1}^{n} Q_{i} (θ ∣ {\hat{θ}}^{(k)}), \end{aligned}$ (18)
where
$\begin{aligned} Q_{i} (θ ∣ {\hat{θ}}^{(k)}) & = Q_{i} (β^{⊤}, γ^{⊤}, σ^{2}, ρ, ν ∣ {\hat{θ}}^{(k)}) \\ = - \frac{1}{2} \ln | Σ | - \frac{1}{2} tr \\ \times [({\hat{u y_{i}^{2}}}^{(k)} - {\hat{u y_{i}}}^{(k)} {μ_{i}}^{⊤} - μ_{i} ({\hat{u y_{i}}}^{(k)})^{⊤} + {\hat{u_{i}}}^{(k)} μ_{i} μ_{i}^{⊤}) Σ^{- 1}], \end{aligned}$ (19)
with ${\hat{u_{i}}}^{(k)} = E (U_{i} ∣ V_{i}, C_{i}, {\hat{θ}}^{(k)})$ , ${\hat{u y_{i}}}^{(k)} = E (U_{i} Y_{i} ∣ V_{i}, C_{i}, {\hat{θ}}^{(k)})$ , and ${\hat{u y_{i}^{2}}}^{(k)} = E (U_{i} Y_{i} Y_{i}^{⊤} ∣ V_{i}, C_{i}, {\hat{θ}}^{(k)})$ . Consider that calculating ${\hat{κ_{i}}}^{(k)} = E {\ln h (U_{i} ∣ ν) ∣ V_{i}, C_{i}, {\hat{θ}}^{(k)}}$ directly poses analytical challenges. To circumvent this, they opt for the constrained ML step (CML-step) to update ν instead. In the CML-step, the actual log-likelihood function is maximized under specific constraints, rather than the Q-function. Parameter transformations $ψ = σ^{2} (1 - ρ^{2})$ and $ρ_{*} = ρσ$ are utilized to derive closed-form expressions in the M-Step.
M-step:
the conditional maximization of $Q (θ ∣ {\hat{θ}}^{(k)})$ is performed regarding $β_{c}, σ^{2}, ρ$ , yielding updated estimates ${\hat{β_{c}}}^{(k + 1)}$ , ${\hat{σ}}^{2 (k + 1)}$ , ${\hat{ρ}}^{(k + 1)}$ . The closed-form expressions for these estimates, along with a suitable approach for computing ML estimate standard errors for the SLt model and residual analysis, are detailed in Lachos et al. [19]. These methodologies are implemented in the R package HeckmanEM.

4. Influence diagnostics

Techniques for diagnosing influence focus on determining how sensitive a model's parameter estimates are to changes in the dataset or the model's underlying assumptions. Two main approaches are employed to identify influential data points. The first approach, known as the case-deletion technique, assesses the impact of excluding a particular observation by comparing the parameter estimates before and after its removal. This involves fitting one or more models without the observation and evaluating the differences using metrics like likelihood or Cook's distance [6]. The second approach, termed the local influence method, explores the effect of making minor adjustments to an observation on the analysis results, instead of completely omitting it Cook [7]. Building on the work of Zhu and Lee [39], we introduce case-deletion measures and local influence measures for the Heckman selection-t model, utilizing the Q-function derived during the E-step of the EM algorithm; see Zhu et al. [38]. We start by discussing the case-deletion measures, then move on to the local influence measures, and finally describe the perturbation schemes used.

4.1. Case-deletion measures

The case-deletion approach is frequently employed to examine the impact of removing the ith observation from a dataset. In this context, any quantity with the subscript ‘ $[i]$ ’ represents the original quantity with the ith observation excluded. For example, $y_{c [i]} = (V_{[i]}, C_{[i]}, y_{[i]}, u_{[i]})^{⊤}$ refers to the complete data with the ith observation removed. Let ${\hat{θ}}_{[i]} = ({\hat{β_{c}}}_{[i]}^{⊤}, {\hat{σ}}_{[i]}^{2}, {\hat{ρ}}_{[i]})^{⊤}$ be the maximizer of the function $Q_{[i]} (θ ∣ \hat{θ}) = E [ℓ_{c} (θ ∣ y_{c [i]}) ∣ V, C, \hat{θ}]$ , where $\hat{θ}$ represents the ML estimates of $θ$ . To assess the effect of the ith observation on $\hat{θ}$ , we compare the difference between ${\hat{θ}}_{[i]}$ and $\hat{θ}$ . If the removal of an observation significantly alters the estimates, it indicates that the observation is influential. In other words, if ${\hat{θ}}_{[i]}$ markedly differs from $\hat{θ}$ , the ith observation may be considered influential. Since calculating ${\hat{θ}}_{[i]}$ for every observation can be computationally intensive, the following one-step approximation ${\tilde{θ}}_{[i]}$ is used to alleviate the computational load (see [8,39]):

\begin{aligned} {\tilde{θ}}_{[i]} = \hat{θ} + {- \ddot{Q} (\hat{θ} ∣ \hat{θ})}^{- 1} {\dot{Q}}_{[i]} (\hat{θ} ∣ \hat{θ}), for i \in {1, \dots, n}, \end{aligned}

(20)

where

\begin{aligned} {\dot{Q}}_{[i]} (\hat{θ} ∣ \hat{θ}) = {\frac{\partial Q_{[i]} (θ ∣ \hat{θ})}{\partial θ} |}_{θ = \hat{θ}} and \ddot{Q} (\hat{θ} ∣ \hat{θ}) = {\frac{\partial^{2} Q (θ ∣ \hat{θ})}{\partial θ \partial θ^{⊤}} |}_{θ = \hat{θ}}, \end{aligned}

(21)

represent the gradient vector and the Hessian matrix evaluated at $\hat{θ}$ , respectively. Specifically, the Hessian matrix plays a critical role in the method developed by Zhu et al. [39] (see also [37]) for computing case-deletion diagnostic measures and for assessing the local influence of a given perturbation scheme. These formulas can be derived straightforwardly from Equation (18). The elements of the gradient vector ${\dot{Q}}_{[i]} (\hat{θ} ∣ \hat{θ}) = ({\dot{Q}}_{[i] β_{c}} (\hat{θ} ∣ \hat{θ}), {\dot{Q}}_{[i] σ^{2}} (\hat{θ} ∣ \hat{θ}), {\dot{Q}}_{[i] ρ} (\hat{θ} ∣ \hat{θ}))$ are given by:

\begin{aligned} {\dot{Q}}_{[i] β_{c}} (\hat{θ} ∣ \hat{θ}) & = {\frac{\partial Q_{[i]} (θ ∣ \hat{θ})}{\partial β_{c}} |}_{θ = \hat{θ}} = \sum_{j \neq i} X_{jc}^{⊤} Σ^{- 1} \hat{u y_{j}} - \sum_{j \neq i} \hat{u_{j}} X_{jc}^{⊤} Σ^{- 1} X_{jc} β_{c}, \\ {\dot{Q}}_{[i] σ^{2}} (\hat{θ} ∣ \hat{θ}) & = {\frac{\partial Q_{[i]} (θ ∣ \hat{θ})}{\partial σ^{2}} |}_{θ = \hat{θ}} = - \frac{1}{2} \sum_{j \neq i} tr (Σ^{- 1} B) + \frac{1}{2} \sum_{j \neq i} tr ({\hat{Γ}}_{j} Σ^{- 1} B Σ^{- 1}), \\ {\dot{Q}}_{[i] ρ} (\hat{θ} ∣ \hat{θ}) & = {\frac{\partial Q_{[i]} (θ ∣ \hat{θ})}{\partial ρ} |}_{θ = \hat{θ}} = - \frac{1}{2} \sum_{j \neq i} tr (Σ^{- 1} D) + \frac{1}{2} \sum_{j \neq i} tr ({\hat{Γ}}_{j} Σ^{- 1} D Σ^{- 1}), \end{aligned}

where

\begin{aligned} {\hat{Γ}}_{j} & = {\hat{u y_{j}^{2}} - \hat{u y_{j}} μ_{j}^{⊤} - μ_{j} (\hat{u y_{j}})^{⊤} + \hat{u_{j}} μ_{j} μ_{j}^{⊤}}, B = \frac{\partial Σ}{\partial σ^{2}} = (\begin{array}{cc} 1 & \frac{ρ}{2 σ} \\ \frac{ρ}{2 σ} & 0 \end{array}), and \\ D & = \frac{\partial Σ}{\partial ρ} = (\begin{array}{cc} 0 & σ \\ σ & 0 \end{array}) . \end{aligned}

(22)

The elements of the Hessian matrix $\ddot{Q} (θ ∣ \hat{θ}) = \sum_{i = 1}^{n} \partial^{2} Q_{i} (θ ∣ \hat{θ}) / \partial θ \partial θ^{⊤}$ , where $θ = (β_{c}^{⊤}, σ^{2}, ρ)^{⊤}$ is the parameter vector, are given by:

\begin{aligned} \frac{\partial^{2} Q_{i} (θ ∣ \hat{θ})}{\partial β_{c} \partial {β_{c}}^{⊤}} & = - \hat{u_{i}} X_{ic}^{⊤} Σ^{- 1} X_{ic}, \\ \frac{\partial^{2} Q_{i} (θ ∣ \hat{θ})}{\partial β_{c} \partial σ^{2}} & = - X_{ic}^{⊤} Σ^{- 1} B Σ^{- 1} \hat{u y_{i}} + \hat{u_{i}} X_{ic}^{⊤} Σ^{- 1} B Σ^{- 1} X_{ic} β_{c}, \\ \frac{\partial^{2} Q_{i} (θ ∣ \hat{θ})}{\partial β_{c} \partial ρ} & = - X_{ic}^{⊤} Σ^{- 1} D Σ^{- 1} \hat{u y_{i}} + \hat{u_{i}} X_{ic}^{⊤} Σ^{- 1} D Σ^{- 1} X_{ic} β_{c}, \\ \frac{\partial^{2} Q_{i} (θ ∣ \hat{θ})}{\partial σ^{2} \partial σ^{2}} & = \frac{1}{2} tr (Σ^{- 1} B Σ^{- 1} B - Σ^{- 1} E) + \frac{1}{2} tr [{\hat{Γ}}_{i} (Σ^{- 1} E Σ^{- 1} - 2 Σ^{- 1} B Σ^{- 1} B Σ^{- 1})], \\ \frac{\partial^{2} Q_{i} (θ ∣ \hat{θ})}{\partial σ^{2} \partial ρ} & = \frac{1}{2} tr (Σ^{- 1} D Σ^{- 1} B - Σ^{- 1} F) + \frac{1}{2} tr [{\hat{Γ}}_{i} (Σ^{- 1} F Σ^{- 1} - 2 Σ^{- 1} B Σ^{- 1} D Σ^{- 1})], \\ \frac{\partial^{2} Q_{i} (θ ∣ \hat{θ})}{\partial ρ \partial ρ} & = \frac{1}{2} tr (Σ^{- 1} D Σ^{- 1} D) - tr ({\hat{Γ}}_{i} Σ^{- 1} D Σ^{- 1} D Σ^{- 1}), \end{aligned}

where ${\hat{Γ}}_{i}$ , $B$ , and $D$ are defined as in (22), $E = \frac{\partial^{2} Σ}{\partial^{2} σ^{2}} = (\begin{matrix} 0 & - \frac{ρ}{4 σ^{3}} \\ - \frac{ρ}{4 σ^{3}} & 0 \end{matrix})$ , and $F = \frac{\partial^{2} Σ}{\partial σ^{2} \partial ρ} = (\begin{matrix} 0 & \frac{1}{2 σ} \\ \frac{1}{2 σ} & 0 \end{matrix})$ . We obtain the Hessian matrix $\ddot{Q} (\hat{θ} ∣ \hat{θ})$ by evaluating these second-order derivatives at $θ = \hat{θ}$ .

To assess influential observations, we can develop case-deletion measures such as the generalized Cook's distance and the likelihood distance. Using the distance metric proposed by Zhu et al. [39] to quantify the difference between ${\hat{θ}}_{[i]}$ and $\hat{θ}$ , we define the generalized Cook's distance as follows:

\begin{aligned} {GD}_{i} = ({\hat{θ}}_{[i]} - \hat{θ})^{⊤} {- \ddot{Q} (\hat{θ} ∣ \hat{θ})} ({\hat{θ}}_{[i]} - \hat{θ}), i = 1, \dots, n . \end{aligned}

(23)

By substituting (20) into (23), we derive the following approximation of the generalized Cook's distance:

\begin{aligned} {GD}_{i}^{1} = {\dot{Q}}_{[i]} (\hat{θ} ∣ \hat{θ})^{⊤} {- \ddot{Q} (\hat{θ} ∣ \hat{θ})}^{- 1} {\dot{Q}}_{[i]} (\hat{θ} ∣ \hat{θ}) . \end{aligned}

4.2. Local influence

In this subsection, we calculate the normal curvature of local influence, following the method described by Cook [7], for several standard perturbation schemes applied to the model or data. Specifically, we will investigate case-weight perturbation, scale matrix perturbation, explanatory variable perturbation, and response perturbation. By examining these methods, we aim to accurately assess the sensitivity and resistance of our model, enabling us to make any necessary adjustments to enhance its reliability.

Let $υ = (υ_{1}, \dots, υ_{g})^{⊤}$ be the perturbation vector varying within an open region $Υ \subset R^{g}$ . Consider that $ℓ_{c} (θ, υ ∣ y_{c})$ denote the complete-data log-likelihood function of the perturbed model. We assume there exists $υ_{0} \in Υ$ such that $ℓ_{c} (θ, υ_{0} ∣ y_{c}) = ℓ_{c} (θ ∣ y_{c})$ for all $θ$ . Let us define

\begin{aligned} Q (θ, υ ∣ \hat{θ}) & = E [ℓ_{c} (θ, υ ∣ y_{c}) ∣ V, C, \hat{θ}] and \\ \hat{θ} (υ) & = \underset{θ}{\arg max} {Q (θ, υ ∣ \hat{θ})} = {(\hat{β_{c}} (υ)^{⊤}, {\hat{σ}}^{2} (υ), \hat{ρ} (υ))}^{⊤} . \end{aligned}

The influence graph is defined as $α (υ) = (υ^{⊤}, f_{Q} (υ))^{⊤}$ , with $f_{Q} (υ)$ representing the Q-displacement function, given by:

\begin{aligned} f_{Q} (υ) = 2 [Q (\hat{θ} ∣ \hat{θ}) - Q (\hat{θ} (υ) ∣ \hat{θ})] . \end{aligned}

Building on the methodology of Cook [7] and Zhu and Lee [38], the normal curvature $C_{f_{Q}, h}$ of $α (υ)$ at $υ_{0}$ in the direction of a unit vector $h$ can be employed to analyze the local behavior of the Q-displacement function. We define

\begin{aligned} \nabla_{υ} = {\frac{\partial^{2} Q (θ, υ ∣ \hat{θ})}{\partial θ \partial υ^{⊤}} |}_{θ = \hat{θ} (υ)} and {\ddot{Q}}_{υ_{0}} = {\frac{\partial^{2} Q (\hat{θ} (υ) ∣ \hat{θ})}{\partial υ \partial υ^{⊤}} |}_{υ = υ_{0}} . \end{aligned}

Next, it can be demonstrated that

\begin{aligned} C_{f_{Q}, h} = - 2 h^{⊤} {\ddot{Q}}_{υ_{0}} h = 2 h^{⊤} \nabla_{υ_{0}}^{⊤} {- \ddot{Q} (\hat{θ} ∣ \hat{θ})}^{- 1} \nabla_{υ_{0}} h, \end{aligned}

with $\ddot{Q} (\hat{θ} ∣ \hat{θ})$ as defined in (21).

Adopting the approach described by Cook [7], the symmetric matrix $- {\ddot{Q}}_{υ_{0}}$ offers crucial insights for detecting influential observations. We begin by applying the spectral decomposition:

\begin{aligned} - 2 {\ddot{Q}}_{υ_{0}} = \sum_{k = 1}^{g} ξ_{k} ϵ_{k} ϵ_{k}^{⊤}, \end{aligned}

where ${(ξ_{k}, ϵ_{k}), k = 1, \dots, g}$ are eigenvalue-eigenvector pairs of $- 2 {\ddot{Q}}_{υ_{0}}$ with $ξ_{1} \geq \dots \geq ξ_{r} > ξ_{r + 1} = \dots = 0$ , and orthonormal eigenvectors $ϵ_{k}, for k = 1, \dots, g$ . Zhu and Lee [38] proposed examining all eigenvectors corresponding to nonzero eigenvalues to extract additional insights, employing the following approach:

\begin{aligned} {\tilde{ξ}}_{k} = \frac{ξ_{k}}{ξ_{1} + \dots + ξ_{r}}, ϵ_{k}^{2} = (ϵ_{k 1}^{2}, \dots, ϵ_{kg}^{2})^{⊤} and M (0) = \sum_{k = 1}^{r} {\tilde{ξ}}_{k} ϵ_{k}^{2} . \end{aligned}

Let $M (0)_{l} = \sum_{k = 1}^{r} {\tilde{ξ}}_{k} ϵ_{kl}^{2}$ represent the lth component of $M (0)$ , where the assessment of influential cases involves visually inspecting $M (0)_{l}$ for $l = 1, \dots, g$ , plotted against the index l, and considering a case influential if $M (0)_{l}$ exceeds a specified benchmark.

Using normal curvature to evaluate observation influence can be problematic due to the variability of $C_{f_{Q}, h}$ , which is not invariant under uniform scale changes. To address this issue, Zhu and Lee [38] introduced the concept of conformal normal curvature, inspired by Poon and Poon [32], defined as:

\begin{aligned} B_{f_{Q}, h} = \frac{C_{f_{Q}, h}}{tr [- 2 {\ddot{Q}}_{υ_{0}}]}, \end{aligned}

which is straightforward to compute and satisfies $0 \leq B_{f_{Q}, h} \leq 1$ . Let $h_{l}$ be a basic perturbation vector where the lth entry is 1 and all other entries are 0. Zhu and Lee [38] demonstrated that $M (0)_{l} = B_{f_{Q}, h_{l}}$ for all l, allowing us to derive $M (0)_{l}$ from $B_{f_{Q}, h_{l}}$ .

At present, there is no standard guideline for determining the influence magnitude of a given case. Let $\bar{M (0)}$ and $SM (0)$ represent the mean and standard error of $M (0)_{l}$ , with $l = 1, \dots, g$ . Since the vectors $ϵ_{k}$ are orthonormal, it is straightforward to establish that $\bar{M (0)} = 1 / g$ . Poon and Poon [32] suggested using $2 \bar{M (0)}$ as a reference for $M (0)$ , though alternative functions can be used. For example, Zhu and Lee [38] proposed $\bar{M (0)} + 2 SM (0)$ to account for the variance of $M (0)_{l}$ . According to Lee and Xu [22], choosing a benchmark function based on $\bar{M (0)}$ is subjective; they recommended $\bar{M (0)} + c^{*} SM (0)$ , where $c^{*}$ is a constant adaptable to specific applications. In this study, we use $c^{*} = 3.5$ , a choice supported by Massuia et al. [25], who found it effective in empirical research.

4.3. Perturbation schemes

This section examines the matrix $\nabla$ across various perturbation strategies within the Heckman selection-t model. Case-weight perturbation is employed to identify observations that notably impact the log-likelihood function, potentially exerting significant influence on maximum likelihood estimates. Scale perturbation involves adjustments to the scale matrix $Σ$ , highlighting individuals whose likelihood displacement within the scale structure is most pronounced. Response perturbation focuses on varying response values to identify observations that strongly affect their predicted outcomes. Explanatory variable perturbation helps pinpoint the values of continuous explanatory variables that are highly sensitive, as indicated by changes in the log-likelihood. Each perturbation scheme is considered in its partitioned form:

\begin{aligned} \nabla_{υ_{0}} = {(\nabla_{β_{c}}^{⊤}, \nabla_{σ^{2}}^{⊤}, \nabla_{ρ}^{⊤})}^{⊤}, \end{aligned}

where

\begin{aligned} \nabla_{β_{c}} & = {\frac{\partial^{2} Q (θ, υ ∣ \hat{θ})}{\partial β_{c} \partial υ^{⊤}} |}_{θ = \hat{θ} (υ_{0})}, \nabla_{σ^{2}} = {\frac{\partial^{2} Q (θ, υ ∣ \hat{θ})}{\partial σ^{2} \partial υ^{⊤}} |}_{θ = \hat{θ} (υ_{0})} and \\ \nabla_{ρ} & = {\frac{\partial^{2} Q (θ, υ ∣ \hat{θ})}{\partial ρ \partial υ^{⊤}} |}_{θ = \hat{θ} (υ_{0})}, \end{aligned}

with $\nabla_{β_{c}}$ $\in R^{(p + q) \times g}$ , $\nabla_{σ^{2}} \in R^{1 \times g}$ and $\nabla_{ρ} \in R^{1 \times g}$ . Analytical expressions are provided in the following four propositions, with proofs available in the Appendix section.

4.3.1. Case-weight perturbation

We investigate assigning arbitrary weights to the expected value of the complete-data log-likelihood function (perturbed Q-function), allowing us to account for deviations in various directions through

\begin{aligned} Q (θ, υ ∣ \hat{θ}) & = E [ℓ_{c} (θ, υ ∣ y_{c}) ∣ V, C, \hat{θ}] = \sum_{i = 1}^{n} υ_{i} E [ℓ_{ic} (θ ∣ y_{c}) ∣ V, C, \hat{θ}] \\ = \sum_{i = 1}^{n} υ_{i} Q_{i} (θ ∣ \hat{θ}) . \end{aligned}

(24)

Here, $υ = (υ_{1}, \dots, υ_{n})^{⊤}$ is an $n \times 1$ vector, and $υ_{0} = (1, \dots, 1)^{⊤}$ . Note that if $υ_{i} = 0$ and $υ_{j} = 1$ for $j \neq i$ , the ith observation is excluded from the complete-data log-likelihood function.

Proposition 4.1

Under the case-weight perturbation scheme defined in (24), the elements of the matrix $\nabla_{υ_{0}}$ are given by

$\begin{aligned} \nabla_{β_{c}} & = \sum_{i = 1}^{n} X_{ic}^{⊤} Σ^{- 1} \hat{u y_{i}} - \sum_{i = 1}^{n} \hat{u_{i}} X_{ic}^{⊤} Σ^{- 1} X_{ic} β_{c}, \\ \nabla_{σ^{2}} & = - \frac{1}{2} \sum_{i = 1}^{n} tr (Σ^{- 1} B) + \frac{1}{2} \sum_{i = 1}^{n} tr ({\hat{Γ}}_{i} Σ^{- 1} B Σ^{- 1}) and \\ \nabla_{ρ} & = - \frac{1}{2} \sum_{i = 1}^{n} tr (Σ^{- 1} D) + \frac{1}{2} \sum_{i = 1}^{n} tr ({\hat{Γ}}_{i} Σ^{- 1} D Σ^{- 1}), \end{aligned}$

where ${\hat{Γ}}_{i}$ , $B$ , and $D$ are defined as in (22).

Proof.

See the Appendix.

4.3.2. Scale perturbation

To assess deviations from the assumption about the scale matrix $Σ$ , we examine the perturbation given by

\begin{aligned} Σ (υ_{i}) = υ_{i}^{- 1} Σ, i = 1, \dots, n . \end{aligned}

(25)

In this perturbation scheme, the original model corresponds to $υ_{0} = (1, \dots, 1)^{⊤} \in R^{n}$ . Furthermore, the perturbed Q-function, as in (18), replaces $Σ$ with $Σ (υ_{i})$ .

Proposition 4.2

Under the scale perturbation scheme defined in (25), the elements of the matrix $\nabla_{υ_{0}}$ are given by:

$\begin{aligned} \nabla_{β_{c}} & = \sum_{i = 1}^{n} X_{ic}^{⊤} Σ^{- 1} \hat{u y_{i}} - \sum_{i = 1}^{n} \hat{u_{i}} X_{ic}^{⊤} Σ^{- 1} X_{ic} β_{c}, \\ \nabla_{σ^{2}} & = \frac{1}{2} \sum_{i = 1}^{n} tr ({\hat{Γ}}_{i} Σ^{- 1} B Σ^{- 1}) and \\ \nabla_{ρ} & = \frac{1}{2} \sum_{i = 1}^{n} tr ({\hat{Γ}}_{i} Σ^{- 1} D Σ^{- 1}), \end{aligned}$

where ${\hat{Γ}}_{i}$ , $B$ , and $D$ are defined as in (22).

Proof.

See the Appendix.

4.3.3. Response perturbation

To introduce a perturbation of the response variables $(y_{1}^{⊤}, \dots, y_{n}^{⊤})^{⊤}$ , we replace $y_{i}$ with

\begin{aligned} y_{i} (υ) = y_{i} + υ_{i} 1_{2}, \end{aligned}

(26)

where $1_{2}$ is a $2 \times 1$ vector of ones. The perturbed Q-function follows (18), replacing $y_{i}$ with $y_{i} (υ)$ . Here, the vector $υ_{0} = 0 \in R^{n}$ signifies no perturbation.

Proposition 4.3

Under the response perturbation scheme defined in (26), the elements of the matrix $\nabla_{υ_{0}}$ are as follows:

$\begin{aligned} \nabla_{β_{c}} & = - \sum_{i = 1}^{n} \hat{u_{i}} X_{ic}^{⊤} Σ^{- 1} 1_{2}, \\ \nabla_{σ^{2}} & = \frac{1}{2} \sum_{i = 1}^{n} tr ({\hat{Γ}}_{i}^{†} Σ^{- 1} B Σ^{- 1}) and \\ \nabla_{ρ} & = \frac{1}{2} \sum_{i = 1}^{n} tr ({\hat{Γ}}_{i}^{†} Σ^{- 1} D Σ^{- 1}), \end{aligned}$

where ${\hat{Γ}}_{i}^{†} = {- 2 (\hat{u y_{i}})^{⊤} 1_{2} + 2 \hat{u_{i}} μ_{i}^{⊤} 1_{2}}$ , and $B$ and $D$ are defined as in (22).

Proof.

See the Appendix.

4.3.4. Explanatory variable perturbation

There are three potential methods for perturbing a specific continuous explanatory variable: as a covariate in the primary regression (outcome model), as a covariate in the selection equation (selection model), or simultaneously in both equations. Here, we will focus on the first scenario, while the other two can be addressed similarly.

In this scenario, we aim to perturb a specific continuous explanatory variable in the primary regression. Under this condition, the perturbed explanatory matrix is given by:

\begin{aligned} X_{ic} (υ) = (\begin{array}{cc} x_{i}^{⊤} (υ_{i}) & 0 \\ 0 & w_{i}^{⊤} \end{array}), \end{aligned}

(27)

where $x_{i}^{⊤} (υ_{i}) = x_{i}^{⊤} + υ_{i} 1_{u}^{⊤}$ , $1_{u}^{⊤} = (0, \dots, 1, \dots, 0)$ is a $1 \times p$ vector with 1 in the uth column, $u = 1, \dots, p$ . This approach addresses situations where the continuous covariate $x_{i}$ is measured with error. The perturbed Q-function follows (18), with $X_{ic} (υ)$ replacing $X_{ic}$ . The unperturbed case is achieved by setting $υ_{0} = 0 \in R^{n}$ .

Proposition 4.4

Under the explanatory variable perturbation scheme defined in (27), $\nabla_{υ_{0}}$ has the following elements:

$\begin{aligned} \nabla_{β_{c}} & = \sum_{i = 1}^{n} G_{i}^{⊤} Σ^{- 1} \hat{u y_{i}} - \sum_{i = 1}^{n} \hat{u_{i}} G_{i}^{⊤} Σ^{- 1} X_{ic} β_{c} - \sum_{i = 1}^{n} \hat{u_{i}} X_{ic}^{⊤} Σ^{- 1} G_{i} β_{c}, \\ \nabla_{σ^{2}} & = \frac{1}{2} \sum_{i = 1}^{n} tr ({\hat{Γ}}_{i}^{‡} Σ^{- 1} B Σ^{- 1}) and \\ \nabla_{ρ} & = \frac{1}{2} \sum_{i = 1}^{n} tr ({\hat{Γ}}_{i}^{‡} Σ^{- 1} D Σ^{- 1}) . \end{aligned}$

where ${\hat{Γ}}_{i}^{‡} = {- {\hat{u y_{i}}}^{⊤} G_{i} β_{c} - (G_{i} β_{c})^{⊤} \hat{u y_{i}} + \hat{u_{i}} (G_{i} β_{c})^{⊤} X_{ic} β_{c} + \hat{u_{i}} (X_{ic} β_{c})^{⊤} G_{i} β_{c}}$ and $G_{i}$ is a matrix of dimension $2 \times (p + q)$ obtained by

$\begin{aligned} G_{i} = \frac{\partial X_{ic} (υ)}{\partial υ_{i}} = (\begin{array}{cc} 0 \dots 1 \dots 0 & 0_{1 \times q} \\ 0_{1 \times p} & 0_{1 \times q} \end{array}), \end{aligned}$

with 1 in the first row of the uth column, $u = 1, \dots, p$ . $B$ and $D$ are defined as in (22).

Proof.

See the Appendix.

Note that while it is not feasible to cover all pertinent perturbation schemes in detail, the key lies in finding an appropriate $υ$ . As long as the perturbed complete-data log-likelihood function $ℓ_{c} (θ, υ ∣ Y_{c})$ remains sufficiently smooth, ensuring that all necessary derivatives for diagnostic measures are well-defined, conducting local influence analysis becomes feasible without significant complications. Finally, it is important to note that our approach to developing diagnostic techniques is based on $\ddot{Q} (\hat{θ} ∣ \hat{θ})$ . In the context of linear mixed models, Pan and Foster [31] suggest using $E [\ddot{Q} (\hat{θ} ∣ \hat{θ})]$ instead of $\ddot{Q} (\hat{θ} ∣ \hat{θ})$ to detect potentially influential observations. They demonstrate that $E [\ddot{Q} (\hat{θ} ∣ \hat{θ})]$ is block-diagonal, which facilitates the interpretation of diagnostic measures in this setting.

The next section provides a brief simulation study to evaluate the effectiveness of the proposed diagnostic measures in identifying outliers.

5. Simulation studies

To evaluate the effectiveness of the proposed diagnostic measures, we conducted a simulation study focusing on the SLn and SLt models. The Monte Carlo simulations were designed to assess the ability of the global diagnostic measure (GD), derived via the case-deletion technique (see Section 4.1), to detect influential points in the response variable. Although analogous analyses can be extended to local influence measures, this simulation study concentrates on the global approach for simplicity.

The simulations were based on the classical Heckman selection model, as defined in Equations (7) and (8), with a sample size of $n = 100$ . The regression coefficients were set to $β = (1.0, 0.5)^{⊤}$ , and the selection parameters to $γ = (1.28, 0.30, - 0.50)^{⊤}$ . Covariates were specified as $X = (1, Unif (- 1, 1))$ and $W = (X, N (0, σ^{2}))$ , with $σ = 1$ and $ρ = 0.6$ . The error terms followed a bivariate normal distribution, as outlined in Equation (9), and were assumed to be independent.

Data generation adhered to the Heckman model framework, implemented using the rHeckman function from the HeckmanEM package in R. We considered three levels of censoring, 10%, 20%, and 40% to examine whether the degree of censoring affects the detection of influential points.

To identify these points, we perturbed the minimum and maximum values within each sample by k standard deviations, with k ranging from 0 to 3. Specifically, the minimum value was reduced, and the maximum value increased by k standard deviations. Even without perturbation, these extreme points were treated as potential influential candidates due to their positions in the distribution tails. Each sample was subsequently fitted to the SLn and SLt models, and the GD measure was computed. For the SLt model two fitting strategies were performed: 1) we fixed the degrees of freedom (SLt with fixed ν) in the estimated value when no synthetic influential observation was added; 2) the degree of freedom was estimated for the datasets (SLt with adjusted ν) with the inclusion of the outliers. A point was classified as influential if both the minimum and maximum values exceeded the GD benchmark threshold. This process was repeated across 500 Monte Carlo replicates.

The results, summarized in Table 1, present the percentage of detected influential points for both the SLn and SLt models, alongside the mean and standard deviation (SD) of the influence measure, compared to the benchmark value. As expected, for the SLn model, the percentage of influential points increased with higher values of k, regardless of the censoring level, showcasing the diagnosis capacity to detect the influential observations as they become more severe. Meanwhile, the SLt fit presents two different behaviors. When the ν parameter is fixed, the SLt loses its capacity to adapt to outliers. Therefore, the diagnosis detects the shifted observations as influential. However, when the degree of freedom is re-estimated in the model fit, the model absorbs the outliers making the tails fatter and not exceeding the threshold of the diagnosis tool. This helps us understand why the SLt model has significant resistance compared to the SLn.

Table 1.

Influence diagnostic analysis: case-deletion in SLn and SLt models under different censoring types and threshold variations ( $Y_{\min} = Y_{\min} - k \cdot sd (y), Y_{\max} = Y_{\max} + k \cdot sd (y)$ ).

	SLn				SLt with fixed ν				SLt with adjusted ν
	k				k				k
Statistic	0	1	2	3	0	1	2	3	0	1	2	3
	10% Censoring
% Influential¹	2.0	46.4	95.4	99.6	0.8	27.7	81.8	93.0	0.0	0.2	0.2	0.0
Mean measure	0.020	0.025	0.032	0.044	0.021	0.022	0.028	0.035	0.020	0.019	0.019	0.018
SD²measure	0.029	0.055	0.117	0.203	0.025	0.039	0.079	0.134	0.025	0.022	0.019	0.017
Benchmark	0.140				0.160				0.160
	20% Censoring
% Influential	1.8	44.8	91.2	99.8	0.2	28.1	81.6	93.5	0.0	0.2	0.2	0.0
Mean measure	0.021	0.024	0.033	0.043	0.021	0.023	0.029	0.036	0.020	0.019	0.019	0.018
SD measure	0.027	0.051	0.110	0.190	0.023	0.038	0.079	0.134	0.025	0.022	0.019	0.017
Benchmark	0.140				0.160				0.160
	40% Censoring
% Influential	1.2	31.6	83.0	97.6	0.0	18.6	69.3	88.8	0.0	0.2	0.0	0.0
Mean measure	0.020	0.023	0.029	0.036	0.020	0.021	0.026	0.031	0.020	0.020	0.019	0.019
SD measure	0.025	0.044	0.089	0.147	0.021	0.034	0.067	0.109	0.022	0.020	0.018	0.016
Benchmark	0.140				0.160				0.160

Open in a new tab

¹% Influential: represents the percentage of Monte Carlo replicates in which both the minimum and maximum observations were jointly detected as influential (exceeded the benchmark value).

²SD: Standard deviation.

Moreover, in the SLt with adjusted ν model, the mean and standard deviation of the GD measure remained stable or slightly decreased as k increased, indicating resistance. This behavior contrasts with the SLn and SLt with fixed ν models, where both the mean and standard deviation of GD increased with k. Additionally, higher censoring levels led to a slight decrease in the detection of influential points for the SLn and SLt with fixed ν models, whereas the SLt with adjusted ν model remained unaffected, emphasizing its resilience to outliers regardless of the degree of censoring when the degree of freedom is estimated in the process.

Figure 1 allows us to visualize, for one of the Monte Carlo iterations with 10% censoring, the effect of the GD measure with varying k. Clearly, for the SLn and the SLt with fixed ν models, the influential observation became more extreme as k increased, being more severe for the SLn model. However, the GD measure maintained stable for the SLt with adjusted ν. In this case, there was an inverse association between ν and k, that is, ν reduced as k increased. This effect happened to absorb the outliers, showcasing the resistance of the SLt model.

This clearly demonstrates that the proposed diagnostic tools are effective in accurately identifying whether an observation is influential, depending on the fitted distribution. It is well established that heavy-tailed models are robust to outliers, and the diagnostic measures effectively capture this robustness.

The next section presents two real-world applications that further illustrate the efficacy of the proposed methodology.

6. Applications

6.1. Ambulatory expenditures

To illustrate the methodologies discussed, we applied them to analyze ambulatory expenditures data originally from Cameron and Trivedi [5] and later re-examined by Marchenko and Genton [24] using ML estimation in Stata, by Ding [9] using Bayesian methods, and by Lachos et al. [19] utilizing an efficient EM-type algorithm.

For our analysis, we selected the same covariates as used by Marchenko and Genton [24], Ding [9], and Lachos et al. [19]. Specifically, we focused on log expenditures (lambexp) as the outcome variable. The covariates in the outcome equation included $x = (1,$ age, blhisp, educ, female, ins, totchr), representing age, ethnicity, education status, gender, insurance status, and number of chronic diseases, respectively. The income variable was included in the selection equation, $w = (x$ , income), to ensure the exclusion restriction. The dataset had 526 missing values out of a total of 3328 observations.

Using the HeckmanEM package in R, we fitted both SLn and SLt models. Covariates educ and ins were found non-significant in both models, leading to their exclusion upon model readjustment to $x = (1,$ age, blhisp, female, totchr) with $w = (x$ , income). The estimation results are detailed in Table 2. Notably, all covariates proved significant in both the outcome and selection models for SLn and SLt. As noted by Marchenko and Genton [24], Ding [9], and Lachos et al. [19], the SLn model's 95% confidence interval of ρ contained zero $(- 0.563, 0.249)$ , indicating weak evidence of selection bias. In contrast, the SLt model exhibited a 95% confidence interval of ρ of $(- 0.618, - 0.116)$ , suggesting a significant selection bias effect.

Table 2.

Ambulatory expenditures data: ML estimates, standard errors, and information criteria.

	SLn				SLt
Parameter	Estimate	Std. error	z	p	Estimate	Std. error	z	p
Outcome model
`intercept`	5.317	0.173	30.745	0.000	5.471	0.135	40.374	0.000
`age`	0.209	0.024	8.653	0.000	0.202	0.023	8.738	0.000
`blhisp`	−0.233	0.068	−3.411	0.001	−0.200	0.059	−3.367	0.001
`female`	0.342	0.071	4.831	0.000	0.295	0.059	4.973	0.000
`totchr`	0.534	0.051	10.413	0.000	0.505	0.041	12.438	0.000
Selection model
`intercept`	0.126	0.115	1.088	0.277	0.091	0.124	0.735	0.462
`age`	0.088	0.026	3.344	0.001	0.098	0.029	3.427	0.001
`blhisp`	−0.435	0.060	−7.189	0.000	−0.465	0.064	−7.235	0.000
`female`	0.687	0.060	11.404	0.000	0.748	0.066	11.308	0.000
`totchr`	0.780	0.068	11.457	0.000	0.869	0.083	10.437	0.000
`income`	0.005	0.001	4.512	0.000	0.006	0.001	4.492	0.000
σ	1.274	0.021			1.203	0.025
ρ	−0.157	0.207			−0.367	0.128
ν					13.089
AIC	11713.780				11686.060
BIC	11719.890				11692.170

Open in a new tab

Lachos et al. [19] performed residuals analysis and concluded the SLt model provided a better fit than the SLn model. Subsequently, we investigated the dataset for influential observations using the case-deletion approach ( $GD i$ ), $M (0)$ from conformal curvature $B_{f_{Q}, d_{l}}$ , and perturbation schemes outlined in Section 4.3.

Figure 2 presents the approximate generalized Cook's distance ${GD}_{i}$ for the SLn (left panel) and SLt (right panel) fitted models. Higher ${GD}_{i}$ values indicate greater impact of the ith observation on ML parameter estimates. Adapting the suggestion of Barros et al. [2], we used $(2 \times npar) / n$ as a benchmark for ${GD}_{i}$ , where $npar$ denotes the number of estimated model parameters. To enhance clarity, we highlighted observations with high ${GD}_{i}$ values. Notably, fewer observations exceeded the benchmark in the SLt model (18 points, 0.54% of data) compared to the SLn model (55 points, 1.65% of data), underscoring the resistance of the heavy-tailed model, as expected. This highlights the efficacy of our diagnostic approach in identifying influential observations. Heavy-tail distributions are recognized for their resilience to influential observations (see, e.g.[12,25,27]). Figure 2 confirms the resistant behavior of the Student-t distribution in assessing ${GD}_{i}$ . Additionally, it is noteworthy that all influential observations identified in the SLt model also appeared in the SLn model, though with lower ${GD}_{i}$ values.

Next, we conducted a local influence study based on $M (0)$ , guided by Sections 4.2 and 4.3. Here, we used the criterion $M (0)_{l} > \bar{M (0)} + 3.5 SM (0)$ , for $l = 1, \dots, 3328$ , to identify influential observations.

Figure 3 presents SLn and SLt model results under case-weight perturbation, scale matrix perturbation, and response perturbation schemes. To enhance clarity, we highlighted points with high $M (0)_{l}$ values. Analysis of case-weight and scale matrix perturbations revealed a greater number of influential points detected in the SLn model compared to the SLt model, consistent with observations in Figure 2. Conversely, response perturbation yielded similar findings of influential observations across both models.

Figure 4 illustrates the explanatory variable perturbation for the two continuous covariates included in the primary regression. As anticipated, the SLn model identifies several influential points when perturbing the age and totchr covariates. In contrast, the SLt model designates only a few observations as influential points when perturbing the totchr covariate (number of chronic diseases). Regarding the age covariate, the SLt model effectively accommodates the observations, with no additional influential points identified.

To further assess the effectiveness of the proposed diagnostic measures, we refitted the SLn and SLt models by excluding specific data points. Based on the points identified as potentially influential by our proposal, we implemented the following strategy: for the SLn fit, we initially excluded 55 non-influential observations (with the lowest values of $GD$ ), and alternatively, for comparison purposes, we removed all 55 observations with $GD$ values above the benchmark. Similarly, for the SLt fit, we excluded 18 non-influential observations (with the lowest values of $GD$ ) and then eliminated all 18 observations with $GD$ values above the benchmark. Table 3 displays the relative percentage changes (RC) in these estimates, calculated as

\begin{aligned} {RC}_{\hat{η}} = | \frac{\hat{η} - {\hat{η}}_{[i]}}{\hat{η}} | \times 100 %, \end{aligned}

(28)

where η = $β_{0}, \dots, β_{4}, γ_{0}, \dots, γ_{5}, σ, ρ, ν$ and ${\hat{η}}_{[i]}$ denotes the ML estimates of $\hat{η}$ after the set has been removed. As expected, when we remove points with low values of $GD$ (considered non-influential), we observe in Table 3 that the relative percentage changes are very small, indicating that their removal does not significantly impact the ML estimates for both models. Conversely, when we exclude the points identified by the $GD$ measure as influential, for both models, we observe a substantial percentage of relative changes (many exceeding 10%) in the ML estimates. Therefore, our proposal correctly discriminates the influential points from the non-influential.

Table 3.

Ambulatory expenditures data: relative changes (in %) of ML estimates.

	SLn		SLt
	Dropping 55 points	Dropping 55 points	Dropping 18 points	Dropping 18 points
Parameter	without influence^†	with influence^‡	without influence^†	with influence^‡
Outcome model
`intercept`	0.12	0.68	0.08	0.17
`age`	0.06	3.44	0.35	1.47
`blhisp`	0.44	6.00	0.26	5.48
`female`	0.97	4.66	0.42	4.04
`totchr`	0.03	4.10	0.18	2.40
Selection model
`intercept`	2.27	120.75	4.16	78.05
`age`	1.96	18.12	1.38	3.72
`blhisp`	0.96	7.75	0.47	3.06
`female`	0.19	13.14	0.71	1.23
`totchr`	0.69	22.55	0.61	6.11
`income`	3.78	51.65	0.55	45.02
σ	0.90	3.15	0.62	0.61
ρ	6.35	8.79	0.70	8.71
ν			3.80	0.33

Open in a new tab

^†Lower $GD$ values. ^‡ $GD$ values above the benchmark.

6.2. Mroz: labor supply data

In this second application, our focus is on analyzing missing econometric data through a reexamination of the dataset originally introduced by Mroz [29]. Our goal is to estimate the wage offer function for married women using diagnostic tools proposed in the methodology. The dataset, referred to as the ‘Mroz data’, consists of observations on 753 married white women across 21 variables. This dataset is available in the R package AER [17], accessible via the command data("PSID1976"). Notably, the variable of interest, female wage, is missing for 325 (43%) of the 753 women in the sample. To illustrate diagnostic techniques, we adopt the same set of covariates used by Ogundimu and Hutton [30]. Specifically, the logarithm of wage depends on education status and city, represented as $x = (1, educ, city)$ . The selection equation incorporates husband's wage, number of children aged 5 years or younger, marginal tax rate of the wife, and the wife's father's educational attainment, alongside educational and city variables. Thus, $w = (x, huswage, kid5, mtr, fatheduc)$ .

We fitted both SLn and SLt models using the HeckmanEM package in R. Table 4 summarizes the parameter estimates and their corresponding p-values. Notably, while the statistical significance of covariates in both models is similar, the SLt model yielded a small estimated value of $ν = 3.001$ . This suggests that the SLn model is inadequate for the Mroz data. Moreover, the estimate of σ decreased from 0.8 in the SLn model to 0.5 in the SLt model. Both models indicated a high value of ρ close to −1, indicating non-random sample selection.

Table 4.

Mroz data: ML estimates, standard errors, and information criteria.

	SLn				SLt
Parameter	Estimate	Std. error	z	p	Estimate	Std. error	z	p
Outcome model
`intercept`	0.669	0.239	2.798	0.005	0.332	0.170	1.959	0.051
`educ`	0.066	0.018	3.559	0.000	0.087	0.013	6.719	0.000
`city`	0.107	0.082	1.306	0.192	0.094	0.059	1.602	0.110
Selection model
`intercept`	3.802	0.764	4.975	0.000	5.934	0.953	6.228	0.000
`huswage`	−0.103	0.015	−6.812	0.000	−0.153	0.021	−7.387	0.000
`kids5`	−0.415	0.078	−5.345	0.000	−0.585	0.108	−5.438	0.000
`mtr`	−5.782	0.847	−6.825	0.000	−8.448	1.089	−7.756	0.000
`fatheduc`	−0.020	0.013	−1.617	0.106	−0.012	0.016	−0.793	0.428
`educ`	0.112	0.024	4.653	0.000	0.118	0.029	4.140	0.000
`city`	−0.040	0.107	−0.370	0.712	−0.097	0.123	−0.784	0.433
σ	0.800	0.028			0.501	0.030
ρ	−0.780	0.040			−0.733	0.061
ν					3.001
AIC	1765.604				1678.728
BIC	1770.228				1683.352

Open in a new tab

Figure 5 presents the normal probability plot of residuals generated by the HeckmanEM package, highlighting a better fit for the SLt model, corroborated by lower AIC and BIC values (as seen in Table 4). Additionally, Figure 6 shows the approximate generalized Cook's distance ( ${GD}_{i}$ ) for the Mroz data in both SLn and SLt models. Interestingly, the SLt model identified only 4 influential points (specifically 84, 176, 369, and 423), representing a 73.3% reduction from the 15 influential points identified by the SLn model.

Figure 6. — Mroz data: approximate generalized Cook's distance ( ${GD}_{i}$ ) for the SLn (left) and SLt (right) fitted models.

Furthermore, a local influence study based on $M (0)$ (Sections 4.2 and 4.3) was conducted. Figure 7 displays results for both SLn (left) and SLt (right) models under various perturbations (case-weight, scale matrix, response, and explanatory variables). Regarding the latter, we exclusively present the perturbation graph for the continuous covariate educ.

Based on Figure 7, it is evident that several points identified in Figure 6 ( $G D_{i}$ ) reappear during the case-weight, scale matrix, and explanatory variable educ perturbations, but exclusively in the SLn model fit. In contrast, these points do not exhibit prominence in the SLt model fit. As previously discussed, these outcomes align with the inherent characteristics of the SLn and SLt models. The efficacy of the proposed diagnostic techniques in detecting these discrepancies is noteworthy. Lastly, concerning the response perturbation, similar patterns among the highlighted points are observed across both model fits.

Based on the influence methods, the resistance of the SLt model to atypical observations is reinforced. In particular, these diagnostic tools enable us to quantify how much the ML estimates of $θ$ are impacted by altering a single observation $Y_{i}$ by ξ units. Specifically, we modify a single observation $y_{i}$ to $y_{i} (ξ) = y_{i} + ξ$ , and then compute the relative change in estimates $((\hat{θ} (ξ) - \hat{θ}) / \hat{θ})$ , where $\hat{θ}$ represents the original estimate and $\hat{θ} (ξ)$ denotes the estimate with the contaminated data. In this instance, we manipulated the observation corresponding to subject 369, varying ξ from $- 5$ to 5 in increments of 1. Figure 8 illustrates the relative changes in the estimate $β = (β_{0}, β_{1}, β_{2})^{⊤}$ , corresponding to (intercept, educ, city) of the outcome model, for different levels of ξ, under both SLn and SLt models. As anticipated, the SLt model exhibits less pronounced fluctuations in estimates when subjected to variations in ξ, compared to the SLn model.

Figure 8. — Mroz data: relative changes in ML estimates of $β_{0}$ , $β_{1}$ , and $β_{2}$ for SLn and SLt models under different contamination of ξ on subject 369. Percentage change = $100 \times ((\hat{θ} (ξ) - \hat{θ}) / \hat{θ})$ , where $\hat{θ}$ denotes the original estimate and $\hat{θ} (ξ)$ denotes the estimate for the contaminated data.

An important point to emphasize is the significant role that diagnostic techniques play in model inference. To illustrate this, we re-estimated the SLn and SLt models using the Mroz dataset, excluding all points identified as influential in the response perturbation analysis (see Figure 7). The revised parameter estimates and their corresponding p-values are presented in Table 5. A notable change was observed in the covariate fatheduc, included in the selection equation. Initially, in the SLn model, its p-value was 0.106 (see Table 4), making it non-significant at the 5% level. However, after re-estimation, the p-value dropped to 0.04, rendering the variable statistically significant and altering the corresponding inference. In the SLt model, no significant change in the conclusions of the model were obtained when the potential influential observations were removed. This is an indication of the resistance of the SLt model. Regardless, these results underscore the importance of diagnostic tools in detecting possible influential observations and the need to study their effects on the fit of the model, which can even change the model conclusion, as observed for SLn.

Table 5.

Mroz data: ML estimates, standard errors, and information criteria after excluding influential points identified in response perturbation.

	SLn				SLt
Parameter	Estimate	Std. error	z	p	Estimate	Std. error	z	p
Outcome model
`intercept`	0.749	0.229	3.273	0.001	0.343	0.170	2.016	0.044
`educ`	0.062	0.018	3.484	0.001	0.086	0.013	6.640	0.000
`city`	0.095	0.080	1.190	0.235	0.095	0.059	1.604	0.109
Selection model
`intercept`	3.768	0.743	5.075	0.000	5.910	0.953	6.201	0.000
`huswage`	−0.099	0.014	−6.993	0.000	−0.148	0.021	−7.136	0.000
`kids5`	−0.366	0.072	−5.056	0.000	−0.558	0.108	−5.144	0.000
`mtr`	−5.770	0.825	−6.996	0.000	−8.436	1.090	−7.738	0.000
`fatheduc`	−0.023	0.011	−2.040	0.042	−0.013	0.016	−0.862	0.389
`educ`	0.113	0.024	4.787	0.000	0.117	0.028	4.125	0.000
`city`	−0.056	0.105	−0.533	0.594	−0.103	0.123	−0.837	0.403
σ	0.814	0.027			0.503	0.030
ρ	−0.851	0.028			−0.740	0.060
ν					3.001
AIC	1721.049				1676.327
BIC	1725.661				1680.941

Open in a new tab

In conclusion, our diagnostic methodology effectively identified influential points in the analysis of real data. Moreover, it confirmed the SLt model is superior in its resistance by significantly reducing the number of influential observations compared to the SLn model.

7. Conclusions

To the best of our knowledge, this article is the first to introduce diagnostic tools designed to identify outliers and influential observations in Heckman selection models, filling this gap in the literature. Specifically, this is done for the SLt and SLn models, which assume joint distributions of outcome and sample selection following either the Student's-t or normal distribution. Our approach utilizes the Q-function derived from the EM algorithm specific to these models. Nevertheless, the techniques employed can be extended to any selection model that relies on EM algorithm.

From the real data analysis and simulation studies, we found that our diagnostic tools effectively distinguish between influential and non-influential observations. Additionally, our findings complement the resistant likelihood-based inference methods pioneered by Lachos et al. [19] for analyzing SLt (and SLn) models, particularly suited for selection bias scenarios. Further, from the simulation study, we were not only able to show that our diagnostic tools correctly detect influential observation but, as a side effect, help the reader understand why the SLt model is resistant to outliers. Our proposed methodology has been integrated into the R package HeckmanEM, offering practitioners a user-friendly tool for applying these diagnostics in various domains. Furthermore, this package promotes the reproducibility of our research outcomes, supporting transparency and reliability in subsequent applications.

Future work encompasses the development of diagnostic tools to another type of Heckman selection models that rely on EM-type algorithms, e.g.the Heckman selection contaminated normal model (SLcn) introduced by Lim et al. [23] or to understand the relationship between SLn and SLt models with the broader family of extended skew-elliptical distributions [10,11] to build influence diagnostics from a broader perspective.

Appendix.

The following results of matrix differentiation will be used in the proofs of some propositions.

Lemma A.1

Let $A$ be an $n \times n$ symmetric matrix, and let $x$ , $t$ , and $a$ be vectors of dimension $n \times 1$ . Then

$\begin{aligned} \frac{\partial t^{⊤} A t}{\partial t} = 2 A t, \frac{\partial a^{⊤} t}{\partial t} = a, \frac{\partial A t}{\partial t^{⊤}} = A, \frac{\partial A t}{\partial t} = vec (A) . \end{aligned}$

Proof.

The proof can be found in Graham [13].

Lemma A.2

Let $A$ denote a positive definite $n \times n$ matrix, which is therefore symmetric, and let t be a scalar. Then,

$\begin{aligned} \frac{\partial A^{- 1}}{\partial t} & = - A^{- 1} \frac{\partial A}{\partial t} A^{- 1}, \frac{\partial tr (A)}{\partial t} = tr (\frac{\partial A}{\partial t}), \\ \frac{\partial | A |}{\partial t} & = | A | tr (A^{- 1} \frac{\partial A}{\partial t}), \frac{\partial \log | A |}{\partial t} = tr (A^{- 1} \frac{\partial A}{\partial t}) . \end{aligned}$

Proof.

The proof can be found in Graham [13].

Proof Proof of Proposition 4.1 —

Starting from Equation (24), we can express $Q (θ, υ ∣ \hat{θ})$ as a summation: $Q (θ, υ ∣ \hat{θ}) = \sum_{i = 1}^{s} υ_{i} Q_{i} (θ ∣ \hat{θ}) .$ Substituting $Q_{i} (θ ∣ {\hat{θ}}^{(k)})$ as defined in (19), and omitting the superscript (k) for simplicity, we obtain:

$\begin{aligned} Q (θ, υ ∣ \hat{θ}) = \sum_{i = 1}^{n} Q_{i} (θ, υ ∣ \hat{θ}) = \sum_{i = 1}^{n} [- \frac{1}{2} υ_{i} \ln | Σ | - \frac{1}{2} υ_{i} tr (\hat{Γ_{i}} Σ^{- 1})], \end{aligned}$

where $\hat{Γ_{i}} = \hat{u y_{i}^{2}} - \hat{u y_{i}} {μ_{i}}^{⊤} - μ_{i} (\hat{u y_{i}})^{⊤} + \hat{u_{i}} μ_{i} μ_{i}^{⊤}$ , and $μ_{i} = X_{ic} β_{c}$ . Now, applying the results of Lemmas 1 and 2, and considering $B = \frac{\partial Σ}{\partial σ^{2}}$ and $D = \frac{\partial Σ}{\partial ρ}$ , we have: $\frac{\partial Q_{i} (θ, υ ∣ \hat{θ})}{\partial β_{c}} = υ_{i} [X_{ic}^{⊤} Σ^{- 1} \hat{u y_{i}} - \hat{u_{i}} X_{ic}^{⊤} Σ^{- 1} X_{ic} β_{c}],$ $\frac{\partial Q_{i} (θ, υ ∣ \hat{θ})}{\partial σ^{2}} = - \frac{1}{2} υ_{i} tr (Σ^{- 1} B) + \frac{1}{2} υ_{i} tr (\hat{Γ_{i}} Σ^{- 1} B Σ^{- 1}),$ and $\frac{\partial Q_{i} (θ, υ ∣ \hat{θ})}{\partial ρ} = - \frac{1}{2} υ_{i} tr (Σ^{- 1} D) + \frac{1}{2} υ_{i} tr (\hat{Γ_{i}} Σ^{- 1} D Σ^{- 1}) .$

Now, differentiating with respect to $υ^{⊤}$ and evaluating at $\hat{θ} = \hat{θ} (υ_{0})$ , we obtain:

$\begin{aligned} \nabla_{β_{c}} & = {\frac{\partial^{2} Q (θ, υ ∣ \hat{θ})}{\partial β_{c} \partial υ^{⊤}} |}_{θ = \hat{θ} (υ_{0})} = \sum_{i = 1}^{n} {\frac{\partial^{2} Q_{i} (θ, υ ∣ \hat{θ})}{\partial β_{c} \partial υ^{⊤}} |}_{θ = \hat{θ} (υ_{0})} \\ = \sum_{i = 1}^{n} [X_{ic}^{⊤} Σ^{- 1} \hat{u y_{i}} - \hat{u_{i}} X_{ic}^{⊤} Σ^{- 1} X_{ic} β_{c}], \\ \nabla_{σ^{2}} & = \sum_{i = 1}^{n} [- \frac{1}{2} tr (Σ^{- 1} B) + \frac{1}{2} tr (\hat{Γ_{i}} Σ^{- 1} B Σ^{- 1})] and \nabla_{ρ} \\ = \sum_{i = 1}^{n} [- \frac{1}{2} tr (Σ^{- 1} D) + \frac{1}{2} tr (\hat{Γ_{i}} Σ^{- 1} D Σ^{- 1})] . \end{aligned}$

Proof Proof of Proposition 4.2 —

The perturbed Q-function is as defined in (18), where $Σ (υ_{i}) = υ_{i}^{- 1} Σ$ is used in place of $Σ$ . Therefore, we have:

$\begin{aligned} Q (θ, υ ∣ \hat{θ}) = \sum_{i = 1}^{n} Q_{i} (θ, υ ∣ \hat{θ}) = \sum_{i = 1}^{n} [- \frac{1}{2} \ln [υ_{i}^{- 1} | Σ |] - \frac{1}{2} tr (\hat{Γ_{i}} υ_{i} Σ^{- 1})] . \end{aligned}$

By leveraging Lemmas A.1 and A.2 and proceeding through analogous steps as those in the proof of Proposition 4.1, the result ensues.

Proof Proof of Proposition 4.3 —

The perturbed Q-function follows (18), where we replace $y_{i} (υ) = y_{i} + υ_{i} 1_{2}$ with $y_{i}$ . Therefore, the perturbed Q-function is expressed as

$\begin{aligned} Q (θ, υ ∣ \hat{θ}) = \sum_{i = 1}^{n} Q_{i} (θ, υ ∣ \hat{θ}) = \sum_{i = 1}^{n} [- \frac{1}{2} \ln | Σ | - \frac{1}{2} tr (\hat{Γ_{i}} Σ^{- 1})], \end{aligned}$

where $\hat{Γ_{i}}$ is updated from the proposed perturbation, specifically: $\hat{Γ_{i}} = \hat{u y_{i}^{2}} - \hat{u y_{i}} {μ_{i}}^{⊤} - μ_{i} (\hat{u y_{i}})^{⊤} + \hat{u_{i}} μ_{i} μ_{i}^{⊤} - 2 (\hat{u y_{i}})^{⊤} υ_{i} 1_{2} + 2 \hat{u_{i}} υ_{i}^{2} + 2 \hat{u_{i}} υ_{i} {μ_{i}}^{⊤} 1_{2}$ . Now, applying the results of Lemmas 1 and 2, and considering the same $B$ and $D$ as in the proof of Proposition 4.1, we obtain: $\frac{\partial Q_{i} (θ, υ ∣ \hat{θ})}{\partial β_{c}} = X_{ic}^{⊤} Σ^{- 1} \hat{u y_{i}} - \hat{u_{i}} X_{ic}^{⊤} Σ^{- 1} (υ_{i} 1_{2} + μ_{i}),$ $\frac{\partial Q_{i} (θ, υ ∣ \hat{θ})}{\partial σ^{2}} = - \frac{1}{2} tr (Σ^{- 1} B) + \frac{1}{2} tr (\hat{Γ_{i}} Σ^{- 1} B Σ^{- 1}),$ and $\frac{\partial Q_{i} (θ, υ ∣ \hat{θ})}{\partial ρ} = - \frac{1}{2} tr (Σ^{- 1} D) + \frac{1}{2} tr (\hat{Γ_{i}} Σ^{- 1} D Σ^{- 1})$ , with updated $\hat{Γ_{i}}$ . Again, differentiating with respect to $υ^{⊤}$ and evaluating at $\hat{θ} = \hat{θ} (υ_{0})$ , we obtain:

$\begin{aligned} \nabla_{β_{c}} = {\frac{\partial^{2} Q (θ, υ ∣ \hat{θ})}{\partial β_{c} \partial υ^{⊤}} |}_{θ = \hat{θ} (υ_{0})} = \sum_{i = 1}^{n} {\frac{\partial^{2} Q_{i} (θ, υ ∣ \hat{θ})}{\partial β_{c} \partial υ^{⊤}} |}_{θ = \hat{θ} (υ_{0})} = \sum_{i = 1}^{n} - \hat{u_{i}} X_{ic}^{⊤} Σ^{- 1} 1_{2}, \\ \nabla_{σ^{2}} = \sum_{i = 1}^{n} [\frac{1}{2} tr ({\hat{Γ}}_{i}^{†} Σ^{- 1} B Σ^{- 1})] and \nabla_{ρ} = \sum_{i = 1}^{n} [\frac{1}{2} tr ({\hat{Γ}}_{i}^{†} Σ^{- 1} D Σ^{- 1})], \end{aligned}$

where ${\hat{Γ}}_{i}^{†} = {- 2 (\hat{u y_{i}})^{⊤} 1_{2} + 2 \hat{u_{i}} μ_{i}^{⊤} 1_{2}}$ .

Proof Proof of Proposition 4.4 —

Consider the perturbed explanatory matrix

$\begin{aligned} X_{ic} (υ) = (\begin{array}{cc} x_{i}^{⊤} (υ_{i}) & 0 \\ 0 & w_{i}^{⊤} \end{array}), \end{aligned}$

where $x_{i}^{⊤} (υ_{i}) = x_{i}^{⊤} + υ_{i} 1_{u}^{⊤}$ . Here, $1_{u}^{⊤} = (0, \dots, 1, \dots, 0)$ is a $1 \times p$ vector with 1 in the uth column, $u = 1, \dots, p$ . The perturbed Q-function is defined as in (18), by replacing $X_{ic} (υ)$ with $X_{ic}$ . The unperturbed case corresponds to $υ_{0} = 0 \in R^{n}$ . When substituting $X_{ic} (υ)$ with $X_{ic}$ , ${\hat{Γ}}_{i}$ is updated accordingly, following the procedure outlined in the proof of Proposition 4.3. The conclusion follows from applying Lemmas A.1 and A.2 and following the same steps as in the proofs of the preceding propositions.

Funding Statement

The research conducted by Marcos S. Oliveira was supported by Grant no. 401418/2022-7 Conselho Nacional de Desenvolvimento Científico e Tecnológico - CNPq. Marcos O. Prates acknowledges support from CNPq grant 309186/2021-8, Fundação de Amparo à Pesquisa do Estado de Minas Gerais (FAPEMIG) grant APQ-01837-22, and Coordenação de Aperfeiçoamento de Pessoal de Nível Superior (CAPES). Christian E. Galarza acknowledges the support from the ESPOL Dean of Research. Victor Lachos acknowledges partial financial support from UConn - CLAS's Summer Research Funding Initiative 2023 and Research Excellence Program - UConn.

Disclosure statement

No potential conflict of interest was reported by the author(s).

References

1.Arellano-Valle R., Branco M., and Genton M., A unified view on skewed distributions arising form selections, Can. J. Stat. 34 (2006), pp. 581–601. [Google Scholar]
2.Barros M., Galea M., González M., and Leiva V., Influence diagnostics in the Tobit censored response model, Stat. Methods. Appt. 19 (2010), pp. 379–397. [Google Scholar]
3.Bastos F.S. and Barreto-Souza W., Birnbaum–Saunders sample selection model, J. Appl. Stat. 48 (2021), pp. 1896–1916. [DOI] [PMC free article] [PubMed] [Google Scholar]
4.Bastos F.S., Barreto-Souza W., and Genton M.G., A generalized heckman model with varying sample selection bias and dispersion parameters, Stat. Sin. 32 (2022), pp. 1911–1938. [Google Scholar]
5.Cameron A.C. and Trivedi P.K., Microeconometrics Using Stata, Vol. 5, Stata Press, College Station, TX, 2009. [Google Scholar]
6.Cook R.D., Detection of influential observation in linear regression, Technometrics 19 (1977), pp. 15–18. [Google Scholar]
7.Cook R.D., Assessment of local influence, J. R. Stat. Soc. Ser. B. 48 (1986), pp. 133–169. [Google Scholar]
8.Cook R.D. and Weisberg S., Residuals and Influence in Regression, Chapman & Hall/CRC, Boca Raton, FL, 1982. [Google Scholar]
9.Ding P., Bayesian robust inference of sample selection using selection-t models, J. Multivar. Anal. 124 (2014), pp. 451–464. [Google Scholar]
10.Galarza C.E., Matos L.A., Castro L.M., and Lachos V.H., Moments of the doubly truncated selection elliptical distributions with emphasis on the unified multivariate skew-t distribution, J. Multivar. Anal. 189 (2022), p. 104944. [Google Scholar]
11.Galarza C.E., Matos L.A., Dey D.K., and Lachos V.H., On moments of folded and doubly truncated multivariate extended skew-normal distributions, J. Comput. Graph. Stat. 31 (2022), pp. 455–465. [Google Scholar]
12.Garay A., Castro L., Leskow J., and Lachos V.H., Censored linear regression models for irregularly observed longitudinal data using the multivariate-t distribution, Stat. Methods. Med. Res. 26 (2014), pp. 542–566. [DOI] [PubMed] [Google Scholar]
13.Graham A., Kronecker Products and Matrix Calculus: With Applications, Ellis Horwood series in mathematics and its applications, Horwood, 1981. [Google Scholar]
14.Heckman J., Shadow prices, market wages, and labor supply, Econometrica 42 (1974), pp. 679–694. [Google Scholar]
15.Heckman J., Sample selection bias as a specification error, Econometrica 47 (1979), pp. 153–161. [Google Scholar]
16.Henningsen A., Toomet O., and Petersen S., sampleSelection: Sample selection models, R Package Version 1.2-0 https://cran.r-project.org/web/packages/sampleSelection/index.html (2019).
17.Kleiber C. and Zeileis A., Applied Econometrics with R, Springer-Verlag, New York, 2008. [Google Scholar]
18.Lachos V.H., Ghosh P., and Arellano-Valle R.B., Likelihood based inference for skew–normal independent linear mixed models, Stat. Sin. 20 (2010), pp. 303–322. [Google Scholar]
19.Lachos V.H., Prates M.O., and Dey D.K., Heckman selection-t model: Parameter estimation via the EM-algorithm, J. Multivar. Anal. 184 (2021), p. 104737. [Google Scholar]
20.Lee L.F., Generalized econometric models with selectivity, Econometrica 51 (1983), pp. 507–512. [Google Scholar]
21.Lee M.-j., Treatment effects in sample selection models and their nonparametric estimation, J. Econom. 167 (2012), pp. 317–329. [Google Scholar]
22.Lee S.Y. and Xu L., Influence analysis of nonlinear mixed-effects models, Comput. Stat. Data Anal. 45 (2004), pp. 321–341. [Google Scholar]
23.Lim H., Ordonez J.A., Lachos V.H., and Punzo A., Heckman selection contaminated normal model, arXiv preprint arXiv:2409.12348 (2024).
24.Marchenko Y.V. and Genton M.G., A Heckman selection-t model, J. Am. Stat. Assoc. 107 (2012), pp. 304–317. [Google Scholar]
25.Massuia M.B., Cabral C.R.B., Matos L.A., and Lachos V.H., Influence diagnostics for Student-t censored linear regression models, Statistics 49 (2015), pp. 1074–1094. [Google Scholar]
26.Matos L.A., Lachos V.H., Balakrishnan N., and Labra F.V., Influence diagnostics in linear and nonlinear mixed-effects models with censored data, Comput. Stat. Data Anal. 57 (2013), pp. 450–464. [Google Scholar]
27.Matos L.A., Prates M.O., Chen M.H., and Lachos V.H., Likelihood-based inference for mixed-effects models with censored response using the multivariate-t distribution, Stat. Sin. 23 (2013), pp. 1323–1342. [Google Scholar]
28.Miao W., Ding P., and Geng Z., Identifiability of normal and normal mixture models with nonignorable missing data, J. Am. Stat. Assoc. 111 (2016), pp. 1673–1683. [Google Scholar]
29.Mroz T.A., The sensitivity of an empirical model of married women's hours of work to economic and statistical assumptions, Econometrica 55 (1987), pp. 765–799. [Google Scholar]
30.Ogundimu E.O. and Hutton J.L., A sample selection model with skew-normal distribution, Scand. J. Stat. 43 (2016), pp. 172–190. [Google Scholar]
31.Pan J., Fei Y., and Foster P., Case-deletion diagnostics for linear mixed models, Technometrics 56 (2014), pp. 269–281. [Google Scholar]
32.Poon W.Y. and Poon Y.S., Conformal normal curvature and assessment of local influence, J. R. Stat. Soc. Ser. B 61 (1999), pp. 51–61. [Google Scholar]
33.Saulo H., Vila R., Cordeiro S.S., and Leiva V., Bivariate symmetric heckman models and their characterization, J. Multivar. Anal. 193 (2023), p. 105097. [Google Scholar]
34.Vaida F. and Liu L., Fast implementation for normal mixed effects models with censored response, J. Comput. Graph. Stat. 18 (2009), pp. 797–817. [DOI] [PMC free article] [PubMed] [Google Scholar]
35.Valeriano K.A., Galarza C.E., Matos L.A., and Lachos V.H., Likelihood-based inference for the multivariate skew-t regression with censored or missing responses, J. Multivar. Anal. 196 (2023), p. 105174. [Google Scholar]
36.Zhao J., Kim H.-J., and Kim H.-M., New EM-type algorithms for the Heckman selection model, Comput. Stat. Data. Anal. 146 (2020), p. 106930. [Google Scholar]
37.Zhu H., Ibrahim J.G., and Shi X., Diagnostic measures for generalized linear models with missing covariates, Scand. J. Stat. 36 (2009), pp. 686–712. [DOI] [PMC free article] [PubMed] [Google Scholar]
38.Zhu H. and Lee S., Local influence for incomplete-data models, J. R. Stat. Soc. Ser. B 63 (2001), pp. 111–126. [Google Scholar]
39.Zhu H., Lee S., Wei B., and Zhou J., Case-deletion measures for models with incomplete data, Biometrika 88 (2001), pp. 727–737. [Google Scholar]

[CIT0001] 1.Arellano-Valle R., Branco M., and Genton M., A unified view on skewed distributions arising form selections, Can. J. Stat. 34 (2006), pp. 581–601. [Google Scholar]

[CIT0002] 2.Barros M., Galea M., González M., and Leiva V., Influence diagnostics in the Tobit censored response model, Stat. Methods. Appt. 19 (2010), pp. 379–397. [Google Scholar]

[CIT0003] 3.Bastos F.S. and Barreto-Souza W., Birnbaum–Saunders sample selection model, J. Appl. Stat. 48 (2021), pp. 1896–1916. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CIT0004] 4.Bastos F.S., Barreto-Souza W., and Genton M.G., A generalized heckman model with varying sample selection bias and dispersion parameters, Stat. Sin. 32 (2022), pp. 1911–1938. [Google Scholar]

[CIT0005] 5.Cameron A.C. and Trivedi P.K., Microeconometrics Using Stata, Vol. 5, Stata Press, College Station, TX, 2009. [Google Scholar]

[CIT0006] 6.Cook R.D., Detection of influential observation in linear regression, Technometrics 19 (1977), pp. 15–18. [Google Scholar]

[CIT0007] 7.Cook R.D., Assessment of local influence, J. R. Stat. Soc. Ser. B. 48 (1986), pp. 133–169. [Google Scholar]

[CIT0008] 8.Cook R.D. and Weisberg S., Residuals and Influence in Regression, Chapman & Hall/CRC, Boca Raton, FL, 1982. [Google Scholar]

[CIT0009] 9.Ding P., Bayesian robust inference of sample selection using selection-t models, J. Multivar. Anal. 124 (2014), pp. 451–464. [Google Scholar]

[CIT0010] 10.Galarza C.E., Matos L.A., Castro L.M., and Lachos V.H., Moments of the doubly truncated selection elliptical distributions with emphasis on the unified multivariate skew-t distribution, J. Multivar. Anal. 189 (2022), p. 104944. [Google Scholar]

[CIT0011] 11.Galarza C.E., Matos L.A., Dey D.K., and Lachos V.H., On moments of folded and doubly truncated multivariate extended skew-normal distributions, J. Comput. Graph. Stat. 31 (2022), pp. 455–465. [Google Scholar]

[CIT0012] 12.Garay A., Castro L., Leskow J., and Lachos V.H., Censored linear regression models for irregularly observed longitudinal data using the multivariate-t distribution, Stat. Methods. Med. Res. 26 (2014), pp. 542–566. [DOI] [PubMed] [Google Scholar]

[CIT0013] 13.Graham A., Kronecker Products and Matrix Calculus: With Applications, Ellis Horwood series in mathematics and its applications, Horwood, 1981. [Google Scholar]

[CIT0014] 14.Heckman J., Shadow prices, market wages, and labor supply, Econometrica 42 (1974), pp. 679–694. [Google Scholar]

[CIT0015] 15.Heckman J., Sample selection bias as a specification error, Econometrica 47 (1979), pp. 153–161. [Google Scholar]

[CIT0016] 16.Henningsen A., Toomet O., and Petersen S., sampleSelection: Sample selection models, R Package Version 1.2-0 https://cran.r-project.org/web/packages/sampleSelection/index.html (2019).

[CIT0017] 17.Kleiber C. and Zeileis A., Applied Econometrics with R, Springer-Verlag, New York, 2008. [Google Scholar]

[CIT0018] 18.Lachos V.H., Ghosh P., and Arellano-Valle R.B., Likelihood based inference for skew–normal independent linear mixed models, Stat. Sin. 20 (2010), pp. 303–322. [Google Scholar]

[CIT0019] 19.Lachos V.H., Prates M.O., and Dey D.K., Heckman selection-t model: Parameter estimation via the EM-algorithm, J. Multivar. Anal. 184 (2021), p. 104737. [Google Scholar]

[CIT0020] 20.Lee L.F., Generalized econometric models with selectivity, Econometrica 51 (1983), pp. 507–512. [Google Scholar]

[CIT0021] 21.Lee M.-j., Treatment effects in sample selection models and their nonparametric estimation, J. Econom. 167 (2012), pp. 317–329. [Google Scholar]

[CIT0022] 22.Lee S.Y. and Xu L., Influence analysis of nonlinear mixed-effects models, Comput. Stat. Data Anal. 45 (2004), pp. 321–341. [Google Scholar]

[CIT0023] 23.Lim H., Ordonez J.A., Lachos V.H., and Punzo A., Heckman selection contaminated normal model, arXiv preprint arXiv:2409.12348 (2024).

[CIT0024] 24.Marchenko Y.V. and Genton M.G., A Heckman selection-t model, J. Am. Stat. Assoc. 107 (2012), pp. 304–317. [Google Scholar]

[CIT0025] 25.Massuia M.B., Cabral C.R.B., Matos L.A., and Lachos V.H., Influence diagnostics for Student-t censored linear regression models, Statistics 49 (2015), pp. 1074–1094. [Google Scholar]

[CIT0026] 26.Matos L.A., Lachos V.H., Balakrishnan N., and Labra F.V., Influence diagnostics in linear and nonlinear mixed-effects models with censored data, Comput. Stat. Data Anal. 57 (2013), pp. 450–464. [Google Scholar]

[CIT0027] 27.Matos L.A., Prates M.O., Chen M.H., and Lachos V.H., Likelihood-based inference for mixed-effects models with censored response using the multivariate-t distribution, Stat. Sin. 23 (2013), pp. 1323–1342. [Google Scholar]

[CIT0028] 28.Miao W., Ding P., and Geng Z., Identifiability of normal and normal mixture models with nonignorable missing data, J. Am. Stat. Assoc. 111 (2016), pp. 1673–1683. [Google Scholar]

[CIT0029] 29.Mroz T.A., The sensitivity of an empirical model of married women's hours of work to economic and statistical assumptions, Econometrica 55 (1987), pp. 765–799. [Google Scholar]

[CIT0030] 30.Ogundimu E.O. and Hutton J.L., A sample selection model with skew-normal distribution, Scand. J. Stat. 43 (2016), pp. 172–190. [Google Scholar]

[CIT0031] 31.Pan J., Fei Y., and Foster P., Case-deletion diagnostics for linear mixed models, Technometrics 56 (2014), pp. 269–281. [Google Scholar]

[CIT0032] 32.Poon W.Y. and Poon Y.S., Conformal normal curvature and assessment of local influence, J. R. Stat. Soc. Ser. B 61 (1999), pp. 51–61. [Google Scholar]

[CIT0033] 33.Saulo H., Vila R., Cordeiro S.S., and Leiva V., Bivariate symmetric heckman models and their characterization, J. Multivar. Anal. 193 (2023), p. 105097. [Google Scholar]

[CIT0034] 34.Vaida F. and Liu L., Fast implementation for normal mixed effects models with censored response, J. Comput. Graph. Stat. 18 (2009), pp. 797–817. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CIT0035] 35.Valeriano K.A., Galarza C.E., Matos L.A., and Lachos V.H., Likelihood-based inference for the multivariate skew-t regression with censored or missing responses, J. Multivar. Anal. 196 (2023), p. 105174. [Google Scholar]

[CIT0036] 36.Zhao J., Kim H.-J., and Kim H.-M., New EM-type algorithms for the Heckman selection model, Comput. Stat. Data. Anal. 146 (2020), p. 106930. [Google Scholar]

[CIT0037] 37.Zhu H., Ibrahim J.G., and Shi X., Diagnostic measures for generalized linear models with missing covariates, Scand. J. Stat. 36 (2009), pp. 686–712. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CIT0038] 38.Zhu H. and Lee S., Local influence for incomplete-data models, J. R. Stat. Soc. Ser. B 63 (2001), pp. 111–126. [Google Scholar]

[CIT0039] 39.Zhu H., Lee S., Wei B., and Zhou J., Case-deletion measures for models with incomplete data, Biometrika 88 (2001), pp. 727–737. [Google Scholar]

PERMALINK

Influence diagnostics in the Heckman selection models based on EM algorithms

Marcos S Oliveira

Marcos O Prates

Christian E Galarza

Victor H Lachos

Abstract

1. Introduction

2. Background

2.1. The multivariate Student's-t distribution and its truncated version

2.2. The multivariate extended skew-t distribution

3. The Heckman selection model

3.1. The classical Heckman selection model

3.2. The Heckman selection-t model

4. Influence diagnostics

4.1. Case-deletion measures

4.2. Local influence

4.3. Perturbation schemes

4.3.1. Case-weight perturbation

Proposition 4.1

Proof.

4.3.2. Scale perturbation

Proposition 4.2

Proof.

4.3.3. Response perturbation

Proposition 4.3

Proof.

4.3.4. Explanatory variable perturbation

Proposition 4.4

Proof.

5. Simulation studies

Table 1.

Figure 1.

6. Applications

6.1. Ambulatory expenditures

Table 2.

Figure 2.

Figure 3.

Figure 4.

Table 3.

6.2. Mroz: labor supply data

Table 4.

Figure 5.

Figure 6.

Figure 7.

Figure 8.

Table 5.

7. Conclusions

Appendix.

Lemma A.1

Proof.

Lemma A.2

Proof.

Proof Proof of Proposition 4.1 —

Proof Proof of Proposition 4.2 —

Proof Proof of Proposition 4.3 —

Proof Proof of Proposition 4.4 —

Funding Statement

Disclosure statement

References

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases