Linear censored regression models with skew scale mixtures of normal distributions

Daniel C F Guzmán; Clécio S Ferreira; Camila B Zeller

doi:10.1080/02664763.2020.1795814

. 2020 Jul 21;48(16):3060–3085. doi: 10.1080/02664763.2020.1795814

Linear censored regression models with skew scale mixtures of normal distributions

Daniel C F Guzmán ¹, Clécio S Ferreira ^1,^CONTACT, Camila B Zeller ¹

PMCID: PMC9041878 PMID: 35707255

Abstract

A special source of difficulty in the statistical analysis is the possibility that some subjects may not have a complete observation of the response variable. Such incomplete observation of the response variable is called censoring. Censorship can occur for a variety of reasons, including limitations of measurement equipment, design of the experiment, and non-occurrence of the event of interest until the end of the study. In the presence of censoring, the dependence of the response variable on the explanatory variables can be explored through regression analysis. In this paper, we propose to examine the censorship problem in context of the class of asymmetric, i.e., we have proposed a linear regression model with censored responses based on skew scale mixtures of normal distributions. We develop a Monte Carlo EM (MCEM) algorithm to perform maximum likelihood inference of the parameters in the proposed linear censored regression models with skew scale mixtures of normal distributions. The MCEM algorithm has been discussed with an emphasis on the skew-normal, skew Student-t-normal, skew-slash and skew-contaminated normal distributions. To examine the performance of the proposed method, we present some simulation studies and analyze a real dataset.

Keywords: Censoring, MCEM algorithm, linear regression models, skew scale mixtures of normal distributions

1. Introduction

Regression models with censored dependent variable are applied in many fields, for instance, censoring in astronomical data due to nondetections. According to Feigelson and Babu [10] due to limited sensitivities, some objects may be undetected, leading to upper limits in their derived luminosities. It is emphasized that, even censored, all results from a study should be used in statistical analysis. The omission of censorship can lead to vicious conclusions. The importance of censorship can be noted from the large number of articles have been published in various journals. For instance, Arellano-Valle et al. [2] and Massuia et al. [20] proposed an extension of the CR model (regression model with censored dependent variable) with normal errors (N-CR model) to Student-t (T-CR model) errors. Garay et al. [15] proposed a CR model where the observational errors follow an SMN distributions (SMN-CR models) introduced by Andrews and Mallows [1]. Although these models are attractive, now, let us turn our attention away from Gaussian models (or symmetrical models), and study other important models. In asymmetric context, Massuia et al. [21] developed a Bayesian framework for CR models by assuming that the random errors follow an SMSN distributions [6]. Recently, Mattos et al. [23] proposed the CR models considering the SMSN class of distributions from a likelihood-based perspective (SMSN-CR models).

It is important to note that there is another family of distributions that takes into account asymmetry and heavy tails simultaneously, introduced by Ferreira et al. [11], called skew scale mixtures of normal distributions (SSMN). There are some important differences between the classes of SSMN and SMSN distributions. First, the mechanisms for generating random samples are slightly different, which produces different structures of distributions. Second, these classes present different coefficients of asymmetry and kurtosis. Thus, it is interesting to investigate the performance of the two classes under certain specific models, like linear censored regression models. In this work, we propose a linear regression models based on SSMN distributions for censored data, extending the works of Arellano-Valle et al. [2], Massuia et al. [20], Garay et al. [15] and supplementing the work of Massuia et al. [21]. Therefore, in this article, we provide some additional results for the censorship problem in a context of asymmetry.

The rest of the paper is organized as follows. In Section 2, we introduce the linear censored regression models with skew scale mixtures of normal distributions, called SSMN-CR models, including an type-EM algorithm for maximum-likelihood (ML) estimation. In Section 3, we describe how to obtain the standard errors of the ML estimates of the parameters in the SSMN-CR models. In Section 4 we present the results of simula- tion studies to explore the proposed CR model in different contexts, i.e, performance of the ML estimates, parameter recovery and selection criteria, imputation of censored observations and influence of a single outlier. Finally, in Section 5 we show the applicability of our proposal by analyzing a real dataset. All computations were carried out using the R software [24]. R code for analyzing the application may be downloaded from the website https://github.com/ClecioFerreira/SSMN-CR. Section 6 presents some concluding remarks.

2. The proposed model

2.1. Skew scale mixtures of normal distributions

In this section, we present the SSMN distributions introduced by Ferreira et al. [11]. We start with the definition of the skew-normal (SN) distribution that will be used in this work; see Azzalini [4]) for more details. A random variable $Y \sim S N (μ, σ^{2}, λ),$ if its probability density function (pdf) is given by

f (y | μ, σ^{2}, λ) = 2 φ (y | μ, σ^{2}) Φ (\frac{λ (y - μ)}{σ}), y \in R,

(1)

where $φ (x | μ, σ^{2})$ and $Φ (x | μ, σ^{2})$ are the pdf and the cumulative distribution function (cdf), respectively, of the $N (μ, σ^{2})$ distribution evaluated at x. The cdf of the Y is given by

F_{S N} (y | μ, σ^{2}, λ) = 2 Φ_{2} ((\frac{y - μ}{σ}, 0), 0, Σ),

(2)

where $Σ = (\begin{matrix} c c 1 & - δ \\ - δ & 1 \end{matrix})$ and $δ = \frac{λ}{(1 + λ^{2})^{1 / 2}}$ . Its stochastic representation is given by

Y \overset{d}{=} μ + σ [δ ∣ T_{0} ∣ + (1 - δ^{2})^{1 / 2} T_{1}],

(3)

where $∣ T_{0} ∣$ denotes the absolute value of $T_{0}$ , $T_{0} \sim N (0, 1)$ and $T_{1} \sim N (0, 1)$ are independent, and $\overset{d}{=}$ means ‘distributed as’. One particular case of this distribution is the normal distribution $(Y \sim N (μ, σ^{2})),$ when $λ = 0$ .

In a symmetric context, Lange and Sinsheimer [18] provided a group of thick-tailed distributions which has the normal distribution as particular case too. A random variable $Y \sim S M N (μ, σ^{2}, H; κ),$ if its pdf assumes the form

f_{0} (y | μ, σ^{2}, H) = \int_{0}^{\infty} φ (y | μ, κ (u) σ^{2}) d H (u | τ),

(4)

where $H (u | τ)$ is the cdf of a positive random variable U indexed by the parameter vector $τ$ and $κ (.)$ is a strictly positive function. Moreover, when $μ = 0$ and $σ^{2} = 1$ , we denote $Y \sim S M N (H; κ)$ .

An asymmetric version of SMN distributions was introduced by Ferreira et al. [11] as a challenging family for statistical procedures with asymmetric data. This new family of distributions contains all the distributions studied by Lange and Sinsheimer [18] with an extra parameter, which regulates the skewness of the distribution.

Definition 2.1

A random variable Y follows an SSMN with location parameter $μ \in R$ , scale factor $σ^{2}$ and skewness parameter $λ \in R$ , if its pdf is given by

$f_{S S M N} (y | μ, σ^{2}, λ, H) = 2 f_{0} (y | μ, σ^{2}, H) Φ (\frac{λ (y - μ)}{σ}),$ (5)

where $f_{0} (y | μ, σ^{2}, H)$ is as defined in Equation (4). For a random variable with pdf as in equation (5), we use the notation $Y \sim S S M N (μ, σ^{2}, λ, H; κ)$ . If $μ = 0$ and $σ^{2} = 1$ we refer to it as a standard SSMN distributions and we denote it by $S S M N (λ, H; κ)$ . If $λ = 0$ , we have the SMN distributions.

Note that if $Y \sim S S M N (μ, σ^{2}, λ, H; κ)$ , then $Z = (Y - μ) / σ \sim S S M N (λ, H; κ)$ . According to Ferreira et al. [12], a random variable $Y \sim S S M N (μ, σ^{2}, λ, H; κ)$ has the following stochastic representation

\begin{array}{rcl} Y & = & μ + σ [\frac{λ ∣ T_{0} ∣}{[U (U + λ^{2})]^{\frac{1}{2}}} + \frac{T_{1}}{(U + λ^{2})^{\frac{1}{2}}}], \end{array}

(6)

where $U \sim H (τ), T_{0} \sim N (0, 1) a n d T_{1} \sim N (0, 1)$ are mutually independent. The stochastic representation in equation (6) facilitates to generate random samples from the truncated SSMN distributions, whose definition is given below (see Definition 2.2).

An SSMN random variable Y with a pdf as in Equation (5) has a hierarchical representation given by Proposition 2.1. The proof can be found in Ferreira et al. [11].

Proposition 2.1

Let $Y \sim S S M N (μ, σ^{2}, λ, H; κ)$ . Then its hierarchical representation is given by

$\begin{aligned} Y | U = u & \sim S N (μ, σ^{2} κ (u), λ κ^{1 / 2} (u)), \\ U & \sim H (τ) . \end{aligned}$

From Proposition 2.1, this convenient hierarchical representation facilitates EM-type implementation for the maximum-likelihood estimation and it can be used to simulate data. For example, the skew Student-t-normal distribution is derived from Proposition 2.1, by taking $κ (u) = u^{- 1}$ , $U \sim G a m m a (τ / 2, τ / 2), τ > 0,$ and the conditional distribution $Y | U$ is obtained of the stochastic representation given in (3). This class of distributions also includes the skew-normal distribution when U = 1.

The form of an SSMN distribution is determined by the distribution of U, whose distribution is indexed by a vector of parameters or scalar $τ$ that controls the tails of the SSMN distributions. In this work we will concentrate on some special cases of the SSMN distributions, when $κ (u) = u^{- 1}$ , i.e. the skew Student-t-normal (ST), the skew-slash (SSL) and the skew-contaminated normal (SCN), whose properties have been widely discussed in Ferreira et al. [11]. Thus, to avoid excessive notation, from now on we will no longer use the κ variable.

An useful result is the cdf of SSMN distributions. Using the expressions (2) and Proposition 1 with $κ (u) = u^{- 1}$ , we have that

F_{S S M N} (y | μ, σ^{2}, λ) = 2 E_{U} [Φ_{2} ((\frac{(y - μ) U^{1 / 2}}{σ}, 0), 0, Σ_{U})],

(7)

where $Σ_{U} = (\begin{matrix} c c 1 & - δ_{U} \\ - δ_{U} & 1 \end{matrix})$ , with $δ_{U} = \frac{λ}{(U + λ^{2})^{1 / 2}}$ .

Another important class of distributions, which will be useful for implementing the type-EM algorithm, is the truncated SSMN distributions, given by the following definition.

Definition 2.2

Let $X \sim S S M N (μ, σ^{2}, λ, H)$ and $P (a < X < b) > 0$ , with a<b. A random variable Y has a truncated SSMN distribution in the interval $[a, b]$ , denoted by $Y \sim T S S M N (μ, σ^{2}, λ, H; [a, b])$ , if it has the same distribution as $X | X \in [a, b] .$ Here [a, b] means that each extreme of the interval can be either open or closed. Thus, the pdf of the random variable $Y \sim T S S M N (μ, σ^{2}, λ, H; [a, b])$ is

$\begin{aligned} f_{T S S M N} (y | μ, σ^{2}, λ, H; [a, b]) \\ = \frac{f_{S S M N} (y | μ, σ^{2}, λ, H)}{F_{S S M N} (b | μ, σ^{2}, λ, H) - F_{S S M N} (a | μ, σ^{2}, λ, H)} I_{[a, b]} (y), \end{aligned}$

where $I_{A} (\cdot)$ denotes the indicator function of the set A, i.e. $I_{A} (y) = 1$ if $y \in A$ and $I_{A} (y) = 0$ otherwise, $f_{S S M N} (\cdot | μ, σ^{2}, λ, H)$ and $F_{S S M N} (\cdot | μ, σ^{2}, λ, H)$ represent the pdf and cdf of the SSMN distributions, respectively.

2.2. Model specification and ML estimation via the EM algorithm

In this section, we define the linear regression model with censored response variable and distributed errors in the family of SSMN distributions. First, consider the linear regression model, as defined by Ferreira et al. [13], given by

Y_{i} = x_{i}^{T} β + ξ_{i}, ξ_{i} \overset{i i d}{\sim} S S M N (0, σ^{2}, λ, H), i = 1, \dots, n,

(8)

where $Y_{i}$ is an observed continuous response variable for individual i and $ξ_{i}$ is a random error. Associated with individual i, it is assumed a known $p \times 1$ covariate vector $x_{i}$ , which we use to specify the linear predictor $μ_{i} = x_{i}^{⊤} β$ , where $β$ is a p-dimensional vector of unknown regression coefficients.

We extend the linear regression model defined in (8) with the assumption that the response variable is not fully observed for all subjects. Thus, for the ith subject and assuming left-censoring, $Y_{i}$ is a latent variable and the observed data take the form $(V_{i}, ρ_{i})$ , where

V_{i} = \{\begin{array}{cc} c_{i}, & i f ρ_{i} = 1 (i . e . Y_{i} \leq c_{i}), \\ Y_{i}, & i f ρ_{i} = 0 (i . e . Y_{i} > c_{i}), \end{array}

(9)

for some known threshold point $c_{i}, i = 1, \dots, n .$ The censoring indicator $ρ_{i} = 1$ (or $ρ_{i} = 0$ ) means that the ith observation is censored (or not censored). The extensions of our results to right-censoring are immediate: it is enough to transform the response $Y_{i}$ and censoring level $c_{i}$ to $- Y_{i}$ and $- c_{i}$ , respectively. We call the structure defined by (8) and (9) as the SSMN-CR model (linear censored regression models with skew scale mixtures of normal distributions). We use specific notations for particular SSMN distributions, for example, SN-CR and ST-CR in the skew-normal and skew Student-t-normal cases, respectively.

The log-likelihood function of the SSMN-CR model is given by

ℓ (θ | v, ρ) = \sum_{i = 1}^{n} ρ_{i} \log [F_{j} (\frac{v_{i} - μ_{i}}{σ})] + \sum_{i = 1}^{n} (1 - ρ_{i}) \log [f_{S S M N} (v_{i} | θ, H)],

(10)

where $θ = (β^{⊤}, σ^{2}, λ, τ^{⊤})^{⊤}$ , $v = (v_{1}, v_{2}, \dots, v_{n})$ is the observed sample of $V = (V_{1}, V_{2}, \dots, V_{n}),$ $ρ = (ρ_{1}, ρ_{2}, \dots, ρ_{n})$ , and $F_{j} (\cdot)$ denotes the cdf of the $S S M N (0, 1, λ, H)$ distribution. Since the observed log-likelihood function involves complex expressions, it is very difficult to maximize directly $ℓ (θ | v, ρ)$ for ML estimation. To overcome this problem, we propose an EM-type algorithm based on an augmented data representation of the SSMN-CR model. To do so, observe that, given a sample of size n from the model, the vector of censored responses $Y = (Y_{1}, \dots, Y_{n})^{⊤}$ is seen as a latent (partially unobservable) random vector. From Proposition 2.1 and Equation (3), the complete data are given by

\begin{aligned} Y_{i} | U_{i} = u_{i}, T_{i} = t_{i} & \overset{i n d}{\sim} N (μ_{i} + \frac{σ λ}{(u_{i} (u_{i} + λ^{2}))^{1 / 2}} t_{i}, \frac{σ^{2}}{u_{i} + λ^{2}}) \\ U_{i} & \overset{i i d}{\sim} H (τ) \\ T_{i} & \overset{i i d}{\sim} T N (0, 1; (0, + \infty)), i = 1, \dots, n, \end{aligned}

(11)

all independent, where $T N (r, s; (a, b))$ denotes the univariate normal distribution $(N (r, s))$ , truncated on the interval $(a, b)$ .

Defining the vectors $y = (y_{1}, \dots, y_{n})^{⊤},$ $t = (t_{1}, \dots, t_{n})^{⊤}$ and $u = (u_{1}, \dots, u_{n})^{⊤}$ , we have that the complete data log-likelihood associated with $y_{c} = (v^{⊤}, ρ^{⊤}, y^{⊤}, t^{⊤}, u^{⊤})$ is

\begin{aligned} ℓ_{c} (θ | y_{c}) & = c - n \log σ^{2} - \frac{1}{2 σ^{2}} \sum_{i = 1}^{n} [u_{i} y_{i}^{2} - 2 μ_{i} u_{i} y_{i} + μ_{i}^{2} u_{i} + t_{i}^{2} \\ - 2 λ t_{i} y_{i} + 2 λ μ_{i} t_{i} + λ^{2} (y_{i}^{2} - 2 μ_{i} y_{i} + μ_{i}^{2})], \end{aligned}

where c is a constant that does not depend on $θ$ . As in the original proposal of Dempster et al. [7], the E-step of our algorithm consists of taking the conditional expectation $Q (θ | {\hat{θ}}^{(k)}) = E [ℓ_{c} (θ | y_{c}) | v, ρ, {\hat{θ}}^{(k)}]$ , where ${\hat{θ}}^{(k)} = ({\hat{β}}^{(k) ⊤}, {\hat{σ^{2}}}^{(k)}, {\hat{λ}}^{(k)}, {\hat{τ}}^{(k) ⊤})^{⊤}$ is the current estimate of $θ$ at the kth iteration. For cases in which the E-step has no analytic form, Wei and Tanner [26] proposed the MCEM algorithm, in which the E-step is replaced by a Monte Carlo approximation based on a number of independent simulations of the missing data. The M-step consists of maximization of $Q (θ | {\hat{θ}}^{(k)})$ with respect to $θ$ . Thus, our MCEM algorithm for the SSMN-CR model can be summarized in the following steps:

E-step: Given the current estimate ${\hat{θ}}^{(k)}$ at the kth iteration, we obtain the conditional expectation of the complete data log-likelihood function given the observed $v$ and $ρ$ , named the Q-function, which is given by $Q (θ | {\hat{θ}}^{(k)}) = \sum_{i = 1}^{n} Q_{i} (θ | {\hat{θ}}^{(k)}),$ where, excluding unimportant constants,

\begin{aligned} Q_{i} (θ | {\hat{θ}}^{(k)}) & = - \log σ^{2} - \frac{1}{2 σ^{2}} [{\hat{u y^{2}}}_{i}^{(k)} - 2 μ_{i} {\hat{u y}}_{i}^{(k)} + μ_{i}^{2} {\hat{u}}_{i}^{(k)} + {\hat{t^{2}}}_{i}^{(k)} - 2 λ {\hat{t y}}_{i}^{(k)} \\ + 2 λ μ_{i} {\hat{t}}_{i}^{(k)} + λ^{2} ({\hat{y^{2}}}_{i}^{(k)} - 2 μ_{i} {\hat{y}}_{i}^{(k)} + μ_{i}^{2})], \end{aligned}

(12)

where ${\hat{u y}}_{i}^{(k)} = E [U_{i} Y_{i} | v_{i}, ρ_{i}, {\hat{θ}}^{(k)}]$ , ${\hat{u y^{2}}}_{i}^{(k)} = E [U_{i} Y_{i}^{2} | v_{i}, ρ_{i}, {\hat{θ}}^{(k)}]$ , ${\hat{u}}_{i}^{(k)} = E [U_{i} | v_{i}, ρ_{i}, {\hat{θ}}^{(k)}]$ , ${\hat{t^{2}}}_{i}^{(k)} = E [T_{i}^{2} | v_{i}, ρ_{i}, {\hat{θ}}^{(k)}]$ , ${\hat{t y}}_{i}^{(k)} = E [T_{i} Y_{i} | v_{i}, ρ_{i}, {\hat{θ}}^{(k)}]$ , ${\hat{t}}_{i}^{(k)} = E [T_{i} | v_{i}, ρ_{i}, {\hat{θ}}^{(k)}]$ , ${\hat{y}}_{i}^{(k)} = E [Y_{i} | v_{i}, ρ_{i}, {\hat{θ}}^{(k)}]$ and ${\hat{y^{2}}}_{i}^{(k)} = E [Y_{i}^{2} | v_{i}, ρ_{i}, {\hat{θ}}^{(k)}]$ . By using known properties of conditional expectation, we obtain

For an uncensored observation i: in this case, $ρ_{i} = 0$ , so $V_{i} = Y_{i}$ and thus, ${\hat{u y}}_{i}^{(k)} = y_{i} E [U_{i} | y_{i}, {\hat{θ}}^{(k)}], {\hat{u y^{2}}}_{i}^{(k)} = y_{i}^{2} E [U_{i} | y_{i}, {\hat{θ}}^{(k)}], {\hat{t y}}_{i}^{(k)} = y_{i} E [T_{i} | y_{i}, {\hat{θ}}^{(k)}],$ ${\hat{y}}_{i}^{(k)} = y_{i}$ and ${\hat{y^{2}}}_{i}^{(k)} = y_{i}^{2} .$ In this case, we have a closed-form expression for ${\hat{t^{2}}}_{i}^{(k)} = E [T_{i}^{2} | y_{i}, {\hat{θ}}^{(k)}],$ ${\hat{t}}_{i}^{(k)} = E [T_{i} | y_{i}, {\hat{θ}}^{(k)}]$ and ${\hat{u}}_{i}^{(k)} = E [U_{i} | y_{i}, {\hat{θ}}^{(k)}]$ for the SN, ST, SSL and SCN distributions, as can be found in Ferreira et al. [13].
For a censored observation i: In this case, we have $ρ_{i} = 1$ , $Y_{i} \leq c_{i}$ and $V_{i} = c_{i}$ . Therefore, the conditional expectations
$E [U_{i}^{w} T_{i}^{q} Y_{i}^{s} | v_{i}, Y_{i} \leq c_{i}, {\hat{θ}}^{(k)}],$ (13)
for $w, q, s = 0, 1, 2,$ by not having a closed form, requires us to introduce two intermediate steps in order to replace the E-step by a stochastic approximation using simulated data. Thus, the iteration k consists of the following steps:
- --
  Let $Y^{c} = (Y_{1}^{c}, \dots, Y_{n_{c}}^{c})^{⊤}$ the vector of $n_{c}$ censored cases, where $Y_{i}^{c}$ is generated from the $T S S M N (μ_{i}, σ^{2}, λ, H; (- \infty, c_{i}])$ for $i = 1, \dots, n_{c}$ . Thus, the new vector of observations $Y^{(l, k)} = (Y_{1}^{c (l, k)}, \dots, Y_{n_{c}}^{c (l, k)}, Y_{n_{c} + 1}, \dots, Y_{n})^{⊤}$ is a random sample generated for the $n_{c}$ censored cases and the observed values (uncensored cases), for $l = 1, \dots, m .$ Section 2.3 describes the details of the methods used to generate from the random vector $Y^{c} .$
- --
  Since we have the sequence $Y^{(l, k)}$ , at the kth iteration, considering the conditional expectations ${\hat{t^{2}}}_{i}^{(k)},$ ${\hat{t}}_{i}^{(k)}$ and ${\hat{u}}_{i}^{(k)}$ given in Ferreira et al. [13], for the SN, ST, SSL and SCN distributions, we replace the conditional expectations in Equation (13) with the following stochastic approximations:
  $E [U_{i}^{w} T_{i}^{q} Y_{i}^{s} | v_{i}, Y_{i} \leq c_{i}, {\hat{θ}}^{(k)}] = \frac{1}{m} \sum_{l = 1}^{m} E [U_{i}^{w} T_{i}^{q} Y_{i}^{s (l, k)} | v_{i}, ρ_{i}, {\hat{θ}}^{(k)}],$ (14)
  for $w, q, s = 0, 1, 2.$ We chose $m = 300.$ See Appendix 1 for more details.

CM-step: Update ${\hat{β}}^{(k + 1)},$ ${\hat{λ}}^{(k + 1)}$ and ${\hat{σ^{2}}}^{(k + 1)}$ by maximizing $Q (θ | {\hat{θ}}^{(k)})$ over $θ$ , which leads to the following closed-form expressions

\begin{aligned} {\hat{β}}^{(k + 1)} & = [X^{T} D ({\hat{u}}^{(k)} + {\hat{λ}}^{2 (k)} I_{n}) X]^{- 1} X^{T} [{\hat{u y}}^{(k)} - {\hat{λ}}^{(k)} {\hat{t}}^{(k)} + {\hat{λ}}^{2 (k)} {\hat{y}}^{(k)}], \\ {\hat{λ}}^{(k + 1)} & = \frac{\sum_{i = 1}^{n} [{\hat{t y}}_{i}^{(k)} - {\hat{μ}}_{i}^{(k + 1)} {\hat{t}}_{i}^{(k)}]}{\sum_{i = 1}^{n} [{\hat{y^{2}}}_{i}^{(k)} - 2 {\hat{μ}}_{i}^{(k + 1)} {\hat{y}}_{i}^{(k)} + {\hat{μ}}_{i}^{2 (k + 1)}]} and \\ {\hat{σ^{2}}}^{(k + 1)} & = \frac{1}{2 n} \sum_{i = 1}^{n} [{\hat{u y^{2}}}_{i}^{(k)} - 2 {\hat{μ}}_{i}^{(k + 1)} {\hat{u y}}_{i}^{(k)} + {\hat{μ}}_{i}^{2 (k + 1)} {\hat{u}}_{i}^{(k)} + {\hat{t^{2}}}_{i}^{(k)} - 2 {\hat{λ}}^{(k + 1)} {\hat{t y}}_{i}^{(k)} \\ + 2 {\hat{λ}}^{(k + 1)} {\hat{μ}}_{i}^{(k + 1)} {\hat{t}}_{i}^{(k)} + {\hat{λ}}^{2 (k + 1)} ({\hat{y^{2}}}_{i}^{(k)} - 2 {\hat{μ}}_{i}^{(k + 1)} {\hat{y}}_{i}^{(k)} + {\hat{μ}}_{i}^{2 (k + 1)})] . \end{aligned}

CML-step: Update ${\hat{τ}}^{(k + 1)}$ by maximizing the actual marginal log-likelihood function, obtaining ${\hat{τ}}^{(k + 1)} = \arg max_{τ} ℓ ({\hat{θ}}^{(k)} | v, ρ),$ where $ℓ (θ | v, ρ)$ is defined in Equation (10).

The iterations are repeated until a suitable convergence rule is satisfied. We use the criterion $∥ {\hat{θ}}^{(k + 1)} - {\hat{θ}}^{(k)} ∥ < 10^{- 4} .$ Useful starting values are required to implement this algorithm. We start the MCEM algorithm with initial values ${\hat{β}}^{(0)}, {\hat{σ^{2}}}^{(0)}$ and ${\hat{λ}}^{(0)}$ . First, we consider the least-square estimation method for determining ${\hat{β}}^{(0)}$ and ${\hat{σ^{2}}}^{(0)} .$ Then, ${\hat{λ}}^{(0)} = s i g n (\hat{r}),$ where $\hat{r}$ be the sample skewness coefficient for residuals $e_{i} = y_{i} - {\hat{μ}}_{i}, i = 1, \dots, n$ . However, in order to ensure that the true maximum-likelihood estimates is identified, we recommend running the EM algorithm using a range of different starting values. Note that when $λ = 0$ the M-step equations reduce to the equations obtained assuming SMN distributions; see Garay et al. [15]. Particularly, this algorithm clearly generalizes the results found in Arellano et al. [2] by taking $κ (u) = u^{- 1}$ and $U \sim G a m m a (τ / 2, τ / 2) .$

2.3. Computational aspects

In this section, we describe a simulation method to generate random samples from the random variable $Y \sim T S S M N (μ, σ^{2}, λ, H; [a, b]) .$ We concentrate on the truncated skew normal (TSN), truncated skew Student-t-normal (TST), truncated skew slash (TSSL) and truncated skew contaminated normal (TSCN) distributions. According to Ferreira et al. [12], a random variable $Y \sim S S M N (μ, σ^{2}, λ, H)$ has the stochastic representation given in (6) and considering $W = U^{- \frac{1}{2}} [U + λ^{2}]^{\frac{1}{2}} | T_{0} |$ , we have that

Y = μ + σ (\frac{λ W}{U + λ^{2}} + \frac{T_{1}}{(U + λ^{2})^{\frac{1}{2}}}) .

(15)

So, we obtain the following hierarchical representation

\begin{aligned} Y | W = w, U = u & \sim N (μ + \frac{σ λ w}{u + λ^{2}}, \frac{σ^{2}}{(u + λ^{2})}), \\ W | U = u & \sim T N (0, \frac{u + λ^{2}}{u}; [0, \infty)), \\ U & \sim H (τ), \end{aligned}

(16)

where U and W are positive random variables. Then, we have that $a < Y < b,$ which implies

\begin{array}{ccccc} \underset{⏟}{[\frac{a - μ}{σ} - \frac{λ w}{u + λ^{2}}] (u + λ^{2})^{\frac{1}{2}}} & < & \underset{⏟}{[\frac{1}{σ} (Y - μ) - \frac{λ w}{u + λ^{2}}] (u + λ^{2})^{\frac{1}{2}}} & < & \underset{⏟}{[\frac{b - μ}{σ} - \frac{λ w}{u + λ^{2}}] (u + λ^{2})^{\frac{1}{2}}} . \\ a_{1} (u, w) & < & T_{1} & < & b_{1} (u, w) \end{array}

Therefore, the algorithm to generate random samples of the TSSMN models is as follows:

(P1)
Generate a random sample $U_{1}, \dots, U_{m}$ from $H (τ)$ .
(P2)
Generate a random sample $W_{1}, \dots, W_{m}$ from $W_{i} \sim T N (0, \frac{u_{i} + λ^{2}}{u_{i}}; [0, + \infty))$ .
(P3)
Generate a random sample $T_{1_{1}}, \dots, T_{1_{m}}$ from $T_{1_{i}} \sim T N (0, 1; [a_{1} (u_{i}, w_{i}), b_{1} (u_{i}, w_{i})])$ .
(P4)
Using the stochastic representation given in (15), set Y.

Consequently, we draw $y_{i}^{(k)}$ from $f (y_{i} | v_{i}, ρ_{i}, {\hat{θ}}^{(k)})$ in the E-step.

3. Standard error approximation

In this section, we describe how to obtain the standard errors of the ML estimates of the parameters in the SSMN-CR model. Assuming the usual regularity conditions, we have that the asymptotic covariance matrix of $\hat{θ}$ can be approximated by the inverse of the empirical information matrix defined as $I_{o} = \sum_{i = 1}^{n} s (y_{i} | θ) s^{⊤} (y_{i} | θ) - n^{- 1} S (y | θ) S^{⊤} (y | θ),$ where $S (y | θ) = \sum_{i = 1}^{n} s (y_{i} | θ)$ , with $s (y_{i} | θ) = \partial Q_{i} (θ | {\hat{θ}}^{(k)}) / \partial θ$ – see (12) [19]. Substituting $θ$ by the ML estimates $\hat{θ}$ in $I_{o}$ , we obtain the approximation ${\hat{I}}_{o} = \sum_{i = 1}^{n} {\hat{s}}_{i} {\hat{s}}_{i}^{⊤},$ where ${\hat{s}}_{i} = s (y_{i} | \hat{θ})$ is an individual score vector given by ${\hat{s}}_{i} = ({\hat{s}}_{i, β^{⊤}}, {\hat{s}}_{i, σ^{2}}, {\hat{s}}_{i, λ})^{⊤},$ with explicit expressions for the elements of ${\hat{s}}_{i}$ given by

\begin{aligned} {\hat{s}}_{i, β^{⊤}} & = - \frac{1}{2 \hat{σ^{2}}} [- 2 \hat{u y_{i}} + 2 {\hat{μ}}_{i} {\hat{u}}_{i} + 2 \hat{λ} {\hat{t}}_{i} - 2 {\hat{λ}}^{2} {\hat{y}}_{i} + 2 {\hat{λ}}^{2} {\hat{μ}}_{i}] x_{i}, \\ {\hat{s}}_{i, σ^{2}} & = - \frac{1}{\hat{σ^{2}}} + \frac{1}{2 {\hat{σ^{2}}}^{2}} [{\hat{u y^{2}}}_{i} - 2 x_{i}^{T} \hat{β} {\hat{u y}}_{i} + x_{i}^{T} \hat{β} x_{i}^{T} \hat{β} {\hat{u}}_{i} + {\hat{t}}_{i}^{2} \\ - 2 \hat{λ} {\hat{t y}}_{i} + 2 \hat{λ} x_{i}^{T} \hat{β} {\hat{t}}_{i} + {\hat{λ}}^{2} {\hat{y}}_{i}^{2} - 2 {\hat{λ}}^{2} x_{i}^{T} \hat{β} \hat{y_{i}} + {\hat{λ}}^{2} x_{i}^{T} \hat{β} x_{i}^{T} \hat{β}] and \\ {\hat{s}}_{i, λ} & = - \frac{1}{\hat{σ^{2}}} [- {\hat{t y}}_{i} + x_{i}^{T} \hat{β} {\hat{t}}_{i} + \hat{λ} {\hat{y^{2}}}_{i} - 2 \hat{λ} x_{i}^{T} \hat{β} {\hat{y}}_{i} + \hat{λ} x_{i}^{T} \hat{β} x_{i}^{T} \hat{β}] . \end{aligned}

Standard errors of $\hat{θ}$ are extracted from the square root of the diagonal elements of the inverse of ${\hat{I}}_{o}$ . Following Mattos et al. [23], in our analysis, we focus solely on comparing the standard errors of $β$ , $σ^{2}$ and λ.

In the next sections, simulation studies and a real dataset are presented in order to illustrate the performance of the proposed method.

4. Simulation experiments

4.1. Experiment I: performance of the ML estimates

In this section, we use Monte Carlo simulations to evaluate the performance of the ML estimates of the parameters in the SSMN-CR model. The simulation study is designed to observe the changes in estimates by varying sample sizes and right-censoring levels. The data were artificially generated from SSMN-CR models with $x_{i}^{T} = (1, x_{i}),$ such that $x_{i} \sim U (0, 1)$ , $i = 1, \dots, n$ . We generated 100 datasets from each of the SN-CR, ST-CR, SSL-CR and SCN-CR models with the following setup: $β = (β_{0}, β 1)^{T} = (5, 1)^{T}$ , $σ^{2} = 1, λ = 3, τ = 3$ for the ST-CR and SSL-CR, and $τ = (0.1, 0.1)$ for SCN-CR. The description of each scenario is as follows: Scenario 1: A censoring proportion of $10 %$ and different sample sizes, say, n = 50, 100, 200, 400 and 600. The goal in this study is to show the asymptotic behavior of the ML estimates obtained via the proposed MCEM algorithm. Scenario 2: A sample of size n = 100 and different censoring proportions, say, $0 %, 5 %, 10 %, 20 %$ and $30 % .$ We aim to study the behavior of the SSMN-CR models under different censoring proportions. The desired level of censoring was obtained in the following way: the observations were placed in increasing order, and a threshold point was fixed in such a way that the number of observations above this point corresponded to the desired level of censoring.

In both scenarios, for each set of data coming from the respective model SSMN-CR, we set to the same model SSMN-CR. Note that, for scenarios 1 and 2, there are 20 different simulation settings with 100 simulated Monte Carlo datasets for each one. Then, for each simulation, the ML estimates were recorded.

Figures 1–4 show boxplots of the parameter estimates for the SN-CR, ST-CR, SSL-CR and the SCN-CR model, respectively, under scenario 1. In general, for a given censoring level, the bias and the variability of the parameter estimates decrease when the sample size increases. This essentially agrees with the asymptotic properties of the ML method.

Figure 2. — **Scenario 1**: Boxplots of the estimates of $λ, σ^{2}, β_{0}$ and $β_{1}$ (line indicates the true value of the parameter) for the ST-CR model. Legend on panel (a).

Figure 3. — **Scenario 1**: Boxplots of the estimates of $λ, σ^{2}, β_{0}$ and $β_{1}$ (line indicates the true value of the parameter) for the SSL-CR model. Legend on panel (a).

Figure 1. — **Scenario 1**: Boxplots of the estimates of $λ, σ^{2}, β_{0}$ and $β_{1}$ (line indicates the true value of the parameter) for the SN-CR model. Legend on panel (a).

Figure 4. — **Scenario 1**: Boxplots of the estimates of $λ, σ^{2}, β_{0}$ and $β_{1}$ (line indicates the true value of the parameter) for the SCN-CR model. Legend on panel (a).

As observed by an anonymous referee and in order to evaluate the performance of the ML estimates of the parameters in the SSMN-CR model, we compared the bias (BIAS) and the mean square error (MSE) for each parameter over the 100 replicates. The BIAS and MSE measures are defined as in Garay et al. [15] (see Section 5.2). Analyzing Figures 5 and 6, for the censoring level $10 %$ , it can be seen that the Bias and MSE tend to zero in all SSMN-CR models when n increases, i.e. the ML estimates of the parameters in the SSMN-CR model improves when the sample size increases. In addition, Tables 7–10 (see Appendix 2) present the summary statistics for parameter estimation under this scenario.

Figure 5. — **Scenario 1**: Bias of parameters $β_{0}, β_{1}, σ^{2}$ and λ for SSMN models.

Figure 6. — **Scenario 1**: MSE of parameters $β_{0}, β_{1}, σ^{2}$ and λ for SSMN models.

Figures 7–10 show boxplots of the parameter estimates for the SN-CR, ST-CR, SSL-CR and the SCN-CR model, respectively, unde scenario 2. In general, when the sample size is fixed, we see that an increasing censoring level corresponds to increasing bias and variability of the parameter estimates.

Figure 8. — **Scenario 2**: Boxplots of the estimates of $λ, σ^{2}, β_{0}$ and $β_{1}$ (line indicates the true value of the parameter) for the ST-CR model. Legend on panel (a).

Figure 9. — **Scenario 2**: Boxplots of the estimates of $λ, σ^{2}, β_{0}$ and $β_{1}$ (line indicates the true value of the parameter) for the SSL-CR model. Legend on panel (a).

Figure 7. — **Scenario 2**: Boxplots of the estimates of $λ, σ^{2}, β_{0}$ and $β_{1}$ (line indicates the true value of the parameter) for the SN-CR model. Legend on panel (a).

Figure 10. — **Scenario 2**: Boxplots of the estimates of $λ, σ^{2}, β_{0}$ and $β_{1}$ (line indicates the true value of the parameter) for the SCN-CR model. Legend on panel (a).

4.2. Experiment II: parameter recovery and selection criteria

The main objective of this experiment is to illustrate the capacity of the censored models with asymmetry and heavy tails of fitting data with a structure generated from a family of different asymmetric distributions, and also investigating the effects on parametric inference.

4.2.1. Study I

In this study, we consider 100 samples of size 100 from a SCN-CR model with right-censoring levels $5 %, 10 %, 20 %$ or $30 %$ , $x_{i}^{T} = (1, x_{i 1}, x_{i 2})$ , such that $x_{i 1} \sim U (1, 5)$ and $x_{i 2} \sim U (0, 1),$ and parameter values given by $β = (β_{0}, β_{1}, β_{2})^{⊤} = (1, - 1, - 4)^{⊤}$ , $σ^{2} = 2, λ = - 4$ e $τ = (0.1, 0.1)$ . For each sample, we set the SN-CR, ST-CR and SSL-CR models.

As in experiment I, summary statistics of $β$ parameter estimation are presented in Table 11 (see Appendix 2). From these results, for a specific model, we see that an increasing censoring level corresponds to increasing bias and MSE of parameter estimates. We also observe that an increasing censoring level corresponds to decreasing coverage probability at $95 %$ of the $β_{0}, β_{1}$ and $β_{2}$ parameter estimates. Additionally, we conclude that, for all levels of censoring, the SSMN distributions, with heavy tails, outperform the skew-normal distribution, having smaller BIAS and MSE, and greater coverage probability at $95 %$ of $β_{0}, β_{1}$ and $β_{2}$ parameters estimates.

Figures 11–15 show the boxplots of the parameter estimates for the SN-CR, ST-CR and SSL-CR models under the various levels of censorship considered. The estimates of the scale parameters, from the models with distributions of heavy tails, present smaller bias and variability in relation to the SN-CR model for all levels of censorship. Furthermore, it is readily seen that the estimates of the scale parameters obtained from the heavy-tailed models are less sensitive to the variation in the censoring level. This indicates that these models are not only robust to model misspecification but also to different levels of censoring. As expected, censored models with heavy-tailed distributions perform better than the skew-normal one in recovering the true parameter values independently of censoring levels.

Figure 12. — Boxplots of $λ, σ^{2}, β_{1}$ and $β_{2}$ (line indicates the true value of the parameter) for the SN-CR, ST-CR and SSL-CR models – $5 %$ censorship. Legend on panel (a).

Figure 13. — Boxplots of $λ, σ^{2}, β_{1}$ and $β_{2}$ (line indicates the true value of the parameter) for the SN-CR, ST-CR and SSL-CR models – $10 %$ censorship. Legend on panel (a).

Figure 14. — Boxplots of $λ, σ^{2}, β_{1}$ and $β_{2}$ (line indicates the true value of the parameter) for the SN-CR, ST-CR and SSL-CR models – $20 %$ censorship. Legend on panel (a).

Figure 11. — Boxplots of $λ, σ^{2}, β_{1}$ and $β_{2}$ (line indicates the true value of the parameter) for the SN-CR, ST-CR and SSL-CR models – without censorship. Legend on panel (a).

Figure 15. — Boxplots of $λ, σ^{2}, β_{1}$ and $β_{2}$ (line indicates the true value of the parameter) for the SN-CR, ST-CR and SSL-CR models – $30 %$ censorship. Legend on panel (a).

Finally, we compare the capacity of some classical selection criteria of models to select the appropriate model between different SSMN-CR models. Because there is no universal criterion for model selection, we chose two criteria to compare the proposed models, the Akaike information criterion (AIC) and the Bayesian information criterion (BIC), such that $A I C = - 2 ℓ (\hat{θ}) + 2 ξ$ and $B I C = - 2 ℓ (\hat{θ}) + ξ \log n,$ where $ℓ (θ)$ is the actual log-likelihood and ξ is the number of free parameters that have to be estimated in the model. Thus, for each simulation, the parameter estimates as well as the AIC and BIC criteria were recorded. Table 1 shows the percentages in which the censored models heavy tail distributions, specifically ST-CR and SSL-CR models, are preferable to the other adjusted SN-CR model. Not surprisingly, under different levels of censoring, all criteria favor censored models based on heavy tails distributions.

Table 1. Percentages of preferred models under the conditions examined.

Censoring levels	Examined conditions	AIC	BIC
$0 %$	SN vs ST	81	75
	SN vs SSL	86	76
$5 %$	SN vs ST	61	71
	SN vs SSL	83	76
$10 %$	SN vs ST	87	80
	SN vs SSL	92	81
$20 %$	SN vs ST	89	79
	SN vs SSL	93	82
$30 %$	SN vs ST	89	80
	SN vs SSL	90	81

Open in a new tab

4.2.2. Study II

In this simulation study, the aim is to show the flexibility of our proposed SSMN-CR models. As suggested by an anonymous reviewer, we generate artificial data from the proposed linear censored regression model, where the errors follow a distribution totally different in nature from the class of SSMN distributions studied in this paper, but that produces asymmetry and heavy tails. An appropriate example is the generalized hyperbolic (GH) distribution, which is a normal mean-variance mixture distribution. The random variable Y is said to have a normal mean-variance mixture distribution if

Y \overset{d}{=} μ + κ (V) γ + κ^{1 / 2} (V) W,

(17)

where $W \sim N (0, σ^{2})$ , V is a positive random variable, with distribution $H (v | ν)$ independent of W indexed by the parameter vector $ν$ , whereas $κ (.)$ is a strictly positive function which is associated with the mixture variable V. If $κ (V) = V$ and the mixture variable V is distributed Generalized Inverse Gaussian (GIG), then Y is said to have a GH distribution. More details of GIG distribution can be found in Jørgensen [16].

In this study, we consider 100 samples of size 200 from a linear censored regression model, where the errors follow a GH distribution with right-censoring level $10 %$ , $x_{i}^{T} = (1, x_{i 1}, x_{i 2})$ , such that $x_{i 1} \sim U (1, 5)$ and $x_{i 2} \sim U (0, 1),$ and parameter values given by $β = (β_{0}, β_{1}, β_{2})^{⊤} = (1, - 1, - 4)^{⊤}$ , $σ^{2} = 1, γ = - 4$ and the following values for $ν$ : situation 1 – $ν = (- 0.5, 0.1, 0.1)$ or situation 2 – $ν = (- 0.5, 1, 1)$ , with the first situation considered for $ν$ providing a distribution with greater kurtosis. For each sample, we set the SN-CR, ST-CR, SCN-CR, ST $_{B}$ -CR and SCN $_{B}$ -CR models (sub-index B indicates the censored model based on the SMSN distributions, proposed by Branco and Dey, [6]).

Tables 2 and 3 show that the heavy-tailed models outperform the skew-normal one for the two situations considered in this study. In fact, those models have smaller standard deviations, BIAS and MSE of $β = (β_{0}, β_{1}, β_{2})^{⊤}$ parameters estimates. The variance components are not comparable since they are on different scales. In addition, Monte Carlo means of the model comparison criteria (MC AIC and MC BIC) strongly favor the heavy-tailed ones. It can also be seen that, in general, the standard deviations, BIAS and MSE of $β$ under the ST-CR and SCN-CR models are smaller than under ST $_{B}$ -CR and SCN $_{B}$ -CR models, indicating that our proposed models are capable of producing more accurate and precise estimates. According to the MC AIC (or MC BIC) values, the SSMN-CR models fit the data well concerning its competitors. See Tables 2 and 3, where the best fit is indicated by (*1), the second best by (*2) and the third best by (*3).

Table 2. Experiment II – Study II – $ν = (- 0.5, 0.1, 0.1)$ : MC Mean and MC SD are the respective average values (Mean) and the corresponding standard deviations (SD) of the MCEM estimates across all samples from fitting SN-CR, ST-CR, SCN-CR, ${S T}_{B}$ -CR and ${S C N}_{B}$ -CR models.

Fit		Statistics				Criteria
	$β$	MC Mean	MC SD	BIAS	MSE	MC AIC	MC BIC
SN-CR	$β_{0}$	−1.4161	0.9674	2.4161	6.7639
	$β_{1}$	−1.8813	0.4768	0.8813	1.0018
	$β_{2}$	−7.3692	1.6299	3.3692	13.9816	1261.1480	1294.1310
	$β$	MC Mean	MC SD	BIAS	MSE	MC AIC	MC BIC
ST-CR	$β_{0}$	1.2344	0.3708	0.2828	0.1910
	$β_{1}$	−1.0060	0.0567	0.0407	0.0032
	$β_{2}$	−4.0431	0.2307	0.1641	0.0546	859.0534 (1*)	875.5450 (1*)
	$β$	MC Mean	MC SD	BIAS	MSE	MC AIC	MC BIC
SCN-CR	$β_{0}$	2.2601	0.5887	1.2601	1.9310
	$β_{1}$	−1.0568	0.0712	0.0713	0.0082
	$β_{2}$	−4.2575	0.2555	0.2920	0.1309	1038.280	1054.772
	$β$	MC Mean	MC SD	BIAS	MSE	MC AIC	MC BIC
ST $_{B}$ -CR	$β_{0}$	−2.5602	0.8171	3.5602	13.3357
	$β_{1}$	−1.2343	0.1011	0.2343	0.0650
	$β_{2}$	−4.9221	0.3660	0.9221	0.9829	938.4083 (3*)	954.8999 (3*)
	$β$	MC Mean	MC SD	BIAS	MSE	MC AIC	MC BIC
SCN $_{B}$ -CR	$β_{0}$	−2.2265	0.7843	3.2265	11.0193
	$β_{1}$	−1.2291	0.0948	0.2291	0.0614
	$β_{2}$	−4.9104	0.3668	0.9104	0.9620	927.4107 (2*)	950.4989 (2*)

Open in a new tab

Table 3. Experiment II – Study II – $ν = (- 0.5, 1, 1)$ : MC Mean and MC SD are the respective average values (Mean) and the corresponding standard deviations (SD) of the MCEM estimates across all samples from fitting SN-CR, ST-CR, SCN-CR, ST $_{B}$ -CR and SCN $_{B}$ -CR models.

Fit		Statistics				Criteria
	$β$	MC Mean	MC SD	BIAS	MSE	MC AIC	MC BIC
SN-CR	$β_{0}$	−2.5447	0.5001	3.5447	12.8124
	$β_{1}$	−1.1524	0.1255	0.1640	0.0388
	$β_{2}$	−4.7063	0.4903	0.7236	0.7369	993.5653	1026.5484
	$β$	MC Mean	MC SD	BIAS	MSE	MC AIC	MC BIC
ST-CR	$β_{0}$	−0.1049	0.3875	1.1049	1.3694
	$β_{1}$	−0.9752	0.1037	0.0862	0.0113
	$β_{2}$	−3.8768	0.4076	0.3396	0.1797	946.4920 (3*)	962.9836 (2*)
	$β$	MC Mean	MC SD	BIAS	MSE	MC AIC	MC BIC
SCN-CR	$β_{0}$	0.0651	0.3882	0.9349	1.0232
	$β_{1}$	−0.9751	0.1025	0.0836	0.0110
	$β_{2}$	−3.8720	0.4031	0.3380	0.1772	947.6776	964.1692 (3*)
	$β$	MC Mean	MC SD	BIAS	MSE	MC AIC	MC BIC
ST $_{B}$ -CR	$β_{0}$	−2.8953	0.4419	3.8953	15.3664
	$β_{1}$	−1.0503	0.0959	0.0850	0.0116
	$β_{2}$	−4.2305	0.3831	0.3568	0.1985	936.5145 (1*)	953.0061 (1*)
	$β$	MC Mean	MC SD	BIAS	MSE	MC AIC	MC BIC
SCN $_{B}$ -CR	$β_{0}$	−2.5667	0.4444	3.5667	12.9167
	$β_{1}$	−1.0828	0.1016	0.1043	0.0171
	$β_{2}$	−4.3755	0.4135	0.4456	0.3103	942.9701 (2*)	966.0584

Open in a new tab

4.3. Experiment III: imputation of censored observations

To deal with the censored values, we use the imputation procedure by replacing the censored values by $E [Y_{i} | v_{i}, ρ_{i}, {\hat{θ}}^{(k)}]$ obtained from the MCEM algorithm. Therefore, when the censored values are imputed, a complete dataset is obtained. The reason to use the imputation procedure is that it avoids computing truncated conditional expectations of the SSMN distributions originated by the censoring scheme.

Then, in this section, we are interested in predicting the censored observations, denoted by $y_{i}^{c}$ . In the implementation of the MCEM algorithm, at the kth iteration, the predictions of the censored observations, denoted by ${\tilde{y}}_{i}^{c (k)}$ , are calculated as ${\tilde{y}}_{i}^{c (k)} = \frac{1}{m} \sum_{ℓ = 1}^{m} y_{i}^{c (k, l)}, i = 1, \dots, n$ . It is important to remark that the components $y_{i}^{c (k, l)}$ are obtained without computational effort from E-step of the proposed MCEM algorithm. Although we can obtain predicted values of the censored responses at every iteration of the algorithm, we consider only the values of these predictions at the last iteration of the MCEM algorithm.

In this experiment, we consider 100 samples of size 100 from a SCN-CR model with right-censoring levels $5 %, 10 %, 20 %$ or $30 %$ , $x_{i}^{T} = (1, x_{i 1}, x_{i 2})$ , such that $x_{i 1} \sim U (1, 5)$ and $x_{i 2} \sim U (0, 1),$ and the same configuration for the model parameters, previously defined in study II (see Section 4.2). Then, for each sample, we fitted the SN-CR, ST-CR and SSL-CR models, and the predictions of censored observations were recorded. In order to investigate the performance of the prediction when the model distribution is poorly specified, we considered two empirical discrepancy measures, namely the MAE (mean absolute error) and MSE (mean square error); see Matos et al. [22] for more details. These measures are given by

M A E = \frac{1}{100} \sum_{i, j} ∣ y_{i, j} - {\tilde{y}}_{i, j}^{c} ∣ and M S E = \frac{1}{100} \sum_{i, j} {(y_{i, j} - {\tilde{y}}_{i, j}^{c})}^{2},

where $y_{i, j}$ is the original value and ${\tilde{y}}_{i, j}^{c}$ is the predicted value for the ith observation censored in the j simulation, for $j = 1, \dots, 100$ and $i = 1, \dots, n_{c},$ such that $n_{c}$ is the number of censored observations $(5, 10, 20$ or $30)$ depending on the censoring level considered.

Table 4 shows the comparison between the predicted values and the real ones under the SN-CR, ST-CR and SSL-CR with different censoring levels. One can see from these results that the ST-CR and SSL-CR models generate predictive values close to the real ones. As expected, the MCEM algorithm provides a satisfactory imputation for these censored values when heavy-tailed distributions are used. Finally, there is a loss of accuracy in predicting the censored observations as the censoring level increases.

Table 4. Evaluation of the prediction accuracy for the SN-CR, ST-CR and SSL-CR models with different censoring levels.

Censoring levels	Measures	SN-CR	ST-CR	SSL-CR
$5 %$	MAE	1.5993	1.5778	1.5801
	MSE	0.9037	0.8342	0.8600
$10 %$	MAE	4.1277	3.9365	4.0320
	MSE	3.0452	2.6905	2.8763
$20 %$	MAE	10.2852	9.7301	9.9687
	MSE	9.25091	7.9992	8.5780
$30 %$	MAE	18.3699	17.2280	17.8197
	MSE	18.4904	16.1069	17.3085

Open in a new tab

4.4. Experiment IV: influence of a single outlier

The robustness aspects of the SSMN-CR models can be studied considering the influence of a single outlying observation on the estimates of $θ$ . Without loss of generality, we simulated one dataset from the skew-normal trigonometric regression model, such that

y_{i} = β_{0} + β_{1} x_{i} + β_{2} \sin (6 π x_{i}) + ξ_{i}, i = 1, \dots, 200,

$x_{i} \sim U (0, 1)$ , $ξ \sim S N (0, 0.1, - 4),$ where $β = (- 1, 1, 2)^{⊤},$ with right-censoring level $10 %$ ( $c_{i} = 0.0705$ ). For this sample, we fitted the SN-CR, ST-CR and SSL-CR models. We analyzed the influence of a change of δ units in a single observation $y_{i}$ on the estimates of $θ$ . First, we replaced the observation $y_{i}$ with the contaminated value $y_{i} (δ) = y_{i} - δ .$ In this example, we contaminated the uncensored observation i = 200 $(y_{200} = - 0.3955)$ and varied δ between 0 and 20. Figure 16 shows the scatter plot of the data and illustrates the contamination of the observation 200 (denoted by asterisk).

Figure 16. — (a) Scatter plot of the data from the skew-normal trigonometric regression model with right-censoring level $10 %$ . (b) Contamination of observation 200 ranging δ from 0 to 20.

Following Fagundes et al. [8], the influence of a single outlier on the estimates can be assessed based on the mean magnitude of relative error (MMER), which is defined as follows: suppose that $φ = (φ_{1}, \dots, φ_{n_{p}})$ is a generic vector of parameters and that ${\hat{φ}}_{j} (δ)$ is the estimate of $φ_{j}$ after contamination of the data. Then, we define $M M E R (φ) = \sum_{j = 1}^{n_{p}} n_{p}^{- 1} | ({\hat{φ}}_{j} (δ) - {\hat{φ}}_{j}) / {\hat{φ}}_{j} |,$ where ${\hat{φ}}_{j}$ is the ML of $φ_{j}$ . For example, we evaluate $M M E R (φ),$ where $φ = θ^{(1)}$ or $θ^{(2)},$ with $θ^{(1)} = (β_{0}, β_{1}, β_{2})^{⊤} = β^{⊤}$ e $θ^{(2)} = (σ^{2}, λ)^{⊤}$ .

In Figure 17, we present the results of the MMERs for different contaminations δ. In Figures 17(a,b), as expected, the estimates in models with heavy tails are less affected by variations of δ than those in the SN-CR model. As typically considered in the literature, the relevance of using the SSMN distributions is related to its capability of down-weighting outlying observations. In addition, Figure 17(c) shows the BIC values for all fitted models, for each disturbed version of the original dataset. Clearly, it can be seen that as the observation become more atypical, the heavy-tailed models better fit the data.

In addition, this simulation study also evaluates the influence of a single aberrant observation in the prediction of censored components via algorithm MCEM. In this context, suppose that ${\tilde{y}}^{c} = ({\tilde{y}}_{1}^{c}, \dots, {\tilde{y}}_{n_{c}}^{c})$ is a generic vector of predictions of the censored observations and thus, we consider $M M E R ({\tilde{y}}^{c}) = \sum_{j = 1}^{n_{c}} \frac{1}{n_{c}} | ({\tilde{y}}_{j}^{c} (δ) - {\tilde{y}}_{j}^{c}) / {\tilde{y}}_{j}^{c} |,$ where ${\tilde{y}}_{j}^{c} (δ)$ are the predictions of the censored observations after contamination of the data. In this example, we have 20 censored observations and Figure 17(d) shows the results of the MMERs for different contaminations δ. Not surprisingly, the models with heavy tails have better performance in predicting the censored observations, showing their robustness to discrepant observations.

5. Apllication: stellar abundances dataset

In this section, we illustrate our proposed methods with a dataset obtained from Santos et al. [25], which is available, e.g. in the R package astrodatR [9], under the name Stellar abundances. This dataset contains measurements for 68 solar-type stars and for our analysis, following Mattos et al. [23], we considered: $\log N (B e)$ as the response variable, which represents the log of the abundance of the light element beryllium (Be) in stars scaled to Sun's abundance (i.e. the Sun has $\log N (B e) = 0.0$ ) and $T e f f / 1000$ as the explanatory variable, which represents the effective stellar surface temperature (in kelvin). According to Feigelson and Babu [10] due to limited sensitivities, some objects may be undetected, leading to upper limits in their derived luminosities. For this dataset we have 12 left-censored data points, i.e. 12 undetected beryllium measurement, that represents $19.35 %$ of observations.

Mattos et al. [23] fitted various SMSN-CR models and they concluded the ST-CR $_{B}$ model (sub-index B indicates the censored model based on the SMSN distributions, proposed by Branco and Dey, [6]) seems to better fit the Stellar abundances data. Thus, we analyzed the Stellar abundances dataset with the aim of providing additional inferences by using SSMN distributions in the context of linear censored regression models. Table 5 contains the ML estimates for the parameters of the three models, i.e. ST-CR, SSL-CR and SCN-CR $(τ = (τ, γ)^{⊤})$ models, together with their corresponding standard errors calculated via the empirical information matrix. We refer the interested reader to see Table 5 in Mattos et al. [23], which contains the ML estimates of the parameters from the SMSN-CR models, including SN-CR model.

Table 5. Stellar abundances dataset: Estimated parameter values of the SSMN-CR models via the MCEM algorithm with corresponding approximate standard errors (SE).

	ST-CR		SSL-CR		SCN-CR
Parameter	Estimate	SE	Estimate	SE	Estimate	SE
$β_{0}$	−1.8713	0.0318	−1.8049	0.0233	−1.7203	0.0233
$β_{1}$	0.5245	0.0054	0.5167	0.0040	0.5029	0.0040
$σ^{2}$	0.0333	0.0091	0.0283	0.0070	0.0474	0.0070
λ	−1.9058	0.4156	−2.0179	0.4038	−2.6846	0.4038
τ	2.0101	–	1.0101	–	0.2979	–
γ	–	–	–	–	0.1000	–

Open in a new tab

The results of the fit in terms of log-likelihood, AIC and BIC are provided in Table 6. Note that the SN-CR model does not seem to fit the data well. We can see that the models with higher log-likelihood are the ST-CR, the ST-CR $_{B}$ and the SSL-CR $_{B}$ models. We also see that both AIC and BIC criteria favor the ST-CR model, and then the ST-CR $_{B}$ model closely followed by the SSL-CR $_{B}$ model, i.e.models that have heavy tails and asymmetric behaviors.

Table 6. Stellar abundances dataset: Comparison of log-likelihood maximum, AIC and BIC for fitted various models using the stellar abundances data.

SSMN-CR models	log-likelihood	AIC	BIC
ST-CR	−1.7802	11.5605 (1*)	20.4385 (1*)
SCN-CR	−4.4375	16.8750	25.7531
SSL-CR	−3.2474	14.4949	23.3729
SMSN-CR models	log-likelihood	AIC	BIC
SN-CR	−18.2276	44.4553	53.3333
ST-CR $_{B}$	−2.1278	12.2556 (2*)	21.1336 (2*)
SCN-CR $_{B}$	−3.7473	15.4946	24.3372
SSL-CR $_{B}$	−2.7253	13.4506 (3*)	22.3287 (3*)

Open in a new tab

Note: Best fit indicated by (*1), second best by (*2), and third best by (*3).

6. Conclusions

In this work, we have proposed a linear regression model with censored responses based on skew scale mixtures of normal distributions, denoted by SSMN-CR models, as a replacement to the conventional choice of normal (or symmetric) distribution for censored linear models. Our results generalize the recent works by Arellano-Valle et al. [2], Massuia et al. [20] and Garay et al. [15] from a frequentist point of view. Also, the results of this paper are a necessary supplement to those presented in Mattos et al. [23] in the sense that both classes of asymmetric distributions, SMSN and SSMN, are special cases of the SSMSN family proposed by Arellano et al. [3].

An MCEM algorithm is developed by exploring the statistical properties of the class considered, which is implemented in the R software [24]. R code for analyzing the application may be downloaded from the website https://github.com/ClecioFerreira/SSMN-CR. The developed algorithm can be viewed as consisting of two parts, associated with the uncensored data and censored data, respectively. In the case of no censoring, the algorithm naturally reduces to the standard EM algorithm (see [13]). Simulation results under various scenarios and real data analysis indicate that the proposed method can be used to model data that present asymmetry and heavy tails with great flexibility.

Finally, the method proposed in this paper can be extended in multivariate settings and carrying out diagnostics analysis in the SSMN-CR models. Besides, as pointed out by a referee, further research could be centered in obtaining closed (or implementable) form expressions for the conditional expectations in the E-step, such as the recent proposal of Lachos et al. [17] (see also, [14]) in the context of SMSN distributions. Although relevant, a deeper investigation of these moments is beyond the scope of the present paper. We thank the anonymous referee for valuable suggestions for future research. We hope to report these findings in a future paper.

Acknowledgments

Camila Borelli Zeller was supported by CNPq and FAPEMIG.

Appendices.

Appendix 1. E-step in the MCEM algorithm.

E-step: The notation used is that of Section 2.2.

For a censored observation i: In this case, we have $ρ_{i} = 1$ , $Y_{i} \leq c_{i}$ and $V_{i} = c_{i}$ . At the kth iteration, considering the conditional expectations ${\hat{t^{2}}}_{i}^{(k)},$ ${\hat{t}}_{i}^{(k)}$ and ${\hat{u}}_{i}^{(k)}$ given in Ferreira et al. [13], for the SN, ST, SSL and SCN distributions, we have that

$E [U_{i} Y_{i}^{s} | v_{i}, Y_{i} \leq c_{i}, {\hat{θ}}^{(k)}] = E [Y_{i}^{s} E [U_{i} | y_{i}, {\hat{θ}}^{(k)}] | v_{i}, Y_{i} \leq c_{i}, {\hat{θ}}^{(k)}], for s = 0, 1, 2,$ where

$E [U_{i} Y_{i}^{s} | v_{i}, Y_{i} \leq c_{i}, {\hat{θ}}^{(k)}] = \frac{1}{m} \sum_{l = 1}^{m} [y_{i}^{s (l, k)} (\frac{τ + 1}{τ + d_{i}^{(l, k)}})]$ for the ST-CR model,

$E [U_{i} Y_{i}^{s} | v_{i}, Y_{i} \leq c_{i}, {\hat{θ}}^{(k)}] = \frac{1}{m} \sum_{l = 1}^{m} [y_{i}^{s (l, k)} (\frac{(2 τ + 1)}{d_{i}^{(l, k)}} \frac{P_{1} (τ + 3 / 2, d_{i}^{(l, k)} / 2)}{P_{1} (τ + 1 / 2, d_{i}^{(l, k)} / 2)})]$ for the SSL-CR model and

$E [U_{i} Y_{i}^{s} | v_{i}, Y_{i} \leq c_{i}, {\hat{θ}}^{(k)}] = \frac{1}{m} \sum_{l = 1}^{m} [y_{i}^{s (l, k)} (\frac{1 - τ_{1} + τ_{1} τ_{2}^{3 / 2} \exp {(1 - τ_{2}) d_{i}^{(l, k)} / 2}}{1 - τ_{1} + τ_{1} τ_{2}^{1 / 2} \exp {(1 - τ_{2}) d_{i}^{(l, k)} / 2}})]$ for the SCN-CR model $(τ = (τ_{1}, τ_{2})^{⊤})$ , where $d_{i}^{(l, k)} = \frac{(y_{i}^{(l, k)} - {\hat{μ}}_{i}^{(k)})^{2}}{{\hat{σ^{2}}}^{(k)}}$ and $P_{x} (a, b)$ denotes the cdf of a distributions Gama $(a, b)$ available in x.
$E [T_{i} Y_{i}^{s} | v_{i}, Y_{i} \leq c_{i}, {\hat{θ}}^{(k)}] = E [Y_{i}^{s} E [T_{i} | y_{i}, {\hat{θ}}^{(k)}] | v_{i}, Y_{i} \leq c_{i}, {\hat{θ}}^{(k)}] .$ Then, $E [T_{i} Y_{i}^{s} | v_{i}, Y_{i} \leq c_{i}, {\hat{θ}}^{(k)}] = \frac{1}{m} \sum_{l = 1}^{m} [y_{i}^{s (l, k)} ({\hat{λ}}^{(k)} [y_{i}^{(l, k)} - {\hat{μ}}_{i}^{(k)}] + {\hat{σ}}^{(k)} W_{Φ} (\frac{{\hat{λ}}^{(k)} (y_{i}^{(l, k)} - {\hat{μ}}_{i}^{(k)})}{{\hat{σ}}^{(k)}}))],$ for $s = 0, 1,$ where $W_{Φ} (u) = \frac{φ_{1} (u)}{Φ (u)}$ .
$E [T_{i}^{2} | v_{i}, Y_{i} \leq c_{i}, {\hat{θ}}^{(k)}] = E [E [T_{i}^{2} | y_{i}, {\hat{θ}}^{(k)}] | v_{i}, Y_{i} \leq c_{i}, {\hat{θ}}^{(k)}]$ . Then, $E [T_{i}^{2} | v_{i}, Y_{i} \leq c_{i}, {\hat{θ}}^{(k)}] =$

$\frac{1}{m} \sum_{l = 1}^{m} [{\hat{λ}}^{2 (k)} [y_{i}^{(l, k)} - {\hat{μ}}_{i}^{(k)}]^{2} + {\hat{σ^{2}}}^{(k} + {\hat{σ}}^{(k)} {\hat{λ}}^{(k)} [y_{i}^{(l, k)} - {\hat{μ}}_{i}^{(k)}] W_{Φ} (\frac{{\hat{λ}}^{(k)} (y_{i}^{(l, k)} - {\hat{μ}}_{i}^{(k)})}{{\hat{σ}}^{(k)}})] .$
$E [Y_{i}^{s} | v_{i}, Y_{i} \leq c_{i}, {\hat{θ}}^{(k)}] = \frac{1}{m} \sum_{l = 1}^{m} y_{i}^{s (l, k)}, for s = 1, 2.$

Appendix 2. Complementary tables.

Table A1. Experiment I – Scenario 1: MC Mean and MC SD are the respective average values (Mean) and the corresponding standard deviations (SD) of the MCEM estimates across all samples from fitting the SN-CR model.

			Parameters
Fit			$β_{0}$	$β_{1}$	$σ^{2}$	λ
SN-CR	n = 50	MC Mean	5.1117	1.0033	0.8764	4.0716
		MC SD	0.3700	0.0321	0.3172	3.3792
		BIAS	0.2628	0.0256	0.2772	2.5540
		MSE	0.1480	0.0010	0.1149	12.4533
	n = 100	MC Mean	5.0397	1.0010	0.9126	3.1606
		MC SD	0.1533	0.0218	0.2034	1.3715
		BIAS	0.1177	0.0168	0.1803	0.9664
		MSE	0.0248	0.0005	0.0486	1.8879
	n = 200	MC Mean	4.9934	1.0017	0.9686	3.1818
		MC SD	0.1091	0.0139	0.1579	0.9829
		BIAS	0.0882	0.0116	0.1295	0.7352
		MSE	0.0118	0.0002	0.0257	0.9895
	n = 400	MC Mean	5.0379	0.9971	0.9222	2.8531
		MC SD	0.0733	0.0108	0.0989	0.5434
		BIAS	0.0656	0.0089	0.1035	0.4633
		MSE	0.0068	0.0001	0.0157	0.3140
	n = 600	MC Mean	5.0367	0.9968	0.9234	2.8786
		MC SD	0.0483	0.0081	0.0895	0.4779
		BIAS	0.0496	0.0070	0.0958	0.4064
		MSE	3.6514e−03	7.5503e−05	1.3798e−02	2.4086-01

Open in a new tab

Table A2. Experiment I – Scenario 1: MC Mean and MC SD are the respective average values (Mean) and the corresponding standard deviations (SD) of the MCEM estimates across all samples from fitting the ST-CR model.

			Parameters
Fit			$β_{0}$	$β_{1}$	$σ^{2}$	λ
ST-CR	n = 50	MC Mean	5.1159	0.9979	0.8569	3.5389
		MC SD	0.2566	0.0329	0.3755	2.4125
		BIAS	0.2269	0.0269	0.3354	1.8752
		MSE	0.0786	0.0011	0.1601	6.0526
	n = 100	MC Mean	5.0498	1.0006	0.8687	3.0797
		MC SD	0.1545	0.0201	0.2704	1.5868
		BIAS	0.1241	0.0157	0.2436	1.1727
		MSE	0.0261	0.0004	0.0896	2.4991
	n = 200	MC Mean	5.0031	1.0005	0.9954	3.2745
		MC SD	0.1518	0.0172	0.1999	1.3447
		BIAS	0.1172	0.0130	0.1786	0.8941
		MSE	0.0228	0.0003	0.0410	1.8656
	n = 400	MC Mean	5.0539	1.0002	0.8382	2.5117
		MC SD	0.0794	0.0105	0.1224	0.5376
		BIAS	0.0768	0.0088	0.1546	0.6256
		MSE	0.0091	0.0001	0.0396	0.5246
	n = 600	MC Mean	5.0457	1.0003	0.8622	2.5864
		MC SD	0.0695	0.0090	0.1143	0.5060
		BIAS	0.0676	0.0072	0.1496	0.5394
		MSE	6.8751e−03	8.0643e−05	3.1923e−02	4.2452e−01

Open in a new tab

Table A3. Experiment I – Scenario 1: MC Mean and MC SD are the respective average values (Mean) and the corresponding standard deviations (SD) of the MCEM estimates across all samples from fitting the SSL-CR model.

			Parameters
Fit			$β_{0}$	$β_{1}$	$σ^{2}$	λ
SSL-CR	n = 50	MC Mean	5.1150	0.9930	0.9192	4.0079
		MC SD	0.4000	0.0324	0.2857	2.9206
		BIAS	0.2354	0.0253	0.2376	2.1866
		MSE	0.1716	0.0011	0.0874	9.4603
	n = 100	MC Mean	5.0714	1.0001	0.8839	3.1415
		MC SD	0.2213	0.0237	0.2628	1.8025
		BIAS	0.1759	0.0189	0.2292	1.2252
		MSE	0.0536	0.0005	0.0819	3.2365
	n = 200	MC Mean	4.9934	1.00037	1.010	3.2875
		MC SD	0.1138	0.0168	0.1482	1.0242
		BIAS	0.0929	0.0136	0.1196	0.7542
		MSE	0.0129	0.0003	0.0218	1.1212
	n = 400	MC Mean	5.0493	0.9952	0.9031	2.8113
		MC SD	0.0842	0.0123	0.1113	0.4879
		BIAS	0.0786	0.0102	0.1206	0.4222
		MSE	0.0094	0.0002	0.0216	0.2713
	n = 600	MC Mean	5.0457	0.9958	0.9154	2.8626
		MC SD	0.0644	0.0090	0.0942	0.4728
		BIAS	0.0657	0.0081	0.1059	0.3980
		MSE	6.1938e−03	9.7448e−05	1.5947e−02	2.4017e−01

Open in a new tab

Table A4. Experiment I – Scenario 1: MC Mean and MC SD are the respective average values (Mean) and the corresponding standard deviations (SD) of the MCEM estimates across all samples from fitting the SCN-CR model.

			Parameters
Fit			$β_{0}$	$β_{1}$	$σ^{2}$	λ
SCN-CR	n = 50	MC Mean	5.0916	1.0021	0.9308	3.5118
		MC SD	0.2524	0.0365	0.4163	2.4528
		BIAS	0.20968	0.0289	0.3548	1.8838
		MSE	0.0714	0.0013	0.1764	6.2180
	n = 100	MC Mean	5.0817	1.0014	0.8307	2.8386
		MC SD	0.1788	0.0223	0.2375	1.6371
		BIAS	0.1483	0.0178	0.2423	1.2717
		MSE	0.0383	0.0005	0.0845	2.6794
	n = 200	MC Mean	5.0049	0.9999	1.0096	3.3595
		MC SD	0.1071	0.0131	0.2015	1.1964
		BIAS	0.0838	0.0103	0.1609	0.9328
		MSE	0.0117	0.0001	0.0403	1.5463
	n = 400	MC Mean	5.0548	0.9972	0.8862	2.7467
		MC SD	0.0937	0.0114	0.1325	0.6159
		BIAS	0.0816	0.00954	0.1480	0.5433
		MSE	0.0114	0.0002	0.0303	0.4397
	n = 600	MC Mean	5.0367	0.9968	0.9234	2.8786
		MC SD	0.0483	0.0081	0.0895	0.4779
		BIAS	0.0496	0.0070	0.0958	0.4064
		MSE	3.6514e−03	7.5503e−05	1.3798e−02	2.4086e−01

Open in a new tab

Table A5. Experiment II – Study I – without censorship and $5 %, 10 %, 20 %$ and $30 %$ censorships: MC Mean and MC SD are the respective average values (Mean) and the corresponding standard deviations (SD) of the MCEM estimates across all samples from fitting SSMN-CR models.

			Parameters
Cens.	Fit		$β_{0}$	$β_{1}$	$β_{2}$
$0 %$	SN-CR	MC Mean	1.2085	−1.0076	−4.0386
		MC SD	0.3412	0.0888	0.2686
		BIAS	0.3256	0.0629	0.2153
		MSE	0.1664	0.0066	0.0729
		MC CP $(%)$	91	91	91
	ST-CR	MC Mean	0.9348	−1.0051	−4.0446
		MC SD	0.3333	0.0776	0.2508
		BIAS	0.2425	0.0625	0.2086
		MSE	0.0941	0.0064	0.0642
		MC CP $(%)$	98	98	98
	SSL-CR	MC Mean	0.9963	−1.0059	−4.0473
		MC SD	0.3371	0.0807	0.2526
		BIAS	0.2470	0.0605	0.2098
		MSE	0.0969	0.0060	0.0654
		MC CP $(%)$	97	97	97
$5 %$	SN-CR	MC Mean	0.7449	−0.9869	−3.9911
		MC SD	0.3199	0.0804	0.3297
		BIAS	0.3285	0.0695	0.2618
		MSE	0.1588	0.0072	0.1077
		MC CP $(%)$	88	88	88
	ST-CR	MC Mean	1.0867	−0.9907	−4.0016
		MC SD	0.2956	0.0796	0.3058
		BIAS	0.2779	0.0632	0.2387
		MSE	0.1142	0.0060	0.0926
		MC CP $(%)$	96	96	96
	SSL-CR	MC Mean	0.9069	−0.9878	−3.9854
		MC SD	0.2985	0.0767	0.3036
		BIAS	0.2778	0.0648	0.2380
		MSE	0.1125	0.0065	0.0915
		MC CP $(%)$	95	95	95
$10 %$	SN-CR	MC Mean	0.9468	−0.9607	−3.9132
		MC SD	0.3507	0.0773	0.3951
		BIAS	0.4186	0.0708	0.3153
		MSE	0.2652	0.0079	0.1621
		MC CP $(%)$	78	78	78
	ST-CR	MC Mean	0.6126	−0.9655	−3.9511
		MC SD	0.3409	0.0759	0.3462
		BIAS	0.2808	0.0689	0.2805
		MSE	0.1246	0.0071	0.1211
		MC CP $(%)$	95	95	95
	SSL-CR	MC Mean	0.7741	-0.9635	−3.9353
		MC SD	0.3361	0.0733	0.3497
		BIAS	0.3166	0.0678	0.2832
		MSE	0.1628	0.0066	0.1253
		MC CP $(%)$	85	85	85
$20 %$	SN-CR	MC Mean	0.4142	−0.9465	−3.7297
		MC SD	0.4844	0.1016	0.3782
		BIAS	0.6136	0.0904	0.3773
		MSE	0.5411	0.0131	0.2146
		MC CP $(%)$	76	76	76
	ST-CR	MC Mean	0.7429	−0.9609	−3.7649
		MC SD	0.4471	0.0999	0.3583
		BIAS	0.4299	0.0839	0.3317
		MSE	0.2984	0.0114	0.1824
		MC CP $(%)$	89	89	89
	SSL-CR	MC Mean	0.5526	−0.9493	−3.7246
		MC SD	0.4284	0.0982	0.3461
		BIAS	0.5008	0.0860	0.3502
		MSE	0.3818	0.0121	0.1944
		MC CP $(%)$	82	82	82
$30 %$	SN-CR	MC Mean	0.2360	−0.9130	−3.7078
		MC SD	0.5684	0.1000	0.4731
		BIAS	0.8115	0.1142	0.4578
		MSE	0.8279	0.0179	0.3069
		MC CP $(%)$	66	66	66
	ST-CR	MC Mean	0.5699	−0.9260	−3.6838
		MC SD	0.4966	0.0982	0.3958
		BIAS	0.5953	0.1046	0.4263
		MSE	0.5047	0.0154	0.2551
		MC CP $(%)$	88	88	88
	SSL-CR	MC Mean	0.3360	-0.9076	-3.6342
		MC SD	0.5110	0.0973	0.4017
		BIAS	0.7316	0.1109	0.4512
		MSE	0.6995	0.0171	0.2936
		MC CP $(%)$	72	72	72

Open in a new tab

Note: MC CP is the coverage probability at $95 % .$

Funding Statement

This work was supported by Conselho Nacional de Desenvolvimento Científico e Tecnológico and Fundação de Amparo à Pesquisa do Estado de Minas Gerais

Disclosure statement

No potential conflict of interest was reported by the authors.

References

1.Andrews D.R. and Mallows C.L., Scale mixture of normal distributions, J. Roy. Stat. Soc. B 36 (1974), pp. 99–102. [Google Scholar]
2.Arellano R.B., Castro L.M., González-Farías G., and Munõz-Gajardo K.A., Student-t censored regression model: Properties and inference, Stat. Methods Appl. 21 (2012), pp. 453–473. doi: 10.1007/s10260-012-0199-y [DOI] [Google Scholar]
3.Arellano-Valle R.B., Ferreira C.S., and Genton M.G., Scale and shape mixtures of multivariate skew-normal distributions, J. Multivar. Anal. 166 (2018), pp. 98–110. doi: 10.1016/j.jmva.2018.02.007 [DOI] [Google Scholar]
4.Azzalini A., A class of distributions which includes the normal ones, Scand. J. Stat. 12 (1985), pp. 171–178. [Google Scholar]
5.Berkane M., Kano Y., and Bentler P.M., Pseudo maximum likelihood estimation in elliptical theory: Effects of misspecification, Comput. Stat. Data Anal. 18 (1994), pp. 255–267. doi: 10.1016/0167-9473(94)90175-9 [DOI] [Google Scholar]
6.Branco M.D. and Dey D.K., A general class of multivariate skew elliptical distributions, J. Multivariate Anal. 79 (2001), pp. 99–113. doi: 10.1006/jmva.2000.1960 [DOI] [Google Scholar]
7.Dempster A., Laird N., and Rubin D.B., Maximum likelihood from incomplete data via the EM algorithm, J. Roy. Stat. Soc. B 39 (1977), pp. 1–38. [Google Scholar]
8.Fagundes R.A., Souza R.M.C., and Cysneiros F.J.A., Robust regression with application to symbolic interval data, Eng. Appl. Artif. Intell. 26 (2013), pp. 564–573. doi: 10.1016/j.engappai.2012.05.004 [DOI] [Google Scholar]
9.Feigelson E.D., AstrodatR: Astronomical data, R package v. 0.1, 2014, Available at https://cran.r-project.org/web/packages/astrodatR/.
10.Feigelson E.D. and Babu G.J., Modern Statistical Methods for Astronomy: With R Applications, Cambridge University Press, Cambridge, 2012. [Google Scholar]
11.Ferreira C.S., Bolfarine H., and Lachos V.H., Skew scale mixture of normal distributions: Properties and estimation, Stat. Methodol. 8 (2011), pp. 154–171. doi: 10.1016/j.stamet.2010.09.001 [DOI] [Google Scholar]
12.Ferreira C.S., Bolfarine H., and Lachos V.H., Likelihood-based inference for multivariate skew scale mixtures of normal distributions, AStA Adv. Stat. Anal. 100 (2016), pp. 421–441. doi: 10.1007/s10182-016-0266-z [DOI] [Google Scholar]
13.Ferreira C.S., Lachos V.H., and Bolfarine H., Inference and diagnostics in skew scale mixtures of normal regression models, J. Stat. Comput. Simul. 85 (2015), pp. 517–537. doi: 10.1080/00949655.2013.828057 [DOI] [Google Scholar]
14.Galarza C.E., Momentos de distribuições multivariadas duplamente truncadas, Tese de doutorado, Departamento de Estatística, IMECC-UNICAMP, 2020.
15.Garay A.M., Lachos V.H., Bolfarine H., and Cabral C.R.B., Linear censored regression models with scale mixtures of normal distributions, Stat. Papers 58 (2017), pp. 247–278. doi: 10.1007/s00362-015-0696-9 [DOI] [Google Scholar]
16.Jørgensen B., Statistical Properties of the Generalized Inverse Gaussian Distribution, Lecture notes in statistics, Springer, Heidelberg, 1982.
17.Lachos V.H., Garay A.M., and Cabral C.R.B., Moments of truncated skew-normal/independent distributions, Brazilian J. Probab. Stat. (in Press), (2019). Available from: https://www.imstat.org/wp-content/uploads/2019/03/BJPS438.pdf.
18.Lange K. and Sinsheimer J.S., Normal/independent distributions and their applications in robust regression, J. Comput. Graph. Stat. 2 (1993), pp. 175–198. [Google Scholar]
19.Louis T.A., Finding the observed information matrix when using the EM algorithm, J. Roy. Stat. Soc. B (Methodol.) 44 (1982), pp. 226–233. [Google Scholar]
20.Massuia M.B., Cabral C.R.B., Matos L.A., and Lachos V.H., Influence diagnostics for student-t censored linear regression models, Statistics 49 (2015), pp. 1074–1094. doi: 10.1080/02331888.2014.958489 [DOI] [Google Scholar]
21.Massuia M.B., Garay A.M., Lachos V.H., and Cabral C.R.B., Bayesian analysis of censored linear regression models with scale mixtures of skew-normal distributions, Stat. Interface 10 (2017), pp. 425–439. doi: 10.4310/SII.2017.v10.n3.a7 [DOI] [Google Scholar]
22.Matos L.A., Castro L.M., and Lachos V.H., Censored mixed-effects models for irregularly observed repeated measures with applications to HIV viral loads, Test 25 (2016), pp. 627–653. doi: 10.1007/s11749-016-0486-2 [DOI] [Google Scholar]
23.Mattos T.B., Garay A.M., and Lachos V.H., Likelihood-based inference for censored linear regression models with scale mixtures of skew-normal distributions, J. Appl. Stat. 45 (2018), pp. 2039–2066. doi: 10.1080/02664763.2017.1408788 [DOI] [Google Scholar]
24.R Core Team , R: A Language and Environment for Statistical Computing, R Foundation for Statistical Computing, Vienna, Austria. 2018. Available at http://www.R-project.org/.
25.Santos N., López R.G., Israelian G., Mayor M., Rebolo R., García-Gil A., de Taoro M.P., and Randich S., Beryllium abundances in stars hosting giant planets, Astron. Astrophys. 386 (2002), pp. 1028–1038. doi: 10.1051/0004-6361:20020280 [DOI] [Google Scholar]
26.Wei G.C.G. and Tanner M.A., A Monte Carlo implementation of the EM algorithm and the poor man's data augmentation algorithms, J. Am. Stat. Assoc. 85 (1990), pp. 699–704. doi: 10.1080/01621459.1990.10474930 [DOI] [Google Scholar]

[CIT0001] 1.Andrews D.R. and Mallows C.L., Scale mixture of normal distributions, J. Roy. Stat. Soc. B 36 (1974), pp. 99–102. [Google Scholar]

[CIT0002] 2.Arellano R.B., Castro L.M., González-Farías G., and Munõz-Gajardo K.A., Student-t censored regression model: Properties and inference, Stat. Methods Appl. 21 (2012), pp. 453–473. doi: 10.1007/s10260-012-0199-y [DOI] [Google Scholar]

[CIT0003] 3.Arellano-Valle R.B., Ferreira C.S., and Genton M.G., Scale and shape mixtures of multivariate skew-normal distributions, J. Multivar. Anal. 166 (2018), pp. 98–110. doi: 10.1016/j.jmva.2018.02.007 [DOI] [Google Scholar]

[CIT0004] 4.Azzalini A., A class of distributions which includes the normal ones, Scand. J. Stat. 12 (1985), pp. 171–178. [Google Scholar]

[CIT0005] 5.Berkane M., Kano Y., and Bentler P.M., Pseudo maximum likelihood estimation in elliptical theory: Effects of misspecification, Comput. Stat. Data Anal. 18 (1994), pp. 255–267. doi: 10.1016/0167-9473(94)90175-9 [DOI] [Google Scholar]

[CIT0006] 6.Branco M.D. and Dey D.K., A general class of multivariate skew elliptical distributions, J. Multivariate Anal. 79 (2001), pp. 99–113. doi: 10.1006/jmva.2000.1960 [DOI] [Google Scholar]

[CIT0007] 7.Dempster A., Laird N., and Rubin D.B., Maximum likelihood from incomplete data via the EM algorithm, J. Roy. Stat. Soc. B 39 (1977), pp. 1–38. [Google Scholar]

[CIT0008] 8.Fagundes R.A., Souza R.M.C., and Cysneiros F.J.A., Robust regression with application to symbolic interval data, Eng. Appl. Artif. Intell. 26 (2013), pp. 564–573. doi: 10.1016/j.engappai.2012.05.004 [DOI] [Google Scholar]

[CIT0009] 9.Feigelson E.D., AstrodatR: Astronomical data, R package v. 0.1, 2014, Available at https://cran.r-project.org/web/packages/astrodatR/.

[CIT0010] 10.Feigelson E.D. and Babu G.J., Modern Statistical Methods for Astronomy: With R Applications, Cambridge University Press, Cambridge, 2012. [Google Scholar]

[CIT0011] 11.Ferreira C.S., Bolfarine H., and Lachos V.H., Skew scale mixture of normal distributions: Properties and estimation, Stat. Methodol. 8 (2011), pp. 154–171. doi: 10.1016/j.stamet.2010.09.001 [DOI] [Google Scholar]

[CIT0012] 12.Ferreira C.S., Bolfarine H., and Lachos V.H., Likelihood-based inference for multivariate skew scale mixtures of normal distributions, AStA Adv. Stat. Anal. 100 (2016), pp. 421–441. doi: 10.1007/s10182-016-0266-z [DOI] [Google Scholar]

[CIT0013] 13.Ferreira C.S., Lachos V.H., and Bolfarine H., Inference and diagnostics in skew scale mixtures of normal regression models, J. Stat. Comput. Simul. 85 (2015), pp. 517–537. doi: 10.1080/00949655.2013.828057 [DOI] [Google Scholar]

[CIT0014] 14.Galarza C.E., Momentos de distribuições multivariadas duplamente truncadas, Tese de doutorado, Departamento de Estatística, IMECC-UNICAMP, 2020.

[CIT0015] 15.Garay A.M., Lachos V.H., Bolfarine H., and Cabral C.R.B., Linear censored regression models with scale mixtures of normal distributions, Stat. Papers 58 (2017), pp. 247–278. doi: 10.1007/s00362-015-0696-9 [DOI] [Google Scholar]

[CIT0016] 16.Jørgensen B., Statistical Properties of the Generalized Inverse Gaussian Distribution, Lecture notes in statistics, Springer, Heidelberg, 1982.

[CIT0017] 17.Lachos V.H., Garay A.M., and Cabral C.R.B., Moments of truncated skew-normal/independent distributions, Brazilian J. Probab. Stat. (in Press), (2019). Available from: https://www.imstat.org/wp-content/uploads/2019/03/BJPS438.pdf.

[CIT0018] 18.Lange K. and Sinsheimer J.S., Normal/independent distributions and their applications in robust regression, J. Comput. Graph. Stat. 2 (1993), pp. 175–198. [Google Scholar]

[CIT0019] 19.Louis T.A., Finding the observed information matrix when using the EM algorithm, J. Roy. Stat. Soc. B (Methodol.) 44 (1982), pp. 226–233. [Google Scholar]

[CIT0020] 20.Massuia M.B., Cabral C.R.B., Matos L.A., and Lachos V.H., Influence diagnostics for student-t censored linear regression models, Statistics 49 (2015), pp. 1074–1094. doi: 10.1080/02331888.2014.958489 [DOI] [Google Scholar]

[CIT0021] 21.Massuia M.B., Garay A.M., Lachos V.H., and Cabral C.R.B., Bayesian analysis of censored linear regression models with scale mixtures of skew-normal distributions, Stat. Interface 10 (2017), pp. 425–439. doi: 10.4310/SII.2017.v10.n3.a7 [DOI] [Google Scholar]

[CIT0022] 22.Matos L.A., Castro L.M., and Lachos V.H., Censored mixed-effects models for irregularly observed repeated measures with applications to HIV viral loads, Test 25 (2016), pp. 627–653. doi: 10.1007/s11749-016-0486-2 [DOI] [Google Scholar]

[CIT0023] 23.Mattos T.B., Garay A.M., and Lachos V.H., Likelihood-based inference for censored linear regression models with scale mixtures of skew-normal distributions, J. Appl. Stat. 45 (2018), pp. 2039–2066. doi: 10.1080/02664763.2017.1408788 [DOI] [Google Scholar]

[CIT0024] 24.R Core Team , R: A Language and Environment for Statistical Computing, R Foundation for Statistical Computing, Vienna, Austria. 2018. Available at http://www.R-project.org/.

[CIT0025] 25.Santos N., López R.G., Israelian G., Mayor M., Rebolo R., García-Gil A., de Taoro M.P., and Randich S., Beryllium abundances in stars hosting giant planets, Astron. Astrophys. 386 (2002), pp. 1028–1038. doi: 10.1051/0004-6361:20020280 [DOI] [Google Scholar]

[CIT0026] 26.Wei G.C.G. and Tanner M.A., A Monte Carlo implementation of the EM algorithm and the poor man's data augmentation algorithms, J. Am. Stat. Assoc. 85 (1990), pp. 699–704. doi: 10.1080/01621459.1990.10474930 [DOI] [Google Scholar]

PERMALINK

Linear censored regression models with skew scale mixtures of normal distributions

Daniel C F Guzmán

Clécio S Ferreira

Camila B Zeller

Abstract

1. Introduction

2. The proposed model

2.1. Skew scale mixtures of normal distributions

Definition 2.1

Proposition 2.1

Definition 2.2

2.2. Model specification and ML estimation via the EM algorithm

2.3. Computational aspects

3. Standard error approximation

4. Simulation experiments

4.1. Experiment I: performance of the ML estimates

Figure 2.

Figure 3.

Figure 1.

Figure 4.

Figure 5.

Figure 6.

Figure 8.

Figure 9.

Figure 7.

Figure 10.

4.2. Experiment II: parameter recovery and selection criteria

4.2.1. Study I

Figure 12.

Figure 13.

Figure 14.

Figure 11.

Figure 15.

Table 1. Percentages of preferred models under the conditions examined.

4.2.2. Study II

Table 2. Experiment II – Study II – ν=(−0.5,0.1,0.1): MC Mean and MC SD are the respective average values (Mean) and the corresponding standard deviations (SD) of the MCEM estimates across all samples from fitting SN-CR, ST-CR, SCN-CR, STB-CR and SCNB-CR models.

Table 3. Experiment II – Study II – ν=(−0.5,1,1): MC Mean and MC SD are the respective average values (Mean) and the corresponding standard deviations (SD) of the MCEM estimates across all samples from fitting SN-CR, ST-CR, SCN-CR, ST B-CR and SCN B-CR models.

4.3. Experiment III: imputation of censored observations

Table 4. Evaluation of the prediction accuracy for the SN-CR, ST-CR and SSL-CR models with different censoring levels.

4.4. Experiment IV: influence of a single outlier

Figure 16.

Figure 17.

5. Apllication: stellar abundances dataset

Table 5. Stellar abundances dataset: Estimated parameter values of the SSMN-CR models via the MCEM algorithm with corresponding approximate standard errors (SE).

Table 6. Stellar abundances dataset: Comparison of log-likelihood maximum, AIC and BIC for fitted various models using the stellar abundances data.

6. Conclusions

Acknowledgments

Appendices.

Appendix 1. E-step in the MCEM algorithm.

Appendix 2. Complementary tables.

Table A1. Experiment I – Scenario 1: MC Mean and MC SD are the respective average values (Mean) and the corresponding standard deviations (SD) of the MCEM estimates across all samples from fitting the SN-CR model.

Table A2. Experiment I – Scenario 1: MC Mean and MC SD are the respective average values (Mean) and the corresponding standard deviations (SD) of the MCEM estimates across all samples from fitting the ST-CR model.

Table A3. Experiment I – Scenario 1: MC Mean and MC SD are the respective average values (Mean) and the corresponding standard deviations (SD) of the MCEM estimates across all samples from fitting the SSL-CR model.

Table A4. Experiment I – Scenario 1: MC Mean and MC SD are the respective average values (Mean) and the corresponding standard deviations (SD) of the MCEM estimates across all samples from fitting the SCN-CR model.

Table A5. Experiment II – Study I – without censorship and 5%,10%,20% and 30% censorships: MC Mean and MC SD are the respective average values (Mean) and the corresponding standard deviations (SD) of the MCEM estimates across all samples from fitting SSMN-CR models.

Funding Statement

Disclosure statement

References

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases

Table 2. Experiment II – Study II – $ν = (- 0.5, 0.1, 0.1)$ : MC Mean and MC SD are the respective average values (Mean) and the corresponding standard deviations (SD) of the MCEM estimates across all samples from fitting SN-CR, ST-CR, SCN-CR, ${S T}_{B}$ -CR and ${S C N}_{B}$ -CR models.

Table 3. Experiment II – Study II – $ν = (- 0.5, 1, 1)$ : MC Mean and MC SD are the respective average values (Mean) and the corresponding standard deviations (SD) of the MCEM estimates across all samples from fitting SN-CR, ST-CR, SCN-CR, ST $_{B}$ -CR and SCN $_{B}$ -CR models.

Table A5. Experiment II – Study I – without censorship and $5 %, 10 %, 20 %$ and $30 %$ censorships: MC Mean and MC SD are the respective average values (Mean) and the corresponding standard deviations (SD) of the MCEM estimates across all samples from fitting SSMN-CR models.