Abstract
A special source of difficulty in the statistical analysis is the possibility that some subjects may not have a complete observation of the response variable. Such incomplete observation of the response variable is called censoring. Censorship can occur for a variety of reasons, including limitations of measurement equipment, design of the experiment, and non-occurrence of the event of interest until the end of the study. In the presence of censoring, the dependence of the response variable on the explanatory variables can be explored through regression analysis. In this paper, we propose to examine the censorship problem in context of the class of asymmetric, i.e., we have proposed a linear regression model with censored responses based on skew scale mixtures of normal distributions. We develop a Monte Carlo EM (MCEM) algorithm to perform maximum likelihood inference of the parameters in the proposed linear censored regression models with skew scale mixtures of normal distributions. The MCEM algorithm has been discussed with an emphasis on the skew-normal, skew Student-t-normal, skew-slash and skew-contaminated normal distributions. To examine the performance of the proposed method, we present some simulation studies and analyze a real dataset.
Keywords: Censoring, MCEM algorithm, linear regression models, skew scale mixtures of normal distributions
1. Introduction
Regression models with censored dependent variable are applied in many fields, for instance, censoring in astronomical data due to nondetections. According to Feigelson and Babu [10] due to limited sensitivities, some objects may be undetected, leading to upper limits in their derived luminosities. It is emphasized that, even censored, all results from a study should be used in statistical analysis. The omission of censorship can lead to vicious conclusions. The importance of censorship can be noted from the large number of articles have been published in various journals. For instance, Arellano-Valle et al. [2] and Massuia et al. [20] proposed an extension of the CR model (regression model with censored dependent variable) with normal errors (N-CR model) to Student-t (T-CR model) errors. Garay et al. [15] proposed a CR model where the observational errors follow an SMN distributions (SMN-CR models) introduced by Andrews and Mallows [1]. Although these models are attractive, now, let us turn our attention away from Gaussian models (or symmetrical models), and study other important models. In asymmetric context, Massuia et al. [21] developed a Bayesian framework for CR models by assuming that the random errors follow an SMSN distributions [6]. Recently, Mattos et al. [23] proposed the CR models considering the SMSN class of distributions from a likelihood-based perspective (SMSN-CR models).
It is important to note that there is another family of distributions that takes into account asymmetry and heavy tails simultaneously, introduced by Ferreira et al. [11], called skew scale mixtures of normal distributions (SSMN). There are some important differences between the classes of SSMN and SMSN distributions. First, the mechanisms for generating random samples are slightly different, which produces different structures of distributions. Second, these classes present different coefficients of asymmetry and kurtosis. Thus, it is interesting to investigate the performance of the two classes under certain specific models, like linear censored regression models. In this work, we propose a linear regression models based on SSMN distributions for censored data, extending the works of Arellano-Valle et al. [2], Massuia et al. [20], Garay et al. [15] and supplementing the work of Massuia et al. [21]. Therefore, in this article, we provide some additional results for the censorship problem in a context of asymmetry.
The rest of the paper is organized as follows. In Section 2, we introduce the linear censored regression models with skew scale mixtures of normal distributions, called SSMN-CR models, including an type-EM algorithm for maximum-likelihood (ML) estimation. In Section 3, we describe how to obtain the standard errors of the ML estimates of the parameters in the SSMN-CR models. In Section 4 we present the results of simula- tion studies to explore the proposed CR model in different contexts, i.e, performance of the ML estimates, parameter recovery and selection criteria, imputation of censored observations and influence of a single outlier. Finally, in Section 5 we show the applicability of our proposal by analyzing a real dataset. All computations were carried out using the R software [24]. R code for analyzing the application may be downloaded from the website https://github.com/ClecioFerreira/SSMN-CR. Section 6 presents some concluding remarks.
2. The proposed model
2.1. Skew scale mixtures of normal distributions
In this section, we present the SSMN distributions introduced by Ferreira et al. [11]. We start with the definition of the skew-normal (SN) distribution that will be used in this work; see Azzalini [4]) for more details. A random variable if its probability density function (pdf) is given by
(1) |
where and are the pdf and the cumulative distribution function (cdf), respectively, of the distribution evaluated at x. The cdf of the Y is given by
(2) |
where and . Its stochastic representation is given by
(3) |
where denotes the absolute value of , and are independent, and means ‘distributed as’. One particular case of this distribution is the normal distribution when .
In a symmetric context, Lange and Sinsheimer [18] provided a group of thick-tailed distributions which has the normal distribution as particular case too. A random variable if its pdf assumes the form
(4) |
where is the cdf of a positive random variable U indexed by the parameter vector and is a strictly positive function. Moreover, when and , we denote .
An asymmetric version of SMN distributions was introduced by Ferreira et al. [11] as a challenging family for statistical procedures with asymmetric data. This new family of distributions contains all the distributions studied by Lange and Sinsheimer [18] with an extra parameter, which regulates the skewness of the distribution.
Definition 2.1
A random variable Y follows an SSMN with location parameter , scale factor and skewness parameter , if its pdf is given by
(5) where is as defined in Equation (4). For a random variable with pdf as in equation (5), we use the notation . If and we refer to it as a standard SSMN distributions and we denote it by . If , we have the SMN distributions.
Note that if , then . According to Ferreira et al. [12], a random variable has the following stochastic representation
(6) |
where are mutually independent. The stochastic representation in equation (6) facilitates to generate random samples from the truncated SSMN distributions, whose definition is given below (see Definition 2.2).
An SSMN random variable Y with a pdf as in Equation (5) has a hierarchical representation given by Proposition 2.1. The proof can be found in Ferreira et al. [11].
Proposition 2.1
Let . Then its hierarchical representation is given by
From Proposition 2.1, this convenient hierarchical representation facilitates EM-type implementation for the maximum-likelihood estimation and it can be used to simulate data. For example, the skew Student-t-normal distribution is derived from Proposition 2.1, by taking , and the conditional distribution is obtained of the stochastic representation given in (3). This class of distributions also includes the skew-normal distribution when U = 1.
The form of an SSMN distribution is determined by the distribution of U, whose distribution is indexed by a vector of parameters or scalar that controls the tails of the SSMN distributions. In this work we will concentrate on some special cases of the SSMN distributions, when , i.e. the skew Student-t-normal (ST), the skew-slash (SSL) and the skew-contaminated normal (SCN), whose properties have been widely discussed in Ferreira et al. [11]. Thus, to avoid excessive notation, from now on we will no longer use the κ variable.
An useful result is the cdf of SSMN distributions. Using the expressions (2) and Proposition 1 with , we have that
(7) |
where , with .
Another important class of distributions, which will be useful for implementing the type-EM algorithm, is the truncated SSMN distributions, given by the following definition.
Definition 2.2
Let and , with a<b. A random variable Y has a truncated SSMN distribution in the interval , denoted by , if it has the same distribution as Here [a, b] means that each extreme of the interval can be either open or closed. Thus, the pdf of the random variable is
where denotes the indicator function of the set A, i.e. if and otherwise, and represent the pdf and cdf of the SSMN distributions, respectively.
2.2. Model specification and ML estimation via the EM algorithm
In this section, we define the linear regression model with censored response variable and distributed errors in the family of SSMN distributions. First, consider the linear regression model, as defined by Ferreira et al. [13], given by
(8) |
where is an observed continuous response variable for individual i and is a random error. Associated with individual i, it is assumed a known covariate vector , which we use to specify the linear predictor , where is a p-dimensional vector of unknown regression coefficients.
We extend the linear regression model defined in (8) with the assumption that the response variable is not fully observed for all subjects. Thus, for the ith subject and assuming left-censoring, is a latent variable and the observed data take the form , where
(9) |
for some known threshold point The censoring indicator (or ) means that the ith observation is censored (or not censored). The extensions of our results to right-censoring are immediate: it is enough to transform the response and censoring level to and , respectively. We call the structure defined by (8) and (9) as the SSMN-CR model (linear censored regression models with skew scale mixtures of normal distributions). We use specific notations for particular SSMN distributions, for example, SN-CR and ST-CR in the skew-normal and skew Student-t-normal cases, respectively.
The log-likelihood function of the SSMN-CR model is given by
(10) |
where , is the observed sample of , and denotes the cdf of the distribution. Since the observed log-likelihood function involves complex expressions, it is very difficult to maximize directly for ML estimation. To overcome this problem, we propose an EM-type algorithm based on an augmented data representation of the SSMN-CR model. To do so, observe that, given a sample of size n from the model, the vector of censored responses is seen as a latent (partially unobservable) random vector. From Proposition 2.1 and Equation (3), the complete data are given by
(11) |
all independent, where denotes the univariate normal distribution , truncated on the interval .
Defining the vectors and , we have that the complete data log-likelihood associated with is
where c is a constant that does not depend on . As in the original proposal of Dempster et al. [7], the E-step of our algorithm consists of taking the conditional expectation , where is the current estimate of at the kth iteration. For cases in which the E-step has no analytic form, Wei and Tanner [26] proposed the MCEM algorithm, in which the E-step is replaced by a Monte Carlo approximation based on a number of independent simulations of the missing data. The M-step consists of maximization of with respect to . Thus, our MCEM algorithm for the SSMN-CR model can be summarized in the following steps:
E-step: Given the current estimate at the kth iteration, we obtain the conditional expectation of the complete data log-likelihood function given the observed and , named the Q-function, which is given by where, excluding unimportant constants,
(12) |
where , , , , , , and . By using known properties of conditional expectation, we obtain
For an uncensored observation i: in this case, , so and thus, and In this case, we have a closed-form expression for and for the SN, ST, SSL and SCN distributions, as can be found in Ferreira et al. [13].
- For a censored observation i: In this case, we have , and . Therefore, the conditional expectations
for by not having a closed form, requires us to introduce two intermediate steps in order to replace the E-step by a stochastic approximation using simulated data. Thus, the iteration k consists of the following steps:(13) -
--Let the vector of censored cases, where is generated from the for . Thus, the new vector of observations is a random sample generated for the censored cases and the observed values (uncensored cases), for Section 2.3 describes the details of the methods used to generate from the random vector
-
--Since we have the sequence , at the kth iteration, considering the conditional expectations and given in Ferreira et al. [13], for the SN, ST, SSL and SCN distributions, we replace the conditional expectations in Equation (13) with the following stochastic approximations:
for We chose See Appendix 1 for more details.(14)
-
--
CM-step: Update and by maximizing over , which leads to the following closed-form expressions
CML-step: Update by maximizing the actual marginal log-likelihood function, obtaining where is defined in Equation (10).
The iterations are repeated until a suitable convergence rule is satisfied. We use the criterion Useful starting values are required to implement this algorithm. We start the MCEM algorithm with initial values and . First, we consider the least-square estimation method for determining and Then, where be the sample skewness coefficient for residuals . However, in order to ensure that the true maximum-likelihood estimates is identified, we recommend running the EM algorithm using a range of different starting values. Note that when the M-step equations reduce to the equations obtained assuming SMN distributions; see Garay et al. [15]. Particularly, this algorithm clearly generalizes the results found in Arellano et al. [2] by taking and
2.3. Computational aspects
In this section, we describe a simulation method to generate random samples from the random variable We concentrate on the truncated skew normal (TSN), truncated skew Student-t-normal (TST), truncated skew slash (TSSL) and truncated skew contaminated normal (TSCN) distributions. According to Ferreira et al. [12], a random variable has the stochastic representation given in (6) and considering , we have that
(15) |
So, we obtain the following hierarchical representation
(16) |
where U and W are positive random variables. Then, we have that which implies
Therefore, the algorithm to generate random samples of the TSSMN models is as follows:
-
(P1)
Generate a random sample from .
-
(P2)
Generate a random sample from .
-
(P3)
Generate a random sample from .
-
(P4)
Using the stochastic representation given in (15), set Y.
Consequently, we draw from in the E-step.
3. Standard error approximation
In this section, we describe how to obtain the standard errors of the ML estimates of the parameters in the SSMN-CR model. Assuming the usual regularity conditions, we have that the asymptotic covariance matrix of can be approximated by the inverse of the empirical information matrix defined as where , with – see (12) [19]. Substituting by the ML estimates in , we obtain the approximation where is an individual score vector given by with explicit expressions for the elements of given by
Standard errors of are extracted from the square root of the diagonal elements of the inverse of . Following Mattos et al. [23], in our analysis, we focus solely on comparing the standard errors of , and λ.
In the next sections, simulation studies and a real dataset are presented in order to illustrate the performance of the proposed method.
4. Simulation experiments
4.1. Experiment I: performance of the ML estimates
In this section, we use Monte Carlo simulations to evaluate the performance of the ML estimates of the parameters in the SSMN-CR model. The simulation study is designed to observe the changes in estimates by varying sample sizes and right-censoring levels. The data were artificially generated from SSMN-CR models with such that , . We generated 100 datasets from each of the SN-CR, ST-CR, SSL-CR and SCN-CR models with the following setup: , for the ST-CR and SSL-CR, and for SCN-CR. The description of each scenario is as follows: Scenario 1: A censoring proportion of and different sample sizes, say, n = 50, 100, 200, 400 and 600. The goal in this study is to show the asymptotic behavior of the ML estimates obtained via the proposed MCEM algorithm. Scenario 2: A sample of size n = 100 and different censoring proportions, say, and We aim to study the behavior of the SSMN-CR models under different censoring proportions. The desired level of censoring was obtained in the following way: the observations were placed in increasing order, and a threshold point was fixed in such a way that the number of observations above this point corresponded to the desired level of censoring.
In both scenarios, for each set of data coming from the respective model SSMN-CR, we set to the same model SSMN-CR. Note that, for scenarios 1 and 2, there are 20 different simulation settings with 100 simulated Monte Carlo datasets for each one. Then, for each simulation, the ML estimates were recorded.
Figures 1–4 show boxplots of the parameter estimates for the SN-CR, ST-CR, SSL-CR and the SCN-CR model, respectively, under scenario 1. In general, for a given censoring level, the bias and the variability of the parameter estimates decrease when the sample size increases. This essentially agrees with the asymptotic properties of the ML method.
Figure 2.
Scenario 1: Boxplots of the estimates of and (line indicates the true value of the parameter) for the ST-CR model. Legend on panel (a).
Figure 3.
Scenario 1: Boxplots of the estimates of and (line indicates the true value of the parameter) for the SSL-CR model. Legend on panel (a).
Figure 1.
Scenario 1: Boxplots of the estimates of and (line indicates the true value of the parameter) for the SN-CR model. Legend on panel (a).
Figure 4.
Scenario 1: Boxplots of the estimates of and (line indicates the true value of the parameter) for the SCN-CR model. Legend on panel (a).
As observed by an anonymous referee and in order to evaluate the performance of the ML estimates of the parameters in the SSMN-CR model, we compared the bias (BIAS) and the mean square error (MSE) for each parameter over the 100 replicates. The BIAS and MSE measures are defined as in Garay et al. [15] (see Section 5.2). Analyzing Figures 5 and 6, for the censoring level , it can be seen that the Bias and MSE tend to zero in all SSMN-CR models when n increases, i.e. the ML estimates of the parameters in the SSMN-CR model improves when the sample size increases. In addition, Tables 7–10 (see Appendix 2) present the summary statistics for parameter estimation under this scenario.
Figure 5.
Scenario 1: Bias of parameters and λ for SSMN models.
Figure 6.
Scenario 1: MSE of parameters and λ for SSMN models.
Figures 7–10 show boxplots of the parameter estimates for the SN-CR, ST-CR, SSL-CR and the SCN-CR model, respectively, unde scenario 2. In general, when the sample size is fixed, we see that an increasing censoring level corresponds to increasing bias and variability of the parameter estimates.
Figure 8.
Scenario 2: Boxplots of the estimates of and (line indicates the true value of the parameter) for the ST-CR model. Legend on panel (a).
Figure 9.
Scenario 2: Boxplots of the estimates of and (line indicates the true value of the parameter) for the SSL-CR model. Legend on panel (a).
Figure 7.
Scenario 2: Boxplots of the estimates of and (line indicates the true value of the parameter) for the SN-CR model. Legend on panel (a).
Figure 10.
Scenario 2: Boxplots of the estimates of and (line indicates the true value of the parameter) for the SCN-CR model. Legend on panel (a).
4.2. Experiment II: parameter recovery and selection criteria
The main objective of this experiment is to illustrate the capacity of the censored models with asymmetry and heavy tails of fitting data with a structure generated from a family of different asymmetric distributions, and also investigating the effects on parametric inference.
4.2.1. Study I
In this study, we consider 100 samples of size 100 from a SCN-CR model with right-censoring levels or , , such that and and parameter values given by , e . For each sample, we set the SN-CR, ST-CR and SSL-CR models.
As in experiment I, summary statistics of parameter estimation are presented in Table 11 (see Appendix 2). From these results, for a specific model, we see that an increasing censoring level corresponds to increasing bias and MSE of parameter estimates. We also observe that an increasing censoring level corresponds to decreasing coverage probability at of the and parameter estimates. Additionally, we conclude that, for all levels of censoring, the SSMN distributions, with heavy tails, outperform the skew-normal distribution, having smaller BIAS and MSE, and greater coverage probability at of and parameters estimates.
Figures 11–15 show the boxplots of the parameter estimates for the SN-CR, ST-CR and SSL-CR models under the various levels of censorship considered. The estimates of the scale parameters, from the models with distributions of heavy tails, present smaller bias and variability in relation to the SN-CR model for all levels of censorship. Furthermore, it is readily seen that the estimates of the scale parameters obtained from the heavy-tailed models are less sensitive to the variation in the censoring level. This indicates that these models are not only robust to model misspecification but also to different levels of censoring. As expected, censored models with heavy-tailed distributions perform better than the skew-normal one in recovering the true parameter values independently of censoring levels.
Figure 12.
Boxplots of and (line indicates the true value of the parameter) for the SN-CR, ST-CR and SSL-CR models – censorship. Legend on panel (a).
Figure 13.
Boxplots of and (line indicates the true value of the parameter) for the SN-CR, ST-CR and SSL-CR models – censorship. Legend on panel (a).
Figure 14.
Boxplots of and (line indicates the true value of the parameter) for the SN-CR, ST-CR and SSL-CR models – censorship. Legend on panel (a).
Figure 11.
Boxplots of and (line indicates the true value of the parameter) for the SN-CR, ST-CR and SSL-CR models – without censorship. Legend on panel (a).
Figure 15.
Boxplots of and (line indicates the true value of the parameter) for the SN-CR, ST-CR and SSL-CR models – censorship. Legend on panel (a).
Finally, we compare the capacity of some classical selection criteria of models to select the appropriate model between different SSMN-CR models. Because there is no universal criterion for model selection, we chose two criteria to compare the proposed models, the Akaike information criterion (AIC) and the Bayesian information criterion (BIC), such that and where is the actual log-likelihood and ξ is the number of free parameters that have to be estimated in the model. Thus, for each simulation, the parameter estimates as well as the AIC and BIC criteria were recorded. Table 1 shows the percentages in which the censored models heavy tail distributions, specifically ST-CR and SSL-CR models, are preferable to the other adjusted SN-CR model. Not surprisingly, under different levels of censoring, all criteria favor censored models based on heavy tails distributions.
Table 1. Percentages of preferred models under the conditions examined.
Censoring levels | Examined conditions | AIC | BIC |
---|---|---|---|
SN vs ST | 81 | 75 | |
SN vs SSL | 86 | 76 | |
SN vs ST | 61 | 71 | |
SN vs SSL | 83 | 76 | |
SN vs ST | 87 | 80 | |
SN vs SSL | 92 | 81 | |
SN vs ST | 89 | 79 | |
SN vs SSL | 93 | 82 | |
SN vs ST | 89 | 80 | |
SN vs SSL | 90 | 81 |
4.2.2. Study II
In this simulation study, the aim is to show the flexibility of our proposed SSMN-CR models. As suggested by an anonymous reviewer, we generate artificial data from the proposed linear censored regression model, where the errors follow a distribution totally different in nature from the class of SSMN distributions studied in this paper, but that produces asymmetry and heavy tails. An appropriate example is the generalized hyperbolic (GH) distribution, which is a normal mean-variance mixture distribution. The random variable Y is said to have a normal mean-variance mixture distribution if
(17) |
where , V is a positive random variable, with distribution independent of W indexed by the parameter vector , whereas is a strictly positive function which is associated with the mixture variable V. If and the mixture variable V is distributed Generalized Inverse Gaussian (GIG), then Y is said to have a GH distribution. More details of GIG distribution can be found in Jørgensen [16].
In this study, we consider 100 samples of size 200 from a linear censored regression model, where the errors follow a GH distribution with right-censoring level , , such that and and parameter values given by , and the following values for : situation 1 – or situation 2 – , with the first situation considered for providing a distribution with greater kurtosis. For each sample, we set the SN-CR, ST-CR, SCN-CR, ST -CR and SCN -CR models (sub-index B indicates the censored model based on the SMSN distributions, proposed by Branco and Dey, [6]).
Tables 2 and 3 show that the heavy-tailed models outperform the skew-normal one for the two situations considered in this study. In fact, those models have smaller standard deviations, BIAS and MSE of parameters estimates. The variance components are not comparable since they are on different scales. In addition, Monte Carlo means of the model comparison criteria (MC AIC and MC BIC) strongly favor the heavy-tailed ones. It can also be seen that, in general, the standard deviations, BIAS and MSE of under the ST-CR and SCN-CR models are smaller than under ST -CR and SCN -CR models, indicating that our proposed models are capable of producing more accurate and precise estimates. According to the MC AIC (or MC BIC) values, the SSMN-CR models fit the data well concerning its competitors. See Tables 2 and 3, where the best fit is indicated by (*1), the second best by (*2) and the third best by (*3).
Table 2. Experiment II – Study II – : MC Mean and MC SD are the respective average values (Mean) and the corresponding standard deviations (SD) of the MCEM estimates across all samples from fitting SN-CR, ST-CR, SCN-CR, -CR and -CR models.
Fit | Statistics | Criteria | |||||
---|---|---|---|---|---|---|---|
MC Mean | MC SD | BIAS | MSE | MC AIC | MC BIC | ||
SN-CR | −1.4161 | 0.9674 | 2.4161 | 6.7639 | |||
−1.8813 | 0.4768 | 0.8813 | 1.0018 | ||||
−7.3692 | 1.6299 | 3.3692 | 13.9816 | 1261.1480 | 1294.1310 | ||
MC Mean | MC SD | BIAS | MSE | MC AIC | MC BIC | ||
ST-CR | 1.2344 | 0.3708 | 0.2828 | 0.1910 | |||
−1.0060 | 0.0567 | 0.0407 | 0.0032 | ||||
−4.0431 | 0.2307 | 0.1641 | 0.0546 | 859.0534 (1*) | 875.5450 (1*) | ||
MC Mean | MC SD | BIAS | MSE | MC AIC | MC BIC | ||
SCN-CR | 2.2601 | 0.5887 | 1.2601 | 1.9310 | |||
−1.0568 | 0.0712 | 0.0713 | 0.0082 | ||||
−4.2575 | 0.2555 | 0.2920 | 0.1309 | 1038.280 | 1054.772 | ||
MC Mean | MC SD | BIAS | MSE | MC AIC | MC BIC | ||
ST -CR | −2.5602 | 0.8171 | 3.5602 | 13.3357 | |||
−1.2343 | 0.1011 | 0.2343 | 0.0650 | ||||
−4.9221 | 0.3660 | 0.9221 | 0.9829 | 938.4083 (3*) | 954.8999 (3*) | ||
MC Mean | MC SD | BIAS | MSE | MC AIC | MC BIC | ||
SCN -CR | −2.2265 | 0.7843 | 3.2265 | 11.0193 | |||
−1.2291 | 0.0948 | 0.2291 | 0.0614 | ||||
−4.9104 | 0.3668 | 0.9104 | 0.9620 | 927.4107 (2*) | 950.4989 (2*) |
Table 3. Experiment II – Study II – : MC Mean and MC SD are the respective average values (Mean) and the corresponding standard deviations (SD) of the MCEM estimates across all samples from fitting SN-CR, ST-CR, SCN-CR, ST -CR and SCN -CR models.
Fit | Statistics | Criteria | |||||
---|---|---|---|---|---|---|---|
MC Mean | MC SD | BIAS | MSE | MC AIC | MC BIC | ||
SN-CR | −2.5447 | 0.5001 | 3.5447 | 12.8124 | |||
−1.1524 | 0.1255 | 0.1640 | 0.0388 | ||||
−4.7063 | 0.4903 | 0.7236 | 0.7369 | 993.5653 | 1026.5484 | ||
MC Mean | MC SD | BIAS | MSE | MC AIC | MC BIC | ||
ST-CR | −0.1049 | 0.3875 | 1.1049 | 1.3694 | |||
−0.9752 | 0.1037 | 0.0862 | 0.0113 | ||||
−3.8768 | 0.4076 | 0.3396 | 0.1797 | 946.4920 (3*) | 962.9836 (2*) | ||
MC Mean | MC SD | BIAS | MSE | MC AIC | MC BIC | ||
SCN-CR | 0.0651 | 0.3882 | 0.9349 | 1.0232 | |||
−0.9751 | 0.1025 | 0.0836 | 0.0110 | ||||
−3.8720 | 0.4031 | 0.3380 | 0.1772 | 947.6776 | 964.1692 (3*) | ||
MC Mean | MC SD | BIAS | MSE | MC AIC | MC BIC | ||
ST -CR | −2.8953 | 0.4419 | 3.8953 | 15.3664 | |||
−1.0503 | 0.0959 | 0.0850 | 0.0116 | ||||
−4.2305 | 0.3831 | 0.3568 | 0.1985 | 936.5145 (1*) | 953.0061 (1*) | ||
MC Mean | MC SD | BIAS | MSE | MC AIC | MC BIC | ||
SCN -CR | −2.5667 | 0.4444 | 3.5667 | 12.9167 | |||
−1.0828 | 0.1016 | 0.1043 | 0.0171 | ||||
−4.3755 | 0.4135 | 0.4456 | 0.3103 | 942.9701 (2*) | 966.0584 |
4.3. Experiment III: imputation of censored observations
To deal with the censored values, we use the imputation procedure by replacing the censored values by obtained from the MCEM algorithm. Therefore, when the censored values are imputed, a complete dataset is obtained. The reason to use the imputation procedure is that it avoids computing truncated conditional expectations of the SSMN distributions originated by the censoring scheme.
Then, in this section, we are interested in predicting the censored observations, denoted by . In the implementation of the MCEM algorithm, at the kth iteration, the predictions of the censored observations, denoted by , are calculated as . It is important to remark that the components are obtained without computational effort from E-step of the proposed MCEM algorithm. Although we can obtain predicted values of the censored responses at every iteration of the algorithm, we consider only the values of these predictions at the last iteration of the MCEM algorithm.
In this experiment, we consider 100 samples of size 100 from a SCN-CR model with right-censoring levels or , , such that and and the same configuration for the model parameters, previously defined in study II (see Section 4.2). Then, for each sample, we fitted the SN-CR, ST-CR and SSL-CR models, and the predictions of censored observations were recorded. In order to investigate the performance of the prediction when the model distribution is poorly specified, we considered two empirical discrepancy measures, namely the MAE (mean absolute error) and MSE (mean square error); see Matos et al. [22] for more details. These measures are given by
where is the original value and is the predicted value for the ith observation censored in the j simulation, for and such that is the number of censored observations or depending on the censoring level considered.
Table 4 shows the comparison between the predicted values and the real ones under the SN-CR, ST-CR and SSL-CR with different censoring levels. One can see from these results that the ST-CR and SSL-CR models generate predictive values close to the real ones. As expected, the MCEM algorithm provides a satisfactory imputation for these censored values when heavy-tailed distributions are used. Finally, there is a loss of accuracy in predicting the censored observations as the censoring level increases.
Table 4. Evaluation of the prediction accuracy for the SN-CR, ST-CR and SSL-CR models with different censoring levels.
Censoring levels | Measures | SN-CR | ST-CR | SSL-CR |
---|---|---|---|---|
MAE | 1.5993 | 1.5778 | 1.5801 | |
MSE | 0.9037 | 0.8342 | 0.8600 | |
MAE | 4.1277 | 3.9365 | 4.0320 | |
MSE | 3.0452 | 2.6905 | 2.8763 | |
MAE | 10.2852 | 9.7301 | 9.9687 | |
MSE | 9.25091 | 7.9992 | 8.5780 | |
MAE | 18.3699 | 17.2280 | 17.8197 | |
MSE | 18.4904 | 16.1069 | 17.3085 |
4.4. Experiment IV: influence of a single outlier
The robustness aspects of the SSMN-CR models can be studied considering the influence of a single outlying observation on the estimates of . Without loss of generality, we simulated one dataset from the skew-normal trigonometric regression model, such that
, where with right-censoring level ( ). For this sample, we fitted the SN-CR, ST-CR and SSL-CR models. We analyzed the influence of a change of δ units in a single observation on the estimates of . First, we replaced the observation with the contaminated value In this example, we contaminated the uncensored observation i = 200 and varied δ between 0 and 20. Figure 16 shows the scatter plot of the data and illustrates the contamination of the observation 200 (denoted by asterisk).
Figure 16.
(a) Scatter plot of the data from the skew-normal trigonometric regression model with right-censoring level . (b) Contamination of observation 200 ranging δ from 0 to 20.
Following Fagundes et al. [8], the influence of a single outlier on the estimates can be assessed based on the mean magnitude of relative error (MMER), which is defined as follows: suppose that is a generic vector of parameters and that is the estimate of after contamination of the data. Then, we define where is the ML of . For example, we evaluate where or with e .
In Figure 17, we present the results of the MMERs for different contaminations δ. In Figures 17(a,b), as expected, the estimates in models with heavy tails are less affected by variations of δ than those in the SN-CR model. As typically considered in the literature, the relevance of using the SSMN distributions is related to its capability of down-weighting outlying observations. In addition, Figure 17(c) shows the BIC values for all fitted models, for each disturbed version of the original dataset. Clearly, it can be seen that as the observation become more atypical, the heavy-tailed models better fit the data.
Figure 17.
MMERs of the EM estimates for (a) and (b) (c) BIC values and (d) MMERs of the MCEM predictions of the censored observations for different contaminations of δ for observation and right-censoring level .
In addition, this simulation study also evaluates the influence of a single aberrant observation in the prediction of censored components via algorithm MCEM. In this context, suppose that is a generic vector of predictions of the censored observations and thus, we consider where are the predictions of the censored observations after contamination of the data. In this example, we have 20 censored observations and Figure 17(d) shows the results of the MMERs for different contaminations δ. Not surprisingly, the models with heavy tails have better performance in predicting the censored observations, showing their robustness to discrepant observations.
5. Apllication: stellar abundances dataset
In this section, we illustrate our proposed methods with a dataset obtained from Santos et al. [25], which is available, e.g. in the R package astrodatR [9], under the name Stellar abundances. This dataset contains measurements for 68 solar-type stars and for our analysis, following Mattos et al. [23], we considered: as the response variable, which represents the log of the abundance of the light element beryllium (Be) in stars scaled to Sun's abundance (i.e. the Sun has ) and as the explanatory variable, which represents the effective stellar surface temperature (in kelvin). According to Feigelson and Babu [10] due to limited sensitivities, some objects may be undetected, leading to upper limits in their derived luminosities. For this dataset we have 12 left-censored data points, i.e. 12 undetected beryllium measurement, that represents of observations.
Mattos et al. [23] fitted various SMSN-CR models and they concluded the ST-CR model (sub-index B indicates the censored model based on the SMSN distributions, proposed by Branco and Dey, [6]) seems to better fit the Stellar abundances data. Thus, we analyzed the Stellar abundances dataset with the aim of providing additional inferences by using SSMN distributions in the context of linear censored regression models. Table 5 contains the ML estimates for the parameters of the three models, i.e. ST-CR, SSL-CR and SCN-CR models, together with their corresponding standard errors calculated via the empirical information matrix. We refer the interested reader to see Table 5 in Mattos et al. [23], which contains the ML estimates of the parameters from the SMSN-CR models, including SN-CR model.
Table 5. Stellar abundances dataset: Estimated parameter values of the SSMN-CR models via the MCEM algorithm with corresponding approximate standard errors (SE).
ST-CR | SSL-CR | SCN-CR | ||||
---|---|---|---|---|---|---|
Parameter | Estimate | SE | Estimate | SE | Estimate | SE |
−1.8713 | 0.0318 | −1.8049 | 0.0233 | −1.7203 | 0.0233 | |
0.5245 | 0.0054 | 0.5167 | 0.0040 | 0.5029 | 0.0040 | |
0.0333 | 0.0091 | 0.0283 | 0.0070 | 0.0474 | 0.0070 | |
λ | −1.9058 | 0.4156 | −2.0179 | 0.4038 | −2.6846 | 0.4038 |
τ | 2.0101 | – | 1.0101 | – | 0.2979 | – |
γ | – | – | – | – | 0.1000 | – |
The results of the fit in terms of log-likelihood, AIC and BIC are provided in Table 6. Note that the SN-CR model does not seem to fit the data well. We can see that the models with higher log-likelihood are the ST-CR, the ST-CR and the SSL-CR models. We also see that both AIC and BIC criteria favor the ST-CR model, and then the ST-CR model closely followed by the SSL-CR model, i.e.models that have heavy tails and asymmetric behaviors.
Table 6. Stellar abundances dataset: Comparison of log-likelihood maximum, AIC and BIC for fitted various models using the stellar abundances data.
SSMN-CR models | log-likelihood | AIC | BIC |
---|---|---|---|
ST-CR | −1.7802 | 11.5605 (1*) | 20.4385 (1*) |
SCN-CR | −4.4375 | 16.8750 | 25.7531 |
SSL-CR | −3.2474 | 14.4949 | 23.3729 |
SMSN-CR models | log-likelihood | AIC | BIC |
SN-CR | −18.2276 | 44.4553 | 53.3333 |
ST-CR | −2.1278 | 12.2556 (2*) | 21.1336 (2*) |
SCN-CR | −3.7473 | 15.4946 | 24.3372 |
SSL-CR | −2.7253 | 13.4506 (3*) | 22.3287 (3*) |
Note: Best fit indicated by (*1), second best by (*2), and third best by (*3).
6. Conclusions
In this work, we have proposed a linear regression model with censored responses based on skew scale mixtures of normal distributions, denoted by SSMN-CR models, as a replacement to the conventional choice of normal (or symmetric) distribution for censored linear models. Our results generalize the recent works by Arellano-Valle et al. [2], Massuia et al. [20] and Garay et al. [15] from a frequentist point of view. Also, the results of this paper are a necessary supplement to those presented in Mattos et al. [23] in the sense that both classes of asymmetric distributions, SMSN and SSMN, are special cases of the SSMSN family proposed by Arellano et al. [3].
An MCEM algorithm is developed by exploring the statistical properties of the class considered, which is implemented in the R software [24]. R code for analyzing the application may be downloaded from the website https://github.com/ClecioFerreira/SSMN-CR. The developed algorithm can be viewed as consisting of two parts, associated with the uncensored data and censored data, respectively. In the case of no censoring, the algorithm naturally reduces to the standard EM algorithm (see [13]). Simulation results under various scenarios and real data analysis indicate that the proposed method can be used to model data that present asymmetry and heavy tails with great flexibility.
Finally, the method proposed in this paper can be extended in multivariate settings and carrying out diagnostics analysis in the SSMN-CR models. Besides, as pointed out by a referee, further research could be centered in obtaining closed (or implementable) form expressions for the conditional expectations in the E-step, such as the recent proposal of Lachos et al. [17] (see also, [14]) in the context of SMSN distributions. Although relevant, a deeper investigation of these moments is beyond the scope of the present paper. We thank the anonymous referee for valuable suggestions for future research. We hope to report these findings in a future paper.
Acknowledgments
Camila Borelli Zeller was supported by CNPq and FAPEMIG.
Appendices.
Appendix 1. E-step in the MCEM algorithm.
E-step: The notation used is that of Section 2.2.
For a censored observation i: In this case, we have , and . At the kth iteration, considering the conditional expectations and given in Ferreira et al. [13], for the SN, ST, SSL and SCN distributions, we have that
-
where
for the ST-CR model,
for the SSL-CR model and
for the SCN-CR model , where and denotes the cdf of a distributions Gama available in x.
Then, for where .
-
. Then,
Appendix 2. Complementary tables.
Table A1. Experiment I – Scenario 1: MC Mean and MC SD are the respective average values (Mean) and the corresponding standard deviations (SD) of the MCEM estimates across all samples from fitting the SN-CR model.
Parameters | ||||||
---|---|---|---|---|---|---|
Fit | λ | |||||
SN-CR | n = 50 | MC Mean | 5.1117 | 1.0033 | 0.8764 | 4.0716 |
MC SD | 0.3700 | 0.0321 | 0.3172 | 3.3792 | ||
BIAS | 0.2628 | 0.0256 | 0.2772 | 2.5540 | ||
MSE | 0.1480 | 0.0010 | 0.1149 | 12.4533 | ||
n = 100 | MC Mean | 5.0397 | 1.0010 | 0.9126 | 3.1606 | |
MC SD | 0.1533 | 0.0218 | 0.2034 | 1.3715 | ||
BIAS | 0.1177 | 0.0168 | 0.1803 | 0.9664 | ||
MSE | 0.0248 | 0.0005 | 0.0486 | 1.8879 | ||
n = 200 | MC Mean | 4.9934 | 1.0017 | 0.9686 | 3.1818 | |
MC SD | 0.1091 | 0.0139 | 0.1579 | 0.9829 | ||
BIAS | 0.0882 | 0.0116 | 0.1295 | 0.7352 | ||
MSE | 0.0118 | 0.0002 | 0.0257 | 0.9895 | ||
n = 400 | MC Mean | 5.0379 | 0.9971 | 0.9222 | 2.8531 | |
MC SD | 0.0733 | 0.0108 | 0.0989 | 0.5434 | ||
BIAS | 0.0656 | 0.0089 | 0.1035 | 0.4633 | ||
MSE | 0.0068 | 0.0001 | 0.0157 | 0.3140 | ||
n = 600 | MC Mean | 5.0367 | 0.9968 | 0.9234 | 2.8786 | |
MC SD | 0.0483 | 0.0081 | 0.0895 | 0.4779 | ||
BIAS | 0.0496 | 0.0070 | 0.0958 | 0.4064 | ||
MSE | 3.6514e−03 | 7.5503e−05 | 1.3798e−02 | 2.4086-01 |
Table A2. Experiment I – Scenario 1: MC Mean and MC SD are the respective average values (Mean) and the corresponding standard deviations (SD) of the MCEM estimates across all samples from fitting the ST-CR model.
Parameters | ||||||
---|---|---|---|---|---|---|
Fit | λ | |||||
ST-CR | n = 50 | MC Mean | 5.1159 | 0.9979 | 0.8569 | 3.5389 |
MC SD | 0.2566 | 0.0329 | 0.3755 | 2.4125 | ||
BIAS | 0.2269 | 0.0269 | 0.3354 | 1.8752 | ||
MSE | 0.0786 | 0.0011 | 0.1601 | 6.0526 | ||
n = 100 | MC Mean | 5.0498 | 1.0006 | 0.8687 | 3.0797 | |
MC SD | 0.1545 | 0.0201 | 0.2704 | 1.5868 | ||
BIAS | 0.1241 | 0.0157 | 0.2436 | 1.1727 | ||
MSE | 0.0261 | 0.0004 | 0.0896 | 2.4991 | ||
n = 200 | MC Mean | 5.0031 | 1.0005 | 0.9954 | 3.2745 | |
MC SD | 0.1518 | 0.0172 | 0.1999 | 1.3447 | ||
BIAS | 0.1172 | 0.0130 | 0.1786 | 0.8941 | ||
MSE | 0.0228 | 0.0003 | 0.0410 | 1.8656 | ||
n = 400 | MC Mean | 5.0539 | 1.0002 | 0.8382 | 2.5117 | |
MC SD | 0.0794 | 0.0105 | 0.1224 | 0.5376 | ||
BIAS | 0.0768 | 0.0088 | 0.1546 | 0.6256 | ||
MSE | 0.0091 | 0.0001 | 0.0396 | 0.5246 | ||
n = 600 | MC Mean | 5.0457 | 1.0003 | 0.8622 | 2.5864 | |
MC SD | 0.0695 | 0.0090 | 0.1143 | 0.5060 | ||
BIAS | 0.0676 | 0.0072 | 0.1496 | 0.5394 | ||
MSE | 6.8751e−03 | 8.0643e−05 | 3.1923e−02 | 4.2452e−01 |
Table A3. Experiment I – Scenario 1: MC Mean and MC SD are the respective average values (Mean) and the corresponding standard deviations (SD) of the MCEM estimates across all samples from fitting the SSL-CR model.
Parameters | ||||||
---|---|---|---|---|---|---|
Fit | λ | |||||
SSL-CR | n = 50 | MC Mean | 5.1150 | 0.9930 | 0.9192 | 4.0079 |
MC SD | 0.4000 | 0.0324 | 0.2857 | 2.9206 | ||
BIAS | 0.2354 | 0.0253 | 0.2376 | 2.1866 | ||
MSE | 0.1716 | 0.0011 | 0.0874 | 9.4603 | ||
n = 100 | MC Mean | 5.0714 | 1.0001 | 0.8839 | 3.1415 | |
MC SD | 0.2213 | 0.0237 | 0.2628 | 1.8025 | ||
BIAS | 0.1759 | 0.0189 | 0.2292 | 1.2252 | ||
MSE | 0.0536 | 0.0005 | 0.0819 | 3.2365 | ||
n = 200 | MC Mean | 4.9934 | 1.00037 | 1.010 | 3.2875 | |
MC SD | 0.1138 | 0.0168 | 0.1482 | 1.0242 | ||
BIAS | 0.0929 | 0.0136 | 0.1196 | 0.7542 | ||
MSE | 0.0129 | 0.0003 | 0.0218 | 1.1212 | ||
n = 400 | MC Mean | 5.0493 | 0.9952 | 0.9031 | 2.8113 | |
MC SD | 0.0842 | 0.0123 | 0.1113 | 0.4879 | ||
BIAS | 0.0786 | 0.0102 | 0.1206 | 0.4222 | ||
MSE | 0.0094 | 0.0002 | 0.0216 | 0.2713 | ||
n = 600 | MC Mean | 5.0457 | 0.9958 | 0.9154 | 2.8626 | |
MC SD | 0.0644 | 0.0090 | 0.0942 | 0.4728 | ||
BIAS | 0.0657 | 0.0081 | 0.1059 | 0.3980 | ||
MSE | 6.1938e−03 | 9.7448e−05 | 1.5947e−02 | 2.4017e−01 |
Table A4. Experiment I – Scenario 1: MC Mean and MC SD are the respective average values (Mean) and the corresponding standard deviations (SD) of the MCEM estimates across all samples from fitting the SCN-CR model.
Parameters | ||||||
---|---|---|---|---|---|---|
Fit | λ | |||||
SCN-CR | n = 50 | MC Mean | 5.0916 | 1.0021 | 0.9308 | 3.5118 |
MC SD | 0.2524 | 0.0365 | 0.4163 | 2.4528 | ||
BIAS | 0.20968 | 0.0289 | 0.3548 | 1.8838 | ||
MSE | 0.0714 | 0.0013 | 0.1764 | 6.2180 | ||
n = 100 | MC Mean | 5.0817 | 1.0014 | 0.8307 | 2.8386 | |
MC SD | 0.1788 | 0.0223 | 0.2375 | 1.6371 | ||
BIAS | 0.1483 | 0.0178 | 0.2423 | 1.2717 | ||
MSE | 0.0383 | 0.0005 | 0.0845 | 2.6794 | ||
n = 200 | MC Mean | 5.0049 | 0.9999 | 1.0096 | 3.3595 | |
MC SD | 0.1071 | 0.0131 | 0.2015 | 1.1964 | ||
BIAS | 0.0838 | 0.0103 | 0.1609 | 0.9328 | ||
MSE | 0.0117 | 0.0001 | 0.0403 | 1.5463 | ||
n = 400 | MC Mean | 5.0548 | 0.9972 | 0.8862 | 2.7467 | |
MC SD | 0.0937 | 0.0114 | 0.1325 | 0.6159 | ||
BIAS | 0.0816 | 0.00954 | 0.1480 | 0.5433 | ||
MSE | 0.0114 | 0.0002 | 0.0303 | 0.4397 | ||
n = 600 | MC Mean | 5.0367 | 0.9968 | 0.9234 | 2.8786 | |
MC SD | 0.0483 | 0.0081 | 0.0895 | 0.4779 | ||
BIAS | 0.0496 | 0.0070 | 0.0958 | 0.4064 | ||
MSE | 3.6514e−03 | 7.5503e−05 | 1.3798e−02 | 2.4086e−01 |
Table A5. Experiment II – Study I – without censorship and and censorships: MC Mean and MC SD are the respective average values (Mean) and the corresponding standard deviations (SD) of the MCEM estimates across all samples from fitting SSMN-CR models.
Parameters | |||||
---|---|---|---|---|---|
Cens. | Fit | ||||
SN-CR | MC Mean | 1.2085 | −1.0076 | −4.0386 | |
MC SD | 0.3412 | 0.0888 | 0.2686 | ||
BIAS | 0.3256 | 0.0629 | 0.2153 | ||
MSE | 0.1664 | 0.0066 | 0.0729 | ||
MC CP | 91 | 91 | 91 | ||
ST-CR | MC Mean | 0.9348 | −1.0051 | −4.0446 | |
MC SD | 0.3333 | 0.0776 | 0.2508 | ||
BIAS | 0.2425 | 0.0625 | 0.2086 | ||
MSE | 0.0941 | 0.0064 | 0.0642 | ||
MC CP | 98 | 98 | 98 | ||
SSL-CR | MC Mean | 0.9963 | −1.0059 | −4.0473 | |
MC SD | 0.3371 | 0.0807 | 0.2526 | ||
BIAS | 0.2470 | 0.0605 | 0.2098 | ||
MSE | 0.0969 | 0.0060 | 0.0654 | ||
MC CP | 97 | 97 | 97 | ||
SN-CR | MC Mean | 0.7449 | −0.9869 | −3.9911 | |
MC SD | 0.3199 | 0.0804 | 0.3297 | ||
BIAS | 0.3285 | 0.0695 | 0.2618 | ||
MSE | 0.1588 | 0.0072 | 0.1077 | ||
MC CP | 88 | 88 | 88 | ||
ST-CR | MC Mean | 1.0867 | −0.9907 | −4.0016 | |
MC SD | 0.2956 | 0.0796 | 0.3058 | ||
BIAS | 0.2779 | 0.0632 | 0.2387 | ||
MSE | 0.1142 | 0.0060 | 0.0926 | ||
MC CP | 96 | 96 | 96 | ||
SSL-CR | MC Mean | 0.9069 | −0.9878 | −3.9854 | |
MC SD | 0.2985 | 0.0767 | 0.3036 | ||
BIAS | 0.2778 | 0.0648 | 0.2380 | ||
MSE | 0.1125 | 0.0065 | 0.0915 | ||
MC CP | 95 | 95 | 95 | ||
SN-CR | MC Mean | 0.9468 | −0.9607 | −3.9132 | |
MC SD | 0.3507 | 0.0773 | 0.3951 | ||
BIAS | 0.4186 | 0.0708 | 0.3153 | ||
MSE | 0.2652 | 0.0079 | 0.1621 | ||
MC CP | 78 | 78 | 78 | ||
ST-CR | MC Mean | 0.6126 | −0.9655 | −3.9511 | |
MC SD | 0.3409 | 0.0759 | 0.3462 | ||
BIAS | 0.2808 | 0.0689 | 0.2805 | ||
MSE | 0.1246 | 0.0071 | 0.1211 | ||
MC CP | 95 | 95 | 95 | ||
SSL-CR | MC Mean | 0.7741 | -0.9635 | −3.9353 | |
MC SD | 0.3361 | 0.0733 | 0.3497 | ||
BIAS | 0.3166 | 0.0678 | 0.2832 | ||
MSE | 0.1628 | 0.0066 | 0.1253 | ||
MC CP | 85 | 85 | 85 | ||
SN-CR | MC Mean | 0.4142 | −0.9465 | −3.7297 | |
MC SD | 0.4844 | 0.1016 | 0.3782 | ||
BIAS | 0.6136 | 0.0904 | 0.3773 | ||
MSE | 0.5411 | 0.0131 | 0.2146 | ||
MC CP | 76 | 76 | 76 | ||
ST-CR | MC Mean | 0.7429 | −0.9609 | −3.7649 | |
MC SD | 0.4471 | 0.0999 | 0.3583 | ||
BIAS | 0.4299 | 0.0839 | 0.3317 | ||
MSE | 0.2984 | 0.0114 | 0.1824 | ||
MC CP | 89 | 89 | 89 | ||
SSL-CR | MC Mean | 0.5526 | −0.9493 | −3.7246 | |
MC SD | 0.4284 | 0.0982 | 0.3461 | ||
BIAS | 0.5008 | 0.0860 | 0.3502 | ||
MSE | 0.3818 | 0.0121 | 0.1944 | ||
MC CP | 82 | 82 | 82 | ||
SN-CR | MC Mean | 0.2360 | −0.9130 | −3.7078 | |
MC SD | 0.5684 | 0.1000 | 0.4731 | ||
BIAS | 0.8115 | 0.1142 | 0.4578 | ||
MSE | 0.8279 | 0.0179 | 0.3069 | ||
MC CP | 66 | 66 | 66 | ||
ST-CR | MC Mean | 0.5699 | −0.9260 | −3.6838 | |
MC SD | 0.4966 | 0.0982 | 0.3958 | ||
BIAS | 0.5953 | 0.1046 | 0.4263 | ||
MSE | 0.5047 | 0.0154 | 0.2551 | ||
MC CP | 88 | 88 | 88 | ||
SSL-CR | MC Mean | 0.3360 | -0.9076 | -3.6342 | |
MC SD | 0.5110 | 0.0973 | 0.4017 | ||
BIAS | 0.7316 | 0.1109 | 0.4512 | ||
MSE | 0.6995 | 0.0171 | 0.2936 | ||
MC CP | 72 | 72 | 72 |
Note: MC CP is the coverage probability at
Funding Statement
This work was supported by Conselho Nacional de Desenvolvimento Científico e Tecnológico and Fundação de Amparo à Pesquisa do Estado de Minas Gerais
Disclosure statement
No potential conflict of interest was reported by the authors.
References
- 1.Andrews D.R. and Mallows C.L., Scale mixture of normal distributions, J. Roy. Stat. Soc. B 36 (1974), pp. 99–102. [Google Scholar]
- 2.Arellano R.B., Castro L.M., González-Farías G., and Munõz-Gajardo K.A., Student-t censored regression model: Properties and inference, Stat. Methods Appl. 21 (2012), pp. 453–473. doi: 10.1007/s10260-012-0199-y [DOI] [Google Scholar]
- 3.Arellano-Valle R.B., Ferreira C.S., and Genton M.G., Scale and shape mixtures of multivariate skew-normal distributions, J. Multivar. Anal. 166 (2018), pp. 98–110. doi: 10.1016/j.jmva.2018.02.007 [DOI] [Google Scholar]
- 4.Azzalini A., A class of distributions which includes the normal ones, Scand. J. Stat. 12 (1985), pp. 171–178. [Google Scholar]
- 5.Berkane M., Kano Y., and Bentler P.M., Pseudo maximum likelihood estimation in elliptical theory: Effects of misspecification, Comput. Stat. Data Anal. 18 (1994), pp. 255–267. doi: 10.1016/0167-9473(94)90175-9 [DOI] [Google Scholar]
- 6.Branco M.D. and Dey D.K., A general class of multivariate skew elliptical distributions, J. Multivariate Anal. 79 (2001), pp. 99–113. doi: 10.1006/jmva.2000.1960 [DOI] [Google Scholar]
- 7.Dempster A., Laird N., and Rubin D.B., Maximum likelihood from incomplete data via the EM algorithm, J. Roy. Stat. Soc. B 39 (1977), pp. 1–38. [Google Scholar]
- 8.Fagundes R.A., Souza R.M.C., and Cysneiros F.J.A., Robust regression with application to symbolic interval data, Eng. Appl. Artif. Intell. 26 (2013), pp. 564–573. doi: 10.1016/j.engappai.2012.05.004 [DOI] [Google Scholar]
- 9.Feigelson E.D., AstrodatR: Astronomical data, R package v. 0.1, 2014, Available at https://cran.r-project.org/web/packages/astrodatR/.
- 10.Feigelson E.D. and Babu G.J., Modern Statistical Methods for Astronomy: With R Applications, Cambridge University Press, Cambridge, 2012. [Google Scholar]
- 11.Ferreira C.S., Bolfarine H., and Lachos V.H., Skew scale mixture of normal distributions: Properties and estimation, Stat. Methodol. 8 (2011), pp. 154–171. doi: 10.1016/j.stamet.2010.09.001 [DOI] [Google Scholar]
- 12.Ferreira C.S., Bolfarine H., and Lachos V.H., Likelihood-based inference for multivariate skew scale mixtures of normal distributions, AStA Adv. Stat. Anal. 100 (2016), pp. 421–441. doi: 10.1007/s10182-016-0266-z [DOI] [Google Scholar]
- 13.Ferreira C.S., Lachos V.H., and Bolfarine H., Inference and diagnostics in skew scale mixtures of normal regression models, J. Stat. Comput. Simul. 85 (2015), pp. 517–537. doi: 10.1080/00949655.2013.828057 [DOI] [Google Scholar]
- 14.Galarza C.E., Momentos de distribuições multivariadas duplamente truncadas, Tese de doutorado, Departamento de Estatística, IMECC-UNICAMP, 2020.
- 15.Garay A.M., Lachos V.H., Bolfarine H., and Cabral C.R.B., Linear censored regression models with scale mixtures of normal distributions, Stat. Papers 58 (2017), pp. 247–278. doi: 10.1007/s00362-015-0696-9 [DOI] [Google Scholar]
- 16.Jørgensen B., Statistical Properties of the Generalized Inverse Gaussian Distribution, Lecture notes in statistics, Springer, Heidelberg, 1982.
- 17.Lachos V.H., Garay A.M., and Cabral C.R.B., Moments of truncated skew-normal/independent distributions, Brazilian J. Probab. Stat. (in Press), (2019). Available from: https://www.imstat.org/wp-content/uploads/2019/03/BJPS438.pdf.
- 18.Lange K. and Sinsheimer J.S., Normal/independent distributions and their applications in robust regression, J. Comput. Graph. Stat. 2 (1993), pp. 175–198. [Google Scholar]
- 19.Louis T.A., Finding the observed information matrix when using the EM algorithm, J. Roy. Stat. Soc. B (Methodol.) 44 (1982), pp. 226–233. [Google Scholar]
- 20.Massuia M.B., Cabral C.R.B., Matos L.A., and Lachos V.H., Influence diagnostics for student-t censored linear regression models, Statistics 49 (2015), pp. 1074–1094. doi: 10.1080/02331888.2014.958489 [DOI] [Google Scholar]
- 21.Massuia M.B., Garay A.M., Lachos V.H., and Cabral C.R.B., Bayesian analysis of censored linear regression models with scale mixtures of skew-normal distributions, Stat. Interface 10 (2017), pp. 425–439. doi: 10.4310/SII.2017.v10.n3.a7 [DOI] [Google Scholar]
- 22.Matos L.A., Castro L.M., and Lachos V.H., Censored mixed-effects models for irregularly observed repeated measures with applications to HIV viral loads, Test 25 (2016), pp. 627–653. doi: 10.1007/s11749-016-0486-2 [DOI] [Google Scholar]
- 23.Mattos T.B., Garay A.M., and Lachos V.H., Likelihood-based inference for censored linear regression models with scale mixtures of skew-normal distributions, J. Appl. Stat. 45 (2018), pp. 2039–2066. doi: 10.1080/02664763.2017.1408788 [DOI] [Google Scholar]
- 24.R Core Team , R: A Language and Environment for Statistical Computing, R Foundation for Statistical Computing, Vienna, Austria. 2018. Available at http://www.R-project.org/.
- 25.Santos N., López R.G., Israelian G., Mayor M., Rebolo R., García-Gil A., de Taoro M.P., and Randich S., Beryllium abundances in stars hosting giant planets, Astron. Astrophys. 386 (2002), pp. 1028–1038. doi: 10.1051/0004-6361:20020280 [DOI] [Google Scholar]
- 26.Wei G.C.G. and Tanner M.A., A Monte Carlo implementation of the EM algorithm and the poor man's data augmentation algorithms, J. Am. Stat. Assoc. 85 (1990), pp. 699–704. doi: 10.1080/01621459.1990.10474930 [DOI] [Google Scholar]