Abstract
Bartolucci et al.(2003) extended the distribution assumption from the normal (Lyles et al., 2000) to the elliptical contoured distribution (ECD) for random regression models used in analysis of longitudinal data accounting for both undetectable values and informative drop-outs. In this paper, the random regression models are constructed on the multivariate skew ECD. A real data set is used to illustrate that the skew ECDs can fit some unimodal continuous data better than the Gaussian distributions or more general continuous symmetric distributions when the symmetric distribution assumption is violated. Also, a simulation study is done for illustrating the model fitness from a variety of skew ECDs. The software we used is SAS/STAT, V. 9.13.
Keywords: Multivariate analysis, skew elliptically contoured distributions, skew power exponential distributions, gamma distributions, maximum likelihood functions, censoring, informative drop-outs, undetectable
1 Introdution
The class of skew ECDs is large and accommodates distributions which are both symmetric and asymmetric, with both heavy tails and thin tails. It also accommodates distributions with large range of skewness and with different levels of kurtosis such as leptokurtic and platykurtic or mesokurtic distributions. Under the skew ECD assumptions, the outcome variables can be modeled and predicted more accurately and precisely, the (1 − α)100% prediction confidence intervals are narrower and the estimators are more robust if we deal with some skewed data.
An early drop-out caused by illness or death is known as informative drop-out. On the other hand, we will have undetectable data if some events occur before the start of an observation period or some data has a value below the limit of detection for the machine being used to record the data. The undetectable data is usually called left-censored. If either a left-censored or an informative drop-out is present, random effects linear models (Laird and Ware, 1982) and generalized estimating equations (GEE) (Liang and Zeger, 1986) give biased estimates of key parameters such as the population average HIV RNA slope and intercept as pointed out by Lyles et al. (2000).
Louis (1982) used asymptotic approximation methods to deal with the problem of left-censored and informative drop-out data. Hughes (1999) and Schluchter (1992) implemented maximum likelihood (ML) estimation via Expectation and Maximization (EM) algorithm to handle the problem of left-censored and informative dropout data. Lyles et al. (2000) combined the approaches of Hughes (1999) and Schluchter (1992) into a single likelihood integrating subject-specific random slopes and intercepts which took both informative drop-out and undetectable data into account. Then, they maximized the likelihood function with respect to fixed effects and other variables.
Bartolucci et al. (2003) extended the distribution assumption from the normal (Lyles et al., 2000) to the ECD for random regression models used in analysis of longitudinal data accounting for both undetectable values and informative drop-outs. In this paper, we generalize the random regression models proposed by Bartolucci et al. Our models are constructed on the multivariate skew elliptical contoured distributions. The estimation of the fixed parameters in the random regression models are invariant under different distribution assumptions. For the unimodal asymmetric continuous data, the skew ECD models fit the data better than the ECD models, which are better than classical normal models.
For illustrating the usefulness of our models we use the data from the HIV Epidemiology Research Study (HERS), the same data used by Lyles et al. (2000). Our analysis of the data shows that the skew ECD can fit the unimodal continuous data better than the Gaussian distributions or more general continuous symmetric distributions if the symmetric distribution assumption is violated. We also present simulation study illustrating the model fitness from a variety of skew elliptical contoured distributions. A convenient family of skew elliptical contoured distributions is the skew power exponential distributions.
2 Skew Power Exponential Distributions and Models
The skew power exponential distributions can be used to model light and heavy tailed, asymmetric and unimodal continuous data sets. The power exponential distribution (Lindsey, 1999) is a special case of the Kotz-type distribution (Nadarajah, 2003), which is a subclass of ECD. Therefore, the skew power exponential distribution is a subset of skew ECD. Diciccio Monti (2004) discussed the likelihood inference about the parameters and the information matrix of the maximum likelihood estimators of univariate skew power exponential (USPE) family and the application of the USPE distributions in robust estimation problems. According to Fang (2003) (λ = 0) and Zheng et al. (2003), we define the USPE distribution and the multivariate skew power exponential (MSPE) distribution, respectively.
First, we define the USPE distribution as follows:
| (2.1) |
where x is a random variable, α is the skew regulator, σ is a constant, related to the variance of x, β is a shape parameter, μ is a location parameter. If X is distributed as a USPE with parameters μ, σ, β and α we write X ~ USPE(μ, σ, β, α), or X ~ USPE(β, α) if μ = 0, σ = 1. If α is negative, X is said to be left skewed (Figure 1). If α is positive, X is said to be right skewed (Figure 2).
Figure 1.

Densities of the USPE(0, 1, 0.70, -3) and N(0, 1).
Figure 2.

Densities of the USPE(0, 1, 1.10, 2.5) and N(0, 1).
Second, we define the MSPE distribution as follows:
| (2.2) |
where x is a random vector of size k, Ω is a positive definite matrix of size k,related to the covariance matrix of random vector x, β is a shape parameter, μ is a location vector of size k, α is a constant vector of size k, regulating the skewness.
If x is distributed as an MSPE with parameters μ, Ω, β and α we write x ~ MSPE(μ, Ω, β, α), or x ~ MSPE(β, α) if μ = 0, Ω = Ik. If x is distributed as a two dimensional MSPE, we write x ~ MSPE(μ1, μ2, σ1, σ12, σ2, β, α1, α2) (Figure 3).
Figure 3.

Densities of the MSPE(0, 0, 1, 0.13, 1, 1.5, 5, -0.01).
We use the same model as the model used by Lyles et al. (2000) except for the underlying distribution assumptions. We use the following linear random-effects regression model:
| (2.3) |
where α denotes the fixed intercept, β denotes the fixed slope. We take the response Yij to be the base 10 logarithm of HIV RNA measured at the jth time point tij(j = 1,2, ⋯ , ni) for the ith woman (i = 1, 2, ⋯ , k = 528, 1 ≤ ni ≤ 5 for our data set). We assume that the error term eij are distributed as USPE(0, σ, υ1, α), the random intercept deviations ai have the distribution, USPE(0, σ1, υ2, α1), and the random slope deviations bi have the distribution, USPE(0, σ2, υ2, α2), with cov(ai, bi) = c.σ12, where c is the correction coefficient. The joint distribution of ai and bi is MSPE2(0, Ω2, υ2, (α1, α2)′), where Ω2 = (σij) is a positive definite matrix of size 2. Based on the trivariate power exponential distribution model (Bartolucci et al., 2003) and Definition 1 (Fang, 2003), we assume that the distribution of the 3-dimension random vector ( ) is trivariate skew power exponential. That is
where
The joint p.d.f. of ( ) is given by
| (2.4) |
where υ2 is the shape parameter, ωi = (ai, bi, ci)′, and , is the natural logarithm of the ‘survival’ time for the ith subject. The ‘survival’ time is defined as the time from the baseline to the occurrence of an event (i.e., sickness or death) before the study ended, α = (α1, α2, α3)′ is the skew parameter vector, and μt is the mean of .
From now on, we use the same or similar notations as above. The p.d.f. of Yij given ai, bi is as follows:
| (2.5) |
The p.d.f. of the 2-dimension random vector (ai, bi) is defined as bivariate skew power exponential:
| (2.6) |
The marginal p.d.f. of ai is given by
| (2.7) |
The marginal p.d.f. of bi is as follows:
| (2.8) |
The conditional p.d.f. of ai given bi = b is given by
| (2.9) |
Suppose x is distributed as (2.2) and x′ = (x(1)′,x(2)′), where x(1) is k1 × 1, x(2) is k2 × 1 and k1 + k2 = k. The conditional p.d.f. of x(1) given x(2) is as follows:
| (2.10) |
The next section applies the theory above to a longitudinal data set. In order to find the ML estimates of the related parameters, the probability density functions (2.4) through (2.10) developed in this section will be utilized to build ML functions. The maximization is implemented via the SAS procedure PROC IML.
3 Applications
In clinical studies of human immunodeficiency virus (HIV) infection, the number of copies of HIV ribonucleic acid (RNA) per milliliter of plasma is often used to measure the progression of the disease. According to Lyles et al. (2000), as an example of a large scale epidemiologic investigation, the HERS is a prospective, multisite cohort study of HIV infection. The primary objective of this study is to describe the natural history of HIV disease progression among women and to identify factors that are associated with prognosis. The study rationale, organization and methods have been described in detail elsewhere (Smith et al., 1997).
The HERS data was used to illustrate that the skew ECD can fit the asymmetric and unimodal continuous data better than the Gaussian distributions or more general continuous symmetric distributions. The data set contains 528 HIV-infected women (16-55 years old, from April 1993 to June 1998) in the HIV Epidemiology Research Study and 1,864 RNA measurements were collected. Overall, there were 25 (4.7%) drop-out subjects which resulted in 77 informative drop-out observations, and 745 (40%) observations were undetectable or left censored (below 500 copies per milliliter). For more detailed description of the data, including some definitions, the reader is referred to Lyles et al. (2000).
We use the general integrated likelihood expressions given by Lyles et al. (2000) to facilitate estimation and inference under the skew ECD distribution assumption. The software package, SAS is used. The ML function is maximized through the NLPQN routine in IML with respect to the parameters. We used the Dual Quasi-Newton optimization method. The double integration was computed by quadrature for each subject. The Hessian matrix was found through the NLPFDD routine in IML. There are no built-in generic skew ECD functions in SAS. Hence, we did the integration by approximation using the Simpson’s rule. Simpson’s rule is a method for numerical integration, the numerical approximation of definite integrals. Specifically, it is the following approximation:
| (3.11) |
Our calculation revealed that the univariate distributions of the random intercept and random slope and their joint distribution are symmetric. However, the error term is distributed as a non-normal and long-tailed skew ECD if we do not use the low undetectable limit or half of it to replace the undetectable values. Therefore, we assume that the error term is distributed as skew power exponential distribution which makes a subclass of the skew ECD.
4 Results
For the approximate ML estimates and the related statistics, the reader is referred to see Table 1. We used the Akaike (1974) Information Criteria (AIC) to compare the multivariate power exponential model and the skew power exponential model. In Version 9.13 of SAS/STAT software, AIC is defined as ‘smaller-is-better’. Specifically, AIC=−2L + 2d, where L denotes the maximum value of the log likelihood, d denotes the dimension of the model, i.e., the number of parameters estimated in the likelihood function, n denotes the number of observation. There are four models under discussion.
Table 1.
Results of HERS Data Analysis
| M | υ1 | υ2 | S | d | AIC |
|---|---|---|---|---|---|
| M1 | 0.6574(0.116) | 0.6574(0.116) | 0.0000 | 7 | 3928.101 |
| M2 | 0.5173(0.055) | 1.3706(0.215) | 0.0000 | 12 | 4064.791 |
| M3 | 0.6489(0.011) | 0.6489(0.011) | -0.3178(0.166) | 8 | 3927.380 |
| M4 | 0.7386(0.018) | 0.6415(0.020) | -0.0219(0.129) | 13 | 4045.155 |
- Numbers in parentheses are Standard Errors of the corresponding estimates;
- d denotes the number of parameters estimated in the ML function;
- υ1, υ2 denote the shape parameters in the ML function;
- s denotes skew parameters in the ML function;
- AIC is for Akaike Information Criteria.
Model 1 (M1)
ECD are assumed in this model. The model takes account for undetectable values only. Furthermore, we assume that both shape parameters are equal, i.e., υ1 = υ2. There are seven parameters (α, β, , , σ12, σ2, υ1) to be estimated in the ML function.
Model 2 (M2)
ECD are assumed in this model. Undetectable, informative drop-out and right censored values are considered at the same time in the model. We do not assume υ1 = υ2.
Model 3 (M3)
This model is the same as M1 except that we assume skew power exponential distribution for the error terms in the linear random-effects regression model (2.3) on page 5. Also, we assume that the distributions of the random intercept and slope are symmetric in the model.
Model 4 (M4)
This model is the same as M2 except that we assume skew power exponential distribution for the error terms in the linear random-effects regression model (3) on page 4. Also, we assume that the distributions of the random intercept and slope are symmetric in the model.
For the algorithm of the estimation of the parameters in M1 and M2, the reader is referred to Bartolucci et al.(2003). Nevertheless, to accommodate the different distribution assumption, for that in M3 and M4, some ECD densities in the ML functions should be replaced by the skewed ECD densities. Now, we summarize what we have found from the HERS data analysis. From Table 1, we can see that the skew ECDs fit the data much better than ECDs. M3 decreases AIC from 3928.10 of M1 to 3927.38. Among models M2, and M4, we treat undetects, informative drop-outs and right censored observations simultaneously. M4 decreases AIC from 4064.79 of M2 to 4045.155. Overall, model M4 is the most appropriate one by AIC criterion if we consider all possible situations.
5 Simulation
In this section, we present simulation study illustrating the model fitness from a variety of skew power exponential distributions. In the study, we simulated data for model 3. The parameter values were (a0, b0, , , σ12, σ2) = (3, 0.5, 0.5, 0.1, 0.1, 0.2); skew parameter α = 0.1, 0.3, ⋯ , 1.5; shape parameter v = 0.6, 0.8, 1.0, 1.2, 1.4; and sample size n = 1000. The values considered for α and υ cover various combinations of skewness and kurtosis. We can see from Table 2 that model 3 is better or at least equivalent to model 1 by using the same simulated data sets since both AIC and the Bayesian information criterion (BIC) (Schwarz, 1978) of model 3 are much less than that of model 1 or very close to that of model 1 for all parameter combinations.
Table 2.
Simulation Results
| α | 0.10 | 0.30 | 0.50 | 0.70 | 0.90 | 1.10 | 1.30 | 1.50 |
|---|---|---|---|---|---|---|---|---|
|
υ= 0.60 | ||||||||
| AICN | 2799.6 | 3005.5 | 3101.2 | 3127.7 | 3199.8 | 3097.9 | 3049.6 | 3068.2 |
| AICS | 2758.1 | 3000.0 | 3103.2 | 3128.0 | 3195.4 | 3086.6 | 3037.3 | 3051.3 |
| ABSD | 41.5 | 5.4 | 2.1 | 0.3 | 4.4 | 11.2 | 12.3 | 16.9 |
| BICN | 2834.0 | 3039.7 | 3135.5 | 3162 | 3234 | 3132.2 | 3084.0 | 3102.5 |
| BICS | 2797.4 | 3039.2 | 3142.5 | 3167 | 3235 | 3125.9 | 3076.5 | 3090.6 |
| ABSD | 36.6 | 0.5 | 7.0 | 5.2 | 0.5 | 6.3 | 7.4 | 12.0 |
|
υ= 0.80 | ||||||||
| AICN | 2266.1 | 2452.4 | 2418.8 | 2470.3 | 2487.8 | 2267.5 | 2336.2 | 2222.5 |
| AICS | 2252.5 | 2448.7 | 2420.9 | 2472.1 | 2489.1 | 2260.6 | 2320.9 | 2207.0 |
| ABSD | 13.5 | 3.7 | 2.2 | 1.8 | 1.3 | 6.9 | 15.3 | 15.5 |
| BICN | 2300.4 | 2486.7 | 2453.1 | 2504.7 | 2522.2 | 2301.9 | 2370.6 | 2256.8 |
| BICS | 2291.8 | 2487.9 | 2460.2 | 2511.4 | 2528.4 | 2299.9 | 2360.2 | 2246.2 |
| ABSD | 8.6 | 1.2 | 7.1 | 6.7 | 6.2 | 2.0 | 10.4 | 10.6 |
|
υ= 1.00 | ||||||||
| AICN | 1799.3 | 2060.2 | 2025.0 | 2080.3 | 1991.1 | 1954.7 | 1982.0 | 1865.8 |
| AICS | 1784.6 | 2059.0 | 2027.2 | 2081.4 | 1992.9 | 1951.2 | 1977.9 | 1847.4 |
| ABSD | 14.7 | 1.2 | 2.1 | 1.0 | 1.8 | 3.5 | 4.1 | 18.5 |
| BICN | 1833.6 | 2094.5 | 2059.4 | 2114.7 | 2025.4 | 1989.1 | 2016.3 | 1900.2 |
| BICS | 1823.9 | 2098.3 | 2066.4 | 2120.6 | 2032.2 | 1990.5 | 2017.1 | 1886.7 |
| ABSD | 9.8 | 3.8 | 7.0 | 5.9 | 6.7 | 1.4 | 0.8 | 13.5 |
|
υ= 1.20 | ||||||||
| AICN | 1536.5 | 1685.5 | 1833.6 | 1762.3 | 1698.7 | 1685.6 | 1605.1 | 1447.6 |
| AICS | 1517.5 | 1680.8 | 1833.1 | 1764.3 | 1696.4 | 1679.9 | 1578.9 | 1430.4 |
| ABSD | 19.0 | 4.7 | 0.4 | 2.0 | 2.3 | 5.7 | 26.2 | 17.2 |
| BICN | 1570.9 | 1719.9 | 1867.9 | 1796.7 | 1733.0 | 1720.0 | 1639.5 | 1481.9 |
| BICS | 1556.8 | 1720.1 | 1872.4 | 1803.6 | 1735.6 | 1719.2 | 1618.2 | 1469.6 |
| ABSD | 14.1 | 0.2 | 4.5 | 6.9 | 2.6 | 0.8 | 21.3 | 12.3 |
|
υ= 1.40 | ||||||||
| AICN | 1277.9 | 1548.7 | 1606.4 | 1547.1 | 1554.9 | 1420.6 | 1398.4 | 1296.6 |
| AICS | 1253.7 | 1532.3 | 1607.6 | 1546.6 | 1557.3 | 1413.3 | 1390.9 | 1258.5 |
| ABSD | 24.2 | 16.5 | 1.2 | 0.5 | 2.5 | 7.3 | 7.5 | 38.1 |
| BICN | 1312.2 | 1583.1 | 1640.7 | 1581.4 | 1589.2 | 1455.0 | 1432.8 | 1330.9 |
| BICS | 1293.0 | 1571.5 | 1646.8 | 1585.8 | 1596.6 | 1452.6 | 1430.1 | 1297.7 |
| ABSD | 19.3 | 11.6 | 6.1 | 4.4 | 7.4 | 2.4 | 2.6 | 33.2 |
Note about Table 2:
α is the skew parameter for error distribution in (2.3);
AICN denotes AIC of model 1 with power exponential distribution assumption;
AICS denotes AIC of model 3 with skew power exponential distn assumption;
BICN denotes BIC of model 1 with power exponential distribution assumption;
BICS denotes BIC of model 3 with skew power exponential distn assumption;
ABSD denotes the absolute difference of AIC or BIC between model 1 and 3.
The absolute differences become larger when the skew parameters are less than 0.3 and the differences increase as the skew parameters increase when the skew parameters are larger than 0.3, then it will start to decrease a little after the skew parameters pass 1.9 or 2.1. Nonlinearity of the relation between the skew parameter and the skew coefficient caused the phenomena above. The Pearson correlation coefficient between the absolute difference of AIC between the two models under discussion and the absolute estimated skew parameter is 0.86(p < 0.001), which indicates that the difference between the two models is more significant when we have larger absolute estimated skew parameter. On the other hand the average absolute differences of AIC between the two models for different shape parameters: 0.6, 0.8, 1.0, 1.2, 1.4, are 17.3, 12.4, 12.6, 14.0, 21.6, 15.6 respectively, which shows that the difference between the two models is more significant when the shape parameter departs further from 1.0. We can have the same conclusion if we discuss all of above based on BIC instead of AIC.
6 Discussion
The magnitude of the estimated skew parameters for the HERS data is not large since we have taken the base 10 logarithm transformation on the number of copies of HIV ribonucleic acid (RNA) per milliliter of plasma. Nevertheless, the skew ECD models do improve the ECD models based on AIC. However, in Section 5, we simulated data sets to cover both large and small magnitude of skewness and kurtosis.
Next, we discuss some possible extensions. The skew power exponential distributions, including the skew normal distributions as a subset, are just a member of the larger skew ECD family. To extend the skew power exponential distribution assumptions for the models we have discussed is a challenging task and of great interest in both theory and practice. We can extend the skew power exponential distribution to such as skew t distributions, skew generalized t distributions, skew Cauchy distributions, and other skew distributions.
Acknowledgments
The authors would like to thank Lytt Gardner (Division of HIV Prevention, CDC, Atlanta) for providing the motivation and the data of the HIV Epidemiology Research Study used in this paper. The authors would like to thank the Editor-in-Chief, Professor Shahjahan Khan, associate editors, referees, and Zhao Peng for their thoughtful and constructive comments. Their suggestions and coments have strengthened our paper.
Footnotes
This work was supported, in part, by the National Institutes of Health (DA14037, DA15131, DA17804, DA17805, MH62464 and MH68391), the Sarah M. and Charles E. Seay Endowed Chair in Child Psychiatry at UT Southwestern Medical Center.
Contributor Information
Shimin Zheng, Department of Psychiatry, UTSW Medical Center, Dallas, TX 75390, U.S.A. & Department of Finance, Nanjing Audit University, Nanjing 210029, P. R. China. Shimin.Zheng@UTSouthwestern.edu.
Uma Rao, Department of Psychiatry, UTSW Medical Center, Dallas, TX 75390, U.S.A. Uma.Rao@UTSouthwestern.edu.
Alfred A. Bartolucci, Department of Biostatistics, UAB, Birmingham AL 35294, U.S.A. ABartolucci@ms.soph.uab.edu
Karan P. Singh, Department of Biostatistics, UNTHSC, Fort Worth, TX 76107, U.S.A. ksingh@hsc.unt.edu
References
- Akaike H. A New Look at the Statistical Model Identification. IEEE Transaction on Automatic Control. 1974;AC-19:716–723. [Google Scholar]
- Bartolucci AA, Zheng S, Singh KP. Random regression models Based On The Elliptically Contoured Distribution Assumptions With Applications To Longitudinal Data. Journal of Modern Applied Statistical Methods. 2003;2:259–370. [PMC free article] [PubMed] [Google Scholar]
- DiCiccio TJ, Monti AC. Inferential aspects of the skew exponential power distribution. Journal of American Statistical Association, Theory and Methods. 2004;99:439–450. [Google Scholar]
- Fang BQ. The skew elliptical distributions and their quadratic forms. Journal of Multivariate Analysis. 2003;87:298–314. [Google Scholar]
- Hughes JP. Mixed effects models with censored data with application to HIV RNA Levels. Biometrics. 1999;55:625–629. doi: 10.1111/j.0006-341x.1999.00625.x. [DOI] [PubMed] [Google Scholar]
- Laird NM, Ware JH. Random-effects models for longitudinal data. Biometrics. 1982;38:963–974. [PubMed] [Google Scholar]
- Liang KY, Zeger SL. Longitudinal data analysis using generalized linear models. Biometrika. 1986;73:13–22. [Google Scholar]
- Lindsey JK. Multivariate elliptically contoured distributions for repeated measurements. Biometrics. 1999;55:1277–1280. doi: 10.1111/j.0006-341x.1999.01277.x. [DOI] [PubMed] [Google Scholar]
- Louis TA. Finding the observed information matrix when using the EM algorithm. Journal of the Royal Statistical Society, B. 1982;44:226–233. [Google Scholar]
- Lyles RH, et al. Random regression models for human immunodeficiency virus ribonucleic acid data subject to left censoring and informative drop-outs. Journal of the Royal Statistical Society Series C, Applied Statistics. 2000a;49:485–497. [Google Scholar]
- Lyles RH, et al. SAS Programs and Simulation Data Sets. 2000b http://www.blackwellpublishers.co.uk/rss/
- Nadarajah Saralees. The Kotz-type distribution with applications. Statistics. 2003;34:341–358. [Google Scholar]
- Schluchter MD. Methods for the analysis of informatively censored longitudinal data. Statistics in Medicine. 1992;11:1861–1870. doi: 10.1002/sim.4780111408. [DOI] [PubMed] [Google Scholar]
- Smith DK, et al. Design and baseline participant characteristics of the Human Immunodeficiency Virus Epidemiology Research (HER) study: a prospective cohort study of human immunodeficiency virus infection in US women. American Journal of Epidemiology. 1997;146:459–469. doi: 10.1093/oxfordjournals.aje.a009299. [DOI] [PubMed] [Google Scholar]
- Todd J, et al. Performance characteristics for the quantitation of plasma HIV-1 RNA using branched DNA signal amplification technology. Journal AIDS. 1995;10:S35–S44. [PubMed] [Google Scholar]
- Zheng S, Bae S, Bartolucci AA, Singh KP. Power exponential distribution. Journal AIDS. 2003;4:97–111. [Google Scholar]
