ABSTRACT
In this paper, we propose a new kind of multivariate t distribution by allowing different degrees of freedom for each univariate component. Compared with the classical multivariate t distribution, it is more flexible in the model specification that can be used to deal with the variant amounts of tail weights on marginals in multivariate data modeling. In particular, it could include components following the multivariate normal distribution, and it contains the product of independent t-distributions as a special case. Subsequently, it is extended to the regression model as the joint distribution of the error terms. Important distributional properties are explored and useful statistical methods are developed. The flexibility of the specified structure in better capturing the characteristic of data is exemplified by both simulation studies and real data analyses.
Keywords: Expectation/conditional maximization algorithm, multivariate t distribution, multivariate t regression model, multivariate truncated normal distribution, stochastic representation
1. Introduction
As a natural generalization of the univariate Student's t distribution, the multivariate t (MVT) distribution is a robust alternative to the multivariate normal distribution in the analysis of multivariate continuous data with heavy tails or outliers. The first derivation of the MVT distribution was accomplished by independent researchers, see [4,6], and the latter just concentrated on the bivariate case. Some characterizations on the MVT distribution were presented, see [5,16]. The work of considering an efficient computational procedure for the bivariate case was included in [1]. The maximum likelihood estimation methods for the MVT distribution with missing data were also discussed in [17,19–21]. The regression models with MVT error terms have been widely investigated by [8,18,29]. A comprehensive review on the topic was summarized in [14], as the first monograph on the MVT distribution. The corresponding mathematical properties such as stochastic representation (SR), consistency property, density expansion, moments conditional distributions can also be found in [23], and the associated estimation methods can be found in [24]. Recently, a sampling method for the MVT distribution with R package was provided, see [11].
The most common way of constructing the MVT distribution is through the SR of a normal random vector and an independent chi-squared random variable. Assume that the random vector , the random variable , and they are independent (denoted by ). Define
| (1) |
then is said to follow a multivariate t distribution, denoted by , where is the location parameter vector, is a positive-definite scale matrix and ν is the degrees of freedom. When , the distribution reduces to the multivariate Cauchy distribution. As , the distribution approaches the multivariate normal distribution. Hence, the parameter ν may be viewed as a robustness tuning parameter (see [22], pp.5332). It can be fixed in advance or inferred from the observed data. An equivalent expression of (1) is
| (2) |
where (the density of is denoted by with and ), and . We call (1) or (2) the SR of the classical MVT or Type I MVT random vector.
From the SR (1) or (2), we have observed that the Type I MVT distribution has the following particular disadvantages:
All components follow univariate t distributions with the same degrees of freedom ν and hence the same amount of tailweight (see [13], pp. 163).
The Type I MVT random vector with a finite ν includes neither a component following the univariate normal distribution nor a sub-vector following the r-dimensional normal distribution, where and .
The Type I MVT random vector can never contain statistically independent components since all components share a common random variable V or U, even when is diagonal.
These drawbacks definitely limit its application to a certain extent. To overcome the above first drawback, within the framework of copula, a class of meta-elliptical distributions including a special member of so-called asymmetric multivariate t (AMVT) distribution was proposed, see [7], whose marginals being univariate t distributions with different degrees of freedom. They further pointed out that the AMVT distribution and Type I MVT distribution have the same copulas (i.e. the same correlation structure). Hence, the AMVT distribution still cannot overcome the above second and third drawbacks. Moreover, they did not develop any statistical inference methods on this particular distribution. Then, a new bivariate t distribution with marginals having different degrees of freedom was proposed, see [13], by defining
| (3) |
where are assumed to be mutually independent, , and with . It also mentioned two possible multivariate extensions of (3) (see [13]) as:
| (4) |
where are assumed to be mutually independent, , and for with ; and
| (5) |
where the and are the same as defined above and with for . We have three comments on (3)–(5). First, all components in (3)–(5) follow univariate t distributions with possible different degrees of freedom, i.e. the first drawback of Type I MVT distribution can be overcome; however, the second and third drawbacks cannot yet be overcome. Second, only the distributional properties for the bivariate case were studied in [13], and did not provide any statistical inference methods. Third, for multivariate cases, it remarked that ‘It appears, however, that only in the bivariate case are the joint density function and hence conditional distributions fully tractable. It is this that prompted the publication of the current special case, along with any independent interest, this special case may possess’, see [13]. An alternative t-distribution by replacing the common gamma divisor in (2) with p i.i.d. gamma divisors was then proposed, see [9], and they only mentioned the proposal of allowing these independent gamma divisors with possible different degrees of freedom, but without any detailed discussion. By inserting multidimensional weights into the Gaussian scale mixture, it is generalized to a multivariate t-distribution, see [10]. However, the resulting marginal is a linear combination of univariate t-distributions, thus has no intuitive interpretation on the degrees of freedom for marginal distributions. For the two data sets presented in Section 6 below, we may encounter that there is a dependency structure among components, while each component does not have the same tailweight. The classical MVT distribution is no longer appropriate, thus a new tool is expected to break through the limitations in existing models and have a broader application.
In this paper, we will propose a new kind of MVT distribution by allowing different degrees of freedom for each univariate component, called Type II MVT distribution. The proposed distribution has several remarkable features including (a) all components follow univariate t-distributions with possible different degrees of freedom; (b) it could include components following the multivariate normal distributions when the corresponding ; (c) it could contain statistically independent components such that the product of independent t distributions is its special case. Compared with the classical MVT distribution, this new structure can better capture the characteristic of the data. We first derive important distributional properties and then develop useful statistical inference methods.
The rest of the paper is organized as follows. In Section 2, the Type II MVT distribution is outlined and some distributional properties are explored. In Section 3, the maximum likelihood estimation of the parameters via the Monte Carlo expectation/conditional maximization (ECM) algorithm and test of independence are developed, and then an extended regression model is introduced and investigated. Bayesian methods are presented in Section 4. In Section 5, some simulation studies are performed to evaluate the proposed methods. Two real data sets are used to compare the proposed distribution with the classical MVT distribution in Section 6. Finally, a discussion is given in Section 7.
2. Type II multivariate distribution
We define a new MVT distribution through the SR instead of the joint probability density function (pdf). A random vector is said to follow the Type II MVT distribution, denoted by , if can be stochastically represented as
| (6) |
where , , , for , , , and . The vector could be fixed in advance or it can be estimated from the observed data. Note that if , the distribution of in (6) becomes the alternative multivariate t distribution proposed by [9].
By comparing with the Type I MVT distribution, the Type II MVT distribution possesses the following three significant characteristics:
From (6), it is easy to obtain , indicating that all components follow univariate t-distributions with possible different degrees of freedom; i.e. for .
The Type II MVT random vector could include components following a multivariate normal distribution if the corresponding .
The Type II MVT random vector could contain statistically independent components. From (7) and (8) below, in Type II MVT distribution, any two components ( ) are independent provided that are independent. Whereas in Type I MVT distribution, any two components are always dependent since they share a common factor U. Particularly, when is diagonal, the Type II MVT density becomes the product of d independent univariate t densities.
Remark 2.1
The dependency structure of the Type II MVT distribution induced by the SR (6) is totally determined by the variance–covariance matrix of . Note that the Gamma random variables are independent, thus all are free parameters and the estimation procedure on each will be independent and straightforward. However, the dependency structure of the two distributions respectively induced by the SRs (4) and (5) is only own to the common random variable since the correlation from the multivariate normal distribution is not incorporated. Under the construction of (4) or (5), the estimations of 's will be more complicated since an ascending order constraint or for is involved.
2.1. Density function of Type II MVT distribution
To derive the joint pdf of Type II MVT distribution, we first introduce the multivariate truncated normal distribution. A d-dimensional random vector is said to follow the multivariate normal distribution truncated in the box , denoted by with and , if its joint pdf is (see [12], pp. 210)
where is the realization of .
Now, we can rewrite (6) as a mixture of
| (7) |
where is the realization of and is the realization of . The joint pdf of is obtained as (for detailed derivation, see Appendix):
| (8) |
where
| (9) |
and with
| (10) |
To have an insight into the function , we first define the mixed moments of a random vector. Let a random vector have the joint pdf . Given , where are non-negative constants, then
is called the mixed moments of in power of with respect to . Note that C is the normalizing constant of , then given by (9) is the mixed moments of in power of with respect to . It is clear from (8) that this distribution is not of the elliptical form (see [7]).
Let , where consists of the first q components and consists of the last components. To consider the conditional distribution of given , the derivation of is not easy since the marginal density of depends on a multiple integral over hypercubes. For visualization, we present the conditional distribution curves for the bivariate case. The parameters are set as
Case 1: , , . The plots of are curved for , respectively.
Case 2: , , . The plots of are curved for , respectively.
From Figure 1, the curves of are in similar shapes as the value of moves under the same parameter configuration.
Figure 1.
The conditional distribution curves for in Type II MVT distribution. (a) , , ; (b) , , .
2.2. Moments and correlation
The mean vector and variance–covariance matrix of are, respectively, given by
where
for . Thus, the correlation coefficient between and is given by
| (11) |
From (11), we can see that the sign of is determined by and can be arbitrary (positive, zero or negative).
2.3. Comparison of the densities
We compare the pdfs of Type I MVT distribution and Type II MVT distribution . For the same and , if one sets where , from (2) and (6) we can see that the marginal density of each from the two distributions are identical, following . However, the shapes of the two density functions may appear totally different. Moreover, Type II MVT distribution is more flexible since it could contain the case of .
For more comparisons, we consider d = 2 and set parameter configurations as and for both distributions, ν being the degrees of freedom in Type I MVT, being degrees of freedom in Type II MVT, and . To illustrate the differences, we averagely take 30 values of within and 30 values of within . The maximum and minimum of Type I and Type II MVT densities over these points are computed and displayed in Table 1. The correlation coefficients between two components denoted by ρ in the two distributions are also compared. Besides, the shapes of corresponding densities are curved in both contour graphs and three-dimensional perspectives by Figures 2 and 3. It is observed that when the two densities have the same amount of marginal tail weights, the Type II density curve is a bit flatter than the Type I curve, and the correlation coefficient between components in Type II MVT is also weaker than that in Type I MVT, which is not surprising as the dependence in Type II MVT only relies on the dependence from the multivariate normal vector.
Table 1.
Maximum and minimum values of Type I and II MVT densities for .
| Maximum | 0.156351 | 0.133184 | 0.111747 | 0.106234 |
| Minimum | 0.001228 | 0.000528 | ||
| ρ | 0 | 0 | 0.816497 | 0.767150 |
Note: : Density function of Type I MVT distribution; : Density function of Type II MVT distribution, c.f. (8).
Figure 2.
The contour plots and 3-D perspectives of Type I and Type II MVT density curves given , . (a1)–(a2) Type I MVT with ; (b1)–(b2) Type II MVT with .
Figure 3.
The contour plots and 3-D perspectives of Type I and Type II MVT density curves given , . (a1)–(a2) Type I MVT with ; (b1)–(b2) Type II MVT with .
Moreover, we could have different 's in Type II MVT, thus we choose several combinations of parameters with . Similarly, the maximum and minimum of densities are presented in Table 2 and their various shapes are displayed in Figure 4.
Table 2.
Maximum and minimum values of Type II MVT density function for .
| Maximum | 0.137650 | 0.103573 |
| Minimum | 0.000397 | |
| ρ | 0 | 0.713626 |
Figure 4.
The contour plots and 3-D perspectives of Type II MVT density curves. (a1)–(a2) , , ; (b1)–(b2) , , .
3. Estimation of parameters and test of independence
3.1. MLEs of via the Monte Carlo ECM algorithm
Usually the value of cannot be known in advance in practice, we need to estimate together with . Let and the observed data be , where denotes the realization of for . The mixture expression of Type II MVT random vector specified by (7) evokes us to employ the ECM algorithm to obtain the MLEs of the parameters. For each ( ), based on (7), we introduce the corresponding latent vector , where for . The missing data are denoted by , where is the realization of , and the complete data are . The complete-data likelihood function is given by
where . Let and is its realization with and . Then the log-likelihood function becomes
The CM-step is to calculate the complete-data conditional MLEs of as given by
| (12) |
And the complete-data MLE of is the solution to the equation
| (13) |
where is the digamma function. The E-step is to replace those terms involved 's in (12)–(13), i.e. and for and , by their conditional expectations. To this end, we first need to derive the conditional distribution of , which is given by
| (14) |
where , and is defined by (10). Since the normalizing constant in (14) is not available in closed form, we will perform the Monte Carlo implementation in the E-step (see [2,15,28]).
Given (14), the Monte Carlo method for calculating the conditional expectations is summarized as follows.
- Step 1:
-
Step 2:Approximately calculate the conditional expectations as
(16)
for .(17)
By combining (12)–(13) with (16)–(17), we could obtain the MLEs of .
Regarding the convergence of the Monte Carlo ECM algorithm, the specification of G is very important. Large value of G could move the approximation closer to the true maximizer, but it is time-consuming. Therefore, it is inefficient to start with a large value of G when the current approximation to the maximizer is far from the true value. Instead, it is recommended to monitor the convergence of the algorithm till the process has been almost stabilized, then to terminate the algorithm and to choose the stabilized point as the new initial value to continue with a large value of G. This will further decrease the system variability and obtain a more closed maximizer to the real value. Besides, let be the t-th approximation of , then the stopping rule of the Monte Carlo ECM algorithm is set as , where δ is a predetermined precision.
3.2. Testing hypothesis of independence
Suppose that we want to test the following hypotheses
Under , the likelihood ratio test (LRT) statistic is given by
| (18) |
where are the constrained MLEs of under while are the unconstrained MLEs of . Under , Type II MVT distribution reduces to the product of d independent univariate t distributions, thus the constrained MLEs are easily obtained. The test statistic T asymptotically follows a chi-squared distribution with degrees of freedom under , and the corresponding is given by
| (19) |
For a given significance level α, the null hypothesis should be rejected if .
3.3. Type II multivariate t regression model
In existing multivariate linear regression models, the error terms are often assumed to follow a multivariate normal or t distribution, where the latter can improve the robustness of the normal model for data with outliers. Since the proposed Type II MVT distribution is much more flexible than Type I MVT distribution, we adopt it as the joint distribution of the error terms. Then Type II MVT regression model is formulated as
| (20) |
where is an vector of response, , , is a positive-definite matrix, is the vector of degrees of freedom, is the known covariate vector for the subject j and is the vector of regression coefficients. Note that can be expressed in matrix form as
where . Now we rewrite (20) as
| (21) |
and the objective is to estimate , and .
Let be the realization of , and the observed data be denoted by . Similar with that in Section 3.1, we introduce latent variables for and such that
where the notations , and are the same as those denoted in Section 3.1.
The Monte Carlo ECM algorithm is also employed to give the MLEs of parameters. The first two CM-steps are to calculate the conditional MLEs
| (22) |
and the third CM-step is to calculate , which is the same solution to the equation (14) for . The E-step is to replace and for by their conditional expectations, which can be calculated similarly as (15)–(17) based on the following conditional distribution
| (23) |
where and .
It is quite difficult to obtain the standard deviations of MLEs and Wald confidence intervals of parameters due to the complexity of the observed-data likelihood function and the observed information matrix. Fortunately, the posterior samples of these parameters are relatively easy to generate by using the Gibbs sampling as shown in the next section. Based on the posterior samples, we can provide the posterior means, the posterior standard deviations and Bayesian credible intervals.
4. Bayesian methods
In this section, we discuss the Bayesian methods for Type II MVT distribution. The prior distribution of the parameters for the classical MVT distribution has been discussed, see [17], in which , and ν are assumed independent. And the locally uniform prior is assigned to , the inverse Wishart prior is assigned to as
| (24) |
where m is a scalar and is a non-negative definite matrix. If m = d and (i.e. zero matrix), is the non-informative prior; if m = −1 and , is the flat prior. The prior distribution for the hyperparameter, the degrees of freedom ν, is advised by some literature (see [17,25]). One assumption of the prior of ν is the flat prior distribution for , i.e.
| (25) |
For the proposed Type II MVT distribution , similarly we can apply (24) as the priors for , and extend the prior for the univariate hyperparameter in (25) to the multivariate case as
| (26) |
The posterior distribution of the complete-data is
where
Firstly, the conditional predictive distribution of missing data is derived as
where
| (27) |
To generate samples of , we combine the Gibbs sampling method with the acceptance–rejection (AR) algorithm (see [27]). We propose that the envelope density and the envelope constant are given by
| (28) |
respectively, where
and denotes the density of the uniform distribution . Note that and are positive definite, for any non-zero vector , we always have . Thus, from (27), it follows that
since the function arrives its maximum at when . For , we have . Overall, the envelope constant c specified by (28) is always greater than or equal to 1, indicating that the function in (27) is minorized by .
Thus, the procedures to generate posterior samples by AR algorithm are given as:
-
Step 1:For each , draw from through the AR algorithm with the following two steps:
- Draw , and independently draw for and for . Set .
-
Step 2:Draw from via the Gibbs sampling method with the following loop:
where
and generate from the posterior distribution of given via the grid method, with the density for each given by
The above methods could be extended to the regression model specified in (20) or (21) by assigning priors for .
5. Simulation studies
The components in Type I MVT distribution must be dependent and share the same degrees of freedom. While the proposed Type II MVT distribution can accommodate different marginal amounts of tail weights on univariate t distributions and even approximate normal margins. Thus, we conduct some simulations to compare the fitting performances of the two distributions when data are generated from one of them and highlight the characteristic of Type II MVT distribution.
We consider bivariate and trivariate cases, i.e. d = 2, 3. The sample size is set as n = 100, the mean vector is always chosen to be zero, the scale matrix is set as the diagonal elements for all i and the off-diagonal elements for will be specified in each case. Under different configurations of parameters , samples will be generated from either or as indicated by the first line of each experiment in the following two tables, where the results are obtained by fitting samples by both of the two distributions. In the case of d = 2, we have Experiments 1–4, the comparisons of estimation results including the average MLE, the corresponding mean squared error (MSE) of the estimates based on L = 1000 replications are presented in Table 3. The Akaike information criterion (AIC) and Bayesian information criterion (BIC) are used to reflect their fitting performances by averaging the AIC values and BIC values for each fitted model.
Table 3.
Comparisons of estimate performances with d = 2.
| Experiment 1: , , | |||||||||
|---|---|---|---|---|---|---|---|---|---|
| Type I | ν | AIC | BIC | ||||||
| MLE | 0.0009 | 0.0024 | 0.9794 | 0.4919 | 0.9955 | 2.9793 | 682.7873 | 698.4183 | |
| MSE | 0.0142 | 0.0127 | 0.0401 | 0.0194 | 0.0394 | 0.3975 | – | – | |
| Type II | AIC | BIC | |||||||
| MLE | 0.0011 | 0.0026 | 1.0712 | 0.6064 | 1.0978 | 3.4333 | 3.4489 | 694.9977 | 713.2339 |
| MSE | 0.0158 | 0.0145 | 0.0681 | 0.0480 | 0.0785 | – | – | – | – |
| Experiment 2: , , | |||||||||
| Type I | ν | AIC | BIC | ||||||
| MLE | −0.0024 | −0.0035 | 1.1145 | 0.4475 | 1.1175 | 3.2953 | 705.6417 | 721.2727 | |
| MSE | 0.0140 | 0.0148 | 0.0734 | 0.0168 | 0.0802 | – | – | – | |
| Type II | AIC | BIC | |||||||
| MLE | −0.0006 | −0.0014 | 0.9967 | 0.4965 | 1.0077 | 3.1612 | 3.2248 | 697.3014 | 715.5376 |
| MSE | 0.0127 | 0.0134 | 0.0553 | 0.0165 | 0.0567 | 0.8421 | 0.8900 | – | – |
| Experiment 3: , , | |||||||||
| Type I | ν | AIC | BIC | ||||||
| MLE | −0.0025 | −0.0050 | 1.2948 | 0.4552 | 0.8495 | 4.8398 | 653.4693 | 669.1004 | |
| MSE | 0.0166 | 0.0128 | 0.1672 | 0.0164 | 0.0475 | – | – | – | |
| Type II | AIC | BIC | |||||||
| MLE | −0.0024 | −0.0047 | 0.9957 | 0.5007 | 0.9884 | 3.0641 | 10.0156 | 646.4751 | 664.7113 |
| MSE | 0.0152 | 0.0121 | 0.0551 | 0.0159 | 0.0266 | 0.7348 | 5.0330 | – | – |
| Experiment 4: , , | |||||||||
| Type I | ν | AIC | BIC | ||||||
| MLE | −0.0030 | 0.0005 | 1.0831 | 0.0013 | 1.0807 | 5.5423 | 666.9350 | 682.5660 | |
| MSE | 0.0127 | 0.0133 | 0.0470 | 0.0097 | 0.0533 | – | – | – | |
| Type II | AIC | BIC | |||||||
| MLE | −0.0029 | 0.0005 | 0.9560 | 0.0006 | 0.9533 | 4.6581 | 4.6488 | 663.7192 | 681.9554 |
| MSE | 0.0123 | 0.0126 | 0.0372 | 0.0096 | 0.0417 | 1.4100 | 1.4080 | – | – |
Note: MLE is the average of 1000 point estimates with precision via the Monte Carlo ECM algorithm; Mean squared error (MSE) of the estimate is equal to the sum of the variance and the squared bias of the estimator.
In the case of d = 3, we conduct Experiments 5–6, estimation accuracies on parameters are compared by MLE and MSE as summarized in Table 4. In a more direct-viewing way, box plots are adopted to present the estimation results for each experiment in Figures 5–7.
Figure 6.
(a1–a3) Experiment 3: Box plots for MLEs of the parameters by bivariate Type I and Type II MVT distributions; (b1–b3) Experiment 4: Box plots for MLEs of the parameters by bivariate Type I and Type II MVT distributions.
Table 4.
Comparisons of estimate performances with d = 3.
| Experiment 5: , , | ||||||
|---|---|---|---|---|---|---|
| Type I | ||||||
| MLE | 0.0233 | 0.0020 | 0.0250 | 0.9997 | −0.4970 | 0.5032 |
| MSE | 0.0102 | 0.0118 | 0.0101 | 0.0230 | 0.0118 | 0.0122 |
| Type I | ν | |||||
| MLE | 0.9996 | −0.4974 | 1.0042 | 3.0911 | ||
| MSE | 0.0261 | 0.0118 | 0.0280 | 0.3512 | ||
| Type II | ||||||
| MLE | 0.0213 | 0.0016 | 0.0218 | 0.9147 | −0.5093 | 0.5146 |
| MSE | 0.0070 | 0.0080 | 0.0077 | 0.0398 | 0.0191 | 0.0191 |
| Type II | ||||||
| MLE | 0.9116 | −0.5081 | 0.9157 | 2.9983 | 2.9775 | 2.9868 |
| MSE | 0.0434 | 0.0182 | 0.0446 | – | – | – |
| Experiment 6: , , | ||||||
| Type I | ||||||
| MLE | 0.0356 | 0.0075 | 0.0330 | 1.2886 | −0.5332 | 0.5327 |
| MSE | 0.0155 | 0.0177 | 0.0157 | 0.1558 | 0.0181 | 0.0175 |
| Type I | ν | |||||
| MLE | 1.2996 | −0.5349 | 1.2854 | 4.0222 | ||
| MSE | 0.1648 | 0.0185 | 0.1547 | – | ||
| Type II | ||||||
| MLE | 0.0230 | 0.0045 | 0.0206 | 0.9020 | −0.4621 | 0.4629 |
| MSE | 0.0074 | 0.0089 | 0.0082 | 0.0405 | 0.0132 | 0.0125 |
| Type II | ||||||
| MLE | 0.9039 | −0.4646 | 0.8966 | 2.9699 | 2.9454 | 2.9617 |
| MSE | 0.0433 | 0.0139 | 0.0445 | 0.3139 | 0.3137 | 0.3377 |
Note: MLE is the average of 1000 point estimates with precision via the Monte Carlo ECM algorithm; Mean squared error (MSE) of the estimate is equal to the sum of the variance and the squared bias of the estimator.
Figure 5.
(a1–a3) Experiment 1: Box plots for MLEs of the parameters by bivariate Type I and Type II MVT distributions; (b1–b3) Experiment 2: Box plots for MLEs of the parameters by bivariate Type I and Type II MVT distributions.
Figure 7.
(a1–a3) Experiment 5: Box plots for MLEs of the parameters by trivariate Type I and Type II MVT distributions; (b1–b3) Experiment 6: Box plots for MLEs of the parameters by trivariate Type I and Type II MVT distributions.
From Tables 3 and 4, it is observed that the performance of the Monte Carlo ECM algorithm in parameter estimation of Type II MVT distribution is quite satisfactory in the sense that all the average MLEs are close to their true values. When data are indeed generated from Type I MVT distribution as in Experiments 1 and 5, the point estimates in Type I MVT are closer to their true values and are more stable with smaller MSEs for most of the parameters, and Type I MVT overwhelms Type II MVT in model fitting in terms of AIC and BIC values. While the estimation by Type II MVT still gives an acceptable result where the estimates of are not too far away from the true values and the estimates on 's are close to each other. Moreover, when samples are indeed generated from Type II MVT distribution with identical 's as in Experiments 2, 4, 6, the estimation performance by Type II MVT distribution is better, as the estimates are more accurate and have smaller MSEs; while in Type I MVT, from Figures 5–7, we can see that the estimations on parameters especially and ν are more likely to produce outliers. If the true values of and are distinct as in Experiment 3, the Type II MVT distribution apparently outperforms Type I MVT distribution which has only one degrees of freedom parameter, and Type II MVT distribution has a better depiction on different amounts of tail weights for marginal components.
6. Applications
To ease the numerical work and graphical presentation, we focus on the bivariate case to provide the numerical illustration of the presented methodologies.
6.1. Sport data
We make use of a data set, collected by the Australian Institute of Sport and reported, see [3], containing several variables measured on n = 202 Australian athletes. Specifically, we consider the pair of variables BMI (body mass index) and LBM (lean body mass). Table 5 summaries the parameter estimates from Type I bivariate t distribution with unknown ν and Type II bivariate t distribution with unknown .
Table 5.
MLEs of parameters for fitting (BMI, LBM).
| MLEs of parameters | Criterion | ||||||
|---|---|---|---|---|---|---|---|
| Dist. | ν | log-L | AIC | BIC | |||
| Type I | 11.0467 | – | −1228.471 | 2468.942 | 2488.792 | ||
| Type II | – | −1223.653 | 2461.307 | 2484.464 | |||
To test the independence between BMI and LBM, from (18) and (19), the value of the test statistic T is given as and the corresponding p-value , indicating that the null hypothesis should be rejected at 0.05 significance level.
Figure 8 displays the scatter plot and the superimposed contours of the fitted Type II bivariate t distribution for the (BMI, LBM) pair. When comparing the two models, the proposed Type II distribution is selected by both AIC and BIC, also with a larger value of log-likelihood. Besides, the estimated correlation coefficient from (11) is , which is very close to the sample index . The Kendall's tau and Spearman's rho for measuring the rank correlation between BMI and LBM are and respectively. All of these statistics reveal that there is an obvious positive correlation between the two variables, this feature is well captured by the proposed Type II MVT distribution.
Figure 8.
Scatter plot (BMI, LBM) and fitted Type II bivariate t distribution.
6.2. Tuberculosis vaccine data
The data consist of 13 trials on the efficacy of Bacillus Calmette–Guéin (BCG) vaccine against tuberculosis for a vaccinated group and a non-vaccinated control group, as presented in Table 6. Let VD, VND denote the number of disease cases and non-disease cases in the vaccinated group, NVD, NVND denote the number of disease cases and non-disease cases in the non-vaccinated group, respectively. Some covariates are available: geographic latitude of the place where the study was done; year of publication. Here, we will carry out a bivariate analysis on the log-odds of tuberculosis in the vaccinated and not-vaccinated control arm. Let where and , and the covariates are chosen to be , where , . The Type II bivariate t regression model is applied to analyze this data set.
Table 6.
Data from clinical trials on the efficacy of BCG vaccine in the prevention of tuberculosis (see [26]).
| Vaccinated | Not Vaccinated | |||||
|---|---|---|---|---|---|---|
| Trial | Disease | No disease | Disease | No disease | Latitude | Year |
| 1 | 4 | 119 | 11 | 128 | 44 | 48 |
| 2 | 6 | 300 | 29 | 274 | 55 | 49 |
| 3 | 3 | 228 | 11 | 209 | 42 | 60 |
| 4 | 62 | 13536 | 248 | 12619 | 52 | 77 |
| 5 | 33 | 5036 | 47 | 5761 | 13 | 73 |
| 6 | 180 | 1361 | 372 | 1079 | 44 | 53 |
| 7 | 8 | 2537 | 10 | 619 | 19 | 73 |
| 8 | 505 | 87886 | 499 | 87892 | 13 | 80 |
| 9 | 29 | 7470 | 45 | 7232 | 27 | 68 |
| 10 | 17 | 1699 | 65 | 1600 | 42 | 61 |
| 11 | 186 | 50448 | 141 | 27197 | 18 | 74 |
| 12 | 5 | 2493 | 3 | 2338 | 33 | 69 |
| 13 | 27 | 16886 | 29 | 17825 | 33 | 76 |
By adopting the bivariate t regression model specified by (21) and setting the initial values of as
we calculate the MLEs of these parameters by using the Monte Carlo ECM algorithm (22), (13) and (23), which converged to in 51 iterations with precision . These results are summarized in Table 7.
Table 7.
MLEs and posterior estimates of parameters for the tuberculosis vaccine data.
| Posterior | Posterior | 95% Bayesian | |||
|---|---|---|---|---|---|
| Parameter | Coefficients | MLE | Mean | Std | Credible Interval |
| Constant | −5.006 | −5.017 | 0.190 | [−5.408, −4.606] | |
| 0.009 | 0.005 | 0.021 | [−0.031,0.046] | ||
| −0.080 | −0.085 | 0.028 | [−0.128, −0.026] | ||
| Constant | −4.137 | −4.139 | 0.243 | [−4.652, −3.685] | |
| 0.043 | 0.035 | 0.032 | [−0.023, 0.116] | ||
| −0.078 | −0.090 | 0.041 | [−0.168, −0.003] | ||
| 0.445 | 0.494 | 0.192 | [0.235, 1.005] | ||
| 0.510 | 0.519 | 0.159 | [0.245, 0.812] | ||
| 0.649 | 0.690 | 0.220 | [0.305, 1.085] | ||
| 13.834 | 12.870 | 0.581 | [12.027, 13.923] | ||
| 3.062 | 3.234 | 0.141 | [3.018, 3.479] |
To illustrate the proposed Bayesian methods in Section 4, we assign the non-informative prior of (24) to (i.e. ) and (26) to , we generate 1000 posterior samples of by using the Gibbs sampling embedded with the AR algorithm. By discarding the first half of these samples, we can calculate the posterior means, the posterior standard deviations and the 95% Bayesian credible intervals as shown in the last three columns of Table 7.
Now suppose that we want to test the independence between and , i.e. to test whether is zero or not. Under the null hypothesis, we have for and i = 1, 2. That is, we fit each component by the univariate t regression model. The parameters for each univariate distribution are estimated by
From (18)–(19), the LRT statistic is given by t = 9.6778 and the corresponding . Thus, the null hypothesis should be rejected.
Moreover, from (11), the estimated correlation coefficient between and is , which is acceptable comparing with the sample correlation coefficient given by 0.924, indicating a positive correlation between and . The statistics of Kendall's tau and Spearman's rho for rank correlation coefficient between and are and , concurring with the above result. The difference of estimates on and from the marginal analysis indicates the distinct amounts of tail weights for and , thus the Type II bivariate t regression model provides a more reasonable fit to this data set than Type I model.
From the regression results, we know that the covariate is insignificant while is significant. To get a final model, we still need some extra work. We use this data set just for demonstrating that the proposed model is applicable to a regression analysis. Furthermore, incorporating covariates into the scale structure could be a potential future work.
7. Discussions
To overcome three obvious disadvantages associated with the classical MVT distribution, in this paper, we introduced a Type II MVT distribution as a new robust alternative to the multivariate normal distribution. This new distribution has three noteworthy characteristics: (1) All components follow univariate t-distributions with not necessarily identical degrees of freedom; (2) it is applicable to the multivariate data that some components are marginally t distributed while some are approximately normal distributed, depending on small or large value of the corresponding ; and (3) it could contain some dependent components and some statistically independent components. Therefore, the proposed distribution is more flexible in model specification. Confirmed by the two real data sets presented in Section 6, the variables are moderately or strongly correlated but with very different tail weights seen from the marginal perspective. In such cases, both the dependency structure and various amounts on tails should be considered. The classical MVT distribution has a very conspicuous drawback that limits the same degrees of freedom for all components. Instead, the proposed Type II MVT distribution covers the two aspects thus it is superior to existing models and has a better performance in our two real data fitting.
In general, although the joint density of the Type II MVT random vector does not have a closed-form expression, the SR (6) is very useful in the derivation of the marginal distributions, mixed moments, and Monte Carlo ECM algorithm. From the characteristic (1) in Section 2, we have shown that each component follows a univariate t-distribution. In fact, any sub-vectors of , say, follows , where , is a sub-matrix consisting of the rows and columns of , and . We also applied the Type II MVT distribution to the linear regression model by adopting it as the joint distribution of the error terms, taking advantage of its flexibility to better capture the characteristic of the data when outliers exist.
Further extension of (6) can be considered as follows:
| (29) |
where , , for , and are mutually independent. In particular, in (29) when , it reduces to Type I MVT distribution; and when , it reduces to Type II MVT distribution. Note that the Type I MVT distribution (2) and the proposed Type II MVT distribution (6) are non-nested. In (29), the dependency of the random vector can possibly come from both the multivariate normal vector and the common Gamma random variable V. Besides, compared with the two models in (4)–(5), although similar in expressions, the model (29) will bring convenience in parameter estimations without embedding the constraint as stated in Remark 2.1. When analyzing real data, we can start with this general model (29), and turn to the reduced model based on the estimation results. The model (29) includes the Type I and Type II MVT distributions as special cases, thus it is much more convenient and useful for model selection.
As a comparison with the model proposed in [10], similarly we decompose the scale matrix into , where is the matrix of eigenvectors of and is a diagonal matrix whose entries are the eigenvalues of , then the vector in (6) can be reformulated as
| (30) |
where is a d-dimensional Gaussian random vector with mean zero vector and covariance matrix equal to the identity matrix. Note that the model in (30) is different from the formulation in [10] since in general it is not commutative for matrix product, and only when is a diagonal matrix the two are equivalent.
Acknowledgments
The authors are grateful to the editor and anonymous reviewer's valuable comments and suggestions.
Appendix. Derivation of joint pdf .
Based on the mixture expressions in (7), given the conditional distribution of and the marginal distributions of 's, the joint pdf of is given by
By employing the transformation of for and then , it follows that
where and are defined in (10).
Funding Statement
Chi Zhang's research was supported by National Natural Science Foundation of China (Grant No. 11801380). Guo-Liang Tian's research was fully supported by National Natural Science Foundation of China (Grant No. 11771199). Kam Chuen Yuen's research was supported by a Seed Fund for Basic Research of the University of Hong Kong, and a grant from the Research Grants Council of the Hong Kong Special Administrative Region, China (Project No. HKU17306220). The work of Man-Lai Tang was partially supported through grants from the Research Grant Council of the Hong Kong Special Administrative Region (UGC/FDS14/P06/17, UGC/FDS14/P02/18, and the Research Matching Grant Scheme (RMGS)) and a grant from the National Natural Science Foundation of China (11871124). The computing facilities/software were supported by SAS Viya and the Big Data Intelligence Centre at Hang Seng University of Hong Kong.
Disclosure statement
No potential conflict of interest was reported by the author(s).
References
- 1.Amos D.E. and Bulgren W.G., On the computation of a bivariate t-distribution, Math. Comp. 23 (1969), pp. 319–333. [Google Scholar]
- 2.Booth J.G. and Hobert J.P., Maximizing generalized linear mixed model likelihoods with an automated Monte Carlo EM algorithm, J. R. Stat. Soc. Ser. B. Stat. Methodol. 61 (1999), pp. 265–285. [Google Scholar]
- 3.Cook R.D. and Weisberg S., An Introduction to Regression Graphics, John Wiley & Sons, New York, 2009. [Google Scholar]
- 4.Cornish E.A., The multivariate t-distribution associated with a set of normal sample deviates, Aust. J. Phys. 7 (1954), pp. 531–542. [Google Scholar]
- 5.Cornish E.A., The multivariate t-distribution associated with the general multivariate normal distribution, CSIRO Tech. Paper No. 13, CSIRO Division in Mathematics and Statistics, Adelaide, 1962.
- 6.Dunnett C.W. and Sobel M., A bivariate generalization of Student's t-distribution, with tables for certain special cases, Biometrika 41 (1954), pp. 153–169. [Google Scholar]
- 7.Fang H.B., Fang K.T. and Kotz S., The meta-elliptical distributions with given marginals, J. Multivariate Anal. 82 (2002), pp. 1–16. [Google Scholar]
- 8.Fernández C. and Steel M.F.J., Multivariate student-t regression models: Pitfalls and inference, Biometrika 86 (1999), pp. 153–167. [Google Scholar]
- 9.Finegold M. and Drton M., Robust graphical modeling of gene networks using classical and alternative t-distributions, Ann. Appl. Stat. 5 (2011), pp. 1057–1080. [Google Scholar]
- 10.Forbes F. and Wraith D., A new family of multivariate heavy-tailed distributions with variable marginal amounts of tailweight: Application to robust clustering, Stat. Comput. 24 (2014), pp. 971–984. [Google Scholar]
- 11.Hofert M., On sampling from the multivariate t distribution, R. J. 5 (2013), pp. 129–136. [Google Scholar]
- 12.Horrace W.C., Some results on the multivariate truncated normal distribution, J. Multivariate Anal. 94 (2005), pp. 209–221. [Google Scholar]
- 13.Jones M.C., A dependent bivariate t distribution with marginals on different degrees of freedom, Statist. Probab. Lett. 56 (2002), pp. 163–170. [Google Scholar]
- 14.Kotz S. and Nadarajah S., Multivariate t Distributions and Their Applications, Cambridge University Press, Cambridge, UK, 2004. [Google Scholar]
- 15.Levine R.A. and Casella G., Implementations of the Monte Carlo EM algorithm, J. Comput. Graph. Statist. 10 (2001), pp. 422–439. [Google Scholar]
- 16.Lin P.E., Some characterizations of the multivariate t distribution, J. Multivariate Anal. 2 (1972), pp. 339–344. [Google Scholar]
- 17.Liu C., Missing data imputation using the multivariate t distribution, J. Multivariate Anal. 53 (1995), pp. 139–158. [Google Scholar]
- 18.Liu C., Bayesian robust multivariate linear regression with incomplete data, J. Amer. Statist. Assoc. 91 (1996), pp. 1219–1227. [Google Scholar]
- 19.Liu C., ML estimation of the multivariate t distribution and the EM algorithm, J. Multivariate Anal. 63 (1997), pp. 296–312. [Google Scholar]
- 20.Liu C. and Rubin D.B., The ECME algorithm: A simple extension of EM and ECM with faster monotone convergence, Biometrika 81 (1994), pp. 633–648. [Google Scholar]
- 21.Liu C. and Rubin D.B., ML estimation of the t distribution using EM and its extensions, ECM and ECME, Statist. Sinica 5 (1995), pp. 19–39. [Google Scholar]
- 22.McLachlan G.J., Bean R.W. and Jones L.B.T., Extension of the mixture of factor analyzers model to incorporate the multivariate t-distribution, Comput. Statist. Data Anal. 51 (2007), pp. 5327–5338. [Google Scholar]
- 23.Nadarajah S. and Kotz S., Mathematical properties of the multivariate t distribution, Acta. Appl. Math. 89 (2005), pp. 53–84. [Google Scholar]
- 24.Nadarajah S. and Kotz S., Estimation methods for the multivariate t distribution, Acta. Appl. Math. 102 (2008), pp. 99–118. [Google Scholar]
- 25.Relles D.A. and Rogers W.H., Statisticians are fairly robust estimators of location, J. Amer. Statist. Assoc. 72 (1977), pp. 107–111. [Google Scholar]
- 26.van Houwelingen H.C., Arends L.R. and Stijnen T., Advanced methods in meta-analysis: Multivariate approach and meta-regression, Stat. Med. 21 (2002), pp. 589–624. [DOI] [PubMed] [Google Scholar]
- 27.von Neumann J., Various techniques used in connection with random digits, J. Res. Nat. Bur. Stand. Appl. Math. Series 3 (1951), pp. 36–38. [Google Scholar]
- 28.Wei G.C.G. and Tanner M.A., A Monte Carlo implementation of the EM algorithm and the poor man's data augmentation algorithms, J. Amer. Statist. Assoc. 85 (1990), pp. 699–704. [Google Scholar]
- 29.Zellner A., Bayesian and non-Bayesian analysis of the regression model with multivariate student-t error terms, J. Amer. Statist. Assoc. 71 (1976), pp. 400–405. [Google Scholar]








