Abstract
Many extensions of the multivariate normal distribution to heavy-tailed distributions are proposed in the literature, which includes scale Gaussian mixture distribution, elliptical distribution, generalized elliptical distribution and transelliptical distribution. The inferences for each family of distributions are well studied. However, extensions are overlapped or similar to each other, and it is hard to differentiate one extension from the other. For this reason, in practice, researchers simply pick one of many extensions and apply it to the analysis. In this paper, to enlighten practitioners who should conduct statistical procedures not based on their preferences but based on how data look like, we comparatively review various extensions and their estimators. Also, we fully investigate the inclusion and exclusion relations of different extensions by Venn diagrams and examples. Moreover, in the numerical study, we illustrate visual differences of the extensions by bivariate plots and analyze different scatter matrix estimators based on the microarray data.
Keywords: Elliptical distributions, generalized elliptical distributions, transeliptical distributions, non-paranormal distributions, scatter matrix, mutual relations
1. Introduction
A multivariate normal distribution (MVN) has inarguably been the most fundamental and overarching cornerstone in the theory of multivariate statistics. It has been proven for more than centuries that it possesses a bundle of nice theoretical properties, which facilitate developments and applications of multivariate procedures in real-life problems. For example, equality of population mean can be tested through the Hotelling's T procedure [26] under the Gaussian assumption, or hidden clusters of data can be identified by the model-based clustering [5]. However, it is an admitted fact that the multivariate analyses built upon the MVN assumption mostly do not show acceptable performance when the distributional assumption is violated. In modern sciences, diverse types of data have poured out, showing non-‘normal’ behaviors.
In this context, many alternatives or extensions of MVN are discussed in order to solve complex problems arising from many applications. The family of heavy-tailed distributions covers distributions having tail-decay slower than that of the Gaussian; for example, multivariate t distribution. Research on heavy-tail extension has brought considerable and interesting multivariate distributions. Out of them, the followings are dominantly used in modeling heavy-tail data: (1) scale Gaussian mixture distribution in image processing [22,44], (2) an elliptical distribution in covariance matrix estimation [12] and in mixture modeling [31], (3) a generalized elliptical distribution in scatter matrix estimation under missing data [19], (4) a non-paranormal distribution for estimating a graphical model [36], mutual information [47], and causality [41], and (5) a transelliptical distribution for sparse PCA [23] and separable covariance matrix estimation [42].
No doubt, there exist some overviews or reviews on the extended classes of multivariate distributions (not limited to the preceding examples); Adcock and Azzalini [1] cast their attention on a family of the skewed distributions, and Babic et al. [4] compared and classified the distributions according to their own criteria. As evidenced by the two review works, many extensions have been studied individually, but missing the mutual relations and equivalences. Thus, the goal of this paper is to clarify their inclusion relations and characterize distributional properties. We summarize the inclusions by Venn diagrams and examine whether they are strict by giving concrete examples. We also briefly address scatter matrix estimation in heavy-tailed distributions and relationships between the estimators. Furthermore, through numerical experiments, (1) we show how data from different heavy-tail families look different and (2) we compare the scatter matrix estimators when heavy-tailed gene expression data are given. We hope the account given in this paper will lessen unnecessary confusions that may have arisen in this field and thus serve as a guide for researchers who are concerned about possible remedies when data is against normality.
The remainder of this paper is as follows. In Section 2, we describe heavy-tailed distributions and explain related facts. In Section 3, we investigate the relationships between the class of distributions and between available estimators of a scatter matrix. In Section 4, we show distributions of heavy-tailed distributions and empirically compare the estimators based on the gene expression data. Section 5 ends this paper with summary and comments.
2. Heavy-tail extensions
We briefly introduce definitions of the heavy-tailed distributions mentioned at the beginning of this section. Individual characteristics are to be explained in the subsequent subsections. In this paper, we mainly use a stochastic representation of a random vector that follows a heavy-tailed distribution. For different characterizations such as using a characteristic function or a probability density function, we refer readers to [17].
Assume a p-dimensional random vector satisfies the following stochastic representation:
| (1) |
where means equality in distribution, is a random vector uniformly distributed on the unit -sphere in , R is a scalar random variable in , is a (deterministic) vector in , and is a (deterministic) matrix. The class of distributions defined as above contains the location-scale parameters ( : a location parameter and : a scatter matrix) as in the Gaussian distribution. However, an additional parameter R controls the tail behavior, which produces diverse characteristics.
By specifying model parameters , R, and or utilizing transformations on , we can express different heavy-tailed distributions through (1). For instance, the multivariate normal distribution with an identity covariance matrix corresponds to the above definition (1) with
where R is distributed independently of and is a random variable from the chi-square distribution with degrees of freedom p. One should aware that suitable identifiability conditions on R and are needed in this characterization [23,42]. A necessity of the conditions is because and for any positive factor c>0 result in the same representation of . Here, we assume if needed, following [23].
Definition 2.1
Assume is a multivariate random vector in and has a stochastic representation (1) with . Then, different classes of heavy-tailed distributions are characterized as follows.
(SGM) If with a positive random variable λ, follows the scale Gaussian mixture distribution. Here, λ, , and are (jointly) independent.
(EL) If R is a non-negative random variable, follows the elliptical distribution. Here, R and are independent.
(GEL) (No further restriction) follows the generalized elliptical distribution.
(NPN) Let . If there exist the strictly increasing functions such that follows the multivariate Gaussian distribution, follows the non-paranormal distribution.
(TEL) Let . If there exist the strictly increasing functions such that follows the elliptical distribution, follows the transelliptical distribution.
Note that R and are not necessarily independent when we define GEL. Throughout this paper we will use the above abbreviations (e.g. EL=elliptical distribution).
The stochastic representation proves its utility as a generative model. For example, a random sample of can be generated by normalizing independent standard normal samples:
where each component of independently follows and is a vector 2-norm. If we generate R independently of (assuming R is from a known distribution), a random sample of is obtained by multiplying the generated R and , if . Finally, it should be noted that we only consider continuous distributions satisfying for expositional simplicity.
Parameter estimation in heavy-tailed distributions has been extensively studied under different contexts: for example, estimation of multivariate t by Nadarajah and Kotz [39], estimation of multivariate Cauchy by Auderset et al. [3], covariance matrix estimation by Goes et al. [21] and Ke et al. [28], and mean estimation and regression by Lugosi and Mendelson [38]. As the references have already discussed this topic intensively, we will not focus on it, but new comparative results for estimators are addressed.
Given data, the random variable R (or a functional parameter) is not generally estimated, but rather specified by users, and then the other Euclidean parameters ( , ) are estimated. However, there have been attempts to estimate all together [6,8,50]. We discuss this point at the end of this paper.
2.1. Scale Gaussian mixture distribution (SGM)
One of the commonly used families of distributions in SGM is the finite mixture of Gaussians. If discrete λ (λ appears in Definition 2.1) has k levels with each weight , then is the mixture Gaussian random variable with its density function at given by . is the density of the normal distribution that has its mean zero and non-singular covariance matrix . Besides, we can consider abundant examples of the SGM family by choosing a continuous mixing variable λ. To name a few, if we set λ by an exponential variable with its density , it yields the multivariate inverse normal distribution. Letting λ be the inverse Gaussian variable having its density produces the multivariate Laplace distribution [15]. More recently, Punzo and Bagnato [45] introduced the multivariate shifted exponential normal distribution represented by where W is from a shifted exponential distribution.
SGM has been applied to diverse fields (e.g. natural image processing [44,53], biomedical records and time series in astronomy [51], speech signal processing [25], and genetics [14]). Also, it is often set as a prior distribution in Bayesian analysis (with some modifications) [7,40].
2.2. Elliptical distribution (EL)
The most familiar class of distributions in EL is the multivariate normal distribution (R is a square root of the chi-squared random variable). However, not limited to this, various choices of R have been also explored in the literature to address different features in data. Such examples are the multivariate logistic distribution, Pearson Type II, VII distributions, and Kotz-type distribution where specific forms of R can be found in Table 1 of [54].
It is noteworthy that not all EL-distributed random variables possess the density functions, but if exists, it should be of the following form;
Definition 2.2
A p-dimensional random vector is said to have an elliptic distribution with mean , a scatter matrix (positive definite), and Lebesgue measurable generator : if its density function at has the form
(2)
A number of known distributions agree with this formulation: a multivariate normal distribution and a multivariate t distribution ( ) where , are (different) normalizing constants depending on . See Section 3.2 for more examples.
Once density functions of EL-distributed variables are accessible, parameter estimation becomes obvious: the principle of maximum likelihood. One of the well-known cases is the sample mean and covariance matrix for estimating a mean vector and a covariance matrix under the multivariate normal distribution. Another example is the multivariate t distribution, though an explicit form of estimators is not available and thus one needs dedicated algorithms (see [39] for treatment on estimation under the multivariate t distribution).
A density function is not often tractable or not known a priori, but we can still achieve legitimate estimates for the scatter matrix parameter . Lindskog et al. [32] proved that the Kendall's tau estimator is invariant to the choice of R, and thus can be used to detect the correlation structure of any EL-distributed data. The gist of Kendall's tau estimator, or the rank correlation, is that it solely depends on the rank of samples, not the magnitude, thereby bypassing estimation of the scalar random variable R. Recall its definition where the -th entry of Kendall's tau matrix is given by
| (3) |
where is EL-distributed data and indicates the ith entry of . The estimator is also valid for the case of TEL, which will be explained later.
Other known statistics estimating a scatter matrix in EL are a multivariate version of Kendall's tau estimator [13] and Tyler's M estimator [52]. Both estimators are underpinned by the normalization of data, which obviates the need for estimating/specifying R. The multivariate Kendall's tau can be used to retrieve an eigenspace of [24]. We elucidate Tyler's M estimator in the following subsection since the estimator is also valid in a broader class of distributions (GEL).
2.3. Generalized elliptical distribution (GEL)
The generalized elliptical distribution extends EL by allowing dependency between R and and/or permitting R to take any values in . This extension enables to model asymmetry of data that elliptical distributions cannot appropriately deal with. The skew elliptical distribution in is often represented by a random vector conditioning on a sign of a real-valued random variable where their joint distribution is EL. By defining new depending on a sign of the real-valued variable, we can find appropriate random variables and depending on each other to define the skew elliptical random vector (Theorem 28 of [18]). This distribution is suited for the analysis of financial data [2] where data asymmetry is commonly observed.
Aforementioned Tyler's M estimator is attractive under GEL because R is not necessarily specified in the estimation procedure. To introduce its derivation, we assume defined in (1) is centered, and then we have
| (4) |
Frahm [18] showed that the normalized vector ignoring the sign has the same density as the angular central Gaussian (ACG) distribution with a scatter matrix .
Fact 2.1 Theorem 1, [19] —
The normalized vector is distributed over a unit sphere and distributed as GEL with its density with respect to the uniform measure on the unit sphere being
(5) where . (5) is the probability density function of the angular central Gaussian distribution.
Let be a set of samples from GEL and be its normalized samples ignoring the signs. We can derive the negative log-likelihood function of (up to additive and scaling factors) based on (5):
Its maximum likelihood estimator (MLE) is defined by a fixed-point equation:
It is easy to see due to (4) that the above equation is equivalent to that defines a well-known Tyler's M estimator [52]:
| (6) |
The solution would be attained through an iterative algorithm. To satisfy the identifiability conditions mentioned earlier, it is sufficient to re-scale the estimator by once after convergence is met.
It is worth noting that Tyler's M estimator given in (6) is not the MLE, if exists, in respect of GEL-distributed , but it is the MLE of ACG-distributed . Therefore, we need a justification to explain the connection between the estimator and observations . A scatter matrix is often formally defined by a quantity satisfying the following identity [19]:
We can easily check by rearranging (6) that Tyler's M estimator satisfies the sample version of the above equation.
2.4. Non-paranormal (NPN) and transelliptical (TEL) distributions
Non-paranormal and transelliptical distributions adopt an idea of copulas that model marginal structures and dependency of variables individually. For NPN, the mutual dependency is considered by the multivariate normal distribution through its covariance matrix, and then marginal transformations are utilized to normally distributed random variables. In the meantime, TEL considers elliptical random variables instead of MVN.
Lafferty et al. [30] first coined the term ‘non-paranormal’ to refer a nonparametric extension of the normal distribution. Indeed, this extension has a richer history in the field of statistics and was first introduced with being entitled ‘meta-Gaussian’ [29]. Furthermore, as shown in Lemma 1 of [36], NPN is equivalent to the Gaussian copula, which has its own epic starting from [20,48]. According to Han and Liu [23], TEL is traced back to [16] where they first call it a ‘meta-elliptical’ distribution. However, Liu et al. [35] pointed out NPN and TEL are slightly different from meta-Gaussian and meta-elliptical distributions (Lemma 3.5 of [35]). The difference comes from the case when joint and/or marginal density functions of are not defined (or do not exist). Thus, NPN and TEL are strictly larger classes than meta-Gaussian and meta-elliptical distributions, respectively, while each of them is equivalent to its counterpart when density exists. Liu et al. [35] asserted that TEL is more attractive due to better interpretation and easier theoretical analysis. Hence, we mainly discuss NPN and TEL here.
To estimate a scatter matrix under NPN, Lafferty et al. [30] use the so-called Winsorized estimator which estimates the transformation in nonparametric sense. On the other hand, Liu et al. [34] propose to use Kendall's tau and Spearman's rho estimators which do not directly estimate the functions. They can achieve the parametric rate with the two estimators, which is faster than the rate obtained in [30]. Han and Liu [23] further verify in Theorem 3.2 that Kendall's tau statistic proved to work under EL is also valid under TEL. This property is intuitively obvious since two Kendall's tau statistic based on and are exactly the same. We finally remark that identifiability conditions for NPN and TEL are necessary to distinguish the distribution. Liu et al. [36] and Lafferty et al. [30] required the monotone transformations to preserve means and variances.
3. Mutual relations and estimators
3.1. Mutual relations
We first display the Venn diagram (Figure 1) of heavy-tailed distributions mentioned in the previous section to allow readers a good grasp of their relations. The left side of Figure 1 shows one type of direction of extensions from MVN:
From left to right, the families of heavy-tailed distributions are enlarged as the random variable R in (1) becomes more general or less restricted. Another extension depends on a monotone transformation (Figure 1, right):
The two extended cases (NPN and TEL) differ by which underlying distributions the transformation is applied to.
Figure 1.
Venn diagram of the heavy-tailed distributions. Their abbreviations are given in each section title.
We give concrete examples for strict inclusions in the above relations.
The multivariate t distribution
A random vector uniformly distributed on the p-dimensional unit sphere
The skew normal distribution
MVN transformed by monotone functions
The multivariate t distribution transformed by monotone functions.
Adopted from [10], the second example is the case where a density of an EL-distributed random variable does not exist. However, a random vector from SGM generally has a density, because the form implies its absolute continuity where is MVN with its covariance matrix being an identity matrix and is independent of .
3.2. Additional relations
Lu et al. [37] proposed a new family of heavy-tailed distributions called pair-elliptical. A random vector belonging to this family satisfies that all pairs are bivariate elliptical. Pair-normal distribution is also similarly defined. Based on these families, Lu et al. [37] described characterizations of MVN and EL. First, MVN is exactly the same as the intersection of EL and NPN, or that of EL and pair-normal. Also, EL can be characterized by the intersection of TEL and pair-elliptical. The equivalences are summarized as follows:
Also, we can find a special characterization for a sub-class of EL by considering the consistency property described below.
Fact 3.1 Theorem 1, [27]1 —
Assume follow EL and satisfy two different representations (1) and (2). Then, the two followings are equivalent.
A density generator possess the consistency property, i.e.for any and almost surely (w.r.t. the Lebesgue measure).
There exists a random variable Q>0, unrelated to dimension p, such that for any , , and , Q, and are mutually independent.
It should be noted that the second statement in Fact 3.1 depends on the scale random variable Q independent of . The property would be useful for determining if a family of distributions belongs to SGM, i.e. it is a sufficient condition for a family of distributions to be in SGM. However, it is not necessary since the multivariate logistic distribution does not satisfy the consistency property but [49] proved the mixing variable Q is expressed by the Kolmogorov–Smirnov distribution for an univariate case. The relation is drawn by a Venn diagram in Figure 2.
Figure 2.

Venn diagram of the heavy-tailed distributions. The class of EL satisfying the consistency property denoted by ‘CON’ is a proper subset of SGM.
We name a few of EL families with the consistency property [27]: MVN with , the multivariate t distribution with ( ), the stable laws distribution ( , r<0). By the previous argument, the three examples belong to SGM. On the other hand, EL not satisfying the consistency property has the following distributions: the multivariate logistic distribution with , Pearson Type II distribution with (m>0), Pearson Type VII distribution with ( ), the multivariate Bessel distribution with ( , ) where is the modified Bessel function of the third kind, and finally the power exponential distribution with ( ).
Note that all mutual relationships are only guaranteed in a finite-dimensional space, but can be different in other spaces. For example, Boente et al. [10] proved that the class of EL is equivalent to that of SGM in the infinite-dimensional (separable Hilbert) space.
3.3. Invariance property of scatter matrix estimators
As mentioned in Section 2, different robust estimators are available for different classes of heavy-tailed distributions. Table 1 summarizes the types of heavy-tailed distributions under which Kendall's tau and/or Tyler's M estimators are legitimate. The common characteristic of two estimators is that they work even if R is not explicitly determined. In other words, they are invariant to the specification of R. Kendall's tau is invariant to the monotone function f and the scale random variable R of TEL, while Tyler's M estimator to R in GEL.
Table 1.
Comparison of scatter matrix estimators for heavy-tailed distributions. ‘O’ means the estimator is valid under the distribution, while ‘–’ implies the absence of such property.
To delve into this point, we compare the explicit formulas of estimators based on samples from different distributions but sharing the same scatter matrix. Let us denote by the estimator (6) obtained from n samples that follow GEL with its scatter matrix (and mean zero). On the other hand, assume , , is from the angular central Gaussian with its scatter matrix (and mean zero), and define its Tyler's M estimator by
Then, by Fact 2.1, the two Tyler's M estimators are identical in the distributional sense, i.e. .
We now turn our attention to TEL. Assume and are from and and are from TEL sharing the same scatter matrix. Based on Lemma 5 of [32] and monotonicity of the transformation functions, it is observed that
| (7) |
We write by the rank correlation estimator (3) computed from n copies of . We also define the estimator using n MVN data (copies of ) by
| (8) |
Since the sums of signs in (3) and (8) are identically distributed (∵ (7)), we can conclude that , .
Table 1 also implies that a multivariate distribution belonging to EL can have multiple choices for estimating its scatter matrix. One natural question is how different the available estimators are and/or which one is more efficient. It has been shown by Goes et al. [21] and Zhang et al. [56] that Tyler's M estimator and the sample covariance matrix are close to each other under MVN assumption. Assume that , are random samples from mean-zero MVN and let us denote the sample covariance matrix by . According to Lemma 2.1 of [56], Tyler's M estimator can be expressed as
where are the unique solution to the minimization problem
They also prove that the estimated weights are approximately equal to 1/n, that is to say, almost surely as when an underlying distribution for is the isotropic Gaussian distribution [56, Lemma 2.3], which is later generalized to the case of an arbitrary scatter matrix [21, Lemma 7]. Using this fact, Goes et al. [21] showed in Lemma 8 that if and , then for some constants depending only on γ we have for all
provided that satisfying is in the class of approximately sparse matrices introduced in [9]. The concentration is useful to derive that of due to the triangle inequality: . The deviation inequality based on the element-wise maximum norm has been of consequence in the estimation of high-dimensional covariance/precision matrices (see [43] for more details).
4. Numerical study
4.1. Comparison of heavy-tail extentions
We consider the following generative models of four distributions with mean zero and covariance matrix :
where , R>0 is the random variable following , , and . λ is independent of R and . Note that the three extensions of MVN share the same quantities (R, , ) of and have distinct scalar random variables, which are for , for , and for . Moreover, by applying a monotone transformation to and , we can define and . Based on the generating mechanism, we can compare the differences induced by either the scalar random variables or the marginal transformation. To empirically do it, we generate n samples of and then sequentially produce those of the other extensions. Here, we fix n = 10000, p = 2, and (Figure 3).
Figure 3.
Scatterplot of bivariate random variables. n = 10000 data points from MVN are first generated and then those from the others are created by their definitions depending on the MVN samples.
Compared to MVN, other distributions exhibit much larger values ( in absolute value). Especially for TEL, the most extreme value reaches about 5000, which is not commonly observable in other distributions. Indeed, the scale of samples in TEL changes up to what transformation we use. SGM and EL keep the elliptical shape of contours, but SGM is more concentrated at the center. GEL is the only distribution among six that displays asymmetric spreading, one of its distinctive characteristics. NPN and TEL show similar non-elliptical dispersion of data with different scales.
4.2. Application to estimation of scatter matrix in genomics data
We provide an illustration of results using different scatter matrix estimators to appreciate their behaviors when heavy-tailed data are given.
The dataset we use here is adopted from the preceding work [55], where gene-expressions of the plant Arabidopsis thaliana are measured by Affymetrix GeneChip microarrays and the gene-regulatory network for the genes related to the isoprenoid pathway is constructed. In contrast to the original work under the multivariate Gaussian assumption, Liu et al. [34,36] assumed the data are from a heavy-tailed distribution and find very different results from the original. In this section, we investigate the difference of three scatter matrix estimators -- Kendall's tau, Tyler's M, and Pearson's R - which are the initial estimators for the estimation of the (sparse) inverse covariance matrix in [34,36].
The gene microarray data contains n = 118 chip records from p = 39 genes,2 which are log-transformed and standardized. For an expositional purpose, we apply a monotone function to every observation, making heavy-tailedness more distinct. Since Kendall's tau estimator only finds out a correlation structure, the other two estimators are also converted to the correlation counterparts.
The estimators are given in the top panels of Figure 4. They appear to be similar, but a closer look could find subtle differences in Pearson's R from the other two. To be specific, there exist clear distinctions in the area bounded by a black box, such as different degrees of correlations and their opposite signs. Then, we numerically compare their eigen-structures. In the bottom left panel of Figure 4, eigenvalues of each estimator commonly show an elbow-like pattern. While it is inarguably at the fourth index where the sharp elbow is detected in Kendall's tau and Tyler's M estimators, it is rather obscure to pick a cutoff point in Pearson's R type. Next, we measure the angles between two eigenvectors from different correlation estimators. For example, in the bottom right panel of Figure 4 the green line labeled by ‘K_vs_T’ plots the angle between the first eigenvectors from Kendall's tau and Tyler's M, that of the second eigenvectors, and so on. The first three angles are relatively close to 0, implying each eigenvector is almost parallel to the counterpart. On the other hand, the angles formed by eigenvectors of Pearson's R are not close to zero. Moreover, we can approximate each estimator by only using a few (say, the first 3 here) of eigenvalues and eigenvectors as if the remaining part is a noisy component that might disguise similarity of the three estimators. The middle of Figure 4 shows that the approximated Pearson's R does not look like the other two that are alike. Taken together, the eigen-structure of Pearson's R estimator presents a sharp contrast to those of Kendall's tau and Tyler's M.
Figure 4.
Comparison of three correlation matrix estimators. The first row shows their original heatmaps, while the second the approximated versions using the first three eigenvectors. The bottom left panel describes the eigenvalues of each estimator, while the right one compares angles between eigenvectors from two different estimators.
Considering Kendall's tau and Tyler's M are well-founded estimators for heavy-tailed distributions but Pearson's R is not, the disparity adds a caveat to practitioners that heavy-tailed features should be suitably treated. It also implies that analyses based on heavy-tailed assumptions may end up with different findings. Furthermore, the similarity in eigen-structure between Kendall's tau and Tyler's M may indicate the first three eigen-pairs would be central information of the scatter matrix, which could benefit researchers interested in this dataset.
Next, we use the principal component (PC) scores for outlier detection based on a simple idea as follows [11,46]. Data points with exceptionally large (in absolute value) PC values are regarded as potential outliers. The scores are defined by the inner product between observed data and eigenvectors obtained from any of Kendall's tau, Tyler's M, and Pearson's correlation matrix estimators. Since the first four PCs explain most of the variance (see bottom left in Figure 4), we show in Figure 5 the pairwise scatter plot of them.
Figure 5.
Scatterplot matrix of PC scores obtained by three different correlation matrices (labeled on the top). PC1 is on the x-axis, while PC2, PC3, PC4 on the y-axes.
When using the robust estimators (Tyler, Kendall), it can be seen that 3, 13, and 113th are far away from the majority of observations. However, for Pearson's estimator, the third observation substantially deviates from the majority, and as a consequence it may hinder one from capturing the variance of the others that include 13, 133th points. This could result in different scientific conclusions about outliers.
5. Conclusion
MVN is the most fundamental and commonly used distributional assumption in the multivariate analysis due to its attractive theoretical properties, but mostly due to its ease of use and accessibility. Therefore, practitioners or even statistical experts are inclined to overlook a deviation from Gaussianity found in real-world data and innocently take advantage of the good properties of MVN. One of the factors for this phenomenon is researchers are not fully aware that many alternatives to MVN are already in place. For this purpose, this paper addresses a review of heavy-tailed multivariate distributions and conducts an in-depth investigation into their relationships. As a consequence, we hope this work would encourage researchers to use the extensions properly in future research.
We finally conclude the paper with a discussion. One of the appealing characteristics of heavy-tailed families is that they are parametric distributions that can specify the location and scale parameters, which promise interpretability of statistical models To date, when estimating and , most researches exploit two schemes. On the one hand, the distribution of R is fully specified so that the maximum likelihood estimators are attained through the likelihood function of . On the other hand, one can pursue an estimator invariant to R so as Kendall's tau estimator. However, the former approach is only applicable to limited (known) distributions, while the latter cannot perform further procedures (e.g. prediction) because a generative model cannot be specified. The limitations lead to a natural question of whether one can directly estimate the distribution of R from data. For example, estimating the degrees of freedom in t distribution matches this problem [33]. We believe it is a practically important problem that would attract many researchers' attention in the field.
Funding Statement
This work was partially supported by the National Research Foundation of Korea (NRF) grant funded by the Korea government (MSIT) [grant number NRF-2021R1A2C1010786].
Notes
We bring two of five statements in the theorem for simplicity. We refer those interested in more details to check the original paper.
Originally, p = 40 was used but the available data as of now only contains 39 genes.
Disclosure statement
No potential conflict of interest was reported by the authors.
References
- 1.Adcock C. and Azzalini A., A selective overview of skew–elliptical and related distributions and of their applications, Symmetry 12 (2020), p. 118. [Google Scholar]
- 2.Adcock C., Eling M., and Loperfido N., Skewed distributions in finance and actuarial science: A review, Euro. J. Finance 21 (2015), pp. 1253–1281. [Google Scholar]
- 3.Auderset C., Mazza C., and Ruh E.A., Angular Gaussian and Cauchy estimation, J. Multivar. Anal. 93 (2005), pp. 180–197. [Google Scholar]
- 4.Babic S., Gelbgras L., Hallin M., and Ley C., Optimal tests for elliptical symmetry: Specified and unspecified location, arXiv e-prints (2019). Available at arXiv:1911.08171.
- 5.Banfield J.D. and Raftery A.E., Model-based Gaussian and non-Gaussian clustering, Biometrics 49 (1993), pp. 803–821. [Google Scholar]
- 6.Battey H. and Linton O., Nonparametric estimation of multivariate elliptic densities via finite mixture sieves, J. Multivar. Anal. 123 (2014), pp. 43–67. [Google Scholar]
- 7.Bhattacharya A., Chakraborty A., and Mallick B.K., Fast sampling with Gaussian scale mixture priors in high-dimensional regression, Biometrika 103 (2016), pp. 985–991. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 8.Bhattacharyya S. and Bickel P.J., Adaptive estimation in elliptical distributions with extensions to high dimensions, preprint (2012). Available at http://www.science.oregonstate.edu/bhattash/Site/Research.html.
- 9.Bickel P.J. and Levina E., Covariance regularization by thresholding, Ann. Statist. 36 (2008), pp. 2577–2604. [Google Scholar]
- 10.Boente G., Salibián Barrera M., and Tyler D.E., A characterization of elliptical distributions and some optimality properties of principal components for functional data, J. Multivar. Anal. 131 (2014), pp. 254–264. [Google Scholar]
- 11.Chen X., Zhang B., Wang T., Bonni A., and Zhao G., Robust principal component analysis for accurate outlier sample detection in rna-seq data, BMC Bioinformatics. 21 (2020), pp. 1–20. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 12.Chen Y., Wiesel A., and Hero A., Robust shrinkage estimation of high-dimensional covariance matrices, IEEE. Trans. Signal. Process. 59 (2011), pp. 4097–4107. [Google Scholar]
- 13.Choi K. and Marden J., A multivariate version of Kendall's τ, J. Nonparametr. Stat. 9 (1998), pp. 261–293. [Google Scholar]
- 14.Cui T., Havulinna A., Marttinen P., and Kaski S., Informative Gaussian scale mixture priors for bayesian neural networks, arXiv e-prints (2020). Available at arXiv:2002.10243.
- 15.Doulgeris A.P. and Eltoft T., Scale mixture of Gaussian modelling of polarimetric SAR data, EURASIP. J. Adv. Signal. Process. 2010 (2009), pp. 1–12. [Google Scholar]
- 16.Fang H.-B., Fang K.-T., and Kotz S., The meta-elliptical distributions with given marginals, J. Multivar. Anal. 82 (2002), pp. 1–16. [Google Scholar]
- 17.Fang K., Kotz S., and Ng K., Symmetric Multivariate and Related Distributions, Chapman & Hall/CRC Monographs on Statistics & Applied Probability, Taylor & Francis, 1990.
- 18.Frahm G., Generalized elliptical distributions: Theory and applications, PhD thesis, University of Cologne, 2004.
- 19.Frahm G. and Jaekel U., A generalization of Tyler's M-estimators to the case of incomplete data, Comput. Stat. Data. Anal. 54 (2010), pp. 374–393. [Google Scholar]
- 20.Fréchet M., Sur les tableaux de corrélation dont les marges sont données, Ann. Univ. Lyon Sci. Sect. A 14 (1951), pp. 53–77. [Google Scholar]
- 21.Goes J., Lerman G., and Nadler B., Robust sparse covariance estimation by thresholding Tyler's M-estimator, Ann. Statist. 48 (2020), pp. 86–110. [Google Scholar]
- 22.Gupta P., Moorthy A.K., Soundararajan R., and Bovik A.C., Generalized Gaussian scale mixtures: A model for wavelet coefficients of natural images, Signal Process. Image Commun. 66 (2018), pp. 87–94. [Google Scholar]
- 23.Han F. and Liu H., Scale-invariant sparse PCA on high-dimensional meta-elliptical data, J. Am. Stat. Assoc. 109 (2014), pp. 275–287. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 24.Han F. and Liu H., ECA: High-dimensional elliptical component analysis in non-Gaussian distributions, J. Am. Stat. Assoc. 113 (2018), pp. 252–268. [Google Scholar]
- 25.Hao J., Lee T., and Sejnowski T.J., Speech enhancement using Gaussian scale mixture models, IEEE. Trans. Audio. Speech. Lang. Process. 18 (2010), pp. 1127–1136. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 26.Hotelling H., The generalization of student's ratio, Ann. Math. Statist. 2 (1931), pp. 360–378. [Google Scholar]
- 27.Kano Y., Consistency property of elliptic probability density functions, J. Multivar. Anal. 51 (1994), pp. 139–147. [Google Scholar]
- 28.Ke Y., Minsker S., Ren Z., Sun Q., and Zhou W.-X., User-friendly covariance estimation for heavy-tailed distributions, Stat. Sci. 34 (2019), pp. 454–471. [Google Scholar]
- 29.Krzysztofowicz R. and Kelly K.S., A meta-Gaussian distribution with specified marginals, Tech. Rep., University of Virginia, 1996.
- 30.Lafferty J., Liu H., and Wasserman L., Sparse nonparametric graphical models, Stat. Sci. 27 (2012), pp. 519–537. [Google Scholar]
- 31.Li S., Yu Z., and Mandic D., A universal framework for learning the elliptical mixture model, IEEE. Trans. Neural. Netw. Learn. Syst. 32 (2020), pp. 1–15. [DOI] [PubMed] [Google Scholar]
- 32.Lindskog F., McNeil A., and Schmock U., Kendall's tau for elliptical distributions, in Credit Risk, Physica-Verlag HD, Heidelberg, 2003, pp. 149–156.
- 33.Liu C. and Rubin D., ML estimation of the t distribution using EM and its extensions, ECM and ECME, Stat. Sin. 5 (1995), pp. 19–39. [Google Scholar]
- 34.Liu H., Han F., Yuan M., Lafferty J., and Wasserman L., High-dimensional semiparametric gaussian copula graphical models, Ann. Statist. 40 (2012), pp. 2293–2326. [Google Scholar]
- 35.Liu H., Han F., and Zhang C., Transelliptical graphical modeling under a hierarchical latent variable framework, Tech. Rep., 2012.
- 36.Liu H., Lafferty J., and Wasserman L., The nonparanormal: Semiparametric estimation of high dimensional undirected graphs, J. Mach. Learn. Res. 10 (2009), pp. 2295–2328. [PMC free article] [PubMed] [Google Scholar]
- 37.Lu J., Han F., and Liu H, Robust scatter matrix estimation for high dimensional distributions with heavy tails, IEEE Trans. Inf. Theory 67 (2020), pp. 5283–5304. [Google Scholar]
- 38.Lugosi G. and Mendelson S., Mean estimation and regression under heavy-tailed distributions: A survey, Found. Comut. Math. 19 (2019), pp. 1145–1190. [Google Scholar]
- 39.Nadarajah S. and Kotz S., Estimation methods for the multivariate t distribution, Acta. Appl. Math. 102 (2008), pp. 99–118. [Google Scholar]
- 40.Nalci A., Fedorov I., Al-Shoukairi M., Liu T.T., and Rao B.D., Rectified Gaussian scale mixtures and the sparse non-negative least squares problem, IEEE. Trans. Signal. Process. 66 (2018), pp. 3124–3139. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 41.Nandy P., Maathuis M.H., and Richardson T.S., Estimating the effect of joint interventions from observational data in sparse high-dimensional settings, Ann. Statist. 45 (2017), pp. 647–674. [Google Scholar]
- 42.Niu L., Liu X., and Zhao J., Robust estimator of the correlation matrix with sparse Kronecker structure for a high-dimensional matrix-variate, J. Multivar. Anal. 177 (2020), p. 104598. [Google Scholar]
- 43.Park S., Wang X., and Lim J., Estimating high-dimensional covariance and precision matrices under general missing dependence, Electron. J. Stat. 15 (2021), pp. 4868–4915. [Google Scholar]
- 44.Portilla J., Strela V., Wainwright M.J., and Simoncelli E.P., Image denoising using scale mixtures of Gaussians in the wavelet domain, IEEE. Trans. Image. Process. 12 (2003), pp. 1338–1351. [DOI] [PubMed] [Google Scholar]
- 45.Punzo A. and Bagnato L., Allometric analysis using the multivariate shifted exponential normal distribution, Biom. J. 62 (2020), pp. 1525–1543. [DOI] [PubMed] [Google Scholar]
- 46.Saha P., Roy N., Mukherjee D., and Sarkar A.K., Application of principal component analysis for outlier detection in heterogeneous traffic data, Procedia. Comput. Sci. 83 (2016), pp. 107–114. The 7th International Conference on Ambient Systems, Networks and Technologies (ANT 2016) / The 6th International Conference on Sustainable Energy Information Technology (SEIT-2016) / Affiliated Workshops. [Google Scholar]
- 47.Singh S. and Póczos B., Nonparanormal information estimation, Proceedings of the 34th International Conference on Machine Learning, Vol. 70 of Proceedings of Machine Learning Research, International Convention Centre, Sydney, Australia. PMLR, 2017, pp. 3210–3219.
- 48.Sklar M., Fonctions de repartition an dimensions et leurs marges, Publ. Inst. Stat. Univ. Paris 8 (1959), pp. 229–231. [Google Scholar]
- 49.Stefanski L.A., A normal scale mixture representation of the logistic distribution, Stat. Probab. Lett. 11 (1990), pp. 69–70. [Google Scholar]
- 50.Stute W. and Werner U., Nonparametric Estimation of Elliptically Contoured Densities, Springer Netherlands, Dordrecht, 1991. pp. 173–190. [Google Scholar]
- 51.Tak H., Ellis J.A., and Ghosh S.K., Robust and accurate inference via a mixture of Gaussian and Student's t errors, J. Comput. Graph. Stat. 28 (2019), pp. 415–426. [Google Scholar]
- 52.Tyler D.E., A distribution-free M-estimator of multivariate scatter, Ann. Statist. 15 (1987), pp. 234–251. [Google Scholar]
- 53.Wainwright M.J. and Simoncelli E.P., Scale mixtures of Gaussians and the statistics of natural images, in Proceedings of the 12th International Conference on Neural Information Processing Systems, NIPS'99, MIT Press, Cambridge, MA, USA, 1999, pp. 855–861.
- 54.Wang X. and Yan J., Practical notes on multivariate modeling based on elliptical copulas, J. Soc. Française Stat. 154 (2013), pp. 102–115. [Google Scholar]
- 55.Wille A., Zimmermann P., Vranová E., Fürholz A., Laule O., Bleuler S., Hennig L., Prelic A., von Rohr P., Thiele L., Zitzler E., Gruissem W., and Bühlmann P., Sparse graphical gaussian modeling of the isoprenoid gene network in arabidopsis thaliana, Genome. Biol. 5 (2004), pp. R92–R92. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 56.Zhang T., Cheng X., and Singer A., Marčenko–Pastur law for Tyler's M-estimator, J. Multivar. Anal. 149 (2016), pp. 114–123. [Google Scholar]




