Robust inference for skewed data in health sciences

Amarnath Nandy; Ayanendranath Basu; Abhik Ghosh

doi:10.1080/02664763.2021.1891527

. 2021 Feb 25;49(8):2093–2123. doi: 10.1080/02664763.2021.1891527

Robust inference for skewed data in health sciences

Amarnath Nandy ¹, Ayanendranath Basu ¹, Abhik Ghosh ^1,^CONTACT

PMCID: PMC9225436 PMID: 35757589

ABSTRACT

Health data are often not symmetric to be adequately modeled through the usual normal distributions; most of them exhibit skewed patterns. They can indeed be modeled better through the larger family of skew-normal distributions covering both skewed and symmetric cases. Since outliers are not uncommon in complex real-life experimental datasets, a robust methodology automatically taking care of the noises in the data would be of great practical value to produce stable and more precise research insights leading to better policy formulation. In this paper, we develop a class of robust estimators and testing procedures for the family of skew-normal distributions using the minimum density power divergence approach with application to health data. In particular, a robust procedure for testing of symmetry is discussed in the presence of outliers. Two efficient computational algorithms are discussed. Besides deriving the asymptotic and robustness theory for the proposed methods, their advantages and utilities are illustrated through simulations and a couple of real-life applications for health data of athletes from Australian Institute of Sports and AIDS clinical trial data.

Keywords: Skew normal (SN) distribution, robust minimum density power divergence estimation, wald-type test, test for symmetry, Genetic algorithm, Influence function

1. Introduction

Health science is an integral part of medical research where the objective is to improve the quality of human (as well as animal) health through appropriate scientific insight generation and necessary policy implementation. The backbone of health science research is the efficient analyses of heath data obtained from several designed or observational medical experiments or surveys with appropriate target questions in mind. The innovations and insights generated from such analyses are essential in medical research to develop cures to different sorts of illness and ensure better health quality; they are also important for any country (and even globally) to prepare appropriate health policies.

Often, the conventional statistical distribution used in modeling different kinds of health data is the bell-shaped and symmetric normal distribution. Although it works for some health measurements like height or weight of patients, etc., most health data, specially those measured on some clinical metrics, often exhibit empirical skewness so that the conventional normal distribution cannot be used to model/analyze them [45]. Among several possible parametric distributions to model skewed data, possibly the most popular one is the Azzalini-type Skew Normal (SN) distribution family [1–4] which also covers the usual symmetric normal distribution as a special case; see Figure 1 for a wide variety of distributional shapes (densities) of the SN distribution obtained by varying the shape parameter. Lately, this SN distribution has been successfully applied to model and analyze different types of recent biomedical data [6,14,16–19,26,28,33,34,40,43,45,47,49,50,54,57,60,62]. In this paper, we focus on the SN distribution family for modeling data from different health measurements under one umbrella and on the inference using the estimators of the corresponding SN parameters.

Figure 1. — Probability densities of SN distribution with parameters $μ = 0$ , $σ = 1$ and different values of γ.

The SN distribution is defined in terms of three parameters, namely, the location parameter $μ \in R$ , the scale parameter $σ \in R^{+}$ and the skewness parameter $γ \in R$ and is denoted by $S N (μ, σ, γ)$ . In particular, if $μ = 0$ and $σ = 1$ , it is referred to as the standard SN distribution and is denoted by $S N (γ)$ . The probability density function (pdf) and the cumulative distribution function (cdf) of the SN(μ,σ,γ) distribution are given, respectively, as

\begin{aligned} f_{θ} (x) & = \frac{2}{σ} ϕ (\frac{x - μ}{σ}) Φ (γ \frac{x - μ}{σ}), x \in R, \end{aligned}

(1)

\begin{aligned} F_{θ} (x) & = Φ (\frac{x - μ}{σ}) - 2 T (\frac{x - μ}{σ}, γ), x \in R, \end{aligned}

(2)

where $θ = (μ, σ, γ)^{T}$ is the vector of unknown parameters, ϕ and Φ are the pdf and the cdf of the standard normal distribution, respectively, and $T (h, a)$ is Owen's function defined as

T (h, a) = \frac{1}{2 π} \int_{0}^{a} \frac{e^{- \frac{h^{2}}{2} (1 + x^{2})}}{1 + x^{2}} d x, h, a \in R .

The mean, variance and skewness ( $γ_{1}$ ) of a random variable X having SN(μ,σ,γ) distribution are given by $E (X) = μ + σ δ \sqrt{\frac{2}{π}}$ , $V a r (X) = σ^{2} (1 - \frac{2 δ^{2}}{π})$ , with $δ = γ (1 + γ^{2})^{- 1 / 2}$ , and $γ_{1} = \frac{(4 - π) γ^{3}}{2 (\frac{π}{2} + (\frac{π}{2} - 1) γ^{2})^{3 / 2}} .$ Clearly, the SN distribution is positively and negatively skewed according to the sign of the parameter γ; see Figure 1. At the particular case $γ = 0$ , the SN distribution $S N (μ, σ, 0)$ has skewness zero and coincides with symmetric normal distribution, $N (μ, σ^{2})$ , having mean μ and variance $σ^{2}$ .

Given a random sample $X_{1}, \dots, X_{n}$ from a skewed population, we can fit the SN distribution by estimating the parameters $θ = (μ, σ, γ)^{T}$ based on the observed data and the subsequent inference can be done based upon these estimates. The usual method of estimation under SN model is the maximum likelihood estimator (MLE) which is asymptotically the most efficient at the model. But, a major drawback of the MLE is its extreme non-robust nature against data contaminations, outliers or model misspecifications; this further makes all the MLE-based inference highly unstable yielding incorrect insights. However, it is not unusual to have some outlying observations in modern complex datasets due to several external or erroneous factors/activities. Hence, a robust inference procedure automatically taking care of the noises (outliers) in the data would be of great practical value to produce stable and more precise research insights leading to better policy formulation. To further motivate our work in the context of health data analyses, let us consider the following real data example.

A motivating example (AIS data):

We consider the data on health measurements of 706 Australian athletes from 12 different sports which were collected at the Australian Institute of Sports (AIS) in 1990 by Telford and Cunningham [53] to investigate the relationships of the five routine hematological measures, namely, the hemoglobin concentration (HC), hematocrit (H), red cell count (RCC), white cell count (WCC) and plasma ferritin concentration (PFC) in the blood of these athletes with their height (Ht), weight (Wt) and the sports type. These measurements are recorded on 1604 occasions from each athlete based on the blood samples collected from their forearm vein amidst periods of moderate to intense training but at least 6 h after a training session. Some important derived health measurements like body-mass index (BMI) and lean body mass (LBM) are also reported. The data were later used by several researchers in different statistical inference problems; in particular, few of them fitted the SN distribution with the MLE but only to a few measurements and/or a part of the data [43,61].

Let us here consider eight important health measurements, namely, HC, RCC, WCC, PFC, BMI, LBM, Ht and Wt, from 202 athletes as available in the R package ‘DAAG’, and plot the corresponding histograms and box-plots in Figure 2. From the figure, it may be seen that several of these variables exhibit clearly skewed patterns. For example, RCC, WCC, PFC, BMI and LBM have between mild to prominently pronounced positive skewness. Ht, on the other hand, has negative skewness. The other two, HC and Wt, are more difficult to judge visually. The SN family of distributions, therefore, can be used to model all these health measurements. However, in all the cases, the respective box-plots reveal one or more outlying values which makes the MLE and the associated inference highly unstable. The MLE-based fits are also shown in the figures along with the histogram, which clearly show the inability of the MLE to adequately model the bulk of the data due to the presence of few outlying points. In particular, the fitted distributions (by MLE) have a somewhat different mode and skewness compared to the majority of the empirical data for the measurements HC, PFC, BMI, LBM and Ht due to strong outlier effects. We have also computed the MLE after removing the outliers identified through respective box plots which are presented in Table 6 along with the full data MLE (and their standard errors); the changes in the estimates due to the presence of outliers are quite drastic in most cases. Although there is only one outlier in HC, its effect is quite dramatic in that the deletion of this single observation leads to a reversal in the sign of γ; the MLE of γ for the full data (with the outlier) is 0.9655 and any standard testing procedure based on the MLE will reject the hypothesis of negative skewness of this distribution although the removal of this single outlier produces a value of $- 1.7941$ as the MLE of γ. The MLE of σ increases drastically for PFC, BMI and Wt due to the presence of outliers (73.8403, 4.1327 and 17.6825, respectively, for the full data compared to the corresponding MLEs 57.6705, 2.3489, and 13.0243 for the outlier deleted data) and hence results in an inadequate fit for their mode. These examples clearly illustrate the non-robust and unstable nature of the MLE-based inference under the SN distribution in the presence of outliers leading to contradictory insights!

Figure 2. — Histograms and box-plots of different Health indicator variables from the AIS Data. The SN distribution fitted by the MLE is also shown along with the histograms. (a) Hemoglobin Conc. (HC), (b) red cell count (RCC), (c) white cell count (WCC), (d) plasma Ferritin Conc. (PFC), (e) Body Mass Index (BMI), (f) lean body mass (LBM), (g) height (Ht) and (h) weight (Wt)

Table 6.

The SN parameter estimates (standard errors) for the measurements in AIS data, obtained through MDPDEs at different α, the MLE (at $α = 0$ ) and the outlier deleted MLE. The number of outliers found for each measurements are reported after their name in the first column.

		α
Variable(Outlier)		0(MLE)	0.1	0.3	0.5	0.7	1.0	Outlier deleted MLE
HC	μ	40.664	43.387	45.384	46.382	46.382	46.383	46.440
(1)		(0.187)	(0.339)	(0.546)	(0.670)	(1.084)	(1.463)	(0.336)
	σ	4.387	4.883	4.881	4.876	4.876	4.876	4.880
		(0.185)	(0.330)	(0.438)	(0.641)	(1.300)	(1.802)	(0.333)
	γ	0.966	−1.752	−1.762	−1.766	−1.766	−1.766	−1.794
		(0.104)	(0.472)	(0.661)	(0.942)	(1.354)	(1.979)	(0.449)
RCC	μ	4.296	4.543	4.539	4.525	4.528	4.529	4.543
(1)		(0.020)	(0.021)	(0.022)	(0.024)	(0.027)	(0.028)	(0.018)
	σ	0.622	0.525	0.530	0.504	0.521	0.533	0.466
		(0.024)	(0.026)	(0.027)	(0.028)	(0.031)	(0.035)	(0.022)
	γ	1.607	0.506	0.502	0.500	0.499	0.498	0.499
		(0.086)	(0.090)	(0.095)	(0.100)	(0.107)	(0.153)	(0.086)
WCC	μ	5.106	5.481	5.472	5.471	5.471	5.471	5.475
(4)		(0.098)	(0.099)	(0.105)	(0.106)	(0.119)	(0.156)	(0.094)
	σ	2.690	2.311	2.197	2.193	2.191	2.191	2.184
		(0.105)	(0.109)	(0.125)	(0.126)	(0.147)	(0.196)	(0.099)
	γ	2.727	1.716	1.712	1.712	1.712	1.712	1.703
		(0.146)	(0.164)	(0.164)	(0.166)	(0.216)	(0.250)	(0.143)
PFC	μ	20.244	23.197	23.196	23.196	23.196	23.196	23.226
(12)		(1.519)	(1.595)	(1.742)	(1.834)	(1.977)	(2.200)	(1.546)
	σ	73.840	67.835	61.832	57.832	57.832	57.832	57.671
		(2.905)	(3.178)	(3.663)	(3.689)	(4.386)	(4.868)	(2.934)
	γ	9.143	7.096	6.097	6.097	6.097	6.097	6.066
		(0.361)	(0.606)	(0.628)	(0.854)	(1.097)	(1.783)	(0.402)
BMI	μ	19.970	21.344	22.294	22.291	22.291	22.291	22.315
(7)		(0.066)	(0.067)	(0.072)	(0.077)	(0.079)	(0.169)	(0.060)
	σ	4.133	2.646	2.386	2.369	2.368	2.369	2.349
		(0.125)	(0.132)	(0.136)	(0.146)	(0.159)	(0.190)	(0.117)
	γ	2.313	1.227	0.595	0.194	0.195	0.194	0.174
		(0.084)	(0.087)	(0.092)	(0.096)	(0.105)	(0.216)	(0.086)
LBM	μ	50.383	50.953	50.953	50.953	50.953	50.953	50.958
(1)		(0.765)	(0.796)	(0.801)	(0.854)	(0.959)	(1.351)	(0.768)
	σ	19.493	18.726	18.727	18.726	18.726	18.726	18.718
		(0.856)	(0.886)	(0.955)	(1.124)	(1.349)	(1.978)	(0.840)
	γ	2.424	2.197	2.197	2.197	2.197	2.197	2.195
		(0.106)	(0.199)	(0.199)	(0.212)	(0.229)	(0.531)	(0.106)
Ht	μ	187.072	184.794	184.794	184.794	184.794	184.794	184.771
(3)		(0.582)	(0.612)	(0.623)	(0.647)	(0.653)	(1.095)	(0.571)
	σ	11.952	10.115	10.113	10.113	10.112	10.112	10.094
		(0.321)	(0.339)	(0.384)	(0.435)	(0.517)	(0.517)	(0.315)
	γ	−1.074	−0.673	−0.676	−0.677	−0.678	−0.678	−0.674
		(0.154)	(0.166)	(0.178)	(0.185)	(0.195)	(0.284)	(0.153)
Wt	μ	64.066	69.063	72.063	72.063	72.063	72.062	72.143
(4)		(0.378)	(0.395)	(0.416)	(0.437)	(0.495)	(0.758)	(0.373)
	σ	17.682	14.076	13.072	13.071	13.071	13.071	13.024
		(0.646)	(0.686)	(0.741)	(0.748)	(0.805)	(0.874)	(0.645)
	γ	1.232	0.848	0.450	0.250	0.250	0.250	0.240
		(0.085)	(0.087)	(0.092)	(0.097)	(0.104)	(0.120)	(0.085)

Open in a new tab

However, the use of SN distribution for modeling the health measurements and its skewness is indeed justifiable through its distributional structure as well as technically from the concept of selective sampling [3,5]. For a brief explanation, suppose that we want to model a health measurement variable $U_{1}$ , which is assumed to be standardized for simplicity. In most cases, health data are collected from a random sample of an appropriately defined subpopulation satisfying a minimum health standard; in the above example of AIS data, all observations are collected from trained athletes who are known to be healthier than others (in some appropriate health measurement scale). Suppose such a subpopulation is defined in terms of the condition $U_{0} > τ$ for a population random variable $U_{0}$ ; without loss of generality, we may assume $U_{0}$ to be also standardized. Assume $τ = 0$ , $U_{0}$ is normally distributed and has a correlation ρ with $U_{1}$ . Then, even if $U_{1}$ is normally distributed in the population, its distribution over the subpopulation, i.e. conditional distribution given $U_{0} > 0$ is indeed SN(γ) with $γ = ρ (1 - ρ^{2})^{- 1 / 2}$ . It can only be symmetric if the target variable is uncorrelated with sub-population defining variable $U_{0}$ .

Therefore, the SN distribution is inevitable in health data analyses and it is the MLE which makes the ultimate inference erroneous under data contamination. Therefore, it is important to develop an appropriate robust inference methodology for the SN distribution family. Unfortunately, little attention has been paid on this issue in the literature, except for a few discrete attempts for some particular applications only [8,30,44,64]. In this paper, we develop a simple yet highly efficient robust inference procedure for the SN distribution that can even be generalized to any complex inference problem associated with skewed data quite easily.

It is also possible to estimate skewness nonparametrically [e.g. 12,13,31,37], so a few brief words are necessary here to argue in favor of the advantages of the parametric approach considered in this paper. It is well known that parametric inference, when based on properly specified models, is more efficient (sometimes significantly more efficient), than the corresponding nonparametric models. Evidently, no model fits a set of real data exactly, so the model specification becomes important. However, robust parametric methods have the advantage that they remain useful and informative even when the specified parametric model is only approximately correct (and therefore not as restrictive as the strictly defined parametric model in classical inference). Thus we trust that unless the data are wildly discrepant in comparison to the skew normal model, our method will perform better than the available nonparametric solutions to this problem. The same will indeed be justified further via numerical simulation studies where the significantly superior performance of our proposed parametric estimators are illustrated in comparison to two most recent existing non-parametric estimates of skewness from [12,37].

Among several approaches to robust inference, we consider the minimum distance approach to estimate the parameters of the SN distribution by minimizing an appropriate divergence (distance) measure between the data and the model density. In particular, we consider the density power divergence (DPD) measure [9] which has lately been extremely popular because of generating extremely robust estimators along with high asymptotic efficiency [10,11,20–25]. In this paper, we first define the minimum DPD estimator (MDPDE) of the parameters of the SN distribution based on a random sample and discuss its asymptotic properties like consistency and asymptotic normality. Their asymptotic variance can then be consistently estimated to obtain the standard errors of our proposed estimators and their robustness properties are discussed through the influence function analysis. Since there are complexities in the computation of the MLE itself for SN distribution [61], the computation of the MDPDE is also challenging; we have developed an efficient algorithm for this purpose using the concept of Genetic Algorithms [48]. We next develop a robust Wald-type test based on the proposed MDPDE along with their asymptotic and robustness properties. The important particular case of testing the hypothesis of symmetry ( $γ = 0$ ) under the SN alternatives is discussed in great detail. The fixed-sample performances of the proposed estimation and testing procedures are illustrated through extensive simulation studies. Our proposals are then applied to reanalyze the motivating AIS dataset as well as to analyze data from an AIDS clinical trial for robust inferential insights. Finally, the paper ends with some concluding discussion about our work and its possible future extensions.

2. The minimum DPD estimation for the SN distributions

2.1. Estimating equation

The DPD family [9] is indexed by a single tuning parameter $α \geq 0$ , controlling the trade-off between robustness and efficiency. For two densities g and f, both being absolutely continuous with respect to some common dominating measure μ, the DPD measure between g and f is defined as

\begin{aligned} d_{α} (g, f) & = \int \{f^{1 + α} - (1 + \frac{1}{α}) g f^{α} + \frac{1}{α} g^{1 + α}\} d μ, if α > 0, \end{aligned}

(3)

\begin{aligned} d_{0} (g, f) & = lim_{α ↓ 0} d_{α} (g, f) = \int g \log \frac{g}{f} d μ . \end{aligned}

(4)

Note that, the DPD at $α = 0$ is nothing but the well-known Kullback–Leibler divergence (KLD) associated with the likelihood approach. We need to minimize the DPD measure between the estimated data density and the postulated model density to obtain the ‘best fitted’ model and the corresponding parameter estimates.

Suppose we have a random sample $X_{1}, \dots, X_{n}$ from a population having true density g with the associated distribution function G (with the associated measure μ being the Lebesgue measure). We wish to model them by the SN distribution having density $f_{θ}$ and distribution function $F_{θ}$ , which are given in (1) and (2), respectively. Then, the minimum DPD estimator (MDPDE) of the unknown model parameter $θ$ is to be obtained by minimizing $d_{α} (\hat{g}, f_{θ})$ over the parameter space $Θ = R \times R^{+} \times R$ , where $\hat{g}$ is an estimate of g based on the observed sample. One major advantage of the DPD measure is that we can avoid estimating density g by nonparametric smoothing, which often has several complications like bandwidth selection, curse of dimensionality, etc. This is because we can rewrite the form of the DPD from (3), using the relation $g d μ = d G$ (since g is an absolutely continuous density of G with respect to the dominating measure μ), as

d_{α} (g, f_{θ}) = \int f_{θ}^{1 + α} d μ - (1 + \frac{1}{α}) \int f_{θ}^{α} d G + K,

where the last term $K = \frac{1}{α} \int g^{1 + α} d μ$ is independent of the parameter $θ$ and has no effect in our target minimization with respect to $θ \in Θ$ . Noting that the second term can be estimated just by plugging in the empirical estimate of G, namely the empirical CDF obtained based on the sample $X_{1}, \dots, X_{n}$ , the MDPDE can be obtained by minimizing the simpler objective function

H_{n} (θ) = \int f_{θ}^{1 + α} d μ - (1 + \frac{1}{α}) \frac{1}{n} \sum_{i = 1}^{n} f_{θ} (X_{i})^{α} .

For the present case of SN distribution, using the form of $f_{θ}$ from (1) with μ being the usual Lebesgue measure, the above MDPDE objective function has the form

\begin{aligned} H_{n} (θ) = H_{n} (μ, σ, γ) & = \int_{- \infty}^{\infty} {(\frac{2}{σ})}^{α + 1} ϕ {(\frac{x - μ}{σ})}^{α + 1} Φ {(γ \frac{x - μ}{σ})}^{α + 1} d x \\ - (1 + \frac{1}{α}) \frac{1}{n} \sum_{i = 1}^{n} {(\frac{2}{σ})}^{α} ϕ {(\frac{x_{i} - μ}{σ})}^{α} Φ {(γ \frac{x_{i} - μ}{σ})}^{α} . \end{aligned}

(5)

Note that, the integral part of the objective function (5) does not have a tractable closed-form expression, and hence we need to compute it numerically during the simultaneous minimization of $H_{n} (μ, σ, γ)$ with respect to the three parameters $(μ, σ, γ)$ . By standard differentiation, we get the estimating equations of our MDPDE as given by

\frac{1}{n} \sum_{i = 1}^{n} u_{θ} (X_{i}) {(\frac{2}{σ})}^{α} ϕ {(\frac{x_{i} - μ}{σ})}^{α} Φ {(γ \frac{x_{i} - μ}{σ})}^{α} = ξ_{α} (θ),

(6)

where $u_{θ} (x) = \frac{\partial}{\partial θ} \log f_{θ} (x)$ is the score function of the SN distribution and has the form

\begin{aligned} u_{θ} (x) & = (\frac{x - μ}{σ^{2}} - \frac{γ ϕ (γ \frac{x - μ}{σ})}{σ Φ (γ \frac{x - μ}{σ})}, \frac{(x - μ)^{2}}{σ^{3}} - γ \frac{(x - μ)}{σ^{2}} \frac{ϕ (γ \frac{x - μ}{σ})}{Φ (γ \frac{x - μ}{σ})} \\ - {\frac{1}{σ}, \frac{(x - μ)}{σ} \frac{ϕ (γ \frac{x - μ}{σ})}{Φ (γ \frac{x - μ}{σ})})}^{T}, \end{aligned}

(7)

and

ξ_{α} (θ) = \int u_{θ} (x) f_{θ} (x)^{α + 1} d x = {(ξ_{α}^{(1)} (θ), ξ_{α}^{(2)} (θ), ξ_{α}^{(3)} (θ))}^{T} .

(8)

Clearly, there is no closed form solution of the above MDPDE estimating equations in (6) and we need to solve them numerically in order to obtain the MDPDEs based on a given sample. An efficient method for the computation of the MDPDE is dicussed later in Section 3.

It is important to note that the MDPDE is indeed an M-estimator, since its estimating equation can be written in the form $\sum_{i} ψ (X_{i}, θ) = 0$ for a model-based ψ-function; see Equation (6) to identify it. Further, as $α \to 0$ , the MDPDE objective function in (5) satisfies $[H_{n} (θ) + \frac{1}{α}] \to 1 -$ the log-likelihood function, and the MDPDE estimating equation in (6) coincides with the usual score equation leading to the MLE. Hence, the MDPDEs at $α > 0$ can be thought of as a generalization of the MLE to achieve greater robustness against data contamination.

2.2. Asymptotic efficiency and standard error

The asymptotic distribution of the MDPDE for the present case of the SN distribution can easily be obtained from its general theory or the M-estimation theory. In particular, the minimum DPD estimators are $\sqrt{n}$ -consistent and asymptotically normal. At a given $α \geq 0$ , if the corresponding MDPDE obtained based on a random sample of size n is denoted by ${\hat{θ}}_{α, n}$ , and the true parameter value is $θ_{0}$ , we have

\sqrt{n} ({\hat{θ}}_{α, n} - θ_{0}) \overset{D}{\to} N_{3} (0_{3}, Σ_{α} (θ)),

where $0_{p}$ is a p-vector with all entries zero and $Σ_{α} (θ) = J_{α} (θ)^{- 1} K_{α} (θ) J_{α} (θ)^{- 1} .$ Here, for SN distributions with $θ = (μ, σ, γ)^{T}$ , the $3 \times 3$ matrices $K_{α} (θ)$ and $J_{α} (θ)$ are given by

\begin{aligned} J_{α} (θ) & = \int u_{θ} (x) u_{θ}^{T} (x) f_{θ} (x)^{α + 1} d x = (\begin{matrix} \begin{array}{ccc} N_{α}^{(11)} (θ) & N_{α}^{(12)} (θ) & N_{α}^{(13)} (θ) \\ N_{α}^{(12)} (θ) & N_{α}^{(22)} (θ) & N_{α}^{(23)} (θ) \\ N_{α}^{(13)} (θ) & N_{α}^{(23)} (θ) & N_{α}^{(33)} (θ) \end{array} \end{matrix}), \end{aligned}

(9)

\begin{aligned} K_{α} (θ) & = \int u_{θ} (x) u_{θ}^{T} (x) f_{θ} (x)^{2 α + 1} d x - ξ_{α} (θ) ξ_{α} (θ)^{T} \\ = (\begin{matrix} N_{2 α}^{(11)} (θ) - ξ_{α}^{(1)} (θ)^{2} & N_{2 α}^{(12)} (θ) - ξ_{α}^{(1)} (θ) ξ_{α}^{(2)} (θ) \\ N_{2 α}^{(12)} (θ) - ξ_{α}^{(1)} (θ) ξ_{α}^{(2)} (θ) & N_{2 α}^{(22)} (θ) - ξ_{α}^{(2)} (θ)^{2} \\ N_{2 α}^{(13)} (θ) - ξ_{α}^{(1)} (θ) ξ_{α}^{(3)} (θ) & N_{2 α}^{(23)} (θ) - ξ_{α}^{(2)} (θ) ξ_{α}^{(3)} (θ) \end{matrix} \\ \begin{matrix} N_{2 α}^{(13)} (θ) - ξ_{α}^{(1)} (θ) ξ_{α}^{(3)} (θ) \\ N_{2 α}^{(23)} (θ) - ξ_{α}^{(2)} (θ) ξ_{α}^{(3)} (θ) \\ N_{2 α}^{(33)} (θ) - ξ_{α}^{(3)} (θ)^{2} \end{matrix}), \end{aligned}

(10)

where $ξ_{α} (θ)$ and $f_{θ}$ are as defined in (8) and (1), respectively, and

\begin{aligned} N_{α}^{(11)} (θ) & = \int_{- \infty}^{\infty} ((\frac{x - μ}{σ^{2}}) - \frac{γ}{σ} \frac{ϕ (γ \frac{x - μ}{σ})}{Φ (γ \frac{x - μ}{σ})})^{2} f_{θ} (x)^{α + 1} d x \\ N_{α}^{(22)} (θ) & = \int_{- \infty}^{\infty} (\frac{(x - μ)^{2}}{σ^{3}} - \frac{1}{σ} - \frac{γ (x - μ)}{σ^{2}} \frac{ϕ (γ \frac{x - μ}{σ})}{Φ (γ \frac{x - μ}{σ})})^{2} f_{θ} (x)^{α + 1} d x \\ N_{α}^{(33)} (θ) & = \int_{- \infty}^{\infty} (\frac{x - μ}{σ} \frac{ϕ (γ \frac{x - μ}{σ})}{Φ (γ \frac{x - μ}{σ})})^{2} f_{θ} (x)^{α + 1} d x \\ N_{α}^{(12)} (θ) & = \int_{- \infty}^{\infty} ((\frac{x - μ}{σ^{2}}) - \frac{γ}{σ} \frac{ϕ (γ \frac{x - μ}{σ})}{Φ (γ \frac{x - μ}{σ})}) \\ \times (\frac{(x - μ)^{2}}{σ^{3}} - \frac{1}{σ} - \frac{γ (x - μ)}{σ^{2}} \frac{ϕ (γ \frac{x - μ}{σ})}{Φ (γ \frac{x - μ}{σ})}) f_{θ} (x)^{α + 1} d x \\ N_{α}^{(13)} (θ) & = \int_{- \infty}^{\infty} ((\frac{x - μ}{σ^{2}}) - \frac{γ}{σ} \frac{ϕ (γ \frac{x - μ}{σ})}{Φ (γ \frac{x - μ}{σ})}) (\frac{x - μ}{σ} \frac{ϕ (γ \frac{x - μ}{σ})}{Φ (γ \frac{x - μ}{σ})}) f_{θ} (x)^{α + 1} d x \\ N_{α}^{(23)} (θ) & = \int_{- \infty}^{\infty} (\frac{(x - μ)^{2}}{σ^{3}} - \frac{1}{σ} - \frac{γ (x - μ)}{σ^{2}} \frac{ϕ (γ \frac{x - μ}{σ})}{Φ (γ \frac{x - μ}{σ})}) \\ \times (\frac{x - μ}{σ} \frac{ϕ (γ \frac{x - μ}{σ})}{Φ (γ \frac{x - μ}{σ})}) f_{θ} (x)^{α + 1} d x . \end{aligned}

Here we can see that the above integrals do not have closed forms, but we can numerically calculate them to compute the asymptotic variance matrix at different given values of α and $θ$ . Based on these formulas, we can study the asymptotic relative efficiency (ARE) of the proposed MDPDE which are presented in Table 1 at different values of $θ = (μ, σ, γ)^{T}$ for the SN distribution. Note that, these AREs decrease with increasing α but the loss in efficiency is not quite significant at small positive α.

Table 1.

Asymptotic relative efficiency of the MDPDEs of $(μ, σ, γ)^{T}$ in different SN distributions.

	α
Distribution		0(MLE)	0.05	0.1	0.2	0.3	0.5	0.7	1
SN(0,1,1)	μ	100	99.76	98.13	94.77	86.08	77.26	68.19	58.13
	σ	100	99.10	95.45	91.40	86.93	76.20	64.70	52.16
	γ	100	98.94	95.51	92.42	90.25	84.24	76.18	65.20
SN(0,1,0)	μ	100	99.41	98.22	92.81	85.34	77.00	68.92	57.23
	σ	100	98.87	96.11	91.09	85.66	74.39	63.91	52.82
	γ	100	99.24	98.57	92.86	91.34	83.76	76.82	66.95
SN(0,1,-1)	μ	100	99.07	96.48	91.34	85.55	79.75	68.36	58.44
	σ	100	98.84	95.37	90.66	84.18	76.68	65.48	54.90
	γ	100	98.17	94.96	91.23	89.96	81.19	72.97	65.69

Open in a new tab

The above asymptotic variance formula can also help us to obtain the standard errors of the MDPDEs in any practical application. For the MDPDE ${\hat{θ}}_{α, n} = ({\hat{μ}}_{α, n}, {\hat{σ}}_{α, n}, {\hat{γ}}_{α, n})^{T}$ , obtained based on a sample of size n, its standard errors are given by $\sqrt{Σ_{α}^{(11)} (θ_{0}) / n}$ , $\sqrt{Σ_{α}^{(22)} (θ_{0}) / n}$ and $\sqrt{Σ_{α}^{(33)} (θ_{0}) / n}$ , respectively, where $Σ_{α}^{(i j)} (θ)$ denotes the $(i, j)$ -th element of the asymptotic variance matrix $Σ_{α} (θ)$ for i, j = 1, 2, 3. A consistent estimate of $Σ_{α} (θ)$ is given by $Σ_{α} ({\hat{θ}}_{α, n})$ , from which we can easily estimate (consistently) the standard errors of the MDPDEs of each parameter $μ, σ$ and γ.

2.3. Robustness: influence function analysis

The robustness of an estimator can be theoretically examined through the classical influence function (IF) analysis [29]. The IF indeed measures the asymptotic (standardized) bias of the estimator caused by an infinitesimal contamination at a distant contamination point (say y). Therefore, the boundedness of the IF over the contamination point y restricts the extent of possible bias finitely for the corresponding estimator indicating its robust nature (sometime also referred to as B-robustness to emphasis the boundedness of bias). On the other hand, an unbounded IF indicates possible unbounded bias and non-robustness of the estimator. Further, with similar intuition, the supremum of the absolute IF taken over all possible contamination points naturally indicates the extent of (bias) robustness of the corresponding estimator.

From the theory of M estimator [29] or that of the general MDPDE [9,11], one can obtain the influence function of the MDPDE functional, say $T_{α}$ for a tuning parameter α, under the present case of SN distribution which is given by

I F (y, T_{α}, θ) = J_{α} (θ)^{- 1} [u_{θ} (y) f_{θ} (y)^{α} - ξ_{α} (θ)],

(11)

where $ξ_{α}$ and $J_{α}$ are defined as in (8) and (9), respectively. Now, the form of the SN density $f_{θ}$ in (1) and the corresponding score function $u_{θ}$ in (7) clearly indicates that the above IF is bounded in y for all $α > 0$ and unbounded at $α = 0$ . We have presented the IFs of the three parameters $(μ, σ, γ)$ in Figure 3 for different values of α which clearly illustrate their boundedness for $α > 0$ . This demonstrates the claimed robustness of the MDPDEs at any $α > 0$ and the well-known non-robustness of the MLE at $α = 0$ . Additionally, we can clearly observe the redescending nature of the IFs with increasing values of α which, in turn, indicates the greater extent of robustness with increasing α.

Figure 3. — Influence functions (IF) of the MDPDEs of the parameters of the SN distribution at SN(0,1,1) for different α [Thin solid line: $α = 0$ (MLE); dashed line: $α = 0.1$ ; dotted line: $α = 0.3$ ; dash-dotted line: $α = 0.5$ ; thick solid line: $α = 1$ ].(a) IF for μ. (b) IF for σ. (c) IF for γ.

3. Computation of the MDPDE

In order to compute the MDPDE, we need to minimize the objective function in (5) simultaneously with respect to the three parameters $(μ, σ, γ)$ , or equivalently solve the three estimating equations given in (6). These are not straightforward numerical exercises due to the complex form of the objective function and standard numerical procedures like Newton-Raphson algorithm fail. It is indeed also a known problem in case of the computation of the MLE for the SN distribution as well, for which some advanced numerical procedures has been tried in the literature. Here, we describe two possible efficient algorithms for the computation of our MDPDE at any given $α > 0$ .

3.1. Genetic algorithm

The genetic algorithm (GA) has been successfully applied for the computation of the MLE under the SN model by [61]. The GA is an useful and appropriately designed randomized search technique to find exact or approximate solutions in an optimization problem. Although John Holland first introduced this algorithm in 1960, it has become popular lately through the works of David Goldberg and others with divergent applications [27,48]. The name unsurprisingly came from its structural similarity with genetic mutations and crossover across generations following the basic principle of the Darwinian Theory of ‘Survival of the Fittest ’. For an optimization problem, we need to consider an appropriate fitness function (often the objective function itself) which produce the fitness value of each possible (candidate) solution under the objective criterion. Then, in brief, the GA starts with an initial set of candidate solutions (chromosomes) and iterates over the subsequent generations to produce new sets of solutions (chromosomes) through recombination and mutation where the solutions with better fitness values have a higher chance to be there in the subsequent generation so that the objective function is improved towards optimality.

To compute the MDPDE using GA, we consider the objective function $H_{n} (θ) = H_{n} (μ, σ, γ)$ as the fitness function with lower values indicating greater fitness of the solution vector $H_{n} (θ = (μ, σ, γ)$ . Then, the algorithm traverses through the following steps.

GA for Computation of the MDPDE:

We start with an initial set of N candidate solutions denoted as $P^{(0)} = {θ_{1}^{(0)}, θ_{2}^{(0)}, \dots, θ_{N}^{(0)}}$ . Set m = 0.
Compute the fitness function $H_{n} (θ)$ for each solutions in $P^{(m)} = {θ_{1}^{(m)}, θ_{2}^{(m)}, \dots, θ_{N}^{(m)}}$ .
From the set $P^{(m)}$ , we choose some parent solutions to generate new solutions (offsprings) through the ‘Fitness Proportionate Selection’ scheme, where the probability of selection is proportional to the (better) fitness values.

Alternative schemes like ‘Roulette Wheel Selection’ or ‘Tournament Selection’ [15] can also be used.
We form a new set of N candidate solutions, denoted as $P^{(m + 1)} = {θ_{1}^{(m + 1)}, θ_{2}^{(m + 1)}, \dots, θ_{N}^{(m + 1)}}$ , for the next iteration (generation) using the following two steps:
1. We choose a specific number (say $N_{E}$ ) of elite solutions (survivors) from $P^{(m)}$ which are carried forward over the next iteration (generation) without any alterations. They are again chosen by the criterion of having best fitness values.
2. For generating remaining $(N - N_{E})$ candidate solutions for next generations, we perform crossover and mutation operations (through some weighted combination) to the solutions from $P^{(m)}$ according to some pre-specified crossover probability ( $P_{C}$ ) and mutation probability ( $P_{M}$ ). The crossover leads the solutions to a convergence while mutation increases diversity among the solutions to avoid being stuck at a local optima.
Set m = m + 1 and go to Step 2.
Repeat Step 2 to Step 5, until an appropriate (pre-specified) convergence criteria is satisfied.

When stopped, the fittest solution in the last iteration (generation) is returned as the optimal solution (which is the required MDPDE).

Note that, in order to implement the above GA, we need to first specify the necessary tuning parameters $N, N_{E}, P_{C}, M_{P}$ ; it is suggestive to take $N_{E}$ as $5 %$ of N, a higher value of $P_{C}$ and a lower values of $P_{M}$ for faster convergence [48]. In all our numerical experiments (simulation studies), we have used the R package ‘GA’ to implement the Genetic Algorithm with N = 50, $N_{E} = 2$ , $P_{M} = 0.1$ , $P_{C} = 0.8$ and a maximum of 5000 iterations (generations) as stopping criterion. However, one challenge using this approach is to choose appropriate values of these tuning parameters for any real life application!

3.2. Gradient descent method

The method of gradient descent is another popular first-order iterative optimization algorithm mostly used in Machine Learning [38,51]. To find the minimum of the objective function, this method progresses iteratively by updating the parameter values taking steps proportional to the negative of the gradient (first-order derivative) of the objective function. For choosing these steps in each iteration, there are various types of algorithms available in the literature [46,55]. It is important to note here that this gradient descent approach might converge to just to a local minimum depending on the initial parameter value considered; however, if the function is convex, which is mostly the case for our MDPDE, we expect to achieve the global minimum by starting with any reasonable initial value.

Considering again the MDPDE objective function $H_{n} (θ) = H_{n} (μ, σ, γ)$ , the gradient descent algorithm can be used to find its minimum, i.e. the required MDPDE, through the following steps:

Gradient Descent for Computation of the MDPDE:

Start with an initial parameter value $θ_{0}$ =( $μ_{0}$ , $σ_{0}$ , $γ_{0}$ ) and a step size (tuning parameter) $λ > 0$ .

Set m = 0.
Calculate $\nabla H_{n} (θ_{m})$ , the derivative of the function $H_{n} (θ)$ with respect to $θ$ evaluated at the point $θ_{m}$ (the solution at the m-th step of iteration).
Update the solution at $(m + 1)$ -th step as: $θ_{m + 1} = θ_{m} - λ \nabla H_{n} (θ_{m}) .$
Set m = m + 1 and go to Step 2.
Repeat Step 2 to Step 4, until an appropriate convergence criteria is satisfied.

Here, we only need to choose one tuning parameter λ for the gradient descent algorithm and there exist several suggestions for its optimum selection; see, e.g. [7,63]. For all our numerical illustrations here, we have taken λ=0.04, and the initial parameter value to be the maximum partial likelihood estimates of $θ = (μ, σ, γ)$ , obtained by using the R function ‘sn.mple’, and the convergence criterion as no significant (relative) change in the objective function. It has been observed through the extensive simulation studies that both the gradient descent and genetic algorithm perform quite similarly for the computation of the MDPDE under SN model, with the gradient descent taking significantly less computation time. Accordingly, the real data applications are performed using gradient descent algorithm only.

4. Robust Wald-type tests based on MDPDE

4.1. General theory for composite hypotheses

We now consider the problem of testing statistical hypotheses. Suppose that, based on a random sample $X_{1}, \dots, X_{n}$ from the SN distribution, we want to test the composite hypothesis

H_{0} : θ \in Θ_{0} against H_{1} : θ \notin Θ_{0},

(12)

for some closed subset $Θ_{0}$ of the parameter space Θ. In most applications, the restricted (null) parameter space $Θ_{0}$ is defined by a set of $r \leq 3$ restrictions of the form $m (θ) = 0_{r}$ , where $m : Θ \mapsto R^{r}$ is a known function. We assume that the $3 \times r$ matrix $M (θ) = \frac{\partial m^{T} (θ)}{\partial θ}$ exists, is continuous in $θ$ and rank $[M (θ)] = r$ . The simplest possible case is $Θ_{0} = {θ_{0}}$ for some fixed $θ_{0} = (μ_{0}, σ_{0}, γ_{0}) \in Θ$ , where r = 3, $m (θ) = θ - θ_{0}$ and $M (θ) = I_{3}$ , the identity matrix of order 3. Other common cases for the SN distribution could be testing for one or two parameters (among three) considering the remaining parameter(s) as nuisance. As noted earlier the usual test based on the MLE is non-robust and hence we discuss a Wald-type test based on the robust MDPDEs following Basu et al. [10].

If ${\hat{θ}}_{α, n}$ denotes the MDPDE of $θ$ based on the given sample, the Wald-type test statistic for testing the hypothesis (12) is given by

W_{α, n} = n m^{T} ({\hat{θ}}_{α, n}) [M^{T} ({\hat{θ}}_{α, n}) Σ_{α} ({\hat{θ}}_{α, n}) M ({\hat{θ}}_{α, n})]^{- 1} m ({\hat{θ}}_{α, n}),

(13)

where $J_{α}$ and $K_{α}$ are as defined in (9) and (10), respectively. At $α = 0$ , this Wald-type test statistic coincides with the usual Wald test based on the MLE.

From the asymptotic distribution of the MDPDE ${\hat{θ}}_{α, n}$ in Section 2.2, it immediately follows that $W_{α, n}$ asymptotically follows a (central) chi-squared distribution, $χ_{r}^{2}$ , with r degrees of freedom under the null hypothesis in (12). Therefore, we reject $H_{0}$ in (12) at $τ_{0}$ level of significance if $W_{α, n} \geq χ_{r, τ_{0}}^{2}$ , the upper $(1 - τ_{0})$ -th quantile of $χ_{r}^{2}$ distribution.

From the general theory of Basu et al. [10], the MDPDE-based Wald-type test is consistent at any fixed alternatives. Under the contiguous hypothesis of the form $H_{1, n} : θ_{n} = θ_{0} + n^{- 1 / 2} d$ , with $θ_{0} \in Θ_{0}$ and $d \in R^{3} ∖ {0_{3}}$ , $W_{α, n}$ asymptotically follows a non-central chi-squared distribution, denoted as $χ_{r, δ}^{2}$ , having r degrees of freedom and the non-centrality parameter $δ = d^{T} Q_{α} (θ_{0}) d$ , with $Q_{α} (θ_{0}) = M (θ_{0}) [M^{T} (θ_{0}) Σ_{α} (θ_{0}) M (θ_{0})]^{- 1} M^{T} (θ_{0})$ . Based on this result, an approximate expression of contiguous power function of the test based on $W_{α, n}$ can be calculated as $Π_{α} (θ_{n}) = 1 - G_{χ_{r, δ}^{2}} (χ_{r, τ_{0}}^{2}),$ where $G_{χ_{r, δ}^{2}}$ is the cdf of the $χ_{r, δ}^{2}$ distribution.

The robustness properties of the MDPDE-based Wald-type tests were first discussed by [25] for general parametric models, which also hold for our SN distribution case. For completeness, we restate the main results briefly. At the null distribution with $θ_{0} \in Θ_{0}$ , the first-order IF of the Wald-type test statistic is inconclusive (identically zero) but the second-order IF has the form

I F_{2} (y, W_{α, n}, θ_{0}) = 2 I F (y, T_{α}, θ_{0})^{T} Q_{α} (θ_{0}) I F (y, T_{α}, θ_{0}) .

(14)

Note that, this (second order) IF of the MDPDE-based Wald-type test statistic directly depends on $I F (y, T_{α}, θ_{0})$ , the IF of the underlying MDPDE used. Based on our earlier exploration in Section 2.3, the test IF in (14) will then be bounded in the contamination point y for any $α > 0$ which implies the robustness of the test based on the MDPDE-based statistics in (13).

Again from the general theory of [25], one can see the robustness of the level and power of the MDPDE-based Wald-type tests for any $α > 0$ through their bounded level influence function (LIF) and the power influence function (PIF). In particular, the LIF of any order is identically zero and the PIF for testing at the significance level $τ_{0}$ has the form

P I F (y, W_{α, n}, θ_{0}) = C_{r}^{*} (d^{T} Q_{α} (θ_{0}) d) d^{T} Q_{α} (θ_{0}) I F (y, T_{α}, θ_{0}),

(15)

where $C_{r}^{*} (s) = e^{- \frac{s}{2}} \sum_{v = 0}^{\infty} s^{v - 1} 2^{- v} (2 v - s) P (χ_{r + 2 v}^{2} > χ_{r, τ_{0}}^{2}) / v! .$ Again the PIF is a linear function of the IF of the MDPDE and hence bounded for all $α > 0$ indicating power robustness of the Wald-type test based on (13).

4.2. Robust test for symmetry: normal versus skew-Normal

We now discuss a particular testing problem in the context of SN distribution, namely, the test of symmetry in the SN family through the null hypothesis $H_{0} : γ = 0$ . Note that, under this null hypothesis, the SN distribution coincides with the usual (symmetric) normal distribution and hence it indeed provides a test for normality against the non-symmetric SN alternatives. Let us discuss our proposal for a slightly general hypothesis, namely,

H_{0} : γ = γ_{0} against H_{1} : γ \neq γ_{0},

(16)

for a pre-fixed real $γ_{0}$ . Note that, the choice $γ_{0} = 0$ yields the test for symmetry against the SN alternatives. Note that, here μ and σ are unknown nuisance parameters. In the notation of Section 4.1, we have $Θ_{0} = {θ = (μ, σ, γ)^{T} : μ \in R, σ \in R^{+}, γ = γ_{0}}$ , r = 1, $m (θ) = γ - γ_{0}$ and $M (θ) = (0, 0, 1)^{T}$ .

Denoting the MDPDE as ${\hat{θ}}_{α, n} = ({\hat{μ}}_{α, n}, {\hat{σ}}_{α, n}, {\hat{γ}}_{α, n})$ , our MDPDE-based Wald-type test statistic (13) has a simplified form for testing (16) which is given by

W_{α, n} = \frac{n {({\hat{γ}}_{α, n} - γ_{0})}^{2}}{Σ_{α}^{(33)} ({\hat{θ}}_{α, n})},

(17)

where $Σ_{α}^{(33)} (θ)$ is the $(3, 3)$ -th element of $Σ_{α} (θ)$ . Then, under the null hypothesis in (16), $W_{α, n}$ asymptotically follows $χ_{1}^{2}$ distribution and the test can be performed by comparing $W_{α, n}$ with the corresponding critical values. Further, the approximate expression of power function at the contiguous hypothesis of the form $H_{1, n} : γ = γ_{0} + n^{- 1 / 2} d$ , with $d \in R$ , is given by

Π_{α} (θ_{n}) = 1 - G_{χ_{1, δ}^{2}} (χ_{1, τ_{0}}^{2}), with δ = \frac{d^{2}}{Σ_{α}^{(33)} (θ_{0})}, θ_{0} \in Θ_{0} .

We have numerically calculated this asymptotic contiguous power for testing symmetry ( $γ_{0} = 0$ ) at 5% level of significance by the MDPDE-based Wald-type test with different values of α, which are presented in Table 2 for $θ_{0} = (0, 1, 0)^{T}$ . It is clear that, just like the ARE of the MDPDE, the contiguous power of the MDPDE-based test also decreases as α increases but this loss is not quite significant at small $α > 0$ . For larger values of d, i.e. alternatives further away from the null, the power eventually becomes one for all $α \geq 0$ in accordance with the consistency of these tests.

Table 2.

Asymptotic contiguous power of the MDPDE-based Wald-type test for testing symmetry ( $γ_{0} = 0$ ) at 5% level of significance with $θ_{0} = (0, 1, 0)^{T}$ and different values of α and d.

	α
d	0	0.05	0.1	0.2	0.3	0.5	0.7	1
3.00	0.6685	0.6678	0.6662	0.6471	0.6327	0.6011	0.5507	0.4905
3.50	0.7982	0.7975	0.7960	0.7785	0.7649	0.7342	0.6827	0.6175
4.00	0.8915	0.8910	0.8899	0.8763	0.8655	0.8401	0.7948	0.7329
4.50	0.9489	0.9485	0.9478	0.9390	0.9317	0.9137	0.8792	0.8275
5.00	0.9790	0.9788	0.9784	0.9736	0.9694	0.9585	0.9356	0.8974
5.50	0.9925	0.9924	0.9922	0.9900	0.9879	0.9823	0.9690	0.9440
6.00	0.9977	0.9977	0.9976	0.9967	0.9958	0.9933	0.9866	0.9721
7.00	0.9999	0.9999	0.9998	0.9998	0.9997	0.9993	0.9982	0.9947
8.00	1.0000	1.0000	1.0000	1.0000	1.0000	1.0000	0.9998	0.9993
9.00	1.0000	1.0000	1.0000	1.0000	1.0000	1.0000	1.0000	0.9999

Open in a new tab

Next the robustness of the Wald-type test based on the statistic (17) for testing (16) can be studied through the second-order influence function of the test statistic and the PIF. From the general formulas presented in Section 4.1, we can easily calculate these measures in the present case of testing (16) as given by

\begin{aligned} I F_{2} (y, W_{α, n}, θ_{0}) & = 2 I F (y, T_{α}^{(γ)}, θ_{0})^{2} / Σ_{α}^{(33)} (θ_{0}) . \end{aligned}

(18)

\begin{aligned} P I F (y, W_{α, n}, θ_{0}) & = C_{r}^{*} (d^{2} / Σ_{α}^{(33)} (θ_{0})) I F (y, T_{α}^{(γ)}, θ_{0}) d / Σ_{α}^{(33)} (θ_{0}), \end{aligned}

(19)

where $T_{α}^{(γ)}$ is the MDPDE functional corresponding to γ and hence $I F (y, T_{α}^{(γ)}, θ_{0})$ is given by the third component of the full 3-dimensional IF vector given in (11). For illustration, we have presented the plots of these $I F_{2}$ and PIF for different values of α at $γ_{0} = 1$ , $θ_{0} = (0, 1, 1)^{T}$ and d = 4 in Figure 4; note that the plot of the corresponding $I F (y, T_{α}^{(γ)}, θ_{0})$ is as given in Figure 3(c). It is clearly evident from these figures that the MDPDE-based Wald-type test statistic (17) has bounded second-order IF as well as bounded PIF for all $α > 0$ indicating the claimed robustness and the extent of robustness increases as α increases.

Figure 4. — Second-order IF and the PIF for the MDPDE based of Wald-type test for testing $γ = 1$ with $θ_{0} = (0, 1, 1)^{T}$ and d = 4, for different α [Thin solid line: $α = 0$ (MLE); dashed line: $α = 0.1$ ; dotted line: $α = 0.3$ ; dash-dotted line: $α = 0.5$ ; thick solid line: $α = 1$ ].

5. Simulation study

5.1. Performance of the MDPDE

We now examine the finite sample performances of the MDPDE for the SN distribution through a Monte-Carlo simulation study. We simulate random samples from the SN distribution, using the R package ‘Sn’, for different sizes n = 50, 100, and the true parameter values $(μ, σ, γ) = (0, 1, 5)$ . Based on each simulated sample, we compute the MDPDEs at different α, including the MLE at $α = 0$ , using the genetic algorithm described in Section 3. Replicating this process 1000 times, we compute the empirical bias and MSE of the MDPDEs of the three parameters $(μ, σ, γ)$ for the fitted SN distribution. Further, to examine the robustness property, we repeat the above simulation exercise by contaminating $100 ϵ %$ of each sample by observations from a distant contaminating distribution. We have considered the contamination proportion ϵ to be 0.05 and 0.1, leading to 5% and 10% contaminations respectively, and four different contaminating distributions as SN(10,1,5), SN( $- 10$ ,1,5), SN(0,5,5) and SN(0,1,1). For each situation, the bias and MSE under contaminated data are again computed using 1000 replications for MDPDEs at different α and are compared with their pure data values. We report the exact values of the biases and MSEs of each of the three parameters $(μ, σ, γ)$ under each scenario in Tables 3–4.

Table 3.

Empirical Biases and MSEs of the parametric MDPDEs, for various α, and the two non-parametric estimates (QSN and MC) at sample size n = 50.

		Parametric MDPDE with α						Non-parametric		Parametric MDPDE with α						Non-parametric
ϵ		0(MLE)	0.1	0.3	0.5	0.7	1	QSN	MC	0(MLE)	0.1	0.3	0.5	0.7	1	QSN	MC
		Bias								MSE
0	μ	0.008	0.012	0.014	0.020	0.042	0.096	0.403	–	0.005	0.007	0.009	0.012	0.019	0.026	0.195	–
	σ	−0.011	−0.059	−0.096	−0.109	−0.179	−0.249	−0.546	–	0.007	0.014	0.022	0.037	0.092	0.111	0.320	–
	γ	0.810	0.926	1.047	1.169	1.285	1.342	−3.585	−4.071	1.638	2.109	2.588	2.973	3.468	3.791	13.182	17.188
Outliers from SN(10,1,5)
0.05	μ	0.117	0.091	0.087	0.044	0.049	0.107	0.409	–	0.038	0.027	0.022	0.015	0.024	0.039	0.209	–
	σ	−0.295	−0.261	−0.152	−0.108	−0.177	−0.257	−0.485	–	0.110	0.102	0.075	0.042	0.099	0.135	0.268	–
	γ	1.218	1.146	1.104	1.002	1.258	1.368	−3.517	−3.904	3.427	3.194	3.088	2.985	3.616	4.083	12.751	15.841
0.1	μ	0.124	0.102	0.099	0.055	0.062	0.159	0.373	–	0.045	0.032	0.026	0.019	0.032	0.055	0.309	–
	σ	−0.326	−0.297	−0.215	−0.132	−0.181	−0.318	−0.357	–	0.136	0.122	0.092	0.061	0.118	0.157	0.458	–
	γ	1.341	1.258	1.175	1.099	1.302	1.418	−3.354	−3.624	3.799	3.478	3.268	3.085	3.798	4.197	12.300	13.722
Outliers from SN(−10,1,5)
0.05	μ	−0.124	−0.089	−0.073	0.041	0.072	0.109	0.363	–	0.048	0.031	0.023	0.017	0.026	0.045	0.167	–
	σ	−0.207	−0.184	−0.105	−0.097	−0.174	−0.316	−0.540	–	0.179	0.125	0.095	0.055	0.112	0.154	0.315	–
	γ	1.194	1.122	1.041	0.934	1.160	1.386	−3.628	−4.377	3.480	3.199	3.114	3.085	3.484	4.054	13.464	20.029
0.1	μ	−0.143	−0.100	−0.081	0.055	0.090	0.140	0.318	–	0.059	0.043	0.035	0.020	0.037	0.056	0.140	–
	σ	−0.280	−0.232	−0.142	−0.111	−0.186	−0.324	−0.534	–	0.244	0.148	0.101	0.060	0.122	0.188	0.309	–
	γ	1.226	1.148	1.099	1.001	1.186	1.465	−3.678	−4.487	3.754	3.397	3.221	3.115	3.759	4.259	13.830	22.768
Outliers from SN(0,5,5)
0.05	μ	0.083	0.071	0.060	0.053	0.078	0.103	0.409	–	0.039	0.026	0.018	0.014	0.020	0.039	0.206	–
	σ	0.282	0.229	0.134	−0.109	−0.158	−0.269	−0.503	–	0.376	0.291	0.173	0.093	0.154	0.283	0.280	–
	γ	1.250	1.118	1.042	0.929	1.240	1.367	−3.540	−3.983	3.675	3.393	3.152	2.987	3.545	4.082	12.891	16.485
0.1	μ	0.103	0.092	0.076	0.067	0.091	0.125	0.403	–	0.045	0.038	0.027	0.017	0.027	0.046	0.208	–
	σ	0.309	0.255	0.166	−0.115	−0.209	−0.302	−0.441	–	0.402	0.336	0.249	0.131	0.214	0.352	0.234	–
	γ	1.279	1.143	1.078	0.996	1.270	1.382	−3.472	−3.835	3.870	3.495	3.202	3.046	3.630	4.356	12.468	15.237
Outliers from SN(0,1,1)
0.05	μ	−0.065	−0.051	−0.041	−0.029	0.076	0.115	0.402	–	0.096	0.044	0.027	0.015	0.022	0.036	0.196	–
	σ	−0.233	−0.178	−0.123	−0.119	−0.159	−0.279	−0.544	–	0.253	0.207	0.146	0.088	0.149	0.196	0.32	–
	γ	1.202	1.154	1.109	1.071	1.166	1.455	−3.600	−147	3.579	3.284	3.105	3.008	3.529	4.155	13.285	17.855
0.1	μ	−0.082	−0.062	−0.055	−0.048	0.097	0.142	0.396	–	0.137	0.097	0.048	0.024	0.032	0.048	0.191	–
	σ	−0.273	−0.214	−0.171	−0.145	−0.182	−0.319	−0.542	–	0.308	0.234	0.159	0.093	0.153	0.251	0.317	–
	γ	1.254	1.224	1.124	1.108	1.199	1.494	−3.617	−4.168	3.822	3.527	3.266	3.063	3.764	4.391	13.386	18.018

Open in a new tab

Table 4.

Empirical Biases and MSEs of the parametric MDPDEs, for various α, and the two non-parametric estimators (QSN and MC) at sample size n = 100.

		Parametric MDPDE with α						Non-parametric		Parametric MDPDE with α						Non-parametric
ϵ		0(MLE)	0.1	0.3	0.5	0.7	1	QSN	MC	0(MLE)	0.1	0.3	0.5	0.7	1	QSN	MC
		Bias								MSE
0	μ	0.005	0.010	0.011	0.015	0.031	0.080	0.398	–	0.003	0.004	0.005	0.006	0.007	0.010	0.178	–
	σ	−0.005	−0.063	−0.084	−0.099	−0.122	−0.202	−0.540	–	0.005	0.007	0.009	0.018	0.032	0.058	0.304	–
	γ	0.397	0.484	0.695	0.990	1.094	1.178	−3.427	−3.786	0.860	1.128	1.295	2.089	2.969	3.194	13.112	16.188
Outliers from SN(10,1,5)
0.05	μ	0.100	0.083	0.055	0.022	0.039	0.093	0.406	–	0.029	0.021	0.015	0.009	0.016	0.033	0.187	–
	σ	−0.248	−0.220	−0.115	−0.103	−0.149	−0.217	−0.484	–	0.090	0.062	0.045	0.027	0.094	0.124	0.251	–
	γ	1.089	1.025	1.009	0.962	1.119	1.234	−3.476	−3.833	2.967	2.747	2.337	2.169	2.997	3.499	12.448	14.957
0.1	μ	0.106	0.095	0.077	0.036	0.051	0.103	0.388	–	0.034	0.025	0.016	0.010	0.025	0.042	0.178	–
	σ	−0.286	−0.237	−0.164	−0.114	−0.169	−0.245	−0.386	–	0.128	0.072	0.073	0.049	0.108	0.139	0.175	–
	γ	1.213	1.143	1.058	1.003	1.215	1.366	−3.459	−3.604	3.271	3.008	2.770	2.485	3.258	3.984	12.157	13.187
Outliers from SN(-10,1,5)
0.05	μ	−0.103	−0.079	−0.047	0.017	0.036	0.101	0.360	–	0.035	0.024	0.014	0.010	0.018	0.039	0.149	–
	σ	−0.191	−0.175	−0.095	−0.088	−0.161	−0.242	−0.540	–	0.146	0.092	0.054	0.029	0.098	0.137	0.315
	γ	1.128	1.038	1.009	0.916	1.107	1.249	−3.672	−4.029	3.324	2.961	2.500	2.184	3.143	3.562	13.629	18.652
0.1	μ	−0.131	−0.088	−0.060	0.034	0.068	0.127	0.316	–	0.047	0.034	0.024	0.014	0.028	0.049	0.119	–
	σ	−0.240	−0.204	−0.121	−0.093	−0.170	−0.283	−0.533	–	0.207	0.120	0.082	0.043	0.113	0.185	0.296	–
	γ	1.167	1.075	1.034	0.959	1.129	1.357	−3.716	−4.138	3.479	3.175	3.160	2.842	3.328	4.059	13.814	20.281
Outliers from SN(0,5,5)
0.05	μ	0.072	0.065	0.056	0.049	0.066	0.106	0.407	–	0.030	0.021	0.014	0.008	0.013	0.028	0.187	–
	σ	0.244	0.188	0.123	−0.097	−0.132	−0.236	−0.502	–	0.326	0.263	0.148	0.081	0.119	0.245	0.267	–
	γ	1.169	1.057	1.002	0.885	1.130	1.260	−3.501	−3.902	3.415	3.148	2.948	2.362	3.019	3.749	12.121	15.515
0.1	μ	0.082	0.074	0.059	0.051	0.077	0.113	0.402	–	0.038	0.030	0.016	0.009	0.020	0.035	0.186	–
	σ	0.259	0.229	0.145	0.104	−0.158	−0.287	−0.443	–	0.385	0.297	0.201	0.105	0.172	0.296	0.216	–
	γ	1.240	1.120	1.054	0.954	1.195	1.334	−3.541	−3.779	3.552	3.239	3.098	2.687	3.472	4.138	12.705	14.523
Outliers from SN(0,1,1)
0.05	μ	−0.060	−0.045	−0.034	−0.026	0.062	0.093	0.400	–	0.066	0.037	0.014	0.008	0.013	0.024	0.179	–
	σ	−0.198	−0.147	−0.113	−0.105	−0.129	−0.246	−0.541	–	0.191	0.125	0.082	0.049	0.089	0.169	0.310	–
	γ	1.106	1.032	1.014	1.002	1.097	1.386	−3.616	−4.049	3.413	3.124	2.678	2.284	3.195	3.798	13.218	16.766
0.1	μ	−0.067	−0.060	−0.046	−0.035	0.071	0.103	0.394	–	0.114	0.075	0.036	0.059	0.101	0.037	0.175	–
	σ	−0.236	−0.179	−0.143	−0.125	−0.152	−0.282	−0.542	–	0.286	0.191	0.145	0.078	0.126	0.185	0.307	–
	γ	1.195	1.148	1.075	1.016	1.136	1.317	−3.665	−4.083	3.571	3.283	2.913	2.479	3.514	4.124	13.589	17.049

Open in a new tab

Additionally, we compare the proposed parametric MDPDE with two most recent existing non-parametric estimates of skewness, namely the ones proposed by [12] and [37]; we will refer to these two nor-parametric estimates as ‘MC’ and ‘QSN’, respectively. For the sake of completeness, a brief description of these MC and QSN are provided in Appendix A.1 and A.2; note that QSN provides the (non-parametric) estimates of all the parameters of the SN distribution (using quartiles) whereas MC only estimates the skewness parameter γ. In all our simulation settings, along with the proposed MDPDE, we also compute the MC and QSN from the simulated samples, and report their biases and MSEs based on 1000 replications in Tables 3–4.

One can clearly note that the bias and MSE under pure data increases with α but the increase is reasonably small at smaller positive values of α. On the other hand, under contaminated data the bias and MSE increases significantly for the MLE (at $α = 0$ ), whereas those for MDPDEs with larger α remain closer to their pure data values; the stability increases with increasing values of $α > 0$ . Based on the efficiency and robustness trade-off, it has been observed in all the situations considered, the MDPDE with α around 0.5 produce smallest values of bias and MSEs under contamination which are significantly lower compared to those obtained by the MLE under contamination.

The non-parametric estimators, QSN and MC, are significantly robust against most contamination scenarios but, as expected, they have significantly poorer performance compared to the proposed parametric MDPDEs for both pure and contaminated data. This clearly justifies the usefulness and importance of the parametric approach, along with the proposed robust MDPDEs, over the non-parametric ones when a parametric (SN) distribution can be assumed for the majority of sample data.

5.2. Performance of the MDPDE-based Waldtype test

To visualize the performance of proposed MDPDE-based Wald-type tests, we have again performed several simulation studies. We consider the problem of testing symmetry through the hypothesis $H_{0}$ : $γ = 0$ against $H_{1}$ : $γ \neq 0$ , for which the Wald-type test statistic $W_{α, n}$ is as given in (17) with $γ_{0} = 0$ . We first simulate random samples of sizes n from the SN(0,1,0) distribution and perform the MDPDE-based Wald-type test for different α, including the classical Wald test at $α = 0$ . Based on 1000 replications, we then compute the empirical levels of the tests measured as the proportion of test statistics exceeding the chi-square critical value among the 1000 replications. Subsequently, to compute the empirical power of the tests, we repeat the above exercise but now generating random samples from an alternative SN(0,1,1) distribution. Finally, to illustrate the claimed robustness, we recalculate the level and power of the Wald-type tests after contamination $100 ϵ %$ of each sample in the previous simulation exercises with $ϵ = 0.05, 0.1$ . The contaminated observations are generated from SN(0,1,3) and SN(0,1,−3) distributions, respectively, for the level and power calculations. In Table 5, we report all the resulting empirical levels and powers obtained from different simulation scenarios with increasing sample sizes n.

Table 5.

Empirical levels and powers of the MDPDE-based Wald-type tests for different α.

			α
	ϵ	SampleSize (n)	0(MLE)	0.1	0.3	0.5	0.7	1.0
Level	0	50	0.114	0.139	0.156	0.181	0.208	0.243
		100	0.061	0.105	0.121	0.163	0.184	0.206
		150	0.055	0.072	0.091	0.107	0.123	0.147
		200	0.052	0.065	0.077	0.086	0.096	0.107
		250	0.052	0.059	0.065	0.075	0.081	0.092
		300	0.051	0.054	0.058	0.063	0.069	0.078
		350	0.051	0.054	0.056	0.06	0.066	0.074
		400	0.05	0.052	0.054	0.057	0.061	0.067
		500	0.05	0.051	0.052	0.055	0.057	0.061
	0.05	50	0.787	0.429	0.258	0.187	0.162	0.136
		100	0.846	0.476	0.272	0.183	0.139	0.112
		150	0.861	0.489	0.283	0.141	0.108	0.094
		200	0.877	0.495	0.29	0.127	0.089	0.078
		250	0.88	0.502	0.301	0.122	0.074	0.064
		300	0.891	0.507	0.315	0.11	0.063	0.055
		350	0.897	0.516	0.319	0.108	0.06	0.055
		400	0.904	0.525	0.324	0.104	0.058	0.054
		500	0.911	0.536	0.331	0.101	0.057	0.052
	0.10	50	0.825	0.478	0.286	0.21	0.179	0.142
		100	0.859	0.502	0.294	0.205	0.153	0.126
		150	0.873	0.497	0.292	0.178	0.117	0.105
		200	0.886	0.506	0.314	0.159	0.094	0.087
		250	0.897	0.519	0.327	0.141	0.079	0.071
		300	0.904	0.522	0.332	0.138	0.067	0.06
		350	0.912	0.53	0.341	0.128	0.066	0.058
		400	0.923	0.541	0.355	0.119	0.063	0.057
		500	0.928	0.553	0.362	0.116	0.061	0.055
Power	0	50	0.94	0.953	0.968	0.979	0.99	1
		100	0.962	0.974	0.985	0.994	1	1
		150	0.973	0.982	0.996	1	1	1
		200	0.981	0.989	0.998	1	1	1
		250	1	1	1	1	1	1
		300	1	1	1	1	1	1
	0.05	50	0.23	0.565	0.792	0.956	0.978	1
		100	0.249	0.582	0.815	0.983	0.994	1
		150	0.261	0.598	0.849	0.991	1	1
		200	0.306	0.639	0.905	1	1	1
		250	0.369	0.692	0.941	1	1	1
		300	0.414	0.761	0.985	1	1	1
	0.10	50	0.241	0.573	0.801	0.963	0.987	1
		100	0.258	0.596	0.831	0.99	0.999	1
		150	0.269	0.607	0.863	0.994	1	1
		200	0.317	0.65	0.918	1	1	1
		250	0.383	0.708	0.954	1	1	1
		300	0.427	0.782	0.996	1	1	1

Open in a new tab

It can be observed from the table that, under pure data, the levels are inflated for Wald-type tests with larger α. However, through more extensive simulations, we can see that the levels stabilize to the desired $5 %$ significance level at n = 300; although this happens for the classical MLE-based Wald test (at $α = 0$ ) at n = 150 itself. As a results, the pure data power always appears higher for the Wald-type tests with larger α and they indeed becomes one for all α at n = 250. However, the main advantage of the MDPDE-based Wald-type tests appear at the stability of their levels and sizes under contamination in sample data. For Wald test at $α = 0$ , the level inflates significantly due to contamination but becomes more stable with increasing α. Similarly, the power of the classical Wald test decreases drastically under contamination but regain its high values for the MDPDE-based Wald-type tests with larger $α \geq 0.3$ . Therefore, the MDPDE-based Wald-type tests with moderately large $α > 0$ always produce more power with a slightly inflated levels which remain stable even under different contamination levels.

6. Real data applications

6.1. AIS dataset

Let us consider again the motivating dataset and use the MDPDE to obtain the estimates of the fitted SN distributions. We consider again the important health indicator variables as in Figure 2 and compute the MDPDEs of the parameters of the fitted SN distribution for each variable using the algorithm described in Section 3. We have also estimated the standard errors of the resulting MDPDEs using the formula described in Section 2.2. The parameter estimates, along with their standard errors, for all eight variables are reported in Table 6. The outlier deleted MLE, obtained after removing the outliers identified through the respective box-plots, are also presented in Table 6 for reference.

It can be easily observed from Table 6 that the MLE changes drastically for all the variables due to the presence of outliers, but the proposed MDPDEs with larger $α > 0$ computed over the full data remain extremely close to the outlier deleted MLE. Thus, the use of the MDPDEs with larger $α > 0$ leads to robust insights even in the presence of outliers in the data; most of the time the MDPDEs with large values of α are very close to the cleaned data MLE. However, we need larger values of $α > 0$ if the strength of the outliers increases (more in number or greater distance from the data center) and vice versa. In the present example, the variables HC, RCC and LBM all have one outlying data-point but the MDPDEs of RCC and LBM becomes quite close to the outlier deleted MLE at $α \approx 0.1$ and that requires larger $α \approx 0.5$ for HC due to the greater distance of the outlier in this case; among other variables PFC, having 12 outliers, requires $α \approx 0.5$ to generate robust estimates, whereas the corresponding values of α are 0.3 also for the measurements BMI, WCC and Wt. In summary, all MDPDES with $α \geq 0.5$ generates estimates similar to the outlier deleted MLE in all cases although sometimes a substantially lower $α > 0$ may also produce stable results (like $α = 0.1$ for Ht).

To illustrate the robustness aspect of the MDPDEs more clearly, we have also recomputed the MDPDEs for outlier deleted data for all $α \geq 0$ and compared them with the corresponding full data values; the greater robustness can be measured by the lower values of their relative differences defined as

R D (ν) = \frac{| {\hat{ν}}_{full} - {\hat{ν}}_{clean} |}{{\hat{ν}}_{full}} \times 100 %, ν \in {μ, σ, γ},

where ${\hat{ν}}_{full}$ and ${\hat{ν}}_{clean}$ denote, respectively, the estimates of $ν \in {μ, σ, γ}$ obtained from full data with outliers and the outlier deleted data. For all the eight measurements, the relative differences (RDs) of the MDPDEs over different α are plotted in Figure 5. Clearly, the RDs are significantly high for MLE (at $α = 0$ ); they are as high as 1200% and 400% for the skewness parameter γ for BMI and Wt, respectively. But these RDs decrease for MDPDEs as $α > 0$ increases and become very close to zero for $α \geq 0.5$ in all the cases; they already become close to zero at $α \approx 0.2$ for HC, RCC, WCC, LBM and Ht. Among three parameters, the effect of outliers is seen to be most significant for γ followed by σ and the effect is often minimum for the parameter μ. All these illustrations clearly show the claimed robustness of the proposed MDPDE with larger $α > 0$ for analyses of the present AIS dataset.

Figure 5. — Relative differences (RDs) of the MDPDEs and MLE (at $α = 0$ ) due to the presence of outliers, plotted over α, for different Health indicator variables from the AIS Data [Solid line: μ; dotted line: σ; dashed line: γ]. (a) Hemoglobin Conc. (HC), (b) red cell count (RCC), (c) white cell count (WCC), (d) plasma Fer. Conc. (PFC), (e) Body Mass Index (BMI), (f) lean body mass (LBM), (g) height (Ht), (h) weight (Wt).

Next, let us study the performance of the proposed MDPDE-based Wald-type tests for generating inference for the present AIS data. We have examined several types of simple and composite parametric hypotheses for different health measurements in AIS data with or without outliers. Since the results are similar in all cases, for brevity, we report the six most interesting cases as follows. All six null hypotheses are composite as we assume the other remaining parameters to be unknown under the null as well as the corresponding omnibus alternative hypothesis.

Variable	HC	WCC	LBM	PFC	BMI	Wt
Hypothesis $H_{0}$	$γ = - 1.8$	$γ = 1.7$	$γ = 2$	$σ = 57$	$σ = 4$	$μ = 72$

Open in a new tab

For all these hypotheses, we have computed the p-values using the MDPDE-based Wald-type tests for different α for the full data as well as the outlier-deleted data, which are plotted in Figure 6. Note that, the usual Wald test at $α = 0$ is strongly affected by the outliers and provides completely opposite inference with clear difference in significance levels in presence or absence of outliers in most cases. However, the proposed MDPDE-based tests with $α > 0$ provides stable inference similar to the one we could have obtained after removing the outliers for all the variables except PFC; for the testing problem in PFC, we need $α \geq 0.2$ to have robust inference due to the excessive amount of contamination. Another interesting case is the one with LBM, where the p-values obtained under the full data and outlier removed data are almost the same, except for the classical Wald test at $α = 0$ ; the corresponding p-values obtained by Wald test are 0.000066 and 0.065, respectively. Thus, here also the inference at 95% significance level alters due to the outlier for Wald tests, but the proposed MDPDE-based tests at $α > 0$ yield more reasonable inference of failing to reject the hypothesis even in the presence of outliers. These observations further support our claimed robustness of the proposed MDPDE-based Wald-type tests.

Figure 6. — P-values obtained for different hypotheses testing problems for AIS data using the MDPDE-based Wald-type tests for the full data (solid line) and the outlier-deleted data (dotted line). (a) Variable: HC; $H_{0} : γ = - 1.8$ . (b) Variable: WCC; $H_{0} : γ = 1.7$ . (c) Variable: LBM; $H_{0} : γ = 2$ . (d) Variable: PFC; $H_{0} : σ = 57$ . (e) Variable: BMI; $H_{0} : σ = 4$ . (f) Variable: Wt; $H_{0} : μ = 72$ .

6.2. AIDS clinical trial data

Our second example is an AIDS clinical trial (ACTG 315) including 46 HIV-1 infected patients treated with a potent antiretroviral drug cocktail based on protease inhibitor and reverse transcriptase drugs (ritonavir, 3TC and AZT). During the study, the viral load, cd4 count (CD4) and cd8 count (CD8) were measured several times in different days from the start of the treatment (generally 4–10 measurements per patient) The corresponding data have been analyzed by several statisticians [39,58,59] and are available in the R package 'qrNLMM'. In particular Castro [14] fitted the skew-normal distribution to these data in a regression setting.

Here, we consider the variable CD4, CD8 and the logarithm of the viral load (LGVIRAL) measured at the second day after the start of the study for each patients. The corresponding histogram and the SN fit by the MLE is presented in Figure 7; clearly the distributions are skewed but the MLE is unable to fit them properly for CD8 and LGVIRAL due to the presence of outliers as shown in the respective box-plots in the same figure (Figure 7). The MLE-based fits are also shown in the figures along with the histogram, which clearly show the inability of the MLE to adequately model the bulk of the data due to the presence of few outlying points. In particular, the fitted SN distributions (by MLE) have a clearly different mode for both the measurements CD8 and LGVIRAL due to strong outlier effects.

Figure 7. — Histograms and box-plots of different variables, measured at day 2, from the ACTG 315 Data. The SN distributions fitted by the MLE are also shown along with the histograms. (a) CD4, (b) CD8, (c) LGVIRAL.

We next compute the proposed MDPDEs of the parameters of the fitted SN distribution for each of the three health measurements and compared them with the MLEs and the outlier deleted MLEs; the resulting estimates and their estimated standard errors are presented in Table 7. From the table, we can see that the MDPDEs at any $α > 0$ are very similar to the MLE for CD4 where there are no outliers in the data. For LGVIRAl, the MDPDEs with $α \geq 0.3$ produce robust results which are significantly different from the MLE and are close to the outlier deleted MLE. For CD8, however, the MDPDEs with $α \geq 0.1$ are all similar but significantly different from both the MLE as well as the outlier deleted MLE. To see which one provides the more robust fit, in Figure 8, we have plotted the fitted SN density obtained by the MDPDE at $α = 0.5$ , the MLE and the outlier deleted MLE, along with the histograms of CD8 and LGVIRAL. In both cases, the MDPDE seems to provide the best fit to the major bulk of the histogram, even better than the outlier deleted MLE. This shows that there are yet other masked outliers in the data which are not detectable by the usual box-plot technique and hence illustrates the significance of our proposed MDPDEs over outlier deletion methods in providing stable inference from contaminated data.

Table 7.

The parameter estimates (standard errors) for the fitted SN distribution to the three health measurements in AIDS clinical trail data (measured at day 2), obtained through MDPDEs at different α, the MLE (at $α = 0$ ) and the outlier deleted MLE. The number of outliers found for each measurements are reported after their name in the first column.

		α
Variable (Outlier)		0(MLE)	0.1	0.3	0.5	0.7	1.0	Outlier deleted MLE
CD4	μ	252.650	252.634	252.634	252.634	252.634	252.634	252.650
(0)		(1.538)	(1.729)	(1.954)	(2.177)	(2.280)	(2.470)	(1.538)
	σ	89.204	89.204	89.204	89.204	89.204	89.204	89.204
		(1.169)	(1.243)	(1.290)	(1.343)	(1.475)	(1.989)	(1.160)
	γ	−1.084	−1.057	−1.058	−1.059	−1.059	−1.059	−1.084
		(0.675)	(0.696)	(0.733)	(0.738)	(0.752)	(0.755)	(0.679)
CD8	μ	407.251	407.141	407.140	407.140	407.140	407.140	408.972
(2)		(1.605)	(1.736)	(1.804)	(2.063)	(2.160)	(2.708)	(1.284)
	σ	757.601	594.563	594.562	594.562	594.562	594.562	570.614
		(1.839)	(2.297)	(2.406)	(2.749)	(3.386)	(3.986)	(1.771)
	γ	109.581	9.739	9.738	9.738	9.738	9.738	87.762
		(1.291)	(2.264)	(2.370)	(2.442)	(2.751)	(3.216)	(0.790)
LGVIRAL	μ	5.374	4.548	4.592	4.598	4.597	4.595	4.558
(1)		(0.076)	(0.076)	(0.078)	(0.086)	(0.105)	(0.278)	(0.053)
	σ	0.902	0.713	0.675	0.642	0.630	0.620	0.510
		(0.076)	(0.079)	(0.085)	(0.096)	(0.117)	(0.166)	(0.055)
	γ	−1.231	1.222	1.271	1.380	1.485	1.609	1.679
		(0.160)	(0.305)	(0.326)	(0.330)	(0.331)	(0.927)	(0.344)

Open in a new tab

Figure 8. — The SN distributions fitted by the MDPDE at $α = 0.5$ (dashed line), the MLE (solid line) and the outlier deleted MLE (dotted line), along with the histograms of the measurements LGVIRAL and CD8 in AIDS clinical trial data. (a) CD8, (b) LGVIRAL.

Finally, as in the previous example, here also we have observed that the MDPDEs with $α \geq 0.3$ are extremely stable in the presence and absence of the outliers, and produce robust inference for any parametric hypothesis testing problem for these clinical trial data as well. So, we have avoided presenting these numbers here for brevity.

7. Concluding remarks

In this paper, we have discussed new robust inference procedures for the SN distribution which is useful in modelling noisy skewed data through the popular minimum DPD approach. The minimum DPD estimators of the SN parameter are described along with their asymptotic and robustness properties and two efficient computational algorithms have been proposed. Then, we discuss the robust testing procedure through the MDPDE-based Wald-type tests and their properties with detailed illustrations for testing symmetry against SN alternatives. The usefulness of the SN distribution and the proposed robust inference in the context of health data analysis have been argued and illustrated empirically.

The proposed approach would also be helpful to develop an outlier detection algorithm under the Skew-normal distributions. As per the knowledge of the authors, there is little existing literature on a formal way to define outliers for the skew-normal distributions; most of the available ones use the idea of adjusting the box-plot based on some non-parametric measure of skewness; see, for example [35–37,42,52]. Since we have demonstrated significantly superior performance of the proposed MDPDE of the skew-normal parameters compared to the existing non-parametric approaches, the use of these MDPDEs in adjusting the box-plot and the subsequent outlier detection is expected to yield much improved and stable inference. Considering the length and the content of the present paper, we have deferred the detailed discussions on such MDPDE-based outlier detection techniques for skew-normal distribution to a future work.

Another issue worth mentioning here is the choice of the robustness tuning parameter α in real-life applications. Although we have suggested some empirical choices for the tuning parameter α to be used in practice, more detailed future research in this line would be necessary to develop an algorithm for data-driven selection of α. One can easily extend the approach of [56] for this purpose with few specific adjustments for the skew-normal distributions.

Further, the present work additionally opens up several new directions in health research as well. The proposed methodology is a generalization of the MLE which can be extended to different inferential problems in health studies with skewed data to generate stable insight. For example, the immediate extensions would be the robust inference under regression models for skewedly distributed responses or the comparison of different populations of skewed data. The latter one can be used, e.g. in finding differential genes from expression data which are skewed in nature. Also one can suitably extend the proposed MDPDE-based robust inference procedures to other recently developed parametric distributions, like the Skew-reflected Gompertz [32] and the Two-piece normal [41], which have similar behaviors to the skew-normal distribution. We hope to pursue some of these extensions in our future work.

Acknowledgements

The authors wish to thank the Editor and the two anonymous referees for their careful reading of the manuscript and several constructive suggestions which have significantly improved the paper. The research of the third author (AG) is partially supported by the INSPIRE Faculty Research Grant from Department of Science and Technology, Government of India, and the research of the first and third authors (AN and AG) are partially supported by a Start-up Research Grant from Indian Statistical Institute.

Appendix. The two non-parametric estimators of skewness

A.1. QSN: A quartile-based estimator of skew-normal parameters

A quartile-based (non-parametric) estimator for the parameters $(μ, σ, γ)$ of a SN distribution is recently proposed in [37], which we will refer to as the QSN estimator. For a given sample $X_{1}, \dots, X_{n}$ from SN distribution, we estimate its parameters sequentially as follows.

First estimate the skewness parameter γ from the empirical skewness measure as
$\hat{γ} = s k (X_{1}, \dots, X_{n}) = \frac{Q_{3} - Q_{2}}{Q_{2} - Q_{1}},$
where $Q_{1}$ , $Q_{2}$ and $Q_{3}$ denotes the first, second and third quartiles, respectively, of the observed sample.
Next, based on $\hat{γ}$ , we estimate the parameter σ by
$\hat{σ} = \frac{Q_{3} - Q_{1}}{q_{3}^{'} - q_{1}^{'}},$
where $q_{1}^{'}$ and $q_{3}^{'}$ denotes the first and third quartiles of the $S N (0, 1, \hat{γ})$ distribution.
Finally, based on $\hat{γ}$ and $\hat{σ}$ , we estimate the parameter μ by
$\hat{μ} = Q_{2} - q_{2}^{″},$
where $q_{2}^{″}$ is the median of the $S N (0, \hat{σ}, \hat{γ})$ distribution.

A.2. MC: A non-parametric estimator of the skewness parameter

We consider another non-parametric estimator of the skewness parameter γ in our set-up obtained from a robust measure of skewness known as medcouple (MC) [12], which we will refer to as the MC estimator of γ. For a given sample $X_{1}, \dots, X_{n}$ , the skewness measure MC is defined as

M C = \underset{X_{i} \leq Q_{2} \leq X_{j}}{Median} h (X_{i}, X_{j}),

where $Q_{2}$ denotes the sample median and the kernel function h is given by

h (X_{i}, X_{j}) = \frac{(X_{j} - Q_{2}) - (Q_{2} - X_{i})}{X_{j} - X_{i}}, for all\ {X_{i} \neq X_{j}} .

It clearly follows from the above definition that the MC always lies between $- 1$ and $+ 1$ . A positively skewed distribution shows positive value for MC, whereas MC becomes negative for negatively skewed distribution. It becomes zero for symmetric distributions.

We can easily obtain an estimate, the MC estimate, of the skewness parameter γ of our SN distribution from this MC via appropriate transformation. More specifically, we equate the value MC to the corresponding population skewness parameter to get an estimate of $δ = \frac{γ}{\sqrt{1 + γ^{2}}}$ as

\hat{δ} = \sqrt{\frac{π}{2}} \frac{| M C |^{\frac{2}{3}} sign (M C)}{| M C |^{\frac{2}{3}} + {((4 - π) / 2)}^{\frac{2}{3}}} .

From this $\hat{δ}$ , the MC estimate of γ is then given by

\hat{γ} = \frac{\hat{δ}}{\sqrt{1 - {\hat{δ}}^{2}}} .

Funding Statement

The research of the third author (A. G.) is partially supported by the INSPIRE Faculty Research Grant from Department of Science and Technology, Government of India, and the research of the first and third authors (A. N. and A. G.) are partially supported by a Start-up Research Grant from Indian Statistical Institute.

Disclosure statement

No potential conflict of interest was reported by the author(s).

References

1.Azzalini A., A class of distributions which includes the normal ones, Scand. J. Stat. 12 (1985), pp. 171–178. [Google Scholar]
2.Azzalini A., Further results on a class of distributions which includes the normal ones, Statistica 46 (1986), pp. 199–208. [Google Scholar]
3.Azzalini A., The skew-normal distribution and related multivariate families, Scand. J. Stat. 32 (2005), pp. 159–188. [Google Scholar]
4.Azzalini A., Skew-symmetric families of distributions, in International Encyclopedia Statistical Science, M. Lovric, ed., Springer, Berlin Heidelberg, 2011, pp. 1344–1346.
5.Azzalini A. and Regoli G., The work of Fernando de Helguero on non-normality arising from selection, Chilean J. Stat. 3 (2012). [Google Scholar]
6.Bandyopadhyay D., Lachos V.H., AbantoValle C.A. and Ghosh P., Linear mixed models for skewnormal/independent bivariate responses with an application to periodontal disease, Stat. Med. 29 (2010), pp. 2643–2655. [DOI] [PMC free article] [PubMed] [Google Scholar]
7.Barzilai J. and Borwein J.M., Two-point step size gradient methods, IMA J. Numer. Anal. 8 (1988), pp. 141–148. [Google Scholar]
8.Basso R.M., Lachos V.H., Cabral C.R.B. and Ghosh P., Robust mixture modeling based on scale mixtures of skew-normal distributions, Comput. Stat. Data Anal. 54 (2010), pp. 2926–2941. [Google Scholar]
9.Basu A., Harris I.R., Hjort N.L. and Jones M.C., Robust and efficient estimation by minimising a density power divergence, Biometrika 85 (1998), pp. 549–559. [Google Scholar]
10.Basu A., Mandal A., Martin N. and Pardo L., Generalized Wald-type tests based on minimum density power divergence estimators, Statist. 50 (2016), pp. 1–26. [Google Scholar]
11.Basu A., Shioya H. and Park C., Statistical Inference: The Minimum Distance Approach, Chapman and Hall/CRC, Boca Raton, FL, 2011. [Google Scholar]
12.Brys G., Hubert M. and Struyf A., A robust measure of skewness, J. Comput. Graph Stat. 13 (2004), pp. 996–1017. [Google Scholar]
13.Brys G., Hubert M. and Struyf A., A comparison of some new measures of skewness, in Developments in Robust Statistics, R. Dutter, P. Filzmoser, U. Gather, and P. J. Rousseeuw, eds., Physica, Heidelberg, 2003, pp. 98–113
14.Castro L.M., Wang W.L., Lachos V.H., Inacio de Carvalho V. and Bayes C.L., Bayesian semiparametric modeling for HIV longitudinal data with censoring and skewness, Statist Methods Med Res 28 (2019), pp. 1457–1476. [DOI] [PubMed] [Google Scholar]
15.Chudasama C., Shah S.M. and Panchal M., Comparison of parents selection methods of genetic algorithm for TSP, International Conference on Computer Communication and Networks, CSI-COMNET-2011, Proceedings 2011, pp. 85–87.
16.Crocetta C. and Loperfido N., Maximum likelihood estimation of correlation between maximal oxygen consumption and the 6-min walk test in patients with chronic heart failure, J. Appl. Stat. 36 (2009), pp. 1101–1108. [Google Scholar]
17.da Silva Ferreira C., Vilca F. and Bolfarine H., Diagnostics analysis for skew-normal linear regression models: applications to a quality of life dataset, Braz. J. Prob. Stat. 32 (2018), pp. 525–544. [Google Scholar]
18.Daly C.H., Higgins V., Adeli K., Grey V.L. and Hamid J.S., Reference interval estimation: methodological comparison using extensive simulations and empirical data, Clin. Biochem. 50 (2017), pp. 1145–1158. [DOI] [PubMed] [Google Scholar]
19.Ghalani M.R. and Zadkarami M.R., Investigation of covariance structures in modelling longitudinal ordinal responses with skew normal random effect, Communic. Stat. Simul. Comput. 50 (2019), pp. 1–16. [Google Scholar]
20.Ghosh A., Robust inference under the beta regression model with application to health care studies, Statist. Meth. Medical Res. 28 (2019), pp. 871–888. [DOI] [PubMed] [Google Scholar]
21.Ghosh A. and Basu A., Robust estimation for independent non-homogeneous observations using density power divergence with applications to linear regression, Electron. J. Stat. 7 (2013), pp. 2420–2456. [Google Scholar]
22.Ghosh A. and Basu A., Robust estimation for non-Homogeneous data and the selection of the optimal tuning parameter: the DPD approach, J. Appl. Stat. 42 (2015), pp. 2056–2072. [Google Scholar]
23.Ghosh A. and Basu A., Robust and efficient parameter estimation based on censored data with stochastic covariates, Statist. 51 (2017), pp. 801–823. [Google Scholar]
24.Ghosh A., Basu A. and Pardo L., Robust Wald-Type tests under random censoring with applications to clinical trial analyses, Preprint (2019), arXiv:1708.09695v2 [stat.ME] [DOI] [PubMed]
25.Ghosh A., Mandal A., Martin N. and Pardo L., Influence analysis of robust Wald-type test, J. Mult. Anal. 147 (2016), pp. 102–126. [Google Scholar]
26.Giuntella O., Why does the health of immigrants deteriorate? Evidence from birth records, J. Health Econ. 54 (2017), pp. 1–16. [DOI] [PubMed] [Google Scholar]
27.Goldberg D.E., Genetic Algorithm in Search, Optimization and Machine Learning, Addison-Wesley Longman Publishing Company Inc., Boston, 1989. [Google Scholar]
28.Gutman R. and Rubin D.B., Estimation of causal effects of binary treatments in unconfounded studies with one continuous covariate, Statist. Meth. Med. Res. 26 (2017), pp. 1199–1215. [DOI] [PMC free article] [PubMed] [Google Scholar]
29.Hampel F.R., Ronchetti E., Rousseeuw P.J. and Stahe l W.A.. Robust Statistics: The Approach Based on Influence Functions, John Wiley & Sons, New York, USA, 1986. [Google Scholar]
30.Hashimoto S., Robust estimation of skew-normal distribution with location and scale parameters via log-regularly varying functions, Int. J. Stat. Syst. 12 (2017), pp. 813–822. [Google Scholar]
31.Hinkley D.V., On power transformations to symmetry, Biometrika 62 (1975), pp. 101–111. [Google Scholar]
32.Hoseinzadeh A., Maleki M., Khodadadi Z. and Contreras-Reyes J.E., The Skew-Reflected-Gompertz distribution for analysing symmetric and asymmetric data, J. Comput. Appl. Math 349 (2019), pp. 132–141. [Google Scholar]
33.Hossain A. and Beyene J., Application of Skew normal distribution for detecting differential expression to micro-RNA data, J. Appl. Stat. 42 (2015), pp. 477–491. [Google Scholar]
34.Huang C.Y. and Ku M.S., Asymmetry effect of particle size distribution on content uniformity and over-potency risk in low-dose solid drugs, J. Pharm. Sci. 99 (2010), pp. 4351–4362. [DOI] [PubMed] [Google Scholar]
35.Hubert M., An adjusted boxplots for skewed distributions, Comput. Stat. Data Anal. 52 (2008), pp. 5186–5201. [Google Scholar]
36.Hubert M. and Van der Veeken S., Outlier detection for skewed data, J. Chemom. 22 (2008), pp. 235–246. [Google Scholar]
37.Huh M-H. and Lee Y., Skew normal boxplots and outliers, Commun. Stat. Appl. Meth. 19 (2012), pp. 591–595. [Google Scholar]
38.Kim D. and Fessler J.A., Optimized first-order methods for smooth convex minimization, Math. Prog. 159 (2016), pp. 81–107. [DOI] [PMC free article] [PubMed] [Google Scholar]
39.Lachos V.H., Castro L.M. and Dey D.K., Bayesian inference in nonlinear mixed-effects models using normal independent distributions, Comput. Stat. Data Anal. 64 (2013), pp. 237–252. [Google Scholar]
40.Liu L., Strawderman R.L., Johnson B.A. and O'Quigley J.M., Analyzing repeated measures semi-continuous data, with application to an alcohol dependence study, Statist. Meth. Med. Res. 25 (2016), pp. 133–152. [DOI] [PubMed] [Google Scholar]
41.Maleki M., Contreras-Reyes J.E. and Mahmoudi M.R., Robust mixture modelling based on two-Piece scale mixtures of normal family, Axioms 8 (2019), pp. 38. [Google Scholar]
42.Meropi P., Bikos C. and George Z., Outlier detection in skewed data, Simul. Model Practice Theor. 87 (2018), pp. 191–209. [Google Scholar]
43.Ngunkeng G., Statistical analysis of skew normal distribution and its applications, Doctoral dissertation, Bowling Green State University, 2013
44.Nurminen H., Ardeshiri T., Piche R. and Gustafsson F., Robust inference for state-space models with skewed measurement noise, IEEE Signal Proces. Lett. 22 (2015), pp. 1898–1902. [Google Scholar]
45.Partlett C., Asymmetry and other distributional properties in medical research data. Doctoral dissertation, University of Birmingham; 2015
46.Robins H. and Monro S., A stochastic approximation method, Ann. Math. Stat. 22 (1951), pp. 400–407. [Google Scholar]
47.Sengupta D., Choudhary P.K. and Cassey P., Modeling and Analysis of Method Comparison Data with Skewness and Heavy Tails, Ordered Data Analysis, Modeling and Health Research Methods, Springer, Cham, 2015, pp. 169–187.
48.Sivananadam S.N. and Deepa S.N., Introduction to Genetic Algorithm, Springer-Verlag, Berlin Heidelberg, 2008. [Google Scholar]
49.Smirnova E., Huzurbazar S. and Jafari F., PERFect: PERmutation filtering test for microbiome data, Biostatistics 20 (2018 Jun 18), pp. 615–631. [DOI] [PMC free article] [PubMed] [Google Scholar]
50.Smith V.A., Neelon B., Preisser J.S. and Maciejewski M.L., A marginalized two-part model for longitudinal semicontinuous data, Statist. Meth. Med. Res. 26 (2017), pp. 1949–1968. [DOI] [PubMed] [Google Scholar]
51.Snyman J.A. and Wilke D.N., Practical Mathematical Optimization- Basic Optimization Theory and Gradient Based Algorithms, 2nd ed. Springer Optim Appl, Springer, 2018, pp. 133. [Google Scholar]
52.Sun Y., Hering A.S. and Browning J.M., Robust bivariate error detection in skewed data with application to historical radiosonde winds, Environmetrics 28 (2017), pp. e2431. [Google Scholar]
53.Telford R.D. and Cunningham R.B., Sex, sport, and body-size dependency of hematology in highly trained athletes, Med. Sci. Sports Exerc. 23 (1991), pp. 788–794. [PubMed] [Google Scholar]
54.van den Hout A. and Matthews F.E., A piecewise-constant Markov model and the effects of study design on the estimation of life expectancies in health and ill health, Statist. Meth. Med. Res. 18 (2009), pp. 145–162. [DOI] [PubMed] [Google Scholar]
55.Vandenberghe L., Fast Gradient Methods, Lecture notes for EE236C at UCLA, 2019
56.Warwick J. and Jones M.C., Choosing a robustness tuning parameter, J. Stat. Comput. Simul. 75 (2005), pp. 581–588. [Google Scholar]
57.Wason J.M. and Mander A.P., The choice of test in phase II cancer trials assessing continuous tumour shrinkage when complete responses are expected, Statist. Meth. Med. Res. 24 (2015), pp. 909–919. [DOI] [PMC free article] [PubMed] [Google Scholar]
58.Wu L., A joint model for nonlinear mixed-effects models with censoring and covariates measured with error, with application to aids studies, J. Amer. Statist. Assoc. 97 (2002), pp. 955–964. [Google Scholar]
59.Wu H. and Ding A.A., Population HIV-1 dynamics in vivo: applicable models and inferential tools for virological data from AIDS clinical trials, Biometrics 55 (1999), pp. 410–418. [DOI] [PubMed] [Google Scholar]
60.Xing D., Huang Y., Chen H., Zhu Y., Dagne G.A. and Baldwin J., Bayesian inference for two-part mixed-effects model using skew distributions, with application to longitudinal semicontinuous alcohol data, Statist. Meth. Med. Res. 26 (2017), pp. 1838–1853. [DOI] [PubMed] [Google Scholar]
61.Yalçinkaya A., Enoglu B. and Yolcu U., Maximum likelihood estimation for the parameters of skew normal distribution using genetic algorithm, Swarm Evolut. Comput. 38 (2018), pp. 127–138. [Google Scholar]
62.Yiu S. and Tom B.D., Two-part models with stochastic processes for modelling longitudinal semicontinuous data: computationally efficient inference and modelling the overall marginal mean, Statist. Meth. Med. Res. 27 (2018), pp. 3679–3695. [DOI] [PMC free article] [PubMed] [Google Scholar]
63.Yuan Y., Step-sizes for the gradient method, AMS/IP Stud. Adv. Math. 42 (1999), pp. 785–805. [Google Scholar]
64.Zeller C.B., Cabral C.R. and Lachos V.H., Robust mixture regression modeling based on scale mixtures of skew-normal distributions, TEST 25 (2016), pp. 375–396. [Google Scholar]

[CIT0001] 1.Azzalini A., A class of distributions which includes the normal ones, Scand. J. Stat. 12 (1985), pp. 171–178. [Google Scholar]

[CIT0002] 2.Azzalini A., Further results on a class of distributions which includes the normal ones, Statistica 46 (1986), pp. 199–208. [Google Scholar]

[CIT0003] 3.Azzalini A., The skew-normal distribution and related multivariate families, Scand. J. Stat. 32 (2005), pp. 159–188. [Google Scholar]

[CIT0004] 4.Azzalini A., Skew-symmetric families of distributions, in International Encyclopedia Statistical Science, M. Lovric, ed., Springer, Berlin Heidelberg, 2011, pp. 1344–1346.

[CIT0005] 5.Azzalini A. and Regoli G., The work of Fernando de Helguero on non-normality arising from selection, Chilean J. Stat. 3 (2012). [Google Scholar]

[CIT0006] 6.Bandyopadhyay D., Lachos V.H., AbantoValle C.A. and Ghosh P., Linear mixed models for skewnormal/independent bivariate responses with an application to periodontal disease, Stat. Med. 29 (2010), pp. 2643–2655. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CIT0007] 7.Barzilai J. and Borwein J.M., Two-point step size gradient methods, IMA J. Numer. Anal. 8 (1988), pp. 141–148. [Google Scholar]

[CIT0008] 8.Basso R.M., Lachos V.H., Cabral C.R.B. and Ghosh P., Robust mixture modeling based on scale mixtures of skew-normal distributions, Comput. Stat. Data Anal. 54 (2010), pp. 2926–2941. [Google Scholar]

[CIT0009] 9.Basu A., Harris I.R., Hjort N.L. and Jones M.C., Robust and efficient estimation by minimising a density power divergence, Biometrika 85 (1998), pp. 549–559. [Google Scholar]

[CIT0010] 10.Basu A., Mandal A., Martin N. and Pardo L., Generalized Wald-type tests based on minimum density power divergence estimators, Statist. 50 (2016), pp. 1–26. [Google Scholar]

[CIT0011] 11.Basu A., Shioya H. and Park C., Statistical Inference: The Minimum Distance Approach, Chapman and Hall/CRC, Boca Raton, FL, 2011. [Google Scholar]

[CIT0012] 12.Brys G., Hubert M. and Struyf A., A robust measure of skewness, J. Comput. Graph Stat. 13 (2004), pp. 996–1017. [Google Scholar]

[CIT0013] 13.Brys G., Hubert M. and Struyf A., A comparison of some new measures of skewness, in Developments in Robust Statistics, R. Dutter, P. Filzmoser, U. Gather, and P. J. Rousseeuw, eds., Physica, Heidelberg, 2003, pp. 98–113

[CIT0014] 14.Castro L.M., Wang W.L., Lachos V.H., Inacio de Carvalho V. and Bayes C.L., Bayesian semiparametric modeling for HIV longitudinal data with censoring and skewness, Statist Methods Med Res 28 (2019), pp. 1457–1476. [DOI] [PubMed] [Google Scholar]

[CIT0015] 15.Chudasama C., Shah S.M. and Panchal M., Comparison of parents selection methods of genetic algorithm for TSP, International Conference on Computer Communication and Networks, CSI-COMNET-2011, Proceedings 2011, pp. 85–87.

[CIT0016] 16.Crocetta C. and Loperfido N., Maximum likelihood estimation of correlation between maximal oxygen consumption and the 6-min walk test in patients with chronic heart failure, J. Appl. Stat. 36 (2009), pp. 1101–1108. [Google Scholar]

[CIT0017] 17.da Silva Ferreira C., Vilca F. and Bolfarine H., Diagnostics analysis for skew-normal linear regression models: applications to a quality of life dataset, Braz. J. Prob. Stat. 32 (2018), pp. 525–544. [Google Scholar]

[CIT0018] 18.Daly C.H., Higgins V., Adeli K., Grey V.L. and Hamid J.S., Reference interval estimation: methodological comparison using extensive simulations and empirical data, Clin. Biochem. 50 (2017), pp. 1145–1158. [DOI] [PubMed] [Google Scholar]

[CIT0019] 19.Ghalani M.R. and Zadkarami M.R., Investigation of covariance structures in modelling longitudinal ordinal responses with skew normal random effect, Communic. Stat. Simul. Comput. 50 (2019), pp. 1–16. [Google Scholar]

[CIT0020] 20.Ghosh A., Robust inference under the beta regression model with application to health care studies, Statist. Meth. Medical Res. 28 (2019), pp. 871–888. [DOI] [PubMed] [Google Scholar]

[CIT0021] 21.Ghosh A. and Basu A., Robust estimation for independent non-homogeneous observations using density power divergence with applications to linear regression, Electron. J. Stat. 7 (2013), pp. 2420–2456. [Google Scholar]

[CIT0022] 22.Ghosh A. and Basu A., Robust estimation for non-Homogeneous data and the selection of the optimal tuning parameter: the DPD approach, J. Appl. Stat. 42 (2015), pp. 2056–2072. [Google Scholar]

[CIT0023] 23.Ghosh A. and Basu A., Robust and efficient parameter estimation based on censored data with stochastic covariates, Statist. 51 (2017), pp. 801–823. [Google Scholar]

[CIT0024] 24.Ghosh A., Basu A. and Pardo L., Robust Wald-Type tests under random censoring with applications to clinical trial analyses, Preprint (2019), arXiv:1708.09695v2 [stat.ME] [DOI] [PubMed]

[CIT0025] 25.Ghosh A., Mandal A., Martin N. and Pardo L., Influence analysis of robust Wald-type test, J. Mult. Anal. 147 (2016), pp. 102–126. [Google Scholar]

[CIT0026] 26.Giuntella O., Why does the health of immigrants deteriorate? Evidence from birth records, J. Health Econ. 54 (2017), pp. 1–16. [DOI] [PubMed] [Google Scholar]

[CIT0027] 27.Goldberg D.E., Genetic Algorithm in Search, Optimization and Machine Learning, Addison-Wesley Longman Publishing Company Inc., Boston, 1989. [Google Scholar]

[CIT0028] 28.Gutman R. and Rubin D.B., Estimation of causal effects of binary treatments in unconfounded studies with one continuous covariate, Statist. Meth. Med. Res. 26 (2017), pp. 1199–1215. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CIT0029] 29.Hampel F.R., Ronchetti E., Rousseeuw P.J. and Stahe l W.A.. Robust Statistics: The Approach Based on Influence Functions, John Wiley & Sons, New York, USA, 1986. [Google Scholar]

[CIT0030] 30.Hashimoto S., Robust estimation of skew-normal distribution with location and scale parameters via log-regularly varying functions, Int. J. Stat. Syst. 12 (2017), pp. 813–822. [Google Scholar]

[CIT0031] 31.Hinkley D.V., On power transformations to symmetry, Biometrika 62 (1975), pp. 101–111. [Google Scholar]

[CIT0032] 32.Hoseinzadeh A., Maleki M., Khodadadi Z. and Contreras-Reyes J.E., The Skew-Reflected-Gompertz distribution for analysing symmetric and asymmetric data, J. Comput. Appl. Math 349 (2019), pp. 132–141. [Google Scholar]

[CIT0033] 33.Hossain A. and Beyene J., Application of Skew normal distribution for detecting differential expression to micro-RNA data, J. Appl. Stat. 42 (2015), pp. 477–491. [Google Scholar]

[CIT0034] 34.Huang C.Y. and Ku M.S., Asymmetry effect of particle size distribution on content uniformity and over-potency risk in low-dose solid drugs, J. Pharm. Sci. 99 (2010), pp. 4351–4362. [DOI] [PubMed] [Google Scholar]

[CIT0035] 35.Hubert M., An adjusted boxplots for skewed distributions, Comput. Stat. Data Anal. 52 (2008), pp. 5186–5201. [Google Scholar]

[CIT0036] 36.Hubert M. and Van der Veeken S., Outlier detection for skewed data, J. Chemom. 22 (2008), pp. 235–246. [Google Scholar]

[CIT0037] 37.Huh M-H. and Lee Y., Skew normal boxplots and outliers, Commun. Stat. Appl. Meth. 19 (2012), pp. 591–595. [Google Scholar]

[CIT0038] 38.Kim D. and Fessler J.A., Optimized first-order methods for smooth convex minimization, Math. Prog. 159 (2016), pp. 81–107. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CIT0039] 39.Lachos V.H., Castro L.M. and Dey D.K., Bayesian inference in nonlinear mixed-effects models using normal independent distributions, Comput. Stat. Data Anal. 64 (2013), pp. 237–252. [Google Scholar]

[CIT0040] 40.Liu L., Strawderman R.L., Johnson B.A. and O'Quigley J.M., Analyzing repeated measures semi-continuous data, with application to an alcohol dependence study, Statist. Meth. Med. Res. 25 (2016), pp. 133–152. [DOI] [PubMed] [Google Scholar]

[CIT0041] 41.Maleki M., Contreras-Reyes J.E. and Mahmoudi M.R., Robust mixture modelling based on two-Piece scale mixtures of normal family, Axioms 8 (2019), pp. 38. [Google Scholar]

[CIT0042] 42.Meropi P., Bikos C. and George Z., Outlier detection in skewed data, Simul. Model Practice Theor. 87 (2018), pp. 191–209. [Google Scholar]

[CIT0043] 43.Ngunkeng G., Statistical analysis of skew normal distribution and its applications, Doctoral dissertation, Bowling Green State University, 2013

[CIT0044] 44.Nurminen H., Ardeshiri T., Piche R. and Gustafsson F., Robust inference for state-space models with skewed measurement noise, IEEE Signal Proces. Lett. 22 (2015), pp. 1898–1902. [Google Scholar]

[CIT0045] 45.Partlett C., Asymmetry and other distributional properties in medical research data. Doctoral dissertation, University of Birmingham; 2015

[CIT0046] 46.Robins H. and Monro S., A stochastic approximation method, Ann. Math. Stat. 22 (1951), pp. 400–407. [Google Scholar]

[CIT0047] 47.Sengupta D., Choudhary P.K. and Cassey P., Modeling and Analysis of Method Comparison Data with Skewness and Heavy Tails, Ordered Data Analysis, Modeling and Health Research Methods, Springer, Cham, 2015, pp. 169–187.

[CIT0048] 48.Sivananadam S.N. and Deepa S.N., Introduction to Genetic Algorithm, Springer-Verlag, Berlin Heidelberg, 2008. [Google Scholar]

[CIT0049] 49.Smirnova E., Huzurbazar S. and Jafari F., PERFect: PERmutation filtering test for microbiome data, Biostatistics 20 (2018 Jun 18), pp. 615–631. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CIT0050] 50.Smith V.A., Neelon B., Preisser J.S. and Maciejewski M.L., A marginalized two-part model for longitudinal semicontinuous data, Statist. Meth. Med. Res. 26 (2017), pp. 1949–1968. [DOI] [PubMed] [Google Scholar]

[CIT0051] 51.Snyman J.A. and Wilke D.N., Practical Mathematical Optimization- Basic Optimization Theory and Gradient Based Algorithms, 2nd ed. Springer Optim Appl, Springer, 2018, pp. 133. [Google Scholar]

[CIT0052] 52.Sun Y., Hering A.S. and Browning J.M., Robust bivariate error detection in skewed data with application to historical radiosonde winds, Environmetrics 28 (2017), pp. e2431. [Google Scholar]

[CIT0053] 53.Telford R.D. and Cunningham R.B., Sex, sport, and body-size dependency of hematology in highly trained athletes, Med. Sci. Sports Exerc. 23 (1991), pp. 788–794. [PubMed] [Google Scholar]

[CIT0054] 54.van den Hout A. and Matthews F.E., A piecewise-constant Markov model and the effects of study design on the estimation of life expectancies in health and ill health, Statist. Meth. Med. Res. 18 (2009), pp. 145–162. [DOI] [PubMed] [Google Scholar]

[CIT0055] 55.Vandenberghe L., Fast Gradient Methods, Lecture notes for EE236C at UCLA, 2019

[CIT0056] 56.Warwick J. and Jones M.C., Choosing a robustness tuning parameter, J. Stat. Comput. Simul. 75 (2005), pp. 581–588. [Google Scholar]

[CIT0057] 57.Wason J.M. and Mander A.P., The choice of test in phase II cancer trials assessing continuous tumour shrinkage when complete responses are expected, Statist. Meth. Med. Res. 24 (2015), pp. 909–919. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CIT0058] 58.Wu L., A joint model for nonlinear mixed-effects models with censoring and covariates measured with error, with application to aids studies, J. Amer. Statist. Assoc. 97 (2002), pp. 955–964. [Google Scholar]

[CIT0059] 59.Wu H. and Ding A.A., Population HIV-1 dynamics in vivo: applicable models and inferential tools for virological data from AIDS clinical trials, Biometrics 55 (1999), pp. 410–418. [DOI] [PubMed] [Google Scholar]

[CIT0060] 60.Xing D., Huang Y., Chen H., Zhu Y., Dagne G.A. and Baldwin J., Bayesian inference for two-part mixed-effects model using skew distributions, with application to longitudinal semicontinuous alcohol data, Statist. Meth. Med. Res. 26 (2017), pp. 1838–1853. [DOI] [PubMed] [Google Scholar]

[CIT0061] 61.Yalçinkaya A., Enoglu B. and Yolcu U., Maximum likelihood estimation for the parameters of skew normal distribution using genetic algorithm, Swarm Evolut. Comput. 38 (2018), pp. 127–138. [Google Scholar]

[CIT0062] 62.Yiu S. and Tom B.D., Two-part models with stochastic processes for modelling longitudinal semicontinuous data: computationally efficient inference and modelling the overall marginal mean, Statist. Meth. Med. Res. 27 (2018), pp. 3679–3695. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CIT0063] 63.Yuan Y., Step-sizes for the gradient method, AMS/IP Stud. Adv. Math. 42 (1999), pp. 785–805. [Google Scholar]

[CIT0064] 64.Zeller C.B., Cabral C.R. and Lachos V.H., Robust mixture regression modeling based on scale mixtures of skew-normal distributions, TEST 25 (2016), pp. 375–396. [Google Scholar]

PERMALINK

Robust inference for skewed data in health sciences

Amarnath Nandy

Ayanendranath Basu

Abhik Ghosh

ABSTRACT

1. Introduction

Figure 1.

Figure 2.

Table 6.

2. The minimum DPD estimation for the SN distributions

2.1. Estimating equation

2.2. Asymptotic efficiency and standard error

Table 1.

2.3. Robustness: influence function analysis

Figure 3.

3. Computation of the MDPDE

3.1. Genetic algorithm

3.2. Gradient descent method

4. Robust Wald-type tests based on MDPDE

4.1. General theory for composite hypotheses

4.2. Robust test for symmetry: normal versus skew-Normal

Table 2.

Figure 4.

5. Simulation study

5.1. Performance of the MDPDE

Table 3.

Table 4.

5.2. Performance of the MDPDE-based Waldtype test

Table 5.

6. Real data applications

6.1. AIS dataset

Figure 5.

Figure 6.

6.2. AIDS clinical trial data

Figure 7.

Table 7.

Figure 8.

7. Concluding remarks

Acknowledgements

Appendix. The two non-parametric estimators of skewness

A.1. QSN: A quartile-based estimator of skew-normal parameters

A.2. MC: A non-parametric estimator of the skewness parameter

Funding Statement

Disclosure statement

References

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases