Abstract
An easy-to-implement global procedure for testing the four assumptions of the linear model is proposed. The test can be viewed as a Neyman smooth test and it only relies on the standardized residual vector. If the global procedure indicates a violation of at least one of the assumptions, the components of the global test statistic can be utilized to gain insights into which assumptions have been violated. The procedure can also be used in conjunction with associated deletion statistics to detect unusual observations. Simulation results are presented indicating the sensitivity of the procedure in detecting model violations under a variety of situations, and its performance is compared with three potential competitors, including a procedure based on the Box-Cox power transformation. The procedure is demonstrated by applying it to a new car mileage data set and a water salinity data set that has been used previously to illustrate model diagnostics.
Keywords: Box-Cox transformation, Deletion statistics, Model diagnostics and validation, Neyman smooth test, Outlier detection, Score test
1 Linear Model and its Assumptions
One of the most important models in Statistics is the linear model, in which the relationship between an observable n × 1 response vector Y and an observable n × p design matrix X of predictor variables is given by
(1) |
where β is a p × 1 vector of unknown coefficients, σ is an unknown scale parameter, and ε is an n × 1 vector of unobservable error variables. Conditionally on X, ε has a multivariate normal distribution with mean vector 0 and covariance matrix I, the n × n identity matrix. This distributional assumption, together with the linear link specification in (1) are enumerated as four distinct assumptions: (A1) (Linearity) E{Yi|X} = xiβ,where xi is the ith row of X; (A2) (Homoscedasticity) Var{Yi|X} = σ2, i = 1,2,…,n; (A3) (Uncorrelatedness) Cov{Yi,Yj|X} = 0,(i ≠ j); and (A4) (Normality) (Y1,Y2,…,Yn)|X have a multivariate normal distribution. Assumptions (A3) and (A4) imply that, given X, Yi, i = 1, 2, …, n,are independent normal random variables. Without loss of generality, we assume that X is of full rank with n > p, so rank(X) = p. Under (A1)–(A4),the maximum likelihood (ML) estimators of β and σ2 are given, respectively, by
(2) |
where P[X] = H = X(XtX)−1Xt is the projection operator on the linear subspace generated by the columns of X. The estimator b in (2) is also the least-squares (LS) estimator of β. The usual procedures for constructing confidence ellipsoids/intervals and for testing hypotheses for β and σ2 rely on the validity of (A1)–(A4). The consequences of the breakdown of any of these four assumptions are well-known, and possible remedial measures such as variable transformations, weighted regression, incorporating additional predictor variables and, if need be, the adoption of nonparametric methods, have also been discussed (see, for example, Neter, Kutner, Nachtsheim, and Wasserman 1996).
Assessment of whether assumptions (A1)–(A4) are satisfied, based on the data (Y, X), has received considerable attention. Assessment procedures typically involve the standardized residuals R, herein defined according to
(3) |
There are other types of residuals that have been used in model validation and diagnostics. The studentized residuals are R′ = (R′1, …, R′n) with , where hii is the ith diagonal element of H. Other residuals are Theil (1965)’s best linear unbiased scalar covariance residuals (BLUS residuals) as well as recursive or sequential residuals; see review paper. We focus our attention on R, as this is the residual vector that naturally arises from our theoretical development.
Important work in assessing the model assumptions includes Tukey (1949) for assessing (A1); Durbin and Watson (1950, 1951) for assessing (A3); Anscombe (1961) and Anscombe and Tukey (1963) for assumptions (A4) and (A2). Many of these methods are summarized and discussed in Cook and Weisberg (1982) and Atkinson (1985). It should be noted that the residuals are not independent and may have different variances even if (A1)–(A4) hold, in contrast to the i.i.d. structure of ε = (Y − Xβ)/σ, which is the counterpart of R when the model parameters β and σ are known. The impact, especially the non-negligible change on the distributional properties of the residuals even in large samples, by the substitution of estimators for unknown parameters to obtain the residuals has been duly noted, cf., Durbin and Watson (1950); Anscombe and Tukey (1963); Theil (1965); Atkinson (1985).
In the assessment of (A1)–(A4) through graphical methods, the impact of the aforementioned substitution is mostly ignored, potentially giving rise to inaccurate assessment. Moreover, even apart from this issue, the interpretation of graphical methods is highly subjective, for though a picture is worth a thousand words, beauty is in the eye of the beholder. Furthermore, a particular plot is used to assess a specific assumption, and sometimes the synergistic impact of combinations of violations of (A1)–(A4) is not clear. It is therefore beneficial to augment these plots with a numerical measure of the degree to which (A1)–(A4) are violated.
Formal significance tests for (A1)–(A4) involve testing the null hypothesis (H0) versus the alternative hypothesis (H1), where
(4) |
The typical structure of such a test is to define a statistic S(R) whose sampling distribution is known under H0 and such that departures from H0 will manifest in terms of larger values of S(R). Given an observed residual vector R = r, one calculates the p-value via p = P {S(R) > S(R) >S(r) | H0}, and the decision to reject H0 is based on the magnitude of p. However, existing formal significance tests are typically tests for a specific assumption, hence are not simultaneous or global tests for the four assumptions (A1)–(A4). For instance, there are tests for the normality assumption (cf., Anscombe and Tukey 1963); there are tests for link mis-specifications (cf., Tukey 1949); there are tests for heterogeneity of variances (cf., Cook and Weisberg 1983; Bickel 1978; Anscombe 1961); and there are tests for the uncorrelatedness or independence of the error components (cf., Durbin and Watson 1950cf., Durbin and Watson 1951; Theil and Nagar 1961). See also Kianifard and Swallow (1996) for procedures that use the recursive residuals for significance testing of the different assumptions. The difficulty with these tests is that each is designed to detect departures from one assumption, and the impact of violations of other assumptions on this test, as well as its sensitivity against these violations are not apparent. Hence, when a specific test indicates a violation, it might be due to the violation of another assumption which affects this test. For example, a test for normality could be affected by a mis-specified link function or dependent error components. One may decide to perform tests for each of the different assumptions, but this will lead to an increase in the Type I error probability when the results of these tests are combined, though some corrective measure such as a Bonferroni adjustment could be implemented to alleviate this inflation. There is therefore a need to have a global test for all the assumptions (A1)–(A4) that controls the Type I error rate and which could be used especially if the analyst does not have an idea of which set of assumptions are violated. If such a test indicates that at least one of the assumptions is not satisfied, then directional tests may be used to determine the assumptions that have been violated. Knowing the set of assumptions that has been violated is important for instituting appropriate remedial measures, such as variable transformations, adjustments in the link function, utilizing lagged values, etc.
In this paper we propose such a global test. An important consideration in our proposal is that the procedure should be simple and easy-to-implement, but at the same time should be theoretically justifiable. Our procedure is based on the residual vector R, and the theoretical development of the procedure relies on the idea of smooth test (cf., Thomas and Pierce 1979; Rayner and Best 1986, 1989). The components of the global test can also be used as directional tests for determining the assumptions that have been violated. Because functions of R generally possess complicated distributional properties, asymptotic distributional properties for the global test are ascertained. For small sample sizes, computer-intensive methods may be employed to determine p-values. We also discuss deletion statistics based on the global statistic that can be used to identify outlying or influential observations. Moreover, the mathematical framework for the test procedure is quite general and allows for a broad class of tests to be generated by changing the embedding functions (see Section 3). However, with the goal of obtaining an easy-to-implement procedure, and to recover some currently-used directional tests, we have confined ourselves to a particular set of embedding functions. As a reviewer pointed out, a better procedure may arise via a different choice of embedding functions, but possibly at the cost of greater complexity. Even with this potential limitation, the performance of the proposed procedure is still commendable as seen in the simulation studies in Sections 5 and the applications in Section 6.
There is a deeper foundational issue regarding model-building in relation to the validation of the model assumptions and the additional inferences that are made such as testing hypotheses, construction of confidence intervals about the regression parameters, or the prediction of future observations. For example, suppose that the linear model assumptions are validated through formal and/or graphical methods using the observed data, so this validation process is subject to error, and then regression parameters are estimated using the same data and via procedures derived under the linear model and its assumptions. How should one assess the properties of these estimators in light of this two-step process? There is a growing literature and ongoing active research on the more general, but related, area of inference after model selection; see for instance Hjort and Claeskens (2003), Claeskens and Hjort (2003), and Dukić and Peña (2005), and references in these papers. This is an important issue that needs to be addressed, but this paper focuses on formally validating the model assumptions.
The paper is organized as follows. Section 2 describes and discusses the global and the component statistics. The theoretical justification of the global procedure is presented in Section 3 where it is derived as a Neyman smooth test. The asymptotic normality and the asymptotic independence of the components are established in this section. Deletion statistics, obtained by excluding an observation from the analysis, are described in Section 4. Section 5 presents simulation studies that examined the properties of the procedures. Section 6 illustrates the applications of these procedures to two real data sets. Concluding thoughts are provided in Section 7.
2 Validation Procedures
We first present the tests in this section, and then provide theoretical justification in the next section. Henceforth, we assume that X has as its first column the n × 1 vector 1 = (1, 1, …, 1)t, so that we are incorporating an intercept term in model (1). Recalling that the ith component of the residual vector R is , where is the ith fitted value, the first three component statistics are as follows:
(5) |
(6) |
(7) |
where, with for an n × q matrix Z, we define
(8) |
The fourth component statistic requires a user-supplied n × 1 vector V, which by default is set to be the standardized time sequence V = (1, 2, …, n)t/n. It is defined via
(9) |
with . The global test statistic is defned as
(10) |
An appealing feature of this global statistic is that variants of the statistics , have been considered for significance testing purposes in earlier papers. For instance, statistics related to and have appeared in Anscombe and Tukey (1963), and a statistic related to has been considered by Cook and Weisberg (1983); Bickel (1978); Anscombe (1961) in the context of testing for heteroscedasticity. Statistic is related to test for additivity. One of the main contributions of this paper is combining these different directional statistics in a global statistic and determining its properties. We will see in ensuing sections that this combined global statistic serves as an omnibus statistic for globally testing all the assumptions of the linear model.
For large n, which for application purposes will be understood to mean that n − p ≥ 30, theglobal test for the hypotheses H0 versus H1 in (4) at an asymptotic significance level of α is:
(11) |
where is the 100(1 − α)th percentile of a central chi-squared distribution with degrees-of-freedom (df) k. If the test in (11) leads to the rejection of H0,the component statistics could be examined by comparing their values to , or perhaps more approxiately to (see the test in (19)) or (see the test in (20)), to get an indication of which particular assumption or assumptions have been violated. The following are rough guidelines in interpreting the values of these component statistics, with these guidelines suggested by the theoretical considerations to be presented in Section 3 and the simulation results in Section 5: (i) Skewed error distributions will usually be indicated by large values of the statistic ; (ii) Deviations from the normal distribution kurtosis of the true error distribution will be generally revealed by large values of statistic ; (iii) The use of a misspecified link function, possibly due to the absence of other predictor variables in the model, will mostly be detected by large values of the statistic ; (iv) The presence of heteroscedastic errors and/or dependent errors will typically manifest in large values of the statistic ; and (v) Simultaneous violations of at least two of the assumptions (A1)–(A4) will be manifested by large values of several of these component statistics.
3 Theoretical Development of Procedure
From (1), if the true parameter values β and σ are known, we may call, perhaps inappropriately, the error vector ε to be the vector of ‘true’ residuals R0. Thus, R0 ≡ R0(σ2, β) = (Y −Xβ)/σ, which therefore is equal-in-distribution to the error vector ε. If H0 holds, then the density function of R0 is
where is the standard normal density function. Following idea of constructing a ‘smooth’ test (cf., Thomas and Pierce 1979; Rayner and Best 1989), we embed fR0(r0) into a class of density functions, indexed by θ = (θ1, θ2, …, θ6)t, whose members are of the form
(12) |
where with
The particular choice of the Qi(z; σ2, β) functions is motivated by our desire to recover commonly-used directional statistics. Other forms for the Qi(z; σ2, β) functions, such as trigonometric or wavelet functions, are certainly possible, and may lead to procedures with better properties. The function C(θ; σ2, β) in (12) is a proportionality constant that makes a density function. A straightforward calculation shows that this constant satisfies , where Z is a standard normal random variable. Notice that in the in the embedding class, the null hypothesis density function obtains when θ = 0. When β = (β1,β*t)t is fixed, this larger family, which does not depend on β1, is an exponential family of densities, hence possesses many of the nice properties intrinsic to exponential families. Furthermore, observe that if we allow for the case where β* = 0, then the larger model is not identifiable since (θ5 = 0, β* ≠ 0) and (θ5 ≠ 0, β* = 0) both lead to the same distribution. But since this model validation issue only becomes practically important in the presence of a ‘trend,’ which is the case where β* ≠ 0, then in the theoretical development we assume that β* resides in ℜp−1 \ {0}, which does not lead to any technical difficulties as this is still an open set in ℜp−1.
Let us first consider the case where β and σ2 are known, so R0 = R0(σ2, β) is observable. Within the embedding class of density functions specified by (12), the score test for versus is easily developed. The use of score tests in this situation is appealing because it is known that score tests are endowed with a “robustness of optimality” property, see Chen (1983, 1985) regarding this property, and Cox and Hinkley (1974) for a general discussion of score tests. In our setting, it is straightforward to see that the score test statistic at θ = 0 equals
Since under , are i.i.d. standard normal variables, then for any positive integer k, and , and so the covarience matrix of is
where , k = 2, 4. If, as n → ∞, the following conditions are satisfied:
There exists a nonsingular p × p matrix Σx such that
There exists a function Ω(β) such that
There exists a such that
and
then it follows from the Lindeberg-Feller Central Limit Theorem (CLT) that, under H0,
where
(13) |
In this situation where β and σ2 are assumed known, notice the asymptotic dependence of the components Q1, Q3 and Q5; as well Q2 and Q4. An asymptotic α-level score test for versus rejects whenever
However, since σ2 and β are unknown, neither R0 nor are observable. There is therefore a need to use estimators for σ2 and β in R0(σ2, β), and by substituting the ML estimators s2 and b given in (2), respectively, we obtain the (estimated) residual vector R = R0(s2, b) given in (3). To develop a test based on R, we need the asymptotic distribution of Q(R; s2, b) under H0. Towards this goal, observe that the ML estimating equations for σ2 and β that give rise to s2 and b are
(14) |
(15) |
Augmenting the vector Q with A and B, then invoking the Lindeberg-Feller CLT, we find that, under H0, plus the conditions guaranteeing asymptotic normality of Q(R0(σ2, β); σ2, β) enumerated earlier,
(16) |
where
with
and with μX and Γ(β) defined according to
By virtue of (14) and (15), when s2 and b are substituted for σ2 and β, respectively, the last two components in the augmented vector are both equal to zero. Consequently, it follows by multivariate normal theory, or it could be established more formally by relying on result, that
where To provide a simplified form for this limiting covariance matrix, we establish the following intermediate result.
Lemma 1 If the first column of X is 1, then
Proof: Write X = [1 W] so Applying the partitioned matrix inverse theorem (cf., Anderson 1984, Th. A.3.3), we obtain
Since μX = (1 μW), then the assertion immediately follows by matrix multiplication.||
By straightforward multiplication, and applying Lemma 1, we obtain
(17) |
with The matrix Δ(σ2, β) is the correction factor in the limiting covariance matrix arising from plugging-in s2 and b for σ2 and β, respectively. This factor is clearly non-negligible. Finally, from (13) and (17), a simplified form of Ξ11.2 is
(18) |
where We formally state this asymptotic result as a theorem.
Theorem 1 If assumptions (A1)–(A4) hold for the linear model in (1) with X having as its first column the vector 1, and if conditions (a)–(e) enumerated earlier hold, then n−½Q(R; s2, b) converges in distribution to a zero-mean normal distribution with covariance matrix Ξ11.2 given in (18).
Note the invariance of this asymptotic result to re-scaling, that is, the result is independent of σ. This is a consequence of the facts that the model is scale-invariant and the residual vector is scale-equivariant. The theorem also indicates that Q1(R; s2, b) and Q2(R; s2, b) are degenerate at zero, hardly a surprise since these quantities are the estimating functions for σ2 and β. What is surprising, instead, is the asymptotic independence of Q3(R; s2, b) and Q5(R; s2, b), since as noted earlier, Q3(R0; σ2, β) and Q5(R0; σ2, β) are not asymptotically independent. Thus, interestingly and unexpectedly, replacing the unknown parameters by their ML estimators in the quantities Q(R0(σ2, β); σ2, β) made all the components asymptotically independent!
The quantities Ω(β), ΣX, and Γ(β) can be consistently estimated by their empirical counterparts and with β replaced by b. Their respective estimators are those given in (8), and so we are able to obtain a consistent estimator of Ξ11.2. The score statistic for testing versus , with σ2 and β considered as nuisance parameters, is the quadratic form of with quadratic matrix , where for a matrix M, M− denotes inverse. It is immediate to see that this statistic is
where , and are as defined in (5), (6), (7), (9), and (10), respectively. Theorem 1 therefore justifies the use of the chi-squared distribution with four df for assessing the magnitude of , as well as the one df chi-squared distributions for each of the component statistics.
Before proceeding, we mention three possible competing test procedures to the -based test. These competing tests are also included in the simulation studies. The first is to perform simultaneous testing using the test statistics but incorporating a Bonferroni adjustment. By virtue of the asymptotic results above, this test is as follows:
(19) |
This amounts to rejecting H0 if at least one of the unidirectional tests rejects H0 at level of significance of α/4.
The second competing test arises by recognizing that under H0, by invoking the asymptotic independence of the component statistics, the asymptotic distribution of the test statistic Gmax in (19) is As a consequence, an asymptotic α-level test of H0 is provided by:
(20) |
When α = .05, the critical values of the tests in (19) and (20) equal 6.239 and 6.205, respectively. This explains the almost identical behaviors of these two tests observed in the simulation studies (see Section 5).
The third competitor, referred in Section 5 as the BoxCox, is the use of the Box and Cox 1964 power transformation. The idea is to ft the linear model on the transformed responses with the transformation being
(21) |
where γ is the transformation parameter, and is its ML estimate. The null hypothesis H0 is then rejected if the null hypothesis is rejected. The test for utilized a likelihood-ratio test, with the numerical implementation relying on the R language (Ihaka and Gentleman 1996) object boxcox found in the MASS package of Venables and Ripley.
4 Deletion Statistics
It is important to accompany assessment of model assumptions with investigation for unusual observations (either outlying or influential), since such observations could impact inferences regarding model validity. Unusual observations may arise either as a consequence of model violations or as rare outcomes when the data in fact adhere to the model. In the first case, exclusion of the unusual observations from analysis may have little impact because the global test remains sensitive to violations in the remaining data, or may lead to a nonsignificant global test because the excluded observations aided detection of violations substantially. In the second case, when the data meet model assumptions apart from rare exceptions, unusual observations may cause the global test to indicate violations, so that their deletion would then permit the procedure to reflect the adherence of the remaining data to the model. In any case, unusual observations should be handled with caution, and solid justification is required for their exclusion, as in the examples in Section 6.
A natural -based procedure for detecting unusual observations arises from the well-known idea of deletion statistics, which reflect the change in values of statistics after the deletion of an observation. For a statistic T, denote by T[i] the value of the statistic after the ith observation is deleted. We will be interested in the quantities
(22) |
which represent the percent relative change in the value of the global statistic after the deletion of the ith observation. The idea is that an observation with a large absolute value of is either an outlier or has large influence. The sign of this global deletion statistic is also informative, since a positive (negative) value indicates that the deleted observation makes the assumptions more (less) plausible.
Related to the statistic in (22) is the p-value after the deletion of the ith observation, that is,
where is the observed value of the global statistic after deletion of the ith observation. The evaluation of this probability could be performed using the (approximate) chi-squared distribution with 4 df. The idea is if p[i] is quite different from the other p[j]’s, this will be indicative that the ith observation is either an outlier or an influential observation.
A potentially useful and interesting plot is the scatterplot of p[·] = (p[1],…,p[n])t versus Following Tukey’s (1977) idea, we indicate in our plots the observation labels of those points beyond the outer fences of either or p[·].Such observations are unusual in that they either have a large influence on the value or the p-value of the global statistic. This plotting idea will be demonstrated in the illustrative examples in Section 6.
5 Properties of Procedures
Simulation studies were performed to assess the achieved levels and powers of the proposed tests for small to moderate sample sizes. The simulation runs to assess the levels each had 20,000 replications (except for the BoxCox test which was added later at the suggestion of a reviewer), while the runs to determine the powers of the tests had 5,000 replications. The simulation code was in the R language (Ihaka and Gentleman 1996), using the function lm and built-in random number generators. For each set of simulation runs associated with a particular combination of simulation parameters, a common covariate sequence x1, x2, …, xn, generated from the standard uniform distribution was used.
The first set of runs was to determine if the procedures achieve a pre-specified 5% level of significance for the sample sizes considered. The sample size n took values ranging from 5 to 1200; see Table 1. The response values were generated according to the model
Table 1.
Simulated levels (%) of the asymptotic 5%-level tests based on 20,000 replications, except for the Box=-Cox test which are based on 1,000 replications. The data was generated according to Yi = xi + εi where the xis are a fixed sequence generated from a standard uniform, and εis are i.i.d.N(0,1) variates. The model Yi = β0 + β1xi was fitted.
RunNum | n | MaxTest | BonfTest | BoxCox | |||||
---|---|---|---|---|---|---|---|---|---|
1 | 5 | 0:000 | 0:000 | 12:285 | 0:000 | 0:000 | 0:000 | 0:000 | 4:5 |
2 | 15 | 2:445 | 0:995 | 6:425 | 2:545 | 2:685 | 2:440 | 2:370 | 4:9 |
3 | 30 | 3:660 | 2:000 | 5:620 | 3:910 | 4:145 | 3:500 | 3:460 | 4:2 |
4 | 50 | 4:060 | 2:495 | 5:260 | 4:355 | 4:520 | 3:990 | 3:945 | 4:8 |
5 | 100 | 4:680 | 3:135 | 4:935 | 4:760 | 5:095 | 4:520 | 4:445 | 5:9 |
6 | 150 | 4:645 | 3:555 | 5:100 | 4:995 | 5:030 | 4:700 | 4:620 | 4:4 |
7 | 200 | 4:800 | 3:640 | 5:180 | 5:100 | 5:075 | 4:920 | 4:825 | 5:4 |
8 | 300 | 4:845 | 3:820 | 4:875 | 4:920 | 4:990 | 4:780 | 4:675 | 5:2 |
9 | 400 | 4:805 | 4:285 | 5:015 | 5:205 | 5:135 | 5:135 | 5:040 | 5:4 |
10 | 600 | 5:055 | 4:365 | 4:735 | 4:795 | 5:065 | 5:045 | 4:980 | 3:6 |
11 | 800 | 4:940 | 4:420 | 5:230 | 4:885 | 5:200 | 5:055 | 4:975 | 5:4 |
12 | 1200 | 5:060 | 4:700 | 5:210 | 5:045 | 5:185 | 5:300 | 5:195 | 5:0 |
(23) |
where εi’s were generated from a standard normal distribution. The model
(24) |
was fitted and the resulting residuals,Ri, (i = 1, 2, …, n), were used in the testing procedures. For V was the default standardized time-sequence. Table 1 summarizes the observed empirical rejection rates. Note that for small sample sizes (n ≤ 30), the asymptotic approximation is not satisfactory. Except for the test based on the procedures tend to be conservative. For moderate to large sample sizes, the procedures achieve significance levels close to the nominal 5%, though the -based statistic has a mild degree of conservatism even when the sample size is large. The rate of convergence to the distribution for this statistic is rather slow, as has been noted in earlier papers; see for instance Doornik and Hansen (1994).
For the power simulations, n takes values in the set {15, 30, 50, 100, 200}. We examined the achieved powers of the tests for n = 15 and n = 30 because, apart from the -based test, the results in Table 1 show that the tests are conservative, which may be acceptable except for a potential decrease in power. Power simulations were performed for specific types of departures from the model assumptions, and for multiple violations of the assumptions. The generic data generation model is given by
(25) |
where and for In this model, all the assumptions are satisfied whenever β2 = 0 or γ = 0, σ2 = 1, α = 0, and εi i.i.d. N(0, 1). To induce a dependent error structure, two models were considered. The first model, which endows the error sequence a martingale structure, has
(26) |
where are i.i.d. from N(0, 1). The second model induces an autoregressive [AR(1)] structure via
(27) |
with ρ being a dependence parameter. Extensive summaries of the simulated powers for all the sample sizes considered in the simulation and for many varieties of departures from model assumptions are summarized in a series of tables in a technical report of the same title as this manuscript, which is available upon request from the authors. To conserve space, we summarize in Table 2 representative cases for each specific departure from model assumptions. Table 3 contains the results when multiple violations occur. The conclusions obtained from this representative summary in Table 2 coincide with those obtained from the extensive tables in the technical report. In the discussion of the simulation results that follows, we also refer to and make use of the more extensive tables in the technical report.
Table 2.
Simulated powers for two sample sizes and specific type of departure from the linear model assumptions. The true model that generated the data is given in (25) with β2 = 0, γ = 0, σ2 = 1, α = 0, and ε i.i.d. N(0. 1), except with the specific change in the column headed ‘Relevant Model Parameters’ containing the violation.
Type of Model Violation | Relevant Model Parameters | n | MaxTest | BonfTest | BoxCox | |||||
---|---|---|---|---|---|---|---|---|---|---|
Non-Normal | t5 | 30 | 20.8 | 21.06 | 5.8 | 10.7 | 24 | 21 | 20 | 17.2 |
100 | 39.1 | 60.80 | 5.3 | 16.7 | 60 | 57 | 57 | 28.9 | ||
Error Distribution | χ52 | 30 | 47.9 | 19.70 | 5.4 | 10.4 | 34 | 32 | 31 | 77.9 |
100 | 99.0 | 57.54 | 5.4 | 14.7 | 93 | 96 | 96 | 100.0 | ||
Heteroscedastic | α = 2 | 30 | 26.7 | 59.48 | 13.4 | 26.4 | 59 | 54 | 54 | 64.9 |
100 | 40.8 | 99.20 | 12.1 | 44.3 | 98 | 98 | 98 | 75.0 | ||
Error | σ2 = 2 | 30 | 12.1 | 10.66 | 6.6 | 40.8 | 27 | 25 | 25 | 9.7 |
100 | 19.0 | 39.00 | 4.5 | 97.3 | 91 | 92 | 92 | 12.0 | ||
Misspecified Link | β2 = 3 | 30 | 3.2 | 1.58 | 29.3 | 3.8 | 11 | 14 | 14 | 8.8 |
γ = 2 | 100 | 4.2 | 2.86 | 54.7 | 5.1 | 31 | 35 | 35 | 9.3 | |
Function | β2 = 5 | 30 | 3.5 | 1.48 | 32.6 | 3.8 | 13 | 16 | 16 | 12.6 |
γ = 2 | 100 | 4.4 | 2.92 | 95.5 | 5.4 | 82 | 88 | 88 | 35.7 | |
Dependent Error | Martingale | 30 | 15.6 | 7.30 | 2.6 | 39.3 | 27 | 27 | 26 | 25.4 |
Type | 100 | 56.2 | 37.84 | 1.2 | 73.8 | 75 | 72 | 72 | 54.1 | |
Structure | Markov | 30 | 7.4 | 0.86 | 6.4 | 22.6 | 14 | 13 | 12 | 7.3 |
Type (ρ = 5) | 100 | 28.1 | 26.58 | 3.5 | 51.1 | 55 | 50 | 50 | 29.2 |
Table 3.
Simulated powers (%) of the tests for models where all the four assumptions are violated. The number of replications is 5,000. The true model that generated the data is given in (25), and the model in (24) is fitted to the data.
β2 | γ | EDa | α | GRb | σ2 | TTc | ρ | n | MaxTest | BonfTest | BoxCox | ||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
1 | 2 | 2 | t10 | 1 | T | 1.5 | MTd | NA | 15 | 10 | 8.5 | 11.7 | 3.4 | 13 | 9.7 | 9.6 | 37 |
2 | 2 | 2 | t10 | 1 | T | 1.5 | MT | NA | 30 | 21 | 28.0 | 8.5 | 21.1 | 33 | 28.4 | 28.2 | 42 |
3 | 2 | 2 | t10 | 1 | T | 1.5 | MT | NA | 50 | 38 | 75.6 | 23.7 | 44.0 | 80 | 75.4 | 75.1 | 68 |
4 | 2 | 2 | t10 | 1 | T | 1.5 | MT | NA | 100 | 45 | 92.0 | 25.2 | 47.4 | 92 | 89.8 | 89.7 | 68 |
5 | 2 | 2 | t10 | 1 | T | 1.5 | MT | NA | 200 | 52 | 99.9 | 42.1 | 85.4 | 100 | 99.9 | 99.9 | 79 |
6 | 2 | 0.5 | χ 210 | 2 | T | 1.5 | MT | NA | 15 | 16 | 14.1 | 6.8 | 27.5 | 25 | 19.1 | 19.0 | 26 |
7 | 2 | 0.5 | χ 210 | 2 | T | 1.5 | MT | NA | 30 | 40 | 56.9 | 8.2 | 17.8 | 57 | 52.0 | 51.9 | 45 |
8 | 2 | 0.5 | χ 210 | 2 | T | 1.5 | MT | NA | 50 | 55 | 88.1 | 17.6 | 54.3 | 90 | 86.0 | 85.9 | 49 |
9 | 2 | 0.5 | χ 210 | 2 | T | 1.5 | MT | NA | 100 | 77 | 99.9 | 11.9 | 49.8 | 100 | 99.5 | 99.5 | 66 |
10 | 2 | 0.5 | χ 210 | 2 | T | 1.5 | MT | NA | 200 | 92 | 100.0 | 9.3 | 69.5 | 100 | 100.0 | 100.0 | 84 |
11 | 1 | 2 | LGe | 2 | T | 2 | ARf | 3 | 15 | 11 | 29.6 | 42.8 | 3.3 | 49 | 36.9 | 36.5 | 57 |
12 | 1 | 2 | LG | 2 | T | 2 | AR | 3 | 30 | 50 | 87.4 | 22.8 | 34.0 | 84 | 81.6 | 81.5 | 51 |
13 | 1 | 2 | LG | 2 | T | 2 | AR | 3 | 50 | 57 | 97.8 | 8.9 | 75.9 | 98 | 97.2 | 97.1 | 45 |
14 | 1 | 2 | LG | 2 | T | 2 | AR | 3 | 100 | 64 | 100.0 | 14.2 | 90.2 | 100 | 100.0 | 100.0 | 51 |
15 | 1 | 2 | LG | 2 | T | 2 | AR | 3 | 200 | 70 | 100.0 | 9.6 | 98.9 | 100 | 100.0 | 100.0 | 52 |
16 | − 1 | 3 | t4 | 1 | T | 2 | AR | − 5 | 15 | 25 | 29.0 | 5.8 | 19.5 | 34 | 25.0 | 24.8 | 31 |
17 | − 1 | 3 | t4 | 1 | T | 2 | AR | − 5 | 30 | 43 | 59.6 | 7.2 | 29.0 | 61 | 55.6 | 55.4 | 35 |
18 | − 1 | 3 | t4 | 1 | T | 2 | AR | − 5 | 50 | 59 | 93.0 | 12.6 | 51.0 | 92 | 90.1 | 90.0 | 47 |
19 | − 1 | 3 | t4 | 1 | T | 2 | AR | − 5 | 100 | 68 | 99.7 | 8.9 | 89.6 | 100 | 99.6 | 99.6 | 53 |
20 | − 1 | 3 | t4 | 1 | T | 2 | AR | − 5 | 200 | 74 | 100.0 | 9.6 | 95.6 | 100 | 100.0 | 100.0 | 59 |
Error Distribution
Grouping
Time Trend
martingale
Logistic
AR(1)
The first type of violation examined was a non-normal error distribution. We considered several types of error distributions, broadly classified into symmetric and skewed distributions. The first four cases in Table 2 present the simulated powers of the tests when the error distribution is symmetric (t-distributed with 5 df), and is right-skewed (a centered χ2 with 5 df). The technical report includes the results for other error distributions such as the logistic, double exponential, t, and χ2 with df other than 5. The global test is quite good relative to the best directional test based on the four component statistics, with its power not significantly degraded by combining the four statistics, and sometimes exceeding those based on the best directional test. The best directional test statistic is which is a kurtosis-type statistic. Notice that the test does not have any power for detecting this error distribution mis-specification. As expected, when the df of the t-distribution increases, the power of the tests decreases. The powers of the MaxTest in (20) and BonfTest in (19) are almost identical, and for these symmetric distributions are lower than the -based test. Additional runs where the error distribution is a normal contaminated with a t1- or t3-distribution were also performed. For contaminating proportions of .1 and .3, the results indicate that the -based test possesses good detection abilities for this violation, and has slightly higher power than the MaxTest and BonfTest. For the t-distributed error distribution, the performance of the BoxCox test was poor relative to the -test when the sample size is large.
When the errors have shifted chi-squared distributions, the global test performs acceptably well relative to the best test among the four directional tests, with the powers slightly degraded due to combining the four directional tests, some of which do not have good power against this assumptional departure. The best directional test statistic is which is the skewness-type statistic. does not have any detection power for this alternative. When the df increases, the power diminishes, because the chi-square distribution approaches the normal distribution. The MaxTestand the BonfTest perform just slightly better than the -based test for some values of n. On the other hand, the BoxCox test performed very well for this right-skewed error distribution. Its power is significantly higher than the other tests for sample size n = 30. This superior performance of the BoxCox test could be intuitively explained by the fact that the transformation is especially appropriate for non-normal and non-symmetric error distributions. Interestingly, this non-symmetric error distribution is the only instance in Table 2 in which the BoxCox test totally dominated the other tests.
The next set of simulation runs concerns the situation where (A2) is violated, so that the conditional variances of the Yi’s are not equal. Two models were considered for this purpose. The first model has variances that depend on the covariate values. Specifically, the true model is
(28) |
where εi’s are i.i.d. from N(0; 1). The simulated powers for α = 2 are summarized in the fifth and sixth rows of Table 2. The best directional test for this departure is the -test, with the global test performing best among all the tests. Again, the test based on has very low power for this heteroscedastic model, though it is not totally devoid of detection power when n = 200. The second model for heteroscedastic variances is of form
(29) |
with εi’s also i.i.d. from N(0; 1). The seventh and eighth rows of Table 2 present the simulated powers for data arising from the model with σ1 = 1, σ2. = 2. The best directional test in this situation is based on followed by the test based on The global test also possesses acceptable power, but has lower power compared to that of The powers of the MaxTest and BonfTest are just slightly lower than the -test.
The next set of runs were for mis-specified link functions, that is, when (A1) is violated. The data analyzed were generated according to the model
(30) |
with εi’s i.i.d. from N(0; 1). The ninth to twelfth rows of Table 2 provide the simulated powers of the tests for two different sets of (β2, γ). Interestingly, the directional tests based on are not at all sensitive to this violation. The best directional test is based on The global test also has detection power towards this mis-specification, although its power is quite degraded relative to that of possibly because the other three tests have no power against this alternative. Furthermore, the MaxTest and BonfTest have better powers than the -test. When β2 = 1 and γ ∈ {.5,2}, the powers of the tests are very low. However, this should not be perceived as a defect of the tests because this is a consequence of the fact that for these parameter sets, the signal-to-noise ratios (SNR) are very low. This SNR is measured via
(31) |
with E{MSE(Model A)|Model B} being the expectation of the mean-squared error when Model A is fitted with the expectation evaluated with respect to Model B. Thus, E{MSE(True)|True} = σ2. It is straighforward to show that for the simulation model in (30), the SNR satisfies, for large n:
For the values of (β2, γ, σ) utilized in the simulation studies, , , , , , and These values explain the ordering of the simulated powers for the -based test. Note in particular that is only slightly larger than and this is reflected by the small differences in the observed powers for the -based test for these two sets of values of (β2,γ).
For the simulation runs concerning violations of assumption (A3), we considered the two models described earlier for generating martingale-type and AR-type structures. In the simulation, we performed runs for ρ ∈ {.5, 1, 2, 5, 10}. The last four rows of Table 2 present the simulated powers of the tests under these dependent error models, with ρ = 5 for the AR-type structure. For the martingale structure, the best directional test is based on with the global test surpassing the performance of this best directional test for large n, and also being slightly better than the MaxTest and BonfTest. For the AR(1) structure, the best is also the -test, with the global test’s power also very good, and again the power of the global test is best for large n. The tests based on and also have some detection abilities for this violation, but are not competitive with the -based test or the global test. The test based on possesses no ability to detect this particular type of violation. The global test performs slightly better than the MaxTest and BonfTest for the AR(1) error structure.
Finally, we consider the situation where several of the assumptions are violated simultaneously. We expect that the global test is ideally suited for this situation. Table 3 presents the achieved powers of the tests for four sets of simulation parameters where all four assumptions (A1)-(A4) are violated as in (25). All four directional tests have detection abilities. The performance of the global test is extremely commendable, as its power is generally higher than any of the directional tests as well as the MaxTest and the BonfTest. It is interesting to observe that the BoxCox test sometimes has higher power than the -test for small n, but the rate of increase of its power as n increases is relatively slow compared to the latter test. It is conceivable that the BoxCox test has higher power over the -test when n is small simply because the latter test is highly conservative for small n.
6 Illustrative Examples
Example 1: The first illustration pertains to car mileage data gathered by the first author while commuting from Ann Arbor, Michigan to Bowling Green, Ohio during the period October 20, 1996 through January 27, 1999. There were 205 observations corresponding to gas fill-ups for the following variables: Date, the date of the gas fill-up; NumGallons (denoted Y ), the number of gallons of regular unleaded gasoline pumped into the car; MilesLastFill (denoted X1), the distance travelled since the last fill-up; NumDaysBetw (denoted X2), the number of days since last fill-up; and AveMilesGal, the miles per gallon between gas fill-ups. This data set is at the URL: http://www.stat.sc.edu/~pena/DataSets/CarMileage.txt. We fit the multiple linear regression model Yi = β0+β1X1i+β2X2i+σεi, i= 1,2,…,205, where εis are i.i.d. N(0,1) variates. Scatterplots of Y versus X1 and X2 are provided in the first two plots of Figure 1, respectively. The Pearson's correlation coefficient between Y and X1 is .653, between Y and X2 is − .002, and between X1 and X2 is − .378. Other summary statistics for this data set are provided in Table 4.
Figure 1.
Relevant plots for the analysis of the car mileage data. The first plot is a scatterplot of Y versus X1; the second is that between Y and X2. The next four scatterplots are for successive re-analyses with some observations excluded.
Table 4.
Summary statistics for the car mileage data set, where FQ and TQ are the first and third quartiles, respectively
Statistic | X2 = NumDays Between | Y=Number Gallons | X1 =MilesLast Fill | AveMiles Gal |
---|---|---|---|---|
Min | 1 | 8.199 | 207.0 | 18.37 |
FQ | 3 | 12.459 | 362.3 | 28.28 |
Med | 4 | 12.909 | 379.3 | 29.46 |
TQ | 5 | 13.344 | 394.5 | 30.58 |
Max | 26 | 14.209 | 447.0 | 33.47 |
Mean | 4.1 | 12.823 | 375.5 | 29.29 |
SD | 2.7 | 0.778 | 31.7 | 1.89 |
Table 5 contains the results of the analysis, including the F-value, estimates of regression coefficients, , and the coefficient of determination. The row labeled ‘None’ pertains to the analysis where all 205 observations were used. If the model assumptions are satisfied, the regression coefficients β1 and β2 are both found to be significantly different from zero. However, has a p-value of zero, and those associated with and are also very small, indicating violation of model assumptions.
Table 5.
Results of analyses of the car mileage data using the complete data, and with excluded observations. The F-column contains the F-statistic value for testing that β1 = β2 = 0, while the R2-column contains the coefficient of determination.
Test Statistic / (p-value) | Estimates / (SE) | ||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|
Excluded Obs # | b0 | b1 | b2 | R2 | |||||||
None | 24.26 (≈0) | 0.03 (.86) | 17.24 (≈ 0) | 6.90 (.01) | 0.08 (.78) | 99.34 (0) | 5.48 (.53) | .019 (.001) | .08 (.02) | .56 | 50% |
19, 56, 67, 146, 200 | 13.09 (.01) | 5.07 (.02) | 0.08 (.78) | 7.81 (.01) | 0.14 (.71) | 105.3 (0) | 5.00 (.54) | .019 (.001) | .15 (.02) | .47 | 52% |
19, 56, 67, 146, 164, 200 | 8.99 (.06) | 3.15 (.08) | 0.09 (.76) | 4.47 (.03) | 1.28 (.26) | 128.9 (0) | 4.64 (.52) | .019 (.001) | .24 (.03) | .45 | 57% |
19, 56, 58, 67, 146, 164, 200 | 6.80 (.15) | 2.81 (.09) | 0.08 (.78) | 2.84 (.09) | 1.07 (.30) | 133.2 (0) | 4.48 (.52) | .020 (.001) | .24 (.03) | .44 | 58% |
As advocated in Section 4 we examine for unusual observations. The third plot in Figure 1, which is a scatterplot of indicates that the 19th, 56th, 67th, 146th, and 200th observations are highly unusual. Details of these observations and others that were excluded in further analyses are in Table 6. The dates of these observations reveal their unusual nature. The 19th observation was obtained on Christmas Eve just before a long trip. In contrast to usual practice, though the gas tank was still almost half-full, a decision was made to fully fill it, thus lowering fuel efficiency; the 146th observation was obtained during a long trip which mostly covered interstate highway driving; and the 200th observation encompassed a period when the car was driven during a blizzard and was stuck in deep snow, explaining the low fuel efficiency. The 56th and 67th observations showed up among these unusual values primarily because of their X2-values of 26 (on vacation) and 22 (in repair shop) days, respectively.
Table 6.
Details of the observations that were excluded from the analyses.
Obs # Excluded | X2 =NumDays Between | Y=Number Gallons | X1 =MilesLast Fill | AveMiles Gal | Date Fill-Up |
---|---|---|---|---|---|
19 | 4 | 8.199 | 207.0 | 25.25 | 12/24/96 |
56 | 26 | 12.058 | 316.2 | 26.22 | 6/10/97 |
58 | 4 | 13.043 | 430.8 | 33.03 | 6/19/97 |
67 | 22 | 12.417 | 346.9 | 27.94 | 8/15/97 |
146 | 4 | 9.809 | 311.8 | 31.79 | 5/24/98 |
164 | 21 | 11.937 | 278.8 | 23.36 | 8/17/98 |
200 | 10 | 13.138 | 241.4 | 18.37 | 1/8/99 |
We refitted the linear model with these five observations excluded from the analysis. The results are summarized in the third row in Table 5, which still indicates violations of model assumptions. More importantly, the fourth plot in Figure 1 reveals the 164th observation (in the original data set) to be highly unusual. Similar to the 56th and 67th observations, it has a large value of X2 (equal to 21 due to vacation). Observe also the sensitivity of the directional statistics to the presence of unusual observations. In the first analysis, the p-values of and were high and low, respectively, but after the exclusion of the unusual observations, this pattern reversed.
Further excluding this 164th observation (see fourth row of Table 5) now yields a global statistic of with p-value of .06 indicating that model assumptions appear viable, though the statistic has p-value of .03. A glimpse at the scatterplot, the fifth plot in Figure 1, reveals the 58th observation (in the original data set) to be unusual, though there was no obvious explanation for this being so, in contrast to the other six values. Thus, though it may not be fully justifiable, in the final analysis we also excluded the 58th observation. The results are provided in the last row of Table 5, which show all validation test statistics have p-values larger than .05, indicating that after the exclusion of the seven unusual observations pinpointed by the deletion statistics, the linear model assumptions appear acceptable. Observe also that the scatterplot of , the sixth plot in Figure 1, no longer shows any unusual observations. However, note that the p-values of and are both between .05 and .10, which may be indicating mild violations of the normality and link function assumptions. For these reduced data, the correlation coeffient between Y and X1 is .618; between Y and X2 is .219, a significant increase from the original correlation of − .002 indicating the impact of the unusual observations; and between X1 and X2 is − .323.
Example 2: The second example is a multiple regression analysis of water salinity data (see Table 3 in their paper) which they used to illustrate robust regression techniques, and which was also used for illustrative purposes in Atkinson (1985, pp. 48–52). The data set consisted of 28 observations on the variables Salinity, the water salinity at the specified time period; LagSalinity, the water salinity lagged two weeks; Trend, representing one of the six biweekly periods in March to May; and WaterFlow, the river discharge. The response variable is Salinity, while the predictors are LagSalinity, Trend, and WaterFlow. The first part of the analyses fitted the multiple regression model
(32) |
The fitted model had b0 = 9.590, b1 = .777, b2 = − .026, and b3 = − .295. The coefficients β0, β1 and β3 were significantly different from zero. The multiple R2 was 82.6%. When the model validation procedures are applied, we obtained , , , , and The global test thus indicates that model assumptions are acceptable though, as noted by a reviewer, the nearness of these p-values to one also raises a ‘too good a fit’ concern.
An examination of the plot of , the first plot in Figure 2, reveals that the 16th observation is highly unusual. In Atkinson (1985) the unusual nature of this observation was revealed using a half-normal plot of statistic. It was also pointed out that the value of WaterFlow for this observation is the cause for its being unusual and highly leveraged. The fact that the global test did not conclude violation of model assumptions, even with this very unusual observation, may cast doubt on the effectiveness of the validation procedure. Further analyses of the data, however, reveal the reason for this behavior. LagSalinity is an excellent linear predictor of Salinity, with the correlation coefficient between them equal to .872; whereas WaterFlow is not highly correlated with Salinity, with their correlation coeffient equal to − .477. In addition, the correlation coefficient between LagSalinity and WaterFlow is − .261. Consequently, when the variables LagSalinity and WaterFlow are included in the regression model, the effect of WaterFlow is diminished by the presence of LagSalinity. When LagSalinity is used in the linear regression model without WaterFlow, then indicating that model assumptions are not violated. However, when WaterFlow is used as sole predictor variable for Salinity, then and indicating violations of model assumptions and, in addition, the 16th observation is highly unusual. Therefore, when LagSalinity and WaterFlow are both in the linear regression model and the original data is used, then model assumptions are in fact satisfied. Thus the conclusion from the global test that the assumptions are satisfied, even with the unusual observation, is not cause for alarm regarding the effectiveness of the proposed validation procedure.
Figure 2.
Relevant plots for the salinity data example. The first, second, and sixth plots are the scatterplots of for the original data, corrected data, and for the corrected data with a quadratic WaterFlow term in the model, respectively. The third and fourth plot depict the resulting values of and the correlation between Salinity and WaterFlow for different replacement values for WaterFlow in the 16th observation. The fifth is a scatterplot of Salinity and WaterFlow using Atkinson's replacement value of 23.443.
We follow Atkinson (1985, p. 49), by supposing that the value of 33.443 for WaterFlow for this 16th observation was a misprint of 23.443. We re-fitted the model in (32) but using 23.443 in place of 33.443. The resulting analysis yielded b0 = 18.39, b1 = .70, b2 = − .15, and b3 = − .63, with β0, β1, β3 significantly different from zero. The multiple R2 was 89.28%. Applying the model validation procedures, we obtained , , , , and Though the global statistic has p-value exceeding 10%, the p-value for is .04, which seems to indicate a mild problem in the link function. The scatterplot of , the second plot in Figure 2, indicates no unusual observations, except possibly for the 5th observation.
To gain further insight about these data, we examined the impact of different replacement values for WaterFlow in the 16th observation, specifically on the resulting value of The third plot in Figure 2 presents the values of for different replacement values for WaterFlow. Note that the -value is largest, hence most indicative of violations of model assumptions, when the replacement value is about 23.9, which is close to the value of 23.443 used by Atkinson. The fourth plot in Figure 2 presents the values of the correlation coefficients between Salinity and WaterFlow for different replacement values, and from this plot the largest absolute correlation is at a value very close to Atkinson’s replacement of 23.443. Using this value the correlation coefficient between Salinity and WaterFlow is − .646. Because for replacement values in the interval from 23.3 to 23.9, the correlation coefficients between Salinity and WaterFlow become largest, the impact of WaterFlow in the linear regression model when LagSalinity is also included in the model is not easily diminished, in contrast when using the original data. Consequently, at such replacement values, potential model violations especially with regards to the linearity assumption for WaterFlow materialize. The fifth plot in Figure 2, a scatterplot between Salinity and WaterFlow when using the replacement value of 23.443, partly reveals a curvilinear relationship between Salinity and WaterFlow.
Recognizing the possible problem with the link function, we follow Atkinson’s (1985, p. 51) suggestion of incorporating a quadratic term of WaterFlow and we fitted the model
(33) |
The resulting estimates are b0 = 67.49, b1 = .68, b2 = − .25, b3 = − 4.57, and b4 = :08, and the multiple R2 was 91.65%. Only β2 did not turn out to be significantly different from zero with p-value of .053. The model validation statistics are , , , , and The Scatterplot of , which is the sixth plot in figure 2, no longer shows unusual observation
7 Concluding Remarks
In this paper a global test procedure for validating the four assumptions of the linear model is proposed. The global test statistic, which is a function of the model residuals, is formed from four asymptotically independent statistics, each having the potential of detecting a particular violation. The level and powper properties of the tests were examined through simulation studies, which indicate that the global and directional tests possess the ability to detect different types of violations of the model assumptions. As such, the tests provide a formal method for globally assessing the validity of model assumptions. Deletion statistics and graphical methods based on the global statistic can be used to identify unusual observations. The proposed formal procedure may help in eliminating, or at least reducing, the oftentimes subjective assessment of the validity of model assumptions when using existing graphical techniques.
Other issues remain to be addressed. First, there is the problem of developing an adaptive method. From the simulation results, the power of the global test is generally lower than the best directional test when only one assumption is violated. Some of the directional tests have no power for detecting certain types of alternatives, hence they tend to dilute the power when included in this global statistic. We conjecture that it will be possible to have the data dictate which among the four directional test statistics to combine to form a global test statistic, and by doing so we expect that the resulting adaptive global test may acquire increased power. Several approaches to determining which directional test statistics to combine present themselves, such as those using information measures like the Schwartz (1978) Bayesian information criterion (BIC) or the Akaike information criterion (AIC) (Akaike 1973). Second, the chi-square approximation is not satisfactory for small sample sizes, though we point out that except for the -based statistic, the approximation leads to conservative tests. Two possible ways of alleviating this problem are to utilize empirical estimates of the covariance matrices instead of using the theoretical matrices, or to use computationally-intensive methods to determine the critical regions of the tests. Third, the main motivation for choosing the smoothing functions in the density embedding is to recover some commonly used one-dimensional test statistics, with the aim of formally combining them into one global test statistic. This has been achieved in this paper. However, one is not limited in choosing the functions that enter into the embedding. Finally, to make these linear model validation procedures accessible to practitioners, we plan to provide the procedures through a computer package in the R Library.
Acknowledgments
The authors wish to thank the associate editor and editor for their very useful and perceptive comments, suggestions, and changes which led to significant improvements in the manuscript.
Footnotes
AMS Subject Classification: Primary: 62J20 Secondary: 62J05, 62H15
References
- Akaike H. Information theory and the maximum likelihood principle. Budapest: Akademiai Kiado; 1973. pp. 610–624. [Google Scholar]
- Anderson T. An Introduction to Multivariate Statistical Analysis. 2 New York: John Wiley & Sons; 1984. [Google Scholar]
- Anscombe F. Examination of residuals. Proc Fourth Berkeley Symp. 1961;1:1–36. [Google Scholar]
- Anscombe F, Tukey J. Technometrics. 1963. The Examination and Analysis of Residuals; pp. 141–160. [Google Scholar]
- Atkinson A. Plots, Transformations, and Regression: An Introduction to Graphical Methods of Diagnostic Regression Analysis. Oxford: Clarendon Press; 1985. [Google Scholar]
- Bickel P. Using residuals robustly I: Tests for heteroscedasticity, nonlinearity. Annals of Statistics. 1978;6:266–291. [Google Scholar]
- Box G, Cox D. An analysis of transformations (with discussion) Journal of the Royal Statistical Society, A. 1964;143:383–430. [Google Scholar]
- Chen C. Score tests for regression models. Journal of the American Statistical Association. 1983;78:158–161. [Google Scholar]
- Chen C. Robustness aspects of score tests for generalized linear and partially linear regregression models. Technometrics. 1985;27:277–283. [Google Scholar]
- Claeskens G, Hjort N. The Focussed Information Criterion (with discussion) Journal of the American Statistical Assoiation. 2003;98:900–945. [Google Scholar]
- Cook R. Detection of influential observations in linear regression. Technometrics. 1977;19:15–18. [Google Scholar]
- Cook R, Weisberg S. Residuals and Infuence in Regression. New York: Chapman and Hall; 1982. [Google Scholar]
- Cook R, Weisberg S. Diagnostics for Heteroscedasticity in Regression. Biometrika. 1983;70:1–10. [Google Scholar]
- Cox D, Hinkley D. Theoretical Statistics. London: Chapman and Hall; 1974. [Google Scholar]
- Doornik J, Hansen H. An omnibus test for univariate and multivariate normality. 1994. citeseer.nj.nec.com/doornik94omnibu.html. [Google Scholar]
- Dukić V, Peña E. Variance estimation in a Model with Gaussian Submodels. Journal of the American Statistical Association. 2005;100:296–309. doi: 10.1198/016214504000000818.. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Durbin J, Watson G. Testing for Serial Correlation in Least Squares Regression: I. Biometrika. 1950;37:409–428. [PubMed] [Google Scholar]
- Durbin J, Watson G. Testing for Serial Correlation in Least Squares Regression: II. Biometrika. 1951;38:159–178. [PubMed] [Google Scholar]
- Hjort N, Claeskens G. Frequentist Model Average Estimators. Journal of the American Statistical Association. 2003;98:879–899. [Google Scholar]
- Ihaka R, Gentleman R. R: a language for data analysis and graphics. Journal of Computational and Graphical Statistics. 1996;5:299–314. [Google Scholar]
- Kianifard F, Swallow W. A Review of the Development and Application of Recursive Residuals in Linear Models. Journal of the American Statistical Association. 1996;91:391–400. [Google Scholar]
- Neter J, Kutner M, Nachtsheim C, Wasserman W. Applied Linear Statistical Models. 4 Irwin; 1996. [Google Scholar]
- Neyman J. “Smooth” test for goodness of fit. Skand Aktuarietidskr. 1937;20:150–199. [Google Scholar]
- Pierce D. The asymptotic effect of substituting estimators for parameters in certain types of statistics. Annals of Statistics. 1982;10:475–78. [Google Scholar]
- Rayner J, Best D. Neyman-Type Smooth Test for Location-Scale Families. Biometrika. 1986;73:437–446. [Google Scholar]
- Rayner J, Best D. Smooth Tests of Goodness of Fit. New York: Oxford University Press; [Google Scholar]
- Ruppert D, Carroll R. Trimmed Least Squares Estimation in the Linear Model. Journal of the American Statistical Association. 1980;75:828–838. [Google Scholar]
- Schwartz G. Estimating the dimension of a model. Ann Statist. 1978;6:461–464. [Google Scholar]
- Theil H. Journal of the American Statistical Association. 1965. The Analysis of Disturbances in Regression Analysis; pp. 1067–1079. [Google Scholar]
- Theil H, Nagar A. Testing the Independence of Regression Disturbances. Journal of the American Statistical Association. 1961;56:793–806. [Google Scholar]
- Thomas D, Pierce D. Neyman’s smooth goodness-of-fit test when the hypotheses is composite. Journal of the American Statistical Association. 1979;74:441–445. [Google Scholar]
- Tukey J. One degree of freedom for nonadditivity. Biometrics. 1949;5:232–242. [Google Scholar]
- Tukey J. Exploratory Data Analysis. Reading, MA: Addison-Wesley; 1977. [Google Scholar]