THE DISTRIBUTION OF COOK’S D STATISTIC

Keith E Muller; Mario Chen Mok

doi:10.1080/03610927708831932

. Author manuscript; available in PMC: 2013 Dec 18.

Published in final edited form as: Commun Stat Theory Methods. 2011 Feb 17;26(3):10.1080/03610927708831932. doi: 10.1080/03610927708831932

THE DISTRIBUTION OF COOK’S D STATISTIC

Keith E Muller ¹, Mario Chen Mok ²

PMCID: PMC3867306 NIHMSID: NIHMS445698 PMID: 24363487

Abstract

Cook (1977) proposed a diagnostic to quantify the impact of deleting an observation on the estimated regression coefficients of a General Linear Univariate Model (GLUM). Simulations of models with Gaussian response and predictors demonstrate that his suggestion of comparing the diagnostic to the median of the F for overall regression captures an erratically varying proportion of the values.

We describe the exact distribution of Cook’s statistic for a GLUM with Gaussian predictors and response. We also present computational forms, simple approximations, and asymptotic results. A simulation supports the accuracy of the results. The methods allow accurate evaluation of a single value or the maximum value from a regression analysis. The approximations work well for a single value, but less well for the maximum. In contrast, the cut-point suggested by Cook provides widely varying tail probabilities. As with all diagnostics, the data analyst must use scientific judgment in deciding how to treat highlighted observations.

Keywords: regression diagnostics, influence, residual analysis

1. INTRODUCTION

1.1 Motivation

A wide variety of applications in the medical, social, and physical sciences use regression models with continuous predictors. Often the predictors may plausibly be assumed to follow a multivariate Gaussian distribution. For example, a paleontologist may wish to model total skeleton length of fossils of a particular species, as a function of sizes for a limited number of bones. Many diagnostics have been suggested to aid in evaluating the validity of such models.

Most research in regression diagnostics has centered on the impact of deleting a single observation, with many different measures suggested. Cook (1977) recommended evaluating the standardized shift in the vector of estimated regression coefficients. He suggested comparing the statistic to the median of the F statistic for the test of all coefficients equal to zero. Such highlighted observations merit further examination in terms of their credibility and also their implications for validity of the model assumptions.

Belsley, Kuh, and Welsch (1980, p28) and Cook and Weisberg (1982, p114) discussed two alternatives for judging diagnostic statistics. Internal scaling involves judging a value with respect to the distribution in the sample at hand. External scaling involves judging a value with respect to the distribution that might occur over repeated samples. Both principles have merit in data analysis.

A standard approach for a diagnostic with know sampling distribution, such as studentized residuals, involves three steps. First, highlight observations by reference to the sampling distribution. Second, investigate the highlighted observations values and role in the analysis. Third, decide on the disposition of the observation, in light of all knowledge about the data. Possible actions include doing nothing, correcting a discovered error, or deleting an impossible value.

Data analysts first encountering p-values for regression diagnostics may hope to use them for automatic elimination of observations. Sophisticated analysts use the reference distributions to provide a common metric for the three step process (highlight, investigate, decide). Kleinbaum, Kupper, and Muller (1988, p201), in their introductory regression book, summarized their discussion of diagnostics by stating: “One should be cautioned that deleting the most deviant observations will in all cases slightly improve, and sometimes substantially improve, the fit of the model. One must be careful not to data snoop simply in order to polish the fit of the model by discarding troublesome data points.”

Although conceptually attractive to some observers, Cook’s statistic has not elicited universal enthusiasm. For example, Obenchain (1977) suggested ignoring the statistic and concentrating on its two components, the residual and the leverage. The difficulty in using the statistic stems from uncertainty as to what cut-point to use for highlighting troublesome observations. Our experience led us to the belief that the statistic flags only values already highlighted by residual analysis. Unpublished simulations (Chen Mok, 1993) confirmed the impression.

The ability to compute quantiles for Cook’s statistic based on Gaussian predictors, described in §2, provides an accurate metric for the statistic and hence allows the diagnostic to consistently highlight values worthy of further examination. The new results in this paper also imply a framework and approach for describing distributions and other properties of other diagnostics.

1.2 Related Earlier Work

Nearly all current regression texts consider regression diagnostics in some detail. Excellent book-length treatments include, in chronological order, Belsley, Kuh and Welsch (1980), Cook and Weisberg (1982), Atkinson (1985), and Chatterjee and Hadi (1986).

We consider two versions of the General Linear Univariate Model (GLUM) with iid Gaussian errors. For each observational unit the predictors will be assumed to be either a set of fixed values or to follow a multivariate Gaussian distribution. Sampson (1974) described the setting with fixed predictors as the conditional model, and the setting with Gaussian predictors as the unconditional model. As detailed in §2, the distribution and interpretation of Cook’s statistic depend directly on the distribution of the predictors. See Jensen and Ramirez (1996, 1997) for the distribution of Cook’s statistic for fixed predictors.

2. DISTRIBUTION THEORY

2.1 Notation and Definitions

In this section we present many standard results for regression diagnostics. Rather than cite a single source for each result, we recommend that the reader consult any of the book-length treatments just cited. LaMotte (1994) provided a “Rosetta Stone” for translating among the many names used for residuals.

A number of standard distributions must be considered. In general, indicate the cumulative distribution function (CDF) of the random variable U, which depends on parameters α₁ through α_k, as F_U(t;α₁…α_k), with density f_u(t; α₁…α_k) and pth quantile $F_{U}^{- 1} (p; α_{1} \dots α_{k})$ . For notational convenience write the CDF of U|V = v as F_{U|_v}(t;α₁…α_k). Resolution of conflict between random variable and matrix notation and the random or fixed nature of a variable will be specified when not obvious from context. Let N(μ, σ) indicate a multivariate Gaussian vector, with mean μ, non-singular covariance Σ, and CDF Φ(t; μ, Σ). Most results in this paper involve χ², F, or β random variables (Johnson and Kotz, Chapter 17, 1970a; Chapters 24 and 26, 1970b). Let χ²(ν) indicate a central χ² random variable on ν degrees of freedom, and let F(ν₁, ν₂) indicate a central F random variable on ν₁ and ν₂ degrees of freedom. Similarly let β(κ₁,) indicate a β random variable, with support (0,1).

Most results for regression diagnostics concern fixed predictors, and hence the conditional model described by Sampson (1974). In particular, consider

\underset{N \times 1}{y} = \underset{(N \times q) (q \times 1)}{X β} + \underset{N \times 1}{e} .

(2.1)

Let y_i indicate the ith row of y, X_i the ith row of X, and e_i the ith row of e. Here X contains fixed values, known conditionally on having designated the sampling units, β contains fixed unknown values, and Fe|X(t) = Φ(t; μ Σ). Assume throughout that N ⪢ q and that X has full rank of q. Let ν = (N – q) indicate the error degrees of freedom. Indicate the usual estimators as

\hat{β} = {(X^{'} X)}^{- 1} X^{'} y,

(2.2)

{\hat{σ}}^{2} = y^{'} (I - H) y ∕ ν .

(2.3)

Define

H = X {(X^{'} X)}^{- 1} X^{'},

(2.4)

the hat matrix because $\hat{y} = Hy$ (Hoaglin and Welsch, 1978). Let h_i indicate the ith diagonal element of H, the leverage for the ith observation:

h_{i} = X_{i} {(X^{'} X)}^{- 1} X_{i}^{'} .

(2.5)

Refer to

\hat{e} = (y - \hat{y})

(2.6)

as the vector of residuals. Note that

F_{\hat{e} ∣ X} (t) = Φ [t; 0, σ^{2} (I - H)] .

(2.7)

In turn define the ith squared standardized residual as

R_{i}^{2} = \frac{{\hat{e}}_{i}^{2}}{{\hat{σ}}^{2} (1 - h_{i})} .

(2.8)

Belsley, Kuh and Welsch (1980), Cook and Weisberg (1982) and Atkinson (1985) reviewed the algebra of deletion and properties of residuals. Let (−i) indicate deletion of the ith observation and index the N statistics generated by doing so. Let X_(−)i indicate the (N – 1) × q matrix created by deleting the ith row, with corresponding leverage $h_{(- i)} = X_{i} {(X_{(- i)}^{'} X_{(- i)})}^{- 1} X_{i}^{'}$ . The process creates sets of N estimates of $β, {{\hat{β}}_{(- i)}}$ , predicted values, ${{\hat{y}}_{(- i)} = X_{i} {\hat{β}}_{(- i)}}$ , residuals, ${{\hat{e}}_{(- i)} = y_{i} - {\hat{y}}_{(- i)}}$ , and variance estimates, ${{\hat{σ}}_{(- i)}^{2}}$ . The resulting squared and standardized residual, the studentized residual, equals

\begin{matrix} R_{(- i)}^{2} & = \frac{{\hat{e}}_{(- i)}^{2}}{{\hat{σ}}_{(- i)}^{2} (1 - h_{(- i)})} \\ = \frac{{\hat{e}}_{i}^{2}}{{\hat{σ}}_{(- i)}^{2} (1 - h_{i})} \\ = R_{i}^{2} \cdot (\frac{ν - 1}{ν - R_{i}^{2}}), \end{matrix}

(2.9)

with

F_{R_{(- i)}^{2} ∣ X} (t) = F_{F} (t; 1, ν - 1) .

(2.10)

Cook’s statistic measures the standardized shift in predicted values and the shift in $\hat{β}$ due to deleting the ith observation:

D_{i} = \frac{{({\hat{y}}_{(- i)} - \hat{y})}^{'} ({\hat{y}}_{(- i)} - \hat{y})}{q \cdot {\hat{σ}}^{2}} = \frac{{({\hat{β}}_{(- i)} - \hat{β})}^{'} (X^{'} X) ({\hat{β}}_{(- i)} - \hat{β})}{q \cdot {\hat{σ}}^{2}} .

(2.11)

Furthermore

\begin{matrix} D_{i} & = R_{i}^{2} \cdot \frac{h_{i}}{q (1 - h_{i})} \\ = R_{i}^{2} \cdot C_{i} . \end{matrix}

(2.12)

Finding d such that Pr{D_i > d} = α would provide a metric for Cook’s statistic. This idea motivates the current work. The results also provide a test of whether a particular D_i arose from the distribution of D_i implied by the GLUM assumptions. As highlighted in §1.1 and §4.3, the latter interpretation has more risks than benefits in practical use for the diagnostic setting.

2.2 The Distribution of Cook’s Statistic for Fixed Predictors

For fixed predictors C_i does not vary randomly. Hence, conditional on X,

\begin{matrix} D_{i} & = C_{i} \cdot R_{i}^{2} \\ = C_{i} \cdot ν \cdot β [1 ∕ 2, (ν - 1) ∕ 2] \\ = C_{i}^{'} \cdot β [1 ∕ 2, (ν - 1) ∕ 2] . \end{matrix}

(2.13)

Usually if i ≠ i’ then $C_{i}^{'} \neq C_{i^{'}}^{'}$ . The value of $C_{i}^{'}$ does not vary randomly with fixed predictors, but does vary with the ith leverage, h_i, and hence typically varies across sampling units.

In order to provide a metric for judging Cook’s statistic it would seem natural to eliminate the heterogeneity between sampling units which occurs with fixed predictors. However, doing so eliminates the variability due to C_i and makes D_i a simple multiple of $R_{i}^{2}$ , with no distinct information. At least with predictor values assigned by the experimenter, Obenchain’s (1977) preference for considering the leverages and residuals separately seems appealling. See Jensen and Ramirez (1996, 1997) for a thorough treatment of fixed predictors.

2.3 The Distribution of Cook’s Statistic for Gaussian Predictors

Theorem

Let a₀ = [q(N – 1)⁻¹. a₁ = (q – 1)N[qν(N – 1)⁻¹, and t₀ = max(a₀, d/ν). For d > 0 and Gaussian predictors

\Pr {D_{i} \leq d} = 1 - \int_{t_{0}}^{\infty} \Pr {β (\frac{1}{2}, \frac{ν - 1}{2}) > \frac{d}{ν t}} f_{C_{i}} (t) dt,

(2.14)

with corresponding density

f_{D} (d) = \int_{t_{0}}^{\infty} f_{β} (\frac{d}{ν t}; \frac{1}{2}, \frac{ν - 1}{2}) {(ν t)}^{- 1} f_{C_{i}} (t) dt .

(2.15)

Here

f_{C_{i}} (t) = {\begin{matrix} 0 & t < a_{0} \\ f_{F} [(t - a_{0}) a_{1}^{- 1}; q - 1, ν] a_{1}^{- 1} & a_{0} \leq t \end{matrix} .

(2.16)

Lemma 1

(Weisberg, 1985, p114) Conditional on knowing X (fixed X)

R_{i}^{2} = ν \cdot β [1 ∕ 2, (ν - 1) ∕ 2] .

(2.17)

Lemma 2

A leverage value from a model containing an intercept and (q – 1) multivariate Gaussian predictors, with each row iid, equals a one-to-one function of an F random variable.

Proof

Belsley, Kuh, and Welsch (p66, 1980) proved that

F_{i} = \frac{(h_{i} - 1 ∕ N) ∕ (q - 1)}{(1 - h_{i}) ∕ ν} = F (q - 1, ν) .

(2.18)

Solving their result for h_i yields

h_{i} = \frac{F_{i} (q - 1) ∕ ν + 1 ∕ N}{1 + F_{i} (q - 1) ∕ ν} .

(2.19)

Lemma 3

With Gaussian predictors, C_i = a₀ + a₁F_i, so that

\begin{matrix} \Pr {C_{i} \leq t} & = \Pr {a_{0} + a_{1} F_{i} \leq t} \\ = \Pr {F_{i} \leq (t - a_{0}) ∕ a_{1}}, \end{matrix}

(2.20)

and

f_{C_{i}} (t) = {\begin{matrix} 0 & t < a_{0} \\ f_{F} [(t - a_{0}) ∕ a_{1}; q - 1, ν] a_{1}^{- 1} & a_{0} \leq t \end{matrix} .

(2.21)

Proof

For Gaussian predictors the expression in (2.19) for h_i allows stating

\begin{matrix} C_{i} & = \frac{F_{i} (q - 1) ∕ ν + 1 ∕ N}{q (1 - 1 ∕ N)} \\ = a_{0} + a_{1} F_{i} . \end{matrix}

(2.22)

Lemma 4

Let X_* = XT, with T a full rank q × q matrix of constants. Note that T^−t = (T’)⁻¹ = (T⁻¹)’. Then H does not vary due to this transfonnation of the predictors.

Proof

Observe that

\begin{matrix} H = X {(X^{'} X)}^{- 1} X^{'} & = XT [T^{- 1} {(X^{'} X)}^{- 1} T^{- t}] T^{'} X^{'} \\ = XT {(T^{'} X^{'} XT)}^{- 1} T^{'} X^{'} \\ = X_{⋆} {(X_{⋆}^{'} X_{⋆})}^{- 1} X_{⋆}^{'} . \end{matrix}

(2.23)

Corollary 4.1

H does not vary due to the covariance matrix of iid random predictors.

Proof

Let Σ_x = F F’ indicate a factoring of the (q – 1) × (q – 1) covariance matrix of a row of random predictors, assumed full rank. Choosing

T = [\begin{matrix} 1 & 0 \\ 0 & F^{- t} \end{matrix}]

(2.24)

corresponds to considering a new model with predictors X_* = XT. The model contains an intercept and q – 1 random predictors, with Σ_{x_*} = I.

Corollary 4.2

h_i, ${\hat{e}}_{i}$ , ${\hat{σ}}^{2}$ , R_i, C_i and D_i do not vary due to full rank transformation of the predictors or the covariance matrix of random predictors.

Proof

Each quantity depends on X only through elements of H.

Lemma 5

With Gaussian predictors $F_{R_{(- i)}^{2} ∣ h_{i}} (t) = F_{R_{(- i)}^{2} ∣ X} (t)$ .

Proof

Consider $R_{(- i)}^{2}$ in terms of three pieces: (1 – h_i), ${\hat{σ}}_{(- i)}^{2}$ and ${\hat{e}}_{i}^{2}$ .

Obviously (1 – h_i) depends on X only through h_i.
Conditional on X, ${\hat{σ}}_{(- i)}^{2} (ν - 1) ∕ σ^{2} = χ^{2} (ν - 1)$ , and does not depend on X.
$F_{{\hat{e}}_{i} ∣ X} (t) = Φ [0, (1 - h_{i}) σ^{2}]$ and therefore $F_{{\hat{e}}_{i} ∣ X} (t) = F_{{\hat{e}}_{i} ∣ h_{i}} (t)$ .
Conditional on X, by the nature of deletion ${\hat{e}}_{i}^{2}$ and ${\hat{σ}}_{(- i)}^{2}$ are statistically independent (LaMotte, 1994, example 1) and $F_{{\hat{e}}_{i} ∕ {\hat{σ}}_{(- i)}^{2} ∣ X} (t) = F_{{\hat{e}}_{i} ∣ X} (t) F_{1 ∕ {\hat{σ}}_{(- i)}^{2} ∣ X} (t)$ .
Combining i) through iv) completes the proof

Corollary 5.1

With Gaussian predictors $F_{R_{i}^{2} ∣ h_{i}} (t) = F_{R_{i}^{2} ∣ X} (t)$ .

Proof

Use the last line of (2.9) to write $R_{i}^{2} = ν {[(ν - 1) ∕ R_{(- i)}^{2} + 1]}^{- 1}$ . Hence $R_{i}^{2}$ depends on X only through $R_{(- i)}^{2}$ which depends on X only through h_i.

Corollary 5.2

With Gaussian predictors $F_{R_{i}^{2} ∣ C_{i}} (t) = F_{R_{i}^{2} ∣ X} (t)$ .

Proof

C_i = h_i/[q(1 – h_i)] and hence depends on X only through h_i.

Proof of the Theorem

Use the law of total probability to state

\Pr {D_{i} > d} = \int_{a_{0}}^{\infty} \Pr {(R_{i}^{2} ∣ C_{i} = t) > d ∕ t} f_{C_{i}} (t) dt .

(2.25)

Equation (2.17) describes the distribution function of $R_{i}^{2}$ conditional on X, which equals the distribution of $R_{i}^{2}$ conditional on C_i, by Corollary 5.2. Combining the distribution in (2.17) with (2.25) allows conduding that

\Pr {D_{i} > d} = {\begin{matrix} 1 & d \leq 0 \\ \int_{a_{0}}^{\infty} \Pr {β (\frac{1}{2}, \frac{ν - 1}{2}) > \frac{d}{ν t}} f_{C_{i}} (t) dt & 0 < d < a_{0} ν \\ \int_{d ∕ ν}^{\infty} \Pr {β (\frac{1}{2}, \frac{ν - 1}{2}) > \frac{d}{ν t}} f_{C_{i}} (t) dt & a_{0} ν < d \end{matrix} .

(2.26)

Note that t₀ = max(a₀, d/ν) and simplify. Finding the density requires differentiating each form in (2.26) separately, and recognizing that the lower limit depends on d. The two apparently distinct forms reduce to a single one upon noting that f_β[1; 1/2, (ν – 1)/2] = 0.

2.4 Computational Forms for Numerical Integration

Although tantalizing in form, the integral for the CDF of D_i does not allow closed form integration. Numerical integration allows accurate and convenient computation of Pr{D_i > d}. Both functions in the integral require careful consideration in order to produce a form amenable to computation. Among various forms considered, the ones used here provide the simplest proofs and least computational time for any level of accuracy, except perhaps for small values of Pr{D_i ≤ d}. Interest usually centers on large values of Pr{D_i ≤ d}.

Two distinct representations create a finite region of integration, which greatly simplifies numerical integration. First express the density of C_i in terms of an F. If u = (t – a₀)/a₁, so that t = a₁u + a₀ and u₀ = (t₀ – a₀)/a₁ then

\Pr {D_{i} > d} = \int_{u_{0}}^{\infty} \Pr {β [\frac{1}{2}, \frac{(ν - 1)}{2}] > \frac{d}{ν (a_{1} u + a_{0})}} f_{F} (u; q - 1, ν) du,

(2.27)

or equivalently

\Pr {D_{i} > d} = \int_{u_{0}}^{\infty} \Pr {F (1, ν - 1) > \frac{ν - 1}{(a_{1} u + a_{0}) ν ∕ d - 1}} f_{F} (u; q - 1, ν) du .

(2.28)

The relationship of F and β random variables allows creating a finite region of integration. If z = (q – 1)u[ν + (q – 1)u]⁻¹ then u = ν(q – 1)⁻¹z(1 – z)⁻¹ and Z₀ = (q – 1)u₀[ν + (q – 1)u₀]⁻¹. Also let

s (z) = \frac{ν - 1}{[a_{1} ν {(q - 1)}^{- 1} z {(1 - z)}^{- 1} + a_{0}] ν ∕ d - 1} .

(2.29)

With this transformation

\Pr {D_{i} > d} = \int_{z_{0}}^{1} \Pr {F (1, ν - 1) > s (z)} f_{β} (z; \frac{q - 1}{2}, \frac{ν}{2}) dz .

(2.30)

A second useful representation results from applying the transformation w = u/(1 + u) to the integral in (2.28). With w₀ = u₀/(1 + u₀) and

h (w) = \frac{ν - 1}{[a_{1} w {(1 - w)}^{- 1} + a_{0}] ν ∕ d - 1}

(2.31)

it follows that

\Pr {D_{i} > d} = \int_{w_{0}}^{1} \Pr {F (1, ν - 1) > h (w)} f_{F} [\frac{w}{(1 - w)}; q - 1, ν] \frac{1}{{(1 - w)}^{2}} dw .

(2.32)

2.5 Approximations

Equation (2.27) allows recognizing that Pr{D_i > d} equals the expected value of a function of a random variable whenever t₀ = a₀. For fixed q $\lim_{N \to \infty} d ∕ ν = \lim_{N \to \infty} a_{0} = 0$ . Consequently the expected value interpretation holds, at least asymptotically, in all cases. The accuracy of a series based on treating the integral as an expected value depends both on the remainder term and on any discrepancy due to d/ν > a₀.

Creating a two term Taylor’s series approximation for (2.30) involves noting that εβ[(q – 1)/2, ν/2] = (q – 1)/(ν + q – 1). Ignoring any discrepancy due to d/ν > a₀ yields

\Pr {D_{i} > d} \approx \Pr {F (1, ν - 1) > \frac{ν - 1}{[a_{1} + a_{0}] ν ∕ d - 1}} .

(2.33)

Applying a series expansion for an F random variable, using (2.27) or (2.28), requires ν > 2k to insure finite kth moment. If ν > 2 then εF[(q – 1), ν] = ν/(ν – 2) and, ignoring discrepancy due to d/ν > a₀, a two term series equals

\Pr {D_{i} \leq d} \approx \Pr {F (1, ν - 1) \leq \frac{ν - 1}{[a_{1} ν ∕ (ν - 2) + a_{0}] ν ∕ d - 1}} .

(2.34)

For ν ≤ 2, a one term F based expansion about the number 1 yields

\Pr {D_{i} \leq d} \approx \Pr {F (1, ν - 1) \leq \frac{ν - 1}{[a_{1} + a_{0}] ν ∕ d - 1}},

(2.35)

which corresponds to the two term expansion for the β representation (in 2.34). The approximate probability of (2.35) will never be greater than that of (2.34).

The probability approximations imply approximations for quantiles of D_i:

{\tilde{d}}_{p} = [a_{1} \cdot m + a_{0}] ν {[1 + \frac{ν - 1}{F_{F}^{- 1} (p; 1, ν - 1)}]}^{- 1} .

(2.36)

Here m = 1 for (2.34) and (2.36), or m = ν/(ν – 2) for (2.35). Assigning m the value of the median, $F_{F}^{- 1} (.50; q - 1, ν)$ , or mode, ν(q – 3)/[(q – 1)(ν + 2)], for q > 3, also provides a one term approximation.

One convenient form for creating a long series arises from (2.28):

\begin{matrix} \Pr {D_{i} > d} & = \int_{u_{0}}^{\infty} \Pr {F (1, ν - 1) > \frac{ν - 1}{(a_{1} u + a_{0}) ν ∕ d - 1}} f_{F} (u; q - 1, ν) du \\ = \int_{u_{0}}^{\infty} \Pr {F (ν - 1, 1) \leq \frac{(a_{1} u + a_{0}) ν ∕ d - 1}{ν - 1}} f_{F} (u; q - 1, ν) du \\ = \int_{u_{0}}^{\infty} {F (ν - 1, 1) \leq c_{1} u + c_{0}} f_{F} (u; q - 1, ν) du \\ = \int_{u_{0}}^{\infty} P (u) f_{F} (u; q - 1, ν) du . \end{matrix}

(2.37)

In turn

\begin{matrix} P^{(0)} (u) & = \int_{0}^{c_{1} u + c_{0}} f_{F} (s; ν - 1, 1) ds \\ P^{(1)} (u) & = c_{1} f_{F} (c_{1} u + c_{0}; ν - 1, 1) \\ ⋮ \\ P^{(k)} (u) & = c_{1}^{k} f_{F}^{(k)} (c_{1} u + c_{0}; ν - 1, 1) . \end{matrix}

(2.38)

2.6 Large Sample Properties

The behavior of D_i in large samples merits separate consideration. The results have both analytic and computational value. Rather than study D_i directly, consider D_i* = ν·D_i. Then

\begin{matrix} \Pr {D_{i ⋆} > d_{⋆}} & = \Pr {ν \cdot D_{i} > d_{⋆}} \\ = \Pr {D_{i} > d_{⋆} ∕ ν} \\ = \Pr {D_{i} > d}, \end{matrix}

(2.39)

with d = d_*/ν. Using (2.28) the distribution function for D_i* may be expressed as

\Pr {D_{i ⋆} > d_{⋆}} = \int_{u_{0^{⋆}}}^{\infty} \Pr {F (1, ν - 1) > s_{⋆} (d_{⋆} ∕ ν, u)} f_{F} (u; q - 1, ν) du,

(2.40)

with

s_{⋆} (d_{⋆} ∕ ν, u) = \frac{(ν - 1) ∕ ν}{[u (\frac{q - 1}{q}) (\frac{N}{N - 1}) + q^{- 1} (\frac{ν}{N - 1})] ∕ d_{⋆} - 1 ∕ ν},

(2.41)

u_{0_*} = [t₀(d_*/ν)– a₀]/a₁, and t₀(d_*/ν) = max(a₀, d_*/ν²).

Consider D_i* as N → ∞. In that case

\lim_{N \to \infty} s_{⋆} (\frac{d_{⋆}}{ν}, u) = \frac{1}{[u (\frac{q - 1}{q}) + q^{- 1}] ∕ d_{⋆}} = \frac{d_{⋆} q}{[u (q - 1) + 1]} .

(2.42)

That $\lim_{N \to \infty} a_{0} = 0$ and $\lim_{N \to \infty} d_{⋆} ∕ ν^{2} = 0$ combine to imply $\lim_{N \to \infty} u_{0 ⋆} = 0$ . Therefore

\begin{matrix} \lim_{N \to \infty} & \Pr {D_{i ⋆} > d_{⋆}} = \\ \int_{0}^{\infty} \Pr {χ^{2} (1) > \frac{d_{⋆} q}{[u (q - 1) + 1]}} (q - 1) f_{χ^{2}} [(q - 1) u; q - 1] du . \end{matrix}

(2.43)

Let w = (q – 1)u, so that dw = (q – 1)du. Then

\lim_{N \to \infty} \Pr {D_{i ⋆} > d_{⋆}} = \int_{0}^{\infty} \Pr {χ^{2} (1) > \frac{d_{⋆} q}{w + 1}} f_{χ^{2}} (w; q - 1) dw .

(2.44)

A Taylor’s series about εW = (q – 1) yields the two term approximation

\Pr {D_{i ⋆} > d_{⋆}} \approx \Pr {χ^{2} (1) > d_{⋆}} .

(2.45)

Also, with d = d_*/ν, for large N

\Pr {D_{i} \leq d} \approx \Pr {χ^{2} (1) \leq ν \cdot d},

(2.46)

with corresponding quantile approximation

{\tilde{d}}_{p} \approx F_{χ^{2}}^{- 1} (p; 1) ∕ ν .

(2.47)

The F based approximation in (2.36) provides more accuracy, except in large samples. Additional terms are required for the approximation to vary with q.

Three conclusions follow. First, as N increases D_i converges to a degenerate random variable with all mass at zero. Second, D_i* converges to a non-degenerate random variable. Third, calculations of quantiles in terms of D_i* can greatly reduce numerical difficulties with large samples.

2.7 The Maximum of N Values of Cook’s Statistic

Fitting a linear model leads to considering N values of D_i. The non-independence of the set of D_i makes an analytic description of their joint distribution unclear, and computing associated probabilities rather onerous. Despite that, ignoring the multiple testing problem would lead to spuriously rejecting valid data. A Bonferroni correction provides the simplest strategy.

A multiple-testing correction for D_i with fixed predictors reduces to consideration of the same issue for residuals. Cook and Prescott (1981) examined the accuracy of a Bonferroni correction in evaluating N residuals, for fixed X. They provided useful lower bounds, based on residual correlations, to complement the Bonferroni upper bound. The accuracy of the Bonferroni correction decreases as correlations among the residuals increase. Experimental designs with purposeful confounding can create extremely high correlations among some pairs of residuals. Recall that, given X, the residuals have covariance matrix (I_N – H)σ². The Bonferroni correction seems more likely to be universally applicable with Gaussian predictors. As described in §2.3, for the study of {D_i} the covariance matrix for each row of Gaussian predictors may be assumed to be I_q–1. Hence the expected correlation for any pair of residuals should be modest and asymptotically zero. The excellent performance of the Bonferroni correction with independent events of small probability promises good accuracy here.

3. NUMERICAL EVALUATIONS

3.1 Exact Probability and Quantile Computations

All exact probabilities reported in this paper were computed by applying Simpson’s rule to equation (2.32). All calculations were expressed in terms of the variable D_i* = ν · D_i in order to provide better numerical accuracy for large sample cases. All exact quantiles were computed via a bisection algorithm (Thisted, 1988, p169) applied to equation (2.32), in terms of D_i*. An approximate quantile from equation (2.35) provided the starting value. Equation (2.34) or (2.35) provided starting values. Properties of the function were exploited to refine the code, merely to speed convergence. See Kennedy and Gentle (1980) or Thisted (1988) for a descriptions of Simpson’s rule, as well as general discussions of numerical integration, the use of transformations to finite regions such as the one used here, and function inversion algorithms.

3.2 A Simulation

A small simulation study was conducted in order to verify the accuracy of the computational strategy detailed at the end of §2.4, and to assess the accuracy of a Bonferroni correction in evaluating N values of D_i. Assumptions followed those in §2: y = Xβ + e holds, with {e_i} iid Gaussian, β fixed and unknown, {X_i} iid multivariate Gaussian and independent of {e_i}. In such cases, finding d such that Pr{D_i > d} = α depends only on N, q, and α. The value of d provides a test of whether a particular D_i arose from the hypothesized distribution.

All data were generated under the stated assumptions. Empirical size of the test of D_i was tabulated for each replicate. Two factors were varied in a factorial design: N ∈ {25, 50, 100} and q ∈ {2, 4, 8}. For each replicate the first D_i was tested at α ∈ {.01,.05} and the largest D_i was tested at α ∈ {.01/N,.05/N}. The pseudo-random generation of data, under valid assumptions, insures that the first D_i represents a pseudo-randomly selected value. In contrast, the distribution of the largest D_i depends on the remaining N – 1 values.

Let Z = [e G] indicate an N × q matrix, with row_i(Z) = N(O, I). For each combination of N and q a total of 20,000 replicates of Z were created in SAS IML^©, using the function NORMAL. Next X = [1 G], y = Xβ + e, and {D_i} were computed for each, with β = 0_q (which implies y = e). The first D_i, the largest D_i, N and q were stored for each replicate.

Table I summarizes the empirical size for the tests of D_i with Gaussian predictors, as a function of N, q, and α. The formulas derived in §2 provided accurate probabilities for the simulations of a single value. Furthermore the Bonferroni approximation was quite accurate.

TABLE I.

Empirical Test Size, $\hat{α}$ , for Single and Largest D_i with 20,000 Replications, and standard error of .0007 (α = .01) or .0015 (α = .05)

		Single		Largest
q	N	α = .01	.05	.01/N	.05/N
2	25	.010	.051	.010	.048
	50	.011	.049	.010	.049
	100	.009	.050	.011	.051
4	25	.010	.049	.010	.047
	50	.011	.050	.011	.050
	100	.011	.051	.010	.051
8	25	.010	.047	.010	.049
	50	.010	.052	.010	.048
	100	.011	.050	.009	.051

Open in a new tab

3.3 Comparisons of Approximations

Table II contains probabilities of D_i exceeding $F_{F}^{- 1} (.50; q - 1, ν)$ , and N times the probabilities. Test size systematically and rapidly decreases with N. Ideally a cut-point allows consistent interpretation across regression analyses. The median, or any other quantile of F(q – 1,ν), does not allow such consistency.

TABLE II.

Probability of D_i Exceeding F Median as a Function of Sample Size (N) and Number of Gaussian Predictors (q – 1)

	q
	2	4	8	16
N	Pr{D_i > $F_{F}^{- 1} (.50; q - 1, ν)$ }
25	7.02 · 10⁻³	1.53 · 10⁻³	1.39· 10⁻³	1.19 · 10⁻²
50	7.46 · 10⁻⁴	3.47 · 10⁻⁵	6.02· 10⁻⁶	5.61 · 10⁻⁵
100	3.59 · 10⁻⁵	1.85 · 10⁻⁷	3.10· 10⁻⁹	1.43 · 10⁻¹⁰
200	5.55 · 10⁻⁷	1.20 · 10⁻¹⁰	<1 · 10⁻¹⁴	<1 · 10⁻¹⁴
	N · Pr{D_i > $F_{F}^{- 1} (.50; q - 1, ν)$ }
25	1.76 · 10⁻¹	3.82 · 10⁻²	3.46 · 10⁻²	2.98 · 10⁻¹
50	3.73 · 10⁻²	1.73 · 10⁻³	3.01 · 10⁻⁴	2.81 · 10⁻⁴
100	3.59 · 10⁻³	1.85 · 10⁻⁵	3.10 · 10⁻⁷	1.43 · 10⁻⁸
200	1.11 · 10⁻⁴	2.41 · 10⁻⁸	1.39 · 10⁻¹	<1 · 10⁻¹⁴

Open in a new tab

The approximate quantile in equation (2.36) also provides a cut-point requiring only one evaluation of $F_{F}^{- 1} (\cdot; \cdot, \cdot)$ . Such values were computed for N and q as in Table II, with target test sizes of .01 and .05. Equation (2.32) was integrated with Simpson’s rule to compute exact probabilities of exceeding the approximate quantiles. In order to approximately evaluate a Bonferroni correction, the same process was followed for target test sizes of .01/N and .05/N, with the additional step of multiplying the probabilities by N. As can be seen in Table III, the exact test size ranges from .052 to .058 for a target α of .05, and from .014 to .026 for a target α of .01. The results corresponding to a Bonferroni correction (in the right half of Table III) involve smaller tail probabilities and were much less accurate. An overall target α of .01 gave approximate test sizes ranging from .073 to .398, while a target α of .05 gave approximate test sizes ranging from .200 to .709 (for the conditions examined). Accuracy improves with increasing sample size and number of predictors.

TABLE III.

Probability of D_i Exceeding F-Based Approximate Quantile as a Function of Sample Size (N) and Number of Gaussian Predictors (q – 1)

	q
	2	4	8	16	2	4	8	16
N	Pr{D_i < ${\tilde{d}}_{.01}$ }¹				N·Pr{D_i > ${\tilde{d}}_{.01}$ }¹
25	.022	.023	.022	.026	.182	.166	.149	.296
50	.020	.020	.018	.016	.212	.168	.113	.085
100	.019	.019	.017	.014	.298	.203	.118	.069
200	.019	.019	.016	.014	.398	.268	.142	.073
	Pr{D_i > ${\tilde{d}}_{.05}$ }¹				N · Pr{D_i > ${\tilde{d}}_{.05 ∕ N}$ }¹
25	.054	.058	.057	.058	.295	.286	.260	.403
50	.053	.057	.055	.054	.369	.323	.240	.192
100	.053	.056	.055	.053	.498	.405	.270	.181
200	.053	.056	.055	.053	.709	.539	.330	.200

Open in a new tab

${\tilde{d}}_{p} = [a_{1} \cdot m + a_{0}] ν {[1 + (ν - 1) ∕ F_{F}^{- 1} (p; 1, ν - 1)]}^{- 1}$ , with m = ν/(ν – 2) if ν > 2 and m = 1 otherwise.

Table IV provides exact critical values of ν·D_i for α ∈ {.01/N,.05/N} and a range of N and q. Quantiles in Table IV were computed by a simple bisection algorithm (Thisted, 1988, p 169) applied to equation (2.32). An approximate quantile from equation (2.35) provided the starting value. Algorithmic stability across a large range of sample sizes required using d_* = ν × d.

TABLE IV.

Exact Critical Values of ν · D_i for α ∈ {.01/N, .05/N}

q – 1 = 1		2	3	4	5	6	7	8	9	10	15	20	40	80
N	α = .05/N
10	15.2	16.6	18.2	21.3	28.4	50.2	192.9	.	.	.	.	.	.	.
15	15.6	15.6	15.5	15.6	15.9	16.7	18.0	20.3	24.8	34.7	.	.	.	.
20	16.4	16.0	15.5	15.1	15.0	14.9	15.0	15.3	15.7	16.4	40.0	.	.	.
25	17.2	16.6	15.9	15.3	15.0	14.7	14.6	14.5	14.5	14.6	16.9	44.6	.	.
30	17.9	17.1	16.3	15.7	15.2	14.8	14.6	14.4	14.3	14.2	14.6	17.5	.	.
40	19.2	18.2	17.2	16.4	15.8	15.4	15.0	14.7	14.5	14.4	13.9	14.1	.	.
60	21.3	19.9	18.6	17.7	17.0	16.4	16.0	15.7	15.3	15.1	14.3	13.9	14.5	.
80	23.0	21.2	19.8	18.7	17.9	17.3	16.8	16.4	16.1	15.7	14.9	14.3	13.8	.
100	24.3	22.3	20.7	19.6	18.7	18.0	17.5	17.1	16.7	16.4	15.4	14.8	13.9	15.6
200	28.8	26.1	24.1	22.5	21.4	20.5	20.0	19.3	18.0	18.4	17.2	16.5	15.2	14.6
400	33.9	30.2	27.6	25.8	24.4	23.5	22.4	22.1	21.3	20.5	19.4	18.5	16.8	16.0
800	40.2	34.0	32.1	29.3	28.2	26.7	25.8	24.4	24.3	23.3	21.8	20.3	18.8	17.5
	α = .01/N
10	28.7	31.1	35.1	44.1	66.8	50.5	964.1	.	.	.	.	.	.	.
15	26.9	26.1	25.7	25.8	26.7	28.5	31.8	37.8	50.1	80.7	.	.	.	.
20	27.2	25.7	24.2	23.6	23.2	23.1	23.3	23.9	24.9	26.5	92.1	.	.	.
25	27.9	25.8	24.3	23.2	22.5	22.0	21.7	21.6	21.6	21.8	27.0	102.3	.	.
30	28.7	26.4	24.4	23.3	22.4	21.8	21.3	21.0	20.7	20.6	21.5	27.7	.	.
40	30.2	27.4	25.4	23.9	22.8	22.2	21.4	20.9	20.5	20.1	19.4	19.7	.	.
60	32.3	29.3	26.6	25.2	24.2	23.0	22.2	21.6	21.2	20.7	19.4	18.9	19.8	.
80	34.4	31.1	28.1	26.4	25.0	23.9	23.3	22.4	21.8	21.3	19.8	19.2	17.8	.
100	36.1	32.6	29.2	27.3	25.8	24.4	24.2	23.3	22.2	22.1	20.2	19.2	18.0	20.7
200	41.2	37.3	34.2	31.3	29.4	28.4	26.9	25.8	25.6	24.5	22.4	21.3	19.3	18.6
400	49.4	45.0	37.6	35.3	34.1	31.0	31.0	29.3	28.2	28.2	25.6	23.3	21.2	20.1
800	68.4	57.7	52.6	40.6	36.9	36.9	33.6	33.6	30.5	30.5	27.7	25.2	22.9	22.9

Open in a new tab

Different rows in Table IV have different patterns. The range reflects a varying distance from a boundary condition. The studentized residual embedded in D_i requires N – q – 1 > 0. The critical value of D_i or ν · D_i may be taken to be infinity for N – 1 > q. The table covers q ≤ 80. Rows with N ≤ 80 include the boundary and show a marked upturn in rightmost value. Rows with N ≥ 200 have no entries near the boundary, and hence display a monotone pattern.

3.4 Comments on Algorithms

Computing and verifying the results in this paper led us to program and evaluate both formulations described in §2.4. Equation (2.32) needed less computation for a fixed accuracy. The advantage of the transformation in (2.32) arises from the shape of the function as both N and q get large. The good performance of the transformation reflects the nature of the random variable W = U/(1 + U), with U following an F distribution. Even though both numerator and denominator degrees of freedom increase, the distribution function of W does not degenerate to a point mass. In contrast, the other two formulations involve convergence (as sample size increases) to degenerate random variables and hence degenerate functions. Two difficulties with (2.32) should be noted. First, q = 2 creates a singularity at zero, which often represent the end-point of the interval of integration. Second, extremely large values of d (corresponding to values far beyond those in Table IV) may increase the computational burden.

The care required to insure reasonable numerical performance across a wide range of conditions should not be surprising, given the random variables involved. Kennedy and Gentle (§5.5 and .5.6, 1980) discussed the difficulties in computing F and β probabilities and quantiles. They concluded that no single approach works with all parameter combinations. Thisted (§S.2.2, 1988) provided related material.

4. DISCUSSION

4.1 The Role of Sample Size in Regression Diagnostics

Considering D_i* = νD_i rather than D_i creates a computational advantage. The two alternatives also reflect two mutually exclusive behaviors for regression diagnostics. For a fixed amount of deviance, distinctions among observations shrink as sample size increases for h_i and D_i (both converge to zero). In contrast, a fixed amount of deviance yields an interpretation essentially constant as sample size increases for ${\hat{e}}_{i}$ , and R_i. In order to emphasize the distinction, compare h_i to

h_{i ⋆} = ν h_{i} ∕ q = X_{i} {(X^{'} X ∕ ν)}^{- 1} X_{i}^{'} ∕ q = (X_{i} - 0) {(X^{'} X ∕ ν)}^{- 1} {(X_{i} - 0)}^{'} ∕ q .

(4.1)

Obviously h_i* exhibits the second type of behavior, across N and q. Note that h_i* corresponds to the Mahalanobis distance from the origin.

Both types of behavior have merit. Statistics of the first type better reflect the impact of a single observation on the total analysis. With the first type of statistic, the misleading effect of a single observation eventually drowns in rising sample size. Statistics of the second type highlight a given deviant observation, no matter what the sample size. The second type’s consistent range of values across sample sizes simplifies interpretation. For example, no matter what the sample size, R_i = 7 would demand further attention to the observation.

Sample size also plays a familiar role in the interpretation of D_i* and D_i. As always, one must distinguish between statistical “significance” (a small p-value, reflecting rarity of the value) and scientific importance (a difference of consequence in practice). In the present context, some data analysts judge importance by the size of an estimated regression coefficient, ${\hat{β}}_{j}$ Others consider the standardized version, ${\hat{β}}_{j} ∕ \hat{σ} ({\hat{β}}_{j})$ , the corresponding semi-partial correlation coefficient, r(Y,X_j|{X₁…X_j–1,X_j+1…X_q–1}), or the corresponding sums of squares. A regression coefficient may significantly differ from zero but have no practical importance. Recall that $D_{i} = {({\hat{β}}_{(- i)} - \hat{β})}^{'} (X^{'} X) ({\hat{β}}_{(- i)} - \hat{β}) ∕ [q \cdot {\hat{σ}}^{2}]$ , and captures the shift in (standardized) regression coefficients. Hence to judge the importance of an observation highlighted by D_i one should examine the shift in $\hat{β}$ , sum of squares, or multiple correlation, due to deleting the observation.

4.2 Open Questions and Potential Applications

D_i represents one example of a closely related set of diagnostic statistics, including DFFITS_i and DFBETAS_j(i) (Cook and Weisberg, 1982), and a modification of Cook’s D_i (Atkinson, 1985). The approach presented here for computing probabilities and quantiles appears to allow similar computations for at least some of the related statistics.

The distribution of the predictors and the purpose of the analysis strongly affects the interpretation of the diagnostics. The results cover only two situations, all fixed or all Gaussian variable predictors plus the special fixed predictor, the intercept. Random but non-Gaussian predictors were also not considered here. Do the results provide an approximation whose quality improves as ν increases? Are the results robust with respect to the form of the distribution?

Another generalization involves models with both fixed and Gaussian predictors. Consider, for example, ANCOVA models, which contain one or more fixed effects, and one or more Gaussian predictors. The fixed parameters for a single fixed factor or for any factorial design can be expressed, without loss of generality, as a set of G cell means, with corresponding columns in X of full rank (G). The statistical independence between rows of data allows the likelihood to be separated into G components. The theory in §1-4 treats the special case of G = 1. Consequently the new results apply to each of the G sets of data, considered separately. However, such an analysis would only identify influence within a group, not with respect to all observations in the analysis. Considering all observations simultaneously appears to require additional theoretical results.

More general models may have fixed parameters not expressible as a full-rank cell mean coding (such as a fixed-block design), and/or contain interactions between fixed and Gaussian or Gaussian and Gaussian predictors. Again new theoretical results appear necessary.

Two or more observations may mask the influence of each other. Consequently some research on diagnostics has focused on the impact of deleting two or more observations. Although very appealing, they usually create substantially greater analytic and computational difficulty. Cook and Weisberg (1982) discussed generalizing D_i in this fashion. The theory described here does not accommodate their generalization in any straightforward fashion. The more general result seems worth pursuing. See Jensen and Ramirez (1996, 1997) for the fixed predictor case. Furthermore generalizing the results stated here to multivariate regression (two or more responses) also has merit.

Computational algorithms for probabilities and quantiles of D_i deserve more attention. A series representation would likely provide the best solution, although an even better behaved function for integration might suffice.

4.3 Abusing the Results for Data Analysis

The new results should never be used for automatically discarding an observation. Some widely cited meteorology illustrate the danger of automatic deletion. In order to help process the flood of data from U. S. weather satellites, automatic outlier detection and rejection was applied as part of data formatting and reduction (Kenward, 1988). British scientists (Farman, Gardiner, and Shanklin, 1985), using ground station data, reported a dramatic downward trend across time in ozone levels over the Antarctic. U. S. NASA scientists confirmed the infamous “hole” in the ozone by re-examining their accumulated satellite data, with automatic outlier detection disabled.

4.4 Using the Results for Data Analysis

As discussed in § 1.1, using any diagnostic involves a three step process: 1) highlight bothersome values, 2) investigate the highlighted values, and 3) decide on a disposition, using scientific principles. Whenever the Gaussian predictors assumption seems reasonable, we recommend using the probability and quantile computations for ν·D_i to highlight observations worthy of investigation. As indicated, explicitly compute and compare $\hat{β}$ and ${\hat{β}}_{(- i)}$ . As discussed in §4.1, examining the shift in sum of squares or correlation also has appeal. We believe the results presented here provide a useful metric for D_i and valuable insight into its nature and performance.

ACKNOWLEDGMENTS

Muller’s work supported in part by NCI grant P01 CA47 982-04, NIH grant M01 RR000-46-33, and NIEHS grant N01-ES-35356. The authors gratefully acknowledge comments on earlier drafts by anonymous reviewers.

Contributor Information

Keith E. Muller, Dept. of Biostatistics, CB#7400 University of North Carolina Chapel Hill, North Carolina, 27599

Mario Chen Mok, Dept. of Biostatistics, CB#7400 University of North Carolina Chapel Hill, North Carolina, 27599.

BIBLIOGRAPHY

Atkinson AC. Plots, Transformations, and Regression. Clarendon Press; Oxford: 1985. [Google Scholar]
Belsley DA, Kuh E, Welsch RE. Regression Diagnostics: Identifying Influential Data alld Sources of Collinearity. Wiley; New York: 1980. [Google Scholar]
Chatterjee, Sand Hadi AS. Influential Observations, High Leverage Points, and Outliers in Linear Regression. Statistical Science. 1986;1:379–416. [Google Scholar]
Chen Mok M. Evaluating Cook’s D Statistic in Theory and Practice: A Simulation Study. Department of Biostatistics, University of North Carolina; Chapel Hill: 1993. Unpublished Master’s Paper. [Google Scholar]
Cook RD. Detection of Influential Observations in Linear Regression. Technometrics. 1977;19:15–18. [Google Scholar]
Cook RD, Prescott P. On the Accuracy of Bonferroni Significance Levels for Detecting Outliers in Linear Models. Technometrics. 1981;23:59–63. [Google Scholar]
Cook RD, Weisberg S. Residuals and Influence in Regression. Chapman and Hall; New York: 1982. [Google Scholar]
Farman JC, Gardiner BG, Shanklin JD. Large losses of total ozone in Antarctica reveal seasonal CIOI/NOI interaction. Nature. 1985;315:207–210. [Google Scholar]
Hoaglin DC, Welsch RE. The Hat Matrix in Regression and ANOVA. American Statistician. 1978;32:17–22. [Google Scholar]
Jensen DR, Ramirez DE. Computing the CDF of Cook’s DI Statistic. In: Prat A, Ripoll E, editors. Proceedings of the 12th Symposium in Computational Statistics; Barcelona, Spain. Instituto de Estadística de Catalunya; 1996. pp. 65–66. [Google Scholar]
Jensen DR, Ramirez DE. Some exact properties of Cook’s DI. In: Rao CR, Balakrishnan N, editors. Handbook of Statistics-16: Order Statistics and Their Applications. North-Holland; Amsterdam: 1997. in press. [Google Scholar]
Johnson NL, Kotz S. Continuous Univariate Distributions - 1. Wiley; New York: 1970a. [Google Scholar]
Johnson NL, Kotz S. Continuous Univariate Distributions - 2. Houghton Mifflin; Boston: 1970b. [Google Scholar]
Kennedy WJ, Jr., Gentle JE. Statistical Computing. Marcel Dekker; New York: 1980. [Google Scholar]
Kenward M. Surprise, Surprise. New Scientist. 1988;117(1606):16. [Google Scholar]
Kleinbaum DG, Kupper LL, Muller KE. Applied Regression Analysis and Other Multivariable Methods. Second Edition Duxbury Press; Boston: 1988. [Google Scholar]
LaMotte LR. A Note on the Role of Independence in t Statistics Constructed From Linear Statistics in Regression Models. American Statistician. 1994;48:238–240. [Google Scholar]
Obenchain RL. Letter to the Editor. Technometrics. 1977;19:348–349. [Google Scholar]
Sampson AR. A Tale of Two Regressions. Journal of the American Statistical Association. 1974;69:682–689. [Google Scholar]
Thisted RA. Elements of Statistical Computing. Chapman and Hall; New York: 1988. [Google Scholar]
Weisberg S. Applied Linear Regression. Wiley; New York: 1985. [Google Scholar]

[R1] Atkinson AC. Plots, Transformations, and Regression. Clarendon Press; Oxford: 1985. [Google Scholar]

[R2] Belsley DA, Kuh E, Welsch RE. Regression Diagnostics: Identifying Influential Data alld Sources of Collinearity. Wiley; New York: 1980. [Google Scholar]

[R3] Chatterjee, Sand Hadi AS. Influential Observations, High Leverage Points, and Outliers in Linear Regression. Statistical Science. 1986;1:379–416. [Google Scholar]

[R4] Chen Mok M. Evaluating Cook’s D Statistic in Theory and Practice: A Simulation Study. Department of Biostatistics, University of North Carolina; Chapel Hill: 1993. Unpublished Master’s Paper. [Google Scholar]

[R5] Cook RD. Detection of Influential Observations in Linear Regression. Technometrics. 1977;19:15–18. [Google Scholar]

[R6] Cook RD, Prescott P. On the Accuracy of Bonferroni Significance Levels for Detecting Outliers in Linear Models. Technometrics. 1981;23:59–63. [Google Scholar]

[R7] Cook RD, Weisberg S. Residuals and Influence in Regression. Chapman and Hall; New York: 1982. [Google Scholar]

[R8] Farman JC, Gardiner BG, Shanklin JD. Large losses of total ozone in Antarctica reveal seasonal CIOI/NOI interaction. Nature. 1985;315:207–210. [Google Scholar]

[R9] Hoaglin DC, Welsch RE. The Hat Matrix in Regression and ANOVA. American Statistician. 1978;32:17–22. [Google Scholar]

[R10] Jensen DR, Ramirez DE. Computing the CDF of Cook’s DI Statistic. In: Prat A, Ripoll E, editors. Proceedings of the 12th Symposium in Computational Statistics; Barcelona, Spain. Instituto de Estadística de Catalunya; 1996. pp. 65–66. [Google Scholar]

[R11] Jensen DR, Ramirez DE. Some exact properties of Cook’s DI. In: Rao CR, Balakrishnan N, editors. Handbook of Statistics-16: Order Statistics and Their Applications. North-Holland; Amsterdam: 1997. in press. [Google Scholar]

[R12] Johnson NL, Kotz S. Continuous Univariate Distributions - 1. Wiley; New York: 1970a. [Google Scholar]

[R13] Johnson NL, Kotz S. Continuous Univariate Distributions - 2. Houghton Mifflin; Boston: 1970b. [Google Scholar]

[R14] Kennedy WJ, Jr., Gentle JE. Statistical Computing. Marcel Dekker; New York: 1980. [Google Scholar]

[R15] Kenward M. Surprise, Surprise. New Scientist. 1988;117(1606):16. [Google Scholar]

[R16] Kleinbaum DG, Kupper LL, Muller KE. Applied Regression Analysis and Other Multivariable Methods. Second Edition Duxbury Press; Boston: 1988. [Google Scholar]

[R17] LaMotte LR. A Note on the Role of Independence in t Statistics Constructed From Linear Statistics in Regression Models. American Statistician. 1994;48:238–240. [Google Scholar]

[R18] Obenchain RL. Letter to the Editor. Technometrics. 1977;19:348–349. [Google Scholar]

[R19] Sampson AR. A Tale of Two Regressions. Journal of the American Statistical Association. 1974;69:682–689. [Google Scholar]

[R20] Thisted RA. Elements of Statistical Computing. Chapman and Hall; New York: 1988. [Google Scholar]

[R21] Weisberg S. Applied Linear Regression. Wiley; New York: 1985. [Google Scholar]

PERMALINK

THE DISTRIBUTION OF COOK’S D STATISTIC

Keith E Muller

Mario Chen Mok

Abstract

1. INTRODUCTION

1.1 Motivation

1.2 Related Earlier Work

2. DISTRIBUTION THEORY

2.1 Notation and Definitions

2.2 The Distribution of Cook’s Statistic for Fixed Predictors

2.3 The Distribution of Cook’s Statistic for Gaussian Predictors

Theorem

Lemma 1

Lemma 2

Proof

Lemma 3

Proof

Lemma 4

Proof

Corollary 4.1

Proof

Corollary 4.2

Proof

Lemma 5

Proof

Corollary 5.1

Proof

Corollary 5.2

Proof

Proof of the Theorem

2.4 Computational Forms for Numerical Integration

2.5 Approximations

2.6 Large Sample Properties

2.7 The Maximum of N Values of Cook’s Statistic

3. NUMERICAL EVALUATIONS

3.1 Exact Probability and Quantile Computations

3.2 A Simulation

TABLE I.

3.3 Comparisons of Approximations

TABLE II.

TABLE III.

TABLE IV.

3.4 Comments on Algorithms

4. DISCUSSION

4.1 The Role of Sample Size in Regression Diagnostics

4.2 Open Questions and Potential Applications

4.3 Abusing the Results for Data Analysis

4.4 Using the Results for Data Analysis

ACKNOWLEDGMENTS

Contributor Information

BIBLIOGRAPHY

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases