A weighted generalized score statistic for comparison of predictive values of diagnostic tests

Andrzej S Kosinski

doi:10.1002/sim.5587

. Author manuscript; available in PMC: 2013 Aug 28.

Published in final edited form as: Stat Med. 2012 Aug 22;32(6):964–977. doi: 10.1002/sim.5587

A weighted generalized score statistic for comparison of predictive values of diagnostic tests

Andrzej S Kosinski ¹

PMCID: PMC3756153 NIHMSID: NIHMS499894 PMID: 22912343

Abstract

Positive and negative predictive values are important measures of a medical diagnostic test performance. We consider testing equality of two positive or two negative predictive values within a paired design in which all patients receive two diagnostic tests. The existing statistical tests for testing equality of predictive values are either Wald tests based on the multinomial distribution or the empirical Wald and generalized score tests within the generalized estimating equations (GEE) framework. As presented in the literature, these test statistics have considerably complex formulas without clear intuitive insight. We propose their re-formulations which are mathematically equivalent but algebraically simple and intuitive. As is clearly seen with a new re-formulation we present, the generalized score statistic does not always reduce to the commonly used score statistic in the independent samples case. To alleviate this, we introduce a weighted generalized score (WGS) test statistic which incorporates empirical covariance matrix with newly proposed weights. This statistic is simple to compute, it always reduces to the score statistic in the independent samples situation, and it preserves type I error better than the other statistics as demonstrated by simulations. Thus, we believe the proposed WGS statistic is the preferred statistic for testing equality of two predictive values and for corresponding sample size computations. The new formulas of the Wald statistics may be useful for easy computation of confidence intervals for difference of predictive values. The introduced concepts have potential to lead to development of the weighted generalized score test statistic in a general GEE setting.

Keywords: Correlated data, Diagnostic test, Paired design, Positive and negative predictive values, Score statistic

1 Introduction

Diagnostic tests are important in medicine especially when the gold standard for ascertainment of disease is invasive or expensive. We consider diagnostic tests with binary result and assume gold standard is available for comparison with the test result. As a motivating example, Table 1 presents the coronary artery disease data [1]. This is a paired design, in which all the patients receive both diagnostic tests, and it is the situation we consider. The gold standard for coronary artery disease is coronary angiography and the diagnostic tests are exercise stress test (Test 1) and clinical history of chest pain (Test 2). Two important measures used to evaluate diagnostic tests are positive and negative predictive values. Positive predictive value (ppv) is the probability of disease (as identified by the gold standard) when the diagnostic test is positive and negative predictive value (npv) is the probability of no disease when the diagnostic test is negative. We consider testing of equality of predictive values of two diagnostic tests. Although a joint comparison of positive and negative predictive values has been considered [2, 3], we focus presently on comparison of positive and negative predictive values separately.

Table 1.

Coronary artery disease data

	CAD		No CAD
	Test 2 result		Test 2 result
Test 1 result	Chest pain	No chest pain	Chest pain	No chest pain
Positive stress test	473	29	22	46
Negative stress test	81	25	44	151

Open in a new tab

CAD = coronary artery disease

Since both tests are measured on the same patients, the correlation between predictive values of the two tests is present. To account for this when testing equality of predictive values, generalized estimating equations (GEE) approach [4] with the generalized score test [5, 6] was considered [7]. Alternatively, data in Table 1 were considered to have a multinomial distribution and application of the δ-technique provided Wald test statistics [2, 3, 8]. These test statistics, as reported in the literature, tend to have rather complex formulas which are not most intuitive. Sometimes, formulas are different for the mathematically equivalent test statistics. Thus, there is a need for simple formulas just as they exist for testing equality of two proportions in the two independent samples case.

We propose such novel, simple, and intuitive algebraic re-formulations of the existing test statistics for comparing predictive values. The new formulas facilitate comparisons between the Wald statistics and with the generalized score statistic. A new formulation of the generalized score statistic clearly shows that if each patient receives only one test (unpaired design) then this statistic does not always reduce to the score test statistic commonly used in the independent samples situation. Motivated by this, we propose a new generalized score based statistic for testing equality of two predictive values. The suggested statistic always reduces to the score statistic with independent samples. We call this new statistic the weighted generalized score (wgs) test statistic because it results from consideration of weights when computing empirical covariance matrix needed for the generalized score statistic. The new statistic has superior type I error behavior when compared to the other available test statistics, as demonstrated with simulations.

The paper is organized as follows. Section 2 develops simple formulations of the multinomial based Wald statistics without and with transformation (log, logit) of predictive values. Section 3 deals with the GEE based test statistics resulting from consideration of a logistic model. Section 3.1 derives simple formulation of the empirical Wald statistic and shows that it is equivalent to the multinomial based Wald statistic with the logit transformed predictive values. Section 3.2 derives two algebraically different but mathematically equivalent formulations of the generalized score statistic. The first formulation shows that the generalized score statistic can be quite similar to the multinomial based Wald statistic and the second formulation explains why in the independent samples situation the generalized score statistic does not reduce to the commonly used score statistic. Section 4.1 utilizes these insights to motivate a need for weights when computing the empirical covariance matrix. Section 4.2 presents the proposed weights and the resulting new weighted generalized score statistics. Section 5 contains simulations, Section 6 a detailed example, and discussion follows in Section 7.

2 Proposed new formulations of the multinomial based Wald statistics

This section introduces notations and develops simple formulations of the multinomial based Wald statistics without and with transformation of predictive values. The log and logit transformations are considered.

Table 2 displays general notations for a paired study design data. Appropriate marginal counts are denoted by a dot in the dimension over which marginalization occurs. For brevity we denote n = n_•••. Using this notations, an estimate of ppv for Test 1 is ${\hat{PPV}}_{1} = n_{+ • D} ∕ n_{+ • •}$ and for Test 2 is ${\hat{PPV}}_{2} = n_{• + D} ∕ n_{• + •}$ . Similarly, ${\hat{NPV}}_{1} = n_{- • \bar{D}} ∕ n_{- • •}$ and ${\hat{NPV}}_{2} = n_{• - \bar{D}} ∕ n_{• - •}$ .

Table 2.

Paired design data notations

	Disease (D)		No disease $(\bar{D})$
	Test 2 result		Test 2 result
Test 1 result	Positive (+)	Negative (−)	Positive (+)	Negative (−)
Positive (+)	n _++D	n _+−D	$n_{+ + \bar{D}}$	$n_{+ - \bar{D}}$
Negative (−)	n _−+D	n _− − D	$n_{- + \bar{D}}$	$n_{- - \bar{D}}$

Open in a new tab

Several statistics for comparing predictive values are derived by considering the cells in Table 2 to have multinomial distribution. Let g(π) = f(ppv₁) – f(ppv₂) = f(π_+•D/π_+••) – f(π_•+D/π_•+•) and h(π) = f(npv₁) – f(npv₂) = f(π_–•D/π_–••) – f(π_•–D/π_•–•), where f(·) is a monotone differentiable function. Utilizing the δ-technique, the Wald statistics for testing H₀ : ppv₁ = ppv₂ and H₀ : npv₁ = npv₂ are thus

\begin{matrix} T_{f (PPV)}^{M} & = \frac{g {(\hat{π})}^{2}}{{[\frac{\partial g (π)}{\partial π}]}_{π = \hat{π}}^{T} {\hat{Σ}}_{π} {[\frac{\partial g (π)}{\partial π}]}_{π = \hat{π}}} = \frac{{f ({\hat{PPV}}_{1}) - f ({\hat{PPV}}_{2})}^{2}}{var {f ({\hat{PPV}}_{1}) - f ({\hat{PPV}}_{2})}} \\ T_{f (NPV)}^{M} & = \frac{h {(\hat{π})}^{2}}{{[\frac{\partial h (π)}{\partial π}]}_{π = \hat{π}}^{T} {\hat{Σ}}_{π} {[\frac{\partial h (π)}{\partial π}]}_{π = \hat{π}}} = \frac{{f ({\hat{NPV}}_{1}) - f ({\hat{NPV}}_{2})}^{2}}{var {f ({\hat{NPV}}_{1}) - f ({\hat{NPV}}_{2})}}, \end{matrix}

where $n {\hat{Σ}}_{π} = diag (\hat{π}) - \hat{π} {\hat{π}}^{T}$ . The test statistics are distributed asymptotically as the $χ_{1}^{2}$ distribution under the corresponding null hypotheses.

For f(x) = x, the formulas for the denominators of the above test statistics in terms of counts in Table 2 were provided in [3, 8]. The formulas presented in [3] and [8] are mathematically equivalent but differ algebraically. These formulas are rather complex with no easy intuitive insight. Transformation f(x) = log x was considered in [2, 8]. Similarly, the provided formulas while mathematically equivalent are different algebraically, and are also rather involved and not easily intuitive. Below we present novel, simple, and intuitive algebraic re-formulations of these multinomial based Wald test statistics.

Considering f(x) = x and the estimated variance-covariance matrix of $f ({\hat{PPV}}_{1})$ and $f ({\hat{PPV}}_{2})$ as derived in Appendix A, we have

T_{PPV}^{M} = \frac{{({\hat{PPV}}_{1} - {\hat{PPV}}_{2})}^{2}}{\frac{{\hat{PPV}}_{1} (1 - {\hat{PPV}}_{1})}{n_{+ • •}} + \frac{{\hat{PPV}}_{2} (1 - {\hat{PPV}}_{2})}{n_{• + •}} - 2 \times C^{PPV} (\frac{1}{n_{+ • •}} + \frac{1}{n_{• + •}})}

with

C^{PPV} = {(n_{+ • •} + n_{• + •})}^{- 1} {n_{+ + D} (1 - {\hat{PPV}}_{1}) (1 - {\hat{PPV}}_{2}) + n_{+ + \bar{D}} {\hat{PPV}}_{1} {\hat{PPV}}_{2}} .

(1)

Here, the variance of ${\hat{PPV}}_{1} - {\hat{PPV}}_{2}$ is explicitly expressed as sum of the commonly used binomial variances of ${\hat{PPV}}_{1}$ and ${\hat{PPV}}_{2}$ minus two times covariance. The covariance depends on the estimates of positive predictive values and on the frequencies of concordant positive test results for diseased (n_++D) and disease free groups $(n_{+ + \bar{D}})$ .

Considering f(x) = log x, we have

T_{\log PPV}^{M} = \frac{{(\log {\hat{PPV}}_{1} - \log {\hat{PPV}}_{2})}^{2} \times {\hat{PPV}}_{1} {\hat{PPV}}_{2}}{\frac{{\hat{PPV}}_{2} (1 - {\hat{PPV}}_{1})}{n_{+ • •}} + \frac{{\hat{PPV}}_{1} (1 - {\hat{PPV}}_{2})}{n_{• + •}} - 2 C^{PPV} (\frac{1}{n_{+ • •}} + \frac{1}{n_{• + •}})} .

In addition, with f(x) = logit x = log{x/(1 – x)} we have

T_{logit PPV}^{M} = \frac{{(logit {\hat{PPV}}_{1} - logit {\hat{PPV}}_{2})}^{2} \times {\hat{PPV}}_{1} (1 - {\hat{PPV}}_{1}) {\hat{PPV}}_{2} (1 - {\hat{PPV}}_{2})}{\frac{{\hat{PPV}}_{1} (1 - {\hat{PPV}}_{1})}{n_{• + •}} + \frac{{\hat{PPV}}_{2} (1 - {\hat{PPV}}_{2})}{n_{+ • •}} - 2 C^{PPV} (\frac{1}{n_{+ • •}} + \frac{1}{n_{• + •}})} .

Similarly, utilizing the estimated variance-covariance matrix of $f ({\hat{NPV}}_{1})$ and $f ({\hat{NPV}}_{2})$ (see Appendix A), we suggest the following formulas for the Wald statistics testing equality of two negative predictive values:

\begin{matrix} T_{NPV}^{M} & = \frac{{({\hat{NPV}}_{1} - {\hat{NPV}}_{2})}^{2}}{\frac{{\hat{NPV}}_{1} (1 - {\hat{NPV}}_{1})}{n_{- • •}} + \frac{{\hat{NPV}}_{2} (1 - {\hat{NPV}}_{2})}{n_{• - •}} - 2 C^{NPV} (\frac{1}{n_{- • •}} + \frac{1}{n_{• - •}})} \\ T_{\log NPV}^{M} & = \frac{{(\log {\hat{NPV}}_{1} - \log {\hat{NPV}}_{2})}^{2} \times {\hat{NPV}}_{1} {\hat{NPV}}_{2}}{\frac{{\hat{NPV}}_{2} (1 - {\hat{NPV}}_{1})}{n_{- • •}} + \frac{{\hat{NPV}}_{1} (1 - {\hat{NPV}}_{2})}{n_{• - •}} - 2 C^{NPV} (\frac{1}{n_{- • •}} + \frac{1}{n_{• - •}})} \\ T_{logit NPV}^{M} & = \frac{{(logit {\hat{NPV}}_{1} - logit {\hat{NPV}}_{2})}^{2} \times {\hat{NPV}}_{1} (1 - {\hat{NPV}}_{1}) {\hat{NPV}}_{2} (1 - {\hat{NPV}}_{2})}{\frac{{\hat{NPV}}_{1} (1 - {\hat{NPV}}_{1})}{n_{• - •}} + \frac{{\hat{NPV}}_{2} (1 - {\hat{NPV}}_{2})}{n_{- • •}} - 2 C^{NPV} (\frac{1}{n_{- • •}} + \frac{1}{n_{• - •}})} \end{matrix}

with

C^{NPV} = {(n_{- • •} + n_{• - •})}^{- 1} {n_{- - D} {\hat{NPV}}_{1} {\hat{NPV}}_{2} + n_{- - \bar{D}} (1 - {\hat{NPV}}_{1}) (1 - {\hat{NPV}}_{2})} .

The proposed formulas can be seen as directly expanding the commonly used statistics for two independent samples and reduce to such statistics when there are no concordant results (i.e. $n_{+ + D} = n_{+ + \bar{D}} = n_{- - D} = n_{- - \bar{D}} = 0$ ), as is the case in the two independent samples situation. We will show below that the logit transformation based test statistic turns out to be the same as the empirical Wald test statistic from the GEE approach proposed in [7].

3 Proposed new formulations of the GEE based statistics

This section deals with the GEE based test statistics resulting from consideration of a logistic model. Section 3.1 derives simple formulation of the empirical Wald statistic and shows that it is equivalent to the multinomial based Wald statistic with the logit transformed predictive values. Section 3.2 derives two algebraically different but mathematically equivalent formulations of the generalized score statistic. The first formulation shows that the generalized score statistic can be quite similar to the multinomial based Wald statistic and the second formulation explains why in the independent samples situation the generalized score statistic does not reduce to the commonly used score statistic.

3.1 New formulation of the empirical Wald statistic

The GEE approach with disease status as the dependent variable was considered in [7]. For implementation of this approach, data in Table 2 can be re-expressed, with i denoting patient (cluster), j denoting test number within a cluster (j = 1, 2). Disease status is denoted by D_ij (1=disease, 0=no disease), covariate Z_ij indicates type of diagnostic test (1=Test 1, 0=Test 2), and T_ij denotes test result (1=positive, 0=negative). To test equality of positive predictive values, only data with at least one positive test result (i.e. T_i1 = 1 or T_i2 = 1) need to be used and the model

logit P (D_{ij} = 1 ∣ Z_{ij}, T_{ij} = 1) = α_{PPV} + β_{PPV} Z_{ij}

(2)

allows test of the hypothesis H₀ : β_ppv = 0 which is equivalent to H₀ : ppv₁ = ppv₂. To test equality of negative predictive values (H₀ : npv₁ = npv₂) the model logit P (D_ij = 1∣Z_ij, T_ij = 0) = α_npv + β_npvZ_ij is considered for data with at least one negative test result, and hypothesis H₀ : β_npv = 0 is tested.

The GEE empirical Wald statistic for testing H₀ : β_ppv = 0 in model (2) is $T_{PPV}^{GW} = {\hat{β}}_{PPV}^{2} ∕ {var}_{e} ({\hat{β}}_{PPV})$ , where the denominator is the second (last) diagonal element of the empirical variance-covariance matrix V_e = V_mI_eV_m. The matrix V_m is the model based variance-covariance matrix and matrix I_e is the empirical information matrix. Use of the identity working correlation matrix in the context of diagnostic tests is advocated in [7, 9] and thus, using notations from the above section, the information matrix can be expressed as

I = \sum_{i} [\begin{matrix} t_{i 1} & t_{i 2} \\ z_{i 1} t_{i 1} & z_{i 2} t_{i 2} \end{matrix}] cov (D_{i}) {[\begin{matrix} t_{i 1} & t_{i 2} \\ z_{i 1} t_{i 1} & z_{i 2} t_{i 2} \end{matrix}]}^{T} = \sum_{i} [\begin{matrix} t_{i 1} & t_{i 2} \\ t_{i 1} & 0 \end{matrix}] cov (D_{i}) [\begin{matrix} t_{i 1} & t_{i 1} \\ t_{i 2} & 0 \end{matrix}]

(3)

where cov(D_i) is a variance-covariance matrix of D_i = [D_i1, D_i2]^T. The model based variance-covariance matrix $V_{m} = I_{m}^{- 1}$ is derived in Appendix B and the empirical information matrix I_e results from (3) by substituting for cov(D_i) the matrix

{cov}_{e} (D_{i}) = [\begin{matrix} d_{i 1} - {\hat{d}}_{i 1} \\ d_{i 2} - {\hat{d}}_{i 2} \end{matrix}] {[\begin{matrix} d_{i 1} - {\hat{d}}_{i 1} \\ d_{i 2} - {\hat{d}}_{i 2} \end{matrix}]}^{T} = [\begin{matrix} d_{i 1} - {\hat{PPV}}_{1} \\ d_{i 2} - {\hat{PPV}}_{2} \end{matrix}] {[\begin{matrix} d_{i 1} - {\hat{PPV}}_{1} \\ d_{i 2} - {\hat{PPV}}_{2} \end{matrix}]}^{T} .

The empirical variance-covariance matrix V_e = V_mI_eV_m is derived in Appendix C and the novel algebraic formulation of the GEE empirical Wald test statistic for testing H₀ : β_ppv = 0 in model (2) is as follows:

T_{PPV}^{GW} = \frac{{\hat{β}}_{PPV}^{2} \times {\hat{PPV}}_{1} (1 - {\hat{PPV}}_{1}) {\hat{PPV}}_{2} (1 - {\hat{PPV}}_{2})}{\frac{{\hat{PPV}}_{1} (1 - {\hat{PPV}}_{1})}{n_{• + •}} + \frac{{\hat{PPV}}_{2} (1 - {\hat{PPV}}_{2})}{n_{+ • •}} - 2 C^{PPV} (\frac{1}{n_{+ • •}} + \frac{1}{n_{• + •}})} .

Similarly, the GEE empirical Wald statistic to test equality of two negative predictive values (H₀ : β_npv = 0) is

T_{NPV}^{GW} = \frac{{\hat{β}}_{NPV}^{2} \times {\hat{NPV}}_{1} (1 - {\hat{NPV}}_{1}) {\hat{NPV}}_{2} (1 - {\hat{NPV}}_{2})}{\frac{{\hat{NPV}}_{1} (1 - {\hat{NPV}}_{1})}{n_{• - •}} + \frac{{\hat{NPV}}_{2} (1 - {\hat{NPV}}_{2})}{n_{- • •}} - 2 C^{NPV} (\frac{1}{n_{- • •}} + \frac{1}{n_{• - •}})} .

Above formulations show that the GEE empirical Wald statistics are the same as the corresponding multinomial based Wald statistics utilizing the logit transformation (derived in Section 2), because ${\hat{β}}_{NPV} = logit {\hat{NPV}}_{2} - logit {\hat{NPV}}_{1}$ and ${\hat{β}}_{PPV} = logit {\hat{PPV}}_{1} - logit {\hat{PPV}}_{2}$ .

3.2 Two new formulations of the generalized score statistic

Utilizing model (2) the generalized score statistic formula for testing equality of two positive predictive values is presented in [7] and it is

T_{PPV}^{GS} = \frac{{\sum_{i = 1}^{N_{P}} \sum_{j = 1}^{m_{i}} D_{ij} (Z_{ij} - m_{i} \overset{‒}{Z})}^{2}}{\sum_{i = 1}^{N_{P}} {\sum_{j = 1}^{m_{i}} (D_{ij} - \overset{‒}{D}) (Z_{ij} - \overset{‒}{Z})}^{2}}

where, $\overset{‒}{D} = (\sum_{i = 1}^{N_{P}} \sum_{j = 1}^{m_{i}} D_{ij}) ∕ \sum_{i = 1}^{N_{P}} m_{i}, \overset{‒}{Z} = (\sum_{i = 1}^{N_{P}} \sum_{j = 1}^{m_{i}} Z_{ij}) ∕ \sum_{i = 1}^{N_{P}} m i$ , N_P is the number of patients with at least one positive test outcome, and m_i is the number of positive test results for the i-th patient.

Below, we derive two interesting (mathematically equivalent) formulations of the above statistic, with the first formulation facilitating its comparison with the multinomial based Wald statistic and the second one for direct comparison with the commonly used two independent samples score statistic. In addition, the new formulations provide motivation for the new weighted generalized score statistic we introduce in Section 4.

The general form of the generalized score statistic is $T^{GS} = {({LV}_{m}^{H_{0}} S^{H_{0}})}^{T} {({LV}_{e}^{H_{0}} L^{T})}^{- 1} {LV}_{m}^{H_{0}} S^{H_{0}}$ , where L is the contrast matrix with r rows (number of considered contrasts) and p columns (number of parameters), S is the score vector, V_m is the model based variance-covariance matrix, and V_e = V_mI_eV_m is an empirical variance-covariance matrix with I_e denoting the empirical information matrix. Quantities S, V_m, and I_e are computed under H₀. The generalized score statistic is distributed under H₀ as $χ_{r}^{2}$ .

For positive predictive values consider model (2) and because, as indicated earlier, the identity working matrix is required for application of GEE in the context of this paper [7, 9], the score vector under H₀ : β_ppv = 0 is (see Appendix D for details)

S^{H_{0}} = \sum_{i} [\begin{matrix} t_{i 1} & t_{i 2} \\ z_{i 1} t_{i 1} & z_{i 2} t_{i 2} \end{matrix}] [\begin{matrix} d_{i 1} - {\hat{d}}_{i 1}^{H_{0}} \\ d_{i 2} - {\hat{d}}_{i 2}^{H_{0}} \end{matrix}] = [\begin{matrix} 0 \\ ({\hat{PPV}}_{1} - {\hat{PPV}}_{2}) ∕ (\frac{1}{n_{+ • •}} + \frac{1}{n_{• + •}}) \end{matrix}],

where ${\hat{d}}_{i 1}^{H_{0}} = {\hat{d}}_{i 2}^{H_{0}} = {\hat{PPV}}_{p} = (n_{+ • D} + n_{• + D}) ∕ (n_{+ • •} + n_{• + •})$ is the pooled positive predictive value. Alternatively ${\hat{PPV}}_{p} = w_{1} {\hat{PPV}}_{1} + w_{2} {\hat{PPV}}_{2}$ with w₁ = n_+••/(n_+••+n_•+•) and w₂ = 1 – w₁. The model based variance-covariance matrix $V_{m}^{H_{0}}$ has the same formulation as matrix V_m derived in Appendix B but with ${\hat{PPV}}_{p}$ substituted for ${\hat{PPV}}_{1}$ and ${\hat{PPV}}_{2}$ . To test H₀ : β_ppv = 0, the contrast matrix L = [0, 1] is used, and the generalized score statistic can be expressed as

T_{PPV}^{GS} = \frac{{({LV}_{m}^{H_{0}} S^{H_{0}})}^{T} {LV}_{m}^{H_{0}} S^{H_{0}}}{{LV}_{m}^{H_{0}} I_{e}^{H_{0}} V_{m}^{H_{0}} L^{T}} = \frac{{({\hat{PPV}}_{1} - {\hat{PPV}}_{2})}^{2}}{[\frac{- 1}{n_{• + •}} \frac{1}{n_{+ • •}} + \frac{1}{n_{• + •}}] I_{e}^{H_{0}} {[\frac{- 1}{n_{• + •}} \frac{1}{n_{+ • •}} + \frac{1}{n_{• + •}}]}^{T}},

(4)

with the empirical information matrix $I_{e}^{H_{0}}$ obtained by inserting the matrix

{cov}_{e}^{H_{0}} (D_{i}) = [\begin{matrix} d_{i 1} - {\hat{d}}_{i 1}^{H_{0}} \\ d_{i 2} - {\hat{d}}_{i 2}^{H_{0}} \end{matrix}] {[\begin{matrix} d_{i 1} - {\hat{d}}_{i 1}^{H_{0}} \\ d_{i 2} - {\hat{d}}_{i 2}^{H_{0}} \end{matrix}]}^{T} = [\begin{matrix} d_{i 1} - {\hat{PPV}}_{p} \\ d_{i 2} - {\hat{PPV}}_{p} \end{matrix}] {[\begin{matrix} d_{i 1} - {\hat{PPV}}_{p} \\ d_{i 2} - {\hat{PPV}}_{p} \end{matrix}]}^{T}

in place of cov(D_i) in expression (3). After brief algebra, the first new formulation of the generalized score statistic for testing equality of two positive predictive values is

T_{PPV}^{GS} = \frac{{({\hat{PPV}}_{1} - {\hat{PPV}}_{2})}^{2}}{\frac{{\hat{PPV}}_{1} (1 - {\hat{PPV}}_{1})}{n_{+ • •}} + \frac{{\hat{PPV}}_{2} (1 - {\hat{PPV}}_{2})}{n_{• + •}} + R_{PPV} - 2 C_{p}^{PPV} (\frac{1}{n_{+ • •}} + \frac{1}{n_{• + •}})},

(5)

with

\begin{matrix} R_{PPV} & = {({\hat{PPV}}_{1} - {\hat{PPV}}_{p})}^{2} ∕ n_{+ • •} + {({\hat{PPV}}_{2} - {\hat{PPV}}_{p})}^{2} ∕ n_{• + •} \\ C_{p}^{PPV} & = {(n_{+ • •} + n_{• + •})}^{- 1} {n_{+ + D} {(1 - {\hat{PPV}}_{p})}^{2} + n_{+ + \bar{D}} {\hat{PPV}}_{p}^{2}} . \end{matrix}

This formulation shows that the generalized score statistic can be quite similar to the multi-nomial based Wald statistic $T_{PPV}^{M}$ if R_ppv is small and $C_{p}^{PPV}$ is close to C^ppv.

Further algebra leads to the second new formulation of the generalized score statistic

T_{PPV}^{GS} = \frac{{({\hat{PPV}}_{1} - {\hat{PPV}}_{2})}^{2}}{{{\hat{PPV}}_{p} (1 - {\hat{PPV}}_{p}) + W_{PPV} - 2 C_{p}^{PPV}} (\frac{1}{n_{+ • •}} + \frac{1}{n_{• + •}})},

(6)

where $W_{PPV} = (2 {\hat{PPV}}_{p} - {\hat{PPV}}_{1} - {\hat{PPV}}_{2}) (2 {\hat{PPV}}_{p} - 1)$ . Since W_ppv is not always equal to zero, the above formulation explicitly shows that the generalized score statistic does not always reduce to the score statistic when applied to independent samples (see Section 4 for details).

Similarly, two new formulations of the generalized score statistic for testing equality of two negative predictive values are

\begin{matrix} T_{NPV}^{GS} & = \frac{{({\hat{NPV}}_{1} - {\hat{NPV}}_{2})}^{2}}{\frac{{\hat{NPV}}_{1} (1 - {\hat{NPV}}_{1})}{n_{- • •}} + \frac{{\hat{NPV}}_{2} (1 - {\hat{NPV}}_{2})}{n_{• - •}} + R_{NPV} - 2 C_{p}^{NPV} (\frac{1}{n_{- • •}} + \frac{1}{n_{• - •}})} \\ T_{NPV}^{GS} & = \frac{{({\hat{NPV}}_{1} - {\hat{NPV}}_{2})}^{2}}{{{\hat{NPV}}_{p} (1 - {\hat{NPV}}_{p}) + W_{NPV} - 2 C_{p}^{NPV}} (\frac{1}{n_{- • •}} + \frac{1}{n_{• - •}})}, \end{matrix}

where

\begin{matrix} {\hat{NPV}}_{p} & = (n_{- • \bar{D}} + n_{• - \bar{D}}) ∕ (n_{- • •} + n_{• - •}) \\ R_{NPV} & = {({\hat{NPV}}_{1} - {\hat{NPV}}_{p})}^{2} ∕ n_{- • •} + {({\hat{NPV}}_{2} - {\hat{NPV}}_{p})}^{2} ∕ n_{• - •} \\ C_{p}^{NPV} & = {(n_{- • •} + n_{• - •})}^{- 1} {n_{- - D} {\hat{NPV}}_{p}^{2} + n_{- - \bar{D}} {(1 - {\hat{NPV}}_{p})}^{2}} \\ W_{NPV} & = (2 {\hat{NPV}}_{p} - {\hat{NPV}}_{1} - {\hat{NPV}}_{2}) (2 {\hat{NPV}}_{p} - 1) . \end{matrix}

4 Development of the weighted generalized score statistic

This section motivates and presents the new statistic. Section 4.1 utilizes insights provided by the two formulations of the generalized score statistic to motivate a need for weights when computing the empirical covariance matrix. Section 4.2 presents the proposed weights and the new weighted generalized score statistics.

4.1 Motivation

The formulation (6) of the generalized score statistic shows that in the two independent samples situation (here n_+•• denotes size of sample with Test 1 and n_•+• size of an independent sample with Test 2) the generalized score statistic $T_{PPV}^{GS}$ does not always reduce to the commonly used score statistic

T_{PPV}^{S} = \frac{{({\hat{PPV}}_{1} - {\hat{PPV}}_{2})}^{2}}{{\hat{PPV}}_{p} (1 - {\hat{PPV}}_{p}) (\frac{1}{n_{+ • •}} + \frac{1}{n_{• + •}})} .

(7)

Even though with independent samples we have $C_{p}^{PPV} = 0$ because there are no concordant positive test results, the term W_ppv equals zero only if ${\hat{PPV}}_{1} = {\hat{PPV}}_{2}$ or ${\hat{PPV}}_{p} = 0.5$ , or in a balanced situation, i.e. when n_+•• = n_•+•. Hence, in general, the generalized score statistic does not reduce to the score statistic when dealing with two independent samples. In addition, formulation (5) shows that the generalized score statistic can be quite similar to the multinomial Wald statistic $T_{PPV}^{M}$ if R_ppv is small and $C_{p}^{PPV}$ is close to C^ppv. In fact, in the two independent samples situation (here $C_{p}^{PPV} = C^{PPV} = 0$ ) with unequal sample sizes (n_+•• ≠ n_•+•), the generalized score statistic tracks closely the multinomial based Wald statistic $T_{PPV}^{M}$ rather than the score statistic. An example of this relationship is presented on Figure 1 which shows how these statistics change as the proportion of Test 1 positives among Test 1 or Test 2 positives changes. In a balanced situation this proportion is w₁ ≡ n_+••/(n_+•• + n_•+•) = 0.5 and the generalized score statistic and the score statistic have the same value. This is because with w₁ = 0.5 the term W_ppv in expression (6) equals zero. However, when the two independent samples have different sizes (w₁ ≠ 0.5) then, as w₁ changes, the generalized score statistic surprisingly tracks the multinomial based $T_{PPV}^{M}$ rather than the score statistic (7).

Behavior of the generalized score statistic (solid curve), the score statistic (dashed curve) and the multinomial based Wald statistic $T_{PPV}^{M}$ (dotted curve) in the two independent samples situation as a function of proportion of Test 1 positives among patients with positive result on either of the two diagnostic tests (ppv₁ = 0.80, ppv₂ = 0.85, total n = 900)

To alleviate this behavior of the generalized score statistic we develop a weighted generalized score statistic which always reduces to the score statistic when applied to two independent samples. Motivation for the form of the proposed statistic arises from comparison of the following two formulas, with the first one a re-expression of the score statistic (7) and the second one a re-expression of the generalized score statistics (6) in the two independent samples situation:

\begin{matrix} T_{PPV}^{S} & = \frac{{({\hat{PPV}}_{1} - {\hat{PPV}}_{2})}^{2}}{[w_{1} {\frac{1}{n_{+ • •}} \sum_{i} t_{i 1} {(d_{i 1} - {\hat{PPV}}_{p})}^{2}} + w_{2} {\frac{1}{n_{• + •}} \sum_{i} t_{i 2} {(d_{i 2} - {\hat{PPV}}_{p})}^{2}}] (\frac{1}{n_{+ • •}} + \frac{1}{n_{• + •}})} \\ T_{PPV}^{GS} & = \frac{{({\hat{PPV}}_{1} - {\hat{PPV}}_{2})}^{2}}{[w_{2} {\frac{1}{n_{+ • •}} \sum_{i} t_{i 1} {(d_{i 1} - {\hat{PPV}}_{p})}^{2}} + w_{1} {\frac{1}{n_{• + •}} \sum_{i} t_{i 2} {(d_{i 2} - {\hat{PPV}}_{p})}^{2}}] (\frac{1}{n_{+ • •}} + \frac{1}{n_{• + •}})} . \end{matrix}

Recall that w₁ = n_+••/(n_+•• + n_•+•) and w₂ = 1 – w₁. The only difference between the above two formulas is the opposite weighting of the averages in the curly brackets. This observation leads to consideration of weights (defined in Section below) when computing the empirical covariance matrix utilized in derivation of the generalized score statistic. With these weights the generalized score statistic will be equal to the score statistic in the independent samples situation. Alternative intuition behind the suggestion of weights is that if w₁ > 0.5 then the pooled estimate ${\hat{PPV}}_{p}$ is influenced more by the Test 1 data; thus the distance $d_{i 1} - {\hat{PPV}}_{p}$ used in the empirical covariance matrix is too small and should be up-weighted and distance $d_{i 2} - {\hat{PPV}}_{p}$ down-weighted. Similarly, if w₁ < 0.5 then ${\hat{PPV}}_{p}$ is influenced more by the Test 2 data and the distance $d_{i 1} - {\hat{PPV}}_{p}$ should be down-weighted with distance $d_{i 2} - {\hat{PPV}}_{p}$ up-weighted.

4.2 The proposed weighted generalized score statistic

We propose weights $v_{j}^{PPV} = w_{j} {(w_{1} w_{2})}^{- 1 ∕ 2} (j = 1, 2)$ , i.e. $v_{1}^{PPV} = {(n_{+ • •} ∕ n_{• + •})}^{1 ∕ 2}$ and $v_{2}^{PPV} = {(n_{• + •} ∕ n_{+ • •})}^{1 ∕ 2}$ , and the following weighted empirical covariance matrix:

{cov}_{we}^{H_{0}} (D_{i}) = [\begin{matrix} v_{1}^{PPV} (d_{i 1} - {\hat{PPV}}_{p}) \\ v_{2}^{PPV} (d_{i 2} - {\hat{PPV}}_{p}) \end{matrix}] {[\begin{matrix} v_{1}^{PPV} (d_{i 1} - {\hat{PPV}}_{p}) \\ v_{2}^{PPV} (d_{i 2} - {\hat{PPV}}_{p}) \end{matrix}]}^{T} .

The weighted empirical information matrix $I_{we}^{H_{0}}$ is obtained by substituting the above matrix for cov(D_i) in expression (3). Subsequently, expression (4) with $I_{we}^{H_{0}}$ inserted in place of $I_{e}^{H_{0}}$ leads to the proposed weighted generalized score (wgs) statistic:

T_{PPV}^{WGS} = \frac{{({\hat{PPV}}_{1} - {\hat{PPV}}_{2})}^{2}}{{{\hat{PPV}}_{p} (1 - {\hat{PPV}}_{p}) - 2 C_{p}^{PPV}} (\frac{1}{n_{+ • •}} + \frac{1}{n_{• + •}})} .

(8)

The above statistic is the same as the generalized score statistic in a balanced design (i.e. when w₁ = 0.5), because then the term W_ppv in formulation (6) of the generalized score statistic equals zero. However, a balanced situation is not likely in a paired design and in a more common unbalanced situation these two statistics are different. In the independent samples case ( $n_{+ + D} = n_{+ + \bar{D}} = 0$ , i.e. $C_{p}^{PPV} = 0$ ), the proposed weighted generalized score statistic (8) reduces to the score statistic (7) even in an unbalanced design, in contrast to the generalized score statistic.

Similarly, for testing equality of two negative predictive values the weights are $v_{1}^{NPV} = {(n_{- • •} ∕ n_{• - •})}^{1 ∕ 2}$ and $v_{2}^{NPV} = {(n_{• - •} ∕ n_{- • •})}^{1 ∕ 2}$ , and the following weighted generalized score statistic is suggested:

T_{NPV}^{WGS} = \frac{{({\hat{NPV}}_{1} - {\hat{NPV}}_{2})}^{2}}{{{\hat{NPV}}_{p} (1 - {\hat{NPV}}_{p}) - 2 C_{p}^{NPV}} (\frac{1}{n_{- • •}} + \frac{1}{n_{• - •}})} .

(9)

In the next Section we demonstrate with simulations that the weighted generalized score statistic has superior type I error behavior as compared to the generalized score statistic, as well as compared to the discussed multinomial based statistics.

5 Simulations

We performed simulations to evaluate size and power of test statistics. Each true scenario considers predictive values, prevalence of disease, and measure of association between the two test results. Rather than assume the same degree of association of diagnostic tests in each disease category, we consider a more general approach and allow differential association of two tests for diseased and not diseased groups, as parameterized with odds ratios OR_D and ${OR}_{\bar{D}}$ , respectively. This may be a more realistic scenario because, for example, in the CAD data (Table 1) the odds ratio in the diseased group is 473×25/(29×81) = 5.03 and it is substantially larger than the odds ratio 22 × 151/(46 × 44) = 1.64 in the group with no disease.

Hence, seven true parameters formulate a simulation scenario (OR_D, ${OR}_{\bar{D}}$ , ppv₁, ppv₂, npv₁, npv₂, and disease prevalence θ). These parameters can define the corresponding true multinomial distribution reflecting the paired design considered in this paper and the multinomial probabilities are shown in Table 3. To derive probabilities of this multinomial distribution, we first compute sensitivity se and specificity sp of each test based on the considered predictive values and the disease prevalence. Quantities x and y in Table 3 are margin compatible solutions of quadratic equations resulting from equating cross-product of the first two columns (diseased group) to OR_D and the last two columns (non-diseased group) to ${OR}_{\bar{D}}$ , respectively. The occasionally assumed in literature conditional independence situation, in which tests are considered independent conditional on the disease status, is equivalent to assumption that ${OR}_{D} = {OR}_{\bar{D}} = 1$ . In this situation, the quadratic equations reduce to linear equations with solutions x = se₁se₂ and y = sp₁sp₂, and Table 3 probabilities are then simply probabilities expected under no association between tests within a disease status category.

Table 3.

Multinomial distribution probabilities for specified predictive values ppv₁, ppv₂, npv₁, and npv₂, disease prevalence θ, and odds ratios OR_D and ${OR}_{\bar{D}}$ . Quantities x and y are compatible solutions of quadratic equations resulting from equating cross-product of the first two columns to OR_D and the last two columns to ${OR}_{\bar{D}}$ , respectively.

	Disease (D)		No disease $(\bar{D})$
	Test 2 result		Test 2 result
Test 1 result	Positive	Negative	Positive	Negative
Positive	θ x	θ(se₁ – x)	(1 – θ)(1 – sp₁ – sp₂ + y)	(1 – θ)(sp₂ – y)
Negative	θ(se₂ – x)	θ(1 – se₁ – se₂ + x)	(1 – θ)(sp₁ – y)	(1 – θ)y
se₁ = ppv₁(npv₁ – 1 + θ)/{θ(ppv₁ + npv₁ – 1)},
sp₁ = npv₁(ppv₁ – θ)/{(1 – θ)(ppv₁ + npv₁ – 1)}
se₂ = ppv₂(npv₂ – 1 + θ)/{θ(ppv₂ + npv₂ – 1)},
sp₂ = npv₂(ppv₂ – θ)/{(1 – θ)(ppv₂ + npv₂ – 1)}

Open in a new tab

For each considered true scenario, the corresponding multinomial distribution described in Table 3 was derived, and data (as in Table 2) with total sample size n were generated repetitively from this distribution. We added a very small number to each cell of generated data when this particular data set led to inability to compute one of the considered statistics because of zero cells or predictive values equal to zero or one. We generated data four million times because results are displayed to three decimal digits and this number of simulations provides the length of the 95% confidence interval for the proportion of rejected tests to be less than 0.001. For each simulation, H₀ was rejected if a test statistic exceeded the 0.95 quantile of the $χ_{1}^{2}$ distribution.

Table 4 displays empirical size for the multinomial Wald statistics, the generalized score statistic, and the proposed weighted generalized score statistics. The results for the multinomial based Wald statistic using logit transformation and the GEE empirical Wald statistic are displayed in one column, because these statistics are the same as shown in Section 3.1. We considered ppv₁ = ppv₂ = 0.75, npv₁ = npv₂ = 0.80, and varied prevalence of disease θ and total sample size n. We specified OR_D = 5.0 and ${OR}_{\bar{D}} = 2.0$ , which is similar to the CAD data in Table 1. As seen in Table 4 the proposed weighted generalized score (wgs) statistics preserve the nominal 0.05 type I error better than the other statistics. The empirical size of T^m is similar or slightly larger than for the generalized score statistics, as anticipated from expression (5), but the empirical sizes for the both statistics are moderately inflated, mainly with lower disease prevalence or smaller sample sizes. The empirical size is underestimated for logit (or equivalently GEE empirical Wald) and log transformation multinomial statistics, especially for smaller sample sizes.

Table 4.

Percent of 4,000,000 repeated simulations with H₀ rejected at 5 percent level. True predictive values are PPV₁ = PPV₂ = 0.75 and NPV₁ = NPV₂ = 0.80, OR_D = 5 and ${OR}_{\bar{D}} = 2$ are odds ratios defining degree of association between results of the two tests among diseased and not-diseased, n is the total number of patients with two tests, and θ is the prevalence of diease.

		H₀ : PPV₁ = PPV₂					H₀ : NPV₁ = NPV₂
θ	n	T ^m	$T_{\log}^{M}$	$T_{logit}^{M}$	T ^gs	T ^wgs	T ^m	$T_{\log}^{M}$	$T_{logit}^{M}$	T ^gs	T ^wgs
				T ^gw					T ^gw
0.3	50	9.3	3.6	0.9	7.0	4.2	4.8	3.8	3.7	4.8	4.2
	100	6.6	4.4	2.4	6.0	5.0	5.1	4.7	4.7	5.1	4.8
	200	5.7	4.6	4.2	5.5	5.0	5.1	4.8	4.8	5.1	4.9
	300	5.5	4.7	4.5	5.3	5.0	5.0	4.9	4.9	5.0	4.9
	400	5.4	4.8	4.7	5.3	5.0	5.0	4.9	4.9	5.0	4.9
	500	5.3	4.8	4.7	5.2	5.0	5.0	5.0	5.0	5.0	5.0
0.4	50	6.4	4.4	2.6	6.1	4.9	5.4	4.0	3.7	5.3	4.5
	100	5.7	4.6	4.2	5.6	5.0	5.3	4.6	4.6	5.2	4.9
	200	5.3	4.8	4.7	5.3	5.0	5.1	4.8	4.8	5.1	5.0
	300	5.2	4.9	4.8	5.2	5.0	5.1	4.9	4.9	5.1	4.9
	400	5.2	4.9	4.8	5.1	5.0	5.1	4.9	4.9	5.1	5.0
	500	5.1	4.9	4.9	5.1	5.0	5.0	4.9	4.9	5.0	5.0
0.5	50	5.9	4.5	3.8	5.8	4.9	5.6	3.8	2.9	5.4	4.5
	100	5.4	4.8	4.6	5.4	5.0	5.4	4.5	4.3	5.3	4.9
	200	5.2	4.9	4.8	5.2	5.0	5.2	4.8	4.7	5.2	5.0
	300	5.1	4.9	4.9	5.1	5.0	5.1	4.8	4.8	5.1	5.0
	400	5.1	4.9	4.9	5.1	5.0	5.1	4.9	4.9	5.1	5.0
	500	5.1	5.0	4.9	5.1	5.0	5.1	4.9	4.9	5.1	5.0

Open in a new tab

Table 5 summarizes empirical power when the true predictive values are ppv₁ = 0.75, ppv₂ = 0.85, npv₁ = 0.80, and npv₂ = 0.90. Here we also consider OR_D = 5.0 and ${OR}_{\bar{D}} = 2.0$ , and vary prevalence of disease θ and total sample size n. Empirical power of the multinomial T^m and the generalized score statistics may be somewhat overestimated because of potential for the inflated type I error as demonstrated in Table 4. Power of the proposed weighted generalized score statistic is slightly higher or similar to power of the generalized score statistic. Although not shown, as expected, power for the conditional independence situation is less than power with the chosen positive association between the two tests.

Table 5.

Percent of 4,000,000 repeated simulations with H₀ rejected at 5 percent level. True predictive values are PPV₁ = 0.75, PPV₂ = 0.85, NPV₁ = 0.80, and NPV₂ = 0.90, OR_D = 5 and ${OR}_{\bar{D}} = 2$ are odds ratios defining degree of association between results of the two tests among diseased and not-diseased, n is the total number of patients with two tests, and θ is the prevalence of diease.

		H₀ : PPV₁ = PPV₂					H₀ : NPV₁ = NPV₂
θ	n	T ^m	$T_{\log}^{M}$	$T_{logit}^{M}$	T ^gs	T ^wgs	T ^m	$T_{\log}^{M}$	$T_{logit}^{M}$	T ^gs	T ^wgs
				T ^GW					T ^GW
0.3	50	11.7	4.4	4.0	8.8	10.6	51.4	47.3	40.2	51.4	47.4
	100	15.0	9.5	12.2	13.4	16.6	82.3	81.1	80.0	82.3	81.2
	200	24.8	20.3	26.0	23.5	27.5	98.5	98.4	98.3	98.5	98.4
	300	35.0	31.2	37.1	33.8	37.9	99.9	99.9	99.9	99.9	99.9
	400	44.7	41.5	47.0	43.7	47.6	100	100	100	100	100
	500	53.5	50.8	55.8	52.6	56.2	100	100	100	100	100
0.4	50	16.2	11.5	8.8	15.3	15.4	34.6	30.2	23.4	34.3	32.1
	100	26.6	22.8	23.9	25.9	26.7	61.5	58.9	57.8	61.3	60.4
	200	46.7	44.1	46.1	46.2	47.3	89.3	88.7	88.7	89.3	89.1
	300	63.3	61.6	63.3	62.9	63.9	97.6	97.5	97.5	97.6	97.5
	400	75.7	74.6	75.9	75.4	76.2	99.5	99.5	99.5	99.5	99.5
	500	84.4	83.7	84.6	84.2	84.8	99.9	99.9	99.9	99.9	99.9
0.5	50	23.6	19.5	16.7	23.3	21.9	22.3	17.5	12.3	21.5	21.7
	100	40.7	38.0	38.0	40.5	40.0	41.7	37.9	37.1	41.2	41.6
	200	68.2	66.9	67.3	68.1	68.0	70.1	68.4	69.3	69.8	70.4
	300	84.6	84.0	84.3	84.5	84.6	86.3	85.6	86.1	86.2	86.6
	400	93.1	92.8	93.0	93.0	93.1	94.2	93.9	94.2	94.2	94.4
	500	97.1	97.0	97.0	97.1	97.1	97.7	97.6	97.7	97.7	97.7

Open in a new tab

In conclusion, the simulation results support use of the proposed weighted generalized score statistics by demonstrating excellent size and power behavior.

6 Example

We consider the coronary artery disease data [1] presented in Table 1 (n = 871) and provide detailed calculations which may be easily followed by practitioners with their own data.

First we compute the weighted generalized score statistic for testing equality of positive predictive values. The number of patients with positive Test 1 is n_+•• = 473+29+22+46 = 570 and with positive Test 2 is n_•+• = 473 + 81 + 22 + 44 = 620. This is an unbalanced situation because n_+•• ≠ n_•+•. Positive predictive values are estimated as ${\hat{PPV}}_{1} = (473 + 29) ∕ 570 = 0.881$ and ${\hat{PPV}}_{2} = (473 + 81) ∕ 620 = 0.894$ . The pooled estimate is ${\hat{PPV}}_{p} = (473 + 29 + 473 + 81) ∕ (570 + 620) = 0.887$ . Subsequently, ${\hat{PPV}}_{p} (1 - {\hat{PPV}}_{p}) = 0.100, 473 {(1 - {\hat{PPV}}_{p})}^{2} + 22 {\hat{PPV}}_{p}^{2} = 23.322$ , and $2 C_{p}^{PPV} = 2 \times 23.322 ∕ (570 + 620) = 0.0392$ . Substituting computed values we obtain

T_{PPV}^{WGS} = \frac{{({\hat{PPV}}_{1} - {\hat{PPV}}_{2})}^{2}}{{{\hat{PPV}}_{p} ({\hat{PPV}}_{p}) - 2 C_{p}^{PPV}} (\frac{1}{570} + \frac{1}{620})} = 0.807 .

Hence, utilizing $χ_{1}^{2}$ distribution, we have p = 0.37 and conclude that there is no evidence to claim difference between the two positive predictive. As an aside, observe that the term $2 C_{p}^{PPV}$ reduces the denominator substantially and thus correlation between Test 1 and Test 2 has a major impact in this data set. For comparison, the generalized score statistic is 0.802 and the multinomial based Wald statistics are as follows: 0.802 for the untransformed version, 0.800 for the log transformed version, and 0.806 for the logit transformed version (or equivalently the GEE empirical Wald).

The weighted generalized score statistic for comparison of two negative predictive values is computed similarly. We have n_–•• = 81+25+44+151 = 301, n_•–• = 29+25+46+151 = 251, and the estimated negative predictive values are ${\hat{NPV}}_{1} = (44 + 151) ∕ 301 = 0.648$ and ${\hat{NPV}}_{2} = (46 + 151) ∕ 251 = 0.785$ . The pooled estimate is ${\hat{NPV}}_{p} = (44 + 151 + 46 + 151) ∕ (301 + 251) = 0.710$ . Then ${\hat{NPV}}_{p} (1 - {\hat{NPV}}_{p}) = 0.206, 25 {\hat{NPV}}_{p}^{2} + 151 {(1 - {\hat{NPV}}_{p})}^{2} = 25.294$ , and $2 C_{p}^{NPV} = 2 \times 25.294 ∕ (301 + 251) = 0.0916$ . Subsequently,

T_{NPV}^{WGS} = \frac{{({\hat{NPV}}_{1} - {\hat{NPV}}_{2})}^{2}}{{{\hat{NPV}}_{p} (1 - {\hat{NPV}}_{p}) - 2 C_{p}^{NPV}} (\frac{1}{301} + \frac{1}{251})} = 22.502 .

Here, p < 0.001, and we conclude that the two negative predictive values differ. The generalized score statistic is 23.579 and the multinomial based Wald statistics are as follows: 23.725 for the untransformed version, 22.437 for the log transformed version, and 21.742 for the logit transformed version (or equivalently for the GEE empirical Wald statistic).

7 Discussion

We have proposed new weighted generalized score test statistics (8) and (9) for testing equality of positive and negative predictive values of two diagnostic tests in a paired design. Simulations indicate that these statistics preserve type I error better than the generalized score statistic or the multinomial distribution based Wald statistics. The proposed statistics are intuitive, simple to compute, and in absence of correlated data (unpaired design) they naturally reduce to the commonly used score statistic for comparison of two proportions with independent samples. Thus, we recommend the weighted generalized score (wgs) test statistic for hypothesis testing and for corresponding sample size computations. We have also developed novel simple formulas for the existing Wald statistics and they may be used for computation of confidence intervals for difference of two predictive values. Although, the simple univariable formulas for comparison of predictive values provide intuitive insight, a GEE based regression analysis is necessary when adjustment for covariates is of interest, especially if performance of a test varies substantially according to patient characteristics [7]. This is similar to co-existence of logistic regression methodology and the commonly used “paper-and-pencil” formulas for comparison of two proportions. Finally, we have proposed the weighted generalized score statistic in the setting of diagnostic tests, but with further development the introduced concept may lead to an improved score test within the general GEE framework.

Acknowledgement

This project was supported by the National Center for Research Resources and the National Center for Advancing Translational Science of the National Institutes of Health through the Clinical and Translational Science Award Number UL1RR024128.

Appendix A. Derivation of multinomial based variance-covariance matrix of predictive values

The matrix of derivatives of f(ppv₁) = f(π_+•D/π_+••) and f(ppv₂) = f(π_•+D/π_•+•) with respect to $π = {[π_{+ + D}, π_{- + D}, π_{+ - D}, π_{- - D}, π_{+ + \bar{D}}, π_{- + \bar{D}}, π_{+ - \bar{D}}, π_{- - \bar{D}}]}^{T}$ is G = BD, where D is a diagonal matrix with derivatives f’(ppv₁) and f’(ppv₂) on the diagonal, and

B = {[\begin{matrix} 1 - {PPV}_{1} & 0 & 1 - {PPV}_{1} & 0 & - {PPV}_{1} & 0 & - {PPV}_{1} & 0 \\ 1 - {PPV}_{2} & 1 - {PPV}_{2} & 0 & 0 & - {PPV}_{2} & - {PPV}_{2} & 0 & 0 \end{matrix}]}^{T} [\begin{matrix} \frac{1}{π_{+ • •}} & 0 \\ 0 & \frac{1}{π_{• + •}} \end{matrix}] .

Utilizing the δ-technique and the variance-covariance matrix Σ_π of the multinomial distribution we have B^T π = 0 and

\begin{matrix} {cov}_{f (PPV)}^{M, δ} ([\begin{matrix} f ({\hat{PPV}}_{1}) \\ f ({\hat{PPV}}_{2}) \end{matrix}]) = {\hat{G}}^{T} {\hat{Σ}}_{π} \hat{G} = \hat{D} {\hat{B}}^{T} \frac{diag (\hat{π}) - \hat{π} {\hat{π}}^{T}}{n_{• • •}} \hat{B} \hat{D} = \hat{D} {\hat{B}}^{T} \frac{diag (π)}{n_{• • •}} \hat{B} \hat{D} \\ = [\begin{matrix} f^{'} {({\hat{PPV}}_{1})}^{2} \times \frac{{\hat{PPV}}_{1} (1 - {\hat{PPV}}_{1})}{n_{+ • •}} & f^{'} ({\hat{PPV}}_{1}) f^{'} ({\hat{PPV}}_{2}) \times C^{PPV} (\frac{1}{n_{+ • •}} + \frac{1}{n_{• + •}}) \\ f^{'} ({\hat{PPV}}_{1}) f^{'} ({\hat{PPV}}_{2}) \times C^{PPV} (\frac{1}{n_{+ • •}} + \frac{1}{n_{• + •}}) & f^{'} {({\hat{PPV}}_{2})}^{2} \times \frac{{\hat{PPV}}_{2} (1 - {\hat{PPV}}_{2})}{n_{• + •}} \end{matrix}] \end{matrix}

where $C^{PPV} = \frac{1}{n_{+ • •} + n_{• + •}} {n_{+ + D} (1 - {\hat{PPV}}_{1}) (1 - {\hat{PPV}}_{2}) + n_{+ + \bar{D}} {\hat{PPV}}_{1} {\hat{PPV}}_{2}}$ .

Similarly, variance-covariance matrix ${cov}_{f (NPV)}^{M, δ}$ of $f ({\hat{NPV}}_{1})$ and $f ({\hat{NPV}}_{2})$ is

[\begin{matrix} f^{'} {({\hat{NPV}}_{1})}^{2} \times \frac{{\hat{NPV}}_{1} (1 - {\hat{NPV}}_{1})}{n_{- • •}} & f^{'} ({\hat{NPV}}_{1}) f^{'} ({\hat{NPV}}_{2}) \times C^{NPV} (\frac{1}{n_{- • •}} + \frac{1}{n_{• - •}}) \\ f^{'} ({\hat{NPV}}_{1}) f^{'} ({\hat{NPV}}_{2}) \times C^{NPV} (\frac{1}{n_{- • •}} + \frac{1}{n_{• - •}}) & f^{'} {({\hat{NPV}}_{2})}^{2} \times \frac{{\hat{NPV}}_{2} (1 - {\hat{NPV}}_{2})}{n_{• - •}} \end{matrix}]

where $C^{NPV} = \frac{1}{n_{- • •} + n_{• - •}} {n_{- - D} {\hat{NPV}}_{1} {\hat{NPV}}_{2} + n_{- - \bar{D}} (1 - {\hat{NPV}}_{1}) (1 - {\hat{NPV}}_{2})}$ .

Appendix B. Derivation of model based variance-covariance matrix

\begin{matrix} I_{m} & = \sum_{i} [\begin{matrix} t_{i 1} & t_{i 2} \\ t_{i 1} & 0 \end{matrix}] [\begin{matrix} {\hat{PPV}}_{1} (1 - {\hat{PPV}}_{1}) & 0 \\ 0 & {\hat{PPV}}_{2} (1 - {\hat{PPV}}_{2}) \end{matrix}] [\begin{matrix} t_{i 1} & t_{i 1} \\ t_{i 2} & 0 \end{matrix}] \\ = [\begin{matrix} 1 & 1 \\ 1 & 1 \end{matrix}] n_{+ • •} {\hat{PPV}}_{1} (1 - {\hat{PPV}}_{1}) + [\begin{matrix} 1 & 0 \\ 0 & 0 \end{matrix}] n_{• + •} {\hat{PPV}}_{2} (1 - {\hat{PPV}}_{2}) . \end{matrix}

Then

V_{m} = I_{m}^{- 1} = \frac{1}{n_{+ • •} {\hat{PPV}}_{1} (1 - {\hat{PPV}}_{1})} [\begin{matrix} 0 & 0 \\ 0 & 1 \end{matrix}] + \frac{1}{n_{• + •} {\hat{PPV}}_{2} (1 - {\hat{PPV}}_{2})} [\begin{matrix} 1 & - 1 \\ - 1 & 1 \end{matrix}] .

Appendix C. Derivation of empirical variance-covariance matrix

\begin{matrix} I_{e} = & \sum_{i} [\begin{matrix} t_{i 1} & t_{i 2} \\ t_{i 1} & 0 \end{matrix}] [\begin{matrix} {(d_{i 1} - {\hat{PPV}}_{1})}^{2} & (d_{i 1} - {\hat{PPV}}_{1}) (d_{i 2} - {\hat{PPV}}_{2}) \\ (d_{i 2} - {\hat{PPV}}_{2}) (d_{i 1} - {\hat{PPV}}_{1}) & {(d_{i 2} - {\hat{PPV}}_{2})}^{2} \end{matrix}] [\begin{matrix} t_{i 1} & t_{i 1} \\ t_{i 2} & 0 \end{matrix}] \\ = & [\begin{matrix} 1 & 1 \\ 1 & 1 \end{matrix}] \sum_{i} t_{i 1} {(d_{i 1} - {\hat{PPV}}_{1})}^{2} + [\begin{matrix} 1 & 0 \\ 0 & 0 \end{matrix}] \sum_{i} t_{i 2} {(d_{i 2} - {\hat{PPV}}_{2})}^{2} \\ + [\begin{matrix} 2 & 1 \\ 1 & 0 \end{matrix}] \sum_{i} t_{i 1} t_{i 2} (d_{i 1} - {\hat{PPV}}_{1}) (d_{i 2} - {\hat{PPV}}_{2}) \end{matrix}

Since $\sum_{i} t_{i 1} {(d_{i 1} - {\hat{PPV}}_{1})}^{2} = n_{+ • •} {\hat{PPV}}_{1} (1 - {\hat{PPV}}_{1}), \sum_{i} t_{i 2} {(d_{i 2} - {\hat{PPV}}_{2})}^{2} = n_{• + •} {\hat{PPV}}_{2} (1 - {\hat{PPV}}_{2})$ and $\sum_{i} t_{i 1} t_{i 2} (d_{i 1} - {\hat{PPV}}_{1}) (d_{i 2} - {\hat{PPV}}_{2}) = {(n_{+ • •} + n_{• + •})}^{C^{PPV}}$ (notation in Appendix A), then using I_m and $V_{m} = I_{m}^{- 1}$ from Appendix B leads to the empirical variance-covariance matrix V_e = V_mI_eV_m expressed as

\begin{matrix} V_{e} & = V_{m} {I_{m} + (n_{+ • •} + n_{• + •}) C^{PPV} [\begin{matrix} 2 & 1 \\ 1 & 0 \end{matrix}]} V_{m} \\ = V_{m} + [\begin{matrix} 0 & 1 \\ 1 & - 2 \end{matrix}] \frac{C^{PPV}}{{\hat{PPV}}_{1} (1 - {\hat{PPV}}_{1}) {\hat{PPV}}_{2} (1 - {\hat{PPV}}_{2})} (\frac{1}{n_{+ • •}} + \frac{1}{n_{• + •}}) . \end{matrix}

Appendix D. Derivation of score vector

\begin{matrix} S^{H_{0}} & = \sum_{i} [\begin{matrix} t_{i 1} & t_{i 2} \\ z_{i 1} t_{i 1} & z_{i 2} t_{i 2} \end{matrix}] [\begin{matrix} d_{i 1} - {\hat{d}}_{i 1}^{H_{0}} \\ d_{i 2} - {\hat{d}}_{i 2}^{H_{0}} \end{matrix}] = \sum_{i} [\begin{matrix} t_{i 1} & t_{i 2} \\ t_{i 1} & 0 \end{matrix}] [\begin{matrix} d_{i 1} - {\hat{PPV}}_{p} \\ d_{i 2} - {\hat{PPV}}_{p} \end{matrix}] \\ = [\begin{matrix} \sum_{i} d_{i 1} t_{i 1} + \sum_{i} d_{i 1} t_{i 2} - {\hat{PPV}}_{p} \sum_{i} (t_{i 1} + t_{i 2}) \\ \sum_{i} d_{i 1} t_{i 1} - {\hat{PPV}}_{p} \sum_{i} t_{i 1} \end{matrix}] = [\begin{matrix} n_{+ • D} + n_{• + D} - (n_{+ • •} + n_{• + •}) {\hat{PPV}}_{p} \\ n_{+ • D} - n_{+ • •} {\hat{PPV}}_{p} \end{matrix}] \\ = [\begin{matrix} 0 \\ n_{+ • •} ({\hat{PPV}}_{1} - {\hat{PPV}}_{p}) \end{matrix}] = [\begin{matrix} 0 \\ ({\hat{PPV}}_{1} - {\hat{PPV}}_{2}) ∕ (1 ∕ n_{+ • •} + 1 ∕ n_{• + •}) \end{matrix}] . \end{matrix}

References

[1].Weiner DA, Ryan TJ, McCabe CH, Kennedy JW, Schloss M, Iristani F, Chaitman BR, Fisher LD. Exercise stress-testing — correlations among history of angina, stsegment response and prevalence of coronary-artery disease in the Coronary Artery Surgery Study (CASS) The New England Journal of Medicine. 1979;301:230–235. doi: 10.1056/NEJM197908023010502. DOI: 10.1056/NEJM197908023010502. [DOI] [PubMed] [Google Scholar]
[2].Moskowitz CS, Pepe MS. Comparing the predictive values of diagnostic tests: sample size and analysis for paired study designs. Clinical Trials. 2006;3:272–279. doi: 10.1191/1740774506cn147oa. DOI: 10.1191/1740774506cn147oa. [DOI] [PubMed] [Google Scholar]
[3].Roldán Nofuentes JA, Luna del Castillo JD, Montero Alonso MA. Global hypothesis test to simultaneously compare the predictive values of two binary diagnostic tests. Computational Statistics and Data Analysis. 2012;56:1161–1173. DOI: 10.1016/j.csda.2011.06.003. [Google Scholar]
[4].Liang KY, Zeger SL. Longitudinal data analysis using generalized linear models. Biometrika. 1986;73:13–22. DOI: 10.1093/biomet/73.1.13. [Google Scholar]
[5].Boos DD. On generalized score tests. American Statistician. 1992;46:327–333. DOI: 10.2307/2685328. [Google Scholar]
[6].Rotnitzky A, Jewell NP. Hypothesis testing of regression parameters in semiparametric generalized linear models for cluster correlated data. Biometrika. 1990;77:485–497. DOI: 10.1093/biomet/77.3.485. [Google Scholar]
[7].Leisenring W, Alonzo T, Pepe MS. Comparisons of predictive values of binary medical diagnostic tests for paired designs. Biometrics. 2000;56:345–351. doi: 10.1111/j.0006-341x.2000.00345.x. DOI: 10.1111/j.0006-341X.2000.00345.x. [DOI] [PubMed] [Google Scholar]
[8].Wang W, Davis CS, Soong S. Comparison of predictive values of two diagnostic tests from the same sample of subjects using weighted least squares. Statistics in Medicine. 2006;25:2215–2229. doi: 10.1002/sim.2332. DOI: 10.1002/sim.2332. [DOI] [PubMed] [Google Scholar]
[9].Pepe MS, Anderson GL. A cautionary note for inference for marginal regression models with longitudinal data and general correlated response data. Communications in Statistics. 1994;23:939–951. DOI: 10.1080/03610919408813210. [Google Scholar]

[R1] [1].Weiner DA, Ryan TJ, McCabe CH, Kennedy JW, Schloss M, Iristani F, Chaitman BR, Fisher LD. Exercise stress-testing — correlations among history of angina, stsegment response and prevalence of coronary-artery disease in the Coronary Artery Surgery Study (CASS) The New England Journal of Medicine. 1979;301:230–235. doi: 10.1056/NEJM197908023010502. DOI: 10.1056/NEJM197908023010502. [DOI] [PubMed] [Google Scholar]

[R2] [2].Moskowitz CS, Pepe MS. Comparing the predictive values of diagnostic tests: sample size and analysis for paired study designs. Clinical Trials. 2006;3:272–279. doi: 10.1191/1740774506cn147oa. DOI: 10.1191/1740774506cn147oa. [DOI] [PubMed] [Google Scholar]

[R3] [3].Roldán Nofuentes JA, Luna del Castillo JD, Montero Alonso MA. Global hypothesis test to simultaneously compare the predictive values of two binary diagnostic tests. Computational Statistics and Data Analysis. 2012;56:1161–1173. DOI: 10.1016/j.csda.2011.06.003. [Google Scholar]

[R4] [4].Liang KY, Zeger SL. Longitudinal data analysis using generalized linear models. Biometrika. 1986;73:13–22. DOI: 10.1093/biomet/73.1.13. [Google Scholar]

[R5] [5].Boos DD. On generalized score tests. American Statistician. 1992;46:327–333. DOI: 10.2307/2685328. [Google Scholar]

[R6] [6].Rotnitzky A, Jewell NP. Hypothesis testing of regression parameters in semiparametric generalized linear models for cluster correlated data. Biometrika. 1990;77:485–497. DOI: 10.1093/biomet/77.3.485. [Google Scholar]

[R7] [7].Leisenring W, Alonzo T, Pepe MS. Comparisons of predictive values of binary medical diagnostic tests for paired designs. Biometrics. 2000;56:345–351. doi: 10.1111/j.0006-341x.2000.00345.x. DOI: 10.1111/j.0006-341X.2000.00345.x. [DOI] [PubMed] [Google Scholar]

[R8] [8].Wang W, Davis CS, Soong S. Comparison of predictive values of two diagnostic tests from the same sample of subjects using weighted least squares. Statistics in Medicine. 2006;25:2215–2229. doi: 10.1002/sim.2332. DOI: 10.1002/sim.2332. [DOI] [PubMed] [Google Scholar]

[R9] [9].Pepe MS, Anderson GL. A cautionary note for inference for marginal regression models with longitudinal data and general correlated response data. Communications in Statistics. 1994;23:939–951. DOI: 10.1080/03610919408813210. [Google Scholar]

PERMALINK

A weighted generalized score statistic for comparison of predictive values of diagnostic tests

Andrzej S Kosinski

Abstract

1 Introduction

Table 1.

2 Proposed new formulations of the multinomial based Wald statistics

Table 2.

3 Proposed new formulations of the GEE based statistics

3.1 New formulation of the empirical Wald statistic

3.2 Two new formulations of the generalized score statistic

4 Development of the weighted generalized score statistic

4.1 Motivation

Figure 1.

4.2 The proposed weighted generalized score statistic

5 Simulations

Table 3.

Table 4.

Table 5.

6 Example

7 Discussion

Acknowledgement

Appendix A. Derivation of multinomial based variance-covariance matrix of predictive values

Appendix B. Derivation of model based variance-covariance matrix

Appendix C. Derivation of empirical variance-covariance matrix

Appendix D. Derivation of score vector

References

ACTIONS

PERMALINK

RESOURCES

Cite

Add to Collections

PERMALINK

A weighted generalized score statistic for comparison of predictive values of diagnostic tests

Andrzej S Kosinski

Abstract

1 Introduction

Table 1.

2 Proposed new formulations of the multinomial based Wald statistics

Table 2.

3 Proposed new formulations of the GEE based statistics

3.1 New formulation of the empirical Wald statistic

3.2 Two new formulations of the generalized score statistic

4 Development of the weighted generalized score statistic

4.1 Motivation

Figure 1.

4.2 The proposed weighted generalized score statistic

5 Simulations

Table 3.

Table 4.

Table 5.

6 Example

7 Discussion

Acknowledgement

Appendix A. Derivation of multinomial based variance-covariance matrix of predictive values

Appendix B. Derivation of model based variance-covariance matrix

Appendix C. Derivation of empirical variance-covariance matrix

Appendix D. Derivation of score vector

References

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases