Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2013 Aug 28.
Published in final edited form as: Stat Med. 2012 Aug 22;32(6):964–977. doi: 10.1002/sim.5587

A weighted generalized score statistic for comparison of predictive values of diagnostic tests

Andrzej S Kosinski 1
PMCID: PMC3756153  NIHMSID: NIHMS499894  PMID: 22912343

Abstract

Positive and negative predictive values are important measures of a medical diagnostic test performance. We consider testing equality of two positive or two negative predictive values within a paired design in which all patients receive two diagnostic tests. The existing statistical tests for testing equality of predictive values are either Wald tests based on the multinomial distribution or the empirical Wald and generalized score tests within the generalized estimating equations (GEE) framework. As presented in the literature, these test statistics have considerably complex formulas without clear intuitive insight. We propose their re-formulations which are mathematically equivalent but algebraically simple and intuitive. As is clearly seen with a new re-formulation we present, the generalized score statistic does not always reduce to the commonly used score statistic in the independent samples case. To alleviate this, we introduce a weighted generalized score (WGS) test statistic which incorporates empirical covariance matrix with newly proposed weights. This statistic is simple to compute, it always reduces to the score statistic in the independent samples situation, and it preserves type I error better than the other statistics as demonstrated by simulations. Thus, we believe the proposed WGS statistic is the preferred statistic for testing equality of two predictive values and for corresponding sample size computations. The new formulas of the Wald statistics may be useful for easy computation of confidence intervals for difference of predictive values. The introduced concepts have potential to lead to development of the weighted generalized score test statistic in a general GEE setting.

Keywords: Correlated data, Diagnostic test, Paired design, Positive and negative predictive values, Score statistic

1 Introduction

Diagnostic tests are important in medicine especially when the gold standard for ascertainment of disease is invasive or expensive. We consider diagnostic tests with binary result and assume gold standard is available for comparison with the test result. As a motivating example, Table 1 presents the coronary artery disease data [1]. This is a paired design, in which all the patients receive both diagnostic tests, and it is the situation we consider. The gold standard for coronary artery disease is coronary angiography and the diagnostic tests are exercise stress test (Test 1) and clinical history of chest pain (Test 2). Two important measures used to evaluate diagnostic tests are positive and negative predictive values. Positive predictive value (ppv) is the probability of disease (as identified by the gold standard) when the diagnostic test is positive and negative predictive value (npv) is the probability of no disease when the diagnostic test is negative. We consider testing of equality of predictive values of two diagnostic tests. Although a joint comparison of positive and negative predictive values has been considered [2, 3], we focus presently on comparison of positive and negative predictive values separately.

Table 1.

Coronary artery disease data

CAD No CAD
Test 2 result Test 2 result
Test 1 result Chest pain No chest pain Chest pain No chest pain
Positive stress test 473 29 22 46
Negative stress test 81 25 44 151

CAD = coronary artery disease

Since both tests are measured on the same patients, the correlation between predictive values of the two tests is present. To account for this when testing equality of predictive values, generalized estimating equations (GEE) approach [4] with the generalized score test [5, 6] was considered [7]. Alternatively, data in Table 1 were considered to have a multinomial distribution and application of the δ-technique provided Wald test statistics [2, 3, 8]. These test statistics, as reported in the literature, tend to have rather complex formulas which are not most intuitive. Sometimes, formulas are different for the mathematically equivalent test statistics. Thus, there is a need for simple formulas just as they exist for testing equality of two proportions in the two independent samples case.

We propose such novel, simple, and intuitive algebraic re-formulations of the existing test statistics for comparing predictive values. The new formulas facilitate comparisons between the Wald statistics and with the generalized score statistic. A new formulation of the generalized score statistic clearly shows that if each patient receives only one test (unpaired design) then this statistic does not always reduce to the score test statistic commonly used in the independent samples situation. Motivated by this, we propose a new generalized score based statistic for testing equality of two predictive values. The suggested statistic always reduces to the score statistic with independent samples. We call this new statistic the weighted generalized score (wgs) test statistic because it results from consideration of weights when computing empirical covariance matrix needed for the generalized score statistic. The new statistic has superior type I error behavior when compared to the other available test statistics, as demonstrated with simulations.

The paper is organized as follows. Section 2 develops simple formulations of the multinomial based Wald statistics without and with transformation (log, logit) of predictive values. Section 3 deals with the GEE based test statistics resulting from consideration of a logistic model. Section 3.1 derives simple formulation of the empirical Wald statistic and shows that it is equivalent to the multinomial based Wald statistic with the logit transformed predictive values. Section 3.2 derives two algebraically different but mathematically equivalent formulations of the generalized score statistic. The first formulation shows that the generalized score statistic can be quite similar to the multinomial based Wald statistic and the second formulation explains why in the independent samples situation the generalized score statistic does not reduce to the commonly used score statistic. Section 4.1 utilizes these insights to motivate a need for weights when computing the empirical covariance matrix. Section 4.2 presents the proposed weights and the resulting new weighted generalized score statistics. Section 5 contains simulations, Section 6 a detailed example, and discussion follows in Section 7.

2 Proposed new formulations of the multinomial based Wald statistics

This section introduces notations and develops simple formulations of the multinomial based Wald statistics without and with transformation of predictive values. The log and logit transformations are considered.

Table 2 displays general notations for a paired study design data. Appropriate marginal counts are denoted by a dot in the dimension over which marginalization occurs. For brevity we denote n = n•••. Using this notations, an estimate of ppv for Test 1 is PPV^1=n+Dn+ and for Test 2 is PPV^2=n+Dn+. Similarly, NPV^1=nD¯n and NPV^2=nD¯n.

Table 2.

Paired design data notations

Disease (D) No disease (D¯)
Test 2 result Test 2 result
Test 1 result Positive (+) Negative (−) Positive (+) Negative (−)
Positive (+) n ++D n +−D n++D¯ n+D¯
Negative (−) n −+D n − − D n+D¯ nD¯

Several statistics for comparing predictive values are derived by considering the cells in Table 2 to have multinomial distribution. Let g(π) = f(ppv1) – f(ppv2) = f+•D+••) – f•+D•+•) and h(π) = f(npv1) – f(npv2) = f–•D–••) – f•–D•–•), where f(·) is a monotone differentiable function. Utilizing the δ-technique, the Wald statistics for testing H0 : ppv1 = ppv2 and H0 : npv1 = npv2 are thus

Tf(PPV)M=g(π^)2[g(π)π]π=π^TΣ^π[g(π)π]π=π^={f(PPV^1)f(PPV^2)}2var{f(PPV^1)f(PPV^2)}Tf(NPV)M=h(π^)2[h(π)π]π=π^TΣ^π[h(π)π]π=π^={f(NPV^1)f(NPV^2)}2var{f(NPV^1)f(NPV^2)},

where nΣ^π=diag(π^)π^π^T. The test statistics are distributed asymptotically as the χ12 distribution under the corresponding null hypotheses.

For f(x) = x, the formulas for the denominators of the above test statistics in terms of counts in Table 2 were provided in [3, 8]. The formulas presented in [3] and [8] are mathematically equivalent but differ algebraically. These formulas are rather complex with no easy intuitive insight. Transformation f(x) = log x was considered in [2, 8]. Similarly, the provided formulas while mathematically equivalent are different algebraically, and are also rather involved and not easily intuitive. Below we present novel, simple, and intuitive algebraic re-formulations of these multinomial based Wald test statistics.

Considering f(x) = x and the estimated variance-covariance matrix of f(PPV^1) and f(PPV^2) as derived in Appendix A, we have

TPPVM=(PPV^1PPV^2)2PPV^1(1PPV^1)n++PPV^2(1PPV^2)n+2×CPPV(1n++1n+)

with

CPPV=(n++n+)1{n++D(1PPV^1)(1PPV^2)+n++D¯PPV^1PPV^2}. (1)

Here, the variance of PPV^1PPV^2 is explicitly expressed as sum of the commonly used binomial variances of PPV^1 and PPV^2 minus two times covariance. The covariance depends on the estimates of positive predictive values and on the frequencies of concordant positive test results for diseased (n++D) and disease free groups (n++D¯).

Considering f(x) = log x, we have

TlogPPVM=(logPPV^1logPPV^2)2×PPV^1PPV^2PPV^2(1PPV^1)n++PPV^1(1PPV^2)n+2CPPV(1n++1n+).

In addition, with f(x) = logit x = log{x/(1 – x)} we have

TlogitPPVM=(logitPPV^1logitPPV^2)2×PPV^1(1PPV^1)PPV^2(1PPV^2)PPV^1(1PPV^1)n++PPV^2(1PPV^2)n+2CPPV(1n++1n+).

Similarly, utilizing the estimated variance-covariance matrix of f(NPV^1) and f(NPV^2) (see Appendix A), we suggest the following formulas for the Wald statistics testing equality of two negative predictive values:

TNPVM=(NPV^1NPV^2)2NPV^1(1NPV^1)n+NPV^2(1NPV^2)n2CNPV(1n+1n)TlogNPVM=(logNPV^1logNPV^2)2×NPV^1NPV^2NPV^2(1NPV^1)n+NPV^1(1NPV^2)n2CNPV(1n+1n)TlogitNPVM=(logitNPV^1logitNPV^2)2×NPV^1(1NPV^1)NPV^2(1NPV^2)NPV^1(1NPV^1)n+NPV^2(1NPV^2)n2CNPV(1n+1n)

with

CNPV=(n+n)1{nDNPV^1NPV^2+nD¯(1NPV^1)(1NPV^2)}.

The proposed formulas can be seen as directly expanding the commonly used statistics for two independent samples and reduce to such statistics when there are no concordant results (i.e. n++D=n++D¯=nD=nD¯=0), as is the case in the two independent samples situation. We will show below that the logit transformation based test statistic turns out to be the same as the empirical Wald test statistic from the GEE approach proposed in [7].

3 Proposed new formulations of the GEE based statistics

This section deals with the GEE based test statistics resulting from consideration of a logistic model. Section 3.1 derives simple formulation of the empirical Wald statistic and shows that it is equivalent to the multinomial based Wald statistic with the logit transformed predictive values. Section 3.2 derives two algebraically different but mathematically equivalent formulations of the generalized score statistic. The first formulation shows that the generalized score statistic can be quite similar to the multinomial based Wald statistic and the second formulation explains why in the independent samples situation the generalized score statistic does not reduce to the commonly used score statistic.

3.1 New formulation of the empirical Wald statistic

The GEE approach with disease status as the dependent variable was considered in [7]. For implementation of this approach, data in Table 2 can be re-expressed, with i denoting patient (cluster), j denoting test number within a cluster (j = 1, 2). Disease status is denoted by Dij (1=disease, 0=no disease), covariate Zij indicates type of diagnostic test (1=Test 1, 0=Test 2), and Tij denotes test result (1=positive, 0=negative). To test equality of positive predictive values, only data with at least one positive test result (i.e. Ti1 = 1 or Ti2 = 1) need to be used and the model

logitP(Dij=1Zij,Tij=1)=αPPV+βPPVZij (2)

allows test of the hypothesis H0 : βppv = 0 which is equivalent to H0 : ppv1 = ppv2. To test equality of negative predictive values (H0 : npv1 = npv2) the model logit P (Dij = 1∣Zij, Tij = 0) = αnpv + βnpvZij is considered for data with at least one negative test result, and hypothesis H0 : βnpv = 0 is tested.

The GEE empirical Wald statistic for testing H0 : βppv = 0 in model (2) is TPPVGW=β^PPV2vare(β^PPV), where the denominator is the second (last) diagonal element of the empirical variance-covariance matrix Ve = VmIeVm. The matrix Vm is the model based variance-covariance matrix and matrix Ie is the empirical information matrix. Use of the identity working correlation matrix in the context of diagnostic tests is advocated in [7, 9] and thus, using notations from the above section, the information matrix can be expressed as

I=i[ti1ti2zi1ti1zi2ti2]cov(Di)[ti1ti2zi1ti1zi2ti2]T=i[ti1ti2ti10]cov(Di)[ti1ti1ti20] (3)

where cov(Di) is a variance-covariance matrix of Di = [Di1, Di2]T. The model based variance-covariance matrix Vm=Im1 is derived in Appendix B and the empirical information matrix Ie results from (3) by substituting for cov(Di) the matrix

cove(Di)=[di1d^i1di2d^i2][di1d^i1di2d^i2]T=[di1PPV^1di2PPV^2][di1PPV^1di2PPV^2]T.

The empirical variance-covariance matrix Ve = VmIeVm is derived in Appendix C and the novel algebraic formulation of the GEE empirical Wald test statistic for testing H0 : βppv = 0 in model (2) is as follows:

TPPVGW=β^PPV2×PPV^1(1PPV^1)PPV^2(1PPV^2)PPV^1(1PPV^1)n++PPV^2(1PPV^2)n+2CPPV(1n++1n+).

Similarly, the GEE empirical Wald statistic to test equality of two negative predictive values (H0 : βnpv = 0) is

TNPVGW=β^NPV2×NPV^1(1NPV^1)NPV^2(1NPV^2)NPV^1(1NPV^1)n+NPV^2(1NPV^2)n2CNPV(1n+1n).

Above formulations show that the GEE empirical Wald statistics are the same as the corresponding multinomial based Wald statistics utilizing the logit transformation (derived in Section 2), because β^NPV=logitNPV^2logitNPV^1 and β^PPV=logitPPV^1logitPPV^2.

3.2 Two new formulations of the generalized score statistic

Utilizing model (2) the generalized score statistic formula for testing equality of two positive predictive values is presented in [7] and it is

TPPVGS={i=1NPj=1miDij(ZijmiZ)}2i=1NP{j=1mi(DijD)(ZijZ)}2

where, D=(i=1NPj=1miDij)i=1NPmi,Z=(i=1NPj=1miZij)i=1NPmi, NP is the number of patients with at least one positive test outcome, and mi is the number of positive test results for the i-th patient.

Below, we derive two interesting (mathematically equivalent) formulations of the above statistic, with the first formulation facilitating its comparison with the multinomial based Wald statistic and the second one for direct comparison with the commonly used two independent samples score statistic. In addition, the new formulations provide motivation for the new weighted generalized score statistic we introduce in Section 4.

The general form of the generalized score statistic is TGS=(LVmH0SH0)T(LVeH0LT)1LVmH0SH0, where L is the contrast matrix with r rows (number of considered contrasts) and p columns (number of parameters), S is the score vector, Vm is the model based variance-covariance matrix, and Ve = VmIeVm is an empirical variance-covariance matrix with Ie denoting the empirical information matrix. Quantities S, Vm, and Ie are computed under H0. The generalized score statistic is distributed under H0 as χr2.

For positive predictive values consider model (2) and because, as indicated earlier, the identity working matrix is required for application of GEE in the context of this paper [7, 9], the score vector under H0 : βppv = 0 is (see Appendix D for details)

SH0=i[ti1ti2zi1ti1zi2ti2][di1d^i1H0di2d^i2H0]=[0(PPV^1PPV^2)(1n++1n+)],

where d^i1H0=d^i2H0=PPV^p=(n+D+n+D)(n++n+) is the pooled positive predictive value. Alternatively PPV^p=w1PPV^1+w2PPV^2 with w1 = n+••/(n+••+n•+•) and w2 = 1 – w1. The model based variance-covariance matrix VmH0 has the same formulation as matrix Vm derived in Appendix B but with PPV^p substituted for PPV^1 and PPV^2. To test H0 : βppv = 0, the contrast matrix L = [0, 1] is used, and the generalized score statistic can be expressed as

TPPVGS=(LVmH0SH0)TLVmH0SH0LVmH0IeH0VmH0LT=(PPV^1PPV^2)2[1n+1n++1n+]IeH0[1n+1n++1n+]T, (4)

with the empirical information matrix IeH0 obtained by inserting the matrix

coveH0(Di)=[di1d^i1H0di2d^i2H0][di1d^i1H0di2d^i2H0]T=[di1PPV^pdi2PPV^p][di1PPV^pdi2PPV^p]T

in place of cov(Di) in expression (3). After brief algebra, the first new formulation of the generalized score statistic for testing equality of two positive predictive values is

TPPVGS=(PPV^1PPV^2)2PPV^1(1PPV^1)n++PPV^2(1PPV^2)n++RPPV2CpPPV(1n++1n+), (5)

with

RPPV=(PPV^1PPV^p)2n++(PPV^2PPV^p)2n+CpPPV=(n++n+)1{n++D(1PPV^p)2+n++D¯PPV^p2}.

This formulation shows that the generalized score statistic can be quite similar to the multi-nomial based Wald statistic TPPVM if Rppv is small and CpPPV is close to Cppv.

Further algebra leads to the second new formulation of the generalized score statistic

TPPVGS=(PPV^1PPV^2)2{PPV^p(1PPV^p)+WPPV2CpPPV}(1n++1n+), (6)

where WPPV=(2PPV^pPPV^1PPV^2)(2PPV^p1). Since Wppv is not always equal to zero, the above formulation explicitly shows that the generalized score statistic does not always reduce to the score statistic when applied to independent samples (see Section 4 for details).

Similarly, two new formulations of the generalized score statistic for testing equality of two negative predictive values are

TNPVGS=(NPV^1NPV^2)2NPV^1(1NPV^1)n+NPV^2(1NPV^2)n+RNPV2CpNPV(1n+1n)TNPVGS=(NPV^1NPV^2)2{NPV^p(1NPV^p)+WNPV2CpNPV}(1n+1n),

where

NPV^p=(nD¯+nD¯)(n+n)RNPV=(NPV^1NPV^p)2n+(NPV^2NPV^p)2nCpNPV=(n+n)1{nDNPV^p2+nD¯(1NPV^p)2}WNPV=(2NPV^pNPV^1NPV^2)(2NPV^p1).

4 Development of the weighted generalized score statistic

This section motivates and presents the new statistic. Section 4.1 utilizes insights provided by the two formulations of the generalized score statistic to motivate a need for weights when computing the empirical covariance matrix. Section 4.2 presents the proposed weights and the new weighted generalized score statistics.

4.1 Motivation

The formulation (6) of the generalized score statistic shows that in the two independent samples situation (here n+•• denotes size of sample with Test 1 and n•+• size of an independent sample with Test 2) the generalized score statistic TPPVGS does not always reduce to the commonly used score statistic

TPPVS=(PPV^1PPV^2)2PPV^p(1PPV^p)(1n++1n+). (7)

Even though with independent samples we have CpPPV=0 because there are no concordant positive test results, the term Wppv equals zero only if PPV^1=PPV^2 or PPV^p=0.5, or in a balanced situation, i.e. when n+•• = n•+•. Hence, in general, the generalized score statistic does not reduce to the score statistic when dealing with two independent samples. In addition, formulation (5) shows that the generalized score statistic can be quite similar to the multinomial Wald statistic TPPVM if Rppv is small and CpPPV is close to Cppv. In fact, in the two independent samples situation (here CpPPV=CPPV=0) with unequal sample sizes (n+••n•+•), the generalized score statistic tracks closely the multinomial based Wald statistic TPPVM rather than the score statistic. An example of this relationship is presented on Figure 1 which shows how these statistics change as the proportion of Test 1 positives among Test 1 or Test 2 positives changes. In a balanced situation this proportion is w1n+••/(n+•• + n•+•) = 0.5 and the generalized score statistic and the score statistic have the same value. This is because with w1 = 0.5 the term Wppv in expression (6) equals zero. However, when the two independent samples have different sizes (w1 ≠ 0.5) then, as w1 changes, the generalized score statistic surprisingly tracks the multinomial based TPPVM rather than the score statistic (7).

Figure 1.

Figure 1

Behavior of the generalized score statistic (solid curve), the score statistic (dashed curve) and the multinomial based Wald statistic TPPVM (dotted curve) in the two independent samples situation as a function of proportion of Test 1 positives among patients with positive result on either of the two diagnostic tests (ppv1 = 0.80, ppv2 = 0.85, total n = 900)

To alleviate this behavior of the generalized score statistic we develop a weighted generalized score statistic which always reduces to the score statistic when applied to two independent samples. Motivation for the form of the proposed statistic arises from comparison of the following two formulas, with the first one a re-expression of the score statistic (7) and the second one a re-expression of the generalized score statistics (6) in the two independent samples situation:

TPPVS=(PPV^1PPV^2)2[w1{1n+iti1(di1PPV^p)2}+w2{1n+iti2(di2PPV^p)2}](1n++1n+)TPPVGS=(PPV^1PPV^2)2[w2{1n+iti1(di1PPV^p)2}+w1{1n+iti2(di2PPV^p)2}](1n++1n+).

Recall that w1 = n+••/(n+•• + n•+•) and w2 = 1 – w1. The only difference between the above two formulas is the opposite weighting of the averages in the curly brackets. This observation leads to consideration of weights (defined in Section below) when computing the empirical covariance matrix utilized in derivation of the generalized score statistic. With these weights the generalized score statistic will be equal to the score statistic in the independent samples situation. Alternative intuition behind the suggestion of weights is that if w1 > 0.5 then the pooled estimate PPV^p is influenced more by the Test 1 data; thus the distance di1PPV^p used in the empirical covariance matrix is too small and should be up-weighted and distance di2PPV^p down-weighted. Similarly, if w1 < 0.5 then PPV^p is influenced more by the Test 2 data and the distance di1PPV^p should be down-weighted with distance di2PPV^p up-weighted.

4.2 The proposed weighted generalized score statistic

We propose weights vjPPV=wj(w1w2)12(j=1,2), i.e. v1PPV=(n+n+)12 and v2PPV=(n+n+)12, and the following weighted empirical covariance matrix:

covweH0(Di)=[v1PPV(di1PPV^p)v2PPV(di2PPV^p)][v1PPV(di1PPV^p)v2PPV(di2PPV^p)]T.

The weighted empirical information matrix IweH0 is obtained by substituting the above matrix for cov(Di) in expression (3). Subsequently, expression (4) with IweH0 inserted in place of IeH0 leads to the proposed weighted generalized score (wgs) statistic:

TPPVWGS=(PPV^1PPV^2)2{PPV^p(1PPV^p)2CpPPV}(1n++1n+). (8)

The above statistic is the same as the generalized score statistic in a balanced design (i.e. when w1 = 0.5), because then the term Wppv in formulation (6) of the generalized score statistic equals zero. However, a balanced situation is not likely in a paired design and in a more common unbalanced situation these two statistics are different. In the independent samples case (n++D=n++D¯=0, i.e. CpPPV=0), the proposed weighted generalized score statistic (8) reduces to the score statistic (7) even in an unbalanced design, in contrast to the generalized score statistic.

Similarly, for testing equality of two negative predictive values the weights are v1NPV=(nn)12 and v2NPV=(nn)12, and the following weighted generalized score statistic is suggested:

TNPVWGS=(NPV^1NPV^2)2{NPV^p(1NPV^p)2CpNPV}(1n+1n). (9)

In the next Section we demonstrate with simulations that the weighted generalized score statistic has superior type I error behavior as compared to the generalized score statistic, as well as compared to the discussed multinomial based statistics.

5 Simulations

We performed simulations to evaluate size and power of test statistics. Each true scenario considers predictive values, prevalence of disease, and measure of association between the two test results. Rather than assume the same degree of association of diagnostic tests in each disease category, we consider a more general approach and allow differential association of two tests for diseased and not diseased groups, as parameterized with odds ratios ORD and ORD¯, respectively. This may be a more realistic scenario because, for example, in the CAD data (Table 1) the odds ratio in the diseased group is 473×25/(29×81) = 5.03 and it is substantially larger than the odds ratio 22 × 151/(46 × 44) = 1.64 in the group with no disease.

Hence, seven true parameters formulate a simulation scenario (ORD, ORD¯, ppv1, ppv2, npv1, npv2, and disease prevalence θ). These parameters can define the corresponding true multinomial distribution reflecting the paired design considered in this paper and the multinomial probabilities are shown in Table 3. To derive probabilities of this multinomial distribution, we first compute sensitivity se and specificity sp of each test based on the considered predictive values and the disease prevalence. Quantities x and y in Table 3 are margin compatible solutions of quadratic equations resulting from equating cross-product of the first two columns (diseased group) to ORD and the last two columns (non-diseased group) to ORD¯, respectively. The occasionally assumed in literature conditional independence situation, in which tests are considered independent conditional on the disease status, is equivalent to assumption that ORD=ORD¯=1. In this situation, the quadratic equations reduce to linear equations with solutions x = se1se2 and y = sp1sp2, and Table 3 probabilities are then simply probabilities expected under no association between tests within a disease status category.

Table 3.

Multinomial distribution probabilities for specified predictive values ppv1, ppv2, npv1, and npv2, disease prevalence θ, and odds ratios ORD and ORD¯. Quantities x and y are compatible solutions of quadratic equations resulting from equating cross-product of the first two columns to ORD and the last two columns to ORD¯, respectively.

Disease (D) No disease (D¯)
Test 2 result Test 2 result
Test 1 result Positive Negative Positive Negative
Positive θ x θ(se1x) (1 – θ)(1 – sp1sp2 + y) (1 – θ)(sp2y)
Negative θ(se2x) θ(1 – se1se2 + x) (1 – θ)(sp1y) (1 – θ)y
se1 = ppv1(npv1 – 1 + θ)/{θ(ppv1 + npv1 – 1)},
sp1 = npv1(ppv1θ)/{(1 – θ)(ppv1 + npv1 – 1)}
se2 = ppv2(npv2 – 1 + θ)/{θ(ppv2 + npv2 – 1)},
sp2 = npv2(ppv2θ)/{(1 – θ)(ppv2 + npv2 – 1)}

For each considered true scenario, the corresponding multinomial distribution described in Table 3 was derived, and data (as in Table 2) with total sample size n were generated repetitively from this distribution. We added a very small number to each cell of generated data when this particular data set led to inability to compute one of the considered statistics because of zero cells or predictive values equal to zero or one. We generated data four million times because results are displayed to three decimal digits and this number of simulations provides the length of the 95% confidence interval for the proportion of rejected tests to be less than 0.001. For each simulation, H0 was rejected if a test statistic exceeded the 0.95 quantile of the χ12 distribution.

Table 4 displays empirical size for the multinomial Wald statistics, the generalized score statistic, and the proposed weighted generalized score statistics. The results for the multinomial based Wald statistic using logit transformation and the GEE empirical Wald statistic are displayed in one column, because these statistics are the same as shown in Section 3.1. We considered ppv1 = ppv2 = 0.75, npv1 = npv2 = 0.80, and varied prevalence of disease θ and total sample size n. We specified ORD = 5.0 and ORD¯=2.0, which is similar to the CAD data in Table 1. As seen in Table 4 the proposed weighted generalized score (wgs) statistics preserve the nominal 0.05 type I error better than the other statistics. The empirical size of Tm is similar or slightly larger than for the generalized score statistics, as anticipated from expression (5), but the empirical sizes for the both statistics are moderately inflated, mainly with lower disease prevalence or smaller sample sizes. The empirical size is underestimated for logit (or equivalently GEE empirical Wald) and log transformation multinomial statistics, especially for smaller sample sizes.

Table 4.

Percent of 4,000,000 repeated simulations with H0 rejected at 5 percent level. True predictive values are PPV1 = PPV2 = 0.75 and NPV1 = NPV2 = 0.80, ORD = 5 and ORD¯=2 are odds ratios defining degree of association between results of the two tests among diseased and not-diseased, n is the total number of patients with two tests, and θ is the prevalence of diease.

H0 : PPV1 = PPV2 H0 : NPV1 = NPV2
θ n T m TlogM TlogitM T gs T wgs T m TlogM TlogitM T gs T wgs
T gw T gw
0.3 50 9.3 3.6 0.9 7.0 4.2 4.8 3.8 3.7 4.8 4.2
100 6.6 4.4 2.4 6.0 5.0 5.1 4.7 4.7 5.1 4.8
200 5.7 4.6 4.2 5.5 5.0 5.1 4.8 4.8 5.1 4.9
300 5.5 4.7 4.5 5.3 5.0 5.0 4.9 4.9 5.0 4.9
400 5.4 4.8 4.7 5.3 5.0 5.0 4.9 4.9 5.0 4.9
500 5.3 4.8 4.7 5.2 5.0 5.0 5.0 5.0 5.0 5.0
0.4 50 6.4 4.4 2.6 6.1 4.9 5.4 4.0 3.7 5.3 4.5
100 5.7 4.6 4.2 5.6 5.0 5.3 4.6 4.6 5.2 4.9
200 5.3 4.8 4.7 5.3 5.0 5.1 4.8 4.8 5.1 5.0
300 5.2 4.9 4.8 5.2 5.0 5.1 4.9 4.9 5.1 4.9
400 5.2 4.9 4.8 5.1 5.0 5.1 4.9 4.9 5.1 5.0
500 5.1 4.9 4.9 5.1 5.0 5.0 4.9 4.9 5.0 5.0
0.5 50 5.9 4.5 3.8 5.8 4.9 5.6 3.8 2.9 5.4 4.5
100 5.4 4.8 4.6 5.4 5.0 5.4 4.5 4.3 5.3 4.9
200 5.2 4.9 4.8 5.2 5.0 5.2 4.8 4.7 5.2 5.0
300 5.1 4.9 4.9 5.1 5.0 5.1 4.8 4.8 5.1 5.0
400 5.1 4.9 4.9 5.1 5.0 5.1 4.9 4.9 5.1 5.0
500 5.1 5.0 4.9 5.1 5.0 5.1 4.9 4.9 5.1 5.0

Table 5 summarizes empirical power when the true predictive values are ppv1 = 0.75, ppv2 = 0.85, npv1 = 0.80, and npv2 = 0.90. Here we also consider ORD = 5.0 and ORD¯=2.0, and vary prevalence of disease θ and total sample size n. Empirical power of the multinomial Tm and the generalized score statistics may be somewhat overestimated because of potential for the inflated type I error as demonstrated in Table 4. Power of the proposed weighted generalized score statistic is slightly higher or similar to power of the generalized score statistic. Although not shown, as expected, power for the conditional independence situation is less than power with the chosen positive association between the two tests.

Table 5.

Percent of 4,000,000 repeated simulations with H0 rejected at 5 percent level. True predictive values are PPV1 = 0.75, PPV2 = 0.85, NPV1 = 0.80, and NPV2 = 0.90, ORD = 5 and ORD¯=2 are odds ratios defining degree of association between results of the two tests among diseased and not-diseased, n is the total number of patients with two tests, and θ is the prevalence of diease.

H0 : PPV1 = PPV2 H0 : NPV1 = NPV2
θ n T m TlogM TlogitM T gs T wgs T m TlogM TlogitM T gs T wgs
T GW T GW
0.3 50 11.7 4.4 4.0 8.8 10.6 51.4 47.3 40.2 51.4 47.4
100 15.0 9.5 12.2 13.4 16.6 82.3 81.1 80.0 82.3 81.2
200 24.8 20.3 26.0 23.5 27.5 98.5 98.4 98.3 98.5 98.4
300 35.0 31.2 37.1 33.8 37.9 99.9 99.9 99.9 99.9 99.9
400 44.7 41.5 47.0 43.7 47.6 100 100 100 100 100
500 53.5 50.8 55.8 52.6 56.2 100 100 100 100 100
0.4 50 16.2 11.5 8.8 15.3 15.4 34.6 30.2 23.4 34.3 32.1
100 26.6 22.8 23.9 25.9 26.7 61.5 58.9 57.8 61.3 60.4
200 46.7 44.1 46.1 46.2 47.3 89.3 88.7 88.7 89.3 89.1
300 63.3 61.6 63.3 62.9 63.9 97.6 97.5 97.5 97.6 97.5
400 75.7 74.6 75.9 75.4 76.2 99.5 99.5 99.5 99.5 99.5
500 84.4 83.7 84.6 84.2 84.8 99.9 99.9 99.9 99.9 99.9
0.5 50 23.6 19.5 16.7 23.3 21.9 22.3 17.5 12.3 21.5 21.7
100 40.7 38.0 38.0 40.5 40.0 41.7 37.9 37.1 41.2 41.6
200 68.2 66.9 67.3 68.1 68.0 70.1 68.4 69.3 69.8 70.4
300 84.6 84.0 84.3 84.5 84.6 86.3 85.6 86.1 86.2 86.6
400 93.1 92.8 93.0 93.0 93.1 94.2 93.9 94.2 94.2 94.4
500 97.1 97.0 97.0 97.1 97.1 97.7 97.6 97.7 97.7 97.7

In conclusion, the simulation results support use of the proposed weighted generalized score statistics by demonstrating excellent size and power behavior.

6 Example

We consider the coronary artery disease data [1] presented in Table 1 (n = 871) and provide detailed calculations which may be easily followed by practitioners with their own data.

First we compute the weighted generalized score statistic for testing equality of positive predictive values. The number of patients with positive Test 1 is n+•• = 473+29+22+46 = 570 and with positive Test 2 is n•+• = 473 + 81 + 22 + 44 = 620. This is an unbalanced situation because n+••n•+•. Positive predictive values are estimated as PPV^1=(473+29)570=0.881 and PPV^2=(473+81)620=0.894. The pooled estimate is PPV^p=(473+29+473+81)(570+620)=0.887. Subsequently, PPV^p(1PPV^p)=0.100,473(1PPV^p)2+22PPV^p2=23.322, and 2CpPPV=2×23.322(570+620)=0.0392. Substituting computed values we obtain

TPPVWGS=(PPV^1PPV^2)2{PPV^p(PPV^p)2CpPPV}(1570+1620)=0.807.

Hence, utilizing χ12 distribution, we have p = 0.37 and conclude that there is no evidence to claim difference between the two positive predictive. As an aside, observe that the term 2CpPPV reduces the denominator substantially and thus correlation between Test 1 and Test 2 has a major impact in this data set. For comparison, the generalized score statistic is 0.802 and the multinomial based Wald statistics are as follows: 0.802 for the untransformed version, 0.800 for the log transformed version, and 0.806 for the logit transformed version (or equivalently the GEE empirical Wald).

The weighted generalized score statistic for comparison of two negative predictive values is computed similarly. We have n–•• = 81+25+44+151 = 301, n•–• = 29+25+46+151 = 251, and the estimated negative predictive values are NPV^1=(44+151)301=0.648 and NPV^2=(46+151)251=0.785. The pooled estimate is NPV^p=(44+151+46+151)(301+251)=0.710. Then NPV^p(1NPV^p)=0.206,25NPV^p2+151(1NPV^p)2=25.294, and 2CpNPV=2×25.294(301+251)=0.0916. Subsequently,

TNPVWGS=(NPV^1NPV^2)2{NPV^p(1NPV^p)2CpNPV}(1301+1251)=22.502.

Here, p < 0.001, and we conclude that the two negative predictive values differ. The generalized score statistic is 23.579 and the multinomial based Wald statistics are as follows: 23.725 for the untransformed version, 22.437 for the log transformed version, and 21.742 for the logit transformed version (or equivalently for the GEE empirical Wald statistic).

7 Discussion

We have proposed new weighted generalized score test statistics (8) and (9) for testing equality of positive and negative predictive values of two diagnostic tests in a paired design. Simulations indicate that these statistics preserve type I error better than the generalized score statistic or the multinomial distribution based Wald statistics. The proposed statistics are intuitive, simple to compute, and in absence of correlated data (unpaired design) they naturally reduce to the commonly used score statistic for comparison of two proportions with independent samples. Thus, we recommend the weighted generalized score (wgs) test statistic for hypothesis testing and for corresponding sample size computations. We have also developed novel simple formulas for the existing Wald statistics and they may be used for computation of confidence intervals for difference of two predictive values. Although, the simple univariable formulas for comparison of predictive values provide intuitive insight, a GEE based regression analysis is necessary when adjustment for covariates is of interest, especially if performance of a test varies substantially according to patient characteristics [7]. This is similar to co-existence of logistic regression methodology and the commonly used “paper-and-pencil” formulas for comparison of two proportions. Finally, we have proposed the weighted generalized score statistic in the setting of diagnostic tests, but with further development the introduced concept may lead to an improved score test within the general GEE framework.

Acknowledgement

This project was supported by the National Center for Research Resources and the National Center for Advancing Translational Science of the National Institutes of Health through the Clinical and Translational Science Award Number UL1RR024128.

Appendix A. Derivation of multinomial based variance-covariance matrix of predictive values

The matrix of derivatives of f(ppv1) = f+•D+••) and f(ppv2) = f•+D•+•) with respect to π=[π++D,π+D,π+D,πD,π++D¯,π+D¯,π+D¯,πD¯]T is G = BD, where D is a diagonal matrix with derivatives f’(ppv1) and f’(ppv2) on the diagonal, and

B=[1PPV101PPV10PPV10PPV101PPV21PPV200PPV2PPV200]T[1π+001π+].

Utilizing the δ-technique and the variance-covariance matrix Σπ of the multinomial distribution we have BT π = 0 and

covf(PPV)M,δ([f(PPV^1)f(PPV^2)])=G^TΣ^πG^=D^B^Tdiag(π^)π^π^TnB^D^=D^B^Tdiag(π)nB^D^=[f(PPV^1)2×PPV^1(1PPV^1)n+f(PPV^1)f(PPV^2)×CPPV(1n++1n+)f(PPV^1)f(PPV^2)×CPPV(1n++1n+)f(PPV^2)2×PPV^2(1PPV^2)n+]

where CPPV=1n++n+{n++D(1PPV^1)(1PPV^2)+n++D¯PPV^1PPV^2}.

Similarly, variance-covariance matrix covf(NPV)M,δ of f(NPV^1) and f(NPV^2) is

[f(NPV^1)2×NPV^1(1NPV^1)nf(NPV^1)f(NPV^2)×CNPV(1n+1n)f(NPV^1)f(NPV^2)×CNPV(1n+1n)f(NPV^2)2×NPV^2(1NPV^2)n]

where CNPV=1n+n{nDNPV^1NPV^2+nD¯(1NPV^1)(1NPV^2)}.

Appendix B. Derivation of model based variance-covariance matrix

Im=i[ti1ti2ti10][PPV^1(1PPV^1)00PPV^2(1PPV^2)][ti1ti1ti20]=[1111]n+PPV^1(1PPV^1)+[1000]n+PPV^2(1PPV^2).

Then

Vm=Im1=1n+PPV^1(1PPV^1)[0001]+1n+PPV^2(1PPV^2)[1111].

Appendix C. Derivation of empirical variance-covariance matrix

Ie=i[ti1ti2ti10][(di1PPV^1)2(di1PPV^1)(di2PPV^2)(di2PPV^2)(di1PPV^1)(di2PPV^2)2][ti1ti1ti20]=[1111]iti1(di1PPV^1)2+[1000]iti2(di2PPV^2)2+[2110]iti1ti2(di1PPV^1)(di2PPV^2)

Since iti1(di1PPV^1)2=n+PPV^1(1PPV^1),iti2(di2PPV^2)2=n+PPV^2(1PPV^2) and iti1ti2(di1PPV^1)(di2PPV^2)=(n++n+)CPPV (notation in Appendix A), then using Im and Vm=Im1 from Appendix B leads to the empirical variance-covariance matrix Ve = VmIeVm expressed as

Ve=Vm{Im+(n++n+)CPPV[2110]}Vm=Vm+[0112]CPPVPPV^1(1PPV^1)PPV^2(1PPV^2)(1n++1n+).

Appendix D. Derivation of score vector

SH0=i[ti1ti2zi1ti1zi2ti2][di1d^i1H0di2d^i2H0]=i[ti1ti2ti10][di1PPV^pdi2PPV^p]=[idi1ti1+idi1ti2PPV^pi(ti1+ti2)idi1ti1PPV^piti1]=[n+D+n+D(n++n+)PPV^pn+Dn+PPV^p]=[0n+(PPV^1PPV^p)]=[0(PPV^1PPV^2)(1n++1n+)].

References

  • [1].Weiner DA, Ryan TJ, McCabe CH, Kennedy JW, Schloss M, Iristani F, Chaitman BR, Fisher LD. Exercise stress-testing — correlations among history of angina, stsegment response and prevalence of coronary-artery disease in the Coronary Artery Surgery Study (CASS) The New England Journal of Medicine. 1979;301:230–235. doi: 10.1056/NEJM197908023010502. DOI: 10.1056/NEJM197908023010502. [DOI] [PubMed] [Google Scholar]
  • [2].Moskowitz CS, Pepe MS. Comparing the predictive values of diagnostic tests: sample size and analysis for paired study designs. Clinical Trials. 2006;3:272–279. doi: 10.1191/1740774506cn147oa. DOI: 10.1191/1740774506cn147oa. [DOI] [PubMed] [Google Scholar]
  • [3].Roldán Nofuentes JA, Luna del Castillo JD, Montero Alonso MA. Global hypothesis test to simultaneously compare the predictive values of two binary diagnostic tests. Computational Statistics and Data Analysis. 2012;56:1161–1173. DOI: 10.1016/j.csda.2011.06.003. [Google Scholar]
  • [4].Liang KY, Zeger SL. Longitudinal data analysis using generalized linear models. Biometrika. 1986;73:13–22. DOI: 10.1093/biomet/73.1.13. [Google Scholar]
  • [5].Boos DD. On generalized score tests. American Statistician. 1992;46:327–333. DOI: 10.2307/2685328. [Google Scholar]
  • [6].Rotnitzky A, Jewell NP. Hypothesis testing of regression parameters in semiparametric generalized linear models for cluster correlated data. Biometrika. 1990;77:485–497. DOI: 10.1093/biomet/77.3.485. [Google Scholar]
  • [7].Leisenring W, Alonzo T, Pepe MS. Comparisons of predictive values of binary medical diagnostic tests for paired designs. Biometrics. 2000;56:345–351. doi: 10.1111/j.0006-341x.2000.00345.x. DOI: 10.1111/j.0006-341X.2000.00345.x. [DOI] [PubMed] [Google Scholar]
  • [8].Wang W, Davis CS, Soong S. Comparison of predictive values of two diagnostic tests from the same sample of subjects using weighted least squares. Statistics in Medicine. 2006;25:2215–2229. doi: 10.1002/sim.2332. DOI: 10.1002/sim.2332. [DOI] [PubMed] [Google Scholar]
  • [9].Pepe MS, Anderson GL. A cautionary note for inference for marginal regression models with longitudinal data and general correlated response data. Communications in Statistics. 1994;23:939–951. DOI: 10.1080/03610919408813210. [Google Scholar]

RESOURCES