Statistics Corner

S White

. 2002 Apr;14(1):26–28.

Statistics Corner

PMCID: PMC3345413 PMID: 27528923

Welcome to this new feature in the Journal! In this “Corner” we will look at a specific situation and the statistical methods that can be applied to the analysis of data.

Consider this scenario:

You want to know how well a simple test may diagnose something, and save the need for a more difficult, invasive or expensive test that would provide a definitive answer.

For example, in a patient with palpable lymph nodes, is the cause tuberculosis or not?

To answer this with confidence you need an invasive procedure, preferably excision biopsy and histology (EBH). But such a procedure is not only invasive but also difficult, expensive and slow. You may therefore want to know how well a non-invasive, quick method would provide the answer.

One method (M₁) could be whether simple examination finds that the nodes are ‘matted’ or not, ie whether they seem to be stuck together in groups. Another method could be a Mantoux (tuberculin) test (M₂). You want to evaluate each of these tests on its own for usefulness in diagnosing TB as the cause of the lymphadenopathy. You also want to know if one test is superior to the other.

You plan to conduct a study to evaluate and compare these two methods. The best design uses each method, M₁ and M₂, independently, as well as the invasive one, on all patients studied who have palpable lymph nodes. In all the calculations that foUw, we will assume that Excission Biopsy and History (EBH) is 100% accurate - it will serve as our ‘gold standard’ How should you plan to analyse your data?

You will need to select statistical tests to:

Evaluate each method;
Compare the two methods,

We will consider these questions in turn. To illustrate the statistical tests to be described suppose you have collected data on 100 patients. Some of the data are shown in Table 1 (this only shows 6 patients - the full table would list all 100 cases), which can be summarised in a three-way cross-tabulation (Table 2).

Table 1.

Data Usting according to diagnostic method (EBH = excision biopsy and histology. M1 = palpation, M2 = mantoux test)

Patient	EBH	M1	M2
1	TB	TB	TB
2	TB	Not TB	TB
3	Not TB	Not TB	TB
4	TB	TB	TB
5	Not TB	Not TB	not TB
(etc):	:	:	:
100	Not TB	TB	not TB

Open in a new tab

Table 2.

Cross-tabulation of classifications using methods M1 and M2 by actual TB status (determined by EBH)

Result of methods		Actual status
M₁	M₂	TB	Not TB
TB	TB	39	0
TB	Not TB	18	11
Not TB	TB	0	2
Not TB	Not TB	6	24
Total		63	37

Open in a new tab

Assessment of the accuracy of each method

When the true diagnosis is known there are four commonly used statistics to measure the accuracy. Do you know what they are?

Sensitivity is one of the four methods. It indicates how many of those with the disease (TB) are correctly classified. It can be calculated using the first column of figures in Table 2.

Thence the sensitivity of method M₁ is 90% ({39+18}/63) and for method M₂ it is 62% ({39+0}/63). So, in the sample considered M₁ is more sensitive than M₂ - ie it is better at diagnosing TB when the patient has TB.

The second method is specificity. This indicates how many cases without TB are correct identified. This can be calculated using the second column of figures in Table 2. The specificity of method M₁ is 70% (26/37) and for method M₂ it is 95% 135/37). So now we see that M₂ is more specific in this sample than M₁ - ie it is better at not diagnosing TB when the patient does not have TB. In summary method M₁ is more sensitive but M₂ seems more specific.

These two statistics indicate the proportions correctly classified. By contrast the positive predictive value [and negative predictive value] indicate what proportion of those classified to have [not have] the disease actually do [not] have it. These values are dependent on the mix of proportions in the sample used. To find the positive predictive value for M₁, we must first count all those patients diagnosed as TB by M₁ (39+0+18+11=68), and then ask ‘what percentage of these patients actually have TB?’ The number with TB is actually 39+18=57, so the PPV is 57/68, or 84%. Use the same approach to calculate the PPV for M₂, and the NPVs for both M₁ and M₂ (see Table 3).

Table 3.

Summaiy statistics for the two methods

Statistic	Method M₁	Method M₂
Sensitivity	90%	62%
Specificity	70%	95%
Positive
predictive	84%	95%
value	(57/68)	(39/41)
Negative
predictive	81%	59%
value	(26/32)	(35/59)

Open in a new tab

From visual inspection of this Table we see some differences, but how can we compare the accuracy of the two tests?

Comparison of the two methods

The first step is to decide what to compare and the second is to identify a table (or tables) that summarise the data in a suitable way. What should we compare? Can you identify a two-by-two cross-tabulation that would be appropriate?

Four options that you could consider are shown in Tables 4 to 7, each of which can be derived from Table 2. Let's look at what these tables tell us.

Table 4.

Cross-tabulalion of Classifications by methods used

Method	Classification
	TB	Not TB
M₁	68	32
M₂	41	59

Open in a new tab

Table 7.

Cross-tabulation of classifications by methods for non-diseased patients

Method	Method M₁		Total

M₂	TB	TB Not
TB	0	2	2
Not Tb	11	24	35
Total	11	26	37

Open in a new tab

Table 4 simply tells us how many of the sample were classified as diseased, without reference to the true status. All this table tells us is that more subjects were classified as diseased using method M₁.

Table 5 does use the true diagnosis, enabling us to know what proportion of the subjects were correctly classified by each method. So now method M₁ seems to be better. However a problem with this table is that it gives no indication of whether the method is both sensitive and specific. We've already seen that M1 seems to win on sensitivity and M2 on specificity.

Table 5.

Cross-tabulation of correctness of classification by method

Method	Classification
	Correct	Incorect
M1	83	17
M2	74	26

Open in a new tab

Tables 6 and 7 cross-tabulate the sensitivity and specificity data for the two methods respectively. (The sensitivities of each method are derived from the (emboldened) margins of Table 6.)

Table 6.

Cross-tabulation of classifications by methods for diseased patients

Method	Method M₁		Total
M₂	TB	Not TB
TB	39 (a)	0 (c)	39
Not TB	18 (b)	6 (d)	24
Total	57	6	63

Open in a new tab

These are the two tables we'll use, to compare the sensitivities and specificities. But before we identify a test to use, what are our hypotheses?

The null hypothesis (to be accepted if the result is not significant) is that the methods are equally accurate, or that each method is equally likely to give a correct classification. The alternative is that one method is better than the other.

A simple statistical test you might consider is Pearson's X²test or Fisher's exact. For Table 6 there are too few observations in the second column to use Pearson's X². For Fisher's exact test the two-tailed p-value is 0.004. This indicates that there is a significant association between the classifications from the two methods. But that is to be expected. What it doesn't tell us is whether one test is better than the other.

A test that compares the error rates using the two methods is McNemar's test1. Let's re-consider Table 6. When the two methods agree (cells labelled (a) and (d)) this contributes nothing to McNemar's test statistic. His test compares the pairs of classifications where there is disagreement between the assessors (cells (b) and (c)). The test statistic can be derived using the formula:

X^{2} = \frac{{[b - c] - 1}^{2}}{b + c}, or equivalently z = \frac{[b - c] - 1}{\sqrt{b + c}}

X² is compared with a distribution (and z with the Standard Normal distributions). For Table 2 X²=16.1, for which p<0.0001. This indicates that the observed difference in sensitivities of the methods is significant. Similarly the specificities are significantly different too (X²=4.92, p=0.027). So method M₁ is better than M₂ at detecting TB when present, while M₂ is better at not falsely finding it.

Having identified a significant difference in sensitivity (or specificity) we may also wish to derive a confidence interval for the difference. Again this should not been done in the usual way for comparing proportions from independent samples. The appropriate method¹ uses all of the frequencies in the table to estimate the sensitivities (p₁ and p₂). The standard error of their difference is.

s e (p_{1} - p_{2}) = \frac{1}{n} \sqrt{b + c - \frac{{(b - c)}^{2}}{n}}

For our example the 95% confidence interval is b ie (21.1%,36.1%). Similarly a 95% confidence interval for the difference in sensitivities is (17.5%, 31.2%). So method M₁ is better in at least 21% of patients who have TB, but worse in at least 17% patients who do not have TB.

In this example the two methods are significantly different both in terms of sensitivity and specificity. In this case the method to be preferred depends on whether sensitivity or specificity is more important. The priority might be to diagnose those who have the disease correctly, and thus for sensitivity to be more important. But this has to be balanced with the numbers of individuals who are misdiagnosed, ie who really have a different disease. The lower the prevalence of the disease the higher is the proportion that is wrongly diagnosed as diseased.

Summary

Statistics commonly used to assess accuracy of a diagnostic tool are sensitivity, specificity, positive predictive value and negative predictive value.

If two diagnostic tools are to be compared and both have been applied to the same subjects then McNemar's test provides a method of testing whether there are any differences between the accuracy of two tests. A confidence interval for the difference in sensitivities or specificities can also be formed. These enable the difference to be estimated and hence for a valid statistically based comparison to be made.

Invitation

If you have a suggestion of a statistical issue that you would like to be considered in this Corner please send it to me, either by email: swhite@mlw.medcol.mw or at the Department of Community Health, College of Medicine, Private Bag 360, Chichiri, Blantyre 3.

References

1.Altman DG. Practical Statistics for Medical Research. Boca Raton; London; New York: Chapman and Hall/CRC; 1991. pp. 237–259. [Google Scholar]

[R1] 1.Altman DG. Practical Statistics for Medical Research. Boca Raton; London; New York: Chapman and Hall/CRC; 1991. pp. 237–259. [Google Scholar]

PERMALINK

Statistics Corner

S White

Table 1.

Table 2.

Assessment of the accuracy of each method

Table 3.

Comparison of the two methods

Table 4.

Table 7.

Table 5.

Table 6.

Summary

Invitation

References

ACTIONS

PERMALINK

RESOURCES

Cite

Add to Collections

PERMALINK

Statistics Corner

S White

Table 1.

Table 2.

Assessment of the accuracy of each method

Table 3.

Comparison of the two methods

Table 4.

Table 7.

Table 5.

Table 6.

Summary

Invitation

References

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases