Skip to main content
Malawi Medical Journal logoLink to Malawi Medical Journal
. 2002 Apr;14(1):26–28.

Statistics Corner

S White
PMCID: PMC3345413  PMID: 27528923

Welcome to this new feature in the Journal! In this “Corner” we will look at a specific situation and the statistical methods that can be applied to the analysis of data.

Consider this scenario:

You want to know how well a simple test may diagnose something, and save the need for a more difficult, invasive or expensive test that would provide a definitive answer.

For example, in a patient with palpable lymph nodes, is the cause tuberculosis or not?

To answer this with confidence you need an invasive procedure, preferably excision biopsy and histology (EBH). But such a procedure is not only invasive but also difficult, expensive and slow. You may therefore want to know how well a non-invasive, quick method would provide the answer.

One method (M1) could be whether simple examination finds that the nodes are ‘matted’ or not, ie whether they seem to be stuck together in groups. Another method could be a Mantoux (tuberculin) test (M2). You want to evaluate each of these tests on its own for usefulness in diagnosing TB as the cause of the lymphadenopathy. You also want to know if one test is superior to the other.

You plan to conduct a study to evaluate and compare these two methods. The best design uses each method, M1 and M2, independently, as well as the invasive one, on all patients studied who have palpable lymph nodes. In all the calculations that foUw, we will assume that Excission Biopsy and History (EBH) is 100% accurate - it will serve as our ‘gold standard’ How should you plan to analyse your data?

You will need to select statistical tests to:

  1. Evaluate each method;

  2. Compare the two methods,

We will consider these questions in turn. To illustrate the statistical tests to be described suppose you have collected data on 100 patients. Some of the data are shown in Table 1 (this only shows 6 patients - the full table would list all 100 cases), which can be summarised in a three-way cross-tabulation (Table 2).

Table 1.

Data Usting according to diagnostic method (EBH = excision biopsy and histology. M1 = palpation, M2 = mantoux test)

Patient EBH M1 M2
1 TB TB TB
2 TB Not TB TB
3 Not TB Not TB TB
4 TB TB TB
5 Not TB Not TB not TB
(etc): : : :
100 Not TB TB not TB

Table 2.

Cross-tabulation of classifications using methods M1 and M2 by actual TB status (determined by EBH)

Result of methods Actual status
M1 M2 TB Not TB
TB TB 39 0
TB Not TB 18 11
Not TB TB 0 2
Not TB Not TB 6 24
Total 63 37

Assessment of the accuracy of each method

When the true diagnosis is known there are four commonly used statistics to measure the accuracy. Do you know what they are?

Sensitivity is one of the four methods. It indicates how many of those with the disease (TB) are correctly classified. It can be calculated using the first column of figures in Table 2.

Thence the sensitivity of method M1 is 90% ({39+18}/63) and for method M2 it is 62% ({39+0}/63). So, in the sample considered M1 is more sensitive than M2 - ie it is better at diagnosing TB when the patient has TB.

The second method is specificity. This indicates how many cases without TB are correct identified. This can be calculated using the second column of figures in Table 2. The specificity of method M1 is 70% (26/37) and for method M2 it is 95% 135/37). So now we see that M2 is more specific in this sample than M1 - ie it is better at not diagnosing TB when the patient does not have TB. In summary method M1 is more sensitive but M2 seems more specific.

These two statistics indicate the proportions correctly classified. By contrast the positive predictive value [and negative predictive value] indicate what proportion of those classified to have [not have] the disease actually do [not] have it. These values are dependent on the mix of proportions in the sample used. To find the positive predictive value for M1, we must first count all those patients diagnosed as TB by M1 (39+0+18+11=68), and then ask ‘what percentage of these patients actually have TB?’ The number with TB is actually 39+18=57, so the PPV is 57/68, or 84%. Use the same approach to calculate the PPV for M2, and the NPVs for both M1 and M2 (see Table 3).

Table 3.

Summaiy statistics for the two methods

Statistic Method
M1
Method
M2
Sensitivity 90% 62%
Specificity 70% 95%
Positive
predictive 84% 95%
value (57/68) (39/41)
Negative
predictive 81% 59%
value (26/32) (35/59)

From visual inspection of this Table we see some differences, but how can we compare the accuracy of the two tests?

Comparison of the two methods

The first step is to decide what to compare and the second is to identify a table (or tables) that summarise the data in a suitable way. What should we compare? Can you identify a two-by-two cross-tabulation that would be appropriate?

Four options that you could consider are shown in Tables 4 to 7, each of which can be derived from Table 2. Let's look at what these tables tell us.

Table 4.

Cross-tabulalion of Classifications by methods used

Method Classification
TB Not TB
M1 68 32
M2 41 59

Table 7.

Cross-tabulation of classifications by methods for non-diseased patients

Method Method M1 Total

M2 TB TB Not
TB 0 2 2
Not Tb 11 24 35
Total 11 26 37

Table 4 simply tells us how many of the sample were classified as diseased, without reference to the true status. All this table tells us is that more subjects were classified as diseased using method M1.

Table 5 does use the true diagnosis, enabling us to know what proportion of the subjects were correctly classified by each method. So now method M1 seems to be better. However a problem with this table is that it gives no indication of whether the method is both sensitive and specific. We've already seen that M1 seems to win on sensitivity and M2 on specificity.

Table 5.

Cross-tabulation of correctness of classification by method

Method Classification
Correct Incorect
M1 83 17
M2 74 26

Tables 6 and 7 cross-tabulate the sensitivity and specificity data for the two methods respectively. (The sensitivities of each method are derived from the (emboldened) margins of Table 6.)

Table 6.

Cross-tabulation of classifications by methods for diseased patients

Method Method M1 Total
M2 TB Not TB
TB 39 (a) 0 (c) 39
Not TB 18 (b) 6 (d) 24
Total 57 6 63

These are the two tables we'll use, to compare the sensitivities and specificities. But before we identify a test to use, what are our hypotheses?

The null hypothesis (to be accepted if the result is not significant) is that the methods are equally accurate, or that each method is equally likely to give a correct classification. The alternative is that one method is better than the other.

A simple statistical test you might consider is Pearson's X2test or Fisher's exact. For Table 6 there are too few observations in the second column to use Pearson's X2. For Fisher's exact test the two-tailed p-value is 0.004. This indicates that there is a significant association between the classifications from the two methods. But that is to be expected. What it doesn't tell us is whether one test is better than the other.

A test that compares the error rates using the two methods is McNemar's test1. Let's re-consider Table 6. When the two methods agree (cells labelled (a) and (d)) this contributes nothing to McNemar's test statistic. His test compares the pairs of classifications where there is disagreement between the assessors (cells (b) and (c)). The test statistic can be derived using the formula:

X2={[bc]1}2b+c,or equivalently z = [b - c]1b + c

X2 is compared with a distribution (and z with the Standard Normal distributions). For Table 2 X2=16.1, for which p<0.0001. This indicates that the observed difference in sensitivities of the methods is significant. Similarly the specificities are significantly different too (X2=4.92, p=0.027). So method M1 is better than M2 at detecting TB when present, while M2 is better at not falsely finding it.

Having identified a significant difference in sensitivity (or specificity) we may also wish to derive a confidence interval for the difference. Again this should not been done in the usual way for comparing proportions from independent samples. The appropriate method1 uses all of the frequencies in the table to estimate the sensitivities (p1 and p2). The standard error of their difference is.

se(p1p2)=1nb+c(bc)2n

For our example the 95% confidence interval is b ie (21.1%,36.1%). Similarly a 95% confidence interval for the difference in sensitivities is (17.5%, 31.2%). So method M1 is better in at least 21% of patients who have TB, but worse in at least 17% patients who do not have TB.

In this example the two methods are significantly different both in terms of sensitivity and specificity. In this case the method to be preferred depends on whether sensitivity or specificity is more important. The priority might be to diagnose those who have the disease correctly, and thus for sensitivity to be more important. But this has to be balanced with the numbers of individuals who are misdiagnosed, ie who really have a different disease. The lower the prevalence of the disease the higher is the proportion that is wrongly diagnosed as diseased.

Summary

Statistics commonly used to assess accuracy of a diagnostic tool are sensitivity, specificity, positive predictive value and negative predictive value.

If two diagnostic tools are to be compared and both have been applied to the same subjects then McNemar's test provides a method of testing whether there are any differences between the accuracy of two tests. A confidence interval for the difference in sensitivities or specificities can also be formed. These enable the difference to be estimated and hence for a valid statistically based comparison to be made.

Invitation

If you have a suggestion of a statistical issue that you would like to be considered in this Corner please send it to me, either by email: swhite@mlw.medcol.mw or at the Department of Community Health, College of Medicine, Private Bag 360, Chichiri, Blantyre 3.

References

  • 1.Altman DG. Practical Statistics for Medical Research. Boca Raton; London; New York: Chapman and Hall/CRC; 1991. pp. 237–259. [Google Scholar]

Articles from Malawi Medical Journal : The Journal of Medical Association of Malawi are provided here courtesy of Kamuzu University of Health Sciences and Medical Association of Malawi

RESOURCES