Abstract
The overall accuracy of an enzyme-linked immunosorbent assay (ELISA) used to detect Johne’s disease at herd level was explored in relation to an imperfect test (fecal culture) in 57 Israeli dairy herds. Receiver-operating characteristic (ROC) analysis indicated an area under the curve (AUC) that corresponded to a test accuracy of 82.0% (69.5% to 90.9%; 95% confidence), with optimized herd sensitivity and herd specificity of 70.4% and 83.3%, respectively; and predictive values of 79.2 (+) and 75.8% (−). The optimal ELISA cutoff was 3.16% (> 3.16% seropositive cows in a herd), which was associated with likelihood ratios (LR) of 4.22 (+LR) and 0.36 (−LR), and post-test probabilities of 0.79 (+) and 0.17 (−). For herds with ≤ 200 cows (n = 19 herds), the 95% confidence interval (CI) for the AUC was 0.62–0.97 and the optimal cutoff was 3.33% (HSe = 87.5, HSp = 81.8); for herds with > 200 but ≤ 270 cows (n = 19 herds), the 95% AUC CI was 0.62–0.97 and the optimal cutoff was 1.13% (HSe = 90.0, HSp = 77.78); and for herds with > 270 cows (n = 19 herds), the 95% AUC CI was 0.69–0.99 and the optimal cutoff was 0.7% (HSe = 100.0, HSp = 70.0). The AUC was not influenced by across-herd prevalence [R2 (adjusted) = 0.0, P > 0.05]. Findings may be applied to facilitate targeted sampling of herds similar to those evaluated. For instance, a test cutoff of 0.76% could be considered for “ruling disease in,” while a cutoff of 3.7% could be used for “ruling disease out.” Caveats that may influence this analysis are discussed.
Résumé
La précision d’une épreuve immunoenzymatique (ELISA) utilisée pour détecter la maladie de Johne au niveau du troupeau a été étudiée en relation à un test imparfait (culture de fèces) dans 57 troupeaux laitiers israéliens. L’analyse de la courbe caractéristique de la performance d’un test (ROC) démontrait une surface sous la courbe (AUC) qui correspondait à une précision du test de 82,0 % (69,5 % à 90,9 %; 95 % de confiance), avec une sensibilité et une spécificité optimisées au niveau du troupeau, respectivement, de 70,4 % et 83,3 %; et des valeurs prédictives de 79,2 (+) et 75,8 (−). Le seuil limite optimal pour l’épreuve ELISA était 3,16 % (> 3,16 % de vaches séropositives dans un troupeau), et était associé avec des rapports de vraisemblance (LR) de 4,22 (+LR) et 0,36 (−LR) et des probabilités post-test de 0,79 (+) et 0,17 (−). Pour les troupeaux comptant ≤ 200 vaches (n = 19), l’intervalle de confiance (CI) 95 % de la AUC était 0,62–0,97 et le seuil limite optimal était 3,33 % (HSe = 87,5, HSp = 81,8); pour les troupeaux avec > 200 et ≤ 270 vaches (n = 19), le CI95 % de la AUC était 0,62–0,97 et le seuil limite optimal était 1,13 % (HSe = 90,0, HSp = 77,78); et pour les troupeaux avec > 270 vaches (n = 19), le CI95 % de la AUC était 0,69–0,99 et le seuil limite optimal était 0,7 % (HSe = 100,0, HSp = 70,0 %). La AUC n’était pas influencée par la prévalence parmi les troupeaux [R2 (ajusté) = 0,0, P > 0,05]. Ces trouvailles peuvent être appliquées pour faciliter un échantillonnage ciblé de troupeaux similaires à ceux évalués. Par exemple, un seuil limite de 0,76 % pourrait être considéré afin d’inclure la possibilité de cette maladie alors qu’un seuil limite de 3,7 % pourrait être considéré afin d’éliminer la possibilité de cette maladie. Des restrictions qui pourraient influencer cette analyse sont discutées.
(Traduit par Docteur Serge Messier)
Introduction
Paratuberculosis (Johne’s disease) is becoming a major concern of the dairy industry, because of both its economic consequences and its possible role in (human) Crohn’s disease (1). Consequently, the evaluation of diagnostic test accuracy is a critical issue.
At least 6 issues influence the evaluation process: 1) the unit of analysis, herd vs. cow [this issue is related to the fact that not all cases are identical (2–4)]; 2) the lack of a perfect (“gold standard”) reference test (5,6); 3) the stage of the infection [at best, positive tests detect bacterial shedders but not all infected animals (7)]; 4) the potential loss of information and bias generated when the results produced by continuous tests (e.g., those of a serological test) are dichotomized (as positive or negative) on the basis of a single cutoff point that may be arbitrarily selected and/or not applicable to all populations (8,9); 5) the dependence of sensitivity (Se), specificity (Sp), and test predictive values on prevalence, which varies across herds (10); and 6) the role of covariates (11).
The accuracy of a test is related to the scale of the unit of study. This may be an animal, a herd, a region (e.g., a state), or a country (12). While many evaluations of tests used to diagnose bovine paratuberculosis have been conducted (5,6,10,13), most have been animal-level based studies. Because individual animals within a herd tend to be more alike than animals chosen at random across herds (a phenomenon also known as cluster-correlated binary responses), herd-level evaluations have been recommended (6). Consequently, prevalence (and test accuracy) should be assessed at herd level, not at animal level (2,3).
Comparisons of tests aimed at diagnosing Johne’s disease are particularly problematic because of the current absence of perfect tests: fecal culture and ELISA are imperfect tests. Fecal culture is not a perfect test to use for the detection of Mycobacterium avium, subspecies paratuberculosis (MAP) because: 1) the infecting organism may survive as an obligatory intracellular parasite, which may be expressed as false negative results, and 2) even when positive bacterial recovery is accomplished, the magnitude of positive results is not necessarily representative of the prevalence in the population at large, and bias is also likely to occur when screening is the purpose of testing (14). Selection bias may occur with ELISA-based testing when animals are regarded as negative solely on the basis of repeated serologic testing (15).
The previous issue relates to the several stages into which this disease may be categorized. At least 3 stages have been described: pre-patent and preclinical (not detectable by either serology or microbial culture); preclinical and detectable by microbial culture; and clinical, seropositive, and detectable by microbial culture (5,7). These numerous disease stages indicate that, at best, diagnostic tests may identify shedder animals and/or animals that have elicited a humoral immune response, rather than all truly infected/non-infected animals.
When test results produced by continuous data (i.e., those generated by serology) are interpreted against reference tests that generate dichotomous data (i.e., fecal cultures, if interpreted qualitatively as bacterial isolation or no isolation), a fallacy (dichotomization of continuous data) may occur (8,9). When a cutoff point of a continuous variable is selected, biased results may occur when that selection is arbitrarily conducted, and the purpose of testing (or the population to be tested) differs (16). For instance, when test cutoffs are kept constant over time or across locations, repeated serological testing followed by culling of test-positive animals may increase the proportion of false negative results (i.e., the lower the prevalence, the lower the test sensitivity and, therefore, the higher the percentage of false negative results).
This concept highlights the importance of apparent prevalence, which is defined by the selected cutoff point. Prevalence influences Se, Sp, and, therefore, test predictive values: testing in settings with low prevalence results in low positive predictive values, while the opposite occurs when prevalence is high (17). Consequently, ideal diagnostic tests (i.e., accurate tests) should be independent from (variable) prevalence.
Covariates, such as milk yield or age, may also influence the diagnostic testing procedure (11). However, the relationship between herd size and (serology-based) test accuracy has not been investigated.
The use of receiver operating characteristic (ROC) analysis may address the previously mentioned challenges or opportunities. The ROC analysis has 5 major advantages: 1) it is relatively insensitive to variations in prevalence; 2) it can evaluate test accuracy across the entire range of (continuous) data (it does not require a predetermined cutoff point); 3) it facilitates cutoff selection and optimization, adjusted for the purpose of the test (i.e., for ruling disease in, as in screening studies; or for ruling disease out, as in confirmatory studies); 4) it facilitates cost-benefit like analyses; and 5) it blends naturally into herd-level assessments, because the process of cutoff selection and optimization is the same as determination of herd-level prevalence and, therefore, ROC analysis facilitates animal survey designs (16–18).
Materials and methods
Sampled population
Based on voluntary participation, the exposure to MAP was investigated between 2003 and 2004 in 74 Israeli dairy herds (16 562 cows; mean cows/farm: 229.7) from which blood and fecal samples were collected.
ELISA
Sera were harvested and stored at 4°C until tested (within 5 days after collection). Antibodies against MAP were determined with a commercial ELISA, conducted according to manufacturer’s guidelines (IDEXX ELISA; HerdCheck Mpt Ab, IDEXX Scandinavia, Osterbybruk, Sweden). Serum samples were considered positive when the sample-positive control (S/P) ratio was > 0.30.
Fecal culture
Microbial isolation was performed on individual fecal samples of all seropositive animals (Table 1). Fecal culture was performed at the Kimron Veterinary Institute using hexadecypyridinium chloride (HPC) as a decontaminant, a centrifugation method, and Herrold’s egg yolk agar slants (Becton, Dickinson and Co., Sparks, Maryland, USA) as the culture medium (19). In brief, 3 g of sample were mixed with 30 mL of 1% HPC in a 50 mL sample collection tube. The suspension was allowed to settle for 30 min at room temperature. Then, 15 mL of the supernatant were transferred to a 50-mL centrifuge tube and centrifuged at 1700 g for 20 min at room temperature. The supernatant was discarded and the pellet mixed with a 1% HPC-Brain hearth infusion. The solution was incubated at 37°C overnight and centrifuged at 1700 g for 20 min at room temperature. Two hundred microliters of 0.9% saline solution were added to the pellet, vortexed, and 100 μL were inoculated into the Herrolds’s egg yolk medium tubes. One Herrold tube per sample was incubated for at least 14 wk and examined weekly after 4 wk. Mycobacterium avium subsp. paratuberculosis was confirmed based on 4 criteria: 1) typical colony morphology, 2) growth rate, 3) mycobactin dependence, and 4) acid fast staining.
Table 1.
Individual MAP-related serological and microbiological results (n = 57 herds)
| Herd ID | Cows tested (herd size) | ELISA + cows | Within-herd prevalence | Microbial isolationa | Herd ID | Cows tested (herd size) | ELISA + cows | Within-herd prevalence | Microbial isolationa |
|---|---|---|---|---|---|---|---|---|---|
| 1 | 90 | 7 | 0.0777778 | Yes | 30 | 876 | 6 | 0.0068493 | No |
| 2 | 96 | 2 | 0.0208333 | No | 31 | 458 | 23 | 0.0502183 | Yes |
| 3 | 87 | 8 | 0.0919540 | Yes | 32 | 134 | 5 | 0.0373134 | Yes |
| 4 | 250 | 11 | 0.0440000 | Yes | 33 | 221 | 8 | 0.0361991 | Yes |
| 5 | 91 | 3 | 0.0329670 | No | 34 | 279 | 10 | 0.0358423 | Yes |
| 6 | 44 | 1 | 0.0227273 | No | 35 | 99 | 2 | 0.0202020 | No |
| 7 | 280 | 10 | 0.0357143 | No | 36 | 507 | 3 | 0.0059172 | No |
| 8 | 273 | 4 | 0.0146520 | No | 37 | 247 | 1 | 0.0040486 | No |
| 9 | 265 | 3 | 0.0113208 | No | 38 | 242 | 1 | 0.0041322 | Yes |
| 10 | 46 | 1 | 0.0217391 | No | 39 | 289 | 2 | 0.0069204 | No |
| 11 | 158 | 5 | 0.0316456 | No | 40 | 305 | 7 | 0.0229508 | Yes |
| 12 | 177 | 3 | 0.0169492 | Yes | 41 | 226 | 11 | 0.0486726 | Yes |
| 13 | 235 | 1 | 0.0042553 | No | 42 | 229 | 6 | 0.0262009 | No |
| 14 | 161 | 12 | 0.0745342 | Yes | 43 | 262 | 2 | 0.0076336 | No |
| 15 | 246 | 9 | 0.0365854 | No | 44 | 194 | 1 | 0.0051546 | No |
| 16 | 294 | 2 | 0.0068027 | No | 45 | 55 | 2 | 0.0363636 | No |
| 17 | 319 | 4 | 0.0125392 | No | 46 | 252 | 1 | 0.0039683 | No |
| 18 | 224 | 1 | 0.0044643 | No | 47 | 237 | 3 | 0.0126582 | Yes |
| 19 | 280 | 4 | 0.0142857 | Yes | 48 | 271 | 5 | 0.0184502 | Yes |
| 20 | 266 | 5 | 0.0187970 | Yes | 49 | 929 | 9 | 0.0096878 | Yes |
| 21 | 239 | 12 | 0.0502092 | Yes | 50 | 259 | 14 | 0.0540541 | Yes |
| 22 | 45 | 3 | 0.0666667 | No | 51 | 331 | 1 | 0.0030211 | No |
| 23 | 56 | 2 | 0.0357143 | Yes | 52 | 268 | 3 | 0.0111940 | No |
| 24 | 44 | 1 | 0.0227273 | No | 53 | 216 | 12 | 0.0555556 | Yes |
| 25 | 44 | 3 | 0.0681818 | Yes | 54 | 862 | 6 | 0.0069606 | No |
| 26 | 38 | 1 | 0.0263158 | No | 55 | 244 | 8 | 0.0327869 | Yes |
| 27 | 363 | 15 | 0.0413223 | Yes | 56 | 320 | 1 | 0.0031250 | No |
| 28 | 149 | 10 | 0.0671141 | Yes | 57 | 345 | 23 | 0.0666667 | Yes |
| 29 | 493 | 16 | 0.0324544 | Yes | All | 14510 | 335 |
No — no bacterial growth in culture; Yes — bacterial growth in culture.
Case definition
The herd was used as the unit of analysis in this study. Positive and negative cases were defined after ROC analysis (16–18). A herd that displayed ELISA values above the (post-analysis) optimized cutoff (i.e., with within-herd prevalence above the specified cutoff) was regarded as positive; otherwise, it was considered to be negative. However, the magnitude that defined whether a herd was classified as disease-positive or disease-negative was not constant; it varied, depending on the purpose and context of each testing.
ROC analysis
Receiver-operating characteristic (ROC) curves, the area under the curve (AUC), test predictive values (+/−), likelihood ratios (+/−), and determination of optimal cutoff points were conducted with MedCalc 9.2 (20). Other statistical tests were performed using a statistical package (Minitab 14; Minitab, State College, Pennsylvania, USA).
Results
Fifty-seven of the 74 tested herds were considered to be sero-positive. Microbial isolation was achieved in 27 of those 57 herds (Tables 1 and 2). The area under the ROC curve (AUC) indicated that the ELISA test was, on average, 82.0% accurate (Figure 1A). The 95% confidence interval of the AUC ELISA ranged from 69.5% to 90.9%, which rejected the null hypothesis of a non-informative test (a test associated with a 50% AUC; Table 2A).
Table 2.
ROC analysis and post-test probabilities of the ELISA test (n = 57 herds)
| 2A. Overall ROC analysis | |||
|---|---|---|---|
| Positive group (fecal culture = 1) | n = 27 | ||
| Negative group (fecal culture = 0) | n = 30 | ||
| Across-herd disease prevalence | 47.4% | ||
| AUC (area under the ROC curve) | 82.0% | ||
| AUC 95% confidence interval | 69.5–90.9 | ||
| Standard error | 0.057 | ||
| P level for null hypothesis AUC = 0.50 | < 0.001 | ||
| Optimal test cutoff | 3.16% | ||
| Positive likelihood ratio (+LR) | 4.22 | ||
| Negative likelihood ratio (−LR) | 0.36 | ||
| Herd sensitivity (%) | 70.37 | ||
| Herd specificity (%) | 83.33 | ||
| Test positive predictive value (%) | 79.2 | ||
| Test negative predictive value (%) | 75.8 | ||
| 2B. Herd-adjusted ROC analysis (n = 19 herds per sub-population) | |||
| ≤ 200 cows | > 200, ≤ 270 cows | > 270 cows | |
| Positive group (fecal culture = 1) | n = 8 | n = 10 | n = 9 |
| Negative group (fecal culture = 0) | n = 11 | n = 9 | n = 10 |
| Across-herd disease prevalence | 42.1% | 52.6% | 47.4% |
| AUC (area under the ROC curve) | 85.2% | 85.6% | 91.1% |
| AUC 95% confidence interval | 61.6–96.9 | 62.0–97.0 | 68.9–98.8 |
| Standard error | 0.096 | 0.089 | 0.073 |
| P level for null hypothesis AUC = 0.50 | < 0.001 | < 0.001 | < 0.001 |
| Optimal test cutoff | 3.3% | 1.13% | 0.7% |
| Positive likelihood ratio (+LR) | 3.21 | 4.05 | 3.33 |
| Negative likelihood ratio (−LR) | 0.17 | 0.13 | 0.00 |
| Herd sensitivity (%) | 87.5 | 90.0 | 100 |
| Herd specificity (%) | 81.8 | 77.8 | 70.0 |
| Test positive predictive value (%) | 77.8 | 75.0 | 75.0 |
| Test negative predictive value (%) | 90.0 | 85.7 | 100 |
| 2C. Calculation of post-test probabilities (overall) | |||
| Source or formula | +LR | −LR | |
| 1. Likelihood ratio | ROC analysis (Table 2A) | +LR = 4.22 | −LR = 0.36 |
| 2. Pre-test disease probability | Across-herd prevalence (Table 2A) | 0.474 | 0.474 |
| 3. Conversion of pre-test disease probability into pre-test odds | Pre-test odds = Prevalence/1–prevalence | 0.474/0.526 = 0.901 | 0.36/0.64 = 0.562 |
| 4. Calculation of post-test odds based on Bayes’ theorem | Post-test odds = pre-test odds × LR | 0.901 × 4.22 = 3.8022 | 0.562 × 0.36 = 0.2025 |
| 5. Conversion of post-test odds into post-test disease probability | Post-test probability = post-test odds/1 + post-test odds | 3.802/4.802 = 0.792 | 0.2025/1.2025 = 0.168 |
Figure 1.
A — Area under the ROC curve (AUC) of a serum ELISA test for bovine paratuberculosis, conducted in reference to microbial fecal cultures. The plot describes: 1) the mean AUC plot (slanted line, closed circles), 2 and 3) the lower (slanted line, open triangles) and upper (slanted line, open circles) 95% confidence intervals. The diagonal straight line represents the performance expected by of a non-informative test (one with 50% herd sensitivity and 50% herd specificity). B — Relationship between ROC-based estimates for herd sensitivity and specificity (mean, lower and upper 95% confidence intervals) and ELISA cutoffs [percentage of seropositive cows within herds (within-herd prevalence)], expressed in decimal points (e.g., 0.02 = 2%). C — ROC analysis-based estimation of ELISA outcomes [false positives (FP), true negatives (TN), true positives (TP), and false negatives FN)] in relation to fecal cultures (0 = no bacterial isolation, 1 = bacterial isolation). D — Correlation between herd size (number of cows/herd) and ELISA cutoffs (r = −0.35, P < 0.01). E — Herd-size adjusted relationship between herd sensitivity/specificity and ELISA cutoffs. Three subpopulations (each with n = 19 herds) were created based on herd size: a) with ≤ 200 cows/herd, b) with > 200 but ≤ 270 cows/herd, and c) with > 270 cows/herd. F — Plot of the regression analysis of (across-herd) prevalence on AUC [R2 (adjusted) = 0.0].
The point at which herd sensitivity (HSe) and herd specificity (HSp) intersected (when plotted as a function of the ELISA cutoffs), together with their 95% confidence intervals, indicated the range of ELISA values that could result in optimal cutoffs. These varied from 1.2% to 3.8% (Figure 1B). The ROC analysis indicated that the optimal cutoff point was 3.16%; this resulted in an HSe of 70.37% and an HSp of 83.33% (Table 2A).
This optimal cutoff point helped to tentatively classify herds as truly positive (TP), truly negative (TN), false positive (FP), or false negative (FN). Of the 27 bacteriologically positive herds, 19 were considered to be TP because they expressed ELISA readings above the optimized cutoff, while 8 seropositive herds were diagnosed as FN because their results were below this cutoff. Among the 25 herds showing ELISA values above the ROC-optimized cutoff, 6 were considered to be FP because no bacteria could be isolated from their fecal cultures (Figure 1C).
Additional ROC-based analyses considered the role of a covariate (herd size) on test performance. Overall, a significant, negative correlation was observed between herd size and ELISA cutoffs (r = −0.35, P < 0.008; Figure 1D). When the data were divided into 3 subpopulations according to herd size [herds with ≤ 200 cows (median: 90 cows/herd), herds with > 200 but ≤ 270 cows (median: 244 cows/herd), and herds with > 270 cows (median: 320 cows/herd)], with each subpopulation having 19 herds, the median herd size was positively correlated with HSe (r = 0.86) and negatively correlated with HSp (r = −0.93), although not reaching statistical significance. For the same subpopulations, the optimal ELISA cutoffs were negatively correlated with HSe (r = −0.77), and positively correlated with HSp (r = 0.85); although neither correlation achieved statistical significance (Figure 1E). Complementing the lack of statistic significance of those relationships, the AUC was shown to be insensitive to across-herd prevalence: analysis of AUC, regressed on across-herd prevalence, did not reach significance [R2 (adjusted) = 0.0, P > 0.05; Figure 1F].
When the number of animals per herd was ≤ 200 (Table 2B), the observed optimal ELISA cutoff was > 3.33%; this value was associated with an HSe of 87.5%, an HSp of 81.8%, and an AUC of 85.2% (Figures 2A, B). When the number of animals per herd was > 200 but ≤ 270, the observed optimal ELISA cutoff was > 1.13%; this value was associated with an HSe of 90.0%, an HSp of 77.8%, and an AUC of 85.6% (Figures 2C, D). When the number of animals per herd was > 270, the observed optimal ELISA cutoff was > 0.7%, with an HSe equal to 100.0%, an HSp of 70.0%, and an associated AUC of 91.1% (Figures 2E, F). These results indicated that the estimated test accuracy was only minimally influenced by herd size: across herd sizes, the average AUC ranged between 85.2 and 91.1% (Figures 2B, D, F).
Figure 2.
Herd size-adjusted ROC analyses and test outcomes. Mean, lower, and upper 95% confidence limits for the AUC and test estimates, for herds with ≤ 200 cows (A, B); herds with > 200 but ≤ 270 cows (C, D); and herds with > 270 cows (E, F). Optimal ELISA cutoffs were: 3.33% (≤ 200 cows/herd), 1.11% (> 200 but ≤ 270 cows/herd), and 0.7% (> 270 cows/herd).
In relation to the overall test predictive values, the overall ELISA cutoffs showed significant correlations (r = 0.96, in relation to +PV; and r = −0.92, in relation to − PV, both with P < 0.01). Their intersection also coincided with an optimal ELISA cutoff of 3.16% (Figure 3A). No significant correlations were observed between (+/−) PV and ELISA cutoffs (not shown).
Figure 3.
A — Relationship between test (positive/negative) predictive values and ELISA cutoffs. The central rectangle indicates the range of values where positive and negatives predictive values intersect, which coincide with the optimal ELISA cutoffs indicated by ROC analysis. Correlations between (positive/negative, B/C, respectively) likelihood ratios (LR) and ELISA cutoffs (r ≥ 0.81, both with P < 0.01).
Significant correlations were observed between ELISA cutoffs and (positive/negative) LRs (r ≥ 0.81, P ≤ 0.001; Figures 3B, C). The optimal ELISA cutoff was associated with a positive likelihood ratio (LR) of 4.22 and a negative LR of 0.36. This meant that the ratio of the probability of a truly positive (paratuberculosis-infected) herd giving a positive test result was 4.22 times (+LR) as great as the probability of obtaining this result from a truly negative (not infected) herd; whereas the probability of a truly infected herd giving a negative result was reduced to 0.36 times the probability of a truly non-infected herd giving a negative test result (−LR). Expressed as probabilities, a herd with a positive test result was 79% (0.79) more likely to be truly infected than a herd without a positive test result, whereas a herd with a negative test result had a probability of 0.17 of being infected (Table 2C).
Discussion
This study found an overall AUC of 0.82 which, according to a widely shared guideline (21), indicated a moderately accurate test (with AUC > 0.7 but < 0.9). However, for herds with > 270 cows, the (95%) upper limit of the AUC interval was 0.99. As expected (16,17), AUC was insensitive to prevalence (Figure 1F). The overall AUC observed in this study was similar to that obtained by Kostoulas et al (19) while testing goats. However, unlike that report, in the study reported here, lowering the ELISA cutoff decreased HSp faster than HSe increased: reducing the cutoff from 3.16% to 2.3% did not change HSe (70.4%), but reduced HSp from 83.3% to 73.3% (Figure 1B). As herd size increased, HSe increased and HSp decreased (Table 2B).
The study reported here did not assess fecal culture colony forming units (CFUs). This limitation is probably irrelevant because, regardless of quantitative microbiological data, true infection cannot be estimated from shedders. While CFU quantification does not improve test sensitivity, if animals are in earlier disease stages, it clearly augments testing costs. This results in samples of smaller size, a reported possible limitation of evaluations like this (7), which could induce a bias, as when within-herd clustering is not accounted for (2,3).
This study found major differences with the ELISA ROC-based analyses reported by McKenna et al (13). While the overall AUC observed in this study was 82% (and, for 1 subset, the AUC was 91%), AUC values reported by McKenna et al were below 58%. A major difference between the current study and McKenna et al is that the unit of study in this report was the herd, not the individual animal. As a result, the influence of an individual animal-based testing over the global test accuracy differed substantially between these studies. The influence of individual animals in the current study was, comparatively, of less magnitude than in the study of McKenna et al (where each tested animal had the same weight on the overall test accuracy analysis). Such discrepancies are not unexpected when the unit of analysis differs. An ELISA test applied to individual animals may be less sensitive than when applied to herds (22). One reason for this discrepancy is that when all cases are equally weighted (as when the unit of analysis is the animal), within-herd clustering is not accounted for. However, animals within a herd (i.e., clustered) are more likely to be similar than animals selected at random across herds (3). For that reason, testing of clusters (such as herds) has been prioritized over testing of individual animals, in both this and other diseases (4,6,23,24).
While numerous reports that have assessed test accuracy in paratuberculosis have either focused on animal-level studies or tested herds based on limited samples (5,6,10,25), to the best of our knowledge, the study reported here is the first to use ROC analysis to assess test accuracy while (a) focusing on herds and (b) considering the influence of herd size as a covariate. In herd size-adjusted analyses, ELISA cutoffs appeared to be negatively correlated with HSe and positively correlated with HSp, although not achieving statistical significance (Figure 1E).
However, those correlations may be immaterial in evaluating test performance. Because (herd size-adjusted) across-herd prevalence did not influence AUC, the regression analysis yielded R2 (adjusted) equal to 0.0 (Figure 1F). This finding supported the hypothesis that ROC-based analysis (i.e., a test based on continuous data) is more valid than Se/Sp-based assessments (i.e., analyses based on dichotomous data). It has been observed (26) that test accuracy should not be based on HSe and HSp, but, rather, on likelihood ratios (LRs) because, unlike Se and Sp, computation of LRs does not require previous dichotomization of tests results.
The positive predictive value (+PV) here obtained in association with the optimal ELISA cutoff in the overall evaluation (79.2%) was higher than previously reported in other ELISAs used to detect paratuberculosis (27,28). In contrast, the −PV was lower. For herds having > 270 cows, the +/− PVs coincided with those reported previously. Although +/− PVs depend on prevalence (18,24), LRs are not influenced by prevalence (29).
The generation of LRs provided a tool for estimating post-test probabilities. A statistically significant correlation was observed between either negative (< 1) or positive (> 1) LRs and ELISA cutoffs (at least between 0.01 and 0.04 cutoffs). This strongly suggests that the accuracy of this test holds over a large range of cutoffs.
While classic (binomial) approaches based on dichotomous data do not facilitate cost-benefit like analysis, ROC analysis (using LRs) does (29). Accordingly, this ROC analysis-based approach could be used to determine the optimal cutoffs required to increase gains or reduce costs (30). For instance, if ruling disease in (screening) were the priority, high HSe (> 96.0%) could be obtained with a test cutoff of 0.76% (not shown). If, instead, the goal were to reduce testing costs or to reduce the chances of culling herds erroneously diagnosed as infected (ruling disease out), high HSp (> 96.0%) would be achieved with a 3.7% cutoff (not shown). These findings could also be applied in designing herd-level animal surveys, using previously reported algorithms (4,12,15,31,32). For instance, the standard error of the AUC may be used for sample-size calculations (18).
These results should be taken with caution for the following reasons: 1) serological herd-level evaluation to predict paratuberculosis infection may not be valid or generalizable if it is routinely conducted with the same test cutoff; 2) test accuracy may be regarded to be population-specific, and a given population (a herd) may change by virtue of repeated sampling and culling; 3) disease stage was not estimated; and 4) ROC analysis may be sensitive to sample size (7,29). While the overall analysis exceeded the conventional rule of a minimum of 20 observations per diseased and non-diseased groups (33), the herd size-based analyses reported herein approached but did not meet that criterion.
References
- 1.Hermon-Taylor J, Bull TJ, Sheridan JM, Cheng J, Stellakis ML, Sumar N. Causation of Crohn’s disease by Mycobacterium avium subspecies paratuberculosis. Can J Gastroenterol. 2000;14:521–539. doi: 10.1155/2000/798305. [DOI] [PubMed] [Google Scholar]
- 2.Jordan D. Aggregate testing for the evaluation of Johne’s disease herd status. Aust Vet J. 1996;73:16–19. doi: 10.1111/j.1751-0813.1996.tb09947.x. [DOI] [PubMed] [Google Scholar]
- 3.Jordan D, McEwen SA. Herd-level test performance based on uncertain estimates of individual test performance, individual true prevalence and herd true prevalence. Prev Vet Med. 1998;36:187–209. doi: 10.1016/s0167-5877(98)00087-7. [DOI] [PubMed] [Google Scholar]
- 4.Audigé L, Beckett S. A quantitative assessment of the validity of animal-health surveys using stochastic modelling. Prev Vet Med. 1999;38:259–276. doi: 10.1016/s0167-5877(98)00135-4. [DOI] [PubMed] [Google Scholar]
- 5.Nielsen SS. Variance components of an enzyme-linked immunosorbent assay for detection of IgG antibodies in milk samples to Mycobacterium avium subspecies paratuberculosis in dairy cattle. J Vet Med B. 2002;49:384–387. doi: 10.1046/j.1439-0450.2002.00592.x. [DOI] [PubMed] [Google Scholar]
- 6.Collins MT, Wells SJ, Petrini KR, Collins JE, Schultz RD, Whitlock RH. Evaluation of five antibody detection tests for diagnosis of bovine paratuberculosis. Clin Diagn Lab Immunol. 2005;12:685–692. doi: 10.1128/CDLI.12.6.685-692.2005. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 7.Crossley BM, Zagmutt-Vergara FJ, Fyock TL, Whitlock RL, Gardner IA. Fecal shedding of Mycobacterium avium subsp. paratuberculosis by dairy cows. Vet Microbiol. 2005;107:257–263. doi: 10.1016/j.vetmic.2005.01.017. [DOI] [PubMed] [Google Scholar]
- 8.MacCallum RC, Zhang S, Preacher KJ, Rucker DD. On the practice of dichotomization of quantitative variables. Psychol Meth. 2002;7:19–40. doi: 10.1037/1082-989x.7.1.19. [DOI] [PubMed] [Google Scholar]
- 9.Choi YK, Johnson WO, Thurmond MC. Diagnosis using predictive probabilities without cutoffs. Stat Med. 2006;25:699–717. doi: 10.1002/sim.2365. [DOI] [PubMed] [Google Scholar]
- 10.Collins MT. Interpretation of a commercial bovine paratuberculosis enzyme-linked immunosorbent assay by using likelihood ratios. Clin Diagn Lab Immunol. 2002;9:1367–1371. doi: 10.1128/CDLI.9.6.1367-1371.2002. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 11.Wang C, Turnbull BW, Gröhn YT, Nielsen SS. Estimating receiver operating characteristic curves with covariates when there is no perfect reference test for diagnosis of Johne’s Disease. J Dairy Sci. 2006;89:3038–3046. doi: 10.3168/jds.S0022-0302(06)72577-2. [DOI] [PubMed] [Google Scholar]
- 12.Audigé L, Doherr MG, Hauser R, Salman MD. Stochastic modelling as a tool for planning animal-health surveys and interpreting screening-test results. Prev Vet Med. 2001;49:1–17. doi: 10.1016/s0167-5877(01)00182-9. [DOI] [PubMed] [Google Scholar]
- 13.McKenna SLB, Keefe GP, Barkema HW, Sockett DC. Evaluation of three ELISAs for Mycobacterium avium subsp. paratuberculosis using tissue and fecal culture as comparison standards. Vet Microbiol. 2005;110:105–111. doi: 10.1016/j.vetmic.2005.07.010. [DOI] [PubMed] [Google Scholar]
- 14.Nielsen SS, Grønbæk C, Agger JF, Houe H. Maximum-likelihood estimation of sensitivity and specificity of ELISA and faecal culture for diagnosis of paratuberculosis. Prev Vet Med. 2002;53:191–204. doi: 10.1016/s0167-5877(01)00280-x. [DOI] [PubMed] [Google Scholar]
- 15.Fosgate GT, Adesiyun AA, Hird DW, Hietala SK. Likelihood ratio estimation without a gold standard: A case study evaluating a brucellosis c-ELISA in cattle and water buffalo in Trinidad. Prev Vet Med. 2006;75:189–205. doi: 10.1016/j.prevetmed.2006.02.007. [DOI] [PubMed] [Google Scholar]
- 16.Gardner IA, Greiner M. Receiver-operating characteristic curves and likelihood ratios: Improvements over traditional methods for the evaluation and application of veterinary clinical pathological tests. Vet Clin Pathol. 2006;35:8–17. doi: 10.1111/j.1939-165x.2006.tb00082.x. [DOI] [PubMed] [Google Scholar]
- 17.Linden A. Measuring diagnostic and predictive accuracy in disease management: An introduction to receiver operating characteristic (ROC) analysis. J Eval Clin Prac. 2006;12:132–139. doi: 10.1111/j.1365-2753.2005.00598.x. [DOI] [PubMed] [Google Scholar]
- 18.Obuchowski NA, McClish DK. Sample size determination for diagnostic accuracy studies involving binormal ROC curve indices. Statist Med. 1997;16:1529–1542. doi: 10.1002/(sici)1097-0258(19970715)16:13<1529::aid-sim565>3.0.co;2-h. [DOI] [PubMed] [Google Scholar]
- 19.Kostoulas P, Leontides L, Enøe C, Billinis C, Florou M, Sofia M. Bayesian estimation of sensitivity and specificity of serum ELISA and faecal culture for diagnosis of paratuberculosis in Greek dairy sheep and goats. Prev Vet Med. 2006;76:56–73. doi: 10.1016/j.prevetmed.2006.04.006. [DOI] [PubMed] [Google Scholar]
- 20.MedCalc 9.2 [Web site on the Internet.] [Last accessed November 30, 2006]; Available at http://www.medcalc.be.
- 21.Swets JA. Measuring the accuracy of diagnostic systems. Science. 1988;240:1285–1293. doi: 10.1126/science.3287615. [DOI] [PubMed] [Google Scholar]
- 22.Carman S, Josephson G, McEwen B, et al. Field validation of a commercial blocking ELISA to differentiate antibody to transmissible gastroenteritis virus (TGEV) and porcine respiratory coronavirus and to identify TGEV-infected swine herds. J Vet Diagn Invest. 2002;14:97–105. doi: 10.1177/104063870201400202. [DOI] [PubMed] [Google Scholar]
- 23.Branscum AJ, Gardner IA, Wagner BA, McInturff PS, Salman MD. Effect of diagnostic error on intracluster correlation coefficient estimation. Prev Vet Med. 2005;69:63–75. doi: 10.1016/j.prevetmed.2005.01.015. [DOI] [PubMed] [Google Scholar]
- 24.Norby B, Bartlett PC, Grooms DL, Kaneene JB, Bruning-Fann CS. Use of simulation modeling to estimate herd-level sensitivity, specificity, and predictive values of diagnostic tests for detection of tuberculosis in cattle. Am J Vet Res. 2005;66:1285–1291. doi: 10.2460/ajvr.2005.66.1285. [DOI] [PubMed] [Google Scholar]
- 25.van Schaik G, Stehman SM, Jacobson RH, Schukken YH, Shin SJ, Lein DH. Cow-level evaluation of a kinetics ELISA with multiple cutoff values to detect fecal shedding of Mycobacterium avium subspecies paratuberculosis in New York State dairy cows. Prev Vet Med. 2005;72:221–236. doi: 10.1016/j.prevetmed.2005.01.019. [DOI] [PubMed] [Google Scholar]
- 26.Deeks JJ, Altman DG. Diagnostic tests 4: Likelihood ratios. BMJ. 2004;329:168–169. doi: 10.1136/bmj.329.7458.168. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 27.Jark U, Ringena I, Franz B, Gerlach GF, Beyerbach M, Franz B. Development of an ELISA technique for serodiagnosis of bovine paratuberculosis. Vet Microbiol. 1997;57:189–198. doi: 10.1016/s0378-1135(97)00125-9. [DOI] [PubMed] [Google Scholar]
- 28.Munjal SK, Boehmer J, Beyerbach M, Strutzberg-Minder K, Homuth M. Evaluation of a LAM ELISA for diagnosis of paratuberculosis in sheep and goats. Vet Microbiol. 2004;103:107–114. doi: 10.1016/j.vetmic.2004.07.019. [DOI] [PubMed] [Google Scholar]
- 29.Greiner M, Pfeiffer D, Smith RD. Principles and practical application of the receiver-operating characteristic analysis for diagnostic tests. Prev Vet Med. 2000;45:23–41. doi: 10.1016/s0167-5877(00)00115-x. [DOI] [PubMed] [Google Scholar]
- 30.Brown CD, Davis HT. Receiver operating characteristics curves and related decision measures: A tutorial. Chemom Intell Lab Syst. 2006;80:24–38. [Google Scholar]
- 31.Humphry RW, Cameron A, Gunn GJ. A practical approach to calculate sample size for herd prevalence surveys. Prev Vet Med. 2004;65:173–188. doi: 10.1016/j.prevetmed.2004.07.003. [DOI] [PubMed] [Google Scholar]
- 32.Wagner B, Salman MD. Strategies for two-stage sampling designs for estimating herd-level prevalence. Prev Vet Med. 2004;66:1–17. doi: 10.1016/j.prevetmed.2004.07.008. [DOI] [PubMed] [Google Scholar]
- 33.Lasko TA, Bhagwat JG, Zou KH, Ohno-Machado L. The use of receiver operating characteristic curves in biomedical informatics. J Biomed Inform. 2005;38:404–415. doi: 10.1016/j.jbi.2005.02.008. [DOI] [PubMed] [Google Scholar]



