Skip to main content
Scientific Reports logoLink to Scientific Reports
. 2025 Jun 5;15:19728. doi: 10.1038/s41598-025-05223-6

Comparative performance and age dependence of tuberculin and defined antigen bovine tuberculosis skin tests assessed with Bayesian latent class analysis

Matios Lakew 1,2,✉,#, Andrew J K Conlan 3,✉,#, Biniam Tadesse 2, Sreenidhi Srinivasan 4, Bekele Yalew 2, Teferi Benti 2, Abebe Olani 2, Getachew Kinfe 2, Tigist Ashagrie 2, Ashebir Abebe 2, Abebe Fromsa 1,5, Musse Girma Abdela 1, Berecha Bayissa 6, Solomon Gebre 2, Adane Mihret 7, Getnet Abie Mekonnen 2, Gobena Ameni 1,8, Hagos Ashenafi 1, James L N Wood 3, Balako Gumi 1, Vivek Kapur 9,10,
PMCID: PMC12141502  PMID: 40473835

Abstract

Tuberculin skin tests (TST), the primary diagnostic tool for bovine tuberculosis (bTB), cross-react with BCG vaccine. Recently developed defined antigen skin tests (DSTs) aim to differentiate infected amongst vaccinated animals. We evaluated the field performance of different interpretations of the TST and DSTs relative to IGRA and IDEXX M. bovis antibody tests. This panel of tests was assessed in 446 unvaccinated cattle across 22 Ethiopian dairy herds using Bayesian latent class models. We extended the standard Walter-Hui model to include age-related effects to explore evidence of the presence of diagnostic anergy. The latent class models estimate sensitivity and specificity of the DSTs to be between 84–88% and 79–85% respectively. The DSTs perform intermediately between the comparative intradermal test (CIT, sensitivity 77%, specificity 100%) and single intradermal test (SIT, sensitivity 99%, specificity 76%). We observed significant age-related declines in test sensitivity, most notably for CIT (declining from 75 to 52% over 9 years) and DST10 (83% to 68%), while other tests showed more stable sensitivity across age groups. This variable pattern across tests suggests mechanisms beyond simple age-related anergy. Together, these findings demonstrate that DSTs’ superior sensitivity to CIT and comparable or better specificity than SIT, combined with their ability to distinguish vaccinated animals, creates a viable pathway for implementing BCG vaccination programs. Given the absence of any gold standard definition of infection with bTB, latent class analyses are essential to assess the relative performance of different diagnostic tests. While our results provide encouraging news for the sensitivity of the new DST tests, the high prevalence of bTB within our study population makes our design underpowered to assess the specificity of the DSTs. Future research, including assessment of the specificity of DSTs in disease-free populations and optimization of test formulation and validation through large-scale field trials is essential to fully establish the case for use in vaccination and surveillance programs.

Keywords: Bovine tuberculosis, Defined antigen skin test, Tuberculin, DIVA, Age-dependent diagnostics, Bayesian latent class analysis, BCG vaccination, Disease control

Subject terms: Immunology, Microbiology

Introduction

Bovine tuberculosis (bTB) is a chronic infectious disease of cattle, caused by members of the Mycobacterium tuberculosis complex (MTBC). Tuberculin skin tests (TST), developed over a century ago1, are the standard tests used for the diagnosis of bTB in live cattle2. The most commonly used formats of the TST today are the single intradermal test (SIT), which utilizes bovine purified protein derivative (PPD) alone and the comparative intradermal test (CIT), employing both avian and bovine PPD. The TST has been used to eliminate or significantly reduce the prevalence of bTB from several countries3,4. Nevertheless, the limitations in the diagnostic performance of the test are well known and have been extensively reported5.

The sensitivity of the PPD-based tuberculin skin test can be compromised by various factors, leading to false negative results. Common reasons contributing to false negative tuberculin reactions include testing during the pre-allergic (unreactive or occult) period, desensitization to bovine tuberculin, cross-reaction with environmental mycobacteria, immune suppression during the early post-partum period, low potency of PPDs and operator error59. Cattle in an advanced stage of disease can also enter a so-called “anergic” state, where despite extensive disease they fail to generate a reaction to tuberculin8. However, despite the potential importance of anergy for the effectiveness of control in endemically infected herds, the rate of progression of infected animals to an anergic state, or the prevalence of anergic animals, has never been estimated. Given the chronic nature of bTB we would expect the prevalence of infection to strictly increase with age in the absence of anergy, selective removal of infected animals through testing or excess rates of mortality10. Thus, a decline in the apparent prevalence of bTB with age in endemically infected populations would be the expected signature of the presence of diagnostic anergy. This signature could also be explained by an excess rate of mortality in infected animals—or rate of removal of animals by farmers based on the productivity of animals. A decline in the apparent prevalence of bTB with age based on a single diagnostic test is therefore a necessary—but not sufficient—condition for the presence of anergic animals within a herd. Establishing the presence of anergic animals for any particular diagnostic test therefore requires the estimation of differential patterns of response with age for multiple tests applied to the same animals.

Various factors that reduce the specificity of the TST have also been documented, resulting in false positive results. Tuberculin PPD contains antigens that cross react with the BCG vaccine: the only current vaccine candidate for cattle vaccination against bTB2,5,9,11. Therefore, the use of cattle vaccination to control bTB depends on the development and validation of a test which can successfully distinguish between infected and vaccinated animals (DIVA capability).

To address these challenges, defined antigen skin tests (DSTs) have been developed using M. bovis antigens such as ESAT-6, CFP-10, and Rv3615c1215. DSTs offer improved specificity and DIVA (Differentiating Infected amongst Vaccinated Animals) capability1620. The successful implementation of DSTs could revolutionize bTB control strategies, particularly in countries considering BCG vaccination. However, the comparative performance of DSTs against traditional tuberculin-based tests in field conditions, especially across different age groups, remains poorly understood. This knowledge gap is critical, as age-related effects on test performance could significantly impact the effectiveness of control programs.

In the absence of a gold standard definition of infection, latent class analysis can be used to evaluate the performance of a competing set of diagnostic tests21,22. Given a set of test results for at least two diagnostic tests from the same animals taken from at least two populations with different (but non-zero) prevalence, the foundational Walter-Hui (WH) latent class model can in principle infer the true prevalence of infection within the populations tested and the sensitivity and specificity of each alternative test and have previously been applied to diagnostics for bTB in cattle2325. The Walter-Hui model depends on two key assumptions. Firstly, the form of the likelihood requires that the panel of tests being assessed are—or can be assumed to be—conditionally independent. A conditional dependence between two tests would be introduced should the status of one test influence another independent of the (true) infection status of the animal. A biological dependence between tests (such as would be expected for DSTs and TSTs that target the same basic immune responses) does not necessarily imply a statistical dependence between test results. Predictive model checks can be used to test this assumption—which we use in this study to validate our models. The second, less discussed assumption is that the set of tests can in principle identify all infected animals within a sample or population. If there is a subset of individuals which the set of diagnostic tests available cannot consistently identify, then unknown biases may be introduced into the estimated test characteristics. For this reason, we would argue that the primary utility of latent class analysis—and the objective of this study—is to assess the relative difference in diagnostic performance measured between the set of evaluated tests rather than the absolute values.

In this study we extend the Walter-Hui Bayesian latent class model for diagnostic test evaluation to assess and compare the field performance of the DST with standard tuberculin based tests using a larger, more representative, sample set (446 animals from 22 dairy herds) than has previously been reported. In contrast to previous studies on diagnostics for bTB we collected information on the age of animals which allows us to extend the baseline Walter-Hui model to incorporate an age-catalytic model that allows the prevalence of infection and sensitivity of diagnostic tests to vary with age. We use these models to test the hypothesis for the presence of anergic animals within Ethiopian herds.

Results

The performance of the defined antigen skin test (DST) in the field was compared with that of the PPD based tuberculin skin test, IGRA and IDEXX M. bovis antibody test on dairy farms. The study involved 446 animals, with 142 of them undergoing repeated testing. The apparent within herd prevalence during the first round of testing, as measured by the CIT test ranged from 0 to 87% (to nearest whole %) with a mean of 20% and a median of 7%. The distribution of prevalences within herds is therefore right-skewed with a low frequency of extremely high prevalence herds skewing the average upwards. The apparent prevalence and degree of skew varied between tests with the highest average number of positive results within herds (and least skew) for the SIT with a mean of 47% and median of 50% (Table 1).

Table 1.

Summary of the apparent within-herd prevalence at first round of testing (22 herds, n = 446 animals).

Test Mean Median Lower quartile Upper quartile
CIT 20 7 0 28
CITz 42 40 20 59
DST10 36 31 13 49
DST30 40 39 18 56
DSTF 38 33 19 48
IDEXX 7 1 0 14
IGRA PPD 31 26 8 48
SIT 47 50 21 65

CIT, Comparative Intradermal Test, threshold > 4 mm; CITz, CIT threshold > 0 mm; DST10, DST30, DSTF, Defined-antigen Skin Tests, threshold ≥ 2 mm; IDEXX, ELISA for the detection of M. bovis antibody; IGRA PPD, Interferon Gamma Release Assay with Purified Protein Derivative; SIT, Single Intradermal Test, threshold > 2 mm.

Age-dependence of apparent prevalence

The association between the apparent prevalence of bTB and age varied between the eight different diagnostic tests (Fig. 1). For all tests, other than the IDEXX test for which there was extremely low sensitivity in this study, there is an increasing risk of positivity up to a peak around 6 years of age followed by a (weaker) decline that we would expect to see from the accumulation of anergic animals in these groups (but see below). To quantify this effect, we calculated the relative difference between the peak apparent prevalence for each diagnostic and the value estimated by the GAM at the upper age bound (9 years). All diagnostics demonstrate a variable drop in apparent prevalence suggesting the existence of different age-related effects for each test.

Fig. 1.

Fig. 1

Age dependence of apparent prevalence for each diagnostic test. The apparent prevalence of bTB (proportion of positive tests) was estimated using eight tests; CIT (> 4 mm), CITz (> 0 mm), SIT (> 2 mm), all the three DSTs (DST10, DST30, DSTF ≥ 2 mm), IGRA (≥ 0.1), and IDEXX (≥ 0.3) for 9 discrete annual age groups. Animals older than ten years were excluded from this analysis due to the sparsity of animals surviving to this age. Raw values are presented as points with lines indicated 95% (binomial) confidence intervals. The trend with age for each group is illustrated by a thin plate spline with 4 knots estimated from a (binomial) GAM. Standard errors for the smoothed trend line are plotted as a (shaded) ribbon strip. All plots demonstrate a relatively weak association with age with the highest risk of positivity in middle-aged cattle (in the range of the 4–6 year age groups). Note that the apparent drop in prevalence for animals in age group 7 is a result of both the declining number of animals surviving beyond this age and the uneven distribution of these animals between herds which have very different average levels of prevalence. The relative difference between the peak prevalence and estimated prevalence at the upper age limit (9 years) from the GAM is demonstrates a wide range of variability in the magnitude of the potential “anergy” effect (bottom right inset panel). For the relative difference uncertainty is estimated (and presented as line intervals) using the upper and lower standard errors on the mean of the estimated thin-plate spline from the GAM smoother.

Latent class analyses

As described in the methods section we estimated four candidate models (WH, WH-A, WH-C and WH-AC) to quantify the evidence in support for the anergy hypothesis which was motivated by the qualitative age-patterns described above (Fig. 1). The WH and WH-A models were rejected due to evidence of a systematic lack of fit. These models, which do not include an age-catalytic terms for the “true” (or latent) prevalence, systematically overestimate prevalence in younger age groups and underestimate it in older age groups (Supplementary Figs. 2, 3). Given that the theoretical basis of the Walter-Hui framework depends on the trade-off in the rates of false and true positives expected in populations with differing prevalence it is no surprise that the lack of fit with respect to age consequently leads to a systematic overestimate in the pairwise probability of agreement between tests (Inline graphic, Supplementary Figs. 6, 7) for younger age groups (1–2 years) and an underestimate for older age groups (6–7 years). The two remaining models with a catalytic component (WH-C and WH-AC) provide substantially better fits for both the age distribution (Supplementary Fig. 4, 5) and Inline graphic (Supplementary Figs. 8, 9) demonstrating no evidence for conditional dependence between test results.

Leave-one-out (LOO) cross validation suggests that the model with both age-dependent prevalence and test sensitivity (WH-AC) has a significantly better fit to the data (Table 2) based on the difference in Inline graphic and is thus chosen as our selected final model.

Table 2.

Model comparison through difference in expected log pointwise predictive density (elpd).

Model Inline graphic Standard error
WH-AC 0.0 0.0
WH-AC (uniform) − 0.1 2.9
WH-C − 5.0 2.6

Test sensitivity and specificity

The IDEXX M. bovis antibody test has the lowest estimated sensitivity of the panel of evaluated tests at 19% (95% CrI 16–28). However, this is likely due to the timing of sample collection, which was conducted before stimulation with the tuberculin-based skin test. Setting aside the results for the IDEXX M. bovis antibody test, our final selected model places the two most common formats of the tuberculin test as extremes in terms of the estimated sensitivity and specificity. The SIT has the highest estimated sensitivity at 99% (95% CrI 96–100), but lowest specificity at 76% (95% CrI 70–82) while the CIT has the highest specificity at 100% (95% CrI 98–100) but sensitivity at 77% (95% CrI 67–84). The new DIVA tests (DST10, DST30 and DSTF) are estimated to have diagnostic characteristics intermediate between these extremes with estimated sensitivities of 84, 88 and 87% respectively (Table 3) that are greater than that of the CIT but lower than the estimate for the SIT. The sensitivity and specificity of the peptide-based and fusion protein-based DSTs were found to be qualitatively similar with overlapping 95% credible intervals (Fig. 2, Table 3).

Table 3.

Estimated sensitivity and specificity of diagnostic tests with 95% credible intervals (CrI), to the nearest whole percent. Youden’s Index gives a measure of the global accuracy of each test. Defined as (sensitivity + specificity) − 1, the value ranges from 0 (poor) to 1 (perfect).

Model Test Sensitivity (95% CrI) Specificity (95% CrI) Youden’s index (95% CrI)
WH-AC CIT 77 (67–84) 100 (98–100) 0.77 (0.7–0.8)
CITz 93 (84–98) 84 (79–89) 0.80 (0.7–0.9)
DST10 84 (73–90) 84 (79–89) 0.68 (0.6–0.8)
DST30 88 (76–94) 79 (74–84) 0.67 (0.5–0.8)
DSTF 87 (75–94) 85 (80–89) 0.72 (0.6–0.8)
IDEXX 19 (16–28) 99 (97–100) 0.17 (0.1–0.3)
IGRA 83 (69–91) 92 (88–95) 0.75 (0.6–0.8)
SIT 99 (96–100) 76 (70–82) 0.75 (0.7–0.8)
WH-AC (uniform priors) CIT 78 (68–85) 100 (100–100) 0.78 (0.7–0.8)
CITz 98 (91–100) 85 (80–89) 0.83 (0.7–0.9)
DST10 85 (74–91) 84 (79–89) 0.69 (0.6–0.8)
DST30 89 (77–95) 79 (74–84) 0.68 (0.5–0.8)
DSTF 89 (76–95) 85 (80–89) 0.73 (0.6–0.8)
IDEXX 18 (16–27) 100 (98–100) 0.18 (0.2–0.3)
IGRA 85 (72–92) 92 (88–96) 0.78 (0.6–0.9)
SIT 100 (99–100) 76 (70–82) 0.76 (0.7–0.8)

The test cutoffs are as follows: CIT (> 4 mm), CITz (> 0 mm), SIT (> 2 mm), DST10, DST30, and DSTF (all ≥ 2 mm), IGRA (≥ 0.1), and IDEXX (≥ 0.3).

Fig. 2.

Fig. 2

Sensitivity and specificity of the 8 diagnostic tests as estimated by our selected final model (WH-AC). The respective cut-offs for the eight tests are; DST10, DST30 and DSTF (all at ≥ 2 mm cut-off), CIT (> 4 mm), CITz (CIT > 0 mm), SIT (> 2 mm), IGRA (≥ 0.1), and IDEXX (≥ 0.3). Point (median, red dot) estimates of sensitivity (left) and specificity (right) are presented for each test with lines indicating the 95% credible intervals (thick red line) and posterior range (thinner red line). A shaded density strip (grey scale) for each quantity illustrates the shape of the posterior distribution within intensity of ink proportional to the posterior density.

Age dependence of test sensitivity

The final selected WH-AC model estimates varying patterns of age-dependence for all eight tests (Fig. 3). However, the CIT and DST10 tests are the only tests where we estimate a biologically significant decline in the estimated sensitivity with age corresponding to a drop from ~ 75% in calves to ~ 52% in adults aged 9 years old for the CIT, and from ~ 83% to ~ 68% for the DST10.

Fig. 3.

Fig. 3

Dependence of test sensitivity on age. The estimated sensitivity of the eight tests (interpretations) as a function of age for our final selected model (WH-AC). Posterior predictive distributions for the respective test sensitivity in each discrete age group are presented as point intervals with median estimated value (red dot), 95% intervals (thick red line) and posterior predictive range (thinner red line). A linear smoothed trend line is added (red) to illustrate the relative strength of the estimated age-effect for each test.

Inferred “true” prevalence within herds

In a latent class model the estimates of test sensitivity and specificity are conditional on and jointly estimated with the “true” prevalence within each as inferred from the full set of diagnostic tests. For our purpose the inferred prevalence is essentially a nuisance parameter to be integrated out to estimate the test characteristics. For completeness, we present the inferred within herd prevalence in supplemental information (Supplementary Figs. 10 and 11). We note the imprecision of estimates of the average within-herd prevalence because of the significant variation in risk of infection with age within each herd. Failing to collect the age of cattle and adjusting for this systematic variation within herds could lead to underestimates in the uncertainty of both test characteristics and the inferred prevalence from latent class models for bTB.

Sensitivity of estimates to prior assumptions

To evaluate the sensitivity of our model estimates to our assumed prior distributions we refitted the final selected model (WH-AC) using uniform prior distributions for the test parameters (Inline graphic and log transformed force of infection (Inline graphic). The predictive power of this alternative model fit was found to be statistically indistinguishable and the estimates of sensitivity and specificity identical up to small variations from numerical precision (Table 3).

Test agreement for repeat tested animals

As previously described 142 animals were tested twice after an interval of 2–3 months (Supplementary Fig. 13). The percentage of agreement (concordant results) was relatively higher for tests that used the standard PPD antigen (CIT, CITz, SIT and IGRA) ranging from 83 to 87%. Whereas the percentage of agreement between the initial and repeated tests for the defined antigen skin tests (DST10, DST30 and DST F) varied from 73–74% (Table 3).

The repeatability of the tests was assessed using the Cohen’s Kappa coefficient and a good (substantial) test agreement (Kappa coefficient from 0.66 to 0.75) was observed with the CITz, SIT and IGRA tests between the initial and repeated test results. However, the test agreement of CIT and the defined antigen skin tests were between 0.46 and 0.57 indicating a moderate agreement (Table 4). However, aside from the CITz, which shows good agreement, and the DSTs, which demonstrate moderate agreement, the differences between the other test agreements are not statistically significant, with overlapping 95% confidence intervals.

Table 4.

Concordant and discordant test results of twice tested animals and test agreement between initial and repeated test using Cohen’s Kappa Coefficient (n = 142).

Initial and repeated test CIT CITz DST10 DST30 DSTF IGRA SIT
Initial and repeated test result, Number positive (%)
Initial 39 (27.5) 75 (52.8) 77 (54.2) 83 (58.5) 80 (56.3) 58 (40.8) 82 (57.7)
Repeated 37 (26.1) 69 (48.6) 60 (42.3) 67 (47.2) 61 (43) 55 (38.7) 83 (58.5)
Concordant and discordant test result (Initial/Repeated), n (%)
+/+ 26 (18) 63 (44) 50 (35) 56 (39) 51 (36) 45 (32) 72 (51)
+/− 13 (9) 12 (9) 27 (19) 27 (19) 29 (20) 12 (9) 10 (7)
−/+ 11 (8) 6 (4) 10 (7) 11 (8) 10 (7) 10 (7) 11 (8)
−/− 92 (65) 61 (43) 55 (39) 48 (34) 52 (37) 75 (52) 49 (34)
Concordant 118 (83) 124 (87) 105 (74) 104 (73) 103 (73) 120 (84) 121 (85)
Discordant 24 (17) 18 (13) 37 (26) 38 (27) 39 (27) 22 (16) 21 (15)
Test agreement using Cohen’s Kappa Coefficient
Kappa coefficient (95% CI) 0.57 (0.42–0.72) 0.75 (0.64–0.86) 0.49 (0.35–0.62) 0.47 (0.33–0.61) 0.46 (0.32–0.6) 0.66 (0.54–0.79) 0.7 (0.58–0.82)
Agreement Moderate Good Moderate Moderate Moderate Good Good

Discussion

This study assessed the field performance of the DST and compared it to tuberculin tests based on PPDs. Animals from intensive dairy farms with different levels of bTB prevalence were included in the study enabling a latent class analysis to be carried out. Skin tests were performed using a peptide-based DST (DST10, DST30), fusion protein based DSTF, and PPDs (SIT, CIT, CITz). Additionally, blood based IGRA and IDEXX M. bovis antibody tests were conducted as additional points of reference. The sensitivity and specificity of all eight tests were estimated using a Bayesian latent class model. We extended the baseline Walter-Hui model to adjust for the qualitative patterns of age-dependence in test positivity observed in Ethiopian herds and to test for evidence of the impact of anergy on tuberculin test results.

Conditional dependence and model validation

Conditional dependence is a common concern for the application of latent class methods particularly in our case where we compare different formats of the basic tuberculin test which depend on measuring the response to subsets of the same set of antigens. In this study, we explicitly test this assumption using posterior predictive tests and found no evidence for (statistical) dependence between any of the tests for our final selected model. However, we also demonstrated that failing to adjust for the increase in prevalence with age expected for a chronic disease can impose a pattern of conditional dependence between diagnostic tests. This model misspecification demonstrated for the WH and WH-A models illustrates that failing to collect the age of animals for studies of diagnostic performance for chronic infections in endemically infected populations has the potential to bias estimates as well as reducing the information available to infer test characteristics. However, we acknowledge the important limitation to our approach which is that the age-catalytic model assumes that the populations, in this case herds, are at an endemic equilibrium and the difference in average prevalence between herds demonstrate a systematic variation in the risk (force) of infection on herds. If disease is emerging or remerging within herds this assumption may itself potentially introduce bias and cannot easily be tested without longitudinal data.

Comparisons of test performance

Key to the approval of DST tests for use as part of control or for demonstrating freedom of infection for trade is that they are demonstrated to be equivalent to tuberculin testing. Our final selected model suggests that both the sensitivity and specificity of the peptide-based DSTs (DST10, DST30) and the fusion protein-based DST (DSTF) are intermediate between the higher specificity CIT and higher sensitivity SIT in this population. This has important implications for the potential to use DST tests as the differences in characteristics of the SIT and CIT tests mean that they are typically recommended for use in different situations. The higher specificity of the CIT means that it is more commonly used for screening where large number of animals are being tested and there is a desire to minimize false positives. Likewise, for assurance of freedom of infection for trade, or clearing herds of infection, the SIT is often favored due to its higher sensitivity.

Our results suggest that the DSTs provide a level of performance that is a compromise between these two extremes (and use cases). For countries where screening is based on the standard CIT (4 mm cut-off), our results suggest that—in terms of test sensitivity at least—switching to the DST could provide similar or better effectiveness in identifying and clearing infections from herds, even before considering the additional benefits of vaccination. Indeed, vaccination could provide substantial benefits in reducing prevalence in endemic herds across all testing regimes. However, an important caveat is that the specificity estimates from our latent class model would lead to an unacceptably high number of false positive reactors and TB incidents given the frequency of testing within statutory programs in countries such as the UK and Ireland26.

An important limitation of our study design is the lack of post-mortem examination of subjects and thus the opportunity to confirm infection through pathology and culture. This lack limits our ability to evaluate the absolute values of the test characteristics and particularly the specificity. Given the recruitment of animals from commercial dairy farms, and need for a representative distribution of ages, this would not have been practically (or economically) possible in our study design. Although an imperfect gold-standard, post-mortem confirmation could have identified groups of animals insensitive to ante-mortem tests. The existence of such a group of animals would change estimates of the true prevalence of infection within our herds and thus the estimated test characteristics of all of the ante-mortem diagnostic tests. In contrast to the requirement (assumption) of conditional dependence between tests the effect of such missing information can’t be assessed or tested through posterior checks. For this reason, demonstrating the absolute specificity of the DST would be better achieved through large scale trials of the tests in populations known to be free from disease such as those currently being carried out in the UK and due to report in 2025. While our study provides valuable insights into the relative performance of different diagnostic tests, future studies in disease-free populations will be crucial for establishing the true specificity of the DST.

Another consideration for use of the DST in countries with established control programs is that many countries (such as Ireland and the UK) use more severe interpretations of the CIT (as with the CITz in our study) to enhance sensitivity in high-risk herds a flexibility that the DST does not currently provide5,27,28. For countries that base control on the SIT, while vaccination benefits may still be valuable in endemic herds, the DSTs’lower sensitivity compared to SIT would require careful consideration. In these contexts, particularly for establishing disease freedom or enabling trade and movement certification, the more sensitive test interpretations would minimize false negatives.

The expected benefits of cattle vaccination must be balanced against the need for optimal test sensitivity in different contexts26. Future research could enhance diagnostic capability through two complementary approaches: improving the sensitivity of current DSTs while maintaining their DIVA capabilities, or developing new molecularly defined tuberculins that maintain the high sensitivity of the SIT while gaining improved specificity. The latter approach has shown promise in recent studies29, suggesting multiple pathways to achieve optimal diagnostic performance for different control contexts.

Several other studies have assessed the specificity of DST based on its cross-reactivity with non-tuberculous environmental mycobacteria and BCG. A study on guinea pigs30 indicated that those animals experimentally inoculated with non-tuberculous mycobacterial species exhibited no cross-reactions to either the peptide-based or fusion protein-based DSTs. Only guinea pigs exposed to M. bovis and M. caprae exhibited responses to both types of DSTs30. Other studies have also reported a higher specificity of up to 100% for DST in non-BCG vaccinated control animals, without any false positive results31. Various studies have also assessed the specificity of the DST reagent, composed of the three specific antigens ESAT-6, CFP-10, and Rv3615c, on BCG-vaccinated cattle. These studies highlighted the DST’s effectiveness in differentiating between infected and vaccinated animals (DIVA). There were no reported cases of cross-reactions with the DST in cattle that received either a single dose of BCG vaccination or were revaccinated17,18,31.

In our study, we found an extremely low sensitivity of the IDEXX M. bovis antibody test. This is likely to be a consequence of samples being collected before the PPD injection was carried out for skin testing which is well known to substantially increase the sensitivity of IDEXX M. bovis antibody test through the anamnestic antibody response (boosting effect)3234. The IDEXX M. bovis antibody tests were included for the latent class analysis due to the value of including a test which measures a different immunological response to the tuberculin and DST antigen assays. There are also studies that have reported serological tests can be used to identify subpopulations of CIT-negative but truly infected animals, as confirmed by postmortem examination and/or culture positivity35.

Test repeatability and concordance

The repeatability of tests was assessed on 142 animals using the initial and repeated test results, a higher percentage of concordant results was observed in PPD-based tests compared to DSTs. Additionally, the Kappa coefficient indicated good agreement for PPD-based tests, while moderate agreement was seen in DSTs between initial and repeated tests. Given that there was a 2–3 month gap between the initial and repeated tests, various factors could have led to discordant results. Some animals shifted from a positive to a negative test status for reasons that generally decreased the test’s sensitivity, while others changed from a negative to a positive test result which could be a consequence of the earlier test missing infection or it being acquired after the initial test9. The amplitude of skin thickness in bTB reactor animals was reported to be lower for DST compared to PPD20. Since a positive result for DST is defined as a skin thickness of ≥ 2 mm, measurement errors could also contribute to discordant results, with readings clustering around the cutoff potentially leading to these discrepancies. Overall, various factors, including time between tests, potential new infections, and measurement errors, could contribute to discordant results. These findings underscore the need for further research, including post-mortem confirmation, to enhance our understanding of test reliability.

Age-dependence of test sensitivity

The inclusion of age-dependence for both the prevalence and sensitivity of diagnostic tests was found to significantly improve the overall fit and predictive (strictly the classification) ability of our model to this data set. Moreover, the age-catalytic component which adjusts for the expected increase in prevalence of bTB with age was essential for the Walter-Hui model to pass the full set of posterior predictive checks. However, the measured effect of age was small and variable, most pronounced for the CIT with no effect measured for the SIT. Despite the potential for anergy being repeatedly reported in the literature5,8,9,36,37, this is the first attempt we are aware of to quantify the impact in naturally infected populations. However, the large variability in the estimated age-dependence of test positivity between tests is inconsistent with the anergy hypothesis. A priori we might expect anergy to impact on all of the tuberculin and DST diagnostics which stimulate the same basic immune response equally. However, while the estimated relative difference in apparent prevalence for all diagnostics (other than IDEXX) demonstrates a similar reduction with age from the peak prevalence (Fig. 1), there are mixed effects estimated by the latent class model for test sensitivity with no apparent impact of age on the SIT and CITz tests. Our design cannot necessarily distinguish what mechanism is driving the patterns of age-dependence. While anergy was the motivating factor for this study it is not the only mechanism that could lead to a levelling out or decline in the estimated age-prevalence curve. A higher rate of exposure to the disease in calves, e.g. through bulk milk feeding, could lead to a leveling out of the age-prevalence curve, but could not in itself explain a decline. Selective removal of infected animals from herds either due to mortality or reductions in production and productivity of infected animals could generate such a “turnover” of the apparent prevalence curves in line with the patterns seen in countries with established test-and-slaughter programs10. Our analysis also makes the assumption that the age-distribution within each herd is stable and represents an endemic equilibrium with variation between herds modelled through estimating a different force of infection for each herd. If the herds are not endemically infected this variation could also arise from differences in the time of introduction of disease. For the small number of herds we were able to estimate prevalence twice we found no significant differences in apparent (Supplementary Table 1) or inferred prevalence between the two-time points consistent with earlier results from repeat testing of herds in Northern Ethiopia38.

Given the lack of age-dependence for the single SIT and severe interpretation of the comparative test (CITz) a parsimonious explanation might be that this is not in fact the result of animals developing a state of anergy, but rather an increasing exposure to environmental mycobacteria with age interfering with the comparative test. This hypothesis is supported by the differential patterns of response to avian and bovine tuberculin seen in test negative animals (Supplementary Fig. 12). While avian and bovine reaction sizes decrease linearly with age in SIT positive animals the average avian reaction size in bovine nonreactors has a non-linear relationship with age increasing in older animals after a minimum value around 7 years of age. However, this hypothesis cannot explain the (albeit less pronounced) decline in estimated sensitivity for the DST10 and DST30 tests.

We only consider the relationship between age and test-positivity without adjusting for other known risk factors (such as desensitization from repeated testing, operator error, coinfection with parasites etc.)5,39,40. Studies have previously reported that the size of the tuberculin reaction (CIT) is associated with age, with younger animals (< 2.85 years) typically exhibiting larger reactions compared to older ones41,42. While inconclusive, our study suggests that further research is recommended to validate the predicted drop in sensitivity of the CIT for older animals which could have significant implications for countries, notably the UK and Ireland that rely on the CIT for demonstrating freedom of infection within herds. O’Hagan et al. estimated a smaller decline in sensitivity of the CIT with age on sensitivity from a two population latent class model comparing CIT status to post-mortem confirmation43. Age has consistently been reported as a risk factor for bTB in the UK and Ireland44, however the risk of test positivity increasing to between 2–3 years before plateauing is markedly different from the distribution of visible lesions and confirmed infection within reactors (which declines with age in both Ireland and the UK)10,45,46. It should be noted that the distribution of disease in the cattle population in the UK and Ireland is quite different to the endemic setting in Ethiopia. In regions where bTB is actively managed, cattle populations are subject to frequent testing and immediate removal of test-positive animals. Consequently, positive animals are less likely to accumulate in older age groups, and infections observed in older cattle are more likely to result from recent transmission. As such the potential importance of the age-dependent effect measured in this study to managed populations would depend on whether the effect is one of time from infection (the anergy hypothesis) or the age of animals at infection. This is not a distinction that can be made from our data and would also be challenging to assess in the UK and Ireland due to lower overall infectious pressure and frequency of infections in older animals.

Concluding comments and future directions

Our study provides critical insights into the performance of DSTs compared to traditional tuberculin tests, with important implications for bTB control strategies. The intermediate sensitivity and specificity of DSTs between the SIT and CIT suggest that these tests could serve as a valuable tool in bTB control programs, particularly in countries considering the implementation of BCG vaccination. The observed age-dependent decrease in CIT sensitivity, although requiring further validation, could have profound implications for countries using the CIT for demonstrating herd-level freedom from infection.

Future research should focus on enhancing the sensitivity of DSTs to match or exceed that of the SIT, while maintaining their DIVA capability. Additionally, large-scale trials in known bTB-free populations are crucial to definitively establish the specificity of DSTs. Further investigation into age-related effects on test performance is warranted, potentially leading to age-adjusted testing protocols. Finally, studies incorporating post-mortem examinations would provide valuable data to refine our understanding of test characteristics and the true prevalence of bTB in study populations and are essential for optimizing bTB control strategies, particularly in countries considering the integration of BCG vaccination or relying on comparative intradermal tests for trade certification.

Methods

Study area

The study was conducted in intensive dairy farms from Sebeta and Sendafa towns in central Ethiopia. Previous bTB test results available at the Animal Health Institute (AHI) were used to select dairy farms with a history of bTB positivity and with variable levels of within-herd prevalence (Supplementary Table 1).

Study animals

The study was carried out using exotic and cross breed (cross of Holstein Friesian or Jersey with local breed) dairy cattle managed under intensive production systems in the selected farms. The dairy cattle included in the study were kept indoors and fed grass hay inside the barn, with limited or no outdoor grazing. All healthy cattle were tested for bTB, with the exception of calves younger than 5 months, sick animals on treatment, animals in the last month of pregnancy and early postpartum period (within 1 month of calving)5,9. These exclusion criteria were set to avoid factors known to be associated with false negative results to the tuberculin skin tests in cattle. Two to three antigens were injected in each side of the neck during the study. Calves (younger than five months) were excluded from the study due to the limited space on the neck of these animals. All tested cattle had not received BCG vaccination.

Study design

A cross-sectional prospective study was carried out, which included the initial testing of 446 animals across 22 dairy farms, with a subsequent retest of 142 animals from 11 of those farms after an interval of 2–3 months based on convenience sample. In total, 588 sets of animal test results were collected. Screening for bTB was conducted using a skin test with five antigens, IGRA, and the IDEXX M. bovis antibody test ELISA.

Skin tests

The five antigens used for the skin tests were; two concentrations of a peptide-based DST (referred to as DST10 and DST30), a fusion protein-based DST (DSTF), and tuberculin PPDs (PPD-A and PPD-B). These antigens were simultaneously injected into each animal, with three antigens injected on the left side and two on the right side neck of the animal. The antigens were injected in rotation in all the five injection sites using a Latin square design, to give equal chance of sensitivity for all the five antigens. A minimum distance of ~ 10–12 cm was maintained between two injection sites (Supplementary Fig. 1)7,47.

The tuberculin skin test

The tuberculin tests were performed following the World Organization for Animal Health (WOAH) guidelines2. Bovine PPD (3000 IU/dose) and avian PPD (2500 IU/dose) were administered intradermally using McLintock syringe with the bevel edge of needle faced outwards (0.1 ml final volume; Prionics/ThermoFisher Scientific, Lelystad B.V., The Netherlands). The skin thickness before and 72 Inline graphic 4 h after PPD injection was measured by the same operator using manual Irish calipers. The difference in the increase of skin thickness measurements at the PPD-B and PPD-A sites before and after inoculation was considered for interpretation. According to the WOAH standard interpretation for the CIT, a reaction was considered positive if ΔB − ΔA > 4 mm. Moreover, with more stringent interpretations, animals were deemed positive if CIT > 0 mm (CITz, ΔB > ΔA)2,48,49.

In the SIT which requires PPD-B only, the animal was considered positive if the increase in skin-fold thickness was more than 2 mm (ΔB > 2 mm)49.

Defined antigen skin test (DST)

Both the synthetic peptide cocktail based and the recombinant fusion protein based defined antigen skin test (DST) reagents prepared from the three mycobacterial antigens ESAT-6, CFP-10 and Rv3615c were used for the skin test20,31. The peptide-based DST consists of chemically synthesized peptides, offering easier production, quality control, and lower regulatory burden compared to recombinant fusion protein based DST. In contrast, the recombinant fusion protein based DST (DST-F) uses a recombinant fusion protein containing the three antigens. The lyophilized peptide based DST (100 µg of each peptide per vial; peptides synthesized by GenScript, USA and USV Private Limited, India) was diluted with sterile distilled water before injection50. For DST10, vials were reconstituted with 1000 µl sterile water to give a concentration of 10 µg/100 µl of each peptide. For DST30, vials were reconstituted with 333 µl sterile water to give a concentration of 30 µg/100 µl. The two concentrations of DST10 and DST30, containing respectively 10 and 30 µg/100 µl of the peptide cocktail, were injected intradermally (0.1 ml) using a McLintock syringe to assess the field performance of the peptide based DST. Skin thickness was measured before and 72 h after the injection by the same operator. In the interpretation of the test result, the animal was considered positive if there was an increase of 2 mm or more in skin-fold thickness19,20.

The fusion recombinant protein of the three antigens ESAT-6, CFP-10 and Rv3615c (DSTF), that was supplied as a ready to use solution (300 µg total protein/ml, Lionex, Germany) was also injected simultaneously with the two concentrations of the peptide based DST (DST10 and DST30) and PPDs during skin test31. The same 0.1 ml dose of the recombinant fusion protein DSTF antigen was injected using McLintock syringe and the test result was considered positive if there was an increase of 2 mm or more in skin-fold thickness.

Interferon gamma release assay (IGRA)

The IGRA test was performed using the commercially available M. bovis gamma interferon test kit for cattle BOVIGAM (Product number: 63326, Prionics, Swizerland). Whole blood samples were collected before antigen injection for the skin test using lithium heparin vacutainer tubes and were immediately transported at room temperature to the Animal Health Institute, Ethiopia, for antigen stimulation within a maximum of three hours.The collected whole blood samples were stimulated with PPD-B and PPD-A, at a final concentration of 300 and 250 IU/ml, respectively. Moreover, RPMI1640 media with l-Glutamine and 10 μg/ml final concentration of Pokeweed Mitogen (PWM) were also used as negative (nil) and positive controls for the whole blood stimulation process, respectively. The samples were incubated in a humidified atmosphere at 37 °C and 5% CO2 for 16–24 h, following which the incubation plates were centrifuged at 300 g for 10 min at room temperature and the supernatant (plasma) was harvested. The harvested plasma was tested using the BOVIGAM ELISA kit to detect the secretion of interferon gamma (IFN- γ) from the stimulated T cells5153 following the manufacturer’s instructions and the absorbance was measured at 450 nm using ELISA plate reader. Each sample was stimulated and tested in duplicate, and the mean optical density (OD) value was used for interpreting the results. A reaction was considered positive when the difference between the mean OD values of bovine PPD and avian PPD was equal to or greater than 0.1.

Serological tests

The presence of antibody to M. bovis in serum sample was detected using the IDEXX M. bovis antibody test kit (IDEXX M. bovis, product number: 99–29853). Blood samples were collected before antigen injection for skin test using plain vacutainer tube. Serum was harvested after keeping the blood sample in slightly slant position at room temperature for 24 h. The serum samples, along with the positive and negative controls included in the kit, were first diluted 1/50 with the sample diluent. The test was conducted as per the IDEXX ELISA test kit instruction and samples were tested in duplicate. Absorbance was then measured at 450 nm using an ELISA reader. For interpreting the results, the sample to positive (S/P) ratio was calculated for each sample to determine the presence or absence of M. bovis antibodies. Accordingly, samples with S/P ≥ 0.30 was considered positive.

Ethical clearance

Ethical clearance to conduct the study was obtained from the Animal Research Scientific and Ethics Review Committee (ARSERC) of the Animal Health Institute (Ref. number: ARSERC/EC/003/26/09/2019). The screening test for bTB was carried out following the World Organization for Animal Health’s guidelines2.

Statistics and reproducibility

Cohen’s Kappa coefficient was calculated to assess the test agreement for double tested animals. The interpretation of the Cohen’s Kappa values was conducted according to54 as follows; a value ≤ 0 considered as no agreement, 0.01–0.20 slight agreement, 0.21–0.40 fair agreement, 0.41–0.60 moderate agreement, 0.61–0.80 good (or substantial) agreement, and 0.81–1.00 almost perfect agreement.

Age-related trends in test positivity

To explore the qualitative trends in test positivity with age (Fig. 1) we fitted a thin-plate spline (restricted to 4 knots) to the testing data (aggregated to 9 annual age groups for animals between 0 and 9 years old) using a generalized additive model (GAM). The GAM was estimated using a binomial family with default Logit link function to account for the declining number of animals in older age cohorts.

Walter-Hui latent class models

For this study we adapt the Walter-Hui latent class modelling framework previously used to evaluate the performance of the DST antigens in buffalo (Bubalus bubalis)50. The Walter-Hui model provides a likelihood to estimate the sensitivity and specificity of competing diagnostic tests when samples are available from at least two populations with differing prevalence. This likelihood depends on the assumption that the probability of test Inline graphic being positive for individual Inline graphic) only depends on the individuals true (latent) disease status (Inline graphic where Inline graphic if the animal is disease free and Inline graphic if it is infected) and not the response of the other tests. To satisfy this requirement we exclude the set of repeated test measurements. Thus, for the latent class model we analyze 446 unique animal test results, from 22 herds).

Following22,55 we parameterize the model using a probit (Inline graphic) link function:

graphic file with name 41598_2025_5223_Article_Equa.gif
graphic file with name 41598_2025_5223_Article_Equb.gif

To ensure numerical stability we restrict the sensitivity parameters (on the probit scale) Inline graphic to the range Inline graphic. A common issue with this class of models is that the likelihood can be symmetric under relabeling of the latent variable. This can lead to a bi-modal posterior distribution where two modes representing the situation where the true positive rate is greater than the false positive rate (TPR > FPR, Inline graphic and vice versa have equivalent fits to the data. Given the performance of the SIT and CIT tests in other contexts and the further analysis presented in50 we consider the situation where the FPR > TPR to be biologically implausible and choose prior assumptions to force selection of this mode by restricting Inline graphic and Inline graphic to the range Inline graphic. This is equivalent to assuming that no tests have a sensitivity or specificity of < 16%. For the baseline Walter-Hui model (WH) we further assume normal priors for the test parameters and uniform priors for the true within-herd prevalence in herd Inline graphic (Inline graphic—the latent variable to be inferred for each herd):

graphic file with name 41598_2025_5223_Article_Equc.gif
graphic file with name 41598_2025_5223_Article_Equd.gif
graphic file with name 41598_2025_5223_Article_Eque.gif

Motivated by the qualitative patterns of test-results with age we extended this baseline model to allow test sensitivity (but not specificity) and the latent variable (Inline graphic to vary with the age (Inline graphic) of animals:

graphic file with name 41598_2025_5223_Article_Equf.gif

where Inline graphic is the age of animal Inline graphic (to nearest year) and Inline graphic is the maximum age of animals in the study (rounded to nearest year, Inline graphic). Inline graphic measures the effect of age on the sensitivity of each test Inline graphic. We restrict this parameter to the range Inline graphic to allow for both positive and negative effects with age and chose a normal (0,1) prior:

graphic file with name 41598_2025_5223_Article_Equg.gif

To model an age-dependence in the prevalence of infection within herds we use a catalytic infection model10, where we assume that the herds are endemically infected and all animals experience the same average fixed force of infection Inline graphic. Under these assumptions we can write differential equations for the proportion of susceptible (S) and infected (I) animals with respect to age:

graphic file with name 41598_2025_5223_Article_Equh.gif
graphic file with name 41598_2025_5223_Article_Equi.gif

where Inline graphic is the age-dependent mortality rate. The age of animals was estimated to the nearest year, so we choose to formulate our model likelihood based on the proportions of animals in annual age cohorts. Assuming animals are all born susceptible and as shown in10 the proportion of susceptible (Inline graphic) and infective (Inline graphic animals in each coarse age-cohort (Inline graphic) is given by:

graphic file with name 41598_2025_5223_Article_Equj.gif
graphic file with name 41598_2025_5223_Article_Equk.gif

where Inline graphic is now a vector with elements Inline graphic corresponding to mortality rate within cohort Inline graphic.

graphic file with name 41598_2025_5223_Article_Equl.gif

Using this solution we can write the probability that an animal of age Inline graphic in herd Inline graphic will be infected as:

graphic file with name 41598_2025_5223_Article_Equm.gif

To ensure numerical stability we sample the force of infection on the log scale defining:

graphic file with name 41598_2025_5223_Article_Equn.gif

where we constrain Inline graphic and define a shrinkage prior:

graphic file with name 41598_2025_5223_Article_Equo.gif

with hyperparameter:

graphic file with name 41598_2025_5223_Article_Equp.gif

To test the sensitivity our estimates to these prior assumptions we refitted the final selected model using uniform priors:

graphic file with name 41598_2025_5223_Article_Equq.gif
graphic file with name 41598_2025_5223_Article_Equr.gif
graphic file with name 41598_2025_5223_Article_Equs.gif
graphic file with name 41598_2025_5223_Article_Equt.gif

Finally, we use estimates for the age-dependent removal rates Inline graphic from fitting a piecewise constant demographic model to a larger collection of dairy herds from across Ethiopia (see reference for full details and data)56.

We fit four alternative models to explore the extent to which age-dependent effects are supported by the data and may influence the estimated diagnostic performance of the DST:

  • WH model (Baseline Walter-Hui): test sensitivity and prevalence independent of age

  • WH-A (Age) model: prevalence independent of age, test sensitivity dependent on age

  • WH-C (Catalytic) model: prevalence dependent on age, test sensitivity independent of age

  • WH-AC (Age-Catalytic) model: prevalence and test sensitivity dependent on age

The latent class models were implemented using stan and estimated using Hamiltonian MCMC57. Convergence was assessed through visual inspection of the chains and standard diagnostic statistics Inline graphic) for all parameters after 4,000 iterations for 8 chains. Leave-one-out (LOO) cross-validation58 is used for model comparison and selection. The expected log pointwise predictive density: Inline graphic, which measures the predictive accuracy of the model when a single observation is dropped out, was estimated by standard importance sampling (SIS-LOO). The difference Inline graphic) between Inline graphic for alternative models fitted to the same data provides a measure of their relative predictive accuracy. The standard error on the difference gives a measure of uncertainty. Standard errors comparable to the magnitude of the difference Inline graphic suggest that the relative predictive accuracy of the two models is indistinguishable.

Posterior predictive checks for model fit and assumption of conditional independence

Model fit was assessed by visual inspection of the posterior predictive distributions of the fitted models to the distribution of test results stratified by age and herd.

The SIT and CIT test results have an implicit dependence on each other in the sense that they are both calculated based on the magnitude of the observed reaction to bovine tuberculin. Comparison to avian tuberculin in the CIT test is intended to raise specificity and reduce sensitivity. As tuberculin contains the DST antigens at lower concentration there is also expected to be some biological dependence between the DST and tuberculin based assays. A biological dependence between tests implies but does not guarantee that there will be a statistical association between the results of two tests strong enough to violate the assumption of conditional independence. If the correlation between (true) infection status and the results of each test result is large compared to the correlation between the test results themselves then the conditional dependence will not affect the parameter estimates and the test results can effectively be treated as (statistically) independent.

Dendukuri et al. 2009 proposed a method for testing whether the assumption of conditional dependence is satisfied by calculating the pairwise probability of agreement between each pair of diagnostic tests (Inline graphic):

graphic file with name 41598_2025_5223_Article_Equu.gif

Any systematic differences between the observed (Inline graphic) and expected values from the estimated model (Inline graphic) would imply a violation of the assumption of conditional independence.

Supplementary Information

Acknowledgements

The authors would like to express their gratitude to the dairy farms for their permission, assistance in animal handling, and provision of information during the bTB testing. We sincerely thank Dr. Martin Vordermeier and Dr. Gareth Jones for providing the DSTF reagent (fusion protein-based DST) and for their insightful discussions, critical review of the manuscript, and invaluable guidance throughout this study.

Abbreviations

bTB

Bovine tuberculosis

CIT

Comparative intradermal test

DIVA

Differentiate infected from vaccinated animals

DST

Defined antigen skin test

IGRA

Interferon gamma release assay

PPD

Purified protein derivative

SIT

Single intradermal test

TST

Tuberculin skin test

WOAH

World Organisation for Animal Health

Author contributions

Conceptualization: A.J.K.C., S.S., J.L.N.W., V.K. Methodology: M.L., A.J.K.C., S.S., J.L.N.W., V.K. Investigation: M.L., B.T., B.Y., T.B., A.O., G.K., T.A., A.A., A.F., M.G.A., B.B., S.G., G.A.M., A.M. Data analysis: A.J.K.C., M.L., V.K. Funding acquisition: A.J.K.C., G.A., J.L.N.W., V.K. Project administration: V.K. Supervision: A.J.K.C., A.M., H.A., B.G., G.A., G.A.M., S.S., S.G., V.K. Writing—original draft: M.L., A.J.K.C., V.K. Writing—review and editing: all authors.

Funding

This research was carried out as part of the Accelerating Bovine Tuberculosis Control in Developing Countries (ABTBCD) project, which received funding from a grant (OPP1176950) provided by the Bill & Melinda Gates Foundation and the Foreign, Commonwealth and Development Office.

Data availability

All the analyzed data have been included in the manuscript and supplementary materials. Supplementary Tables 2 and 3 present a full cross-tabulation of the eight test results for each animal. Full data tables stratified by age and herd, and all codes are available in the GitHub code repository: https://github.com/MonkeyMyshkin/LatentDST. For further inquiries, please contact the corresponding authors.

Declarations

Competing interests

The Pennsylvania State University (SS and VK) is in the process of applying for intellectual property protection for a peptide-based DIVA skin test under patent number WO/2020/20836859. The authors declare no other competing interests.

Ethics statement

Ethical clearance was obtained from the Animal Research Scientific and Ethics Review Committee (ARSERC) of the Animal Health Institute (Reference number: ARSERC/EC/003/26/09/2019). All field work was conducted in accordance with WOAH guidelines. Furthermore, the study was reported in accordance with ARRIVE guidelines.

Footnotes

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Matios Lakew and Andrew J. K. Conlan contributed equally to this work.

Contributor Information

Matios Lakew, Email: matioslakew@gmail.com.

Andrew J. K. Conlan, Email: ajkc2@cam.ac.uk

Vivek Kapur, Email: vxk1@psu.edu.

Supplementary Information

The online version contains supplementary material available at 10.1038/s41598-025-05223-6.

References

  • 1.Kaufmann, S. H. & Schaible, U. E. 100th anniversary of Robert Koch’s Nobel Prize for the discovery of the tubercle bacillus. Trends Microbiol.13, 469–475 (2005). [DOI] [PubMed] [Google Scholar]
  • 2.WOAH. Mammalian tuberculosis (infection with Mycobacterium tuberculosis complex), WOAH Terrestrial Manual, 2022 Chapter 3.1.13. https://www.woah.org/fileadmin/Home/eng/Health_standards/tahm/3.01.13_Mammalian_tuberculosis.pdf (2022).
  • 3.More, S. J., Radunz, B. & Glanville, R. J. Lessons learned during the successful eradication of bovine tuberculosis from Australia. Vet. Rec.177, 224–232 (2015). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 4.Olmstead, A. & Rhode, P. An impossible undertaking: the eradication of bovine tuberculosis in the United States. J. Econ. Hist.64, 734–772 (2004). [Google Scholar]
  • 5.de la Rua-Domenech, R. et al. Ante mortem diagnosis of tuberculosis in cattle: a review of the tuberculin tests, gamma-interferon assay and other ancillary diagnostic techniques. Res. Vet. Sci.81, 190–210 (2006). [DOI] [PubMed] [Google Scholar]
  • 6.Bakker, D. & Good, M. Quality Control of Purified Protein Derivative Tuberculins: Essential for Effective Bovine Tuberculosis Control and Eradication Programmes (2018).
  • 7.Duignan, A., Kenny, K., Bakker, D. & Good, M. Tuberculin PPD potency assays in naturally infected tuberculous cattle as a quality control measure in the irish bovine tuberculosis eradication programme. Front. Vet. Sci.6, 328 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 8.Kleeberg, H. H. The tuberculin test in cattle. J. S. Afr. Vet. Assoc.31, 213–225 (1960). [Google Scholar]
  • 9.Monaghan, M. L., Doherty, M. L., Collins, J. D., Kazda, J. F. & Quinn, P. J. The tuberculin test. Vet. Microbiol.40, 111–124 (1994). [DOI] [PubMed] [Google Scholar]
  • 10.Brooks-Pollock, E. et al. Age-dependent patterns of bovine tuberculosis in cattle. Vet. Res.44, 1–9 (2013). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 11.Whelan, A. O. et al. Lack of correlation between BCG-induced tuberculin skin test sensitisation and protective immunity in cattle. Vaccine29, 5453–5458 (2011). [DOI] [PubMed] [Google Scholar]
  • 12.Berthet, F. X., Rasmussen, P. B., Rosenkrands, I., Andersen, P. & Gicquel, B. A Mycobacterium tuberculosis operon encoding ESAT-6 and a novel low-molecular-mass culture filtrate protein (CFP-10). Microbiology (Reading)144(Pt 11), 3195–3203 (1998). [DOI] [PubMed] [Google Scholar]
  • 13.Sidders, B. et al. Screening of highly expressed mycobacterial genes identifies Rv3615c as a useful differential diagnostic antigen for the Mycobacterium tuberculosis complex. Infect. Immun.76, 3932–3939 (2008). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 14.Sørensen, A. L., Nagai, S., Houen, G., Andersen, P. & Andersen, A. B. Purification and characterization of a low-molecular-mass T-cell antigen secreted by Mycobacterium tuberculosis. Infect. Immun.63, 1710–1717 (1995). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 15.Vordermeier, H. M., Jones, G. J., Buddle, B. M., Hewinson, R. G. & Villarreal-Ramos, B. Bovine tuberculosis in cattle: vaccines, DIVA tests, and host biomarker discovery. Annu. Rev. Anim. Biosci.4, 87–109 (2016). [DOI] [PubMed] [Google Scholar]
  • 16.Bayissa, B. et al. Field evaluation of specific mycobacterial protein-based skin test for the differentiation of Mycobacterium bovis-infected and Bacillus Calmette Guerin-vaccinated crossbred cattle in Ethiopia. Transbound Emerg. Dis.69, e1–e9 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 17.Srinivasan, S. et al. A defined antigen skin test that enables implementation of BCG vaccination for control of bovine tuberculosis: proof of concept. Front. Vet. Sci.7, 391 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 18.Subramanian, S. et al. Defined antigen skin test for bovine tuberculosis retains specificity on revaccination with bacillus Calmette–Guérin. Front. Vet. Sci.9 (2022). [DOI] [PMC free article] [PubMed]
  • 19.Whelan, A. O. et al. Development of a skin test for bovine tuberculosis for differentiating infected from vaccinated animals. J. Clin. Microbiol.48, 3176–3181 (2010). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 20.Srinivasan, S. et al. A defined antigen skin test for the diagnosis of bovine tuberculosis. Sci. Adv.5, eaax4899 (2019). [DOI] [PMC free article] [PubMed]
  • 21.Hui, S. L. & Walter, S. D. Estimating the error rates of diagnostic tests. Biometrics. 167–171 (1980). [PubMed]
  • 22.Collins, J. & Huynh, M. Estimation of diagnostic test accuracy without full verification: a review of latent class methods. Stat. Med.33, 4141–4169 (2014). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 23.Alvarez, J. et al. Evaluation of the sensitivity and specificity of bovine tuberculosis diagnostic tests in naturally infected cattle herds using a Bayesian approach. Vet. Microbiol.155, 38–43 (2012). [DOI] [PubMed] [Google Scholar]
  • 24.Pucken, V.-B. et al. Evaluating diagnostic tests for bovine tuberculosis in the southern part of Germany: A latent class analysis. PLoS ONE12, e0179847–e0179847 (2017). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 25.Singhla, T. et al. Determination of the sensitivity and specificity of bovine tuberculosis screening tests in dairy herds in Thailand using a Bayesian approach. BMC Vet. Res.15, 149 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 26.Conlan, A. J. K. et al. Potential benefits of cattle vaccination as a supplementary control for bovine tuberculosis. PLoS Comput. Biol.11, e1004038 (2015). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 27.Nuñez-Garcia, J. et al. Meta-analyses of the sensitivity and specificity of ante-mortem and post-mortem diagnostic tests for bovine tuberculosis in the UK and Ireland. Prev. Vet. Med.153, 94–107 (2018). [DOI] [PubMed] [Google Scholar]
  • 28.Ryan, E. et al. The Irish bTB eradication programme: combining stakeholder engagement and research-driven policy to tackle bovine tuberculosis. Ir. Vet. J.76, 32 (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 29.Middleton, S. et al. A molecularly defined skin test reagent for the diagnosis of bovine tuberculosis compatible with vaccination against Johne’s disease. Sci. Rep.11, 2929 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 30.Fernández-Veiga, L. et al. Differences in skin test reactions to official and defined antigens in guinea pigs exposed to non-tuberculous and tuberculous bacteria. Sci. Rep.13, 2936 (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 31.Jones, G. J. et al. Test performance data demonstrates utility of a cattle DIVA skin test reagent (DST-F) compatible with BCG vaccination. Sci. Rep.12, 12052 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 32.Waters, W. R. et al. Effects of serial skin testing with purified protein derivative on the level and quality of antibodies to complex and defined antigens in Mycobacterium bovis-infected cattle. Clin. Vaccine Immunol.22, 641–649 (2015). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 33.Wood, P. R. & Rothel, J. S. In vitro immunodiagnostic assays for bovine tuberculosis. Vet. Microbiol.40, 125–135 (1994). [DOI] [PubMed] [Google Scholar]
  • 34.Casal, C. et al. Strategic use of serology for the diagnosis of bovine tuberculosis after intradermal skin testing. Vet. Microbiol.170, 342–351 (2014). [DOI] [PubMed] [Google Scholar]
  • 35.McCallan, L. et al. Serological test performance for bovine tuberculosis in cattle from herds with evidence of on-going infection in Northern Ireland. PLoS ONE16, e0245655 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 36.Pollock, J. M. & Neill, S. D. Mycobacterium bovis infection and tuberculosis in cattle. Vet. J.163, 115–127 (2002). [DOI] [PubMed] [Google Scholar]
  • 37.Lepper, A. W., Pearson, C. W. & Corner, L. A. Anergy to tuberculin in beef cattle. Aust. Vet. J.53, 214–216 (1977). [DOI] [PubMed] [Google Scholar]
  • 38.Mekonnen, G. A. et al. Dynamics and risk of transmission of bovine tuberculosis in the emerging dairy regions of Ethiopia. Epidemiol. Infect.149, e69 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 39.Kelly, R.F. et al. Association of Fasciola gigantica co-infection with bovine tuberculosis infection and diagnosis in a naturally infected cattle population in Africa. Front. Vet. Sci.5 (2018). [DOI] [PMC free article] [PubMed]
  • 40.Claridge, J. et al. Fasciola hepatica is associated with the failure to detect bovine tuberculosis in dairy cattle. Nat. Commun.3, 853 (2012). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 41.Byrne, A. W. et al. Modelling the variation in skin-test tuberculin reactions, post-mortem lesion counts and case pathology in tuberculosis-exposed cattle: Effects of animal characteristics, histories and co-infection. Transbound Emerg. Dis.65, 844–858 (2018). [DOI] [PubMed] [Google Scholar]
  • 42.Byrne, A.W. et al. Bovine tuberculosis in youngstock cattle: A narrative review. Front. Vet. Sci.9 (2022). [DOI] [PMC free article] [PubMed]
  • 43.O’Hagan, M. J. H. et al. Test characteristics of the tuberculin skin test and post-mortem examination for bovine tuberculosis diagnosis in cattle in Northern Ireland estimated by Bayesian latent class analysis with adjustments for covariates. Epidemiol. Infect.147, e209 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 44.Broughan, J. M. et al. A review of risk factors for bovine tuberculosis infection in cattle in the UK and Ireland. Epidemiol. Infect144, 2899–2926 (2016). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 45.O’Hagan, M. J. H. et al. Risk factors for visible lesions or positive laboratory tests in bovine tuberculosis reactor cattle in Northern Ireland. Prev. Vet. Med.120, 283–290 (2015). [DOI] [PubMed] [Google Scholar]
  • 46.Byrne, A. W. et al. Bovine tuberculosis visible lesions in cattle culled during herd breakdowns: the effects of individual characteristics, trade movement and co-infection. BMC Vet. Res.13, 400 (2017). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 47.Haagsma, J., O’Reilly, L., Dobbelaer, R. & Murphy, T. A comparison of the relative potencies of various bovine PPD tuberculins in naturally infected tuberculous cattle. J. Biol. Stand.10, 273–284 (1982). [DOI] [PubMed] [Google Scholar]
  • 48.Goodchild, A. V., Downs, S. H., Upton, P., Wood, J. L. & de la Rua-Domenech, R. Specificity of the comparative skin test for bovine tuberculosis in Great Britain. Vet. Rec.177, 258 (2015). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 49.EU. The Council Of The European Economic Community: Council Directive of 26 June 1964 on animal health problems affecting intra-Community trade in bovine animals and swine (64/432/EEC, with later amendments) (OJ 121, 29.7.1964, p. 1977). 1964L0432—EN—27.05.2015—019.002—2. (accessed Sep 2021) https://eur-lex.europa.eu/legal-content/EN/TXT/PDF/?uri=CELEX:01964L0432-20150527&rid=1. (2015).
  • 50.Kumar, M. et al. Comparative analysis of tuberculin and defined antigen skin tests for detection of bovine tuberculosis in buffaloes (Bubalus bubalis) in Haryana state, India. BMC Vet. Res.20, 65 (2024). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 51.Wood, P. R., Corner, L. A. & Plackett, P. Development of a simple, rapid in vitro cellular assay for bovine tuberculosis based on the production of gamma interferon. Res. Vet. Sci.49, 46–49 (1990). [PubMed] [Google Scholar]
  • 52.Wood, P. R. et al. Field comparison of the interferon-gamma assay and the intradermal tuberculin test for the diagnosis of bovine tuberculosis. Aust. Vet. J.68, 286–290 (1991). [DOI] [PubMed] [Google Scholar]
  • 53.OIE. BOVIGAM®—Mycobacterium bovis Gamma interferon test kit for cattle. OIE Procedure for Registration of Diagnostic Kits, Abstract sheet. https://www.oie.int/app/uploads/2021/03/oie-register-bovigam-abstract-v1-05-2015.pdf. (2015).
  • 54.Landis, J.R. & Koch, G.G. The measurement of observer agreement for categorical data. Biometrics. 159–174 (1977). [PubMed]
  • 55.Dendukuri, N., Hadgu, A. & Wang, L. Modeling conditional dependence between diagnostic tests: a multiple latent variable model. Stat. Med.28, 441–461 (2009). [DOI] [PubMed] [Google Scholar]
  • 56.Fromsa, A. et al. BCG vaccination reduces bovine tuberculosis transmission, improving prospects for elimination. Science383, eadl3962 (2024). [DOI] [PubMed] [Google Scholar]
  • 57.Carpenter, B. et al. Stan: A probabilistic programming language. J. Stat. Softw.76 (2017). [DOI] [PMC free article] [PubMed]
  • 58.Vehtari, A., Gelman, A. & Gabry, J. Practical Bayesian model evaluation using leave-one-out cross-validation and WAIC. Stat. Comput.27, 1413–1432 (2017). [Google Scholar]
  • 59.Kapur, V., Srinivasan, S., Vordermeier, H. & Jones, G., inventors; Diagnostic reagents. WIPO Patent. 2020.

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Data Availability Statement

All the analyzed data have been included in the manuscript and supplementary materials. Supplementary Tables 2 and 3 present a full cross-tabulation of the eight test results for each animal. Full data tables stratified by age and herd, and all codes are available in the GitHub code repository: https://github.com/MonkeyMyshkin/LatentDST. For further inquiries, please contact the corresponding authors.


Articles from Scientific Reports are provided here courtesy of Nature Publishing Group

RESOURCES