Skip to main content
PLOS One logoLink to PLOS One
. 2025 Sep 5;20(9):e0329668. doi: 10.1371/journal.pone.0329668

Machine learning for predicting the diagnosis of tuberculous versus malignant pleural effusion: External validation and accuracy in two different settings

Alberto Garcia-Zamalloa 1,*, Rafael Arnay 2, Iván Castilla-Rodriguez 2, Javier Mar 3, Jose Manuel Gonzalez-Cava 2, Oliver Ibarrondo 4, Iñaki Salegui 5, Juan Antonio De Miguel 5, Nekane Mugica 5, Borja Aguinagalde 6, Jon Zabaleta 6, Begoña Basauri 7, Marta Alonso 8, Nekane Azcue 8, Eva Gil 9, Irati Garmendia 10, Jorge Taboada 4,11
Editor: Guocan Yu12
PMCID: PMC12412920  PMID: 40911586

Abstract

Objective

To perform an external validation of a previously reported machine learning (ML) approach for predicting the diagnosis of pleural tuberculosis.

Patients and Methods

We defined two cohorts: a Training group, comprising 273 out of 1,220 effusions from our prospective study (2013–2022); and a Testing group, from a retrospective analysis of 360 effusions from 832 consecutive patients in Bajo Deba health district (1996–2012). All the effusions included were exudative and lymphocytic. In Training and Testing groups respectively, 49 and 104 cases were tuberculous, 143 and 92 were malignant, and 81 and 164 were diagnosed with “other diseases”; pre-test probabilities of pleural tuberculosis were 4% and 12.7%. Variables included were: age, pH, adenosine deaminase, glucose, protein, and lactate dehydrogenase levels, and white cell counts (total and differential) in pleural fluid. We used two ML classifiers: binary (tuberculous and non-tuberculous), and three-class (tuberculous, malignant, and others); and compared them with Bayesian analysis.

Results

The best binary classifier yielded a sensitivity of 88%, specificity of 98%, and accuracy of 95%. The best three-class classifier achieved the same accuracy and correctly classified 83% (77/92) of malignant cases. The ML models yielded higher positive predictive values than Bayesian analysis based on ADA > 40 U/l and lymphocyte percentage ≥ 50% (92%).

Conclusions

This external validation confirms the good performance of the previously reported ML approach for predicting the diagnosis of pleural tuberculosis based on exudative and lymphocytic pleural effusions, and for discriminating the cases most likely to be malignant. Additionally, ML was more accurate than the Bayesian approach in our study.

Introduction

Tuberculosis (TB) remains a major global public health problem. In 2024, the World Health Organization estimated that 10.8 million people worldwide had TB (95% uncertainty interval (UI): 10.1–11.7 million) and 1.25 million of them died of this disease through 2023 [1]. Tuberculous pleural effusion (TPE) is the second most prevalent manifestation of extrapulmonary TB, just after tuberculous lymphadenitis [2]. Additionally, TPE constitutes a paucibacillary manifestation of tuberculous disease, and the usefulness of various biomarkers in pleural fluid has been extensively analysed. The most promising biomarker is adenosine deaminase (ADA), as the measurement of this enzyme is cheap, easy and available in most laboratories, and it has shown uniformly high diagnostic performance through five consecutive meta-analyses with a mean sensitivity and specificity of 92–93% and 90–92% respectively, for a diagnostic cut-off point of 40 U/l [37].

The approach for diagnosing incident cases has traditionally been determined by the regional variation in overall TPE prevalence [8]; in relation to this, based on a Bayesian interpretation of its diagnostic accuracy, ADA in pleural fluid is considered optimal as a rule-in test in high TB prevalence settings and as a rule-out test in low prevalence scenarios [9,10]. More recently, Shaw et al. proposed an ADA level greater than 40 U/L for ruling in TPE diagnosis with a local TB global incidence of over 125 cases per 100,000 population, and ADA less than 30 U/L for ruling out TPE in low-incidence countries [11].

The progressively wider use of machine learning (ML) tools in medicine is providing researchers with new opportunities to improve diagnostic accuracy by including additional biomarkers. In 2021, our group published a comparison of different ML algorithms for the diagnosis of TPE in a low prevalence scenario (3.8%) based on 230 exudative and lymphocytic pleural effusions diagnosed in Gipuzkoa Region from 2013 to 2020 [12]. The variables included in the predictive models were age, ADA and results from the routine analysis of pleural fluid including cell counts and biochemical tests. The best ML algorithm (support vector classifier, SVC, a particular type of support vector machine) achieved a sensitivity and specificity of 91% and 98% respectively; furthermore, compared with the Bayesian analysis of ADA level > 40 U/l plus lymphocyte percentage ≥ 50% in pleural fluid, the positive predictive value (PPV) increased from 42.4% to 70.5% in a 5% pre-test probability scenario. As the presumptive diagnosis of TPE has been traditionally driven by its prevalence, we needed to confirm that the implemented tool maintains its diagnostic properties in other scenarios featuring a different tuberculous prevalence.

The aim of this study was to perform an external validation of the aforementioned model by testing the trained ML algorithm in a different TPE prevalence setting, namely 360 pleural exudative and lymphocytic pleural effusions diagnosed in Bajo Deba Health District from 1996 to 2012. In a second step, we compared the diagnostic accuracy of the ML procedure and the classical Bayesian analysis system for TPE in both different clinical scenarios (Bajo Deba 1996–2012 and Gipuzkoa 2013–2022).

Materials and methods

We pooled data from two groups of patients from Gipuzkoa (Basque Country, Spain) from January 1996 to the present. All the patients had been diagnosed with pleural effusion, undergone diagnostic thoracentesis, and had ADA level measured in the pleural fluid sample obtained. The first group, with which the ML models were trained (Training group), corresponded to patients from the whole region of Gipuzkoa, included in a previously published article regarding the diagnostic accuracy of ML for TPE [12], and for whom data were collected from January 2013 to December 2020. We extended this group with patients prospectively recruited across the region until December 2022. The second group, with which the ML models were tested (Testing group), comprised a retrospective cohort of patients from a specific area (Bajo Deba health district) for whom data were collected from January 1996 to December 2012 (partial results from 1998 to 2008 were published in 2012) [13]. The retrospective project counted with the permission of the local Ethics Committee of the Mendaro Hospital and the approval of the Western Gipuzkoa Clinical Research Commission; in the prospective project all patients gave written informed consent (the parents of a seventeen years old girl signed the unique informed consent for a minor) and the protocol was evaluated and approved by the Clinical Research Ethics Committee of Gipuzkoa (Record number 11/12). Data were accessed for research purposes from September to December 2012 within the retrospective project and from March 2013 to December 2022 for the prospective one.

The mean annual incidence rates of TB in Gipuzkoa from 2013 to 2022 and in Bajo Deba health district from 1996 to 2012 and in were 12.59 and 43.17 cases per 100,000 population respectively. Although a Tuberculosis Control Programme was not implemented in the Basque Country until 2003 [14], data recording has been highly reliable since 1995.

The following variables were included: age, pleural fluid ADA level, and pleural fluid routine test results: pH, glucose, protein, and lactate dehydrogenase (LDH) levels, and white blood cell counts (total and differential). It was not possible to include the red blood cell count in the analysis since it had not been routinely assessed in the referral hospital of Bajo Deba health district during the specified period (1996–2012). Further, to ensure consistency with the methodology employed in the 2013–2022 Gipuzkoa study, we considered only the first pleural fluid sample from each case. The databases were anonymized and authors did not have access to information that could identify individual participants.

Diagnostic criteria for tuberculous, malignant and parapneumonic pleural effusion applied to all the cases from the external validation/Testing group were as follows [15,16]:

  • Confirmed tuberculous pleural effusion: positive culture or Xpert MTB/RIF assay in pleural fluid, pleural tissue or sputum.

  • Probable tuberculous pleural effusion: granulomatous inflammation in pleural tissue and/or exudative and lymphocytic pleural effusion with ADA > 40 U/l and complete recovery with antituberculosis treatment.

  • Confirmed malignant pleural effusion: malignant cells found in pleural biopsy tissue or pleural fluid.

  • Paramalignant pleural effusion: cancer diagnosed de novo, no other cause of pleural effusion identified, absence of malignant cells in pleural fluid and pleural biopsy tissue, and parallel evolution of malignant disease and pleural effusion.

  • Parapneumonic effusion: alveolar consolidation diagnosed by chest X-ray or computed tomography with ipsilateral pleural effusion, and total recovery of both with antibiotic treatment.

The rest of the cases in this group were diagnosed by well-defined clinical criteria, and all the cases in the Training group had gold standard quality diagnoses [12].

Overall, in the Training group (Gipuzkoa 2013–2022), 273 pleural effusions were exudative and lymphocytic and, of these, 49 were of tuberculous origin and 143 malignant. Regarding, the Testing group (Bajo Deba 1996–2012), 360 out of 832 episodes of pleural effusion were exudative and lymphocytic, and TB was diagnosed in 104 out of these 360 cases. Additionally, 92 cases were found to have a malignant aetiology.

The sample size for the Training group was computed by using a statistical program (Epidat 3.1). For a sensitivity of 95%, specificity of 90%, prevalence (pre-test probability) of 10%, significance level of 5%, power of 80%, and precision of 5%, the minimum sample size was 200 pleural effusions (12). Regarding the Testing group, we tried to include as many pleural effusions as possible due to the fact that the data recording was retrospective; we were able to record reliable data from 1996 to 2012.

In line with previous studies on pleural diseases since the 1990s, the term “prevalence” was employed to refer to the number of cases of a specific type of pleural effusion divided by the total number of pleural effusions studied in a given clinical setting over a known period of time. In this context, “prevalence” can be considered a synonym of “pre-test probability”. Nonetheless, it should be noted that the term “pre-test probability” for ML specifically refers to the number of pleural fluid samples with a given diagnosis divided by the number of pleural fluid samples included in the study, not in the clinical scenario. Table 1 reports both of these variables; additionally, Supporting information (S1 Table and S2 Table) list the aetiology and diagnostic criteria met for all the cases included in the study.

Table 1. Number of cases in the Training and Testing groups; pre-test probability for 1) local prevalence of tuberculous pleural effusion and 2) cases included in the machine learning analysis (that is, considering only the cases with exudative and lymphocytic effusion samples).

Group GIPUZKOA 2013–2022 (Training group) BAJO DEBA 1996–2012 (Testing group)
Cases with thoracentesis, (n) 1220 832
Exudative and lymphocytic pleural effusions included, (n) 273 360
Non-tuberculous cases (based on exudative and lymphocytic effusions), (n) 224 256
Cases of tuberculosis, (n) 49 104
Local pre-test probability or “prevalence” 4% (49/1220) 12.5% (104/832)
Machine learning pre-test probability or “prevalence” 17.9% (49/273) 28.8% (104/360)
Malignant cases, (n) 143 92
Other cases, (n) 81 164

Age and pleural fluid ADA level were by far the most relevant variables in our reference study (Training Group), and therefore, we compared the two populations as a function of these two variables and in a categorical way. Results of these comparisons are presented in Table 2.

Table 2. Comparative analysis of categorical variables age and ADA level in the reference (Training) and external validation (Testing) groups (considering p< 0.05 indicative of a statistically significant difference).

Diagnosis Tuberculosis Malignancy Others
Patient group Gipuzkoa 2013–2022 Bajo Deba 1996–2012 p value Gipuzkoa 2013–2022 Bajo Deba 1996–2012 p value Gipuzkoa 2013–2022 Bajo Deba 1996–2012 p value
Number of cases 49 104 143 92 81 164
Age (years) <30 7 (14.29%) 57 (54.81%) <0.001 0 (0.00%) 0 (0.00%) 0.191 2 (2.47%) 3 (1.83%) 0.872
30-60 22 (44.90%) 22 (21.15%) 34 (23.78%) 15 (16.30%) 19 (23.46%) 36 (21.95%)
≥60 20 (40.82%) 25 (24.04%) 109 (76.22%) 77 (83.70%) 60 (74.07%) 125 (76.22%)
ADA <=40 U/l 2 (4.08%) 8 (7.69%) 0.503 133 (93.01%) 89 (96.74%) 0.258 77 (95.06%) 159 (96.95%) 0.483
>40 U/l 47 (95.92%) 96 (92.31%) 10 (6.99%) 3 (3.26%) 4 (4.94%) 5 (3.05%)

As we can see, “age” and “ADA level” show similar values and behaviour in the two groups. The only statistically significant difference was found in the age of the two populations diagnosed with TPE: they were younger in the external validation or Testing group (Bajo Deba 1996–2012), and this is attributable to the TB incidence in the corresponding period being higher than in the Training group (Gipuzkoa 2013–2022), implying more TB bacilli circulating in the community and in turn a higher proportion of primary infections like cases of TPE in young people [17,18].

Along with this, Fig 1 shows the distribution of “tuberculous” and “others” samples as a function of ADA and age in both Training and Testing groups (recalling that all pleural fluid samples are exudative and lymphocytic, and hence, this distribution is the result of combining the two variables ADA > 40 U/l and lymphocyte percentage ≥ 50%: ADA 40 + LP 50). In this figure, there is a horizontal line representing the ADA level = 40 U/l. We can conclude that patients in both Training and Testing groups were evenly distributed, and hence, a comparison between them is valid.

Fig 1. Distribution of “Tuberculosis” and “others” samples in terms of ADA and age in Training (Gipuzkoa 2013-2022, left) and Testing (Bajo Deba 1996-2012, right) groups.

Fig 1

Two distinct analyses were conducted. The primary analysis employed ML techniques to categorize samples into two groups: tuberculous and non-tuberculous. A secondary analysis utilized the same techniques to further differentiate malignant cases, resulting in three categories: tuberculous, malignant, and “others”. During both the training and testing phases, the following variables were used: age, pleural fluid ADA level, and routine parameters derived from pleural fluid analysis (including pH, glucose, protein, and LDH levels, and total and differential white blood cell counts). In this study, the six classifiers employed were the same as those used in the previous study, and are among the most widely used types of ML classifier: multilayer perceptron, logistic regression, SVC, decision tree, K-nearest neighbours, and random forest. As in our previous study, we utilized the Python scikit-learn library to implement these classifiers. For each classifier, we explored a range of parameter values in conjunction with a 5-fold cross-validation approach.

Additionally, we compared positive and negative predictive values as a function of pre-test probability for the different trained ML models, as well as for a classification based on the ADA 40 + LP 50 criterion. The estimated positive predictive value (PPV) was calculated as a function of the pre-test probability (prevalence) using the sensitivity and specificity of each classifier obtained in the training dataset (Gipuzkoa):

(sensitivity * prevalence)/ ((sensitivity * prevalence) + (1  specificity) * (1  prevalence)).

The real PPVs in Gipuzkoa and Bajo Deba were calculated as the true positives divided by the sum of the true positives and false positives, obtained in each dataset:

TP / (TP + FP)

Finally, we performed a three-class classification: tuberculous, malignant and others for comparability with our previous study.

Results

Fig 2 shows the confusion matrices of the cross-validation results for all the ML models in the training set (Gipuzkoa), and the testing results (Bajo Deba) (0: others, 1: tuberculous; predicted values in columns, real values in rows).

Fig 2. Confusion matrices in cross-validation (Gipuzkoa, left) and in testing (Bajo Deba, right).

Fig 2

Besides the classification results obtained with the ML models, we also conducted the same classification using the ADA 40 + LP 50 criterion to classify a sample as tuberculous. Using this criterion, we classified 210 cases of non-tuberculous disease as non-tuberculous (true negatives, TNs), 47 cases of TB as tuberculous (true positives, TPs), 14 cases of non-tuberculous disease as tuberculous (false positives, FPs) and 2 cases of TB as non-tuberculous disease (false negatives, FNs) in the Gipuzkoa dataset. On the other hand, in the Bajo Deba dataset, we obtained 248 TNs, 96 TPs, 8 FPs and 8 FNs.

Table 3 lists the metrics of accuracy, sensitivity, specificity, and PPV for each dataset for the different trained ML models and the ADA 40 + LP 50 criterion.

Table 3. Accuracy, sensitivity, specificity, and positive predictive value for each dataset for the trained machine learning models.

Classifier type Gipuzkoa dataset Bajo Deba dataset
Accuracy Sensitivity Specificity PPV Accuracy Sensitivity Specificity PPV
Logistic regression 0.95 0.98 0.94 0.77 0.94 0.86 0.98 0.95
Support vector machine 0.96 0.98 0.96 0.83 0.95 0.88 0.98 0.94
Decision tree 0.9 1.0 0.88 0.65 0.94 0.95 0.93 0.85
K-nearest neighbours 0.92 0.9 0.92 0.71 0.95 0.91 0.96 0.91
Random forest 0.95 0.96 0.94 0.78 0.95 0.9 0.96 0.91
Multilayer perceptron 0.96 0.94 0.96 0.85 0.94 0.82 0.98 0.96
ADA 40 + LP 50 0.94 0.96 0.94 0.77 0.96 0.92 0.97 0.92

PPV, positive predictive value; ADA 40 + LP 50, adenosine deaminase > 40 U/l and lymphocyte percentage ≥ 50%.

In Fig 3, the comparison of positive and negative predictive values as a function of pre-test probability is shown for the different trained ML models, as well as for a classification based on the ADA 40 + LP 50 criterion. The dashed vertical lines indicate the ML TPE pre-test probability points for the training (0.18) and testing (0.29) datasets (recall that the corresponding clinical scenario pre-test probabilities or “local prevalence of TPE” are 4% and 12.5% respectively).

Fig 3. Positive and negative predictive values as a function of pre-test probability.

Fig 3

Table 4 compares the real and estimated PPVs obtained for each ML model as well as for a classification based on ADA 40 + LP 50, at the particular pre-test probabilities in the training (Gipuzkoa: 0.18) and testing (Bajo Deba: 0.29) datasets.

Table 4. Real and estimated positive predictive values in the training and testing datasets.

Classifier type Gipuzkoa dataset Bajo Deba dataset
Estimated Real Estimated Real
Logistic regression 0.78 0.77 0.87 0.95
Support vector machine 0.84 0.83 0.91 0.94
Decision tree 0.64 0.65 0.77 0.85
K-nearest neighbours 0.71 0.71 0.82 0.91
Random forest 0.77 0.78 0.87 0.91
Multilayer perceptron 0.83 0.85 0.91 0.96
ADA 40 + LP50 0.77 0.77 0.86 0.92

ADA 40 + LP 50: adenosine deaminase > 40 U/l and lymphocyte percentage ≥ 50%.

It can be observed that the PPV from the ML models exceeds that based on the ADA 40 + LP 50 criterion in several cases, both for the Gipuzkoa and Bajo Deba datasets. Additionally, the estimated PPVs underestimate the real values obtained in the Bajo Deba dataset for all ML models as well as for the ADA 40 + LP 50-based classifier.

Fig 4 shows the confusion matrices of the classification results in Gipuzkoa (left) and Bajo Deba (right), respectively (0: tuberculous, 1: malignant, and 2: others; predicted values in columns, real values in rows).

Fig 4. Confusion matrices for each classifier in Gipuzkoa (left) and Bajo Deba (right) (0: tuberculous, 1: malignant, 2: others; predicted values in columns, real values in rows).

Fig 4

Discussion

In this study, we have performed an external validation of our results from a prospective study that assessed the usefulness of ML for predicting the diagnosis of TPE, going beyond the ADA 40 + LP 50 criterion, in a low prevalence setting and through the period 2013–2022 [12]. We decided to conduct the study in a different prevalence scenario and we found some additional interesting behaviours of the ML classifiers in this context.

TPE constitutes a type 4 hypersensitivity reaction to mycobacterial antigens, and it is not certain whether this immunogenic response needs live mycobacteria. There is a well-proven initial invasion of polymorphonuclear leukocytes in the pleural fluid, followed by macrophages, and finally, almost all TPE fluids become lymphocytic [19,20]. An ideal diagnostic tool to confirm TPE should be able to detect Mycobacterium tuberculosis or any of its specific subunits or components directly, and optimally in pleural fluid through diagnostic thoracentesis; but due to its paucibacillary nature, current microbiological and nucleic acid amplification tests show low sensitivity: < 5% with acid-fast bacilli smear in pleural fluid, 30–50% with mycobacterial culture and 21–47% with a nucleic acid amplification test compared to a composite reference standard [21]. Pleural biopsy (closed or guided by video-thoracoscopy) is more sensitive but also more invasive, and hence, it is not suitable in all clinical scenarios; further, Mycobacterium tuberculosis grows in pleural tissue culture in 75% of cases, the others being diagnosed based on the histological finding of granuloma in pleural tissue, which shows a specificity of 95% but gives no information regarding microbiological characteristics or drug resistance [22].

Due to all the above, great efforts have been made over recent decades to obtain a surrogate diagnostic tool for TPE, pleural fluid biomarkers being the most extensively explored. They provide a probability for ruling in and/or ruling out the disease, and among them, ADA is the most cost-effective, it is standardized and it has a universally accepted diagnostic cut-off value (40 U/l) [12]. Its combination with lymphocyte percentage increases ADA’s specificity [23], as we also confirmed in our first study from 1998 to 2008 [13]. In our second study [12], we included both variables in an ML approach. This third study is an external validation of the latter, and we performed it in a higher TPE prevalence scenario.

Using the ADA 40 + LP 50 criterion, specificity decreases when moving from a high- to a low-incidence scenario. This is because, proportionally, there are more malignant samples with ADA > 40 U/l in low-incidence scenarios, which increases the number of false positives (8 classified erroneously as tuberculous out of a total of 256 non-tuberculous samples in the case of Bajo Deba, compared to 14 classified erroneously as tuberculous out of a total of 224 non-tuberculous samples in the case of Gipuzkoa).

Nonetheless, some ML models show a less marked decrease in specificity when transitioning from the high- to the low-incidence scenario. For example, the SVC achieved sensitivities and specificities of 98% and 96% in Gipuzkoa and 88% and 98% in Bajo Deba, while the logistic regression classifier provided sensitivities and specificities, respectively, of 98% and 94% in Gipuzkoa and 86% and 98% in Bajo Deba. This is because these models rely not only on the ADA 40 + LP 50 criterion but also a combination of other features, each with its relative influence, ADA and age being the most important ones.

Both the ADA 40 + LP 50 criterion and the ML models provide lower sensitivity in Gipuzkoa than in Bajo Deba health district. This can be explained by two factors. Firstly, there were quite a few cases of TB in Bajo Deba with ADA below 40 U/l (in the first pleural fluid sample), while there are virtually no such cases in Gipuzkoa, causing both the ML models and the ADA > 40 criterion to fail to generalize correctly in the Bajo Deba dataset. Secondly, and this affects only the ML models, the patterns of ADA and age, the two most discriminatory variables, are not the same in the Gipuzkoa and Bajo Deba datasets. Fig 5 shows the tuberculous, malignant, and other cases as a function of ADA and age in Gipuzkoa (left) and in Bajo Deba (right). False negatives produced by the SVC in the Bajo Deba dataset have also been marked with red crosses. As can be seen, there was much more TB circulating in the young population in the Bajo Deba dataset (right) than in Gipuzkoa (left), and although the ML models have not seen this pattern in the training set, they are capable of generalizing well, since there were no non-tuberculous samples in those ranges of ADA and age in the Gipuzkoa dataset either. Nonetheless, there are ranges of ADA slightly above 40 U/l and ages around 80 years, where most samples in the Gipuzkoa dataset are classified as “malignant” or “other”, while in the Bajo Deba dataset, there are tuberculous samples, and hence, most of the trained ML models fail to generalize well.

Fig 5. Tuberculous, malignant, and other cases as a function of adenosine deaminase level and age in Gipuzkoa (left) and in Bajo Deba (right).

Fig 5

False negatives produced by the support vector classifier in the Bajo Deba dataset are marked with red crosses.

Comparing ML and Bayesian analysis, it can be observed that the PPV of the ML models exceeds that of ADA 40 + LP 50 in several cases, for both the Gipuzkoa and Bajo Deba datasets. Additionally, the estimated PPVs underestimate the real values obtained in the Bajo Deba dataset for all ML models as well as for the ADA 40 + LP 50-based classifier (Fig 3 and Table 4). This indicates that the “tuberculous” and “others” classes are more readily separable in the Bajo Deba than in the Gipuzkoa dataset. This is primarily attributable to the fact that there are fewer malignant cases with high ADA in Bajo Deba than in Gipuzkoa. Consequently, the projected PPV for a classifier with a sensitivity and specificity obtained in Gipuzkoa falls short of the real PPV obtained in Bajo Deba for that classifier.

Concerning the results of the three-class classification (tuberculous/malignant/others), as can be observed, practically all three-class classifiers provide sensitivity/specificity in the classification of tuberculous cases against the others similar to that of the two-class classifiers. For example, the Random Forest classifier correctly detects 47 cases of TPE out of a total of 49 in Gipuzkoa and 92 out of a total of 104 in Bajo Deba, but at the same time, it is capable of correctly classifying 118 malignant cases out of a total of 143 malignant cases in Gipuzkoa, as well as correctly classifying 74 malignant cases out of a total of 92 malignant cases in Bajo Deba (Fig 4).

This study has some limitations. First, the analysis of the testing dataset is retrospective, it has been performed in a single centre and the quality of diagnosis is lower than in the training dataset. Second, the incidence rate of TB is higher in the testing group, but at the same time, this has provided an opportunity to analyse the behaviour of the ML classifiers in different prevalence scenarios of a transmissible disease. Moreover, the number of cases included in the study, the homogeneity of the groups and the comparable behaviour of the two most important variables are positive features, enhancing the validity of our study’s findings. Our Machine Learning model is freely available as an app (at https://pleurapp.ispana.es/) to help other physicians or thoracic surgeons apply this approach when dealing with exudative and lymphocytic pleural effusions.

Conclusions

With this external validation of our previously reported model, we confirm that an ML approach combining ADA with age and routine pleural fluid parameters is suitable for predicting the diagnosis of pleural TB in any prevalence scenario, and secondarily, for discriminating the cases most likely to be malignant amongst exudative and lymphocytic non-tuberculous effusions. Additionally, we conclude that ML is more accurate than Bayesian analysis in different prevalence scenarios, although the epidemiological behaviour of transmissible diseases (e.g., the relationship between incidence rate and patient age) is a challenge to be tackled in developing ML approaches.

Supporting information

S1 Table. Number of cases by diagnosis obtained in the Training and Testing groups.

(DOCX)

pone.0329668.s001.docx (15.8KB, docx)
S2 Table. Diagnostic criteria met by tuberculous and malignant cases (listing each case just once under the highest quality criterion among all the criteria met).

(DOCX)

pone.0329668.s002.docx (13.8KB, docx)

Data Availability

All the data included in the study are fully available and uploaded in the Zenodo public repository: https://doi.org/10.5281/zenodo.15576280 Additionally we have deposited our laboratory protocols in protocols.io, in order to enhance the reproducibility of our results: dx.doi.org/10.17504/protocols.io.4r3l2pw53g1y/v1.

Funding Statement

The author(s) received no specific funding for this work.

References

  • 1.Global tuberculosis report 2024 [Internet]. Geneva: World Health Organization; 2024. Available from: https://www.who.int/teams/global-tuberculosis-programme/tb-reports/global-tuberculosis-report-2024/tb-disease-burden [Google Scholar]
  • 2.Cohen LA, Light RW. Tuberculous pleural effusion. Turk Thorac J. 2015;16:1–9. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 3.Palma RM, Bielsa S, Esquerda A, Martínez-Alonso M, Porcel JM. Diagnostic Accuracy of Pleural Fluid Adenosine Deaminase for Diagnosing Tuberculosis. Meta-analysis of Spanish Studies. Arch Bronconeumol (Engl Ed). 2019;55(1):23–30. doi: 10.1016/j.arbres.2018.05.007 [DOI] [PubMed] [Google Scholar]
  • 4.Greco S, Girardi E, Masciangelo R. Adenosine deaminase and interferon gamma measurements for the diagnosis of tuberculous pleurisy: a meta-analysis. Int J Tuberc Lung Dis. 2003;7:777–86. [PubMed] [Google Scholar]
  • 5.Liang Q-L, Shi H-Z, Wang K, Qin S-M, Qin X-J. Diagnostic accuracy of adenosine deaminase in tuberculous pleurisy: a meta-analysis. Respir Med. 2008;102(5):744–54. doi: 10.1016/j.rmed.2007.12.007 [DOI] [PubMed] [Google Scholar]
  • 6.Aggarwal AN, Agarwal R, Sehgal IS, Dhooria S. Adenosine deaminase for diagnosis of tuberculous pleural effusion: A systematic review and meta-analysis. PLoS One. 2019;14(3):e0213728. doi: 10.1371/journal.pone.0213728 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 7.Morisson P, Neves DD. Evaluation of adenosine deaminase in the diagnosis of pleural tuberculosis: a Brazilian meta-analysis. J Bras Pneumol. 2008;34(4):217–24. doi: 10.1590/s1806-37132008000400006 [DOI] [PubMed] [Google Scholar]
  • 8.Mummadi SR, Stoller JK, Lopez R, et al. Epidemiology of Adult Pleural Disease in the United States. Chest. 2021;160:1534–51. [DOI] [PubMed] [Google Scholar]
  • 9.Porcel JM. Biomarkers in the diagnosis of pleural diseases: a 2018 update. Ther Adv Respir Dis. 2018;12:1753466618808660. doi: 10.1177/1753466618808660 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 10.Antonangelo L, Faria CS, Sales RK. Tuberculous pleural effusion: diagnosis & management. Expert Rev Respir Med. 2019;13(8):747–59. doi: 10.1080/17476348.2019.1637737 [DOI] [PubMed] [Google Scholar]
  • 11.Shaw JA, Ahmed L, Koegelenberg CFN. Effusions related to TB. In: Maskell NA, Laursen CB, Lee YCG, et al., editors. Pleural Disease [Internet]. Sheffield, United Kingdom: European Respiratory Society; 2020. [cited 2024 May 22]. p. 172–192. Available from: http://erspublications.com/lookup/doi/10.1183/2312508X.10023819 [Google Scholar]
  • 12.Garcia-Zamalloa A, Vicente D, Arnay R, Arrospide A, Taboada J, Castilla-Rodríguez I, et al. Diagnostic accuracy of adenosine deaminase for pleural tuberculosis in a low prevalence setting: A machine learning approach within a 7-year prospective multi-center study. PLoS One. 2021;16(11):e0259203. doi: 10.1371/journal.pone.0259203 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 13.Garcia-Zamalloa A, Taboada-Gomez J. Diagnostic accuracy of adenosine deaminase and lymphocyte proportion in pleural fluid for tuberculous pleurisy in different prevalence scenarios. PLoS One. 2012;7(6):e38729. doi: 10.1371/journal.pone.0038729 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 14.Tuberculosis Prevention and Control Program. Workgroup. Consensus document. 2001. Available from: https://www.ogasun.ejgv.euskadi.eus/r51-catpub/es/k75aWebPublicacionesWar/k75aObtenerPublicacionDigitalServlet?R01HNoPortal=true&N_LIBR=051622&N_EDIC=0001&C_IDIOM=es&FORMATO=.pdf
  • 15.Roberts ME, Rahman NM, Maskell NA, Bibby AC, Blyth KG, Corcoran JP, et al. British Thoracic Society Guideline for pleural disease. Thorax. 2023;78(11):1143–56. doi: 10.1136/thorax-2023-220304 [DOI] [PubMed] [Google Scholar]
  • 16.Herrera Lara S, Fernández-Fabrellas E, Juan Samper G, Marco Buades J, Andreu Lapiedra R, Pinilla Moreno A, et al. Predicting Malignant and Paramalignant Pleural Effusions by Combining Clinical, Radiological and Pleural Fluid Analytical Parameters. Lung. 2017;195(5):653–60. doi: 10.1007/s00408-017-0032-3 [DOI] [PubMed] [Google Scholar]
  • 17.Valdés L, Ferreiro L, Cruz-Ferro E, González-Barcala FJ, Gude F, Ursúa MI, et al. Recent epidemiological trends in tuberculous pleural effusion in Galicia, Spain. Eur J Intern Med. 2012;23(8):727–32. doi: 10.1016/j.ejim.2012.06.014 [DOI] [PubMed] [Google Scholar]
  • 18.Garcia Zamalloa A, Taboada Gómez J, Arrospide Elgerresta A. Pleural tuberculosis in young people as an indicator of general tuberculosis rate. Eur Respir J. 2014;44:P2689. [Google Scholar]
  • 19.Vorster MJ, Allwood BW, Diacon AH, Koegelenberg CFN. Tuberculous pleural effusions: advances and controversies. J Thorac Dis. 2015;7(6):981–91. doi: 10.3978/j.issn.2072-1439.2015.02.18 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 20.Sahn SA, Huggins JT, San José ME, Álvarez-Dobaño JM, Valdés L. Can tuberculous pleural effusions be diagnosed by pleural fluid analysis alone? Int J Tuberc Lung Dis. 2013;17(6):787–93. doi: 10.5588/ijtld.12.0892 [DOI] [PubMed] [Google Scholar]
  • 21.Chan KKP, Lee YCG. Tuberculous pleuritis: clinical presentations and diagnostic challenges. Curr Opin Pulm Med. 2024;30(3):210–6. doi: 10.1097/MCP.0000000000001052 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 22.Porcel JM. Advances in the diagnosis of tuberculous pleuritis. Ann Transl Med. 2016;4(15):282. doi: 10.21037/atm.2016.07.23 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 23.Burgess LJ, Maritz FJ, Le Roux I, Taljaard JJ. Combined use of pleural adenosine deaminase with lymphocyte/neutrophil ratio. Increased specificity for the diagnosis of tuberculous pleuritis. Chest. 1996;109(2):414–9. doi: 10.1378/chest.109.2.414 [DOI] [PubMed] [Google Scholar]

Decision Letter 0

Guocan Yu

15 May 2025

PONE-D-25-09667Machine learning for predicting the diagnosis of pleural tuberculosis: external validation and accuracy in two different settings.PLOS ONE

Dear Dr. Garcia-Zamalloa,

Thank you for submitting your manuscript to PLOS ONE. After careful consideration, we feel that it has merit but does not fully meet PLOS ONE’s publication criteria as it currently stands. Therefore, we invite you to submit a revised version of the manuscript that addresses the points raised during the review process.

Please submit your revised manuscript by Jun 29 2025 11:59PM. If you will need more time than this to complete your revisions, please reply to this message or contact the journal office at plosone@plos.org . When you're ready to submit your revision, log on to https://www.editorialmanager.com/pone/ and select the 'Submissions Needing Revision' folder to locate your manuscript file.

Please include the following items when submitting your revised manuscript:

  • A rebuttal letter that responds to each point raised by the academic editor and reviewer(s). You should upload this letter as a separate file labeled 'Response to Reviewers'.

  • A marked-up copy of your manuscript that highlights changes made to the original version. You should upload this as a separate file labeled 'Revised Manuscript with Track Changes'.

  • An unmarked version of your revised paper without tracked changes. You should upload this as a separate file labeled 'Manuscript'.

If you would like to make changes to your financial disclosure, please include your updated statement in your cover letter. Guidelines for resubmitting your figure files are available below the reviewer comments at the end of this letter.

If applicable, we recommend that you deposit your laboratory protocols in protocols.io to enhance the reproducibility of your results. Protocols.io assigns your protocol its own identifier (DOI) so that it can be cited independently in the future. For instructions see: https://journals.plos.org/plosone/s/submission-guidelines#loc-laboratory-protocols . Additionally, PLOS ONE offers an option for publishing peer-reviewed Lab Protocol articles, which describe protocols hosted on protocols.io. Read more information on sharing protocols at https://plos.org/protocols?utm_medium=editorial-email&utm_source=authorletters&utm_campaign=protocols .

We look forward to receiving your revised manuscript.

Kind regards,

Guocan Yu

Academic Editor

PLOS ONE

Journal requirements: 

When submitting your revision, we need you to address these additional requirements.

1. Please ensure that your manuscript meets PLOS ONE's style requirements, including those for file naming. The PLOS ONE style templates can be found at

https://journals.plos.org/plosone/s/file?id=wjVg/PLOSOne_formatting_sample_main_body.pdf and

https://journals.plos.org/plosone/s/file?id=ba62/PLOSOne_formatting_sample_title_authors_affiliations.pdf .

2. Please note that PLOS ONE has specific guidelines on code sharing for submissions in which author-generated code underpins the findings in the manuscript. In these cases, we expect all author-generated code to be made available without restrictions upon publication of the work. Please review our guidelines at https://journals.plos.org/plosone/s/materials-and-software-sharing#loc-sharing-code and ensure that your code is shared in a way that follows best practice and facilitates reproducibility and reuse.

3. In the online submission form, you indicated that [All the data are included in a database in our Department, and they cannot be shared publicly, but we have no problem to share all them for everyone who ask for it and who meet the criteria for access to confidential data.].

All PLOS journals now require all data underlying the findings described in their manuscript to be freely available to other researchers, either 1. In a public repository, 2. Within the manuscript itself, or 3. Uploaded as supplementary information.

This policy applies to all data except where public deposition would breach compliance with the protocol approved by your research ethics board. If your data cannot be made publicly available for ethical or legal reasons (e.g., public availability would compromise patient privacy), please explain your reasons on resubmission and your exemption request will be escalated for approval.

4. Please include captions for your Supporting Information files at the end of your manuscript, and update any in-text citations to match accordingly. Please see our Supporting Information guidelines for more information: http://journals.plos.org/plosone/s/supporting-information.

[Note: HTML markup is below. Please do not edit.]

Reviewers' comments:

Reviewer's Responses to Questions

Comments to the Author

1. Is the manuscript technically sound, and do the data support the conclusions?

The manuscript must describe a technically sound piece of scientific research with data that supports the conclusions. Experiments must have been conducted rigorously, with appropriate controls, replication, and sample sizes. The conclusions must be drawn appropriately based on the data presented.

Reviewer #1: Yes

Reviewer #2: Yes

**********

2. Has the statistical analysis been performed appropriately and rigorously?

Reviewer #1: Yes

Reviewer #2: Yes

**********

3. Have the authors made all data underlying the findings in their manuscript fully available?

The PLOS Data policy requires authors to make all data underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data—e.g. participant privacy or use of data from a third party—those must be specified.

Reviewer #1: Yes

Reviewer #2: Yes

**********

4. Is the manuscript presented in an intelligible fashion and written in standard English?

PLOS ONE does not copyedit accepted manuscripts, so the language in submitted articles must be clear, correct, and unambiguous. Any typographical or grammatical errors should be corrected at revision, so please note any specific errors here.

Reviewer #1: Yes

Reviewer #2: Yes

**********

5. Review Comments to the Author

Please use the space provided to explain your answers to the questions above. You may also include additional comments for the author, including concerns about dual publication, research ethics, or publication ethics. (Please upload your review as an attachment if it exceeds 20,000 characters)

Reviewer #1: Peer Review for Manuscript PONE-D-25-09667

Date: 9-April-2025

“Machine learning for predicting the diagnosis of pleural tuberculosis: external validation and accuracy in two different settings”

These are my comments of the peer review for the manuscript requested.

My general comments:

In general, the authors presented a well-conducted of external validation study. This research was developed, based on a authors’ recent publication of development prospective cohort model (reported in 2021) of Adenosine deaminase (ADA) for pleural tuberculosis in low tuberculosis (TB) prevalence. The training cohort showed promising results, providing a good rationale for an external validation study. The analysis is very good; however, I think there is room for improvement in writing. To my own perception, I think the structure and presentation of this article writing are still not smooth.

The plus point of this external validation is that the developed pleural TB model (TPE) was tested in a completely different cohort with different TB/TPE incidence. In this study, although test cohort showed a bit lower accuracy, predictive values, than the Train study cohort. I think these real-life data are highly appreciated, demonstrating the applicability of the machine learning TPE models. This highlights the real-life data testing in the external validation cohort.

In clinical practice, tuberculous pleural effusion (TPE) is difficult for diagnosis, particularly in immunocompromised patients and/or limited resource countries with high TB burden. Hence, I raise a clinical question regarding the immune status of the study participants (training and test cohorts). Because HIV co-infection prevalence is highly significant among patients with TPE, do authors have any data about HIV infection or immune testing (CD4 cell counts…etc) in these cohorts?

In additionally, a short justification for sample size calculation in external validation cohorts enhances the validity of the study as well.

My specific comments:

1. Title: The title fully describes study aim and objectives.

2. Abstract: The abstract is well written. However, I also think one minor point for amendment, as below:

Lines 45 to 47:

The authors presented a sample size of the Test cohort with 832 consecutive patients in Bajo Deba health district (1996-2012), but did not show the sample size of the Train cohort from a prospective cohort study (2013-2020). To be consistent and transparent in data, the sample size (how many patients in Train cohort?) should be described (stated) in this section as well.

3. Introduction:

There is room for improvement in the introduction section, as follows:

Line 67: 95% UI needs to be fully written, as readers are not familiar with this term-UI.

Lines 92 to 94: “The model is freely 92 available as an app (at https://pleurapp.ispana.es/) to help other physicians or thoracic surgeons apply this approach when dealing with exudative and lymphocytic pleural effusions”

� This information should not be placed in the introduction because it was not connected with the flow of the main idea discussed. This can be relocated to the discussion as appropriate.

Lines 99 to 100 in the introduction section:

“we compared the 99 diagnostic accuracy of the ML procedure and the classical Bayesian analysis system for TPE in both 100 different clinical scenarios (Bajo Deba 1996-2012 and Gipuzkoa 2013-2022)”.

� The authors mentioned two study cohorts to be modelled without brief introduction before. Hence, I recommend a brief introduction (1-2 sentences) of these 2 cohorts in the previous paragraphs.

4. Materials and Methods and Results

There is room for Materials and Methods section for improvement, regarding data presentation.

4.1 I can understand that the authors aimed to place emphasis on the Test cohort (external validation), so proactively present the Test study cohort as the first group, while the Train study cohort as the second group. Intuitively, this style of data presentation brings readers (like me) to a certain level of confusion and needs to reread and rethink. Therefore, I recommend that the authors present data as routine to characterize the first group = Train cohort, and second group = external validation cohort.

This order of presentation should be consistent throughout the manuscript text and Tables. In the Tables 1 and 2, the authors first present: Test to Train cohorts (in sequence);

then in Tables 3 and 4: Train to Test cohorts (in sequence)

� The transition in data presentation will make readers (eg, like my case) confused and take some time to reread, rethink the study data.

Therefore, I highly recommend a consistent presentation style of study data, I prefer training to testing cohort data presentation (from left to right), as the conventional way.

4.2 In Lines 213 to 218, the authors do not need to describe detailed data about confusion matrices of all machine learning models one by one, because readers can track all these information in Figures 2 and 4 presented. Only salient features from these data should be stated in the manuscript text.

4.3 Lines 237-239 and Lines 242-245

“The estimated PPV area is calculated as a function of the pre-test probability (prevalence) using the

sensitivity and specificity of each classifier obtained in the training dataset (Gipuzkoa):

(sensitivity * prevalence) / ((sensitivity * prevalence) + (1 - specificity) * (1 - prevalence)).”

“The real PPVs in Gipuzkoa and Bajo Deba are calculated as the true positives divided by the sum of the true positives and false positives, obtained in each dataset: TP / (TP + FP)”

� The formulas should be relocated in the Methods section, as it is more appropriate.

5. Discussion:

The discussion is comprehensively discussed and well written. From this study, the machine learning models outperformed the Bayesian modelling, as shown in a different study setting with different prevalence of TPE and malignancy.

Conclusion:

� I think this is a great study, and minor amendments are suggested to make it more comprehensible for readers. I agree that this study is appropriate for publication.

Many thanks,

Best regards,

Reviewer #2: Overall well-written.

See attached DOCX file for some reorganizing suggestions.

Some of the Results have been included in the Methods section.

The incidence of TB between the two groups is not so significant.

**********

6. PLOS authors have the option to publish the peer review history of their article (what does this mean? ). If published, this will include your full peer review and any attached files.

If you choose “no”, your identity will remain anonymous but your review may still be made public.

Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy .

Reviewer #1: Yes:  Nguyen Tat Thanh (MD, PhD)

Reviewer #2: No

**********

[NOTE: If reviewer comments were submitted as an attachment file, they will be attached to this email and accessible via the submission site. Please log into your account, locate the manuscript record, and check for the action link "View Attachments". If this link does not appear, there are no attachment files.]

While revising your submission, please upload your figure files to the Preflight Analysis and Conversion Engine (PACE) digital diagnostic tool, https://pacev2.apexcovantage.com/ . PACE helps ensure that figures meet PLOS requirements. To use PACE, you must first register as a user. Registration is free. Then, login and navigate to the UPLOAD tab, where you will find detailed instructions on how to use the tool. If you encounter any issues or have any questions when using PACE, please email PLOS at figures@plos.org . Please note that Supporting Information files do not need this step.

Attachment

Submitted filename: Peer Review-Manuscript PONE-D-25-09667.pdf

pone.0329668.s003.pdf (220KB, pdf)
Attachment

Submitted filename: PONE-D-25-09667_reviewer.docx

pone.0329668.s004.docx (435.3KB, docx)
PLoS One. 2025 Sep 5;20(9):e0329668. doi: 10.1371/journal.pone.0329668.r002

Author response to Decision Letter 1


4 Jun 2025

Reviewer #1: Peer Review for Manuscript PONE-D-25-09667

Date: 9-April-2025

“Machine learning for predicting the diagnosis of pleural tuberculosis: external validation and accuracy in two different settings”

- (REVIEWER): These are my comments of the peer review for the manuscript requested.

My general comments:

In general, the authors presented a well-conducted of external validation study. This research was developed, based on a authors’ recent publication of development prospective cohort model (reported in 2021) of Adenosine deaminase (ADA) for pleural tuberculosis in low tuberculosis (TB) prevalence. The training cohort showed promising results, providing a good rationale for an external validation study. The analysis is very good; however, I think there is room for improvement in writing. To my own perception, I think the structure and presentation of this article writing are still not smooth.

* (AUTHORS) Thank you very much for your general suggestion about the structure of the work. We have thoroughly followed your comments and, as a result, we think that we have improved the overall structure of the paper. Particularly, we have presented the training results first and then the testing results. Besides, we have improved the Material and Methods section,

- (R) The plus point of this external validation is that the developed pleural TB model (TPE) was tested in a completely different cohort with different TB/TPE incidence. In this study, although test cohort showed a bit lower accuracy, predictive values, than the Train study cohort. I think these real-life data are highly appreciated, demonstrating the applicability of the machine learning TPE models. This highlights the real-life data testing in the external validation cohort.

In clinical practice, tuberculous pleural effusion (TPE) is difficult for diagnosis, particularly in immunocompromised patients and/or limited resource countries with high TB burden. Hence, I raise a clinical question regarding the immune status of the study participants (training and test cohorts). Because HIV co-infection prevalence is highly significant among patients with TPE, do authors have any data about HIV infection or immune testing (CD4 cell counts…etc) in these cohorts?

* (A) Thank you for your question. Indeed, tuberculosis is more prevalent amongst patients coinfected with HIV, but in this sense we must state that:

- Screening for Human Immunodeficiencty virus (HIV) was performed in all patients diagnosed with any form of tuberculosis in the Gipuzkoa Region following the guidelines of the Tuberculosis Control Program implemented in the Basque Country since 2003.

- There were no cases of TPE coinfected with HIV in our series from 2013 to 2022 in Gipuzkoa Region, namely the Training Cohort. Only three patients diagnosed with HIV infection developed pleural effusion through this period, and it was malignant in all cases, as we reported in our prospective project (1). The absence of cases of TPE amongst HIV coinfected patients would have probably been due to the widespread antiretroviral treatment.

- Unfortunately, regarding the Testing Cohort in Bajo Deba Health District from 1996 to 2012, patients coinfected with HIV were attended and followed in the Regional Donostia University Hospital. This cohort was retrospective and we only have the information stored at the Bajo Deba Health District Hospital.

- Nevertheless, and as pointed out in our first report from 1998 to 2008 (2), ADA accuracy is known to be equally reliable in HIV-positive patients with TPE, even in those with low CD4 T-cell count (3,4), and even in renal transplant recipients (5).

o 1) Garcia-Zamalloa A, Vicente D, Arnay R, Arrospide A, Taboada J, Castilla-Rodriguez I, et al. (2021) Diagnostic accuracy of adenosinedeaminase for pleural tuberculosis in a low prevalence setting: A machine learning approach within a 7-year prospective multi-center study. PLoS ONE 16(11): e0259203. https://doi.org/10.1371/journal.pone.0259203

o 2) Garcia-Zamalloa A, et al. (2012) Diagnostic accuracy of adenosine deaminase and lymphocyte proportion in pleural fluid for tuberculous pleurisy in different prevalence scenarios. PLoSONE 7 (6): e38729.

o 3) Riantawan P, et al. (1999) Diagnostic value of pleural fluid adenosine deaminase in tuberculous pleuritis with reference to HIV coinfection and a Bayesian analysis. Chest 116: 97-103.

o 4) Baba K, et al. (2008) Adenosine deaminase activity is a sensitive marker for the diagnosis of tuberculous pleuritis in patients with low CD4 counts. PLoSONE 3 (7): e2788.

o 5) Krenke R, et al. (2010) Use of pleural fluid levels of adenosine deaminase and interferon gamma in the diagnosis of tuberculous pleuritis. Curr Opin Pulm Med 16: 367-375.

- (R) In additionally, a short justification for sample size calculation in external validation cohorts enhances the validity of the study as well.

* (A) Thank you very much for the suggestion. We have included this information in the manuscript.

Besides, we must state that our intention was to include as many pleural effusions as possible in the Testing Cohort, due to the fact that it was retrospective (Bajo Deba 1996-2012).

Nevertheless, as expressed in our report from 2021 (1), calculation of minimal sample size was set to 200 patients

o 1) Garcia-Zamalloa A, Vicente D, Arnay R, Arrospide A, Taboada J, Castilla-Rodriguez I, et al. (2021) Diagnostic accuracy of adenosinedeaminase for pleural tuberculosis in a low prevalence setting: A machine learning approach within a 7-year prospective multi-center study. PLoS ONE 16(11): e0259203. https://doi.org/10.1371/journal.pone.0259203

- (R) My specific comments:

1. Title: The title fully describes study aim and objectives.

* (A) We finally decided to modify the title by following the Reviewer 2´s suggestion: “Machine learning for predicting the diagnosis of tuberculous versus malignant pleural effusion: external validation and accuracy in two different settings”. Thank you.

2. Abstract: The abstract is well written. However, I also think one minor point for amendment, as below:

Lines 45 to 47:

The authors presented a sample size of the Test cohort with 832 consecutive patients in Bajo Deba health district (1996-2012), but did not show the sample size of the Train cohort from a prospective cohort study (2013-2020). To be consistent and transparent in data, the sample size (how many patients in Train cohort?) should be described (stated) in this section as well.

* (A) Thank you for the suggestion. We do find it very reasonable. We have included the number of pleural effusions of the Training cohort from 2013 to 2020, subsequently extended to 2022.

3. Introduction:

There is room for improvement in the introduction section, as follows:

Line 67: 95% UI needs to be fully written, as readers are not familiar with this term-UI.

* (A) Thank you for the suggestion. We have modified it.

- (R) Lines 92 to 94: “The model is freely available as an app (at https://pleurapp.ispana.es/) to help other physicians or thoracic surgeons apply this approach when dealing with exudative and lymphocytic pleural effusions”

� This information should not be placed in the introduction because it was not connected with the flow of the main idea discussed. This can be relocated to the discussion as appropriate.

* (A) Thank you very much for your suggestion. We have moved it into the Discussion chapter as last paragraph.

- (R) Lines 99 to 100 in the introduction section:

“we compared the 99 diagnostic accuracy of the ML procedure and the classical Bayesian analysis system for TPE in both 100 different clinical scenarios (Bajo Deba 1996-2012 and Gipuzkoa 2013-2022)”.

� The authors mentioned two study cohorts to be modelled without brief introduction before. Hence, I recommend a brief introduction (1-2 sentences) of these 2 cohorts in the previous paragraphs.

* (A) Thank you for the suggestion. We have included a brief exposition regarding the pleural effusions included in the two cohorts and the two different prevalence settings. Following the amendments of Reviewer 2, we also changed the term “higher prevalence setting” to “different prevalence setting”.

4. Materials and Methods and Results

There is room for Materials and Methods section for improvement, regarding data presentation.

4.1 I can understand that the authors aimed to place emphasis on the Test cohort (external validation), so proactively present the Test study cohort as the first group, while the Train study cohort as the second group. Intuitively, this style of data presentation brings readers (like me) to a certain level of confusion and needs to reread and rethink. Therefore, I recommend that the authors present data as routine to characterize the first group = Train cohort, and second group = external validation cohort.

This order of presentation should be consistent throughout the manuscript text and Tables. In the Tables 1 and 2, the authors first present: Test to Train cohorts (in sequence);

then in Tables 3 and 4: Train to Test cohorts (in sequence)

� The transition in data presentation will make readers (eg, like my case) confused and take some time to reread, rethink the study data.

Therefore, I highly recommend a consistent presentation style of study data, I prefer training to testing cohort data presentation (from left to right), as the conventional way.

* (A) Thank you for the recommendation. We have restructured the text and tables to introduce Gipuzkoa (Training) first and then Bajo Deba (Testing).

4.2 In Lines 213 to 218, the authors do not need to describe detailed data about confusion matrices of all machine learning models one by one, because readers can track all these information in Figures 2 and 4 presented. Only salient features from these data should be stated in the manuscript text.

* (A) Thank you for the suggestion. We have added a clarification at the beginning of the paragraph to emphasize that we refer to the TP, FP, TN and FN values of a classification method simply consisting of using the ADA 40 + LP 50 criterion. The aim of this paragraph is to present a comparison with the results obtained by ML models shown in Figures 2 and 4.

4.3 Lines 237-239 and Lines 242-245

“The estimated PPV area is calculated as a function of the pre-test probability (prevalence) using the

sensitivity and specificity of each classifier obtained in the training dataset (Gipuzkoa):

(sensitivity * prevalence) / ((sensitivity * prevalence) + (1 - specificity) * (1 - prevalence)).”

“The real PPVs in Gipuzkoa and Bajo Deba are calculated as the true positives divided by the sum of the true positives and false positives, obtained in each dataset: TP / (TP + FP)”

� The formulas should be relocated in the Methods section, as it is more appropriate.

* (A) Following your recommendation, we have modified the Material and Methods section to introduce the comparative analysis of ML and Bayesian analysis for estimating positive and negative predictive values as a function of pre-test probability. We have placed the mentioned formulas in this section.

5. Discussion:

The discussion is comprehensively discussed and well written. From this study, the machine learning models outperformed the Bayesian modelling, as shown in a different study setting with different prevalence of TPE and malignancy.

Conclusion:

� I think this is a great study, and minor amendments are suggested to make it more comprehensible for readers. I agree that this study is appropriate for publication.

* (A) Many thanks,

Best regards,

- Reviewer #2: Overall well-written.

See attached DOCX file for some reorganizing suggestions.

Some of the Results have been included in the Methods section.

* (A) Thank you for the amendment. It is true that in the Material and Methods section it is shown a comparative analysis of our data. However, the aim of this analysis is to show that there are no statistically significant differences between both data sets. We prefer to see this analysis as part of the methodology that we followed to validate our Materials (Data), in order to proceed to train and test ML models with this data, rather than threat it as a result by itself. Also, following some suggestions of Reviewer 1, we have expanded the Material and Methods chapter with the methodology followed to train and test the ML models and the comparative analysis of ML and Bayesian analysis for estimating positive and negative predictive values as a function of pre-test probability

- (R) The incidence of TB between the two groups is not so significant.

* (A) Thank you for the suggestion. We modified the term “higher” prevalence for “different” prevalence.

We also changed the title following Reviewer 2’s suggestion and we made some corrections to keep a consistent order in the presentation of the Training and Testing results (in this order).

Attachment

Submitted filename: 20250531_response_to_reviewers.docx

pone.0329668.s006.docx (24KB, docx)

Decision Letter 1

Guocan Yu

21 Jul 2025

Machine learning for predicting the diagnosis of tuberculous versus malignant pleural effusion: external validation and accuracy in two different settings.

PONE-D-25-09667R1

Dear Dr. Alberto Garcia-Zamalloa,

We’re pleased to inform you that your manuscript has been judged scientifically suitable for publication and will be formally accepted for publication once it meets all outstanding technical requirements.

Within one week, you’ll receive an e-mail detailing the required amendments. When these have been addressed, you’ll receive a formal acceptance letter and your manuscript will be scheduled for publication.

An invoice will be generated when your article is formally accepted. Please note, if your institution has a publishing partnership with PLOS and your article meets the relevant criteria, all or part of your publication costs will be covered. Please make sure your user information is up-to-date by logging into Editorial Manager at Editorial Manager®  and clicking the ‘Update My Information' link at the top of the page. If you have any questions relating to publication charges, please contact our Author Billing department directly at authorbilling@plos.org.

If your institution or institutions have a press office, please notify them about your upcoming paper to help maximize its impact. If they’ll be preparing press materials, please inform our press team as soon as possible -- no later than 48 hours after receiving the formal acceptance. Your manuscript will remain under strict press embargo until 2 pm Eastern Time on the date of publication. For more information, please contact onepress@plos.org.

Kind regards,

Guocan Yu

Academic Editor

PLOS ONE

Additional Editor Comments (optional):

Reviewers' comments:

Reviewer's Responses to Questions

Comments to the Author

1. If the authors have adequately addressed your comments raised in a previous round of review and you feel that this manuscript is now acceptable for publication, you may indicate that here to bypass the “Comments to the Author” section, enter your conflict of interest statement in the “Confidential to Editor” section, and submit your "Accept" recommendation.

Reviewer #1: All comments have been addressed

Reviewer #3: All comments have been addressed

**********

2. Is the manuscript technically sound, and do the data support the conclusions?

The manuscript must describe a technically sound piece of scientific research with data that supports the conclusions. Experiments must have been conducted rigorously, with appropriate controls, replication, and sample sizes. The conclusions must be drawn appropriately based on the data presented.

Reviewer #1: Yes

Reviewer #3: Yes

**********

3. Has the statistical analysis been performed appropriately and rigorously?

Reviewer #1: Yes

Reviewer #3: Yes

**********

4. Have the authors made all data underlying the findings in their manuscript fully available?

The PLOS Data policy requires authors to make all data underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data—e.g. participant privacy or use of data from a third party—those must be specified.

Reviewer #1: Yes

Reviewer #3: Yes

**********

5. Is the manuscript presented in an intelligible fashion and written in standard English?

PLOS ONE does not copyedit accepted manuscripts, so the language in submitted articles must be clear, correct, and unambiguous. Any typographical or grammatical errors should be corrected at revision, so please note any specific errors here.

Reviewer #1: Yes

Reviewer #3: Yes

**********

6. Review Comments to the Author

Please use the space provided to explain your answers to the questions above. You may also include additional comments for the author, including concerns about dual publication, research ethics, or publication ethics. (Please upload your review as an attachment if it exceeds 20,000 characters)

Reviewer #1: Peer Review for Manuscript PONE-D-25-09667R1

Date: 10-June-2025

“Machine learning for predicting the diagnosis of tuberculous versus malignant pleuraleffusion: external validation and accuracy in two different settings”

The revision mauniscript is much improved. All the suggested points have been resolved. I agree that the paper is published.

My general comments:

In general, the authors presented a well-conducted of external validation study. This research was developed, based on a authors’ recent publication of development prospective cohort model (reported in 2021) of Adenosine deaminase (ADA) for pleural tuberculosis in low tuberculosis (TB) prevalence. The training cohort showed promising results, providing a good rationale for an external validation study. The analysis is very good; however, I think there is room for improvement in writing. To my own perception, I think the structure and presentation of this article writing are still not smooth.

The plus point of this external validation is that the developed pleural TB model (TPE) was tested in a completely different cohort with different TB/TPE incidence. In this study, although test cohort showed a bit lower accuracy, predictive values, than the Train study cohort. I think these real-life data are highly appreciated, demonstrating the applicability of the machine learning TPE models. This highlights the real-life data testing in the external validation cohort.

In clinical practice, tuberculous pleural effusion (TPE) is difficult for diagnosis, particularly in immunocompromised patients and/or limited resource countries with high TB burden. Hence, I raise a clinical question regarding the immune status of the study participants (training and test cohorts). Because HIV co-infection prevalence is highly significant among patients with TPE, do authors have any data about HIV infection or immune testing (CD4 cell counts…etc) in these cohorts?

In additionally, a short justification for sample size calculation in external validation cohorts enhances the validity of the study as well.

My review: All my comments were appropriately answered.

My specific comments:

1. Title: The title fully describes study aim and objectives.

My review: The updated title is accepted.

2. Abstract: The abstract is well written. However, I also think one minor point for amendment, as below:

Lines 45 to 47:

The authors presented a sample size of the Test cohort with 832 consecutive patients in Bajo Deba health district (1996-2012), but did not show the sample size of the Train cohort from a prospective cohort study (2013-2020). To be consistent and transparent in data, the sample size (how many patients in Train cohort?) should be described (stated) in this section as well.

My review: These comments have been amended as appropriate.

3. Introduction:

There is room for improvement in the introduction section, as follows:

Line 67: 95% UI needs to be fully written, as readers are not familiar with this term-UI.

My review: My comment was resolved as appropriate.

Lines 92 to 94: “The model is freely 92 available as an app (at https://pleurapp.ispana.es/) to help other physicians or thoracic surgeons apply this approach when dealing with exudative and lymphocytic pleural effusions”

� This information should not be placed in the introduction because it was not connected with the flow of the main idea discussed. This can be relocated to the discussion as appropriate.

My review: My comment was resolved as appropriate.

Lines 99 to 100 in the introduction section:

“we compared the 99 diagnostic accuracy of the ML procedure and the classical Bayesian analysis system for TPE in both 100 different clinical scenarios (Bajo Deba 1996-2012 and Gipuzkoa 2013-2022)”.

� The authors mentioned two study cohorts to be modelled without brief introduction before. Hence, I recommend a brief introduction (1-2 sentences) of these 2 cohorts in the previous paragraphs.

My review: My comment was resolved as appropriate.

4. Materials and Methods and Results

There is room for Materials and Methods section for improvement, regarding data presentation.

4.1 I can understand that the authors aimed to place emphasis on the Test cohort (external validation), so proactively present the Test study cohort as the first group, while the Train study cohort as the second group. Intuitively, this style of data presentation brings readers (like me) to a certain level of confusion and needs to reread and rethink. Therefore, I recommend that the authors present data as routine to characterize the first group = Train cohort, and second group = external validation cohort.

This order of presentation should be consistent throughout the manuscript text and Tables. In the Tables 1 and 2, the authors first present: Test to Train cohorts (in sequence);

then in Tables 3 and 4: Train to Test cohorts (in sequence)

� The transition in data presentation will make readers (eg, like my case) confused and take some time to reread, rethink the study data.

Therefore, I highly recommend a consistent presentation style of study data, I prefer training to testing cohort data presentation (from left to right), as the conventional way.

My review: My comment was resolved as appropriate.

4.2 In Lines 213 to 218, the authors do not need to describe detailed data about confusion matrices of all machine learning models one by one, because readers can track all these information in Figures 2 and 4 presented. Only salient features from these data should be stated in the manuscript text.

My review: My comment was resolved as appropriate.

4.3 Lines 237-239 and Lines 242-245

“The estimated PPV area is calculated as a function of the pre-test probability (prevalence) using the

sensitivity and specificity of each classifier obtained in the training dataset (Gipuzkoa):

(sensitivity * prevalence) / ((sensitivity * prevalence) + (1 - specificity) * (1 - prevalence)).”

“The real PPVs in Gipuzkoa and Bajo Deba are calculated as the true positives divided by the sum of the true positives and false positives, obtained in each dataset: TP / (TP + FP)”

� The formulas should be relocated in the Methods section, as it is more appropriate.

My review: My comment was resolved as appropriate.

5. Discussion:

The discussion is comprehensively discussed and well written. From this study, the machine learning models outperformed the Bayesian modelling, as shown in a different study setting with different prevalence of TPE and malignancy.

My review: My comment was resolved as appropriate.

Conclusion:

� I think this is a great study. The revised manuscript resolved all my comments. I agree that this study is appropriate for publication.

Many thanks,

Best regards,

Reviewer #3: Authors have addressed the revisions. All required questions have been answered and that all responses meet formatting specifications

**********

7. PLOS authors have the option to publish the peer review history of their article (what does this mean? ). If published, this will include your full peer review and any attached files.

If you choose “no”, your identity will remain anonymous but your review may still be made public.

Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy .

Reviewer #1: Yes:  Nguyen Tat Thanh, MD PhD

Reviewer #3: Yes:  Harun Agca

**********

Acceptance letter

Guocan Yu

PONE-D-25-09667R1

PLOS ONE

Dear Dr. Garcia-Zamalloa,

I'm pleased to inform you that your manuscript has been deemed suitable for publication in PLOS ONE. Congratulations! Your manuscript is now being handed over to our production team.

At this stage, our production department will prepare your paper for publication. This includes ensuring the following:

* All references, tables, and figures are properly cited

* All relevant supporting information is included in the manuscript submission,

* There are no issues that prevent the paper from being properly typeset

You will receive further instructions from the production team, including instructions on how to review your proof when it is ready. Please keep in mind that we are working through a large volume of accepted articles, so please give us a few days to review your paper and let you know the next and final steps.

Lastly, if your institution or institutions have a press office, please let them know about your upcoming paper now to help maximize its impact. If they'll be preparing press materials, please inform our press team within the next 48 hours. Your manuscript will remain under strict press embargo until 2 pm Eastern Time on the date of publication. For more information, please contact onepress@plos.org.

You will receive an invoice from PLOS for your publication fee after your manuscript has reached the completed accept phase. If you receive an email requesting payment before acceptance or for any other service, this may be a phishing scheme. Learn how to identify phishing emails and protect your accounts at https://explore.plos.org/phishing.

If we can help with anything else, please email us at customercare@plos.org.

Thank you for submitting your work to PLOS ONE and supporting open access.

Kind regards,

PLOS ONE Editorial Office Staff

on behalf of

Dr. Guocan Yu

Academic Editor

PLOS ONE

Associated Data

    This section collects any data citations, data availability statements, or supplementary materials included in this article.

    Supplementary Materials

    S1 Table. Number of cases by diagnosis obtained in the Training and Testing groups.

    (DOCX)

    pone.0329668.s001.docx (15.8KB, docx)
    S2 Table. Diagnostic criteria met by tuberculous and malignant cases (listing each case just once under the highest quality criterion among all the criteria met).

    (DOCX)

    pone.0329668.s002.docx (13.8KB, docx)
    Attachment

    Submitted filename: Peer Review-Manuscript PONE-D-25-09667.pdf

    pone.0329668.s003.pdf (220KB, pdf)
    Attachment

    Submitted filename: PONE-D-25-09667_reviewer.docx

    pone.0329668.s004.docx (435.3KB, docx)
    Attachment

    Submitted filename: 20250531_response_to_reviewers.docx

    pone.0329668.s006.docx (24KB, docx)

    Data Availability Statement

    All the data included in the study are fully available and uploaded in the Zenodo public repository: https://doi.org/10.5281/zenodo.15576280 Additionally we have deposited our laboratory protocols in protocols.io, in order to enhance the reproducibility of our results: dx.doi.org/10.17504/protocols.io.4r3l2pw53g1y/v1.


    Articles from PLOS One are provided here courtesy of PLOS

    RESOURCES