ABSTRACT
Improved biomarkers for predicting progression to active tuberculosis (TB) are urgently needed, especially in people with HIV, who are at elevated risk. We used high-throughput plasma proteomics and machine learning to identify signatures associated with TB progression in this population. From the Swiss HIV Cohort Study, we analyzed plasma samples collected at least 6 months before TB diagnosis from 91 participants who later developed TB. We selected 293 controls matched for demographic and clinical parameters who remained TB-free to achieve a risk score specific to active TB. In total, 583 samples were analyzed, with 613–1,283 proteins quantified per sample. A random forest classifier predicted a significantly higher median probability of TB progression for cases (33%) than for controls (16%; P < 0.001). In this matched population, the score achieved an area under the receiver-operating characteristic curve of 0.77, an area under the precision-recall curve (AUPRC) of 0.60 (as compared to an expected AUPRC of 0.29), as well as a specificity of 87.3% and a sensitivity of 58.6% using the optimal threshold of 0.311. The plasma proteome of individuals who progressed to active TB showed a distinct shift toward systemic inflammation, B cell activation, and immunoglobulin production. Independent of progression to active TB, the proteome score correlated with broader indicators of immune suppression, including lower CD4 counts and unsuppressed HIV RNA. This suggests that integrating proteomic and clinical data could enhance the overall predictive power of the score.
IMPORTANCE
We still lack reliable tools to predict who will develop tuberculosis (TB) among people with HIV. Moreover, the underlying biological events driving progression remain poorly understood. Our study reveals early immune changes that include unexpected alterations in B cell activation and antibody responses. These findings suggest that humoral immunity may play a more important role in TB pathogenesis than previously recognized and offer promising new directions for biomarker discovery and targeted prevention.
KEYWORDS: HIV/TB co-infection, predicting tuberculosis, machine learning, proteomics
INTRODUCTION
Active tuberculosis (TB) remains a leading cause of death from an infectious agent, with around 1.25 million deaths globally in 2023 (1). While it is estimated that around 28% of the global population has been exposed to Mycobacterium tuberculosis (MTB), fewer than 10% of infected individuals progress to active TB (2). Predicting progression to active TB is thus critical toward identifying individuals at highest risk and allocating preventive measures effectively. A key milestone in achieving this goal, also integrated into the World Health Organization’s strategy to end the global TB epidemic, involves developing reliable biomarkers to diagnose and predict active TB (3).
People with human immunodeficiency virus (HIV) (PWH) are disproportionately affected by the TB epidemic (4). HIV accelerates the progression to active TB, and predicting active TB in PWH is particularly challenging. Current immunological tests for MTB infection, such as the tuberculin skin test (TST) and interferon-gamma release assays (IGRA), depend on a functional immune system and specifically rely on CD4+ T cells. Since HIV-1 targets CD4+ T cells, leading to their depletion and dysfunction, these tests often perform poorly in PWH. Even with antiretroviral therapy (ART) and CD4+ T cell recovery, the MTB-specific CD4+ T cell repertoire remains compromised in most individuals (5, 6). Since the risk for the emergence of active TB is highest within the first 2 years following MTB exposure, developing predictive tools that identify progression toward active TB within the critical first 2-to-4-year window would enable more targeted preventive strategies (7, 8).
This project utilizes high-throughput quantitative proteomics analysis of samples collected in the Swiss HIV Cohort Study (SHCS). Samples from PWH who developed active TB were compared with matched controls (i.e., PWH who did not develop active TB). Using machine learning techniques, we developed a risk score for active TB progression and analyzed its association with other participant characteristics. The workflow for generating and validating the protein-based risk score is visualized in Fig. 1.
Fig 1.
Representation of the workflow toward generating a protein risk score for developing active tuberculosis. Created in BioRender. Kusejko, K. (2025) (https://BioRender.com/ske23uw).
MATERIALS AND METHODS
Swiss HIV Cohort Study (SHCS)
The SHCS launched in 1988 is a multicentric cohort study enrolling adult PWH in Switzerland (9). In biannual follow-up visits, in-depth clinical, demographic, and laboratory information are collected. In addition, plasma and peripheral blood mononuclear cell (PBMC) samples are collected routinely for each participant on a yearly basis. According to standard care guidelines, a tuberculosis test (either TST or IGRA) is performed once at the SHCS enrollment for every participant. The SHCS was approved by the ethics committees of the participating institutions (BASEC-Nr. 2023-02080), and all participants provided written informed consent.
Definitions
Active tuberculosis is recorded as follows: (i) definitive pulmonary tuberculosis: confirmed by direct detection (i.e., PCR approaches or culture of pulmonary sample); (ii) presumptive pulmonary tuberculosis: by detection of acid-fast bacilli in sputum without confirmation by culture or PCR and response to specific treatment, or suggestive infiltrate on chest X-ray and anamnestic exposure to TB and response to specific treatment; (iii) definitive extrapulmonary tuberculosis: by culture of non-pulmonary sample; and (iv) presumptive extrapulmonary tuberculosis: by the detection of acid-fast bacilli in stool, blood, bodily fluids, or tissue or through histological analysis of lymph nodes, where the mycobacterial species cannot be identified by culture. This diagnosis is further supported by the presence of a concurrent confirmed diagnosis of pulmonary tuberculosis or by a clinical response to standard anti-tuberculosis treatment (8–10).
Selection of the study population
All SHCS participants with a diagnosis of active TB and available plasma samples between 6 months and 4 years before diagnosis were included in this study. A control population (i.e., SHCS participants without diagnosis of active TB and without reported preventive treatment against TB) (rifampicin, rifabutin, isoniazid) was matched with a ratio of 1:3 based on a weighted algorithm, including body mass index (BMI) at sample date, age, gender, ethnicity, CD4 cell counts, HIV transmission group (men who have sex with men [MSM], heterosexual, other), and HIV RNA at the sample date. The summed weights concerning these six characteristics were based on a previous analysis in the same population (8). To exclude a diagnosis of active TB for the control population, we required a follow-up of at least 5 years after the respective sample date. Furthermore, we required that either both the case and the selected control have another opportunistic infection within 3 months of the sample date, or both have none. For each case sample, up to 3 control samples were selected. In order to balance potential MTB infection within the control population, around half of the controls were chosen having a positive IGRA or TST result (control MTB+), with the other half having negative IGRA or TST results (control MTB−).
Sample preparation
The samples were ordered from the decentralized SHCS biobank and centrally processed at the University Hospital Zurich within 1 week by the same person using the PreOmics iST Kit in the 96-well format (11). The processing involved the addition of 4 µL of plasma and 100 µL of lysis buffer, followed by loading 50 µL onto the cartridge. On each batch, standardization samples were added as quality control to potentially detect plate differences consisting of a mixture of 2–5 µL of each included sample. Case and control samples were randomly assigned across batches (Fig. S1).
Proteomics measurement
Quantitative data were acquired in dia-PASEF mode (data independent acquisition-parallel accumulation serial fragmentation) on a timsTOF Pro mass spectrometer equipped with the nanoElute Chromatography System (Bruker) (11, 12). This method uses trapped ion mobility separations to increase sensitivity and resolving power combined with data-independent acquisition to ensure robustness and data completeness. All analyses were performed using ~200 ng purified peptides. More details on the workflow can be found in the supplemental methods. To monitor the variation between sample batches and workflow steps, we employed two types of quality control (QC) samples: the first QC set consisted of a pooled sample (full process QC) included twice on every plate. The second QC set comprised a random mixture of 100 samples spiked with iRT peptides analyzed 78 times throughout the complete data acquisition process to assess instrument performance (measurement QC). These raw mass spectrometry (MS) data were monitored using QuiC software (Biognosys, Version 4) to ensure reproducibility of the QC indices. In addition, to assess potential issues with sample handling in the preparation phase, we calculated 3 quality measures suggested by Geyer et al.: the ratio of platelet and erythrocyte contamination as well as sample coagulation (13).
Proteomics data analysis
Spectronaut (version 18.1) was used for comparative analysis to transform diaPASEF raw data into precursor and fragment identifications by deconvolution-based analysis. The spectral library was generated using all raw data and 14 IM-GPF data using the default settings and used for library-based search analysis. All data were searched against the human proteome sequence database downloaded from UniProt on 7 January 2022, comprising 20,387 entries, using trypsin as the digestion enzyme. Cysteine carbamidomethylation was set as a fixed modification, and methionine oxidation and excision at the N terminus were selected as variable modifications. A maximum of two missed cleavages and up to five variable modifications were allowed. The FDR cutoff was set to 1% at PSM, peptide, and protein levels. The cutoffs for precursor q-value, precursor peptide, and protein q-value (experiment and run) were all set to 0.01 (14). Protein inference used the IDPicker algorithm using peak area for quantification. Imputation and normalization were performed using default global imputing and cross-run normalization methods, respectively (15). Log10 transformation was applied to the data matrix containing quantity information of 1,412 different proteins for each sample.
Statistical analysis
All analyses were carried out using R (version 4.3.1). The main steps were the following: (i) training and validation set: we randomly split participants with active TB (cases) and their matched controls into 1/3 validation and 2/3 training cases, with the corresponding matched controls comprising the validation and training data set; (ii) selection of candidate proteins: within the training set, potential candidate proteins for the development of a prognostic TB score were selected using a linear mixed-effects model (R package lme4), including a time trend concerning time to TB diagnosis; all proteins with a P-value of less than 0.03, unadjusted for multiple testing at this pre-selection step, either for the comparison case versus control or the interaction with time were selected as potential candidate proteins; with this threshold, we selected roughly 5% of the full library as potential candidates; (iii) development of a prognostic score: a random forest algorithm (R package caret) with fivefold cross-validation (i.e., further division of the training set), Cohen’s kappa as accuracy metric, and weighing of the cases with a ratio of 1:3 (reflecting the distribution of cases and controls) was applied using the pre-selected protein candidates; here, only one sample per patient was used, namely, the sample closest to the active TB diagnosis for cases; and (iv) validation of the prognostic score: the score was evaluated using the validation set to assess the final score performance; we used the area under the receiver operating characteristic (ROC) curve (AUC) as well as the area under the precision-recall curve (AUPRC) to assess precision in this imbalanced data set (ratio case versus control was around 1:3); in addition, the model performance stratified by MTB+ and MTB− controls (using T tests to evaluate differences in the score) was assessed, as well as the correlation of the timing of the sample and the score by using linear regression.
Association of the score and clinical characteristics
The association of the score with the following demographic and clinical characteristics was assessed within the validation set: age, sex, ethnicity (white, black, other), body mass index, MTB status, diagnosis of additional opportunistic infections, and several laboratory parameters (HIV RNA, creatinine, leukocyte count, triglycerides, CD3, CD4, CD8 cell count, HDL, cholesterol, glucose).
Comparison with existing scores
To understand the performance of the newly generated risk score compared to that of other published diagnostic and prognostic risk scores for active TB, we searched for published scores where all proteins were available in our 1,412-protein library. For each identified published score, we first re-trained a random forest model with restriction to proteins included in the published risk score using the same training set as before and evaluated the score on our validation set (i.e., a validation set not included for the generation of the model). We repeated this procedure for all identified published risk scores separately. We compared the performance of our developed score and the published scores using the AUC-ROC, as this was the metric usually assessed in other publications.
RESULTS
Patient characteristics
We identified 91 participants with a diagnosis of active TB, with a total of 178 available samples (68 in the range of 6–12 months, 62 in the range of 1–2 years, and 48 in the range of 2–4 years before active TB diagnosis). The control population comprised 293 SHCS participants with a total of 405 samples (Fig. S2). Most of the cases were male (58.2%) with a median age at diagnosis of 38 (interquartile range [IQR] = 31–43.5). Around half (46.1%) had a suppressed HIV RNA viral load (<400 copies/mL) at the sample date closest to TB diagnosis (Table 1).
TABLE 1.
Basic characteristics of the included participants stratified by cases and controls
| Parameter | Cases | Controls |
|---|---|---|
| Total no. of patients | 91 | 293 |
| Gender | ||
| Male | 53 (58.2%) | 169 (57.7%) |
| Female | 38 (41.8%) | 124 (42.3%) |
| Ethnicity | ||
| White | 45 (49.5%) | 183 (62.5%) |
| Black | 34 (37.4%) | 86 (29.4%) |
| Other | 12 (13.2%) | 24 (8.2%) |
| Risk group | ||
| MSM | 19 (20.9%) | 73 (24.9%) |
| HET | 49 (53.8%) | 147 (50.2%) |
| IDU | 20 (22%) | 59 (20.1%) |
| Other | 3 (3.3%) | 14 (4.8%) |
| Age at sample date: median [IQR] | 38 (31–43.5) | 41 (36–44) |
| CD4 at sample date | ||
| >500 | 22 (24.2%) | 128 (43.7%) |
| 350–500 | 25 (27.5%) | 69 (23.5%) |
| 200–350 | 19 (22%) | 54 (19.1%) |
| <200 | 25 (27.5%) | 42 (14.3%) |
| RNA at sample date | ||
| Suppressed (<400 copies/mL) | 42 (46.2%) | 179 (61.1%) |
| >400 copies/mL | 49 (53.8%) | 114 (38.9%) |
| BMI | ||
| <18.5 | 9 (9.9%) | 21 (7.2%) |
| 18.5–25 | 59 (64.8%) | 187 (63.8%) |
| >25 | 21 (23.1%) | 85 (29%) |
| Total no. of samples | 178 | 405 |
| Time period | ||
| 6–12 months | 68 (38.2%) | 165 (40.7%) |
| 1–2 years | 62 (34.8%) | 155 (38.3%) |
| >2 years | 48 (27%) | 85 (21%) |
| Calendar year of samples: median [IQR] | 2005 (2002–2009) | 2007 (2004–2011) |
Plasma proteomics data
The generated spectral library contained 19,041 peptide precursors for 1,412 proteins, from which 613 to 1,283 proteins per sample (median = 906, IQR = 838–1,021) were quantified across the full sample set (Fig. 2A). After reviewing all quality check procedures, all samples could be kept within the data set (Fig. S3 and S4).
Fig 2.
Information on proteins in the library. (A) Number of proteins detected per sample. The red points indicate the samples, which were outliers with respect to other quality metrics (see Fig. S3); (B) significant (P < 0.03) association between cases and controls determined via linear mixed-effects model; (C) time trend of all proteins with respect to time to diagnosis: each line shows the time trend of one protein; and (D) proteins showing higher or increased quantities, plus overlap; and (E) proteins showing lower or decreasing quantities, plus overlap.
Proteomic signature of TB progression reveals enrichment in inflammatory and humoral immune pathways
We obtained 446 samples in the training set, for which 1,412 different proteins were measured. We chose the unadjusted P-value cut-off of 0.03 for a difference between cases and controls or a time trend, respectively, leading to 81 (5.7%) candidate proteins. Of these, 73 were below this chosen cut-off when comparing cases and controls, with 43 showing an upregulation and 30 showing a downregulation (Fig. 2B; Table S1). Furthermore, 29 proteins were below the chosen threshold concerning the time trend, with 21 showing an increase toward diagnosis and eight showing a decrease (Fig. 2C; Table S1). Enriched pathways included acute-phase response, blood coagulation, and wound healing (Fig. 3A through C).
Fig 3.
Importance of candidate proteins: (A) biological processes involved in proteins that are higher in cases or have an increasing slope toward diagnosis of active TB; (B) biological processes involved in proteins that are lower in cases or have a decreasing slope toward diagnosis of active TB; and (C) variable importance within the random forest algorithm.
Figure 3 illustrates the functional interpretation of these candidate proteins. Panels (A) and (B) show chord diagrams mapping proteins with higher abundance in cases or increasing toward TB diagnosis (A) and those lower in cases or decreasing (B) to Gene Ontology biological processes. Upregulated proteins were strongly linked to acute inflammatory responses, B cell-mediated immunity, immunoglobulin production, and defense responses to pathogens. In contrast, downregulated proteins were associated with metabolic processes, wound healing, and hemostasis.
Panel (c) ranks the top 20 proteins by importance in the random forest classifier. Notably, several immunoglobulin heavy and light chain variants featured prominently among the most predictive proteins, reinforcing a central role of humoral immunity in TB progression.
Prognostic score for active TB
A random forest algorithm was developed using 81 candidate proteins and 284 training set samples, selecting one sample per individual (the sample closest to the diagnosis date for cases). The model was validated on a separate data set including 100 samples. The median score for cases progressing to active TB (i.e., the median probability for a case sample progressing to active TB as predicted by the random forest classifier) was 0.33 (IQR: 0.19–0.39) compared to 0.16 (IQR: 0.13–0.26) for controls (P < 0.001), indicating a more than twofold higher predicted risk for TB progression among cases (Fig. 4A). This approximate twofold difference was robust when evaluating samples within 6–12 months (0.34, IQR: 0.19–0.39) and those more than 1 year (0.30, IQR: 0.20–0.41) before diagnosis of active TB separately, although with small sample sizes per category (Fig. S5). Similarly, there was only a slight increase but no significant relationship between the time now as a continuous variable from sample collection to TB diagnosis and the score among 29 test set case samples (P = 0.52, Fig. 4B). However, including all available samples (i.e., training and validation set [N = 583]) did reveal a trend of an increasing score within the same individuals toward TB diagnosis (Fig. S6). Notably, the score did not differ significantly between MTB+ (median: 0.15, IQR: 0.10–0.24) and MTB− (median: 0.17, IQR: 0.13–0.27) controls (P = 0.28).
Fig 4.
Performance of the risk score within the test set: (A) case versus controls, (B) risk score within cases over time, (C) clinical and demographic characteristics associated with the risk score, first unadjusted for case/control status and second adjusted for case/control status, (D) stratification by HIV viral load suppression, and (E) stratification by CD4 cell count.
In terms of the predictive power of the risk score, we obtained an AUC-ROC of 0.77 and an AUPRC of 0.60 (as compared to an expected AUPRC of 0.29, as 29 cases are among the 100 control samples). The optimal threshold on the ROC curve was 0.311, reaching a specificity of 87.3% and a sensitivity of 58.6%.
External validation of previously identified candidate proteins
We identified several previously published protein-based predictive scores with included proteins present in our library (Fig. 5). First, we assessed the three-protein signature developed by Penn-Nicholson et al. based on 46 cases and 106 control samples from an HIV-uninfected adolescent population in South Africa (16). None of the three proteins were among our 81 selected candidate proteins, yet the predictive performance yielded an AUC of 0.63 in our data set compared to 0.72 (0.64–0.81) for 181–360 days before diagnosis in the original publication. Second, we evaluated the candidate proteins from the six-protein prognostic score in the South Africa cohort (N = 30) developed by Singer et al. in a population of PWH. While in the original population, the prognostic performance was reported as an AUC of 0.86 for 180–365 days before diagnosis, we achieved an AUC of 0.60 in our validation set (17). Third, we examined the five-protein panel developed by Garay-Baquero et al., which was based on 11 patients without HIV and 10 healthy controls from South Africa and Peru (18). Four of the five proteins were also identified in our candidate set, yielding an AUC of 0.63 in our validation set. Next, we tested the three-protein panel by Xu et al., which was developed from 40 cases and 40 controls in China (19). Only one of these proteins overlapped with our candidate set, resulting in a limited predictive performance of AUC = 0.49 in our data set. Finally, we evaluated the three-protein biomarker identified by Jiang et al. based on 136 cases and 66 controls in China (20). Two of the three proteins overlapped with our candidate set, achieving an AUC of 0.57 in our validation set. The last three reported scores were only assessed for diagnostic performance in the original publications; thus, the direct comparison with our obtained AUCs for prognostic purposes should not be overinterpreted.
Fig 5.
Performance of other candidate protein sets within our sample set. (A) The five investigated publications are listed in bold overlapping proteins within our 81-candidate set. The AUC-ROC reported in the original publication as well as the AUC-ROC of predicting active TB within our validation set are included in the third and fourth columns, respectively; and (B) visualization of the receiver operating characteristics curves (ROC).
Correlation with patient characteristics
Among the clinical and demographic variables tested in the validation cohort, CD4 count (P = 0.0004), suppressed HIV RNA (P = 0.02), cholesterol (P = 0.02), and HDL (P = 0.02) were each significantly associated with the proteomic risk score (Fig. 4C). These associations remained largely significant after adjusting for case/control status, particularly for CD4 count (P = 0.0003), suppressed HIV RNA (P = 0.01), and HDL (P = 0.03). Importantly, the risk score showed consistent shifts across categories of CD4 count and viral suppression status (Fig. 4D and E), indicating that the score reflects established clinical indicators. The most revealing pattern emerged when comparing groups with mismatched clinical and virologic profiles. Cases with suppressed HIV RNA and viremic controls had similar risk scores. However, the score remained distinctly higher in cases than in controls when both groups had suppressed HIV RNA. This distinction highlights the added value of plasma proteomics: it reveals biological differences that are not captured by standard clinical measurements alone.
The highly granular clinical phenotype data available within the SHCS allows us in addition to retrospectively investigating the characteristics of individual patients, introducing substantial additional value by helping us to understand why the score misclassified certain controls. For example, one control patient had a risk score of greater than 0.6 for developing active TB (Fig. 4A). This patient had a CD4 cell count of only 59 cells/µL and a concurrent herpes zoster infection, both indicative of significant immune suppression. Additionally, three control patients with risk scores between 0.4 and 0.5 had CD4 counts of 117, 364, and 280 cells/µL, respectively. For reference, the mean CD4 count among control samples was 380 cells/µL, highlighting that lower CD4 counts are associated with higher TB risk scores.
Furthermore, two patients had infections with Mycobacterium avium: one control patient with a risk score of 0.36 and one case patient co-infected with both Mycobacterium tuberculosis and M. avium who had a risk score of 0.38. These scores were significantly higher than the median risk score in the validation data set, which was 0.21, suggesting that immune suppression and the presence of any opportunistic infection may contribute to elevated risk scores.
DISCUSSION
Technical advances of high-throughput proteomics and novel computational tools offer promising solutions for the development of new biomarkers, such as urgently needed alternatives to T-cell-based TB diagnostics (16, 21, 22). In order to apply these new methods, however, well-characterized pre-disease onset samples and long follow-up durations are necessary. In this study, we leveraged longitudinally and routinely collected samples and data within the SHCS to develop a proteome TB risk score in PWH. With robust case-control matching, including viremic individuals, we were able to assess the predictive power of this score specific to TB, even including temporal variation with respect to the time point of active TB.
Among the most striking findings of our analysis was the consistent association of B cell responses and immunoglobulin production with progression to active tuberculosis. While historically the role of antibodies in TB has been considered limited, emerging evidence challenges this paradigm. A landmark study demonstrated that individuals with latent TB produce antibodies with distinct Fc functional profiles and glycosylation patterns compared to those with active TB, suggesting that protective antibody responses may shape disease outcomes (23). More recently, TB-specific antibodies were shown to exhibit unique effector functions and subclass distributions depending on the clinical disease state, further supporting the concept that qualitative features of humoral immunity influence TB pathogenesis. Additional studies have emphasized the immunomodulatory role of B cells and antibodies in TB through mechanisms, including opsonization, antigen presentation, and inflammatory regulation (24–26). In HIV-TB co-infection, where T cell immunity is often compromised, antibody-mediated mechanisms may become even more relevant, though they remain underexplored. HIV-associated B cell dysfunction is known to alter antibody quality and repertoire, which could influence TB susceptibility and progression (27). Our findings strongly suggest that, even in people with HIV, humoral immune perturbations—particularly involving B cell activation and immunoglobulin production—are key features of TB progression (28). This supports a growing recognition of the role of antibodies in TB and HIV-TB co-infection (29).
Our plasma proteome score demonstrated a performance comparable to or better than other published biomarkers in predicting active TB. While some studies have reported very high performance for distinguishing active TB from healthy controls, these results must be interpreted cautiously. Factors, such as study setting, sample availability, and the methods used to calculate the scores, play a critical role in performance estimates. To evaluate how our newly developed score compares to existing proteome-based biomarkers, we identified five studies with published scores that included proteins present in our library. For example, four out of five proteins of the five-protein risk score developed by Garay-Baquero et al. were overlapping with our identified proteins, but nevertheless showed limited predictive power within our data set (18). This indicates potential limitations concerning reproducibility of such scores; however, we observed a significant improvement in performance when adding additional proteins. Similar findings were observed for the other protein biomarkers we investigated. This analysis of using our data set as an external validation of published risk scores, thus, suggests that the predictive power of risk scores including only a small set of proteins is most likely limited. Although some single proteins showed some predictive power, including also proteins previously identified in other publications, the combination of all 81 proteins outperformed this predictive performance, suggesting that expanding the protein panel enhances predictive accuracy and robustness.
Interestingly, our proteome score shows no strong evidence of time dependency over the observation period. This is in contrast to published transcriptome-based scores, which showed increasing trends as the sample date approaches the diagnosis of clinically overt disease, potentially driven by type 1 interferon-driven (30). This indicates that the proteome score is less likely to diagnose early disease and instead reflects the overall immune status. This is also seen in the regression analysis, which revealed an inverse correlation between our risk score and CD4 cell counts and the occurrence of opportunistic infections, suggesting that the score provides a proxy for overall immune status. TB often co-occurs with other opportunistic infections in PWH, and our analysis highlights the multifaceted nature of the score. Notably, two individuals with M. avium infection exhibited particularly high-risk scores, with one subsequently developing active TB, while the other did not.
Choosing the optimal threshold, our newly developed risk score reaches a sensitivity of 58.6% and a specificity of 87.3%. A recent publication within the same study population estimated a sensitivity of only 20% and a specificity of 95% of the current gold standard in terms of predicting progression to active TB at least 6 months before diagnosis (8). The high sensitivity of our newly developed risk score is particularly remarkable given that the case and control population was matched by demographic and clinical characteristics, in particular ethnicity, CD4 cell count, and HIV RNA suppression, factors which are independently associated with risk for developing active TB. In another recent publication within our group, a machine learning algorithm for predicting active TB was developed using clinical and demographic parameters only, reaching a sensitivity of 70.1% and a specificity of 81.0% (AUC-ROC: 0.83) (31). Given the promising predictive power of both—only clinical and demographic values or only information given by the proteome (with other factors matched)—suggests that combining clinical and demographic information with proteomics might even yield better results.
This study has several strengths and limitations. A major strength is the clear phenotype definition for active TB and the longitudinal and retrospective design enabling the selection of control participants who did not progress to active TB over a follow-up period of more than 5 years. This, to our knowledge, represents the most stringent criterion for a control population used in the development of prognostic (or diagnostic) TB risk scores. A further strength of our study is the use of TB-negative and MTB-positive controls, where we observed surprisingly no significant difference between the two groups. This control MTB+ population in our study is distinctive, as MTB infection preventive measures were not applied as indicated by established guidelines, making them an understudied group. This provides a unique opportunity to differentiate true progression to active TB from MTB infection, offering valuable insights for future studies. We initially aimed to create a TB-specific proteomic risk score by carefully matching cases and controls on key clinical and demographic variables. While this matching strategy helped reduce confounding, it was not perfect—some factors strongly associated with active TB, such as low CD4 count or high HIV viral load remained unevenly distributed. Despite this, the analysis revealed that the risk score still correlated with CD4 count and HIV viral load independently of the TB case/control status. Specifically, lower CD4 counts and viremic HIV infection were each associated with higher risk scores, suggesting that the proteomic signal reflects the host immune status beyond the presence of active TB alone. One limitation of the study is inherent to the design, which required plasma samples to be collected at least 6 months before the diagnosis of active TB. This criterion excluded individuals with late HIV presentation and simultaneous TB diagnosis, thereby reducing the study population. Specifically, only 11% of participants with active TB met the criterion of diagnosis at least 6 months after enrolling in the SHCS with available plasma samples.
In summary, our study underscores the potential of proteomics as a robust approach for identifying biomarkers relevant to TB risk, particularly in the context of PWH. By leveraging a well-defined cohort with longitudinal sampling and stringent control criteria, we were able to develop a proteome-based score that not only predicts TB progression but also reflects a broader immune status. While the proteome score appears not to diagnose early subclinical TB, its association with immune suppression and opportunistic infections highlights its relevance in populations with complex immune dynamics. Future studies with larger sample sizes and diverse immune-suppressed patients will be essential to validate these findings and further refine the utility of proteomic risk scores for both TB-specific and general immune health applications.
ACKNOWLEDGMENTS
We would like to thank all participants of the SHCS. Furthermore, we would like to thank all study nurses, data managers, and physicians involved in collecting data, as well as all members of the SHCS: Abela, I.A., Aebi-Popp, K., Anagnostopoulos, A., Battegay, M., Bernasconi, E., Braun, D.L., Bucher, H.C., Calmy, A., Cavassini, M. (Chairman of the Clinical and Laboratory Committee), Ciuffi, A., Dollenmaier, G., Egger, M., Elzi, L., Fehr, J.S., Fellay, J., Frigerio Malossa, S., Furrer, H., Fux, C.A., Günthard, H.F. (President of the SHCS), Hachfeld, A., Haerry, D.H.U. (deputy of "Positive Council"), Hasse, B., Hirsch, H.H., Hoffmann, M., Hösli, I., Huber, M., Jackson-Perry, D. (patient representatives), Kahlert, C.R. (Chairman of the Mother & Child Substudy), Keiser, O., Klimkait, T., Kouyos, R.D., Kovari, H., Kusejko, K. (Head of Data Center), Labhardt, N.D., Leuzinger, K., Martinez de Tejada. B., Marzolini, C., Metzner, K.J., Müller, N., Nemeth, J., Nicca, D., Notter, J., Paioni, P., Pantaleo, G., Perreau, M., Rauch, A., Salazar-Vizcaya, L.P., Schmid, P., Segeral, O., Speck, R.F., Stöckle, M., Tarr, P.E., Trkola, A., Wandeler, G. (Chairman of the Scientific Board), Weisser. M., and Yerly, S.
This study has been financed within the framework of the Swiss HIV Cohort Study supported by the Swiss National Science Foundation (grant #201369) and SHCS project #P895. The data are gathered from five Swiss university Hospitals, two Cantonal hospitals, 15 affiliated hospitals, and 36 private physicians (listed in https://www.shcs.ch/health-care-providers/). Katharina Kusejko was supported by the Swiss National Science Foundation under grant #218408. Johannes Nemeth was supported by the Swiss National Science Foundation under grant #310030_200407.
K.K. and J.N. designed the study, performed the analyses, and wrote the first draft of the manuscript. B.C., M.A., and D.R. prepared the samples and performed the proteomics measurements. K.K. and D.D. generated the machine learning pipeline. M.Z. and R.D.K. critically assessed the bioinformatics pipeline and statistical analyses. M.H., N.L., G.W., M.C., S.H., E.B., and H.F.G. were involved in data collection. All authors reviewed and edited the manuscript.
Contributor Information
Katharina Kusejko, Email: katharina.kusejko@usz.ch.
Alan N. Engelman, Dana-Farber Cancer Institute, Boston, Massachusetts, USA
DATA AVAILABILITY
The Swiss HIV Cohort Study (SHCS) established in 1988 collects clinical data and plasma samples from people living with HIV in Switzerland. All participants have provided written informed consent; however, strict confidentiality rules apply. Specifically, the SHCS is obligated to ensure that individuals cannot be re-identified from shared data, which prevents the release of detailed participant-level data (https://www.shcs.ch/for-researchers/open-data-statement-shcs/). To enable transparency and reproducibility while respecting these privacy obligations, we provide the protein quantities plus key sample meta data (case/control status, timing of the sample, MTB status, train/validation-status) in the Supplemental Data. With these data, the reproducibility of the protein candidate pre-selection and development of the risk score are ensured. For projects that require participant-level data variables, a proposal needs to be submitted to the Scientific Board of the SHCS: https://www.shcs.ch/for-researchers/who-can-submit/.
SUPPLEMENTAL MATERIAL
The following material is available online at https://doi.org/10.1128/mbio.01585-25.
Supplemental methods, figures, and tables.
Information on the 583 samples.
ASM does not own the copyrights to Supplemental Material that may be linked to, or accessed through, an article. The authors have granted ASM a non-exclusive, world-wide license to publish the Supplemental Material files. Please contact the corresponding author directly for reuse.
REFERENCES
- 1. WHO . 2024. Tuberculosis (TB). Available from: https://www.who.int/news-room/fact-sheets/detail/tuberculosis
- 2. Houben R, Dodd PJ. 2016. The global burden of latent tuberculosis infection: a re-estimation using mathematical modelling. PLoS Med 13:e1002152. doi: 10.1371/journal.pmed.1002152 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 3. WHO . 2024. The end TB strategy. Available from: https://www.who.int/teams/global-tuberculosis-programme/the-end-tb-strategy
- 4. Kwan CK, Ernst JD. 2011. HIV and tuberculosis: a deadly human syndemic. Clin Microbiol Rev 24:351–376. doi: 10.1128/CMR.00042-10 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 5. Gasser O, Bihl FK, Wolbers M, Loggi E, Steffen I, Hirsch HH, Günthard HF, Walker BD, Brander C, Battegay M, Hess C, for the Swiss HIV Cohort Study . 2007. HIV patients developing primary CNS lymphoma lack EBV-specific CD4+ T cell function irrespective of absolute CD4+ T cell counts. PLoS Med 4:e96. doi: 10.1371/journal.pmed.0040096 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 6. Stuehler C, Bernardini C, Elzi L, Stoeckle M, Zimmerli S, Furrer H, Günthard HF, Leibundgut-Landmann S, Battegay M, Khanna N. 2016. Immune recovery in HIV-infected patients after Candida esophagitis is impaired despite long-term antiretroviral therapy. AIDS 30:1923–1933. doi: 10.1097/QAD.0000000000001126 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 7. Menzies NA, Wolf E, Connors D, Bellerose M, Sbarra AN, Cohen T, Hill AN, Yaesoubi R, Galer K, White PJ, Abubakar I, Salomon JA. 2018. Progression from latent infection to active disease in dynamic tuberculosis transmission models: a systematic review of the validity of modelling assumptions. Lancet Infect Dis 18:e228–e238. doi: 10.1016/S1473-3099(18)30134-8 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 8. Zeeb M, Tepekule B, Kusejko K, Reiber C, Kälin M, Bartl L, Notter J, Furrer H, Hoffmann M, Hirsch HH, Calmy A, Cavassini M, Labhardt ND, Bernasconi E, Braun DL, Günthard HF, Kouyos RD, Nemeth J, Meier J, Schäfer Y, Follonier O, Perraudin D, Amstad M, the Swiss HIV Cohort Study . 2023. Understanding the decline of incident, active tuberculosis in people with human immunodeficiency virus in Switzerland. Clin Infect Dis 77:1303–1311. doi: 10.1093/cid/ciad330 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 9. Scherrer AU, Traytel A, Braun DL, Calmy A, Battegay M, Cavassini M, Furrer H, Schmid P, Bernasconi E, Stoeckle M, et al. 2022. Cohort profile update: the swiss HIV cohort study (SHCS). Int J Epidemiol 51:33–34j. doi: 10.1093/ije/dyab141 [DOI] [PubMed] [Google Scholar]
- 10. Kusejko K, Günthard HF, Olson GS, Zens K, Darling K, Khanna N, Furrer H, Vetter P, Bernasconi E, Vernazza P, Hoffmann M, Kouyos RD, Nemeth J, the Swiss HIV Cohort Study . 2020. Diagnosis of latent tuberculosis infection is associated with reduced HIV viral load and lower risk for opportunistic infections in people living with HIV. PLoS Biol 18:e3000963. doi: 10.1371/journal.pbio.3000963 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 11. Kulak NA, Pichler G, Paron I, Nagaraj N, Mann M. 2014. Minimal, encapsulated proteomic-sample processing applied to copy-number estimation in eukaryotic cells. Nat Methods 11:319–324. doi: 10.1038/nmeth.2834 [DOI] [PubMed] [Google Scholar]
- 12. Penny J, Arefian M, Schroeder GN, Bengoechea JA, Collins BC. 2023. A gas phase fractionation acquisition scheme integrating ion mobility for rapid diaPASEF library generation. Proteomics 23:2200038. doi: 10.1002/pmic.202200038 [DOI] [PubMed] [Google Scholar]
- 13. Geyer PE, Voytik E, Treit PV, Doll S, Kleinhempel A, Niu L, Müller JB, Buchholtz M-L, Bader JM, Teupser D, Holdt LM, Mann M. 2019. Plasma proteome profiling to detect and avoid sample‐related biases in biomarker studies. EMBO Mol Med 11:e10427. doi: 10.15252/emmm.201910427 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 14. Baker CP, Bruderer R, Abbott J, Arthur JSC, Brenes AJ. 2023. Optimising Spectronaut search parameters to improve data quality with minimal proteome coverage reductions in DIA analyses of heterogeneous samples. bioRxiv. doi: 10.1101/2023.10.11.561927 [DOI] [PMC free article] [PubMed]
- 15. Tyanova S, Temu T, Sinitcyn P, Carlson A, Hein MY, Geiger T, Mann M, Cox J. 2016. The Perseus computational platform for comprehensive analysis of (prote)omics data. Nat Methods 13:731–740. doi: 10.1038/nmeth.3901 [DOI] [PubMed] [Google Scholar]
- 16. Penn-Nicholson A, Hraha T, Thompson EG, Sterling D, Mbandi SK, Wall KM, Fisher M, Suliman S, Shankar S, Hanekom WA, Janjic N, Hatherill M, Kaufmann SHE, Sutherland J, Walzl G, De Groote MA, Ochsner U, Zak DE, Scriba TJ, ACS and GC6–74 cohort study groups . 2019. Discovery and validation of a prognostic proteomic signature for tuberculosis progression: a prospective cohort study. PLoS Med 16:e1002781. doi: 10.1371/journal.pmed.1002781 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 17. Singer SN, Ndumnego OC, Kim RS, Ndung’u T, Anastos K, French A, Churchyard G, Paramithiothis E, Kasprowicz VO, Achkar JM. 2022. Plasma host protein biomarkers correlating with increasing Mycobacterium tuberculosis infection activity prior to tuberculosis diagnosis in people living with HIV. EBioMedicine 75:103787. doi: 10.1016/j.ebiom.2021.103787 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 18. Garay-Baquero DJ, White CH, Walker NF, Tebruegge M, Schiff HF, Ugarte-Gil C, Morris-Jones S, Marshall BG, Manousopoulou A, Adamson J, Vallejo AF, Bielecka MK, Wilkinson RJ, Tezera LB, Woelk CH, Garbis SD, Elkington P. 2020. Comprehensive plasma proteomic profiling reveals biomarkers for active tuberculosis. JCI Insight 5:137427. doi: 10.1172/jci.insight.137427 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 19. Xu D, Li Y, Li X, Wei L, Pan Z, Jiang T, Chen Z, Wang C, Cao W, Zhang X, Ping Z, Liu C, Liu J, Li Z, Li J. 2015. Serum protein S100A9, SOD3, and MMP9 as new diagnostic biomarkers for pulmonary tuberculosis by iTRAQ‐coupled two‐dimensional LC‐MS/MS. Proteomics 15:58–67. doi: 10.1002/pmic.201400366 [DOI] [PubMed] [Google Scholar]
- 20. Jiang T-T, Shi L-Y, Wei L-L, Li X, Yang S, Wang C, Liu C-M, Chen Z-L, Tu H-H, Li Z-J, Li J-C. 2017. Serum amyloid A, protein Z, and C4b-binding protein β chain as new potential biomarkers for pulmonary tuberculosis. PLOS One 12:e0173304. doi: 10.1371/journal.pone.0173304 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 21. Mousavian Z, Källenius G, Sundling C. 2023. From simple to complex: protein‐based biomarker discovery in tuberculosis. Eur J Immunol 53:e2350485. doi: 10.1002/eji.202350485 [DOI] [PubMed] [Google Scholar]
- 22. Mateos J, Estévez O, González-Fernández Á, Anibarro L, Pallarés Á, Reljic R, Mussá T, Gomes-Maueia C, Nguilichane A, Gallardo JM, Medina I, Carrera M. 2020. Serum proteomics of active tuberculosis patients and contacts reveals unique processes activated during Mycobacterium tuberculosis infection. Sci Rep 10:3844. doi: 10.1038/s41598-020-60753-5 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 23. Lu LL, Chung AW, Rosebrock TR, Ghebremichael M, Yu WH, Grace PS, Schoen MK, Tafesse F, Martin C, Leung V, Mahan AE, Sips M, Kumar MP, Tedesco J, Robinson H, Tkachenko E, Draghi M, Freedberg KJ, Streeck H, Suscovich TJ, Lauffenburger DA, Restrepo BI, Day C, Fortune SM, Alter G. 2016. A functional role for antibodies in tuberculosis. Cell 167:433–443. doi: 10.1016/j.cell.2016.08.072 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 24. Achkar JM, Casadevall A. 2013. Antibody-mediated immunity against tuberculosis: implications for vaccine development. Cell Host Microbe 13:250–262. doi: 10.1016/j.chom.2013.02.009 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 25. Watson A, Li H, Ma B, Weiss R, Bendayan D, Abramovitz L, Ben-Shalom N, Mor M, Pinko E, Bar Oz M, Wang Z, Du F, Lu Y, Rybniker J, Dahan R, Huang H, Barkan D, Xiang Y, Javid B, Freund NT. 2021. Human antibodies targeting a Mycobacterium transporter protein mediate protection against tuberculosis. Nat Commun 12:602. doi: 10.1038/s41467-021-20930-0 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 26. Grace PS, Dolatshahi S, Lu LL, Cain A, Palmieri F, Petrone L, Fortune SM, Ottenhoff THM, Lauffenburger DA, Goletti D, Joosten SA, Alter G. 2021. Antibody subclass and glycosylation shift following effective TB treatment. Front Immunol 12:679973. doi: 10.3389/fimmu.2021.679973 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 27. Adeoye B, Nakiyingi L, Moreau Y, Nankya E, Olson AJ, Zhang M, Jacobson KR, Gupta A, Manabe YC, Hosseinipour MC, Kumwenda J, Sagar M. 2023. Mycobacterium tuberculosis disease associates with higher HIV-1-specific antibody responses. iScience 26:106631. doi: 10.1016/j.isci.2023.106631 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 28. Zeeb M, Kusejko K, Hartnack S, Pasin C, Abela IA, Rusert P, Liechti T, Kadelka C, Notter J, Eichenberger A, Hoffmann M, Hirsch HH, Calmy A, Cavassini M, Labhardt ND, Bernasconi E, Günthard HF, Kouyos RD, Trkola A, Nemeth J, and the Swiss HIV Cohort Study . 2025. Infection with Mycobacterium tuberculosis alters the antibody response to HIV-1. PLoS Pathog 21:e1013350. doi: 10.1371/journal.ppat.1013350 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 29. Hermann C, King CG. 2021. TB or not to be: what specificities and impact do antibodies have during tuberculosis? Oxf Open Immunol 2:iqab015. doi: 10.1093/oxfimm/iqab015 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 30. Zak DE, Penn-Nicholson A, Scriba TJ, Thompson E, Suliman S, Amon LM, Mahomed H, Erasmus M, Whatney W, Hussey GD, et al. 2016. A blood RNA signature for tuberculosis disease risk: a prospective cohort study. Lancet 387:2312–2322. doi: 10.1016/S0140-6736(15)01316-1 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 31. Bartl L, Zeeb M, Kälin M, Loosli T, Notter J, Furrer H, Hoffmann M, Hirsch HH, Zangerle R, Grabmeier-Pfistershammer K, Knappik M, Calmy A, Fernandez JD, Labhardt ND, Bernasconi E, Günthard HF, Kouyos RD, Kusejko K, Nemeth J. 2025. Machine learning-based prediction of active tuberculosis in people with human immunodeficiency virus using clinical data. Clin Infect Dis:ciaf149. doi: 10.1093/cid/ciaf149 [DOI] [PMC free article] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Supplementary Materials
Supplemental methods, figures, and tables.
Information on the 583 samples.
Data Availability Statement
The Swiss HIV Cohort Study (SHCS) established in 1988 collects clinical data and plasma samples from people living with HIV in Switzerland. All participants have provided written informed consent; however, strict confidentiality rules apply. Specifically, the SHCS is obligated to ensure that individuals cannot be re-identified from shared data, which prevents the release of detailed participant-level data (https://www.shcs.ch/for-researchers/open-data-statement-shcs/). To enable transparency and reproducibility while respecting these privacy obligations, we provide the protein quantities plus key sample meta data (case/control status, timing of the sample, MTB status, train/validation-status) in the Supplemental Data. With these data, the reproducibility of the protein candidate pre-selection and development of the risk score are ensured. For projects that require participant-level data variables, a proposal needs to be submitted to the Scientific Board of the SHCS: https://www.shcs.ch/for-researchers/who-can-submit/.





