Skip to main content
PLOS Global Public Health logoLink to PLOS Global Public Health
. 2024 Jul 23;4(7):e0002643. doi: 10.1371/journal.pgph.0002643

Strong effect of demographic changes on Tuberculosis susceptibility in South Africa

Oshiomah P Oyageshio 1,*,#, Justin W Myrick 2,#, Jamie Saayman 3, Lena van der Westhuizen 3, Dana R Al-Hindi 4, Austin W Reynolds 5, Noah Zaitlen 6, Eileen G Hoal 3, Caitlin Uren 3,7, Marlo Möller 3,7,*, Brenna M Henn 1,2,4,*
Editor: Indira Govender8
PMCID: PMC11265723  PMID: 39042651

Abstract

South Africa is among the world’s top eight tuberculosis (TB) burden countries, and despite a focus on HIV-TB co-infection, most of the population living with TB are not HIV co-infected. The disease is endemic across the country, with 80–90% exposure by adulthood. We investigated epidemiological risk factors for (TB) in the Northern Cape Province, South Africa: an understudied TB endemic region with extreme TB incidence (926/100,000). We leveraged the population’s high TB incidence and community transmission to design a case-control study with similar mechanisms of exposure between the groups. We recruited 1,126 participants with suspected TB from 12 community health clinics and generated a cohort of 774 individuals (cases = 374, controls = 400) after implementing our enrollment criteria. All participants were GeneXpert Ultra tested for active TB by a local clinic. We assessed important risk factors for active TB using logistic regression and random forest modeling. We find that factors commonly identified in other global populations tend to replicate in our study, e.g. male gender and residence in a town had significant effects on TB risk (OR: 3.02 [95% CI: 2.30–4.71]; OR: 3.20 [95% CI: 2.26–4.55]). We also tested for demographic factors that may uniquely reflect historical changes in health conditions in South Africa. We find that socioeconomic status (SES) significantly interacts with an individual’s age (p = 0.0005) indicating that protective effect of higher SES changed across age cohorts. We further find that being born in a rural area and moving to a town strongly increases TB risk, while town birthplace and current rural residence is protective. These interaction effects reflect rapid demographic changes, specifically SES over recent generations and mobility, in South Africa. Our models show that such risk factors combined explain 19–21% of the variance (r2) in TB case/control status.

Introduction

Tuberculosis (TB) is the world’s leading cause of death due to infectious disease, currently greater than COVID-19 [1]. The causative agent, Mycobacterium tuberculosis (M.tb), is an obligate intracellular pathogen mainly infecting the lungs, and sometimes other organs [2, 3]. Approximately 25% of the world’s population is infected with M.tb and the annual death toll is similar to COVID-19 (~1.5 million deaths). South Africa is amongst the top 30 ‘high burden’ countries coping with TB, TB/HIV co-infection, and multi-drug resistance or rifampicin-resistant TB (MDR/RR-TB). TB is South Africa’s leading natural cause of death [4] with an extremely high prevalence (446/100,000, [5]) and accounts for 3.3% of all global TB cases [1]. The Northern Cape presently has the highest TB incidence in South Africa (ZF Mgcawu district 926/100,000 [5]), but the lowest HIV prevalence (7.1% vs 13.5% National average), including the lowest density of people living with HIV [6].

Determinants of active TB progression are multifaceted, including: genetics, nutrition, social and economic conditions, behavior, and sex-specific biology [1, 7, 8]. Initial M.tb infection is largely determined by exogenous factors, such as TB prevalence in the community, population density (e.g., prisons), and working conditions (e.g., mining, healthcare workers) [913]. The lifetime risk of progressing to active TB following infection is 10%. This risk is the highest within the first 5 years of initial infection and is typically considered to be mediated by the host’s innate and cell-mediated immune system [9, 14]. Individual (i.e., host) factors, however, have also been shown to increase risk of progressing to active disease. These include HIV/AIDS, poor nutrition or low body mass index, indoor air pollution (e.g., cooking with wood and poor ventilation, smoking, alcohol abuse, diabetes mellitus, and intravenous drug use [1, 9, 14, 15]. Studies in India have shown undernutrition to be among the strongest determinants for TB risk [15]. In South Africa specifically, poor living conditions, unemployment, low SES, age and male gender, race, smoking, and marital status have all been identified as contributing to TB risk [1621].

The extent of these determinants’ effects can vary across and within populations, necessitating epidemiological studies in differing contexts and communities [8]. Compared to medium or high-TB-incidence countries, the effect sizes for alcohol abuse, homelessness, and intravenous drug use are stronger in low-incidence populations [22]. In South Africa, multilevel modeling approaches have shown that provincial [16] and community income inequality [18] have strong effects on TB incidence and progression, independent of individual-level risk factors.

HIV increases TB risk by 20-fold, the largest known risk factor for progression to active TB, and TB is the leading cause of AIDS-related deaths [23]. The effect of HIV on suppressing the host immune system can reactivate a latent M.tb infection and increase susceptibility to initial infection [14, 23, 24]. Despite HIV being the strongest TB determinant, other TB risk factors explain the majority of global TB cases [9]. In South Africa, 59% of people with TB on the National TB Programme (i.e. on TB medication) are co-infected with HIV [25]. However, South Africa’s first national TB prevalence survey found that only 28% of people with TB were also people living with HIV (PLWHIV) [25], a finding underscoring the necessity to extend TB research to those living without HIV in high TB burden areas. At the provincial level in South Africa, HIV prevalence explains little of TB incidence (r2 = 0.036) [26]. The Western and Northern Cape Provinces have among the highest TB incidence yet the lowest HIV prevalence [27].

Here, we present a TB case-control study characterizing the individual-level risk factors for TB progression among HIV-negative patients with suspected TB from the Northern Cape. The Northern Cape has the highest TB incidence but the lowest HIV prevalence and PLWHIV density, and overall low population density, canonical risk factors do not appear to be driving the extraordinary incidence rates. To focus on factors other than immune suppression, we exclude PLWHIV from the analysis. Controls from our study sample are people with suspected TB from local health clinics who were microbiologically confirmed to be negative for active TB. Controls are assumed to either have been previously exposed to or infected with M.tb (i.e., LTBI). In South Africa, TB transmission is driven largely by community spread, rather than household contacts [28, 29]. Cases, in contrast, are people who have microbiologically confirmed active TB or self-report a past active TB episode. We test three separate models comprising common risk factors, as well as factors that may uniquely affect South Africa. We find that exogenous factors like SES, cohort age, and residence/birthplace have a strong effect on TB progression, often equal to or greater than endogenous factors like gender or smoking/alcohol. These results suggest further research into the causal mechanisms behind exogenous risk factors and opportunities for TB prevention are warranted.

Methods

Research ethics statement

This study has been approved by the Health Research Ethics Committee (HREC) of Stellenbosch University (N11/07/210A) and the Northern Cape Department of Health (NC2015/008). All participants were adults (18 years and older) and provided written informed formal consent. Authors Justin W. Myrick, Jamie Saayman, Lena van der Westhuizen and Marlo Möller had access to identifiable information about participants as they were directly involved in data collection or database management. Access to these records commenced on 26th January 2016, and is still ongoing as it is an integral part of the Northern Cape Tuberculosis Project (NCTB).

Inclusivity in global research

Additional information regarding the ethical, cultural, and scientific considerations specific to inclusivity in global research is included in the Supporting Information (S1 Checklist).

Study design and recruitment

Participants (18 years and older) provided written informed consent and were recruited from 12 community health clinics from the ZF Mgcawu district in the Northern Cape Province of South Africa from 26th January 2016 – 15th May 2017, and 11th December 2018 – 11th March 2020. Community health clinics are the front line for TB screening and treatment, visited by 87% of people who seek TB care [25]. TB nurses referred patients with suspected TB (with ≥2 TB symptoms: cough for ≥2 weeks, night sweats, weight loss, and fever ≥2 weeks, or interaction with a TB index contact) and known TB patients to our on-site RAs. All study participants took a clinic-administered sputum GeneXpert Ultra test for active TB at the time of the study interview and provided saliva for genotyping. Clinic medical charts were accessed by a staff research nurse to record GeneXpert test results and verify HIV status and TB history.

Case-control assignment

Cases and controls were assigned based on both the participant’s medical charts and self-reported data (Fig 1). Cases include anyone with active pulmonary TB in their lifetime and that was HIV-negative. Thus, cases could be partitioned into 1) clinically confirmed active TB (n = 208) at the time of enrollment, and 2) self-reported past TB episode(s) (n = 166). GeneXpert results, diagnostic test date, TB strain (drug resistance), and TB medication regimen were used to validate clinically confirmed progression to active TB. Past TB episodes are based on self-report, mainly due to older medical charts which were not reliably available, discarded, or difficult to locate by clinic staff.

Fig 1. Case-control decision tree.

Fig 1

Study participants were categorized as cases or controls based on medical record information and self-reported data. All participants were GeneXpert tested for active TB infection at the time of enrollment. Past TB episodes were self-reported and cross-referenced with medical records when available.

We defined controls as HIV-negative clinic patients with suspected TB symptoms who had a negative GeneXpert Ultra result, and no history of active pulmonary TB at the time of study enrollment. Controls in our study are likely to be latently infected with M.tb (LTBI) or to have been exposed in their lifetime. A majority of the population in high TB burden South African suburbs are LTBI, 88% by ages 31–35 [30].

Our exclusion criteria removed participants with unknown TB or HIV status, as well as PLWHIV.

Study covariates

We collected demographic information that included date of birth, place of birth, current residence, self-identified gender, self-reported ethnic identity, and parental ethnic identities. Behavioral variables include smoking, drinking and diabetes (Supplementary Methods in S1 Text). In our analyses, we only used binary measures for smoking, drinking and diabetes (“Do you smoke?”, “Do you have diabetes?”). Age at enrollment was used as a continuous variable for all analyses and binned for calculating empirical odds. Socioeconomic status “SES” was operationalized as number of years of education, i.e., the highest completed level of education. McKenzie et al. [31] have shown education level, in this dataset, positively predicts body mass index in TB controls, tracking access to resources and food security if only a crude measure.

Residence and birthplace locations are categorized as rural (≤2000 people) and town (>2,000 people). Population size was derived from the South African census and when census data was absent, e.g., a farm, we used Google Earth (earth.google.com) to estimate population size based on the number of dwellings. Places that did not have a census size available in Stats SA (statssa.gov.za) typically were very small communities, like a farm or a small settlement. By visualizing the settlement through Google Earth, we were able to estimate whether the community size was >2000 estimated people. We used the average household size from the census data from all locations in the district listed by Stats SA (statssa.gov.za), then counted the number of dwellings in blocks and multiplied by the average household size [4] to get an estimated population size. While an imprecise measure of population size, the lack of government census data for a community is itself an indicator of its rural locality.

Statistical analyses

Statistical analyses were performed in R (version 4.2.3). We calculated Pearson correlations with the R package ggcorrplot. All categorical variables were numerically coded to “0” and “1”. Classification models for our binary, qualitative dependent variable (“case”/ “control”) included logistic regression and random forest—a machine learning classifier robust to non-linear associations and unknown variable interactions [32] (Supplementary Methods in S1 Text). To calculate empirical odds, we binned our participants into 7 age groups, dividing the number of controls by the number of cases in each age bin. All R scripts for analyses are available at https://github.com/oshiomah1/NCTB-Epidemiology-Project.

Imputing missing data

To maximize our sample size, we imputed missing data for diabetes, smoking, and years of education (Table C in S1 Text). The proportion of missing data was overall low, i.e. below 5%, except for alcohol use at for which missingness was 35%; the alcohol use measure was implemented after a pilot study. We chose to exclude alcohol from imputation and model analyses due to high missingness. Multivariate imputation was performed using the R package MICE, implementing chained equations in which every variable with incomplete data is imputed conditional on all the data from other variables in the dataset [33]. To initiate the MICE procedure, we created a matrix of variables consisting of age, gender, height, HIV, mother’s ethnicity, father’s ethnicity, diabetes, smoking status, and years of education. Notably, ethnicity and height variables were not used in our regression analysis but bore potential relevance to our missing variables, so they were included to improve the statistical inference of our imputation. We set the parameters of our imputation using recommended settings [33, 34], generating two imputed datasets (m = 2) that were run for 10 iterations each (maxit = 10). We used a classification and regression tree method which is robust in epidemiological datasets similar to ours [35].

To cross-validate our imputation method, we randomly sampled ten percent of known values in each variable and converted them to missing values (Table C in S1 Text). Next, these missing values were imputed using the procedure described above, and the missing value was compared to the original value. For continuous variables, the average percent difference between imputed and original value was used to calculate the cross-validation (CV) score while the average accuracy of the imputed variable was used to generate the CV score for binary variables. This procedure was carried out on one variable at a time for 100 iterations. Cross-validation results revealed that years of education and diabetes were sufficiently imputed (CV score > 10%).

Obtaining and visualizing model coefficients

After MICE imputation, we used the ‘psfmi’ package [36] to implement our logistic regression models, obtaining pooled odds ratios using Rubin’s Rules [37]. Each model was Bonferroni corrected using a baseline of p<0.05. For the Residence model, we set lifetime rural dwellers as the baseline and manually calculated contrasts for the other three comparisons. To illustrate the covariate effects from our models, we extracted the first imputed dataset from the MICE output, used the R ‘glm’ function to implement the logistic regression models, and then used the ‘effects’ package [38] to visualize the odds of Active TB.

Genetic data processing & ancestry estimation

A subset of participants (n = 159) was genotyped for >2 million single nucleotide polymorphisms (SNPs) on the Illumina H3Africa array. Genetic data processing involved DNA extraction from saliva samples, common variant calling with GenomeStudio, rare variant calling with zCall, and further data cleaning using plink2 (Supplementary Methods in S1 Text). Global [i.e. genome-wide) ancestry estimates were calculated using ADMIXTURE v1.13 [39]. The Luhya, Maasai, Himba, British, Palestinian, Chinese, Bangladeshi, Tamil, Ju|’hoansi San, Khomani San, and Nama populations were used as reference groups encompassing all major ancestry sources. ADMIXTURE was run in groups of maximally unrelated individuals to avoid biasing the ancestry estimates. We assumed k = 5 possible ancestries, inferred in unsupervised mode for each of the running groups. After matching clusters, we merged ancestry estimates across all running groups, averaging individuals that appeared in multiple running groups using pong [40]. We further tested whether population stratification affected the results of the logistic regression models by including 10 principal components (computed with plink2) and re-computing regressions for just the subset of n = 159 individuals (Supplementary Methods in S1 Text).

Results

TB case-control classification

1,126 participants were partitioned into preliminary cases, preliminary controls, and unverified TB status (571, 504, and 51 respectively). After excluding, participants with unverified TB status and either unknown or positive HIV status, 774 participants remained in the study (374 cases and 400 controls; Table A in S1 Text).

Socio-behavioral and demographic characteristics of the cohort

Men and women were equally represented in the dataset (384 vs. 390, respectively, Table A in S1 Text). Men were more likely to drink alcohol (p < 0.001) and smoke (p < 0.05). A high fraction of our participants self-reported smoking (67%) and drinking alcohol (46%); smoking and drinking were moderately correlated with each other (r = 0.36, p < 0.05; Fig A in S1 Text). Women were more likely to have diabetes (p = 0.0004; Fig A in S1 Text) and, on average, had more education than men (female mean = 8.2 years, male mean = 7.8 years).

We use the number of years of education as a proxy for socio-economic status “SES” (Methods). The mean educational attainment in our cohort is 8 years, equivalent to completing primary school, and is similar between individuals recruited in rural areas and towns [ANOVA, p > 0.1). In the ZF Mgcawu District census [41], 13% of adults have not completed primary school as compared to 25.3% of our participants. Age was moderately correlated with SES (r = -0.5, p < 0.05; Fig A in S1 Text) such that older participants tended to have lower SES. Cases and controls had similar mean ages, 43.6 and 43.1 years respectively (Wilcoxon rank sum p-value = 0.959) (Fig 2A). We found a significant difference in SES between cases and controls, with mean of 7.7 years and 8.3 years, respectively (Wilcoxon rank sum p-value = 0.0019) (Fig 2B).

Fig 2.

Fig 2

Density plots of continuous variables A) Age by case-control status B) SES by case-control status.

To investigate the possibility of selection bias we computed empirical odds of active TB by age group. Assuming that age is a cumulative measure of exposure (that is, capturing the amount of time someone is exposed to TB), the empirical odds of TB should increase monotonically with age. We observe a non-monotonic trend where the odds of active TB progressively increase from ages 18 up to 38, then reverses, progressively decreasing starting at age 39 up till the 79–88 age group having the lowest empirical odds (Fig B in S1 Text).

Ethnicity and genetic ancestry

Individuals were asked to self-identify their ethnicity without categorical prompts. 88.4% of participants [both TB cases and controls) self-identify as Coloured, followed by 4.2% as a Khoe-San ethnicity (e.g., Nama, San), 4.6% as Tswana, 1.3% as Xhosa, and 1.9% as “other”. Whilst we acknowledge that in some contexts the term “Coloured” has derogatory connotations, in South Africa it is a recognized ethnicity as well as a racial category. People who self-identify using this term tend to have genetic ancestries from multiple geographic origins, including the indigenous Khoe-San groups (e.g., Khoekhoe, San), Bantu-speaking, European, Indian, Malaysian (Southeast Asian) slaves, or people of mixed ancestry and their descendants [42]. The use of “Coloured” in this context reflects the self-identified cultural attributes of the participants, as well as possible historical and genetic attributes. Ethnicity is reported out of respect for participants’ choice of identity.

Genetic ancestry characterization was performed for 159 participants to assess if there was significant variation among clinics which could potentially confound analysis. Since genetic information was not available for all participants in the study, individual ancestry was not directly factored into the logistic regression models and random forest models. Khoe-San ancestry varied across clinic locations (Fig 3A) but remained the majority ancestry at each site (mean = 56%), followed by Bantu-speaking African ancestry (mean = 21%), European ancestry (mean = 16%), South Asian ancestry (mean = 5%), and East Asian ancestry (mean = 2%) (Fig 3B).

Fig 3. Khoe-san ancestry is the primary genetic ancestry in clinics from the northern cape, South Africa.

Fig 3

A subset of participants (n = 159) was genotyped to obtain the average genome-wide ancestry proportions across all individuals for each clinic. A) Khoe-San ancestry is the largest proportion of ancestry in our sample, it varies significantly across study sites. The boxplots show the median, the 25th and 75th percentile and 1.5 times said percentile, and all outliers as dots. B) The study population is admixed with 5 distinct ancestries with the southern African indigenous Khoe-San ancestry being the highest proportion of ancestry at all study sites.

There was statistically significant variation in ancestry at the clinic level (ANOVA: Khoe-San, F = 6.9, p < 1e-06; Bantu-speaking, F = 7, p < 6e-05; European, F = 5, p < 6e-05; South Asian, F = 8, p < 4.8e-08; East Asian, F = 4.5, p < 0.001, Fig 3A). The statistically significant differences in the proportion of mean ancestry were generally between clinics in the Kalahari (Askham and Rietfontein, except Groot Mier) versus clinics along the Orange River (Harry Surtie, Dorp, Kakamas, and Keimoes) determined by Tukey HSD post hoc tests. The Kalahari clinics tended to have more Khoe-San ancestry (20–30% more) than the Orange River clinics and less Bantu-speaking African ancestry (~20% less). Notably, Groot Mier in the Kalahari had more European (11–19% more) and greater South and East Asian ancestry (4–10% and 2–3% more, respectively) than most other clinics (Fig 3B), likely due to Groot Mier’s history as an early European colonial post [43].

Hypotheses and models of progression to active TB

We designed three logistic regression models [44] to examine the risk factors that determine TB case/control status in our sample. Our first model, which we termed the “common risk factor” model (n = 774), includes six covariates known to be common behavioral or demographic risk factors for TB. We hypothesized that risk factors identified from prior studies are also significantly associated with active TB progression in our HIV-negative population.

Common risk factor model: TB Status ~ gender + smoking + diabetes + residence + age + SES

Health disparities are one of the many consequences of apartheid in South Africa [45, 46]. The end of apartheid circa 1994 improved social mobility and educational access; however, health disparities in the Northern Cape (and other provinces) still remain problematic [47]. We formulated two alternative models which included variables potentially important to South African populations, involving the change in SES over time, and migration between rural and urban areas. We hypothesized that there are differential effects of SES (education) on TB status due to the sociopolitical effects of apartheid. To capture the effect of lived experience vis-à-vis Apartheid on TB outcomes we designed an “SES model” (n = 774). This model includes the common risk factors as above but allows for an interaction between age and SES to account changing economic conditions over the past eighty years. For example, completion of a high school equivalent education in 1960 did not afford the same economic benefits as completion of a high school education in 2010. We predict that for younger cohorts, higher SES is protective against TB; in contrast, for individuals born during apartheid, higher SES would have little effect on lifetime TB status. Age was kept as a continuous variable because apartheid was not a historically binary event.

SES model: TB Status ~ common risk factor model + age * SES

Residing in an urban or rural environment is an established risk factor for TB status [11, 4850] and was included in our common risk factor model, as above. Previously, we have shown that migration from an individual’s natal town has increased over the past two generations in the rural Northern Cape Province [31]. In addition, a recent longitudinal study leveraging data from South Africa’s National Health Laboratory Service showed that incorporating cross-municipality migration improves the ability to predict TB incidence [51]. For our “residence model”, we hypothesized that migrating from a rural to urban area in one’s lifetime increases the odds of active TB status due to greater exposure to M.tb. Here, we include an interaction between current residence and birthplace in the common risk factor model. Setting this interaction allows us to examine four patterns, namely: lifetime rural residence, rural birthplace to urban residence, urban birthplace to rural residence, and lifetime urban residence.

Residence Model: TB Status ~ common risk factor model + residence * birthplace

The common risk factor model (pseudo R2 = 19%, n = 774, Table 1) performed slightly worse than the SES model (pseudo R2 = 21%, n = 774, Table 2). The residence model had a smaller sample size (pseudo R2 = 19%, n = 720, Table 2) than the common risk factor model due to missing birthplace data for some individuals. For an equal comparison, we re-ran the common risk factor and SES models with same individuals as in the Residence model. For the reduced dataset the Residence model and SES model had comparable pseudo R2 while the common risk factor model had a slightly worse value (pseudo R2 = 18% vs. 19%) (Table B in S1 Text). Therefore, we present results from all three models and contrast the variable effects (Tables 1 and 2). All significance levels were Bonferroni corrected, assuming an α = 0.05.

Table 1. Odds ratios and p-values for the demographic and socio-behavioral variables used in the common risk factor model.

Common Risk Factor (n = 774)
Odds Ratio (95% CI) p value
(Intercept) 0.23 [0.08, 0.66] 0.007
Residence 3.20 [2.26, 4.55] p < 0.001**
Gender 3.01 [2.20, 4.12] p < 0.001**
Age 0.99 [0.97, 1.00] 0.046
Years of education (SES) 0.91 [0.86, 0.97] 0.006*
Smoker 1.32 [0.94, 1.85] 0.111
Diabetes 1.27 [0.67, 2.43] 0.462
n 774
R2 0.19

* alpha = 0.05

** alpha = 0.01

Table 2. Odds ratios and p-values for the demographic and socio-behavioral variables used in the residence model and SES model.

Residence Model SES Model
Odds Ratio (95% CI) p value Odds Ratio (95% CI) p value
(Intercept) 0.36 [0.12, 1.12] 0.077 3.69 [0.50, 27.36] 0.197
Birthplace^ 2.84 [1.28, 6.21] 0.005*
Residence^ 6.10 [2.97, 12.5] p <0.001** 3.29 [2.30, 4.71] p <0.001**
Gender^ 3.04 [2.19, 4.21] p <0.001** 3.02 [2.20, 4.15] p <0.001**
Age 0.98 [0.97, 1.00] 0.022 0.94 [0.91, 0.97] p <0.001**
Years of education (SES) 0.92 [0.86, 0.98] 0.013 0.66 [0.54, 0.81] p <0.001**
Smoker 1.32 [0.93, 1.89] 0.124 1.22 [0.85, 1.73] 0.274
Diabetes 1.49 [0.74, 3.00] 0.267 1.28 [0.67, 2.45] 0.453
Years of education: Age 1.01 [1.00, 1.01] p <0.001**
Birthplace: Residence 0.33 [0.13–0.86] 0.024
n 720 774
R 2 0.19 0.21

* alpha = 0.05

** alpha = 0.01

^ We report the OR with a baseline of rural = 1, for Birthplace; a baseline of town = 1, for Residence; for Gender the OR is listed for males, and for Smoking and Diabetes the OR is listed for a positive response.

Common risk factors

Across the three models, males consistently have three times the odds of active TB than females (OR = 3.01 [2.20,4.12], p < 0.001; Tables 1 and 2, and Fig C in S1 Text). All logistic regression models showed insufficient statistical evidence for smoking (common risk factor model: OR = 1.32 [0.94, 1.85], p = 0.11; Tables 1 and 2), and diabetes (common risk factor model: OR = 1.27 [0.67, 2.43], p = 0.46; Tables 1 and 2) on TB risk. Despite the lack of significance, we note that smoking had an effect size in the expected direction (Fig C in S1 Text). The variable with the strongest effect size was current residence in towns–areas with a population size greater than 2,000 peoples (OR = 3.20 [2.26, 4.55], p <0.0001; Tables 1 and 2 and Fig C in S1 Text).

Age interacts with SES

In the common risk factor model, age does not significantly affect TB risk, but higher SES has a protective effect (OR = 0.91 [0.86, 0.97], p = 0.006; Table 1). In the SES model, SES significantly affects TB status depending on age group (OR = 1.01 [1.00, 1.01], p< 0.001, Table 2). The effect is such that higher SES at younger ages (18–59 years old) is protective against TB, and higher SES at older ages (>59 years) increases risk (Fig 4)

Fig 4. Logistic regression interaction plots.

Fig 4

A) The odds of active TB by education level vary across age groups (shown above by the different color lines). More years of education decreases the odds of active TB in younger age groups, but this pattern reverses in the oldest age groups. B) Effect plot from the residence model visualizing an interaction term between birthplace residence and current residence. Regardless of birthplace, the odds of active TB is highest in individuals who currently reside in towns. Individuals born in towns and currently residing in rural areas have the lowest odds of active TB.

TB risk by residence and birthplace

We analyzed the relationship between TB status and a change of locality between birthplace (rural area or town) and current residence (rural area or town) during an individual’s lifetime. We expected to see a difference in the odds of active TB between lifelong residents and those who have moved between locales. Under such a model, lifelong rural dwellers would have the lowest odds and lifelong town dwellers would have the highest odds. We set an interaction term between current residence and birthplace classified into town/rural, (OR = 0.33 [0.13–0.86], p = 0.024, Table 2). To break down the interaction effect, we set lifelong residents of rural areas as the baseline, comparing the three other residence patterns with this baseline group. We found that lifelong town dwellers had about twice the odds of active TB (OR = 2.16 [1.43–3.28], p < 0.001) relative to the baseline. Individuals born in a rural area and currently residing in a town had similar outcomes as lifelong town dwellers (OR = 2.19 [1.14–4.20], p = 0.018). Taken together, these show that town residence increases risk regardless of birthplace. Interestingly, individuals who were born in a town and later moved to rural areas are even more protected than individuals born and currently residing in rural areas (OR = 0.33 [0.16–0.71], p = 0.004) (Fig 4).

Random forest modeling

As an alternative to logistic regression, we trained a random forest model to classify TB status utilizing all the variables from the common risk factor model. We configured the model to grow 5000 classification trees (Supplementary Methods in S1 Text). The model assigned gender, current residence, SES, and age as the top important independent variables (Fig 5). Diabetes and smoking were classified as uninformative predictors for TB status. The model had an overall “out-of-bag” misclassification rate of 23%.

Fig 5. Random forest model with common risk factor variables (n = 774).

Fig 5

Random subsets of all 6 variables on the y-axis were used to grow 5000 trees to classify participants into cases and controls. The model had an overall “out-of-bag” misclassification rate of 23%. Variables with higher variable importance are most crucial for case-control classification. Predictor variables with negative variable importance values worsen the ability of the model to classify TB status.

Discussion

We examined common demographic risk factors for TB, constructing the largest TB epidemiological study in a Northern Cape clinical population (n = 774), to our knowledge. We show that gender, SES, and current residence locality are significant variables or important TB risk predictors, using logistic regression and random forest models. Neither smoking nor diabetes is associated with increased TB risk in any model. Among the three logistic regression models, interacting SES by age (“SES model”), and birthplace by residence (“residence model”), had similar explanatory power, improving on the common risk factor model.

In South African townships, M.tb is community spread [28, 29] and about 88% of adults are infected by ages 31–35 [30]. Here, we demonstrate the utility of sampling in high disease incidence populations to rapidly build datasets with large sample sizes of TB cases and controls with similar pathogen exposure. Validating M.tb exposure in controls is traditionally done by a tuberculin skin test (TST) and/or interferon-gamma release assay (IGRA; e.g., QuantiFERON), thereby assigning LTBI status. We did not perform IGRA and TST tests for all controls in our cohort because IGRA testing requires blood draws and is often prohibitively expensive for large cohort studies, and TST is not readily available in South Africa. To validate the LTBI rate in our controls, we IGRA-tested (within 3 days of the participant’s GeneXpert result), a random selection of our sample controls (n = 70) and found that they have an 87% LTBI rate (IGRA+). The highly significant model results, in combination with the IGRA+ subset of controls, suggest that our sampling strategy reliably categorizes TB disease risk.

We considered whether our sampling strategy displayed any indication of selection bias. We hypothesized that the risk of active TB should monotonically increase with age, reflecting cumulative lifetime exposure to the M.tb pathogen. Contrary to this, our findings revealed a non-monotonic relationship. Starting with the youngest age group (18–28 years), the empirical odds of active TB increased with age, peaking in the 29-38-year-old age group, followed by a progressive decline in empirical odds till the oldest age group (79–88 years) (Fig B in S1 Text). This unexpected pattern can be indicative of selection bias, possibly driven by survivor bias—where individuals dying from TB are absent from the study population at older age bins and/or related to whether our controls are representative of the general population (sample-selection bias). Generally, selection bias is difficult to measure and mitigate [5254], especially in case-control studies where controls are recruited from a clinical setting. Recall bias in older adults could also lead to the lower observed empirical OR in older age bins; however, because TB treatment is a six-month course, and incomplete treatment regimens lead to relapse or life-threatening drug resistant TB, it is generally a life event that people remember with the exception of pediatric TB.

SES model effects

Age-specific TB risk varies across the lifespan. The greatest risk of TB is during infancy, decreasing through adolescence, then increasing and peaking between 25–35 years old followed by a decrease, and another peak after 65 years [48, 55]. Age was not a significant variable in the logistic regression except when interacting with SES. SES’s protective effect on TB risk is most evident among 18–39-year-olds but the trend reverses and increases risk among the eldest individuals (>69 years; Fig 4A)—those who grew up and reached adulthood during Apartheid (Fig 4A). Higher SES increasing TB risk at older ages is contrary to findings in populations in the United States and Mexico [55]. This unique pattern may reflect South Africa’s recent history of Apartheid and post-Apartheid societal and economic shifts. During Apartheid, individuals from historically marginalized backgrounds had limited career options, but some were able to become teachers, police officers, or nurses. Such occupations are associated with higher education requirements and would have facilitated access to larger salaries, transportation, and mobility; potentially leading to better diagnosis and treatment. Alternatively, the observed pattern of our interaction effect at older ages could be explained by selection bias or our operationalization of SES in this study. Highest completed level of education (e.g., grade, diploma, degree, etc.) is a blunt measure of SES, and does not fully capture all SES facets, including social, economic, and cultural capital [56, 57], and universal access to education increased post-Apartheid [58]. Additionally, we only sample from community health clinics, not private clinics, thereby missing a fraction of the SES spectrum.

Residence model effects

Consistent with previous research [21, 5961], we find TB risk is associated with living in larger towns. In our prior work, mobility in the Northern and Western Cape populations changed over the past 3 generations, with the highest levels of mobility in the grandparental generation [62]. Therefore, we tested whether mobility (operationalized as a different birthplace and residence) affected TB risk. Individuals currently residing in towns (regardless of birthplace) had higher odds of active TB, compared to those born in towns that migrated to rural areas, and lifetime rural dwellers. Unexpectedly, the individuals with the lowest odds of active TB are those born in a town who move to a rural area (Fig 4B). When we returned results to the community, the clinic staff hypothesized that despite nationally standardized BCG vaccinations, rural areas may have lower vaccination rates (observation communicated by clinical staff in the study catchment), therefore those born in town benefit from a greater likelihood of greater BCG vaccine during childhood and low adult M.tb exposure living in rural areas. The benefits of the BCG vaccine, however, attenuate in adolescents and memory of childhood TB episodes suffer from recall bias. Another possibility is that the town-born and rural-residence group accrued more wealth in towns before moving to a rural area, affording a different lifestyle than their rural neighbors (e.g., afford larger homes, less crowding, cleaner cooking fuel, etc.). This unique combination of factors may explain why the town-born rural-residence group has even lower odds of active TB compared to the lifetime rural dwellers. Future work should consider collecting birthplace in addition to current residence to better identify TB risk as M.tb exposure varies across the lifespan.

Common risk factors

Across global studies, men are on average 1.7 times more likely to have TB [6365]. Sex biases like this are common in other infectious diseases [66, 67] and are attributable to an intersection of sex (biological factors, e.g., immune function) and gender (social and behavioral factors, e.g., risk-taking behavior) [68]. Despite smoking not being a significant TB risk, we found 75.5% of men smoke compared to 55.8% of women, indicating at least some gender differences in risky behaviors in the Northern Cape population.

Smoking and alcohol consumption has been shown to increase TB risk and mortality in the Northern Cape and at the national level [18, 6971]. In our models, smoking had the expected effect on TB risk and alcohol was excluded from our models due to high missingness. Self-reporting biases in observational studies like this one are a concern for variables like smoking, alcohol consumption, and SES measures [72]. Our sample, however, reports much higher levels of smoking compared to large-scale national surveys (e.g., [73], men: 75.5.% vs. 41%; women: 55.8% vs. 21%, respectively suggesting minimal self-report bias in our study. The weak effect of smoking observed from our models may be due to our method of binary classification. We collected fine-scale smoking phenotypes (Supplementary Methods in S1 text) but because of the high missingness of these phenotypes, we ultimately classified participants as Smokers/Non-smokers. This stratification may mask the heterogeneity of smoking behaviors such as casual and binge substance use or differences in the types of smoking materials consumed.

Ancestry & ethnicity

Finally, we highlight that our study included enrollment from 12 different clinics, some of which are more than 250 kilometers apart. We surveyed ethnicity and genetic ancestry to test for population structure in the sample. Such structure can confound analysis if genetic ancestry tracks differential host risk for progression to TB or if different ethnicities have different cultural norms. Previous studies have described the high proportion of Khoe-San ancestry in Northern Cape communities but these largely focused on descendant groups who identify as Khoe-San (e.g. the ≠Khomani San, the Nama, Karretjie) rather than the general population [74]. Here, we show the clinical study population to be admixed with 4 other distinct ancestries (Fig 3), demonstrative of recent historical events. These include the Bantu expansion into Southern Africa, European colonialism, the Dutch East India Company (aka VOC] slave trade, and the displacement and forced settlement of indigenous South African Khoe-San groups, especially in the last few generations in the Northern Cape [43, 75]. Although we do observe heterogeneity in ancestry across clinics, correcting for the top 10 genetic principal components did not change the logistic regression results (Fig D in S1 Text and S1 Data). To our knowledge, this is the first study to report ancestry proportions of clinical populations in the Northern Cape Province, South Africa. This work provides a baseline to design future studies, such as exploring host genetic correlates of active TB progression in this population (Supplementary Discussion in S1 text).

Conclusion

Active TB progression is a multifactorial process involving the environment, genetics, and their interaction [1, 7]. Our results from the NCTB cohort indicate that sociodemographic variables strongly impact active TB risk. Effects that are unique to the Northern Cape Province may reflect how changes in the pre- to post-apartheid environment modified social factors, such as SES and mobility, which in turn impacted lifetime TB risk.

Supporting information

S1 Checklist

(DOCX)

pgph.0002643.s001.docx (66KB, docx)
S1 Data

(XLSX)

pgph.0002643.s002.xlsx (18.5KB, xlsx)
S1 Text

(DOCX)

pgph.0002643.s003.docx (2.6MB, docx)

Acknowledgments

We would like to thank all the participant communities in the Northern Cape for their continued trust and support in helping us undertake this project. We would also like to thank our community research assistants and translators who assisted in data collection for the project. We are grateful to Prof. Faadiel Essop, Dr. Desiree Petersen, Prof. Eileen Hoal, and Prof. Leslie Swartz for closely reading this manuscript. We would also like to thank Dr. Chris Gignoux and Dr. Mark Grote for statistical advice. Finally, we want to thank the Department of Health in the Northern Cape Province, South Africa for their continued support of the project.

Data Availability

All R scripts for statistical data analysis and visualization are available at https://github.com/oshiomah1/NCTB-Epidemiology-Project. The relevant raw genetic data is deposited in the European Genome-phenome Archive (study accession number: EGAS00001007850). To maintain the privacy and anonymity of our study participants, and following our IRB-approved protocol, epidemiological data is available upon reasonable request. For access, please contact the Stellenbosch University Health Research Ethics Office at ethics@sun.ac.za and Dr. Marlo Moller at marlom@sun.ac.za.

Funding Statement

This work was funded by NIH grant R35GM133531 to BMH. This work was also partially funded by the South African government through the South African Medical Research Council and the National Research Foundation (UID41744) to all members of DSI-NRF Centre of Excellence for Biomedical Tuberculosis Research: MM, CU, LW and JS. The content is solely the responsibility of the authors and does not necessarily represent the official views of the National Institutes of Health or the South African government. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

References

  • 1.World Health Organization. Global Tuberculosis Report 2021 [Internet]. 2021. Oct [cited 2022 Aug 8]. Available from: https://www.who.int/publications/i/item/9789240037021 [Google Scholar]
  • 2.Sharma SK, Mohan A. Extrapulmonary Tuberculosis. Indian J Med Res. 2004. Nov 11;120(4):316. [PubMed] [Google Scholar]
  • 3.Stallings CL, Glickman MS. Is Mycobacterium tuberculosis stressed out? A critical assessment of the genetic evidence. Microbes Infect Inst Pasteur. 2010. Dec;12(14–15):1091–1101. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 4.Statistics South Africa. Mortality and Causes of death in South Africa: Findings from death notification 2018 [Internet]. 2018. [cited 2022 Apr 4]. Available from: https://www.statssa.gov.za/publications/P03093/P030932017.pdf [Google Scholar]
  • 5.National Institute for Communicable Diseases. Microbiologically Confirmed Pulmonary TB—South Africa. 2019. TB Online Surveillance Dashboard. Available from: https://www.nicd.ac.za/tb-surveillance-dashboard/ [Google Scholar]
  • 6.Kim H, Tanser F, Tomita A, Vandormael A, Cuadros DF. Beyond HIV prevalence: identifying people living with HIV within underserved areas in South Africa. BMJ Glob Health. 2021. Apr 1;6(4):e004089. doi: 10.1136/bmjgh-2020-004089 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 7.Glaziou P, Falzon D, Floyd K, Raviglione M. Global epidemiology of tuberculosis. Semin Respir Crit Care Med. 2013;34(1):3–16. doi: 10.1055/s-0032-1333467 [DOI] [PubMed] [Google Scholar]
  • 8.Lacerda SNB, De Abreu TRC, De Figueiredo TMRM, De Luna FDT, De Sousa MAN, De Abreu LC, et al. Individual and social vulnerabilities upon acquiring tuberculosis: A literature systematic review. Int Arch Med. 2014. Jul;7(1):1–8. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 9.Bloom BR, Atun R, Cohen T, Dye C, Fraser H, Gomez GB, et al. Tuberculosis. In: Holmes KK, Bertozzi S, Bloom BR, Jha P, editors. Major Infectious Diseases [Internet]. 3rd ed. Washington (DC): The International Bank for Reconstruction and Development / The World Bank; 2017. [cited 2024 Mar 18]. Available from: http://www.ncbi.nlm.nih.gov/books/NBK525174/ [Google Scholar]
  • 10.Boelaert JR, Vandecasteele SJ, Appelberg R, Gordeuk VR. The effect of the host’s iron status on tuberculosis. J Infect Dis. 2007. Jun 15;195(12):1745–53. doi: 10.1086/518040 [DOI] [PubMed] [Google Scholar]
  • 11.Chadha VK, Kumar P, Jagannatha PS, Vaidyanathan PS, Unnikrishnan KP. Average annual risk of tuberculous infection in India. Int J Tuberc Lung Dis. 2005. Jan;9(1):116–8. [PubMed] [Google Scholar]
  • 12.Hanifa Y, Grant AD, Lewis J, Corbett EL, Fielding K, Churchyard G. Prevalence of latent tuberculosis infection among gold miners in South Africa. Int J Tuberc Lung Dis. 2009. Jan;13(1):39–46. [PubMed] [Google Scholar]
  • 13.Isler MA, Rivest P, Mason J, Brassard P. Screening employees of services for homeless individuals in Montréal for tuberculosis infection. J Infect Public Health. 2013. Jun;6(3):209–15. [DOI] [PubMed] [Google Scholar]
  • 14.Bruchfeld J, Correia-Neves M, Källenius G. Tuberculosis and HIV Coinfection. Cold Spring Harb Perspect Med. 2015. Jul;5(7):a017871. doi: 10.1101/cshperspect.a017871 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 15.Bhargava A, Benedetti A, Oxlade O, Pai M, Menzies D. Undernutrition and the incidence of tuberculosis in India: national and subnational estimates of the population-attributable fraction related to undernutrition. Natl Med J India. 2014;27(3):128–33. [PubMed] [Google Scholar]
  • 16.Dhlakama H, Lougue S, Mwambi HG, Ogunsakin RE. A Multilevel Analysis of the Associated and Determining Factors of TB among Adults in South Africa: Results from National Income Dynamics Surveys 2008 to 2017. Int J Environ Res Public Health. 2022. Aug 25;19(17):10611. doi: 10.3390/ijerph191710611 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 17.Dhlakama H, Lougue S. Bayesian Modelling of Tuberculosis Risk Factors in South Africa 2014. Int J Stat Med Res. 2017. Feb 27;6(1):34–48. [Google Scholar]
  • 18.Harling G, Ehrlich R, Myer L. The social epidemiology of tuberculosis in South Africa: A multilevel analysis. Soc Sci Med. 2008;66(2):492–505. doi: 10.1016/j.socscimed.2007.08.026 [DOI] [PubMed] [Google Scholar]
  • 19.Mahomed H, Hawkridge T, Verver S, Geiter L, Hatherill M, Abrahams DA, et al. Predictive factors for latent tuberculosis infection among adolescents in a high-burden area in South Africa. Int J Tuberc Lung Dis. 2011. Mar;15(3):331–6. [PubMed] [Google Scholar]
  • 20.Moyo S, Ismail F, Walt MV der, Ismail N, Mkhondo N, Dlamini S, et al. Prevalence of bacteriologically confirmed pulmonary tuberculosis in South Africa, 2017–19: a multistage, cluster-based, cross-sectional survey. Lancet Infect Dis. 2022. Aug 1;22(8):1172–80. doi: 10.1016/S1473-3099(22)00149-9 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 21.Ncayiyana JR, Bassett J, West N, Westreich D, Musenge E, Emch M, et al. Prevalence of latent tuberculosis infection and predictive factors in an urban informal settlement in Johannesburg, South Africa: a cross-sectional study. BMC Infect Dis. 2016. Nov 8;16(1):661. doi: 10.1186/s12879-016-1989-x [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 22.Fok A, Numata Y, Schulzer M, FitzGerald MJ. Risk factors for clustering of tuberculosis cases: a systematic review of population-based molecular epidemiology studies. Int J Tuberc Lung Dis. 2008. May 1;12(5):480–92. [PubMed] [Google Scholar]
  • 23.WHO. Global Tuberculosis Report 2013. World Health Organization; 2013. [Google Scholar]
  • 24.Diedrich CR, Flynn JL. HIV-1/mycobacterium tuberculosis coinfection immunology: how does HIV-1 exacerbate tuberculosis? Infect Immun. 2011. Apr;79(4):1407–17. doi: 10.1128/IAI.01126-10 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 25.South African National Department of Health, South African Medical Research Council, Human Sciences Research Council, National Institute for Communicable Diseases, World Health Organization, United States Agency for International Development, et al. The First National TB Prevalence Survey | South Africa 2018 [Internet]. 2018. [cited 2021 Sep 11]. Available from: https://www.nicd.ac.za/wp-content/uploads/2021/02/TB-Prevalence-survey-report_A4_SA_TPS-Short_Feb-2021.pdf [Google Scholar]
  • 26.Squires A. Epidemiology of tuberculosis in Northern Cape Province [Internet]. 2011. Available from: https://www.researchgate.net/profile/Adeline-Squires/publication/272821340_Epidemiology_of_tuberculosis_in_Northern_Cape_Province/links/54f004100cf2432ba657c798/Epidemiology-of-tuberculosis-in-Northern-Cape-Province.pdf [Google Scholar]
  • 27.National Institute for Communicable Diseases [Internet]. Microbiologically Confirmed Pulmonary TB—Province. 2017. TB Online Surveillance Dashboard. Available from: https://www.nicd.ac.za/tb-surveillance-dashboard/ [Google Scholar]
  • 28.Verver S, Warren RM, Munch Z, Richardson M, et al. Proportion of tuberculosis transmission that takes place in households in a high-incidence area. The Lancet. 2004. Jan 17;363(9404):212–4. [DOI] [PubMed] [Google Scholar]
  • 29.Middelkoop K, Mathema B, Myer L, Shashkina E, Whitelaw A, Kaplan G, et al. Transmission of Tuberculosis in a South African Community With a High Prevalence of HIV Infection. J Infect Dis. 2015. Jan 1;211(1):53–61. doi: 10.1093/infdis/jiu403 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 30.Wood R, Liang H, Wu H, Middelkoop K, Oni T, Rangaka MX, et al. Changing prevalence of tuberculosis infection with increasing age in high-burden townships in South Africa. Int J Tuberc Lung Dis. 2010. Apr;14(4):406–12. [PMC free article] [PubMed] [Google Scholar]
  • 31.Smith MH, Myrick JW, Oyageshio O, Uren C, Saayman J, Boolay S, et al. Epidemiological correlates of overweight and obesity in the Northern Cape Province, South Africa. PeerJ. 2023. Feb 9;11:e14723. doi: 10.7717/peerj.14723 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 32.Bi Q, Goodman KE, Kaminsky J, Lessler J. What is Machine Learning? A Primer for the Epidemiologist. Am J Epidemiol. 2019. Dec 31;188(12):2222–39. doi: 10.1093/aje/kwz189 [DOI] [PubMed] [Google Scholar]
  • 33.Buuren S van, Groothuis-Oudshoorn K. mice: Multivariate Imputation by Chained Equations in R. J Stat Softw. 2011. Dec 12;45:1–67. [Google Scholar]
  • 34.Rubin DB. Multiple Imputation for Nonresponse in Surveys [Internet]. John Wiley & Sons, Ltd; 1987. [cited 2024 Apr 19]. Available from: https://onlinelibrary.wiley.com/doi/abs/10.1002/9780470316696.fmatter [Google Scholar]
  • 35.Burgette LF, Reiter JP. Multiple Imputation for Missing Data via Sequential Regression Trees. Am J Epidemiol. 2010. Nov 1;172(9):1070–6. doi: 10.1093/aje/kwq260 [DOI] [PubMed] [Google Scholar]
  • 36.Heymans MW. psfmi: Prediction Model Pooling, Selection and Performance Evaluation Across Multiply Imputed Datasets [Internet]. Available from: https://mwheymans.github.io/psfmi/ [Google Scholar]
  • 37.Little RJ, Rubin DB. Statistical analysis with missing data. John Wiley & Sons; 2019. April 23. [Google Scholar]
  • 38.Fox J. Effect Displays in R for Generalised Linear Models. J Stat Softw. 2003. Jul 22;8:1–27. [Google Scholar]
  • 39.Alexander DH, Novembre J, Lange K. Fast model-based estimation of ancestry in unrelated individuals. Genome Res. 2009. Sep;19(9):1655. doi: 10.1101/gr.094052.109 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 40.Behr AA, Liu KZ, Liu-Fang G, Nakka P, Ramachandran S. pong: fast analysis and visualization of latent clusters in population genetic data. Bioinforma Oxf Engl. 2016. Sep 15;32(18):2817–23. doi: 10.1093/bioinformatics/btw327 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 41.Statistics South Africa. Provincial profile: Northern Cape Community Survey 2016, Report 03-01-14 [Internet]. 2018. Available from: http://cs2016.statssa.gov.za/wp-content/uploads/2018/07/NorthernCape.pdf [Google Scholar]
  • 42.Adhikari M. The Sons of Ham: Slavery and the Making of Coloured Identity. South Afr Hist J. 1992. Nov 1;27(1):95–112. [Google Scholar]
  • 43.Legassick M. Hidden Histories of Gordonia: Land dispossession and resistance in the Northern Cape, 1800–1990. NYU Press; 2016. [Google Scholar]
  • 44.LaValley MP. Logistic Regression. Circulation. 2008. May 6;117(18):2395–9. doi: 10.1161/CIRCULATIONAHA.106.682658 [DOI] [PubMed] [Google Scholar]
  • 45.Baker PA. From Apartheid to Neoliberalism: Health Equity in Post-Apartheid South Africa. Int J Health Serv. 2010. Jan 1;40(1):79–95. doi: 10.2190/HS.40.1.e [DOI] [PubMed] [Google Scholar]
  • 46.Maphumulo WT, Bhengu BR. Challenges of quality improvement in the healthcare of South Africa post-apartheid: A critical review. Curationis. 2019. May 29;42(1):e1–9. doi: 10.4102/curationis.v42i1.1901 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 47.Mhlanga D, Garidzirai R. The Influence of Racial Differences in the Demand for Healthcare in South Africa: A Case of Public Healthcare. Int J Environ Res Public Health. 2020. Jan;17(14):5043. doi: 10.3390/ijerph17145043 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 48.Davies PDO. Risk factors for tuberculosis. Monaldi Arch Chest Dis. 2005. Mar;63(1):37–46. doi: 10.4081/monaldi.2005.656 [DOI] [PubMed] [Google Scholar]
  • 49.de Vries G, van Hest NAH, Baars HWM, Šebek MMGG, Richardus JH. Factors associated with the high tuberculosis case rate in an urban area. Int J Tuberc Lung Dis. 2010. Jul 1;14(7):859–65. [PubMed] [Google Scholar]
  • 50.Smith JP, Oeltmann JE, Hill AN, Tobias JL, Boyd R, Click ES, et al. Characterizing tuberculosis transmission dynamics in high-burden urban and rural settings. Sci Rep. 2022. Apr 26;12(1):6780. doi: 10.1038/s41598-022-10488-2 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 51.Fofana AM, Moultrie H, Scott L, Jacobson KR, Shapiro AN, Dor G, et al. Cross-municipality migration and spread of tuberculosis in South Africa. Sci Rep. 2023. Feb 15;13(1):2674. doi: 10.1038/s41598-023-29804-5 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 52.Szklo M, Nieto FJ. Epidemiology: Beyond the Basics. 3rd ed. Jones & Bartlett Learning; 2014. [Google Scholar]
  • 53.Rothman KJ, Greenland S, Lash TL. Modern Epidemiology. 3rd ed. Philadelphia: Lippincott Williams & Wilkins; 2008. [Google Scholar]
  • 54.Woodward M. Epidemiology: Study Design and Data Analysis, 3rd ed. CRC Press; 2013. [Google Scholar]
  • 55.Scordo JM, Aguillón-Durán GP, Ayala D, Quirino-Cerrillo AP, Rodríguez-Reyna E, Mora-Guzmán F, et al. A prospective cross-sectional study of tuberculosis in elderly Hispanics reveals that BCG vaccination at birth is protective whereas diabetes is not a risk factor. PLOS ONE. 2021. Jul;16(7):e0255194–e0255194. doi: 10.1371/journal.pone.0255194 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 56.Khalatbari-Soltani S, Maccora J, Blyth FM, Joannès C, Kelly-Irving M. Measuring education in the context of health inequalities. Int J Epidemiol. 2022. Jun 1;51(3):701–8. doi: 10.1093/ije/dyac058 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 57.Shavers VL. Measurement of socioeconomic status in health disparities research. J Natl Med Assoc. 2007. Sep;99(9):1013–23. [PMC free article] [PubMed] [Google Scholar]
  • 58.Department of Education. South African Schools Act [Internet]. 1996. Nov [cited 2024 Mar 19]; Available from: https://www.gov.za/sites/default/files/gcis_document/201409/act84of1996.pdf [Google Scholar]
  • 59.Beiranvand R, Karimi A, Delpisheh A, Sayehmiri K, Soleimani S, Ghalavandi S. Correlation Assessment of Climate and Geographic Distribution of Tuberculosis Using Geographical Information System (GIS). Iran J Public Health. 2016. Jan;45(1):86–93. [PMC free article] [PubMed] [Google Scholar]
  • 60.Hoffner S, Hadadi M, Rajaei E, Farnia P, Ahmadi M, Jaberansari Z, et al. Geographic characterization of the tuberculosis epidemiology in iran using a geographical information system. Biomed Biotechnol Res J. 2018;2(3):213. [Google Scholar]
  • 61.Sikalengo G, Hella J, Mhimbira F, Rutaihwa LK, Bani F, Ndege R, et al. Distinct clinical characteristics and helminth co-infections in adult tuberculosis patients from urban compared to rural Tanzania. Infect Dis Poverty. 2018. Mar 24;7(1):24. doi: 10.1186/s40249-018-0404-9 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 62.Reynolds A, Grote MN, Myrick JW, Al-Hindi DR, Siford RL, Mastoras M, et al. Persistence of matrilocal post-marital residence across multiple generations in Southern Africa. Human Nature. 2023. Jun;34(2):295–323. doi: 10.1007/s12110-023-09452-4 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 63.Hertz D, Schneider B. Sex differences in tuberculosis. Semin Immunopathol. 2019. Mar;41(2):225–37. doi: 10.1007/s00281-018-0725-6 [DOI] [PubMed] [Google Scholar]
  • 64.Holmes CB, Hausler H, Nunn P. A review of sex differences in the epidemiology of tuberculosis. Int J Tuberc Lung Dis. 1998. Feb 1;2(2):96–104. [PubMed] [Google Scholar]
  • 65.Neyrolles O, Quintana-Murci L. Sexual Inequality in Tuberculosis. PLOS Med. 2009. Dec 22;6(12):e1000199. doi: 10.1371/journal.pmed.1000199 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 66.World Health Organization(Western Pacific Region). Taking sex and gender into account in emerging infectious disease programme: an analytical framework [Internet]. 2007. [cited 2022 Aug 29]. Available from: https://www.who.int/publications-detail-redirect/9789290615323 [Google Scholar]
  • 67.Wizemann TM, Pardue ML. Exploring the Biological Contributions to Human Health [Internet]. National Academies Press (US); 2001. [cited 2022 Aug 29]. Available from: https://www.ncbi.nlm.nih.gov/books/NBK222288/ [PubMed] [Google Scholar]
  • 68.Lawry LL, Lugo-Robles R, McIver V. Improvements to a framework for gender and emerging infectious diseases. Bull World Health Organ. 2021. Sep 1;99(9):682–4. doi: 10.2471/BLT.20.275636 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 69.Peltzer K, Louw J, Mchunu G, Naidoo P, Matseke G, Tutshana B. Hazardous and Harmful Alcohol Use and Associated Factors in Tuberculosis Public Primary Care Patients in South Africa. Int J Environ Res Public Health. 2012. Sep;9(9):3245–57. doi: 10.3390/ijerph9093245 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 70.Sitas F, Urban M, Bradshaw D, Kielkowski D, Bah S, Peto R. Tobacco attributable deaths in South Africa. Tob Control. 2004. Dec;13(4):396–9. doi: 10.1136/tc.2004.007682 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 71.Wessels J, Walsh CM, Nel M. Smoking habits and alcohol use of patients with tuberculosis at Standerton Tuberculosis Specialised Hospital, Mpumalanga, South Africa. Health SA SA Gesondheid. 2019. Oct 8;24:1146. doi: 10.4102/hsag.v24i0.1146 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 72.Althubaiti A. Information bias in health research: definition, pitfalls, and adjustment methods. J Multidiscip Healthc. 2016. May 4;9:211–7. doi: 10.2147/JMDH.S104807 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 73.National Department of Health (NDoH), Statistics South Africa, South African Medical Research Coucil, ICF. South Africa Demographic and Health Survey 2016. [Internet]. [cited 2022 Aug 8]. Available from: https://dhsprogram.com/pubs/pdf/FR337/FR337.pdf [Google Scholar]
  • 74.Uren C, Kim M, Martin AR, Bobo D, Gignoux CR, van Helden PD, et al. Fine-Scale Human Population Structure in Southern Africa Reflects Ecogeographic Boundaries. Genetics. 2016. Sep 1;204(1):303–14. doi: 10.1534/genetics.116.187369 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 75.Penn N. The Forgotten Frontier: Colonist and Khoisan on the Cape’s Northern Frontier in the 18th Century. Juta and Company Ltd; 2005. [Google Scholar]
PLOS Glob Public Health. doi: 10.1371/journal.pgph.0002643.r001

Decision Letter 0

Indira Govender

8 Jan 2024

PGPH-D-23-02135

Strong Effect of Demographic Changes on Tuberculosis­­ Susceptibility in South Africa

PLOS Global Public Health

Dear Dr. Oyageshio,

Thank you for submitting your manuscript to PLOS Global Public Health. After careful consideration, we feel that it has merit but does not fully meet PLOS Global Public Health’s publication criteria as it currently stands. Therefore, we invite you to submit a revised version of the manuscript that addresses the points raised during the review process.

==============================

Both reviewers have major issues related to how ancestry is used to draw conclusions in this analysis; the selection of cases and controls (specifically pertaining to HIV status) and subsequent biases; the validity of methods used to determine TB disease history; and the statistical analyses. Please address all major and minor comments in your revised submission.

==============================

Please submit your revised manuscript by 4 February 2024. If you will need more time than this to complete your revisions, please reply to this message or contact the journal office at globalpubhealth@plos.org. When you're ready to submit your revision, log on to https://www.editorialmanager.com/pgph/ and select the 'Submissions Needing Revision' folder to locate your manuscript file.

Please include the following items when submitting your revised manuscript:

  • A rebuttal letter that responds to each point raised by the editor and reviewer(s). You should upload this letter as a separate file labeled 'Response to Reviewers'.

  • A marked-up copy of your manuscript that highlights changes made to the original version. You should upload this as a separate file labeled 'Revised Manuscript with Track Changes'.

  • An unmarked version of your revised paper without tracked changes. You should upload this as a separate file labeled 'Manuscript'.

Guidelines for resubmitting your figure files are available below the reviewer comments at the end of this letter.

We look forward to receiving your revised manuscript.

Kind regards,

Indira Govender, MBChB, MMed, FCPHM

Academic Editor

PLOS Global Public Health

Journal Requirements:

1. Please include a complete copy of PLOS’ questionnaire on inclusivity in global research in your revised manuscript. Our policy for research in this area aims to improve transparency in the reporting of research performed outside of researchers’ own country or community. The policy applies to researchers who have travelled to a different country to conduct research, research with Indigenous populations or their lands, and research on cultural artefacts. The questionnaire can also be requested at the journal’s discretion for any other submissions, even if these conditions are not met.  Please find more information on the policy and a link to download a blank copy of the questionnaire here: https://journals.plos.org/globalpublichealth/s/best-practices-in-research-reporting. Please upload a completed version of your questionnaire as Supporting Information when you resubmit your manuscript.

Additional Editor Comments (if provided):

[Note: HTML markup is below. Please do not edit.]

Reviewers' comments:

Reviewer's Responses to Questions

Comments to the Author

1. Does this manuscript meet PLOS Global Public Health’s publication criteria? Is the manuscript technically sound, and do the data support the conclusions? The manuscript must describe methodologically and ethically rigorous research with conclusions that are appropriately drawn based on the data presented.

Reviewer #1: Partly

Reviewer #2: Yes

**********

2. Has the statistical analysis been performed appropriately and rigorously?

Reviewer #1: No

Reviewer #2: Yes

**********

3. Have the authors made all data underlying the findings in their manuscript fully available (please refer to the Data Availability Statement at the start of the manuscript PDF file)?

The PLOS Data policy requires authors to make all data underlying the findings described in their manuscript fully available without restriction, with rare exception. The data should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data—e.g. participant privacy or use of data from a third party—those must be specified.

Reviewer #1: No

Reviewer #2: Yes

**********

4. Is the manuscript presented in an intelligible fashion and written in standard English?

PLOS Global Public Health does not copyedit accepted manuscripts, so the language in submitted articles must be clear, correct, and unambiguous. Any typographical or grammatical errors should be corrected at revision, so please note any specific errors here.

Reviewer #1: Yes

Reviewer #2: Yes

**********

5. Review Comments to the Author

Please use the space provided to explain your answers to the questions above. You may also include additional comments for the author, including concerns about dual publication, research ethics, or publication ethics. (Please upload your review as an attachment if it exceeds 20,000 characters)

Reviewer #1: This study focuses on the factors that affect TB risk in an understudied TB endemic region. The authors use a case-control study and find that the following factors increase TB risk: male gender, age x socioeconomic status, and birthplace x residence locality. The variables included in the interactions did not have main effects on TB risk, nor did smoking and alcohol consumption. Finally, the authors performed genotyping and report that participants had a majority Khoe-San ancestry, but genetic ancestry is never linked to TB risk. This paper represents a large amount of collaborative and interdisciplinary work and an important description of an understudied population. My main comment is that several parts of the analyses and decisions behind them are unclear, including whether and how genetic ancestry is analyzed in the context of TB.

Major comments:

The introduction discusses how genetic variation and genetic ancestry can impact TB risk, and line 91 says that ancestry proportions are investigated. However, I didn’t see these analyses anywhere in the paper.

Line 259 – Do you examine ancestry effects on host susceptibility? This concept is brought up again but no analyses are described in the methods.

Are there any relatives in your study? Or population structure? If so, please consider linear mixed effects models that control for pairwise genetic relatedness and/or principal components of genetic variation. Since you have genetic data, I’m not sure why it isn’t used for data analysis either to allow for modeling of a key covariate and/or to examine ancestry effects on susceptibility.

Why are HIV positive individuals not considered cases? They still have TB. From the flow diagram in Figure 1 it also looks like HIV positive status does not exclude someone from being a control, which seems like it would create bias if they are excluded from one group and not the other.

Minor comments:

Line 52 – “The extent of these determinants’ effects varies across and within populations, necessitating epidemiological studies in differing contexts and communities”. I’m sure this is true, but a few examples would really help. In general, the motivation for the particular study (beyond studying an understudied population) could use some fleshing out.

Can you provide some more background about why TB/HIV co-infection is so common? Is the thought that individuals succumb to TB after being infected with HIV because of compromised immune systems? It is also confusing that the intro focuses on TB/HIV co-infection, and then these individuals are excluded from the actual study.

It is very inconvenient to have the figure legends embedded in the text, but the figures themselves at the end of the manuscript.

It’s not clear until the methods that the study is really comparing suspected/latent TB individuals vs individuals with currently active or past active TB. Could you make that clear in the intro and reframe some of the background about risk factors for progression to active TB (rather than risk factors for TB alone)?

Line 159-161 – Please expand on this procedure. How were Google Earth images translated into population size? It would also be nice to compare censuses data to your Google Earth method for a few regions that have census data, to see how correlated the methods are and if there is any bias. Also, this part of the methods makes it sound like you gather population density info, but then there’s not explanation of how this is binarized into the variable you use for modeling (town vs rural).

Line 219 - Please justify the use of the particular populations included as reference groups in the ADMIXTURE analyses. Also why was k=5 chosen? Were other values of k tested?

Line 290 – Why not run the common risk factor model for the individuals included in the residence model? That way you could ask whether the inclusion of residence variables improves the model. As it stands, it’s very unclear because there is no base model to compare to.

Table 1 – You don’t need a blank row for “Current Residence - Rural (reference)”

Apologies if I missed it, but could you provide some ideas in the discussion about the interaction between birthplace and current residence, since the effects go in the opposite direction of what was expected?

Reviewer #2: This study aims to assess risk factors for TB in a sample of patients attending health facilities in the Northern Cape province of South Africa. Consistent with previous studies, male sex and lower educational attainment were found to be risk factors, but in further analysis the effect of educational attainment was found to differ substantially with respect to age. Also somewhat surprisingly, although there was a substantially higher TB risk in town dwellers, people who had been born in towns and resided in rural areas had the lowest TB risk.

Major comments

The study design is unusual because the “cases” are defined as people who either have current TB (microbiologically confirmed by GeneXpert testing) or who self-report having had TB previously. The latter is less ideal because self-report is generally not considered very reliable (there may be recall bias, particularly if the TB episode occurred a very long time ago, or there may be social desirability bias given the stigma associated with TB). The authors don’t report what fraction of the cases were self-reported versus microbiologically diagnosed, and so it’s difficult to know how serious this limitation is. This information is also important in assessing how much of the TB is past versus current TB. Did the authors assess whether there were differences in the TB risk factors when considering separately the current TB cases and the self-reported past TB cases?

There are a number of places in the paper where the authors allude to historical changes (or in some cases “demographic changes”) as possible explanations for interactions between risk factors, but without providing coherent explanations or clear hypotheses. For example, in lines 377-9: “This study demonstrates a possible unique historical context to South Africa, (post-)Apartheid differential effects between sociodemographic and health outcomes.” It is difficult to make a convincing argument when past TB and current TB are analysed as a single outcome.

Related to the previous point, the authors could do more to compare their study with other South African studies on the role of socio-economic status. It’s difficult to know whether the risk factors they describe as “different from other global populations” (in the abstract) are really a peculiarity of this study design/setting or consistent with other South African data. As examples of South African studies that have assessed the effect of socio-economic status on TB risk, see Ncayiyana et al (BMC Infectious Diseases, 2016, 16:661), Mahomed et al (Int J Tuberc Lung Dis, 2011, 15(3):331-6) and Harling et al (Social Science & Medicine, 2008, 492-505). It should also be acknowledged that educational attainment may be a crude measure of socio-economic status (especially in the post-Apartheid context of universal access to education). In addition, participants were recruited from public health facilities, and so people using private healthcare are not represented, which obviously limits the ability to draw conclusions about TB risk at higher SES levels.

The finding that TB risk is lowest in people who were born in towns but now reside in rural areas is a bit unexpected. The authors attempt to provide an explanation for this is the Discussion, arguing that BCG vaccine coverage is probably better in urban areas than in rural areas, meaning the town-born individuals are more protected early in life. But this argument seems a bit tenuous, given that the authors don’t cite any evidence of low BCG vaccine coverage in rural areas, and given that the BCG vaccine is generally thought to provide only limited protection in early childhood (a life stage that most people are unlikely to remember when asked if they’ve ever had TB). Another explanation might be that the people who were born in towns but now reside in rural areas are generally of higher socio-economic status.

The decision to exclude people living with HIV (PLWH) from the cases but not the controls seems a bit odd, and the rationale isn’t clear in the main paper. In the supplementary materials it becomes clear that the authors are trying to focus on non-HIV risk factors (it would perhaps have helped to explain this in the main text). But if one is going to exclude PLWH, does it not make sense to exclude them from both the cases and the controls? Describing HIV-positive non-TB patients as “resisters” (in the supplementary materials) is a bit weak, since the authors haven’t done any TST/IGRA testing, but even if it is true, how does this justify the decision to include them as controls?

The interpretation of Figure 2 could be questioned. The authors interpret the (relatively) low TB risk at older ages as indicating a “survivor bias”. Although that is possible, one should also consider that (a) these are self-reported data that could be affected by recall bias (more important in older adults), and (b) these are data from patients attending health facilities, and not a population-based sample (one might expect that older adults would be more likely to be attending health facilities because of chronic conditions). I did also wonder whether it was really necessary to include Figure 2 in the paper (and if so, whether it should not go in the Results section, rather than the Methods section).

A limitation is that 35% of the alcohol data in this study were imputed. Although this is mentioned in the supplementary materials, it’s important enough that it should be mentioned in the main text. The high level of missing alcohol data cou

PLOS Glob Public Health. doi: 10.1371/journal.pgph.0002643.r003

Decision Letter 1

Indira Govender

3 Apr 2024

PGPH-D-23-02135R1

Strong Effect of Demographic Changes on Tuberculosis­­ Susceptibility in South Africa

PLOS Global Public Health

Dear Dr. Oshiomah Oyageshio,

Thank you for submitting your manuscript to PLOS Global Public Health. After careful consideration, we feel that it has merit but does not fully meet PLOS Global Public Health’s publication criteria as it currently stands. Therefore, we invite you to submit a revised version of the manuscript that addresses the points raised during the review process.

==============================

We appreciate the authors having made significant progress in addressing the first round of comments from reviewers. However there remain major concerns around the selection and description of controls, and the availability of data used in this analysis. Please note that due to a technical issue, reviewer 1 was only able to mark the revisions as acceptable but has provided feedback requesting minor revisions which are inserted below:

The authors have done a thorough job of addressing my comments. I appreciate the attention to detail and found the responses to be generally robust.

I have two minor outstanding comments:

Line 219 - Please justify the use of the particular populations included as reference groups in the ADMIXTURE analyses. Also why was k=5 chosen? Were other values of k tested?

● We have previously analyzed genetic ancestry data for other populations from the Western and Northern Cape (e.g. Uren et al 2016, Genetics; Petersen et al. 2013, PLoS Genetics; De Wit et al 2010, Human Genetics). Based on the observation of ancestry proportions we know that there are two ancestries (eastern Asian: Indonesian, Malay, Chinese; southern Asian: Sri Lanka, India, Bangladesh) which are minority ancestries typically less than 5% derived from the VOC slave trade at lower fractions compared to Cape Town. European migration is well documented historically and in interviews. Interviews also suggest that recent migration from Bantu-speakers impacts the communities, including Xhosa, Tswana and others. Finally, the Khoe-San ancestry could have been broken down into 2 clusters based on some dierentiation between Khoekhoe and San-derived demographic histories. But this ancestry dierentiation estimate tends to be unstable in ADMIXTURE, likely due to small samples sizes for the Khoekhoe. Hence, k=5.

Additional reviewer response: Got it. Please incorporate this information into the methods so it clear to all readers.

Are there any relatives in your study? Or population structure? If so, please consider linear mixed effects models that control for pairwise genetic relatedness and/or principal components of genetic variation. Since you have genetic data, I’m not sure why it isn’t used for data analysis either to allow for modeling of a key covariate and/or to examine ancestry effects on susceptibility.

●  We do see moderate population structure across clinics. Please see Supplementary section “Genetic ancestry and ethnicity” for detailed discussion. 


●  We agree it would be appropriate to include PC covariates or GRM for this analysis. It is the goal of this project is to perform these association analyses in the future with more data, however, we have only genotyped ~159 samples, which will be underpowered to identify genetic associations. Grant applications to genotype the full dataset have not yet been funded and would total ~$60,000 prohibiting us from including them in the current paper. 


Additional review response: Thanks, that makes sense and I understand the financial limitations for genotyping the whole cohort. However, 159 is a reasonable sample size for checking whether your scientific conclusions are likely to change when you account for population structure, which you demonstrate is pretty substantial. For example, I think it would be worthwhile to check if effect sizes dramatically change in magnitude or direction when you include PCs in the model using the reduced dataset. Or potentially you could ask about average TB patterns as a function of average per-clinic admixture proportions. Again, given that you do see population structure, I think it's worthwhile to try to get some handle on how it could be impacting/confounding current conclusions, even though I understand larger datasets would be more ideal to really dig into this.

==============================

Please submit your revised manuscript by 3 May 2024. If you will need more time than this to complete your revisions, please reply to this message or contact the journal office at globalpubhealth@plos.org. When you're ready to submit your revision, log on to https://www.editorialmanager.com/pgph/ and select the 'Submissions Needing Revision' folder to locate your manuscript file.

Please include the following items when submitting your revised manuscript:

  • A rebuttal letter that responds to each point raised by the editor and reviewer(s). You should upload this letter as a separate file labeled 'Response to Reviewers'.

  • A marked-up copy of your manuscript that highlights changes made to the original version. You should upload this as a separate file labeled 'Revised Manuscript with Track Changes'.

  • An unmarked version of your revised paper without tracked changes. You should upload this as a separate file labeled 'Manuscript'.

Guidelines for resubmitting your figure files are available below the reviewer comments at the end of this letter.

We look forward to receiving your revised manuscript.

Kind regards,

Indira Govender, MBChB, MMed, FCPHM

Academic Editor

PLOS Global Public Health

Journal Requirements:

Additional Editor Comments (if provided):

1. Please include a complete copy of PLOS’ questionnaire on inclusivity in global research in your revised manuscript. Our policy for research in this area aims to improve transparency in the reporting of research performed outside of researchers’ own country or community. The policy applies to researchers who have travelled to a different country to conduct research, research with Indigenous populations or their lands, and research on cultural artefacts. The questionnaire can also be requested at the journal’s discretion for any other submissions, even if these conditions are not met.  Please find more information on the policy and a link to download a blank copy of the questionnaire here: https://journals.plos.org/globalpublichealth/s/best-practices-in-research-reporting. Please upload a completed version of your questionnaire as Supporting Information when you resubmit your manuscript.

[Note: HTML markup is below. Please do not edit.]

Reviewers' comments:

Reviewer's Responses to Questions

Comments to the Author

1. If the authors have adequately addressed your comments raised in a previous round of review and you feel that this manuscript is now acceptable for publication, you may indicate that here to bypass the “Comments to the Author” section, enter your conflict of interest statement in the “Confidential to Editor” section, and submit your "Accept" recommendation.

Reviewer #1: All comments have been addressed

Reviewer #2: (No Response)

**********

2. Does this manuscript meet PLOS Global Public Health’s publication criteria? Is the manuscript technically sound, and do the data support the conclusions? The manuscript must describe methodologically and ethically rigorous research with conclusions that are appropriately drawn based on the data presented.

Reviewer #1: Yes

Reviewer #2: Partly

**********

3. Has the statistical analysis been performed appropriately and rigorously?

Reviewer #1: Yes

Reviewer #2: Yes

**********

4. Have the authors made all data underlying the findings in their manuscript fully available (please refer to the Data Availability Statement at the start of the manuscript PDF file)?

The PLOS Data policy requires authors to make all data underlying the findings described in their manuscript fully available without restriction, with rare exception. The data should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data—e.g. participant privacy or use of data from a third party—those must be specified.

Reviewer #1: No

Reviewer #2: No

**********

5. Is the manuscript presented in an intelligible fashion and written in standard English?

PLOS Global Public Health does not copyedit accepted manuscripts, so the language in submitted articles must be clear, correct, and unambiguous. Any typographical or grammatical errors should be corrected at revision, so please note any specific errors here.

Reviewer #1: Yes

Reviewer #2: Yes

**********

6. Review Comments to the Author

Please use the space provided to explain your answers to the questions above. You may also include additional comments for the author, including concerns about dual publication, research ethics, or publication ethics. (Please upload your review as an attachment if it exceeds 20,000 characters)

Reviewer #1: (No Response)

Reviewer #2: The authors have made significant attempts at address the concerns raised around the first submission, and I appreciate that a number of aspects of the paper are now clearer. I also appreciate that they have removed the HIV-positive individuals from the controls (to be consistent with the definition of cases). Unfortunately, though, there are still a number of significant problems.

Firstly, the authors insist on describing their controls as population-based controls, which is simply not true. The controls were recruited from health facilities and were “TB suspects”, which presumably means they had symptoms suggestive of TB. This strongly suggests the controls are NOT representative of the general population. Failure to acknowledge this issue leads to problems in the interpretation of the results. As I commented in the original submission, older people are more likely to attend clinics than younger people, and thus one cannot assume the age distribution of the controls to be the same as the age distribution of the general adult population. This makes it dangerous to infer a survivor bias based on the difference in age distribution between cases and controls. The paragraph in lines 91-102 is still confusing and unnecessary.

Secondly, the authors still are not providing a clear motivation for why the genetic data belong in this study. The data don’t say anything about TB risk, so don’t they belong in a separate paper?

Thirdly, the authors have only partially addressed my comment about the unexpected finding that TB risk is lowest in people who are born in towns but migrated to rural areas. It’s unsatisfactory that the authors rely on hypotheses of clinic staff that are neither logical nor supported by evidence. If they really wanted to document the perspectives of clinic staff (as they say in their response letter), I would have expected a related research objective with an appropriate qualitative research methodology.

Fourthly, the authors have removed from the main text the first statistical model (the “Common risk factor model”, i.e. the model without interactions). It’s not clear why this was done – I find it quite difficult to interpret the results of the interaction models without being able to compare against the results of the simpler model without interactions.

Lastly, there is still a lot of explanation and interpretation in the Results section that should rather have gone in the Methods section or others sections. For example, there is much description of the statistical models and the hypotheses that are being tested, whic

PLOS Glob Public Health. doi: 10.1371/journal.pgph.0002643.r005

Decision Letter 2

Indira Govender

13 Jun 2024

Strong Effect of Demographic Changes on Tuberculosis­­ Susceptibility in South Africa

PGPH-D-23-02135R2

Dear Mr Oshiomah Oyageshio,

We thank you for your considered responses to all queries raised by the reviewers and are pleased to inform you that your manuscript 'Strong Effect of Demographic Changes on Tuberculosis­­ Susceptibility in South Africa' has been provisionally accepted for publication in PLOS Global Public Health.

Before your manuscript can be formally accepted you will need to complete some formatting changes, which you will receive in a follow up email. A member of our team will be in touch with a set of requests.

Please note that your manuscript will not be scheduled for publication until you have made the required changes, so a swift response is appreciated.

IMPORTANT: The editorial review process is now complete. PLOS will only permit corrections to spelling, formatting or significant scientific errors from this point onwards. Requests for major changes, or any which affect the scientific understanding of your work, will cause delays to the publication date of your manuscript.

If your institution or institutions have a press office, please notify them about your upcoming paper to help maximize its impact. If they'll be preparing press materials, please inform our press team as soon as possible -- no later than 48 hours after receiving the formal acceptance. Your manuscript will remain under strict press embargo until 2 pm Eastern Time on the date of publication. For more information, please contact globalpubhealth@plos.org.

Thank you again for supporting Open Access publishing; we are looking forward to publishing your work in PLOS Global Public Health.

Best regards,

Indira Govender

Academic Editor

PLOS Global Public Health

***********************************************************

Associated Data

    This section collects any data citations, data availability statements, or supplementary materials included in this article.

    Supplementary Materials

    S1 Checklist

    (DOCX)

    pgph.0002643.s001.docx (66KB, docx)
    S1 Data

    (XLSX)

    pgph.0002643.s002.xlsx (18.5KB, xlsx)
    S1 Text

    (DOCX)

    pgph.0002643.s003.docx (2.6MB, docx)
    Attachment

    Submitted filename: Response to Reviewers.pdf

    pgph.0002643.s004.pdf (2.7MB, pdf)
    Attachment

    Submitted filename: Response to Reviewers2.pdf

    pgph.0002643.s005.pdf (209.5KB, pdf)

    Data Availability Statement

    All R scripts for statistical data analysis and visualization are available at https://github.com/oshiomah1/NCTB-Epidemiology-Project. The relevant raw genetic data is deposited in the European Genome-phenome Archive (study accession number: EGAS00001007850). To maintain the privacy and anonymity of our study participants, and following our IRB-approved protocol, epidemiological data is available upon reasonable request. For access, please contact the Stellenbosch University Health Research Ethics Office at ethics@sun.ac.za and Dr. Marlo Moller at marlom@sun.ac.za.


    Articles from PLOS Global Public Health are provided here courtesy of PLOS

    RESOURCES