Abstract
Previously, the International Tuberculosis Host Genetics Consortium (ITHGC) demonstrated the power of large-scale GWAS analysis across diverse ancestries in identifying tuberculosis (TB) susceptibility loci (Schurz et al., 2024). Despite identifying a significant genetic correlate in the human leukocyte antigen (HLA)-II region, this association did not replicate in the African ancestry-specific analysis, due to small sample size and the inclusion of admixed samples. Our study aimed to build upon the findings from the ITHGC and identify TB susceptibility loci in an admixed South African cohort using the local ancestry allelic adjusted association (LAAA) model. We identified a suggestive association peak (rs3117230, p-value = 5.292 × 10-6, OR = 0.437, SE = 0.182) in the HLA-DPB1 gene originating from KhoeSan ancestry. These findings extend the work of the ITHGC, underscore the need for innovative strategies in studying complex admixed populations, and confirm the role of the HLA-II region in TB susceptibility in admixed South African samples.
Research organism: Human
Introduction
Tuberculosis (TB) is a communicable disease caused by Mycobacterium tuberculosis (M.tb) (World Health Organization, 2023). M.tb infection has a wide range of clinical manifestations from asymptomatic, non-transmissible, or so-called ‘latent’, infections to active TB (Zaidi et al., 2023). Approximately 1/4 of the global population is infected with M.tb, but only 5–15% of infected individuals will develop active TB (Menzies et al., 2021). Several factors increase the risk of progressing to active TB, including co-infection with HIV and comorbidities, such as diabetes mellitus, asthma and other airway and lung diseases (Glaziou et al., 2018). Socio-economic factors including smoking, malnutrition, alcohol abuse, intravenous drug use, prolonged residence in a high burdened community, overcrowding, informal housing and poor sanitation also influence M.tb transmission and infection (Cudahy et al., 2020; Escombe et al., 2019; Laghari et al., 2019; Matose et al., 2019; Smith et al., 2023). Additionally, individual variability in infection and disease progression has been attributed to variation in the host genome (Schurz et al., 2024; Uren et al., 2020; Verhein et al., 2018; Uren et al., 2021). Numerous genome-wide association studies (GWASs) investigating TB susceptibility have been conducted across different population groups. However, findings from these studies often do not replicate across population groups (Möller and Kinnear, 2020; Möller et al., 2018; Uren et al., 2017). This lack of replication could be caused by small sample sizes, variation in phenotype definitions among studies, variation in linkage disequilibrium (LD) patterns across different population groups and the presence of population-specific effects (Möller and Kinnear, 2020). Additionally, complex LD patterns within population groups, produced by admixture, impede the detection of statistically significant loci when using traditional GWAS methods (Swart et al., 2020).
The International Tuberculosis Host Genetics Consortium (ITHGC) performed a meta-analysis of TB GWAS results including 14 153 TB cases and 19 536 controls of African, Asian and European ancestries (Schurz et al., 2024). The multi-ancestry meta-analysis identified one genome-wide significant variant (rs28383206) in the human leukocyte antigen (HLA)-II region (p=5.2 x 10–9, OR = 0.89, 95% CI=0.84–0.95). The association peak at the HLA-II locus encompassed several genes encoding crucial antigen presentation proteins (including HLA-DR and HLA-DQ). While ancestry-specific association analyses in the European and Asian cohorts also produced suggestive peaks in the HLA-II region, the African ancestry-specific association test did not yield any significant associations or suggestive peaks. The authors described possible reasons for the lack of associations, including the smaller sample size compared to the other ancestry-specific meta-analyses, increased genetic diversity within African individuals and population stratification produced by two admixed cohorts from the South African Coloured (SAC) population (Schurz et al., 2024). The SAC population (as termed in the South African census Lehohla, 2012) forms part of a multi-way (up to five-way) admixed population with ancestral contributions from Bantu-speaking African (~30%), KhoeSan (~30%), European (~20%), and East (~10%) and Southeast Asian (~10%) populations (Chimusa et al., 2013). The diverse genetic background of admixed individuals can lead to population stratification, potentially introducing confounding variables. However, the power to detect statistically significant loci in admixed populations can be improved by leveraging admixture-induced local ancestry (Swart et al., 2021; Swart et al., 2022a). Since previous computational algorithms did not include local ancestry as a covariate for GWASs, the local ancestry allelic adjusted association model (LAAA) was developed to overcome this limitation (Duan et al., 2018). The LAAA model identifies ancestry-specific alleles associated with the phenotype by including the minor alleles and the corresponding ancestry of the minor alleles (obtained by local ancestry inference) as covariates. The LAAA model has been successfully applied in a cohort of multi-way admixed SAC individuals to identify novel variants associated with TB susceptibility (Swart et al., 2021; Swart et al., 2022b).
Our study builds upon the findings from the ITHGC (Schurz et al., 2024) and aims to resolve the challenges faced in African ancestry-specific association analysis. Here, we explore host genetic correlates of TB in a complex admixed SAC population using the LAAA model.
Results
Global and local ancestry inference
After close inspection of global ancestry proportions generated using ADMIXTURE, the K number of contributing ancestries (the lowest k-value determined through cross-validation) was K=3 for the Xhosa individuals and K=5 for the SAC individuals (Figure 1). This is consistent with previous global ancestry deconvolution results (Chimusa et al., 2014; Choudhury et al., 2021). It is evident that our cohort is a complex, highly admixed group with ancestral contributions from the indigenous KhoeSan (~22–30%), Bantu-speaking African (~30–72%), European (~5–24%), Southeast Asian (~11%), and East Asian (~5%) population groups.
Figure 1. Genome-wide ancestral proportions of all individuals in the merged dataset.
Ancestral proportions for each individual are plotted vertically with different colours representing different contributing ancestries.
Local ancestry was estimated for all individuals. Admixture between geographically distinct populations creates complex ancestral and admixture-induced LD blocks, which can be visualised using local ancestry karyograms. Figure 2 shows karyograms for three individuals from the merged dataset. It is evident that, despite individuals being from the same population group, each possesses unique patterns of local ancestry arising from differing numbers and lengths of ancestral segments.
Figure 2. Local ancestry karyograms of three admixed individuals from the SAC population.
Each admixed individual (A, B and C) has unique local ancestry patterns generated by admixture among geographically distinct ancestral population groups.
Local ancestry-allelic adjusted analysis
LAAA models were successfully applied for all five contributing ancestries (KhoeSan, Bantu-speaking African, European, East Asian and Southeast Asian). However, no variants passed the threshold for statistical significance. Although no variants reached genome-wide significance, a suggestive peak was identified in the HLA-II region of chromosome 6 when using the LAAA model and adjusting for KhoeSan ancestry (Figure 3). The QQ-plot suggested minimal genomic inflation, which was verified by calculating the genomic inflation factor (λ=1.05289; Figure 3—figure supplement 1). The lead variants identified using the LAAA model whilst adjusting for KhoeSan ancestry in this region on chromosome 6 are summarised in Table 1. The suggestive peak encompasses the HLA-DPA1/B1 (major histocompatibility complex, class II, DP alpha 1/beta 1) genes (Figure 4). It is noteworthy that without the LAAA model, this suggestive peak would not have been observed for this cohort. This highlights the importance of utilising the LAAA model in future association studies when investigating disease susceptibility loci in admixed individuals, such as the SAC population.
Figure 3. Log transformation of association signals obtained for KhoeSan ancestry whilst using the LAAA model on chromosome 6.
The thresholds for genome-wide significance (p-value = 5 x 10–8) and suggestive significance (p-value = 1 x 10–5) and the significance threshold for admixture mapping (p-value = 2.5 x 10–6) are shown. The four different models are represented in black (global ancestry only - GAO), blue (local ancestry effect - LAO), orange (ancestry plus allelic effect - APA), and pink (local ancestry adjusted allelic effect - LAAA).
Figure 3—figure supplement 1. QQ-plot of expected p-values and observed p-values for the association signals obtained for Khoisan ancestry located on chromosome 6.
Table 1. Suggestive associations (p-value <1e–5) for the LAAA analysis adjusting for KhoeSan local ancestry on chromosome 6.
| Position | Marker name | Ref | Alt | AltFreq | OR (95% CI) | SE | p-value (x10–6) | Gene | Location | Imputed/typed | INFO score |
|---|---|---|---|---|---|---|---|---|---|---|---|
| 33075635 | rs3117230 | A | G | 0.370 | 0.437 (0.306; 0.624) | 0.182 | 5.292 | HLA-DPB1 | Intergenic | Genotyped | NA |
| 33048661 | rs1042151 | A | G | 0.325 | 0.437 (0.305; 0.627) | 0.184 | 6.806 | HLA-DPB1 | Exonic | Imputed | 0.992 |
| 33058874 | rs2179920 | C | T | 0.369 | 0.445 (0.313; 0.633) | 0.180 | 6.960 | HLA-DPB1 | Intergenic | Genotyped | NA |
| 33072266 | rs2064478 | C | T | 0.371 | 0.447 (0.313; 0.637) | 0.181 | 8.222 | HLA-DPB1 | Intergenic | Imputed | 1 |
| 33072729 | rs3130210 | G | T | 0.371 | 0.447 (0.313; 0.637) | 0.181 | 8.222 | HLA-DPB1 | Intergenic | Imputed | 0.999 |
| 33073440 | rs2064475 | G | A | 0.371 | 0.447 (0.313; 0.637) | 0.181 | 8.222 | HLA-DPB1 | Intergenic | Imputed | 1 |
| 33074348 | rs3117233 | T | C | 0.371 | 0.447 (0.313; 0.637) | 0.181 | 8.222 | HLA-DPB1 | Intergenic | Imputed | 1 |
| 33074707 | rs3130213 | G | A | 0.371 | 0.447 (0.313; 0.637) | 0.181 | 8.222 | HLA-DPB1 | Intergenic | Imputed | 0.970 |
Ref, reference allele; Alt, alternate allele; AltFreq, alternate allele frequency; OR, odds ratio; SE, standard error.
Figure 4. Regional plot indicating the nearest genes in the region of the lead variant (rs3117230) observed on chromosome 6.
SNPs in linkage disequilibrium (LD) with the lead variant are coloured red/orange. The lead variant is indicated in purple. Functional protein-coding genes are coded in red and non-functional (pseudo-genes) are indicated in black.
The lead variant within this suggestive peak lies within COL11A2P1 (collagen type X1 alpha 2 pseudogene 1). COL11A2P1 is an unprocessed pseudogene (ENSG00000228688). Unprocessed pseudogenes are seldom transcribed and translated into functional proteins (Witek and Mohiuddin, 2024). HLA-DPB1 and HLA-DPA1 are the closest functional protein-coding genes to our lead variants. The lead variant identified in the ITHGC meta-analysis, rs28383206, was not present in our genotype or imputed datasets. The ITHGC imputed genotypes using the 1000 Genomes (1000 G) reference panel (Schurz et al., 2024). The lead variant, rs28383206, has an alternate allele frequency of 11.26% in the African population subgroup within the 1000 G dataset (https://www.ncbi.nlm.nih.gov/snp/rs28383206). However, rs28383206 is absent from our in-house whole-genome sequencing (WGS) datasets, which include Bantu-speaking African and KhoeSan individuals. This absence suggests that rs28383206 might not have been imputed in our datasets using the AGR reference panel, potentially due to its low alternate allele frequency in southern African populations. Our merged dataset contained two variants located within 800 base pairs of rs28383206: rs482205 (6:32576009) and rs482162 (6:32576019). However, these variants were not significantly associated with TB status in our cohort (Supplementary file 1).
Discussion
The LAAA analysis of host genetic susceptibility to TB, involving 942 TB cases and 592 controls, identified one suggestive association peak adjusting for KhoeSan local ancestry. The association peak identified in this study encompasses the HLA-DPB1 gene, a highly polymorphic locus, with over 2000 documented allelic variants (Robinson et al., 2020). This association is noteworthy given that HLA-DPB1 alleles have been associated with TB resistance (Dawkins et al., 2022; Ravikumar et al., 1999; Selvaraj et al., 2008). The direction of effect of the lead variants in our study (Table 1) similarly suggests a protective effect against developing active TB. However, variants in HLA-DPB1 were not identified in the ITHGC meta-analysis.
The ITHGC did not identify any significant associations or suggestive peaks in their African ancestry-specific analyses. Notably, the suggestive peak in the HLA-DPB1 region was only captured in our cohort using the LAAA model whilst adjusting for KhoeSan local ancestry. This underscores the importance of incorporating global and local ancestry in association studies investigating complex multi-way admixed individuals, as the genetic heterogeneity present in admixed individuals (produced as a result of admixture-induced and ancestral LD patterns) may cause association signals to be missed when using traditional association models (Duan et al., 2018; Swart et al., 2022b).
We did not replicate the significant association signal in HLA-DRB1 identified by the ITHGC. However, the ITHGC also did not replicate this association in their own African ancestry-specific analysis. The significant association, rs28383206, identified by the ITHGC meta-analysis appears to be tagging the HLA-DQA1*02:1 allele, which is associated with TB in Icelandic and Asian populations (Li et al., 2021; Sveinbjornsson et al., 2016; Zheng et al., 2018). It is possible that this association signal is specific to non-African populations, but additional research is required to verify this hypothesis. Both our study and the ITHGC independently pinpointed variants associated with TB susceptibility in different genes within the HLA-II locus (Figure 5). The HLA-II region spans ~0.8 Mb on chromosome 6p21.32 and encompasses the HLA-DP, -DR, and -DQ alpha and beta chain genes. The HLA-II complex is the human form of the major histocompatibility complex class II (MHC-II) proteins on the surface of antigen presenting cells, such as monocytes, dendritic cells and macrophages. The innate immune response against M.tb involves phagocytosis by alveolar macrophages. In the phagosome, mycobacterial antigens are processed for presentation on MHC-II on the surface of the antigen presenting cell. Previous studies have suggested that M.tb interferes with the MHC-II pathway to enhance intracellular persistence and delay activation of the adaptive immune response (Oliveira-Cortez et al., 2016). For example, M.tb can inhibit phagosome maturation and acidification, thereby limiting antigen processing and presentation on MHC-II molecules (Chang et al., 2005). Given that MHC-II plays an essential role in the adaptive immune response to TB and numerous studies have identified HLA-II variants associated with TB (Cai et al., 2019; Chihab et al., 2023; de de Sá et al., 2020; Harishankar et al., 2018; Schurz et al., 2024; Selvaraj et al., 2008), additional research is required to elucidate the effects of HLA-II variation on TB risk status.
Figure 5. A schematic diagram of the location of HLA-II genes associated with TB susceptibility.

Genes in red were identified by the ITHGC. Genes in blue were identified by this study.
This analysis has a few limitations. First, unlike the ITHGC manuscript, we did not validate our SNP peak in the HLA-II region through fine mapping. Although we initially considered performing HLA imputation and fine-mapping using the HIBAG R package, as described in the ITHGC article (https://hibag.s3.amazonaws.com/hlares_index.html#estimates), the African HIBAG model was trained on genotype data from African American and HapMap YRI populations, which have minimal to no KhoeSan ancestry. Since our association peak likely originates from KhoeSan ancestral haplotype blocks, using an imputation reference panel that includes individuals with KhoeSan ancestry is essential to this analysis. We acknowledge that HLA typing could validate the importance of our lead SNPs in the HLA-II region and support the LAAA model, but this was not feasible due to the absence of a suitable reference panel that includes KhoeSan ancestry. Second, our analysis has a notable case-control imbalance (cases/controls = 1.610). While many studies discuss methods for addressing case-control imbalances with more controls than cases which can inflate type 1 error rates (Dai et al., 2021; Öztornaci et al., 2023; Zhou et al., 2018), few address the implications of a large case-to-control ratio like ours (952 cases to 592 controls). To assess the impact of this imbalance, we used the Michigan genetic association study (GAS) power calculator (Skol et al., 2006). Under an additive disease model with an estimated prevalence of 0.15, a disease allele frequency of 0.3, a genotype relative risk of 1.5, and a default significance level of 7×10⁻⁶, we achieved an expected power of approximately 75%. With a balanced sample size of 950 cases and 950 controls, power would exceed 90%, but it would drop significantly with a smaller balanced cohort of 590 cases and 590 controls. Given these results, we proceeded with our analysis to maximise statistical power despite the case-control imbalance.
In conclusion, the application of the LAAA to a highly admixed SAC cohort revealed a suggestive association signal in the HLA-II region associated with protection against TB that was not identified by the African-ancestry specific analysis performed by the ITHGC. Our study builds on the results of the ITHGC by demonstrating an alternative method to identify association signals in cohorts with complex genetic ancestry. This analysis shows the value of including individual global and local ancestry in genetic association analyses. Furthermore, we confirm HLA-II loci associations with TB susceptibility in an admixed South African population, highlighting the role of the adaptive immune system in TB susceptibility and resistance.
Materials and methods
Data
This study included the two SAC admixed datasets from the ITHGC analysis [RSA(A) and RSA(M)] as well as four additional TB case-control datasets obtained from admixed South African population groups (Table 2). Like the SAC population, the Xhosa population is admixed with Bantu-speaking African and KhoeSan ancestral contributions (Choudhury et al., 2021). All datasets were collected over the past 30 years under different research projects (Daya et al., 2013; Kroon et al., 2020; Schurz et al., 2018; Smith et al., 2023; Ugarte-Gil et al., 2020) and individuals that were included in the analyses consented to the use of their data in future research regarding TB host genetics. Across all datasets, TB cases were bacteriologically confirmed (culture positive) or diagnosed by GeneXpert. Controls were healthy individuals with no history of TB disease or treatment. However, given the high prevalence of TB in South Africa 852 cases (95% CI 679–1026) per 10,000 individuals 15 years and older (Cudahy et al., 2020), most controls have likely been exposed to M.tb at some point (Gallant et al., 2010). For all datasets, cases and controls were obtained from the same community and thus share similar socio-economic status and health care access.
Table 2. Summary of the datasets included in analysis.
| Dataset | Genotyping platform | Self-reported ethnicity | Cases/controls | Reference |
|---|---|---|---|---|
| RSA(A) | Affymetrix 500 k | SAC | 642/91 | Daya et al., 2013 |
| RSA(M) | MEGA array 1.1 M | SAC | 555/440 | Schurz et al., 2018; Swart et al., 2021 |
| RSA(TANDEM) | H3Africa array | SAC and Bantu-speaking African | 161/133 | Swart et al., 2022b |
| RSA(NCTB) | H3Africa array | SAC | 49/111 | Oyageshio et al., 2023 |
| RSA(Worcester) | H3Africa array | SAC | 61 cases | Unpublished |
| RSA(Xhosa) | Whole genome sequencing | IsiXhosa | 44/120 | Unpublished |
A list of sites genotyped on the Infinium H3Africa array (https://chipinfo.h3abionet.org/browse) was extracted from the whole-genome sequenced [RSA(Xhosa)] dataset and treated as genotype data in subsequent analyses. Quality control (QC) of raw genotype data was performed using PLINK v1.9 (Purcell et al., 2007). In all datasets, individuals were screened for sex concordance and discordant sex information was corrected based on X chromosome homozygosity estimates (Festimate <0.2 for females and Festimate >0.8 for males). In the event that sex information could not be corrected based on homozygosity estimates, individuals with missing or discordant sex information were removed. Individuals with genotype call rates less than 90% and SNPs with more than 5% missingness were removed as described previously (Swart et al., 2021). Monomorphic sites were removed. Individuals were screened for deviations in Hardy-Weinberg Equilibrium (HWE) for each SNP, and sites deviating from the HWE threshold of 10–5 were removed. Sex chromosomes were excluded from the analysis. The genome coordinates across all datasets were checked for consistency and, if necessary, converted to GRCh37 using the UCSC liftOver tool (Kuhn et al., 2013). The number of individuals and variants remaining after genotype QC is shown in Supplementary file 2.
Genotype datasets were pre-phased using SHAPEIT v2 (Delaneau et al., 2013) and imputed using the Positional Burrows-Wheeler Transformation (PBWT) algorithm through the Sanger Imputation Server (SIS; Durbin, 2014). The African Genome Resource (AGR) panel (n=4956), accessed via the SIS, was used as the reference panel for imputation (Gurdasani et al., 2015) since it has been shown that the AGR is the best reference panel for imputation of missing genotypes for samples from the SAC population (Schurz et al., 2019). Imputed data were filtered to remove sites with imputation quality INFO scores less than 0.95. Individual datasets were screened for relatedness using KING software (Manichaikul et al., 2010) and individuals up to second degree relatedness were removed. A total of 7,544,769 markers overlapped across all six datasets. This list of intersecting markers was extracted from each dataset using the PLINK --extract flag. The datasets were then merged using the PLINK v1.9. After merging, all individuals missing more than 10% genotypes were removed, markers with more than 5% missing data were excluded and a HWE filter was applied to controls (threshold <10–5). The merged dataset was screened for relatedness using KING, and individuals up to second degree relatedness were subsequently removed. The final merged dataset after QC and data filtering (including the removal of related individuals) consisted of 1 544 individuals (952 TB cases and 592 healthy controls). A total of 7,510,057 variants passed QC and filtering parameters.
Global ancestry inference
ADMIXTURE was used to determine the correct number of contributing ancestral proportions in our multi-way admixed population cohort (Alexander and Lange, 2011). ADMIXTURE estimates the number of contributing ancestral populations (denoted by K) and population allele frequencies through cross-validation (CV). All 1544 individuals were grouped into running groups of equal size together with 191 reference populations (Table 3). Running groups were created to ensure approximately equal numbers of reference populations and admixed populations. Xhosa and SAC samples were divided into separate running groups.
Table 3. Ancestral populations included for global ancestry deconvolution.
| Population | n | Source |
|---|---|---|
| European (British – GBR) | 40 | 1000 Genomes (1000 G) phase 3 (Auton et al., 2015) |
| East Asian (Chinese – CHB) | 40 | 1000 G phase 3 |
| Bantu-speaking African (Yoruba – YRI) | 40 | 1000 G phase 3 |
| Southeast Asian (Malaysian) | 38 | Singapore Sequencing Malay Project (SSMP) (Wong et al., 2013) |
| KhoeSan (Nama) | 33 | African Genome Variation Project (AGVP/ADRP) (Gurdasani et al., 2015) |
Redundant SNPs were removed by PLINK through LD pruning by removing each SNP with LD r2 >0.1 within a 50-SNP sliding window (advanced by 10 SNPs at a time). Ancestral proportions were inferred in an unsupervised manner for K=3–6 (1 iteration). The best value of K for the data was selected by choosing the K value with the lowest CV error across all running groups. Ten iterations of K=3 and K=5 were run for the Xhosa and SAC individuals respectively. Since it has been shown that RFMix (Maples et al., 2013) outperforms ADMIXTURE in determining global ancestry proportions (Uren et al., 2020), RFMix was also used to refine inferred global ancestry proportions. Global ancestral proportions were visualised using PONG (Behr et al., 2016).
Local ancestry inference
The merged dataset and the reference file (containing reference populations from Table 3) were phased separately using SHAPEIT2. The local ancestry for each position in the genome was inferred using RFMix (Maples et al., 2013). Default parameters were used, but the number of generations since admixture was set to 15 for the SAC individuals and 20 for the Xhosa individuals (as determined by previous studies) (Uren et al., 2016). RFMix was run with three expectation maximisation iterations and the --reanalyse-reference flag.
Batch effect screening and correction
Merging separate datasets generated at different timepoints and/or facilities, as we have done here, will undoubtedly introduce batch effects. Principal component analysis (PCA) is a common method used to visualise batch effects, where the first two principal components (PCs) are plotted with each sample coloured by batch, and a separation of colours is indicative of a batch effect (Nyamundanda et al., 2017). However, it is difficult to differentiate between separation caused by population structure and separation caused by batch effect using PCA alone. An alternative method to detect batch effects (Chen et al., 2022) involves coding case/control status by batch, followed by running an association analysis testing each batch against all other batches. If any single dataset has more positive signals compared to the other datasets, then batch effects may be responsible for producing spurious results. Batch effects can be resolved by removing those SNPs which pass the genome-wide significance threshold from the merged dataset. We have adapted this batch effect correction method for application in a highly admixed cohort with complex population structure (Croock et al., 2024). Code required to execute batch effect correction procedures is publicly available (https://github.com/TBHostGenetics/data_harmonisation, copy archived at Croock, 2025). Our modified method was used to remove 36 627 SNPs affected by batch effects from our merged dataset.
Local ancestry allelic adjusted association analysis
The LAAA association model was used to investigate if there are allelic, ancestry-specific or ancestry-specific allelic associations with TB susceptibility in our merged dataset. Global ancestral components inferred by RFMix, age and sex were included as covariates in the association tests (Supplementary file 3). Variants with minor allele frequency (MAF) <1% were removed to improve the stability of the association tests. A total of 784,557 autosomal markers (with MAF >1%) and 1544 unrelated individuals (952 TB cases and 592 healthy controls) were available for further analyses. Of the markers included in the final dataset, 535,193 sites were imputed. Dosage files, which code the number of alleles of a specific ancestry at each locus across the genome, were compiled. Separate regression models for each ancestral contribution were fitted to investigate which ancestral contribution is associated with TB susceptibility. Code required to execute the LAAA model is publicly available (https://github.com/TBHostGenetics/LAAA-model, copy archived at Swart, 2025). Details regarding the models have been described elsewhere (Swart et al., 2022b); but in summary, four regression models were tested to detect the source of the association signals observed:
(1) Null model or global ancestry (GA) model
The null model only includes global ancestry, sex and age covariates. This test investigates whether an additive allelic dose exerts an effect on the phenotype (without including local ancestry of the allele).
(2) Local ancestry (LA) model
This model is used in admixture mapping to identify ancestry-specific variants associated with a specific phenotype. The LA model evaluates the number of alleles of a specific ancestry at a locus and includes the corresponding marginal effect as a covariate in association analyses.
(3) Ancestry plus allelic (APA) model
The APA model simultaneously performs model (1) and (2). This model tests whether an additive allelic dose exerts an effect on the phenotype whilst adjusting for local ancestry.
(4) Local ancestry adjusted allelic (LAAA) model
The LAAA model is an extension of the APA model, which models the combination of the minor allele and ancestry of the minor allele at a specific locus and the effect this interaction has on the phenotype.
The R package STEAM (Significance Threshold Estimation for Admixture Mapping; Grinde et al., 2019) was used to determine the admixture mapping significance threshold given the global ancestral proportions of each individual and the number of generations since admixture (g=15). For the LA model, a genome-wide significance threshold of p-value <2.5 x 10–6 was deemed significant by STEAM. The traditional genome-wide significance threshold of 5x10–8 was used for the GA, APA and LAAA models, as recommended by the authors of the LAAA model (Duan et al., 2018). Results from the analysis performed on chromosome 6 whilst adjusting for KhoeSan ancestry are documented in Supplementary file 4.
Acknowledgements
We acknowledge the support of the DSI-NRF Centre of Excellence for Biomedical Tuberculosis Research, South African Medical Research Council Centre for Tuberculosis Research (SAMRC CTR), Division of Molecular Biology and Human Genetics, Faculty of Medicine and Health Sciences, Stellenbosch University, Cape Town, South Africa. We also acknowledge the Centre for High Performance Computing (CHPC), South Africa, for providing computational resources. This research was partially funded by the South African government through the SAMRC and the Harry Crossley Research Foundation.
Funding Statement
The funders had no role in study design, data collection and interpretation, or the decision to submit the work for publication.
Contributor Information
Caitlin Uren, Email: caitlinu@sun.ac.za.
Bavesh D Kana, University of the Witwatersrand, South Africa.
Bavesh D Kana, University of the Witwatersrand, South Africa.
Funding Information
This paper was supported by the following grants:
South African Medical Research Council to Dayna Adrienne Croock.
Harry Crossley Foundation to Dayna Adrienne Croock.
Additional information
Competing interests
No competing interests declared.
Author contributions
Formal analysis, Investigation, Visualization, Methodology, Writing – original draft, Writing – review and editing.
Resources, Supervision, Methodology, Writing – review and editing.
Conceptualization, Supervision, Methodology, Writing – review and editing.
Conceptualization, Supervision, Writing – review and editing.
Conceptualization, Data curation, Supervision, Writing – review and editing.
Conceptualization, Resources, Data curation, Supervision, Project administration, Writing – review and editing.
Ethics
Ethics approval was granted by the Health Research Ethics Committee (HREC) of Stellenbosch University, South Africa (project number S22/02/031). Individuals that were included in the analyses consented to the use of their data in future research regarding TB host genetics.
Additional files
Data availability
The current manuscript is a computational study, so no new genetic data was generated for this manuscript. Access to retrospective genetic datasets analysed can be requested through the original studies data access process. Where the dataset is yet to be published, access to these datasets will be considered upon reasonable request in line with the initial participant consent - please email caitlinu@sun.ac.za. Summary statistics for the covariate data for individuals in the cohort are available in Supplementary File 3, and LAAA model results for chromosome 6 (adjusted for KhoeSan ancestry) are available in Supplementary File 4. Code required to perform genotype QC, imputation, ancestry inference and batch effect procedures is publicly available (https://github.com/TBHostGenetics/data_harmonisation copy archived at Croock, 2025). Code required to execute the LAAA model is publicly available (https://github.com/TBHostGenetics/LAAA-model copy archived at Swart, 2025).
The following previously published dataset was used:
Oyageshio OP, Myrick JW, Saayman J, van der Westhuizen L, Al-Hindi D, Reynolds AW, Zaitlen N, Uren C, Möller M, Henn BM. 2023. Investigating Host Genetic Risk Factors for Tuberculosis in Highly Endemic South African Populations. European Genome-Phenome Archive. EGAS00001007850
References
- Alexander DH, Lange K. Enhancements to the ADMIXTURE algorithm for individual ancestry estimation. BMC Bioinformatics. 2011;12:246. doi: 10.1186/1471-2105-12-246. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Auton A, Brooks LD, Durbin RM, Garrison EP, Kang HM, Korbel JO, Marchini JL, McCarthy S, McVean GA, Abecasis GR, 1000 Genomes Project Consortium A global reference for human genetic variation. Nature. 2015;526:68–74. doi: 10.1038/nature15393. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Behr AA, Liu KZ, Liu-Fang G, Nakka P, Ramachandran S. pong: fast analysis and visualization of latent clusters in population genetic data. Bioinformatics. 2016;32:2817–2823. doi: 10.1093/bioinformatics/btw327. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Cai L, Li Z, Guan X, Cai K, Wang L, Liu J, Tong Y. The research progress of host genes and tuberculosis susceptibility. Oxidative Medicine and Cellular Longevity. 2019;2019:9273056. doi: 10.1155/2019/9273056. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Chang ST, Linderman JJ, Kirschner DE. Multiple mechanisms allow Mycobacterium tuberculosis to continuously inhibit MHC class II-mediated antigen presentation by macrophages. PNAS. 2005;102:4530–4535. doi: 10.1073/pnas.0500362102. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Chen D, Tashman K, Palmer DS, Neale B, Roeder K, Bloemendal A, Churchhouse C, Ke ZT. A data harmonization pipeline to leverage external controls and boost power in GWAS. Human Molecular Genetics. 2022;31:481–489. doi: 10.1093/hmg/ddab261. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Chihab LY, Kuan R, Phillips EJ, Mallal SA, Rozot V, Davis MM, Scriba TJ, Sette A, Peters B, Lindestam Arlehamn CS, Group SS. Expression of specific HLA class II alleles is associated with an increased risk for active tuberculosis and a distinct gene expression profile. HLA. 2023;101:124–137. doi: 10.1111/tan.14880. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Chimusa ER, Daya M, Möller M, Ramesar R, Henn BM, van Helden PD, Mulder NJ, Hoal EG. Determining ancestry proportions in complex admixture scenarios in South Africa using a novel proxy ancestry selection method. PLOS ONE. 2013;8:e73971. doi: 10.1371/journal.pone.0073971. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Chimusa ER, Zaitlen N, Daya M, Möller M, van Helden PD, Mulder NJ, Price AL, Hoal EG. Genome-wide association study of ancestry-specific TB risk in the South African Coloured population. Human Molecular Genetics. 2014;23:796–809. doi: 10.1093/hmg/ddt462. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Choudhury A, Sengupta D, Ramsay M, Schlebusch C. Bantu-speaker migration and admixture in southern Africa. Human Molecular Genetics. 2021;30:R56–R63. doi: 10.1093/hmg/ddaa274. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Croock D, Swart Y, Schurz H, Petersen DC, Möller M, Uren C. Data harmonization guidelines to combine multi-platform genomic data from admixed populations and boost power in genome-wide association studies. Current Protocols. 2024;4:e1055. doi: 10.1002/cpz1.1055. [DOI] [PubMed] [Google Scholar]
- Croock D. Data_harmonisation. swh:1:rev:e9709c5a9257c2622637c418bc410f7b832a5cd7Software Heritage. 2025 https://archive.softwareheritage.org/swh:1:dir:34c80bd568d8d7bf8deaa5e190c8f35f9bf2caf7;origin=https://github.com/TBHostGenetics/data_harmonisation;visit=swh:1:snp:0bc87001d37daf4994a047e4979bdc48e140b021;anchor=swh:1:rev:e9709c5a9257c2622637c418bc410f7b832a5cd7
- Cudahy PGT, Wilson D, Cohen T. Risk factors for recurrent tuberculosis after successful treatment in a high burden setting: a cohort study. BMC Infectious Diseases. 2020;20:789. doi: 10.1186/s12879-020-05515-4. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Dai X, Fu G, Zhao S, Zeng Y. Statistical learning methods applicable to genome-wide association studies on unbalanced case-control disease data. Genes. 2021;12:736. doi: 10.3390/genes12050736. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Dawkins BA, Garman L, Cejda N, Pezant N, Rasmussen A, Rybicki BA, Levin AM, Benchek P, Seshadri C, Mayanja-Kizza H, Iannuzzi MC, Stein CM, Montgomery CG. Novel HLA associations with outcomes of Mycobacterium tuberculosis exposure and sarcoidosis in individuals of African ancestry using nearest-neighbor feature selection. Genetic Epidemiology. 2022;46:463–474. doi: 10.1002/gepi.22490. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Daya M, van der Merwe L, Galal U, Möller M, Salie M, Chimusa ER, Galanter JM, van Helden PD, Henn BM, Gignoux CR, Hoal E. A panel of ancestry informative markers for the complex five-way admixed South African coloured population. PLOS ONE. 2013;8:e82224. doi: 10.1371/journal.pone.0082224. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Delaneau O, Howie B, Cox AJ, Zagury JF, Marchini J. Haplotype estimation using sequencing reads. American Journal of Human Genetics. 2013;93:687–696. doi: 10.1016/j.ajhg.2013.09.002. [DOI] [PMC free article] [PubMed] [Google Scholar]
- de Sá NBR, Ribeiro-Alves M, da Silva TP, Pilotto JH, Rolla VC, Giacoia-Gripp CBW, Scott-Algara D, Morgado MG, Teixeira SLM. Clinical and genetic markers associated with tuberculosis, HIV-1 infection, and TB/HIV-immune reconstitution inflammatory syndrome outcomes. BMC Infectious Diseases. 2020;20:59. doi: 10.1186/s12879-020-4786-5. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Duan Q, Xu Z, Raffield LM, Chang S, Wu D, Lange EM, Reiner AP, Li Y. A robust and powerful two-step testing procedure for local ancestry adjusted allelic association analysis in admixed populations. Genetic Epidemiology. 2018;42:288–302. doi: 10.1002/gepi.22104. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Durbin R. Efficient haplotype matching and storage using the positional Burrows-Wheeler transform (PBWT) Bioinformatics. 2014;30:1266–1272. doi: 10.1093/bioinformatics/btu014. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Escombe AR, Ticona E, Chávez-Pérez V, Espinoza M, Moore DAJ. Improving natural ventilation in hospital waiting and consulting rooms to reduce nosocomial tuberculosis transmission risk in a low resource setting. BMC Infectious Diseases. 2019;19:88. doi: 10.1186/s12879-019-3717-9. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Gallant CJ, Cobat A, Simkin L, Black GF, Stanley K, Hughes J, Doherty TM, Hanekom WA, Eley B, Beyers N, Jaïs J-P, van Helden P, Abel L, Alcaïs A, Hoal EG, Schurr E. Impact of age and sex on mycobacterial immunity in an area of high tuberculosis incidence. The International Journal of Tuberculosis and Lung Disease. 2010;14:952–959. [PubMed] [Google Scholar]
- Glaziou P, Floyd K, Raviglione MC. Global epidemiology of tuberculosis. Seminars in Respiratory and Critical Care Medicine. 2018;39:271–285. doi: 10.1055/s-0038-1651492. [DOI] [PubMed] [Google Scholar]
- Grinde KE, Brown LA, Reiner AP, Thornton TA, Browning SR. Genome-wide significance thresholds for admixture mapping studies. American Journal of Human Genetics. 2019;104:454–465. doi: 10.1016/j.ajhg.2019.01.008. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Gurdasani D, Carstensen T, Tekola-Ayele F, Pagani L, Tachmazidou I, Hatzikotoulas K, Karthikeyan S, Iles L, Pollard MO, Choudhury A, Ritchie GRS, Xue Y, Asimit J, Nsubuga RN, Young EH, Pomilla C, Kivinen K, Rockett K, Kamali A, Doumatey AP, Asiki G, Seeley J, Sisay-Joof F, Jallow M, Tollman S, Mekonnen E, Ekong R, Oljira T, Bradman N, Bojang K, Ramsay M, Adeyemo A, Bekele E, Motala A, Norris SA, Pirie F, Kaleebu P, Kwiatkowski D, Tyler-Smith C, Rotimi C, Zeggini E, Sandhu MS. The African genome variation project shapes medical genetics in Africa. Nature. 2015;517:327–332. doi: 10.1038/nature13997. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Harishankar M, Selvaraj P, Bethunaickan R. Influence of genetic polymorphism towards pulmonary tuberculosis susceptibility. Frontiers in Medicine. 2018;5:213. doi: 10.3389/fmed.2018.00213. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Kroon EE, Kinnear CJ, Orlova M, Fischinger S, Shin S, Boolay S, Walzl G, Jacobs A, Wilkinson RJ, Alter G, Schurr E, Hoal EG, Möller M. An observational study identifying highly tuberculosis-exposed, HIV-1-positive but persistently TB, tuberculin and IGRA negative persons with M. tuberculosis specific antibodies in Cape Town, South Africa. EBioMedicine. 2020;61:103053. doi: 10.1016/j.ebiom.2020.103053. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Kuhn RM, Haussler D, Kent WJ. The UCSC genome browser and associated tools. Briefings in Bioinformatics. 2013;14:144–161. doi: 10.1093/bib/bbs038. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Laghari M, Sulaiman SAS, Khan AH, Talpur BA, Bhatti Z, Memon N. Contact screening and risk factors for TB among the household contact of children with active TB: a way to find source case and new TB cases. BMC Public Health. 2019;19:1274. doi: 10.1186/s12889-019-7597-0. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Lehohla P. South African Census 2011 Meta-data (Report No. 03-01-47; p. 130). South African Census. Statistics South Africa; 2012. [Google Scholar]
- Li M, Hu Y, Zhao B, Chen L, Huang H, Huai C, Zhang X, Zhang J, Zhou W, Shen L, Zhen Q, Li B, Wang W, He L, Qin S. A next generation sequencing combined genome-wide association study identifies novel tuberculosis susceptibility loci in Chinese population. Genomics. 2021;113:2377–2384. doi: 10.1016/j.ygeno.2021.05.035. [DOI] [PubMed] [Google Scholar]
- Manichaikul A, Mychaleckyj JC, Rich SS, Daly K, Sale M, Chen WM. Robust relationship inference in genome-wide association studies. Bioinformatics. 2010;26:2867–2873. doi: 10.1093/bioinformatics/btq559. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Maples BK, Gravel S, Kenny EE, Bustamante CD. RFMix: a discriminative modeling approach for rapid and robust local-ancestry inference. American Journal of Human Genetics. 2013;93:278–288. doi: 10.1016/j.ajhg.2013.06.020. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Matose MT, Poluta M, Douglas TS. Natural ventilation as a means of airborne tuberculosis infection control in minibus taxis. South African Journal of Science. 2019;115:9/10. doi: 10.17159/sajs.2019/5737. [DOI] [Google Scholar]
- Menzies NA, Swartwood N, Testa C, Malyuta Y, Hill AN, Marks SM, Cohen T, Salomon JA. Time since infection and risks of future disease for individuals with Mycobacterium tuberculosis infection in the United States. Epidemiology. 2021;32:70–78. doi: 10.1097/EDE.0000000000001271. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Möller M, Kinnear CJ, Orlova M, Kroon EE, van Helden PD, Schurr E, Hoal EG. Genetic resistance to Mycobacterium tuberculosis infection and disease. Frontiers in Immunology. 2018;9:2219. doi: 10.3389/fimmu.2018.02219. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Möller M, Kinnear CJ. Human global and population-specific genetic susceptibility to Mycobacterium tuberculosis infection and disease. Current Opinion in Pulmonary Medicine. 2020;26:302–310. doi: 10.1097/MCP.0000000000000672. [DOI] [PubMed] [Google Scholar]
- Nyamundanda G, Poudel P, Patil Y, Sadanandam A. A novel statistical method to diagnose, quantify and correct batch effects in genomic studies. Scientific Reports. 2017;7:10849. doi: 10.1038/s41598-017-11110-6. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Oliveira-Cortez A, Melo AC, Chaves VE, Condino-Neto A, Camargos P. Do HLA class II genes protect against pulmonary tuberculosis? A systematic review and meta-analysis. European Journal of Clinical Microbiology & Infectious Diseases. 2016;35:1567–1580. doi: 10.1007/s10096-016-2713-x. [DOI] [PubMed] [Google Scholar]
- Oyageshio OP, Myrick JW, Saayman J, van der Westhuizen L, Al-Hindi D, Reynolds AW, Zaitlen N, Uren C, Möller M, Henn BM. Strong effect of demographic changes on tuberculosis susceptibility in South Africa. medRxiv. 2023 doi: 10.1371/journal.pgph.0002643. https://www.medrxiv.org/content/10.1101/2023.11.02.23297990v1 [DOI] [PMC free article] [PubMed]
- Öztornaci RO, Syed H, Morris AP, Taşdelen B. The use of class imbalanced learning methods on ULSAM data to predict the case–control status in genome-wide association studies. Journal of Big Data. 2023;10:174. doi: 10.1186/s40537-023-00853-x. [DOI] [Google Scholar]
- Purcell S, Neale B, Todd-Brown K, Thomas L, Ferreira MAR, Bender D, Maller J, Sklar P, de Bakker PIW, Daly MJ, Sham PC. PLINK: a tool set for whole-genome association and population-based linkage analyses. American Journal of Human Genetics. 2007;81:559–575. doi: 10.1086/519795. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Ravikumar M, Dheenadhayalan V, Rajaram K, Lakshmi SS, Kumaran PP, Paramasivan CN, Balakrishnan K, Pitchappan RM. Associations of HLA-DRB1, DQB1 and DPB1 alleles with pulmonary tuberculosis in south India. Tubercle and Lung Disease. 1999;79:309–317. doi: 10.1054/tuld.1999.0213. [DOI] [PubMed] [Google Scholar]
- Robinson J, Barker DJ, Georgiou X, Cooper MA, Flicek P, Marsh SGE. IPD-IMGT/HLA Database. Nucleic Acids Research. 2020;48:D948–D955. doi: 10.1093/nar/gkz950. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Schurz H, Kinnear CJ, Gignoux C, Wojcik G, van Helden PD, Tromp G, Henn B, Hoal EG, Möller M. A sex-stratified genome-wide association study of tuberculosis using a multi-ethnic genotyping array. Frontiers in Genetics. 2018;9:678. doi: 10.3389/fgene.2018.00678. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Schurz H, Müller SJ, van Helden PD, Tromp G, Hoal EG, Kinnear CJ, Möller M. Evaluating the accuracy of imputation methods in a five-way admixed population. Frontiers in Genetics. 2019;10:34. doi: 10.3389/fgene.2019.00034. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Schurz H, Naranbhai V, Yates TA, Gilchrist JJ, Parks T, Dodd PJ, Möller M, Hoal EG, Morris AP, Hill AVS. International tuberculosis host genetics consortium. eLife. 2024;13:84394. doi: 10.7554/eLife.84394. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Selvaraj P, Raghavan S, Swaminathan S, Alagarasu K, Narendran G, Narayanan PR. HLA-DQB1 and -DPB1 allele profile in HIV infected patients with and without pulmonary tuberculosis of south India. Infection, Genetics and Evolution. 2008;8:664–671. doi: 10.1016/j.meegid.2008.06.005. [DOI] [PubMed] [Google Scholar]
- Skol AD, Scott LJ, Abecasis GR, Boehnke M. Joint analysis is more efficient than replication-based analysis for two-stage genome-wide association studies. Nature Genetics. 2006;38:209–213. doi: 10.1038/ng1706. [DOI] [PubMed] [Google Scholar]
- Smith MH, Myrick JW, Oyageshio O, Uren C, Saayman J, Boolay S, van der Westhuizen L, Werely C, Möller M, Henn BM, Reynolds AW. Epidemiological correlates of overweight and obesity in the Northern Cape Province, South Africa. PeerJ. 2023;11:e14723. doi: 10.7717/peerj.14723. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Sveinbjornsson G, Gudbjartsson DF, Halldorsson BV, Kristinsson KG, Gottfredsson M, Barrett JC, Gudmundsson LJ, Blondal K, Gylfason A, Gudjonsson SA, Helgadottir HT, Jonasdottir A, Jonasdottir A, Karason A, Kardum LB, Knežević J, Kristjansson H, Kristjansson M, Love A, Luo Y, Magnusson OT, Sulem P, Kong A, Masson G, Thorsteinsdottir U, Dembic Z, Nejentsev S, Blondal T, Jonsdottir I, Stefansson K. HLA class II sequence variants influence tuberculosis risk in populations of European ancestry. Nature Genetics. 2016;48:318–322. doi: 10.1038/ng.3498. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Swart Y, van Eeden G, Sparks A, Uren C, Möller M. Prospective avenues for human population genomics and disease mapping in southern Africa. Molecular Genetics and Genomics. 2020;295:1079–1089. doi: 10.1007/s00438-020-01684-8. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Swart Y, Uren C, van Helden PD, Hoal EG, Möller M. Local ancestry adjusted allelic association analysis robustly captures tuberculosis susceptibility loci. Frontiers in Genetics. 2021;12:716558. doi: 10.3389/fgene.2021.716558. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Swart Y, Eeden G, Uren C, Spuy G, Tromp G, Moller M. Cis -eQTL mapping of TB-T2D comorbidity elucidates the involvement of african ancestry in TB susceptibility. bioRxiv. 2022a doi: 10.1101/2022.10.19.512814. [DOI]
- Swart Y, van Eeden G, Uren C, van der Spuy G, Tromp G, Möller M. GWAS in the Southern African context. bioRxiv. 2022b doi: 10.1101/2022.02.16.480704. [DOI] [PMC free article] [PubMed]
- Swart Y. LAAA-model. swh:1:rev:2026dd400d02cdea6de32d578bdab657df8d251bSoftware Heritage. 2025 https://archive.softwareheritage.org/swh:1:dir:89b4ab6d74cb1bf1f1601e077db7b8e3154177d2;origin=https://github.com/TBHostGenetics/LAAA-model;visit=swh:1:snp:863ae787ed6e43134d24b9b5a2fed72c3f7a52c9;anchor=swh:1:rev:2026dd400d02cdea6de32d578bdab657df8d251b
- Ugarte-Gil C, Alisjahbana B, Ronacher K, Riza AL, Koesoemadinata RC, Malherbe ST, Cioboata R, Llontop JC, Kleynhans L, Lopez S, Santoso P, Marius C, Villaizan K, Ruslami R, Walzl G, Panduru NM, Dockrell HM, Hill PC, Mc Allister S, Pearson F, Moore DAJ, Critchley JA, van Crevel R. Diabetes mellitus among pulmonary tuberculosis patients from 4 tuberculosis-endemic countries: the TANDEM study. Clinical Infectious Diseases. 2020;70:780–788. doi: 10.1093/cid/ciz284. [DOI] [PubMed] [Google Scholar]
- Uren C, Kim M, Martin AR, Bobo D, Gignoux CR, van Helden PD, Möller M, Hoal EG, Henn BM. Fine-scale human population structure in Southern Africa reflects ecogeographic boundaries. Genetics. 2016;204:303–314. doi: 10.1534/genetics.116.187369. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Uren C, Henn BM, Franke A, Wittig M, van Helden PD, Hoal EG, Möller M. A post-GWAS analysis of predicted regulatory variants and tuberculosis susceptibility. PLOS ONE. 2017;12:e0174738. doi: 10.1371/journal.pone.0174738. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Uren C, Hoal EG, Möller M. Putting RFMix and ADMIXTURE to the test in a complex admixed population. BMC Genetics. 2020;21:40. doi: 10.1186/s12863-020-00845-3. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Uren C, Hoal EG, Möller M. Mycobacterium tuberculosis complex and human coadaptation: a two-way street complicating host susceptibility to TB. Human Molecular Genetics. 2021;30:R146–R153. doi: 10.1093/hmg/ddaa254. [DOI] [PubMed] [Google Scholar]
- Verhein KC, Vellers HL, Kleeberger SR. Inter-individual variation in health and disease associated with pulmonary infectious agents. Mammalian Genome. 2018;29:38–47. doi: 10.1007/s00335-018-9733-z. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Witek J, Mohiuddin SS. Biochemistry, Pseudogenes in StatPearls. StatPearls Publishing; 2024. [PubMed] [Google Scholar]
- Wong LP, Ong RTH, Poh WT, Liu X, Chen P, Li R, Lam KKY, Pillai NE, Sim KS, Xu H, Sim NL, Teo SM, Foo JN, Tan LWL, Lim Y, Koo SH, Gan LSH, Cheng CY, Wee S, Yap EPH, Ng PC, Lim WY, Soong R, Wenk MR, Aung T, Wong TY, Khor CC, Little P, Chia KS, Teo YY. Deep whole-genome sequencing of 100 southeast Asian Malays. American Journal of Human Genetics. 2013;92:52–66. doi: 10.1016/j.ajhg.2012.12.005. [DOI] [PMC free article] [PubMed] [Google Scholar]
- World Health Organization Global tuberculosis report 2023 (World health organization, Ed.; p. 75) World health organization. 2023. [August 4, 2023]. https://www.who.int/publications/i/item/9789240083851
- Zaidi SMA, Coussens AK, Seddon JA, Kredo T, Warner D, Houben R, Esmail H. Beyond latent and active tuberculosis: a scoping review of conceptual frameworks. EClinicalMedicine. 2023;66:102332. doi: 10.1016/j.eclinm.2023.102332. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Zheng R, Li Z, He F, Liu H, Chen J, Chen J, Xie X, Zhou J, Chen H, Wu X, Wu J, Chen B, Liu Y, Cui H, Fan L, Sha W, Liu Y, Wang J, Huang X, Zhang L, Xu F, Wang J, Feng Y, Qin L, Yang H, Liu Z, Cui Z, Liu F, Chen X, Gao S, Sun S, Shi Y, Ge B. Genome-wide association study identifies two risk loci for tuberculosis in Han Chinese. Nature Communications. 2018;9:4072. doi: 10.1038/s41467-018-06539-w. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Zhou W, Nielsen JB, Fritsche LG, Dey R, Gabrielsen ME, Wolford BN, LeFaive J, VandeHaar P, Gagliano SA, Gifford A, Bastarache LA, Wei WQ, Denny JC, Lin M, Hveem K, Kang HM, Abecasis GR, Willer CJ, Lee S. Efficiently controlling for case-control imbalance and sample relatedness in large-scale genetic association studies. Nature Genetics. 2018;50:1335–1341. doi: 10.1038/s41588-018-0184-y. [DOI] [PMC free article] [PubMed] [Google Scholar]





