Objective:
Human leucocyte antigen (HLA) class I alleles are the main host genetic factors involved in controlling HIV-1 viral load (VL). Nevertheless, HLA diversity has proven a significant challenge in association studies. We assessed how accounting for binding affinities of HLA class I alleles to HIV-1 peptides facilitate association testing of HLA with HIV-1 VL in a heterogeneous cohort.
Design:
Cohort from the Strategic Timing of AntiRetroviral Treatment (START) study.
Methods:
We imputed HLA class I alleles from host genetic data (2546 HIV+ participants) and sampled immunopeptidomes from 2079 host-paired viral genomes (targeted amplicon sequencing). We predicted HLA class I binding affinities to HIV-1 and unspecific peptides, grouping alleles into functional clusters through consensus clustering. These functional HLA class I clusters were used to test associations with HIV VL.
Results:
We identified four clades totaling 30 HLA alleles accounting for 11.4% variability in VL. We highlight HLA-B∗57:01 and B∗57:03 as functionally similar but yet overrepresented in distinct ethnic groups, showing when combined a protective association with HIV+ VL (log, β −0.25; adj. P-value < 0.05). We further demonstrate only a slight power reduction when using unspecific immunopeptidomes, facilitating the use of the inferred functional HLA groups in other studies
Conclusion:
The outlined computational approach provides a robust and efficient way to incorporate HLA function and peptide diversity, aiding clinical association studies in heterogeneous cohorts. To facilitate access to the proposed methods and results we provide an interactive application for exploring data.
Keywords: human leucocyte antigen, immunopeptidomes, machine learning, major histocompatibility complex, viral load
Introduction
The human leucocyte antigen (HLA) is a critical component of the host immune response. HLA Class I alleles mediate the antiviral response through the presentation of intracellular viral peptides for recognition by cytotoxic T cells. This mechanism is critical to the host's defense against diverse pathogens, which is why it is among the most genetically diverse regions in the human genome as evidenced by its association with our variable response to infectious disease [1]. In the context of HIV, HLA alleles are the canonical host genetic factors associated with viral load (VL) [2], altogether contributing up to 12% of its variability [3]. Viral diversity, on the other hand, is thought to explain two to four times as much (20−46%) [4]. The combined relevance of both host and viral genomics suggests the necessity for the simultaneous analysis of both in the context of infectious disease [5]. Different approaches have been used to study this interaction such as genome-to-genome [6] or peptidome-wide associations [7]. In addition, when considering the interplay between host HLA alleles and viral peptides, their association could be explained functionally in terms of epitope binding and presentation.
Analysis of genetic variance within the HLA region and its associations with clinical outcomes is challenged by population-dependent distributions and the diversity of HLA haplotypes. Reaching statistical power within this region is difficult and while accounting for the effects of this important immunological region is necessary, it is often ignored altogether. Yet, the function of HLA is conserved as a core component of the immune response [8,9]. This suggests there may be shared functional features within and between populations, even in the presence of genotypic plasticity. Understanding the structure of this mutual information could prove relevant for the study of host−pathogen interactions where high mutation rates are observed such as in HIV [10].
Immunopeptidomes have been used to estimate functional similarities between HLA alleles in what was initially denominated HLA supertypes [11]. These functional groups are based on the propensity of various alleles to bind similar sets of peptides. The algorithms used to estimate these binding profiles have gradually improved over time [12,13]. However, the consideration of HLA functional groups in the context of Genome-Wide Association Studies (GWAS) has largely been overlooked [14].
In this study, we considered the ability of HLA proteins to functionally bind peptides leveraging the cost-effectiveness of high-throughput genotyping as opposed to directly assaying HLA function. We provide a framework based on state-of-the-art computational methods for the study of predicted immunopeptidomes and functional HLA groups in the context of HIV-1 infection. In doing so, we demonstrate the increased statistical power that is gained by moving away from a purely genetic approach to a functional one with implications for studying the immune response either directly or as a confounder. We processed HIV immunopeptidomes that incorporate intra-host viral diversity and imputed HLA alleles from the same geographically diverse cohort of people with HIV (PWH) and antiretroviral therapy (ART) naïve participants from the Strategic Timing of AntiRetroviral Treatment (START) trial [15]. Interactive results of this work and complementary data are available via a web application (https://persimune-health-informatics.shinyapps.io/PAW2022Zucco__HLA_HIV_INSIGHT/).
Methods
Ethics
Host and viral samples in this study were extracted and analyzed from participants in the START clinical trial (NCT00867048) [15], conducted by the International Network for Strategic Initiatives in Global HIV Trials (INSIGHT) and the Community Programs for Clinical Research on AIDS (CPCRA). Written consent for the study and genetic analyses were obtained from the participants and approved by participants’ site ethics review committees.
Human leucocyte antigen class I alleles
Imputation of HLA class I alleles (HLA-A, HLA-B and HLA-C) for 2546 genotyped ART-naive, HIV+ participants was performed with HIBAG at 4-digit resolution [16]. Full details of the imputation process and quality control are described in previous publications [2,17]. A multiethnic pre-trained model was used for imputation with a minimum out-of-bag accuracy of over 90% for all loci. HLA diversity was measured by the inverse Simpson index based on the HLA allele frequencies per locus and country. This index represents the complement of the probability that two participants would have the same HLA allele for a selected locus in a country.
Immunopeptidomes and binding affinity prediction
Plasma samples were obtained for a subset of 2079 ART-naïve, HIV+ participants from 21 countries enrolled in the START study. Viral RNA was sequenced, paired-end, using Illumina MiSeq and covered two amplicons in the HXB2 genome positioned 1485–5058 and 5967–9517. The sample preparation, library preparation, sequencing procedure, and detailed quality controls have been described previously [18]. Raw reads were fragmented into 27-mers using KAT [19] and those with a count higher than 1 were translated into peptide sequences of nine amino acids length to fit the mean length of HLA Class I epitopes. Peptides were mapped to 10 major HIV proteins (Asp, Gag-Pol, Nef, Vpr, Vpu, gp160, Vif, Pr55, Rev, Tat) from NCBI RefSeq NC001802.1 (Supplementary file 1) using BLAST 2.8.1 (blastp-short). Hits with an E-value > 1E−05 were excluded to remove low-quality k-mers by considering exact matches. To compare HIV peptidomes with random sequences, a set of half-million 9-mers were generated by processing the same number of random protein sequences from Uniprot. Binding affinities for 268 class I HLA alleles to both HIV and random peptidomes using NetMHCpan 4.0 [20]. Three different immunopeptidome subsets were generated by selecting peptides from the top 10% binding affinities, 10% most variable peptides, and peptides that potentially bind to at least 10% of the alleles using <500 nmol/l as general threshold [12].
Consensus clustering
Hierarchical clustering was implemented using two different linkage functions. Average linkage was used for measuring relationships between HLA alleles represented by dissimilarity defined as cosine, correlation and Euclidean distances. For clustering based on Ward linkage, cosine and correlation distances were corrected by the square root to satisfy the triangular inequality necessary to operate in Euclidean space [21]. To generate an ensemble of clustering solutions, we employed consensus clustering to mitigate bias from the subset, distance metric, or chosen linkage function [22]. A consensus matrix (Cij) of size (n × n) was built where each element is the number of times an ith allele clustered together with a jth allele [23] at a varying total number of clusters selected (3–160). The consensus matrix was then processed by hierarchical clustering with average linkage after transforming the values into dissimilarity scores (1 – Cij). Clustering was performed in Python 3.7.1 using the Scipy library [24].
Statistical analyses
Associations of log10-transformed HIV VL with each node of the consensus tree were tested by linear regression and adjusted by sex, self-reported race, and country (Supplementary Table 1). HIV VL was measured once at study entry [2] and log10-transformed to approximate normality (Figure S3, Supplemental Digital Content). Tested HLA alleles had to be present in more than 10% of the participants. Multiple testing was controlled by a Benjamini–Hochberg procedure using a q-value <0.05 to identify associations. Analyses were performed in R v3.6.0 [25].
Data visualization and availability
Consensus clustering dendrograms and association coefficients were depicted using Interactive Tree of Life (iTOL) [26] and tanglegrams, which were generated by the dendextend R package [27]. We provide a flexible visualization of peptide-to-HLA binding profiles across viral proteins. Data downloads and access to supplementary materials are made available through the app (https://persimune-health-informatics.shinyapps.io/PAW2022Zucco__HLA_HIV_INSIGHT/).
Results
Baseline characteristics for the START cohort
We used baseline data and the genotypes of 2546 participants from the START trial. Next-generation sequencing of HIV samples was retrieved for a subset of 2079 participants. All participants included in the trial were asymptomatic HIV+ and ART-naive with two CD4+ cell counts >500/μl at least 14 days apart within 60 days of enrollment in the trial. Baseline characteristics for study participants can be found in Table 1.
Table 1.
Characteristics | Genotyped participants (n = 2546) |
Median age (IQR) (years) | 36 (29–45) |
Sex [n (%)] | |
Female | 511 (20.1) |
Male | 2035 (79.9) |
Race/ethnic group [n (%)] | |
Black | 577 (22.7) |
Hispanic | 498 (19.6) |
Asian | 26 (1.0) |
White | 1404 (55.2) |
Other | 41 (1.6) |
Geographical region [n (%)] | |
Africa | 343 (13.5) |
Asia | 0 (0) |
Australia and New Zealand | 96 (3.8) |
Europe and Israel | 1148 (45.1) |
Latin America | 499 (19.6) |
United States and Canada | 460 (18.1) |
Mode of HIV-infection [n (%)] | |
Sexual contact | |
MSM | 1633 (64.1) |
With a person of the opposite sex | 751 (29.5) |
Injection-drug use | 45 (1.8) |
Other | 117 (4.6) |
Median time since HIV diagnosis (IQR) (years) | 1 (0–3) |
ART-naive [n (%)] | 2546 (100) |
Median CD4+ T-cell count (IQR) (cells/μl) | 651 (585–759) |
Median HIV viral load (IQR) (copies/ml) | 14 833 (3503–46 000) |
IQR, interquartile range; MSM, men who have sex with men; ART, antiretroviral treatment.
Prediction of patient-derived HIV immunopeptidomes
Our approach for immunopeptidome generation initially yielded 9.88 × 107 peptide 9-mers. After filtering and mapping with BLAST to 10 major HIV proteins, a total of 173 792 nine-mers were considered. This accounted for a 99.22% coverage across the reference proteome. Among the final list, we observed 136 best-defined CTL/CD8+ epitopes from Los Alamos (version 2019-11-20; Supplementary file 2).
Data exploration
To facilitate the exploration of the results, a web application was developed (Fig. 1) providing tools to interactively navigate the global and local diversity of imputed HLA alleles, HIV subtype-derived peptides, and their corresponding binding profiles. Shannon and Simpson diversity indices can be directly visualized on the world map highlighting the geographical diversity of the cohort. This interactive tool facilitates the further examination of specific HLA allele, HIV subtype, or peptide frequencies for the entire cohort and options to explore details at a country level.
Higher human leucocyte antigen-A diversity is associated with a lower mean viral load per country
The diversity of HLA Class I alleles measured in terms of the inverse Simpson index was calculated using the HLA allele frequencies per locus and country for comparison. A negative univariate correlation (R = −0.65, P = 0.0018) between HLA-A diversity and mean HIV log10(viral load) per country was found. This indicates that countries in our cohort with a high diversity of HLA-A alleles would show lower levels of HIV VL in the population represented in terms of mean log10(VL). No significant correlation was observed for HLA-B and HLA-C, respectively (Fig. 2).
Consensus clustering of HIV-derived immunopeptidomes improves statistical power compared to random immunopeptidomes
Consensus trees were generated based on a half-million random peptides from Uniprot and the predicted HIV immunopeptidome under the same methodology. The correlation between both trees was 0.985 indicating high similarity. Minor differences were found in allele distances (Figure S1, Supplemental Digital Content) with only a few major structural changes such as the displacement of B∗35:20 from a predominantly B∗15 node (unspecific immunopeptidome) to one dominated by B∗35 alleles (HIV immunopeptidome). Another two examples are HLA-B∗15:13 and B∗15:58 which moved from a node of predominantly HLA-C alleles (unspecific immunopeptidome) to a node of HLA-B alleles (HIV immunopeptidome). These small differences, when combined, were translated into an increase in statistical power (Figure S2, Supplemental Digital Content). Overall, when assessed in a multivariate model including all HLA class I alleles as covariates the combined percentage of explained variance accounted for 11.44% of the VL after adjustment for sex, self-reported race, and country. Hence, our clusters of HLA function accounted for HIV-VL variance to a similar degree as reported by Bartha et al.[3] for genotypic evidence in a comparatively pure European cohort. This highlights that accounting for HLA function is sufficient to explain variability in VL in heterogeneous populations comparable to studies performed in homogeneous cohorts [3,7].
Human leucocyte antigen class I functional nodes associations with HIV-viral load
Associations between the HLA functional nodes and measurements of HIV-VL taken at study entry were tested using linear regression. Four nodes with a consistent effect size were associated with HIV-VL (Fig. 3). These nodes were composed of 30 HLA class I alleles of which 11 were observed in participants with measurable VL at study entry. Two functional nodes were associated with a lower VL, one group composed of HLA-B∗57:01, B∗58:01, B∗57:02, and B∗57:03 (β −0.25, q-value 7.02E−06) and the second group composed of a pair of HLA-C∗08 alleles, HLA-C∗08:04 and C∗08:01 (β−0.29, q-value 0.042). In contrast, two nodes showed an association with higher VL, one cluster composed of six HLA-B∗44 alleles, B∗44:05, B∗44:08, B∗44:04, B∗44:03, B∗44:02, B∗44:27 (β 0.15, q-value 0.003) and a mixed group composed of 16 alleles: B∗35:20, B∗35:16, B∗35:10, B∗35:43, B∗35:08, B∗35:19, B∗35:41, B∗35:01, B∗35:17, B∗35:05, B∗44:06, B∗56:03, B∗53:01, B∗15:08 and B∗15:11 (β 0.13, q-value 0.048). From these alleles, only two (HLA-B∗57:01 and B∗57:03) were associated with HIV-VL when tested at the individual allele level (Fig. 3).
Discussion
In this study, we implemented a computational approach based on consensus clustering of HLA alleles using predicted immunopeptidomes to explore associations of functional HLA groups in a geographically diverse cohort of PWH. We defined functional similarity as differences in epitope binding profiles between HLA alleles and performed HLA imputation on patients enrolled in the START study. We combined the host genetic information with viral genomics through the prediction of immunopeptidomes. Viral sequences derived from a subset of participants were processed into peptides to predict binding affinities to 268 HLA class I alleles which allowed us to generate distance matrices of HLA alleles. Consensus clustering allowed us to agglomerate HLA alleles into nodes by their functionality. Four nodes were found to be associated with HIV-VL, implicating alleles that could not be detected when performing independent allele-specific tests alone. These four nodes accounted for a total of 30 HLA alleles of which 11 were observed in our cohort.
The effects differed among the four nodes associated with HIV-VL. One node containing four well characterized alleles HLA-B∗57:01, B∗58:01, B∗57:02, and B∗57:03 showed an association with lower HIV-VL indicating a protective effect. This effect has been previously reported on an individual allele level [28]; we confirmed and observed shared binding affinity profiles that now indicate their common mode of function. While having a similar effect in HIV-VL, these four alleles are represented at different frequencies between populations. For example, HLA-B∗57:01 carriers are of European descent compared to B∗57:03 which is predominant among those of African descent [2]. The similarity in function was captured by our analysis despite the differences in allele frequencies. A second protective node contained two alleles, HLA-C∗08:04 and C∗08:01, which have been reported in an admixed population and detected after adjusting for multiple factors such as CD4+ and CD8+ counts [29]. We showed that such association could not only be detected in our study but that functional clustering provided a clear and simple method to do it with greater mechanistic insight. We report a node of six HLA-B∗44 alleles, in this case, associated with a higher VL. This finding contradicts a previous study in a cohort of participants of Chinese descent [30] in which a few alleles of the cluster were found to have a protective effect. Given the large effect of viral diversity on VL, this could be a result of regional adaptation that could not be accounted for in our study due to the lack of participants from the same region. While this suggests a weakness in our study due to the underrepresentation of some ethnicities, divergence in our results from studies on homogeneous populations could be a useful tool to detect localized effects. However, data from a cohort with sufficient Chinese representation is needed to confirm the findings.
Defining discrete nodes of HLA alleles is challenging as distances between alleles are based on predicted binding affinities and measured continuously. Therefore, metrics to define a fixed number of clusters failed to suggest a robust threshold for cutting the trees obtained by consensus clustering. For this reason, the final HIV-associated groups of alleles were determined by assessing the trees hierarchically from leaves (individual HLA alleles) to roots (broad HLA nodes), thus optimizing groups based on their association with VL when the combination of alleles in all child nodes supported it. In this way, the methodology implemented provides an advantage to traditional HLA association studies by adaptively increasing the signal of functional groups allowing to uncover associations that could not be found at the individual allele level due to the number of statistical comparisons or sample size [14]. Grouping or clustering of variables to increase statistical power and limit multiple testing is common practice in epidemiological studies. The functional clustering presented here, however, is based on the assumption that the majority of HLA functionality is mediated by peptide binding affinities. This serves to facilitate study design by simplifying decisions on the size and diversity of the cohort while allowing a biological interpretation of the results. For example, it enables the inference of the effect of unobserved HLA alleles on HIV-VL given their common function to those observed. These are notable challenges across clinical studies attempting to measure the effect on the immune response. Altogether, the functional clustering approach presented is neither specific to HIV nor VL and may be transferred to study HLAs and outcomes in other host-pathogen interactions.
Similar computational approaches were proposed to explore HLA class I molecules and their role in HIV-1 infection: These studies mainly focused on genome-to-genome approaches [6], techniques to find new epitopes in European participants using a peptidome approach [7] or to explore the interactions of HLA class I molecules to other ligands [31]. In contrast, we focused on the host factors, the HLA class I alleles, accounting for the high diversity of the cohort and inclusion of a larger number of viral sequences to predict immunopeptidomes and expand on genome-to-genome analyses previously performed in our same cohort [18]. While implementations of HLA clustering in the context of HIV-1 have been developed based on a Bayesian framework [13] these were limited by the diversity of their data and assumptions of the clustering parameters. Extra considerations must be taken when clustering predicted immunopeptidomes to avoid bias introduced by predominantly low binding affinity predictions, which can affect the reliability of the computed distances between HLA alleles. To solve this, we used consensus clustering by computing different subsets of immunopeptidomes and avoiding bias from non-binders when calculating distances between HLA class I alleles under multiple linkage functions. The ensemble of implemented techniques avoids bias due to high dimensionality and has shown success when applied to clustering of biological data [22].
We propose diverse methodological improvements in the analysis of host-viral genetics. From the host genetics perspective, we showed that imputed HLA class I alleles can be used to calculate HLA diversity without requiring full HLA sequences. We found an inverse correlation between HLA diversity in terms of the Simpson index of HLA-A alleles and HIV VL per country. Further assessment of these results is necessary since the reported correlations could be influenced by potential cofounders [32]. From the viral genetics perspective, we incorporated viral sequences using a fast k-mer approach to avoid generating a consensus sequence, especially for pathogens of high genetic variability and to take into account the intra-host viral diversity [33]. When combining host and viral information through predicted binding affinities, we showed that the differences between clustering based on specific HIV and random immunopeptidomes facilitate a slightly higher resolution of the trees. We suggest that the unspecific HLA clusters from the latter approach could be used for other infectious phenotypes. This is likely due to the diversity of viral genomes present in our dataset, and this may not be the case for smaller, more homogeneous cohorts. However, it suggests that our immunopeptidomes could work as a proxy for other similarly diverse studies that do not have access to paired viral genomes. Alternatively, new immunopeptidomes may be proposed based entirely on synthetic data at the risk of increased type-II error.
Although we demonstrate the utility of the implemented methodology on HIV, there is also a clear road to extend it to the analysis of other pathogens eliciting class-I immune responses as well as the opportunity to analyze the variation of HLA structure in populations and within the context of multiple pathogens. Another possible application could be as a screening tool for the most effective peptides either for targeting populations carrying specific allelic distributions or maximizing coverage of vaccines across geographies. Finally, we propose that the focused consideration of HLA function in terms of peptide binding affinities provides a promising approach to inform modern vaccine design. This would be relevant in the advent of mRNA vaccines coming to fruition after decades of research [34] as they take advantage of proper peptide selection. As an example, the need for a new class of tools that account for both variable immunogenic coverage and clinically relevant mutations has been highlighted during the recent SARS-CoV-2 pandemic [35]. To facilitate open research, the results of this work are made available for common use via a web application (https://persimune-health-informatics.shinyapps.io/PAW2022Zucco__HLA_HIV_INSIGHT/)
Acknowledgements
We would like to thank all participants in the START trial and all trial investigators. See N Engl J Med 2015;373:705–807 [15] for the complete list of START investigators.
Sources of funding: This study was supported by the Danish National Research Foundation (DNRF126) and the National Institute of Allergy and Infectious Diseases, Division of Clinical Research and Division of AIDS (National Institutes of Health grants UM1-AI068641, UM1-AI120197 and U01-AI136780). The START trial was supported by the National Institute of Allergy and Infectious Diseases, National Institutes of Health Clinical Center, National Cancer Institute, National Heart, Lung, and Blood Institute, Eunice Kennedy Shriver National Institute of Child Health and Human Development, National Institute of Mental Health, National Institute of Neurological Disorders and Stroke, National Institute of Arthritis and Musculoskeletal and Skin Diseases, Agence Nationale de Recherches sur le SIDA et les Hépatites Virales (France), National Health and Medical Research Council (Australia), National Research Foundation (Denmark), Bundes Ministerium für Bildung und Forschung (Germany), European AIDS Treatment Network, Medical Research Council (United Kingdom), National Institute for Health Research, National Health Service (United Kingdom), and the University of Minnesota. Antiretroviral drugs were donated to the central drug repository by AbbVie, Bristol-Myers Squibb, Gilead Sciences, GlaxoSmithKline/ViiV Healthcare, Janssen Scientific Affairs, and Merck.
Author contributions: A.G.Z., J.D.L. and C.R.M. conceived the study. A.G.Z., M.B., M.G. and C.E. prepared the data. A.G.Z. performed the statistical and computational analyses. A.G.Z. and C.R.M. drafted the manuscript. All authors contributed to data interpretation, critically revised the manuscript, and approved the final version.
Conflicts of interest
There are no conflicts of interest.
Supplementary Material
Footnotes
Supplemental digital content is available for this article.
References
- 1.Tian C, Hromatka BS, Kiefer AK, Eriksson N, Noble SM, Tung JY, et al. Genome-wide association and HLA region fine-mapping studies identify susceptibility loci for multiple common infections. Nat Commun 2017; 8:599. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 2.Ekenberg C, Tang MH, Zucco AG, Murray DD, MacPherson CR, Hu X, et al. Association between single-nucleotide polymorphisms in HLA alleles and human immunodeficiency virus type 1 viral load in demographically diverse, antiretroviral therapy–naive participants from the Strategic Timing of AntiRetroviral Treatment Trial. J Infect Dis 2019; 220:1325–1334. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 3.Bartha I, McLaren PJ, Brumme C, Harrigan R, Telenti A, Fellay J. Estimating the respective contributions of human and viral genetic variation to HIV control. PLoS Comput Biol 2017; 13:e1005339. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 4.Fraser C, Lythgoe K, Leventhal GE, Shirreff G, Hollingsworth TD, Alizon S, et al. Virulence and pathogenesis of HIV-1 infection: an evolutionary perspective. Science 2014; 343:1243727. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 5.McLaren PJ, Carrington M. The impact of host genetic variation on infection with HIV-1. Nat Immunol 2015; 16:577–583. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 6.Bartha I, Carlson JM, Brumme CJ, McLaren PJ, Brumme ZL, John M, et al. A genome-to-genome analysis of associations between human genetic variation, HIV-1 sequence diversity, and viral control. eLife Sci 2013; 2:e01123. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 7.Arora J, McLaren PJ, Chaturvedi N, Carrington M, Fellay J, Lenz TL. HIV peptidome-wide association study reveals patient-specific epitope repertoires associated with HIV control. PNAS 2019; 116:944–949. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 8.Aflalo A, Boyle LH. Polymorphisms in MHC class I molecules influence their interactions with components of the antigen processing and presentation pathway. Int J Immunogenet 2021; 48:317–325. [DOI] [PubMed] [Google Scholar]
- 9.Buhler S, Nunes JM, Sanchez-Mazas A. HLA class I molecular variation and peptide-binding properties suggest a model of joint divergent asymmetric selection. Immunogenetics 2016; 68:401–416. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 10.Carlson JM, Le AQ, Shahid A, Brumme ZL. HIV-1 adaptation to HLA: a window into virus–host immune interactions. Trends Microbiol 2015; 23:212–224. [DOI] [PubMed] [Google Scholar]
- 11.Sette A, Sidney J. HLA supertypes and supermotifs: a functional perspective on HLA polymorphism. Curr Opin Immunol 1998; 10:478–482. [DOI] [PubMed] [Google Scholar]
- 12.Thomsen M, Lundegaard C, Buus S, Lund O, Nielsen M. MHCcluster, a method for functional clustering of MHC molecules. Immunogenetics 2013; 65:655–665. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 13.van Dorp C, Kesmir C. Estimating HLA disease associations using similarity trees. bioRxiv 2018; 4:408302. [Google Scholar]
- 14.Kennedy AE, Ozbek U, Dorak MT. What has GWAS done for HLA and disease associations?. Int J Immunogenet 2017; 44:195–211. [DOI] [PubMed] [Google Scholar]
- 15.Insight Start Study Group. Initiation of antiretroviral therapy in early asymptomatic HIV infection. N Engl J Med 2015; 373:795–807. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 16.Zheng X, Shen J, Cox C, Wakefield JC, Ehm MG, Nelson MR, et al. HIBAG—HLA genotype imputation with attribute bagging. Pharmacogenomics J 2014; 14:192–200. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 17.Ekenberg C, Reekie J, Zucco AG, Murray DD, Sharma S, Macpherson CR, et al. The association of human leukocyte antigen alleles with clinical disease progression in HIV-positive cohorts with varied treatment strategies. AIDS 2021; 35:783–789. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 18.Gabrielaite M, Bennedbæk M, Zucco AG, Ekenberg C, Murray DD, Kan VL, et al. Human immunotypes impose selection on viral genotypes through viral epitope specificity. J Infect Dis 2021; (224):2053–2063. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 19.Mapleson D, Garcia Accinelli G, Kettleborough G, Wright J, Clavijo BJ. KAT: a K-mer analysis toolkit to quality control NGS datasets and genome assemblies. Bioinformatics 2017; 33:574–576. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 20.Jurtz V, Paul S, Andreatta M, Marcatili P, Peters B, Nielsen M. NetMHCpan-4.0: improved peptide–MHC Class I interaction predictions integrating eluted ligand and peptide binding affinity data. J Immunol 2017; 199:3360–3368. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 21.van Dongen S, Enright AJ. Metric distances derived from cosine similarity and Pearson and Spearman correlations. arXiv preprint arXiv 2012; 1208.3145. [Google Scholar]
- 22.Ronan T, Qi Z, Naegle KM. Avoiding common pitfalls when clustering biological data. Sci Signal 2016; 9:re6–re16. [DOI] [PubMed] [Google Scholar]
- 23.Strehl A, Ghosh J. Cluster ensembles — a knowledge reuse framework for combining multiple partitions. J Mach Learn Res 2003; 3:583–617. [Google Scholar]
- 24.Virtanen P, Gommers R, Oliphant TE, Haberland M, Reddy T, Cournapeau D, et al. SciPy 1.0: fundamental algorithms for scientific computing in Python. Nat Methods 2020; 17:261–272. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 25.R Core Team, R: a language and environment for statistical computing. Vienna, Austria: R Foundation for Statistical Computing, 2019. Available at: https://www.R-project.org/. [Google Scholar]
- 26.Letunic I, Bork P. Interactive Tree Of Life (iTOL) v4: recent updates and new developments. Nucleic Acids Res 2019; 47:W256–W259. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 27.Galili T. dendextend: an R package for visualizing, adjusting and comparing trees of hierarchical clustering. Bioinformatics 2015; 31:3718–3720. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 28.Goulder PJR, Walker BD. HIV and HLA Class I: an evolving relationship. Immunity 2012; 37:426–440. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 29.Valenzuela-Ponce H, Garrido-Rodríguez D, Soto-Nava M, García-Téllez T, Escamilla-Gómez T, et al. Novel HLA class I associations with HIV-1 control in a unique genetically admixed population. Sci Rep 2018; 8:6111. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 30.Zhang X, Huang X, Xia W, Li W, Zhang T, Wu H, et al. HLA-B∗44 is associated with a lower viral set point and slow CD4 decline in a cohort of chinese homosexual men acutely infected with HIV-1. Clin Vaccine Immunol 2013; 20:1048–1054. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 31.Debebe BJ, Boelen L, Lee JC, Thio CL, Astemborski J, Kirk G, et al. Identifying the immune interactions underlying HLA class I disease associations. eLife 2020; 9:e54558. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 32.Arora J, Pierini F, McLaren PJ, Carrington M, Fellay J, Lenz TL. HLA heterozygote advantage against HIV-1 is driven by quantitative and qualitative differences in HLA allele-specific peptide presentation. Mol Biol Evol 2020; 37:639–650. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 33.Fellay J, Pedergnana V. Exploring the interactions between the human and viral genomes. Hum Genet 2019; 139:777–781. [DOI] [PubMed] [Google Scholar]
- 34.Jackson NAC, Kester KE, Casimiro D, Gurunathan S, DeRosa F. The promise of mRNA vaccines: a biotech and industrial perspective. npj Vaccines 2020; 5:11. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 35.Liu G, Carter B, Gifford DK. Predicted cellular immunity population coverage gaps for SARS-CoV-2 subunit vaccines and their augmentation by compact peptide sets. Cell Syst 2021; 12:102–107e4. [DOI] [PMC free article] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.