ABSTRACT
Understanding tissue-based HIV-1 proviral population structure is important for improving treatment strategies for individuals with HIV-associated neurological disorders (HAND). Previous analyses have revealed HIV-1 envelope (env) population structure between brain and peripheral tissues as well as Env functional differences, especially in individuals with HAND. Furthermore, population structure has been detected among different anatomical locations in the brain itself, although such patterns are inconsistent across individuals and less strongly associated with the presence/absence of HAND. Here, we utilized the Pacific Biosciences single-molecule real-time (SMRT) high-throughput technology to generate thousands of sequences for each tissue, along with phylogenetic and distance-based analyses, to investigate env sequences from paired brain and spleen samples from eight individuals with/without HAND. To account for the high error rate associated with SMRT sequencing, we used a clustering approach to identify high-quality consensus sequences representative of ≥10 reads (“HQCS10”). In parallel, we characterized variable regions from nonclustered sequences to identify potential functional differences. We found evidence for significant population structure between brain and spleen tissues, as well as among brain tissues and within the same brain tissue, in individuals both with and without HAND. Variable region analysis showed differences in length and charge among brain and nonbrain tissues as well as within the brain, suggesting possible functional differences. Our results demonstrate the complexity of HIV-1 env structure/gene flow among tissues and support the concept that selective pressures in different tissue microenvironments drive viral evolution and adaptation.
IMPORTANCE Understanding the evolution of HIV-1 in the brain compared to other tissues is important for improving treatment strategies for individuals with HIV-associated neurological disorders (HAND). We utilized high-throughput sequencing technology to generate thousands of full-length env sequences from paired brain and spleen samples from eight individuals with/without HAND. We found significant viral population structure for participants both with and without HAND, providing robust evidence for the brain as a compartmentalized tissue and potentially a viral reservoir. We also found striking genetic differences between virus populations, even from the same tissue, suggesting the potential for functional differences and the possibility for multiple evolutionary pathways that result in similar tropisms and/or other tissue-adapted characteristics. Our results demonstrate the complexity of viral population structure within the brain and suggest that analysis of peripheral blood samples alone may not be fully informative with respect to improving strategies to treat or eradicate HIV-1.
KEYWORDS: compartmentalization, phylogenetics, reservoir
INTRODUCTION
Neurological disorders are common in untreated people living with HIV-1 (PLWH), with clinical manifestations ranging from asymptomatic neurocognitive impairment or mild neurocognitive disorders to HIV-1-associated dementia (HAD) (1). The detection of higher levels of HIV-1 proviral DNA in the brains of individuals with HIV-associated neurological disorders (HAND) than in asymptomatic individuals (2) suggests a direct role for HIV-1 replication in the development of HIV-1-related neurological disorders. With the common use of suppressive antiretroviral therapy (ART) regimens, severe neurocognitive disorders are now less common, yet up to 50% of PLWH on suppressive ART experience some form of HAND (1).
Understanding tissue-based HIV-1 proviral population structure is important for improving treatment strategies for individuals with HAND and possibly preventing the occurrence of HAND from the outset. Previous analyses, including our own, have revealed HIV-1 envelope (env) population structure as well as Env functional differences in brain tissue compared with peripheral tissues, especially in individuals with neurological disorders (2–6). In addition, population structure has been detected among different anatomical locations in the brain itself, although such patterns are inconsistent across individuals and less strongly associated with the presence/absence of HAND (3, 6–9).
Recent comparative studies of HIV-1 evolution in brain and immune tissues have typically used single-genome analysis (SGA), which combines endpoint dilution and Sanger sequencing, to directly amplify and sequence single HIV-1 env genomes (2, 5, 10, 11). This approach reduces the chance of experimentally resampling HIV-1 sequences amplified during PCR or generated by PCR-mediated recombination (12, 13); moreover, PCR-amplified env can also be expressed on pseudoviruses to investigate Env cellular tropism and entry phenotypes (2, 5). However, only a small number of variants are feasibly generated from each sample by SGA, which may limit the evaluation of the full range of env present.
We have previously used an ultradeep sequencing approach (single-molecule real-time [SMRT] sequencing) to generate tens of thousands of full-length env sequences from multisite autopsy (blood, brain, lymph node, and gastrointestinal [GI] tract) samples from a single individual with undetectable plasma HIV-1 RNA on ART, according to the methods outlined previously by Laird Smith and colleagues (14). Analyses demonstrated tight compartmentalization of env sequences and macrophage tropism in brain tissues compared to samples from tissues outside the brain (6). Here, we have applied the same ultradeep sequencing approach, along with phylogenetic tree-based and distance-based analyses, to investigate env sequences from paired brain and immune tissue (spleen) samples from eight individuals with and without HAND. The resulting analyses allow a more thorough characterization of the complexity of HIV-1 population structure and variant exchange across and between brain and immune tissues and support the concept that selective pressures in different tissue microenvironments drive the evolution and adaptation of HIV-1 envelope.
RESULTS
Cohort.
Clinical and laboratory findings on all individuals studied are summarized in Table 1, including neurological disease status, duration of infection, CD4 count, plasma viral load (plasma and cerebrospinal fluid [CSF]), and drug therapy for each subject. Four individuals had a diagnosis of HAND, one had a diagnosis of minor cognitive motor disorder, and two did not have evidence of neurologic disorders related to HIV. Clinical classification was not available for one individual. For all eight study participants, at least one section of frontal lobe and spleen tissues was available. For three individuals (CE116, CE161, and 10017), two different sections of frontal lobe specimens were available, and for one individual (7766), two different sections of cerebellum were available.
TABLE 1.
Study participant characteristicse
| Study participant | HIV-associated neurological disordera (no. of days before death) | Duration of infection (yrs) | CD4 count (cells/mm3) (no. of days before death) | Plasma viral load (no. of days before death) | Cerebrospinal fluid viral loadb | Antiretroviral therapy(ies)c (no. of days before death) |
|---|---|---|---|---|---|---|
| 6568 | Y (163) | 17 | 77 (79) | >750,000 (402) | ND | EFV (163) |
| CE125 | N (0) | 10 | 218 (1,230) | ND | ND | 3TC, IDV, D4T (ND) |
| 10-12 | ND | ND | ND | ND | ND | 3TC, AZT (ND) |
| CA110 | Y (0) | 21 | 21 (60) | 198,957 (60) | 2,268 | ND |
| 10017 | Y (191) | 9 | 7 (319) | 389,120 (381) | ND | 3TC, D4T, SQV, ZDV (ND) |
| CE116 | MCMDd (63) | 13 | 80 (63) | 342,386 (63) | 856,342 (0) | D4T, DDI, NFV (63) |
| CE161 | N (192) | 13 | 11 (192) | 246,000 (192) | 2,850 | 3TC, D4T, KTA, TFV (192) |
| 7766 | Y (0) | 12 | 43 (21) | 1,843 (20) | ND | 3TC, ABC, EFV (22) |
Y, yes; N, no.
Cerebrospinal fluid RNA values at the time of death.
3TC, lamivudine; ABC, abacavir; D4T, stavudine; DDI, didanosine; EFV, efavirenz; KTA, lopinavir-ritonavir (Kaletra); TFV, tenofovir; SQV, saquinavir; ZDV, zidovudine.
MCMD, minor cognitive motor disorder.
ND, no data.
Genetic sequences.
A total of 742,980 reads were obtained for 22 tissue sections obtained from eight study participants (see Table S1 in the supplemental material). Each tissue section resulted in a median of 23,753 reads (range, 9,393 to 99,273). These original reads were processed into alignments, resulting in a total of 707,975 aligned reads, with a median of 23,327 aligned reads and a range of 8,857 to 95,286 per tissue section. These aligned reads were condensed into a total of 42,644 high-quality consensus sequences (HQCS), with a median of 1,163 HQCS and a range of 21 to 7,773 per tissue section. Finally, a total of 867 HQCS representative of ≥10 reads (HQCS10) were generated, with a median of 43 HQCS10 and a range of 1 to 105 HQCS10 per tissue section.
Overall, the number of HQCS10 for spleen among study participants was fairly consistent and ranged between 31 and 105 (Fig. 1). On the other hand, the number of HQCS10 in frontal lobe from study participants ranged between 1 and 73. For the three study participants from whom two separate frontal lobe sections were analyzed (participants CE116, CE161, and 10017), the number of HQCS10 was highly variable and ranged from 2-fold to 69-fold differences between sections. Similarly, there was a 10-fold difference in the number of HQCS between the two cerebellar tissue sections studied from a single individual (7766). The relative number of HQSC10 between brain and spleen was also variable among study participants.
FIG 1.
Numbers of sequences (y axis) for the aligned, high-quality consensus sequence (HQCS), and HQCS10 data sets. The height of the bar is proportional to the number of sequences. Tissues are colored according to the key (FL #1, frontal lobe section 1; FL #2, frontal lobe section 2, CB #1, cerebellum section 1; CB #2, cerebellum section 2; OC, occipital lobe; SP, spleen).
The number of spleen HQCS among study participants was more variable than the number of spleen HQCS10 (range, 548 to 7,773). For the brain HQCS, the range was even greater (21 to 4,383). For the frontal lobe, there was a modest significant correlation between the numbers of HQCS and HQSC10 (R2 = 0.56; P = 0.008), which increased when a single outlier (participant CE116, frontal lobe section 2) was removed (R2 = 0.92; P < 0.001). On the other hand, there was no significant correlation found between the numbers of HQCS and HQSC10 for spleen. There was also no significant correlation between the number of aligned sequences and either HQCS or HQCS10 for either spleen or frontal lobe.
Phylogenies.
An initial tree was inferred using the HQCS10 from all study participants to ensure no cross-contamination. A few HQCS (each representing <100 sequences) were identified as cross-contamination and removed from further analysis. All study participants were separated from each other with 100% bootstrap support (Fig. 2). Trees were then inferred for each study participant individually. Two representative trees are shown in Fig. 3 and 4; the remaining trees are shown in Fig. S1 to S6. The patterns in the phylogenies for all eight study participants are described in Table 2.
FIG 2.
Maximum likelihood phylogeny of HQCS10 from all study participants (n = 8). Branches are scaled in substitutions per site according to the scale on the left. Symbols represent HQCS10 variants and are colored according to the tissue of origin (red, frontal lobe; orange, occipital lobe; magenta, cerebellum; green, spleen). Colored portions of the wheel designate the study participant according to the label.
FIG 3.
Maximum likelihood phylogeny of study participant 7766. Branches are scaled in substitutions per site according to the scale on the left. Symbols represent HQCS10 variants and are scaled by the number of reads that they represent and colored according to the tissue of origin. Colored boxes designate the clades containing the cerebellum (green and pink) and frontal lobe (yellow) sequences. Asterisks indicate major branches with >70% bootstrap support.
FIG 4.

Maximum likelihood phylogeny of study participant 10017. Branches are scaled in substitutions per site according to the scale on the left. Symbols represent HQCS10 variants and are scaled by the number of reads that they represent and colored according to the tissue of origin. The colored box (yellow) designates the clades containing the frontal lobe sequences. Asterisks indicate major branches with >70% bootstrap support.
TABLE 2.
Phylogenetic patterns for eight study participantsa
| Study participant | Tissue sections | Separation (brain, spleen) | Separation (within brain, same tissue) | Separation (within brain, different tissues) |
|---|---|---|---|---|
| 6568 | FL, SP | FL single variant; groups within the SP clade | NA | NA |
| CE125 | FL, SP | FL single variant; groups within the SP clade | NA | NA |
| 10-12 | FL, SP | Yes (96%); spillover of FL into SP | NA | NA |
| CA110 | FL, OC, SP | Yes (OC + FL, 100%); FL single variant | NA | FL single variant within larger OC clade |
| 10017 | FL1, FL2, SP | Yes (100%); spillover of SP into FL1 | No; both FLs are interspersed | NA |
| CE116 | FL1, FL2, SP | Yes (FL1, 92%); no (FL2 and SP interspersed) | Yes (FL1, 92%); FL2 interspersed with SP | NA |
| CE161 | FL1, FL2, SP | Yes (FL1, 100%; FL2, 78%) | Yes (100%) | NA |
| 7766 | FL, CB1, CB2, SP | Yes (92%) | No; CB1 clade within larger CB2 group | FL clade (62%) descendant from CB clade |
NA, not applicable; FL, frontal lobe; CB, cerebellum; OC, occipital lobe; SP, spleen.
For two study participants with a single frontal lobe and spleen section (6568 [HAND] and CE125 [no HAND]) (Fig. S1 and S2), the frontal lobe reads were represented by a single HQCS10 variant. Interestingly, the frontal lobe HQCS10 variant was separated from the spleen sequences by a long branch. For the third study participant with a single frontal lobe and spleen section (10-12), the spleen and frontal lobe HQCS10 variants were separated with 96% bootstrap support, although four HQCS10 variants from the spleen were intermixed with the frontal lobe clade (Fig. S3). For one study participant (CA110 [HAND]) with a single section from the frontal lobe, occipital lobe, and spleen, the HQCS10 variants from both brain tissues were grouped with 100% bootstrap support (Fig. S4). Again, the frontal lobe reads were represented by a single HQCS10 variant.
For all four study participants with a single spleen section and two sections from the same brain tissue specimen, there was some degree of separation between spleen and brain. In two cases, one without HAND (participant 7766) (Fig. 3) and one identified as having a minor cognitive motor disorder (MCMD) (participant CE161) (Fig. S5), HQCS10 from brain were completely separate from HQCS10 from spleen with high bootstrap support (78% to 100%). In a third case (10017 [HAND]) (Fig. 4), HQCS10 from brain was separated from HQCS10 from spleen with high bootstrap support (100%), although two HQCS10 brain variants were also in the spleen clade. In the final case (CE116 [HAND]) (Fig. S6), HQCS10 from one brain sample were completely separate from spleen with high bootstrap support (92%), while the HQCS10 from the second brain sample were interspersed.
With respect to separation between HQCS10 from different sections from the same brain tissue, in two cases (CE161 [HAND] and CE116 [MCMD]), there was complete separation (bootstrap values of 100% and 92%, respectively). In the study participant with two cerebellum sections (7766 [HAND]), the HQCS10 from cerebellum section 1 were monophyletic and descended from cerebellum section 2 (Fig. 3). In the final case (10017 [HAND]), the HQCS10 from both frontal lobe sections were interspersed (Fig. 4).
Phylogenetic tests for population structure.
To objectively test for the presence of population structure among the different tissue specimens/sections, we used the Slatkin-Maddison test and the correlation coefficient test, both with 1,000 permutations, to construct a random distribution of migration events. For the six study participants with >1 HQCS10 from brain and spleen tissues, we grouped all sequences from brain tissues and compared brain and spleen (Table 3). For the Slatkin-Maddison test, all comparisons were significant (P < 0.001), indicating fewer migrations from one tissue to another than expected by chance. For the correlation coefficient tests, both metrics (branch counts and path lengths) were significant (P < 0.001) for three study participants (CA110, 10017, and 7766) and nearly significant (P = 0.002) for a fourth (CE161), indicating that sequences from the same tissue were significantly closer in the tree than those from different tissues. For participant 10-12, the P values of the correlation coefficient tests were low (P = 0.010) but not significant. For participant CE116, both metrics were also not significant (P = 0.030 and P = 1.0).
TABLE 3.
Phylogenetics-based population structure statisticsa
| Study participant | Analysis | Slatkin-Maddison observed/expected (P value) | Correlation coefficient of branch counts (P value) | Correlation coefficient of path lengths (P value) |
|---|---|---|---|---|
| 10-12 | Brain vs spleen | 5/8 (<0.001) | 0.52 (0.010) | 0.49 (0.010) |
| CA110 | Brain vs spleen | 4/5 (<0.001) | 0.23 (<0.001) | 0.4823 (<0.001) |
| 10017 | Brain vs spleen | 3/6 (<0.001) | 0.54 (<0.001) | 0.2623 (<0.001) |
| CE116 | Brain vs spleen | 8/18 (<0.001) | 0.05 (1.0) | −0.12 (0.030) |
| CE161 | Brain vs spleen | 2/10 (<0.001) | 0.07 (<0.001) | 0.0823 (0.002) |
| 7766 | Brain vs spleen | 2/12 (<0.001) | 0.51 (<0.001) | 0.7523 (<0.001) |
| 10017 | SP vs FL1 vs FL2 | 9/12 (<0.001) | 0.45 (<0.001) | 0.2223 (<0.001) |
| CE116 | SP vs FL1 vs FL2 | 9/19 (<0.001) | 0.43 (<0.001) | 0.2623 (<0.001) |
| CE161 | SP vs FL1 vs FL2 | 3/10 (<0.001) | 0.64 (<0.001) | 0.7023 (<0.001) |
| 7766 | SP vs FL vs CB | 2/11 (<0.001) | 0.59 (<0.001) | 0.6323 (<0.001) |
| SP vs FL vs CB1 vs CB2 | 3/15 (<0.001) | 0.57 (<0.001) | 0.6023 (<0.001) |
P values for each test are shown. Significant results are in boldface type (alpha = 0.001).
For the three study participants with two frontal lobe sections (10017, CE116, and CE161), we performed tests again, this time considering each section a separate population. All results were significant (P < 0.001), again consistent with strong population structure.
For one study participant with two cerebellum sections and a frontal lobe section (7766), we ran the tests two additional times, once by grouping both cerebellum sections together and once by considering each brain section as a separate group. In all instances, all tests were again significant (P < 0.001).
Altogether, these results were in agreement with the observations of the maximum likelihood (ML) phylogenies.
Genetic distance tests for population structure.
We then examined population structure by calculating the Fst values using four different metrics based on genetic distances in the HQCS10 sequence alignments. For the six study participants with >1 HQCS10 from brain and spleen tissues, we grouped all sequences from brain tissues and compared brain and spleen (Table 4). For five study participants (10-12, CA110, 10017, CE161, and 7766), all four metrics were significant (P < 0.001), consistent with population structure. In the sixth participant (CE116), only one metric (Hudson nearest neighbor statistic [Snn]) was significant.
TABLE 4.
Distance-based structure statistics (Fst)a
| Study participant | Analysis | Statistic value (P value) |
|||
|---|---|---|---|---|---|
| Hudson, Slatkin, and Maddison | Slatkin | Hudson, Boos, and Kaplan | Hudson (Snn) | ||
| 10-12 | Brain vs spleen | 0.385 (<0.001) | 0.239 (<0.001) | 0.238 (<0.001) | 0.887 (<0.001) |
| CA110 | Brain vs spleen | 0.412 (<0.001) | 0.260 (<0.001) | 0.112 (<0.001) | 1.000 (<0.001) |
| 10017 | Brain vs spleen | 0.182 (<0.001) | 0.100 (<0.001) | 0.094 (<0.001) | 0.952 (<0.001) |
| CE116 | Brain vs spleen | 0.035 (0.172) | 0.018 (0.172) | 0.015 (0.172) | 0.953 (<0.001) |
| CE161 | Brain vs spleen | 0.141 (<0.001) | 0.076 (<0.001) | 0.065 (<0.001) | 0.994 (<0.001) |
| 7766 | Brain vs spleen | 0.432 (<0.001) | 0.275 (<0.001) | 0.196 (<0.001) | 1.000 (<0.001) |
| 10017 | FL vs FL | 0.126 (0.289) | 0.067 (0.289) | 0.058 (0.289) | 0.478 (0.746) |
| CE116 | FL vs FL | 0.264 (<0.001) | 0.152 (<0.001) | 0.137 (<0.001) | 0.990 (<0.001) |
| CE161 | FL vs FL | 0.583 (<0.001) | 0.411 (<0.001) | 0.385 (<0.001) | 0.991 (<0.001) |
| 7766 | CB vs CB | 0.232 (0.019) | 0.131 (0.019) | 0.047 (0.019) | 0.911 (0.015) |
| CB vs FL | 0.248 (<0.001) | 0.141 (<0.001) | 0.140 (<0.001) | 0.992 (<0.001) | |
P values for each test are shown. Significant results are in boldface type (alpha = 0.001).
We then calculated the Fst values between the two sections from the same brain tissue for the four study participants with those available. In two cases (CE116 and CE161), all metrics were significant (P < 0.001). For participant 10017, none of the metrics were significant, while for participant 7766, the P values were low although not significant (P = 0.015 to 0.019). Finally, for participant 7766, we compared all of the cerebellum sequences together with the frontal lobe sequences. All metrics were significant (P < 0.001).
Variable region analysis.
Since the variable regions were removed from the sequence alignments prior to clustering into HQCS, and variable regions could not be associated with specific sequences, we separately assessed the similarities among variable region (V1-V3) lengths and charges among participants/tissues/sections (see Table S2 for the number of sequences for each study participant).
The overall lengths were similar between spleen and brain tissues for each subject (Fig. S7). For participants 7766 and CE116, there was more variation in the V1 and V2 lengths in the brain sequences than in the spleen sequences. For participants 7766 and CE161, in the brain tissues with >1 section (cerebellum and frontal lobe, respectively), one of these sections had sequences with lengths not seen in the other brain section or the spleen. On the other hand, in study participant 10017, lengths from both frontal lobe sections and the spleen were similar. In study participants 6568 and CE125, more variation in length was seen in the spleen than in the brain.
Similar trends were seen in the distribution of charge in V1, V2, and V3 as noted above for lengths (Fig. S8). In study participants 6568, CE125, 10-12, and CA110, variation was generally higher in the spleen than in the brain sequences. For participants CE161 and 7766, sequences with distinctly different charges were seen in the spleen and at least one of the brain sections. In study participants 10017 and CE116, charges in all tissues/sections were similar.
These patterns in variable region characteristics largely reflected the patterns observed in the phylogenies, where participants CE161 and 7766 showed complete compartmentalization between spleen and brain sequences and participants 6568 and CE125 each had a single HQCS10 in the brain.
As expected, the vast majority of viruses (>99%) in all participants/tissues/sections were predicted to use the CCR5 coreceptor (Fig. S9 and Table S3). The exceptions were the spleen for two study participants, in which ∼72% (CE125) and ∼3% (10-12) of the viruses were predicted to use CXCR4, respectively, and one frontal lobe in participant CE116, where ∼9% of viruses were predicted to use CXCR4.
DISCUSSION
The ability to investigate the population structure of HIV-1 within and among tissues, particularly the brain, is limited by a number of factors, including the difficulty in obtaining tissue from study participants and the depth and breadth of the sequencing approach. Moreover, the use of different approaches to analyze sequences has led to varying definitions of and conclusions about sequence compartmentalization (15). Here, we used multiple tree-based and distance-based statistical methods to investigate population structure in matching tissue samples (frontal lobe and spleen) from eight study participants. We utilized the Pacific Biosciences (PacBio) SMRT high-throughput technology to generate thousands of full-length env sequences for each tissue, according to the methodological approach of Laird Smith et al. (14), to reduce noise from potential sequencing errors. In parallel, we also investigated the variable region characteristics from nonclustered sequences to identify potential functional differences within and among tissues.
Using both phylogenetics-based and distance-based approaches, we found evidence for some degree of population structure between brain and spleen tissues in all six study participants with >1 HQCS10 from both tissues. This is consistent with other studies that have found compartmentalization between brain and nonbrain tissues (2–6). In the two cases with a single HQCS10 from the brain (6568 [HAND] and CE125 [no HAND]), the long branch leading to the brain variant was also consistent with an independently evolving population. In our two study participants with tissues from multiple parts of the brain (i.e., frontal lobe and occipital lobe, or frontal lobe and cerebellum), the frontal lobe sequences were descendant from the other brain tissue, which is also consistent with our previous work that showed that the frontal lobe sequences were the most distinct from the other brain tissues (6). However, although evidence for population structure between brain and nonbrain was clear in five cases, in three cases, some intermixing between brain and nonbrain tissues was evident (10017 [HAND], CE116 [MCMD], and 10-12 [no diagnosis]), suggesting some degree of trafficking across tissues. We note that population structure between brain and nonbrain was observed in both the presence and absence of HAND.
We also found evidence for population structure within the brain, both between different brain tissues and between different sections of the same brain tissue specimen. In fact, only one out of the four participants who had sequences from multiple sections from the same brain tissue specimen showed evidence for a lack of population structure (10017 [HAND]). In another case, sequences from the two frontal lobe sections were monophyletic with respect to each other and the spleen (CE161 [no HAND]), while in another case (CE116 [MCMD]), sequences from one frontal lobe section were monophyletic with respect to sequences from other tissue sections, although sequences from the other frontal lobe section were interspersed with those from the spleen. In the final case (7766 [HAND]), while the sequences from both sections of cerebellum were monophyletic with respect to the spleen, one section clearly contained less diversity than the other section. For that case, the frontal lobe sequences were also distinct from the cerebellum sequences. We are following up on the observation of within-tissue population structure by using single-genome amplification and Sanger sequencing to strengthen these findings.
The patterns seen in the variable region analysis were generally consistent with those seen in the phylogeny and suggest potential phenotypic differences within populations. Notably, we found that ∼9% of viruses in one frontal lobe section (but <1% in the other frontal lobe section) in participant CE116, an individual with long-standing HIV-1 infection and a low absolute CD4 T cell count, were predicted to use the CXCR4 coreceptor. This is somewhat unusual since the majority of viruses in the brain are expected to use the CCR5 coreceptor in concordance with the predominant cell types in the brain (e.g., macrophages and other macrophage lineage cells) (5). These predicted X4 viruses may represent infection of migratory CD4+ T cells from another tissue, or (less likely) they may have evolved within the brain itself. Additional studies are under way to further characterize this potentially interesting subpopulation.
We note that the detection of population structure among rapidly evolving viruses like HIV-1 may be affected by a number of biological factors, including the time since infection, the viral population size within each tissue, antiretroviral therapy, and the presence of physical partitions, including the blood-brain barrier, which may affect the penetrance of immune effector cells or molecules. Methodological factors, including the virus sample size, tissue specimen size, sequencing strategy, and testing choices, may also bias results (15). While we attempted to address some of the methodological factors here, we note that our results are necessarily limited by the inclusion of a single time point for all participants and the possible effects of various lengths of infection and drug regimens. We also note that while seven participants were on ART, several individuals were viremic at their last visit, which likely indicates a lack of/difficulty with adherence of discontinuation of therapy. It is possible that rebounding virus populations may account for some of the patterns observed. On the other hand, the detection of significant population structure in the brain for all six of the participants with >1 HQCS10 for brain and spleen tissues (both with and without HAND) and the long branches leading to the single brain HQCS10 variant in the other two cases (one with and one without HAND) suggest that the observed patterns largely derive from virus dynamics that were established well before death and did not result from a transient increase in viremia. Furthermore, our findings are consistent with previous studies, including our previous study of a virologically suppressed PLWH on ART (6), and provide additional robust evidence for the brain as a compartmentalized tissue and potentially a viral reservoir, independent of HAND status. In addition, the striking genetic differences between virus populations, even from the same tissue, suggest the potential for functional differences and the possibility for multiple evolutionary pathways that result in similar tropisms and/or other tissue-adapted characteristics. We are following up on these findings by performing functional assays of Sanger-sequenced clones representative of these subpopulations.
Unlike other high-throughput platforms, which generate short (<500-bp) reads and require computational reconstruction of the original gene sequence, the SMRT technology generates reads that cover the entire env gene. While the error rate of the unprocessed long reads is higher than those of other platforms, the circularization of the amplicon during sequencing (circularized consensus sequencing [CCS]) allows numerous polymerase passes of the same amplicon, from which a consensus sequence can be generated with drastically reduced error rates (16, 17). In addition, Ho and coauthors used plasmid sequencing studies to determine that at 7 full CCS passes, the error rate was 0.02% (18). We used 15 complete amplicon passes as our minimum for inclusion, out of an abundance of caution to limit the influence of sequence error. In addition, Laird Smith and coauthors found that 80% of sequencing errors using this technology were insertions and deletions concentrated in homopolymeric regions. Our manual removal of spurious insertions (i.e., sites with >95% gaps) further reduced the influence of sequence error in our alignments and subsequent analyses. Finally, clustering the reads and using the HQCS10 variants only should remove spurious sequence errors from affecting the analysis.
A potential limitation of our results might be the effect of recombination. We addressed this experimentally by using a high-fidelity polymerase and increasing the PCR extension times; however, these measures may not entirely eliminate PCR-based recombination and do not address polymerase template switching and nonenzymatic DNA damage. On the other hand, Laird Smith and coauthors experimentally calculated a recombination rate of <1% in their SMRT-derived data set (14), suggesting that, with a similar experimental method, recombination may not impact the results to a substantial degree.
Altogether, the results from this study demonstrate the complexity of viral population structure within the brain. The results further caution against relying upon a single tissue section and/or shallow sequencing methods to infer evolutionary history in its entirety from these low-copy-number samples. The complexity of viral population structures as well the differences in functional characteristics that they may represent warrant further investigation and suggest that analysis of peripheral blood samples alone may not be fully informative with respect to improving strategies to treat or eradicate HIV-1.
MATERIALS AND METHODS
Tissue specimens.
Postmortem tissue specimens from eight individuals with or without HAND were provided by the National Neuro-AIDS Tissue Consortium (NNTC) (n = 7) and the National Disease Research Interchange (NDRI) (n = 1). Brain and blood or immune tissue samples provided by the NNTC were immediately frozen at −80°C. Samples provided by the NDRI from a single individual (10-12) were immediately placed on ice and examined within 48 h of death. The UMass Chan Medical School institutional review board (IRB) considered that this research was not human subject research (IRB docket number H00014098). Tissues collected at autopsy included at least one frontal lobe and one spleen tissue specimen from each individual; occipital lobe and cerebellum were studied when available. In order to investigate potential compartmentalization within brain tissues, two different sections of frontal lobe specimens (approximately 25 mg each) were examined from three individuals (CE116, CE161, and 10017), and two different sections of cerebellum were examined from another individual (7766).
Nucleic acid isolation and PCR HIV-1 env amplification.
Genomic DNA was isolated from each section of tissue (25 mg of brain tissue and 10 mg of spleen tissue for each sample) using the QIAamp DNA minikit (Qiagen) according to the manufacturer’s protocol. DNA was eluted in nuclease-free, PCR-grade water and stored at −80°C until analysis. For PacBio SMRT sequencing, a nested PCR approach was used to amplify bulk ∼3-kb full-length env products as previously described (6). We used the Phusion high-fidelity DNA polymerase (New England BioLabs) to reduce PCR polymerase errors. To reduce PCR-mediated recombination, we increased our extension times to 3 min to allow more time for the completion of each amplicon before denaturation. PCR products were imaged on a 2% agarose gel with GelGreen nucleic acid gel stain (Biotium) using a blue-light transilluminator and purified using a QIAquick gel extraction kit (Qiagen Inc.). PCR products were eluted in 50 μl nuclease-free, PCR-grade water and were sent to the Interdisciplinary Center for Biotechnology Research core sequencing laboratory at the University of Florida for library preparation and sequencing using the PacBio Sequel instrument. The LR v3 SMRT cell was utilized with 20-h collection times for each run, with 4 to 8 libraries multiplexed in a single SMRT cell.
PacBio sequence processing.
SMRT raw reads for each tissue were quality filtered at 99% quality using the SMRT pipeline to remove low-quality reads. The quality-filtered reads were further selected to include only circular consensus reads of >2,000 bp with at least 15 complete amplicon passes using PacBioSmartPipe tools (v.2.3.0). For each tissue-specific data file, reads were assembled into a single endpoint dilution tissue-specific reference sequence using BLASR (v.2.3.0). The region containing the majority of the reads was extracted, and this resulting file was designated the “original” alignment (see Table S1 in the supplemental material). This original sequence alignment was then gap stripped to remove spurious insertions to retain columns with ≥95% coverage using Geneious Prime software (Geneious). Sequences with >100 bp missing compared to the reference sequence were removed from the alignment. Nonhomologous variable regions (V1 to V5) were also removed, and the alignment was manually optimized using AliView (v.1.17.1) (19). This alignment was designated “aligned” (Table S1).
High-quality consensus sequences (HQCS) were generated to correct for sequencing errors that may have resulted due to the SMRT sequencing technology according to the methods described previously by Laird Smith et al. (14). In brief, sequences from each tissue section were clustered at 99% genetic identity using USEARCH v10.0.24 (20). Each cluster was then assigned a representative sequence (the HQCS) by taking the centroid sequence of the cluster. The number of HQCS in each tissue/study participant is noted in Table S1 as “HQCS.” To further reduce the effect of potential sequence errors, we included only HQCS that represented >10 sequences from the original alignment (HQCS10) (Table S1).
PacBio sequence analysis.
All of the HQCS10 for each study participant were combined and used to generate an initial phylogenetic tree. Subsequent trees were inferred using participant-specific HQCS10. Trees were generated using the Hasegawa-Kishino-Yano nucleotide model of substitution (21) with gamma-distributed among-site rate variation and 1,000 ultrafast bootstraps to assess branch support using IQTREE v2 (22). Trees were midpoint rooted.
Population structure was assessed using tree-based methods and distance-based methods using HYPHY (23). Specifically, for tree-based methods, the Slatkin-Maddison test (24) and the correlation coefficient test (using branch counts and path lengths) (25) were used to assess tree structure for different compartments in study participants with >1 HQCS10 for each group. One thousand permutations were performed to construct a distribution of random migration events. For distance-based methods, four measures of F-statistics value (Fst) (Hudson, Slatkin, and Maddison [26]; Slatkin [27]; Hudson, Boos, and Kaplan [28]; and Hudson [29]) were used to determine population structure from the HQCS10 alignments using the Tamura-Nei (30) model of nucleotide substitution. One thousand permutations were performed to randomly distribute the sequences into subpopulations.
Variable region analysis.
We extracted variable region 1 (V1), V2, and V3 from the original alignments for each study participant/tissue/section. V1 and V2 were extracted as a single unit initially. The V1-V2 and V3 alignments were then “cleaned” to include only sequences with no ambiguous bases and that contained a beginning and ending cysteine. V1 and V2 were then separated into separate alignments (see Table S2 for final numbers of sequences). Length and charge were determined for all sequences using the Variable Region Characteristics tool at the Los Alamos HIV-1 Database (https://www.hiv.lanl.gov/content/sequence/VAR_REG_CHAR/index.html). Predicted tropism was determined using the PSSM tool (https://indra.mullins.microbiol.washington.edu/webpssm/).
Data availability.
Raw sequence data files are available at the NCBI database under BioProject accession number PRJNA746593. Alignments of the HQCS10 are available at https://github.com/Bioinfoexperts/pacbio.
ACKNOWLEDGMENTS
We are grateful to the individuals who donated their tissues to enable this research and the National Neuro-AIDS Tissue Consortium and the National Disease Research Interchange for providing samples.
This work was supported by National Institutes of Health grants R01NS107022, R01NS095749, and UL1TR001453 (K.L.).
R.R., D.J.N., S.C., and S.L.L. are employed by BioInfoExperts, LLC.
Footnotes
Supplemental material is available online only.
Contributor Information
Katherine Luzuriaga, Email: katherine.luzuriaga@umassmed.edu.
Frank Kirchhoff, Ulm University Medical Center.
REFERENCES
- 1.Farhadian S, Patel P, Spudich S. 2017. Neurological complications of HIV infection. Curr Infect Dis Rep 19:50. 10.1007/s11908-017-0606-5. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 2.Gonzalez-Perez MP, Peters PJ, O’Connell O, Silva N, Harbison C, Cummings Macri S, Kaliyaperumal S, Luzuriaga K, Clapham PR. 2017. Identification of emerging macrophage-tropic HIV-1 R5 variants in brain tissue of AIDS patients without severe neurological complications. J Virol 91:e00755-17. 10.1128/JVI.00755-17. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 3.Lamers SL, Gray RR, Salemi M, Huysentruyt LC, McGrath MS. 2011. HIV-1 phylogenetic analysis shows HIV-1 transits through the meninges to brain and peripheral tissues. Infect Genet Evol 11:31–37. 10.1016/j.meegid.2010.10.016. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 4.Lamers SL, Salemi M, Galligan DC, Morris A, Gray R, Fogel G, Zhao L, McGrath MS. 2010. Human immunodeficiency virus-1 evolutionary patterns associated with pathogenic processes in the brain. J Neurovirol 16:230–241. 10.3109/13550281003735709. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 5.Gonzalez-Perez MP, O’Connell O, Lin R, Sullivan WM, Bell J, Simmonds P, Clapham PR. 2012. Independent evolution of macrophage-tropism and increased charge between HIV-1 R5 envelopes present in brain and immune tissue. Retrovirology 9:20. 10.1186/1742-4690-9-20. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 6.Brese RL, Gonzalez-Perez MP, Koch M, O’Connell O, Luzuriaga K, Somasundaran M, Clapham PR, Dollar JJ, Nolan DJ, Rose R, Lamers SL. 2018. Ultradeep single-molecule real-time sequencing of HIV envelope reveals complete compartmentalization of highly macrophage-tropic R5 proviral variants in brain and CXCR4-using variants in immune and peripheral tissues. J Neurovirol 24:439–453. 10.1007/s13365-018-0633-5. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 7.Smit TK, Wang B, Ng T, Osborne R, Brew B, Saksena NK. 2001. Varied tropism of HIV-1 isolates derived from different regions of adult brain cortex discriminate between patients with and without AIDS dementia complex (ADC): evidence for neurotropic HIV variants. Virology 279:509–526. 10.1006/viro.2000.0681. [DOI] [PubMed] [Google Scholar]
- 8.Chang J, Jozwiak R, Wang B, Ng T, Ge YC, Bolton W, Dwyer DE, Randle C, Osborn R, Cunningham AL, Saksena NK. 1998. Unique HIV type 1 V3 region sequences derived from six different regions of brain: region-specific evolution within host-determined quasispecies. AIDS Res Hum Retroviruses 14:25–30. 10.1089/aid.1998.14.25. [DOI] [PubMed] [Google Scholar]
- 9.Smit TK, Brew BJ, Tourtellotte W, Morgello S, Gelman BB, Saksena NK. 2004. Independent evolution of human immunodeficiency virus (HIV) drug resistance mutations in diverse areas of the brain in HIV-infected patients, with and without dementia, on antiretroviral treatment. J Virol 78:10133–10148. 10.1128/JVI.78.18.10133-10148.2004. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 10.Schnell G, Joseph S, Spudich S, Price RW, Swanstrom R. 2011. HIV-1 replication in the central nervous system occurs in two distinct cell types. PLoS Pathog 7:e1002286. 10.1371/journal.ppat.1002286. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 11.Dunfee RL, Thomas ER, Gabuzda D. 2009. Enhanced macrophage tropism of HIV in brain and lymphoid tissues is associated with sensitivity to the broadly neutralizing CD4 binding site antibody b12. Retrovirology 6:69. 10.1186/1742-4690-6-69. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 12.Keele BF, Giorgi EE, Salazar-Gonzalez JF, Decker JM, Pham KT, Salazar MG, Sun C, Grayson T, Wang S, Li H, Wei X, Jiang C, Kirchherr JL, Gao F, Anderson JA, Ping L-H, Swanstrom R, Tomaras GD, Blattner WA, Goepfert PA, Kilby JM, Saag MS, Delwart EL, Busch MP, Cohen MS, Montefiori DC, Haynes BF, Gaschen B, Athreya GS, Lee HY, Wood N, Seoighe C, Perelson AS, Bhattacharya T, Korber BT, Hahn BH, Shaw GM. 2008. Identification and characterization of transmitted and early founder virus envelopes in primary HIV-1 infection. Proc Natl Acad Sci USA 105:7552–7557. 10.1073/pnas.0802203105. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 13.Simmonds P, Balfe P, Peutherer JF, Ludlam CA, Bishop JO, Brown AJ. 1990. Human immunodeficiency virus-infected individuals contain provirus in small numbers of peripheral mononuclear cells and at low copy numbers. J Virol 64:864–872. 10.1128/JVI.64.2.864-872.1990. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 14.Laird Smith M, Murrell B, Eren K, Ignacio C, Landais E, Weaver S, Phung P, Ludka C, Hepler L, Caballero G, Pollner T, Guo Y, Richman D, IAVI Protocol C Investigators & The IAVI African HIV Research Network, Poignard P, Paxinos EE, Kosakovsky Pond SL, Smith DM. 2016. Rapid sequencing of complete env genes from primary HIV-1 samples. Virus Evol 2:vew018. 10.1093/ve/vew018. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 15.Zárate S, Kosakovsky Pond SL, Shapshak P, Frost SDW. 2007. Comparative study of methods for detecting sequence compartmentalization in human immunodeficiency virus type 1. J Virol 81:6643–6651. 10.1128/JVI.02268-06. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 16.Travers KJ, Chin CS, Rank DR, Eid JS, Turner SW. 2010. A flexible and efficient template format for circular consensus sequencing and SNP detection. Nucleic Acids Res 38:e159. 10.1093/nar/gkq543. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 17.Carneiro MO, Russ C, Ross MG, Gabriel SB, Nusbaum C, DePristo MA. 2012. Pacific Biosciences sequencing technology for genotyping and variation discovery in human data. BMC Genomics 13:375. 10.1186/1471-2164-13-375. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 18.Ho CKY, Raghwani J, Koekkoek S, Liang RH, Van der Meer JTM, Van Der Valk M, De Jong M, Pybus OG, Schinkel J, Molenkamp R. 2017. Characterization of hepatitis C virus (HCV) envelope diversification from acute to chronic infection within a sexually transmitted HCV cluster by using single-molecule, real-time sequencing. J Virol 91:e02262-16. 10.1128/JVI.02262-16. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 19.Larsson A. 2014. AliView: a fast and lightweight alignment viewer and editor for large datasets. Bioinformatics 30:3276–3278. 10.1093/bioinformatics/btu531. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 20.Edgar RC. 2010. Search and clustering orders of magnitude faster than BLAST. Bioinformatics 26:2460–2461. 10.1093/bioinformatics/btq461. [DOI] [PubMed] [Google Scholar]
- 21.Hasegawa M, Kishino H, Yano T. 1985. Dating of the human-ape splitting by a molecular clock of mitochondrial DNA. J Mol Evol 22:160–174. 10.1007/BF02101694. [DOI] [PubMed] [Google Scholar]
- 22.Minh BQ, Schmidt HA, Chernomor O, Schrempf D, Woodhams MD, von Haeseler A, Lanfear R. 2020. IQ-TREE 2: new models and efficient methods for phylogenetic inference in the genomic era. Mol Biol Evol 37:1530–1534. 10.1093/molbev/msaa015. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 23.Kosakovsky Pond SL, Frost SDW, Muse SV. 2005. HyPhy: hypothesis testing using phylogenies. Bioinformatics 21:676–679. 10.1093/bioinformatics/bti079. [DOI] [PubMed] [Google Scholar]
- 24.Slatkin M, Maddison WP. 1989. A cladistic measure of gene flow inferred from the phylogenies of alleles. Genetics 123:603–613. 10.1093/genetics/123.3.603. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 25.Critchlow DE, Li S, Nourijelyani K, Pearl D. 2000. Some statistical methods for phylogenetic trees with application to HIV disease. Math Comput Model 32:69–81. 10.1016/S0895-7177(00)00120-5. [DOI] [Google Scholar]
- 26.Hudson RR, Slatkin M, Maddison WP. 1992. Estimation of levels of gene flow from DNA sequence data. Genetics 132:583–589. 10.1093/genetics/132.2.583. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 27.Slatkin M. 1993. Isolation by distance in equilibrium and non-equilibrium populations. Evolution 47:264–279. 10.1111/j.1558-5646.1993.tb01215.x. [DOI] [PubMed] [Google Scholar]
- 28.Hudson RR, Boos DD, Kaplan NL. 1992. A statistical test for detecting geographic subdivision. Mol Biol Evol 9:138–151. 10.1093/oxfordjournals.molbev.a040703. [DOI] [PubMed] [Google Scholar]
- 29.Hudson RR. 2000. A new statistic for detecting genetic differentiation. Genetics 155:2011–2014. 10.1093/genetics/155.4.2011. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 30.Tamura K, Nei M. 1993. Estimation of the number of nucleotide substitutions in the control region of mitochondrial DNA in humans and chimpanzees. Mol Biol Evol 10:512–526. 10.1093/oxfordjournals.molbev.a040023. [DOI] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Supplementary Materials
Fig. S1 to S9 and Tables S1 to S3<br>. Download jvi.01202-21-s0001.pdf, PDF file, 0.3 MB (300KB, pdf)
Data Availability Statement
Raw sequence data files are available at the NCBI database under BioProject accession number PRJNA746593. Alignments of the HQCS10 are available at https://github.com/Bioinfoexperts/pacbio.



