Abstract
Rare CD4 T cells that contain HIV under antiretroviral therapy represent an important barrier to HIV cure1–3, but the infeasibility of isolating and characterizing these cells in their natural state has led to uncertainty about whether they possess distinctive attributes that HIV cure-directed therapies might exploit. Here we address this challenge using a microfluidic technology that isolates the transcriptomes of HIV-infected cells based solely on the detection of HIV DNA. HIV-DNA+ memory CD4 T cells in the blood from people receiving antiretroviral therapy showed inhibition of six transcriptomic pathways, including death receptor signalling, necroptosis signalling and antiproliferative Gα12/13 signalling. Moreover, two groups of genes identified by network co-expression analysis were significantly associated with HIV-DNA+ cells. These genes (n = 145) accounted for just 0.81% of the measured transcriptome and included negative regulators of HIV transcription that were higher in HIV-DNA+ cells, positive regulators of HIV transcription that were lower in HIV-DNA+ cells, and other genes involved in RNA processing, negative regulation of mRNA translation, and regulation of cell state and fate. These findings reveal that HIV-infected memory CD4 T cells under antiretroviral therapy are a distinctive population with host gene expression patterns that favour HIV silencing, cell survival and cell proliferation, with important implications for the development of HIV cure strategies.
Subject terms: Pathogens, HIV infections, HIV infections
HIV-infected memory CD4 T cells under antiretroviral therapy are a distinctive population of cells with transcriptomic patterns that favour HIV silencing, cell survival and cell proliferation.
Main
Understanding how HIV persists during antiretroviral therapy (ART) can advance the search for a safe and scalable HIV cure. A central example of this is the latent reservoir concept, in which some HIV proviruses are thought to persist by maintaining a quiescent state that spares their host cells from virus- or immune-mediated killing2. Evidence supporting this concept includes the presence of rare memory CD4 T cells in ex vivo samples that inducibly express HIV1,3,4, as well as data from culture models demonstrating molecular blocks to HIV transcription, particularly in resting cells5–11. These and other findings have prompted the development of latency-reversing agents (LRAs) that can induce HIV transcription with the goal of exposing infected cells to elimination in vivo. However, the lack of a demonstrable reduction in reservoir size in clinical trials of LRAs12–16 has emphasized how much remains unknown about the barriers to an HIV cure. Of particular importance is the long-standing uncertainty about the biology of HIV-infected CD4 T cell reservoirs. As cells containing quiescent viruses in the blood and tissues have not been identifiable without substantial manipulation, it has been impossible to establish whether these rare cells have special attributes that favour HIV latency or otherwise help to account for HIV persistence under ART. Studies attempting to circumvent this obstacle by detecting HIV enrichment in phenotypic, functional or anatomic CD4 T cell subsets17–27—in some cases using advanced single-cell analyses28,29—have found low levels of infected cells across subsets and emphasized the heterogeneity of the infected cell pool. Thus, the identification of distinctive biological signatures among HIV-infected CD4 T cells under ART has emerged as a central challenge in HIV cure research.
To help address this challenge, we developed a custom microfluidic technology that enables the unbiased detection and gene expression profiling of HIV-infected cells directly ex vivo. The technology, termed focused interrogation of cells by nucleic acid detection and sequencing (FIND-seq)30, separates millions of single cells within water-in-oil droplets for immediate lysis, followed by polyadenylated RNA sequence recovery and then sorting according to HIV DNA detection. This approach isolates whole transcriptomes from cells containing quiescent viruses without the need for in vitro latency reversal, thereby capturing a transcriptome-wide profile of these cells in their natural state. Here we used FIND-seq in people with HIV receiving long-term ART to analyse host gene expression patterns of memory CD4 T cells containing HIV gag DNA—a marker of the HIV-infected cell reservoir that encompasses both intact and defective virus sequences31. Our results reveal distinctive transcriptomic signatures that help to explain HIV-infected CD4 T cell persistence despite the suppression of virus replication, highlighting important opportunities for further progress towards an HIV cure.
HIV-DNA+ cell transcriptome sorting
FIND-seq uses three microfluidic devices to isolate polyadenylated RNA sequences from HIV-DNA+ cells (Fig. 1a–c). The first device loads millions of single cells into water-in-oil droplets with a strongly denaturing lysis buffer and molten agarose covalently conjugated to oligo-dT (Fig. 1a). After encapsulation, the agarose in each single-cell droplet is cooled to form a hydrogel that retains high-molecular-mass DNA as well as polyadenylated RNA. This approach maintains compartmentalization among cells during oil removal, incubations, washes and reagent exchanges, therefore enabling optimized cell lysis, mRNA reverse-transcription and subsequent PCR while preventing interference between steps (Extended Data Fig. 1a–d). The second device reinjects washed hydrogels containing single-cell transcriptome cDNA and genomic DNA into a second emulsion for HIV gag DNA detection (Fig. 1b). The third device uses an accurate dielectrophoretic sorter32 to separate droplets on the basis of their fluorescence (Fig. 1c) for subsequent whole-transcriptomic analysis (Fig. 1d and Extended Data Fig. 1e). Using dilutions of latently infected human J-Lat T cells in uninfected human Jurkat T cells, FIND-seq droplet cytometry detected HIV-DNA+ cells with an estimated sensitivity of 50% and a per-droplet false-positive rate of 1 in 300,000 (Fig. 1e). Transcriptome sequencing in HIV-DNA+ droplets sorted from a 1:1 mixture of J-Lat and mouse cells revealed >99% human sequences (Extended Data Fig. 1f,g). These findings demonstrate that FIND-seq accurately detects rare HIV-DNA+ cells and isolates the transcriptomes from these cells.
Transcriptome sequencing after FIND-seq
We tested whether FIND-seq-sorted transcriptomes accurately represent the cells from which they are sorted by using mixtures of J-Lat T cells and Raji human B cells (Extended Data Fig. 2a). We cultured J-Lat and Raji cell lines separately and performed RNA sequencing (RNA-seq) analysis of each using standard protocols. At the same time, a 1:100 mixture of J-Lat and Raji cells was analysed using FIND-seq (Extended Data Fig. 2b). Gene expression differences between J-Lat and Raji cells after standard processing were highly correlated with differences between HIV-DNA+ and HIV-DNA− cells after FIND-seq processing (R = 0.47, P = 2.2 × 10−16; Extended Data Fig. 2c). Furthermore, differential expression between J-Lat and Raji cells analysed using FIND-seq identified canonical T cell and B cell genes (Extended Data Fig. 2d) and agreed with published findings (Extended Data Fig. 2e). These results demonstrate that FIND-seq can be used to study the transcriptomic signatures of rare HIV-DNA+ cells.
FIND-seq of HIV-DNA+ cells ex vivo
To define gene expression patterns of HIV-DNA+ memory CD4 T cells under ART, we applied FIND-seq to magnetically purified memory CD4 T cell samples from five people with HIV receiving long-term ART that was initiated during chronic infection (Supplementary Table 1). Droplet cytometry data acquired during sorting demonstrated between 534 and 2,153 HIV-DNA+ cells per million (Extended Data Fig. 3a), consistent with previous studies using quantitative PCR analysis of extracted DNA19,20. False-positive frequencies of HIV-DNA+ memory CD4 T cells measured in three HIV-uninfected control participants ranged between 7 and 19 per million (Extended Data Fig. 3b). To maximize sorted transcriptome cDNA quantity and therefore reduce the need for extensive whole-transcriptome amplification (WTA) that could skew gene abundance in the sequencing libraries, we collected all droplets after HIV detection PCR in aliquots of 100 cell-equivalents. Sorting resulted in different numbers of aliquots collected across participants owing to the different frequencies of HIV-DNA+ cells (Extended Data Fig. 3c). After WTA and sequencing, we used a prospective curation process to select only those samples with a high library quality for further analysis (Methods). This resulted in a set of 22 curated samples from three people with HIV (Supplementary Table 2 and Extended Data Fig. 4).
Host transcriptomes of HIV-DNA+ cells
Using the curated dataset (Supplementary Table 3), we first compared host gene expression between HIV-DNA+ and HIV-DNA− memory CD4 T cells at the global level. Unsupervised clustering revealed partial segregation between HIV-DNA+ and HIV-DNA− cell transcriptomes (Fig. 2a), and the use of Euclidean distance as a summary measure of transcriptomic relatedness demonstrated that distances between HIV-DNA+ and HIV-DNA− cell samples were significantly greater than distances among HIV-DNA− cell samples (P = 8.0 × 10−4; Fig. 2b). However, we also observed sample clustering by participant (Fig. 2a) as well as significantly greater Euclidean distances among HIV-DNA+ cell samples than among HIV-DNA− cell samples (P = 2.7 × 10−5; Fig. 2b). We conclude that the whole-transcriptome clustering analysis suggested distinctive host gene expression by HIV-DNA+ memory CD4 T cells, but also indicated that transcriptomic differences among populations of HIV-DNA+ cells and across study participants are substantial sources of variation in the dataset.
Host gene differential expression
To identify individual genes and transcriptomic pathways that were characteristic of HIV-DNA+ memory CD4 T cells, we performed differential gene expression (DGE) analysis using two distinct approaches (Supplementary Table 4). Using a combined approach that analysed participants as biological replicates, we identified 2,776 differentially expressed genes (DEGs; absolute fold change > 1.5, FDR ≤ 0.05) (Extended Data Fig. 5a). Pathway enrichment analysis on the basis of these DEGs yielded several cancer- and cell-cycle-related pathways (Fig. 2c), suggesting differences between HIV-DNA+ and HIV-DNA− memory CD4 T cells related to cell proliferation and survival. Notably, a comparison of DEG lists defined for each of the participants separately revealed only 11 DEGs common to all three participants (Extended Data Fig. 5b–d). However, pathway enrichment analysis using participant-specific DEG lists (absolute fold change ≥ 2, P ≤ 0.01) identified six pathways that shared concordant direction across participants (Fig. 2d and Supplementary Table 5). All six concordant pathways showed z-activation scores of <0, indicating pathway inhibition in HIV-DNA+ cells relative to HIV-DNA− cells. Notably, these inhibited pathways in HIV-DNA+ cells included death receptor signalling, necroptosis signalling and the anti-proliferative Gα12/13 signalling pathway33. Inferences of pathway inhibition arose from both decreased expression of pathway activators and increased expression of pathway inhibitors in HIV-DNA+ cells and depended on differential expression of distinct pathway genes in different participants (Fig. 2e). We conclude that although many individual DEGs distinguishing HIV-DNA+ cells from HIV-DNA− cells differed between the participants, higher-order analysis revealed that inhibition of cell death and anti-proliferative signalling are shared attributes of HIV-DNA+ memory CD4 T cells under ART.
Analysis of co-expressed gene signatures
We anticipated that pooled sequencing from diverse HIV-DNA+ memory CD4 T cells under ART could dilute signals from infected cell subpopulations, thereby limiting the detection of informative features of HIV-infected cells in conventional DGE analysis. To identify transcriptomic signatures of HIV-DNA+ cells as groups of genes, we used weighted gene co-expression network analysis (WGCNA) to define gene modules on the basis of correlation patterns across samples (Supplementary Table 6). Within the curated set of 22 samples that together expressed 17,898 different genes, this process produced 28 distinct gene modules of varying sizes (Fig. 3a). Correlating module gene expression patterns with cell infection status (that is, HIV-DNA+ versus HIV-DNA−) identified significant correlations for module 5 (60 genes, R = 0.46, P = 0.03) and module 28 (85 genes, R = 0.78, P = 2 × 10−5) (Fig. 3a). Thus, unsupervised clustering using WGCNA revealed two groups of genes that account for only 0.81% of the measured transcriptome that distinguished HIV-DNA+ from HIV-DNA− memory CD4 T cells in ART-treated people with HIV.
To characterize the differences between HIV-DNA+ and HIV-DNA− memory CD4 T cells reflected by these modules, we analysed the module gene lists using Gene Ontology (GO). In both modules, we found statistically significant enrichment (adjusted P ≤ 0.05) for genes related to the regulation of gene expression at the transcriptional and post-transcriptional levels (Fig. 3b). Module 28 was enriched for GO terms related to mRNA splicing and processing. Module 5 was enriched for genes involved in mRNA degradation by nonsense-mediated decay, which has been linked to negative post-transcriptional regulation of HIV gene expression in vitro34. Moreover, module 5 was enriched for terms related to cell survival, activation and proliferation, including regulation of death receptor signalling, regulation of calcineurin–NFAT signalling and DNA-damage checkpoint regulation. We conclude that GO analysis of WGCNA module genes identified transcriptional and post-transcriptional gene regulation as well as several cell state regulatory processes as distinguishing attributes of HIV-DNA+ memory CD4 T cells under ART.
Furthermore, we examined the transcriptomic differences between HIV-DNA+ and HIV-DNA− memory CD4 T cells by inspecting a filtered list of the 44 genes in WGCNA modules 5 and 28 that showed at least twofold average difference between HIV-DNA+ and HIV-DNA− cell populations and a concordant direction between populations across the participants (Fig. 3c, Extended Data Table 1 and Supplementary Table 6). We noted that 8 out of 44 genes were previously implicated in the regulation of HIV transcription. Four genes were linked to negative regulation of HIV transcription through histone modification (EHMT135, RBBP436 and MTA137) or promoter-proximal pausing of RNA polymerase II (CTR938), and were higher in HIV-DNA+ cells. The remaining four genes were linked to positive regulation of HIV transcriptional initiation (GTF2I39 and MAPKAPK340) or elongation (NCOA141 and SNW142), and were lower in HIV-DNA+ cells. We conclude that host gene expression signatures of HIV-DNA+ memory CD4 T cells under ART were relatively non-permissive for HIV transcription.
Extended Data Table 1.
Genes identified in WGCNA modules 5 and 28 that showed a ≥2-fold average expression difference between HIV DNA+ and HIV DNA− memory CD4 T cells and a concordant direction across all 3 participants. aCalculated as relative expression level between HIV DNA+ and HIV DNA− cells in DGE analysis using GWB v. 21.0.3. bReported and/or predicted gene product function, curated from https://www.uniprot.org. cSubcellular localization of encoded gene product, curated from https://www.proteinatlas.org. FAS, focal adhesion sites; LD, lipid droplets; AF, actin filaments; nr, not reported. dDocumented effect of gene product expression on HIV expression level, based on in vitro studies cited in Results.
We next examined the remaining 36 genes from the filtered module 5 and 28 gene lists. Ten of these genes encoded RNA-processing factors. In module 5, these included higher levels in HIV-DNA+ cells of antiviral defence factor NCBP143 and post-splicing complex component RNPS144, both of which have been linked to nonsense-mediated decay. Module 5 also included higher levels in HIV-DNA+ cells of G3BP2, a stress granule factor in a gene family that has been implicated in cytoplasmic sequestration and translational inhibition of HIV mRNAs45. mRNA-processing factors in module 28 included higher levels in HIV-DNA+ cells of PRRC2A—a reader of N6-methyladenosine RNA modifications that can be induced by HIV infection in vitro46—and the splicing regulator SRPK. Among the additional 26 genes, we noted that module 28 included USP19 and LRRFIP2, which can inhibit apoptosis47 or pyroptosis48 and were higher in HIV-DNA+ cells, and TLN149, which is required for antigen-driven T cell proliferation mediated through immunological synapses49 and was also higher in HIV-DNA+ cells. Finally, we noted multiple module 28 genes involved in the DNA-damage response and mitochondrial function. We conclude that the transcriptomic signatures of HIV-DNA+ memory CD4 T cells under ART suggest that these cells have the capacity for post-transcriptional HIV silencing, and are also consistent with DGE-based indications of increased cell survival and proliferation.
Enrichment of signatures in cell subsets
To investigate the origins of HIV-DNA+ memory CD4 T cell transcriptomic signatures identified by co-expression network analysis, we compared these signatures with the transcriptomes of defined CD4 T cell subsets. We isolated circulating naive and memory CD4 T cell subsets from nine ART-treated people with HIV (Supplementary Table 1) using fluorescence-activated cell sorting (FACS) (Extended Data Fig. 6), defined subset gene expression using RNA-seq and finally used gene set enrichment analysis (GSEA) to compare gene expression signatures in the sorted memory subsets (defined by expression relative to the naive subset) against co-expression network analysis signatures of HIV-DNA+ cells (Extended Data Table 2). This revealed significant enrichment of the module 5 signature in memory CD4 T cells of the CD27+CCR7+CD45RO+CXCR5+CCR6− peripheral T follicular helper (TFH) phenotype50. No significant enrichment was observed for the module 5 signature in any other subset, or for the module 28 signature in any of the subsets. We conclude that, taken together, the transcriptomic signatures of HIV-DNA+ memory CD4 T cells under ART did not map to defined CD4 T cell subsets, although the module 5 signature showed partial similarity to the signature of CCR6− peripheral TFH cells in ART-treated people with HIV.
Extended Data Table 2.
Naïve and memory CD4 T cell subsets were sorted from PBMC of 9 ART-treated PWH as shown in Extended Data Fig. 6 and were then subjected to standard RNA-seq. Transcriptomic signatures of HIV DNA+ memory CD4 T cells defined by WGCNA modules were compared with transcriptomes of memory CD4 T cell subsets using GSEA pre-ranked analysis as described in Methods. Each WGCNA module gene list was separated into two lists, one each for genes that were either higher (up) or lower (down) in HIV DNA+ memory CD4 T cells relative to HIV DNA− memory CD4 T cells. A false-discovery rate p ≤0.05 was considered to represent statistically significant enrichment. The enrichment score and false-discovery rate for the CD27+CCR7+CD45RO+CXCR5+CCR6− subset are shown in bold italics to indicate statistically significant enrichment for this subset. na, not attempted due to gene set size below minimum required for analysis.
HIV RNA expression analysis
Finally, we used the curated set of 22 samples to analyse HIV transcriptional patterns in HIV-DNA+ memory CD4 T cells under ART by aligning transcriptome sequence reads to a reference HIV genome (Fig. 4). We found that some HIV-DNA+ cell samples showed hundreds of HIV reads (Fig. 4a), including one sample from participant 2510 with two distinct virus sequences (Fig. 4b,c) that suggested processive HIV transcripts from at least two cells in the sorted aliquot of 100 cells. Nevertheless, HIV read percentages for all HIV-DNA+ cell samples were <0.05% (Fig. 4a), which is 100-fold lower than previously reported for HIV-expressing cells sequenced after in vitro stimulation51. These findings are consistent with latent infection and/or HIV sequence defects that limit virus transcription in HIV-DNA+ cells. HIV genome coverage patterns of mapped reads were notable for isolated peaks interspersed with areas of no coverage (Fig. 4d), suggesting atypical transcription start sites52, transcripts from proviruses with deletion mutations and/or chance sampling variations. Spliced transcripts were not detected even by manually inspecting and mapping individual mates of read pairs using BLAST. The use of assembly-based tools to produce contigs from reads that did not initially map to the human reference yielded no HIV contigs from 5/6 HIV-DNA+ cell samples and did not substantially increase mapped HIV read counts in the remaining sample (not shown). We conclude that polyadenylated RNA-seq in HIV-DNA+ memory CD4 T cells from ART-treated people with HIV did not reveal either full-length genomic HIV transcripts or spliced HIV messages encoding accessory proteins.
Discussion
The absence of evidence for HIV reservoir size reduction in ‘shock and kill’ clinical trials has bred uncertainty about the role of therapeutic HIV latency reversal and the use of the latent reservoir concept. Meanwhile, attempts to understand the mechanisms of HIV persistence under ART by identifying distinctive attributes of HIV-infected CD4 T cells have faced major technical obstacles. Using microfluidic technology developed to study HIV-DNA+ memory CD4 T cells under ART in their natural state, we identified host gene expression signatures in these rare cells that were intrinsically non-permissive for the transcription of the virus. This supports the concept that these cells are a latent reservoir and links HIV transcriptional quiescence in vivo to host gene expression patterns that are specific to infected cells. Furthermore, host transcriptomic signatures of HIV-DNA+ memory CD4 T cells under ART indicated that the persistence of these cells may involve additional mechanisms beyond HIV transcriptional silencing, including post-transcriptional HIV silencing, resistance to cell death and resistance to anti-proliferative signalling. These findings are consistent with incomplete latency reversal by early LRAs53–58 and the persistence of infected cells observed even after cell stimulation both in vitro59 and in vivo12–16. Overall, our results in this study therefore reveal a host cell transcriptomic signature of which further elucidation may lead to the development of new HIV cure strategies.
The origins of the gene expression patterns that we identified in this study will require further investigation. In part, these patterns may arise progressively under ART through the selective elimination of cells that do not express them. Selection for an HIV-silencing signature may occur among cells that are competent to express toxic virus gene products in vivo, while selection for cell survival and proliferation could apply to the entire HIV-DNA+ cell pool. Importantly, this selection model implies that there are pre-existing differences among CD4 T cells in the expression of HIV silencing, cell survival and cell proliferation signatures that did not trace in their entirety to a single memory CD4 T cell subset. These signatures may therefore reflect mixed contributions from multiple subsets, each with modest enrichment for the virus, perhaps exemplified by our partial mapping of one co-expressed module signature to peripheral TFH cells. At the same time, it is also possible that some gene expression patterns of HIV-infected memory CD4 T cells are a consequence of HIV infection in these cells. Cellular transcriptomic reprogramming could represent a host response to HIV integration or other life cycle steps, as suggested by co-expressed module signature genes encoding virus-induced and DNA-damage response factors. Alternatively, although we detected little evidence of polyadenylated HIV RNA expression in HIV-DNA+ cells, it remains possible that components of infecting HIV virions or HIV expression products of which transcripts went undetected in our sequencing—due to transient expression or method sensitivity—might actively reprogram host cell gene expression. Future studies elucidating such mechanisms may yield new targets for HIV cure strategies.
Our findings in this study have several limitations. First, owing to technical challenges, we sorted and sequenced pools of HIV-DNA+ cell transcriptomes without distinguishing between intact and defective HIV genomes31. As a result, technical advances in FIND-seq and/or new technologies will be required to define how the transcriptomic signatures identified here are distributed among individual cells. Analysis of HIV-DNA+ cells at the single-cell level will avoid dilution of signatures from reservoir subpopulations, thereby refining and extending the findings from this study. Single-cell transcriptomic analyses that distinguish between intact and defective HIV may also clarify whether HIV silencing signatures arise strictly by selection within translation-competent reservoirs, or whether these signatures can arise even when the infecting virus genome has acquired lethal mutations during reverse transcription. Second, although many of the transcriptomic signature genes identified here have well-defined roles in regulating HIV gene expression, cell survival or cell proliferation, the roles of other genes in HIV persistence will require further study. Those signature genes that have RNA-processing functions but have not previously been linked with HIV replication will be of particular interest, as some of these could contribute to post-transcriptional regulation of HIV gene expression while others might serve only as markers of infected cells. Third, our findings address neither the durability of transcriptomic signature expression within each infected cell nor the distribution of cells expressing signature genes across diverse tissue compartments, raising important questions about reservoir cell dynamics that impact the development of HIV cure strategies. Fourth, as our study included a small number of participants, it is possible that larger FIND-seq studies performed in diverse participant populations and incorporating technical improvements to increase the recovery of high-quality data will reveal signatures that were not detected here. Finally, it is important to acknowledge that the barriers to HIV cure under ART may include virus reservoirs outside the memory CD4 T cell pool60–62.
Notwithstanding these limitations, our findings highlight two parallel but complementary paths in translational and basic research towards an HIV cure. The first is an increased emphasis on in vivo studies targeting the full range of mechanisms that both maintain HIV quiescence and prevent the death of HIV-infected cells. The approaches taken may include synergistic combinations of LRAs targeting diverse HIV transcriptional and translational blocks, paired with therapies that potentiate physiological CD4 T cell death. However, as the complexity of therapeutic combinations increases, their potential for significant toxicity may become a growing concern. Thus, the second path forward is an ongoing effort to define gene expression patterns within HIV-infected cellular reservoirs and to understand their mechanistic basis. The intent is for these approaches to reveal how HIV silencing, cell survival and cell proliferation programs come to be expressed among the diverse memory CD4 T cells present in vivo, therefore generating additional insights that may be translated to effective and safe HIV-cure-directed therapies.
Methods
Study participants
Recruitment of study participants with HIV was performed in compliance with relevant ethical regulations under the IRB-approved SCOPE protocol (NCT00187512) at San Francisco General Hospital. Participants were enrolled from the SCOPE cohort on the basis of sample availability at the time of study, without use of sample size calculations, blinding or randomization. Demographic and clinical laboratory data were collected at San Francisco General Hospital and are reported in Supplementary Table 1. All of the participants provided informed consent before study. Prescreening of participant samples to ensure adequate numbers of HIV-DNA+ memory CD4 T cells for FIND-seq analysis was performed in parallel sample aliquots using fluorescence-assisted clonal amplification63.
Cell lines
Jurkat human T cells (TIB-152, ATCC), HIV-DNA+ J-Lat full-length human T cells (clone 6.3, ARP-9846)64 and Raji human B cells (CCL-86, ATCC) were cultured in Gibco RPMI Medium 1640 (Thermo Fisher Scientific, 11875093) with penicillin and streptomycin (Thermo Fisher Scientific, 15140122) and 10% fetal bovine serum (FBS). Mouse fibroblasts (NIH/3T3, CRL-1658, ATCC) were cultured in Dulbecco’s modified Eagle’s medium (DMEM) with penicillin and streptomycin (Thermo Fisher Scientific, 15140122) and 10% FBS. Before use, 3T3 cells were dissociated using 0.25% trypsin-EDTA (Thermo Fisher Scientific, 25200-072) and neutralized in DMEM with 10% FBS. Cell lines were used without authentication or mycoplasma contamination testing.
Fabrication of microfluidic devices
Standard photolithography techniques were used to fabricate microfluidic devices at the Harvard Medical School Microfluidics Facility. Silicon wafers were spin-coated with SU-8 2025/2050 photoresist (Kayaku Advanced Materials) and ultraviolet-patterned using a mask aligner. After developing, the wafers were baked overnight and used as master moulds for soft-lithography. In brief, the PDMS prepolymer and curing agent were mixed by hand at a ratio of 10 to 1 (Momentive, RTV615), degassed for at least 1 h, poured onto the mould and degassed until no bubbles remained. PDMS was baked overnight at 65 °C before holes were punched using a 0.75 mm biopsy punch and bonded to a glass slide (75 × 50 × 1.0 mm, Thermo Fisher Scientific, 12–550C) with a plasma bonder (Technics Plasma Etcher 500-II). Bonded devices were made hydrophobic with Aquapel with a 30 s contact time, flushed with HFE-7500, purged with air and baked for at least 1 h before use.
Cell line validation studies
Cells were washed twice with Hanks’ balanced salt solution (HBSS, no calcium, no magnesium, Thermo Fisher Scientific, 14170112) and then counted, mixed (mouse:human 1:1; J-Lat:Raji 1:100), and resuspended in HBSS containing 18% OptiPrep Density Gradient Medium (Sigma-Aldrich) for FIND-seq. For standard RNA-seq studies performed in parallel, aliquots of 5 × 104 cells were lysed in RNAzol RT (Molecular Research Center) and stored at −80 °C until subsequent total RNA extraction according to the manufacturer’s instructions. Whole-transcriptome cDNA was then generated from total RNA by reverse transcription using 6 mM MgCl2, 1 M betaine, 7.5% PEG-8000, 1 mM dNTP, 2 U µl−1 Maxima H-minus reverse transcriptase (Thermo Fisher Scientific, EP0753), 0.5 U µl−1 RNase inhibitor (Lucigen, NxGen) and 2 µM SMART TSO (AAGCAGTGGTATCAACGCAGAGTGAATrGrGrG). This cDNA was purified using AMPure XP beads (Beckman Coulter), and was then processed for WTA by PCR, with library preparation as previously described65. FIND-seq sample processing and library preparation were performed as described below. The correlation between the DGE results from standard RNA-seq and FIND-seq was analysed using stat_cor (method = “pearson”) in R (v.4.1.0). The results from the J-Lat:Raji mixing study were compared with published transcriptomic signatures of CD4 T cells and B cells66 using GSEA.
PBMC processing for FIND-seq
Approximately 20–30 million cryopreserved peripheral blood mononuclear cells (PBMCs) from each study participant were used for FIND-seq. Cryopreserved PBMC suspensions were thawed in a 37 °C water bath, washed in prewarmed RPMI with 10% FBS, and sedimented by centrifugation at 300 rpm (Sorvall Legend XT). Untouched memory CD4 T cells were then isolated by magnetic-column-based negative selection (Miltenyi, 130-091-893). Cells were counted manually with a haemocytometer using Trypan blue, and aliquots of 5 × 104 cells were lysed and stored in RNAzol RT.
FIND-seq
FIND-seq was performed as described previously30. In brief, four syringes were prepared for microfluidic cell encapsulation: lysis buffer, agarose, cells and oil. The lysis buffer consisted of 20 mM Tris-HCl pH 7.5, 1,000 mM LiCl, 1% LiDS, 10 mM EDTA, 10 mM DTT and 0.4 µg µl−1 proteinase K. Conjugated agarose-dT was heated to 95 °C for 1 h before use and was kept heated throughout the run using a custom syringe heater. A 10 ml syringe was loaded with oil (Bio-Rad, 186–3005) for droplet generation. All of the syringes were connected to the microfluidic device using PE/2 tubing (Scientific Commodities, BB31695-PE/2). To make droplets, pumps were run at 600 μl h−1 (cell mixture), 1,200 μl h−1 (agarose), 600 μl h−1 (lysis buffer), and 5,000 μl h−1 (oil) using a bubble-triggered drop generator67. Air was controlled to break the jet and generate 53–55 µm droplets. After lysis at 55 °C for 2 h, droplets were cooled at 4 °C overnight to allow agarose gelation. Solid agarose microspheres (beads) were removed from the oil using a drop-breaking procedure. All of the steps were performed at 4 °C to prevent dissociation of mRNA from the poly(T) oligonucleotides. The beads were removed from the oil and washed five times. For each wash, the beads were incubated in wash buffer for 5 min on ice, centrifuged at 4,700 rpm for 10 min and aspirated before the next wash. Beads were first washed in wash buffer 1 containing 20 mM Tris-HCl pH 7.5, 500 mM LiCl, 0.1% LiDS and 0.1 mM EDTA. Next, the beads were washed twice with wash buffer 2 containing 20 mM Tris-HCl pH 7.5 and 500 mM NaCl. Finally, the beads were washed twice in 5× reverse transcription buffer containing 250 mM Tris-HCl pH 8.3, 375 mM KCl, 15 mM MgCl2 and 50 mM DTT and filtered with a 100 µm cell strainer. The beads were resuspended in reverse transcription master mix to a final concentration of 6 mM MgCl2, 1 M betaine, 7.5% PEG-8000, 1 mM dNTP, 2 U µl−1 Maxima H-minus reverse transcriptase (Thermo Fisher Scientific, EP0753), 0.5 U µl−1 RNase inhibitor (Lucigen, NxGen) and 2 µM SMART TSO (AAGCAGTGGTATCAACGCAGAGTGAATrGrGrG). Reverse transcription was completed at 25 °C for 30 min, followed by 90 min at 42 °C. The tubes were mixed continuously with an inverter during all incubations. After reverse transcription, the beads were washed five times with 0.1% Pluronic in RNase/DNase-free water.
After reverse transcription, the cell occupancy of agarose beads was quantified by microscopy and successful reverse transcription was checked using WTA before continuing with bead reinjection and sorting. Agarose beads containing cellular genomes and transcriptomes were reinjected into droplets to perform single-cell HIV detection PCR. Beads were mixed with PCR reagents to achieve a final concentration of 1× TaqPath Mastermix (Thermo Fisher Scientific, A30866), PEG-6000 (0.5% (w/v)), Tween-20 (0.5% (w/v)), F-127 Pluronic (0.5% (w/v)), BSA (0.1 mg ml−1), HIV gag forward primer (CACTGTGTTTAGCATGGTGTTT, 900 nM), HIV gag reverse primer (TCAGCCCAGAAGTAATACCCATGT, 900 nM) and HIV gag hydrolysis probe (CY5-ATTATCAGAAGGAGCCACCCCACAAGA-3′ Iowa Black RQ, 250 nM)68. To generate the final 1× reaction mixture concentration, beads were soaked in 2× PCR master mix on a shaker for 30 min in the dark. Next, the beads were centrifuged and loaded into a 3 ml syringe. The remaining 1× PCR master mix (supernatant) was loaded into a separate 3 ml syringe. Finally, the beads and 1× PCR master mix were reinjected in the microfluidic device to encapsulate the beads into 70 µm droplets69. Agarose beads were re-encapsulated in droplets with about 70% loading, which is not accounted for in the detection efficiency calculation. Droplets were collected in 40 µl aliquots in PCR strips and thermocycled as follows: 88 °C for 10 min; then 55 cycles of 88 °C for 30 s and 60 °C for 1 min. After thermocycling, droplets were transferred into a 3 ml syringe for microfluidic sorting.
HIV-DNA+ and HIV-DNA− droplets were sorted on the basis of the HIV PCR signal using a concentric sorter as previously described32. For HIV-DNA−-sorted samples, we sorted 100 cell equivalents based on the number of genomes per hydrogel bead determined previously, collecting a mixture of HIV-DNA− cell droplets and cell-free droplets. For HIV-DNA+-sorted samples, we sorted aliquots of 100 droplets. The sorter was run with the following flow rates: 180 μl h−1 cell droplets, 6,000 μl h−1 bias oil (HFE-7500), 250 μl h−1 spacer oil (HFE-7500) and 3,500 μl h−1 extra spacer oil (HFE-7500). To sort, the 2 M NaCl on-chip electrode was polarized using a high-voltage amplifier at 1,200 V, 4,000 Hz for 15 cycles with 120 μs delay. We sorted into 1.5 ml Eppendorf tubes, removed all but 20 µl of the oil, added 50 µl of distilled nuclease-free water and centrifuged the sample at 20,000g for 5 min, and then stored the samples at −80 °C.
Before performing WTA on sorted HIV-DNA+ droplets in each participant, we determined the WTA cycle number that was required to amplify transcriptome cDNA from 100 cells in that participant. Accordingly, we first performed WTA on HIV-DNA−-sorted sample aliquots. Sorted HIV-DNA− sample aliquots (frozen at −80 °C) were heated to 60 °C on a heat block for 10 min, mixed carefully by pipet and centrifuged at 20,000g for 5 min. The aqueous layer was then transferred to PCR strips and a WTA PCR reaction was performed using the 1× KAPA HiFi Master mix (Roche, KK2601) and 0.4 μM Smart-seq2 primer (AAGCAGTGGTATCAACGCAGAGT). Sorted material was thermocycled as follows: 95 °C for 3 min; then 18–22 cycles of 98 °C for 15s, 67 °C for 20s and 68 °C for 4 min; then 72 °C for 5 min, with a 4 °C terminal hold. The WTA was performed at three different cycle numbers—18, 20, and 22 cycles. All reactions were subsequently purified using a 1.2:1 ratio of AMPure XP beads (Beckman Coulter), with the final elution performed in 20 µl of nuclease-free water. After WTA, the DNA yield was quantified using the Qubit 4 Fluorometer and DNA size distribution was assayed using a Bioanalyzer 2100 with High Sensitivity DNA chip. On the basis of these results, the HIV-DNA+-sorted samples were processed as above using the minimal cycle number required to achieve a concentration of greater than 2 ng µl−1 in 20 µl of elution volume.
Sequencing and read preprocessing
Libraries were prepared from transcriptome material sorted by FIND-seq using the Nextera XT Library Preparation Kit with v2 indexes. Individual sample libraries were combined at equimolar amounts to produce a single library pool. The library was quantified using the KAPA SYBR FAST Universal qPCR Kit. The library concentration and fragment size distribution were confirmed using the Agilent Bioanalyzer 2100 with High Sensitivity DNA chip. The library was diluted and denatured in accordance with the Illumina MiSeq System Denature and Dilute Libraries Guide (document 15039740). Cell line libraries were sequenced on the Illumina MiSeq system in 2 × 75 bp runs, and the selected libraries were subsequently sequenced again on the Illumina HiSeq 4000 system in a 2 × 75 bp run, operated using the Illumina HiSeq Control Software (HCS) v.3.4.0. For samples from people with HIV, libraries were first pooled and run on the Illumina MiSeq system in a 2 × 75 bp run, then rebalanced and run on the Illumina HiSeq 4000 system in a 2 × 75 bp run. Raw sequencing data were converted to fastq format using the bcl2fastq2 script (v.2.20) from Illumina and the reads were demultiplexed using sample-specific indexes. The resulting fastq files were trimmed for quality, ambiguity and presence of read-through adapters using the ‘Trim reads’ tool with the default settings in CLC Genomics Workbench (GWB) v.21.0.3. The quality of the raw and trimmed reads was assessed using QC tools in GWB.
Participant sample data quality filtering
Owing to the abundance of HIV-DNA− cells in samples from ART-treated people with HIV, HIV-DNA− cells were sorted in multiple replicates. Sequencing data were generated from 53 HIV-DNA+ and HIV-DNA− cell samples sorted by FIND-seq from 5 people with HIV. A prospective curation approach was used to exclude low-quality samples from downstream transcriptomic analysis. HIV-DNA− sample quality was assessed according to the following parameters: (1) the total number of reads sequenced; (2) the percentage of intergenic and intronic reads; (3) the proportion of ribosomal RNA (rRNA) reads; and (4) the exonic fragment count (Supplementary Table 2). Samples that had a paired-end read count of less than 106 and had >35% mapped intergenic reads were excluded. Furthermore, within each participant, HIV-DNA− samples that differed qualitatively from other replicates by having lower exonic reads or higher rRNA content were removed. If all HIV-DNA− samples were removed for a participant, that participant was excluded from further analysis. After the removal of 31 FIND-seq-sorted samples in this curation process, 22 HIV-DNA+ and HIV-DNA− samples belonging to participants 2208, 2510 and 3209 remained (Supplementary Table 2).
Analysis pipeline testing
The transcriptomes of primary cell samples generated by FIND-seq showed high proportions of intronic and intergenic reads (Extended Data Fig. 4). We therefore performed a second, deeper sequencing of libraries from the J-Lat:Raji cell mixing study and tested whether bioinformatics pipelines that address coverage bias and/or genomic DNA contamination might mitigate the effects of these patterns on the gene expression results. In total, we evaluated nine different pipelines using control data from the J-Lat:Raji cell line mixing study. The details of each pipeline are found below; the default options and parameters were used for all tools unless otherwise noted. Reads were mapped against the GRCh38 (ENSEMBL v.100) reference with coding gene annotations only for all pipelines tested.
CLC Genomics Workbench
CLC Genomics Workbench (GWB) v.20 and v.21 (https://digitalinsights.qiagen.com/) were tested using the default settings for mapping and abundance estimation using the RNA-seq analysis tool. For DGE analysis in GWB v.21, the option to filter average expression before FDR correction was selected.
3′ tag counting
Raw reads were preprocessed and mapped using GWB v.21. As in a previous study70, reads were mapped to the region within 1,500 bp from the 3′ end of the gene and expression values were calculated in GWB. Analysis of DGE was also performed in GWB.
Salmon with positional bias correction
Salmon v.1.3.0 was implemented as it includes an algorithm for transcript expression quantification that incorporates bias modelling to account for position specific and other biases that are commonly seen in RNA-seq data71. Read mapping generated from GWB v.20 was used as the input. Post-quantification analysis of DGE was performed using EdgeR (v.3.32.1)72 and DESeq2 (v.1.30.1)73.
SeqMonk DNA contamination correction
We considered that relatively high intergenic read proportions in sorted samples might be due to library incorporation of the genomic DNA retained with each cell during FIND-seq. We therefore used the SeqMonk expression quantification (http://www.bioinformatics.babraham.ac.uk/projects/seqmonk/) pipeline v.1.47.2, which estimates and corrects count data for each transcript using the density of intergenic reads. Read mapping previously processed in GWB v.20 was used as the input. Analysis of DGE was performed in DESeq2. Expression qualification and DGE with or without DNA contamination correction (SeqMonk) was evaluated, and each was tested with or without automatic independent filtering (DESeq2).
Selection of the analysis pipeline
For each pipeline, transcriptome accuracy was assessed by comparing J-Lat:Raji FIND-seq mixing study DGE results with the DGE detected between J-Lat cells and the unsorted J-Lat:Raji mixture in standard RNA-seq. DEGs were considered as those with an absolute fold change of ≥1.5 and FDR ≤ 0.05. DEGs identified in standard RNA-seq but not in FIND-seq were considered to be false negatives (FN); those identified only after FIND-seq as false positives (FP); and those identified in both FIND-seq and standard RNA-seq as true positives (TP). Based on this, the sensitivity (or recall) as TP/(TP + FN) and positive predictive value (PPV) as TP/(TP + FP) for each analysis process were calculated (Supplementary Table 7).
GWB v.20 and v.21 yielded the highest combination of sensitivity and PPV. Pipelines that corrected for coverage bias and DNA contamination did not increase the sensitivity, and in several cases showed lower PPV. Although GWB v.20 had a higher PPV than GWB v.21, there were developments in the GWB v.21 transcriptome analysis pipeline that were anticipated to reduce noise in primary cell samples. Thus, the pipeline in GWB v.21 was selected for the analysis of participant samples.
DGE between HIV-DNA+ and HIV-DNA− memory CD4 T cells
As described above, transcriptome data from FIND-seq-sorted material contained higher proportions of intronic and intergenic sequences than the standard RNA-seq data. These non-exonic sequences were also abundant in material that was subjected to only the hydrogel encapsulation and cDNA synthesis steps of FIND-seq, consistent with the requisite co-retention of cell genomic DNA with transcriptome material and with efficient nuclear lysis and capture of immature transcripts in our hydrogel-based workflow. Accordingly, after curating the participant samples on the basis of quality, differential expression using only exonic reads was performed (Supplementary Table 3). Using GWB v.21, a combined analysis was performed using the Wald test with Benjamini–Hochberg multiple-testing correction by defining DEGs between HIV-DNA+ and HIV-DNA− samples using data from the three participants as biological replicates, while controlling for any interparticipant differences in expression. Moreover, a participant-specific analysis was performed by determining DEGs within each participant separately (Supplementary Table 4). The default settings for all other parameters for the differential expression for RNA-seq tool were used except for Filter on average expression for FDR correction, which was enabled for all analyses. Unless otherwise noted, cut-offs for statistical significance of DEGs were absolute fold change of ≥1.5 and FDR ≤ 0.05.
Euclidean distance calculation
Pairwise Euclidean distances between the curated samples were calculated using the dist function in R (v.4.1.0) using a matrix of counts per million mapped reads (CPM) gene expression values as input. For each sample of a given HIV DNA status group (that is, HIV-DNA+ or HIV-DNA−), average intragroup and intergroup distances to all other curated samples were calculated, with values plotted in GraphPad Prism (v.9.3.1). Statistical significance of distance differences between groups was calculated using Mann–Whitney U-tests.
Transcriptomic pathway expression differences between HIV-DNA+ and HIV-DNA− cells
Ingenuity Pathway Analysis (Qiagen, summer release 2021) was used to identify enriched biological pathways (Supplementary Table 5) on the basis of DEG lists. For the combined analysis considering samples from different participants as biological replicates, DEGs with an absolute fold change of ≥1.5 and FDR ≤ 0.05 were used. For the participant-specific analysis, DEGs with a fold change of ≥2 and raw P ≤ 0.01 were used and pathways regulated in the same direction for all three participants were identified.
The directionality of enrichment of pathways for each analysis was determined from the z-score, which is calculated in Ingenuity Pathway Analysis to represent predicted relative pathway activity. The z-score for each pathway was calculated using the list of genes annotated to that pathway and meeting criteria for statistically significant differential expression between HIV-DNA+ and HIV-DNA− cells. A simplified z-score was calculated as follows: Z = (N+ − N–)/(√N), where N+ and N– are those genes of which the direction of regulation is concordant or discordant with predictions from the literature. A positive z-score implies activation of a pathway, whereas a negative z-score implies inhibition. Statistical significance of the enrichment of a pathway was determined using a right-tailed Fisher’s exact test as described previously74. Networks of pathways identified as inhibited across participants and their corresponding genes were plotted using ClusterProfiler (v.4.1.1)75.
WGCNA
Weighted gene co-expression network analysis76 was performed in R using the WGCNA package (v.1.70) with a gene expression matrix of CPM values. Genes detected in <2 samples were excluded from analysis. The one-step automatic method was used for network construction and module detection. A soft thresholding power (β) of 6 was selected based on approximate scale-free topology using the function pickSoftThreshold. The co-expression network was built with a minimum module size of 30, reassignThreshold of 0 and mergeCutHeight of 0.25. The default values were used for the other parameters. Co-expressed modules of genes that correlated with HIV-DNA+ and HIV-DNA− status were identified. Modules that were correlated with the traits with P ≤ 0.05 were considered to be significant. GO enrichment analysis for the genes belonging to the two WGCNA modules significantly correlated with cell HIV DNA status was performed using Enrichr (29 March 2021 release)77,78. Enrichment analysis was performed using a Fisher’s exact test with Benjamini–Hochberg multiple-testing correction.
Analysis of HIV reads
To identify sequence reads representing HIV RNA, we created a combined human (GRCh38, ENSEMBL v.100) and HIV (GenBank: KT284371) reference. The HIV sequence for this reference was derived from the clade B representative in the 2016 LANL HIV sequence compendium, with deletions in the LTR regions replaced by the corresponding sequence and annotations from HXB2CG (GenBank: K03455 M38432), and with masking of the gag amplicon detected in FIND-seq. Reads were aligned to the combined reference using the Map reads to reference tool with the default settings in GWB (v.21). Counts were obtained for reads extracted from mapping to the combined reference. Mapped reads were visualized using GWB and Integrated Genome Viewer (v.2.11.9).
The frequencies of sequence variants in HIV reads compared to the reference sequence were examined to assess the presence of multiple virus sequences. To do this, a consensus of aligned sequences was generated and reads mapping to the HIV genome were extracted. These reads were then mapped against the consensus reference sequence. The resulting mapping was improved by local realignment in areas containing insertions and deletions (indels). Variants were then identified using the ‘low frequency’ variant caller in GWB v.21 with a minimum coverage of 2, minimum count of 1, inclusion of broken reads and without relative read direction filter applied. The default options for the other parameters were used. The list of variants obtained was manually inspected and filtered to remove (1) those with a frequency above 50% (thus representing the predominant sequence rather than a minor variant) and (2) those with read count = 1 or that represented presumptive technical insertions in homopolymeric regions.
Moreover, the Sequences from HIV Easily Reconstructed (SHIVER)79 pipeline (v.1.5.8) was tested to create a hybrid reference from de novo assembled contigs of HIV reads for individual samples and closely matched reference sequences. In brief, reads were mapped to the GRCh38 (ENSEMBL v.100) reference using the Map reads to reference tool in GWB v.21 with stringent settings, with the length fraction and similarity fraction parameters set to 0.8. Unmapped reads were then collected and paired reads among them were processed using the de novo assembly tool in GWB (v.21) with the default settings. We also tested the iterative virus assembler (IVA; v.1.0.11) to perform de novo assembly from the unmapped reads using the default settings, but did not recover HIV contigs using this tool. Contig sequences obtained from GWB (v.21) were exported in fasta format and were processed using the SHIVER pipeline with the default settings. A clade B HIV genome obtained from the 2016 LANL sequence compendium was used as a reference.
Enrichment analysis of WGCNA modules in defined CD4 T cell subsets
Viably cryopreserved PBMCs from ART-treated people with HIV were thawed and stained for FACS with LIVE/DEAD Aqua stain (Molecular Probes) and the following antibodies (with the indicated dilutions): CXCR5-Alexa Fluor 488 (1:7; BD), CCR5-Cy7PE (1:10; BD), CD27-Cy5PE (1:10; Beckman Coulter), CD45RO-PE-Texas Red (1:12; Beckman Coulter), CD14-PE (1:80; BD), CD11c-PE (1:40; BD), CD3-H7APC (1:5; BD), CCR7-Alexa Fluor 700 (1:8; BD), CD20-APC (1:5; BD), CD56-APC (1:10; BD), T cell receptor gamma delta (TCR-γδ)-APC (1:5; BD), PD1-Brilliant Violet 711 (1:10; BioLegend), CD8-Qdot 655 (1:200; Invitrogen), CD4-Qdot 605 (1:200; Invitrogen), CD57-Qdot 585 (1:50; Invitrogen) and CCR6-Brilliant Violet 421 (1:10; BD). Stained samples were sorted into CD4 T cell subsets using the FACSAria (BD) system by first gating for single cells that were CD3+, Aqualow and negative for CD11c, CD14, CD20, CD56 and TCR-γδ. The remaining events that were CD4+ and CD8− were then collected as naive (CD27+CD45RO−) or memory CD4 T cell subsets (see memory subset definitions in Extended Data Table 2). Sorted cell subsets were processed for total RNA extraction and whole-transcriptome sequencing as described previously63. The resulting data were processed using the standard pipeline in GWB v.21 using the human reference (GRCh38, ENSEMBL v. 100) with only the coding gene annotations. The resulting CPM values were exported and provided as an input to GSEA (v.4.2.3)80,81. Enrichment of module 5 and 28 signatures (separated into genes upregulated and downregulated between HIV-DNA+ and HIV-DNA− cells) was identified in transcriptome data from each memory CD4 T cell subset (with data from the naive CD4 T cell subset serving as a reference). GSEA was run using the default settings for all of the parameters.
Reporting summary
Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.
Online content
Any methods, additional references, Nature Portfolio reporting summaries, source data, extended data, supplementary information, acknowledgements, peer review information; details of author contributions and competing interests; and statements of data and code availability are available at 10.1038/s41586-022-05556-6.
Supplementary information
Source data
Acknowledgements
We thank the participants in this study; and N. Morgan for discussions. J-Lat full-length cells, clone 6.3, contributed by E. Verdin, were obtained through the NIH HIV Reagent Program, Division of AIDS, NIAID. This work was supported by NIH U01 AI129206-01 (to E.A.B. and A.R.A.), the CZ Biohub (to A.R.A.), the American Foundation for AIDS Research investment grant 109537-61-RGRL (to E.A.B.), the Delaney AIDS Research Enterprise (DARE) to Find a Cure 1UM1AI126611-01 (to S.G.D.), the NIH Office of AIDS Research Strategic Fund (to E.A.B.), the Intramural AIDS Targeted Antiviral Program (IATAP; to D.C.D.), the NIH Intramural Research Program (to D.C.D. and E.A.B.) and by NIH R01AI149699 (to F.J.Q. and A.R.A.). I.C.C. is supported by a transition grant from the NIH (K22AI152644).
Extended data figures and tables
Author contributions
Conceptualization: D.C.D., A.R.A. and E.A.B. Methodology: I.C.C., A.R.A. and E.A.B. FIND-seq analysis of HIV-DNA+ cells: I.C.C., S.T., M.A.W. and S. Shah. Molecular optimization studies: I.C.C., S.T., S. Smith, M.H., S.H.K., D.G.B., J.S.L., D.K., S.Z., S.C. and S.D. Library preparation and sequencing: S. Smith and A.R.H. Flow cytometry analysis of CD4 T cell subsets: M.A.-L., M.T. and L.P. Bioinformatic analysis: P.M., I.C.C. and S.B. Resources: I.C.C., R.H., S.G.D., F.J.Q., D.C.D., A.R.A. and E.A.B. Manuscript preparation: I.C.C., P.M. and E.A.B. Supervision: F.J.Q., D.C.D., A.R.A. and E.A.B.
Peer review
Peer review information
Nature thanks the anonymous reviewers for their contribution to the peer review of this work.
Data availability
Transcriptome sequencing data from human study participants were deposited with controlled access in the database of Genotypes and Phenotypes (dbGaP; phs003095.v1.p1). Transcriptome sequencing data from cell line experiments were deposited in the NCBI Sequencing Read Archive (SRA; accessions PRJNA819479 and PRJNA893817). Gene sets M3077 and M3076 analysed in Extended Data Fig. 2 are available online (https://www.gsea-msigdb.org/). Source data are provided with this paper.
Competing interests
I.C.C., A.R.A. and E.A.B. have prepared a provisional patent application for submission related to the technology used in this study.
Footnotes
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
These authors contributed equally: Adam R. Abate, Eli A. Boritz
Contributor Information
Adam R. Abate, Email: arabate@gmail.com
Eli A. Boritz, Email: eli.boritz@nih.gov
Extended data
is available for this paper at 10.1038/s41586-022-05556-6.
Supplementary information
The online version contains supplementary material available at 10.1038/s41586-022-05556-6.
References
- 1.Finzi, D. et al. Identification of a reservoir for HIV-1 in patients on highly active antiretroviral therapy. Science278, 1295–1300 (1997). 10.1126/science.278.5341.1295 [DOI] [PubMed] [Google Scholar]
- 2.Siliciano, J. D. & Siliciano, R. F. In vivo dynamics of the latent reservoir for HIV-1: new insights and implications for cure. Annu. Rev. Pathol.17, 271–294 (2022). 10.1146/annurev-pathol-050520-112001 [DOI] [PubMed] [Google Scholar]
- 3.Wong, J. K. et al. Recovery of replication-competent HIV despite prolonged suppression of plasma viremia. Science278, 1291–1295 (1997). 10.1126/science.278.5341.1291 [DOI] [PubMed] [Google Scholar]
- 4.Procopio, F. A. et al. A novel assay to measure the magnitude of the inducible viral reservoir in HIV-infected individuals. EBioMedicine2, 874–883 (2015). 10.1016/j.ebiom.2015.06.019 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 5.Barboric, M. et al. Tat competes with HEXIM1 to increase the active pool of P-TEFb for HIV-1 transcription. Nucleic Acids Res.35, 2003–2012 (2007). 10.1093/nar/gkm063 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 6.Bosque, A. & Planelles, V. Induction of HIV-1 latency and reactivation in primary memory CD4+ T cells. Blood113, 58–65 (2009). 10.1182/blood-2008-07-168393 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 7.Ghose, R., Liou, L. Y., Herrmann, C. H. & Rice, A. P. Induction of TAK (cyclin T1/P-TEFb) in purified resting CD4+ T lymphocytes by combination of cytokines. J. Virol.75, 11336–11343 (2001). 10.1128/JVI.75.23.11336-11343.2001 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 8.Kinoshita, S. et al. The T cell activation factor NF-ATc positively regulates HIV-1 replication and gene expression in T cells. Immunity6, 235–244 (1997). 10.1016/S1074-7613(00)80326-X [DOI] [PubMed] [Google Scholar]
- 9.Nabel, G. & Baltimore, D. An inducible transcription factor activates expression of human immunodeficiency virus in T cells. Nature326, 711–713 (1987). 10.1038/326711a0 [DOI] [PubMed] [Google Scholar]
- 10.Sedore, S. C. et al. Manipulation of P-TEFb control machinery by HIV: recruitment of P-TEFb from the large form by Tat and binding of HEXIM1 to TAR. Nucleic Acids Res.35, 4347–4358 (2007). 10.1093/nar/gkm443 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 11.Van Lint, C., Emiliani, S., Ott, M. & Verdin, E. Transcriptional activation and chromatin remodeling of the HIV-1 promoter in response to histone acetylation. EMBO J.15, 1112–1120 (1996). 10.1002/j.1460-2075.1996.tb00449.x [DOI] [PMC free article] [PubMed] [Google Scholar]
- 12.Archin, N. M. et al. Interval dosing with the HDAC inhibitor vorinostat effectively reverses HIV latency. J. Clin. Invest.127, 3126–3135 (2017). 10.1172/JCI92684 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 13.Archin, N. M. et al. Administration of vorinostat disrupts HIV-1 latency in patients on antiretroviral therapy. Nature487, 482–485 (2012). 10.1038/nature11286 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 14.Elliott, J. H. et al. Short-term administration of disulfiram for reversal of latent HIV infection: a phase 2 dose-escalation study. Lancet HIV2, e520–e529 (2015). 10.1016/S2352-3018(15)00226-X [DOI] [PMC free article] [PubMed] [Google Scholar]
- 15.Rasmussen, T. A. et al. Panobinostat, a histone deacetylase inhibitor, for latent-virus reactivation in HIV-infected patients on suppressive antiretroviral therapy: a phase 1/2, single group, clinical trial. Lancet HIV1, e13–e21 (2014). 10.1016/S2352-3018(14)70014-1 [DOI] [PubMed] [Google Scholar]
- 16.Sogaard, O. S. et al. The depsipeptide romidepsin reverses HIV-1 latency in vivo. PLoS Pathog.11, e1005142 (2015). 10.1371/journal.ppat.1005142 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 17.Banga, R. et al. PD-1+ and follicular helper T cells are responsible for persistent HIV-1 transcription in treated aviremic individuals. Nat. Med.22, 754–761 (2016). 10.1038/nm.4113 [DOI] [PubMed] [Google Scholar]
- 18.Banga, R. et al. Blood CXCR3+ CD4 T cells are enriched in inducible replication competent HIV in aviremic antiretroviral therapy-treated individuals. Front. Immunol.9, 144 (2018). 10.3389/fimmu.2018.00144 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 19.Brenchley, J. M. et al. T-cell subsets that harbor human immunodeficiency virus (HIV) in vivo: implications for HIV pathogenesis. J. Virol.78, 1160–1168 (2004). 10.1128/JVI.78.3.1160-1168.2004 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 20.Chomont, N. et al. HIV reservoir size and persistence are driven by T cell survival and homeostatic proliferation. Nat. Med.15, 893–900 (2009). 10.1038/nm.1972 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 21.Douek, D. C. et al. HIV preferentially infects HIV-specific CD4+ T cells. Nature417, 95–98 (2002). 10.1038/417095a [DOI] [PubMed] [Google Scholar]
- 22.Gosselin, A. et al. Peripheral blood CCR4+CCR6+ and CXCR3+CCR6+CD4+ T cells are highly permissive to HIV-1 infection. J. Immunol.184, 1604–1616 (2010). 10.4049/jimmunol.0903058 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 23.Hiener, B. et al. Identification of genetically intact HIV-1 proviruses in specific CD4+ T cells from effectively treated participants. Cell Rep.21, 813–822 (2017). 10.1016/j.celrep.2017.09.081 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 24.Lee, G. Q. et al. Clonal expansion of genome-intact HIV-1 in functionally polarized Th1 CD4+ T cells. J. Clin. Invest.127, 2689–2696 (2017). 10.1172/JCI93289 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 25.Mendoza, P. et al. Antigen-responsive CD4+ T cell clones contribute to the HIV-1 latent reservoir. J. Exp. Med.217, e20200051 (2020). 10.1084/jem.20200051 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 26.Simonetti, F. R. et al. Antigen-driven clonal selection shapes the persistence of HIV-1-infected CD4+ T cells in vivo. J. Clin. Invest.131, e145254 (2021). 10.1172/JCI145254 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 27.Yukl, S. A. et al. Differences in HIV burden and immune activation within the gut of HIV-positive patients receiving suppressive antiretroviral therapy. J. Infect. Dis.202, 1553–1561 (2010). 10.1086/656722 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 28.Collora, J. A. et al. Single-cell multiomics reveals persistence of HIV-1 in expanded cytotoxic T cell clones. Immunity55, 1013–1031 (2022). 10.1016/j.immuni.2022.03.004 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 29.Weymar, G. H. J. et al. Distinct gene expression by expanded clones of quiescent memory CD4+ T cells harboring intact latent HIV-1 proviruses. Cell Rep.40, 111311 (2022). 10.1016/j.celrep.2022.111311 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 30.Clark, I. C. et al. Identification of astrocyte regulators by nucleic acid cytometry. Nature10.1038/s41586-022-05613-0 (2023). [DOI] [PMC free article] [PubMed]
- 31.Ho, Y. C. et al. Replication-competent noninduced proviruses in the latent reservoir increase barrier to HIV-1 cure. Cell155, 540–551 (2013). 10.1016/j.cell.2013.09.020 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 32.Clark, I. C., Thakur, R. & Abate, A. R. Concentric electrodes improve microfluidic droplet sorting. Lab Chip18, 710–713 (2018). 10.1039/C7LC01242J [DOI] [PMC free article] [PubMed] [Google Scholar]
- 33.Herroeder, S. et al. Guanine nucleotide-binding proteins of the G12 family shape immune functions by controlling CD4+ T cell adhesiveness and motility. Immunity30, 708–720 (2009). 10.1016/j.immuni.2009.02.010 [DOI] [PubMed] [Google Scholar]
- 34.Rao, S. et al. Host mRNA decay proteins influence HIV-1 replication and viral gene expression in primary monocyte-derived macrophages. Retrovirology16, 3 (2019). 10.1186/s12977-019-0465-2 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 35.Ding, D. et al. Involvement of histone methyltransferase GLP in HIV-1 latency through catalysis of H3K9 dimethylation. Virology440, 182–189 (2013). 10.1016/j.virol.2013.02.022 [DOI] [PubMed] [Google Scholar]
- 36.Wang, J. et al. Retinoblastoma binding protein 4 represses HIV-1 long terminal repeat-mediated transcription by recruiting NR2F1 and histone deacetylase. Acta Biochim. Biophys. Sin.51, 934–944 (2019). 10.1093/abbs/gmz082 [DOI] [PubMed] [Google Scholar]
- 37.Cismasiu, V. B. et al. BCL11B is a general transcriptional repressor of the HIV-1 long terminal repeat in T lymphocytes through recruitment of the NuRD complex. Virology380, 173–181 (2008). 10.1016/j.virol.2008.07.035 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 38.Gao, R. et al. Competition between PAF1 and MLL1/COMPASS confers the opposing function of LEDGF/p75 in HIV latency and proviral reactivation. Sci. Adv.6, eaaz8411 (2020). 10.1126/sciadv.aaz8411 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 39.Malcolm, T., Kam, J., Pour, P. S. & Sadowski, I. Specific interaction of TFII-I with an upstream element on the HIV-1 LTR regulates induction of latent provirus. FEBS Lett.582, 3903–3908 (2008). 10.1016/j.febslet.2008.10.032 [DOI] [PubMed] [Google Scholar]
- 40.Yang, X., Chen, Y. & Gabuzda, D. ERK MAP kinase links cytokine signals to activation of latent HIV-1 infection by stimulating a cooperative interaction of AP-1 and NF-κB. J. Biol. Chem.274, 27981–27988 (1999). 10.1074/jbc.274.39.27981 [DOI] [PubMed] [Google Scholar]
- 41.Kino, T., Slobodskaya, O., Pavlakis, G. N. & Chrousos, G. P. Nuclear receptor coactivator p160 proteins enhance the HIV-1 long terminal repeat promoter by bridging promoter-bound factors and the Tat-P-TEFb complex. J. Biol. Chem.277, 2396–2405 (2002). 10.1074/jbc.M106312200 [DOI] [PubMed] [Google Scholar]
- 42.Bres, V., Gomes, N., Pickle, L. & Jones, K. A. A human splicing factor, SKIP, associates with P-TEFb and enhances transcription elongation by HIV-1 Tat. Genes Dev.19, 1211–1226 (2005). 10.1101/gad.1291705 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 43.Gebhardt, A. et al. The alternative cap-binding complex is required for antiviral defense in vivo. PLoS Pathog.15, e1008155 (2019). 10.1371/journal.ppat.1008155 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 44.Lykke-Andersen, J., Shu, M. D. & Steitz, J. A. Communication of the position of exon-exon junctions to the mRNA surveillance machinery by the protein RNPS1. Science293, 1836–1839 (2001). 10.1126/science.1062786 [DOI] [PubMed] [Google Scholar]
- 45.Cobos Jimenez, V. et al. G3BP1 restricts HIV-1 replication in macrophages and T-cells by sequestering viral RNA. Virology486, 94–104 (2015). 10.1016/j.virol.2015.09.007 [DOI] [PubMed] [Google Scholar]
- 46.Csosz, E. et al. Analysis of networks of host proteins in the early time points following HIV transduction. BMC Bioinform.20, 398 (2019). 10.1186/s12859-019-2990-3 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 47.Mei, Y., Hahn, A. A., Hu, S. & Yang, X. The USP19 deubiquitinase regulates the stability of c-IAP1 and c-IAP2. J. Biol. Chem.286, 35380–35387 (2011). 10.1074/jbc.M111.282020 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 48.Jin, J. et al. LRRFIP2 negatively regulates NLRP3 inflammasome activation in macrophages by promoting Flightless-I-mediated caspase-1 inhibition. Nat. Commun.4, 2075 (2013). 10.1038/ncomms3075 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 49.Wernimont, S. A. et al. Contact-dependent T cell activation and T cell stopping require talin1. J. Immunol.187, 6256–6267 (2011). 10.4049/jimmunol.1102028 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 50.Pallikkuth, S. et al. Peripheral T follicular helper cells are the major HIV reservoir within central memory CD4 T cells in peripheral blood from chronically HIV-infected individuals on combination antiretroviral therapy. J. Virol.90, 2718–2728 (2015). 10.1128/JVI.02883-15 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 51.Cohn, L. B. et al. Clonal CD4+ T cells in the HIV-1 latent reservoir display a distinct gene profile upon reactivation. Nat. Med.24, 604–609 (2018). 10.1038/s41591-018-0017-7 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 52.Kuniholm, J. et al. Intragenic proviral elements support transcription of defective HIV-1 proviruses. PLoS Pathog.17, e1009982 (2021). 10.1371/journal.ppat.1009982 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 53.Blazkova, J. et al. Effect of histone deacetylase inhibitors on HIV production in latently infected, resting CD4+ T cells from infected individuals receiving effective antiretroviral therapy. J Infect Dis206, 765–769 (2012). 10.1093/infdis/jis412 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 54.Falcinelli, S. D. et al. Combined noncanonical NF-kappaB agonism and targeted BET bromodomain inhibition reverse HIV latency ex vivo. J. Clin. Invest.132, e157281 (2022). 10.1172/JCI157281 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 55.Grau-Exposito, J. et al. Latency reversal agents affect differently the latent reservoir present in distinct CD4+ T subpopulations. PLoS Pathog.15, e1007991 (2019). 10.1371/journal.ppat.1007991 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 56.Sannier, G. et al. Combined single-cell transcriptional, translational, and genomic profiling reveals HIV-1 reservoir diversity. Cell Rep.36, 109643 (2021). 10.1016/j.celrep.2021.109643 [DOI] [PubMed] [Google Scholar]
- 57.Yukl, S. A. et al. HIV latency in isolated patient CD4+ T cells may be due to blocks in HIV transcriptional elongation, completion, and splicing. Sci. Transl. Med.10, eaap9927 (2018). 10.1126/scitranslmed.aap9927 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 58.Baxter, A. E. et al. Single-cell characterization of viral translation-competent reservoirs in HIV-infected individuals. Cell Host Microbe20, 368–380 (2016). 10.1016/j.chom.2016.07.015 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 59.Ren, Y. et al. BCL-2 antagonism sensitizes cytotoxic T cell-resistant HIV reservoirs to elimination ex vivo. J. Clin. Invest.130, 2542–2559 (2020). 10.1172/JCI132374 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 60.Cochrane, C. R. et al. Intact HIV proviruses persist in the brain despite viral suppression with ART. Ann. Neurol.92, 532–544 (2022). 10.1002/ana.26456 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 61.Heesters, B. A. et al. Follicular dendritic cells retain infectious HIV in cycling endosomes. PLoS Pathog.11, e1005285 (2015). 10.1371/journal.ppat.1005285 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 62.Pinzone, M. R. et al. Naive infection predicts reservoir diversity and is a formidable hurdle to HIV eradication. JCI Insight6, e150794 (2021). 10.1172/jci.insight.150794 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 63.Boritz, E. A. et al. Multiple origins of virus persistence during natural control of HIV infection. Cell166, 1004–1015 (2016). 10.1016/j.cell.2016.06.039 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 64.Jordan, A., Bisgrove, D. & Verdin, E. HIV reproducibly establishes a latent infection after acute infection of T cells in vitro. EMBO J.22, 1868–1877 (2003). 10.1093/emboj/cdg188 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 65.Picelli, S. et al. Smart-seq2 for sensitive full-length transcriptome profiling in single cells. Nat. Methods10, 1096–1098 (2013). 10.1038/nmeth.2639 [DOI] [PubMed] [Google Scholar]
- 66.Hutcheson, J. et al. Combined deficiency of proapoptotic regulators Bim and Fas results in the early onset of systemic autoimmunity. Immunity28, 206–217 (2008). 10.1016/j.immuni.2007.12.015 [DOI] [PubMed] [Google Scholar]
- 67.Yan, Z. H., Clark, I. C. & Abate, A. R. Rapid encapsulation of cell and polymer solutions with bubble-triggered droplet generation. Macromol. Chem. Phys. 218, 1600297 (2017).
- 68.Pasternak, A. O. et al. Highly sensitive methods based on seminested real-time reverse transcription-PCR for quantitation of human immunodeficiency virus type 1 unspliced and multiply spliced RNA and proviral DNA. J. Clin. Microbiol.46, 2206–2211 (2008). 10.1128/JCM.00055-08 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 69.Clark, I. C. & Abate, A. R. Microfluidic bead encapsulation above 20 kHz with triggered drop formation. Lab Chip18, 3598–3605 (2018). 10.1039/C8LC00514A [DOI] [PMC free article] [PubMed] [Google Scholar]
- 70.Sigurgeirsson, B., Emanuelsson, O. & Lundeberg, J. Sequencing degraded RNA addressed by 3′ tag counting. PLoS ONE9, e91851 (2014). 10.1371/journal.pone.0091851 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 71.Patro, R., Duggal, G., Love, M. I., Irizarry, R. A. & Kingsford, C. Salmon provides fast and bias-aware quantification of transcript expression. Nat. Methods14, 417–419 (2017). 10.1038/nmeth.4197 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 72.Robinson, M. D., McCarthy, D. J. & Smyth, G. K. edgeR: a Bioconductor package for differential expression analysis of digital gene expression data. Bioinformatics26, 139–140 (2010). 10.1093/bioinformatics/btp616 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 73.Love, M. I., Huber, W. & Anders, S. Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2. Genome Biol.15, 550 (2014). 10.1186/s13059-014-0550-8 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 74.Kramer, A., Green, J., Pollard, J. Jr & Tugendreich, S. Causal analysis approaches in Ingenuity Pathway Analysis. Bioinformatics30, 523–530 (2014). 10.1093/bioinformatics/btt703 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 75.Wu, T. et al. clusterProfiler 4.0: a universal enrichment tool for interpreting omics data. Innovation2, 100141 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 76.Langfelder, P. & Horvath, S. WGCNA: an R package for weighted correlation network analysis. BMC Bioinform.9, 559 (2008). 10.1186/1471-2105-9-559 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 77.Chen, E. Y. et al. Enrichr: interactive and collaborative HTML5 gene list enrichment analysis tool. BMC Bioinform.14, 128 (2013). 10.1186/1471-2105-14-128 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 78.Kuleshov, M. V. et al. Enrichr: a comprehensive gene set enrichment analysis web server 2016 update. Nucleic Acids Res.44, W90–W97 (2016). 10.1093/nar/gkw377 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 79.Wymant, C. et al. Easy and accurate reconstruction of whole HIV genomes from short-read sequence data with shiver. Virus Evol.4, vey007 (2018). 10.1093/ve/vey007 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 80.Mootha, V. K. et al. PGC-1α-responsive genes involved in oxidative phosphorylation are coordinately downregulated in human diabetes. Nat. Genet.34, 267–273 (2003). 10.1038/ng1180 [DOI] [PubMed] [Google Scholar]
- 81.Subramanian, A. et al. Gene set enrichment analysis: a knowledge-based approach for interpreting genome-wide expression profiles. Proc. Natl Acad. Sci. USA102, 15545–15550 (2005). 10.1073/pnas.0506580102 [DOI] [PMC free article] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Supplementary Materials
Data Availability Statement
Transcriptome sequencing data from human study participants were deposited with controlled access in the database of Genotypes and Phenotypes (dbGaP; phs003095.v1.p1). Transcriptome sequencing data from cell line experiments were deposited in the NCBI Sequencing Read Archive (SRA; accessions PRJNA819479 and PRJNA893817). Gene sets M3077 and M3076 analysed in Extended Data Fig. 2 are available online (https://www.gsea-msigdb.org/). Source data are provided with this paper.