Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2025 Nov 21.
Published in final edited form as: Lancet Microbe. 2025 Jul 21;6(9):101122. doi: 10.1016/j.lanmic.2025.101122

Characterisation of a persistent SARS-CoV-2 infection lasting more than 750 days in a person living with HIV: a genomic analysis

Joseline M Velasquez-Reyes 1,2,3, Beau Schaeffer 4, Scott R Curry 5, Victoria Overbeck 6,7, Cole Sher-Jan 8,9, Bradford P Taylor 10, Jacquelyn Turcinovic 11,12,13, Krutika Kuppalli 14,15,*, John H Connor 16,17,18,*, William P Hanage 19,*
PMCID: PMC12633830  NIHMSID: NIHMS2110331  PMID: 40706602

Summary

Background

People who are immunocompromised can develop persistent SARS-CoV-2 infections. Several viral mutations accumulated during the course of such persistent infections have also been observed in prominent variants of concern (VOCs). Here, we characterise persistent infection and viral evolution of SARS-CoV-2 lasting more than 750 days in a person with advanced HIV-1 infection.

Methods

Between March, 2021, and July, 2022, eight clinical specimens were collected from a person living with HIV, neither receiving antiretroviral therapy nor virally suppressed, and presumed to have been initially infected with SARS-CoV-2 in mid-May, 2020. Viral RNA was extracted from each swab and an amplicon-based sequencing approach was used for genomic analysis of SARS-CoV-2. Variable sites were characterised at the consensus and subconsensus levels, and phylogenetic tools were applied to analyse viral evolution. Publicly available SARS-CoV-2 sequences from GenBank were leveraged to contextualise our sequenced samples and identify any potential evidence of transmission.

Findings

Genomes formed a monophyletic cluster in the B.1 lineage. 68 consensus and 67 subconsensus single nucleotide variants were observed over the course of infection. The intrahost clock rate remained similar to that of the interhost rate in contemporaneous community sequences (6·74 × 10−4 [95% credible interval 5·05 × 10−4 to 8·54 × 10−4] substitutions per site per year vs 6·11 × 10−4 [5·54 × 10−5 to 6·66 × 10−4]). Mutations grouped into two distinct subpopulations present throughout infection. 10 non-synonymous mutations in the spike protein gene were at positions in common with those defining the omicron lineage (BA.1 or BA.2), of which nine were present before November, 2021. Nine of 18 substitutions present throughout infection were rare in online databases, suggesting a lack of long transmission chains descending from this individual.

Interpretation

Convergent SARS-CoV-2 evolution, both in and outside the spike protein, observed in this study suggests parallels with the evolutionary process leading to emergence of the omicron VOC. The inferred absence of onward infections might indicate a loss of transmissibility during adaptation to a single host. Our results underscore the importance of appropriate treatment to cure persistent SARS-CoV-2 infections and monitoring them to understand how mutations contribute to viral adaptation.

Funding

National Institute of General Medical Sciences of the National Institutes of Health, Centers for Disease Control and Prevention, the National Institute of Allergy and Infectious Diseases, MassCPR, and Morris Singer Foundation.

Introduction

Since the emergence of SARS-CoV-2 in 2019, a global effort to collect and sequence viral genomes has documented the evolution of the virus and increasingly divergent variants of concern (VOCs). Starting in November, 2021, the omicron (BA.1 and BA.2) VOC spread globally, replacing previously established viral lineages at an unprecedented rate and placing an acute short-term burden on health-care systems. The success of omicron even among populations with high levels of immunity from vaccination and previous infection reflects a combination of an intrinsically highly transmissible virus, with further mutations conferring the ability to evade the antibody response elicited by vaccines and previous infections. The exact origins of omicron and other VOCs remain unknown, but multiple factors suggest persistent infections among immunocompromised hosts as a possible source.17

Persistent SARS-CoV-2 infections among people who are immunocompromised have been associated with a more rapid accumulation of mutations and evidence of selection.1,8 The distinct selective pressures within an immunocompromised host along with the absence of population bottlenecks caused by repeat transmission events presents the viral genome with an opportunity to explore evolutionary possibilities that would otherwise remain inaccessible. However, infections in people who are immunocompromised are unusual selective environments; the resulting virus might be able to more effectively bind to angiotensin-converting enzyme 2 (ACE2) to improve infectivity during the ongoing infection,1,8 yet lose the ability to transmit efficiently. Nevertheless, should the virus remain capable of transmission these adaptations might serve as a source for new VOCs.

At the end of 2023, an estimated 39·9 million people were living with HIV worldwide, of whom 23% are estimated to not be receiving antiretroviral therapy (ART). People living with HIV, especially those without access to—or otherwise not receiving—ART might have moderate-to-severe immunodeficiency. The evolution of SARS-CoV-2 in people living with HIV is recognised to be important and several cases have been documented, yet studies of poorly controlled infection among people living with HIV who are not receiving ART have been scarce, in part because people living with HIV frequently struggle to access adequate health care. The reports that do exist, together with observations of infections in other immunosuppressed hosts, suggest that inadequately treated HIV infections provide an environment conducive to SARS-CoV-2 evolution.5,9,10 Here we present an analysis of SARS-CoV-2 evolution during a single infection lasting more than 750 days in a person living with advanced HIV.

Methods

Brief clinical report

This study was a longitudinal genomic analysis of SARS-CoV-2 isolated from a man aged 41 years with HIV/AIDS diagnosed in 2002, who was not reliably taking ART, had a CD4 count less than 35 cells per μL, and had a viral load of 32 753 copies per mL with a history of cryptococcal meningitis, condyloma acuminatum, and genital herpes simplex virus type 2 for 18 years (see appendix pp 13 for clinical details). This man developed cough, headaches, myalgias, and malaise in mid-May 2020 after exposure to a close contact with laboratory-confirmed COVID-19. At the time of symptom onset, he was unable to access health-care services, thus laboratory confirmation of SARS-CoV-2 was not performed until he presented to the emergency room and was hospitalised in September, 2020, with cough, sore throat, weight loss, and dyspnoea. Although he had persistent respiratory symptoms for the 26 months following May, 2020, and required four additional hospitalisations, there was only one recorded instance of oxygen saturation dropping as low as 94%. Despite multiple follow-up attempts over the course of infection, consistent engagement with medical care was never established, ART was not reliably accessed, and no therapeutics for SARS-CoV-2 were prescribed. PCR testing was consistently positive for SARS-CoV-2 until his death 26 months after initially reporting symptoms, from causes not clearly related to infection. 2 days before his death, nasopharyngeal, throat, and mid-turbinate samples were all positive for SARS-CoV-2. This study was approved by and conducted under the oversight of the Institutional Review Board at the Medical University of South Carolina (Charleston, SC, USA; protocol 00108994, and participant consent was waived because the patient died before the study was conducted.

Sampling protocol

Eight clinical samples positive for SARS-CoV-2 were collected from the patient by medical staff at the Medical University of South Carolina and stored at −80°C in viral transport medium. Six viral genomes were sequenced from nasopharyngeal swabs and two from throat swabs. Samples were collected at 312, 401, 522, 630 (nasal and throat), 640, and 776 (nasal and throat) estimated days post-infection (EDPI) based on symptom onset (figure 1).11 No samples were collected from the first positive SARS-CoV-2 test. RNA was extracted and sequenced using an established protocol (see appendix p 10). Patient data were extracted from the Medical University of South Carolina Epic electronic health record (Epic Systems, Verona, WI, USA).

Figure 1: Summary of chronic infection and viral genome lineages.

Figure 1:

(A) Timeline of hospitalisations, administered antibody examinations, HIV viral load, and collection of samples. (B) Timeline of collected samples with context of globally circulating SARS-CoV-2 strains. Each sequenced sample also contains information regarding its Pango18 lineage. Ct=cycle threshold. EDPI=estimated days post-infection. Ig=immunoglobulin. *Throat samples collected.

Sequence processing pipeline

Primer trimming was done using fastp,12 read classification with kraken2,13 alignment to Wuhan isolate reference (MN908947.3) using bowtie214 variant calling with LoFreq,15 and consensus genome creation with samtools.16 Single nucleotide variants (SNVs) were called using LoFreq.15 SnpEff and SnpSift were used to annotate the resulting variant call format files.17 Lineages were assigned using Pangolin.11

Variable sites

Variable sites were determined at consensus and subconsensus levels according to alternative allele frequency relative to the Wuhan/Hu-1/2019 reference strain (MN908947.3). Only sites with a sequencing depth of 25 reads or more were considered for the analysis. Consensus analyses were conducted using full genomes incorporating variant sites where alternative allele frequencies were more than 0·5. Subconsensus analyses were conducted using variant call format files filtered to SNVs with alternative allele frequency of 0·1 or more. Hierarchical clustering was applied on the frequency of the SNVs using Euclidean distance and visualised as a dendrogram using Python version 3. The ratio of non-synonymous to synonymous mutations was calculated at each timepoint to quantify selective pressure over the course of infection. The specific mutations observed (ie, C to T, A to G etc) at each timepoint were randomly permuted to produce 1000 simulated genomes with the same mutations but at random positions for each sample to simulate a null model of evolution. From these simulations an average ratio of NS/S was calculated for each timepoint. Spike (S) protein SNVs over time at 0·2 or more allele frequency were compared with mutations in S proteins from SARS-CoV-2 VOCs collected from outbreak.info. The SNVs accumulated in the S protein over time at 0·8 allele frequency or greater were visualised using ChimeraX.18 The N protein SNVs were compared with known CD8+ T-cell epitopes to identify any evidence of CD8+ T cell–virus interaction.

Phylogenetics

For phylogenetic analysis, 292 contextual sequences were obtained from GenBank by randomly sampling genomes from the USA between Jan 1, 2020, and Dec 31, 2020, using web-based functionality. These sequences were filtered for complete genomes from human hosts. Multiple nucleotide sequence alignment was performed using MAFFT.19 We estimated the clock rate of SARS-CoV-2 within this infection (see appendix pp 1011). To determine possible convergent evolution at sites other than those characterising known variants, we also examined the frequency of those substitutions present across the entirety of the infection in unrelated genomes in the Global Initiative on Sharing All Influenza Data (GISAID).20 We additionally explored whether onward transmission could have occurred using Ultrafast Sample placement on Existing tRees (UShER)21 to place our set of eight consensus genomes on a global SARS-CoV-2 phylogeny of genomes from GenBank (update July 2, 2024) to examine any closely related sequences (appendix p 10).

Role of the funding source

The funders of the study had no role in study design, data collection, data analysis, data interpretation, or writing of the report.

Results

A patient living with advanced HIV (CD4 <35 cells per μL) developed COVID-19 symptoms in May, 2020, but did not receive a confirmed COVID-19 diagnosis until September, 2020. Following a laboratory-confirmed diagnosis of SARS-CoV-2, the patient was hospitalised on five different occasions before death from causes unrelated to infection (figure 1A). Throughout the SARS-CoV-2 infection, the patient had HIV viral loads ranging from 32 752 to 956 994 copies per mL (figure 1A). At each hospital admission, the patient had a CD4+ cell count below 50 cells per μL and was PCR-positive for SARS-CoV-2. During two hospitalisations, the patient was IgG-negative for SARS-CoV-2 (312 and 630 EDPI).

All eight samples were successfully sequenced to an average depth of over 400 reads (appendix p 4). Pangolin analysis of the consensus genomes of these samples identified all genomes as B.1. The B.1 classification persisted through population sweeps in which the alpha (B.1.1.7), delta (B.1.617.2), and omicron (BA.1 and BA.2) VOCs (figure 1B) outcompeted previously circulating viral lineages (including B.1), suggesting the host experienced a single persistent infection event without reinfection.

Figure 2A shows the number of SNVs at different allele frequency thresholds (0·5, 0·75, 0·90, and 0·99). Across the infection, samples exhibited 25–60 mutations at allele frequency greater than 0·5, 19–58 mutations at allele frequency greater than 0·75, 10–55 mutations at allele frequency greater than 0·90, and 10–40 at allele frequency greater than 0·99. All genomes sequenced in this study formed a single monophyletic clade, characterised by a combination of variant sites found only in SARS-CoV-2 isolates from this patient occurring on the branch separating them from community samples (appendix p 5). There is high support for the split grouping the samples collected 401 and 776 EDPI from the others (figure 2B). The final two samples form a distinct clade with a most recent common ancestor that predates four of the other six, suggesting the presence of multiple lineages that were maintained over at least this period. We also noted clustering of contemporaneous nasal and throat genomes (two pairs), suggesting that the same population was dominant in both sample sites at that point in time. No homoplasic sites were found in the phylogeny, suggesting that recombination with circulating viruses was unlikely over the course of infection. A phylogeny constructed using UShER showed patient genomes formed a monophyletic clade, indicating no evidence for onward transmission among more than 8 million genomes submitted to GISAID between 2020 and 2022 (figure 3). Of 18 mutations we identified as fixed in this infection, nine were rare in the global population of viruses (3% or less of the genomes sampled over the period above; table).

Figure 2: Characterisation of evolution and selection over the course of infection.

Figure 2:

(A) Number of SNVs present at each timepoint according to four allele frequency cutoff values (0·50, 0·75, 0·90, 0·99). (B) Maximum clade credibility tree of all patient samples (nasopharyngeal or mid-turbinate and throat) with coloured node posterior support values and tips labelled by EDPI. Branch lengths reflect date of sampling. (C) Ratio of non-synonymous to synonymous mutations over time for only nasal samples. The grey dots represent the expected non-synonymous to synonymous mutation ratios under the null hypothesis of no selective pressure. (D) Posterior probability distributions of estimated molecular clock rates of patient infection (intrahost) compared with contextual sequences (interhost). EDPI=estimated days post-infection. SNV=single nucleotide variant.

Figure 3: Investigating onward transmission.

Figure 3:

Focused clade from larger divergence-scaled (substitutions) UShER phylogeny showing that patient samples (red) cluster together at the end of a branch in the context of all SARS-CoV-2 genomes available on GenBank.

UShER=ultrafast sample placement on existing trees.

Table:

Mutations present at consensus and prevalence of fixed SNVs

Nucleotide Viral protein Amino acid Prevalence

Non-coding C241T Non-coding 98·09%
ORF1AB C1059T NSP2 Thr85Ile 5·14%
ORF1AB C3037T NSP3 99·51%
ORF1AB GTCTGGTTTT11287G NSP6 Ser106delLys (3aa del) 26·27%
ORF1AB A12091G NSP7 0·01%
ORF1AB C14408T NSP12 Pro323Ser 99·56%
ORF1AB G18255T NSP14 Met72Ile 0·67%
ORF1AB G20580T NSP15 0·08%
S C21846T Spike (N terminal) Thr95Ile 25·79%
S G21986A Spike (N terminal) Gly142Ser 0·02%
S A23403G Spike (C terminal) Asp614Gly 99·73%
S C24138A Spike (FP) Thr859Asn 0·28%
S G24410A Spike (HR1) Asp950Asn 55·79%
ORF3a G25563T ORF3a Gln57His 7·13%
E C26299T Envelop Leu19Phe 0·01%
ORF8 C27889T Non-coding 0·24%
Non-coding TAA28270T Non-coding 0·00%
N C28866T Nucleocapsid Thr198Ile 0·03%

Mutations present at consensus across the entirety of the infection period, listed by open reading frame, and prevalence of fixed SNVs as a percentage of genomes in the Global Initiative on Sharing All Influenza Data database. One non-coding fixed mutation is unique to this infection while others are extremely rare. SNV=single nucleotide variant.

The non-synonymous to synonymous mutation ratio was not constant over the course of infection but was consistent with positive selection despite the patient’s immunodeficiency, suggesting that immune pressure is not the only selective force driving this phenomenon (figure 2C). For each timepoint, we randomly permuted the profile of variant sites throughout the genome to simulate a null model of evolution. This approach confirmed that the ratio of non-synonymous to synonymous substitutions was higher than would be expected if their positions were selected at random through genetic drift (appendix p 6).

We estimated the clock rate of SARS-CoV-2 within this infection to be 6·74 × 10−4 (95% credible intervals 5·05 × 10−4 to 8·54 × 10−4) substitutions per site per year compared with 6·11 × 10−4 (5·54 × 10−5 to 6·66 × 10−4) in the 292 contextual samples (figure 2D; appendix p 5). We noted 216 SNVs at various frequencies (appendix p 8) relative to the Wuhan-Hu-1 genome over the course of infection. There were 67 present more than once at subconsensus and 68 SNVs present more than once at consensus level. The 68 SNVs spanned the whole length of the genome, with 24 in ORF1ab, 20 in the S protein-encoding gene, and 18 in proteins encoded 3′ to S protein (figure 4A). Among the mutations found in all genomes from the patient was an NSP6 deletion which is found in multiple VOCs, including omicron.22,23 ORF6 and the S gene had the largest number of changes per codon. 18 SNVs were consistently found in all six nasal genomes at consensus. These SNVs were distributed across the genome (eight in ORF1AB, five in S, and five in other viral proteins), suggesting viral persistence rather than reinfection (table). Outside of these mutations, we note that the other SNVs that rose to consensus did not arise in a linear manner; rather, the genome gained some mutations and lost others inconsistently (figure 4A). To understand the viral populations at a subconsensus and consensus level, we applied hierarchical clustering to our SNVs heatmap creating a dendrogram visualisation using Euclidean distance (figure 4C). The clustering formed three distinct clusters. The orange cluster includes intrahost SNVs from 312, 630, and 640 EDPI. The green cluster consists of intrahost SNVs from 401 and 776 EDPI. Lastly, the blue cluster consists of only intrahost SNVs from 522 EDPI. When we coloured each sample by cluster on a timeline (figure 4B), we can see that the cluster identification for each genome is not in chronological order, which illustrates fluctuation of the viral population across infection.

Figure 4: SNV analysis including subconsensus SNVs.

Figure 4:

(A) Overview of the SNVs present for each genome over the course of infection (312–776 EDPI) compared with the Wuhan-Hu-I reference. The heatmap scale represents the intrahost SNV frequency in the viral population. The location of non-synonymous SNVs in each gene are indicated below the heatmap as the codon position, followed by the location in the viral genome (eg, NSP2 Thr85Iso•1059 indicates a mutation in the 85th codon of the NSP2 gene, at nucleotide 1059 in the genome, leading to a change from threonine to isoleucine). For non-coding and synonymous SNVs we report only the position in the genome. (B) Timeline of the genomics in the context of similarity created by hierarchical clustering. (C) Dendrogram representing mutations at subconsensus and consensus. EDPI=estimated days post-infection. SNV=single nucleotide variant.

A total of 43 mutations in S were observed (≥0·2 allele frequency; appendix p 8), of which 12 were at positions that define VOCs such as alpha, delta, BA.1, and BA.2 (figure 5B). Of these, ten are at positions fixed in BA.1, seven of which are also present in BA.2, including Ser477Asn and Glu484Lys (Glu484Ala in omicron lineages), associated with changes to ACE2 binding and immune evasion, and His655Tyr and Asn679Lys which have been associated with attenuation and adaptation to upper airway replication.24 We also note that nine of the ten mutations in S that define the BA.1 omicron subvariant, six of which are also found in BA.2, were present in this infection before the emergence of omicron in November, 2021.

Figure 5: Overview of SNVs present in the spike protein genome over the course of infection.

Figure 5:

(A) Schematic representation of the full-length spike protein. (B) Summary of mutation site changes similar to or shared with variants of concerns such as alpha, delta, omicron BA.1, and omicron BA.2. (C and D) Structural depiction of the mutational changes accumulated over the course of infection at ≥0·8 allele frequency. (C) shows mutations with spike orientated vertically, and (D) shows mutations at bird’s eye view down the core of spike protein. CTD=C-terminal domain. HR1=heptad repeat 1. NTD=N-terminal domain. RBD=receptor binding domain. S1=spike protein subunit 1. S2=spike protein subunit 2. SNV=single nucleotide variant. *Mutational cluster present in the patient at 312 estimated days post-infection and shared with omicron BA.1 (sequences from Dec 27, 2021).

Figure 5C shows the location of the accumulated S protein mutations across infection at 0·8 allele frequency or greater on the prefusion form of the protein. Mutations are located primarily on the surface, with several mutations in the receptor binding domain, N-terminal domain, and furin cleavage sites (see figure 5A for reference). No mutations accumulated in the core of the prefusion structure (figure 5D). Comparing these mutations with those observed in BA.1, it was noted that the tertiary locations of the unique mutations in omicron differed from those in the sequences we analysed. Many BA.1 mutations are found at the interface between subunits 1 and 2 of the S protein, but the SARS-CoV-2 genomes from this infection contained no mutations in this region but rather a cluster of mutations within the receptor binding domain (appendix p 9). Although short-read sequencing is incapable of phasing SNVs at distant sites into the same haplotype, we note that the presence of Asn439Lys was inversely related with that of Glu484Lys at three timepoints (ie, 522, 630, and 640 EDPI), consistent with the continued circulation of multiple viral sublineages within the host which were characterised by one or the other of these mutations. A similar pattern was shown by the 69 and 70 deletion which was present in two different samples during the course of infection (appendix p 9). CD8+ T cells, which remain even during the advanced stages of HIV infection, have been reported to target epitopes clustered in the N protein;25 however none of the intrahost SNVs recorded in these genomes were located in these regions of the gene25 (table).

Discussion

Literature suggests that there is difficulty clearing SARS-CoV-2 infection when treating individuals living with advanced or uncontrolled HIV as a result of T-cell dysfunction, reduction in neutralisation response, and B-cell alterations—immunological features that are not present or reduced in individuals with suppressed HIV.26 As a result, SARS-CoV-2 infections in people living with HIV can become protracted. We note that among the persistent infections that are described in the scientific literature, few have been reported from the early B.1 lineage, and to our knowledge the infection we describe herein is the longest reported to date. We also note the difference in the rate at which mutations reach fixation at the within-host level compared with the population level. SARS-CoV-2 adaptation over more than 750 days of infection in a person with HIV led to the acquisition of S protein mutations associated with the evasion of neutralising antibodies and enhancement of receptor binding, with many present at the first timepoint sampled (312 EDPI). Due to the lack of a detectable humoral immune response to SARS-CoV-2 and low CD4+ levels, these mutational changes are, probably, primarily adaptations to this host, rather than specific and strong selection for immune evasion. Interestingly, we noted an absence of N mutations which might suggest a lack of interaction between CD8+ T cells and the virus.25 Convergent evolution (at S protein and NSP6) between the viral population within this patient and the omicron VOC suggests that an immunocompromised host environment can drive evolution towards genotypes similar to the highly divergent omicron, supporting the hypothesis that new VOCs might appear from persistent infections in people who are immunocompromised and highlighting the importance of characterising SARS-CoV-2 infections among people living with HIV, curing those infections, and assuring access to ART.

S protein adaptations that developed over this course of infection (Thr95Ile, Gly142Val, Ala475Val, and Glu484Lys) were similar to a COVID-19 case in a person living with HIV infected with SARS-CoV-2 for 190 days.27 We also observed deletions present in other VOCs, including one at the 69 and 70 codon positions in the S protein found in some VOCs and not others. Notably, this deletion was characteristic of the alpha variant as well as the BA.1 and BA.5 subvariants of omicron but not the delta VOC or BA.2 subvariants which were respectively dominant in the periods between alpha and BA.1, and between BA.1 and BA.5. The significance of, and reasons for, this alternating pattern are unclear. However, the transient presence of this deletion in the infection described in this paper could suggest a form of negative frequency-dependent selection, in which the rare allele at the position gains a selective advantage that diminishes once it becomes common, and hence the selective benefit switches to the other allele. In ecology and evolution, negative frequency-dependent selection is an important means of maintaining population diversity in both pathogens and hosts.28,29 Further evidence of convergent evolution is a three-nucleotide NSP6 deletion which is associated with an impaired type 1 interferon response.30 This deletion has been reported in all VOCs except delta. Although the full selective consequences of this deletion remain to be established, given the evidence surrounding VOC emergence from long-term infections,31 it might be a signature of within-host adaptation.

The prevalence of many intrahost SNVs at a level greater than 20% but below fixation illustrates the viral swarm that developed over this long-term infection. There is no universally dominant genome except for the last patient sample. Instead, there are many different genomes representing different evolutionary histories explored within the host. The reasons for persistent subpopulations remain unclear but might include tissue tropism. We were unable to investigate possible intermittent co-infections with other pathogens due to the lack of residual samples for their detection. It is technically possible that undocumented changes in immune function led to fluctuating selection pressure, although we have no evidence of a humoral response to the virus nor any record of the patient ever receiving a COVID-19 vaccine, and CD4+ counts were consistently low.

The virus population that developed over this long-term infection appears to have been well adapted for successful replication within this specific host. There was no evidence for superinfection or displacement by alpha, delta, or omicron variants despite these viruses sweeping through the USA over the course of infection. Such adaptation to one individual could have resulted in trade-offs that limited onward transmission. Onward transmission from persistent infections has been inconsistently reported.9,10,32 We found no evidence for any related viruses that might be descendants of this infection among the millions of SARS-CoV-2 genomes publicly available, even though the viral load in nasal samples remained high throughout infection. However, we cannot exclude the possibility of short transmission chains that were not sampled in genomic surveillance.

All sequences from this patient contained a fixed mutation at position TAA-28 270-T that is not found in any other sequence present in GISAID but was present in one other chronic SARS-CoV-2 infection.33 This convergent pattern might suggest that the mutation at position 28 270 produces viruses adapted to success in an individual infection but poor at transmission. Finally, we acknowledge that undescribed patient behaviour could also explain the apparent lack of onward transmission; the patient was relatively isolated in a rural area of South Carolina not served by sewer systems or subject to wastewater surveillance monitoring for similar infections in the community.

The present work has several limitations, the foremost of which is the small number of samples and the uncertainty of the date of infection. These reflect the difficulty in accessing and collecting samples from patients who have few interactions with health-care infrastructure. The extremely long duration of this infection reflects the uncontrolled nature of the patient’s HIV infection. ART has been shown to restore immune function and lead to the clearance of SARS-CoV-2 in similar cases.27,34 In this case, adequate ART was not achieved. Despite enduring infection and immunosuppression the patient did not receive vaccines or antiviral therapies for COVID-19. The SARS-CoV-2 infection ended only at death from causes that were apparently independent of infection. This report emphasises that SARS-CoV-2 can establish a chronic, non-lethal infection in individuals with compromised immune function that leads to considerable viral evolution. Clearing these infections should be a priority for health-care systems.

Supplementary Material

1

Research in context.

Evidence before this study

Persistent SARS-CoV-2 infections in immunocompromised individuals lead to substantial intrahost evolution. These often heavily mutated genomes have been hypothesised as sources of future variants of concern (VOCs), but few long-term infections have been longitudinally documented, and few of those have lasted more than 2 years. We searched PubMed for articles published between March, 2020, and Sept 15, 2024, using the keywords “chronic SARS-CoV-2” or “persistent SARS-CoV-2”, combined with “immunocompromised” and “evolution” without language restrictions. This search resulted in 62 relevant papers which characterised the viral evolutionary path and explored specific fixed mutations. These studies described persistent infections ranging from 157 to 457 days, mostly focusing on the rapid accumulation of mutations in the spike protein region. Our study elucidates the evolutionary trajectory of the full SARS-CoV-2 genome in a person living with HIV over a much longer time horizon—776 days. The characteristics of evolution during this infection are important for the large number of people living with inadequately controlled HIV globally and for the evolution of VOCs.

Added value of this study

By analysing viral genomes across a multi-year infection period, we noted fluctuations in selective pressure on the virus and a large, continually changing landscape of subconsensus mutations. We observed the rise and fall of different combinations of mutations at different times of infection, supporting the idea that in long-term SARS-CoV-2 infections, the dominant genome can change over time and multiple viral populations can persist in a single infection. We also noted that spike and NSP6 mutations associated with VOCs, specifically omicron, appeared in this long-term infection months before their emergence in the community following the omicron surge at the end of 2021. The convergent changes we observed offer support for the proposition that long-term infections with early-pandemic viruses have a propensity to develop omicron-like changes both within and outside of the spike protein.

Implications of all the available evidence

The growing body of literature describing intrahost SARS-CoV-2 evolution suggests that persistent infections in immunocompromised people are a plausible source of VOCs, with which they often display convergent evolution. Our analysis shows that these changes might be transiently present in persistent infections, and how mutations previously believed to positively or negatively affect transmission might also be associated with other phenotypes, such as attenuation. The body of evidence surrounding the evolution of these persistent infections provides justification for further efforts to reduce infection risk among vulnerable, immunocompromised populations and the further study of intrahost SARS-CoV-2 evolution.

Acknowledgments

Research reported in this publication was supported by the National Institute of General Medical Sciences of the National Institutes of Health under award number T32GM100842. The content is solely the responsibility of the authors and does not necessarily represent the official views of the National Institutes of Health. The work was also supported by grants from the Centers for Disease Control and Prevention (NU50CK000629–01-00 paid to JHC and 200–2016-91779 to WPH) and the National Institute of Allergy and Infectious Diseases (R01AI128344; paid to WPH), as well as support from the Morris-Singer Foundation and MassCPR. KK conducted this work while at the Medical University of South Carolina. We thank all data contributors (ie, the authors and their originating laboratories responsible for obtaining the specimens, and their submitting laboratories for generating the genetic sequence and metadata and sharing via the GISAID Initiative, on which this research is based). Molecular graphics and analyses were performed with UCSF ChimeraX, developed by the Resource for Biocomputing, Visualisation, and Informatics at the University of California, San Francisco, CA, USA, with support from the National Institutes of Health (R01-GM129325), the Office of Cyber Infrastructure and Computational Biology, and National Institute of Allergy and Infectious Diseases. We want to thank Manish Sagar for sharing his expertise regarding the immune system status for people living with HIV, and Mohsan Saeed for helpful discussions of NSP6. During the preparation of this work the authors used Grammarly in order to assist with language. After using this tool, the authors reviewed and edited the content as needed and take full responsibility for the content of the publication.

Footnotes

Declaration of interests

WPH has served as a consultant for Shionogi, Merck Vaccines, Pfizer, Biobot Analytics, and Vedanta Biosciences. All other authors declare no competing interests.

Contributor Information

Joseline M Velasquez-Reyes, Program in Bioinformatics, Boston University, Boston, MA, USA; Department of Virology, Immunology, and Microbiology, Boston University Chobanian & Avedisian School of Medicine, Boston, MA USA; National Emerging Infectious Diseases Laboratories, Boston University, Boston, MA, USA.

Beau Schaeffer, Center for Communicable Disease Dynamics, Department of Epidemiology, Harvard T H Chan School of Public Health, Boston, MA, USA.

Scott R Curry, Division of Infectious Diseases, Department of Medicine, Medical University of South Carolina, Charleston, SC, USA.

Victoria Overbeck, Department of Epidemiology, Boston University School of Public Health, Boston, MA, USA; Department of Biostatistics, Boston University School of Public Health, Boston, MA, USA.

Cole Sher-Jan, Department of Virology, Immunology, and Microbiology, Boston University Chobanian & Avedisian School of Medicine, Boston, MA USA; National Emerging Infectious Diseases Laboratories, Boston University, Boston, MA, USA.

Bradford P Taylor, Center for Communicable Disease Dynamics, Department of Epidemiology, Harvard T H Chan School of Public Health, Boston, MA, USA.

Jacquelyn Turcinovic, Program in Bioinformatics, Boston University, Boston, MA, USA; Department of Virology, Immunology, and Microbiology, Boston University Chobanian & Avedisian School of Medicine, Boston, MA USA; National Emerging Infectious Diseases Laboratories, Boston University, Boston, MA, USA.

Krutika Kuppalli, Department of Medicine, University of Texas Southwestern Medical Center, Dallas, TX, USA; O’Donnell School of Public Health, University of Texas Southwestern Medical Center, Dallas, TX, USA.

John H Connor, Program in Bioinformatics, Boston University, Boston, MA, USA; Department of Virology, Immunology, and Microbiology, Boston University Chobanian & Avedisian School of Medicine, Boston, MA USA; National Emerging Infectious Diseases Laboratories, Boston University, Boston, MA, USA.

William P Hanage, Center for Communicable Disease Dynamics, Department of Epidemiology, Harvard T H Chan School of Public Health, Boston, MA, USA.

Data sharing

The Illumina sequencing of all samples using in the study were submitted to National Center for Biotechnology Information under the bioproject PRJNA1193019. The bioproject will be available upon publication.

References

  • 1.Chaguza C, Hahn AM, Petrone ME, et al. Accelerated SARS-CoV-2 intrahost evolution leading to distinct genotypes during chronic infection. Cell Rep Med 2023; 4: 100943. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 2.Avanzato VA, Matson MJ, Seifert SN, et al. Case study: prolonged infectious SARS-CoV-2 shedding from an asymptomatic immunocompromised individual with cancer. Cell 2020; 183: 1901–12.e9. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 3.Karim F, Moosa M, Gosnell B, et al. Persistent SARS-CoV-2 infection and intra-host evolution in association with advanced HIV infection. medRxiv 2021; published online June 4. 10.1101/2021.06.03.21258228 (preprint). [DOI] [Google Scholar]
  • 4.Baang JH, Smith C, Mirabelli C, et al. Prolonged severe acute respiratory syndrome coronavirus 2 replication in an immunocompromised patient. J Infect Dis 2021; 223: 23–27. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 5.Cele S, Karim F, Lustig G, et al. SARS-CoV-2 prolonged infection during advanced HIV disease evolves extensive immune escape. Cell Host Microbe 2022; 30: 154–62.e5. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 6.Sepulcri C, Dentone C, Mikulska M, et al. The longest persistence of viable SARS-CoV-2 with recurrence of viremia and relapsing symptomatic COVID-19 in an immunocompromised patient-a case study. Open Forum Infect Dis 2021; 8: ofab217. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 7.Weigang S, Fuchs J, Zimmer G, et al. Within-host evolution of SARS-CoV-2 in an immunosuppressed COVID-19 patient as a source of immune escape variants. Nat Commun 2021; 12: 6405. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 8.Machkovech HM, Hahn AM, Garonzik Wang J, et al. Persistent SARS-CoV-2 infection: significance and implications. Lancet Infect Dis 2024; 24: e453–62. [DOI] [PubMed] [Google Scholar]
  • 9.Álvarez H, Ruiz-Mateos E, Juiz-González PM, et al. SARS-CoV-2 evolution and spike-specific CD4+ T-cell response in persistent COVID-19 with severe HIV immune suppression. Microorganisms 2022; 10: 143. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 10.Ko SH, Radecki P, Belinky F, et al. Rapid intra-host diversification and evolution of SARS-CoV-2 in advanced HIV infection. Nat Commun 2024; 15: 7240. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 11.O’Toole Á, Scher E, Underwood A, et al. Assignment of epidemiological lineages in an emerging pandemic using the pangolin tool. Virus Evol 2021; 7: veab064. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 12.Chen S, Zhou Y, Chen Y, Gu J. fastp: an ultra-fast all-in-one FASTQ preprocessor. Bioinformatics 2018; 34: i884–90. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 13.Wood DE, Lu J, Langmead B. Improved metagenomic analysis with Kraken 2. Genome Biol 2019; 20: 257. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 14.Langmead B, Salzberg SL. Fast gapped-read alignment with Bowtie 2. Nat Methods 2012; 9: 357–59. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 15.Wilm A, Aw PPK, Bertrand D, et al. LoFreq: a sequence-quality aware, ultra-sensitive variant caller for uncovering cell-population heterogeneity from high-throughput sequencing datasets. Nucleic Acids Res 2012; 40: 11189–201. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 16.Li H, Handsaker B, Wysoker A, et al. The Sequence Alignment/Map format and SAMtools. Bioinformatics 2009; 25: 2078–79. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 17.Cingolani P, Platts A, Wang LL, et al. A program for annotating and predicting the effects of single nucleotide polymorphisms, SnpEff: SNPs in the genome of Drosophila melanogaster strain w1118; iso-2; iso-3. Fly (Austin) 2012; 6: 80–92. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 18.Pettersen EF, Goddard TD, Huang CC, et al. UCSF ChimeraX: structure visualization for researchers, educators, and developers. Protein Sci 2021; 30: 70–82. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 19.Katoh K, Standley DM. MAFFT multiple sequence alignment software version 7: improvements in performance and usability. Mol Biol Evol 2013; 30: 772–80. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 20.Khare S, Gurry C, Freitas L, et al. GISAID’s role in pandemic response. China CDC Wkly 2021; 3: 1049–51. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 21.Turakhia Y, Thornlow B, Hinrichs AS, et al. Ultrafast Sample placement on Existing tRees (UShER) enables real-time phylogenetics for the SARS-CoV-2 pandemic. Nat Genet 2021; 53: 809–16. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 22.Chen DY, Chin CV, Kenney D, et al. Spike and nsp6 are key determinants of SARS-CoV-2 omicron BA.1 attenuation. Nature 2023; 615: 143–50. [DOI] [PubMed] [Google Scholar]
  • 23.Feng S, O’Brien A, Chen DY, Saeed M, Baker SC. SARS-CoV-2 nonstructural protein 6 from alpha to omicron: evolution of a transmembrane protein. mBio 2023; 14: e0068823. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 24.Escalera A, Laporte M, Turner S, et al. The impact of S2 mutations on omicron SARS-CoV-2 cell surface expression and fusogenicity. Emerg Microbes Infect 2024; 13: 2297553. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 25.Choy C, Chen J, Li J, et al. SARS-CoV-2 infection establishes a stable and age-independent CD8+ T cell response against a dominant nucleocapsid epitope using restricted T cell receptors. Nat Commun 2023; 14: 6725. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 26.Nkosi T, Chasara C, Papadopoulos AO, et al. Unsuppressed HIV infection impairs T cell responses to SARS-CoV-2 infection and abrogates T cell cross-recognition. Elife 2022; 11: e78374. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 27.Cele S, Karim F, Lustig G, et al. SARS-CoV-2 evolved during advanced HIV disease immunosuppression has Beta-like escape of vaccine and Delta infection elicited immunity. medRxiv 2021. 10.1101/2021.09.14.21263564; published online Dec 7 (preprint). [DOI] [Google Scholar]
  • 28.Papkou A, Gokhale CS, Traulsen A, Schulenburg H. Host-parasite coevolution: why changing population size matters. Zoology (Jena) 2016; 119: 330–38. [DOI] [PubMed] [Google Scholar]
  • 29.Christie MR, McNickle GG. Negative frequency dependent selection unites ecology and evolution. Ecol Evol 2023; 13: e10327. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 30.Bills CJ, Xia H, Chen JYC, et al. Mutations in SARS-CoV-2 variant nsp6 enhance type-I interferon antagonism. Emerg Microbes Infect 2023; 12: 2209208. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 31.Khandia R, Singhal S, Alqahtani T, et al. Emergence of SARS-CoV-2 omicron (B.1.1.529) variant, salient features, high global health concerns and strategies to counter it amid ongoing COVID-19 pandemic. Environ Res 2022; 209: 112816. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 32.Karim F, Riou C, Bernstein M, et al. Clearance of persistent SARS-CoV-2 associates with increased neutralising antibodies in advanced HIV disease post-ART initiation. Nat Commun 2024; 15: 2360. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 33.Khatamzas E, Rehn A, Muenchhoff M, et al. Emergence of multiple SARS-CoV-2 mutations in an immunocompromised host. medRxiv 2021; published online Jan 15. 10.1101/2021.01.10.20248871 (preprint). [DOI] [Google Scholar]
  • 34.Karim F, Gazy I, Cele S, et al. HIV status alters disease severity and immune cell responses in Beta variant SARS-CoV-2 infection wave. Elife 2021; 10: e67397. [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

1

Data Availability Statement

The Illumina sequencing of all samples using in the study were submitted to National Center for Biotechnology Information under the bioproject PRJNA1193019. The bioproject will be available upon publication.

RESOURCES