Abstract
The 2023 monkeypox (mpox) epidemic was caused by a subclade IIb descendant of a monkeypox virus (MPXV) lineage traced back to Nigeria in 1971. Person-to-person transmission appears higher than for clade I or subclade IIa MPXV, possibly caused by genomic changes in subclade IIb MPXV. Key genomic changes could occur in the genome’s low-complexity regions (LCRs), which are challenging to sequence and are often dismissed as uninformative. Here, using a combination of highly sensitive techniques, we determine a high-quality MPXV genome sequence of a representative of the current epidemic with LCRs resolved at unprecedented accuracy. This reveals significant variation in short tandem repeats within LCRs. We demonstrate that LCR entropy in the MPXV genome is significantly higher than that of single-nucleotide polymorphisms (SNPs) and that LCRs are not randomly distributed. In silico analyses indicate that expression, translation, stability, or function of MPXV orthologous poxvirus genes (OPGs), including OPG153, OPG204, and OPG208, could be affected in a manner consistent with the established “genomic accordion” evolutionary strategies of orthopoxviruses. We posit that genomic studies focusing on phenotypic MPXV differences should consider LCR variability.
Subject terms: Pox virus, Viral infection, Molecular evolution
The 2023 monkeypox outbreak was caused by a subclade IIb monkeypox virus (MPXV). Here, using advanced sequencing techniques, the authors identify variations on low-complexity regions of the MPXV genome and describe their potential as evolutionary drivers.
Introduction
Monkeypox virus (MPXV) is a double-stranded DNA virus classified in genus Orthopoxvirus (varidnavirian Nucleocytoviricota: Poxviridae: Chordopoxvirinae) along with other viruses, such as vaccinia virus (VACV) and variola virus (VARV) that also can infect humans1. MPXV causes “monkeypox (mpox)” (World Health Organization International [WHO] Classification of Diseases, Eleventh Revision [ICD-11] code 1E71)2.
First encountered in 1958 in crab-eating macaques imported to Belgium3, MPXV has caused sporadic human disease outbreaks since the 1970s in Eastern, Middle, and Western Africa, totaling approximately 25,000 cases (case fatality rate 1–10%)4, and also sporadic disease outbreaks among wild monkeys and apes5,6. Exposure to MPXV animal reservoirs, in particular rope squirrels and sun squirrels, is a significant risk factor of human infections7.
Since May 2022, multiple European countries have reported a continuously increasing number of MPXV infections and associated disease, including clusters of cases associated with potential superspreading events in Belgium, Spain, and the United Kingdom (UK). As of January 10, 2024, a total of 94,274 cases had been reported in 118 countries/territories/areas in all six WHO regions. While the number of new cases has decreased over time, cases of the disease are still occurring among vaccinated individuals. Therefore, as the duration of the virus’s circulation in humans increases, the risk of emergence of a more transmissible variant capable of causing larger outbreaks escalates.
Phylogenetically, historic MPXV isolates cluster into two clades8, designated I and II9,10. Clade I viruses are considered more virulent and transmissible than clade II viruses8,9,11. The viruses of the 2022 epidemic belong to subclade IIb12–14, a line of descent of MPXV that had been circulating in Nigeria, likely since 197115.
The clinical presentation of mpox caused by MPXV clade I or subclade IIa includes fever, headache, lymphadenopathy, and/or malaise, followed by a characteristic rash that progresses centrifugally from maculopapules via vesicles and pustules to crusts that may occur on the face, body, mucous membranes, palms of the hands, and soles of the feet16. The clinical presentation of subclade IIb infection diverges from classical mpox by having a good prognosis, self-limiting but infectious skin lesions (typically emerging at and restricted to the genital, perineal/perianal, and/or peri-oral areas) before the development of fever, lymphadenopathy, and malaise. Generalized disease usually manifests with a rash that has not been widely observed in the current outbreak. Human-to-human transmission is substantially higher in outbreaks associated with subclade IIb MPXV than those caused by clade I and subclade IIa17–21. The R0 for MXPV IIb among men who have sex with men (MSM) is higher than 1. Transmission may be catalyzed by a decrease in protection associated with the VARV/smallpox vaccination campaign that ended in 198022,23. Furthermore, a change in transmission route may be the cause of the difference in clinical presentation and pathogenesis as was shown in animal models24.
Orthopoxvirus infections are classified as systemic or localized25. The involved orthopoxvirus and the immune status of the host are determinants of generalized or localized infection. Different mechanisms of virion entry and egress, as well as virus-encoded host restriction factors, also play pivotal roles in determining the clinical manifestation of infection26–30. Localized usually means that signs are restricted to the site of viral entry, which is the most common clinical presentation described in the 2022 mpox outbreak. Changes in the genome of the current MPXV variant, such as gene loss31 may explain both trends.
The MPXV genome is a linear, ≈197-kb-long double-stranded DNA with covalently closed hairpin ends. The genome’s densely packed orthologous poxvirus genes (OPGs)32 are distributed over a central conserved region (“core”) and flanking terminal regions, each of which ends in identical but oppositely oriented ≈6.4-kb-long terminal inverted repetitions (ITRs). Roughly 193 open reading frames (ORFs) encode proteins with ≥60 amino-acid residues. “Housekeeping” proteins involved in MPXV transcription, replication, and virion assembly are encoded by OPGs located in the central conserved region, whereas proteins involved in host range and pathogenesis are mostly encoded by OPGs located in the terminal regions33. Like all orthopoxvirus genomes, the MPXV genome contains numerous tandem repeats in the ITRs as well as nucleotide homopolymers all over the genome33–36. However, other similar structures through the MPXV genome were observed in the form of short tandem repeats (STRs). Moreover, initial observations appear to indicate that these STRs (which may consist of dinucleotide, trinucleotide, or more complex palindromic repeats) are localized in areas where more variation is observed, suggesting a crucial role in MXPV biology and evolution.
Orthopoxviruses rapidly acquire higher fitness by massive gene amplification (genome expansion) when encountering severe bottlenecks in vitro. This amplification, akin to gene reduplication in organismal evolution, enables gene copies to accumulate mutations, potentially resulting in protein variants that can overcome the bottlenecks. Subsequent gene copy reduction (genome contraction) offsets the costs associated with increasing genome length, thereby retaining the adaptive mutations37. Orthopoxviruses also rapidly adapt to selective pressures by single-nucleotide insertions (genome expansion) or deletions (genome contractions) within poly-A or poly-T stretches, resulting in easily reversible gene-inactivating or re-activating frameshifts38. These rhythmic genome expansions and contractions are referred to as “genomic accordions” at the gene and base level37. Given the overall conservation of STRs in orthopoxvirus genomes, we hypothesized that their variation could be a third type of genomic accordion and that, overall, this type of adaptation (which we designate here as low-complexity regions [LCRs]), rather than single-nucleotide polymorphisms (SNPs), could be the key to understanding the unusual epidemiology of 2022 subclade IIb MPXV. We do not know whether this different epidemiology is due intrinsically to the virus, to different host behavior, and/or different transmission routes, but all should influence the composition of the MPXV genome, as different selective and purifying pressures would necessarily leave marks on the affected viral effectors.
In this study, we provide a comprehensive genomic characterization of LCRs during the mpox outbreak. Our analysis establishes the fact that LCRs exhibit a non-random distribution across the genome. Moreover, we demonstrate that LCRs display higher entropy compared to SNPs. Importantly, our findings highlight three specific gene candidates that warrant further investigation in relation to transmissibility and/or adaptation. As a result, we propose a focused examination of LCR variability in future MPXV genomic analyses. The distinctive characteristics observed in LCRs emphasize their potential significance in understanding the dynamics of the currently circulating viral clades and suggest promising avenues for targeted research.
Results
De novo assembly of subclade IIb lineage B.1 MPXV genome sequence 353R
Using a template-based mapping approach, shotgun metagenomic short-read-based sequencing of nucleic acids in vesicular lesion swabs from Spanish mpox patients resulted in the determination of 48 MPXV consensus genome sequences with at least 10X read depth. A median of 39,697,742 high-quality reads per swab (maximum = 111,030,976; minimum = 7,780,032) were obtained using a NovaSeq 6000 Illumina sequencer. Although 98.12% of the reads were assigned as being of human origin, a median of 74,085 MPXV reads (maximum = 27,516,891; minimum = 30,854) sufficed to cover >99% of the genome (Supplementary Data 1). Oxford Nanopore MinION Mk1B sequencing of swab 353R generated 410,050 reads, with a median read length of 420 nucleotides and a median read quality of 10.9 Phred.
Read mapping indicated that LCRs of the MPXV genome were mostly unresolved. More importantly, those results were biased by the reference genome used as a scaffold. In general, the observation is that LCRs are resolved by reference mapping software tools “following” the pattern provided in the scaffold reference genome instead of reporting the actual pattern (Figure S1a). To determine the actual LCR sequence, we explored assembly strategies generally used for resolving eukaryotic genomes, which mostly combine different sequencing technologies. To increase the chances of success, we applied these technologies to an mpox patient sample with a high proportion of high-quality viral reads (swab 353R). The de novo assembly obtained from Illumina NovaSeq (2×150-bp pair-ended reads), MiSeq (2×300-bp pair-ended reads), and Oxford Nanopore MinION Mk1b sequencing generated 3, 2, and 1 contigs belonging to MPXV, covering 97, 97, and 101% of the MPXV-M5312_HM12_Rivers sequence, respectively (Fig. 1). The Oxford Nanopore MinION Mk1b de novo assembly had a median depth of coverage of 196X.
Characterization and validation of non-randomly distributed LCRs in MPXV genome sequence 353R
We applied a systematic approach for LCR discovery to the MPXV 353R sequence that resulted in the identification of 21 LCRs (13 STRs, 8 homopolymers; Table 1 and Supplementary Data 2). Two pairs of LCRs (1/4 and 10/11) are located in the ITRs and are identical copies in reverse-complementary form.
Table 1.
Name | Location starta | Location endb | Repeat unitc | Patternd | Nearest OPGe | Type of LCRf | Relative position to the OPGg | Distance in bph | Copenhagen notationi | Vaccinia virus (VACV) notationj | Comments |
---|---|---|---|---|---|---|---|---|---|---|---|
LCR1 | 5369 | 5624 | 16 | [AACTAACTTATGACTT]n | OPG003 (ITR) | STR | Downstream | 72 | Cop-C19L | NA | |
LCR1 | 5369 | 5624 | 16 | [AACTAACTTATGACTT]n | OPG015 (ITR) | STR | Upstream | 35 | CPXV-017 | NA | |
LCR2 | 174,063 | 174,112 | 2 | [ATAT]n | NA | STR | Downstream | 46 | Cop-B16R | B14R | |
LCR3 | 179,872 | 180,345 | 9 | ATAT [ACATTATAT]n | OPG208 | STR | ATG Start/Promoter | 21 | Cop-K2L | B19R | SPI-1 apoptosis inhibition |
LCR4 | 193,504 | 193,759 | 16 | [AAGTCATAAGTTAGTT]n | OPG003 (ITR) | STR | Downstream | 72 | Cop-C19L | NA | |
LCR4 | 193,504 | 193,759 | 16 | [AAGTCATAAGTTAGTT]n | OPG015 (LITR) | STR | Upstream | 35 | CPXV-017 | NA | |
LCR5 | 133,895 | 133,918 | 1 | [T]n | MPXVgp137 | homopolymer | Upstream | 889 | Cop-A25L | A27L | Fragmented gene area |
LCR6 | 133,980 | 133,989 | 10 | [CAATCTTTCT]n | MPXVgp137 | STR | Upstream | 818 | Cop-A25L | A27L | |
LCR7 | 137,319 | 137,375 | 3 | [ATC]n | OPG153 | STR | Inside ORF | NA | Cop-A28L | A26L | Attachment MVs/laminin |
LCR8 | 147,655 | 147,718 | 5 + 7 | [ATATTTT]n [ATTTT]n [ATATTTT]n [ATTTT]n [ATATTTT]n [ATTTT]n [ATATTTT]n | OPG171 | STR | Upstream | 75 | Cop-A42R | A42R | |
OPG170 | STR | Upstream | 70 | Cop-A41L | A41L | ||||||
LCR9 | 151,350 | 151,417 | 9 | [TATGAAG]n [GATATGAT]n [GATATGATG]n [GATATGAT]n | OPG176 | STR | Upstream | 12 | Cop-A46R | A47R | |
LCR10 | 197,830 | 197,842 | 1 | [T]n | OPG001 (ITR) | homopolymer | Downstream | 225 | NA | NA | |
LCR11 | 1286 | 1298 | 1 | [T]n | OPG001 (ITR) | homopolymer | Downstream | 225 | NA | NA | |
LCR12 | 29,326 | 29,364 | 1 | [A]n | OPG044 | homopolymer | Inside ORF | NA | Cop-K7R | B15R | C-terminal position |
LCR13 | 76,896 | 76,904 | 1 | [T]n | OPG097 | homopolymer | Inside ORF | NA | Cop-L3L/L4R | L3L/L4R | |
LCR14 | 81,658 | 81,666 | 1 | [T]n | OPG104 | homopolymer | Inside ORF | NA | Cop-J5L | L5L | Essential for viral replication |
LCR15 | 140,911 | 140,977 | 9 | [ATAACAATT]n [ATAATTGTT]n [ATAATAATT]n [ATAATTGTT]n | OPG159 | STR | Inside ORF | NA | Cop-A31L | A33L | PKR inhibitor candidate? / C-terminal position |
LCR16 | 153,457 | 153,465 | 1 | [A]n | OPG180 | homopolymer | Upstream | 15 | Cop-A50R | A50R | |
LCR17 | 163,979 | 164,003 | 4 | [TAAC]n | OPG188 | STR | Downstream | 82 | Cop-B2R | B4R | |
LCR18 | 166,865 | 166,920 | 7 | [AATAATT]n | OPG190 | STR | Downstream | 15 | Cop-B5R | B6R | |
LCR19 | 170,508 | 170,563 | 6 | [GATACA]n | OPG197 | STR | Inside ORF | NA | Cop-B11R | B11R | Hypothetical protein |
LCR20 | 172,868 | 172,876 | 1 | [T]n | OPG199 | homopolymer | Downstream | 56 | Cop-K2L | SPI-2/B12R | |
LCR21 | 175,299 | 175,357 | 6 | [GATGAA]n | OPG204 | STR | ATG Start/Promoter | NA | Cop-B19R | B16R | Alternative ATG repeat start |
Short tandem repeats (STRs) are described using nucleotide base-pair coordinates with reference to the high-quality genome (HQG) sequence (ENA Accession #OX044336). Listed are the number of repeat units, description of the sequence (with n = number of repeats for this particular genome), identification of the nearest annotated orthologous poxvirus gene (OPG), type of LCR (STR or homopolymer), position of the LCR to the nearest gene, and distance of the LCR to the nearest gene. OPG notations follow the standardized nomenclature32; vaccinia virus (VACV) Copenhagen strain and classical VACV gene notations are shown in addition to enable comparisons. NA not applicable.
anucleotide base coordinate in reference HQG (Genbank #OX044336).
bnucleotide base coordinate in reference HQG (Genbank #OX044336).
cnumber of repeat units in the HQG (Genbank #OX044336).
ddescription of the pattern of the LCR in representative MPXV, in which n is the number of repeats for this particular genome.
eidentification according to Senkevich et al. of nearest identified gene; new notation.
ftype of LCR: short tandem repeats or homopolymer.
gposition of the LCR to the nearest gene.
hdistance of the LCR to the nearest gene.
inotation of the gene in the VACV Copenhagen strain.
jnotation of the gene in the VACV Western Reserve strain.
In general, LCRs were resolved using the assembly obtained from single-molecule sequencing and further validated using short-read sequencing since most sequences were 13–67 bp long and therefore were covered by reads from each side or flanking region without mismatches (Figure S1b; Supplementary Note 1). All LCRs were validated this way with the exception of LCR pair 1/4 (256 bp) and LCR3 (468 bp), which were only resolved with single-molecule sequencing reads due to their lengths (Table 2).
Table 2.
Name | Repeat Unit | Pattern HQ | Number of repeats HQG | Nearest Gene | Variation | Type of Variation | Entropy threshold > 0.03 | Resolved correctly in RefSeq | Nanopore | MiSeq | NovaSeq | # Supporting Reads MiSeq | # Supporting Reads NovaSeq |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
LCR4 | 16 | TAGTCATAAGTTAGTT [AAGTCATAAGTTAGTT]15 | 16 | OPG003 (ITR) | NR | Length | NA | Nob | Yes | No | No | NA | NA |
LCR3 | 9 | ATAT [ACATTATAT]52 | 52 | OPG208 | Yes | Length | NA | Yes | Yes | No | No | NA | NA |
LCR1 | 16 | [AACTAACTTATGACTT]15 AACTAACTTATGACTA | 16 | OPG003 (ITR) | NR | Length | NA | Nob | Yes | No | No | NA | NA |
LCR2 | 2 | [AT]25 | 25 | NA | Yes | Length | 1.66 | No | Yes | Yes | Yes | 768 | 90 |
LCR5 | 1 | [T]24 | 24 | OPG152 | Yes | Length | 1.535 | Yes | No | No | Yes | NA | 112 |
LCR10 | 1 | [T]13 | 13 | OPG001 (ITR) | Yes | Length | 0.63 | Nob | Yes | No | No | 6561 | 11,945 |
LCR11 | 1 | [T]13 | 13 | OPG001 (ITR) | Yes | Length | 0.627 | Nob | Yes | Yes | Yes | 6448 | 11,589 |
LCR21 | 6 | [GATGAA]4 GATGA | 4.5 | OPG204 | Yes | Mutation | 0.207 | Yes | Yes | Yes | Yes | 6578 | 6661 |
LCR7 | 3 | [ATC]14 TATGAT [ATC]3 | 19 | OPG153 | Yes | Length | 0.181 | Yes | Yes | Yes | Yes | 4541 | 6607 |
LCR9 | 9 | [TATGAAG]1 [GATATGAT]1 [GATATGATG]5 [GATATGAT]1 | 8 | OPG176 | No | NA | 0 | Yes | Yes | Yes | Yes | 5208 | 5737 |
LCR8 | 5 + 7 | [ATATTTT]1 [ATTTT]1 [ATATTTT]3 [ATTTT]1 [ATATTTT]2 [ATTTT]1 [ATATTTT]1 | 10 | OPG171 | No | NA | 0 | Yes | Yes | Yes | Yes | 6581 | 6790 |
LCR6 | 10 | [CAATCTTTCT]1 | 1 | OPG152 | Yesa | NA | 0 | Noa | Yes | Yes | Yes | 4884 | 12,930 |
LCR20 | 1 | [T]9 | 9 | OPG199 | No | NA | 0 | Yes | Yes | Yes | Yes | 10,106 | 13,315 |
LCR19 | 6 | GATTCA [GATACA]8 GAT | 9.3 | OPG197 | No | NA | 0 | Yes | Yes | Yes | yes | 4119 | 4685 |
LCR18 | 7 | [AATAATT]3 AATAA | 3 | OPG190 | No | NA | 0 | Yes | Yes | Yes | Yes | 9755 | 11,838 |
LCR17 | 4 | [TAAC]6 T | 6.1 | OPG188 | No | NA | 0 | Yes | Yes | Yes | Yes | 7388 | 9474 |
LCR16 | 1 | [A]9 | 9 | OPG180 | No | NA | 0 | Yes | Yes | Yes | Yes | 10,340 | 16,044 |
LCR15 | 9 | [ATAACAATT]4 [ATAATTGTT]1 [ATAATAATT]1 [ATAATTGTT]1 | 7 | OPG159 | No | NA | 0 | Yes | Yes | Yes | Yes | 7067 | 6569 |
LCR14 | 1 | [T]9 | 9 | OPG104 | No | NA | 0 | Yes | Yes | Yes | Yes | 7819 | 12,521 |
LCR13 | 1 | [T]9 | 9 | OPG097/098 | No | NA | 0 | Yes | Yes | Yes | Yes | 7480 | 12,126 |
LCR12 | 1 | [A]9 | 9 | OPG044 | No | NA | 0 | Yes | Yes | Yes | Yes | 9789 | 13,592 |
Listed are the type and number of supporting reads for each LCR. Definitions of quality: Yes, LCR is found entirely in the assembly in one contig; no, LCR is not assembled with the reported method. All LCRs with entropy levels above 0.15 are shaded in gray. OPG orthologous poxvirus gene.
aLCR6 is a 10-bp repeat that was reported early in the outbreak as an insertion (https://virological.org/t/first-german-genome-sequence-of-monkeypox-virus-associated-to-multi-country-outbreak-in-may-2022/812). In our dataset, we have not seen any variation in this area.
bLCR pairs 1/4 and 10/11 are located in ITRs. Given that no read covering this area reached a unique are outside of the ITRs, we cannot technically state that we solved the repeat. Nonetheless, the ITRs should be identical based on the know poxvirid replication mode.
LCR3 contains a complex tandem repeat with the sequence ATAT [ACATTATAT]n with n = 52 in the MPXV 353R genome sequence. No publicly available MPXV genome sequence contains a tandem repeat of similar length (e.g., likely due to the limitations of the high-throughput sequencing techniques used for their characterization). However, applying the analysis to 35 publicly available MPXV National Center for Biotechnology Information (NCBI) Sequence Read Archive (SRA) datasets of single-molecule raw reads allowed us to resolve the LCR3 of some (Fig. 2a). Fifteen datasets revealed supporting long reads that include both LCR3 flanking regions. Interestingly, four subclade IIb lineage B.1 MPXV sequences associated with the 2022 mpox epidemic and available in SRA have 54–62 repeats in LCR3. Their number of repeats (n) separate these sequences from 2018–2019 subclade IIb lineage A sequences that have 12–42 repeats in LCR3, indicating LCR3 as a region of genomic instability and high variability.
LCR pair 1/4 contains a tandem repeat with the sequence [AACTAACTTATGACTT]n with n = 16 in the MPXV 353R genome sequence. Instead, the sequences of the subclade IIb lineage B.1 MPXV isolate MPXV_USA_2022_MA001 and lineage A reference isolate MPXV-M5312_HM12_Rivers LCR pair 1/4 have n = 8 (Table 3). Inclusion of NCBI SRA datasets into the analysis confirmed the n = 16 value (Fig. 2b). In addition, the analysis of the SRA datasets revealed subclade IIb lineage-specific repeat differences in LCR pair 1/4. Lineage A.1 virus genomes are polymorphic, having 14–19 repeats; lineage A.2 viruses have 23–26 repeats; and lineage A viruses have 32–71 repeats. In contrast, lineage B1 viruses consistently have 16 repeats. While LCR3 appears to have “increased” in length since the spillover, LCR pair 1/4 appears to be decreasing in length, thus behaving like an “accordion” over time.
Table 3.
MPXV-M5312_HM12_Rivers | MPXV_USA_2022_MA001 | 353R | |
---|---|---|---|
Genome length | 197,209 | 197,205 | 198,547 |
SNPsa | NA | 67 | 69 |
Indelsa | NA | 10del 7ins | 11del 6 ins |
Homopolymeric sitesb | 408 | 405 | 399 |
Unique SNPs | NA | 0 | 2 |
LCR pair 1/4 | 8 | 8 | 16 |
LCR2 | 22 | 24 | 25 |
LCR3 | 18 | 16 | 52 |
LCR5 | 25 | 28 | 24 |
LCR6 | 2 | 1 | 1 |
LCR7 | 19 | 17.6 | 17.6 |
LCR8 | 10 | 10 | 10 |
LCR9 | 8 | 6 | 6 |
LCR pair 10/11 | 17 | 14 | 13 |
LCR12 | 9 | 9 | 9 |
LCR13 | 9 | 9 | 9 |
LCR14 | 9 | 9 | 9 |
LCR15 | 7 | 7 | 7 |
LCR16 | 9 | 9 | 9 |
LCR17 | 6.1 | 6.1 | 6.1 |
LCR18 | 3.5 | 3.5 | 3.5 |
LCR19 | 9.3 | 9.3 | 9.3 |
LCR20 | 9 | 9 | 9 |
LCR21 | 4.5 | 4.5 | 4.5 |
Low-complexity region (LCR) repetitions for each genome are indicated. Discrepant number (n) of LCR repeats are given in bold. indels number of insertion (ins) or deletion (del) of bases; LCR low-complexity region, SNP single-nucleotide polymorphism.
aSNPs and indels vs MPXV-M5312_HM12_Rivers,
bhomopolymers >8 nt in length; bold values indicate discrepant number (n) of LCR repeats.
The subclade IIb lineage B.1 353R and MPXV_USA_2022_MA001 genome sequences have the same 67 SNPs called against the subclade IIb lineage A reference isolate MPXV-M5312_HM12_Rivers genome sequence (Supplementary Data 7). Additionally, the 353R sequence has two additional paired SNPs in the left and right ITRs (5595G→A; 191,615C→T compared with the MPXV-M5312_HM12_Rivers sequence) that result in the introduction of a stop codon in OPG015. We observed this variation in only two other patient samples among our dataset. The 353R and MPXV_USA_2022_MA001 sequences also differ by two number of insertion (ins) or deletion (del) of bases (indels) at positions 133,077 and 173,273, respectively, which correspond to differences in LCR2 and LCR5, respectively. As a result of the accurate resolution of the LCRs, the genome of 353R is 1342 bp longer compared with the MPXV_USA_2022_MA001 sequence and 1338 bp compared with the MPXV-M5312_HM12_Rivers sequence. Most of the variation is due to differences in the length of LCR pair 1/4 and LCR3, along with minor length differences in LCR2, LCR5, and LCR pair 10/11 (Table 3). In general, the number of repeats (n) found with the hybrid assembly approach we used here doubled the length of LCRs.
Based on the higher resolution of the MPXV 353R genome sequence, in particular regarding LCRs, we propose this sequence as the new MPXV high-quality genome (HQG) reference sequence39.
LCRs are not associated with defective genomes
Sequencing MPXV DNA directly from skin lesions has the advantage of avoiding artifacts arising by virus passage in cell culture, but defective virus genomes that are unable to replicate might be overrepresented. Consequently, identified LCRs or SNPs might not be important in MPXV pathogenesis as they could be part of defective virus genomes rather than the genome of replicative virus. To rule out this scenario, we analyzed our genomic data (Supplementary Data 4) to evaluate whether the variability observed could be the result of proliferation of defective MPXV genomes.
Defective MPXV genomes may arise naturally in various ways (e.g., via mutations resulting in stop codons or nucleotide insertions or deletions that interrupt ORFs). We hypothesized that insertions or deletions of entire codons (nucleotide triples) would be significantly less likely to occur just by chance than insertions or deletions of single nucleotides. Therefore, we re-analyzed all majority and minority alleles to identify these types of changes. No stop codons were detected in any allele in the LCRs inside of genes, enabling us to quickly rule out this scenario. We then focused on variants leading to LCR length variation (e.g., due to insertions or deletions). None of the seven LCRs located inside of genes caused major allele frameshifts. Among minor alleles, only changes in LCR12 (homopolymer, three samples, frequency range of 0.08–0.26) and LCR14 (homopolymer, allele frequency of 0.037) were observed; neither was related to changes in coding regions. Further, we relaxed the inclusion criteria to consider all alleles occurring at a frequency equal or higher than 0.03 regardless of the depth. This enabled us to perform a statistical comparison among these seven LCRs and others located in non-coding regions of the genome (Figure S6). Among the seven LCRs detected, five, including LCR3, LCR7, and LCR21 (identified as primary candidates driving adaptation), evolved solely as changes in codons (Figure S6). These analyses collectively enabled us to confidently rule out polymerase slippage and defective virions as drivers of the observed variability in LCRs.
LCRs are non-randomly distributed in the monkeypox virus (MPXV) genome
We compared the distribution of LCRs between the different major functional protein OPG groups following the classification32. Differences between functional groups were statistically significant (Kruskal–Wallis test, χ2 p-value < 0.001). Pairwise analysis demonstrated that the functional group “Housekeeping genes/Core” (orthopoxvirus genomic central conserved region) includes LCRs at a significantly lower frequency (multiple pairwise-comparison Wilcoxon test) than functional groups “ANK/PRANC” (false discovery rate [FDR]-corrected p-value < 0.0001), “Bcl-2 domain” (corrected p-value = 0.04), and “Accessory/Other” (FDR-corrected p-value < 0.0001) (Fig. 3a). These analyses indicate that LCRs in orthopoxvirus genomes are non-randomly distributed and that there is a significant purifying selection force against introducing LCRs in central conserved region areas.
Next, we compared the degree of diversity among the 21 identified LCRs with the observed SNP variability that had been the focus of the field. In the 353R HQG sequence, LCRs 2, 5, 7, 10, 11, and 21 had intra-host genetic diversity, with entropy values that ranged from 0.18 (LCR7) to 1.66 (LCR2), with an average of 0.81 and a standard deviation (SD) of 0.64 among them (Table 2). Only five nucleotide positions (1285; 6412; 88,807; 133,894; and 145,431) had intra-host genetic diversity at the SNP level. The entropy values ranged from 0.17 (position 133,894) to 0.69 (position 6412), with an average of 0.38 and a SD = 0.21 among them. Interestingly, a student’s t-test revealed a significantly higher level of diversity in LCRs than in SNPs (p-value = 0.021; Fig. 3b).
Then, we characterized, collected, and compared the allele frequencies for all LCRs from all dataset samples (Supplementary Data 3) applying the filters described above. Our analyses revealed that the average inter-sample Euclidean distances at LCRs ranged from 0.05 (LCR21) to 0.73 (LCR2) (Fig. 3c). We found statistically significant differences between LCRs (Kruskal–Wallis χ2 p-value < 0.001). More specifically, multiple pairwise comparison Wilcoxon test results showed that all LCRs have significantly different levels of inter-sample distances (FDR-corrected p-values < 0.001), except in case of the LCR10 versus LCR11 (FDR-corrected p-value = 0.48) and LCR2 versus LCR5 (FDR-corrected p-value = 0.25) (Fig. 3c). Average distances in SNPs were 0.0018–0.4168. Our randomization tests revealed that all LCRs have a significantly higher level of inter-sample diversity than the SNPs (all FDR-corrected p-values < 0.05) (Fig. 3c). These analyses uncovered that most of the variability in the MPXV genome is located in LCRs. Consequently, we propose that studies focusing on MPXV genomic sequence variation should include LCRs as a possible source of meaningful variation.
LCRs might be more phylogenetically informative than SNPs for inter-host sequence analysis
Analysis of only two mpox patient samples (353R and 349R) resulted in sufficient sequence coverage information to enable allele frequency comparison in most LCRs (Fig. 4a). Their side-by-side comparison revealed differences in allele frequency in some of them (LCR2, LCR5, and LCR pair 10/11) (Fig. 4b). The sequence coverage achieved with the remaining samples only enabled to unequivocally resolve an LCR subset (i.e., covering both flanking regions: LCRs 2, 5, 7, 8, 9, and pair 10/11). LCR8 and LCR9 were identical across the entire sample set. However, LCR7 and LCR pair 10/11 had considerable intra-host variation, as well as differences in the preponderant allele (LCR pair 10/11) between samples (Fig. 4c).
Phylogenetic (Figure S2a) and haplotype network (Figure S2b) analyses of the sequences determined from the mpox patient samples yielded limited information regarding the outbreak. Most sequences were highly similar and are, therefore, part of the basal ancestral MPXV subclade IIb lineage B1 node. Some sequences formed supporting clusters. Sequences clustered into groups as follows (see also Supplementary Data 4).
group 1 (lineage B.1): sequences from patients 395, 399, 441, and Floridian MPXV isolate MPXV_USA_2022_FL002 (GenBank #ON676704);
group 2 (lineage B.1): sequences from patients 347, 352R, 353R, 416, and Spanish MPXV isolate MPXV/ES0001/HUGTiP/2022 (GenBank #ON622718); all share a stop-codon mutation in OPG015;
group 3 (lineage B.1.3): sequences from patient 2,369 formed with Slovenian MPXV isolate SLO (GenBank #ON609725.2), French isolate MPXV_FR_HCL0001_2022 (GenBank #ON622722), and 38 other sequences worldwide; defined by NBT03_gp174 mutation G190,660A, resulting in an R84K amino-acid residue change;
group 4 (lineage B.1): sequences from patients 417 and 2437;
group 5 (lineage B.1.1): sequences from patients 698, 1300, 2388, 2428, German MPXV isolate MPXV/Germany/2022/RKI01 (GenBank #ON637938.1), and 97 other sequences worldwide; defined by OPG094 (Cop-G9R) mutation G74,360A, resulting in an R194H amino-acid residue change); and
group 6 (lineage B.1): sequences from patients 2309 and 2317.
Only one epidemiological link among group samples was identified; patients 395 and 399 were sexual partners who attended events together in both Portugal and Spain.
In summary, there appears to be limited value in full-genome SNP characterization for transmission analysis.
Conservation and variation in proteins encoded by OPGs and codon usage analysis in OPG LCRs
Analyses showed that LCRs were associated with intra-host and inter-sample variation. Although most LCRs that showed variability in our sample set (pair 1/4, 2, 3, 5, 7, pair 10/11, and 21), three (3, 7, and 21) are located in regions that, considering orthopoxvirus evolutionary history, are associated with virulence or transmission40,41. Noteworthy, three of the 21 highly repetitive areas identified in our intra-host variation analysis (those of LCRs 5, 6, and 7) are located in a defined central conserved region of the orthopoxvirus genome between positions 130,000 and 138,000 (Fig. 1). This region contains OPG152 (Cop-A25L) (which is truncated in the MPXV genome), OPG153 (Cop-A26L) (directly affected by LCR7), and OPG154 (Cop-A27L). LCR7 is the only STR that is located at the center of a functional ORF. In contrast, LCR3 and LCR21 are situated in the promoter/start area, potentially modifying the ORF start site. The repeat area of LCR7 encodes a poly-D homopolymer in a nonstructured region of OPG153 (Fig. 5a). The changes we uncovered result in the insertion of two isoleucyls. This change resembles the primary structure found in clade I viruses. In contrast, pre-2017 African subclade IIa viruses lack such insertions.
Another region of potential functional impact is the area between 170,000 to 180,000 that includes LCRs 19, 20, 2, 21 and 3 (Fig. 1). The LCR3 repeat [ACATTATAT]n is located 21 bp upstream of the putative translation start site of OPG208 (Cop-K2L). Importantly, a methionyl codon is located immediately upstream of LCR3. The usage of this start codon would result in the introduction of an Ile-Ile-Tyr repeat. This codon has a medium to low probability of being used as start codon in the cognate mRNA (T base in position −3), compared to a “strong” Kozak sequence of the downstream putative start codon. Nevertheless, LCR3 remained in-frame in all clade II MPXV samples, indicating the possibility of alternative start translation (Fig. 5b). Interestingly, LCR3 is not located in-frame in most clade I viruses. This may be significant because OPG208 is a member of a set of genes most likely responsible for increased virulence of clade I MPXV compared to classical (pre-epidemic) subclade IIa MPXV40. The LCR3 tandem repeat [ACATTATAT]n is present with n = 52, 54, and 62 copies in epidemic subclade IIb lineage B viruses (Fig. 2a), whereas it is n = 7, 37, and 27 in subclade IIa MPXV isolates Sierra Leone (GenBank #AY741551), MPXV-WRAIR7-61 (GenBank #AY603973), and MPXV-COP-58 (GenBank #AY753185) sequences, respectively40, as well as n = 16 for clade I MPXV isolate Zaire-96-I-16 (GenBank #AF380138). Interestingly, all publicly available subclade IIb lineage A single-molecule long-read sequence data imply a repeat n < 40 (Fig. 2a). The LCR3 repeat sequence may have implications beyond its repetitive nature, potentially influencing promoter function. Notably, a codon usage analysis of MPXV 202242 unveiled biases in the usage of different codons. This analysis also suggested an increased adaptation to primates and unique evolutionary processes within MPXV. TAC codons appear to be underrepresented in the MPXV genome42. LCR3 introduces 52 TAC copies. Intriguingly, when considering the alternative methionine codon upstream of LCR3, the usage of TAC codons is significantly changed compared to its regular utilization in MPXV (Fig. 5c). This altered usage pattern aligns more closely with that observed in humans (Fig. 5c), similar to the minimal adaptation to primates observed by Zhou et al.42. Additionally, the STR downstream of LCR21 introduces a methionyl codon upstream of the putative start codon for OPG204 (Cop-B19R) (Fig. 5d). Preliminary analysis of the Kozak sequence indicates a medium to high probability for translation compared to the putative start codon (Fig. 5d). These observations highlight the nuanced influence of repeat sequences and codon usage on the translational landscape of MPXV, providing intriguing avenues for further investigation.
The remaining two LCRs (pair 4/1 and pair 10/11) are located downstream of known ORFs; thus, their variation is less likely to be associated with a change in phenotype.
Experimental demonstration of the impact of LCR variability on viral ORF stability, transcription, and/or translation
Three LCRs are located within the putative protein-coding sections of genes OPG153, OPG204, and OPG208. We observed that alterations in OPG204 and OPG208 are found close to the N-terminal area and, thus, hypothesized they could be connected to the regulation of protein synthesis. The LCR repeat in OPG208 gives rise to an Ile-Ile-Tyr repeat (Fig. 5b), and a Met-Lys repeat is created by the variation in the N-terminal domain in OPG204 (Fig. 5d). Instead, the modification observed in OPG153, a poly-D amino-acid homopolymer (Fig. 5a), was located at the center of the ORF, but its resulting nature resembled features regulating protein degradation in other mammalian systems. Thus, we designed an initial experiment to demonstrate the presence of alternative start codons (ATGs) and N-terminal domains in viral mRNAs of subclade IIb MPXV strains.
Characterization of OPG204 and OPG208 promoters and their transcriptomic data during MPXV infection suggest that LCRs 3 and 21 are transcribed
We utilized open-access RNA-seq data to examine alterations in the transcriptional profile of MPXV during infection43. We used these data to identify the top (e.g., 10 or more abundant transcripts) early and late expressed viral genes (see DESEQ results in Supplementary Data 8). Our ultimate objective was to pinpoint their conserved promoters. Both type of promoters, as previously characterized for orthopoxviruses44,45, exhibit a high ratio of adenine and thymine bases (Figure S6a). Late promoters encompass the initial start codon of the gene, whereas early promoters are typically located approximately 20 nucleotides upstream of the start codon (Figure S6b). These characteristics are akin to those previously reported for VACV44,45. We successfully identified the promoter sequences of OPG204 and OPG208. Interestingly, we confirmed their location upstream of the LCRs and the implied alternative start codon linked to these regions (Figure S6c and S6d).
To further substantiate the transcription of these LCRs, along with their associated alternative start codons, we assessed the read coverage of the 5’ end (spanning a region that begins 40 nucleotides upstream of the start codon and concludes 10 nucleotides downstream it) for a collection of early-expressed genes. We anticipated that areas expected to be transcribed (e.g., immediately downstream of promoters) would possess higher read coverage than regions flanking the gene (Fig. 6a–h). The LCRs and alternative start codons exhibited significantly higher read coverage than the non-coding regions (Fig. 6g and Fig. 6h). This finding held true for both OPG204 and OPG208. This additional evidence supported our hypothesis that the LCRs and the ATGs associated with them are transcribed, extending the translated protein.
Discussion
MPXV subclade IIb traces back to a human MPXV infection that likely occurred after spillover from a local animal reservoir in Ihie, Abia State, Nigeria, in 1971. An additional 10 human infections with MPXV of this lineage were detected through 1978, when this lineage seems to have disappeared. However, in 2017, it reemerged in Yenagoa, Bayelsa State, Nigeria15. Since then, hundreds of mpox cases have been reported and MPXV belonging to the subclade IIb lineage has been sampled in several countries. However, there were no secondary cases prior to the 2022 epidemic46–49.
Subclade IIb viruses cause mpox that presents differently than the classical disease caused by clade I and subclade IIa viruses; i.e., subclade IIb infections are associated with higher prevalence among adults than adolescents, are predominant in the MSM community, and are efficiently transmitted human-to-human from localized infectious skin lesions rather than requiring disseminated infection17–21.
Comparative genomics demonstrated obvious relationships between orthopoxvirus genotype and phenotype, driven by selective pressure from hosts31,40,50–55. Consequently, it was expected that increased MPXV genotype IIb human-to-human transmission would go hand in hand with genotypic changes. However, since orthopoxvirus genomes are organized in redundant ways37,38,56,57, genotypic changes were expected to be modulating, rather than radical.
Genomic characterization of the 2022 epidemic has focused on describing its evolutionary history and tracking MPXV introductions into western countries. The 2022 MPXV cluster diverges from predecessor viruses by an average of 50 SNPs. Of these, the majority (n = 24) are non-synonymous mutations with a second minority subset of synonymous mutations (n = 18) and a few intergenic differences (n = 4)58. Genome editing by apolipoprotein B mRNA-editing catalytic polypeptide-like 3 (APOBEC3) and deletion of immunomodulatory genes were also noted59,60. MPXV sublineages mostly represent very small variations, usually characterized by one or two SNP differences to basal phylogenetic nodes14. Four of the determined MPXV genome sequences can be assigned to global lineage B.1.1, one to lineage B.1.3, and the remaining ones to lineage B1. Due to the limited uncovered relationship between SNPs and epidemiological links, we propose that future MPXV genomic epidemiological analyses include LCRs.
Our analyses located considerable MPXV genomic variability in areas previously considered of poor informative value, i.e., in LCRs. Because LCR entropy is significantly higher than that of SNPs and LCRs are not randomly distributed in defined coding areas in the genome and, because genomic accordions are a rapid path for orthopoxvirus adaptation during serial passaging37,38, we posit that LCR changes might be associated with MPXV transmissibility differences over time.
Eight LCRs had evident signs of intra-host and inter-sample variation (pair 1/4, 2, 3, 5, 6, 7, pair 10/11, and 21). Five of them (5, 6, 7, 3, and 21) were co-located in two areas of the MPXV genome: base pairs 130,000–135,000 (5, 6, and 7) which are in the central conserved region of the orthopoxvirus genome in which most “housekeeping” genes are located; and base pairs 170,000–180,000 (3 and 21), which are located in the immunomodulatory area (Fig. 1). One of those LCRs is located inside the translated regions of gene OPG153, whereas LCRs associated with OPG204 and OPG208 would increase the length of the genes by extending the N-terminal domains with repetitive codons. The OPG153 repeat results in a poly-D amino-acid homopolymer string (Fig. 5a); the LCR repeat in OPG208 results in an Ile-Ile-Tyr repeat (Fig. 5b); and the N-terminal domain variation in OPG204 results in a Met-Lys repeat (Fig. 5c). Thus, to demonstrate the biological relevance of our findings, we analyzed publicly available transcriptome data obtained at different times during the MPXV replication cycle, allowing for the identification of expression of viral transcripts43. We first identified the early and late putative MPXV promoters (Figure S6). We also demonstrated the presence of viral mRNA derived from both OPG204 and OPG208 that included the putative LCR-encoding region, a position of the promoter compatible with expression of the extended viral mRNA, and complete absence of the shorter previously annotated mRNA. These data demonstrate that LCR-containing areas of the genome are of functional importance.
Protein translation rates are in part regulated by the availability of mRNA codon-cognate amino-acylated tRNAs. The non-optimal tyrosyl codons in MPXV LCRs 3 and 21 suggest such translation modulation for OPG208 and OPG204, respectively. The protein encoded by OPG208, Serpin-K2, is a serine protease inhibitor-like protein that functions as an inhibitor of apoptosis in VACV-infected cells61,62 that could prevent VACV proliferation and protect nearby cells63,64. Our analysis points to considering OPG208 as a potential MPXV virulence marker, opposite to what has been stated previously40. The protein encoded by OPG204, B16, is a secreted decoy receptor for interferon type I65,66. We did not observe any repeat number changes in OPG204-associated LCR21, but SNPs in clade I, subclade IIa, and subclade IIb result in alternative translational start sites followed by a lysyl, suggesting these SNPs could have direct effects on OPG204 translation.
The changes observed in OPG153 stand out as they are located inside a region that is under high selective pressure for transmission in a “housekeeping” orthologous poxvirus gene, which is involved in virion attachment and egress32. Thus, we urge that these areas be scrutinized for changes that might affect the MPXV interactome.
The OPG153 expression product (A26) attaches orthopoxvirus particles to laminin67 and regulates their egress41,68–70. OPG153 is unique, as it is the central conserved region gene that has been “lost” the most times during orthopoxvirus evolution32. Inactivation of OPG153 genes by frameshift mutations occurs rapidly in experimental evolution models38, resulting in increased virus replication levels, changes in particle morphogenesis, decreased particle-to-PFU ratios, and pathogenesis modulation38,41. Finally, A26 is the main target of the host antibody response to orthopoxvirus infection71,72. Thus, any OPG153 modulation is likely of significance.
In summary, our in silico comparative genomics and in vitro functional genomics findings expand the concept of genome accordions as a simple and recurrent mechanism of adaptation on a genomic scale in orthopoxvirus evolution. A consequence of this broadening is the recognition that areas of the MPXV genome (LCRs) might be relevant in adaptability of the orthopoxvirus replication cycle. Further characterization of their functional biology, which will need to incorporate “loss-of-function” and genetically modified strains to understand their roles, will likely improve our understanding of current mpox epidemiology and clinical presentation.
Methods
Study population
This study includes confirmed human mpox cases diagnosed from May 18 to July 14, 2022, at the Centro Nacional de Microbiología (CNM), Instituto de Salud Carlos III, Madrid, Spain. The study was performed as part of the public health response to the current mpox epidemic by the Spanish Ministry of Health. Sample information is listed in Supplementary Data 1 and Supplementary Data 5.
The samples used in this work were obtained in the context of the Microbiological Surveillance and Diagnosis Program for the mpox outbreak conducted by the Centro Nacional de Microbiología, Instituto de Salud Carlos III. The study was based on routine testing, did not involve any additional sampling or tests and stored RNA extracts were used, so specific ethical approval was not required for this study. All sequenced viruses corresponded to those to patients that gave consent to be analyzed for diagnosis or surveillance purposes.
Study sample processing
Swabs of vesicular lesions from study patients in viral transport media were sent refrigerated to CNM. Nucleic acids were extracted at CNM using either a QIAamp MinElute Virus Spin DNA Kit (#57704; Qiagen, Germantown, MD, USA) or QIAamp Viral RNA Mini Kit (#52904; Qiagen) according to the manufacturer’s recommendations. Inactivation of samples was conducted in a certified class II biological safety cabinet in a biosafety level (BSL) 2 laboratory using BSL-3 best practices with appropriate personal protective equipment.
MPXV laboratory confirmation
MPXV detection by PCR in a sample was considered laboratory confirmation and resulted in inclusion of the swab in the study. A previously described orthopoxvirus-generic real-time PCR (qPCR) was used for screening73. A previously described conventional validated nested PCR targeting OPG002 (Cop-C22L) (encoding a TFN receptor) was used for results confirmation74.
MPXV genome sequencing
Sequencing libraries were prepared with a tagmentation-based Illumina DNA Prep kit (#20060060; Illumina, San Diego, CA, USA) and run in a NovaSeq 6000 SP Reagent Kit (#20028312; Illumina) flow cell using 2×150 paired-end sequencing. To improve assembly quality, the library from swab 353R, an unpassaged vesicular fluid from a confirmed case, was also run in a MiSeq Reagent Kit v3 (#MS-102-3003; Illumina) flow cell using 2×300 paired-end sequencing. Additionally, sample 353R was also analyzed by single-molecule methods using nanopore sequencing (Oxford Nanopore Technologies, Oxford, UK). For nanopore sequencing, 210 ng of DNA was extracted from swab 353R and used to prepare a sequence library with a Rapid Sequencing Kit (#SQK-RAD114; Oxford Nanopore Technologies); the library was analyzed in an FLO-MIN106D (#FLO-MIN106D; Oxford Nanopore Technologies) flow cell for 25 h. The process rendered 1.12 Gb of filter-passed bases.
Bioinformatics
De novo assembly and annotation of subclade IIb lineage B.1 MPXV genome sequence 353R
Due to the high yield of MPXV genomic material in a preparatory run, swab 353R was selected as source material for the determination of an MPXV high-quality genome (HQG) sequence. Single-molecule long-sequencing reads were preprocessed using Porechop v0.3.2pre75 with default parameters. Reads were de novo assembled using Flye v2.9-b176876 in single-molecule sequencing raw read mode with default parameters, resulting in one MPXV contig of 198,254 bp. Short 2×150 sequencing reads were mapped with Bowtie2 v2.4.477 against the selected contig, and resulting BAM files were used to correct the assembly using Pilon v1.2478. At this intermediate step, this corrected sequence was used as a reference in the nf-core/viralrecon v2.4.1 pipeline79 for mapping and consensus generation with short-read sequencing. The allele frequency threshold of 0.5 was used for including variant positions in the corrected contig.
Short MiSeq 2×300 and NovaSeq 2×150 sequencing reads were also assembled de novo using the nf-core/viralrecon v2.4.1 pipeline, written in Nextflow80 in collaboration between the nf-core community81 and the Unidad de Bioinformática, Instituto de Salud Carlos III, Madrid, Spain (https://github.com/BU-ISCIII). FASTQ files containing raw reads were quality controlled using FASTQC v0.11.982. Raw reads were trimmed using fastp v0.23.283. The sliding-window quality-filtering approach was performed, scanning the read with a 4-base-wide sliding window and cutting 3’ and 5’ base ends when average quality per base dropped below a Qphred of 20. Reads shorter than 50 bp and reads with more than 10% read quality under Qphred 20 were removed. Host genome reads were removed via kmer-based mapping of the trimmed reads against the human genome reference sequence GRCh38 (https://www.ncbi.nlm.nih.gov/data-hub/genome/GCF_000001405.26/) using Kraken 2 v2.1.284. The remaining non-host reads were assembled using SPADES v3.15.385,86 in rnaviral mode. A fully ordered MPXV genome sequence was generated using ABACAS v1.3.187, based on the MPXV isolate MPXV_USA_2022_MA001 (Nextstrain subclade IIb lineage B.1) sequence (GenBank #ON563414.3)88. The independently obtained de novo assemblies and reference-based consensus genomes obtained from swab 353R were aligned using MAFFT v7.47589 and visually inspected for variation using Jalview v2.11.090.
Systematic identification of low-complexity regions in orthopoxvirus genomes
Detection of STRs in the HQG sequence and other orthopoxvirus genomes was performed with Tandem repeats finder91, using default parameters. Briefly, the algorithm works without the need to specify either the pattern or its length. Tandem repeats are identified considering percent identity and frequency of insertion (ins) or deletion (del) of bases (indels) between adjacent pattern copies, using statistically based recognition criteria. Since Tandem repeats finder does not detect single-nucleotide repeats, we developed an R script to systematically identify homopolymers of at least 9 nucleotide residues in all available orthopoxvirus genome sequences. STRs and homopolymers were annotated as LCRs.
Curation of LCRs in the MPXV high-quality virus genome sequence
We curated LCRs in the HQG sequence using a modified version of STRsearch92. Once provided with identifying flanking regions, STRsearch performed a profile analysis of STRs in massively parallel sequencing data. To ensure high-quality characterization of LCR alleles, we modified the script (https://github.com/BU-ISCIII/MPXstreveal) to complement reverse reads that map against the reverse genome strand according to their BAM flags. In addition, output was modified to add information later accessed by a custom Python script to select only reads containing both LCR flanking regions. All LCRs in the HQG sequence were manually validated using STRsearch results and de novo assemblies obtained from all sequencing approaches. When an LCR was only resolved by single-molecule long-sequencing technologies (LCR pair 1/4 and LCR3), we also analyzed publicly available data by downloading all single-molecule long-sequencing data from NCBI SRA (https://www.ncbi.nlm.nih.gov/sra) as of August 10, 2022, and analyzed the data according to Supplementary Note 1 found in Supplementary Information file.
Final MPXV high-quality virus genome sequence assembly
The consensus genome constructed with the nf-core/viralrecon v2.4.1 pipeline using the corrected de novo contig as stated above, along with the resulting curated and validated consensus LCRs, were used to build the final HQG reference sequence using a custom Python script. The resulting HQG is available from the European Nucleotide Archive (#OX044336.2).
Generation of MPXV high-quality virus genome reference-based consensus sequence for all other samples
For the remaining specimens, sequencing reads were analyzed for MPXV genome sequence determination using the nf-core/viralrecon v2.4.1 pipeline. Trimmed reads were mapped with Bowtie2 v2.4.4 against the HQG sequence and the sequence of subclade IIb lineage A MPXV isolate M5312_HM12_Rivers (GenBank #MT903340.1)93. Picard v2.26.1094 and SAMtools v1.1495 were used to generate MPXV genome mapping statistics. iVar v1.3.196, which calls for low-frequency and high-frequency variants, was used for variant calling. Variants with an allele frequency higher than 75% were kept to be included in the consensus genome sequence. BCFtools v1.1497 was used to obtain the MPXV genome sequence consensus with filtered variants and masked genomic regions with coverage values lower than 10X. All variants, included or not, in the consensus genome sequence, were annotated using SnpEff v5.0e98 and SnpSift v4.399. Final summary reports were created using MultiQC v1.11100. Consensus genome sequences were analyzed with Nextclade v2.4.1101 using the “MPXV (All clades)” dataset (timestamp 2022-08-19T12:00:00Z). Raw reads and consensus genomes are available from the European Nucleotide Archive (#ERS12168855–ERS12168865, #ERS12168867, #ERS12168868, and #ERS13490510–ERS13490543).
Intra-host and inter-host allele frequency analyses
Intra-host genetic entropy (defined as -sum(Xi*log(Xi)), in which Xi denotes each of the allele frequencies in a position) was calculated according to the SNP frequencies of each position along the genome using nf-core/viralrecon v2.4.1 pipeline results. Similarly, genetic entropy for each LCR was calculated considering the frequencies of repeat lengths.
LCR intra-host and inter-host variations in the sample set were analyzed using a modified version of STRsearch. As a quality check for this analysis, STRsearch results (Supplementary Data 5) were filtered, keeping alleles with at least 10 reads spanning the region and allele frequency above 0.03. Quality control and allele frequency graphs were created using a customized R script.
Pairwise genetic distances between samples were calculated as Euclidean distances (defined as /X-Y/=sqrt(sum(xi-yi)^2), in which xi and yi are the allele frequencies of sample x and y at a given position, respectively), thus accounting for the major and minor alleles at each analyzed position. Distances were calculated individually for each variable LCR (STRs 2, 5, 7, 10, 11, and 21) and for each of all 5422 SNPs showing inter-sample variability (compared to MPXV-M5312_HM12_Rivers). The distributions of inter-sample distances were compared between LCRs using a Kruskal–Wallis test (χ2 p-values) followed by multiple pairwise-comparison between groups (Wilcoxon test), with p-values subjected to the false discovery rate (FDR) correction. A randomization test was used to test whether inter-sample variability in LCRs is higher than that in SNPs: first, the average Euclidean distance for each LCR and each SNP position was calculated; then, the average value of each LCR was compared to a random sample of 1000 values from the distribution of mean distances from the SNPs along the genome. The p-value was calculated from the percentage of times that the mean of the LCR was higher than the randomly taken values from the SNPs.
Phylogenetic analysis of the MPXV central conserved region
Variant calling and SNP matrix generation was performed using Snippy v4.4.5102, including sequence samples and representative MPXV genome sequences downloaded from GenBank (Supplementary Data 5). The SNP matrix with both invariant and variant sites was used for phylogenetic analysis using IQ-Tree 2 v. 2.1.4-beta103 via predicted model K3Pu+F + I and 1000 bootstrap replicates. A phylogenetic tree was visualized and annotated using iTOL v6.5.8104. The SNP matrix was also used for generating the haplotype network using PopArt v1.7105.
Selected MPXV ORF analysis
Representative orthopoxvirus genomes32 were downloaded from GenBank together with the consensus genome sequences from the specimens analyzed in this study (Supplementary Data 5). MPXV genomes were assigned to clades and lineages according to the most recent nomenclature recommendations according to Nextstrain14 using Nextclade v2.4.1. Annotations from RefSeq #NC_063383.1 (subclade IIb lineage A MPXV virus isolate MPXV-M5312_HM12_Rivers) GFF file were transferred to all FASTA genome sequences using Liftoff v1.6.3106. OPG153 was extracted using AGAT v0.9.1107 and multi-FASTA files were generated for each group and gene. OPG204 and OPG208 alternative annotation start site ORFs were re-annotated in Geneious Prime (Biomatters, San Diego, CA, USA), and extracted as alignments. We used MUSCLE v3.8.1551 for aligning each multi-FASTA file and Jalview v2.11.0 for inspecting and editing the alignments. Finally, MetaLogo v1.1.2108 was used for creating and aligning the sequence logos for each orthopoxvirus group of the OPG153 LCR7, OPG204 LCR21, and OPG208 LCR3.
Comparison of LCR frequencies in protein functional groups
The potential biological impact of LCRs was evaluated by mapping the frequency and location of STRs and homopolymers in the orthopoxvirus genome and considering the biological function of the affected genes. The frequency of inclusion of LCRs between distinct functional groups of genes was compared as previously described32. Orthopoxvirus genomes (n = 231, Akhmeta virus [AKMV]: n = 6 sequences; alaskapox virus [AKPV]: n = 1; cowpox virus [CPXV]: n = 82; ectromelia virus [ECTV]: n = 5; MPXV: n = 62; VACV: n = 18; VARV: n = 57) include 216 functionally annotated OPGs classified in 6 categories (“Housekeeping genes/Core” ANK/PRANC family, Bcl-2 domain family, BTB/Kelch domain family, PIE family, and “Accessory/Other” [e.g., virus–host interacting genes]). The frequency was calculated after normalizing the number of LCRs registered with the sample size of the OPG alignment in the categories described above. Statistical analysis of the significance of differences was performed by applying a Kruskal–Wallis test (χ2 p-values) followed by a non-parametric multiple pairwise comparison between groups (Wilcoxon test), with p-values subjected to FDR correction.
Viral Transcriptomics
Data selection
RNA-seq samples were retrieved from NCBI SRA (ncbi.nlm.nih.gov/sra) under accession ERP141806. The RNA-seq data are from MPXV isolated from skin lesion samples from a patient; samples had been cultured in CV-1 cells for 1–24 h post-infection, 3 replicates each. The original samples were subjected to total RNA isolation, library preparation, poly(A)+ enrichment, and RNA sequencing with ONT MinIon. The BAM files retrieved from SRA were preprocessed files from which reads have been aligned to MPXV ON563414.3 and human GCF_015252025.143.
DEG analyses
Default parameters were set in htseq-count109, which was used to obtain transcript counts for each MPXV gene (ON563414.3 reference) for every sample. Differential gene-expression analyses were then performed with DESeq2 (R package)110 to differentiate early (those overexpressed at early time points) versus late (most expressed at late time points) viral genes. A linear-regression analysis considering the variables “replicate” and “timepoint” was performed.
Definition of promoters
For the top 10 early and late genes (defined as those genes with padj <0.05 that had the largest absolute log2 fold change values), we retrieved the sequence representing the 100 nucleotides upstream of their canonical start codon (ATG), considering the coordinates of the reference genome ON563414.3. Early and late genes were then aligned separately with the Muscle alignment tool111 from MEGA11112. VACV early and late genes have unique conserved TATA boxes44,45,113. We used these reports as templates to define the likely promoters in MPXV.
After defining the more likely consensus early and late promoters from the general viral transcripts, we identified the specific promoters of OPG204 (which was among the top upregulated genes at 1 h post-infection) and of OPG208 (not overexpressed at 1 nor at 24 h post-infection but expressed at the earliest timepoint).
Read depth analysis
We later analyzed the region spanning 40 bases before the start of translation to the first 10 nucleotides from the analyzed coding region of all top early genes using SAMtools95 to assess the read depth at the 5’ end, and later we compared them to OPG204 and OPG208, which were both defined as early genes. The idea behind this analysis is that, given poly(A)+ enrichment, regions that are transcribed (e.g., containing a coding region and untranslated 5’ region) should be covered at significantly higher depth than non-transcribed areas (noise).
Reporting summary
Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.
Supplementary information
Acknowledgements
We would like to thank the work of the Rapid Response Unit of the National Center for Microbiology, especially MªJosé Buitrago, and Cristobal Belda, ISCIII General Director. We also thank Anya Crane (Integrated Research Facility at Fort Detrick, National Institute of Allergy and Infectious Diseases, National Institutes of Health) for critically editing the manuscript and Jiro Wada (Integrated Research Facility at Fort Detrick, National Institute of Allergy and Infectious Diseases, National Institutes of Health) for helping with figure preparation. The work for this study performed at Instituto de Salud Carlos III was partially funded by Acción Estratégica “Impacto clínico y microbiológico del brote por el virus de la viruela del mono en pacientes en España (2022): proyecto multicéntrico MONKPOX-ESP22” (CIBERINFEC) (M.P.S.S.). The work for this study done at the Icahn School of Medicine at Mount Sinai Department of Microbiology as part of Global Health Emerging Pathogen Institute activities was funded by institutional funds (G.P.) from the Icahn School of Medicine at Mount Sinai Department of Microbiology in support of Global Health Emerging Pathogen Institute activities. This work was also supported in part through Laulima Government Solutions, LLC, prime contract with the U.S. National Institute of Allergy and Infectious Diseases (NIAID) under Contract No. HHSN272201800013C. J.H.K. performed this work as an employee of Tunnell Government Services (TGS), a subcontractor of Laulima Government Solutions, LLC, under Contract No. HHSN272201800013C. Opinions, interpretations, conclusions, and recommendations are those of the authors and are not necessarily endorsed by the U.S. Army. The views and conclusions contained in this document are those of the authors and should not be interpreted as necessarily representing the official policies, either expressed or implied, of the U.S. Department of Health and Human Services or of the institutions and companies affiliated with the authors, nor does mention of trade names, commercial products, or organizations imply endorsement by the U.S. Government.
Author contributions
Conceptualization, S.M., S.V., A.N., I.J., M.S.-L., A.G.-S., G.P., I.C., and M.P.S.S. Methodology, S.M., S.V., A.N., S.V.-F., J.A.P.-G., N.F.-G., A.Z., J.H.K., M.S.-L., N.D., J.R.K., E.G., S.G., and G.P. Investigation, S.M., S.V., A.N., S.V.-F., J.A.P.-G., N.F.-G.,A.Z., E.O., O.A., A.M.-G., A.D.-I., V.E., C.G., F.M., P.S.-M., M.T., A.V., J.-C.G., I.T., M.C. del R., L.M-D., M.L., A.G., L.C., A.G., J.C., L.H., P.J., M.L.N.-R., I.J., M.E.A.-A., C.L., L. del R., I.E., M.S., M.A.M., J.H.K., M.S.-L., N.D.P., J.R.K., E.G., S.G., A.G.-S., G.P., I.C., and M.P.S.S. Formal Analysis, S.M., S.V., A.N., S.V.-F., J.A.P.-G., N.F.-G., J.H.K., M.S.-L., N.D.P., J.R.K., E.G., G.P., I.C., and M.S.S. Writing – Original Draft, S.M., S.V., A.N., S.V.-F., and G.P. Writing – Review & Editing, S.M., S.V., A.N., S.V.-F., J.H.K., A.G.-S., G.P., I.C., and M.P.S.S. Visualization, S.M., S.V., A.N., S.V.-F., and G.P. Supervision, A.G.-S., G.P., I.C., and M.P.S.S. Resources, A.G.-S., G.P., I.C., and M.P.S.S. Funding Acquisition, A.G.-S., G.P., I.C., and M.P.S.S.
Peer review
Peer review information
Nature Communications thanks the anonymous reviewers for their contribution to the peer review of this work.
Data availability
The data generated in this study are provided within the article, Supplementary Information, and Supplementary Data files. MPXV high-quality genome sequence is available in the European Nucleotide Archive (ENA) database (Accession number OX044336) https://www.ebi.ac.uk/ena/browser/view/OX044336, accession numbers for all the samples sequenced in this manuscript can be found in Supplementary Data 1. Further information and requests for resources and reagents should be directed to and will be fulfilled by the lead contact, Gustavo Palacios (gustavo.palacios@mssm.edu).
Code availability
All scripts and codes used for this study can be found at GitHub repository: (https://github.com/BU-ISCIII/MPXstreveal)114.
Competing interests
A.G.-S. has consulting agreements for the following companies involving cash and/or stock: Castlevax, Amovir, Vivaldi Biosciences, Contrafect, 7Hills Pharma, Avimex, Vaxalto, Pagoda, Accurius, Esperovax, Farmak, Applied Biological Laboratories, Pharmamar, Paratus, CureLab Oncology, CureLab Veterinary, Synairgen, and Pfizer, outside of the reported work. A.G.-S. has been an invited speaker in meeting events organized by Seqirus, Janssen, Abbott, and Astrazeneca. A.G.-S. is inventor on patents and patent applications on the use of antivirals and vaccines for the treatment and prevention of virus infections and cancer, owned by the Icahn School of Medicine at Mount Sinai, New York, outside of the reported work. The other authors declare no competing interests.
Footnotes
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
These authors contributed equally: Sara Monzón, Sarai Varona, Anabel Negredo, Santiago Vidal-Freire.
These authors jointly supervised this work: Isabel Cuesta, Maripaz P. Sánchez-Seco, Gustavo Palacios.
Supplementary information
The online version contains supplementary material available at 10.1038/s41467-024-46949-7.
References
- 1.International Committee on Taxonomy of Viruses. Current ICTV Taxonomy Release. Current ICTV Taxonomy Release. https://ictv.global/taxonomy. (2022).
- 2.World Health Organization. ICD-11. International Classification of Diseases 11th Revision. https://icd.who.int/en/. (2022).
- 3.Magnus PV, Andersen EK, Petersen KB, Birch-Andersen A. A pox-like disease in cynomolgus monkeys. Acta Pathol. Microbiol. Scand. 2009;46:156–176. doi: 10.1111/j.1699-0463.1959.tb00328.x. [DOI] [Google Scholar]
- 4.Beer EM, Rao VB. A systematic review of the epidemiology of human monkeypox outbreaks and implications for outbreak strategy. PLoS Negl. Trop. Dis. 2019;13:e0007791. doi: 10.1371/journal.pntd.0007791. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 5.Patrono LV, et al. Monkeypox virus emergence in wild chimpanzees reveals distinct clinical outcomes and viral diversity. Nat. Microbiol. 2020;5:955–965. doi: 10.1038/s41564-020-0706-0. [DOI] [PubMed] [Google Scholar]
- 6.Radonić A, et al. Fatal monkeypox in wild-living sooty mangabey, Côte d’Ivoire, 2012. Emerg. Infect. Dis. 2014;20:1009–1011. doi: 10.3201/eid2006.131329. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 7.Khodakevich L, Ježek Z, Messinger D. Monkeypox virus: ecology and public health significance. Bull. World Health Organ. 1988;66:747–752. [PMC free article] [PubMed] [Google Scholar]
- 8.Likos AM, et al. A tale of two clades: monkeypox viruses. J. Gen. Virol. 2005;86:2661–2672. doi: 10.1099/vir.0.81215-0. [DOI] [PubMed] [Google Scholar]
- 9.World Health Organization. Monkeypox: experts give virus variants new names. https://www.who.int/news/item/12-08-2022-monkeypox--experts-give-virus-variants-new-names. (2022).
- 10.Happi C, et al. Urgent need for a non-discriminatory and non-stigmatizing nomenclature for monkeypox virus. PLoS Biol. 2022;20:e3001769. doi: 10.1371/journal.pbio.3001769. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 11.Damon IK. Status of human monkeypox: clinical disease, epidemiology and research. Vaccine. 2011;29:D54–D59. doi: 10.1016/j.vaccine.2011.04.014. [DOI] [PubMed] [Google Scholar]
- 12.Antinori A, et al. Epidemiological, clinical and virological characteristics of four cases of monkeypox support transmission through sexual contact, Italy, May 2022. Euro Surveill. 2022;27:2200421. doi: 10.2807/1560-7917.ES.2022.27.22.2200421. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 13.Vivancos R, et al. Community transmission of monkeypox in the United Kingdom, April to May 2022. Euro Surveill. 2022;27:2200422. doi: 10.2807/1560-7917.ES.2022.27.22.2200422. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 14.Nextstrain. Genomic epidemiology of monkeypox virus. https://nextstrain.org/monkeypox/hmpxv1. (2022).
- 15.Faye O, et al. Genomic characterisation of human monkeypox virus in Nigeria. Lancet Infect. Dis. 2018;18:246. doi: 10.1016/S1473-3099(18)30043-4. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 16.Ježek Z, Szczeniowski M, Paluku KM, Mutombo M. Human monkeypox: clinical features of 282 patients. J. Infect. Dis. 1987;156:293–298. doi: 10.1093/infdis/156.2.293. [DOI] [PubMed] [Google Scholar]
- 17.Bunge EM, et al. The changing epidemiology of human monkeypox—A potential threat? A systematic review. PLoS Negl. Trop. Dis. 2022;16:e0010141. doi: 10.1371/journal.pntd.0010141. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 18.Otu A, Ebenso B, Walley J, Barceló JM, Ochu CL. Global human monkeypox outbreak: atypical presentation demanding urgent public health action. Lancet Microbe. 2022;3:e554–e555. doi: 10.1016/S2666-5247(22)00153-7. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 19.Thornhill JP, et al. Monkeypox virus infection in humans across 16 countries — April–June 2022. N. Engl. J. Med. 2022;387:679–691. doi: 10.1056/NEJMoa2207323. [DOI] [PubMed] [Google Scholar]
- 20.Vusirikala A, et al. Epidemiology of early monkeypox virus transmission in sexual networks of gay and bisexual men, England, 2022. Emerg. Infect. Dis. 2022;28:2082–2086. doi: 10.3201/eid2810.220960. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 21.Ulaeto DO, Dunning J, Carroll MW. Evolutionary implications of human transmission of monkeypox: the importance of sequencing multiple lesions. Lancet Microbe. 2022;3:e639–e640. doi: 10.1016/S2666-5247(22)00194-X. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 22.Grant R, Nguyen LL, Breban R. Modelling human-to-human transmission of monkeypox. Bull. World Health Organ. 2020;98:638–640. doi: 10.2471/BLT.19.242347. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 23.Rimoin AW, et al. Major increase in human monkeypox incidence 30 years after smallpox vaccination campaigns cease in the Democratic Republic of Congo. Proc. Natl Acad. Sci. USA. 2010;107:16262–16267. doi: 10.1073/pnas.1005769107. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 24.Reynolds MG, et al. Clinical manifestations of human monkeypox influenced by route of infection. J. Infect. Dis. 2006;194:773–780. doi: 10.1086/505880. [DOI] [PubMed] [Google Scholar]
- 25.Damon, I. K. in Fields Virology (eds D. M. Knipe & P. M. Howley) 2160-2184 (Williams & Wilkins, 2013).
- 26.Liu R, et al. SPI-1 is a missing host-range factor required for replication of the attenuated modified vaccinia Ankara (MVA) vaccine vector in human cells. PLoS Pathog. 2019;15:e1007710. doi: 10.1371/journal.ppat.1007710. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 27.McFadden G. Poxvirus tropism. Nat. Rev. Microbiol. 2005;3:201–213. doi: 10.1038/nrmicro1099. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 28.Moss B. Poxvirus entry and membrane fusion. Virology. 2006;344:48–54. doi: 10.1016/j.virol.2005.09.037. [DOI] [PubMed] [Google Scholar]
- 29.Moss B. Membrane fusion during poxvirus entry. Semin. Cell Dev. Biol. 2016;60:89–96. doi: 10.1016/j.semcdb.2016.07.015. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 30.Roberts KL, Smith GL. Vaccinia virus morphogenesis and dissemination. Trends Microbiol. 2008;16:472–479. doi: 10.1016/j.tim.2008.07.009. [DOI] [PubMed] [Google Scholar]
- 31.Kugelman JR, et al. Genomic variability of monkeypox virus among humans, Democratic Republic of the Congo. Emerg. Infect. Dis. 2014;20:232–239. doi: 10.3201/eid2002.130118. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 32.Senkevich TG, Yutin N, Wolf YI, Koonin EV, Moss B. Ancient gene capture and recent gene loss shape the evolution of orthopoxvirus-host interaction genes. mBio. 2021;12:e0149521. doi: 10.1128/mBio.01495-21. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 33.Moss, B. & Smith, G. L. in Fields virology. Volume 2: DNA viruses (eds P. M. Howley, D. M. Knipe, B. A. Damania, & Jeffrey I. Cohen) 573-613 (Wolters Kluwer, 2021).
- 34.Wittek R, Moss B. Tandem repeats within the inverted terminal repetition of vaccinia virus DNA. Cell. 1980;21:277–284. doi: 10.1016/0092-8674(80)90135-X. [DOI] [PubMed] [Google Scholar]
- 35.Shchelkunov SN, et al. Analysis of the monkeypox virus genome. Virology. 2002;297:172–194. doi: 10.1006/viro.2002.1446. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 36.Slabaugh MB, Roseman NA, Mathews CK. Amplification of the ribonucleotide reductase small subunit gene: analysis of novel joints and the mechanism of gene duplication in vaccinia virus. Nucleic Acids Res. 1989;17:7073–7088. doi: 10.1093/nar/17.17.7073. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 37.Elde NC, et al. Poxviruses deploy genomic accordions to adapt rapidly against host antiviral defenses. Cell. 2012;150:831–841. doi: 10.1016/j.cell.2012.05.049. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 38.Senkevich TG, Zhivkoplias EK, Weisberg AS, Moss B. Inactivation of genes by frameshift mutations provides rapid adaptation of an attenuated vaccinia virus. J Virol. 2020;94:e01053–01020. doi: 10.1128/JVI.01053-20. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 39.Ladner JT, et al. Standards for sequencing viral genomes in the era of high-throughput sequencing. mBio. 2014;5:e01360–01314. doi: 10.1128/mBio.01360-14. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 40.Chen N, et al. Virulence differences between monkeypox virus isolates from West Africa and the Congo basin. Virology. 2005;340:46–63. doi: 10.1016/j.virol.2005.05.030. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 41.Kastenmayer RJ, et al. Elimination of A-type inclusion formation enhances cowpox virus replication in mice: implications for orthopoxvirus evolution. Virology. 2014;452-453:59–66. doi: 10.1016/j.virol.2013.12.030. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 42.Zhou J, Wang X, Zhou Z, Wang S. Insights into the evolution and host adaptation of the monkeypox virus from a codon usage perspective: focus on the ongoing 2022 outbreak. Int. J. Mol. Sci. 2023;24:11524. doi: 10.3390/ijms241411524. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 43.Kakuk B, et al. In-depth temporal transcriptome profiling of monkeypox and host cells using nanopore sequencing. Sci. Data. 2023;10:262. doi: 10.1038/s41597-023-02149-4. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 44.Davison AJ, Moss B. Structure of vaccinia virus early promoters. J. Mol. Biol. 1989;210:749–769. doi: 10.1016/0022-2836(89)90107-1. [DOI] [PubMed] [Google Scholar]
- 45.Knutson BA, Liu X, Oh J, Broyles SS. Vaccinia virus intermediate and late promoter elements are targeted by the TATA-binding protein. J. Virol. 2006;80:6784–6793. doi: 10.1128/JVI.02705-05. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 46.Vaughan A, et al. Two cases of monkeypox imported to the United Kingdom, September 2018. Euro Surveill. 2018;23:1800509. doi: 10.2807/1560-7917.ES.2018.23.38.1800509. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 47.Cohen-Gihon I, et al. Identification and whole-genome sequencing of a monkeypox virus strain isolated in Israel. Microbiol. Resour. Announc. 2020;9:e01524–19. doi: 10.1128/MRA.01524-19. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 48.Ng OT, et al. A case of imported monkeypox in Singapore. Lancet Infect. Dis. 2019;19:1166. doi: 10.1016/S1473-3099(19)30537-7. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 49.Yong SEF, et al. Imported monkeypox, Singapore. Emerg. Infect. Dis. 2020;26:1826–1830. doi: 10.3201/eid2608.191387. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 50.Baroudy BM, Moss B. Sequence homologies of diverse length tandem repetitions near ends of vaccinia virus genome suggest unequal crossing over. Nucleic Acids Res. 1982;10:5673–5679. doi: 10.1093/nar/10.18.5673. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 51.Esposito JJ, et al. Genome sequence diversity and clues to the evolution of variola (smallpox) virus. Science. 2006;313:807–812. doi: 10.1126/science.1125134. [DOI] [PubMed] [Google Scholar]
- 52.Gubser C, Hué S, Kellam P, Smith GL. Poxvirus genomes: a phylogenetic analysis. J. Gen. Virol. 2004;85:105–117. doi: 10.1099/vir.0.19565-0. [DOI] [PubMed] [Google Scholar]
- 53.Hendrickson RC, Wang C, Hatcher EL, Lefkowitz EJ. Orthopoxvirus genome evolution: the role of gene loss. Viruses. 2010;2:1933–1967. doi: 10.3390/v2091933. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 54.Shchelkunov SN. Orthopoxvirus genes that mediate disease virulence and host tropism. Adv. Virol. 2012;2012:524743. doi: 10.1155/2012/524743. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 55.Shchelkunov SN, et al. Human monkeypox and smallpox viruses: genomic comparison. FEBS Lett. 2001;509:66–70. doi: 10.1016/S0014-5793(01)03144-1. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 56.McLysaght A, Baldi PF, Gaut BS. Extensive gene gain associated with adaptive evolution of poxviruses. Proc. Natl Acad. Sci. USA. 2003;100:15655–15660. doi: 10.1073/pnas.2136653100. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 57.Bratke KA, McLysaght A. Identification of multiple independent horizontal gene transfers into poxviruses using a comparative genomics approach. BMC Evol. Biol. 2008;8:67. doi: 10.1186/1471-2148-8-67. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 58.Isidro J, et al. Phylogenomic characterization and signs of microevolution in the 2022 multi-country outbreak of monkeypox virus. Nat. Med. 2022;28:1569–1572. doi: 10.1038/s41591-022-01907-y. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 59.O’Tool, Á. & Rambaut, A. Initial observations about putative APOBEC3 deaminase editing driving short-term evolution of MPXV since 2017. https://virological.org/t/initial-observations-about-putative-apobec3-deaminase-editing-driving-short-term-evolution-of-mpxv-since-2017/830 (2022).
- 60.Jones, T. C. et al. Genetic variability, including gene duplication and deletion, in early sequences from the 2022 European monkeypox outbreak. bioRxiv, 2022.2007.2023.501239 10.1101/2022.07.23.501239 (2022).
- 61.Kotwal GJ, Moss B. Vaccinia virus encodes two proteins that are structurally related to members of the plasma serine protease inhibitor superfamily. J. Virol. 1989;63:600–606. doi: 10.1128/jvi.63.2.600-606.1989. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 62.Kettle S, Blake NW, Law KM, Smith GL. Vaccinia virus serpins B13R (SPI-2) and B22R (SPI-1) encode Mr 38.5 and 40K, intracellular polypeptides that do not affect virus virulence in a murine intranasal model. Virology. 1995;206:136–147. doi: 10.1016/S0042-6822(95)80028-X. [DOI] [PubMed] [Google Scholar]
- 63.Jorgensen I, Rayamajhi M, Miao EA. Programmed cell death as a defence against infection. Nat. Rev. Immunol. 2017;17:151–164. doi: 10.1038/nri.2016.147. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 64.Brooks MA, Ali AN, Turner PC, Moyer RW. A rabbitpox virus serpin gene controls host range by inhibiting apoptosis in restrictive cells. J. Virol. 1995;69:7688–7698. doi: 10.1128/jvi.69.12.7688-7698.1995. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 65.Hernáez B, et al. A virus-encoded type I interferon decoy receptor enables evasion of host immunity through cell-surface binding. Nat. Commun. 2018;9:5440. doi: 10.1038/s41467-018-07772-z. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 66.Colamonici OR, Domanski P, Sweitzer SM, Larner A, Buller RM. Vaccinia virus B18R gene encodes a type I interferon-binding protein that blocks interferon α transmembrane signaling. J. Biol. Chem. 1995;270:15974–15978. doi: 10.1074/jbc.270.27.15974. [DOI] [PubMed] [Google Scholar]
- 67.Chiu W-L, Lin C-L, Yang M-H, Tzou D-L, Chang W. Vaccinia virus 4c (A26L) protein on intracellular mature virus binds to the extracellular cellular matrix laminin. J. Virol. 2007;81:2149–2157. doi: 10.1128/JVI.02302-06. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 68.Ulaeto D, Grosenbach D, Hruby DE. The vaccinia virus 4c and A-type inclusion proteins are specific markers for the intracellular mature virus particle. J. Virol. 1996;70:3372–3377. doi: 10.1128/jvi.70.6.3372-3377.1996. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 69.Howard AR, Senkevich TG, Moss B. Vaccinia virus A26 and A27 proteins form a stable complex tethered to mature virions by association with the A17 transmembrane protein. J. Virol. 2008;82:12384–12391. doi: 10.1128/JVI.01524-08. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 70.Liu L, Cooper T, Howley PM, Hayball JD. From crescent to mature virion: vaccinia virus assembly and maturation. Viruses. 2014;6:3787–3808. doi: 10.3390/v6103787. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 71.Keasey S, et al. Proteomic basis of the antibody response to monkeypox virus infection examined in cynomolgus macaques and a comparison to human smallpox vaccination. PLoS One. 2010;5:e15547. doi: 10.1371/journal.pone.0015547. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 72.Pugh C, et al. Povidone iodine ointment application to the vaccination site does not alter immunoglobulin G antibody response to smallpox vaccine. Viral Immunol. 2016;29:361–366. doi: 10.1089/vim.2016.0025. [DOI] [PubMed] [Google Scholar]
- 73.Fedele CG, Negredo A, Molero F, Sánchez-Seco MP, Tenorio A. Use of internally controlled real-time genome amplification for detection of variola virus and other orthopoxviruses infecting humans. J. Clin. Microbiol. 2006;44:4464–4470. doi: 10.1128/JCM.00276-06. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 74.Sánchez-Seco MP, et al. Detection and identification of orthopoxviruses using a generic nested PCR followed by sequencing. Br. J. Biomed. Sci. 2006;63:79–85. doi: 10.1080/09674845.2006.11732725. [DOI] [PubMed] [Google Scholar]
- 75.Wick RR, Judd LM, Gorrie CL, Holt KE. Completing bacterial genome assemblies with multiplex MinION sequencing. Microb. Genom. 2017;3:e000132. doi: 10.1099/mgen.0.000132. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 76.Kolmogorov M, Yuan J, Lin Y, Pevzner PA. Assembly of long, error-prone reads using repeat graphs. Nat. Biotechnol. 2019;37:540–546. doi: 10.1038/s41587-019-0072-8. [DOI] [PubMed] [Google Scholar]
- 77.Langmead B, Salzberg SL. Fast gapped-read alignment with Bowtie 2. Nat. Methods. 2012;9:357–359. doi: 10.1038/nmeth.1923. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 78.Walker BJ, et al. Pilon: an integrated tool for comprehensive microbial variant detection and genome assembly improvement. PLoS One. 2014;9:e112963. doi: 10.1371/journal.pone.0112963. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 79.Patel, H. et al. nf-core/viralrecon: nf-core/viralrecon v2.4.1 - Plastered Magnesium Marmoset. https://zenodo.org/record/6320980#.YzJOTzTMJyE. 10.5281/zenodo.6320980 (2022)
- 80.Di Tommaso P, et al. Nextflow enables reproducible computational workflows. Nat. Biotechnol. 2017;35:316–319. doi: 10.1038/nbt.3820. [DOI] [PubMed] [Google Scholar]
- 81.Ewels PA, et al. The nf-core framework for community-curated bioinformatics pipelines. Nat. Biotechnol. 2020;38:276–278. doi: 10.1038/s41587-020-0439-x. [DOI] [PubMed] [Google Scholar]
- 82.Andrews, S. FASTQC: a quality control tool for high throughput sequence data. https://www.bioinformatics.babraham.ac.uk/projects/fastqc/ (2010).
- 83.Chen S, Zhou Y, Chen Y, Gu J. fastp: an ultra-fast all-in-one FASTQ preprocessor. Bioinformatics. 2018;34:i884–i890. doi: 10.1093/bioinformatics/bty560. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 84.Wood DE, Lu J, Langmead B. Improved metagenomic analysis with Kraken 2. Genome Biol. 2019;20:257. doi: 10.1186/s13059-019-1891-0. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 85.Antipov D, Korobeynikov A, McLean JS, Pevzner P. A. hybridSPAdes: an algorithm for hybrid assembly of short and long reads. Bioinformatics. 2016;32:1009–1015. doi: 10.1093/bioinformatics/btv688. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 86.Prjibelski A, Antipov D, Meleshko D, Lapidus A, Korobeynikov A. Using SPAdes De Novo Assembler. Curr. Protoc. Bioinformatics. 2020;70:e102. doi: 10.1002/cpbi.102. [DOI] [PubMed] [Google Scholar]
- 87.Assefa S, Keane TM, Otto TD, Newbold C, Berriman M. ABACAS: algorithm-based automatic contiguation of assembled sequences. Bioinformatics. 2009;25:1968–1969. doi: 10.1093/bioinformatics/btp347. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 88.Gigante, C. M. et al. Multiple lineages of Monkeypox virus detected in the United States, 2021- 2022. Science378, 560–565 (2022). [DOI] [PMC free article] [PubMed]
- 89.Katoh K, Rozewicki J, Yamada KD. MAFFT online service: multiple sequence alignment, interactive sequence choice and visualization. Brief Bioinform. 2019;20:1160–1166. doi: 10.1093/bib/bbx108. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 90.Waterhouse AM, Procter JB, Martin DMA, Clamp M, Barton GJ. Jalview Version 2—a multiple sequence alignment editor and analysis workbench. Bioinformatics. 2009;25:1189–1191. doi: 10.1093/bioinformatics/btp033. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 91.Benson G. Tandem repeats finder: a program to analyze DNA sequences. Nucleic Acids Res. 1999;27:573–580. doi: 10.1093/nar/27.2.573. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 92.Wang D, et al. STRsearch: a new pipeline for targeted profiling of short tandem repeats in massively parallel sequencing data. Hereditas. 2020;157:8. doi: 10.1186/s41065-020-00120-6. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 93.Mauldin MR, et al. Exportation of monkeypox virus from the African continent. J. Infect. Dis. 2022;225:1367–1376. doi: 10.1093/infdis/jiaa559. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 94.The Broad Institute. Picard coomand-line tools. https://broadinstitute.github.io/picard/ (2018).
- 95.Li H, et al. The Sequence Alignment/Map format and SAMtools. Bioinformatics. 2009;25:2078–2079. doi: 10.1093/bioinformatics/btp352. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 96.Grubaugh ND, et al. An amplicon-based sequencing framework for accurately measuring intrahost virus diversity using PrimalSeq and iVar. Genome Biol. 2019;20:8. doi: 10.1186/s13059-018-1618-7. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 97.Danecek P, et al. Twelve years of SAMtools and BCFtools. GigaScience. 2021;10:giab008. doi: 10.1093/gigascience/giab008. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 98.Cingolani P, et al. A program for annotating and predicting the effects of single nucleotide polymorphisms, SnpEff: SNPs in the genome of Drosophila melanogaster strain w1118; iso-2; iso-3. Fly. 2012;6:80–92. doi: 10.4161/fly.19695. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 99.Cingolani P, et al. Using Drosophila melanogaster as a model for genotoxic chemical mutational studies with a new program, SnpSift. Front. Genet. 2012;3:35. doi: 10.3389/fgene.2012.00035. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 100.Ewels P, Magnusson M, Lundin S, Käller M. MultiQC: summarize analysis results for multiple tools and samples in a single report. Bioinformatics. 2016;32:3047–3048. doi: 10.1093/bioinformatics/btw354. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 101.Aksamentov I, Roemer C, Hodcroft EB, Neher RA. Nextclade: clade assignment, mutation calling and quality control for viral genomes. J. Open Source Softw. 2021;6:3773. doi: 10.21105/joss.03773. [DOI] [Google Scholar]
- 102.Seeman, T. Snippy: rapid haploid variant calling and core genome alignment. https://github.com/tseemann/snippy (2015).
- 103.Minh BQ, et al. IQ-TREE 2: new models and efficient methods for phylogenetic inference in the genomic era. Mol. Biol. Evol. 2020;37:1530–1534. doi: 10.1093/molbev/msaa015. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 104.Letunic I, Bork P. Interactive Tree Of Life (iTOL) v5: an online tool for phylogenetic tree display and annotation. Nucleic Acids Res. 2021;49:W293–W296. doi: 10.1093/nar/gkab301. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 105.Bandelt H-J, Forster P, Röhl A. Median-joining networks for inferring intraspecific phylogenies. Mol. Biol. Evol. 1999;16:37–48. doi: 10.1093/oxfordjournals.molbev.a026036. [DOI] [PubMed] [Google Scholar]
- 106.Shumate A, Salzberg SL. Liftoff: accurate mapping of gene annotations. Bioinformatics. 2021;37:1639–1643. doi: 10.1093/bioinformatics/btaa1016. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 107.Dainat, J., Hereñú, D., LucileSol & pascal-git. NBISweden/AGAT: AGAT-v0.8.1. https://zenodo.org/record/5834795#.YzJOCzTMJyE.10.5281/zenodo.5834795 (2022).
- 108.Chen Y, et al. MetaLogo: a heterogeneity-aware sequence logo generator and aligner. Brief Bioinform. 2022;23:1–7. doi: 10.1093/bib/bbab591. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 109.Anders S, Pyl PT, Huber W. HTSeq—a Python framework to work with high-throughput sequencing data. Bioinformatics. 2015;31:166–169. doi: 10.1093/bioinformatics/btu638. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 110.Love MI, Huber W, Anders S. Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2. Genome Biol. 2014;15:550. doi: 10.1186/s13059-014-0550-8. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 111.Edgar RC. MUSCLE: a multiple sequence alignment method with reduced time and space complexity. BMC Bioinformatics. 2004;5:113. doi: 10.1186/1471-2105-5-113. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 112.Tamura K, Stecher G, Kumar S. MEGA11: Molecular evolutionary genetics analysis Version 11. Mol. Biol. Evol. 2021;38:3022–3027. doi: 10.1093/molbev/msab120. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 113.Holley J, et al. Engineered promoter-switched viruses reveal the role of poxvirus maturation protein A26 as a negative regulator of viral spread. J. Virol. 2021;95:e0101221. doi: 10.1128/JVI.01012-21. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 114.Monzon, S. V. et al. MPXstreveal v1.0.0 (v1.0.0)10.5281/zenodo.10721675 (2024).
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Supplementary Materials
Data Availability Statement
The data generated in this study are provided within the article, Supplementary Information, and Supplementary Data files. MPXV high-quality genome sequence is available in the European Nucleotide Archive (ENA) database (Accession number OX044336) https://www.ebi.ac.uk/ena/browser/view/OX044336, accession numbers for all the samples sequenced in this manuscript can be found in Supplementary Data 1. Further information and requests for resources and reagents should be directed to and will be fulfilled by the lead contact, Gustavo Palacios (gustavo.palacios@mssm.edu).
All scripts and codes used for this study can be found at GitHub repository: (https://github.com/BU-ISCIII/MPXstreveal)114.