Skip to main content
HHS Author Manuscripts logoLink to HHS Author Manuscripts
. Author manuscript; available in PMC: 2023 Jun 12.
Published in final edited form as: Science. 2022 Oct 20;378(6619):560–565. doi: 10.1126/science.add4153

Multiple lineages of monkeypox virus detected in the United States, 2021–2022

Crystal M Gigante 1, Bette Korber 2, Matthew H Seabolt 1,3, Kimberly Wilkins 1, Whitni Davidson 1, Agam K Rao 1, Hui Zhao 1, Todd G Smith 1, Christine M Hughes 1, Faisal Minhaj 1, Michelle A Waltenburg 1, James Theiler 4, Sandra Smole 5, Glen R Gallagher 5, David Blythe 6, Robert Myers 6, Joann Schulte 7, Joey Stringer 7, Philip Lee 8, Rafael M Mendoza 9, LaToya A Griffin-Thomas 10, Jenny Crain 11, Jade Murray 12, Annette Atkinson 12, Anthony H Gonzalez 13, June Nash 13, Dhwani Batra 1, Inger Damon 1, Jennifer McQuiston 1, Christina L Hutson 1, Andrea M McCollum 1, Yu Li 1,*
PMCID: PMC10258808  NIHMSID: NIHMS1900512  PMID: 36264825

Abstract

Monkeypox is a viral zoonotic disease endemic in Central and West Africa. In May 2022, dozens of non-endemic countries reported hundreds of monkeypox cases, most with no epidemiological link to Africa. We identified two lineages of monkeypox virus (MPXV) among two 2021 and seven 2022 U.S. monkeypox cases: the major 2022 outbreak variant, B.1, and a minor contemporaneously sampled variant called A.2. Analyses of mutations among these two variants revealed an extreme preference for GA-to-AA mutations indicative of human APOBEC3 cytosine deaminase activity among Clade IIb MPXV (previously West African, Nigeria) sampled since 2017. Such mutations were not enriched within other MPXV clades. These findings suggest that APOBEC3 editing may be a recurrent and a dominant driver of MPXV evolution within the current outbreak.

One-Sentence Summary:

We report multiple introductions of monkeypox viruses of different origins into the U.S. and find evidence of APOBEC3 editing since 2017.


Monkeypox is a viral zoonotic disease caused by monkeypox virus (MPXV) that is endemic in West and Central Africa. There have been several reported cases of travel-associated monkeypox in non-African countries in recent years. In 2003, an outbreak of monkeypox in the United States (U.S.) was linked to imported African small mammals (1). In 2017, the largest monkeypox outbreak in western Africa occurred in Nigeria after decades of no identified cases (2), and during 2018 to 2021, eight cases were exported from Nigeria to non-endemic countries (25). In 2021, there were two U.S. monkeypox cases in travelers from Nigeria, in Maryland and Texas (4, 5). In May of 2022, this pattern of monkeypox cases in travelers from Nigeria shifted. As of September 28, 2022, 67,602 cases of monkeypox were reported in 99 non-endemic countries, most with no epidemiological link to Africa (6), and 25,509 cases have been confirmed in the U.S.A (7).

Comparison of MPXV sequences from nine U.S. monkeypox cases from 2021 and 2022 (ON563414.3, ON674051, ON675438, ON676703 – ON676708) revealed two distinct lineages (Fig. 1) within MPXV Clade IIb (formerly named West African MPXV found east of the Dahomey Gap). Neutral nomenclature was chosen to reduce stigmatization that can be associated with naming based on locations (8). We maintain the designation of two clades of MPXV (Clade I: formerly Congo Basin/Central African and Clade II: formerly West African) based on genetic distance and evidence-based clinical differences (912). Five of the seven May 2022 U.S. sequences formed a monophyletic clade with 2022 MPXV sequences from Europe (Fig. 1), with most genomes within this clade containing 0 – 2 nucleotide changes in non-repeat regions (Fig. S1, Table S1). This clade will be referred to as the current predominant 2022 MPXV outbreak variant, B.1 (based on proposed MPXV naming from Nextstrain (13)). MPXV from a 2021 travel-associated case from Nigeria to Maryland (USA_2021_MD) displayed high similarity to variant B.1 sequences, with approximately 13 nucleotide differences (Figs. 1 and S1, Table 1). The USA_2021_MD and 2022 outbreak sequences shared many mutations that separated them from MPXV sequences from Nigeria and travel-associated cases from 2017 – 2019 (Table S2).

Fig. 1. Phylogenetic analysis of Clade IIb MPXV genome sequences.

Fig. 1.

Variant B.1 sequences are shown in blue text; variant A.2 is shown in orange; green branches indicate Clade IIb. Phylogenetic analysis was performed in BEAST v1.8.3 using HKY+G model and constant coalescent prior on complete genome alignments after removing all sites containing gaps. Scale bar is in substitutions per site; posterior support values are shown at branch points.

Table 1. Unique coding nucleotide changes in 2022 predominant MPXV outbreak variant B.1.

Mutations listed were shared among all MPXV Clade IIb variant B.1 sequences examined in Figure 1 and were not present in USA_2021_MD or UK-P2 (MT903344.1). Gene homologs in Vaccinia virus Copenhagen (VACV-Cop) are given for each gene. Pos is position in MT903344.1. Additional information can be found in Table S2.

Pos MD 2021 2022 Outbreak TYPE EFFECT AA CHANGE NOTE
3120, 194114 G A snp synonymous VACV-Cop C19L Ankyrin-like
39148 C T snp missense Glu353Lys VACV-Cop F13L major envelope antigen of EEV
73248 G A snp missense Asp88Asn VACV-Cop G8R VLTF-1 (late transcription factor 1)
74214 G A snp missense Met142Ile VACV-Cop G9R Entry/fusion complex component
77392 G A snp missense Glu162Lys VACV-Cop L4R ss/dsDNA binding protein
84596 C T snp synonymous VACV-Cop J6R RNA polymerase subunit (RPO147)
150480 C T snp missense His221Tyr VACV-Cop A46R IL-1/TLR signaling inhibitor
170273 G A snp synonymous VACV-Cop B12R Ser/Thr Kinase
183534 C T snp missense Pro722Ser VACV-Cop B21R Membrane-associated glycoprotein

Two 2022 U.S. MPXV sequences, USA_2022_FL001 and USA_2022_VA001, and one 2021 U.S. sequence (USA_2021_TX) formed a monophyletic clade (variant A.2) that was polyphyletic to other 2022 MPXV sequences from the U.S. and Europe (Fig. 1). Each genome contained approximately 80 nucleotide changes relative to variant B.1 MPXV sequences (Fig. S1). The three variant A.2 genomes displayed approximately 30 unique nucleotide differences from each other (Fig. S1, Table S3). Each case also reported travel to different countries in the Middle East and West Africa (Table S4). Together, this suggests that although the three MPXVs share a common ancestor, they likely represent separate introductions to the USA.

Real-time PCR testing of USA_2021_MD and the five 2022 MPXV variant B.1 samples revealed a lower sensitivity (average Ct delay of 6.88, ranging from 5.3 to 8.3) in Orthopoxvirus generic OPX3 real-time PCR assay (14) compared to Monkeypox Clade II-specific real-time PCR assay (15) (Table S5, when performed as described in Methods). By contrast, similar Ct values were observed for USA_2022_FL001, USA_2022_VA001, or USA_2021_TX samples (average difference of −0.78 Ct, ranging from −1.4 to 0.5). By careful comparison of Ct values between the two assays laboratories can, in theory, differentiate between cases belonging to variant B.1 and cases from other Clade IIb MPXV without sequencing. Sequence examination revealed a SNP in the reverse primer binding site for the OPX3 real-time PCR assay (DNA polymerase gene, VACV-Cop E9L, C322T) in USA_2021_MD and variant B.1 sequences that was absent from other Clade IIb MPXV sequences. The decreased sensitivity in the OPX3 assay is unrelated to the U.S. Food and Drug Administration-cleared VAC1 assay (16) used for MPXV screening. Use of different commercial reagents, run parameters or PCR platform may result in different results.

When we compared the 2022 outbreak MPXV sequences in Fig. 1 to other MPVX sequences in Lineage A, we noticed a striking preponderance of G-to-A mutations, specifically 5ʹ GA-to-AA or 5ʹ GG-to-AG, indicative of host Apolipoprotein B mRNA Editing Catalytic Polypeptide-like3 (APOBEC3) activity (17, 18) (Fig. 2, Table 2). Looking at this systematically, we found a significant enrichment of APOBEC3 context G-to-A mutations was evident throughout Lineage A, among Clade IIb MPXV sequences sampled from 2017 to 2022 (Fig. 2). Overall, among unique mutations within Lineage A, 167 G-to-A mutations were in an APOBEC3 context, while nine G-to-A mutations were not in an APOBEC3 motif, and only 14 mutations were not G-to-A (Fig. 2, Table 2). The vast majority of the APOBEC3 context G-to-A mutations were specifically GA-to-AA (156 out of 167, 95%, Fig. 2), indicating that if these mutations were generated by APOBEC3 editing, APOBEC3G was not the major form, as it produces GG-to-AG changes (19, 20).

Fig. 2. Analysis of APOBEC3 motif mutations in MPXV.

Fig. 2.

A. Maximum Likelihood phylogenetic tree using IQ-TREE. Enrichment for G-to-A APOBEC context mutations (shown by blue line) was found in Lineage A, within Clade IIb (green branches). A detailed version of this tree with taxa names is included in Fig. S2. The scale bar indicates the number of substitutions per sequence site. B. Mutational patterns found among Clade I, Clade IIa, and Lineage A in the phylogeny above. All unique mutations in a clade relative to the most recent common ancestor of that clade are shown in a single panel, and the class of each mutation is shown on either the forward or reverse complement strand. The increased number of APOBEC3 context mutations, indicated by blue ticks, in Lineage A, versus red and grey captures the dominance of GA-to-AA APOBEC3 context mutations in this lineage and contrasts with Clades I and IIa. The exact numbers and statistics are provided in Table 2. The A.2 and B.1 variants within Lineage A in the tree, and an updated analysis of 397 variant B.1 sequences from GISAID, demonstrate the continuing dominance of GA-to-AA APOBEC3 context mutations among currently sampled outbreak sequences. C. Bar charts showing the total number of each class of unique mutations within each clade in Fig. 2B, Clade I, Clade IIa, Lineage A, as well as for 397 sequences from lineage B.1 from Fig. S3A.

Table 2. Summary of the counts of different classes of all unique mutations that were observed in different clades shown in Fig. 2.

C-to-T mutations in the forward strand are included as G-to-A in the reverse complement, as a simplified way to tally of all mutations that occurred in the context of an APOBEC3 motif in the viral genome relative to those that did not. The p-values are based on a Fisher’s exact test of a contingency table that considered all G’s either in an APOBEC context (GA or GG) or not in an APOBEC context (GC or GT), and then determining the number of G-to-A substitutions that occurred within each of the two contexts. ABOBEC motif Odds Ratio >1 indicates APOBEC context enrichment.

Contingency Table
Clade All other mutations G-to-A APOBEC context G in APOBEC context, no change G-to-A non APOBEC context G not in APOBEC context, no change p-value
2-sided
q-value Odds Ratio GA-to-AA
or
GG-to-AG

Non-APOBEC context I 277 72 36552 176 30580 <0.000001 0.000006 0.3423 45 + 27
IIa 225 65 36548 132 30578 <0.000001 0.000006 0.412 35 + 30

APOBEC context enriched Lineage A 14 167 36415 9 30669 <0.000001 0.000006 15.63 159 + 8
B.1, N=12 0 8 36517 0 30675 0.0095 0.022 inf 8 + 0
B.1, N=397 115 275 35064 33 29705 <0.000001 0.000006 7.06 261 + 14
A.2 3 52 36516 3 30674 <0.000001 0.000006 14.56 51 + 1

We explored the abundance of APOBEC3 motif G-to-A mutations along the evolutionary trajectory from the common ancestor of Lineage A to the variant A.2 and B.1 (Figs. 2, S2). The path from the estimated ancestor of Lineage A to the estimated ancestors of both variant B.1 and A.2, including each internal branch, revealed significantly increased APOBEC3 motif G-to-A changes, indicating sequential acquisition of mutations in distinct individuals sampled historically since 2017 have contributed to the high proportions of GA-to-AA APOBEC3 (Fig. S2 and Table S6). Given the limited variation found in 2022 outbreak sequences in variant B.1, there were only a small number of SNPs relative to the ancestor for any one sequence. When considered together, however, all 8 SNPs observed among the 12 variant B.1 genomes available in early May 2022 were GA-to-AA APOBEC3 context mutations (Figs. 2, S2 and Table S6). Outbreak-related sequence data has increased very rapidly, enabling analysis of 397 outbreak genomes sampled between May 1 and July 15, 2022, available through the GISAID data sharing initiative (21). This greatly expanded data set confirmed the pattern observed among the first twelve sequences: 275/308 (89%) of observed unique G-to-A mutations occurred in an APOBEC3 context, and 261/275 (95%) of these were specifically 5’ GA-to-AA (Figs. 2 and S3, Table S7). Beyond variant B.1, by partitioning Lineage A into sublineages (A.2, Nigeria 2017, and Nigeria 2017 – 2019; Fig. 2) we found that G-to-A mutations in A.2 were in an APOBEC3 context were enriched throughout Lineage A (Figs. 2, S2 and Table S6). Four additional MPXV sequences belonging to variant A.2 have now been detected in India and Thailand (22); 45/47 (96%) G-to-A mutations were in a APOBEC3 context, and all 45 of these were specifically 5’ GA-to-AA (Table S7).

In contrast, APOBEC3 context G-to-A changes were lower than expected by chance relative to other G-to-A changes across Clade I and Clade IIa MPXV (Figs. 2 and S2, Table S8). To better resolve where within the phylogeny the switch in mutational patterns arose, we evaluated the mutational frequencies along the internal branches in Clade IIb leading into Lineage A (Figs. 2 and S2, Table S8); these were not found to be statistically enriched for APOBEC3 context G-to-A changes.

Of note, in the July 15 GISAID sample there were three highly related sequences, one each from Italy, Spain and France, that were chimeric in that most of the genome was typical of variant B.1, but each ITR region carried 5 consecutive bases that were typically found among other Lineage A and Clade IIb MPXV but not among B.1 sequences. A detailed analyses of these sequences is provided in Fig. S4, and they were excluded from analysis shown in Fig S3. This pattern is potentially indicative of recombination, an important aspect of poxvirus evolution (23); however, assembly errors caused by reference-calling in low coverage regions could not be ruled out as a possible explanation.

Given the clear transition in the mutational pattern in Lineage A relative to Clades I and IIa (Fig. 2), we explored the hypothesis that there was a concomitant change in the evolutionary rate in Lineage A. First, we used the fixed local clock model as implemented in BEAST to compare the estimated mean evolutionary rate of Lineage A (7.2×10−6 ± 8.9×10−7 (standard deviation), to Clade I (1.9×10−6 ± 3.1×10−7) and Clade IIa (3.9×10−6 ± 8.9×10−7) (Table S9). We also separated Lineage A, retaining Nigerian-SE-1971 as an outgroup, from Clade I and IIa and repeated the analysis using two trees; we felt this to be the more appropriate alternative given the transition in the underlying evolutionary model. In this analysis we found Lineage A had an even higher evolutionary rate estimate of 2.8×10−5 ± 8.8×10−6, while estimates for Clades I and IIa were essentially unchanged (Table S9). Thus, using either approach, we observed higher evolutionary rates within Lineage A, consistent with mutations being driven by host editing mediated through APOBEC3, over what is expected by errors in the viral polymerase. An enriched evolutionary rate for clade B.1 outbreak viruses has been similarly reported by others (18, 22). A cautionary note regarding the robustness of estimates of rates of evolutionary change in this scenario is that if the mutations are indeed dominated by host-mediated mutations, different people may have different levels of APOBEC3 activity, and this may change over time. Such transient host-dependent increases in G-to-A ABOBEC3 motif enriched mutations in HIV genomes in single infected hosts are observed both in HIV-1 infection in humans and SHIV infections in monkeys (24, 25)

All publicly available MPXV genomes from the 2022 Monkeypox outbreak to date belong to Clade II, which may cause less severe disease and have a lower case fatality rate than Clade I MPXV (1013). Genomes published during the 2022 outbreak share a common ancestor with MPXV sequences from Nigeria (Clade IIb); however, sequences from surrounding countries are limited and most of our understanding of these relationships comes from viruses linked to or identified in Nigeria. The high similarity among the current predominant 2022 MPXV outbreak variant B.1 is similar to the 0.4 to 1.5 SNPs reported between genomes from epidemiologically linked samples from the same transmission chain (3). Among Clade IIb sequences, many unique mutations were shared between USA_2021_MD and variant B.1 genomes, further indicating that any evolutionary force leading to these changes most likely preceded the 2022 outbreak and were found in a common ancestor shared between the variant B.1 and USA_2021_MD. In contrast, there were sufficient differences (>30 SNPs) among the three variant A.2 sequences to suggest they likely represent separate introductions into the U.S.

After the re-emergence of MPXV in Nigeria in 2017 and prior to 2022, there had only been two reported events of person-to-person transmission of MPXV outside of Africa (3, 26). Several factors may be contributing to this recent increased spread between humans, including behaviors involving close contact and failure to recognize or diagnose monkeypox to prevent spread. It is currently unclear if APOBEC3 motif or other mutations observed in the 2022 outbreak MPXV variants have affected or will affect viral transmissibility or pathogenicity. A mutation in F13L (which codes for the target of tecovirimat, an antiviral agent indicated for the treatment of smallpox) (E353K) was shared by all sequences in the predominant 2022 outbreak cluster; functional studies revealed no effect on the efficacy of tecovirimat (EC50 was 0.007731 ± 0.002 μM for 353E and 0.002860 ± 0.001 μM for 353K) (Fig S5).

APOBEC3 proteins are an important component of the vertebrate innate immune system and restrict the replication of exogenous viruses through cytosine-to-uracil deaminase activity (19, 27). APOBEC3 proteins act mainly on single-stranded DNA and have been extensively studied in RNA viruses, including HIV (20) but can also act on DNA viruses (28, 29). It is unlikely the GA-to-AA substitutions we report here are due to sequencing error as the sequences used in the analyses were produced across many different laboratories using multiple methods, including both short and long read approaches, and both with and without PCR. The presence of mutations consistent with APOBEC3 editing in several branches of the Clade IIb MPXV will cause discordance in estimates of evolutionary rate and divergence time using many standard methods.

Specific enrichment of APOBEC3 motif mutations in Clade IIb MPXV since 2017 may suggest something different in virus-host interactions, facilitating this mutational pattern. Such a pattern might be caused by sustained transmission in a new host or a new route of infection. Primates, including humans, exhibit evolutionary expansion of the APOBEC3 family to seven members that are not seen in rodents (30), which are thought to be the primary reservoir of monkeypox virus in Africa. Additionally, APOBEC3 has been found in higher levels in mucosal tissues than in skin in humans (31), thus permucosal transmission may provide increased opportunity for APOBEC3 editing of MPXV. Prior to this outbreak, APOBEC3 editing was not recognized or appreciated as a mechanism of poxvirus mutation. Two studies in vaccinia virus found (1) overexpression of APOBEC3 had no effect on viral replication and (2) endogenous or overexpressed APOBEC3 was not degraded by vaccinia virus, as it is by other viruses (32, 33). However, neither of these studies performed sequencing. The fact that the enrichment can be observed at essentially all levels within Clade IIb MPXV since 2017 suggests it is a recurrent and dominant mutational effect in recent MPXV evolution. Furthermore, the majority of mutations detected among expanding outbreak samples were 5ʹ GA-to-AA (Fig. 2, Table S7), indicating that this mutational bias is continuing.

Supplementary Material

Supplementary_Materials

Fig. S1. Nucleotide changes among Clade IIb MPXV genome sequences. The predominant 2022 MPXV outbreak variant B.1 and outbreak variant A.2 are highlighted. The large node at the center of variant B.1 represents 13 identical sequences; country abbreviations are given for sake of space (sequences used can be found in Table S1). GenBank accession numbers are given for all other samples. Sequence differences between nodes are indicated by the numbers on the branches. Unlabeled nodes represent hypothetical common ancestors, lines connecting nodes do not represent direct links between cases. GBR: United Kingdom; USA: United States of America; FRA: France; BEL: Belgium; PRT: Portugal; ITA: Italy; ESP: Spain; NLD: Netherlands; CHE: Switzerland; SVN: Slovenia.

Fig. S2. A detailed version of the ML tree shown in main text Fig. 2A. Ancestral nodes of interest are noted here, and these are used to track the statistical exploration of G-to-A mutations in an APOBEC context through different lineages in the tree summarized in Fig. 2B in the main text and provided in detail in Tables S6 (for Lineage A) and S8 (for Clade I and II outside Lineage A). The sequence name of each of the taxa are shown, preceded by their GenBank accession number.

Figure S3. A. Details of mutational patterns among 397 variant B.1 sequences available through GISAID that were sampled between May 1, 2022, and July 15, 2022. Each horizontal line represents a unique sequence, and tick marks indicate all mutations relative to the ancestor of the 2022 outbreak variant B.1. Sequences that are phylogenetically similar based on shared mutations are proximal to each other along the Y-axis, and so vertical columns of mutations are indicative of shared mutations between sequences. One would expect roughly the same number of red and blue mutations if G-to-A mutations were randomly occurring; the preponderance of blue ticks indicates the extreme bias favoring G-to-A mutations embedded in a 5’ GA-to-AA context among these international 2022 outbreak strains. The GA-to-AA pattern dominates (dark blue) over the alternative APOBEC motif 5’ GG-to-AG (cyan), as we have seen throughout Lineage A. 114 sequences among the 397 variant B.1 sequences had no SNPS relative to the ancestral form of the genome and these are not shown in the figure. GY-to-AY (red) indicates a G-to-A change followed by a C or T, the G-to-A mutations that are not in an APOBEC motif. B. Geographic distribution of sampling of the 397 sequences used for part A, based on the sequences available from GISAID sampled between May 1, 2022, and July 15, 2022. The relative area of a given red circle reflects the number of available sequences. This map captures the level of contributions of early monkeypox genomes to GISAID the geographic distribution of the available data and is not meant to reflect the confirmed cases.

Fig. S4. Chimeric sequences within Clade IIb. A. We acquired 400 full genome MPXV sequences from GISAID sampled between May 1, 2022, and July 15, 2022, for APOBEC analysis. While confirming their lineage association, we identified 3 related sequences that were chimeric in that they carried 5 consecutive SNPs that were mirrored in each ITR that were shared with more ancestral sequences from Lineage A and Clade IIb but not found in variant B.1. Otherwise, these sequences were B.1-like throughout the remainder of the genome. These sequences originated from three different European laboratories*. The graphic is a “Highlighter” plot (a www.hiv.lanl.gov tool) that shows every SNP relative to the USA 2022 MA001 sequence for a small set of background sequences used to explore that hypothesis that the chimeric viruses were the result of recombination. Using the Recombination Analysis Program (RAPR) (PMID: 29765018 https://www.hiv.lanl.gov/content/sequence/RAP2017/rap.html), and excluding the 5’ ITR from the analysis so we did not overcount the repeated mutations in both ITRs (marked in gray). The putative recombinant sequence from Spain was incomplete and did not span the 5’ ITR region, therefore we used the 3’ region for analysis to enable including all three related forms. We determined that a string of 5 mutations in series that differed from the representative variant B.1 sequences and that are shared with more ancestral Clade IIb sequences is unlikely to have occurred by chance alone (p-value 3.71e-07). Thus, the RAPR analysis supports this being a recombinant lineage, although alternatively this pattern may have emerged as a systematic sequencing artifact. In either case, the analysis indicated these sequences were chimeric and not simple variant B.1 viruses, and so we removed them from the subsequent analysis of variant B.1 sequences in Fig. S3. Red sequence names indicate sequences identified by RAPR as representative of candidate parental lineages, and blue the putative recombinant lineage. B. The pattern of interest in the ITR was consistently found in sequences throughout Clade IIb, including samples from the 1970s through contemporary samples excluding the B.1 variant. In the 2021_MD sequence, four of the five variant B.1 mutations were apparent.

*EPI_ISL_13302316: Laboratory of Clinical Microbiology, Virology and Bioemergencies. ASST-Fatebenefratelli-Sacco, L.Sacco University Hospital, Milano, Italy; EPI_ISL_13331716: Genomics Division, Instituto Tecnologico y de Energias Renovables (ITER), Poligono Industrial de Granadilla Santa Cruz de Tenerife, Spain (the 3’ ITR sequence was not available); EPI_ISL_13052287: Virology, GENomique EPIdemiologique des maladies Infectieuses, Lyon, France.

Fig. S5. Sensitivity of 2022 outbreak MPXVs to tecovirimat (TPOXX). A. Aggregate results of cytopathic effect (CPE) assay showing cell growth in the presence of MPXV after treatment with different doses of TPOXX. Error bars indicate 95% confidence intervals based on four statistical and two biological replicates at each dose per group. B. Half-maximal effective concentration EC50 of TPOXX for different MPXV isolates. Average plus 95% confidence intervals were based on 24 values from four statistical and two biological replicates.

Table S1. List of Genbank accession numbers for sequences used in haplotype network analysis (Fig. S1).

Table S2. Variant table showing differences in genomes of USA_2022_MA001 (ON563414.3) and USA_2021_MD (ON676708.1) compared to MT903344.1 UK-P2. Changes shared among USA_2022_MA001 and USA_2021_MD are highlighted in green. Position and annotation information is based on reference MT903344.1.

Table S3. Variant table showing differences in genomes of U.S. variant A.2 USA_2022_FL001 (ON674051.1), USA_2022_VA001 (ON675438.1) and USA_2021_TX (ON676707.1) compared to MT903344.1 UK-P2. Changes shared among all three A.2 sequences are highlighted in green. Position and annotation information is based on reference MT903344.1.

Table S4. Details for U.S. MPXV genomes from nine 2021 – 2022 cases.

Table S5. Summary PCR results for OPX3 (14) and Clade II-specific (15) real-time PCR assay. Average Ct value is based on triplicate testing for each assay. Difference (diff) was calculated by subtracting Clade II-specific average Ct from OPX3 average Ct value.

Table S6. Mutational pattern details within Lineage A, where APOBEC3 motif G-to-A mutations are enriched. Here we compare the frequencies of G-to-A substitutions in APOBEC3 and non-APOBEC3 contexts in Lineage A and each relevant sub-lineage within Lineage A. This table details the mutational patterns throughout the entire Lineage A, and G-to-A mutations in an APOBEC3 context are highly enriched relative to other mutations. First, to determine the overall level of enrichment for these mutations in Lineage A, we compressed every unique SNP mutation that arose within Lineage A onto a single sequence we call “All unique SNPs in Lineage A merged”; the analysis of the merged data is highlighted in gray at the top of the table, and this summary was also used in Figure 2 in the main text. Within Lineage A, 167 G-to-A mutations arose in an APOBEC3 context (5’ GA-to-AA or GG-to-AG, in blue); 36,415 Gs in a APOBEC context, GA or GG in the ancestral sequence, did not change. In contrast, only 9 G-to-A mutations arose out of outside of an APOBEC3 context (GY-to-AY, where Y is C or T, in red), while 30,669 GY’s remained unchanged. A simple two-sided Fisher’s exact text was used to test the null hypothesis that the distributions between the two mutational patterns was random; q-values (37, 38) were calculated based on all p-values included Tables S6 and S8 to correct for multiple tests. The G-to-A mutations are obviously highly enriched. When the odds ratio is greater than one (highlighted in light blue) it is indicative more G-to-A mutations in an APOBEC3 context. We also tallied all other SNPs that were not G-to-A (in gray); these were rare, there were only 14 of these in all of Lineage A. To resolve if the enrichment pattern for G-to-A mutations in an APOBEC context was observed throughout Lineage A, we compared the mutational patterns on the internal branches within Lineage A, and we found all partitions of the data throughout Lineage A were enriched for G-to-A in an APOBEC3 context. While variant B.1, the predominant 2022 outbreak, had too few changes between the ancestor and any one leaf for a single sequence to be significant, all of the 8 mutations observed in the 12 earliest sequences from variant B.1 available for our initial analysis were G-to-A in an APOBEC3 context, and this enrichment was significant when considered together as a merged sequence. There were enough mutations in the longer branches in variant A.2 for statistical significance of the pattern to be evident for each of the 3 taxa independently. G-to-A in an APOBEC3 context was also enriched among the Nigerian sequences, in 2018/2019 export set, and on each of 4 major internal branches within Lineage A.

Table S8. Mutational pattern details in Clade I and II outside of Lineage A, where G-to-A mutations outside of an APOBEC3 motif are modestly enriched, and G-to-A mutations are less favored overall compared to Lineage A. Here we compare Clade I, Clade IIa and Clade IIb data prior to the emergence of Lineage A. In these clades, there was no evident enrichment for G-to-A changes in APOBEC3 context. Instead, there was a modest, but overall significant, enrichment for G-to-A substitutions to occur not in a APOBEC3 context (the two-sided p-values were for the merged data for Clades I and IIa were significant) and the Odds Ratios were less than one throughout these clades indicating G-to-A changes in APOBEC3 context were relatively diminished compared to other G-to-A mutations.

Table S9. Rate estimates for Lineage A, Clade I and Clade IIa. Analysis was performed in BEAST v1.8.3 using the alignment from Figure 2. Mean rates are shown with 95% highest posterior density intervals in parenthesis.

TableS10

Table S10. GISAID acknowledgement table for variant B.1 sequences

TableS11

Table S11. GISAID acknowledgement table for variant A.2 sequences

TableS7. Table S7. Mutational pattern details in viral sequences sampled in humans in 2022, where APOBEC3 motif G-to-A mutations are enriched.

To confirm the enrichment for G-to-A substitutions was retained in the rapidly expanding available outbreak sequence data, we extended our original analysis to include an additional 397 variant B.1 sequences that were available in GISAID as of July 15, 2022; these data supported our initial findings. A summary of all merged unique mutations in this set of 397 sequences is provided in this table, and the merged data is presented in Fig. 2B in the main text; a detailed graphic displaying the details of the analyses is provided in the supplemental Fig. S3. We also identified four additional variant A.2 sequences, two from Thailand and two from India, that became available through GISAID later in July. These too were highly enriched for G-to-A substitutions, and the statistics are provided in this table. The column headers are described in Table S6. *EPI_ISL_14011193: Bangkok Hospital Phuket, Thai Red Cross Emerging Infectious Diseases Clinical Center and Faculty of Medicine, Chulalongkorn University; EPI_ISL_13953611: Indian Council of Medical Research-National Institute of Virology; EPI_ISL_14049245, Indian Council of Medical Research-National Institute of Virology; EPI_ISL_13983888: Bangkok Hospital Phuket, National Institute of Health, Department of Medical Sciences, Ministry of Public Health, Thailand. The GISAID acknowledgments tables for these two data sets are provided in Table S9 and S10.

Acknowledgments:

We thank Brian Foley for providing a draft alignment of MPXV. We would like to thank Duncan MacCannell, Ketan Patel, Ying Tao, Sue Tong, Blake Cherney, and John Barnes from U.S. Centers for Disease Control and Prevention; Alexander Kim and Salimatu Lukula from the Maryland Department of Health Laboratory; Barbara Downes from Fairfax County, Virginia Department of Health, Catherine Brown, Larry Madoff, Mary DeMartino, Erika Buzby, Stephanie Ash, and Joshua Hall from the Massachusetts Department of Health; Juan Jaramillo from Dallas County LRN; Destiny Hairfield and Kimberly Stratton from Virginia Department of General Services - Division of Consolidated Laboratory Services. We gratefully acknowledge the authors from the originating laboratories and the submitting laboratories, who generated and shared via GISAID genetic sequence data on which this research is based and have included acknowledgements tables (Tables S10 and S11).

Funding:

No external funding was used for the CDC effort in this investigation. BK was supported by the Laboratory Directed Research and Development program of Los Alamos National Laboratory under project number 20220660ER.

Footnotes

Data and materials availability:

All data are available in the main text or public databases and GISAID. MPXV genomes were deposited to GenBank under accession numbers ON563414.3, ON674051, ON675438, and ON676703 – ON676708. Alignments and code have been made available (4143).

Supplementary Materials

Materials and Methods

Tables S1 – S11

Figures S1 – S5

References 3443

References and Notes

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Supplementary_Materials

Fig. S1. Nucleotide changes among Clade IIb MPXV genome sequences. The predominant 2022 MPXV outbreak variant B.1 and outbreak variant A.2 are highlighted. The large node at the center of variant B.1 represents 13 identical sequences; country abbreviations are given for sake of space (sequences used can be found in Table S1). GenBank accession numbers are given for all other samples. Sequence differences between nodes are indicated by the numbers on the branches. Unlabeled nodes represent hypothetical common ancestors, lines connecting nodes do not represent direct links between cases. GBR: United Kingdom; USA: United States of America; FRA: France; BEL: Belgium; PRT: Portugal; ITA: Italy; ESP: Spain; NLD: Netherlands; CHE: Switzerland; SVN: Slovenia.

Fig. S2. A detailed version of the ML tree shown in main text Fig. 2A. Ancestral nodes of interest are noted here, and these are used to track the statistical exploration of G-to-A mutations in an APOBEC context through different lineages in the tree summarized in Fig. 2B in the main text and provided in detail in Tables S6 (for Lineage A) and S8 (for Clade I and II outside Lineage A). The sequence name of each of the taxa are shown, preceded by their GenBank accession number.

Figure S3. A. Details of mutational patterns among 397 variant B.1 sequences available through GISAID that were sampled between May 1, 2022, and July 15, 2022. Each horizontal line represents a unique sequence, and tick marks indicate all mutations relative to the ancestor of the 2022 outbreak variant B.1. Sequences that are phylogenetically similar based on shared mutations are proximal to each other along the Y-axis, and so vertical columns of mutations are indicative of shared mutations between sequences. One would expect roughly the same number of red and blue mutations if G-to-A mutations were randomly occurring; the preponderance of blue ticks indicates the extreme bias favoring G-to-A mutations embedded in a 5’ GA-to-AA context among these international 2022 outbreak strains. The GA-to-AA pattern dominates (dark blue) over the alternative APOBEC motif 5’ GG-to-AG (cyan), as we have seen throughout Lineage A. 114 sequences among the 397 variant B.1 sequences had no SNPS relative to the ancestral form of the genome and these are not shown in the figure. GY-to-AY (red) indicates a G-to-A change followed by a C or T, the G-to-A mutations that are not in an APOBEC motif. B. Geographic distribution of sampling of the 397 sequences used for part A, based on the sequences available from GISAID sampled between May 1, 2022, and July 15, 2022. The relative area of a given red circle reflects the number of available sequences. This map captures the level of contributions of early monkeypox genomes to GISAID the geographic distribution of the available data and is not meant to reflect the confirmed cases.

Fig. S4. Chimeric sequences within Clade IIb. A. We acquired 400 full genome MPXV sequences from GISAID sampled between May 1, 2022, and July 15, 2022, for APOBEC analysis. While confirming their lineage association, we identified 3 related sequences that were chimeric in that they carried 5 consecutive SNPs that were mirrored in each ITR that were shared with more ancestral sequences from Lineage A and Clade IIb but not found in variant B.1. Otherwise, these sequences were B.1-like throughout the remainder of the genome. These sequences originated from three different European laboratories*. The graphic is a “Highlighter” plot (a www.hiv.lanl.gov tool) that shows every SNP relative to the USA 2022 MA001 sequence for a small set of background sequences used to explore that hypothesis that the chimeric viruses were the result of recombination. Using the Recombination Analysis Program (RAPR) (PMID: 29765018 https://www.hiv.lanl.gov/content/sequence/RAP2017/rap.html), and excluding the 5’ ITR from the analysis so we did not overcount the repeated mutations in both ITRs (marked in gray). The putative recombinant sequence from Spain was incomplete and did not span the 5’ ITR region, therefore we used the 3’ region for analysis to enable including all three related forms. We determined that a string of 5 mutations in series that differed from the representative variant B.1 sequences and that are shared with more ancestral Clade IIb sequences is unlikely to have occurred by chance alone (p-value 3.71e-07). Thus, the RAPR analysis supports this being a recombinant lineage, although alternatively this pattern may have emerged as a systematic sequencing artifact. In either case, the analysis indicated these sequences were chimeric and not simple variant B.1 viruses, and so we removed them from the subsequent analysis of variant B.1 sequences in Fig. S3. Red sequence names indicate sequences identified by RAPR as representative of candidate parental lineages, and blue the putative recombinant lineage. B. The pattern of interest in the ITR was consistently found in sequences throughout Clade IIb, including samples from the 1970s through contemporary samples excluding the B.1 variant. In the 2021_MD sequence, four of the five variant B.1 mutations were apparent.

*EPI_ISL_13302316: Laboratory of Clinical Microbiology, Virology and Bioemergencies. ASST-Fatebenefratelli-Sacco, L.Sacco University Hospital, Milano, Italy; EPI_ISL_13331716: Genomics Division, Instituto Tecnologico y de Energias Renovables (ITER), Poligono Industrial de Granadilla Santa Cruz de Tenerife, Spain (the 3’ ITR sequence was not available); EPI_ISL_13052287: Virology, GENomique EPIdemiologique des maladies Infectieuses, Lyon, France.

Fig. S5. Sensitivity of 2022 outbreak MPXVs to tecovirimat (TPOXX). A. Aggregate results of cytopathic effect (CPE) assay showing cell growth in the presence of MPXV after treatment with different doses of TPOXX. Error bars indicate 95% confidence intervals based on four statistical and two biological replicates at each dose per group. B. Half-maximal effective concentration EC50 of TPOXX for different MPXV isolates. Average plus 95% confidence intervals were based on 24 values from four statistical and two biological replicates.

Table S1. List of Genbank accession numbers for sequences used in haplotype network analysis (Fig. S1).

Table S2. Variant table showing differences in genomes of USA_2022_MA001 (ON563414.3) and USA_2021_MD (ON676708.1) compared to MT903344.1 UK-P2. Changes shared among USA_2022_MA001 and USA_2021_MD are highlighted in green. Position and annotation information is based on reference MT903344.1.

Table S3. Variant table showing differences in genomes of U.S. variant A.2 USA_2022_FL001 (ON674051.1), USA_2022_VA001 (ON675438.1) and USA_2021_TX (ON676707.1) compared to MT903344.1 UK-P2. Changes shared among all three A.2 sequences are highlighted in green. Position and annotation information is based on reference MT903344.1.

Table S4. Details for U.S. MPXV genomes from nine 2021 – 2022 cases.

Table S5. Summary PCR results for OPX3 (14) and Clade II-specific (15) real-time PCR assay. Average Ct value is based on triplicate testing for each assay. Difference (diff) was calculated by subtracting Clade II-specific average Ct from OPX3 average Ct value.

Table S6. Mutational pattern details within Lineage A, where APOBEC3 motif G-to-A mutations are enriched. Here we compare the frequencies of G-to-A substitutions in APOBEC3 and non-APOBEC3 contexts in Lineage A and each relevant sub-lineage within Lineage A. This table details the mutational patterns throughout the entire Lineage A, and G-to-A mutations in an APOBEC3 context are highly enriched relative to other mutations. First, to determine the overall level of enrichment for these mutations in Lineage A, we compressed every unique SNP mutation that arose within Lineage A onto a single sequence we call “All unique SNPs in Lineage A merged”; the analysis of the merged data is highlighted in gray at the top of the table, and this summary was also used in Figure 2 in the main text. Within Lineage A, 167 G-to-A mutations arose in an APOBEC3 context (5’ GA-to-AA or GG-to-AG, in blue); 36,415 Gs in a APOBEC context, GA or GG in the ancestral sequence, did not change. In contrast, only 9 G-to-A mutations arose out of outside of an APOBEC3 context (GY-to-AY, where Y is C or T, in red), while 30,669 GY’s remained unchanged. A simple two-sided Fisher’s exact text was used to test the null hypothesis that the distributions between the two mutational patterns was random; q-values (37, 38) were calculated based on all p-values included Tables S6 and S8 to correct for multiple tests. The G-to-A mutations are obviously highly enriched. When the odds ratio is greater than one (highlighted in light blue) it is indicative more G-to-A mutations in an APOBEC3 context. We also tallied all other SNPs that were not G-to-A (in gray); these were rare, there were only 14 of these in all of Lineage A. To resolve if the enrichment pattern for G-to-A mutations in an APOBEC context was observed throughout Lineage A, we compared the mutational patterns on the internal branches within Lineage A, and we found all partitions of the data throughout Lineage A were enriched for G-to-A in an APOBEC3 context. While variant B.1, the predominant 2022 outbreak, had too few changes between the ancestor and any one leaf for a single sequence to be significant, all of the 8 mutations observed in the 12 earliest sequences from variant B.1 available for our initial analysis were G-to-A in an APOBEC3 context, and this enrichment was significant when considered together as a merged sequence. There were enough mutations in the longer branches in variant A.2 for statistical significance of the pattern to be evident for each of the 3 taxa independently. G-to-A in an APOBEC3 context was also enriched among the Nigerian sequences, in 2018/2019 export set, and on each of 4 major internal branches within Lineage A.

Table S8. Mutational pattern details in Clade I and II outside of Lineage A, where G-to-A mutations outside of an APOBEC3 motif are modestly enriched, and G-to-A mutations are less favored overall compared to Lineage A. Here we compare Clade I, Clade IIa and Clade IIb data prior to the emergence of Lineage A. In these clades, there was no evident enrichment for G-to-A changes in APOBEC3 context. Instead, there was a modest, but overall significant, enrichment for G-to-A substitutions to occur not in a APOBEC3 context (the two-sided p-values were for the merged data for Clades I and IIa were significant) and the Odds Ratios were less than one throughout these clades indicating G-to-A changes in APOBEC3 context were relatively diminished compared to other G-to-A mutations.

Table S9. Rate estimates for Lineage A, Clade I and Clade IIa. Analysis was performed in BEAST v1.8.3 using the alignment from Figure 2. Mean rates are shown with 95% highest posterior density intervals in parenthesis.

TableS10

Table S10. GISAID acknowledgement table for variant B.1 sequences

TableS11

Table S11. GISAID acknowledgement table for variant A.2 sequences

TableS7. Table S7. Mutational pattern details in viral sequences sampled in humans in 2022, where APOBEC3 motif G-to-A mutations are enriched.

To confirm the enrichment for G-to-A substitutions was retained in the rapidly expanding available outbreak sequence data, we extended our original analysis to include an additional 397 variant B.1 sequences that were available in GISAID as of July 15, 2022; these data supported our initial findings. A summary of all merged unique mutations in this set of 397 sequences is provided in this table, and the merged data is presented in Fig. 2B in the main text; a detailed graphic displaying the details of the analyses is provided in the supplemental Fig. S3. We also identified four additional variant A.2 sequences, two from Thailand and two from India, that became available through GISAID later in July. These too were highly enriched for G-to-A substitutions, and the statistics are provided in this table. The column headers are described in Table S6. *EPI_ISL_14011193: Bangkok Hospital Phuket, Thai Red Cross Emerging Infectious Diseases Clinical Center and Faculty of Medicine, Chulalongkorn University; EPI_ISL_13953611: Indian Council of Medical Research-National Institute of Virology; EPI_ISL_14049245, Indian Council of Medical Research-National Institute of Virology; EPI_ISL_13983888: Bangkok Hospital Phuket, National Institute of Health, Department of Medical Sciences, Ministry of Public Health, Thailand. The GISAID acknowledgments tables for these two data sets are provided in Table S9 and S10.

RESOURCES