Skip to main content
PLOS Biology logoLink to PLOS Biology
. 2020 Feb 11;18(2):e3000611. doi: 10.1371/journal.pbio.3000611

Combining genomics and epidemiology to track mumps virus transmission in the United States

Shirlee Wohl 1,2,*,#, Hayden C Metsky 1,3,#, Stephen F Schaffner 1,2,4,#, Anne Piantadosi 1,5, Meagan Burns 6, Joseph A Lewnard 7, Bridget Chak 1,2, Lydia A Krasilnikova 1,2, Katherine J Siddle 1,2, Christian B Matranga 1, Bettina Bankamp 8, Scott Hennigan 6, Brandon Sabina 6, Elizabeth H Byrne 1,2, Rebecca J McNall 8, Rickey R Shah 1,2, James Qu 1, Daniel J Park 1, Soheyla Gharib 9, Susan Fitzgerald 9, Paul Barreira 9, Stephen Fleming 6, Susan Lett 6, Paul A Rota 8,, Lawrence C Madoff 6,10,, Nathan L Yozwiak 1,2,, Bronwyn L MacInnis 1,4,‡,*, Sandra Smole 6,, Yonatan H Grad 4,11,12,, Pardis C Sabeti 1,2,4,13,
Editor: Sara Y Del Valle14
PMCID: PMC7012397  PMID: 32045407

Abstract

Unusually large outbreaks of mumps across the United States in 2016 and 2017 raised questions about the extent of mumps circulation and the relationship between these and prior outbreaks. We paired epidemiological data from public health investigations with analysis of mumps virus whole genome sequences from 201 infected individuals, focusing on Massachusetts university communities. Our analysis suggests continuous, undetected circulation of mumps locally and nationally, including multiple independent introductions into Massachusetts and into individual communities. Despite the presence of these multiple mumps virus lineages, the genomic data show that one lineage has dominated in the US since at least 2006. Widespread transmission was surprising given high vaccination rates, but we found no genetic evidence that variants arising during this outbreak contributed to vaccine escape. Viral genomic data allowed us to reconstruct mumps transmission links not evident from epidemiological data or standard single-gene surveillance efforts and also revealed connections between apparently unrelated mumps outbreaks.


A study of recent mumps outbreaks in the United States reveals transmission patterns within and between Massachusetts communities and highlights the importance of combining detailed genomic and epidemiological data, both locally and nationwide.

Introduction

An unusually large number of mumps cases were reported in the United States in 2016 and 2017, despite high rates of vaccination [1,2]. In the prevaccination era, mumps was a routine childhood disease, with over 150,000 cases reported in the US annually [1]. After the mumps vaccine was introduced in 1967, mumps incidence declined by more than 99% [1]. Case counts rose again briefly in the mid-1980s and then continued to decrease after a national outbreak of measles prompted the recommendation of 2 Measles-Mumps-Rubella (MMR) vaccine doses in 1989 [3]. In the early 2000s, only a few hundred cases of mumps were reported annually in the US [1], attesting to the success of vaccination, possibly combined with decreasing clinical suspicion. This apparently low nationwide incidence was interrupted by an outbreak of >5,000 cases in the Midwestern US in 2006 [4], followed by a period of low incidence with minor outbreaks until 2016. This recent resurgence in mumps is partially explained by waning vaccine-induced immunity [5], but the extent to which genetic changes in circulating viruses have contributed is not yet clear.

In Massachusetts, over 250 cases were reported in 2016 and over 170 in 2017, far exceeding the usual state incidence of <10 cases per year [6]. As seen in other recent outbreaks, most cases were associated with academic institutions [4] and other close-contact settings, including prisons [7] and tightly-knit ethnic and religious communities [8,9]. Mumps was reported to the Massachusetts Department of Public Health (MDPH) by 18 colleges and universities in the state, including Harvard University (Harvard), University of Massachusetts Amherst (UMass), and Boston University (BU)—the 3 institutions with the largest numbers of reported cases. Of the individuals infected, 65% had the recommended 2 doses of the MMR vaccine (S1 Table).

We used whole genome sequencing, phylogenetic analysis, and transmission reconstruction to investigate the spread of mumps at multiple geographic scales, including within a college campus, more widely in Massachusetts, and across the US. Pathogen sequence data have become an important tool for understanding the spread of infectious diseases in near real time, allowing researchers to pinpoint outbreak origins [10,11], resolve transmission patterns [12], and detect changes throughout the genome that could affect disease severity or the effectiveness of vaccines and diagnostics [1316]. Such data have been shown to be most useful when analyzed alongside epidemiological data [12,17,18], although the field is still exploring in detail how genomics can contribute to understanding and controlling outbreaks [19]. Mumps outbreaks in 2016 and 2017 in the US, particularly those in universities, provided an opportunity to apply these ideas to the mumps virus and to further this exploration in the context of a closely monitored, largely self-contained campus setting.

Results and discussion

We generated 203 whole mumps virus genomes from buccal swabs from patients who tested positive by polymerase chain reaction (PCR) for mumps virus (Fig 1A), of which 158 were from Massachusetts during the 2016 and 2017 outbreak (Fig 1B), with 92 from Harvard, BU, or UMass in particular. The remaining 43 genomes were from 15 other states, collected between 2014 and 2017 (Table 1). These 203 genomes come from 259 PCR-positive samples, 56 of which were excluded because they did not produce data that met our definition of a complete genome (see Materials and methods) because of low viral loads, often caused by collecting the samples too late in the course of infection (S1A and S1B Fig). These 56 samples that did not produce genomes were well distributed across the 2016 through 2017 study period and showed no geographic clustering (Massachusetts versus elsewhere).

Fig 1. Massachusetts mumps outbreak overview.

Fig 1

(A) Maximum clade credibility tree of 225 mumps virus genotype G whole genome sequences, including 200 generated in this study. Labels on internal nodes indicate posterior support. Clades I and II contain 91% of the samples from the 2016 and 2017 Massachusetts (MA) outbreak; I-outbreak and II-outbreak are the largest clades within them that contain only samples from the outbreak. Clade 0-UM contains samples associated with UMass other than those in Clades I and II; the same is true for 0-BU (BU) and 0-HU (Harvard). II-community contains primarily samples associated with a local Massachusetts community. (B) Number of reported mumps cases in 2016 by epidemiological week in Massachusetts (gray) and in this study (blue). (C) Probability distributions for the date of the most recent common ancestor (computed from tMRCA) of selected clades (see S2 Table for additional clades). Dotted line is the mean of each distribution. BU, Boston University; HU, Harvard Univeristy; tMRCA, time to the most recent common ancestor; UM, University of Massachusetts Amherst; UMass, University of Massachusetts Amherst.

Table 1. Summary of samples and genomes.

Counts of samples sequenced and genomes generated by source (MDPH or CDC), date, and mumps virus PCR result. [G] indicates genotype G genomes. Two genomes are from a second sample of a patient already included in the data set.

Source Dates PCR result Samples Genomes Genomes
(unique patients)
Genomes [G]
(unique patients)
CDC 2014–2015 + 26 18 18 18
CDC 2016–2017 + 33 25 25 25
MDPH 2014–2015 + 6 2 2 2
MDPH 2016–2017 + 194 158 156 155
MDPH 2016–2017 29 0 0 0
Total 288 203 201 200

Abbreviations: CDC, US Centers for Disease Control and Prevention; MDPH, Massachusetts Department of Public Health; PCR, polymerase chain reaction

The median sequencing depth of the 203 successful genomes was 176x (first quintile: 42.4x; fourth quintile: 651.8x; S1C Fig) and all were >82% complete (182 genomes were >99% complete). These full genomes provide substantially more data than sequences of the small hydrophobic (SH) gene alone (accounting for <3% of the genome), which is conventionally used for classifying mumps virus [20,21]. For the Massachusetts samples, epidemiological data (including university affiliation and vaccination status) and contact tracing data collected by the MDPH and local universities were also available (S1 Table; see S1 Data for full line list). We note that 2 of the individuals in the Massachusetts data set had paired samples from 2 timepoints; in both cases, we excluded the genome from the later sample in subsequent analyses, leaving 201 genomes from unique patients. Of these 201 individuals from Massachusetts, 72% had known vaccination status, of whom 93% had 1 or more MMR doses. We also sequenced 29 PCR-negative samples from suspected mumps cases; in only one was there limited evidence for mumps virus, but we did identify 4 other viruses (one in each of 4 of these samples; S3 Table), 2 of which are known to cause parotitis [22], a characteristic symptom of mumps [23].

The viral whole genome data led to 2 key findings about the origin and spread of the recent outbreaks. First, our analysis revealed that the outbreaks—in Massachusetts and also more broadly in the US—were largely the product of a single mumps lineage and that this lineage was responsible for most if not all US mumps outbreaks since at least 2006. This lineage belongs to mumps virus genotype G [20,21] as do all but 1 of the 201 genomes from unique patients in our data set. Unless otherwise stated, all subsequent analyses refer only to these 200 genotype G genomes (Table 1).

Phylogenetic analysis of these 200 genomes, along with the other 25 publicly available genotype G genomes, shows that almost all (192 of the 200 samples in our data set; 211 of the 225 total) belong to a single lineage within genotype G, despite having been collected from all over the country (Fig 1A; also supported by a principal component analysis; see S2 Fig). This lineage descends from the US mumps outbreak in 2006. Single-gene data (see discussion of SH gene below) show no evidence for extensive transmission of this lineage outside the US, suggesting that most mumps cases in the US since 2006 are the result of ongoing transmission of a single lineage within the United States; specifically, this connects previously unassociated cases (with published genomes) between 2006 and 2016 [1,8,24]. This also suggests that unreported infections may be common and may be driving ongoing transmission [23,25].

The distribution of cases within the phylogenetic tree further suggests geographic movement of mumps virus on short time scales. For example, clade I (Fig 1A) contains virus genomes collected in 2015 through 2017 from Massachusetts, from elsewhere in the Northeast, from the South, and from the Midwest. The date of the most recent common ancestor of this clade was most likely in 2013 or later (Fig 1C and S3 Table), which implies that the virus spread to all of those regions in less than 5 years. Similarly, clade II contains genomes from all of those regions as well as one from the western US, all collected during the same time period. Clade II likely dates from no earlier than 2014, suggesting spread to these regions occurred in an even shorter amount of time. Together, these observations provide strong evidence for regional circulation in the Northeast, coupled with at least occasional movement to more distant geographic regions. Further sampling of these other regions may reveal additional long-distance dispersal events, which would indicate widespread geographic movement of mumps. This would not be surprising if transmission frequently occurs through college students, many of whom reside in close-contact settings and also travel long distances to return home.

The second key finding about the recent spread of mumps virus relates to its origins in Massachusetts. Although this 2006 lineage dominated the Massachusetts outbreak, the outbreak was not caused by a single introduction of mumps virus into the state but rather is comprised of at least 6 distinct viral clades (Fig 1A). Several of these were sublineages of the dominant 2006 lineage, including the 2 largest, Clades I and II, which comprise more than 90% of samples from Massachusetts (13% and 77%, respectively). These 2 clades diverged before the 2016 outbreak began (Fig 1C and S2 Table), indicating that they independently contributed to it. The remaining 4 clades (0-UM, 0-HU, 0-BU, plus one genotype K genome) likewise represent independent introductions, one of which (0-BU) also falls within the 2006 lineage. Such multiple introductions, indicating widespread transmission of mumps virus, have been observed elsewhere in the US [25] and are also seen in this data set within individual universities: samples from UMass span 2 clades (II and 0-UM), Harvard samples span 3 clades (I, II, and 0-HU), and BU samples span 3 clades (I, II, 0-BU) (S3 Fig). This emerging pattern of complexity, which has only become apparent with viral genomic data, provides a previously inaccessible look into the degree to which mumps is moving within and between states and communities in the US.

Epidemiological investigation showed that each of the 4 minor clades described above contains at least one sample from a patient with foreign travel history during the incubation period of the virus (12–25 days before symptom onset [26]; S1 Data). Together, the genomic and epidemiological data suggest that most mumps virus cases in the US belong to the primary 2006 lineage but that small clusters of cases can be attributed to repeated importation of the virus from outside the country. These imported cases do not appear to be important contributors to the overall burden of mumps in the US.

The finding that a single mumps lineage has been successful in a highly vaccinated population [2], despite repeated introductions of other lineages, raised the possibility that mutations within this lineage have contributed to its success, perhaps by enabling vaccine escape. Because mutations contributing to the success of the entire lineage necessarily occurred early in the evolution of that lineage, we note here the differences between fixed mutations in our data set and the strain used in the mumps vaccine.

We found that there were numerous fixed differences between samples from the 2016 outbreak and the Jeryl Lynn vaccine strain in regions of immunological significance (S4A Fig and S1 Text), consistent with a recent similar analysis [27]. In the hemagglutinin-neuraminidase (HN) protein, the primary target of neutralizing antibodies [28], we observed 32 sites with fixed amino acid substitutions between our sequences and the Jeryl Lynn strain. Thirty of these sites were conserved between our sequences and a cell-passaged clinical strain that was isolated from Iowa in 2006 (accession: JX287385), near the beginning of mumps resurgence in the US. The Iowa 2006 strain was previously shown to be neutralized by sera from both vaccinated and naturally infected individuals, but to a lower degree than neutralization of the Jeryl Lynn strain itself [2931], raising the possibility that some of these mutations may confer partly reduced neutralization susceptibility. In addition, we observed 2 positions at which our sequences differed from both the Jeryl Lynn and the Iowa 2006 strains. At these 2 positions, the variant observed in our sequences was also present in most other genotype G sequences published to date, including in sequences from a recent study from the Netherlands [28]. Further studies are warranted to test the neutralization susceptibility of strains containing these variants, because the Iowa 2006 sequence may not be fully representative of most currently circulating genotype G viruses.

We also looked for any evidence of ongoing adaptation to the vaccine during the outbreak. We considered this as a possibility because the vaccine was introduced relatively recently in the history of the mumps virus, recently enough that the virus could still be adapting to it. Additionally, in the absence of widespread natural infection, vaccination now constitutes the largest immunological selective force on mumps virus in the US. To investigate this, we paired genomic data with vaccination records to look for any evidence of changes in the mumps virus genome during the outbreak that led to antigenic variation from the Jeryl Lynn vaccine strain.

We first tested whether nucleotide substitutions in genomes from the Massachusetts outbreak clustered by time since vaccination, or whether vaccinated individuals clustered on certain branches of the phylogenetic tree; neither was the case (S4B and S4E Fig). Second, we looked for signals of positive selection (using the dN/dS statistic) in the 225 genotype G genomes in our data set; a signal here would suggest that nonsynonymous mutations in a particular gene were being favored by ongoing selective pressure. We found no evidence for selection in any gene or at any specific site (S4C and S4D Fig). Thus, we did not find direct evidence of genetic variants arising within this outbreak that contributed to vaccine escape, although we note that both tests have quite limited statistical power in this data set. This finding is consistent with a recent study that proposed waning vaccine-induced immunity as a driving factor in recent US mumps outbreaks [5]; this hypothesis is also supported by our own data, in which we find that time since vaccination differs between Massachusetts individuals testing positive and negative for mumps virus by PCR in 2016 through 2017, with longer times since vaccination observed in mumps-positive patients (S5 Fig).

Understanding transmission routes can be crucial in guiding the public health response to an outbreak—for example, whether efforts should be directed toward controlling mumps spread within a university or preventing virus importation. In the Massachusetts mumps outbreak, detailed genomic data allowed us both to confirm connections suggested by public health investigation and to identify new links between cases. The phylogeny described above shows that mumps samples from different Massachusetts universities were genetically similar and fell within the 2 primary clades (S3 Fig), consistent with the epidemiological interpretation that these contemporaneous cases were part of 1 large mumps outbreak in 2016 through 2017. It also showed an unexpected connection between mumps cases in a local, nonacademic community (Clade II-community) and those at Harvard: the II-community cases fall within the predominantly Harvard Clade II, suggesting a spillover event from the university into the wider community (Fig 2A). Cases in these 2 clades were classified as distinct outbreaks during initial public health investigation based on epidemiological data [32], different demographic makeup of the 2 populations (older adults with no obvious university connection versus mostly college-aged students), and a 5-month gap between the last confirmed cases at Harvard and the cases in the local community. However, the genomic data clearly suggests a connection, supported by additional epidemiological investigation that identified 3 individuals affiliated with both Harvard and the local community who could have acted as transmission links.

Fig 2. Epidemiological modeling and transmission reconstruction.

Fig 2

(A) Zoom view of Clade II-community and its ancestors (see Fig 1A). Arrows: individuals affiliated with both II-community and Harvard. (B) Number of importations into Harvard calculated without (left) and with (right) viral genetic information as input. Each point represents a sample from the posterior distribution of RE(t = 0) and the number of introductions, based on simulated transmission dynamics. (C) Transmission reconstruction of individuals within Clade II-outbreak; samples are colored by institution affiliation (light purple: other institution; n/a: no affiliation; question mark: unknown affiliation). Left: reconstruction using epidemiological data only; all individuals in Clade II-outbreak with known epidemiological links (red arrows) are shown. Right: reconstruction using mumps genomes and collection dates. Arrow shading indicates probability of direct transmission between individuals (minimum probability shown: 0.3); cases with 1 or more inferred links are shown and are colored by institution. Arrows outlined in red represent transmission events identified by both genomic and epidemiological data. Faded nodes are those only connected by shared activity links (i.e., no inferred or known direct transmission). BU, Boston University; Harvard, Harvard University; RE, effective reproduction number; UMass, University of Massachusetts Amherst.

Because of the large number of cases reported and sequenced and the contact tracing information available, we were able to quantify mumps transmission dynamics within Harvard. We first used an epidemiological model (S6 Fig) [5] without genomic data to estimate transmission within the university, but it did not permit us to distinguish between a single mumps introduction followed by high transmission and multiple introductions followed by low transmission (Fig 2B left). We then modified the model to incorporate the number of viral lineages (Clades 0-HU, II-community, and 2 subclades within Clade II) observed within Harvard. This markedly improved our ability to distinguish these scenarios, supporting an estimate of 5 (95% CI: 4–18) distinct introductions to Harvard and an effective reproduction number (RE) of 1.70 (95% CI: 1.50–1.91; Fig 2B right). An RE well above 1.0 means that the outbreak could be self-sustaining, which highlights the importance of controlling on-campus transmission during mumps outbreaks.

The high-resolution data from Harvard allowed us to estimate transmission links between individual cases, which can aid in targeting containment efforts aimed at high-risk individuals or groups. For this purpose, we focused on Clade II-outbreak because it was largely contained within a single institution (Harvard) and had dense sampling. When we attempted to link individual mumps cases within this clade using contact tracing data alone, we could only infer direct mumps transmissions (“contact links”) between 2 pairs of individuals (Fig 2C left). We then used the genomic data to validate genetic distance as a proxy for epidemiological linkage (S7A and S7B Fig) and, based on these results, used genomic data and sampling dates to reconstruct possible mumps transmissions. The reconstruction (Fig 2C right, see also S7C Fig), which estimates and assigns a probability to direct transmission links, supports 1 of the 2 pairs of direct mumps transmissions as well as 2 “shared activity links” (see Materials and methods) and also suggests many new links between individuals without any known contacts. Indirect links, however, are difficult to identify using genomic data alone, illustrating the complementary contributions of genomic and epidemiological data for reconstructing detailed transmission, which can guide response efforts for mumps and other outbreaks in close-contact settings.

Conventional sequence-based mumps surveillance has been limited to the SH gene. The SH gene is a small (316-nucleotide), convenient target for sequencing [20,33] and is thus the region for which the most sequence data are available. We used the 3,646 publicly available SH sequences from mumps cases around the world (Fig 3A) to assess whether SH sequences would be adequate to distinguish the lineages and transmission patterns identified above. To do this, we constructed the mumps phylogeny using the SH sequence alone and compared it to the whole genome phylogeny. There was limited variation within each genotype (genotype G shown in Fig 3B), and neither of our main phylogenetic conclusions based on whole genome data could be ascertained from SH alone: we could not determine the relationship between the spillover into the local community and its source in Harvard (other than that they belonged to the same clade), and we were unable to determine the relationship between the 2016 and 2017 outbreak and the 2006 lineage (S8 Fig). Our other key findings, such as the detailed picture of transmission within Harvard and a refined RE estimate, rely on these 2 conclusions. Nevertheless, SH data were sufficient to resolve population structure on a global scale and to confirm that the 11 known clinically relevant genotypes are associated with particular world regions (Fig 3C) [34]. Moreover, an analysis of global migration reveals significant movement of mumps virus between the US and Europe. For details on the global SH analysis, including a discussion of its limitations, see S1 Text, S9 and S10 Figs.

Fig 3. Global spread of mumps virus based on SH gene sequences.

Fig 3

Colors in all panels are by region (legend in bottom right). (A) Number of SH sequences in our data set from each of the 15 regions. (B) Identical genotype G sequences over time from 1995 through 2017. Each dot represents a sample; each row contains samples with identical SH sequences, except the bottom, which includes samples with sequences distinct from those in the above 5 categories. Numbers on the right: percentage of all genotype G samples found in that row. (C) Maximum clade credibility tree of 3,646 publicly available SH gene sequences, including 193 complete SH sequences generated in this study. SH, small hydrophobic.

The combination of high-quality genomic and epidemiological data from the Massachusetts mumps outbreak revealed the extent to which mumps is circulating in the US, connected previously unrelated outbreaks, and allowed us to trace transmission within and between individual communities. Given the high-quality genomic data we were able to produce from mumps clinical samples, as well as the limited information that can be gleaned from SH sequencing, it is worth considering whether future public health surveillance of mumps should incorporate whole genome sequencing. The collection of these detailed data, which we have made available to the community (see Materials and methods), was only possible through extensive collaboration between state and national public health agencies, academic researchers, and affected universities throughout the greater Boston, Massachusetts area. We hope that these partnerships, fostered in response to a surge in mumps cases in Massachusetts in 2016 and 2017, will facilitate real-time genomic and epidemiological data generation, analysis, and sharing in future outbreaks of any pathogen.

Materials and methods

Ethics statement

The study protocol was approved by the MDPH, Centers for Disease Control and Prevention (CDC), and Massachusetts Institute of Technology (MIT) Institutional Review Boards (IRB) (MDPH IRB 00000701, project 906066). Harvard University Faculty of Arts and Sciences and the Broad Institute ceded review of sequencing and secondary analysis to the MDPH IRB through authorization agreements. The MDPH IRB waived informed consent given this research met the requirements pursuant to 45 CFR 46.116 (d). The CDC IRB determined this project to be nonhuman subjects research as only deidentified leftover diagnostic samples were utilized. In compliance with the IRB agreement, Harvard University, University of Massachusetts Amherst, and Boston University granted approval for publication of their institution names in this paper.

Sample collections and study subjects

Buccal swab samples were obtained from suspected and confirmed mumps cases tested at MDPH and CDC. Samples from MDPH (“Cases in Study,” Fig 1B) include all cases with a positive mumps PCR result (see “PCR diagnostic assays performed at MDPH and CDC” below) collected between 1 January 2014 and 30 June 2017. Demographic information for all cases reported in Massachusetts (Fig 1B and Table 1) includes all confirmed and probable mumps cases reported to MDPH in that time period. Probable cases include cases with a positive mumps IgM assay result or those with an epidemiological link to a confirmed case [35]. Samples from CDC are a selection of PCR-positive cases submitted to the CDC for testing between 2014 and 2017. See S1 Data for deidentified information, including metadata, about study participants.

Viral RNA isolation

Sample inactivation and RNA extraction were performed at the MDPH, Broad Institute, and CDC. At MDPH, viral samples were inactivated by adding 300 μL Lysis/Binding Buffer (Roche) to 200 μL sample, vortexing for 15 seconds, and incubating lysate at room temperature for 30 minutes. RNA was then extracted following the standard external lysis extraction protocol from the MagNA Pure LC Total Nucleic Acid Isolation Kit (Roche) using a final elution volume of 60 μL. At the Broad Institute, samples were inactivated by adding 252 μL Lysis/Binding Buffer (ThermoFisher) to 100 μL sample. RNA was then extracted following the standard protocol from the MagMAX Pathogen RNA/DNA Kit (ThermoFisher) using a final elution volume of 75 μL. At CDC, RNA extraction followed the standard protocol from the QiaAmp Viral RNA mini kit (Qiagen).

PCR diagnostic assays performed at MDPH and CDC

Diagnostic tests for presence of mumps virus were performed at the MDPH and CDC using the CDC Real-Time (TaqMan) RT-PCR Assay for the Detection of Mumps Virus RNA in Clinical Samples [8,36]. Each sample was run in triplicate using both the Mumps N Gene assay (MuN) and RNase P (RP) assay using this protocol. RT-PCR was performed on the Applied Biosystems 7500 Fast Real-Time PCR system or Applied Biosystems Prism 7900HT Sequence Detection System instrument.

PCR quantification assays performed at Broad Institute

Mumps virus RNA was quantified at the Broad Institute using the Power SYBR Green RNA-to-Ct 1-Step qRT-PCR assay (Life Technologies) and CDC MuN primers. The 10 μL assay mix included 3 μL RNA, 0.3 μL each of mumps virus forward and reverse primers at 5 μM concentration, 5 μL 2x Power SYBR RT-PCR Mix, and 0.08 μL 125x RT Enzyme Mix. The cycling conditions were 48 °C for 30 minutes and 95 °C for 10 minutes, followed by 45 cycles of 95 °C for 15 seconds and 60 °C for 30 seconds with a melt curve of 95 °C for 15 seconds, 55 °C for 15 seconds, and 95 °C for 15 seconds. RT-PCR was performed on the ThermoFischer QuantStudio 6 instrument. To determine viral copy number, a double-stranded gene fragment (IDT gBlock) was used as a standard. This standard is a 171 bp fragment of the mumps genome (GenBank accession: NC_002200) including the amplicon (sequence: GGA TCG ATG CTA CAG TGT ACT AAT CCA GGC TTG GGT GAT GGT CTG TAA ATG TAT GAC AGC GTA CGA CCA ACC TGC TGG ATC TGC TGA TCG GCG ATT TGC GAA ATA CCA GCA GCA AGG TCG CCT GGA AGC AAG ATA CAT GCT GCA GCC AGA AGC CCA AAG GTT GAT TCA AAC).

23S rRNA content in samples was quantified using the same Power SYBR Green RNA-to-Ct 1-Step qRT-PCR assay kit and cycling conditions. Primers were used to amplify a 183 bp universally conserved region of the 23S rRNA (fwd: 93a - GGG TTC AGA ACG TCG TGA GA, rev: 97ar—CCC GCT TAG ATG CTT TCA GC) [37]. To determine viral copy number, a double-stranded gene fragment (IDT gBlock) was used as a standard. This standard is a 214 bp fragment of the Streptococcus HTS2 genome (accession: NZ_CP016953) (sequence: AGC GGC ACG CGA GCT GGG TTC AGA ACG TCG TGA GAC AGT TCG GTC CCT ATC CGT CGC GGG CGT AGG AAA TTT GAG AGG ATC TGC TCC TAG TAC GAG AGG ACC AGA GTG GAC TTA CCG CTG GTG TAC CAG TTG TCT CGC CAG AGG CAT CGC TGG GTA GCT ATG TAG GGA AGG GAT AAA CGC TGA AAG CAT CTA AGT GTG AAA CCC ACC TCA AGA T). Data from both assays—each performed only on a subset of samples—is reported in S1 Data.

Bacterial rRNA depletion

Bacterial rRNA was depleted from some RNA samples (see S1 Data) using the Ribo-Zero Bacteria Kit (Illumina). At the hybridization step, the 40 μL reaction mix included 5 μL RNA sample, 4 μL Ribo-Zero Reaction Buffer, 8 μL Ribo-Zero Removal Solution, 22.5 μL water, and 0.5 μL synthetic RNA (25 fg) used to track potential cross-contamination (gift from M. Salit, NIST). Bacterial rRNA-depleted samples were purified using 1.8x volumes Agencourt RNAClean XP beads (Beckman Coulter) and eluted in 10 μL water for cDNA synthesis.

Illumina library construction and sequencing

cDNA synthesis was performed as described in previously published RNA-seq methods [38]. In samples in which bacterial rRNA was not depleted, 25 fg synthetic RNA was added at the beginning of cDNA synthesis to track sample cross-contamination. Positive control libraries were prepared from a mock mumps virus sample in which cultured Enders strain (ATCC VR-106) mumps was spiked into a composite buccal swab sample from healthy patients and diluted to mumps virus RT-qPCR Ct = 21. This mock sample was extracted using the viral RNA isolation protocol described above, except that total nucleic acid was eluted in 100 μL. Negative control libraries were prepared from nuclease-free water. Illumina Nextera XT was used for library preparation: indexed libraries were generated using 16 cycles of PCR, and each sample was indexed with a unique barcode. Libraries were pooled equally based on molar concentration and sequenced on the Illumina HiSeq 2500 (100 or 150 bp paired-end reads) platform.

Hybrid capture

Viral hybrid capture was performed as previously described [38] using 2 different probe sets. In one case, probes were created to target mumps and measles virus (V-MM probe set), and in one case, probes were created to target 356 species of viruses known to infect humans (V-All probe set) [39]. Capture using V-All was used to enrich viral sequences primarily in samples in which we could not detect mumps virus, as well as in other samples (see S1 Data for a list of which samples were captured using which probe set). As described in the work by Metsky and colleagues [39], the probe sets were designed to capture the diversity across all publicly available genome sequences on GenBank for these viruses. Probe sequences can be downloaded here: https://github.com/broadinstitute/catch/tree/cf500c69/probe-designs.

Genome assembly

We used viral-ngs version 1.18.1 [40] to assemble genomes from all sequencing runs. Viral-ngs is freely available under a BSD license (https://viral-ngs.readthedocs.io/en/latest/). We used a set of mumps sequences (accessions: JX287389.1, FJ211586.1, AB000386.1, JF727652.1, AY685920.1, AB470486.1, GU980052.1, NC_002200.1, AF314558.1, AB823535.1, AF467767.2) to taxonomically filter these reads. We de novo assembled reads and scaffolded against the mumps genome with accession JX287389.1 to assemble a genome for each replicate. Then, we pooled read data from all sequencing replicates of each sample and repeated this assembly process to obtain final genomes. Each time we ran viral-ngs, we set the “assembly_min_length_fraction_of_reference” and “assembly_min_unambig” parameters to 0.01. Technical replicates had high concordance: in 27 samples prepared more than once, only 2 base calls differed across replicates.

We replaced deletions in the coding regions with ambiguity (“N”). In one sample, MuVs/Massachusetts.USA/11.16/5 [G], with an insertion at position 3,903 (based on a full 15,384-nucleotide mumps virus genome, e.g., accession JN012242.1) we removed a poorly supported (<5 reads covering the site) extra “A” in a homopolymer region.

To calculate sequencing metrics (S1 Fig), we used SAMtools [41] to downsample raw reads for each replicate to 1 million reads and then reran assembly as described above. Samples from 1 contaminated sequencing batch were excluded, as were all replicates from PCR-negative samples. In cases in which samples from 2 time points were sequenced from a single patient, we included only the first time point in the collection interval analysis (S1D Fig).

Metagenomic analysis

We used the V-All probe set for capture on all samples from suspected mumps cases with a negative mumps PCR result (n = 29). A subset of PCR-positive samples was also sequenced with this probe set (n = 145; without capture = “unbiased,” n = 111). We used the mock Enders strain mumps sample as a positive control on a sequencing run containing all PCR-negative samples, as well as a water sample as a negative control. We used the metagenomic tool Kraken version 0.10.6 [42] via viral-ngs to identify the presence of viral taxa in each sample. We built a database similar to the one described in the work by Metsky and colleagues [39], except without insect species. This database encompasses the known diversity of viruses known to infect humans. It is publicly available, in 3 parts, at https://storage.googleapis.com/sabeti-public/meta_dbs/kraken_full_20170522/ [file], where [file] is database.idx.lz4 (595 MB), database.kdb.lz4 (75 GB), and taxonomy.tar.lz4 (66 MB). Because of the possibility of contamination, we prepared a second, independent sequencing replicate on all PCR-negative samples with evidence for mumps or another virus and required both replicates to contain reads matching the virus detected in the sample. We found no evidence of pathogenic viruses other than mumps in PCR-positive samples.

We required the total raw read count for any genus in any sample to be twice (in practice, 7 times) that in any negative control from any sequencing batch. For any sample that had one or more pathogenic viral genera that passed this filter and had deduplicated reads well distributed across the relevant viral genome, we attempted contig assembly: we used viral-ngs to filter all sample reads against all NCBI GenBank [43] entries matching the identified species and then de novo assembled reads using Trinity [44] through viral-ngs and scaffolded against the closest matching full genome identified by a blastn query [45]. We report all viruses identified via this method in S3 Table.

In parallel, we used SPAdes [46] within viral-ngs to de novo assemble contiguous sequence from all de-duplicated, depleted reads. We used the metagenomic tool DIAMOND version 0.9.13 [47] with the nr database downloaded 29 May 2017, followed by blastn [45] of DIAMOND-flagged contigs. Using this method, we confirmed the presence of all previously identified viruses except influenza B virus, for which we never assembled a contiguous sequence. We found no evidence of additional pathogenic viruses using this method.

Criteria for pooling across replicates

We prepared one or more sequencing libraries from each sample and attempted to sequence and assemble a genome from each of these replicates. We required a replicate of a sample to contain 3,000 unambiguous base calls for its read data to be included in that sample’s final genome assembly. This threshold was based on the maximum number of unambiguous bases (2,820) observed in negative controls across all uncontaminated sequencing batches. One sequencing batch showed evidence of contamination: we were able to assemble 7,615 unambiguous mumps bases from a water sample, with a median coverage of 4x. For samples prepared in this batch only, we implemented an additional requirement for including a replicate in pooling: the assembly must have a median depth of coverage of ≥20x, 5 times the median depth of coverage of the water sample.

Multiple sequence alignment of genotype G whole genomes

We required a mumps genome to contain 11,538 unambiguous base calls (75% of the total 15,384-nucleotide genome with GenBank accession JN012242.1) for inclusion in the alignment of whole genome sequences that we used for downstream analysis. For 2 patients with samples taken at 2 time points (MuVs/Massachusetts.USA/19.16/5 [G] (1) and MuVs/Massachusetts.USA/19.16/5 [G] (2–20.16); MuVs/Massachusetts.USA/16.16/6 [G] (1) and MuVs/Massachusetts.USA/16.16/6 [G] (2–17.16)), we only included the earlier sample in downstream analyses. The final alignment of whole genome sequences contains only samples belonging to genotype G; we did not include MuVs/Massachusetts.USA/24.17/5 [K], which belongs to genotype K, in the alignment.

In this alignment, we also included 25 mumps virus genomes published on NCBI GenBank [43]. These comprise all of the sequences with organism “Mumps rubulavirus” available as of September 2017 that meet the following criteria: sequence length ≥14,000 nucleotides, belong to genotype G, sample collection year and country of origin reported in GenBank, no evidence of extensive virus passaging or modification (for vaccine development, for example). The accessions are KY969482, KY996512, KY996511, KY996510, KY680540, KY680539, KY680538, KY680537, KY006858, KY006857, KY006856, KY604739, KF738114, KF738113, KF481689, KM597072, JX287391, JX287390, JX287389, JX287387, JX287385, JN012242, JN635498, AF280799, EU370207.

We aligned mumps virus genomes using MAFFT version 7.221 [48] with default parameters. We provide the sequences and alignments used in analyses at http://doi.org/10.5281/zenodo.3338599.

Visualization of coverage depth across genomes

We plotted aggregate depth of coverage across the 200 samples whose genomes were included in the final alignment (S1C Fig) as described in the work by Metsky and colleagues [49]. We aligned reads against the reference genome with accession JX287389.1 and plotted over a 200-nt sliding window.

Analysis of within- and between-sample variants

We ran V-Phaser 2.0 [50] via viral-ngs on all pooled reads mapping to a sample assembly to identify within-sample variants (S2 Data). To call a variant, we required a minimum of 5 forward and reverse reads, as well as no more than 10-fold strand bias, as previously described [51]. Samples with genomes generated by the sequencing batch that showed evidence of contamination (see “Criteria for pooling across replicates” above) were not included in within-host variant analysis. When analyzing variants in known contacts, we used pairs of samples designated as “contact links,” as described in “Relationship between epidemiological and genetic data” below.

Between-sample variants were called by comparing each final genome sequence to JX287385.1, the earlier of the 2 available whole genomes from the 2006 mumps outbreak in Iowa, US (S2 Data). We ignored all fully or partially ambiguous base calls and excluded sequences that did not descend from the USA_2006 clade from this analysis. When examining amino acid changes in HN given vaccination status (see “SH and HN multiple sequence alignment” below), we ignored sequences from patients with unknown vaccination history.

Maximum likelihood estimation and root-to-tip regression

We generated a maximum likelihood tree using the whole genome genotype G multiple sequence alignment. We used IQ-TREE version 1.3.13 [52] with a GTR substitution model and rooted the tree on the oldest sequence in this data set (accession KF738113.1) in FigTree version 1.4.2 [53].

To estimate root-to-tip distance of samples in the primary US lineage, we subsetted the full genotype G alignment to include only samples descendent of the USA_2006 clade, including samples in this clade (see Fig 1A) and used TempEst version 1.5 [54] with the best fitting root (heuristic residual mean squared function) to estimate distance from the root. We used scikit-learn version 0.14.1 [55] in Python to perform linear regression of distances on dates.

We also generated maximum likelihood trees using the SH gene only (full 316-nucleotide mRNA), HN (coding region only), F (coding region only), and a concatenation of the aforementioned SH, HN, and F regions (S8 Fig). For each tree, we started with the whole genome genotype G alignment (225 sequences) and extracted the relevant region(s). We then removed any sequence with 2 or more consecutive ambiguous bases (“N”s) in any of SH, HN, or F, leaving 209 sequences in each alignment. We used IQ-TREE version 1.5.5 with a GTR substitution [56] model to generate maximum likelihood trees.

Molecular dating using BEAST

We performed all molecular clock analyses on whole genome sequences using BEAST version 1.8.4 [57]. We excluded from the CDS the portion of the V protein after the insertion site [58] because of reading frame ambiguity in that region. On the CDS, we used the SRD06 substitution model [59], which breaks codons into 2 partitions (positions [1+2] and 3) with HKY substitution models [60] and allows gamma site heterogeneity [61] (4 categories) on each. We used a separate partition on noncoding sequence with an HKY substitution model and gamma site heterogeneity. To accommodate inexact dates in 7 sequences from NCBI GenBank, we used sampled tip dates [62].

We tested 6 models as described in the work by Metsky and colleagues [49]. Each was a combination of one of 2 clock models (strict clock and uncorrelated relaxed clock with log-normal distribution [63]) and 1 of 3 coalescent tree priors (constant size population, exponential growth population, and Bayesian Skygrid model [64] with 20 parameters). On each model, we estimated marginal likelihood with path-sampling (PS) and stepping-stone sampling (SS) [65,66] (S2 Table) after sampling 100 path steps each with a chain length of 2 million.

We sampled trees and other parameters on each model by running BEAST for 200 million MCMC steps, sampling every 20,000 steps, and removing 20 million steps as burn-in. We report the mean clock rate as the substitution rate for relaxed clock models. On the sampled trees, we used TreeAnnotator version 1.8.4 to find the maximum clade credibility (MCC) tree and visualized it in FigTree version 1.4.3 [53]. To estimate tMRCAs (Fig 1C and S2 Table), we ran BEAST again for each of the 6 models, drawing from these same sets of sampled trees (without any parameters), for 10,000 steps, sampling every step. We selected a relaxed clock and Skygrid model for plots of tMRCA distributions (Fig 1C) and MCC trees over the whole genome genotype G sequences.

Additionally, we plotted (S11 Fig) a Skygrid reconstruction of the scaled population size (Neτ) using results from the selected model. The earliest time point on this plot is the lower (more recent) 95% HPD bound on the estimated root height.

Gene- and site-specific dN/dS analyses

We used BEAST version 1.8.4 [57] to estimate dN/dS per-site (S4C Fig and S3 Data) and per-gene (S4D Fig) using the same alignment of 225 whole genome sequences described above (again, removing the portion of the V gene after the insertion site).

For site-specific dN/dS estimation, we used the CDS as input and created a separate partition for each codon position (3 partitions). We used an HKY substitution model [60] on each partition and an uncorrelated relaxed clock with log-normal distribution [63] for branch rates. Here, we sampled from the same set of trees that were sampled as described above in “Molecular Dating using BEAST” (relaxed clock with Skygrid tree prior). We ran BEAST for 10 million MCMC steps, sampling every 10,000 steps. We estimated site-specific dN/dS at each sampled state using renaissance counting [67,68] and show summary statistics at each site after discarding 1 million steps as burn-in.

For per-gene estimation, we created 8 separate partitions: 7 correspond to the CDS of a gene (F, HN, L, M, NP, SH, partial V), and the last corresponds to noncoding sequence. For each gene partition, we used a Goldman-Yang codon model [69] with its own parameters for dN/dS (omega) and clock rate. For the noncoding partition, we used an HKY substitution model [60] and gamma site heterogeneity [61] (4 categories). We sampled tip dates as with the molecular clock analyses above and used a Bayesian Skyline tree prior [70] (10 groups). We ran BEAST for 200 million MCMC steps to sample trees and parameter values, discarded 20 million steps as burn-in, and plotted the posterior distribution of omega for each gene partition.

Principal component analysis

The data set for PCA consisted of all SNPs from sites with exactly 2 alleles in the set of all genotype G genomes. We imputed missing data with the R package missMDA [71] and calculated principal components with the R package FactoMineR [72]. We discarded 14 samples as outliers based on visual inspection, leaving 211 samples in the final set.

Relationship between epidemiological and genetic data

We obtained detailed epidemiological data for samples shared by MDPH from the Massachusetts Virtual Epidemiologic Network (MAVEN) surveillance system, an integrated web-based disease surveillance and case management system [73]. We defined 2 types of epidemiological links: “contact links,” between individuals who were determined to be close contacts during public health investigation and had symptom onset dates 7 to 33 days apart (individuals with mumps are usually considered infectious 2 days before through 5 days after onset of parotid swelling, with a typical incubation period of 16–18 days, ranging from 12–25 days) [26]; and “shared activity links,” between individuals who participated in the same extracurricular activity (e.g., a sports team or university club) or frequented a specific residence or athletic facility. When we refer to epidemiological links without specifying link type, we include both types of links.

We calculated pairwise genetic distance between all pairs of samples in the whole genome genotype G alignment. For each pair, the genetic distance score is s/n, in which s is the number of unambiguous differing sites (both sequences must have an unambiguous base at the site, and the called bases must differ) and n is the number of sites at which both sequences have an unambiguous base call.

To visualize the similarity between genomes and its relationship to epidemiological linkage, we performed a multidimensional scaling on sequences in Clade II-outbreak (Fig 1A). This clade is comprised of mostly cases from Harvard and the related community outbreak. Using their pairwise genetic distances, we calculated a metric multidimensional scaling to 2 dimensions in R with cmdscale [74]. We then evenly split the range of the output coordinates into a 100 × 100 grid and collapsed each point into this grid and plotted the number of points at each grid coordinate; this improves visualization of nearly overlapping points (identical or near-identical genomes). We plotted curves that represent epidemiological links between cases within each of the grid coordinates. This is shown in S7A Fig.

To determine the ability of genetic distance to predict epidemiological linkage, we again looked specifically at cases within Clade II-outbreak (Fig 1A). Using the Python scikit-learn package [55], we constructed a receiver operating characteristic (ROC) curve using pairwise distance between II-outbreak cases as the predictor variable and presence or absence of an epidemiological link as the binary response variable. This is shown in S7B Fig.

Model of mumps transmission in a university setting

We developed a stochastic model for mumps virus transmission accounting for the natural history of infection, vaccination status, and control measures implemented in response to the outbreak at Harvard. Our stochastic model of mumps virus transmission included the stages after initial infection, the durations of which we inferred using data from previous clinical studies (S6A and S6C Fig). These included the gamma-distributed incubation period from infection to onset of mumps virus shedding in saliva [75]; the gamma-distributed period of latent infection from shedding onset to parotitis onset [75,76]; and the log-normally distributed time from parotitis onset to the cessation of shedding (defined in the work by Polgreen and colleagues [77]). For asymptomatic cases, we defined the total duration of shedding (γ) as the sum of independent random draws from the durations of shedding before and after parotitis onset, based on the lack of any reported difference in durations of shedding for symptomatic and asymptomatic cases [75]. To account for case isolation interventions implemented at Harvard, we modeled the removal of symptomatic individuals one day after onset of parotitis. In comparison to the 70% probability for symptoms given infection among unvaccinated individuals [78], we modeled the probability of symptoms given infection as uniformly distributed between 27.3% and 38.3% [5,79].

We used previous estimates of the effectiveness and waning rate of mumps vaccination [5] and of the vaccination status distribution of individuals on a university campus [80] to account for susceptibility to infection among the Harvard population (N = 22,000). We scaled risk for mumps infection, given exposure, to time since receipt of the last vaccine dose, yielding the hazard ratio

ξi=eω0τiω1

for an individual i who received their last dose τi years previously, relative to an unvaccinated individual. For fitted values from the work by Lewnard and Grad [5], estimates were below 1.0 for individuals vaccinated since 1967, when the Jeryl Lynn vaccine was introduced (S6D Fig).

Given the instantaneous hazard of infection for an as yet uninfected individual i exposed to I(t) infected individuals

λi(t)=βξiI(t)N-1,

the probability of evading infection over the course of a 1-day simulated time step was exp(−λi (t)). The per-contact transmission rate (β) was measured from the initial (preintroduction) value of the effective reproductive number:

β=RE(0)γ¯ξ-1.

Inferring transmission dynamics

The number of cases (71) and identification of multiple, distinct viral clades within Harvard suggested limited permeation of mumps after any introduction. We simulated dynamics of individual transmission chains to understand the epidemiological course of introduced viral lineages and to infer values of RE(0) and the number of importations of mumps virus. We used the simulation model to sample from the distribution of the number of cases (X, including the index infection if symptomatic) resulting from a single introduction over a 1.5-year time course:

f{xi|RE(0)}=P[X=xi|RE(0)].

We resampled according to f{xi|RE(0)} to define the distribution of the cumulative number of cases (Z) resulting from Y introductions, conditioned on RE(0):

g{zk|RE(0),Y}=P[Z=zk=i=1yjxi|RE(0),Y=yj].

Of the 71 cases at Harvard, 66 had mumps genomes in our data set, so we ran simulations where Z ≥ 66, drawing k = 66 cases at random to determine the number of distinct lineages (S, defined by the index infection) expected to be present within such a sample. The probability of obtaining 66 sequences and observing S = sm lineages among them is

h{sm|RE(0),Y,K=66}=P[S=sm|RE(0),Y=yj,K=66]×P[Z66|RE(0),Y].

The posterior density of our model also accounted for the probability of observing 71 symptomatic cases in total. Defined in terms of the number of introductions and the initial reproductive number, the model posterior was proportional to

h{4|RE(0),Y,K=66}×g{71|RE(0),Y},

where 4 is the number of viral lineages in the 66 Harvard cases (representing clades 0-HU, II-community, and 2 subclades within Clade II). We measured this probability from 100,000 iterates for each pairing of RE(0) ∈ {0.10, 0.11, …, 2.50} and Y ∈ {1, 2,…, 200}.

Last, we defined the minimum necessary third-dose vaccine coverage (C) to bring the effective number below unity using the relation

[1-VE(0)]×CRE(t=0)1

sampling from previous estimates [5] that mumps vaccination protects against infection, prior to waning of immunity (here defined as VE(0)).

Transmission reconstruction using outbreaker

We used the R package outbreaker version 1.1–7 [81] to reconstruct transmission for samples included in Clade II-outbreak. We estimated the generation interval by fitting a gamma distribution, via maximum likelihood, to the time between symptom onset dates for cases with confirmed epidemiological links (S6E Fig). We used the same distribution for the colonization time and set the maximum number of generations between a case and its most recent sampled ancestor to 40. The resulting estimates are nearly identical to those reported in previous studies [82]. We ran outbreaker 6 times in parallel, each with 1 million MCMC steps, and discarded the first 10% of states as burn-in. We assessed run convergence and combined results for 5 of the 6 parallel runs to determine the reconstructed transmission tree (Fig 2C right). For each link in the reconstruction, the support is the frequency of the link in the samples from the posterior (excluding the burn-in). To reconstruct transmission using SH sequences only (S7C Fig), we extracted the SH gene from the II-outbreak alignment and ran outbreaker as described above, using the results from all 6 parallel runs in the analysis.

SH and HN multiple sequence alignment

To analyze all published SH and HN mumps sequences, we searched NBCI GenBank in July 2017 for all nucleotide sequences with organism “Mumps rubulavirus.” We performed a pairwise alignment between each sequence si and a reference genome (accession: JX287389.1) using MAFFT version 7.221 [48] with parameters: “—localpair—maxiterate 1000—preservecase.” We then extracted the SH sequence from each si based on the reference coordinates in the alignment, removing all SH sequences without the full 316-nucleotide region and all SH sequences with an insertion or deletion (“indel”) relative to the reference. We then used MAFFT with parameters “—localpair—maxiterate 1000—retree 2 –preservecase” to create a multiple sequence alignment of the extracted SH gene sequences and removed any sequences with indels in this final alignment. We repeated the same process for the HN region, requiring the full 1,749-nucleotide coding region.

In both the SH and HN alignments, we removed sequences from vaccine strains (i.e., genotype N, or another genotype marked as “(VAC)” or “vaccine”). We also removed sequences with GenBank records indicating extensive passaging. In the SH alignment only, we removed sequences with no reported collection date or country of origin, because these data are required for phylogeographic analyses. In samples with a collection decade (e.g., 1970s) but not a specific year, we assigned the first year of the decade; in samples with only a collection year, we assigned a decimal year of year + 0.5 (e.g., 1970.5); in samples with year and month but no day, we used the day halfway through the given month (e.g., 2015–03 becomes 2015-03-15) to calculate the decimal year; and in samples with an epidemiological week but no specific day, we approximated the decimal year as year + (epi week / 52), except samples collected in epidemiological week 52 were relabeled as week 51.999 to avoid confusion with year-only samples.

In both the SH and HN alignments, we relabeled outdated genotypes (M, E, and any subgenotypes [21]) and constructed a maximum likelihood tree (using IQ-TREE with a GTR substitution model, as described above) to assign a genotype if one was not reported on GenBank. We preserved genotypes designated as “Unclassified” [21].

To each alignment, we added all SH or HN sequences from individual patients generated in this study, except those with 2 or more consecutive ambiguous bases (“N”s) in the SH or HN region. The sequences used in the SH and HN analyses are listed in S4 Data.

SH phylogeographic analysis

To perform phylogenetic and phylogeographic analyses of the SH gene sequence, we first sampled trees using BEAST version 1.8.4 [57]. We used constant size population and strict clock models and used the HKY substitution model [60] with 4 rate categories and no codon partitioning. We ran BEAST in 4 replicates, each for 500 million states with sampling every 50,000 states, and removed the first 150 million states as burn-in. We verified convergence of all parameters across the 4 replicates and then combined the 4 replicates using LogCombiner.

We used TreeAnnotator to determine the MCC tree (Fig 3C) from a resampling of 350 trees and visualized the result in FigTree version 1.4.3 [53]. We computed a kernel density estimate of the probability distribution of the tMRCA over all sampled states for the 11 genotypes in this data set (S9 Fig).

To construct distributions of estimates, we used resampling on the input sequences, similar to prior work facing sampling biases [83]. To perform this resampling, we focused on only samples that were collected both within a window of time and from a geographic region with sufficient sampling. Namely, we considered only sequences sampled in 2010 or afterward and collapsed the locations shown on the full data set (Fig 3A) to just 4 global regions: US (consisting of only samples from the US), Europe (consisting of samples whose location was labeled as Eastern Europe, Northern Europe, Southern Europe, Western Europe, or the United Kingdom), East Asia (consisting of samples whose location was labeled as Eastern Asia or Japan), and South/Southeast Asia (consisting of samples whose location was labeled as Southeast Asia or Southern Asia). These 4 regions encompass 3,541 of the 3,646 SH gene sequences used in our analysis. We ignored samples from 5 locations: Canada; Caribbean, Central America, and South America; Middle and Eastern Africa; Northern Africa; Middle East. See S10A Fig for a visual representation of these 4 regions. Then, we randomly sampled 10 sequences (without replacement) from each region for each year (i.e., 2010–2011, 2011–2012, etc.). We resampled the input sequences with this strategy 100 times.

See S1 Text for a description of limitations of this resampling strategy.

For each of the 100 resamplings of the input sequences, we ran BEAST to sample trees, as described above, for 100 million states sampling every 10,000 states; we removed 10 million states as burn-in and resampled to obtain 1,000 sampled trees.

Then, we performed phylogeographic analyses on each of the 100 samplings of input sequences by drawing from their sampled trees. We used a discrete trait substitution model [84] on location in BEAST version 1.8.4. To estimate transition rates between locations we used a nonreversible CTMC model with 42 − 4 = 12 rates. Furthermore, to evaluate the significance of routes in the diffusion process, we added indicator variables to each rate through Bayesian stochastic search variable selection (BSSVS); we set the number of nonzero rates to have a Poisson prior with a mean of 3.0, placing considerable prior probability on having the fewest rates needed to explain the diffusion. We ran BEAST with 10 million states, sampling every 1,000 states, and removed the first 1 million as burn-in. At each sampled state, we logged the complete Markov jump history [85,86], as well as a tree with the reconstructed ancestral location of each node.

To determine an MCC tree across the 100 samplings of input sequences, we ran TreeAnnotator on the sampled trees from each of the 100 samplings and then selected the MCC tree, from the 100 options, with the highest clade credibility score. We show this one, colored by reconstructed ancestral locations, in S10B Fig.

For each sampling xi of the 100 samplings of input sequences, we counted the number of jumps between each pair of locations at each state using the complete Markov jump history after resampling the jump history every 10,000 states. For each xi, at each state, we calculated the fraction of migrations between each region pair by dividing the number of migrations between the pair by the total number of migrations at that state. To quantify support for migration routes in each xi, we calculated Bayes factors (BFs) on the rate indicator variables. We calculated the posterior probability that a rate is nonzero as the mean of the indicator variable over the MCMC states, thereby providing a posterior odds. We calculated the prior probability that a rate is nonzero as the expected number of nonzero rates divided by the number of rates, which reduces to 1/N, where N is the number of locations; thus, the prior odds is 1/(N-1) or, in this case, 1/3. We set an upper limit of 10,000 on the BF and a lower limit of 1.0. To estimate the proportion of ancestry for each xi, we used skyline statistics via PACT [87] to calculate proportion of ancestry at each location from each other location: in particular, for each xi, we used the sampled trees with ancestral locations as input (after resampling them every 10,000 states), padded the trees with migration events, broke the trees into temporal windows of 0.1 year going back 5 years prior to sampling, and estimated the proportion of history from tips in each time window.

To summarize phylogeographic results across the 100 resamplings xi of the input sequences, we show probability distributions across the xi. When plotting the fraction of migrations to each region from each other (S10C Fig), we calculated the mean of this fraction across all the sampled states in each of the 100 runs to produce a point estimate for each xi and show the distribution of these means across the 100 xi. We calculated the BFs on migration routes between the 4 regions by combining the sampled indicator variables across all 100 xi to compute the posterior odds (S10D Fig). Similarly, the proportion of ancestry plotted between a pair of locations in a time window (S10E Fig) is the mean across the 100 xi of the mean proportion for that pair in that time window from each xi. We calculated the pointwise percentile bands in S10F Fig from the mean proportions in each xi across the xi (i.e., they are percentiles across the resamplings of the input sequences).

Supporting information

S1 Fig. Sequencing results and predictors of outcome.

(A) Distribution of mumps virus (MuV) RT-qPCR Ct value, taken at sample source, for all sequencing replicates prepared with both depletion and capture (see Materials and methods). Genome (blue): a replicate produces a genome passing the thresholds described in Materials and methods. MuV RT-qPCR serves as a predictor of sequencing outcome. (B) Distribution of collection interval (days between symptom onset and sample collection) for all samples prepared with both depletion and capture. Genome (blue) is defined as in panel A. Samples taken more than 4 days after symptom onset did not produce genomes in this study [88]. (C) Relative sequencing depth of coverage aggregated across 203 mumps genomes. (D) Number of unambiguous bases in the genome assembly of each sample by MuV:23S ratio (MuV copies by MuV RT-qPCR divided by 23S copies by 23S RT-qPCR; see Materials and methods). Each point is a replicate, colored by sequencing preparation method. (E) Normalized MuV reads (unique MuV reads divided by raw sequencing depth) in each sample by MuV:23S ratio. Points are as in panel D. Nine points with fraction mumps reads >0.04 are beyond the y-axis limits. In panels A, B, D, and E, reads from each replicate were downsampled to 1 million prior to assembly (see Materials and methods). In panels D and E, 1 point with a MuV:23S ratio <10−8 and 3 points with a MuV:23S ratio >10−3 are beyond the x-axis limits. Ct, cycle threshold; MuV, mumps virus; RT-qPCR, real-time quantitative polymerase chain reaction.

(TIF)

S2 Fig. Maximum likelihood tree, root-to-tip regression, and principal component analysis.

(A) Maximum likelihood tree of the 225 mumps virus genotype G genomes used in this study. Tips are colored by sample source (MDPH or CDC); previously published genomes are indicated by unfilled circles. (B) Root-to-tip regression of genomes shown in panel A, rooted on GenBank accession KF738113 (Pune.IND, 1986). (C) Root-to-tip regression of genomes in the clade containing the two USA 2006 sequences (USA_2006; see Fig 1A) as well as their descendants. (D) Principal component analysis of genetic variants from the genomes in panel A. Each point is a genome colored by its geographic location. CDC, Centers for Disease Control and Prevention; MDPH, Massachusetts Department of Public Health.

(TIF)

S3 Fig. Phylogenetic tree colored by institution.

MCC tree of the 225 mumps virus genotype G genomes used in this study, colored by academic institution. Clades are labeled as in Fig 1A. MCC, maximum clade credibility.

(TIF)

S4 Fig. Amino acid substitution in the mumps virus genome.

(A) Variation in genomes generated in this study. Each row represents one of the 119 mumps HN amino acid sequences from the individuals in our study who had known vaccine status. Samples are displayed in order of descending time since last MMR vaccine dose. Colored variants indicate variation from the consensus of all included sequences. (B) Variation in all published genotype G HN sequences. Each row represents one of the 456 publicly available mumps genotype G HN sequences (including from genomes generated in this study). Identical sequences are collapsed and then grouped by hierarchical clustering. In both panels, amino acid substitutions relative to the Jeryl Lynn vaccine strain are highlighted in blue, with orange indicating a second variant allele and green indicating a third. Light red bars indicate possible neutralizing antibody epitopes, and dark red bars indicate potential N-glycosylation sites. (C) Estimate of dN/dS at each amino acid site in MuV coding regions, calculated across all 225 genotype G genomes used in this study. At each site, the mean estimate and 95% credible interval (not corrected for multiple testing) is shown. (D) Posterior density of dN/dS in each gene, using the same data set. (E) MCC tree of the 225 mumps genotype G genomes used in this study, colored by vaccination status. Clades are labeled as in Fig 1A. HN, hemagglutinin gene; MCC, maximum clade credibility; MMR, Measles-Mumps-Rubella; MuV, mumps virus.

(TIF)

S5 Fig. Comparison of PCR-positive and PCR-negative Massachusetts samples.

Comparison between 521 mumps PCR-negative samples tested in Massachusetts between 2016-01-01 and 2017-06-30, and 198 mumps samples from unique patients in Massachusetts in the same time period. In all panels, percentages have been recalculating after removing unknowns. (A) Vaccination status of positives (n = 198) and negatives (n = 309) with known vaccination status. A chi-square test suggests there is no relationship between PCR result and vaccination status (p = 0.012). (B) Years since vaccination for positives (n = 198) and negatives (n = 309) with known vaccination status. A chi-square tests suggests there is a relationship between PCR result and vaccination status (p = 1.19 × 10−7), and the plot shows that most recently vaccinated individuals were PCR-negative. (C) Collection interval for positives (n = 196) and negatives (n = 477) with known symptom onset and known collection date. A chi-square test suggests there is no relationship between PCR result and collection interval (p = 0.918). PCR, polymerase chain reaction.

(TIF)

S6 Fig. Parameters used in epidemiological models.

We illustrate fitted distributions of parameters of the modeled natural history of mumps infection. (A) We calibrate a gamma distribution to the duration of the incubation period—defined from the time of mumps virus exposure to the onset of shedding—using data from experimental human mumps virus infections with known exposure times [75]. (B) Onset of mumps shedding generally precedes onset of symptoms in the clinical course. We fit a gamma distribution describing the period of latent shedding to pooled data from 2 studies [75] and (C) apply previous estimates of the distribution of the duration of shedding after parotitis onset [77]. (D) We obtain estimates of the distribution of vaccine protection within a university protection by pairing previous estimates of the association between the strength of vaccine protection and time since receipt of the last dose [5] to data on vaccine coverage in a large university [80]. (E) We infer the distribution of the generation interval length in the Harvard using data from 10 cases with known exposure sources (“contact link”). A gamma distribution fitted by maximum likelihood recovers mean and dispersion estimates nearly identical to those reported in earlier mumps outbreaks [82].

(TIF)

S7 Fig. Connection between epidemiological and genetic data.

(A) Multidimensional scaling applied to samples in Clade II-outbreak (see Fig 1A). Each point is a mumps genome and pairwise dissimilarities are based on Hamming distance (see Materials and methods). Genomes with known epidemiological links are connected with a red line. (B) ROC curve for samples within Clade II-outbreak using pairwise genetic distance (calculated as in panel A) as a predictor of epidemiological linkage. (C) Transmission reconstruction using individuals within Clade II-outbreak, using the SH gene or whole genome sequences. Inferred links with probability ≥0.7 are shown; arrows are colored by data set. Nodes with one or more inferred links are shown and are colored by institution. See also Fig 2C, right. ROC, receiver operating characteristic; SH, small hydrophonic.

(TIF)

S8 Fig. Trees produced with single-gene and multigene sequences.

Maximum likelihood trees using (A) the SH gene only, (B) the HN protein only, (C) a concatenation of the HN protein, the F coding region, and the SH gene, and (D) the complete mumps genome. In all panels, tips are colored by clades as defined in Fig 1A and S2 Table. The HN protein sequence does a significantly better job at capturing the epidemiologically-relevant clades than the SH gene, and the tree created from SH+HN+F (nearly 25% of the genome) closely resembles the tree created from whole genome sequences. F, fusion protein; HN, hemagglutinin-neuraminidase; SH, small hydrophobic.

(TIF)

S9 Fig. tMRCA probability distributions for mumps genotypes using SH gene sequences.

The date of the most recent genotype A clinical sample is indicated, excluding samples that closely resemble a mumps vaccine strain. SH, small hydrophobic; tMRCA, time to the most recent common ancestor.

(TIF)

S10 Fig. Additional analyses of global mumps spread using SH gene sequences.

(A) World map indicating number of SH sequences in our data set from each of 15 regions; the 4 circled global regions represent the 4 regions from which we resampled input for migration analyses (see Materials and methods for details regarding geographic and temporal resampling of sequences). (B) Tree with the highest clade credibility across all trees generated on resampled input from 4 global regions. Branch line thickness corresponds to posterior support for ancestry (indicated by branch color). (C) Migration between the 4 global regions shown in panel A. Each plot shows a posterior probability density, taken across resampled input, of the fraction of all reconstructed migrations that occur to the destination (indicated in upper right) from each of the other 3 sources. (D) Migration between the 4 global regions shown in panel A. Shading of each migration route indicates its statistical support (quantified with BF) in explaining the diffusion of mumps virus. (E) Average proportion of geographic ancestry of samples in each of the 4 global regions (labeled) from each of the 4 regions (colored), going back 5 years from sample collection. (F) Average proportion of Europe in geographic ancestry of US samples, and vice-versa. Shaded regions are pointwise percentile bands (2.5% to 97.5%) across 100 resamplings of the input sequences. Colors for panels B,C,D,E,F are by global region, as shown in the bottom right. BF, Bayes factor; SH, small hydrophobic.

(TIF)

S11 Fig. Skygrid reconstruction of population size.

The scaled population size (Neτ) according to a Skygrid reconstruction. Dark line represents the median population size across samples from the posterior, and shaded area represents the 95% HPD interval. HPD, highest posterior density.

(TIF)

S1 Table. Sample metadata.

Demographic information of all mumps cases in Massachusetts between 2016-01-01 and 2017-06-30, and the subset of these included in this study.

(TIF)

S2 Table. Model selection and tMRCA estimates across models.

(A) Marginal likelihoods estimated in 6 models: combinations of 3 coalescent tree priors (constant size population, exponential growth population, and Skygrid) and 2 clock models (strict clock and uncorrelated relaxed clock with log-normal distribution). Estimates are with PS and SS. The BF are calculated against the model with constant size population and a strict clock. (B) Mean estimates of clock rate, date of tree root, and tMRCAs of the clades shown in Fig 1A (excluding Clade 0-HU, which consists of one sample). USA-4 corresponds to “Clades I and II” in Fig 1A. Below each mean estimate is the 95% highest posterior density interval. tMRCAs of additional unlabeled nodes are available at http://doi.org/10.5281/zenodo.3338599. BF, Bayes factor; PS, path sampling; SS, stepping-stone sampling; tMRCA, time to the most recent common ancestor.

(TIF)

S3 Table. Viruses identified in mumps PCR-negative samples.

Influenzavirus B is the only segmented virus listed, and we identify 51 reads mapping to 6 of the 8 segments: in order, 2, 6, 0, 0, 8, 10, 4, 21 reads to each segment. PCR, polymerase chain reaction.

(TIF)

S1 Data. Sample metadata.

Information and metadata regarding all mumps PCR-positive and PCR-negative samples on which sequencing was attempted. PCR, polymerase chain reaction.

(XLSX)

S2 Data. Single nucleotide polymorphisms.

List of identified nonsynonymous SNPs, including the frequency of each SNP. Separate list of identified within-host variants above 2% frequency. Separate list of SNPs in possible immunogenic regions of HN and NP in US samples from 2016 to 2017 (all positions relative to the relevant gene). HN, hemagglutinin-neuraminidase; NP, nucleoprotein; SNP, single nucleotide polymorphism.

(XLSX)

S3 Data. dN/dS values.

Calculated dN/dS value and confidence interval for each amino acid position across the mumps virus genome.

(XLSX)

S4 Data. Publicly available SH and HN sequences.

List of SH and HN sequences and corresponding metadata used in analysis, as well as SH sequences available for resampling. HN, hemagglutinin-neuraminidase; SH, small hydrophobic.

(XLSX)

S1 Text

(DOCX)

Acknowledgments

We thank A. Matthews and S. Winnicki for management and guidance; I. Shlyakhter, S. Weingarten-Gabbay, S. Ye, C. Tomkins-Tinch, and other members of the Sabeti Laboratory for discussions and reading of the manuscript; J. Hall, P. Patel, E. Buzby, K. Chen, and F. Halpern-Smith for mumps diagnosis and laboratory support; A. Osinski, C. Brandeburg, H. Johnson, J. Cohen, K. Royce, M. Popstefanija, N. Harrington, R. Hernandez, and J. Leaf for case management and epidemiological investigation; T. Mason and the Broad Institute Genomics Platform for sequencing support; M. Salit for sharing reagents. We are indebted to mumps patients and clinical and epidemiological teams for making this work possible.

The findings and conclusions in this report are those of the authors and do not necessarily represent the official position of the Centers for Disease Control and Prevention, the National Institute of General Medical Sciences, the National Institute of Allergy and Infectious Diseases, or the National Institutes of Health.

Abbreviations

BF

Bayes factor

BSSVS

Bayesian stochastic search variable selection

BU

Boston University

CDC

US Centers for Disease Control and Prevention

CI

confidence interval

Ct

cycle threshold

F

fusion protein

Harvard

Harvard University

HN

hemagglutinin-neuraminidase

HPD

highest posterior density

MA

Massachusetts

MCC

maximum clade credibility

MDPH

Massachusetts Department of Public Health

MMR

Measles-Mumps-Rubella

MuV

mumps virus

NP

nucleoprotein

PCA

principal component analysis

PCR

polymerase chain reaction

PS

path sampling

RE

effective reproduction number

ROC

receiver operating characteristic

RT-qPCR

real-time quantitative polymerase chain reaction

SH

small hydrophobic

SNP

single nucleotide polymorphism

SS

stepping-stone sampling

tMRCA

time to the most recent common ancestor

UMass

University of Massachusetts Amherst

Data Availability

All code and data generated as part of this study are publicly available at: http://doi.org/10.5281/zenodo.3338599. The 203 mumps virus whole genome sequences generated in this study, as well as nine low quality sequences not included in the analysis, are also available on NCBI GenBank under BioProject accession PRJNA394142 (accession numbers MF965196–MF965318 and MG986380–MG986468). All associated metadata is available in linked NCBI BioSample entries, and in S1 Data.

Funding Statement

Funding was provided by: NIH NIAID U19AI110818 (Broad Institute); NIH NIAID U54GM088558 (J.A.L.); Howard Hughes Medical Institute (P.C.S.); Harvard University Burke Global Health Fellowship (P.C.S.); Amazon Web Services Cloud Credits for Research (P.C.S.). The project described was supported by award number T32GM007753 from the National Institute of General Medical Sciences (E.H.B.). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

References

  • 1.Centers for Disease Control and Prevention. Mumps Cases and Outbreaks. 9 Feb 2018. https://www.cdc.gov/mumps/outbreaks.html. [cited 9 Mar 2018].
  • 2.Centers for Disease Control and Prevention. FastStats—Immunization. J Infect Dis. 2017;198: 508–515. [Google Scholar]
  • 3.Centers for Disease Control and Prevention. Measles Prevention: Recommendations of the Immunization Practices Advisory Committee (ACIP). MMWR Surveill Summ. 1989;38: 1–18. [Google Scholar]
  • 4.Dayan GH, Quinlisk MP, Parker AA, Barskey AE, Harris ML, Schwartz JMH, et al. Recent resurgence of mumps in the United States. N Engl J Med. 2008;358: 1580–1589. 10.1056/NEJMoa0706589 [DOI] [PubMed] [Google Scholar]
  • 5.Lewnard JA, Grad YH. Vaccine waning and mumps re-emergence in the United States. Sci Transl Med. 2018. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 6.Centers for Disease Control and Prevention. National Notifiable Diseases Surveillance System. 2017.
  • 7.Walkty A, Van Caeseele P, Hilderman T, Buchan S, Weiss E, Sloane M, et al. Mumps in prison: description of an outbreak in Manitoba, Canada. Can J Public Health. 2011;102: 341–344. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 8.Barskey AE, Schulte C, Rosen JB, Handschur EF, Rausch-Phung E, Doll MK, et al. Mumps outbreak in Orthodox Jewish communities in the United States. N Engl J Med. 2012;367: 1704–1713. 10.1056/NEJMoa1202865 [DOI] [PubMed] [Google Scholar]
  • 9.Fields VS, Safi H, Waters C, Dillaha J, Capelle L, Riklon S, et al. Mumps in a highly vaccinated Marshallese community in Arkansas, USA: an outbreak report. Lancet Infect Dis. 2019;19: 185–192. 10.1016/S1473-3099(18)30607-8 [DOI] [PubMed] [Google Scholar]
  • 10.Faria NR, Kraemer MUG, Hill SC, Goes de Jesus J, Aguiar RS, Iani FCM, et al. Genomic and epidemiological monitoring of yellow fever virus transmission potential. Science. 2018;361: 894–899. 10.1126/science.aat7115 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 11.Weill F-X, Domman D, Njamkepo E, Almesbahi AA, Naji M, Nasher SS, et al. Genomic insights into the 2016–2017 cholera epidemic in Yemen. Nature. 2019. 10.1038/s41586-018-0818-3 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 12.Gardy JL, Johnston JC, Ho Sui SJ, Cook VJ, Shah L, Brodkin E, et al. Whole-genome sequencing and social-network analysis of a tuberculosis outbreak. N Engl J Med. 2011;364: 730–739. 10.1056/NEJMoa1003176 [DOI] [PubMed] [Google Scholar]
  • 13.Holmes EC, Ghedin E, Miller N, Taylor J, Bao Y, St George K, et al. Whole-genome analysis of human influenza A virus reveals multiple persistent lineages and reassortment among recent H3N2 viruses. PLoS Biol. 2005;3(9): e300 10.1371/journal.pbio.0030300 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 14.Yuan L, Huang X-Y, Liu Z-Y, Zhang F, Zhu X-L, Yu J-Y, et al. A single mutation in the prM protein of Zika virus contributes to fetal microcephaly. Science. 2017;358: 933–936. 10.1126/science.aam7120 [DOI] [PubMed] [Google Scholar]
  • 15.Diehl WE, Lin AE, Grubaugh ND, Carvalho LM, Kim K, Kyawe PP, et al. Ebola Virus Glycoprotein with Increased Infectivity Dominated the 2013–2016 Epidemic. Cell. 2016;167: 1088–1098.e6. 10.1016/j.cell.2016.10.014 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 16.Kugelman JR, Sanchez-Lockhart M, Andersen KG, Gire S, Park DJ, Sealfon R, et al. Evaluation of the potential impact of Ebola virus genomic drift on the efficacy of sequence-based candidate therapeutics. MBio. 2015;6 10.1128/mBio.02227-14 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 17.Grubaugh ND, Ladner JT, Lemey P, Pybus OG, Rambaut A, Holmes EC, et al. Tracking virus outbreaks in the twenty-first century. Nat Microbiol. 2019;4: 10–19. 10.1038/s41564-018-0296-2 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 18.Snitkin ES, Zelazny AM, Thomas PJ, Stock F, NISC Comparative Sequencing Program Group, Henderson DK, et al. Tracking a hospital outbreak of carbapenem-resistant Klebsiella pneumoniae with whole-genome sequencing. Sci Transl Med. 2012;4: 148ra116. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 19.Gardy JL, Loman NJ. Towards a genomics-informed, real-time, global pathogen surveillance system. Nat Rev Genet. 2018;19: 9–20. 10.1038/nrg.2017.88 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 20.Jin L, Rima B, Brown D, Orvell C, Tecle T, Afzal M, et al. Proposal for genetic characterisation of wild-type mumps strains: preliminary standardisation of the nomenclature. Arch Virol. 2005;150: 1903–1909. 10.1007/s00705-005-0563-4 [DOI] [PubMed] [Google Scholar]
  • 21.World Health Organization. Mumps virus nomenclature update: 2012. Weekly Epidemiological Record. 2012;87: 217–224. 24340404 [Google Scholar]
  • 22.Elbadawi LI, Talley P, Rolfes MA, Millman AJ, Reisdorf E, Kramer NA, et al. Non-mumps Viral Parotitis During the 2014–2015 Influenza Season in the United States. Clin Infect Dis. 2018;67: 493–501. 10.1093/cid/ciy137 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 23.Centers for Disease Control and Prevention. Mumps For Healthcare Providers. 12 Jul 2018. https://www.cdc.gov/mumps/hcp.html. [cited 20 Dec 2018].
  • 24.Centers for Disease Control and Prevention (CDC). Update: mumps outbreak—New York and New Jersey, June 2009-January 2010. MMWR Morb Mortal Wkly Rep. 2010;59: 125–129. [PubMed] [Google Scholar]
  • 25.Moncla L BA. Multiple introductions of mumps virus into Washington State. 22 Oct 2018. https://bedford.io/blog/mumps-in-washington-state/. [cited 20 Dec 2018].
  • 26.Kimberlin DW, Brady MT, Jackson MA, Long SS. Red Book, (2015): 2015 Report of the Committee on Infectious Diseases. Am Acad Pediatrics; 2015.
  • 27.Gouma S, Vermeire T, Van Gucht S, Martens L, Hutse V, Cremer J, et al. Differences in antigenic sites and other functional regions between genotype A and G mumps virus surface proteins. Sci Rep. 2018;8: 13337 10.1038/s41598-018-31630-z [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 28.Wolinsky JS, Waxham MN, Server AC. Protective effects of glycoprotein-specific monoclonal antibodies on the course of experimental mumps virus meningoencephalitis. J Virol. 1985;53: 727–734. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 29.Rasheed MAU, Hickman CJ, McGrew M, Sowers SB, Mercader S, Hopkins A, et al. Decreased humoral immunity to mumps in young adults immunized with MMR vaccine in childhood. Proc Natl Acad Sci U S A. 2019. 10.1073/pnas.1905570116 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 30.Rubin SA, Link MA, Sauder CJ, Zhang C, Ngo L, Rima BK, et al. Recent mumps outbreaks in vaccinated populations: no evidence of immune escape. J Virol. 2012;86: 615–620. 10.1128/JVI.06125-11 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 31.Rubin SA, Qi L, Audet SA, Sullivan B, Carbone KM, Bellini WJ, et al. Antibody induced by immunization with the Jeryl Lynn mumps vaccine strain effectively neutralizes a heterologous wild-type mumps virus associated with a large outbreak. J Infect Dis. 2008;198: 508–515. 10.1086/590115 [DOI] [PubMed] [Google Scholar]
  • 32.Clemmons N, Hickman C, Lee A, Marin M, Patel M. Mumps. In: Centers for Disease Control and Prevention, editor. Manual for the Surveillance of Vaccine-Preventable Diseases. 2018.
  • 33.Yeo RP, Afzal MA, Forsey T, Rima BK. Identification of a new mumps virus lineage by nucleotide sequence analysis of the SH gene of ten different strains. Arch Virol. 1993;128: 371–377. 10.1007/bf01309447 [DOI] [PubMed] [Google Scholar]
  • 34.Jin L, Örvell C, Myers R, Rota PA, Nakayama T, Forčić D, et al. Genomic diversity of mumps virus and global distribution of the 12 genotypes. Rev Med Virol. 2015;25: 85–101. 10.1002/rmv.1819 [DOI] [PubMed] [Google Scholar]
  • 35.Centers for Disease Control and Prevention. Mumps 2012 Case Definition. In: National Notifiable Diseases Surveillance System (NNDSS) [Internet]. 2012. https://wwwn.cdc.gov/nndss/conditions/mumps/case-definition/2012/. [cited 9 Mar 2018].
  • 36.Centers for Disease Control and Prevention. Real-time (TaqMan) RT-PCR Assay for the Detection of Mumps Virus RNA in Clinical Samples. 2 Apr 2010. https://www.cdc.gov/mumps/downloads/lab-rt-pcr-assay-detect.pdf. [cited 9 Mar 2018].
  • 37.Van Camp G, Chapelle S, De Wachter R. Amplification and sequencing of variable regions in bacterial 23S ribosomal RNA genes with conserved primer sequences. Curr Microbiol. 1993;27: 147–151. 10.1007/bf01576012 [DOI] [PubMed] [Google Scholar]
  • 38.Matranga CB, Andersen KG, Winnicki S, Busby M, Gladden AD, Tewhey R, et al. Enhanced methods for unbiased deep sequencing of Lassa and Ebola RNA viruses from clinical and biological samples. Genome Biol. 2014;15. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 39.Metsky HC, Siddle KJ, Gladden-Young A, Qu J, Yang DK, Brehio P, et al. Capturing sequence diversity in metagenomes with comprehensive and scalable probe design. Nat Biotechnol. 2019;37: 160–168. 10.1038/s41587-018-0006-x [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 40.Chris Tomkins-Tinch, Simon Ye, Hayden Metsky, Irwin Jungreis, Rachel Sealfon, Xiao Yang, Kristian Andersen, Mike Lin, and Daniel Park. viral-ngs. 10.5281/zenodo.1030020. [cited 30 Sep 2019]. [DOI]
  • 41.Li H, Handsaker B, Wysoker A, Fennell T, Ruan J, Homer N, et al. The Sequence Alignment/Map format and SAMtools. Bioinformatics. 2009;25: 2078–2079. 10.1093/bioinformatics/btp352 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 42.Wood DE, Salzberg SL. Kraken: ultrafast metagenomic sequence classification using exact alignments. Genome Biol. 2014;15: R46 10.1186/gb-2014-15-3-r46 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 43.Clark K, Karsch-Mizrachi I, Lipman DJ, Ostell J, Sayers EW. GenBank. Nucleic Acids Res. 2016;44: D67 10.1093/nar/gkv1276 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 44.Grabherr MG, Haas BJ, Yassour M, Levin JZ, Thompson DA, Amit I, et al. Full-length transcriptome assembly from RNA-Seq data without a reference genome. Nat Biotechnol. 2011;29: 644–652. 10.1038/nbt.1883 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 45.Altschul SF, Gish W, Miller W, Myers EW, Lipman DJ. Basic local alignment search tool. J Mol Biol. 1990;215: 403–410. 10.1016/S0022-2836(05)80360-2 [DOI] [PubMed] [Google Scholar]
  • 46.Bankevich A, Nurk S, Antipov D, Gurevich AA, Dvorkin M, Kulikov AS, et al. SPAdes: a new genome assembly algorithm and its applications to single-cell sequencing. J Comput Biol. 2012;19: 455–477. 10.1089/cmb.2012.0021 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 47.Buchfink B, Xie C, Huson DH. Fast and sensitive protein alignment using DIAMOND. Nat Methods. 2015;12: 59–60. 10.1038/nmeth.3176 [DOI] [PubMed] [Google Scholar]
  • 48.Katoh K, Standley DM. MAFFT multiple sequence alignment software version 7: improvements in performance and usability. Mol Biol Evol. 2013;30: 772–780. 10.1093/molbev/mst010 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 49.Metsky HC, Matranga CB, Wohl S, Schaffner SF, Freije CA, Winnicki SM, et al. Zika virus evolution and spread in the Americas. Nature. 2017;546: 411–415. 10.1038/nature22402 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 50.Yang X, Charlebois P, Macalalad A, Henn MR, Zody MC. V-Phaser 2: variant inference for viral populations. BMC Genomics. 2013;14: 674 10.1186/1471-2164-14-674 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 51.Gire SK, Goba A, Andersen KG, Sealfon RSG, Park DJ, Kanneh L, et al. Genomic surveillance elucidates Ebola virus origin and transmission during the 2014 outbreak. Science. 2014;345: 1369–1372. 10.1126/science.1259657 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 52.Nguyen L- T, Schmidt HA, von Haeseler A, Minh BQ. IQ-TREE: a fast and effective stochastic algorithm for estimating maximum-likelihood phylogenies. Mol Biol Evol. 2015;32: 268–274. 10.1093/molbev/msu300 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 53.Rambaut A. FigTree. 2016. http://tree.bio.ed.ac.uk/software/figtree/. [cited 30 Sep 2019].
  • 54.Rambaut A, Lam TT, Max Carvalho L, Pybus OG. Exploring the temporal structure of heterochronous sequences using TempEst (formerly Path-O-Gen). Virus Evol. 2016;2: vew007. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 55.Pedregosa F, Varoquaux G, Gramfort A, Michel V, Thirion B, Grisel O, et al. Scikit-learn: Machine Learning in Python. J Mach Learn Res. 2011;12: 2825–2830. [Google Scholar]
  • 56.Tavaré S. Some probabilistic and statistical problems in the analysis of DNA sequences. Lectures on mathematics in the life sciences. 1986;17: 57–86. [Google Scholar]
  • 57.Drummond AJ, Suchard MA, Xie D, Rambaut A. Bayesian phylogenetics with BEAUti and the BEAST 1.7. Mol Biol Evol. 2012;29: 1969–1973. 10.1093/molbev/mss075 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 58.Paterson RG, Lamb RA. RNA editing by G-nucleotide insertion in mumps virus P-gene mRNA transcripts. J Virol. 1990;64: 4137–4145. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 59.Shapiro B, Rambaut A, Drummond AJ. Choosing appropriate substitution models for the phylogenetic analysis of protein-coding sequences. Mol Biol Evol. 2006;23: 7–9. 10.1093/molbev/msj021 [DOI] [PubMed] [Google Scholar]
  • 60.Hasegawa M, Kishino H, Yano T. Dating of the human-ape splitting by a molecular clock of mitochondrial DNA. J Mol Evol. 1985;22: 160–174. 10.1007/bf02101694 [DOI] [PubMed] [Google Scholar]
  • 61.Yang Z. Maximum likelihood phylogenetic estimation from DNA sequences with variable rates over sites: approximate methods. J Mol Evol. 1994;39: 306–314. 10.1007/bf00160154 [DOI] [PubMed] [Google Scholar]
  • 62.Shapiro B, Ho SYW, Drummond AJ, Suchard MA, Pybus OG, Rambaut A. A Bayesian phylogenetic method to estimate unknown sequence ages. Mol Biol Evol. 2011;28: 879–887. 10.1093/molbev/msq262 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 63.Drummond AJ, Ho SYW, Phillips MJ, Rambaut A. Relaxed phylogenetics and dating with confidence. PLoS Biol. 2006;4(5): e88 10.1371/journal.pbio.0040088 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 64.Gill MS, Lemey P, Faria NR, Rambaut A, Shapiro B, Suchard MA. Improving Bayesian population dynamics inference: a coalescent-based model for multiple loci. Mol Biol Evol. 2013;30: 713–724. 10.1093/molbev/mss265 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 65.Baele G, Lemey P, Bedford T, Rambaut A, Suchard MA, Alekseyenko AV. Improving the accuracy of demographic and molecular clock model comparison while accommodating phylogenetic uncertainty. Mol Biol Evol. 2012;29: 2157–2167. 10.1093/molbev/mss084 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 66.Baele G, Li WLS, Drummond AJ, Suchard MA, Lemey P. Accurate Model Selection of Relaxed Molecular Clocks in Bayesian Phylogenetics. Molecular Biology and Evolution. 2012. pp. 239–243. 10.1093/molbev/mss243 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 67.O’Brien JD, Minin VN, Suchard MA. Learning to count: robust estimates for labeled distances between molecular sequences. Mol Biol Evol. 2009;26: 801–814. 10.1093/molbev/msp003 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 68.Lemey P, Minin VN, Bielejec F, Kosakovsky Pond SL, Suchard MA. A counting renaissance: combining stochastic mapping and empirical Bayes to quickly detect amino acid sites under positive selection. Bioinformatics. 2012;28: 3248–3256. 10.1093/bioinformatics/bts580 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 69.Goldman N, Yang Z. A codon-based model of nucleotide substitution for protein-coding DNA sequences. Mol Biol Evol. 1994;11: 725–736. 10.1093/oxfordjournals.molbev.a040153 [DOI] [PubMed] [Google Scholar]
  • 70.Drummond AJ, Rambaut A, Shapiro B, Pybus OG. Bayesian coalescent inference of past population dynamics from molecular sequences. Mol Biol Evol. 2005;22: 1185–1192. 10.1093/molbev/msi103 [DOI] [PubMed] [Google Scholar]
  • 71.Josse J, Husson F, Others. missMDA: a package for handling missing values in multivariate data analysis. J Stat Softw. 2016;70: 1–31. [Google Scholar]
  • 72.Lê S, Josse J, Husson F. FactoMineR: an R package for multivariate analysis. J Stat Softw. 2008. http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.422.7829&rep=rep1&type=pdf. [cited 9 Mar 2018].
  • 73.Troppy S, Haney G, Cocoros N, Cranston K, DeMaria A Jr. Infectious disease surveillance in the 21st century: an integrated web-based surveillance and case management system. Public Health Rep. 2014;129: 132–138. 10.1177/003335491412900206 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 74.R Core Team. R: A Language and Environment for Statistical Computing. In: R Foundation for Statistical Computing [Internet]. https://www.R-project.org. [cited 9 Mar 2018].
  • 75.Henle G, Henle W. Isolation of mumps virus from human beings with induced apparent or inapparent infections. J Exp Med. 1948;88: 223–232. 10.1084/jem.88.2.223 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 76.Ennis FA, Jackson D. Isolation of virus during the incubation period of mumps infection. J Pediatr. 1968;72: 536–537. 10.1016/s0022-3476(68)80347-6 [DOI] [PubMed] [Google Scholar]
  • 77.Polgreen PM, Bohnett LC, Cavanaugh JE, Gingerich SB, Desjardin LE, Harris ML, et al. The duration of mumps virus shedding after the onset of symptoms. Clin Infect Dis. 2008;46: 1447–1449. 10.1086/587104 [DOI] [PubMed] [Google Scholar]
  • 78.Galazka AM, Robertson SE, Kraigher A. Mumps and mumps vaccine: a global review. Bull World Health Organ. 1999;77: 3–14. [PMC free article] [PubMed] [Google Scholar]
  • 79.Dittrich S, Hahné S, van Lier A, Kohl R, Boot H, Koopmans M, et al. Assessment of serological evidence for mumps virus infection in vaccinated children. Vaccine. 2011;29: 9271–9275. 10.1016/j.vaccine.2011.09.072 [DOI] [PubMed] [Google Scholar]
  • 80.Cardemil CV, Dahl RM, James L, Wannemuehler K, Gary HE, Shah M, et al. Effectiveness of a Third Dose of MMR Vaccine for Mumps Outbreak Control. N Engl J Med. 2017;377: 947–956. 10.1056/NEJMoa1703309 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 81.Jombart T, Cori A, Didelot X, Cauchemez S, Fraser C, Ferguson N. Bayesian reconstruction of disease outbreaks by combining epidemiologic and genomic data. PLoS Comput Biol. 2014;10(1): e1003457 10.1371/journal.pcbi.1003457 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 82.Vink MA, Bootsma MCJ, Wallinga J. Serial intervals of respiratory infectious diseases: a systematic review and analysis. Am J Epidemiol. 2014;180: 865–875. 10.1093/aje/kwu209 [DOI] [PubMed] [Google Scholar]
  • 83.Bedford T, Riley S, Barr IG, Sroor S, Chadha M, Cox NJ, et al. Global circulation patterns of seasonal influenza viruses vary with antigenic drift. Nature. 2015;523: 217–220. 10.1038/nature14460 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 84.Lemey P, Rambaut A, Drummond AJ, Suchard MA. Bayesian phylogeography finds its roots. PLoS Comput Biol. 2009;5(9): e1000520 10.1371/journal.pcbi.1000520 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 85.Gong LI, Suchard MA, Bloom JD. Stability-mediated epistasis constrains the evolution of an influenza protein. Elife. 2013;2: e00631 10.7554/eLife.00631 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 86.Minin VN, Suchard MA. Fast, accurate and simulation-free stochastic mapping. Philos Trans R Soc Lond B Biol Sci. 2008;363: 3985–3995. 10.1098/rstb.2008.0176 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 87.Bedford T. Posterior Analysis of Coalescent Trees (PACT). https://github.com/trvrb/PACT. [cited 14 Jan 2018].
  • 88.Rota JS, Rosen JB, Doll MK, McNall RJ, McGrew M, Williams N, et al. Comparison of the sensitivity of laboratory diagnostic methods from a well-characterized outbreak of mumps in New York city in 2009. Clin Vaccine Immunol. 2013;20: 391–396. 10.1128/CVI.00660-12 [DOI] [PMC free article] [PubMed] [Google Scholar]

Decision Letter 0

Lauren A Richardson

24 Jul 2019

Dear Dr Sabeti,

Thank you for submitting your manuscript entitled "Combining genomics and epidemiology to track mumps virus transmission in the United States" for consideration as a Research Article by PLOS Biology.

Your manuscript has now been evaluated by the PLOS Biology editorial staff as well as by an academic editor with relevant expertise and I am writing to let you know that we would like to send your submission out for external peer review.

However, before we can send your manuscript to reviewers, we need you to complete your submission by providing the metadata that is required for full assessment. To this end, please login to Editorial Manager where you will find the paper in the 'Submissions Needing Revisions' folder on your homepage. Please click 'Revise Submission' from the Action Links and complete all additional questions in the submission questionnaire.

*Please be aware that, due to the voluntary nature of our reviewers and academic editors, manuscripts may be subject to delays during the holiday season. Thank you for your patience.*

**Important**: Please also see below for further information regarding completing the MDAR reporting checklist. The checklist can be accessed here: https://plos.io/MDARChecklist

Please re-submit your manuscript and the checklist, within two working days, i.e. by Jul 26 2019 11:59PM.

Login to Editorial Manager here: https://www.editorialmanager.com/pbiology

During resubmission, you will be invited to opt-in to posting your pre-review manuscript as a bioRxiv preprint. Visit http://journals.plos.org/plosbiology/s/preprints for full details. If you consent to posting your current manuscript as a preprint, please upload a single Preprint PDF when you re-submit.

Once your full submission is complete, your paper will undergo a series of checks in preparation for peer review. Once your manuscript has passed all checks it will be sent out for review.

Feel free to email us at plosbiology@plos.org if you have any queries relating to your submission.

Kind regards,

Lauren A Richardson, Ph.D

Senior Editor

PLOS Biology

==================

INFORMATION REGARDING THE REPORTING CHECKLIST:

PLOS Biology is pleased to support the "minimum reporting standards in the life sciences" initiative (https://osf.io/preprints/metaarxiv/9sm4x/). This effort brings together a number of leading journals and reproducibility experts to develop minimum expectations for reporting information about Materials (including data and code), Design, Analysis and Reporting (MDAR) in published papers. We believe broad alignment on these standards will be to the benefit of authors, reviewers, journals and the wider research community and will help drive better practise in publishing reproducible research.

We are therefore participating in a community pilot involving a small number of life science journals to test the MDAR checklist. The checklist is intended to help authors, reviewers and editors adopt and implement the minimum reporting framework.

IMPORTANT: We have chosen your manuscript to participate in this trial. The relevant documents can be located here:

MDAR reporting checklist (to be filled in by you): https://plos.io/MDARChecklist

**We strongly encourage you to complete the MDAR reporting checklist and return it to us with your full submission, as described above. We would also be very grateful if you could complete this author survey:

https://forms.gle/seEgCrDtM6GLKFGQA

Additional background information:

Interpreting the MDAR Framework: https://plos.io/MDARFramework

Please note that your completed checklist and survey will be shared with the minimum reporting standards working group. However, the working group will not be provided with access to the manuscript or any other confidential information including author identities, manuscript titles or abstracts. Feedback from this process will be used to consider next steps, which might include revisions to the content of the checklist. Data and materials from this initial trial will be publicly shared in September 2019. Data will only be provided in aggregate form and will not be parsed by individual article or by journal, so as to respect the confidentiality of responses.

Please treat the checklist and elaboration as confidential as public release is planned for September 2019.

We would be grateful for any feedback you may have.

Decision Letter 1

Lauren A Richardson

4 Sep 2019

Dear Dr Sabeti,

Thank you very much for submitting your manuscript "Combining genomics and epidemiology to track mumps virus transmission in the United States" for consideration as a Research Article at PLOS Biology. Your manuscript has been evaluated by the PLOS Biology editors, an Academic Editor with relevant expertise, and by several independent reviewers.

As you will read, the reviewers appreciate many aspects of your study. However, they also raise some concerns that will need to be addressed in a revision. Of particular note, Reviewers #2 and #3 both believe that further evidence that vaccine escape is not contributing to the outbreak is needed. Reviewer #2 and the Academic Editor also believe that the scope of the results and title should be tempered to reflect the region where samples were obtained.

In light of the reviews (below), we will not be able to accept the current version of the manuscript, but we would welcome resubmission of a much-revised version that takes into account the reviewers' comments. We cannot make any decision about publication until we have seen the revised manuscript and your response to the reviewers' comments. Your revised manuscript is also likely to be sent for further evaluation by the reviewers.

Your revisions should address the specific points made by each reviewer. Please submit a file detailing your responses to the editorial requests and a point-by-point response to all of the reviewers' comments that indicates the changes you have made to the manuscript. In addition to a clean copy of the manuscript, please upload a 'track-changes' version of your manuscript that specifies the edits made. This should be uploaded as a "Related" file type. You should also cite any additional relevant literature that has been published since the original submission and mention any additional citations in your response.

Please note while forming your response, if your article is accepted, you may have the opportunity to make the peer review history publicly available. The record will include editor decision letters (with reviews) and your responses to reviewer comments. If eligible, we will contact you to opt in or out.

Before you revise your manuscript, please review the following PLOS policy and formatting requirements checklist PDF: http://journals.plos.org/plosbiology/s/file?id=9411/plos-biology-formatting-checklist.pdf. It is helpful if you format your revision according to our requirements - should your paper subsequently be accepted, this will save time at the acceptance stage.

Please note that as a condition of publication PLOS' data policy (http://journals.plos.org/plosbiology/s/data-availability) requires that you make available all data used to draw the conclusions arrived at in your manuscript. If you have not already done so, you must include any data used in your manuscript either in appropriate repositories, within the body of the manuscript, or as supporting information (N.B. this includes any numerical values that were used to generate graphs, histograms etc.). For an example see here: http://www.plosbiology.org/article/info%3Adoi%2F10.1371%2Fjournal.pbio.1001908#s5.

For manuscripts submitted on or after 1st July 2019, we require the original, uncropped and minimally adjusted images supporting all blot and gel results reported in an article's figures or Supporting Information files. We will require these files before a manuscript can be accepted so please prepare them now, if you have not already uploaded them. Please carefully read our guidelines for how to prepare and upload this data: https://journals.plos.org/plosbiology/s/figures#loc-blot-and-gel-reporting-requirements.

Upon resubmission, the editors will assess your revision and if the editors and Academic Editor feel that the revised manuscript remains appropriate for the journal, we will send the manuscript for re-review. We aim to consult the same Academic Editor and reviewers for revised manuscripts but may consult others if needed.

We expect to receive your revised manuscript within two months. Please email us (plosbiology@plos.org) to discuss this if you have any questions or concerns, or would like to request an extension. At this stage, your manuscript remains formally under active consideration at our journal; please notify us by email if you do not wish to submit a revision and instead wish to pursue publication elsewhere, so that we may end consideration of the manuscript at PLOS Biology.

When you are ready to submit a revised version of your manuscript, please go to https://www.editorialmanager.com/pbiology/ and log in as an Author. Click the link labelled 'Submissions Needing Revision' where you will find your submission record.

Thank you again for your submission to our journal. We hope that our editorial process has been constructive thus far, and we welcome your feedback at any time. Please don't hesitate to contact us if you have any questions or comments.

Sincerely,

Lauren A Richardson, Ph.D

Senior Editor

PLOS Biology

*****************************************************

Reviews

Reviewer #1: Nathan D. Grubaugh, signed review

Wohl et al. provide an excellent investigation of the 2016-17 mumps virus outbreak in Massachusetts using genomic epidemiology. While I didn't have access to previous reviews and author responses, this version reads very well and tells a very interesting and complete story. The analyses all seem appropriate and the figures look nice and are mostly clear. I highly recommend this for publication and provide only a couple minor comments about the figures.

1. It wasn't until I read the main text did I really understand figure 2 (and I typically look at all of the figures before reading). There is a lot of important info buried in the legend, especially to explain B and C. Could the authors add a label to show that the left panels are estimates made from epi data and the right panels include genomic data? If this is clear from the start, the figure would be much clearer and more powerful.

2. Given the space in the main text, I would recommend brining some of the very nice and informative supplemental figures to the main text. The authors can choose, but I would recommend S1, S4 (but update the title to make it clear that this was used to investigate vaccine escape, then perhaps simplify it to showcase the main points), and S9.

--------------

Reviewer #2:

Wohl et al. present a molecular epidemiological analysis of recent mumps outbreaks in Massachusetts during 2016-17. Using whole genome viral data, they are able to reconstruct the spread of mumps between neighboring communities including several universities in the region and connect these local outbreaks to larger-scale transmission patterns in the United States. They also explore whether mumps has undergone antigenic evolution to escape vaccine-based immunity, but do not find any genetic changes that can be correlated with vaccination patterns.

While this paper reports a perfectly valid molecular epidemiological analysis, the viral sequence dataset they analyze is limited to only ~200 samples collected mainly in MA from 2016-17. Thus authors conclusions about mumps circulation in the larger U.S. are therefore somewhat suspect, and I believe the authors over interpret what such sparsely collected data can tell them (see main criticism below) about larger circulation patterns.

Main criticism:

1) Results Lines 102-103: The authors find that a single mumps lineage has dominated in the US since 2006 and then argue that this “implies that continuous and extensive geographic movement of mumps virus, rather than isolated outbreaks, underlies mumps in the US” —> I don’t actually see evidence for this claim in the phylogeny shown in Figure 1. If anything there is a fair amount of clustering in the phylogeny by region and lineages sampled in MA and NE one year are most closely related to lineages sampled in the those regions the previous year. It’s therefore possible that the G genotype spread slowly throughout the US and the virus is now persisting locally in each region.

2) The possibility that mutations may have allowed for vaccine escape is of major interest, but no mutations were found to strongly correlate with vaccination status or show signs of positive selection (dN/dS > 1). But classic dN/dS ratios would be very underpowered to detect selection here (Kryazhimskiy et al, PLoS Genetics, 2008) since there has presumably not been enough time for the multiple mutations needed to produce a signal of positive selection under these tests to accumulate. In light of this, maybe it would be better to plot the frequency of each mutation among sampled viruses over time and look if any mutations in antigenic regions have dramatically increased in frequency?

Minor remarks:

Line 544: ‘e-i(t)’ —> I’m guessing this should be e^{-lambda_i(t}?

--------------

Reviewer #3: Christophe Fraser, signed review

The authors generate approximately 200 new genomes of mumps virus circulating during an unusually large epidemic in and around Boston universities. They show that the virus consists of multiple widely circulating lineages. They highlight the increased resolution into transmission. Using the SH gene, they compare their sample to viruses circulating globally, and make the case for a persistent US-European lineage causing most cases. They demonstrate that the virus has an effective reproductive number RE well above one, and that control measures need to focus on blocking local transmission. A lot of work went into this impressively comprehensive analysis.

This study is well executed. The case that there is a lot of local transmission is convincing, and the finding of multiple closely related lineages circulating is surprising.

The authors also address the question of whether the virus has mutated to escape vaccine, and conclude that waning immunity is a more likely explanation for persistent transmission. I was less convinced by the robustness of this section of the paper. I was not convinced about the power of the dN dS analysis to detect escape from the vaccine selection pressure. At least, it was not clearly motivated what assumptions or hypotheses were being tested.

Perhaps more could be made of the data on time since vaccination, comparing negative and positive samples, and also formally testing whether subtitutions are associated with time since vaccination.

The data sharing statement should refer both to the sequences and to the associated metadata; currently the authors only propose to share sequences.

Specific comments:

Introductory paragraph, lines 46-56. How sure are the authors that temporally changing patterns of reporting might not affect these observations?

Line 60. ‘academic institutions and other close contact settings.’ Please use more specific language. What is a ‘close contact setting’? Does this include a household?

Line 60, and repeated several times later. Where possible, avoid open ended ranges such as ‘at least 18’, and instead use ranges and estimates.

Line 65. I am not convinced that ‘phylogenomic analysis’ is a thing and think ‘phylogenetic analysis’ would be clearer.

Line 76. The number of genomes is variously 201 (here), 200 (line 96), and 203 (Line 220 and Figure S1). Please check to make this consistent, or specify if slightly different samples were used for different analyses.

Line 81. ‘Median sequencing depth’. Also specify 95% range.

Line 81. ‘All genomes were at least 82% complete.’ This is not a fair description of the sequencing success rate, if I understand correctly the information provided in Fig S1. The authors generated 203 genomes from 259 samples. So 56 samples (22%) were excluded, because of low viral loads associated with late collection of the samples. This is relevant information for readers interested in assessing the sequencing method used here. It is also possibly relevant as a sampling bias to consider in the discussion.

Lines 91-95. Whilst I am convinced that this conclusion is likely correct from the paper, I don’t see how you can conclude this from the analysis of the genomes (Figure 1). It rather seems to be a conclusion of the partial genes (Figure 3). To conclude this, you need to place the diversity in your sample in the context of the full global diversity, which you can only do with the SH gene. If you agree, then it seems you may need to present Figure 3 before Figure 1. I can see this is awkward since you want to showcase the genome trees & data, but you could make a case for the broad level information from a conventional SH analysis, and then zooming into the much higher resolution whole genome data.

Lines 129-139. To me, this was the most opaque section of the paper. First on dN/dS: Why is a dN/dS analysis appropriate for addressing the question of vaccine escape? Why would vaccine selection produce higher dN than non-vaccine immunological selection? Is the study powered for this comparison? It is not clear to me that a codon-by-codon analysis, as shown in Fig S4A or B is valid without correcting for multiple testing, and I don’t understand what conclusion, if any, the authors are drawing from this analysis. In terms of the substitutions, the authors conclude that most mutations fixed here were already present in an ancestral sample (Iowa 2006), and thus that immune escape is unlikely (lines 991-998). The authors should make explicit the assumption of some kind of additive model here: why is not plausible that one of the two mutations identified, 336 and 474, could alone be responsible for vaccine escape? Without further wet-lab neutralisation assays, it must be hard to tell. Overall, I can see why the authors did the analyses, the question is interesting. But the conclusions seem vague at this stage.

Lines 158-181. This is a convincing analysis, and really highlights the power of the whole genome data.

Methods. The authors use well-established in-house viral genomics methods and publicly available phylogenetic software.

BEAST analysis: could you show the estimated skygrid plots for effective population size (Ne.tau)?

PCA analysis (lines 478-481). Does this add anything to the paper?

Stochastic model. The branching process approach looks sensible. The model estimates for lines 581-586 seem interesting, but I can’t see any estimates in the paper?

Table S1. Please include the time since vaccination for both the positive and negative samples. Could more not be made of a comparison between the two?

Lines 1005-1010. This seems an orphan paragraph. Could you add a sentence to place these findings in context?

Lines 1013-1045. These are interesting findings on lack of diversity in host, though paragraph 1035-1045 seems very speculative given the data.

Decision Letter 2

Lauren A Richardson

9 Dec 2019

Dear Dr Sabeti,

Thank you for submitting your revised Research Article entitled "Combining genomics and epidemiology to track mumps virus transmission in the United States" for publication in PLOS Biology. I have now obtained advice from two of the original reviewers and have discussed their comments with the Academic Editor.

Based on the reviews, we will probably accept this manuscript for publication, assuming that you will modify the manuscript to address the remaining points raised by the reviewers. Congratulations!

We expect to receive your revised manuscript within two weeks. Your revisions should address the specific points made by each reviewer. In addition to the remaining revisions and before we will be able to formally accept your manuscript and consider it "in press", we also need to ensure that your article conforms to our guidelines. A member of our team will be in touch shortly with a set of requests. As we can't proceed until these requirements are met, your swift response will help prevent delays to publication.

*Copyediting*

Upon acceptance of your article, your final files will be copyedited and typeset into the final PDF. While you will have an opportunity to review these files as proofs, PLOS will only permit corrections to spelling or significant scientific errors. Therefore, please take this final revision time to assess and make any remaining major changes to your manuscript.

NOTE: If Supporting Information files are included with your article, note that these are not copyedited and will be published as they are submitted. Please ensure that these files are legible and of high quality (at least 300 dpi) in an easily accessible file format. For this reason, please be aware that any references listed in an SI file will not be indexed. For more information, see our Supporting Information guidelines:

https://journals.plos.org/plosbiology/s/supporting-information

*Published Peer Review History*

Please note that you may have the opportunity to make the peer review history publicly available. The record will include editor decision letters (with reviews) and your responses to reviewer comments. If eligible, we will contact you to opt in or out. Please see here for more details:

https://blogs.plos.org/plos/2019/05/plos-journals-now-open-for-published-peer-review/

*Early Version*

Please note that an uncorrected proof of your manuscript will be published online ahead of the final version, unless you opted out when submitting your manuscript. If, for any reason, you do not want an earlier version of your manuscript published online, uncheck the box. Should you, your institution's press office or the journal office choose to press release your paper, you will automatically be opted out of early publication. We ask that you notify us as soon as possible if you or your institution is planning to press release the article.

*Protocols deposition*

To enhance the reproducibility of your results, we recommend that if applicable you deposit your laboratory protocols in protocols.io, where a protocol can be assigned its own identifier (DOI) such that it can be cited independently in the future. For instructions see: https://journals.plos.org/plosbiology/s/submission-guidelines#loc-materials-and-methods

*Submitting Your Revision*

To submit your revision, please go to https://www.editorialmanager.com/pbiology/ and log in as an Author. Click the link labelled 'Submissions Needing Revision' to find your submission record. Your revised submission must include a cover letter, a Response to Reviewers file that provides a detailed response to the reviewers' comments (if applicable), and a track-changes file indicating any changes that you have made to the manuscript.

Please do not hesitate to contact me should you have any questions.

Sincerely,

Lauren A Richardson, Ph.D

Senior Editor

PLOS Biology

------------------------------------------------------------------------

ETHICS STATEMENT:

The Ethics Statements in the submission form and Methods section of your manuscript should match verbatim. Please ensure that any changes are made to both versions.

-- Please include the full name of the IACUC/ethics committee that reviewed and approved the study protocol/permit/project license. Please also include an approval number.

-- Please include information about the form of consent (written/oral) given for research involving human participants. All research involving human participants must have been approved by the authors' Institutional Review Board (IRB) or an equivalent committee, and all clinical investigation must have been conducted according to the principles expressed in the Declaration of Helsinki.

------------------------------------------------------------------------

Reviews

Reviewer #2:

The authors' revision has improved an already good paper. But I still disagree with their major conclusion that widespread geographic dispersal must be common.

Specifically, the authors say that: "Given the modest sampling in this dataset from outside the Northeast, finding such wide geographic dispersal suggests that long-distance migration of the virus is common in the US"

I hate to be difficult, but I really don't think this is the most parsimonious conclusion to be drawn from the phylogeny in Figure 1. Rather I would say there is strong evidence for regional circulation (e.g. in MA and the NE) with occasional long-distance disperal to more distance regions of the U.S. After all, there is relatively strong clustering even from regions like the Midwest that were very under-sampled. Of course, more sampling from different regions may in fact reveal long-distance dispersal is common, I just don't feel that this should be assumed without evidence.

---------------

Reviewer #3: Christophe Fraser, signed review

The authors have done a great job responding the comments. They have strengthened their conclusions on waning immunity. They have clarified both the motivations for the different linked analyses, and highlighted limitations where appropriate. This is an impressively comprehensive study with several interesting and (to me, at least) unexpected conclusions.

Decision Letter 3

Lauren A Richardson

3 Jan 2020

Dear Dr Sabeti,

On behalf of my colleagues and the Academic Editor, Sara Y Del Valle, I am pleased to inform you that we will be delighted to publish your Research Article in PLOS Biology.

The files will now enter our production system. You will receive a copyedited version of the manuscript, along with your figures for a final review. You will be given two business days to review and approve the copyedit. Then, within a week, you will receive a PDF proof of your typeset article. You will have two days to review the PDF and make any final corrections. If there is a chance that you'll be unavailable during the copy editing/proof review period, please provide us with contact details of one of the other authors whom you nominate to handle these stages on your behalf. This will ensure that any requested corrections reach the production department in time for publication.

Early Version

The version of your manuscript submitted at the copyedit stage will be posted online ahead of the final proof version, unless you have already opted out of the process. The date of the early version will be your article's publication date. The final article will be published to the same URL, and all versions of the paper will be accessible to readers.

PRESS

We frequently collaborate with press offices. If your institution or institutions have a press office, please notify them about your upcoming paper at this point, to enable them to help maximise its impact. If the press office is planning to promote your findings, we would be grateful if they could coordinate with biologypress@plos.org. If you have not yet opted out of the early version process, we ask that you notify us immediately of any press plans so that we may do so on your behalf.

We also ask that you take this opportunity to read our Embargo Policy regarding the discussion, promotion and media coverage of work that is yet to be published by PLOS. As your manuscript is not yet published, it is bound by the conditions of our Embargo Policy. Please be aware that this policy is in place both to ensure that any press coverage of your article is fully substantiated and to provide a direct link between such coverage and the published work. For full details of our Embargo Policy, please visit http://www.plos.org/about/media-inquiries/embargo-policy/.

Thank you again for submitting your manuscript to PLOS Biology and for your support of Open Access publishing. Please do not hesitate to contact me if I can provide any assistance during the production process.

Kind regards,

Hannah Harwood

Publication Assistant,

PLOS Biology

on behalf of

Lauren Richardson,

Senior Editor

PLOS Biology

Associated Data

    This section collects any data citations, data availability statements, or supplementary materials included in this article.

    Supplementary Materials

    S1 Fig. Sequencing results and predictors of outcome.

    (A) Distribution of mumps virus (MuV) RT-qPCR Ct value, taken at sample source, for all sequencing replicates prepared with both depletion and capture (see Materials and methods). Genome (blue): a replicate produces a genome passing the thresholds described in Materials and methods. MuV RT-qPCR serves as a predictor of sequencing outcome. (B) Distribution of collection interval (days between symptom onset and sample collection) for all samples prepared with both depletion and capture. Genome (blue) is defined as in panel A. Samples taken more than 4 days after symptom onset did not produce genomes in this study [88]. (C) Relative sequencing depth of coverage aggregated across 203 mumps genomes. (D) Number of unambiguous bases in the genome assembly of each sample by MuV:23S ratio (MuV copies by MuV RT-qPCR divided by 23S copies by 23S RT-qPCR; see Materials and methods). Each point is a replicate, colored by sequencing preparation method. (E) Normalized MuV reads (unique MuV reads divided by raw sequencing depth) in each sample by MuV:23S ratio. Points are as in panel D. Nine points with fraction mumps reads >0.04 are beyond the y-axis limits. In panels A, B, D, and E, reads from each replicate were downsampled to 1 million prior to assembly (see Materials and methods). In panels D and E, 1 point with a MuV:23S ratio <10−8 and 3 points with a MuV:23S ratio >10−3 are beyond the x-axis limits. Ct, cycle threshold; MuV, mumps virus; RT-qPCR, real-time quantitative polymerase chain reaction.

    (TIF)

    S2 Fig. Maximum likelihood tree, root-to-tip regression, and principal component analysis.

    (A) Maximum likelihood tree of the 225 mumps virus genotype G genomes used in this study. Tips are colored by sample source (MDPH or CDC); previously published genomes are indicated by unfilled circles. (B) Root-to-tip regression of genomes shown in panel A, rooted on GenBank accession KF738113 (Pune.IND, 1986). (C) Root-to-tip regression of genomes in the clade containing the two USA 2006 sequences (USA_2006; see Fig 1A) as well as their descendants. (D) Principal component analysis of genetic variants from the genomes in panel A. Each point is a genome colored by its geographic location. CDC, Centers for Disease Control and Prevention; MDPH, Massachusetts Department of Public Health.

    (TIF)

    S3 Fig. Phylogenetic tree colored by institution.

    MCC tree of the 225 mumps virus genotype G genomes used in this study, colored by academic institution. Clades are labeled as in Fig 1A. MCC, maximum clade credibility.

    (TIF)

    S4 Fig. Amino acid substitution in the mumps virus genome.

    (A) Variation in genomes generated in this study. Each row represents one of the 119 mumps HN amino acid sequences from the individuals in our study who had known vaccine status. Samples are displayed in order of descending time since last MMR vaccine dose. Colored variants indicate variation from the consensus of all included sequences. (B) Variation in all published genotype G HN sequences. Each row represents one of the 456 publicly available mumps genotype G HN sequences (including from genomes generated in this study). Identical sequences are collapsed and then grouped by hierarchical clustering. In both panels, amino acid substitutions relative to the Jeryl Lynn vaccine strain are highlighted in blue, with orange indicating a second variant allele and green indicating a third. Light red bars indicate possible neutralizing antibody epitopes, and dark red bars indicate potential N-glycosylation sites. (C) Estimate of dN/dS at each amino acid site in MuV coding regions, calculated across all 225 genotype G genomes used in this study. At each site, the mean estimate and 95% credible interval (not corrected for multiple testing) is shown. (D) Posterior density of dN/dS in each gene, using the same data set. (E) MCC tree of the 225 mumps genotype G genomes used in this study, colored by vaccination status. Clades are labeled as in Fig 1A. HN, hemagglutinin gene; MCC, maximum clade credibility; MMR, Measles-Mumps-Rubella; MuV, mumps virus.

    (TIF)

    S5 Fig. Comparison of PCR-positive and PCR-negative Massachusetts samples.

    Comparison between 521 mumps PCR-negative samples tested in Massachusetts between 2016-01-01 and 2017-06-30, and 198 mumps samples from unique patients in Massachusetts in the same time period. In all panels, percentages have been recalculating after removing unknowns. (A) Vaccination status of positives (n = 198) and negatives (n = 309) with known vaccination status. A chi-square test suggests there is no relationship between PCR result and vaccination status (p = 0.012). (B) Years since vaccination for positives (n = 198) and negatives (n = 309) with known vaccination status. A chi-square tests suggests there is a relationship between PCR result and vaccination status (p = 1.19 × 10−7), and the plot shows that most recently vaccinated individuals were PCR-negative. (C) Collection interval for positives (n = 196) and negatives (n = 477) with known symptom onset and known collection date. A chi-square test suggests there is no relationship between PCR result and collection interval (p = 0.918). PCR, polymerase chain reaction.

    (TIF)

    S6 Fig. Parameters used in epidemiological models.

    We illustrate fitted distributions of parameters of the modeled natural history of mumps infection. (A) We calibrate a gamma distribution to the duration of the incubation period—defined from the time of mumps virus exposure to the onset of shedding—using data from experimental human mumps virus infections with known exposure times [75]. (B) Onset of mumps shedding generally precedes onset of symptoms in the clinical course. We fit a gamma distribution describing the period of latent shedding to pooled data from 2 studies [75] and (C) apply previous estimates of the distribution of the duration of shedding after parotitis onset [77]. (D) We obtain estimates of the distribution of vaccine protection within a university protection by pairing previous estimates of the association between the strength of vaccine protection and time since receipt of the last dose [5] to data on vaccine coverage in a large university [80]. (E) We infer the distribution of the generation interval length in the Harvard using data from 10 cases with known exposure sources (“contact link”). A gamma distribution fitted by maximum likelihood recovers mean and dispersion estimates nearly identical to those reported in earlier mumps outbreaks [82].

    (TIF)

    S7 Fig. Connection between epidemiological and genetic data.

    (A) Multidimensional scaling applied to samples in Clade II-outbreak (see Fig 1A). Each point is a mumps genome and pairwise dissimilarities are based on Hamming distance (see Materials and methods). Genomes with known epidemiological links are connected with a red line. (B) ROC curve for samples within Clade II-outbreak using pairwise genetic distance (calculated as in panel A) as a predictor of epidemiological linkage. (C) Transmission reconstruction using individuals within Clade II-outbreak, using the SH gene or whole genome sequences. Inferred links with probability ≥0.7 are shown; arrows are colored by data set. Nodes with one or more inferred links are shown and are colored by institution. See also Fig 2C, right. ROC, receiver operating characteristic; SH, small hydrophonic.

    (TIF)

    S8 Fig. Trees produced with single-gene and multigene sequences.

    Maximum likelihood trees using (A) the SH gene only, (B) the HN protein only, (C) a concatenation of the HN protein, the F coding region, and the SH gene, and (D) the complete mumps genome. In all panels, tips are colored by clades as defined in Fig 1A and S2 Table. The HN protein sequence does a significantly better job at capturing the epidemiologically-relevant clades than the SH gene, and the tree created from SH+HN+F (nearly 25% of the genome) closely resembles the tree created from whole genome sequences. F, fusion protein; HN, hemagglutinin-neuraminidase; SH, small hydrophobic.

    (TIF)

    S9 Fig. tMRCA probability distributions for mumps genotypes using SH gene sequences.

    The date of the most recent genotype A clinical sample is indicated, excluding samples that closely resemble a mumps vaccine strain. SH, small hydrophobic; tMRCA, time to the most recent common ancestor.

    (TIF)

    S10 Fig. Additional analyses of global mumps spread using SH gene sequences.

    (A) World map indicating number of SH sequences in our data set from each of 15 regions; the 4 circled global regions represent the 4 regions from which we resampled input for migration analyses (see Materials and methods for details regarding geographic and temporal resampling of sequences). (B) Tree with the highest clade credibility across all trees generated on resampled input from 4 global regions. Branch line thickness corresponds to posterior support for ancestry (indicated by branch color). (C) Migration between the 4 global regions shown in panel A. Each plot shows a posterior probability density, taken across resampled input, of the fraction of all reconstructed migrations that occur to the destination (indicated in upper right) from each of the other 3 sources. (D) Migration between the 4 global regions shown in panel A. Shading of each migration route indicates its statistical support (quantified with BF) in explaining the diffusion of mumps virus. (E) Average proportion of geographic ancestry of samples in each of the 4 global regions (labeled) from each of the 4 regions (colored), going back 5 years from sample collection. (F) Average proportion of Europe in geographic ancestry of US samples, and vice-versa. Shaded regions are pointwise percentile bands (2.5% to 97.5%) across 100 resamplings of the input sequences. Colors for panels B,C,D,E,F are by global region, as shown in the bottom right. BF, Bayes factor; SH, small hydrophobic.

    (TIF)

    S11 Fig. Skygrid reconstruction of population size.

    The scaled population size (Neτ) according to a Skygrid reconstruction. Dark line represents the median population size across samples from the posterior, and shaded area represents the 95% HPD interval. HPD, highest posterior density.

    (TIF)

    S1 Table. Sample metadata.

    Demographic information of all mumps cases in Massachusetts between 2016-01-01 and 2017-06-30, and the subset of these included in this study.

    (TIF)

    S2 Table. Model selection and tMRCA estimates across models.

    (A) Marginal likelihoods estimated in 6 models: combinations of 3 coalescent tree priors (constant size population, exponential growth population, and Skygrid) and 2 clock models (strict clock and uncorrelated relaxed clock with log-normal distribution). Estimates are with PS and SS. The BF are calculated against the model with constant size population and a strict clock. (B) Mean estimates of clock rate, date of tree root, and tMRCAs of the clades shown in Fig 1A (excluding Clade 0-HU, which consists of one sample). USA-4 corresponds to “Clades I and II” in Fig 1A. Below each mean estimate is the 95% highest posterior density interval. tMRCAs of additional unlabeled nodes are available at http://doi.org/10.5281/zenodo.3338599. BF, Bayes factor; PS, path sampling; SS, stepping-stone sampling; tMRCA, time to the most recent common ancestor.

    (TIF)

    S3 Table. Viruses identified in mumps PCR-negative samples.

    Influenzavirus B is the only segmented virus listed, and we identify 51 reads mapping to 6 of the 8 segments: in order, 2, 6, 0, 0, 8, 10, 4, 21 reads to each segment. PCR, polymerase chain reaction.

    (TIF)

    S1 Data. Sample metadata.

    Information and metadata regarding all mumps PCR-positive and PCR-negative samples on which sequencing was attempted. PCR, polymerase chain reaction.

    (XLSX)

    S2 Data. Single nucleotide polymorphisms.

    List of identified nonsynonymous SNPs, including the frequency of each SNP. Separate list of identified within-host variants above 2% frequency. Separate list of SNPs in possible immunogenic regions of HN and NP in US samples from 2016 to 2017 (all positions relative to the relevant gene). HN, hemagglutinin-neuraminidase; NP, nucleoprotein; SNP, single nucleotide polymorphism.

    (XLSX)

    S3 Data. dN/dS values.

    Calculated dN/dS value and confidence interval for each amino acid position across the mumps virus genome.

    (XLSX)

    S4 Data. Publicly available SH and HN sequences.

    List of SH and HN sequences and corresponding metadata used in analysis, as well as SH sequences available for resampling. HN, hemagglutinin-neuraminidase; SH, small hydrophobic.

    (XLSX)

    S1 Text

    (DOCX)

    Attachment

    Submitted filename: mumps-plosbio-response.docx

    Attachment

    Submitted filename: mumps-plosbio-response2.docx

    Data Availability Statement

    All code and data generated as part of this study are publicly available at: http://doi.org/10.5281/zenodo.3338599. The 203 mumps virus whole genome sequences generated in this study, as well as nine low quality sequences not included in the analysis, are also available on NCBI GenBank under BioProject accession PRJNA394142 (accession numbers MF965196–MF965318 and MG986380–MG986468). All associated metadata is available in linked NCBI BioSample entries, and in S1 Data.


    Articles from PLoS Biology are provided here courtesy of PLOS

    RESOURCES