Skip to main content
Springer Nature - PMC COVID-19 Collection logoLink to Springer Nature - PMC COVID-19 Collection
. 2018 Dec 13;4(1):10–19. doi: 10.1038/s41564-018-0296-2

Tracking virus outbreaks in the twenty-first century

Nathan D Grubaugh 1,2, Jason T Ladner 3,, Philippe Lemey 4, Oliver G Pybus 5, Andrew Rambaut 6,7,, Edward C Holmes 8,, Kristian G Andersen 1,9
PMCID: PMC6345516  NIHMSID: NIHMS1003524  PMID: 30546099

Abstract

Emerging viruses have the potential to impose substantial mortality, morbidity and economic burdens on human populations. Tracking the spread of infectious diseases to assist in their control has traditionally relied on the analysis of case data gathered as the outbreak proceeds. Here, we describe how many of the key questions in infectious disease epidemiology, from the initial detection and characterization of outbreak viruses, to transmission chain tracking and outbreak mapping, can now be much more accurately addressed using recent advances in virus sequencing and phylogenetics. We highlight the utility of this approach with the hypothetical outbreak of an unknown pathogen, ‘Disease X’, suggested by the World Health Organization to be a potential cause of a future major epidemic. We also outline the requirements and challenges, including the need for flexible platforms that generate sequence data in real-time, and for these data to be shared as widely and openly as possible.

Subject terms: Molecular evolution, Genomics, Infectious diseases


This Review Article describes how recent advances in viral genome sequencing and phylogenetics have enabled key issues associated with outbreak epidemiology to be more accurately addressed, and highlights the requirements and challenges for generating, sharing and using such data when tackling a viral outbreak.

Main

Emerging infectious diseases present one of the greatest public health challenges of the twenty-first century. Among these are zoonotic viruses that originate from reservoir species, often mammals, and jump to humans to cause disease syndromes of varying form and severity. An emerging virus, depending on its ability to transmit among humans, can lead to individual or a few sporadic cases, resulting in a localized outbreak that requires public health intervention or, in the worst scenarios, can develop into a large epidemic or global pandemic. Such emergence events over the past two decades are numerous and varied. They include viruses not previously encountered, such as the SARS and MERS coronaviruses13, and familiar foes that have reappeared to cause outbreaks, such as swine- and avian-origin influenza4,5, and Ebola6 and Zika7 viruses. Although many outbreaks end naturally or are controlled quickly, questions remain over how best to scientifically respond to these events.

The broad-scale factors responsible for viral emergence have been well documented and include human population growth, increased frequency and reach of travel, changing patterns of land use, changing diets, wars and social upheaval and climate change8,9. These factors increase interactions between humans and reservoir hosts, facilitating exposure to zoonotic viruses and spillover infections in people, and allow emerging viruses to spread more easily through human populations. The interactions between virus genetics, ecology and the host factors that determine virus emergence are so complex that it is impossible to predict what virus will cause the next epidemic, making it essential that our response is scientifically informed, robust and efficient10.

The emergence of virus outbreaks generates a set of common questions, whose answers are central to disease mitigation and control (Table 1), and which at times can only be answered by sequencing of viral genomes. These include what is the virus, is it novel, or does it represent the re-emergence of a known pathogen; what is its mode of transmission; where does the emerging virus come from (in particular, what is its reservoir host and/or geographic source); what ecological factors underpin its emergence; how many introductions into humans have there been; what is the timing of these introduction events, and was there a period of undetected transmission before the first reported case; during flare-ups and future outbreaks, how are they connected to previous events; and what is the nature of virus evolution and is there evidence for local adaptation? In the past, many of these of questions were addressed using case (incidence) data, which led to estimates of key epidemic parameters such as the basic reproductive number (R0, the expected number of secondary cases produced by each case at the start of the outbreak) that were used to inform epidemic control policy. Although still of fundamental importance, case data alone cannot inform public health management with the level of precision necessary for all targeted interventions. Recent advances in virus genome sequencing and phylogenetic analyses, however, mean that we are now in a position to answer such questions with molecular precision, and open new areas of investigations not previously possible based on epidemiological data alone (Table 1).

Table 1.

Critical questions addressed by viral genomic epidemiology

Questions Examples from genomic epidemiology References
What virus is causing the outbreak? Metagenomic sequencing from patient samples revealed a novel virus—Lujo virus—as the causal virus for an outbreak in South Africa in 2008 23
How is the virus transmitting? Sequencing studies of MERS coronavirus combined with coalescent approaches showed that human outbreaks are driven by seasonally varying zoonotic transfer of viruses from camels 90, 91
Where did the outbreak begin? Large-scale sequencing efforts and phylogenetic analyses showed that the 2009 influenza A/H1N1 pandemic originated in swine populations from Mexico 5, 46
What factors drive the outbreak? Analysis of more than 1,600 Ebola virus genomes identified critical factors that contributed to the spread of the virus during the 2013–2016 epidemic in West Africa 61
How many introductions have there been? Sequencing of Zika virus from patients and mosquitos in Florida showed that multiple introduction events of the virus sustained the 2016 outbreak in Miami and surrounding counties 62
When did the outbreak begin? Large-scale studies showed that the Zika epidemic in the Americas likely started in Brazil more than a year earlier than was initially believed 7, 9294
Are outbreaks linked? Analysis of Ebola virus genomes during the 2013–2016 epidemic showed that the virus can persist for more than a year in survivors, and be responsible for flare-ups of the outbreak via sexual transmission 52, 57, 95, 96
How is the virus evolving? Sequencing studies during the 2013–2016 Ebola epidemic identified mutations in the virus genome that rapidly rose to a high frequency, compatible with increased fitness; experimental follow-up studies showed that some of those mutations were probably Ebola virus adapting to a new host 71, 72, 97

Examples of commonly used software packages for genomic epidemiology investigations are available at: http://www.virological.org/c/software.

Virus genomics have been used to investigate infectious disease outbreaks for several decades. This is possible because viruses, particularly those with RNA genomes, generate genetic variation on the same timescale of virus transmission, through a combination of high rates of mutation and replication11,12. Consequently, it is possible to infer epidemiological and emergence dynamics from virus genomes sampled and sequenced over short epidemic timescales. We term the science of using genomics and associated analyses ‘genomic epidemiology’.

Initially, genomic approaches relied on indirect methods (for example, restriction fragment length polymorphisms13) to infer genotypes and differentiate between virus strains. As direct sequencing technologies advanced, there was a transition toward the use of nucleotide sequences from fragments of virus genomes for this purpose1421. Now, thanks to advances in high-throughput sequencing and decreasing costs22, most virus genomics studies utilize data sets containing tens to thousands of (near) complete virus genomes.

In this Review Article, we will show how our ability to track and understand infectious disease outbreaks have been revolutionized by the addition of virus genomics data. We will highlight the varied uses of virus genomics during the different stages of viral outbreaks, from initial virus detection to understanding the factors contributing towards global spread (Box 1). We will show how genomic epidemiology can be used to track the spread of emerging viruses, where the challenges lie, and establish an agenda for future work. Although we focus on human disease, the genome-based methodologies that we describe can be equally applied to animal and plant infections. Similarly, the increasing ability to rapidly sequence complete genomes of bacterial species means that these technologies offer much to the study of emerging bacterial disease, including those associated with antimicrobial resistance.

Box 1 Outbreak of Disease X: a hypothetical scenario.

In addition to the Ebola, SARS and Zika viruses, the WHO watchlist of viruses that may lead to public health emergencies98 acknowledged for the first time that the next serious epidemic may be caused by a currently unknown virus—Disease X. Its inclusion emphasizes the need for flexible and deployable platforms to understand and combat disease outbreaks of many varieties. Most likely, Disease X may be a known microorganism believed to cause no or mild human disease, as was the case for Zika virus before its epidemic in the Americas. Disease X could emerge anywhere in the world and, given the mobility of human populations, it could spread to distant and highly populated regions within days or weeks. To illustrate how genomic epidemiology can successfully reveal important aspects of disease emergence and inform epidemic control efforts, we present a hypothetical scenario in which Disease X successfully jumped into humans, established sustained transmission and caused severe disease.

In Miami, Florida (United States), a 22-year-old man sought medical assistance after an influenza-like illness suddenly progressed to a dangerously high fever and laboured breathing. He reported golfing activity at nearby resorts, harbouring clusters of wildlife, including birds. He was admitted into the emergency room and within 3 days died of pneumonia. During this time, 5 other young adults presented with similar symptoms to Miami-area hospitals. Standard molecular diagnostics for commonly suspected pathogens were negative, but immunoglobulin M antibodies collected from each patient were slightly cross-reactive to MERS and SARS coronaviruses. Since the virus could not be conclusively identified with conventional assays, metagenomic sequencing was used to identify Disease X as a novel human virus, most closely related to other coronaviruses in ducks (see figure, panel a). Importantly, due to the relatedness of the novel virus to a family of viruses with well-defined host-ranges, these data led to a hypothesis about its potential origin and reservoir (overwintering migratory birds in the nearby Everglades wetlands) and allowed for the development of virus-specific diagnostics and targeted sequencing approaches.

Within 3 weeks, there were 40 new laboratory-confirmed Disease X cases, including 8 from healthcare workers who contacted the original 6 cases, and 5 total deaths (an 11% apparent case fatality rate). Targeted sequencing from 15 patients and related viruses, including from ducks across Southern Florida, revealed that the human Disease X viruses clustered together on a phylogenetic tree and shared a common ancestor with virus genomes from ducks near Palm Beach, suggesting there was a single zoonotic spillover event and subsequent human-to-human transmission (panel b). A molecular clock phylogenetic analysis further indicated that the common ancestor of the human viruses existed several months ago, suggesting that the first patient identified was not the first case of the outbreak, and highlighting the possibility of many more unreported or asymptomatic cases.

As the outbreak progressed, there was a critical need to understand transmission to help control further spread. Traditional epidemiology, including contact tracing, provided important insights into the risk factors for transmission. Virus genome sequencing was used to infer transmission chains that linked each infected patient (panel c). These analyses revealed that: (1) transmission occurred primarily between individuals that had been in close proximity; and (2) a few individuals infected most of the known cases. In response, an action plan of patient isolation/containment and widespread use of facemasks was implemented to reduce close contact and aerosol transmission.

The Disease X outbreak peaked within a year, resulting in ~2,000 cases in Florida and several imported cases throughout the world. Most of the imported cases did not result in secondary local infections, with the exception of two healthcare workers in New York City and a large outbreak of more than 100 cases near Havana, Cuba. Factors leading to local and global spread were investigated by layering transportation, geographic, climatic, economic, and demographic information into a large phylogenetic data set of Disease X viruses (panel d). Analyses indicated that virus dispersal from Miami was more likely to occur to large cities that were either: (1) in close driving proximity; or (2) connected by direct flights with high travel volumes. Once in a new city, the success of virus transmission was correlated with low economic status and high population density. This raised concerns about Disease X outbreaks emerging in low-income and densely populated countries within the Caribbean and Central America. The WHO used this information to implement comprehensive surveillance and response efforts in at-risk nations.

graphic file with name 41564_2018_296_Figa_HTML.jpg

Real-time genomic investigation of Disease X. a, Metagenomic sequencing revealed that Disease X, which could not be identified using standard clinical assays, was a novel virus. b, Targeted sequencing from additional human cases and from related viruses uncovered the likely animal reservoir, the time period that it was introduced into the human population (represented by * in the lower panel), and that subsequent transmission was human-to-human. c, More intensive virus genome sequencing was used to construct detailed transmission chains and identify potential control measures. d, Layering additional climatic (pictured in the lower panel; https://www.climate.gov/maps-data), transportation, geographic, economic and demographic information into a large phylogenetic data set revealed the risk factors that facilitated local and global spread. Images and icons courtesy of S. Knemeyer.

Outbreak detection

Most infectious disease outbreaks start with clinicians noticing unusual patterns. Patients may present with patterns of symptoms that are similar to those of more common diseases, but which, after repeated observation and diagnostic testing, may deviate in scale, seasonality or severity. At this very beginning of an outbreak, the most critical task is therefore to identify a causal pathogen. Historically, virus identification has been performed using molecular tools, such as polymerase chain reaction (PCR) and enzyme-linked immunosorbent assay (ELISA), that directly recognize pathogen-derived material (Box 2), or conventional non-molecular techniques, such as microscopy. The advent of untargeted metagenomic sequencing directly from clinical samples, however, means that we are now on the cusp of being able to detect human viruses in a single step, without a priori knowledge of putative causal pathogens (Box 2). The major advantage of sequencing-based approaches is the ability to detect novel viruses—such as the initial appearances of SARS2, MERS3 or Lujo virus23—or unexpected ones, as exemplified by Ebola virus during the 2013–2016 epidemic in West Africa24.

Once an outbreak has been detected and a causal virus identified, several basic questions can immediately be answered about the virus itself, including: (1) whether it is novel or previously known to infect humans; and (2) if we have the diagnostics, vaccines and therapeutics available to fight it. Importantly, the generation of virus genomics data at this stage will provide deeper insights into these questions by uncovering molecular details not possible with conventional tools. Phylogenetics will also provide an additional level of detail, revealing virus origins, evolutionary characteristics and connections to previous outbreaks in the same region, or to transmissions in other regions6. Given high enough relatedness to other members of a virus family with well-defined reservoir hosts (for example, old-world arenaviruses25), the sequence identification of novel virus species can also be informative about potential reservoirs.

Box 2 Molecular technologies for detecting and tracking outbreaks.

Traditional methods. The methods traditionally used to diagnose infectious disease agents in patients are developed to detect either antigens (for example, ELISAs and lateral flow assays), or nucleic acids (for example, PCR) derived from the pathogen. These assays are typically designed to recognize either single (for example, Ebola virus) or closely related (for example, Filoviridae) pathogens. Versions of such assays may also be combined in a multiplexed fashion to detect a small number of different pathogens (for example, haemorrhagic fever viruses). While most laboratories are capable of running these assays, they are often not available for uncommon or novel pathogens, and running multiple rounds of testing can take weeks. They also require a priori knowledge of putative pathogens and cannot typically be used to detect outbreaks that are caused by novel, highly divergent, understudied or rare pathogens.

Deployable solutions. Over the last several years, robust and deployable solutions have been developed for pathogen detection that do not require the maintenance of a cold chain, which can be difficult or impossible under many outbreak conditions. Simple-to-use, point-of-care rapid diagnostic tests have the potential to transform early outbreak detection. For example, the ReEBOV antigen rapid test for Ebola virus infection developed during the recent epidemic could be deployed throughout sub-Saharan Africa to help detect new outbreaks99,100. Simple nucleic acid assays, such as loop-mediated isothermal amplification (LAMP) developed for Zika virus101, H5N1 avian influenza virus102 and SARS coronavirus103, have eliminated the need for thermal cycling and most power requirements. New and creative advances in microfluidics104, nanowire arrays105 and field-effect biosensors106,107 are also helping to reduce the barriers to efficient and rapid diagnostics, while increasing sensitivity and specificity of detection. Of particular interest for deployment in resource-limited settings are paper-based engineered gene circuits, such as sensors designed for strain-specific Ebola virus detection108. They are stable for long-term storage at room temperature and are activated by rehydration, and thus can be used in remote environments. Very recently, highly sensitive and deployable CRISPR-based diagnostics have also been developed that utilize CRISPR–Cas13/12a to detect pathogen-derived nucleic acids109112. Similar to the traditional methods described above, all of these tools require a priori knowledge of probable causal pathogens and the availability of antibodies, genome sequences or other pathogen characteristics.

Sequencing-based methods. Untargeted metagenomic sequencing provides a potential one-step solution for outbreak pathogen detection of both known and novel pathogens, and may be able to replace the need for multiple individual pathogen assays24. The main advantage of metagenomic sequencing is that it does not require a priori knowledge of the pathogen, but comes at the expense of specialized equipment, increased costs and bioinformatic complexity. Although high backgrounds of host nucleic acid and/or low pathogen titres in clinical samples can make pathogen detection difficult, host gene depletion113 and pathogen enrichment114,115 methods can help alleviate these issues. After the first outbreak pathogen genome sequence has been obtained, targeted approaches using next-generation sequencing can also be developed. This was the case for both of the recent Zika and Ebola epidemics116,117, where cheaper and faster amplicon-based approaches were rapidly developed and deployed to track both of the epidemics. The most common platforms used for these purposes are those developed by Illumina (for example, MiSeq and HiSeq), because they have high accuracy and throughput, but they also have high costs and relatively short read lengths (up to 300 base pairs). Cheaper portable devices, such as the miniaturized Oxford Nanopore MinION, can help to produce data in close to real time directly in-country and under austere conditions93,117. This is a significant advancement because, along with open data sharing, rapid diagnostics and sequencing, such devices help promote a comprehensive and collaborative response network.

First snapshot of an outbreak

Immediately after a viral outbreak has been identified there exists a ‘fog-of-war’. The extent of the outbreak, the timing and nature of its source, and the contribution of human-to-human transmission will be extremely limited, yet these data are critical to designing effective responses. Genomic epidemiology, if applied quickly and comprehensively, holds the potential to answering these questions24.

To provide an initial snapshot of an outbreak, it is important to understand the diversity of circulating viruses from as many cases as possible. Virus genetic diversity, measured as the average number of nucleotide differences among viruses in the population, will increase as an outbreak progresses due to the accumulation of genetic changes in virus genomes at each round of viral replication6. If this rate of mutational accumulation is relatively constant—that is, it conforms to a ‘molecular clock’ of evolutionary change26—then the rate at which it occurs (referred to as the ‘evolutionary rate’) allows us to estimate when the sequenced viruses last shared a common ancestor. Critically, this provides a lower bound on when an outbreak began, and how long the virus had been circulating prior to discovery5,27,28. If the virus genomes have been sampled over only a limited time-scale, so that only a few mutations have accumulated in the virus population, then evolutionary rates will need to be based on those from prior outbreaks or extrapolations from related viruses29. Later in an epidemic, when viruses have been sequenced over a sufficient period of time to capture mutational accumulation, evolutionary rates can be readily estimated directly from virus genomes sampled during the outbreak3032. Evolutionary rate estimates, however, can be sensitive to model specification over short periods of time33 and depend on the timescale of measurement34. Such issues, as well as the unwarranted implications about changes in transmissibility and virulence that may accompany seemingly inflated evolutionary rates, have been discussed in detail in the context of the 2013–2016 Ebola epidemic in West Africa6.

A common approach to phylogenetic analysis of the genetic diversity of a virus population is to infer a tree from sampled virus genomes with branches measured in units of time (that is, a rooted, time-calibrated tree). This can provide estimates of the date of the last common ancestor at the root of the tree, as well as each individual branching event. As an approximation, these branching events correspond to virus transmission from one case to the next, an insight that offers further key information about the unfolding outbreak35. In addition, models of how the process of virus transmission relates to the shape of phylogenetic trees (Fig. 1) enable important epidemiological inferences. In particular, coalescent models relate the rate at which virus lineages of a phylogenetic tree merge, as common ancestors, to the size of the epidemic. This uses the simple premise that, for a sample of virus genomes, the larger the outbreak is, the further back in time the common ancestor will be found (Fig. 1a–c).

Fig. 1. Outbreak scenarios and the resulting phylogenetic trees of virus genomes from sampled human cases.

Fig. 1

The first three scenarios show a single introduction from a non-human reservoir followed by human-to-human spread. a, A small outbreak from a recent zoonosis with a commensurately short tree, suggesting recent emergence. R0 is greater than 1, indicating the potential to cause a large outbreak. b, A medium-sized outbreak with a deeper tree and internal nodes dispersed. With R0 close to 1, this suggests that emergence into humans was not recent and its transmission potential is just sufficient to persist. The root of the tree is not the index case meaning the zoonosis could be older. c, A large outbreak with R0 greater than 1, and thus exhibiting exponential growth in case numbers. Distinctively for a growing epidemic, internal nodes tend to be towards the root of the tree, suggesting that only a small fraction of the total cases were sampled. d, A scenario of repeated zoonotic jumps with limited human to human transmission. The internal parts of the tree represent the diversity of the virus in the non-human reservoir and the human-to-human transmission cases are closely related. Icons courtesy of S. Knemeyer.

Early in an outbreak, one of the primary concerns is to understand the rate at which the virus may be spreading through the human population. As noted in the introduction, this can be assessed by estimating R0, which is critical for epidemiological projections and for planning public health responses. While R0 can be calculated through epidemiological analyses of case counts, accurate estimates of such data may not be available early in an outbreak, since they require a time-series of cases. As demonstrated during the early spread of the novel influenza A/H1N1 virus in 2009, phylogenetic inference of epidemic growth based on virus genomics can provide estimates of R0 comparable to that inferred from case data36. These calculations can be performed using coalescent models that directly estimate R0, based on classic susceptible–infected–recovered (SIR) models37,38. A similar group of models analyse patterns of lineage birth–death, linking the shape of trees to the rate at which virus lineages split and go extinct, and have recently gained popularity39,40. Both approaches were applied during the 2013–2016 epidemic in West Africa to calculate R0 to assess Ebola virus transmission dynamics, and illuminated the impact of ‘superspreader’ events41,42. All of these methods, however, are beholden to the inherent uncertainty of genome sequence data, especially at the start of an epidemic where such sequences exhibit limited variability and sampling may be biased. Hence, phylogenetic estimates of R0, although probably indicative of broad characteristics such as epidemic growth, may not be precise enough to make critical decisions in the absence of corroborating (epidemiological) information.

The initial snapshot of virus genome sequences can also provide critical insights into the role of a zoonotic transmission during an outbreak (Fig. 1d). Genomic analyses, for example, revealed that Lassa fever virus, which is endemic in West Africa43, primarily spreads via repeated transmission from local rodent reservoirs, as opposed to sustained human-to-human transmission44. This is in contrast to Ebola virus during the 2013–2016 epidemic in West Africa, where genomic epidemiology showed that the outbreak was the result of a single zoonotic spillover, followed by sustained human-to-human transmission45.

Given availability of virus genomes from potential zoonotic reservoirs, another aim of early virus sequencing from an outbreak is to uncover the identity and geographic location of the reservoir host. The influenza A/H1N1 pandemic that started in 2009 was quickly recognized as being a likely species jump from pigs, as all of the virus genomic segments closely matched those previously seen in swine4,5. Like the 2013–2016 Ebola epidemic in West Africa, the influenza A/H1N1 pandemic probably started as a single introduction into humans that occurred a few months before it was detected5. The initial suspicion, and later confirmation, that the spillover occurred in Mexico, was complicated by a lack of widespread zoonotic genomic surveillance in this region. Retrospective sequencing of samples from Mexican pigs, however, showed that there were close relatives of the human virus circulating in this country at the time of the epidemic, confirming its origin46.

Transmission chain tracking

Beyond the initial characterization of an outbreak, virus genome sequencing offers enormous potential for determining transmission chains to understand networks of ‘who-infected-whom’. The tracking of transmission chains has long been a standard part of public health responses to outbreaks, providing critical information that can be used to interrupt virus spread and reduce the magnitude of an outbreak. This work has traditionally been performed using interview-based contact tracing, which is labour intensive and limited by the availability and openness of patients for interviews. This approach is particularly challenging during large outbreaks characterized by large numbers of co-occurring transmission chains.

Virus genomic-based approaches can provide much more in-depth information compared to traditional non-sequencing based approaches, as the branching patterns of phylogenetic trees approximately correspond to transmission from one case to the next35 (Fig. 1). Virus genome sequences, for example, were used to reconstruct the spread of foot-and-mouth disease virus in the United Kingdom, including the identification of superspreader events4749. Genomic data also played a critical role in understanding flare-ups during the West African Ebola outbreak5052, where phylogenetic analyses showed that most of the flare-ups were linked to persistently infected Ebola survivors (Fig. 2a), thereby demonstrating sexual transmission of the virus50,52. None of these insights would have been possible without virus genomic data.

Fig. 2. Transmission chain tracking during outbreaks using virus genomics.

Fig. 2

a, Viral genome sequences were used to distinguish between competing hypotheses for the source of the viruses that triggered the Ebola flare-ups in West Africa. The three main hypotheses and their expected genomic signatures are illustrated here with a hypothetical haplotype network. Genomes from all of the observed flare-ups grouped closely with genomes sequenced from patients in the same country, from earlier in the outbreak (bottom left), consistent with transmission from persistent sources. In contrast, genomes linked to re-introductions from neighbouring countries (right) would be expected to cluster with genomes from a different country and from late in the outbreak. In the case of independent spillovers from a reservoir host (top left, that is, independent sampling from the diversity circulating within the reservoir), the spillover genomes would be linked to the main outbreak by a long branch originating from near the root of the network. GIN, Guinea; LBR, Liberia; SLE, Sierra Leone. b, Expected ‘genomic resolution’ for the inference of transmission chains at the level of individual infections. Resolution is dependent on the serial interval between infections (x-axis; used as a proxy for epidemiological generation time), as well as the genome size and nucleotide substitution rate (y-axis). Icons courtesy of S. Knemeyer.

The utility of virus genomic data for the inference of transmission chains is dependent on several factors, including: (1) the evolutionary rate of the virus; (2) the length of time between the infections of interest; and (3) the proportion of sampled cases; which together determine the resolution of the genetic signal (Fig. 2b). Although RNA viruses exhibit remarkably high evolutionary rates53, their small genome sizes and short epidemiological generation times often result in, on average, less than one substitution per transmission event5456 (Fig. 2b). Hence, virus genomics alone often cannot be expected to perfectly reconstruct transmission chains at the level of individual infections. Combined with epidemiological data, however, virus genomics provides a powerful tool for restricting the number of possible transmission scenarios and for supporting novel modes of transmission47,57,58. In addition, most phylogenetics-based transmission chain analyses have been performed using virus consensus sequences (that is, a single genome per sample/patient that represents the average of the virus population), which may limit resolution. However, as virus infections exhibit diverse intra-host populations (containing intra-host single nucleotide variants (iSNVs)44), newer methods incorporating viral iSNVs may greatly increase the resolution of transmission chain analyses so long as multiple variants are transmitted between hosts59.

Outbreak mapping

As described in the previous sections, genomic epidemiology can be used to detect an outbreak, show its origin and elucidate transmission patterns. Evolutionary inferences from virus genomes, unlike non-sequencing based methods, can also be used to dissect the spatial structure and dynamics of spread, as well as to assess how an epidemic may unfold through time and space.

Uncovering the spatial patterns of virus spread during outbreaks is a key objective that has been transformed by genomic epidemiology. Reconstructing a detailed spatial history of virus spread from the origin of an outbreak is generally a task for phylogeographic methods60, which provide location estimates for every ancestral node in a virus phylogeny using simple stochastic (or ‘random walk’) models. Phylogeographic analyses, for example, were used to show how Ebola virus spread across West Africa during the 2013–2016 epidemic61 (Fig. 3). Importantly, virus genome sampling with strong spatiotemporal coverage allowed for the dissection of the entire epidemic into a metapopulation of short- and long-lived transmission chains61. Similar analyses were also used to show that multiple introductions were responsible for sustaining the 2016 Zika outbreak in Florida62. It is important, however, to appreciate the uncertainty of phylogeographic estimates, and to bear in mind that such analyses may only be capable of elucidating partial pictures of outbreak spread. In addition, sampling biases may severely affect these analyses, although the coalescent and birth–death models mentioned above have been extended to account for aspects of virus population structure6365, making the analyses more robust to sampling heterogeneity66.

Fig. 3. Integration and testing predictors of phylogeographic spread.

Fig. 3

We illustrate the concept of this approach using the 2013–2016 Ebola epidemic in West Africa. Geographic distances between all pairs of locations, in this case administrative areas in Guinea, Sierra Leone and Liberia, as well as population sizes at the origin and destination of these pairs are combined into a transition rate matrix through a generalized linear model. This matrix parameterizes the phylogenetic process of spread that is being estimated. Each predictor is associated with a coefficient, β, which denotes the strength of contribution with some predictors (for example, population size) positively associated with the intensity of migration whereas others (for example, geographic distance) are negatively associated. A coefficient of 0 implies that the predictor is excluded from the model (represented in the figure by the transparent matrix with β = 0).

Phylogeographic inference methods can also be used to provide insights into the factors driving virus spread67 (Fig. 3). Such analyses are enabled by the integration of virus genomics with diverse meta-data sets and are critically dependent on the timeliness of data generation and open sharing. These approaches were initially introduced to confirm the key role of human air transportation in the global circulation of influenza viruses67, but they have also been useful in untangling complex virus transmission dynamics on smaller scales61. To illustrate these methods, in Fig. 3 we show an application of generalized linear modelling to explain Ebola virus migration rates between locations as a function of several potential predictors, to infer virus spread during West African Ebola outbreak (Fig. 3). In this case, geographic distances and population sizes at the location of origin and destination combine into a gravity model of spread, with virus transmission largely occurring within large population centres and geographic spread being more frequent over shorter distances61. These phylodynamic studies illustrate the growing importance of data integration for virus genomic analyses55, which critically depend on accurate metadata (for example, sampling date and sampling location), as well as other data sources that can capture host mobility and geographic, demographic and epidemiological context.

Inter-epidemic evolution and spread

Once outbreaks have been brought under control or (temporarily) resolved, phylogenetic analyses can provide insights into evolutionary patterns during inter-epidemic periods by comparing virus genome sequences sampled across different outbreaks. The most fundamental question is whether the virus in question has been able to persist in human populations between outbreaks, so that each new outbreak has arisen from an endemically circulating lineage (for example, dengue virus), or whether they represent independent zoonotic spillover events from an animal reservoir (for example, Ebola virus). With sufficient sampling of viruses from human and reservoir species, this question can be answered using standard phylogenetic analysis. For example, although both dengue virus and yellow fever virus have transmission cycles that involve mosquitoes and humans (urban transmission) or nonhuman primates (sylvatic transmission), phylogenetic analyses have shown that dengue virus is now an entirely endemic urban virus that does not rely on its sylvatic vectors and hosts to seed new epidemics68. Most human outbreaks of yellow fever, in contrast, have been shown by virus genomics approaches to represent independent emergences of the virus from sylvatic sources, rather than spread via an urban cycle69,70.

Inter-epidemic analyses can also be used to elucidate the nature of virus evolution and spread in reservoir species, which are probably characterized by different evolutionary forces than those seen during human outbreaks71,72. For example, although human outbreaks of Ebola have happened relatively frequently since the 1970s, each outbreak starts as an independent spillover of the virus from an animal (probably bat73) reservoir. Hence, the inter-epidemic evolution of Ebola virus occurs in a species other than humans, such that patterns of genetic divergence among the viruses associated with human epidemics can provide insight into viral replication and transmission within reservoir hosts. For example, there have been suggestions that Ebola virus has spread across Africa in a wave-like manner in its reservoir species74; however, phylogenetic analyses incorporating virus genomic data from recent outbreaks are incompatible with this scenario6. Additionally, while Ebola virus normally evolves according to a relatively constant molecular clock6,45,7577, the phylogenetic branch leading to the viruses sequenced from the small Ebola outbreak that occurred in the Democratic Republic of the Congo in 2014, concurrent with the 2013–2016 epidemic in West Africa, was characterized by a far lower evolutionary rate78. Although the reasons for this reduction in evolutionary tempo are unclear, it is possible that it reflects Ebola virus evolution in a different (unknown) reservoir species that experiences a lower rate of viral replication. Alternatively, this rate disparity may result from the existence of different viral replication states within the same reservoir host, similar to that described during human epidemics, with faster rates observed during continuous human-to-human transmission and slower rates during persistent infections of Ebola survivors79.

Requirements and challenges in genomic epidemiology

Virus genomic methods for outbreak investigation and control are powerful additions to more traditional epidemiological approaches but are critically dependent on well planned and coordinated efforts. The foremost need for genomic epidemiology is timely access to clinical samples and data, which should be built on productive and equitable collaborations with local communities, public health agencies, outbreak responders, local clinics and researchers80. For each clinical sample to be used for virus genomic sequencing, it is essential to obtain a minimal set of metadata related to the infection, including: (1) the date of sample collection and/or onset of symptoms; and (2) the location of sampling. Additional information can greatly increase the utility of genomic epidemiology, including the availability of: (3) travel and contact history; (4) suspected source of infection; and (5) clinical outcome and symptoms. Other factors, including patient history, age, sex and economic status can also help to reveal risk factors underlying infection and transmission. Within ethical constraints, it is important that communication lines remain open so that researchers undertaking data analysis can return actionable results to the public health community.

Other large-scale data resources are essential for investigating the spatio-temporal history and spread of an outbreak. These include the temporal and spatial distribution of cases, ecological conditions, vector abundance, environmental factors and travel patterns. Integration of these other data sources with virus genomic data may reveal new properties of an outbreak, potentially leading to actionable measures55,61,67. Non-genomic data often comes from established networks of collaborations, or from the public domain, highlighting the value of open data and data sharing to outbreak investigations.

An important benefit of genomic epidemiology is that it can directly compare and jointly analyse virus genome sequences obtained during an epidemic, even if those sequences were generated by different laboratories. Consequently, there is an urgent need to make genomic and epidemiological data and analysis tools publically available during ongoing epidemics81. This movement is supported by the World Health Organization (WHO), which has called for data pertaining to public health emergencies to be disseminated openly and immediately following generation, and not withheld until the acceptance or publication of a corresponding scientific paper82. More recently, the WHO has outlined the current and future benefits of virus genome data sharing during outbreaks83. Combined with an acceleration of making manuscripts available via preprint servers such as arXiv and bioRxiv, especially during outbreaks84, there has been a shift towards scientists storing their data and source code on depositories such as GitHub (https://www.github.com), Synapse (https://www.synapse.org) and Data Dryad (https://www.datadryad.org), in close to real-time for others to use. Furthermore, extensive online communities and forums such as Twitter (https://www.twitter.com), Virological (http://virological.org), FluTrackers (https://www.flutrackers.com), ProMED (https://www.promedmail.org), Nextstrain (https://www.nextstrain.org), HealthMap (https://www.healthmap.org) and Microreact (https://www.microreact.org) allow rapid dissemination of unpublished results and analyses. In our experience, not only does the process of open science promote new collaborations and lead to more accurate scientific insights into outbreak research, but it helps in getting relevant information rapidly into the hands of decision makers. Despite these advances, however, the speed, nature and extent of virus genome data sharing is inconsistent, sometimes resulting in confusion over what is, or should be, best practice81,85.

Future perspective

Genomic epidemiology promises much to the study and control of infectious disease outbreaks, particularly if viral genomes can be acquired and analysed in real-time. The accumulated set of these data—together with the rapid development of sophisticated software packages (http://virological.org/c/software)—will provide a valuable resolve for the mitigation and control of future outbreaks. Ultimately, with sufficient genome sequences from individual viral genera and/or families, it may be possible to categorize viruses by their phylogenetic patterns and utilize this information in epidemic preparedness. For example, as well as considering obvious biological features of viruses such as their genome structure and mode of transmission, it may be possible to group viruses according to a series of evolutionary variables such as rate of evolutionary change, extent of antigenic evolution, frequency of recombination, pattern of geographic spread and population dynamics. This information may then help forecast the evolutionary behaviour of any virus, should it re-emerge in human populations, and assist in the selection of future vaccine strains8688. This information will also help counter the alarmist claims that emerging viruses will evolve novel phenotypes, such as airborne transmission in the case of Ebola virus89, that often accompany any major disease outbreak. It is clear, however, that a more fundamental understanding of the genetic and ecological barriers of virus spillover into human populations is needed to better identify risk factors for disease emergence. Long-term capacity building, partnerships with local communities, and commitments to long-term investments on these fronts will go a long way towards better enabling the global community to effectively and rapidly deal with future emerging outbreaks80.

Acknowledgements

We thank G. Dudas and S. Knemeyer for help with figure creation. N.D.G. is supported by NIH training grant 5T32AI007244-33. P.L. and A.R. acknowledge funding from the European Research Council under the European Union’s Horizon 2020 research and innovation programme (grant agreement no. 725422-ReservoirDOCS) and from the Wellcome Trust Collaborative Award (grant number 206298/Z/17/Z—ARTICnetwork). P.L. acknowledges support by the Research Foundation—Flanders (‘Fonds voor Wetenschappelijk Onderzoek - Vlaanderen’, G066215N, G0D5117N and G0B9317N). O.G.P. is supported by the European Union’s Seventh Framework Programme (FP7/2007-2013)/European Research Council (614725-PATHPHYLODYN) and by the Oxford Martin School. E.C.H. is supported by an ARC Australian Laureate Fellowship (FL170100022). K.G.A. is a Pew Biomedical Scholar, and is supported by NIH NCATS CTSA UL1TR002550, NIAID contract HHSN272201400048C, NIAID R21AI137690, NIAID U19AI135995, and The Ray Thomas Foundation.

Author contributions

All listed authors have contributed to the conceptualization, writing and preparation of the manuscript.

Competing interests

The authors declare no competing interests.

Footnotes

Publisher’s note: Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Contributor Information

Jason T. Ladner, Email: jason.ladner@nau.edu

Andrew Rambaut, Email: a.rambaut@ed.ac.uk.

Edward C. Holmes, Email: edward.holmes@sydney.edu.au

References

  • 1.Drosten C, et al. Identification of a novel coronavirus in patients with severe acute respiratory syndrome. N. Engl. J. Med. 2003;348:1967–1976. doi: 10.1056/NEJMoa030747. [DOI] [PubMed] [Google Scholar]
  • 2.Ksiazek TG, et al. A novel coronavirus associated with severe acute respiratory syndrome. N. Engl. J. Med. 2003;348:1953–1966. doi: 10.1056/NEJMoa030781. [DOI] [PubMed] [Google Scholar]
  • 3.Zaki AM, van Boheemen S, Bestebroer TM, Osterhaus ADME, Fouchier RAM. Isolation of a novel coronavirus from a man with pneumonia in Saudi Arabia. N. Engl. J. Med. 2012;367:1814–1820. doi: 10.1056/NEJMoa1211721. [DOI] [PubMed] [Google Scholar]
  • 4.Novel Swine-Origin Influenza A (H1N1) Virus Investigation Team et al. Emergence of a novel swine-origin influenza A (H1N1) virus in humans. N. Engl. J. Med. 2009;360:2605–2615. doi: 10.1056/NEJMoa0903810. [DOI] [PubMed] [Google Scholar]
  • 5.Smith GJD, et al. Origins and evolutionary genomics of the 2009 swine-origin H1N1 influenza A epidemic. Nature. 2009;459:1122–1125. doi: 10.1038/nature08182. [DOI] [PubMed] [Google Scholar]
  • 6.Holmes EC, Dudas G, Rambaut A, Andersen KG. The evolution of Ebola virus: insights from the 2013–2016 epidemic. Nature. 2016;538:193–200. doi: 10.1038/nature19790. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 7.Grubaugh ND, Faria NR, Andersen KG, Pybus OG. Genomic insights into Zika virus emergence and spread. Cell. 2018;172:1160–1162. doi: 10.1016/j.cell.2018.02.027. [DOI] [PubMed] [Google Scholar]
  • 8.Morse, S. S. in Plagues and Politics (ed. Mullan, F.) 8–26 (Palgrave Macmillan, London, 2001).
  • 9.Wolfe ND, Dunavan CP, Diamond J. Origins of major human infectious diseases. Nature. 2007;447:279–283. doi: 10.1038/nature05775. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 10.Holmes EC, Rambaut A, Andersen KG. Pandemics: spend on surveillance, not prediction. Nature. 2018;558:180–182. doi: 10.1038/d41586-018-05373-w. [DOI] [PubMed] [Google Scholar]
  • 11.Holland J, et al. Rapid evolution of RNA genomes. Science. 1982;215:1577–1585. doi: 10.1126/science.7041255. [DOI] [PubMed] [Google Scholar]
  • 12.Duffy S, Shackelton LA, Holmes EC. Rates of evolutionary change in viruses: patterns and determinants. Nat. Rev. Genet. 2008;9:267–276. doi: 10.1038/nrg2323. [DOI] [PubMed] [Google Scholar]
  • 13.Kiko H, Niggemann E, Rüger W. Physical mapping of the restriction fragments obtained from bacteriophage T4 dC-DNA with the restriction endonucleases SmaI, KpnI and BglII. Mol. Gen. Genet. 1979;172:303–312. doi: 10.1007/BF00271730. [DOI] [PubMed] [Google Scholar]
  • 14.Chungue E, Deubel V, Cassar O, Laille M, Martin PM. Molecular epidemiology of dengue 3 viruses and genetic relatedness among dengue 3 strains isolated from patients with mild or severe form of dengue fever in French Polynesia. J. Gen. Virol. 1993;74:2765–2770. doi: 10.1099/0022-1317-74-12-2765. [DOI] [PubMed] [Google Scholar]
  • 15.Lanciotti RS, et al. Origin of the West Nile virus responsible for an outbreak of encephalitis in the northeastern United States. Science. 1999;286:2333–2337. doi: 10.1126/science.286.5448.2333. [DOI] [PubMed] [Google Scholar]
  • 16.Kinnunen L, Pöyry T, Hovi T. Generation of virus genetic lineages during an outbreak of poliomyelitis. J. Gen. Virol. 1991;72:2483–2489. doi: 10.1099/0022-1317-72-10-2483. [DOI] [PubMed] [Google Scholar]
  • 17.McNearney T, et al. Limited sequence heterogeneity among biologically distinct human immunodeficiency virus type 1 isolates from individuals involved in a clustered infectious outbreak. Proc. Natl Acad. Sci. USA. 1990;87:1917–1921. doi: 10.1073/pnas.87.5.1917. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 18.Nichol ST, et al. Genetic identification of a hantavirus associated with an outbreak of acute respiratory illness. Science. 1993;262:914–917. doi: 10.1126/science.8235615. [DOI] [PubMed] [Google Scholar]
  • 19.Ou CY, et al. Molecular epidemiology of HIV transmission in a dental practice. Science. 1992;256:1165–1171. doi: 10.1126/science.256.5060.1165. [DOI] [PubMed] [Google Scholar]
  • 20.Power JP, et al. Molecular epidemiology of an outbreak of infection with hepatitis C virus in recipients of anti-D immunoglobulin. Lancet. 1995;345:1211–1213. doi: 10.1016/S0140-6736(95)91993-7. [DOI] [PubMed] [Google Scholar]
  • 21.Rossouw E, Tsilimigras CW, Schoub BD. Molecular epidemiology of a coxsackievirus B3 outbreak. J. Med. Virol. 1991;34:165–171. doi: 10.1002/jmv.1890340306. [DOI] [PubMed] [Google Scholar]
  • 22.Shendure J, Ji H. Next-generation DNA sequencing. Nat. Biotechnol. 2008;26:1135–1145. doi: 10.1038/nbt1486. [DOI] [PubMed] [Google Scholar]
  • 23.Briese T, et al. Genetic detection and characterization of Lujo virus, a new hemorrhagic fever-associated arenavirus from southern Africa. PLoS Pathog. 2009;5:e1000455. doi: 10.1371/journal.ppat.1000455. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 24.Gardy JL, Loman NJ. Towards a genomics-informed, real-time, global pathogen surveillance system. Nat. Rev. Genet. 2018;19:9–20. doi: 10.1038/nrg.2017.88. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 25.Salazar-Bravo J, Ruedas LA, Yates TL. Mammalian reservoirs of arenaviruses. Curr. Top. Microbiol. Immunol. 2002;262:25–63. doi: 10.1007/978-3-642-56029-3_2. [DOI] [PubMed] [Google Scholar]
  • 26.dos Reis M, Donoghue PCJ, Yang Z. Bayesian molecular clock dating of species divergences in the genomics era. Nat. Rev. Genet. 2016;17:71–80. doi: 10.1038/nrg.2015.8. [DOI] [PubMed] [Google Scholar]
  • 27.Rambaut A, Holmes E. The early molecular epidemiology of the swine-origin A/H1N1 human influenza pandemic. PLoS Curr. 2009;1:RRN1003. doi: 10.1371/currents.RRN1003. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 28.Korber B. Timing the ancestor of the HIV-1 pandemic strains. Science. 2000;288:1789–1796. doi: 10.1126/science.288.5472.1789. [DOI] [PubMed] [Google Scholar]
  • 29.Cotten M, et al. Full-genome deep sequencing and phylogenetic analysis of novel human betacoronavirus. Emerg. Infect. Dis. 2013;19:736–42B. doi: 10.3201/eid1905.130057. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 30.Rambaut A. Estimating the rate of molecular evolution: incorporating non-contemporaneous sequences into maximum likelihood phylogenies. Bioinformatics. 2000;16:395–399. doi: 10.1093/bioinformatics/16.4.395. [DOI] [PubMed] [Google Scholar]
  • 31.Drummond A, Pybus OG, Rambaut A. Inference of viral evolutionary rates from molecular sequences. Adv. Parasitol. 2003;54:331–358. doi: 10.1016/S0065-308X(03)54008-8. [DOI] [PubMed] [Google Scholar]
  • 32.Drummond AJ, Nicholls GK, Rodrigo AG, Solomon W. Estimating mutation parameters, population history and genealogy simultaneously from temporally spaced sequence data. Genetics. 2002;161:1307–1320. doi: 10.1093/genetics/161.3.1307. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 33.Möller S, du Plessis L, Stadler T. Impact of the tree prior on estimating clock rates during epidemic outbreaks. Proc. Natl Acad. Sci. USA. 2018;115:4200–4205. doi: 10.1073/pnas.1713314115. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 34.Duchêne S, Holmes EC, Ho SYW. Analyses of evolutionary dynamics in viruses are hindered by a time-dependent bias in rate estimates. Proc. Biol. Sci. 2014;281:20140732. doi: 10.1098/rspb.2014.0732. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 35.Hall MD, Woolhouse MEJ, Rambaut A. Using genomics data to reconstruct transmission trees during disease outbreaks. Rev. Sci. Tech. 2016;35:287–296. doi: 10.20506/rst.35.1.2433. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 36.Fraser C, et al. Pandemic potential of a strain of influenza A (H1N1): early findings. Science. 2009;324:1557–1561. doi: 10.1126/science.1176062. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 37.Volz EM, Kosakovsky Pond SL, Ward MJ, Leigh Brown AJ, Frost SDW. Phylodynamics of infectious disease epidemics. Genetics. 2009;183:1421–1430. doi: 10.1534/genetics.109.106021. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 38.Rasmussen DA, Ratmann O, Koelle K. Inference for nonlinear epidemiological models using genealogies and time series. PLoS Comput. Biol. 2011;7:e1002136. doi: 10.1371/journal.pcbi.1002136. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 39.Stadler T, et al. Estimating the basic reproductive number from viral sequence data. Mol. Biol. Evol. 2012;29:347–357. doi: 10.1093/molbev/msr217. [DOI] [PubMed] [Google Scholar]
  • 40.Kühnert D, Stadler T, Vaughan TG, Drummond AJ. Simultaneous reconstruction of evolutionary history and epidemiological dynamics from viral sequences with the birth-death SIR model. J. R. Soc. Interface. 2014;11:20131106. doi: 10.1098/rsif.2013.1106. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 41.Stadler, T., Kühnert, D., Rasmussen, D. A. & du Plessis, L. Insights into the early epidemic spread of Ebola in Sierra Leone provided by viral sequence data. PLoS Curr. 10.1371/currents.outbreaks.02bc6d927ecee7bbd33532ec8ba6a25f (2014). [DOI] [PMC free article] [PubMed]
  • 42.Volz, E. & Pond, S. Phylodynamic analysis of Ebola virus in the 2014 Sierra Leone epidemic. PLoS Curr. 10.1371/currents.outbreaks.6f7025f1271821d4c815385b08f5f80e (2014). [DOI] [PMC free article] [PubMed]
  • 43.McCormick JB, Fisher-Hoch SP. Lassa fever. Curr. Top. Microbiol. Immunol. 2002;262:75–109. doi: 10.1007/978-3-642-56029-3_4. [DOI] [PubMed] [Google Scholar]
  • 44.Andersen KG, et al. Clinical sequencing uncovers origins and evolution of Lassa virus. Cell. 2015;162:738–750. doi: 10.1016/j.cell.2015.07.020. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 45.Gire SK, et al. Genomic surveillance elucidates Ebola virus origin and transmission during the 2014 outbreak. Science. 2014;345:1369–1372. doi: 10.1126/science.1259657. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 46.Mena I, et al. Origins of the 2009 H1N1 influenza pandemic in swine in Mexico. eLife. 2016;5:e16777. doi: 10.7554/eLife.16777. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 47.Morelli MJ, et al. A Bayesian inference framework to reconstruct transmission trees using epidemiological and genetic data. PLoS Comput. Biol. 2012;8:e1002768. doi: 10.1371/journal.pcbi.1002768. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 48.Cottam EM, et al. Integrating genetic and epidemiological data to determine transmission pathways of foot-and-mouth disease virus. Proc. Biol. Sci. 2008;275:887–895. doi: 10.1098/rspb.2007.1442. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 49.Cottam EM, et al. Molecular epidemiology of the foot-and-mouth disease virus outbreak in the United Kingdom in 2001. J. Virol. 2006;80:11274–11282. doi: 10.1128/JVI.01236-06. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 50.Mate SE, et al. Molecular evidence of sexual transmission of Ebola virus. N. Engl. J. Med. 2015;373:2448–2454. doi: 10.1056/NEJMoa1509773. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 51.Blackley DJ, et al. Reduced evolutionary rate in re-emerged Ebola virus transmission chains. Sci. Adv. 2016;2:e1600378. doi: 10.1126/sciadv.1600378. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 52.Diallo B, et al. Resurgence of Ebola virus disease in Guinea linked to a survivor with virus persistence in seminal fluid for more than 500 days. Clin. Infect. Dis. 2016;63:1353–1356. doi: 10.1093/cid/ciw601. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 53.Duffy S, Shackelton LA, Holmes EC. Rates of evolutionary change in viruses: patterns and determinants. Nat. Rev. Genet. 2008;9:267–276. doi: 10.1038/nrg2323. [DOI] [PubMed] [Google Scholar]
  • 54.Biek R, Pybus OG, Lloyd-Smith JO, Didelot X. Measurably evolving pathogens in the genomic era. Trends Ecol. Evol. 2015;30:306–313. doi: 10.1016/j.tree.2015.03.009. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 55.Baele G, Suchard MA, Rambaut A, Lemey P. Emerging concepts of data integration in pathogen phylodynamics. Syst. Biol. 2017;66:e47–e65. doi: 10.1093/sysbio/syw054. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 56.Campbell F, Strang C, Ferguson N, Cori A, Jombart T. When are pathogen genome sequences informative of transmission events? PLoS Pathog. 2018;14:e1006885. doi: 10.1371/journal.ppat.1006885. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 57.Mate SE, et al. Molecular evidence of sexual transmission of Ebola virus. N. Engl. J. Med. 2015;373:2448–2454. doi: 10.1056/NEJMoa1509773. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 58.Resik S, et al. Limitations to contact tracing and phylogenetic analysis in establishing HIV type 1 transmission networks in Cuba. AIDS Res. Hum. Retroviruses. 2007;23:347–356. doi: 10.1089/aid.2006.0158. [DOI] [PubMed] [Google Scholar]
  • 59.Worby CJ, Lipsitch M, Hanage WP. Shared genomic variants: identification of transmission routes using pathogen deep-sequence data. Am. J. Epidemiol. 2017;186:1209–1216. doi: 10.1093/aje/kwx182. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 60.Faria NR, Suchard MA, Rambaut A, Lemey P. Toward a quantitative understanding of viral phylogeography. Curr. Opin. Virol. 2011;1:423–429. doi: 10.1016/j.coviro.2011.10.003. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 61.Dudas G, et al. Virus genomes reveal factors that spread and sustained the Ebola epidemic. Nature. 2017;544:309–315. doi: 10.1038/nature22040. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 62.Grubaugh ND, et al. Genomic epidemiology reveals multiple introductions of Zika virus into the United States. Nature. 2017;90:4864. doi: 10.1038/nature22400. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 63.Vaughan TG, Kühnert D, Popinga A, Welch D, Drummond AJ. Efficient Bayesian inference under the structured coalescent. Bioinformatics. 2014;30:2272–2279. doi: 10.1093/bioinformatics/btu201. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 64.Müller NF, Rasmussen DA, Stadler T. The structured coalescent and its approximations. Mol. Biol. Evol. 2017;34:2970–2981. doi: 10.1093/molbev/msx186. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 65.Kühnert D, Stadler T, Vaughan TG, Drummond AJ. Phylodynamics with migration: a computational framework to quantify population structure from genomic data. Mol. Biol. Evol. 2016;33:2102–2116. doi: 10.1093/molbev/msw064. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 66.De Maio N, Wu CH, O’Reilly KM, Wilson D. New routes to phylogeography: a bayesian structured coalescent approximation. PLoS Genet. 2015;11:e1005421. doi: 10.1371/journal.pgen.1005421. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 67.Lemey P, et al. Unifying viral genetics and human transportation data to predict the global transmission dynamics of human influenza H3N2. PLoS Pathog. 2014;10:e1003932. doi: 10.1371/journal.ppat.1003932. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 68.Wang E, et al. Evolutionary relationships of endemic/epidemic and sylvatic dengue viruses. J. Virol. 2000;74:3227–3234. doi: 10.1128/JVI.74.7.3227-3234.2000. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 69.Cardoso J, da C, et al. Yellow fever virus in Haemagogus leucocelaenus and Aedes serratus mosquitoes, southern Brazil, 2008. Emerg. Infect. Dis. 2010;16:1918–1924. doi: 10.3201/eid1612.100608. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 70.Faria NR, et al. Genomic and epidemiological monitoring of yellow fever virus transmission potential. Science. 2018;361:894–899. doi: 10.1126/science.aat7115. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 71.Diehl WE, et al. Ebola virus glycoprotein with increased infectivity dominated the 2013–2016 epidemic. Cell. 2016;167:1088–1097. doi: 10.1016/j.cell.2016.10.014. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 72.Urbanowicz RA, et al. Human adaptation of Ebola virus during the West African outbreak. Cell. 2016;167:1079–1085. doi: 10.1016/j.cell.2016.10.013. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 73.Leroy EM, et al. Fruit bats as reservoirs of Ebola virus. Nature. 2005;438:575–576. doi: 10.1038/438575a. [DOI] [PubMed] [Google Scholar]
  • 74.Walsh PD, Biek R, Real LA. Wave-like spread of Ebola Zaire. PLoS Biol. 2005;3:e371. doi: 10.1371/journal.pbio.0030371. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 75.Carroll SA, et al. Molecular evolution of viruses of the family Filoviridae based on 97 whole-genome sequences. J. Virol. 2013;87:2608–2616. doi: 10.1128/JVI.03118-12. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 76.Dudas, G. & Rambaut, A. Phylogenetic analysis of Guinea 2014 EBOV Ebolavirus outbreak. PLoS Curr.10.1371/currents.outbreaks.84eefe5ce43ec9dc0bf0670f7b8b417d (2014). [DOI] [PMC free article] [PubMed]
  • 77.Rambaut A, et al. Comment on ‘Mutation rate and genotype variation of Ebola virus from Mali case sequences’. Science. 2016;353:658. doi: 10.1126/science.aaf3823. [DOI] [PubMed] [Google Scholar]
  • 78.Lam TTY, Zhu H, Chong YL, Holmes EC, Guan Y. Puzzling origins of the Ebola outbreak in the Democratic Republic of the Congo, 2014. J. Virol. 2015;89:10130–10132. doi: 10.1128/JVI.01226-15. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 79.Blackley DJ, et al. Reduced evolutionary rate in reemerged Ebola virus transmission chains. Sci. Adv. 2016;2:e1600378. doi: 10.1126/sciadv.1600378. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 80.Yozwiak NL, et al. Roots, not parachutes: research collaborations combat outbreaks. Cell. 2016;166:5–8. doi: 10.1016/j.cell.2016.06.029. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 81.Yozwiak NL, Schaffner SF, Sabeti PC. Data sharing: make outbreak research open access. Nature. 2015;518:477–479. doi: 10.1038/518477a. [DOI] [PubMed] [Google Scholar]
  • 82.WHO. Policy statement on data sharing by WHO in the context of public health emergencies (as of 13 April 2016) Wkly. Epidemiol. Rec. 2016;91:237–240. [PubMed] [Google Scholar]
  • 83.WHO R&D Blueprint Meeting on Pathogen Genetic Sequence Data (GSD) Sharing in the Context of Public Health Emergencies, 28-29 September 2017 (WHO, 2017).
  • 84.Johansson MA, Reich NG, Meyers LA, Lipsitch M. Preprints: an underutilized mechanism to accelerate outbreak science. PLoS Med. 2018;15:e1002549. doi: 10.1371/journal.pmed.1002549. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 85.Callaway, E. Zika-microcephaly paper sparks data-sharing confusion. Nature News10.1038/nature.2016.19367 (2016).
  • 86.Luksza M, Lässig M. A predictive fitness model for influenza. Nature. 2014;507:57–61. doi: 10.1038/nature13087. [DOI] [PubMed] [Google Scholar]
  • 87.Smith DJ, et al. Mapping the antigenic and genetic evolution of influenza virus. Science. 2004;305:371–376. doi: 10.1126/science.1097211. [DOI] [PubMed] [Google Scholar]
  • 88.Neher RA, Bedford T, Daniels RS, Russell CA, Shraiman BI. Prediction, dynamics, and visualization of antigenic phenotypes of seasonal influenza viruses. Proc. Natl Acad. Sci. USA. 2016;113:E1701–9. doi: 10.1073/pnas.1525578113. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 89.Osterholm MT, et al. Transmission of Ebola viruses: what we know and what we do not know. mBio. 2015;6:e00137. doi: 10.1128/mBio.00137-15. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 90.Sabir JSM, et al. Co-circulation of three camel coronavirus species and recombination of MERS-CoVs in Saudi Arabia. Science. 2016;351:81–84. doi: 10.1126/science.aac8608. [DOI] [PubMed] [Google Scholar]
  • 91.Dudas, G., Carvalho, L. M., Rambaut, A. & Bedford, T. MERS-CoV spillover at the camel-human interface. eLife7, (2018). [DOI] [PMC free article] [PubMed]
  • 92.Faria NR, et al. Zika virus in the Americas: early epidemiological and genetic findings. Science. 2016;352:345–349. doi: 10.1126/science.aaf5036. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 93.Faria NR, et al. Establishment and cryptic transmission of Zika virus in Brazil and the Americas. Nature. 2017;546:406–410. doi: 10.1038/nature22401. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 94.Metsky HC, et al. Zika virus evolution and spread in the Americas. Nature. 2017;66:366. doi: 10.1038/nature22402. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 95.Christie A, et al. Possible sexual transmission of Ebola virus — Liberia, 2015. MMWR Morb. Mortal. Wkly. Rep. 2015;64:479–481. [PMC free article] [PubMed] [Google Scholar]
  • 96.Whitmer SLM, et al. Active Ebola virus replication and heterogeneous evolutionary rates in EVD survivors. Cell Rep. 2018;22:1159–1168. doi: 10.1016/j.celrep.2018.01.008. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 97.Dietzel E, Schudt G, Krähling V, Matrosovich M, Becker S. Functional characterization of adaptive mutations during the West African Ebola virus outbreak. J. Virol. 2017;91:e01913–16. doi: 10.1128/JVI.01913-16. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 98.List of Blueprint Priority Diseases (WHO, 2018); https://www.who.int/blueprint/priority-diseases/en/
  • 99.Boisen ML, et al. Field validation of the ReEBOV antigen rapid test for point-of-care diagnosis of Ebola virus infection. J. Infect. Dis. 2016;214:S203–S209. doi: 10.1093/infdis/jiw261. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 100.Broadhurst MJ, et al. ReEBOV antigen rapid test kit for point-of-care and laboratory-based testing for Ebola virus disease: a field validation study. Lancet. 2015;386:867–874. doi: 10.1016/S0140-6736(15)61042-X. [DOI] [PubMed] [Google Scholar]
  • 101.Chotiwan N, et al. Rapid and specific detection of Asian- and African-lineage Zika viruses. Sci. Transl. Med. 2017;9:eaag0538. doi: 10.1126/scitranslmed.aag0538. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 102.Imai M, et al. Development of H5-RT-LAMP (loop-mediated isothermal amplification) system for rapid diagnosis of H5 avian influenza virus infection. Vaccine. 2006;24:6679–6682. doi: 10.1016/j.vaccine.2006.05.046. [DOI] [PubMed] [Google Scholar]
  • 103.Hong TCT, et al. Development and evaluation of a novel loop-mediated isothermal amplification method for rapid detection of severe acute respiratory syndrome coronavirus. J. Clin. Microbiol. 2004;42:1956–1961. doi: 10.1128/JCM.42.5.1956-1961.2004. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 104.Hattersley SM, Greenman J, Haswell SJ. The application of microfluidic devices for viral diagnosis in developing countries. Methods Mol. Biol. 2013;949:285–303. doi: 10.1007/978-1-62703-134-9_19. [DOI] [PubMed] [Google Scholar]
  • 105.Patolsky F, et al. Electrical detection of single viruses. Proc. Natl Acad. Sci. USA. 2004;101:14017–14022. doi: 10.1073/pnas.0406159101. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 106.Chen Y, et al. Field-effect transistor biosensor for rapid detection of Ebola antigen. Sci. Rep. 2017;7:10974. doi: 10.1038/s41598-017-11387-7. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 107.Afsahi S, et al. Novel graphene-based biosensor for early detection of Zika virus infection. Biosens. Bioelectron. 2018;100:85–88. doi: 10.1016/j.bios.2017.08.051. [DOI] [PubMed] [Google Scholar]
  • 108.Pardee K, et al. Paper-based synthetic gene networks. Cell. 2014;159:940–954. doi: 10.1016/j.cell.2014.10.004. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 109.Gootenberg JS, et al. Nucleic acid detection with CRISPR-Cas13a/C2c2. Science. 2017;356:438–442. doi: 10.1126/science.aam9321. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 110.Myhrvold C, et al. Field-deployable viral diagnostics using CRISPR-Cas13. Science. 2018;360:444–448. doi: 10.1126/science.aas8836. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 111.Gootenberg JS, et al. Multiplexed and portable nucleic acid detection platform with Cas13, Cas12a, and Csm6. Science. 2018;360:439–444. doi: 10.1126/science.aaq0179. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 112.Chen JS, et al. CRISPR-Cas12a target binding unleashes indiscriminate single-stranded DNase activity. Science. 2018;360:436–439. doi: 10.1126/science.aar6245. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 113.Gu W, et al. Depletion of Abundant Sequences by Hybridization (DASH): using Cas9 to remove unwanted high-abundance species in sequencing libraries and molecular counting applications. Genome Biol. 2016;17:41. doi: 10.1186/s13059-016-0904-5. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 114.Siddle, K. J. et al. Capturing diverse microbial sequence with comprehensive and scalable probe design. bioRxiv10.1101/279570 (2018).
  • 115.Matranga CB, et al. Enhanced methods for unbiased deep sequencing of Lassa and Ebola RNA viruses from clinical and biological samples. Genome Biol. 2014;15:519. doi: 10.1186/s13059-014-0519-7. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 116.Quick J, et al. Multiplex PCR method for MinION and Illumina sequencing of Zika and other virus genomes directly from clinical samples. Nat. Protoc. 2017;12:1261–1276. doi: 10.1038/nprot.2017.066. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 117.Quick J, et al. Real-time, portable genome sequencing for Ebola surveillance. Nature. 2016;530:228–232. doi: 10.1038/nature16996. [DOI] [PMC free article] [PubMed] [Google Scholar]

Articles from Nature Microbiology are provided here courtesy of Nature Publishing Group

RESOURCES