Skip to main content
PLOS One logoLink to PLOS One
. 2009 Mar 16;4(3):e4876. doi: 10.1371/journal.pone.0004876

Genetic Diversity in the SIR Model of Pathogen Evolution

Isabel Gordo 1,*, M Gabriela M Gomes 1,2, Daniel G Reis 1, Paulo R A Campos 3
Editor: Robert DeSalle4
PMCID: PMC2653725  PMID: 19287490

Abstract

We introduce a model for assessing the levels and patterns of genetic diversity in pathogen populations, whose epidemiology follows a susceptible-infected-recovered model (SIR). We model the population of pathogens as a metapopulation composed of subpopulations (infected hosts), where pathogens replicate and mutate. Hosts transmit pathogens to uninfected hosts. We show that the level of pathogen variation is well predicted by analytical expressions, such that pathogen neutral molecular variation is bounded by the level of infection and increases with the duration of infection. We then introduce selection in the model and study the invasion probability of a new pathogenic strain whose fitness (R 0(1+s)) is higher than the fitness of the resident strain (R 0). We show that this invasion probability is given by the relative increment in R 0 of the new pathogen (s). By analyzing the patterns of genetic diversity in this framework, we identify the molecular signatures during the replacement and compare these with those observed in sequences of influenza A.

Introduction

Understanding molecular variation in populations with a complex demographic history is of utmost importance [1], [2]. This is so, not only because most natural populations do not have simple demographic histories [3], but also because populations as those of microbes that can cause human diseases do not conform to a simple unstructured, constant size population model [4], [5]. In fact, the standard neutral model of M. Kimura [6], that has provided us with a null model against which we can create interesting alternative hypothesis to understand molecular evolution and variation, is far too simple for understanding pathogen genetic diversity. With this motivation we have studied a non-standard neutral model that aims at being simple enough, but not too simple, so as to account for some of the demographic processes that are likely to occur in natural pathogen populations. The susceptible-infected-recovered (SIR) framework has been used extensively in mathematical epidemiology [7], where the focus lies on how the prevalence and dynamics of infection varies with the transmission capacity of the pathogen and the characteristics of host immune response [8]. In this sense, pathogens are static entities whose evolution is disregarded, at least in the short term. But for pathogens with high mutation rates [9], such as RNA viruses, or even bacteria, it may not be safe to ignore pathogen evolution, even in the short term [10].

Recently, some epidemiological models have been studied where pathogen mutation has been incorporated [4], [5], [11], [12]. For example, Boni [11] studied an SIR model where pathogen mutation was introduced in a simple way. The model keeps track of various pathogen lineages that give rise to new lineages through mutation that implicitly occurs at the transmission stage. Furthermore, the population size of hosts is effectively infinite and intra-host drift is not considered. They show that if all new strains that are continuously created are selectively equivalent then diversity increases at a constant rate (U) and the number of extant lineages at a given time t, is Poisson distributed with mean Ut. In this model, the evolution within each host is not explicitly considered and U represents the rate of fixation of new mutations within a host. Other authors have used the powerful tool of coalescent theory [13] to analyse pathogen genetic diversity but have assumed that the pathogen populations follow the Wright-Fisher model of an unstructured population that fluctuates in size [14]. Here we introduce a modelling framework (Figure 1) that explicitly considers both the population structure of pathogens, which is related to the contact structure of their hosts, and intra-host evolution, where pathogens mutate and new strains can stochastically go extinct. Initially, we consider a neutral evolutionary process where every new strain, although genetically different, is phenotypically equivalent to any other strain, i.e. each strain has the same transmissibility and causes infections with the same duration. We ask what level and pattern of sequence diversity should be expected under this scenario, when both epidemiological and genetic equilibrium between mutation and drift are achieved. We then study the pattern of diversity when a new epidemic occurs. Finally, we introduce selection in the framework and compare the patterns of diversity expected in our model with those observed in natural influenza A isolates.

Figure 1. SIR model with reinfection and selection.

Figure 1

Ii,j means a host that has been infected with the i th strain and is currently carrying pathogen strain j. e i are the different possible rates of recovery from the current infection, when all e i are equal we have a neutral model with reinfection. Ri means that the host has recovered from infection with strain i. We have simulated a strong selective advantage of the new strain by setting e = 0.1 and E = 7*e; e 1 = 3*e and e 2 = 7*e.

Methods

The SIR epidemiological model

In standard formulations of disease dynamics, the time evolution of the different classes of hosts is described by a simple set of ordinary differential equations[7]. Upon this assumption, the population is assumed to be homogeneous and infinitely large such that stochastic events are negligible. In the SIR model, the hosts can be in one of three states: susceptible (S), infected (I) and recovered (R). A susceptible host can get infected at rate β when in contact with infected individuals. At rate τ an infected individual will be recovered. Upon this dynamics the SIR model is then described by the following set of differential equations:

graphic file with name pone.0004876.e001.jpg

where μ corresponds to the birth and death rates of hosts. By measuring time in units of duration of infection, Inline graphic, where Inline graphic, and considering the normalization condition, Inline graphic, which implies that we can omit one of the equations, the model can be re-written as

graphic file with name pone.0004876.e005.jpg (1)

where Inline graphic, and Inline graphic is the relevant parameter of the model and it is known as the basic reproductive number [7]. The basic reproductive number is the average number of secondary cases a typical infected individual can cause in a completely susceptible population during its entire infectious period. System (1) has two solutions: Inline graphic. The disease-free equilibrium, Inline graphic, exists for every Inline graphic and is stable for Inline graphic, whereas the endemic equilibrium, Inline graphic, exists and is stable only when Inline graphic.

Here, we study a discrete time SIR model where we consider explicitly the evolution of pathogens in a finite structured host population and where their genetic diversity is followed.

Including pathogen genetic diversity in the SIR model

To study genetic diversity of a pathogen whose epidemiology follows the SIR model we consider a discrete model of a structured population [5], [15]. The population structure of pathogens is modeled as a metapopulation where each host is depicted as a deme in the metapopulation. D hosts are assumed. An empty deme represents a host in the susceptible state or in the recovered state, whereas a deme which is full corresponds to an infected host. A deme that is currently full (infected) can move to the recovered state with probability e (Figure 1). With probability b any given host can move to the susceptible state. So, the probability that a currently filled deme becomes empty is e+b. A deme that is currently empty and in the susceptible state can become full (infected) through transmission of pathogens- migrants- from nearby filled demes. This implies that the transmission rate, β, is proportional to the migration/recolonization rate, m. If an infection event occurs at a given time then, in the next time step, the pathogen reproduces with mutation (with mutation rate per genome per generation U) to give rise to a diverse population of size Nd, which is the maximum level of parasites within an infected host.

We have considered a homogeneous contact structure, where any given host is connected to the remaining Inline graphic hosts. When a susceptible host is in contact with an infected host it can receive a given number of migrant pathogens which is assumed to be Poisson distributed with mean Ndm. This implies that the mean level of transmission, β, corresponds to NdmK. An empty deme that is in the recovered state is not allowed to receive pathogens, which corresponds to the SIR model with no reinfection. As in the deterministic description of the SIR the relevant parameter is Inline graphic, and considering small values of e, b and Ndm, for which probabilities are similar to rates [16], in our time-discrete stochastic analogue the relevant parameter becomes:

graphic file with name pone.0004876.e016.jpg (2)

Therefore, we can estimate the proportion of infected hosts as:

graphic file with name pone.0004876.e017.jpg (3)

We have compared the infection levels in the simulations with this expectation and found a clear agreement between Eq. (3) and our simulation results, which demonstrates the correspondence between the stochastic model with the fully connected host contact structure and the traditional deterministic SIR model (see Figure 2). We have checked that the results are not dependent on the total pathogen effective population size within hosts, Nd, and on the number of demes D. We have also ascertained that Inline graphic is the critical value to have a non-null probability of an outbreak occurrence.

Figure 2. Fraction of infected individuals, I, as a function of the basic reproductive number R 0.

Figure 2

The parameter values are: Nd = 10, D = 2000, e = 0.1 and b = 0.02 (full symbols) and Nd = 20, D = 7000, e = 0.04 and b = 0.01 (empty symbols). The dashed-line is the theoretical prediction according to Equation (3).

The measures of genetic diversity that we have studied were the mean number of pairwise differences between sequences in random samples of the whole pathogen population (π) and the number of segregating sites. From these we calculated the statistic Tajima's D, whose expected value is zero under a constant size population following the Wright-Fisher model of neutral evolution. Genetic diversity was evaluated at equilibrium and also before epidemiological and genetic equilibrium was reached.

Model for studying selection

Certain pathogens show rapid evolution [17], [18], [19], [20] and genetic analysis has strongly suggested the action of positive selection in some regions of their genomes. To understand the signature of selection on pathogen molecular variation in our model we started considering the invasion of the resident pathogen population by a slightly distinct variant (phenotypically). The new variant is assumed to have a higher fitness resulting from a lower rate of clearance by the infected host, but is it otherwise identical to the resident pathogen strain. In this way, a randomly chosen host is infected by the strain with higher R 0, and the fate of this strain is followed in the population, until loss or fixation. By fixation, one means that the mutant has spread through the whole population and it is now not only the dominant strain but the only strain present. After simulating several thousands of independent simulation runs of this process, a fixation probability of the mutant is estimated as the number of independent simulations in which fixation occurred over the total number of simulations. The mean time to fixation is also obtained and it corresponds to the number of generations that the mutant takes since its appearance until its fixation.

SIR model with reinfection and selection - a simple model for influenza A evolution

Influenza A is an RNA virus that causes annual winter epidemics in temperate climates, while circulating throughout the year in the tropics. With a high mutation rate, the population of influenza A virus can generate considerable genetic variability and if there would be no selection it could potentially attain high levels of genetic diversity. However, it has been found that its genetic diversity is reduced periodically (see [12]), and this is associated with cluster transitions. The evolutionary forces responsible for these patterns of molecular evolution are not well understood, although it is consensual that some form of selection is driving influenza A genome evolution [12], [21], [22], [23].

Previous modeling work has suggested that, in the case of influenza A, evolution occurs through the successive accumulation of neutral mutations, increasing viral diversity, followed by a sharp decline of the diversity which results from the fixation of a mutant strain that escaped host immune surveillance [12], [24]. Motivated by this, we introduce selection in our neutral model using a simple, yet insightful, way to understand influenza evolution. We assume that some level of reinfection can occur such that, while in the recovered state, a host can be reinfected with a probability, β. In this context we study a model where a new viral strain which is genetically sufficiently distinct (has accumulated a given number of mutations that where previously neutral), receives a selective advantage when it infects a host that had recovered from an initial infection caused by the old strain. The advantage is that the new strain causes a slightly longer infection in this host. Note though, that this new strain causes exactly the same duration of infection as the old one when it infects a host that has never been infected. A caricature of this model with all the relevant parameters can be seen in Figure 1. In this model we have studied the pattern of genetic diversity by introducing a genetic distance threshold, denoted by dc, by which pathogens carrying more than dc mutations, acquire a selective advantage.

Sequence data of Influenza A virus

Complete coding sequences of the hemagglutinin (HA) gene of A/H3N2 influenza viruses from the New York state, USA, were collected from the NCBI Influenza Database [25] A file with the sequences is provided as a supplement (Sequences.fas Text S1). We calculate the genetic diversity and Tajima's D of 683 sequences from years 1993 to 2006 using DnaSP 4.20.2 [26]]. The analysis was made by seasons, in which a season was defined as the time-window between September and May.

Results

Level of genetic diversity

We have studied the levels and patterns of genetic diversity in a pathogen population whose dynamics follows the SIR epidemiological model. Figure 3 displays the level of diversity observed in random samples from the entire pathogen metapopulation as a function of R 0. In the figure, the increase of the parameter R 0 was performed by the incrementing the migration rate m, i.e., by increasing the rate of transmission, while keeping all other parameters fixed. Intuitively one can expect that the level of diversity increases with the level of infection, since the overall population size increases; we also expect the level of diversity to increase with the pathogen mutation rate. Furthermore it has been shown [27] that in a metapopulation with extinction and recolonization genetic diversity decreases with increasing extinction rates. More formally we find that the level of diversity π is well approximated by the simplest expression

graphic file with name pone.0004876.e019.jpg (4)

where I is given by equation (3), which shows that neutral genetic diversity is proportional to the level and duration of infection and the mutation rate.

Figure 3. The level of genetic diversity in the pathogen population as a function of R 0.

Figure 3

Transmission is governed by the SIR model (1). Parameter values are as follows D = 30000, e = 0.1 U = 0.00005 and b = 0.005 (in panel A) and U = 0.00002 and b = 0.015 (in panel B), Nd = 5 for filled triangles, Nd = 10 for empty diamonds and Nd = 20 for grey circles. The solid black line is the expected level of diversity as given by equation (4). In panel A for the lowest values of R 0 the pathogen population could not be maintained.

We also note that equation (4) performs best when host population turnover is much slower than recovery from infection (i.e. when b is much lower than e). From Figure 3 and several other simulations that we have performed, we find that this theoretical curve provides a better fit to the simulated genetic diversity when b/e is small, with deviations appearing when both this ratio and R 0 are large (compare panel A in which b/e = 0.05 and panel B for which b/e = 0.15).

Many common infectious diseases, such as influenza where we focus later on, have low R 0 (around 2–4) [28] and very low ratios b/e (around 0.0002), ensuring the applicability of equation (4) as a good approximation to the level of genetic diversity that should generally be expected under neutral evolution of many infectious diseases.

Frequency spectrum of neutral mutations

We also studied how the frequency of segregating neutral mutations in this metapopulation model compares with that expected under the standard neutral model of molecular evolution. In order to do so we have measured the average Tajima's D statistic in samples of the simulated pathogen population. We always found values of Tajima's D similar to that expected for the standard neutral model, i.e, average values of D∼0 (see Table 1 for some examples). This implies that demographic structure, such as that studied here, is difficult to detect with a classical measure of deviations from the standard neutral frequency spectrum.

Table 1. Frequency distribution of mutations assessed by measuring Tajima's D.

b = 0.005 b = 0.015
Nd R 0 D 2SE Nd R 0 D 2SE
5 3 0.06 0.02 5 1.5 0.01 0.08
5 5 0.09 −0.02 5 2 0.04 0.14
5 10 0.09 −0.05 5 3 −0.11 0.12
5 20 0.13 0.03 5 5 −0.08 0.18
5 30 0.09 −0.10 5 10 −0.06 0.20
5 50 0.13 0.04 5 20 −0.04 0.19
5 100 0.11 −0.05 5 30 −0.04 0.19
20 3 −0.08 0.12 5 50 −0.15 0.16
20 5 −0.05 0.10 5 100 0.04 0.17
20 10 −0.09 0.10 10 1.5 −0.08 0.07
20 20 −0.14 0.10 10 2 0.02 0.14
20 30 −0.14 0.11 10 3 0.04 0.14
20 50 −0.09 0.12 10 5 −0.21 0.13
20 100 −0.05 0.14 10 10 −0.06 0.16

Fixed parameters were D = 10000, U = 0.0001 and e = 0.1.

Approach to Equilibrium

In many situations, we may be interested in knowing the level of diversity before equilibrium is attained. This is particularly relevant in the period during or after an epidemic [11]. We have studied this by simulation (Figure 4) and considered a heuristic approximation for the average level of diversity as follows. Let us suppose that the whole convergence to equilibrium can be approximated by a change in pathogen population size which starts from 1 individual. Initially there is no diversity and when Inline graphic the level of diversity will be approximately Inline graphic where Inline graphic with Inline graphic, as we have seen before.

Figure 4. Approach to equilibrium R 0 = 2.5.

Figure 4

The parameter values are D = 5000, U = 0.0001, e = 0.1, b = 0.008 and Nd = 10. Dashed line is the level of infection in the population, crosses is the average pairwise diversity as a function of time (in generations), the black line is the prediction of equation (5), using the level of infection obtained in the simulations. Grey triangles represent the mean value of Tajima's D statistic. In the initial period of the epidemic, diversity increases with slope of approximately 2U (linear regression slope 0.0002, R 2 = 0.99).

Tajima [29] has shown that for a single unstructured population that fluctuates in size (Nt), the number of segregating sites, S, in a sample of size n, changes in time according to

graphic file with name pone.0004876.e024.jpg

which for Inline graphic where Inline graphic gives

graphic file with name pone.0004876.e027.jpg

Because at equilibrium the level and pattern of genetic diversity is similar to that expected under a standard neutral model with effective population size of Inline graphic we try the following heuristic approximation for the variation in diversity levels with time:

graphic file with name pone.0004876.e029.jpg (5)

where It is approximated by the number of infected hosts in the SIR model.

Figure 4 shows that equation (5) provides a good approximation for the level of diversity as the population approaches both epidemiological and genetic equilibrium. We can observe that at the peak of the epidemic, diversity levels are low and Tajima's D is very negative. After the initial epidemic, both diversity levels and Tajima's D start to increase. Diversity initially increases at a rate 2U approximately and follows very closely the levels predicted by equation (5).

Invasion of new pathogenic strains

We have considered the case where a new strain that has a higher R 0 is introduced in the population. This new strain is assumed to carry some beneficial mutation that makes it more virulent in the sense that is causes a longer infection in the host. We then asked what the probability of such virulent strain to invade is and, on average, how long this invasion takes. As commonly done in the population genetics literature [30], we define the relative selective advantage of the new strain as Inline graphic. We show in Figure 5 the simulation results of several independent introductions of mutant strains with different selective advantages. We can see that the probability of replacement depends on the selective advantage (s) according to Inline graphic. Since in this model, to a very good approximation, any given host will either be infected with the old strain or with the new strain, and the number of infected hosts is approximately constant, the process of fixation can be well approximated by a simple Moran model of birth and death of infected hosts, where the probability of birth of infected hosts carrying the new strain is slightly higher than that of infected hosts carrying the old strain [31]. Under the Moran model the probability of fixation of a beneficial mutant is Inline graphic where r is the fitness of the new mutation and N is the population size. In our case r = 1+s and N corresponds to the number of infected hosts, which leads to the probability of fixation given above.

Figure 5. Probability and mean time to replacement.

Figure 5

D = 5000, b = 0.01, e = 0.1 and initial R 0 = 4.5. There is no reinfection. Nd = 10 (circles) and Nd = 20 (squares) A) Probability of replacement of new strain that has selective advantage s (Pfix). B) Mean time to replacement of new selected strain (Tfix).

Pattern of diversity under the invasion of escape mutant strains

We now study a model where we have introduced selection in a simple, yet relevant, way for understanding influenza evolution. We have assumed that, after recovery from a first infection, hosts can be reinfected (see Figure 1). If a host experiences a second infection with a strain that is genetically distant from the one that caused the first infection, then its clearance rate is lower than if the infection would have been caused by a genetically similar virus (e 1<E and e 1<e 2). This simply says that the repertoire of antibodies that was built upon infection with a given pathogen will not be optimal against an antigenically distinct pathogen. Moreover, we assume that this effect is asymmetric. An infection with the invading strain that is preceded by an infection with resident strain has a clearance rate that is lower than if the order is reversed (e 1<e 2). The argument is that the invading strain emerged in the presence of antibodies against the resident and escaped successfully, while the reverse is not true. The resident emerged before the maturation of antibodies against the invader, and therefore has never been under their selective pressure.

To follow the level and pattern of genetic diversity in the sequences we have assumed that two strains are antigenically distinct when they differ by two or more mutations. Figure 6 illustrates what we have observed over many different simulations. As the new strain invades the population it leaves a molecular signature in genetic sequences sampled randomly from the pathogen population. As can be seen in the figure (see also Supporting Information Figure S1, Figure S2 and Figure S3), as the new strain sweeps through the population, the average level of pairwise differences between sequences increases and this is accompanied by a substantial increase in the value of Tajima's D, which becomes positive. When the new strain becomes dominant in the population, diversity decreases to very low levels and this is accompanied by a change in sign of Tajima's D, which now becomes consistently negative. So a rapid and drastic change in sign of Tajima's D and a corresponding decrease in diversity is a molecular signature of a new strain becoming dominant in the population. In Figure 7 we have plotted the values of π and Tajima's D for the HA gene of influenza A sampled in New York. Interestingly, we can observe that in the seasons 2002–2003 and 2003–2004 a rapid increase in diversity accompanied by an increase in Tajima's D, which become positive, followed by a rapid decrease in diversity and a change in sign of Tajima's D.

Figure 6. Time plot of the pattern of diversity during the replacement of a new strain.

Figure 6

On the left scale, we plot π (gray line) and Tajima's D (filled triangles). On the right scale, we plot the total frequency of infection (dashed line) and the frequency of hosts infected with new selected strain (filled line). Parameters are as follows: D = 30000, initial R 0 = 4, U = 0.0001, dc = 2 e = 0.1, E = 0.7, e 1 = 0.3, e 2 = 0.7 and b = 0.005.

Figure 7. Pattern and level of genetic diversity in the coding region of hemagglutinin gene of A/H3N2 influenza virus.

Figure 7

Sequences sampled in New York State, USA, over several seasons, in which a season was defined as the time-window between September and May. Across the time period analysed Tajima's D (full triangles) is negative but in 2002–2003 we can observe that Tajima's D achieves a positive value which accompanies an increase in genetic diversity π (represented as open squares).

In Table 2 we show the probability that the new strain replaces the old strain and also show on average how rapidly that replacement takes place. From the results in Table 2 it is clear that, under the described selection (e 1<e 2), the probability of fixation of the new strain is always higher than the probability of replacement by a neutral strain (e 1 = e 2) and that the time for it to sweep through the population can be short. As the mechanism of selection of the new strain over the resident acts upon reinfection with the asymmetry e 1/e 2<1, we can see that as this ratio becomes smaller the probability of replacement increases and the mean time to replacement decreases, all else being equal.

Table 2. Combinations of model parameters used in simulations.

D R0 e E e1 e2 b Pfix Tfix
30000 4 0.1 0.7 0.1 0.7 0.005 0.65 (0.06) 301
4 0.1 0.7 0.2 0.7 0.005 0.53 (0.06) 621
4 0.1 0.7 0.3 0.7 0.005 0.41 (0.06) 1042
4 0.1 0.7 0.5 0.7 0.005 0.11 (0.06) 2358
4 0.1 0.7 0.6 0.7 0.005 0.05 (0.04) 5184
4 0.1 0.7 0.1 0.6 0.005 0.67 (0.09) 1809
3.2 0.1 0.7 0.6 0.7 0.005 0.03 (0.03) 6792
3.2 0.1 0.7 0.5 0.7 0.005 0.04 (0.04) 2753
3.2 0.1 0.7 0.3 0.7 0.005 0.22 (0.05) 1047
3.2 0.1 0.7 0.2 0.7 0.005 0.48 (0.06) 597
3.2 0.1 0.7 0.1 0.7 0.005 0.60 (0.06) 250
10000 4 0.1 0.7 0.1 0.35 0.005 0.34 (0.09) 771
4 0.1 0.7 0.1 0.5 0.005 0.66 (0.09) 383
4 0.1 0.7 0.1 0.6 0.005 0.72 (0.09) 297
4 0.1 0.7 0.1 0.7 0.005 0.71 (0.09) 150

Fixed parameters were Nd = 5 and U = 0.0001.

Discussion

We have developed a simulation framework aiming at establishing the expectations for levels and patterns of neutral genetic diversity under the epidemiological SIR model. Unlike previous modeling frameworks [11], [12], [23], we have introduced a demographic structure where both within- and between-host evolutionary processes can be studied. This framework should be applicable to many different pathogens. We have found that DNA/RNA sequence variability is not only proportional to the level of infection in the population but also depends specifically on the duration of individual infection, such that, for the same prevalence of infection, pathogens which cause longer infections can sustain more genetic variability. For example, the genetic diversity is 0.26±0.04 for e = 0.2, but for a longer infection period (e = 0.1) the genetic diversity is 1.11±0.20 (when R0 = 20, D = 30000, Nd = 10, U = 0.00001 and b = 0.01). When extending this model to incorporate selection amongst strains that are continuously generated through mutation, we found that simple forms of selection will lead to simple predictions for the probability of replacement of new strains. For concreteness, imagine that a new strain is introduced in a population, either through mutation of immigration. The only phenotypic change in this new strain is a small increase in the duration of infection, which increases significantly the chance of replacement. As an example, a 10% increase in the duration of infection leads to a 10 fold increase in the chance of replacement.

We then studied strain phenotypic diversity generated by mutation, which will lead to antigenic differences in a pathogen, such as the influenza A virus, and incorporated those in an SIR model with reinfection. Despite its simplicity, the model tries to capture genetic properties of influenza A drift evolution. Influenza A evolution is roughly characterized by two evolutionary phenomena: shift and drift [22]. Shift events are associated with subtype replacements and, typically, cause pandemics. Between shifts, antigenic drift occurs. This is characterized by the accumulation of mutations generating viral diversity. The majority of mutations are thought to be neutral but some, or certain combinations, can lead to an antigenically distinct virus which will be subject to selection since it has a reproductive advantage. Motivated by what has been suggested for the evolution of influenza A, we have followed the patterns of sequence diversity under a model where we assumed that, after the accumulation of a critical number of neutral mutations, the pathogenic strain would have a reproductive advantage. We found two clear molecular signatures of replacement in the model: a rapid reduction in diversity and a change in the sign of Tajima's D (from positive to negative), as replacement occurs. This molecular signature is observed in sequences from influenza A over the seasons between 2001 and 2004.

A debate has arisen concerning the possibility that influenza A drift evolution is driven by continuous positive Darwinian selection [21], or by epochal selection [12], [24]. Our model makes clear predictions on the molecular signatures of each scenario provided that data from sufficiently frequent sample exists. If continuous positive selection is occurring we should expect to see repeated molecular signatures of replacement (repeated decreases in π co-occurring with continuous changes in sign of Tajima's D), as in Figure 6. If long periods of neutral evolution occur then no such pattern is expected, as in Figure 4.

The model presented combines pathogen transmission, mutation and selection under minimal assumptions that are verified by many pathogens. This makes the results and conclusions widely applicable.

Supporting Information

Figure S1

Time plot of the pattern of diversity during the replacement of a new strain. On the left scale of the plot diversity (gray line) and Tajima's D (filled triangles). On the right scale we plot the total frequency of infection (dashed line) and the frequency of hosts infected with new selected strain (filled line). Parameters are as follows: D = 30000, initial R0 = 4, U = 0.0001, dc = 2 e = 0.1, E = 0.7, e1 = 0.2, e2 = 0.7 and b = 0.005.

(0.04 MB TIF)

Figure S2

Time plot of the pattern of diversity during the replacement of a new strain. On the left scale of the plot diversity (gray line) and Tajima's D (filled triangles). On the right scale we plot the total frequency of infection (dashed line) and the frequency of hosts infected with new selected strain (filled line). Parameters are as follows: D = 10000, initial R0 = 4, U = 0.0001, dc = 2 e = 0.1, E = 0.7, e1 = 0.3, e2 = 0.7 and b = 0.005.

(0.04 MB TIF)

Figure S3

Time plot of the pattern of diversity during the replacement of a new strain. On the left scale of the plot diversity (gray line) and Tajima's D (filled triangles). On the right scale we plot the total frequency of infection (dashed line) and the frequency of hosts infected with new selected strain (filled line). Parameters are as follows: D = 30000, initial R0 = 4, U = 0.00005, dc = 2 e = 0.1, E = 0.7, e1 = 0.3, e2 = 0.7 and b = 0.005.

(0.04 MB TIF)

Text S1

Supporting sequence information.

(1.30 MB DOC)

Acknowledgments

We thank Katia Koelle for helpful discussions.

Footnotes

Competing Interests: The authors have declared that no competing interests exist.

Funding: We thank the Portuguese Research Council (FCT) for financial support through PTDC/BIA-BDE/65276/2006. We thank FCT of Portugal, grants PTDC/BIA-BDE/65276/2006, and, Fundação Calouste Gulbenkian for the financial support of hermes, a High Performance Computing Centre, hosted in Instituto Gulbenkian de Ciência, Oeiras. PRAC is supported by Conselho Nacional de Desenvolvimento Cientí­fico e Tecnológico (CNPq), Fundação de Amparo à Ciência e Tecnologia do Estado de Pernambuco (FACEPE) and program PRONEX/MCT-CNPq-FACEPE. MGG and DGR are supported by FCT and the European Commission grant MEXT-CT-2004-14338. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

References

  • 1.Grenfell BT, Pybus OG, Gog JR, Wood JLN, Daly JM, et al. Unifying the epidemiological and evolutionary dynamics of pathogens. Science. 2004;303:327–332. doi: 10.1126/science.1090727. [DOI] [PubMed] [Google Scholar]
  • 2.Wilson DJ, Falush D, McVean G. Germs, genomes and genealogies. Trends in Ecology & Evolution. 2005;20:39–45. doi: 10.1016/j.tree.2004.10.009. [DOI] [PubMed] [Google Scholar]
  • 3.Rousset F. Genetic structure and selection in subdivided populations. Princeton: Princeton University Press; 2004. p. 288. [Google Scholar]
  • 4.Campos PRA, Gordo I. Pathogen genetic variation in small-world host contact structures. Journal of Statistical Mechanics-Theory and Experiment. 2006:L12003. [Google Scholar]
  • 5.Gordo I, Campos PRA. Patterns of genetic variation in populations of infectious agents. Bmc Evolutionary Biology. 2007;7:116. doi: 10.1186/1471-2148-7-116. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 6.Kimura M. The neutral theory of molecular evolution. Cambridge: Cambridge University Press; 1983. p. 384. [Google Scholar]
  • 7.Anderson RM, May RM. Infectious diseases of humans: dynamics and control. Oxford: Oxford University Press; 1991. p. 768. [Google Scholar]
  • 8.Gomes MG, White LJ, Medley GF. Infection, reinfection, and vaccination under suboptimal immune protection: epidemiological perspectives. J Theor Biol. 2004;228:539–549. doi: 10.1016/j.jtbi.2004.02.015. [DOI] [PubMed] [Google Scholar]
  • 9.Drake JW, Charlesworth B, Charlesworth D, Crow JF. Rates of spontaneous mutation. Genetics. 1998;148:1667–1686. doi: 10.1093/genetics/148.4.1667. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 10.Perfeito L, Fernandes L, Mota C, Gordo I. Adaptive mutations in bacteria: high rate and small effects. Science. 2007;317:813–815. doi: 10.1126/science.1142284. [DOI] [PubMed] [Google Scholar]
  • 11.Boni MF, Gog JR, Andreasen V, Feldman MW. Epidemic dynamics and antigenic evolution in a single season of influenza A. Proc Biol Sci. 2006;273:1307–1316. doi: 10.1098/rspb.2006.3466. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 12.Koelle K, Cobey S, Grenfell B, Pascual M. Epochal evolution shapes the phylodynamics of interpandemic influenza A (H3N2) in humans. Science. 2006;314:1898–1903. doi: 10.1126/science.1132745. [DOI] [PubMed] [Google Scholar]
  • 13.Hudson RR. Gene genealogies and the coalescent process. Oxford: Oxf. Surv. Evol. Biol; 1990. pp. 1–45. [Google Scholar]
  • 14.Pybus OG, Charleston MA, Gupta S, Rambaut A, Holmes EC, et al. The epidemic behavior of the hepatitis C virus. Science. 2001;292:2323–2325. doi: 10.1126/science.1058321. [DOI] [PubMed] [Google Scholar]
  • 15.Combadao J, Campos PR, Dionisio F, Gordo I. Small-world networks decrease the speed of Muller's ratchet. Genet Res. 2007;89:7–18. doi: 10.1017/S0016672307008658. [DOI] [PubMed] [Google Scholar]
  • 16.Renshaw E. Modelling biological populations in space and time. Cambridge: Cambridge University Press; 1991. p. 424. [Google Scholar]
  • 17.Bazykin GA, Dushoff J, Levin SA, Kondrashov AS. Bursts of nonsynonymous substitutions in HIV-1 evolution reveal instances of positive selection at conservative protein sites. Proc Natl Acad Sci U S A. 2006;103:19396–19401. doi: 10.1073/pnas.0609484103. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 18.Taylor BS, Sobieszczyk ME, McCutchan FE, Hammer SM. Medical progress: The challenge of HIV-1 subtype diversity. New England Journal of Medicine. 2008;358:1590–1602. doi: 10.1056/NEJMra0706737. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 19.Kay A, Zoulim F. Hepatitis B virus genetic variability and evolution. Virus Research. 2007;127:164–176. doi: 10.1016/j.virusres.2007.02.021. [DOI] [PubMed] [Google Scholar]
  • 20.Simmonds P. Genetic diversity and evolution of hepatitis C virus - 15 years on. Journal of General Virology. 2004;85:3173–3188. doi: 10.1099/vir.0.80401-0. [DOI] [PubMed] [Google Scholar]
  • 21.Shih ACC, Hsiao TC, Ho MS, Li WH. Simultaneous amino acid substitutions at antigenic sites drive influenza A hemagglutinin evolution. Proc Natl Acad Sci U S A. 2007;104:6283–6288. doi: 10.1073/pnas.0701396104. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 22.Earn DJD, Dushoff J, Levin SA. Ecology and evolution of the flu. Trends in Ecology & Evolution. 2002;17:334–340. [Google Scholar]
  • 23.Gokaydin D, Oliveira-Martins JB, Gordo I, Gomes MG. The reinfection threshold regulates pathogen diversity: the case of influenza. J R Soc Interface. 2007;4:137–142. doi: 10.1098/rsif.2006.0159. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 24.Wolf YI, Viboud C, Holmes EC, Koonin EV, Lipman DJ. Long intervals of stasis punctuated by bursts of positive selection in the seasonal evolution of influenza A virus. Biology Direct. 2006;1:34. doi: 10.1186/1745-6150-1-34. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 25.Bao YM, Bolotov P, Dernovoy D, Kiryutin B, Zaslavsky L, et al. The influenza virus resource at the national center for biotechnology information. Journal of Virology. 2008;82:596–601. doi: 10.1128/JVI.02005-07. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 26.Rozas J, Sanchez-DelBarrio JC, Messeguer X, Rozas R. DnaSP, DNA polymorphism analyses by the coalescent and other methods. Bioinformatics. 2003;19:2496–2497. doi: 10.1093/bioinformatics/btg359. [DOI] [PubMed] [Google Scholar]
  • 27.Pannell JR, Charlesworth B. Effects of metapopulation processes on measures of genetic diversity. Philos Trans R Soc Lond B Biol Sci. 2000;355:1851–1864. doi: 10.1098/rstb.2000.0740. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 28.Heffernan JM, Smith RJ, Wahl LM. Perspectives on the basic reproductive ratio. J R Soc Interface. 2005;2:281–293. doi: 10.1098/rsif.2005.0042. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 29.Tajima F. The effect of change in population-size on DNA polymorphism. Genetics. 1989;123:597–601. doi: 10.1093/genetics/123.3.597. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 30.Crow JF, Kimura M. Introduction to population genetics. New York: Harper & Row Publishers; 1970. p. 591. [Google Scholar]
  • 31.Slatkin M. Fixation probabilities and fixation times in a subdivided population. Evolution. 1981;35:477–488. doi: 10.1111/j.1558-5646.1981.tb04911.x. [DOI] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Figure S1

Time plot of the pattern of diversity during the replacement of a new strain. On the left scale of the plot diversity (gray line) and Tajima's D (filled triangles). On the right scale we plot the total frequency of infection (dashed line) and the frequency of hosts infected with new selected strain (filled line). Parameters are as follows: D = 30000, initial R0 = 4, U = 0.0001, dc = 2 e = 0.1, E = 0.7, e1 = 0.2, e2 = 0.7 and b = 0.005.

(0.04 MB TIF)

Figure S2

Time plot of the pattern of diversity during the replacement of a new strain. On the left scale of the plot diversity (gray line) and Tajima's D (filled triangles). On the right scale we plot the total frequency of infection (dashed line) and the frequency of hosts infected with new selected strain (filled line). Parameters are as follows: D = 10000, initial R0 = 4, U = 0.0001, dc = 2 e = 0.1, E = 0.7, e1 = 0.3, e2 = 0.7 and b = 0.005.

(0.04 MB TIF)

Figure S3

Time plot of the pattern of diversity during the replacement of a new strain. On the left scale of the plot diversity (gray line) and Tajima's D (filled triangles). On the right scale we plot the total frequency of infection (dashed line) and the frequency of hosts infected with new selected strain (filled line). Parameters are as follows: D = 30000, initial R0 = 4, U = 0.00005, dc = 2 e = 0.1, E = 0.7, e1 = 0.3, e2 = 0.7 and b = 0.005.

(0.04 MB TIF)

Text S1

Supporting sequence information.

(1.30 MB DOC)


Articles from PLoS ONE are provided here courtesy of PLOS

RESOURCES