Skip to main content
National Science Review logoLink to National Science Review
. 2020 Sep 24;8(1):nwaa246. doi: 10.1093/nsr/nwaa246

On the founder effect in COVID-19 outbreaks: how many infected travelers may have started them all?

Yongsen Ruan 1, Zhida Luo 2, Xiaolu Tang 3, Guanghao Li 4, Haijun Wen 5, Xionglei He 6, Xuemei Lu 7,, Jian Lu 8,, Chung-I Wu 9,
PMCID: PMC7543514  PMID: 34676089

Abstract

How many incoming travelers (I0 at time 0, equivalent to the ‘founders’ in evolutionary genetics) infected with SARS-CoV-2 who visit or return to a region could have started the epidemic of that region? I0 would be informative about the initiation and progression of epidemics. To obtain I0, we analyze the genetic divergence among viral populations of different regions. By applying the ‘individual-output’ model of genetic drift to the SARS-CoV-2 diversities, we obtain I0 < 10, which could have been achieved by one infected traveler in a long-distance flight. The conclusion is robust regardless of the source population, the continuation of inputs (It for t > 0) or the fitness of the variants. With such a tiny trickle of human movement igniting many outbreaks, the crucial stage of repressing an epidemic in any region should, therefore, be the very first sign of local contagion when positive cases first become identifiable. The implications of the highly ‘portable’ epidemics, including their early evolution prior to any outbreak, are explored in the companion study (Ruan et al., personal communication).

Keywords: SARS-CoV-2, COVID-19, population genetics, genetic drift, founder effect

INTRODUCTION

It is generally accepted that, in principle, a small number of infected people arriving in a new place could trigger an epidemic (if the basic reproduction number, R0, is not too small [1–3]). The main issue is how many travelers actually started each epidemic. Here, ‘a new place’ may mean a country, or a bordered region, within which the bulk of human interactions happen. Relative to the within-region movement, a bordered region is lightly connected to the rest of the world. Since the epidemic in any bordered region could have been started by one single infected traveler, or by 1000 of them, we take the population genetic approach to analyzing the divergence among viral populations in relation to the ‘founder effect’ [4].

We shall let It be the amount of input at time t (i.e. the number of infected people coming into an uninfected region). The crucial number is I0, i.e. the first batch of input. The magnitude of It is important in public health practice. If It has been large with continual input lasting for weeks, then a bordered region may be able to prevent the epidemic from being exported to (or being imported from) other regions, solely by restricting human movements out of (or into) its borders. On the other hand, if the epidemic in a region could be started with I0 < 10 (and It>0 ∼ 0), then sealing off either emigration or immigration would not be effective in stopping a pandemic. Unless the bordered regions are maintaining zero infections, the danger would be coming mainly from within their borders. Here, we aim to infer It, in particular, in the early period of an epidemic.

In this study, we use a population genetic framework [5]. Because the focus is the stochastic differentiation among viral populations, epidemiological models generally do not cover such topics. The genetic drift formulation used here also permits the calculation of the extinction probability of the invasion of the virus ([5], Ruan et al., personal communication). Epidemiological parameters, such as the number of uninfected individuals, the effects of quarantine and the development of immunity [6,7], are not considered here as population differentiation takes place in the earliest stage of the invasion. During this stage, neither quarantine nor herd immunity has yet become a major factor in the outbreak.

Theory

To estimate It, a conventional method is to inspect the changes in the population size of the viruses, Nt [8,9]. Viral population size corresponds to the number of infected individuals, assuming a viral clone in each person. Because Nt is only weakly dependent on It, the conventional approach does not offer the resolution we seek for. It would be more informative to examine multiple populations for their differences in genetic polymorphism. The differences would depend strongly on It at the very beginning of the epidemic (Fig. 1).

Figure 1.

Figure 1.

Schematic diagram of viral population divergence among regions. In G0 (generation 0), I0 = 8 and infected individuals arrive in regions 1–3 with 3, 5 and 7 of them, respectively, carrying the L-type virus. In the beginning, genetic drift is particularly strong and the frequency of L fluctuates as modeled by the Branching Process. G8 is about 5 weeks after the first arrival when the data are collected. After G8, the fluctuation is greatly dampened due to the large population size. In the later stage (after G20), even weak selection could drive gene frequency toward the fixation of the more contagious genotype. Regions 4–6 are not independent samples and are not included in the analysis (see Data of the main text).

For studying population differentiation, the source population infecting the travelers needs to harbor genetic variants in non-trivial frequencies to yield informative data [10]. For example, Tang et al. [11] reported the existence of two lineages that are distinguishable by two Single Nucleotide Polymorphisms (SNPs), one being a Serine/Lysine (S/L) polymorphism. According to ref. 11, the S lineage accounts for ∼30% and the L lineage accounts for ∼70% among the 103 viral genomes they examined.

For the ease of estimating It’s, the variants should ideally be neutral in fitness. Indeed, since variants under selection would have a short retention time in the population, SNPs are often neutral [1214]. While the fitness differential between the L and S variants remains unclear, our simulations show that the estimation of It is only weakly dependent on selection (see below).

The estimation of I0 as well as It>0 are conducted for multiple viral populations, which should ideally originate from independent samples. Among these populations, we model their differentiation as a function of It. It is not likely to be very large between regions (including the source) reachable only by air flight, each of which may carry at most a small number of infected passengers. As long as all extant populations are derived from the same source population, the estimates of It are only weakly dependent on the actual genetic make-up of the source. For that reason, the source population need not be known.

The hypothesis is that the viral populations seeded by the infected travelers have experienced strong fluctuation in gene frequency. This may happen at the beginning of the epidemic when Nt is small. Soon afterwards, the fluctuation in gene frequencies would be quickly dampened as Nt grows. The fluctuation in gene frequencies due to the random transmissions of genes is referred to as genetic drift [14,15] or the founder effect [4]. The standard formulation of genetic drift by the Wright-Fisher model (or the alternative model of Moran [16]) is not applicable for tracking the viral population. Instead, we use the ‘individual output’ model we previously proposed [5]. All models assume discrete generations. Based on the infection dynamics estimated in a recent study [17], we assume that each discrete generation is ∼4 days. If we use a longer or shorter generation time, the outcome would be similar as long as the progeny production is calibrated with the generation time.

From one generation to the next, each individual produces k ‘descendants’ (or infects k others) with the mean of E(k) and the variance of V(k). In the Wright-Fisher model, k follows the Poisson distribution and V(k) is tied to E(k) [5]. In the ‘individual output’ model, k may follow any distribution, which is often measurable but not in any common form. E(k) dictates the population growth, Nt, while V(k) determines the fluctuation in Nt and in gene frequency. We will attempt to obtain E(k) and V(k) from the empirical data and, for a comparison, will also allow V(k) = E(k) to approximate the Wright-Fisher model. Nt is a function of E(k), V(k) and It. Here, we assume It to be a constant, hence, It = I0 for all t’s. At time T,

graphic file with name M1.gif (1)

If E(k) is not too small, E(NT) would depend mainly on It of the first few generations. In fact, the results are often similar whether there is constant input or not (i.e. It = I0, or It = 0 when t ≥ 1). In other words, Equation (1′) below would yield similar results to Equation (1).

graphic file with name M2.gif (1′)

With reasonable accuracy, E(k) can be obtained from the growth trajectory of Nt but I0 has to be obtained by a different means. While many of the assumptions such as exponential growth and the constancy of It may hold for only a few generations, most of our results depend primarily on the dynamics of the first few generations. The actual trajectory of each population would also depend on V(k). Using the simpler Equation (1′),

graphic file with name M3.gif (2)

To obtain I0 for Equations (1) or (1′), we have to model gene frequencies. Using the example of the S/L polymorphism, we let XT be the frequency of the L lineage at time T. If the fitness of the S and L type is the same, then

graphic file with name M4.gif (3)
graphic file with name M5.gif (4)

where X0 is the frequency in the source population. Equations (3) and (4) will need some modifications if we use Equation (1), or if we consider the fitness difference between L and S (see Supplement).

Simulations

The actual realization of XT in each population is obtained by iteration described here. We assume two types of viruses (L type and S type; [11]). The relative fitness of L type to S type is 1 + s (s = 0 represents no selection). In addition, there is a source population, in which the frequency of the L type is X0. At generation t, there will be It immigrants from the source population. I0 is the founder population size. A parameter T sets the time limit of immigration. Thus,

graphic file with name M6.gif

At generation t − 1, the numbers of the L type and S type are Lt−1 and St−1 respectively. Also, Nt−1 = Lt−1 + St−1 and Xt−1 = Lt−1/Nt−1. After one generation, there will be It (It = IL + IS; IL, IS are the numbers of L and S type, respectively) immigrants from the source population. In addition, Nt−1 will increase to Nt.. Thus, at generation t, the numbers of the L and S type are

graphic file with name M7.gif
graphic file with name M8.gif

where Inline graphic is the progeny number of the i-th individual of either type. The distribution of Inline graphic will be defined in the next section. If there is selection, the number of L type will change as follows:

graphic file with name M11.gif

At generation t, the population size and the frequency of L type are

graphic file with name M12.gif
graphic file with name M13.gif

Based on the definition above, we simulate the stochastic changes of the viral population until it reaches the 20th generation (i.e. t = 20). Since genetic drift is negligible when Nt is large (e.g. >105), we simulate the trajectory by a deterministic model when Nt > 105.

To quantify the population differentiation, we calculate the pairwise Fst values [14,18], defined below. With Xt and Yt for a pair of populations,

graphic file with name M14.gif (5)

where Inline graphic. If Fst = 0, Xt = Yt and the two populations are identical in gene frequency. If Fst = 1, the two populations are maximally differentiated with Xt = 0 and Yt = 1, or vice versa.

Defining the parameter set (I0, T, X0, s) and the distribution of k

We set I0 = 1, 5, 10, 20, 50, 100 and T = 0, 1, 2, 3, 20. Thus, (I0, T) = (5, 3) means five travelers each generation for four generations, counting T = 0 as the initial batch. We set T in this range but will show that T = 0, 1 or 20 hardly matters. The frequency of L lineage in the source population is X0 = 0.1, 0.3, 0.5, 0.7, 0.9. Again, the polymorphism frequency in the source population has turned out to have little effect on the differentiation among populations. In addition, we also set s = 0, 0.1 to study the effect of selection. For each parameter set (I0, T, X0, s), we repeat the simulation 100 times.

As stated, the conventional Wright-Fisher Model requires k (progeny number of an individual) to follow a Poisson distribution with V(k) = E(k). Here, we assume the spread of virus to be associated with the social network, which usually follows the power law [19,20]. Specifically, we let k follow Zipf's law (a discrete power-law distribution; [21]):

graphic file with name M16.gif

where Inline graphic. The mean and variance of k are

graphic file with name M18.gif (6)
graphic file with name M19.gif (7)

The estimate of the basic reproduction number (R0) of SARS-CoV-2 ranges from 1.4 to 6.5 [22–24]. Here, we focus on the early phase of the viral population growth by using R0 = 6.5. The relationship between E(k) and R0 is as follows:

graphic file with name M20.gif (8)

where τ is the serial interval and G is the generation time, and τ is estimated to be ∼5 days [23,25] and G is 4 days [17]. Then, Inline graphic. (Note that E(k) would become smaller with smaller G, as stated above.) To have k follow the power law with E(k) = 4.5, we assume M = 30 and c = 1.3 in Equation (6), which yields V(k) = 45. As noted in Chen YX et al. [5], the strength of genetic drift depends mainly on E(k) and V(k) rather than the actual distribution. By setting V(k) so large, we ensure that the I0 estimate would be on the high side (see Discussion).

Data

The estimation of I0 as well as It>0 should be done on multiple viral populations that are independent samples from the same source. If they are not fully independent, then our estimates of It would be conservative (i.e. over-estimation) since any exchange between populations should reduce the divergence.

Here, we generate two sets of data as shown in Table 1. In the hypothetical Set I, the gene frequency is taken from a normal distribution with the mean of 0.7 and standard deviation of 0.06. The mean is close to the average frequency of the L type in this pandemic. Set I is generated to show the possible pattern of population divergence if I0 is in the hundreds.

Table 1.

Two computationally generated datasets for the frequency of the L lineage.

Region A1a A2 A3 B1 C1 C2 C3 D1 D2b D3
I (hypothetical) 0.73 0.70 0.71 0.64 0.75 0.61 0.70 0.74 0.82 0.69
II (realistic) 0.70 0.95 0.40 0.80 1.00 0.99 0.50 0.33 0.95 0.67

aEach letter in the region code indicates a separate continent and the number indicates a country or region; bD2 could have been derived from C1 or C2 due to the inter-continental travel pattern. Its exclusion from the analysis would increase the spread of Fst in Figs 24 and would thus lead to a smaller I0 estimate.

In Set II, we assign gene frequencies to 10 populations in Table 1 using the reported frequencies as a guide (GISAID (https://www.gisaid.org/); see Supplement). These 10 populations, distributed among four continents, are as likely to be independently derived as we could ascertain. The choice is based on three criteria: (i) the geography of the countries/regions and the distance between them; (ii) the timing of the documented onset of the epidemic; (iii) the abundance of DNA sequences. We consult the frequencies in samples collected before late March 2020, corresponding roughly to G8 in Fig. 1. Due to the rapidly changing data reporting (GISAID [26]), the frequency profile of Set II is plausibly realistic as reported in mid-April (see Supplement). Readers with access to the more up-to-date data can compare the new observations with the theoretical results to improve the estimation of It.

Comparisons between simulations and data

In Fig. 2, the Fst distributions based on the datasets of Table 1 are presented. The two very different distributions would be informative when compared with the simulated distributions.

Figure 2.

Figure 2.

The pairwise Fst distribution from the two datasets of Table 1. Fst is calculated by Equation (5). Given 10 regions, there are 45 (= 10 × 9/2) pairwise comparisons.

Given the large number of possible combinations of parameters, (I0, T, X0, s), the task could have been daunting. Fortunately, all but one parameter would have little impact on the results. In Fig. 3A–D, only X0 varies and the Fst distributions are very similar. The simplest explanation is that small populations would all deviate from the initial frequency, X0, regardless of where it is. Figure 3E–H shows that the results do not depend strongly on T either, T being the time duration of traveler input. Since, at T = 2, the number of infected individuals would have swelled by 20-fold without any new input, the results are not significantly affected by later inputs.

Figure 3.

Figure 3.

The Fst distribution at G20. For each parameter set of (I0, T, X0, s), we repeat the simulations 100 times. For all panels, I0 = 10, s = 0. Panel A–D, T = 20 and X0 ranges from 0.1 to 0.7. Panel E–H, X0 = 0.7, T ranges from 0 to 20. These eight panels show that neither X0 nor T would impact the Fst distribution much.

In Fig. 4, the value of I0 is varied from 10, 50 to 100. Here, we assume no travelers arriving after the first generation because their contributions, as shown in Fig. 3E–H, are insubstantial. The left panels (Fig. 4A, C and E) show the trajectories of 100 populations, which diverge in the first generation or two and then evolve steadily as the population size increases. On the right panels are the comparisons between the simulated and ‘observed’ distributions; the latter being those based on Dataset II (Fig. 4B and D) or Dataset I (Fig. 4F). It is clear that the expected distribution of Fst is very sensitive to It (I0, in particular). The more realistic Dataset II can only be explained by I0 ≤ 10 whereas the artificial Dataset I would agree with I0 ≥ 100. We should note that the extensive survey of parameter space (see Supplement) shows the conclusion of I0 ≤ 10 to be robust.

Figure 4.

Figure 4.

Comparisons between data and simulations with various I0 values. For all panels, s = 0 (no selection), X0 = 0.7 and T = 0. The value of I0 is shown next to each panel. Panel (A, C and E): the frequency of L type over time (100 repeats), the average is showed by the orange dotted line. Panel (B and D): Fst distribution; the simulation results are in black and the distributions from Dataset II (realistic data) are in orange. Panel (F): like panels B and D but Dataset I is used.

In the results presented above, L and S lineages are assumed neutral. Intuitively, selection should drive the trajectories to converge and the left panels (Fig. 5A, C and E) do show that trend. If we let the L type enjoy a 10% selective advantage, the results of Fig. 5 still indicate I0 < 10. Note that the range of I0 spans a smaller range in Fig. 5 than that in Fig. 4. In other words, with selection, the estimated I0 should be even smaller than indicated above.

Figure 5.

Figure 5.

Same as Fig. 4 except that (i) s = 0.1 with selection favoring the L type; (ii) X0 = 0.5 so the L type would not reach fixation so quickly.

The simulation results are based on the distribution of k that follows the power law (see Equations 6 and 7) with V(k) ∼ 10E(k). Such a large V(k) means that the genetic drift would be very large, requiring large It’s to reduce the drift. It is hence interesting that, even under such stringent conditions, I0 is still <10. In the Supplement, we show that the estimated value of I0 would be substantially lower if we use the Poisson distribution of k, associated with the conventional Wright-Fisher model. With V(k) = E(K), I0 would be 2–4. Hence, the conclusion presented in this section is robust.

Inference of parameters (I0, T, X0, s)

In the last section, we present a range of parameter values that yield the expected population divergence for a comparison with the data. To corroborate the visual comparisons, we also carry out formal inferences using the ABC (Approximate Bayesian Computation) procedure on Dataset II (see Supplement). The results of Fig. S3 and S4 indeed confirm the visual impression as the observed divergence is sensitive only to I0, but not to the variation in T, X0 and s. The formal inference of I0 at 2–5 is even smaller than the visual impression would suggest. Finally, this analysis considers only the number of infections that can spread the virus further. A more extensive model that incorporates the development of symptoms, the border control and the local quarantine, all of which may contribute to the suppression of the epidemics, will be presented later (Ruan et al., personal communication).

DISCUSSION

In the theory of genetic drift [5], even 100 infected travelers from a source viral population would give rise to a fairly uniform level of genetic polymorphism among bordered regions. In contrast, the reported data indicate substantial divergence among countries (GISAID (https://www.gisaid.org/); see Supplement). Dataset II of Table 1 is realistic in this respect. The divergent polymorphisms across countries depend mainly on a critical parameter—the size of the first cohort arriving in a country, I0, which is estimated to be <10. The number may in fact be smaller than it seems since a long distance flight carrying one single infected but symptomless patient could infect this many people, all of whom would be without symptoms upon arrival [27].

On the robustness of the estimation

In Figs 3 and 5, we show that, despite the complexity of the model with many parameters, none of them (T, X0, s), except I0, plays a significant role in the divergence among viral populations. As discussed in conjunction with Fig. 3, the distribution of k does not matter either. In fact, in the standard Wright-Fisher model, the estimate of I0 would be <5. It is also noted that the E(k) and V(k) values used are for a generation time of 4 days. For a shorter generation time, the values would be correspondingly smaller and the results should be similar.

The model also assumes that each population is an independent sample of the source population. Since all populations are likely to exchange some individuals due to traveling, the actual divergence among populations would be even smaller than simulated. In other words, to attain the observed level of divergence, I0 would have to be even smaller than estimated. Considering all these variables, we believe the conclusion of I0 < 10 to be robust. In the analysis of regional divergence, the results would depend strongly on the smaller I0 of the two regions being compared. Hence, the assumption of the same I0 among all regions would be a reasonable one.

Subsequent viral evolution after arriving in a new continent

While our focus is on the divergence in the first few generations, we now briefly discuss the subsequent evolution after this initial critical period. The primary lineage delineation, the S/L polymorphism defined by two SNPs [11], has many subtypes (see Supplementary data and Table S1 for details). For example, western European countries including Italy, Switzerland, Germany and Belgium are predominantly of the L type with a similar abundance in the L2 subtype. In contrast, while Japan is also predominantly of the L type, it has mainly the L1 subtype. This contrast suggests that Japan may represent an independent sample from the western European samples, which have likely been spreading regionally after the initial seeding. Another example is the S1 and S2 subtypes, which differentiate between the samples from China and the west coast of the US.

These patterns suggest that, after the initial seeding, each major region or continent has been evolving along an independent path. Since the initial seeding may be extremely difficult to prevent, the onus is to suppress the regional spread. The analyses of the subtypes in Asia, Australia and various parts of North America would offer additional details of the spread of the virus, as has been done recently [28–31]. These details are beyond the scope of this study, which focuses on the early stages of the viral spread.

Implications

The analysis suggests that the COVID-19 epidemic in each region surveyed was likely started by a very small number of travelers (I0 < 10).  With such a tiny trickle of human movement, it would have been very inefficient for any region to prevent infected individuals from exporting an epidemic to (or importing it from) other places. For that reason, the crucial stage of repressing an epidemic in any region should be the very first sign of local contagion.

Finally, due to the ‘portability’ of COVID-19, each epidemic, including the first one on record, could have easily been imported. Where then did all these epidemics begin? While the interest in the ‘origin’ is intense, we suggest the question be broadened as ‘the origin and early evolution’ of SARS-CoV-2. The latter implies a process whereas the former seems to mean a single time point. The process of early evolution may have stretched over different regions in a long time-span and involved multiple host species. Like many other evolutionary questions on origin, we suggest the question be phrased as the early evolution of SARS-CoV-2, rather than be about the ‘origin’. The former implies a process whereas the latter seems to mean a single time point. This distinction is important as seen in the debates on the ‘origin’ of dogs [32,33] and new species in novel environments [34]. By compressing a process into a simple ‘origin’, we may be asking a false question about, say, ‘the first dog’ or ‘the first patient’. The possible early evolution of SARS-CoV-2 is addressed in the companion study (Ruan et al., personal communication).

Acknowledgements

We thank all those who have contributed sequences to the GISAID (Global Initiative on Sharing All Influenza Data) database (https://www.gisaid.org/). Acknowledgements and detailed information of the genome sequences we consulted for this study are given in Table S2. The actual data fitted to our model were computationally generated (see the main text). We thank Drs Yaping Zhang, Suhua Shi, Weiwei Zhai, Jianzhi George Zhang, Bingjie Chen, Zheng Hu and Jindong Zhao for comments on and suggestions for this manuscript.

Contributor Information

Yongsen Ruan, State Key Laboratory of Biocontrol, School of Life Sciences, Sun Yat-sen University, Guangzhou 510275, China.

Zhida Luo, State Key Laboratory of Biocontrol, School of Life Sciences, Sun Yat-sen University, Guangzhou 510275, China.

Xiaolu Tang, State Key Laboratory of Protein and Plant Gene Research, Center for Bioinformatics, School of Life Sciences, Peking University, Beijing 100871, China.

Guanghao Li, CAS Key Laboratory of Genomic and Precision Medicine, Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing 100101, China.

Haijun Wen, State Key Laboratory of Biocontrol, School of Life Sciences, Sun Yat-sen University, Guangzhou 510275, China.

Xionglei He, State Key Laboratory of Biocontrol, School of Life Sciences, Sun Yat-sen University, Guangzhou 510275, China.

Xuemei Lu, State Key Laboratory of Genetic Resources and Evolution, Kunming Institute of Zoology; Center for Excellence in Animal Evolution and Genetics, Chinese Academy of Sciences, Kunming 650223, China.

Jian Lu, State Key Laboratory of Protein and Plant Gene Research, Center for Bioinformatics, School of Life Sciences, Peking University, Beijing 100871, China.

Chung-I Wu, State Key Laboratory of Biocontrol, School of Life Sciences, Sun Yat-sen University, Guangzhou 510275, China.

SUPPLEMENTARY DATA

Supplementary data are available at NSR online.

FUNDING

This work was supported by grants from the Ministry of Science and Technology of the People's Republic of China (No. 2020YFC0847000) to H.W., X.L. and J.L. This work was funded by the National Natural Science Foundation of China (31730046, 91731000 to C.I.W), Guangdong Basic and Applied Basic Research Foundation (2020B1515020030, 2019A1515010708), and CAS ‘Light of West China’ Program.

Conflict of interest statement. None declared.

REFERENCES

  • 1. Smith DL, Battle KE, Hay SIet al. . Ross, Macdonald, and a theory for the dynamics and control of mosquito-transmitted pathogens. PLoS Pathog 2012; 8: e1002588. doi: 10.1371/journal.ppat.1002588 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 2. Rothman KJ, Greenland S, Lash TL. Modern Epidemiology. Philadelphia: Lippincott Williams & Wilkins, 2008. [Google Scholar]
  • 3. Woolhouse ME. Population biology of emerging and re-emerging pathogens. Trends Microbiol 2002; 10: S3–7. doi: 10.1016/S0966-842X(02)02428-9 [DOI] [PubMed] [Google Scholar]
  • 4. Rambaut A, Posada D, Crandall KAet al. . The causes and consequences of HIV evolution. Nat Rev Genet 2004; 5: 52–61. doi: 10.1038/nrg1246 [DOI] [PubMed] [Google Scholar]
  • 5. Chen Y, Tong D, Wu CI. A new formulation of random genetic drift and its application to the evolution of cell populations. Mol Biol Evol 2017; 34: 2057–64. doi: 10.1093/molbev/msx161 [DOI] [PubMed] [Google Scholar]
  • 6. Hethcote HW. The mathematics of infectious diseases. Siam Review 2000; 42: 599–653. doi: 10.1137/S0036144500371907 [Google Scholar]
  • 7. Zhao S, Chen H. Modeling the epidemic dynamics and control of COVID-19 outbreak in China. Quant Biol 2020; 8: 11–9. doi:10.1007/s40484-020-0199-0 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 8. van den Driessche P. Reproduction numbers of infectious disease models. Infect Dis Model 2017; 2: 288–303. doi:10.1016/j.idm.2017.06.002 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 9. Heffernan JM, Smith RJ, Wahl LM. Perspectives on the basic reproductive ratio. J R Soc Interface 2005; 2: 281–93. doi: 10.1098/rsif.2005.0042 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 10. Poo M-m, Wu C-I. Moral imperative for immediate release of 2019-nCoV sequence data. Natl Sci Rev 2020; 7: 719–20. doi: 10.1093/nsr/nwaa030 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 11. Tang X, Wu C, Li Xet al. . On the origin and continuing evolution of SARS-CoV-2. Natl Sci Rev 2020; 7: 1012–23. doi: 10.1093/nsr/nwaa036 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 12. Kimura M, Ohta T. The average number of generations until fixation of a mutant gene in a finite population. Genetics 1969; 61: 763–71. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 13. Kimura M. The Neutral Theory of Molecular Evolution. Cambridge: Cambridge University Press, 1983. [Google Scholar]
  • 14. Crow JF, Kimura M. An Introduction to Population Genetics Theory. New York, Evanston and London: Harper & Row Publishers, 1970. [Google Scholar]
  • 15. Li W-H. Molecular Evolution. New York; Basingstoke: W.H. Freeman; Palgrave; (distributor), 2007. [Google Scholar]
  • 16. Moran PAP. Random processes in genetics. Math Proc Cambridge Philos Soc 1958; 54: 60–71. doi: 10.1017/S0305004100033193 [Google Scholar]
  • 17. Wolfel R, Corman VM, Guggemos Wet al. . Virological assessment of hospitalized patients with COVID-2019. Nature 2020; 581: 465–9. doi: 10.1038/s41586-020-2196-x [DOI] [PubMed] [Google Scholar]
  • 18. Hartl DL, Clark AG. Principles of Population Genetics. Massachusetts: Sinauer Associates, 1997. [Google Scholar]
  • 19. Muchnik L, Pei S, Parra LCet al. . Origins of power-law degree distribution in the heterogeneity of human activity in social networks. Sci Rep 2013; 3: 1783. doi: 10.1038/srep01783 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 20. Newman M. Networks: An Introduction. Oxford: OUP, 2010. [Google Scholar]
  • 21. Zipf GK. Human Behavior and the Principle of Least Effort. Oxford: Addison-Wesley Press, 1949. [Google Scholar]
  • 22. Liu Y, Gayle AA, Wilder-Smith Aet al. . The reproductive number of COVID-19 is higher compared to SARS coronavirus. J Travel Med 2020; 27: taaa021. doi:10.1093/jtm/taaa021 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 23. Ferretti L, Wymant C, Kendall Met al. . Quantifying SARS-CoV-2 transmission suggests epidemic control with digital contact tracing. Science 2020; 368: eabb6936. doi:10.1126/science.abb6936 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 24. Kucharski AJ, Russell TW, Diamond Cet al. . Early dynamics of transmission and control of COVID-19: a mathematical modelling study. Lancet Infect Dis 2020; 20: 553–8. doi:10.1016/S1473-3099(20)30144-4 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 25. Tindale L, Coombe M, Stockdale JEet al. . Transmission interval estimates suggest pre-symptomatic spread of COVID-19. eLife 2020; 9: e57149. doi:10.7554/eLife.5714932568070 [Google Scholar]
  • 26. Shu Y, McCauley J. GISAID: global initiative on sharing all influenza data - from vision to reality. Euro Surveill 2017; 22: 30494. doi:10.2807/1560-7917.ES.2017.22.13.30494 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 27. Li R, Pei S, Chen Bet al. . Substantial undocumented infection facilitates the rapid dissemination of novel coronavirus (SARS-CoV-2). Science 2020; 368: 489–93. doi: 10.1126/science.abb3221 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 28. Rambaut A, Holmes EC, O'Toole Aet al. . A dynamic nomenclature proposal for SARS-CoV-2 lineages to assist genomic epidemiology. Nat Microbiol 2020; 5: 1403–7. doi:10.1038/s41564-020-0770-5 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 29. Bai Y, Jiang D, Lon JRet al. . Evolution and molecular characteristics of SARS-CoV-2 genome. bioRxiv 2020; doi: 10.1101/2020.04.24.058933. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 30. Fauver JR, Petrone ME, Hodcroft EBet al. . Coast-to-coast spread of SARS-CoV-2 during the early epidemic in the United States. Cell 2020; 181: 990–6. doi: 10.1016/j.cell.2020.04.021 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 31. Gonzalez-Reiche AS, Hernandez MM, Sullivan MJet al. . Introductions and early spread of SARS-CoV-2 in the New York City area. Science 2020; 369: 297–301. doi: 10.1126/science.abc1917 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 32. Wang GD, Shao XJ, Bai Bet al. . Structural variation during dog domestication: insights from gray wolf and dhole genomes. Natl Sci Rev 2019; 6: 110–22. doi: 10.1093/nsr/nwy076 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 33. Wei FW. Structural variation provides novel insights into dog domestication. Natl Sci Rev 2019; 6: 123. doi: 10.1093/nsr/nwy099 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 34. He Z, Li X, Yang Met al. . Speciation with gene flow via cycles of isolation and migration: insights from multiple mangrove taxa. Natl Sci Rev 2019; 6: 275–88. doi: 10.1093/nsr/nwy078 [DOI] [PMC free article] [PubMed] [Google Scholar]

Articles from National Science Review are provided here courtesy of Oxford University Press

RESOURCES