Abstract
Evolutionary graph theory investigates how spatial constraints affect processes that model evolutionary selection, e.g. the Moran process. Its principal goals are to find the fixation probability and the conditional distributions of fixation time, and show how they are affected by different graphs that impose spatial constraints. Fixation probabilities have generated significant attention, but much less is known about the conditional time distributions, even for simple graphs. Those conditional time distributions are difficult to calculate, so we consider a close proxy to it: the number of times the mutant population size changes before absorption. We employ martingales to obtain the conditional characteristic functions (CCFs) of that proxy for the Moran process on the complete bipartite graph. We consider the Moran process on the complete bipartite graph as an absorbing random walk in two dimensions. We then extend Wald’s martingale approach to sequential analysis from one dimension to two. Our expressions for the CCFs are novel, compact, exact, and their parameter dependence is explicit. We show that our CCFs closely approximate those of absorption time. Martingales provide an elegant framework to solve principal problems of evolutionary graph theory. It should be possible to extend our analysis to more complex graphs than we show here.
Keywords: Moran process, stochastic process, birth–death process, evolutionary model, fixation time
1. Introduction
The spread of some novelty in a population can be modelled by stochastic processes [1,2]. For example, stochastic processes have modelled the spread of cancer cells in healthy tissue [3,4], disease in a population [5] and social trends [6]. Birth–death processes are a subset of stochastic processes that model the spread of genetic mutations in a resident population [7–9]. One particularly popular birth–death process is the Moran process, which has generated significant research interest since its introduction over 60 years ago [10–15].
The Moran birth–death process models the evolutionary selection of a novel mutation [10]. Briefly, it considers a population of fixed size [8]. Every individual in the population is either a mutant or a resident. On every time step, we choose one individual to reproduce and another to die. The offspring of the former replaces the latter, so the total population size remains constant. The difference between mutants and residents is that mutants have a different probability of being selected to reproduce with respect to the latter. This discrepancy is meant to model ‘fitness'. We repeat this birth–death selection procedure until the entire population comprises either mutants or residents. Our goals are to find the ‘fixation probability’ of the initial mutant population, and the (conditional) distribution of the number of time steps required to do so [16–18].
Evolutionary graph theory studies the impact of spatial constraints on these fixation probabilities and times [11]. It considers the Moran process constrained by a graph, where the graph nodes represent individuals, and the graph edges dictate which individuals can be replaced by other individuals’ offspring. The complete bipartite graph is a simple example of this concept [13,19–21]. It divides the population of individuals into two groups (figure 1). All individuals in one group are connected only to those in the other group. These connections constrain the Moran process such that the offspring from one group can only replace individuals in the other group. For example, imagine two separate colonies of sponges on the seabed. Say that the sponge larvae are programmed to swim away from their parent before searching the seabed for a suitable location to colonize [22]. If only two suitable locations exist in the immediate area, then offspring from one colony settle only in the other. We can model this simple ecosystem as a Moran process on a complete bipartite graph. We want to investigate how its spatial constraint impacts fixation probabilities and times compared with the original, fully connected Moran process [23].
The fixation probability of the complete bipartite graph is well known [19–21]. Much less is known about its conditional fixation time distributions [24–27]. Most prior work on fixation times was obtained via simulation, Markov chain analysis, restricting focus to fixation time means, or restricting population size [18,24–27]. Very few general analytical results exist for fixation times on evolutionary graphs [27], even for graphs as simple as the complete bipartite graph. In a previous paper, we showed that we can apply Wald’s martingale [28] to the original, fully connected Moran process if we eliminate time steps where the mutant population size does not change [16]. We can therefore obtain tractable expressions for the full conditional characteristic functions (CCFs) of the number of times that the mutant population size changes before going extinct or fixing, i.e. its number of ‘active steps’ [29]. Perhaps this approach could be extended to find analogous CCFs of the Moran process on more general graphs.
In this paper, we will extend Wald’s martingale to report the CCFs for the Moran process on the complete bipartite graph. We consider the bipartite graph as a two-dimensional random walk, with each dimension representing the mutant population size in one partition. We will show how to obtain fixation probabilities and the CCFs of active steps from a two-dimensional product martingale. Our expressions for the CCFs are novel, elegant, exact, and their parameter dependence is explicit. We will investigate the parameter dependence of our CCFs by evaluating them in different regions of parameter space. We will show that our CCFs of the number of active steps can accurately approximate the CCFs of fixation time. We will establish conditions for that approximation to be particularly accurate. Our analysis demonstrates that martingales are a powerful tool to solve fundamental problems in evolutionary graph theory, often within a few lines of mathematics [16,21,30–32].
2. Results
2.1. Problem statement and notation
For a more detailed introduction to the Moran birth–death process, see [8,11].
Figure 1 is a schematic of the Moran birth–death process on a complete bipartite graph [19,20,33]. All individuals in a population are divided into two partitions of sizes A and B, e.g. A = 4 and B = 2 in figure 1. Individuals are either mutants (red nodes) or residents (blue nodes). The only difference between the species is that mutants are chosen to reproduce with a different probability relative to the residents. This difference is meant to model ‘fitness’ and is parametrized by r [8]. All individuals in one partition are connected only to those in the other partition (black lines, figure 1). These connections constrain the Moran process such that offspring from one partition can only replace individuals in the other partition.
We consider the Moran process on the complete bipartite graph as a two-dimensional random walk, where the mutant population size fluctuates in two partitions. Let St−1 = [Sa,t−1, Sb,t−1] represent the size of the mutant populations in each partition on time step t − 1 (e.g. St−1 = [2, 1] in figure 1). Let Xt = [Xa,t, Xb,t] represent the change in the mutant population size on time step t. For example, if a mutant from A is chosen to reproduce and a resident from B to die, then Xt = [0, 1] (enlarged individuals and thick connection, figure 1). We write , where S0 = [Sa,0, Sb,0] is the initial mutant population size. On every time step we make a new observation of Xt and add it to the sum until all individuals are mutants (figure 1b(i)) or residents (b(ii) graph).
Our goal is to find the probability that an initial mutant population S0 eventually achieves fixation (i.e. the fixation probability), and how many time steps T are required to do so. Since all connections are undirected, T is almost surely finite [34]. Let a = [A, B] and b = [0, 0] represent the two possible final states of the bipartite graph (figure 1b). The fixation probability is then and the extinction probability is . We also want to find the conditional distributions and of T.
It is very difficult to calculate and , even for simpler birth–death processes like the fully connected, one-dimensional Moran process [16–18]. Instead, we consider the number of times that the mutant population size has changed upon absorption CT. Let Yt represent whether the mutant population size changes on time step t
Initializing C0 = 0, we write . Note that CT depends on T, so we interpret them as proxies to each other [16].
We will identify a product martingale that yields α and the full CCFs of CT.
2.2. Fixation probability and times from a two-dimensional martingale
First, we show how to obtain the fixation probability and CCFs of CT from a two-dimensional product martingale, assuming that we can find one. Say we find a product martingale of the form
2.1 |
where h is a free complex variable and f = f(h) and g = g(h) are functions of h that are independent of St−1. We say that h, f(h) and g(h) satisfying equation (2.1) define a ‘product martingale’ because its exponentiation turns sums into products
If we can show that , then equation (2.1) is true and we have a product martingale.
We can immediately calculate the fixation probability and CCFs of CT from equation (2.1) [28]. Taking the expectation of both sides of equation (2.1)
By induction
assuming that S0 is known (non-random) and C0 = 0. Doob’s optional stopping theorem states that a randomly stopped martingale is also a martingale [35,36]. Inserting a random variable T for t
Splitting the expectation, conditional on fixation or extinction
Inserting the fixation and extinction boundaries
2.2 |
We obtain the fixation probability and times from equation (2.2) by inserting special values for the free variable h into it [16]. For the fixation probability, insert h = 1 (recall that f and g are functions of h)
Rearranging for α
For the CCFs, insert into equation (2.2), where τ is a purely imaginary free variable
We recognize the conditional expectations as the CCFs of CT, and
Assume that there are two pairs of complex functions (f1(h), g1(h)) and (f2(h), g2(h)) that satisfy equation (2.1) [28]. Separately inserting those pairs into equation (2.2), we obtain a system of two equations
2.3 |
We have two equations, so we can solve for both and .
The key condition that we need to apply this analysis is
2.4 |
for two pairs of state-independent f(h) and g(h). We now show that this condition can be met for the Moran process on a complete bipartite graph.
2.3. A two-dimensional martingale for the complete bipartite graph
For compact notation, let Ft−1 represent the total fitness of the bipartite graph on time step t − 1: Ft−1 = rSa,t−1 + A − Sa,t−1 + rSb,t−1 + B − Sb,t−1. We use shorthand for the graph’s transition probabilities
These transition probabilities are
We want to find state-independent f(h) and g(h) such that equation (2.4) is true. Writing the expectation
Inserting pX0 and rearranging
2.5 |
Equation (2.5) is true if the following two equations are true
We split equation (2.5) this way because when we insert our expressions for transition probabilities, all state dependence cancels
With two equations, we can solve for f and g as functions of h. Rearranging the right equation for g
Substituting g in the left equation and rearranging, we see that f is the solution to a quadratic equation
There are two pairs of complex functions (f1(h), g1(h)) and (f2(h), g2(h)) that satisfy equation (2.4) for the Moran process on the complete bipartite graph. In particular, f1 and f2 are given by the quadratic formula (one corresponds to the plus sign and the other to the minus sign in the quadratic formula). Then g1 and g2 are linearly related to the inverse of those two solutions.
Figure 2 plots f1, f2 (a,c), g1 and g2 (b,d) as functions of τ, where . We plot these functions for r = 0.5 (a,b) and r = 1.5 (c,d). The real (red traces) and imaginary (black traces) parts of each function are plotted separately. Note that the real and imaginary parts of all functions are even and odd about τ = 0 respectively. Characteristic functions also have this property. Each panel in figure 2 shows that there are two functions f1 (solid traces, a,c) and f2 (dashed traces, a,c), and g1 (solid traces, b,d) and g2 (dashed traces, b,d) that satisfy equation (2.4).
When τ = 0, one of those two functions passes through the point (1, 0i) (pink and grey dots, figure 2). This observation reflects the fact that, when h = 1 (or τ = 0), the solution f = 1 and g = 1 to equation (2.4) is trivial. The other function passes through a non-trivial point (f0, 0i) or (g0, 0i) when τ = 0 (red and black dots, figure 2). These points are what we use to evaluate the fixation probability.
Setting h = 1 (or τ = 0) and solving for f and g above, we find those non-trivial points:
The fixation probability of the bipartite graph is then
which is consistent with previous results [20,21,33].
We obtain the CCFs of CT by rearranging equation (2.3)
and inserting in our expressions for f1, f2, g1 and g2.
In the special case A = B (i.e. the graph is isothermal), those functions have a simple form
Since f1 = g1 and f2 = g2, our two-dimensional product martingale reduces to one dimension. Furthermore, f1 and f2 are equivalent to functions derived from a one-dimensional product martingale applied to the fully connected Moran process (c.f. eqn. 2.7 in [16]). When the bipartite graph is isothermal, its CCFs of CT are equivalent to those of the fully connected Moran process [16].
2.4. Parameter dependence of the CCFs
The parameter dependence of and is explicit, so we can investigate their parameter dependence by simply evaluating them in different regions of parameter space.
2.4.1. Strong selection expedites extinction and fixation
Figure 3 plots (a,c) and (b,d) for a complete bipartite graph with two values of r (a,b and c,d). Our parameter values were A = 10, B = 4, Sa,0 = 1 and Sb,0 = 1. We plot the real (pink) and imaginary (grey) parts of the CCFs separately. Note that the real and imaginary parts of the CCFs are even and odd, and they pass through 1 and 0 at τ = 0, respectively.
Figure 3 compares and (solid traces) with simulation results from 100 000 trials of the Moran process (dashed traces). On each trial, we counted how many times the mutant population size changed, and whether it fixed or went extinct. We then applied the Fourier transform to that simulation data, and again plot its real and imaginary parts separately (dashed red and black traces, figure 3). We also compared our expression for α with the percentage of simulations where the mutants fixed (lower-right numbers, c,d). Our simulation code is available online at https://github.com/travismonk/bipartite. Simulation results match our theory extremely closely because our analysis is exact, and we ran sufficiently many simulations to converge to that solution.
Figure 3 also reports the conditional first two moments of and . The black and red numbers in the left panels report and , respectively. They report and in the right panels. Those moments are visualized by the black line and red parabola in each panel of figure 3. Analytical expressions for the conditional kth moment are found by evaluating derivatives of the CCFs
Those analytical expressions are not compact, so we omit them here. But it is easy to estimate at least the first few moments by visually inspecting the CCFs (e.g. black lines and red parabolas, figure 3). We can investigate how those moments depend on parameters by visually comparing CCFs that we calculate in different regions of parameter space.
For example, figure 3 shows that the first and second moments of CT decrease as selection increases (cf. a,b and c,d). When r is large, the drift of the mutant population size is more positive. When the drift is strongly positive, St is unlikely to increase far from S0 and then go extinct. Such paths to extinction require larger CT on average because they traverse more states of the graph. So it is less likely that a path with large CT will go extinct, because many of those paths are very unlikely to be observed. Therefore, when r is large, extinctions usually happen quickly. By the same logic, when r is large, it is unlikely for St to decrease substantially before reaching a. Those paths to extinction also require larger CT on average, so fixation usually happens quickly once the initial mutants gain a foothold on the graph. These arguments are no longer true when r = 1. When selection is neutral, it is more possible for St to drift higher from S0 and then go extinct, or drift lower and then fix. Since these longer paths to fixation or extinction are more probable, the conditional first and second moments of CT increase (cf. black and red numbers, figure 3).
2.4.2. Increasing the population size delays fixation more than extinction
Figure 4 illustrates how (a) and (b) depend on the total population size of the complete bipartite graph (i.e. A + B). Again, we plot the real (red/pink traces) and imaginary (black/grey traces) parts of the CCFs separately. We fixed r = 1.01, Sa,0 = 1 and Sb,0 = 1, and calculated CCFs of CT for three pairs of values for A and B. In all plots, we constrained the population sizes A and B to have the same ratio A/B = 3 (figure 4).
Figure 4 shows that is more sensitive to population size than . concentrates in a smaller neighbourhood about the origin as population size grows. That concentration dramatically affects its second moment (red numbers, b), as illustrated by the dashed red or pink parabolas. For these parameter values, that second moment increases by over an order of magnitude when the population size doubles. is also dependent on a, but not as dramatically as . The mean of CT to extinction (black numbers, a) doubles as population size doubles, as visualized by the slopes of the dashed lines. Higher-order moments of CT|b do not appear to be sensitive to changes in a.
These results are qualitatively similar to previous results we found for the fully connected Moran process [16]. As a moves farther from S0, increases because St must traverse a larger number of states, which implies a larger number of population size changes. But increases as well because longer paths to extinction become possible. For example, if a = (9, 3), then the Moran process cannot visit the state (9, 3) and then go extinct because it is already fixed. If we increase a, then that long path to extinction becomes possible. and increase as a increases for the same reason. As the distance between b and a increases, we can observe longer paths to absorption. Summing the square of those longer path lengths can result in a significantly higher second moment of CT. This observation is especially true when the probabilities of observing those longer paths are non-negligible, i.e. when selection is weak.
2.4.3. Increasing the starting state delays extinction more than fixation
Figure 5 is analogous to figure 4, except we altered the starting state S0 instead of a. We fixed r = 2, A = 3, B = 10, and varied S0 as indicated in the legend. Figure 5 shows that is more sensitive to changes in S0 than when S0 is closer to b than a. For example, increasing Sa,0 and Sb,0 by one more than doubles (black numbers, dashed black/grey lines, left panel). By contrast, and decrease slightly as S0 increases.
These results are sensible. As we increase the distance between S0 and b, we expect CT|b to increase because there are more states for St to traverse before extinction. Since increasing the distance between S0 and b necessarily decreases the distance between S0 and a, we expect CT|a to decrease for the converse reason. However, increasing S0 also makes longer paths to fixation possible. For example, if S0 = (Sa,0, Sb,0) = (1, 0), then Sa,t cannot decrease by 1 and then fix because the mutants already went extinct. But if S0 = (2, 0), then this path to fixation is possible, and it requires a slightly larger number of mutant population size changes on average. These two effects of increasing S0 on CT|a partially offset each other, so is relatively insensitive to changes in S0.
2.4.4. Asymmetric partition sizes delays extinction and fixation
Figure 6 is analogous to figures 4 and 5, except we fixed the total population size in all plots. We fixed r = 1.01, Sa,0 = 0 and Sb,0 = 1, and calculated the CCFs for three pairs of values for A and B. In all plots, we constrained A and B to sum to 12, but placed different numbers of those 12 individuals in the partitions (figure 6 legend). By doing so we can investigate how and depend on the asymmetry of partition sizes. We can also investigate how this asymmetry impacts the CCFs with respect to an isothermal graph.
Figure 6 shows that asymmetric partition sizes significantly impact both CCFs. It shows that the isothermal graph has the lowest first and second moments of both CT|b and CT|a (lightest pink and grey traces). As the population sizes of the partitions become increasingly asymmetric, the graph requires more population size changes to achieve extinction or fixation. To explain this observation, consider a star graph [11–13,21,25], i.e. a bipartite graph where one partition has only one individual and the other has many individuals (red and black traces, figure 6). On a given time step, an individual from the populous partition is more likely to be selected to reproduce because that partition has more individuals. Then the lonely individual in the other partition will be replaced on most time steps. Therefore, most mutant population size changes are that lonely individual flipping between mutant and resident. So the star graph requires many population size changes for the mutant population size in the populous partition to grow or shrink. This result is consistent with previous computational simulations showing that the fixation time T increases as the asymmetry of partition sizes increases [26].
2.5. Approximating the CCFs of the number of time steps
Figure 7 compares the CCFs of CT with those of T. Figure 7a,c compares (solid traces) with ψT|b(τ) (dashed traces), and figure 7b,d compares with ψT|a(τ). The bottom and top x-axes in figure 7 correspond to the independent variable of the CCFs of CT and T, respectively. Again, the real (pink and red) and imaginary (grey and black) parts of the CCFs are plotted separately. We obtained ψT|a(τ) and ψT|b(τ) by simulating the Moran process on the complete bipartite graph 200 000 times. Our simulation code is available online at https://github.com/travismonk/bipartite. We stored T|a or T|b after each simulation, depending on whether the Moran process achieved fixation or extinction, and computed their Fourier transforms. In all plots, r = 1.01, Sa,0 = 1 and Sb,0 = 0. The partition sizes were either A = 4 and B = 6 (a,b) or A = 9 and B = 1 (c,d).
Figure 7 shows that and approximate ψT|b(τ) and ψT|a(τ) to within a scaling constant for these parameter values. Equivalently, we can approximate CT|b ∝ T|b and CT|a ∝ T|a when r ≈ 1 and the partition sizes are small. That proportionality approximation is particularly accurate when the process fixes, starting from a small initial mutant population size (b,d). These results are analogous to those obtained for the fully connected Moran process [16]. The sojourn times of the fully connected Moran process do not significantly vary across state space when its population size is small, selection is weak, and the process fixes [16]. Therefore the scaling approximation CT|a ∝ T|a is particularly accurate in this region of parameter space for the Moran process. However, if the Moran process achieves extinction, its sojourn times can vary significantly over state space, and the approximation CT|b ∝ T|b may be inaccurate. Figure 7 shows that these observations hold for the complete bipartite graph when the graph is almost isothermal (a,b). When the partition sizes are highly asymmetric (c,d), both scaling approximations CT|a ∝ T|a and CT|b ∝ T|b are very accurate. This result suggests that the sojourn times of the complete bipartite graph do not vary significantly over state space if its partition sizes are highly asymmetric, e.g. the star graph.
Figure 8 is identical to figure 7, except we set r = 3. In the fully connected Moran process, the proportionality approximation CT|a ∝ T|a loses accuracy as selection departs from r ≈ 1 [16]. Figure 8 suggests that this result also holds for the complete bipartite graph (b,d). Given an appropriate scaling constant, we can accurately estimate the first few moments of T|a from CT|a, but higher-order moments are less accurately approximated than they were in figure 7. Figure 8 also suggests that our proportionality approximation is more accurate when the partition sizes are asymmetric (cf. a,b and c,d), as we observed in figure 7. Again, this observation suggests that the sojourn times of the complete bipartite graph do not appreciably change over state space when the partition sizes are asymmetric.
Figure 9 compares the conditional means and (blue squares) with and (red circles). We set r = 1, 2, 3, 4 and 5, and ran 100 000 simulations of the Moran process on a bipartite graph for each value of r. We calculated the means of those 100 000 simulations conditional on extinction (a,c) or fixation (b,d). We repeated these simulations for two bipartite graphs. The first bipartite graph had partition sizes A = 4 and B = 6 (figure 9a,b), and the second had partition sizes A = 9 and B = 1 (figure 9c,d). Each panel has two y-axes. The left, blue y-axis corresponds to the conditional means of CT, and the right, red y-axis corresponds to the conditional means of T.
Figure 9 suggests that the relationship between the conditional means of CT and T is not straightforward. When the mutant population size of the Moran process changes from St−1 to St, it remains in the state St for a geometrically distributed number of time steps before changing again. If that geometric distribution was constant over all transient states of the process, then the conditional means of CT and T would be proportional to each other. The proportionality constant would be that geometric distribution’s mean, and the red and blue markers in figure 9 would overlap perfectly. However, the means of those geometric distributions depend on St. For example, if the mutant population is very close to extinction or fixation, then the probability of the mutant population size changing on a time step is small. On most time steps we will observe a resident replacing a resident or a mutant replacing a mutant. But if the mutant and resident population sizes are equal, then we are more likely to observe a change in the mutant population size on a time step. The state dependence of those geometric means is why the conditional absorption time distributions of the Moran process are so difficult to calculate [16]. Our martingale methodology shows that when we eliminate those state-dependent geometric distributions by focusing on ‘active steps,’ we obtain clean and exact expressions for that quantity’s CCFs.
3. Discussion
Martingales can be interpreted as conservation laws for stochastic processes [16]. A martingale states that the expectation of some quantity does not change throughout a stochastic process. So if we know that expectation at the beginning of a stochastic process, then by induction we know it upon absorption. For example, we found two pairs of state-independent functions (f1(h), g1(h)) and (f2(h), g2(h)) such that equation (2.4) is true for the complete bipartite graph. Since we found a conservation law, we do not need to construct a Markov transition matrix [12,24], or evaluate recursion relations over all state space [13,37], or assume simplifying limits [14,37] to analyse the complete bipartite graph. Those alternative approaches are valuable because they are flexible tools to investigate a wider class of evolutionary graphs than we consider here. But if we can find a conservation law for a particular graph, then we can immediately exploit it to obtain elegant expressions for statistics of interest upon absorption.
The key step in applying martingale analysis is to somehow eliminate state dependence from an expectation that depends on some random variables of interest. This elimination step is easier for some random variables than others. For example, we have not yet found a martingale that depends on the number of time steps before absorption T for the complete bipartite graph. But we can find a martingale that depends on the number of ‘active steps’ of the graph, CT [29,38]. So by switching our random variable of interest, we facilitate clean analysis. Martingales may not be applicable to all problems of interest in evolutionary graph theory. But they are very helpful in identifying problems that are conducive to tractable analysis.
The evolutionary graph theory literature has primarily studied the conditional distributions or moments of T instead of CT [14,24,25,29,37]. We suggest switching their order of importance for four reasons. First, eliminating time steps where the graph does not change has no impact on the Moran process. By definition, the transition probabilities of the Moran process are unaffected by time steps where the mutant population remains unaltered [8,11]. Second, eliminating those time steps in simulations expedites computation time and reduces power consumption [38], especially as the mutant population size approaches extinction or fixation [17]. We can eliminate those time steps in simulations by calculating transition probabilities conditional on the mutant population changing, i.e. . Third, if we insist on obtaining conditional distributions of T, then we can sometimes closely approximate them from our conditional distributions of CT anyway [16]. Fourth, eliminating those time steps facilitates clean, elegant and exact expressions for the CCFs of active steps.
Our expressions for CCFs are exact because the Moran process must absorb exactly on either the fixation or extinction boundary. Since those boundaries are integers, and since the Moran process can only increase or decrease by 1, it cannot exceed them. Generally, random walks can exceed their absorbing boundaries. For example, consider a stochastic process where we continue adding observations of standard Gaussian random variables until the cumulative sum exceeds one of two (constant) absorbing boundaries [28]. This stochastic process can absorb at an infinite number of possible values because it can exceed its boundaries. Martingales only provide approximate results for global statistics such as absorption probabilities and times when barrier excess is not zero [28] or otherwise calculable [39]. Determining when martingales provide accurate approximations of global statistics for such stochastic processes, or bounds on them, is an active research topic. But this issue is irrelevant for the Moran process [21,28,31,32] and related birth–death processes [7,15], because martingales yield exact results for these problems.
Martingales are a particularly powerful approach to study evolutionary graphs because they are exempt from the curse of dimensionality. As the dimensionality of a graph increases (i.e. as we divide a population into more partitions), martingale analysis does not necessarily increase in complexity. Previous results have demonstrated this remarkable property by calculating fixation probabilities for certain kinds of evolutionary graphs with arbitrary dimensionality [32]. Our results here suggest that we can extend our analysis to find the CCFs of CT for higher-dimensional graphs. We can obtain CCFs of CT for a one-dimensional graph (i.e. the fully connected Moran process [16]), and a two-dimensional graph (i.e. the complete bipartite graph). Therefore, we should be able to obtain them for graphs of arbitrary dimensionality as well.
Martingales’ ability to scale with dimensionality is unmatched by other popular approaches to analysing such graphs, e.g. Markov chains and simulations [12,14,24,37]. As the dimensionality of a graph increases, the dimensionality of the Markov chain must increase because we need to account for more possible transitions of the graph on a time step. Calculating global statistics from a high-dimensional Markov matrix quickly becomes intractable, even if the elements in the matrix have simple mathematical forms [18]. Simulations quickly become prohibitively time-consuming to execute as graphs become more complex. When graphs have more partitions, they have more parameters (e.g. the population size in each partition is a parameter). Exploring how global statistics vary in high-dimensional parameter space is infeasible because simulation results are only valid for the specific parameter values we used in the simulation. Martingales can yield compact expressions for those global statistics that are valid over all parameter space, regardless of their dimensionality.
Some evolutionary graphs are probably not conducive to martingale analysis. We found a martingale by exploiting a symmetry in the state dependence of the mutant population increasing by one, and the resident population decreasing by one. For example, the state dependence in the transition probability of a mutant offspring from partition A replacing a resident in B is Sa,t−1(B − Sb,t−1). The state dependence in the transition probability of a resident offspring from partition B replacing a mutant in A is (B − Sb,t−1)Sa,t−1. Since those state dependencies are the same, we can cancel them in equation (2.4). If we consider graphs with directed edges [29], then this symmetry is destroyed. It will be significantly more difficult, if not impossible, to cancel state dependencies in transition probabilities over all state space. So it will be very difficult to find a conservation law for an evolutionary graph with directed edges. Martingale analysis may be unsuitable graphs with directed connections.
The applicability of martingale analysis is also sensitive to whether birth or death occurs first in a birth–death process (i.e. a birth–death or death–birth process [1,15,40–42]), and whether the selection of the dying or reproducing individual is fitness-dependent. In the original Moran process, we select the reproducing individual before the dying individual, and birth selection is fitness-dependent. We showed that we can eliminate state dependence in evaluating equation (2.4) for the original Moran process on a complete bipartite graph. Now say we select an individual to die before choosing another to reproduce on a time step. If we define death selection to be fitness-dependent in the death–birth process, then we preserve the symmetry of state-dependence and martingale analysis remains applicable. But if the dying individual is chosen first, and reproduction selection is fitness-dependent, then that symmetry is destroyed. Seemingly trivial changes in the definition of the birth–death process can significantly impact the application of martingale analysis.
Martingales may also be applicable to other extensions of the Moran process constrained by graphs. Instead of haploid reproduction, we can consider diploid reproduction models, where two individuals can sexually reproduce only if they are connected by an edge on the graph [7]. We can consider birth–death processes with more than two competing species, each with different fitnesses [43,44]. We can consider heterogeneous graphs, where fitness is attributed to nodes on the graph, as well as the species of the individual occupying it [45]. We can consider evolutionary games on graphs, where individuals connected by graph edges compete in games for some pay-off [25,42,46–48]. Whether or not martingale analysis is applicable to any of these Moran process extensions is an open research question. To show that it is, we need to find a quantity whose expectation is either one (a product martingale) or zero (a sum martingale), regardless of the state of the process. Then we can manipulate that conservation law to extract statistical quantities of interest. Finding such an expectation may be quite laborious or impossible, depending on the complexity of the stochastic process and the exploitable symmetries in it. In such cases, we should defer to other methods of analysis such as simulations, Markov chains, diffusion approximations, etc. But if we can find such an expectation, then martingale theory yields clean, elegant, exact and explicit expressions for statistics of interest. So those other methodologies should not be our default approaches to analysing evolutionary models, but rather our fallback options.
Supplementary Material
Data accessibility
Data and relevant code for this research work are stored in GitHub: https://github.com/travismonk/bipartite and have been archived within the Zenodo repository: https://doi.org/10.5281/zenodo.5504342.
Authors' contributions
T.M. conceived the study, did the maths, produced the figures, wrote the simulation code and drafted the manuscript. A.v.S. interpreted the mathematical results, checked the code and critically revised the manuscript.
Competing interests
The authors declare no competing interests.
Funding
No funding has been received for this article.
References
- 1.Kaveh K, Komarova NL, Kohandel M. 2015. The duality of spatial death–birth and birth–death processes and limitations of the isothermal theorem. R. Soc. Open Sci. 2, 140465. ( 10.1098/rsos.140465) [DOI] [PMC free article] [PubMed] [Google Scholar]
- 2.Voorhees B. 2015. Birth-death models of information spread in structured populations. In ISCS 2014: Interdisciplinary Symposium on Complex Systems: Emergence, Complexity and Computation, vol. 14 (eds Sanayei AE, Rössler O, Zelinka I), pp. 67-76. Cham, Switzerland: Springer. [Google Scholar]
- 3.Beerenwinkel N, Schwarz RF, Gerstung M, Markowetz F. 2015. Cancer evolution: mathematical models and computational inference. Syst. Biol. 64, e1-e25. ( 10.1093/sysbio/syu081) [DOI] [PMC free article] [PubMed] [Google Scholar]
- 4.Manem V, Kohandel M, Komarova N, Sivaloganathan S. 2014. Spatial invasion dynamics on random and unstructured meshes: implications for heterogeneous tumor populations. J. Theor. Biol. 349, 66-73. ( 10.1016/j.jtbi.2014.01.009) [DOI] [PMC free article] [PubMed] [Google Scholar]
- 5.Britton T, House T, Lloyd AL, Mollison D, Riley S, Trapman P. 2015. Five challenges for stochastic epidemic models involving global transmission. Epidemics 10, 54-57. ( 10.1016/j.epidem.2014.05.002) [DOI] [PMC free article] [PubMed] [Google Scholar]
- 6.Watts DJ. 2002. A simple model of global cascades on random networks. Proc. Natl Acad. Sci. USA 99, 5766-5771. ( 10.1073/pnas.082090499) [DOI] [PMC free article] [PubMed] [Google Scholar]
- 7.Whigham PA, Spencer HG. 2021. Graph-structured populations and the Hill–Robertson effect. R. Soc. Open Sci. 8, 201831. ( 10.1098/rsos.201831) [DOI] [PMC free article] [PubMed] [Google Scholar]
- 8.Nowak M. 2006. Evolutionary dynamics: exploring the equations of life. Cambridge, MA: Belknap Press of Harvard University Press. [Google Scholar]
- 9.Herrerías-Azcué F, Pérez-Muñuzuri V, Galla T. 2019. Motion, fixation probability and the choice of an evolutionary process. PLoS Comput. Biol. 15, 1-22. ( 10.1371/journal.pcbi.1007238) [DOI] [PMC free article] [PubMed] [Google Scholar]
- 10.Moran P. 1962. The statistical processes of evolutionary theory. Oxford, UK: Clarendon Press. [Google Scholar]
- 11.Lieberman E, Hauert C, Nowak MA. 2005. Evolutionary dynamics on graphs. Nature 433, 312-316. ( 10.1038/nature03204) [DOI] [PubMed] [Google Scholar]
- 12.Díaz J, Mitsche D. 2021. A survey of the modified Moran process and evolutionary graph theory. Comput. Sci. Rev. 39, 100347. ( 10.1016/j.cosrev.2020.100347) [DOI] [Google Scholar]
- 13.Broom M, Rychtar J. 2008. An analysis of the fixation probability of a mutant on special classes of non-directed graphs. Proc. R. Soc. A 464, 2609-2627. ( 10.1098/rspa.2008.0058) [DOI] [Google Scholar]
- 14.Allen B, Lippner G, Chen YT, Fotouhi B, Momeni N, Yau ST, Nowak MA. 2017. Evolutionary dynamics on any population structure. Nature 544, 227-230. ( 10.1038/nature21723) [DOI] [PubMed] [Google Scholar]
- 15.Tkadlec J, Pavlogiannis A, Chatterjee K, Nowak MA. 2020. Limits on amplifiers of natural selection under death-birth updating. PLoS Comput. Biol. 16, 1-13. ( 10.1371/journal.pcbi.1007494) [DOI] [PMC free article] [PubMed] [Google Scholar]
- 16.Monk T, van Schaik A. 2020. Wald’s martingale and the conditional distributions of absorption time in the Moran process. Proc. R. Soc. A 476, 20200135. ( 10.1098/rspa.2020.0135) [DOI] [PMC free article] [PubMed] [Google Scholar]
- 17.Hathcock D, Strogatz SH. 2019. Fitness dependence of the fixation-time distribution for evolutionary dynamics on graphs. Phys. Rev. E 100, 012408. ( 10.1103/PhysRevE.100.012408) [DOI] [PubMed] [Google Scholar]
- 18.Ashcroft P, Traulsen A, Galla T. 2015. When the mean is not enough: calculating fixation time distributions in birth-death processes. Phys. Rev. E 92, 042154. ( 10.1103/PhysRevE.92.042154) [DOI] [PubMed] [Google Scholar]
- 19.Zhu ZN, Zhang C, Wu Y, Liu W, Yang X. 2012. Fixation probabilities on complete star and bipartite digraphs. Discrete Dyn. Nat. Soc. 2012, 940465. ( 10.1155/2012/940465) [DOI] [Google Scholar]
- 20.Voorhees B, Murray A. 2013. Fixation probabilities for simple digraphs. Proc. R. Soc. A 469, 20120676. ( 10.1098/rspa.2012.0676) [DOI] [Google Scholar]
- 21.Monk T, Green P, Paulin M. 2014. Martingales and fixation probabilities of evolutionary graphs. Proc. R. Soc. A 470, 20130730. ( 10.1098/rspa.2013.0730) [DOI] [Google Scholar]
- 22.Leys SP, Degnan BM. 2001. Cytological basis of photoresponsive behavior in a sponge larga. Biol. Bull. 201, 323-338. ( 10.2307/1543611) [DOI] [PubMed] [Google Scholar]
- 23.Yagoobi M, Traulsen A. 2021. Fixation probabilities in network structured meta-populations. Sci. Rep. 11, 1-9. ( 10.1038/s41598-021-97187-6) [DOI] [PMC free article] [PubMed] [Google Scholar]
- 24.Hindersin L, Traulsen A. 2014. Counterintuitive properties of the fixation time in network-structured populations. J. R. Soc. Interface 11, 20140606. ( 10.1098/rsif.2014.0606) [DOI] [PMC free article] [PubMed] [Google Scholar]
- 25.Broom M, Hadjichrysanthou C, Rychtář J. 2010. Evolutionary games on graphs and the speed of the evolutionary process. Proc. R. Soc. A 466, 1327-1346. ( 10.1098/rspa.2009.0487) [DOI] [Google Scholar]
- 26.Tkadlec J, Pavlogiannis A, Chatterjee K, Nowak MA. 2019. Population structure determines the tradeoff between fixation probability and fixation time. Commun. Biol. 2, 138. ( 10.1038/s42003-019-0373-y) [DOI] [PMC free article] [PubMed] [Google Scholar]
- 27.Askari M, Samani KA. 2015. Analytical calculation of average fixation time in evolutionary graphs. Phys. Rev. E 92, 042707. ( 10.1103/PhysRevE.92.042707) [DOI] [PubMed] [Google Scholar]
- 28.Wald A. 1944. On cumulative sums of random variables. Ann. Math. Stat. 15, 283-296. ( 10.1214/aoms/1177731235) [DOI] [Google Scholar]
- 29.Díaz J, Goldberg LA, Richerby D, Serna M. 2016. Absorption time of the Moran process. Random Struct. Algor. 49, 137-159. ( 10.1002/rsa.20617) [DOI] [Google Scholar]
- 30.Ottino-Loffler B, Scott J, Strogatz S. 2017. Evolutionary dynamics of incubation periods. eLife 6, e30212. ( 10.7554/eLife.30212) [DOI] [PMC free article] [PubMed] [Google Scholar]
- 31.Monk T, Paulin MG, Green P. 2015. Ecological constraints on the origin of neurones. J. Math. Biol. 71, 1299-1324. ( 10.1007/s00285-015-0862-7) [DOI] [PubMed] [Google Scholar]
- 32.Monk T. 2018. Martingales and the fixation probability of high-dimensional evolutionary graphs. J. Theor. Biol. 451, 10-18. ( 10.1016/j.jtbi.2018.04.039) [DOI] [PubMed] [Google Scholar]
- 33.Houchmandzadeh B, Vallade M. 2013. Exact results for fixation probability of bithermal evolutionary graphs. Biosyst. 112, 49-54. ( 10.1016/j.biosystems.2013.03.020) [DOI] [PubMed] [Google Scholar]
- 34.Díaz J, Goldberg LA, Mertzios GB, Richerby D, Serna M, Spirakis PG. 2014. Approximating fixation probabilities in the generalized Moran process. Algorithmica 69, 78-91. ( 10.1007/s00453-012-9722-7) [DOI] [Google Scholar]
- 35.Doob J. 1953. Stochastic processes. New York, NY: Wiley. [Google Scholar]
- 36.Ross S. 1996. Stochastic processes, 2nd edn. New York, NY: John Wiley and Sons. [Google Scholar]
- 37.Allen B, Sample C, Steinhagen P, Shapiro J, King M, Hedspeth T, Goncalves M. 2021. Fixation probabilities in graph-structured populations under weak selection. PLoS Comput. Biol. 17, 1-25. ( 10.1371/journal.pcbi.1008695) [DOI] [PMC free article] [PubMed] [Google Scholar]
- 38.Cuesta FA, Sequeiros PG, Lozano-Rojo Á. 2015. Fast and asymptotic computation of the fixation probability for Moran processes on graphs. Biosyst. 129, 25-35. ( 10.1016/j.biosystems.2015.01.007) [DOI] [PubMed] [Google Scholar]
- 39.Klüppelberg C, Kyprianou AE, Maller RA. 2004. Ruin probabilities and overshoots for general Lévy insurance risk processes. Ann. Appl. Probab. 14, 1766-1801. ( 10.1214/105051604000000927) [DOI] [Google Scholar]
- 40.Allen B et al. 2020. Transient amplifiers of selection and reducers of fixation for death-birth updating on graphs. PLoS Comp. Biol. 16, 1-20. ( 10.1371/journal.pcbi.1007529) [DOI] [PMC free article] [PubMed] [Google Scholar]
- 41.Zukewich J, Kurella V, Doebeli M, Hauert C. 2013. Consolidating birth-death and death-birth processes in structured populations. PLoS ONE 8, 1-7. ( 10.1371/journal.pone.0054639) [DOI] [PMC free article] [PubMed] [Google Scholar]
- 42.Sood V, Antal T, Redner S. 2008. Voter models on heterogeneous networks. Phys. Rev. E 77, 041121. ( 10.1103/PhysRevE.77.041121) [DOI] [PMC free article] [PubMed] [Google Scholar]
- 43.Baxter G, Blythe R, McKane A. 2007. Exact solution of the multi-allelic diffusion model. Math. Biosci. 209, 124-170. ( 10.1016/j.mbs.2007.01.001) [DOI] [PubMed] [Google Scholar]
- 44.Ferreira EM, Neves AGM. 2020. Fixation probabilities for the Moran process with three or more strategies: general and coupling results. J. Math. Biol. 81, 277-314. ( 10.1007/s00285-020-01510-0) [DOI] [PubMed] [Google Scholar]
- 45.Kaveh K, McAvoy A, Chatterjee K, Nowak MA. 2020. The Moran process on 2-chromatic graphs. PLoS Comput. Biol. 16, 1-18. ( 10.1371/journal.pcbi.1008402) [DOI] [PMC free article] [PubMed] [Google Scholar]
- 46.Allen B, Lippner G, Nowak MA. 2019. Evolutionary games on isothermal graphs. Nat. Commun. 10, 5107. ( 10.1038/s41467-019-13006-7) [DOI] [PMC free article] [PubMed] [Google Scholar]
- 47.Ohtsuki H, Nowak MA. 2006. Evolutionary games on cycles. Proc. R. Soc. B 273, 2249-2256. ( 10.1098/rspb.2006.3576) [DOI] [PMC free article] [PubMed] [Google Scholar]
- 48.Altrock PM, Traulsen A, Nowak MA. 2017. Evolutionary games on cycles with strong selection. Phys. Rev. E 95, 022407. ( 10.1103/PhysRevE.95.022407) [DOI] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Supplementary Materials
Data Availability Statement
Data and relevant code for this research work are stored in GitHub: https://github.com/travismonk/bipartite and have been archived within the Zenodo repository: https://doi.org/10.5281/zenodo.5504342.