Approximations for the hitchhiking effect caused by the evolution of antimalarial-drug resistance

Kristan A Schneider; Yuseob Kim

doi:10.1007/s00285-010-0353-9

. Author manuscript; available in PMC: 2012 Jun 1.

Published in final edited form as: J Math Biol. 2010 Jul 11;62(6):789–832. doi: 10.1007/s00285-010-0353-9

Approximations for the hitchhiking effect caused by the evolution of antimalarial-drug resistance

Kristan A Schneider ^1,^2,^3,^✉, Yuseob Kim ^4,⁵

PMCID: PMC3242009 NIHMSID: NIHMS287685 PMID: 20623287

Abstract

An analytically feasible, deterministic model for the spread of drug resistance among human malaria parasites, which incorporates all characteristics of the complex malaria-transmission cycle was introduced by Schneider and Kim (Theor. Popul Biol, 2010). The model accounts for the fact that only a fraction of infected hosts receive drug treatment and that hosts can be co-infected by differently many parasites. Furthermore, the model also incorporates host heterogeneity. Antimalarial-drug resistance is assumed to be caused by a single locus with two alleles—a sensitive one and a resistance one. The most important result for this model is that an analytical solution for the frequencies of a linked neutral biallelic locus exists. However, the exact solution does not admit an explicit form, and cannot straightforwardly be interpreted in terms of the model parameters. Here, we establish simple approximations for the equilibrium frequency at the neutral locus. Under the assumption that the resistant allele is initially rare—the biologically most relevant assumption in this context—and that recombination is weak, the approximations become similar to the approximations in the standard hitchhiking model. However, there are crucial differences. In particular, because of the high degree of selfing among malaria parasites in their sexual phase, a genome-wide reduction of relative heterozygosity occurs if selection is sufficiently strong. It turns out that the approximations are accurate even if the recombination rates are not small and the resistant allele is initially not very rare. The main advantage of our approximations is that they are easy to interpret in terms of model parameters. Moreover, they allow to make predictions of the size of the valley of reduced heterozygosity around the selected locus for given model parameters. Reversely, for a given reduction of heterozygosity, it is possible to identify the corresponding parameters. Moreover, we will show that incorporating host heterogeneity leads to an increased hitchhiking effect.

Keywords: Selective sweep, Co-infections, Drug concentration, Relative heterozygosity, Host heterogeneity, Malaria, Plasmodium falciparum

1 Introduction

Human malaria is an infectious disease caused by parasites belonging to the genus Plasmodium, which is endemic in most tropical and subtropical regions in the world. Infections with Plasmodium falciparium, the most virulent form of human malaria, result worldwide in one to 3 million deaths per year (cf. WHO 2000; Korenromp et al. 2005). Malaria control is highly dependent on drug treatments that kill parasites in infected hosts. However, attempts to control malaria have been thwarted by rapid evolution of antimalarial-drug resistance, a fact that has been described to be a public health disaster (cf. Marsh 1998).

The limited repertoire of safe, effective, and affordable antimalarial drugs has made research on the emergence and dispersion of resistance a global health priority. Mathematical models that can use input from genetic data to investigate the dynamics of mutations associated with drug resistance are urgently needed for designing drug-deployment policies that can increase the lifespan of the available drugs. This requires a detailed understanding of population genetic processes that lead to the emergence and dispersion of drug resistance. Unfortunately, the highly complex nature of the malaria-transmission cycle as well as complex demographic and environmental factors aggravate the efforts to elaborate theoretical models.

Malaria parasites undergo a complex transmission cycle with sexual phases in the mosquito vector and asexual phases in the infected host (cf. e.g. Daily 2006; Prugnolle et al. 2009). A human host is inoculated with sporozoites by the bite of an infected Anopheles mosquito during its blood meal. In the human host the sporozoites migrate first to the liver where they differentiate into hepatic merozoites. These are released into the blood stream where some of the hepatic merozoites form gametocytes. The haploid gametocytes are extracted by a mosquito during its blood meal and immediately reproduce sexually in the mosquito’s gut. Consequently, they undergo recombination. Completing the transmission cycle, this step is followed by the production of haploid sporozoites in the mosquitos’s salivary glands from which they can be inoculated into a human host.

Another source of complication is that many environmental and clinical factors differ significantly across the worldwide distribution of this parasite. The transmission rate and hence the number of secondary infections varies from very low rates in parts of South America, over intermediate rates in Southeast Asia, to high rates in Africa. On the other hand the level of host-acquired immunity is much higher (which means a high number of asymptomatic infections) in most of the affected areas in Africa than in other parts of the world. Such variation in host-acquired immunity affects drug use.

Among others, two important variables that summarize the demographic and clinical setting of the particular geographic area are the (average) number of parasites (m) co-infecting a given host (the average multiplicity of infection), which is determined by transmission intensity, and the proportion (α) of infected hosts that are drug treated. This parameter depends mainly on how many hosts acquired immunity.

Several studies built population genetic models that demonstrated profound effects of m and α on the rate of drug-resistance evolution (Hastings and Mackinnon 1998; Mackinnon and Hastings 1998; Hastings 2003,2006). However, it is still unclear which mechanisms are important to the spread of resistance, i.e., intra-host dynamics, drug half life, multiple drug treatment, migration, multiple infections, recombination, mutation etc.

Over the last 2 decades substantial advances regarding the genetic basis of antimalarial-drug resistance have been made. It is known that specific point mutations in the dhfr and dhps regions underlie resistance to pyrimethamine and sulfadoxine (cf. Cowman et al. 1988; Triglia and Cowman 1994; Brooks et al. 1994), that point mutations in mitrochandrial DNA underlie resistance to atovaquone (cf. Schwobel et al. 2003), and that mutations in the pfcrt gene are causing resistance against Chloroquine (CQ).

The spread of mutations causing drug resistance leads to a valley of reduced genetic variation at linked neutral regions. This removal of pre-existing variation occurs because recombination cannot effectively break the initial association with the neutral background in which the mutant first occurred, a process known as genetic hitchhiking or a selective sweep (Maynard Smith and Haigh 1974; Stephan et al. 1992; Barton 2000). For instance, Nair et al. (2003) observed a severe reduction of variation at microsatellite loci spanning over a 100 kb region surrounding the dhfr gene in a Southeast Asian population of P. falciparum. The extent of this pattern depends on how fast the favored allele increases to high frequency while meiotic recombination is constantly eroding the association between the favored allele and the surrounding chromosome segment (Kim and Stephan 2002). Selective sweeps have been initially studied for randomly mating populations of constant size, and homogenous constant selection pressures (e.g., Maynard Smith and Haigh 1974), to which we refer as the ‘standard model’, or standard ‘hitchhiking’.

In malaria biology, detection of selective sweeps mainly contributed to confirming the location of drug resistant mutations and elucidating their mutational origins (Wootton et al. 2002; Nash et al. 2005; Mita et al. 2007; Nair et al. 2007; McCollum et al. 2008), while fewer studies attempted to relate the span of selective sweeps with the strength of drug selection (Nair et al. 2003). Recently, Schneider and Kim (2010) introduced an analytical feasible model for the spread of antimalarial-drug resistance, which allows to study genetic hitchhiking. The model covers the important characteristics of the transmission cycle, incorporates host heterogeneity, i.e., different classes of treated and untreated hosts, and accounts for the fact that hosts can be infected by differently many parasites. Moreover, the model allows simple conditions for the spread of resistance and its speed in terms of the fitness parameters and α. Hence, it is useful to find ‘optimal’ treatment strategies to prevent or slow down the spread of resistance (for more discussion see Schneider and Kim 2010).

Studies based on known recombination rates and the result of the standard model of selective sweeps concluded that the observed patterns of selective sweeps are compatible with the predictions (Nair et al. 2003). However, Schneider and Kim (2010) discuss in detail why the application of the standard selective-sweep model to a malaria-parasite population is highly problematic. In particular, the high degree of selfing among malaria parasites will lead to a genome-wide reduction of heterozygosity if selection is sufficiently strong (cf. also Hedrick 1980).

In the absence of reliable public-health records, a retrospective analysis of the mechanisms and parameters underlying the spread of antimalarial drug resistance may be achieved indirectly through the patterns of selective sweeps. However, the analytical solution for the valley of reduced heterozygosity in the model of Schneider and Kim (2010) is not explicit, which limits their applicability. Moreover, it is not at all straightforward to interpret these results based on the model parameters.

In this article we derive accurate approximations to the exact, analytical solutions of Schneider and Kim (2010). The approximations under the assumption that the resistant allele is initially rare and that recombination is weak are similar to but different from the usual approximation for standard hitchhiking. Unlike in standard hitchhiking, our approximations are accurate even if the recombination rates are large and the resistant allele is initially not very rare. Moreover, using the approximations, we will show that incorporating host heterogeneity will result in an increased hitchhiking effect compared to our basic model, with corresponding selection parameters, which accounts just for one class of treated and one class of untreated hosts. The simple form of the approximations will allow us to easily interpret the effect of genetic hitchhiking in terms of the model parameters. Such approximations are extremely useful to achieve applicability of the results of Schneider and Kim (2010) to real data. Moreover, we show that the approximate solutions can be easily applied to identify the range of parameters that give rise to given levels of reduced heterozygosity in a given range of recombination distances. Finally, we discuss the differences compared with standard hitchhiking and give an outlook how our results can be applied to real data.

2 The model

We consider two biallelic loci. The first locus is subject to selection with a sensitive allele A_S and a resistant allele A_R segregating. The second locus is selectively neutral with the alleles N₁ and N₂ segregating. We will use the notation and parametrization summarized in Table 1.

Table 1.

Summary of notation

Haplotypes

A_SN₁

A_SN₂

A_RN₁

A_RN₂

Frequency

p₁

p₂

p₃

p₄

Fitness in

W_{1}^{(U)}

W_{2}^{(U)}

W_{3}^{(U)}

W_{4}^{(U)}

Untreated hosts

1 − s

Fitness in

W_{1}^{(T)}

W_{2}^{(T)}

W_{3}^{(T)}

W_{4}^{(T)}

Treated hosts

1 − d_S

1 − d_R

Sensitive/resistant

Sen.

Res.

Open in a new tab

The tables shows the notation of the four haplotypes, their frequencies, fitnesses in treated and untreated hosts, the parametrization of fitnesses that we are going to use for the illustrations in the following sections, and whether the haplotypes are sensitive or resistant. Here, s reflects metabolic costs of the resistant allele, while d_S and d_R indicate how efficiently the drug wipes out the sensitive and resistant parasites, respectively

Let p denote the frequency of the resistant allele A_R. We have p = p₃ + p₄. Furthermore, we denote the frequency of the neutral allele N₁ among the sensitive and resistant haplotypes by R and Q, respectively. We therefore have $R = \frac{p_{1}}{p_{1} + p_{2}} = \frac{p_{1}}{1 - p}$ and $Q = \frac{p_{3}}{p_{3} + p_{4}} = \frac{p_{3}}{p}$ .

Moreover, we denote the recombination rate between the two loci by r, and the vector of haplotype frequencies by p = (p₁, … , p₄). We assume that each host is infected randomly and independently by exactly m haploids (parasite strains). (We assume m to be a fixed parameter until Sect. 4.2, where we assume that m follows a fixed frequency-distribution).

Hosts acquire parasite strains according to their frequencies in the mosquito phase. Hence, the configuration of infections in hosts is multinomially distributed with parameters m and p₁, … , p₄. We assume that all haploids in a newly infected host have equal frequencies. Hence, the relative frequency of a haploid drawn from the parasite population among mosquitos is $\frac{1}{m}$ . Let us denote a multi-index by m = (m₁, … , m₄), and the sum over its components by $| m | = \sum_{i = 1}^{4} m_{i}$ . The probability that a host is infected by m_i copies of haplotype i (i = 1, … , n, |m| = m) is given by

(\begin{matrix} m \\ m \end{matrix}) p^{m},

(1)

where $(\begin{matrix} m \\ m \end{matrix}) = \frac{m!}{\prod_{i = 1}^{4} m_{i}!}$ denotes the respective multinomial coefficient and, as usual, $p^{m} : = \prod_{i = 1}^{4} p_{i}^{m_{i}}$ .

After a host is infected the parasites reproduce clonally in the host. An infected host either receives drug treatment or is untreated. We assume that a fixed proportion α of infected hosts in the population is treated, whereas the remaining hosts are untreated. Thus, the probability for a host to be treated is simply α. The rate of reproduction of the haplotypes is different in treated and untreated hosts. The absolute fitness of a parasite strain is the expected number of its descendants in the host at the time of the mosquito visit. The absolute frequency (fitness) of haplotype i in an untreated host before a mosquito takes its blood meal is denoted by $\frac{m_{i}}{m} W_{i}^{(U)}$ , whereas the absolute frequency of haplotype i in a treated host before a mosquito takes its blood meal is denoted by $\frac{m_{i}}{m} W_{i}^{(T)}$ . [In the following, wherever it is appropriate, we use the superscript to (.) to resemble the superscripts (U) and (T).] Some of these haploids form gametocytes in male or female expressions. The frequencies of those are assumed to be proportional to the number of respective haplotypes. Furthermore, we impose that male and female gametocytes occur at the same frequencies. We assume that the number of different gametocytes taken by a mosquito during its blood meal from an infected host is proportional to its frequency in the host. Let γ denote the proportional constant, which is assumed to be the same for each mosquito. Hence, if the absolute frequencies of parasites in an infected host are $\frac{m_{1}}{m} W_{1}^{(.)}, \dots, \frac{m_{4}}{m} W_{4}^{(.)}$ , the absolute frequencies of parasites absorbed during the blood meal are $γ \frac{m_{1}}{m} W_{1}^{(.)}, \dots, γ \frac{m_{4}}{m} W_{4}^{(.)}$ . Note that this takes drug efficiency into account, because a mosquito will absorb a smaller number of parasites from a host in which drugs efficiently eliminated parasites.

In the mosquitos’ guts recombination occurs immediately after the blood meal during the phase in which meiosis occurs. In the gut of a mosquito, which has taken its blood meal from a host initially infected with m_i haplotypes i (i = 1, … , 4), the probability that a male k-gametocyte fertilizes a female l-gametocyte is

\frac{γ \frac{m_{k}}{m} W_{k}^{(.)}}{\frac{γ}{m} W_{m}^{(.)}} \cdot \frac{γ \frac{m_{l}}{m} W_{l}^{(.)}}{\frac{γ}{m} W_{m}^{(.)}} = \frac{m_{k} W_{k}^{(.)} m_{l} W_{l}^{(.)}}{W_{m}^{{(.)}^{2}}},

where

\frac{γ}{m} W_{m}^{(.)} = \frac{γ}{m} \sum_{k = 1}^{4} m_{k} W_{k}^{(.)}

is the frequency of parasite haploids in the mosquito’s gut. The above probability is the relative frequency of a male k-gametocyte times that of a female l-gametocyte. Therefore, the absolute frequency of pairings of a male k-gametocyte and a female l-gametocyte is, the probability of such a fertilization times the absolute numbers of parasites in the gut, i.e.,

\frac{γ^{2} \frac{m_{k}}{m} W_{k}^{(.)} \frac{m_{l}}{m} W_{l}^{(.)}}{\frac{γ}{m} W_{m}^{(.)}} = \frac{\frac{γ}{m} m_{k} W_{k}^{(.)} m_{l} W_{l}^{(.)}}{W_{m}^{(.)}} .

(2)

The probability that a fertilization of a male k-gametocyte to a female l-gametocyte produces a haplotype i is denoted by R(kl → i). Therefore, the absolute frequencies of haplotype i in the population of mosquitos that took their blood meal from treated and untreated hosts are respectively

p_{i}^{* (.)} = \sum_{| m | = m} (\begin{matrix} m \\ m \end{matrix}) p^{m} \sum_{k, l = 1}^{4} \frac{γ m_{k} W_{k}^{(.)} m_{l} W_{l}^{(.)}}{m W_{m}^{(.)}} R (kl \to i),

where the sum runs over all multi-indices $m \in ℕ_{+}^{4}$ with |m| = m. Therefore, the relative frequencies of haplotypes in the mosquito population become

p_{i}^{'} = \frac{α p_{i}^{* (T)} + (1 - α) p_{i}^{* (U)}}{α \sum_{k = 1}^{4} p_{k}^{* (T)} + (1 - α) \sum_{k = 1}^{4} p_{k}^{* (U)}} = \frac{p_{i}^{*}}{\sum_{k = 1}^{4} p_{k}^{*}},

(3a)

where

p_{i}^{*} : = α p_{i}^{* (T)} + (1 - α) p_{i}^{* (U)} .

(3b)

For what follows, let us define the average fitness of the resistant allele among treated and untreated hosts by

λ : = α W_{3}^{(T)} + (1 - α) W_{3}^{(U)}

(4a)

and that of the sensitive by

μ : = α W_{1}^{(T)} + (1 - α) W_{1}^{(U)} .

(4b)

If p(t) denotes the frequency of the resistant allele at time t, it changes according to $p (t) = \frac{λ^{t} p (0)}{λ^{t} p (0) + μ^{t} (1 - p (0))}$ , i.e., according to the standard haploid one-locus selection model (cf. Schneider and Kim 2010). Thus, the resistant allele will sweep through the population if and only if λ > μ, otherwise it will get lost (or remain constant in frequency). Hence, in the following we will always assume λ > μ without further mentioning. Note that the spread of the resistant allele is independent from m. Moreover, it is obvious to calculate the time until a given level of resistance is reached (cf. Result 2 in Schneider and Kim 2010). Because of the relatively simple form of the dynamics of the resistant allele it is possible to derive simple conditions for the spread of resistance and its speed. Example 1 in Schneider and Kim (2010) directly links the fitness parameters to the spread of fixation. Furthermore, assuming that the fitnesses in treated hosts are functions of the administered drug concentration, Example 2 in Schneider and Kim (2010) illustrates the impact of the drug concentration and α on the spread of resistance and its speed. Assuming that the fitness of resistant parasites decays slower than that of the sensitive ones as a function of the drug concentration they found that resistance spreads most quickly for intermediate drug concentrations and large α.

Anyhow, the focus of this article is genetic hitchhiking. The following relations were derived in Schneider and Kim (2010).

\bar{W} p_{i}^{'} = p_{i} (α W_{i}^{(T)} + (1 - α) W_{i}^{(U)}) + r_{i} p (1 - p) (Q - R) ϑ_{p, m},

(5a)

where

\begin{array}{l} r_{1} & = & - r_{2} = - r_{3} = r_{4} = r, \\ \bar{W} & = & p λ + (1 - p) μ, \\ ϑ_{p, m} & : = & \sum_{k = 0}^{m - 2} (\begin{matrix} m - 2 \\ k \end{matrix}) θ_{k, m} p^{k} {(1 - p)}^{m - 2 - k}, \end{array}

(5b)

θ_{k, m} : = α θ_{k, m}^{(T)} + (1 - α) θ_{k, m}^{(U)},

(5c)

and

θ_{k, m}^{(.)} : = \frac{(m - 1) W_{1}^{(.)} W_{3}^{(.)}}{(m - 1 - k) W_{1}^{(.)} + (k + 1) W_{3}^{(.)}} .

(5d)

It was further shown in Schneider and Kim (2010) that

R^{'} = R + rp (Q - R) \frac{ϑ_{p, m}}{μ}

(6a)

and

Q^{'} = Q - r (1 - p) (Q - R) \frac{ϑ_{p, m}}{λ} .

(6b)

From the relations (5) and (6) the equilibrium frequency of the neutral allele N₁ was calculated to be

{\hat{Q}}^{(m)} = Q_{0} + \frac{r (R_{0} - Q_{0})}{λ} \sum_{τ = 0}^{\infty} \frac{ϑ_{p_{τ}, m}}{\frac{p_{0}}{1 - p_{0}} {(\frac{λ}{μ})}^{τ} + 1} \prod_{l = 0}^{τ - 1} Λ_{m, l},

(7)

where p₀, Q₀, and R₀ are the respective initial frequencies,

ϑ_{p_{t}, m} = {(\frac{p_{0}}{1 - p_{0}} {(\frac{λ}{μ})}^{t} + 1)}^{2 - m} \sum_{k = 0}^{m - 2} (\begin{matrix} m - 2 \\ k \end{matrix}) θ_{k, m} {(\frac{p_{0}}{1 - p_{0}})}^{k} {(\frac{λ}{μ})}^{tk},

and

Λ_{m, t} : = 1 - r \frac{ϑ_{p_{t}, m}}{λ μ} (\frac{p_{0} λ^{t + 1} + (1 - p_{0}) μ^{t + 1}}{p_{0} λ^{t} + (1 - p_{0}) μ^{t}}) .

(8)

It was mentioned by Schneider and Kim (2010) that, for practical purposes, it suffices to sum (7) until the time of quasi-fixation, for which they provided an explicit formula. However, it is not at all obvious how Q̂^(m) is influenced by the various model parameters. In particular, when studying genetic hitchhiking using (7) it seems infeasible to predict the width of the valley of reduced heterozygosity in terms of the recombinational distance. Hence, simple but accurate approximations for (7) that permit a simple interpretation in terms of the model parameters are highly desirable.

3 Approximations

We shall now calculate simple approximations for (7). First, we treat the special case m = 2, and later deduce the general case from it.

3.1 Equilibrium frequencies at the neutral locus for m = 2

The equilibrium frequency of the neutral allele N₁ is given by

\hat{Q} : = {\hat{Q}}^{(2)} = Q_{0} + \frac{r ϑ (R_{0} - Q_{0})}{λ} \sum_{τ = 0}^{\infty} \frac{\prod_{l = 0}^{τ - 1} Λ_{l}}{\frac{p_{0}}{1 - p_{0}} {(\frac{λ}{μ})}^{τ} + 1},

where

ϑ : = ϑ_{p_{t}, 2}

(9)

is independent of the trajectory p_t of the resistant allele and in particular of p₀, and

Λ_{l} : = Λ_{2, l} .

(10)

In the following we will derive upper and lower bounds as well as approximations for Q̂ which exhibit a much simpler form. This permits us to explore the effects of the various parameters on Q̂.

First, note that Λ_l is monotone decreasing in l. To see this note that Λ_l < Λ_l−1 is equivalent to

1 - r \frac{ϑ}{λ μ} (\frac{p_{0} λ^{l + 1} + (1 - p_{0}) μ^{l + 1}}{p_{0} λ^{l} + (1 - p_{0}) μ^{l}}) < 1 - r \frac{ϑ}{λμ} (\frac{p_{0} λ^{l} + (1 - p_{0}) μ^{l}}{p_{0} λ^{l - 1} + (1 - p_{0}) μ^{l - 1}}) .

This can be simplified to

λ^{l - 1} μ^{l + 1} + λ^{l + 1} μ^{l - 1} > 2 λ^{l} μ^{l} .

By dividing through λ^l−1 μ^l−1 this simplifies to

{(λ - μ)}^{2} > 0 .

Moreover, note that

\underline{Λ} : = \lim_{l \to \infty} Λ_{l} = \lim_{l \to \infty} 1 - r \frac{ϑ}{λμ} (\frac{p_{0} λ + (1 - p_{0}) μ {(\frac{μ}{λ})}^{l}}{p_{0} + (1 - p_{0}) {(\frac{μ}{λ})}^{l}}) = 1 - r \frac{ϑ}{μ} .

(11)

Hence, by using

\bar{Λ} : = Λ (0) = 1 - \frac{r ϑ (p_{0} λ + (1 - p_{0}) μ)}{λ μ} \geq Λ_{l} \geq \underline{Λ},

(12)

we obtain

{\bar{Λ}}^{τ} \geq \prod_{l = 0}^{τ - 1} Λ_{l} \geq {\underline{Λ}}^{τ} .

Furthermore, because λ > μ we have

1 - \frac{r ϑ (p_{0} λ + (1 - p_{0}) μ)}{λ μ} \leq 1 - \frac{r ϑ (p_{0} μ + (1 - p_{0}) μ)}{λ μ} = 1 - \frac{r ϑ}{λ} = : \tilde{Λ},

(13)

and consequently Λ̃ ≥ Λ̅.

Let us define

Q^{(a)} : = Q_{0} + \frac{r ϑ (R_{0} - Q_{0})}{λ} \sum_{τ = 0}^{\infty} \frac{a^{τ}}{\frac{p_{0}}{1 - p_{0}} {(\frac{λ}{μ})}^{τ} + 1} .

(14)

Note that by relabeling the alleles, we can assume without loss of generality that R₀ > Q₀. Hence, (14) is monotone increasing in a. Thus, we can formulate our first result.

Result 1

Let Q^(a) be defined as in (14). Then, Q^(a) is monotone increasing in a. Moreover, let Λ̲, Λ̅ and Λ̃ be defined as in (11), (12), and (13), respectively. Then

\bar{Q} : = Q^{(\bar{Λ})} and \tilde{Q} : = Q^{(\tilde{Λ})}

are upper bounds for Q̂, and

\underline{Q} : = Q^{(\underline{Λ})}

is a lower bound for Q̂.

Although the expressions of this bounds are simpler than that of Q̂ they are still not very explicit. Hence, we shall derive upper and lower bounds for Q̅ and Q̲, or more generally for Q^(a).

First, note that ϑ < λ. This is because it is equivalent to

α W_{3}^{(T)} + (1 - α) W_{3}^{(U)} > \frac{α W_{1}^{(T)} W_{3}^{(T)}}{W_{1}^{(T)} + W_{3}^{(T)}} + \frac{(1 - α) W_{1}^{(U)} W_{3}^{(U)}}{W_{1}^{(U)} + W_{3}^{(U)}}

which obviously holds because

\frac{W_{1}^{(T)}}{W_{1}^{(T)} + W_{3}^{(T)}} < 1 and \frac{W_{1}^{(U)}}{W_{1}^{(U)} + W_{3}^{(U)}} < 1.

Essentially the same calculation shows ϑ < μ. Therefore, we have $\frac{ϑ}{λ} < 1$ and $\frac{ϑ}{μ} < 1$ . This combined with (11) and (13) implies $\frac{1}{2} < \underline{Λ} < \tilde{Λ} \leq 1$ ,since $r \in [0, \frac{1}{2}]$ . Hence, in the following we will always assume $a \in [\underline{Λ}, \tilde{Λ}] \subseteq [\frac{1}{2}, 1]$ .

Notice that Q̃ has exactly the same form as in the standard haploid hitchhiking model derived by Maynard Smith and Haigh (1974; eq. 8), with r replaced by $\frac{ϑ r}{λ}$ and p₀ replaced by $\frac{μ p_{0}}{λ (1 - p_{0}) + μ p_{0}}$ . Thus, we can interpret this as studying standard hitchhiking with an ‘effective’ recombination rate $\frac{ϑ r}{λ}$ , which is smaller than r, and an ‘effective’ initial frequency of $\frac{μ p_{0}}{λ (1 - p_{0}) + μ p_{0}}$ , which is smaller than the actual initial frequency. The adjusted recombination rate leads to a more severe hitchhiking effect, whereas the adjusted initial frequency leads to a less pronounced hitchhiking effect. Combination of this factors lead to a more severe hitchhiking effect (cf. Schneider and Kim 2010).

Next, we need two definitions. For a ∈ ℝ and k ∈ ℕ⁺ the upper Pochhammer symbol is defined by

{(a)}_{\bar{k}} : = \prod_{l = 0}^{k - 1} (a + l) .

The hypergeometric function is defined by

{}_{r}F_{s} (\begin{matrix} a_{1}, \dots, a_{r} \\ b_{1}, \dots, b_{s} \end{matrix} ∣ z) : = \sum_{k = 0}^{\infty} \frac{z^{k}}{k!} \frac{\prod_{u = 1}^{r} {(a u)}_{\bar{k}}}{\prod_{υ = 1}^{s} (b_{υ}) \bar{k}},

where r, s ∈ ℕ, z, a₁, …, a_r ∈ ℝ, b₁, … , $b_{s} \in ℝ \ ℤ_{0}^{-}$ .

With the above definition we are already able to formulate our first theorem.

Theorem 1

Assume $a \neq {(\frac{μ}{λ})}^{l}$ for l ∈ ℕ, and let csc $x = \frac{1}{\sin x}$ denote the cosecant of x. If $p_{0} < \frac{1}{2}$ , the function

ψ (a) : = Q_{0} + \frac{rϑ (R_{0} - Q_{0})}{λ} [\frac{π \csc (\frac{π \log a}{\log λ - \log μ})}{(\log λ - \log μ) {(\frac{p_{0}}{1 - p_{0}})}^{\frac{\log a}{\log λ - \log μ}}} - \frac{1}{\log a} {}_{2}F_{1} (\begin{matrix} 1, \frac{\log a}{\log λ - \log μ} \\ 1 + \frac{\log a}{\log λ - \log μ} \end{matrix} ∣ \frac{- p_{0}}{1 - p_{0}})]

(15)

is a lower bound for Q^(a). If $p_{0} < \frac{λ}{λ + μ}$ ,

Ψ (a) : = Q_{0} + \frac{rϑ (R_{0} - Q_{0})}{λ} [\frac{π \csc (\frac{π \log a}{\log λ - \log μ})}{(\log λ - \log μ) {(\frac{p_{0}}{1 - p_{0}})}^{\frac{\log a}{\log λ - \log μ}}} - \frac{1}{a \log a} {}_{2}F_{1} (\begin{matrix} 1, \frac{\log a}{\log λ - \log μ} \\ 1 + \frac{\log a}{\log λ - \log μ} \end{matrix} ∣ \frac{p_{0} μ}{(1 - p_{0}) λ})]

(16)

is an upper bound for Q^(a).

Proof

Consider the function

g (x) = \frac{a^{x}}{{cb}^{x} + 1}

(17)

for x > −1 with a ∈ (0, 1), and let

b : = \frac{λ}{μ} > 1, and c : = \frac{p_{0}}{1 - p_{0}} .

(18)

By calculating the derivative, which is given by

g^{'} (x) = \frac{a^{x} (\log a + {cb}^{x} (\log a - \log b))}{{(1 + {cb}^{x})}^{2}}

and obviously always negative, we recognize that g is monotone decreasing as a function in x. Thus, we obtain the estimate

\int_{0}^{\infty} g (x) d x \leq \sum_{τ = 0}^{\infty} g (τ) \leq \int_{0}^{\infty} g (x - 1) d x = \int_{- 1}^{\infty} g (x) d x .

(19)

Since, we have

Q^{(a)} = Q_{0} + \frac{r ϑ (R_{0} - Q_{0})}{λ} \sum_{τ = 0}^{\infty} g (τ),

it follows from (19) that

ψ (a) = Q_{0} + \frac{r ϑ (R_{0} - Q_{0})}{λ} \int_{0}^{\infty} g (x) d x

(20)

is a lower bound for Q^(a), and that

Ψ (a) = Q_{0} + \frac{r ϑ (R_{0} - Q_{0})}{λ} \int_{- 1}^{\infty} g (x) d x

(21)

is an upper bound for Q^(a).

We will now manipulate the above equations. First, rewrite g as

g (x) = \frac{e^{xξ}}{1 + {ce}^{xη}},

(22)

where

ξ : = \log a and η : = \log b .

(23)

Assume first that ce^xη < 1. Hence, g is a geometric series, can be rewritten as

g (x) = e^{xξ} \sum_{k = 0}^{\infty} {(- {ce}^{xη})}^{k} = \sum_{k = 0}^{\infty} {(- c)}^{k} e^{x (ξ + kη)},

(24)

and converges absolutely for all x satisfying ce^xη < 1. If ce^xη > 1, we have $\frac{1}{{ce}^{xη}} < 1$ and can expand (22) into the power series

g (x) = \frac{e^{xξ}}{{ce}^{xη}} \frac{1}{1 - \frac{- 1}{{ce}^{xη}}} = \frac{e^{xξ}}{{ce}^{xη}} \sum_{k = 0}^{\infty} {(- {ce}^{xη})}^{- k} = - \sum_{k = 0}^{\infty} {(- c)}^{- (k + 1)} e^{x (- η (k + 1) + ξ)},

which converges absolutely for all x satisfying ce^xη > 1.

Let us further define $x_{0} : = - \frac{\log c}{\log b}$ . The assumption $p_{0} < \frac{1}{2}$ guarantees x₀ > 0. The absolute convergence of the above series enables us to integrate by summands, i.e., for x < x₀ we have

\begin{array}{l} G_{1} (x) & = & \int g (x) d x = \int \sum_{k = 0}^{\infty} {(- c)}^{k} e^{x (ηk + ξ)} d x = \sum_{k = 0}^{\infty} \int {(- c)}^{k} e^{x (ηk + ξ)} d x \end{array}

(25a)

\begin{array}{l} = & \sum_{k = 0}^{\infty} {(- c)}^{k} \frac{e^{x (ηk + ξ)}}{ηk + ξ} = \frac{e^{xξ}}{η} \sum_{k = 0}^{\infty} {(- c)}^{k} \frac{e^{xηk}}{k + \frac{ξ}{η}} \\ = & \frac{e^{x ξ}}{ξ} \sum_{k = 0}^{\infty} \frac{\frac{ξ}{η}}{k + \frac{ξ}{η}} {(- {ce}^{xη})}^{k} \end{array}

(25b)

= \frac{e^{xξ}}{ξ} \sum_{k = 0}^{\infty} \frac{{(\frac{ξ}{η})}_{\bar{k}} {(1)}_{\bar{k}}}{{(1 + \frac{ξ}{η})}_{\bar{k}}} \frac{{(- {ce}^{xη})}^{k}}{k!} = \frac{e^{xξ}}{ξ} {}_{2}F_{1} (\begin{matrix} 1, \frac{ξ}{η} \\ 1 + \frac{ξ}{η} \end{matrix} ∣ - {ce}^{xη}) .

(25c)

Note that we have used the fact that $\frac{ξ}{η} \neq - k$ for all k ∈ ℕ in (25b), which is guaranteed by the assumption $a \neq {(\frac{μ}{λ})}^{k}$ for all k ∈ ℕ.

Similarly, for x > x₀ we have

\begin{array}{l} G_{2} (x) & = & \int g (x) d x = \int - \sum_{k = 0}^{\infty} {(- c)}^{- (k + 1)} e^{x (- η (k + 1) + ξ)} d x \\ = & - \sum_{k = 0}^{\infty} \int {(- c)}^{- (k + 1)} e^{x (- η (k + 1) + ξ)} d x \end{array}

(26a)

\begin{array}{l} = & - \sum_{k = 0}^{\infty} {(- c)}^{- (k + 1)} \frac{e^{x (- η (k + 1) + ξ)}}{- η (k + 1) + ξ} \\ = & \frac{e^{x ξ}}{η} \sum_{k = 0}^{\infty} {(- c)}^{- (k + 1)} \frac{e^{- x η (k + 1)}}{- (k + 1) + \frac{ξ}{η}} \end{array}

(26b)

= \frac{- e^{xξ}}{η} \sum_{k = 1}^{\infty} {(- c)}^{- k} \frac{e^{- x η k}}{- k + \frac{ξ}{η}} = \frac{- e^{xξ}}{ξ} \sum_{k = 1}^{\infty} \frac{- \frac{ξ}{η}}{k - \frac{ξ}{η}} {(\frac{- 1}{c e^{xη}})}^{k}

(26c)

\begin{matrix} = & \frac{- e^{xξ}}{ξ} \sum_{k = 1}^{\infty} \frac{{(- \frac{ξ}{η})}_{\bar{k}} {(1)}_{\bar{k}}}{{(1 - \frac{ξ}{η})}_{\bar{k}}} \frac{{(\frac{- 1}{c e^{x η}})}^{k}}{k!} \\ = & \frac{e^{x ξ}}{ξ} [1 - {}_{2}F_{1} (\begin{matrix} 1, - \frac{ξ}{η} \\ 1 - \frac{ξ}{η} \end{matrix} ∣ - \frac{1}{c e^{xη}})] . \end{matrix}

(26d)

It should be mentioned that we did not need the assumption $a \neq {(\frac{μ}{λ})}^{k}$ for all k ∈ ℕ in (26).

Next, assume $p_{0} < \frac{1}{2}$ i.e., x₀ > 0. Hence, we have

\begin{array}{l} \int_{0}^{\infty} g (x) d x & = & \int_{0}^{x_{0}} g (x) d x + \int_{x_{0}}^{\infty} g (x) d x \\ = & \lim_{x \to x_{0} -} G_{1} (x) - G_{1} (0) + \lim_{x \to + \infty} G_{2} (x) - \lim_{x \to x_{0} +} G_{2} (x) . \end{array}

Since $\frac{1}{k + \frac{ξ}{η}} \to 0$ , the Leibnitz criterium implies that

G_{1} (x_{0}) = \frac{e^{- \frac{ξ \log c}{η}}}{η} \sum_{k = 0}^{\infty} \frac{{(- 1)}^{k}}{k + \frac{ξ}{η}}

(27)

converges, i.e., −∞ < G₁(x₀) < ∞. Furthermore, by a similar argument we obtain −∞ < G₂(x₀) < ∞. We also see that

\begin{matrix} | G_{2} (x) | & = & | \frac{- e^{xξ}}{η} \sum_{k = 1}^{\infty} {(- c)}^{- k} \frac{e^{- xηk}}{- k + \frac{ξ}{η}} | \leq \frac{e^{xξ}}{η} \sum_{k = 1}^{\infty} | {(- c)}^{- k} \frac{e^{- xηk}}{- k + \frac{ξ}{η}} | \\ = & \frac{e^{xξ}}{η} \sum_{k = 1}^{\infty} {| \frac{1}{{ce}^{xη}} |}^{k} \frac{1}{k - \frac{ξ}{η}} \leq \frac{e^{x ξ}}{η} \sum_{k = 1}^{\infty} {| \frac{1}{{ce}^{xη}} |}^{k} = \frac{e^{xξ}}{η (1 - c e^{xη})} . \end{matrix}

Hence, for x > x₀ this implies that the series representation (26d) of G₂(x) is absolute convergent. Therefore, we obtain

\lim_{x \to x_{0} +} G_{2} (x) = G_{2} (x_{0})

(28)

and

\lim_{x \to + \infty} G_{2} (x) = \frac{0}{ξ} [1 - {}_{2}F_{1} (\begin{matrix} 1, - \frac{ξ}{η} \\ 1 - \frac{ξ}{η} \end{matrix} ∣ 0)] = 0 .

(29)

Similarly, we see that G₁(x) is absolute convergent for x₀ < x. Therefore, we have

\lim_{x \to x_{0} -} G_{1} (x) = G_{1} (x_{0}) .

(30)

We obtain

\begin{array}{l} \int_{0}^{\infty} g (x) d x & = & G_{1} (x_{0}) - G_{1} (0) - G_{2} (x_{0}) \\ = & \frac{e^{- \frac{ξ \log c}{η}}}{ξ} {}_{2}F_{1} (\begin{matrix} 1, \frac{ξ}{η} \\ 1 + \frac{ξ}{η} \end{matrix} ∣ - 1) - \frac{1}{ξ} {}_{2}F_{1} (\begin{matrix} 1, \frac{ξ}{η} \\ 1 + \frac{ξ}{η} \end{matrix} ∣ - c) - \frac{e^{- \frac{ξ \log c}{η}}}{ξ} [1 - {}_{2}F_{1} (\begin{matrix} 1, - \frac{ξ}{η} \\ 1 - \frac{ξ}{η} \end{matrix} ∣ - 1)] . \end{array}

(31)

If $p_{0} < \frac{λ}{λ + μ}$ , we have x₀ > − 1. The above calculations are still valid and yield

\begin{array}{l} \int_{- 1}^{\infty} g (x) d x & = & G_{1} (x_{0}) - G_{1} (- 1) - G_{2} (x_{0}) \\ = & \frac{e^{- \frac{ξ \log c}{η}}}{ξ} {}_{2}F_{1} (\begin{matrix} 1, \frac{ξ}{η} \\ 1 + \frac{ξ}{η} \end{matrix} ∣ - 1) - \frac{e^{- ξ}}{ξ} {}_{2}F_{1} (\begin{matrix} 1, \frac{ξ}{η} \\ 1 + \frac{ξ}{η} \end{matrix} ∣ - {ce}^{- η}) - \frac{e^{- \frac{ξ \log c}{η}}}{ξ} [1 - {}_{2}F_{1} (\begin{matrix} 1, - \frac{ξ}{η} \\ 1 - \frac{ξ}{η} \end{matrix} ∣ - 1)] . \end{array}

(32)

Let us now simplify (31) and (32). Note that

\begin{array}{l} A & : = & {}_{2}F_{1} (\begin{matrix} 1, κ \\ 1 + κ \end{matrix} ∣ - 1) + {}_{2}F_{1} (\begin{matrix} 1, - κ \\ 1 - κ \end{matrix} ∣ - 1) \\ = & \sum_{k = 0}^{\infty} \frac{{(1)}_{\bar{k}} {(κ)}_{\bar{k}}}{{(κ + 1)}_{\bar{k}}} \frac{{(- 1)}^{k}}{k!} + \sum_{k = 0}^{\infty} \frac{{(1)}_{\bar{k}} {(- κ)}_{\bar{k}}}{{(- κ + 1)}_{\bar{k}}} \frac{{(- 1)}^{k}}{k!} \\ = & \sum_{k = 0}^{\infty} \frac{{(- 1)}^{k} κ}{κ + k} + \sum_{k = 0}^{\infty} \frac{- {(- 1)}^{k} κ}{- κ + k} = \sum_{k = 0}^{\infty} \frac{2 κ^{2} {(- 1)}^{k}}{k^{2} - κ^{2}} \\ = & 2 κ^{2} π^{2} \sum_{k = 0}^{\infty} \frac{{(- 1)}^{k}}{κ^{2} π^{2} - k^{2} π^{2}} . \end{array}

By using the the well known series

\csc (z) = - \frac{1}{z} + 2 z \sum_{k = 0}^{\infty} \frac{{(- 1)}^{k}}{z^{2} - k^{2} π^{2}} for \frac{z}{π} \notin ℤ,

we arrive at

A = π κ \csc (π κ) - 1 .

Because $a \neq {(\frac{μ}{λ})}^{k}$ for k ∈ ℕ we have $\frac{ξ}{η} \notin ℤ^{+}$ . Because, we assume a ∈ [0, 1] and λ > μ we also have $a \neq {(\frac{μ}{λ})}^{k}$ for k ∈ ℤ⁻. Hence, we have $\frac{ξ}{η} \notin ℤ$ . Therefore, we obtain

\int_{0}^{\infty} g (x) d x = \frac{π \csc (π \frac{ξ}{η})}{{ηc}^{\frac{ξ}{n}}} - \frac{1}{ξ} {}_{2}F_{1} (\begin{matrix} 1, \frac{ξ}{η} \\ 1 + \frac{ξ}{η} \end{matrix} ∣ - c)

and

\int_{- 1}^{\infty} g (x) d x = \frac{π \csc (π \frac{ξ}{η})}{{ηc}^{\frac{ξ}{η}}} - \frac{e^{- ξ}}{ξ} {}_{2}F_{1} (\begin{matrix} 1, \frac{ξ}{η} \\ 1 + \frac{ξ}{η} \end{matrix} ∣ - {ce}^{- η}) .

Combining the above with (20) and (21) and the definitions of ξ, η, and c immediately yields (15) and (16).

The assumptions $p_{0} < \frac{1}{2}$ , or $p_{0} < \frac{λ}{λ + μ}$ hold whenever one aims to study the hitchhiking effect of a mutation that is initially rare. If the initial frequency of the mutation is not rare because one is interesting in studying mutations from standing genetic variation that become beneficial, for instance, because of a change in the environment, i.e., if one wants to study soft selective sweeps (cf. Hermisson and Pennings 2005), Theorem 1 is not applicable. The study of soft selective sweeps is relevant and hence we shall also treat the cases $p_{0} \geq \frac{1}{2}$ and $p_{0} \geq \frac{λ}{λ + μ}$ .

Remark 1

The proof of Theorem 1 reveals that we have to replace (15) by

ψ (a) = Q_{0} + \frac{r ϑ (R_{0} - Q_{0})}{λ \log a} [1 - {}_{2}F_{1} (\begin{array}{l} 1, \frac{\log a}{\log μ - \log λ} \\ 1 + \frac{\log a}{\log μ - \log λ} \end{array} ∣ - \frac{1 - p_{0}}{p_{0}})]

(33)

if $p_{0} \geq \frac{1}{2}$ . Moreover, we have to replace (16) by

Ψ (a) = Q_{0} + \frac{r ϑ (R_{0} - Q_{0})}{λa \log a} [1 - {}_{2}F_{1} (\begin{matrix} 1, \frac{\log a}{\log μ - \log λ} \\ 1 + \frac{\log a}{\log μ - \log λ} \end{matrix} ∣ - \frac{(1 - p_{0}) λ}{p_{0} μ})]

(34)

if $p_{0} \geq \frac{λ}{λ + μ}$ . Notably, (33) and (34) also hold if $a = {(\frac{λ}{μ})}^{k}$ for k ∈ ℕ. Furthermore, (33) and (34) are the continuations of (15) and (16) as functions in p₀.

Theorem 1 assumes $a \neq {(\frac{λ}{μ})}^{k}$ for all k ∈ ℕ. If this assumption is violated we have to make the following adjustments.

Remark 2

Assume l ∈ ℕ. If $p_{0} < \frac{1}{2}$ ,

ψ ({(\frac{μ}{λ})}^{l}) = Q_{0} + \frac{r ϑ (R_{0} - Q_{0})}{λ (\log μ - \log λ)} [\sum_{k = 0}^{l - 1} \frac{{(- 1)}^{k}}{k - l} {(\frac{p_{0}}{1 - p_{0}})}^{k} - {(\frac{- p_{0}}{1 - p_{0}})}^{l} \log p_{0}],

(35)

is a lower bound for $Q^{({(\frac{μ}{λ})}^{l})}$ . If $p_{0} < \frac{λ}{λ + μ}$ ,

Ψ ({(\frac{μ}{λ})}^{l}) = Q_{0} + \frac{r ϑ (R_{0} - Q_{0})}{λ (\log μ - \log λ)} [\sum_{k = 0}^{l - 1} \frac{{(- 1)}^{k}}{k - l} {(\frac{μ}{λ})}^{k - l} {(\frac{p_{0}}{1 - p_{0}})}^{k} - {(\frac{- p_{0}}{1 - p_{0}})}^{l} \log (\frac{p_{0} μ}{p_{0} μ + (1 - p_{0}) λ})]

(36)

is an upper bound for $Q^{({(\frac{μ}{λ})}^{l})}$ . Moreover, (35) and (36) are the continuations of (15) and (16) in the limit $a \to {(\frac{μ}{λ})}^{l}$ . If $p_{0} \geq \frac{1}{2}$ , (35) has to be replaced by (33). If $p_{0} \geq \frac{λ}{λ + μ}$ , (36) has to be replaced by (34).

The proof can be found in Appendix A.1.

Let

\tilde{Ψ} : = Ψ (\tilde{Λ}), \bar{Ψ} : = Ψ (\bar{Λ}), and \underline{Ψ} : = Ψ (\underline{Λ})

(37)

and analogously

\tilde{ψ} : = ψ (\tilde{Λ}), \bar{ψ} : = ψ (\bar{Λ}), and \underline{ψ} : = ψ (\underline{Λ}) .

(38)

We immediately obtain the following corollary.

Corollary 1

We have Ψ̃ ≥ Ψ̅ ≥ Q̅ ≥ Q̂ ≥ ψ̲, hence Ψ̃ and Ψ̅ are upper bounds for Q̂, and ψ̲ is a lower bound for Q̂. Moreover, we have Ψ̃ ≥ Q̃ ≥ ψ̃, Ψ̅ ≥ Q̅ ≥ ψ̅, and Ψ̲ ≥ Q̲ ≥ ψ̲.

Figure 1 illustrates Q̂ and its bounds Q̲ and Q̅, as well as their bounds Ψ̅, Ψ̲, ψ̅, and ψ̲ for various parameters. It becomes obvious that the upper bound Q̅ is very close to Q̂, whereas the lower bound Q̲ greatly underestimates Q̂.

Fig. 1 — Equilibrium frequency Q̂ of the neutral allele N₁ and various bounds for Q̂ as a function of r for different parameter combinations. The panels show Q̂ along with its upper and lower bounds Q̅ and Q̲, as well as their respective upper and lower bounds Ψ̅ and ψ̅, and Ψ̲ and ψ̲. The parameters for the various plot panels are specified in the *boxes* above the panels. In all panels Q̂ and Q̅ are almost identical

It turns out that Q̃ and Q̅ are almost identical unless p₀ is large. The same holds for for Ψ̃ and Ψ̅. This is illustrated in Fig. 2.

Although, Ψ (a) and ψ(a) have closed expressions, these expressions involve the hypergeometric function. In the following we shall derive approximations for Ψ(a) and ψ(a) that make no use of the hypergeometric function. Our first step is the following theorem.

Theorem 2

For $p_{0} < \frac{1}{2}$ let

ϕ (a) : = Q_{0} + \frac{r ϑ (R_{0} - Q_{0})}{λ} [{(\frac{p_{0}}{1 - p_{0}})}^{\frac{\log a}{\log μ - \log λ}} (\frac{1}{\log a} - \frac{1}{\log a + \log λ - \log μ} + \frac{1}{\log a - \log λ + \log μ}) - \frac{1}{\log a} + \frac{p_{0}}{(\log a + \log λ - \log μ) (1 - p_{0})}]

(39)

and, for $p_{0} < \frac{λ}{λ + μ}$ let

Φ (a) : = Q_{0} + \frac{r ϑ (R_{0} - Q_{0})}{λ} [{(\frac{p_{0}}{1 - p_{0}})}^{\frac{\log a}{\log μ - \log λ}} (\frac{1}{\log a} - \frac{1}{\log a + \log λ - \log μ} + \frac{1}{\log a - \log λ + \log μ}) - \frac{1}{a \log a} + \frac{p_{0 μ}}{a (\log a + \log λ - \log μ) (1 - p_{0}) λ}] .

(40)

If $p_{0} < \frac{1}{2}$ , ϕ(a) ≈ ψ(a) and if $p_{0} < \frac{λ}{λ + μ}$ , Φ(a) ≈ Ψ (a).

The proof can be found in Appendix A.1.

If $p_{0} \geq \frac{1}{2}$ , or $p_{0} \geq \frac{λ}{λ + μ}$ we have to modify the definitions of ϕ(a), or Φ(a), respectively.

Remark 3

For $p_{0} \geq \frac{1}{2}$ let

ϕ (a) : = Q_{0} - \frac{r ϑ (R_{0} - Q_{0}) (1 - p_{0})}{λ p_{0} (\log λ - \log μ - \log a)}

(41)

and, for $p_{0} \geq \frac{λ}{λ + μ}$ let

Φ (a) : = Q_{0} - \frac{r ϑ (R_{0} - Q_{0}) (1 - p_{0})}{μ p_{0} (\log λ - \log μ - \log a)} .

(42)

If $p_{0} \geq \frac{1}{2}$ , ϕ(a) ≈ ψ(a) and if $p_{0} \geq \frac{λ}{λ + μ}$ , Φ(a) ≈ Ψ(a). Moreover, (41) and (42) are the continuations of (39) and (40) regarded as functions in p₀.

The proof can be found in Appendix A.1.

From Theorem 2, Remark 3, and the definitions of Λ̲, Λ̅, and Λ̃ we immediately obtain the following corollary.

Corollary 2

Let Ψ, ψ, Φ and ϕ be defined as in Theorems 1 and 2, and Remarks 1–3. Furthermore, let

\tilde{Φ} : = Φ (\tilde{Λ}), \bar{Φ} : = Φ (\bar{Λ}), \underline{Φ} : = Φ (\underline{Λ})

(43)

and

\tilde{ϕ} : = ϕ (\tilde{Λ}), \bar{ϕ} : = ϕ (\bar{Λ}), \underline{ϕ} : = ϕ (\underline{Λ}) .

(44)

Then Φ̃ ≈ Ψ̃, Φ̅ ≈ Ψ̅, Φ̲ ≈ Ψ̲, ϕ̃ ≈ ψ̃, ϕ̅ ≈ ψ̅, and ϕ̲ ≈ ψ̲.

The approximations provided in Corollary 2 are very accurate unless p₀ becomes too large. This is illustrated in Fig. 3.

Note that we derived the upper and lower bounds for Q^(a) by using the estimate (19). However, we can use (19) for the standard estimate

\sum_{k = 0}^{\infty} g (k) \approx \frac{1}{2} \int_{- 1}^{\infty} g (x) d x + \frac{1}{2} \int_{0}^{\infty} g (x) d x .

(45)

Hence, the estimates summarized in the next corollary follow immediately.

Corollary 3

Estimates for Q̂ are given by

{\bar{Q}}^{*} : = \frac{1}{2} (\bar{Ψ} + \bar{ψ}) and {\tilde{Q}}^{*} : = \frac{1}{2} (\tilde{Ψ} + \tilde{ψ}),

(46)

and

{\bar{Q}}^{\circ} : = \frac{1}{2} (\bar{Φ} + \bar{ϕ}) and {\tilde{Q}}^{\circ} = \frac{1}{2} (\tilde{Φ} + \tilde{ϕ}) .

(47)

The approximations of Corollary 3 are illustrated in Fig. 4. It is obvious that the above approximations are very accurate for various parameters.

Although we already arrived at relatively simple approximations for Q̂, they are still difficult to interpret in terms of the involved parameters. In the following we will concentrate on the case in which the initial frequency p₀ of the resistant allele is small. This is the most relevant case for studying genetic hitchhiking.

3.2 Approximations for rare mutations

So far in our derivations we did not assume that the initial frequency p₀ of the resistant allele is small. However, since this is the biologically most relevant situations we shall derive further approximations for the equilibrium frequencies of the neutral alleles under the assumption that the mutation is initially rare, i.e., p₀ ≈ 0 and 1 − p₀ ≈ 1. Since we have Λ̃ ≈ Λ̅ under this assumption we focus on estimates based on Λ̃.

For p₀ ≈ 0 we obtain the following theorem.

Theorem 3

Let

A (a) : = Q_{0} + \frac{rϑ (R_{0} - Q_{0})}{λ} [\frac{1}{\log a} ({(\frac{p_{0}}{1 - p_{0}})}^{\frac{\log a}{\log μ - \log λ}} - 1) - \frac{1 - a}{2 a \log a}] .

(48)

For p₀ ≈ 0 we have Q°(a) ≈ A(a).

The proof can be found in Appendix A.1.

In accordance to out previous notation we set

\bar{A} : = A (\bar{Λ}) and \tilde{A} : = A (\tilde{Λ}) .

(49)

These approximations are illustrated in Fig. 5.

Now, we shall additionally assume that a ≈ 1.

Theorem 4

Let a(x) = 1 − rx with $x \in [\frac{ϑ}{λ}, \frac{ϑ}{μ}]$ and

B (x) : = Q_{0} + \frac{ϑ (R_{0} - Q_{0})}{xλ} (1 - p_{0}^{\frac{rx}{\log λ - \log μ}}) .

(50)

For p₀ ≈ 0 and rx ≈ 0 we have Q°(a(x)) ≈ B(x).

Proof

We have log a(x) ≈ −rx. Hence,

\begin{matrix} Q^{\circ} (a (x)) \approx Q_{0} + \frac{rϑ (R_{0} - Q_{0})}{λ} [\frac{1}{- rx} ({(\frac{p_{0}}{1 - p_{0}})}^{\frac{rx}{\log λ - \log μ}} - 1) + \frac{rx}{2 (1 - rx) rx}] \\ = Q_{0} + \frac{ϑ (R_{0} - Q_{0})}{xλ} [1 - {(\frac{p_{0}}{1 - p_{0}})}^{\frac{rx}{\log λ - \log μ}} + \frac{rx}{2 (1 - rx)}] . \end{matrix}

If rx ≈ 0, we can neglect $\frac{rx}{2 (1 - rx)}$ and obtain

Q^{\circ} (a (x)) \approx Q_{0} + \frac{ϑ (R_{0} - Q_{0})}{xλ} (1 - {(\frac{p_{0}}{1 - p_{0}})}^{\frac{rx}{\log λ - \log μ}}) .

Moreover, because p₀ ≈ 0 we have $\frac{p_{0}}{1 - p_{0}} \approx p_{0}$ . Thus,

Q^{\circ} (a (x)) \approx Q_{0} + \frac{ϑ (R_{0} - Q_{0})}{xλ} (1 - p_{0}^{\frac{rx}{\log λ - \log μ}}) .

Note that we have $a (\frac{ϑ (p_{0} λ + (1 - p_{0}) μ)}{λμ}) = \bar{Λ}$ and $a (\frac{ϑ}{λ}) = \tilde{Λ}$ . Let us define

\bar{B} : = B (\frac{ϑ (p_{0} λ + (1 - p_{0}) μ)}{λμ}) and \tilde{B} : = B (\frac{ϑ}{λ}) .

(51)

We obtain the following corollary.

Corollary 4

We have

\tilde{B} = R_{0} - (R_{0} - Q_{0}) p_{0}^{\frac{rϑ}{λ (\log λ - \log μ)}} .

(52)

For p₀ ≈ 0 and $\frac{rϑ}{λ} \approx 0$ , we have Q̃° ≈ B̃.

Figure 5 illustrates the approximations A̅, Ã, B̅, B̃. It is easily seen that the approximations are very accurate unless p₀ is too large (p₀ ≳ 0.1). Anyway, the approximations are still acceptably accurate for p₀ = 0.1.

By expanding $p_{0}^{x}$ into a Taylor series around x = 0, and by setting $x = \frac{rϑ}{λ (\log λ - \log μ)}$ we obtain

\tilde{B} = R_{0} - (R_{0} - Q_{0}) \sum_{k = 0}^{\infty} \frac{r^{k}}{k!} {(\frac{ϑ}{λ (\log λ - \log μ)})}^{k} \log p_{0} .

(53)

Hence, for r ≈ 0 we can further approximate this by neglecting terms of order O(r²) or O(r³) and higher, and obtain

\tilde{C} = Q_{0} - (R_{0} - Q_{0}) \frac{rϑ \log p_{0}}{λ (\log λ - \log μ)}

(54)

and

\tilde{D} = Q_{0} - (R_{0} - Q_{0}) \frac{rϑ \log p_{0}}{λ (\log λ - \log μ)} - (R_{0} - Q_{0}) \frac{r^{2}}{2} {(\frac{ϑ}{λ (\log λ - \log μ)})}^{2} \log p_{0} .

(55)

Figure 6 shows Q̂ along with its approximations C̃ and D̃. It is obvious that these approximations are only accurate for very small r. Not surprisingly D̃ is a much better approximation than the linear approximation C̃.

Fig. 6 — Equilibrium frequency Q̂ of the neutral allele N₁ and various bounds of Q̂ as a function of r for various parameter combinations. The panels show Q̂ along with its upper bounds Q̅ and Q̃, as well as the approximations C̃ and D̃.

3.3 The hitchhiking effect for general m

Since the sensitive allele goes extinct in the population, the equilibrium frequency of the neutral allele N₁ is given by (7), i.e., by

{\hat{Q}}^{(m)} = Q_{0} + \frac{r (R_{0} - Q_{0})}{λ} \sum_{τ = 0}^{\infty} \frac{ϑ_{p_{τ}, m}}{\frac{p_{0}}{1 - p_{0}} {(\frac{λ}{μ})}^{τ} + 1} \prod_{l = 0}^{τ - 1} Λ_{m, l} .

The above formula has the same structure as in the case m = 2. If we set m = 1 there is no recombination and hence the initial proportions of the neutral allele among genotypes with the resistant allele remain constant over time. Hence, we have ${\hat{Q}}^{(1)} = Q_{0}^{(1)}$ .

We shall now derive approximations for Q̂^(m). As easily seen from (7), we can apply our approximations from the case m = 2 if we make the approximation $\prod_{l = 0}^{τ - 1} Λ_{m, l} \approx a^{τ}$ and ϑ_{p_τ,m} ≈ b. We first approximate ϑ_{p_τ,m} by a constant. From (5b) we obtain

ϑ_{p_{τ}, m} = \sum_{k = 0}^{m - 2} (\underset{k}{m - 2}) p_{τ}^{k} {(1 - p_{τ})}^{m - 2 - k} \times (\frac{α (m - 1) W_{1}^{(T)} W_{3}^{(T)}}{(m - k - 1) W_{1}^{(T)} + (k + 1) W_{3}^{(T)}} + \frac{(1 - α) (m - 1) W_{1}^{(U)} W_{3}^{(U)}}{(m - k - 1) W_{1}^{(U)} + (k + 1) W_{3}^{(U)}}) .

Let

{\bar{ϑ}}^{(m)} : = \max_{k = 0, …, m - 2} {\frac{α (m - 1) W_{1}^{(T)} W_{3}^{(T)}}{(m - k - 1) W_{1}^{(T)} + (k + 1) W_{3}^{(T)}} + \frac{(1 - α) (m - 1) W_{1}^{(U)} W_{3}^{(U)}}{(m - k - 1) W_{1}^{(U)} + (k + 1) W_{3}^{(U)}}}

(56)

and

{\underline{ϑ}}^{(m)} : = \min_{k = 0, …, m - 2} {\frac{α (m - 1) W_{1}^{(T)} W_{3}^{(T)}}{(m - k - 1) W_{1}^{(T)} + (k + 1) W_{3}^{(T)}} + \frac{(1 - α) (m - 1) W_{1}^{(U)} W_{3}^{(U)}}{(m - k - 1) W_{1}^{(U)} + (k + 1) W_{3}^{(U)}}} .

(57)

Hence, we have

ϑ_{p_{τ}, m} \leq {\bar{ϑ}}^{(m)} \sum_{k = 0}^{m - 2} (\underset{k}{m - 2}) p_{τ}^{k} {(1 - p_{τ})}^{m - 2 - k} = {\bar{ϑ}}^{(m)},

where the last equality follows from the binomial formula. Similarly, we obtain ϑ_{p_τ,m} ≥ ϑ̲^(m).

Therefore, we obtain

1 - r \frac{{\bar{ϑ}}^{(m)}}{λμ} (\frac{p_{0} λ^{t + 1} + (1 - p_{0}) μ^{t + 1}}{p_{0} λ^{t} + (1 - p_{0}) μ^{t}}) \leq Λ_{m, t} \leq 1 - r \frac{{\underline{ϑ}}^{(m)}}{λμ} (\frac{p_{0} λ^{t + 1} + (1 - p_{0}) μ^{t + 1}}{p_{0} λ^{t} + (1 - p_{0}) μ^{t}}) .

Using the same approximation as in the case m = 2 yields

{\underline{Λ}}^{(m)} : = 1 - r \frac{{\bar{ϑ}}^{(m)}}{μ} \leq Λ_{m, t} \leq 1 - r \frac{{\underline{ϑ}}^{(m)}}{λ μ} (p_{0} λ + (1 - p_{0}) μ) = : {\bar{Λ}}^{(m)} .

Hence, by replacing ϑ and a by ϑ̅^(m) and Λ̅^(m), or ϑ̲^(m) and Λ̲^(m), respectively, we obtain the upper and lower bounds

{\bar{Q}}^{(m)} = Q_{0} + \frac{r {\bar{ϑ}}^{(m)} (R_{0} - Q_{0})}{λ} \sum_{k = 0}^{\infty} \frac{{\bar{Λ}}^{{(m)}^{k}}}{\frac{p_{0}}{1 - p_{0}} {(\frac{λ}{μ})}^{k} + 1}

and

{\underline{Q}}^{(m)} = Q_{0} + \frac{r {\underline{ϑ}}^{(m)} (R_{0} - Q_{0})}{λ} \sum_{k = 0}^{\infty} \frac{{\underline{Λ}}^{{(m)}^{k}}}{\frac{p_{0}}{1 - p_{0}} {(\frac{λ}{μ})}^{k} + 1},

respectively.

It turns out that the upper and lower bounds are very inaccurate estimates for Q̂^(m). However, it is possible to find relative accurate approximations for Q̂^(m), at least if p₀ is small, which is the most important case for our current purpose. If p₀ ≈ 0, we have $\frac{p_{0}}{1 - p_{0}} \approx 0$ , so that we obtain

\begin{array}{l} ϑ_{p_{τ}, m} & \approx & \sum_{k = 0}^{m - 2} (\underset{k}{m - 2}) {(\frac{p_{0}}{1 - p_{0}})}^{k} {(\frac{λ}{μ})}^{t (k)} \\ \times (\frac{α (m - 1) W_{1}^{(T)} W_{3}^{(T)}}{(m - k - 1) W_{1}^{(T)} + (k + 1) W_{3}^{(T)}} + \frac{(1 - α) (m - 1) W_{1}^{(U)} W_{3}^{(U)}}{(m - k - 1) W_{1}^{(U)} + (k + 1) W_{3}^{(U)}}) \\ \approx & \frac{α (m - 1) W_{1}^{(T)} W_{3}^{(T)}}{(m - 1) W_{1}^{(T)} + W_{3}^{(T)}} + \frac{(1 - α) (m - 1) W_{1}^{(U)} W_{3}^{(U)}}{(m - 1) W_{1}^{(U)} + W_{3}^{(U)}} = : {\tilde{ϑ}}^{(m)} . \end{array}

(58)

From (5c) we see that ϑ̃^(m) = θ_0,m Moreover, define

{\tilde{Λ}}^{(m)} : = 1 - \frac{r {\tilde{ϑ}}^{(m)}}{λ} .

(59)

Hence, by approximating ϑ_{p_τ,m} by ϑ̃^(m) and Λ_m,l by Λ̃^(m) we obtain

{\hat{Q}}^{(m)} \approx {\tilde{Q}}^{(m)} = Q_{0} + \frac{r {\tilde{ϑ}}^{(m)} (R_{0} - Q_{0})}{λ} \sum_{τ = 0}^{\infty} \frac{{({\tilde{Λ}}^{(m)})}^{τ}}{\frac{p_{0}}{1 - p_{0}} {(\frac{λ}{μ})}^{τ} + 1} .

(60)

This approximation turns out to be sufficiently accurate unless p₀ is too large.

Clearly, Q̃^(m) can be further approximated as in the case m = 2. However, note that for m > 2 the assumption that p₀ is small is always implicitly made because otherwise the approximation (58) for ϑ_{p_τ,m} will be inaccurate. Hence, only those approximations that were derived under the assumption of p₀ ≈ 0 are meaningful if m > 2, i.e., those of Result 4 and Corollary 4. We can therefore summarize:

Result 2

Let m ≥ 2 and

{\tilde{B}}^{(m)} : = R_{0} - (R_{0} - Q_{0}) p_{0}^{\frac{r {\tilde{ϑ}}^{(m)}}{λ (\log λ - \log μ)}} .

(61)

Then we have Q̂^(m) ≈ B̃^(m).

Note, that the approximations become worse for large m. However, as noted in Schneider and Kim (2010), the differences in Q̂^(m) for different values of m become small for large m. Since, in reality m should be bounded by a maximum possible value it should be sufficient to assume m < 10. For these values Result 2 is still accurate. This is illustrated in Fig. 7. Furthermore, note that Result 2 becomes delicate for large m, because it is based on the approximation (58), which becomes very inaccurate as m → ∞.

Fig. 7 — Equilibrium frequency Q̂ ^(m) (*thin lines*) of the neutral allele N₁ versus the approximation B^(m) (*thick dashed lines*) for various values of m. The parameters are as in Fig. 3

4 The general model

Schneider and Kim (2010) also presented two generalizations of their model. First, they argued that the differentiation into treated and untreated hosts is oversimplified, and that host heterogeneity should be taken into account. Host heterogeneity can reflect for instance different levels of drug concentration, drug decay, different levels of host-acquired immunity, different immune responses etc. Second, they argued that it is oversimplified to assume that each host is infected by the same number of parasites, i.e., that m is a fixed parameter, and showed how the model can be generalized to the case in which m follows a given frequency distribution.

We shall briefly summarize the two generalizations, and show how our results are generalized.

4.1 Host heterogeneity

Assume again a fixed proportion α of hosts is treated. We divide the treated and untreated host into various different discrete “treated classes” and “untreated classes”. Let the proportion of infected hosts that fall into class j (j ∈ ℕ) be α_j. Let U ⊆ ℕ and T = ℕ\U be the sets of treated and untreated classes, respectively. Hence, we have $\sum_{j \in U} α_{j} = 1 - α$ and $\sum_{j \in T} α_{j} = α$ . Let us denote the fitnesses of parasite haplotypes carrying the sensitive and resistant allele in hosts that fall into class j by $W_{1}^{(j)}$ and $W_{2}^{(j)}$ , and $W_{3}^{(j)}$ and $W_{4}^{(j)}$ , respectively.

Note that in the original formulation of the model we have $W_{k}^{(U)} = \sum_{j \in U} \frac{α_{j}}{1 - α} W_{k}^{(j)} = E [W_{k}^{(j)} ∣ U]$ , and $W_{k}^{(T)} = \sum_{j \in T} \frac{α_{j}}{α} W_{k}^{(j)} = E [W_{k}^{(j)} ∣ T]$ . Hence, it can be regarded as the approximation that all untreated hosts are subsumed in just one class, in which $W_{k}^{(U)}$ is the mean fitness of the parasites among them and all treated hosts are subsumed in just one class, in which $W_{k}^{(T)}$ is the mean fitness of the parasites among them.

In the model accounting for host heterogeneity the equilibrium frequency of the neutral allele N₁ is still given by (7), however the parameters λ, μ, are given by

λ = \sum_{j = 0}^{\infty} α_{j} W_{3}^{(j)} = E [W_{3}^{(j)}] = (1 - α) E [W_{3}^{(j)} ∣ U] + α E [W_{3}^{(j)} ∣ T]

(62a)

and

μ = \sum_{j = 0}^{\infty} α_{j} W_{1}^{(j)} = E [W_{1}^{(j)}] = (1 - α) E [W_{1}^{(j)} ∣ U] + α E [W_{1}^{(j)} | T],

(62b)

whereas ϑ_p,m and Λ_m,t are still given by (5b) and (8), however in (5b) θ_k,m has to be replaced by

θ_{k, m} = (m - 1) \sum_{j = 0}^{\infty} \frac{α_{j} W_{1}^{(j)} W_{3}^{(j)}}{(m - 1 - k) W_{1}^{(j)} + (k + 1) W_{3}^{(j)}} .

(63)

Hence, in all approximations ϑ̃^(m) has to be replaced by

{\tilde{ϑ}}^{(m)} = \sum_{j = 0}^{\infty} \frac{α_{j} (m - 1) W_{1}^{(j)} W_{3}^{(j)}}{(m - 1) W_{1}^{(j)} + W_{3}^{(j)}} .

(64)

We summarize:

Result 3

Let m ≥ 2. The equilibrium frequency of the neutral allele N₁, Q̂^(m), is approximately given by

{\tilde{B}}^{(m)} : = R_{0} - (R_{0} - Q_{0}) p_{0}^{\frac{r {\tilde{ϑ}}^{(m)}}{λ (\log λ - \log μ)}},

(65)

where λ,μ, and ϑ̃^(m) are given by (62a), (62b), and (64), respectively.

It was mentioned in Schneider and Kim (2010) that accounting for host heterogeneity results in a more pronounced hitchhiking compared to the basic model if the values of λ and μ coincide, or more precisely if $W_{k}^{(U)} = E [W_{k}^{(j)} ∣ U]$ , and $W_{k}^{(T)} = E [W_{k}^{(j)} ∣ T]$ . The reason is that under this assumption ϑ̃^(m) given by (64) is smaller than if it is given by (58), and (for fixed m) the respective approximations for the hitchhiking effect corresponds to standard hitchhiking with recombination rate $\frac{{\tilde{ϑ}}^{(m)} r}{λ}$ and initial frequency $\frac{μ p_{0}}{λ (1 - p_{0}) + μ p_{0}}$ . We will prove this in the Sect. 5.

4.2 Number of co-infections

Now, assume that the number m of parasites that infect a host follows some probability distribution over the population. Let κ_m denote the probability that a host is infected by m ≥ 1 parasites. Naturally, we have $\sum_{m = 1}^{\infty} κ_{m} = 1$ .

As shown in Schneider and Kim (2010), the equilibrium frequency of the neutral allele N₁ is given by

{\hat{Q}}^{(κ)} = Q_{0} + \frac{r (R_{0} - Q_{0})}{λ} \sum_{τ = 0}^{\infty} \frac{ϑ_{p_{τ}, κ}}{\frac{p_{0}}{1 - p_{0}} {(\frac{λ}{μ})}^{τ} + 1} \prod_{l = 0}^{τ - 1} Λ_{κ, l},

(66a)

with

ϑ_{p_{τ}, κ} = \sum_{m = 1}^{\infty} κ_{m} ϑ_{p_{τ}, m} and Λ_{κ, t} = \sum_{m = 1}^{\infty} κ_{m} Λ_{m, t},

(67a)

where ϑ_{p_τ,m} and Λ_m,t are given by (5b) and (8) (eventually with λ, μ, ϑ_p,m, Λ_m,t, and θ_k,m adjusted as in Sect. 4.1 to incorporate host heterogeneity). Hence, the equilibrium frequency of the neutral allele N₁ can be approximated following the calculations of Sect. 3. We can summarize this in the following result.

Result 4

The equilibrium frequency of the neutral allele N₁, Q̂^(κ) is approximately given by

{\tilde{Q}}^{(κ)} \approx {\tilde{B}}^{(κ)} : = R_{0} - (R_{0} - Q_{0}) p_{0}^{\frac{r {\tilde{ϑ}}^{(κ)}}{λ (\log λ - \log μ)}}

(68)

with ${\tilde{ϑ}}_{κ} = \sum_{m = 1}^{\infty} κ_{m} {\tilde{ϑ}}_{m}$ , where λ, μ, and ϑ̃^(m) are given by (4), and (58), respectively. If one wants to incorporate host heterogeneity, λ, μ, and ϑ̃^(m) are given by (62), and (64), respectively.

5 Equilibrium heterozygosity and the hitchhiking effect

From now on we assume that m follows a probability distribution as in Sect. 4.2 and we account for host heterogeneity as in Sect. 4.1 unless otherwise mentioned.

Remember that the equilibrium heterozygozity is given by

\hat{H} = 2 {\hat{Q}}^{(κ)} (1 - {\hat{Q}}^{(κ)}) .

(69)

We have seen in the last section that the equilibrium frequency, and its upper and lower bounds and approximations are given by expressions of the form Q₀ + (R₀ − Q₀) A(r), where A is a function of the recombination rate that has to be chosen appropriately. For brevity we will suppress the dependence of Q̂^(κ) on r unless necessary. Let us regard R₀, i.e., the initial frequency of N₁ among sensitive parasites, as a random variable and the heterozygosity as a function of R₀. Let us write

\hat{H} = \hat{H} (R_{0}) = 2 (Q_{0} + (R_{0} - Q_{0}) A (r)) (1 - Q_{0} - (R_{0} - Q_{0}) A (r)) .

(70)

Given the initial frequency of the neutral allele N₁ is R₀, the beneficial mutation occurs initially in association with allele N₁ with probability R₀, and in association with allele N₂ with probability (1 − R₀). Hence, we have Q₀ = 1 with probability R₀ and Q₀ = 0 with probability (1 − R₀). Therefore, the average heterozygosity given R₀ is calculated to be

\begin{array}{l} E (\hat{H} ∣ R_{0}) & = & 2 R_{0} (1 + (R_{0} - 1) A (r)) (Q_{0} - R_{0}) A (r) + 2 (1 - R_{0}) R_{0} A (r) (1 - R_{0} A (r)) \\ = & (R_{0} - R_{0}^{2}) (2 A (r) - A^{2} (r)) . \end{array}

(71)

Hence, according to the theorem of total probability, the average heterozygosity is calculated to be

E (\hat{H}) = E (E (\hat{H} ∣ R_{0})) = 2 (E (R_{0}) - E (R_{0}^{2})) (2 A (r) - A^{2} (r)) .

(72)

The initial heterozygosity is given by H₀ = 2R₀(1 − R₀). Hence, the fraction of the expected equilibrium heterozygosity over the initial heterozygosity is given by

H (r) = \frac{E (\hat{H})}{E (H_{0})} = 2 A (r) - A^{2} (r),

(73)

which is independent of the distribution of R₀. Since A(0) = 0 we have H(0) = 0.

If we further approximate Q̂ by B̃ given by (52), we obtain

H^{(κ)} (r) \approx {\tilde{H}}^{(κ)} = 1 - p_{0}^{\frac{2 r {\tilde{ϑ}}^{(κ)}}{λ (\log λ - \log μ)}} .

(74)

From (74) it is seen that H^(κ) shows a strong genome-wide reduction if κ₁ is large and selection is sufficiently strong. The approximation is illustrated in Fig. 8.

Fig. 8 — Relative expected heterozygosity H^(κ) (*thin lines*) of the neutral allele N₁ versus the approximation H̃^(κ) (*thick dashed lines*) for different distributions κ. In all cases we assume a truncated exponential distribution with range 1–10 and mean κ̅. By truncated we mean that the probability of m = 10 is the probability that m ≥ 10 for a poisson distribution with mean κ̅. The parameters are as in Fig. 3

For the special case that m is constant the above reduces to $H^{(m)} (r) \approx {\tilde{H}}^{(m)} = 1 - p_{0}^{\frac{2 r {\tilde{ϑ}}^{(m)}}{λ (\log λ - \log μ)}}$ which is increasing as a function of m. The reason is that ϑ̃^(m) is monotone increasing in m, either if it is given by (58) or in the case of host heterogeneity by (64).

It was mentioned in Schneider and Kim (2010) that accounting for host heterogeneity results in a more pronounced hitchhiking compared to the basic model if the values of λ and μ coincide, or, more precisely, if $W_{k}^{(U)} = E [W_{k}^{(j)} ∣ U]$ and $W_{k}^{(T)} = E [W_{k}^{(j)} ∣ T]$ . The reason is that, for every m, under these assumptions ϑ̃^(m) given by (64) is smaller than if it is given by (58). Hence, this holds also for the corresponding values of ϑ̃^(κ), and consequently (74) is smaller if host heterogeneity is incorporated. We shall summarize this as a remark and prove it in the appendix.

Remark 4

Host heterogeneity leads to an increased hitchhiking effect, i.e., to a stronger reduction in relative heterozygosity H̃^(κ)(r), compared to the basic model with only one class of treated and one class of untreated hosts with corresponding fitness parameters, i.e., $W_{k}^{(U)} = E [W_{k}^{(j)} ∣ U]$ , and $W_{k}^{(T)} = E [W_{k}^{(j)} ∣ T]$ .

For fixed large m the differences in H̃^(m) become very small and vanish in the limit m → ∞ because H̃^(m) approaches the classical approximation for standard hitchhiking.

Note, that the last statement of the remark is delicate, because the approximation H̃^(m) will be inaccurate for very large m.

We can use (73), or (74) to calculate the maximum recombination distance for which a given reduction in relative heterozygosity can be observed. This is relevant for predicting the width of the valley of relative heterozygosity for given selection parameters. By comparison of (73), or (74) with empirical data on the relative heterozygosity, it is possible to re-evaluate or validate parameter estimates (e.g., for α, λ, μ, etc.).

From (73) we obtain that the maximum recombination rate, r̂, for which the relative heterozygosity is smaller than β by solving the equation H(r̂) = β. If we solve this equation first with respect to A(r̂), we obtain

A (\hat{r}) = 1 - \sqrt{1 - β^{2}} .

(75)

By using the approximation according to (74), we obtain

\hat{r} = \frac{λ \log (1 - β^{2}) (\log λ - \log μ)}{2 {\tilde{ϑ}}^{(κ)} \log p_{0}} .

(76)

Figure 9 illustrates the valley of reduced heterozygosity as a function of α for different distributions κ of m. Such illustrations can be used to determine the range of parameters that lead to a given reduction in relative heterozygosity.

Fig. 9 — Contour plot of the valley of reduced heterozygosity for different distributions κ. In all cases we assume a truncated exponential distribution with range 1–10 and mean κ̅. By truncated we mean that the probability of m = 10 is the probability that m ≥ 10 for a poisson distribution with mean κ̅. In all panels the same selection parameters are assumed. The parameters are summarized in the *boxes* above the panels

6 Discussion

We obtained a closed-form approximation for the expected heterozygosity shaped by genetic hitchhiking in the model of antimalarial drug-resistance evolution proposed in Schneider and Kim (2010). This model aims to capture the effect of multiple infections per host (m) and drug-treatment rate (α), which are considered the most important epidemiological parameters that characterize geographic differences in the dynamics of drug resistance (Escalante et al. 2009), as well as the complex malaria transmission cycle on the pattern of selective sweeps caused by drug-resistant mutations (Daily 2006; Prugnolle et al. 2009). Due to the model complexity the exact solution is expressed by the summation of infinite (or a very large number of) terms. Here, we provided numerous approximate solutions with varying degree of accuracy. Notably, using the assumption that the starting frequency of the resistant allele under positive directional selection is low, we could obtain a solution that is simple enough to allow clear biological interpretations regarding the effects of epidemiological parameters. Furthermore, our approximations are flexible enough to incorporate arbitrary distributions of hosts with different infection rates and/or host heterogeneity, e.g., arbitrary distributions of hosts with different drug concentrations. The latter condition, arising due to a slow decay of antimalarial drugs in the bodies of treated patients, was demonstrated to be crucial for the initiation of drug resistance evolution (Hastings et al. 2002). For more discussion on how this model can be used to predict the spread of resistance and its speed, and how it can be used to design ‘optimal’ treatment strategies we refer to Schneider and Kim (2010).

The mean fitness of the resistant parasites, λ, and of the sensitive parasites, μ, are crucial in the considered model. If λ and μ are not too different, log λ − log μ corresponds to the selection coefficient of the beneficial (resistant) mutation. Then, our approximation (74) is basically identical to that under the standard model of hitchhiking obtained by Maynard Smith and Haigh (1974) except the modifying factors of the recombination rate, $\frac{{\tilde{ϑ}}^{(κ)}}{λ}$ , and of the initial frequency, $\frac{μ}{λ (1 - p_{0}) + μ p_{0}}$ . Therefore, the dynamics of hitchhiking unique to this malaria model is summarized by these factors. We have $\frac{{\tilde{ϑ}}^{(κ)}}{λ} < 1$ because $\frac{(m - 1) W_{1}^{(j)}}{(m - 1) W_{1}^{(j)} + W_{3}^{(j)}} < 1$ for all host class j and all m. The latter fraction represents the probability that a given resistant gametocyte pairs with a sensitive gametocyte in the body of a mosquito which took its blood meal from a j host, when the frequency of resistant allele is low. (As with the standard hitchhiking model, the final heterozygosity is predominantly determined by the dynamics of the resistant allele at its early stage.) If the drug in the host is very effective ( $W_{1}^{(j)} ≪ W_{3}^{(j)}$ ), it will greatly reduce $\frac{{\tilde{ϑ}}^{(κ)}}{λ}$ . Namely, the strength of selection determines the effective rate of recombination (decay of association between beneficial and neutral allele), unlike the case of standard hitchhiking model in which the two factors are decoupled. Moreover, the approximation (74) will be more accurate even for larger r and p₀ compared to the standard hitchhiking model because of the two adjustment factors $\frac{{\tilde{ϑ}}^{(κ)}}{λ}$ and $\frac{μ}{λ (1 - p_{0}) + μ p_{0}}$ .

Assume m is constant. It is also obvious from the above that the number, m, of independent parasite strains infecting a host determines the effective recombination rate. A small value of m thus increases the hitchhiking effect. In the extreme case of m = 1, genetic variation in the population is completely wiped out as parasites reproduce effectively asexually. On the other hand, with m → ∞ our approximation approaches that of standard hitchhiking. Note, however, that in the exact solution $\frac{ϑ^{(m)}}{λ}$ is still less than one, even with m → ∞, because the approximation leading to ϑ̃^(m) (Eq. 58) assumes that the frequency of the resistant allele is low and hence that the probability that a given host is infected by two or more resistant strains is negligible. However, if m increases to a large number, such chance is no longer negligible. Therefore, the combined effect of strong drug pressure and the limited number of clones in hosts can greatly amplify the effect of genetic hitchhiking beyond the level predicted by the standard model. If m is not constant, the hitchhiking effect is more pronounced for more left-skewed distributions κ. It will be in particular pronounced if a large fraction of single infections occurs.

Our approximation also reveals another important departure of our model from the previous models that assume random mating and homogeneous selective pressures. In the standard model, the allele-frequency trajectory of a beneficial mutation is necessary and sufficient information for predicting its hitchhiking effect on the linked neutral variation (Betancourt et al. 2004; Chevin and Hospital 2008). For example, the speed with which the beneficial mutation increases in the population determines the size of the genomic regions affected by hitchhiking. However, in our model with host heterogeneity, different combinations of parameter values (j and $W_{i}^{(j)}$ ) that specify the same allele frequency trajectory of the resistance allele may generate different hitchhiking effects. Schneider and Kim (2010) showed that the changes of sensitive and resistant allele frequencies are uniquely determined by their absolute fitness, μ and λ, respectively. With host heterogeneity, the fitness is simply the mean of $W_{1}^{(j)}$ or $W_{3}^{(j)}$ weighted by the frequencies of host classes (α_j). The modifying factor of effective recombination rate, $\frac{{\tilde{ϑ}}^{(m)}}{λ}$ (ore more generally $\frac{{\tilde{ϑ}}^{(κ)}}{λ}$ ), however is not a linear function of $W_{1}^{(j)}$ or $W_{3}^{(j)}$ . As a result, as shown in Remark 4, $\frac{{\tilde{ϑ}}^{(m)}}{λ}$ (or $\frac{{\tilde{ϑ}}^{(κ)}}{λ}$ )) decreases as one introduces host heterogeneity while keeping λ and μ constant. This makes our simplest approximation assuming no host heterogeneity a conservative predictor of the hitchhiking effect.

Comparison of approximate solutions with empirical observation of local reductions of genetic variation around the loci of drug resistance mutations, combined with other genetic (e.g. recombination rate, the frequency change of drug resistance) and epidemiological (e.g. the mean number of independent parasite clones per host) information, will greatly advance our understanding of antimalarial drug-resistance evolution. Especially, if empirical data for the reduction in heterozygosity is available, our results can be used to determine possible ranges for parameters that are unknown and/or infeasible to measure. It should be noted that the fraction λ/μ can be easily estimated from retrospective genetic data. Since $\log \frac{p_{t}}{1 - p_{t}} = t \log \frac{λ}{μ}$ , log λ/μ is just the slope of the linear regression of the logarithm of the ratio of resistant over sensitive parasites measured at different time points. However, it is difficult to scale time, because the number of transmission cycles per year is difficult to quantify. Moreover, also the parameter α should be easy to measure. On the opposite, the distribution of κ of m will be difficult to estimate. However, the distribution of m, and especially single infections, lead to a genome-wide reduction in relative heterozygosity if selection is sufficiently strong. This genome-wide reduction might be used to estimate the distribution κ. However, applying our results to real data lies beyond the scope of this article and will be accomplished in a follow-up paper.

Acknowledgments

This work was funded by the National Institute of Health grant R01GM084320. We want to thank Prof. Ananias Escalante for helpful comments on an earlier draft of this work. We gratefully acknowledge the fruitful discussions with him on this and similar topics. We also want to thank two anonymous reviewers.

Appendix A

A.1 Bounds and approximations

Proof of Remark 2

If $a = {(\frac{μ}{λ})}^{l}$ for l ∈ ℕ we just need to adapt the proof of Theorem 1. The derivation of G₁ assumed $\frac{ξ}{η} \notin ℤ$ . If this assumption is violated we have to adjust the derivation. In this case we have $- \frac{ξ}{η} = l$ and we can replace (24) by

\begin{array}{l} g (x) & = & \frac{e^{- xlη}}{{ce}^{xη} + 1} = e^{- xlη} \sum_{k = 0}^{\infty} {(- {ce}^{xη})}^{k} = \sum_{k = 0}^{\infty} {(- c)}^{k} e^{xη (k - l)} \\ = & {(- c)}^{l} \sum_{k = 0}^{\infty} {(- c)}^{k - l} e^{xη (k - l)} . \end{array}

Hence, we have to replace (25) in the proof of Theorem 1 by

\begin{array}{l} G_{1} (x) & = & \int g (x) d x = \int {(- c)}^{l} \sum_{k = 0}^{\infty} {(- c)}^{k - l} e^{xη (k - l)} d x \\ = & {(- c)}^{l} \sum_{k = 0}^{\infty} \int {(- c)}^{k - l} e^{xη (k - l)} d x \\ = & {(- c)}^{l} (\sum_{k = 0}^{l - 1} \int {(- c)}^{k - l} e^{xη (k - l)} d x + \int d x + \sum_{k = l + 1}^{\infty} \int {(- c)}^{k - l} e^{xη (k - l)} d x) \\ = & {(- c)}^{l} (\sum_{k = 0}^{l - 1} {(- c)}^{k - l} \frac{e^{xη (k - l)}}{η (k - l)} + x + \sum_{k = l + 1}^{\infty} {(- c)}^{k - l} \frac{e^{xη (k - l)}}{η (k - l)}) \\ = & {(- c)}^{l} (\sum_{k = 0}^{l - 1} {(- c)}^{k - l} \frac{e^{xη (k - l)}}{η (k - l)} + x - \frac{1}{η} \sum_{k = 1}^{\infty} {(- c)}^{k - 1} \frac{e^{xηk}}{k}) \\ = & {(- c)}^{l} (\sum_{k = 0}^{l - 1} {(- c)}^{k - l} \frac{e^{xη (k - l)}}{η (k - l)} + x - \frac{\log (1 + {ce}^{xη})}{η}), \end{array}

Thus, we have

G_{1} (- 1) = {(- c)}^{l} (\sum_{k = 0}^{l - 1} \frac{{(- c)}^{k - l}}{η (k - l)} - 1 - \frac{\log (1 + {ce}^{- η})}{η}),

(77)

G_{1} (0) = {(- c)}^{l} (\sum_{k = 0}^{l - 1} \frac{{(- c)}^{k - l}}{η (k - l)} - \frac{\log (1 + c)}{η}),

(78)

and

G_{1} (x_{0}) = {(- c)}^{l} (\sum_{k = 0}^{l - 1} \frac{{(- 1)}^{(k - l)}}{η (k - l)} + x_{0} - \frac{\log 2}{η}) .

(79)

Combining the above with (26), the definitions of ξ, η, c, (31), and (32) yields

\begin{array}{l} ψ ({(\frac{μ}{λ})}^{l}) & = & Q_{0} + \frac{r ϑ (R_{0} - Q_{0})}{λ} \\ \times [\frac{{(\frac{p_{0}}{1 - p_{0}})}^{l}}{\log λ - \log μ} (\sum_{k = 0}^{l - 1} \frac{{(- 1)}^{k}}{k - l} (1 - {(\frac{p_{0}}{1 - p_{0}})}^{k - l}) - \log 2 p_{0}) \\ - \frac{{(\frac{p_{0}}{1 - p_{0}})}^{l}}{l (\log λ - \log μ)} (1 - {}_{2}F_{1} (\begin{matrix} 1, l \\ 1 + l \end{matrix} ∣ - 1))], \end{array}

(80)

and

\begin{array}{l} Ψ ({(\frac{μ}{λ})}^{l}) & = & Q_{0} + \frac{r ϑ (R_{0} - Q_{0})}{λ} \\ \times [\frac{{(\frac{p_{0}}{1 - p_{0}})}^{l}}{\log λ - \log μ} (\sum_{k = 0}^{l - 1} \frac{{(- 1)}^{k}}{k - l} {(1 - (\frac{p_{0} μ}{(1 - p_{0}) λ})}^{k - l}) \\ - \log (\frac{p_{0} λ}{p_{0} μ + (1 - p_{0}) λ}) + \log λ + \log μ - \log 2) \\ - \frac{{(\frac{p_{0}}{1 - p_{0}})}^{l}}{l (\log λ - \log μ)} (1 - {}_{2}F_{1} (\begin{matrix} 1, l \\ 1 + l \end{matrix} ∣ - 1))], \end{array}

(81)

respectively. Note that we have

1 - {}_{2}F_{1} (\begin{matrix} 1, l \\ 1 + l \end{matrix} ∣ - 1) = 1 - \sum_{k = 0}^{\infty} \frac{{(1)}_{\bar{k}} {(l)}_{\bar{k}} {(- 1)}^{k}}{{(l + 1)}_{\bar{k}} k!} = 1 - \sum_{k = 0}^{\infty} \frac{l {(- 1)}^{k}}{l + k}

(82)

= 1 - l {(- 1)}^{l} (\sum_{k = 1}^{\infty} \frac{{(- 1)}^{k}}{k} - \sum_{k = 1}^{l - 1} \frac{{(- 1)}^{k}}{k})

(83)

= \frac{l {(- 1)}^{2 l}}{l} - l {(- 1)}^{l} (\log 2 - \sum_{k = 1}^{l - 1} \frac{{(- 1)}^{k}}{k})

(84)

= l {(- 1)}^{l} (\sum_{k = 1}^{l} \frac{{(- 1)}^{k}}{k} - \log 2)

(85)

= l \sum_{k = 0}^{l - 1} \frac{{(- 1)}^{k}}{k - l} - l {(- 1)}^{l} \log 2.

(86)

Hence, (80) and (81) simplify to

\begin{array}{l} ψ ({(\frac{μ}{λ})}^{l}) & = & Q_{0} + \frac{r ϑ (R_{0} - Q_{0})}{λ} \\ \times [\frac{{(\frac{p_{0}}{1 - p_{0}})}^{l}}{\log μ - \log λ} ({\sum_{k = 0}^{l - 1} \frac{{(- 1)}^{k}}{k - l} (\frac{p_{0}}{1 - p_{0}})}^{k - l} - \log p_{0})], \end{array}

(87)

and

Ψ ({(\frac{μ}{λ})}^{l}) = Q_{0} + \frac{r ϑ (R_{0} - Q_{0})}{λ} [\frac{{(\frac{p_{0}}{1 - p_{0}})}^{l}}{\log μ - \log λ} (\sum_{k = 0}^{l - 1} \frac{{(- 1)}^{k}}{k - l} {(\frac{p_{0} μ}{(1 - p_{0}) λ})}^{k - l} - \log (\frac{p_{0} μ}{p_{0} μ + (1 - p_{0}) λ}))],

(88)

which equal (36) and (35), respectively.

Now, let us regard g(x) given by (17) as a function in x and a and write g_a(x) for it. Clearly for x ≥ 0, g_a(x) is monotone increasing in a, and for x < 0 it is monotone decreasing. Since by Theorem 1 the integral $\int_{\tilde{x}}^{\infty} g_{a} (x) d x$ exists for x̃ ∈ {−1, 0} and $a \neq {(\frac{μ}{λ})}^{l}$ , it follows by the Theorems of monotone convergence and, in case x̃ = −1, also by the theorem of dominated convergence that $\lim_{a \to a_{0}} \int_{\tilde{x}}^{\infty} g_{a} (x) d x = \int_{\tilde{x}}^{\infty} g_{a_{0}} (x) d x$ . By setting $a_{0} = {(\frac{μ}{λ})}^{l}$ it follows that (36) and (35) are the continuations of (16) and (15) in the limit $a \to {(\frac{μ}{λ})}^{l}$ .

If x₀ ≤ 0, we do not need the function G₁, and hence the derivations from Theorem 1 and Remark 1 hold for ψ. The same holds for Ψ if x₀ ≤ −1. This finishes the proof.

Proof of Theorem 2

For κ ≠ −l(l ∈ ℕ) we have

{}_{2}F_{1} (\begin{matrix} 1, κ \\ 1 + κ \end{matrix} ∣ z) = \sum_{k = 0}^{\infty} \frac{{(κ)}_{\bar{k}} z^{k}}{{(κ + 1)}_{\bar{k}}} = 1 + κ \sum_{k = 1}^{\infty} \frac{z^{k}}{κ + k} \approx 1 + \frac{κz}{κ + 1} .

(89)

First assume $a \neq {(\frac{μ}{λ})}^{l}$ for l ∈ ℕ, i.e., $\frac{ξ}{η} \neq - l$ .

Setting $κ = \frac{ξ}{η}$ and z = −c gives

{}_{2}F_{1} (\begin{matrix} 1, \frac{ξ}{η} \\ 1 + \frac{ξ}{η} \end{matrix} ∣ - c) \approx 1 - \frac{cξ}{η + ξ} = 1 - \frac{p_{0} \log a}{(\log a + \log λ - \log μ) (1 - p_{0})},

(90)

whereas setting $κ = \frac{ξ}{η}$ and $z = - \frac{c}{η}$ gives

{}_{2}F_{1} (\begin{matrix} 1, \frac{ξ}{η} \\ 1 + \frac{ξ}{η} \end{matrix} ∣ - \frac{c}{e^{η}}) \approx 1 - \frac{cξ}{e^{η} (η + ξ)} = 1 - \frac{p_{0} μ \log a}{(\log a + \log λ - \log μ) (1 - p_{0}) λ} .

(91)

Moreover, setting $κ = \frac{- ξ}{η}$ and z = −1 yields

{}_{2}F_{1} (\begin{matrix} 1, - \frac{ξ}{η} \\ 1 - \frac{ξ}{η} \end{matrix} ∣ - 1) \approx 1 - \frac{ξ}{ξ - η} = 1 - \frac{\log a}{\log a - \log λ + \log μ},

(92)

whereas by setting $κ = \frac{ξ}{η}$ and z = −1 we obtain

{}_{2}F_{1} (\begin{matrix} 1, \frac{ξ}{η} \\ 1 + \frac{ξ}{η} \end{matrix} ∣ - 1) \approx 1 - \frac{ξ}{ξ + η} = 1 - \frac{\log a}{\log a + \log λ - \log μ} .

(93)

First, combining (93), (90), and (92) with (31), and using this approximation in (20) yields (39) by using the definitions of ξ and η. Similarly, we obtain (40) by combining (93), (91), (92), with (32) and (21).

Clearly, (39) and (40) are continuous especially also at $a = {(\frac{μ}{λ})}^{l}$ . Since ψ(a) and Ψ(a) have continuations at $a = {(\frac{μ}{λ})}^{l}$ for (l ∈ ℕ) according to Remark 2, we have ϕ(a) ≈ ψ(a) and Φ(a) ≈ Ψ(a) for all a. This finishes the proof.

Proof of Remark 3

We obtain (41) by combining (33) with (89) for $κ = \frac{\log a}{\log μ - \log λ}$ and $z = - \frac{1 - p_{0}}{p_{0}}$ , whereas we obtain (42) by combining (34) with (89) for $κ = \frac{\log a}{\log μ - \log λ}$ and $z = - \frac{(1 - p_{0}) λ}{p_{0} μ}$ .

Clearly (39), (40), (41), and (42) are continuous functions in p₀. For $p_{0} = \frac{1}{2}$ it is easily seen that (39) equals (41), whereas for $p_{0} = \frac{λ}{λ + μ}$ (40) equals (42). Hence, (41) and (42) are the continuations in p₀ of (39) and (40), respectively.

Proof of Theorem 3

Note that we have

Q^{\circ} (a) = Q_{0} + \frac{rϑ (R_{0} - Q_{0})}{λ} A,

(94)

where

\begin{array}{l} A & = & {(\frac{p_{0}}{1 - p_{0}})}^{\frac{\log a}{\log μ - \log λ}} (\frac{1}{\log a} - \frac{1}{\log \frac{a λ}{μ}} + \frac{1}{\log \frac{a μ}{λ}}) \\ - \frac{1}{2 a \log a} + \frac{p_{0} μ}{2 (1 - p_{0}) aλ \log \frac{aλ}{μ}} - \frac{1}{2 \log a} + \frac{p_{0}}{2 (1 - p_{0}) \log \frac{aλ}{μ}} \\ = & {(\frac{p_{0}}{1 - p_{0}})}^{\frac{\log a}{\log μ - \log λ}} (\frac{1}{\log a} + \frac{1}{\log \frac{aμ}{λ}}) - \frac{1}{\log a} (\frac{1 + a}{2 a}) \\ - {(\frac{p_{0}}{1 - p_{0}})}^{\frac{\log a}{\log μ - \log λ}} (\frac{1}{\log \frac{aλ}{μ}}) + \frac{p_{0}}{2 (1 - p_{0}) \log \frac{aλ}{μ}} (\frac{μ}{aλ} + 1) \\ = & {(\frac{p_{0}}{1 - p_{0}})}^{\frac{\log a}{\log μ - \log λ}} (\frac{1}{\log a} + \frac{1}{\log \frac{aμ}{λ}}) - \frac{1 + a}{2 a \log a} \\ - \frac{p_{0}}{(1 - p_{0}) \log \frac{aλ}{μ}} ({(\frac{p_{0}}{1 - p_{0}})}^{\frac{\log \frac{a λ}{μ}}{\log μ - \log λ}} - 1) + \frac{p_{0}}{2 (1 - p_{0}) \log \frac{μ}{aλ}} (\frac{μ}{a λ} - 1) \\ = & \frac{1}{\log a} ({(\frac{p_{0}}{1 - p_{0}})}^{\frac{\log a}{\log μ - \log λ}} - 1) - \frac{p_{0}}{(1 - p_{0}) \log \frac{aλ}{μ}} ({(\frac{p_{0}}{1 - p_{0}})}^{\frac{\log \frac{a λ}{μ}}{\log μ - \log λ}} - 1) \\ - \frac{1 - a}{2 a \log a} + \frac{1}{\log \frac{aμ}{λ}} {(\frac{p_{0}}{1 - p_{0}})}^{\frac{\log a}{\log μ - \log λ}} - \frac{p_{0}}{2 (1 - p_{0}) \log \frac{μ}{aλ}} (\frac{μ}{aλ} - 1) . \end{array}

(95)

Furthermore, $\frac{x - 1}{\log x}$ is monotone increasing in x and $\lim_{x \to 0} \frac{x - 1}{\log x} = 0$ . Since λ > μ and $\frac{1}{2} \leq a \leq 1$ , we have $0 < \frac{μ}{aλ} \leq 2$ . Hence, we obtain

0 \leq \frac{\frac{μ}{aλ} - 1}{\log \frac{μ}{aλ}} \leq \frac{1}{\log 2},

and we have

0 \leq \frac{p_{0}}{2 (1 - p_{0}) \log \frac{μ}{aλ}} (\frac{μ}{aλ} - 1) \leq \frac{p_{0}}{2 \log 2} (1 + O (p_{0})),

(96)

which follows from the fact that $\frac{1}{1 - p_{0}}$ can be written as a geometric series.

Therefore,

\begin{array}{l} A & \approx & \frac{1}{\log a} ({(\frac{p_{0}}{1 - p_{0}})}^{\frac{\log a}{\log μ - \log λ}} - 1) - \frac{p_{0}}{(1 - p_{0}) \log \frac{aλ}{μ}} \\ \times ({(\frac{p_{0}}{1 - p_{0}})}^{\frac{\log \frac{aλ}{μ}}{\log μ - \log λ}} - 1) - \frac{1 - a}{2 a \log a} + \frac{1}{\log \frac{a μ}{λ}} {(\frac{p_{0}}{1 - p_{0}})}^{\frac{\log a}{\log μ - \log λ}} . \end{array}

(97)

For C > 0 we obtain $f (x) = \frac{C^{x} - 1}{x}$ is monotone increasing in x, because $f^{'} (x) = \frac{1 - C^{x} + C^{x} \log C^{x}}{x^{2}} \geq 0$ . Furthermore, we have lim_x→0 f (x) = log C. By choosing $C = {(\frac{p_{0}}{1 - p_{0}})}^{\frac{1}{\log μ - \log λ}}$ , we see that

\frac{p_{0}}{(1 - p_{0}) \log \frac{aλ}{μ}} ({(\frac{p_{0}}{1 - p_{0}})}^{\frac{\log \frac{a λ}{μ}}{\log μ - \log λ}} - 1)

is negligible compared with

\frac{1}{\log a} ({(\frac{p_{0}}{1 - p_{0}})}^{\frac{\log a}{\log μ - \log λ}} - 1) .

(98)

Furthermore, also

\frac{1}{\log \frac{aμ}{λ}} {(\frac{p_{0}}{1 - p_{0}})}^{\frac{\log a}{\log μ - \log λ}}

(99)

is negligible compared with (98). Therefore,

A \approx \frac{1}{\log a} ({(\frac{p_{0}}{1 - p_{0}})}^{\frac{\log a}{\log μ - \log λ}} - 1) - \frac{1 - a}{2 a \log a} .

(100)

A.2 Relative heterozygosity

Proof of Remark 4

Let $f (x, y) : = \frac{1}{\frac{1}{x} + \frac{1}{y}}$ for x, y ∈ ℝ⁺. Its Hessian matrix is calculated to be

H = (\begin{matrix} \frac{\partial^{2} f}{\partial x^{2}} & \frac{\partial^{2} f}{\partial y \partial x} \\ \frac{\partial^{2} f}{\partial x \partial y} & \frac{\partial^{2} f}{\partial y^{2}} \end{matrix}) = (\begin{matrix} \frac{- 2 y^{2}}{{(x + y)}^{3}} & \frac{2 xy}{{(x + y)}^{3}} \\ \frac{2 xy}{{(x + y)}^{3}} & \frac{- 2 x^{2}}{{(x + y)}^{3}} \end{matrix}) .

Clearly, we have $\frac{- 2 y^{2}}{{(x + y)}^{3}} < 0$ and det H = 0, i.e., the leading minors of H are non-positive. Hence, f is concave but not strictly concave (note that f (x, x) = x/2). Hence, for positive random variables X and Y defined on a probability space (Ω, A, P) and a sub-σ algebra B the Jensen’s inequality for higher dimensions yields

E [f (X, Y) ∣ B] \leq f (E [X ∣ B], E [Y ∣ B]) .

Now, choose (Ω, A, P) = (ℕ, P(ℕ), P), where P(j) = α_j. Moreover, choose $X (j) = W_{1}^{(j)}, Y (j) = (m - 1) W_{3}^{(j)}$ and B = {0̷, U, T, ℕ}. Then the Jensen’s, inequality gives

\begin{array}{l} \sum_{j \in U} \frac{α_{j} (m - 1) W_{1}^{(j)} W_{3}^{(j)}}{(m - 1) W_{1}^{(j)} + W_{3}^{(j)}} & = & (1 - α) \sum_{j \in U} \frac{\frac{α_{j}}{1 - α}}{\frac{1}{W_{1}^{(j)}} + \frac{1}{(m - 1) W_{3}^{(j)}}} = (1 - α) E [f (X, Y) ∣ U] \\ \leq & (1 - α) f (E [X ∣ U], E [Y ∣ U]) = \frac{(1 - α)}{\frac{1}{W_{1}^{(U)}} + \frac{1}{(m - 1) W_{3}^{(U)}}} \\ = & \frac{(1 - α) (m - 1) W_{1}^{(U)} W_{3}^{(U)}}{(m - 1) W_{1}^{(U)} + W_{3}^{(U)}} . \end{array}

Similarly, we obtain

\sum_{j \in T} \frac{α_{j} (m - 1) W_{1}^{(j)} W_{3}^{(j)}}{(m - 1) W_{1}^{(j)} + W_{3}^{(j)}} \leq \frac{α (m - 1) W_{1}^{(T)} W_{3}^{(T)}}{(m - 1) W_{1}^{(T)} + W_{3}^{(T)}} .

Combination of the above yields

\sum_{j = 0}^{\infty} \frac{α_{j} (m - 1) W_{1}^{(j)} W_{3}^{(j)}}{(m - 1) W_{1}^{(j)} + W_{3}^{(j)}} \leq \frac{(1 - α) (m - 1) W_{1}^{(U)} W_{3}^{(U)}}{(m - 1) W_{1}^{(U)} + W_{3}^{(U)}} + \frac{α (m - 1) W_{1}^{(T)} W_{3}^{(T)}}{(m - 1) W_{1}^{(T)} + W_{3}^{(T)}},

(101)

i.e., that ϑ̃^(m) given by (64) is smaller than if it is given by (58).

Clearly in the limit m → ∞ equality holds in (101). Moreover, ϑ̃^(m) → λ. Hence, in the limit case H̃^(m) reduced to the approximation for standard hitchhiking.

The proof is finished by applying the argument formulated above the remark.

Contributor Information

Kristan A. Schneider, Email: kristan.schneider@asu.edu, School of Life Sciences, Arizona State University, 1711 South Rural Road, Tempe, AZ 85287, USA; Department of Mathematics, University of Vienna, Nordbergstrasse 15, UZA 4, 1090 Vienna, Austria; CEMI/Biodesign Institute, Arizona State University, P. O. Box 875301, Tempe, AZ 85287-5301, USA.

Yuseob Kim, School of Life Sciences, Arizona State University, 1711 South Rural Road, Tempe, AZ 85287, USA; Center for Evolutionary Medicine and Informatics, Biodesign Institute, 1001 S. McAllister Ave., Tempe, AZ 85281, USA.

References

Barton NH. Genetic hitchhiking. Philos Trans R Soc Lond B Biol Sci. 2000;355(1403):1553–1562. doi: 10.1098/rstb.2000.0716. http://rstb.royalsocietypublishing.org/content/355/1403/1553.abstract. [DOI] [PMC free article] [PubMed]
Betancourt AJ, Kim Y, Orr HA. A pseudohitchhiking model of X vs. autosomal diversity. Genetics. 2004;168(4):2261–2269. doi: 10.1534/genetics.104.030999. http://www.genetics.org/cgi/content/abstract/168/4/2261. [DOI] [PMC free article] [PubMed]
Brooks DR, Wang P, Read M, Watkins WM, Sims PFG, Hyde JE. Sequence variation of the hydroxymethyldihydropterin pyrophosphokinase: dihydropteroate synthase gene in lines of the human malaria parasite Plasmodium falciparum with differing resistance to sulfadoxine. Eur J Biochem. 1994;224(2):397–405. doi: 10.1111/j.1432-1033.1994.00397.x. http://dx.doi.org/10.1111/j.1432-1033.1994.00397.x. [DOI] [PubMed]
Chevin L-M, Hospital F. Selective sweep at a quantitative trait locus in the presence of background genetic variation. Genetics. 2008;180(3):1645–1660. doi: 10.1534/genetics.108.093351. http://www.genetics.org/cgi/content/abstract/180/3/1645. [DOI] [PMC free article] [PubMed]
Cowman AF, Morry MJ, Biggs BA, Cross GA, Foote SJ. Amino acid changes linked to pyrimethamine resistance in the dihydrofolate reductase-thymidylate synthase gene of Plasmodium falciparum. Proc Natl Acad Sci USA. 1988;85(23):9109–9113. doi: 10.1073/pnas.85.23.9109. http://www.pnas.org/content/85/23/9109.abstract. [DOI] [PMC free article] [PubMed]
Daily JP. Antimalarial drug therapy: the role of parasite biology and drug resistance. J Clin Pharmacol. 2006;46(12):1487–1497. doi: 10.1177/0091270006294276. http://jcp.sagepub.com/cgi/content/abstract/46/12/1487. [DOI] [PubMed]
Escalante AA, Smith DL, Kim Y. The dynamics of mutations associated with anti-malarial drug resistance in plasmodium falciparum. Trends Parasitol. 2009;25(12):557–563. doi: 10.1016/j.pt.2009.09.008. http://www.sciencedirect.com/science/article/B6W7G-4XJ9BS6-1/2/04d13d69d2006be02c12ef0d051cc5c6. [DOI] [PMC free article] [PubMed]
Hastings IM. Response to: the puzzling links between malaria transmission level and drug resistance. Trends Parasitol. 2003;19(4):160–161. doi: 10.1016/s1471-4922(03)00054-0. http://www.sciencedirect.com/science/article/B6W7G-483SMKD-1/2/680a5da4f74f112c6e1b87c270a9921d. [DOI] [PubMed]
Hastings IM. Complex dynamics and stability of resistance to antimalarial drugs. Parasitology. 2006;132(05):615–624. doi: 10.1017/S0031182005009790. http://journals.cambridge.org/action/displayAbstract?fromPage=online&aid=428221&fulltextType=RA&fileId=S0031182005009790. [DOI] [PubMed]
Hastings IM, Mackinnon MJ. The emergence of drug-resistant malaria. Parasitology. 1998;117(05):411–417. doi: 10.1017/s0031182098003291. http://journals.cambridge.org/action/displayAbstract?fromPage=online&aid=24263&fulltextType=RA&fileId=S0031182098003291. [DOI] [PubMed]
Hastings IM, Watkins WM, White NJ. The evolution of drug-resistant malaria: the role of drug elimination half-life. Philos Trans R Soc Lond B Biol Sci. 2002;357(1420):505–519. doi: 10.1098/rstb.2001.1036. http://rstb.royalsocietypublishing.org/content/357/1420/505.abstract. [DOI] [PMC free article] [PubMed]
Hedrick PW. Hitchhiking: a Comparison of Linkage and Partial Selfing. Genetics. 1980;94(3):791–808. doi: 10.1093/genetics/94.3.791. http://www.genetics.org/cgi/content/abstract/94/3/791. [DOI] [PMC free article] [PubMed]
Hermisson J, Pennings PS. Soft sweeps: molecular population genetics of adaptation from standing genetic variation. Genetics. 2005;169(4):2335–2352. doi: 10.1534/genetics.104.036947. http://www.genetics.org/cgi/content/abstract/169/4/2335. [DOI] [PMC free article] [PubMed]
Kim Y, Stephan W. Detecting a local signature of genetic hitchhiking along a recombining chromosome. Genetics. 2002;160(2):765–777. doi: 10.1093/genetics/160.2.765. http://www.genetics.org/cgi/content/abstract/160/2/765. [DOI] [PMC free article] [PubMed]
Korenromp E, Miller J, Nahlen B, Wardlaw T, Young M. World malaria report 2005. World Health Organization (WHO); Geneva: 2005. [Google Scholar]
Mackinnon MJ, Hastings IM. The evolution of multiple drug resistance in malaria parasites. Trans R Soc Tropical Med Hygiene. 1998;92(2):188–195. doi: 10.1016/s0035-9203(98)90745-3. http://www.sciencedirect.com/science/article/B75GP-4BY31N7-MS/2/bfd1e32b1cf5fd0f23e10978a356bd36. [DOI] [PubMed]
Marsh K. Malaria disaster in africa. Lancet. 1998;352(9132):924–924. doi: 10.1016/S0140-6736(05)61510-3. http://www.sciencedirect.com/science/article/B6T1B-4FWV357-JX/2/c9facab67b868b8fddd289f050bfae19. [DOI] [PubMed]
Maynard Smith J, Haigh J. The hitch-hiking effect of a favourable gene. Genet Res. 1974;23(01):23–35. http://journals.cambridge.org/action/displayAbstract?fromPage=online&aid=1754360&fulltextType=RA&fileId=S0016672300014634. [PubMed]
McCollum AM, Basco LK, Tahar R, Udhayakumar V, Escalante AA. Hitchhiking and selective sweeps of Plasmodium falciparum sulfadoxine and pyrimethamine resistance alleles in a population from Central Africa. Antimicrob Agents Chemother. 2008;52(11):4089–4097. doi: 10.1128/AAC.00623-08. http://aac.asm.org/cgi/content/abstract/52/11/4089. [DOI] [PMC free article] [PubMed]
Mita T, Tanabe K, Takahashi N, Tsukahara T, Eto H, Dysoley L, Ohmae H, Kita K, Krudsood S, Looareesuwan S, Kaneko A, Bjorkman A, Kobayakawa T. Independent evolution of pyrimethamine resistance in Plasmodium falciparum isolates in Melanesia. Antimicrob Agents Chemother. 2007;51(3):1071–1077. doi: 10.1128/AAC.01186-06. http://aac.asm.org/cgi/content/abstract/51/3/1071. [DOI] [PMC free article] [PubMed]
Nair S, Nash D, Sudimack D, Jaidee A, Barends M, Uhlemann A-C, Krishna S, Nosten F, Anderson TJC. Recurrent gene amplification and soft selective sweeps during evolution of multidrug resistance in malaria parasites. Mol Biol Evol. 2007;24(2):562–573. doi: 10.1093/molbev/msl185. http://mbe.oxfordjournals.org/cgi/content/abstract/24/2/562. [DOI] [PubMed]
Nair S, Williams JT, Brockman A, Paiphun L, Mayxay M, Newton PN, Guthmann J-P, Smithuis FM, Hien TT, White NJ, Nosten F, Anderson TJC. A selective sweep driven by pyrimethamine treatment in Southeast Asian malaria parasites. Mol Biol Evol. 2003;20(9):1526–1536. doi: 10.1093/molbev/msg162. http://mbe.oxfordjournals.org/cgi/content/abstract/20/9/1526. [DOI] [PubMed]
Nash D, Nair S, Mayxay M, Newton PN, Guthmann J-P, Nosten F, Anderson TJ. Selection strength and hitchhiking around two anti-malarial resistance genes. Proc R Soc B Biol Sci. 2005;272(1568):1153–1161. doi: 10.1098/rspb.2004.3026. http://rspb.royalsocietypublishing.org/content/272/1568/1153.abstract. [DOI] [PMC free article] [PubMed]
Prugnolle F, Durand P, Renaud F, Rousset F. Effective size of the hierarchically structured populations of the agent of malaria: a coalescent-based model. Heredity. 2009:1–7. doi: 10.1038/hdy.2009.127. http://dx.doi.org/10.1038/hdy.2009.127. [DOI] [PubMed]
Schneider KA, Kim Y. An analytical model for genetic hitchhiking in malaria parasites caused by drug resistance. Theor Popul Biol. 2010 doi: 10.1016/j.tpb.2010.06.005. [DOI] [PMC free article] [PubMed] [Google Scholar]
Schwobel B, Alifrangis M, Salanti A, Jelinek T. Different mutation patterns of atovaquone resistance to Plasmodium falciparum in vitro and in vivo: rapid detection of codon 268 polymorphisms in the cytochrome b as potential in vivo resistance marker. Malaria J. 2003;2(1):5. doi: 10.1186/1475-2875-2-5. http://www.malariajournal.com/content/2/1/5. [DOI] [PMC free article] [PubMed]
Stephan W, Wiehe THE, Lenz MW. The effect of strongly selected substitutions on neutral polymorphism: analytical results based on diffusion theory. Theor Popul Biol. 1992;41(2):237–254. http://www.sciencedirect.com/science/article/B6WXD-4F1Y9N0-3M/2/1245281bba0c6b542457fdd75c343edf.
Triglia T, Cowman AF. Primary structure and expression of the dihydropteroate synthetase gene of Plasmodium falciparum. Proc Natl Acad Sci USA. 1994;91(15):7149–7153. doi: 10.1073/pnas.91.15.7149. http://www.pnas.org/content/91/15/7149.abstract. [DOI] [PMC free article] [PubMed]
WHO. WHO Expert Committee on malaria. World Health Organ Tech Rep Ser. 2000;892:1–74. http://www.genetics.org/cgi/content/abstract/160/2/765. [PubMed]
Wootton JC, Feng X, Ferdig MT, Cooper RA, Mu J, Baruch DI, Magill AJ. Su X-z 07 2002 Genetic diversity and chloroquine selective sweeps in Plasmodium falciparum. Nature. 418(6895):320–323. doi: 10.1038/nature00813. http://dx.doi.org/10.1038/nature00813. [DOI] [PubMed]

[R1] Barton NH. Genetic hitchhiking. Philos Trans R Soc Lond B Biol Sci. 2000;355(1403):1553–1562. doi: 10.1098/rstb.2000.0716. http://rstb.royalsocietypublishing.org/content/355/1403/1553.abstract. [DOI] [PMC free article] [PubMed]

[R2] Betancourt AJ, Kim Y, Orr HA. A pseudohitchhiking model of X vs. autosomal diversity. Genetics. 2004;168(4):2261–2269. doi: 10.1534/genetics.104.030999. http://www.genetics.org/cgi/content/abstract/168/4/2261. [DOI] [PMC free article] [PubMed]

[R3] Brooks DR, Wang P, Read M, Watkins WM, Sims PFG, Hyde JE. Sequence variation of the hydroxymethyldihydropterin pyrophosphokinase: dihydropteroate synthase gene in lines of the human malaria parasite Plasmodium falciparum with differing resistance to sulfadoxine. Eur J Biochem. 1994;224(2):397–405. doi: 10.1111/j.1432-1033.1994.00397.x. http://dx.doi.org/10.1111/j.1432-1033.1994.00397.x. [DOI] [PubMed]

[R4] Chevin L-M, Hospital F. Selective sweep at a quantitative trait locus in the presence of background genetic variation. Genetics. 2008;180(3):1645–1660. doi: 10.1534/genetics.108.093351. http://www.genetics.org/cgi/content/abstract/180/3/1645. [DOI] [PMC free article] [PubMed]

[R5] Cowman AF, Morry MJ, Biggs BA, Cross GA, Foote SJ. Amino acid changes linked to pyrimethamine resistance in the dihydrofolate reductase-thymidylate synthase gene of Plasmodium falciparum. Proc Natl Acad Sci USA. 1988;85(23):9109–9113. doi: 10.1073/pnas.85.23.9109. http://www.pnas.org/content/85/23/9109.abstract. [DOI] [PMC free article] [PubMed]

[R6] Daily JP. Antimalarial drug therapy: the role of parasite biology and drug resistance. J Clin Pharmacol. 2006;46(12):1487–1497. doi: 10.1177/0091270006294276. http://jcp.sagepub.com/cgi/content/abstract/46/12/1487. [DOI] [PubMed]

[R7] Escalante AA, Smith DL, Kim Y. The dynamics of mutations associated with anti-malarial drug resistance in plasmodium falciparum. Trends Parasitol. 2009;25(12):557–563. doi: 10.1016/j.pt.2009.09.008. http://www.sciencedirect.com/science/article/B6W7G-4XJ9BS6-1/2/04d13d69d2006be02c12ef0d051cc5c6. [DOI] [PMC free article] [PubMed]

[R8] Hastings IM. Response to: the puzzling links between malaria transmission level and drug resistance. Trends Parasitol. 2003;19(4):160–161. doi: 10.1016/s1471-4922(03)00054-0. http://www.sciencedirect.com/science/article/B6W7G-483SMKD-1/2/680a5da4f74f112c6e1b87c270a9921d. [DOI] [PubMed]

[R9] Hastings IM. Complex dynamics and stability of resistance to antimalarial drugs. Parasitology. 2006;132(05):615–624. doi: 10.1017/S0031182005009790. http://journals.cambridge.org/action/displayAbstract?fromPage=online&aid=428221&fulltextType=RA&fileId=S0031182005009790. [DOI] [PubMed]

[R10] Hastings IM, Mackinnon MJ. The emergence of drug-resistant malaria. Parasitology. 1998;117(05):411–417. doi: 10.1017/s0031182098003291. http://journals.cambridge.org/action/displayAbstract?fromPage=online&aid=24263&fulltextType=RA&fileId=S0031182098003291. [DOI] [PubMed]

[R11] Hastings IM, Watkins WM, White NJ. The evolution of drug-resistant malaria: the role of drug elimination half-life. Philos Trans R Soc Lond B Biol Sci. 2002;357(1420):505–519. doi: 10.1098/rstb.2001.1036. http://rstb.royalsocietypublishing.org/content/357/1420/505.abstract. [DOI] [PMC free article] [PubMed]

[R12] Hedrick PW. Hitchhiking: a Comparison of Linkage and Partial Selfing. Genetics. 1980;94(3):791–808. doi: 10.1093/genetics/94.3.791. http://www.genetics.org/cgi/content/abstract/94/3/791. [DOI] [PMC free article] [PubMed]

[R13] Hermisson J, Pennings PS. Soft sweeps: molecular population genetics of adaptation from standing genetic variation. Genetics. 2005;169(4):2335–2352. doi: 10.1534/genetics.104.036947. http://www.genetics.org/cgi/content/abstract/169/4/2335. [DOI] [PMC free article] [PubMed]

[R14] Kim Y, Stephan W. Detecting a local signature of genetic hitchhiking along a recombining chromosome. Genetics. 2002;160(2):765–777. doi: 10.1093/genetics/160.2.765. http://www.genetics.org/cgi/content/abstract/160/2/765. [DOI] [PMC free article] [PubMed]

[R15] Korenromp E, Miller J, Nahlen B, Wardlaw T, Young M. World malaria report 2005. World Health Organization (WHO); Geneva: 2005. [Google Scholar]

[R16] Mackinnon MJ, Hastings IM. The evolution of multiple drug resistance in malaria parasites. Trans R Soc Tropical Med Hygiene. 1998;92(2):188–195. doi: 10.1016/s0035-9203(98)90745-3. http://www.sciencedirect.com/science/article/B75GP-4BY31N7-MS/2/bfd1e32b1cf5fd0f23e10978a356bd36. [DOI] [PubMed]

[R17] Marsh K. Malaria disaster in africa. Lancet. 1998;352(9132):924–924. doi: 10.1016/S0140-6736(05)61510-3. http://www.sciencedirect.com/science/article/B6T1B-4FWV357-JX/2/c9facab67b868b8fddd289f050bfae19. [DOI] [PubMed]

[R18] Maynard Smith J, Haigh J. The hitch-hiking effect of a favourable gene. Genet Res. 1974;23(01):23–35. http://journals.cambridge.org/action/displayAbstract?fromPage=online&aid=1754360&fulltextType=RA&fileId=S0016672300014634. [PubMed]

[R19] McCollum AM, Basco LK, Tahar R, Udhayakumar V, Escalante AA. Hitchhiking and selective sweeps of Plasmodium falciparum sulfadoxine and pyrimethamine resistance alleles in a population from Central Africa. Antimicrob Agents Chemother. 2008;52(11):4089–4097. doi: 10.1128/AAC.00623-08. http://aac.asm.org/cgi/content/abstract/52/11/4089. [DOI] [PMC free article] [PubMed]

[R20] Mita T, Tanabe K, Takahashi N, Tsukahara T, Eto H, Dysoley L, Ohmae H, Kita K, Krudsood S, Looareesuwan S, Kaneko A, Bjorkman A, Kobayakawa T. Independent evolution of pyrimethamine resistance in Plasmodium falciparum isolates in Melanesia. Antimicrob Agents Chemother. 2007;51(3):1071–1077. doi: 10.1128/AAC.01186-06. http://aac.asm.org/cgi/content/abstract/51/3/1071. [DOI] [PMC free article] [PubMed]

[R21] Nair S, Nash D, Sudimack D, Jaidee A, Barends M, Uhlemann A-C, Krishna S, Nosten F, Anderson TJC. Recurrent gene amplification and soft selective sweeps during evolution of multidrug resistance in malaria parasites. Mol Biol Evol. 2007;24(2):562–573. doi: 10.1093/molbev/msl185. http://mbe.oxfordjournals.org/cgi/content/abstract/24/2/562. [DOI] [PubMed]

[R22] Nair S, Williams JT, Brockman A, Paiphun L, Mayxay M, Newton PN, Guthmann J-P, Smithuis FM, Hien TT, White NJ, Nosten F, Anderson TJC. A selective sweep driven by pyrimethamine treatment in Southeast Asian malaria parasites. Mol Biol Evol. 2003;20(9):1526–1536. doi: 10.1093/molbev/msg162. http://mbe.oxfordjournals.org/cgi/content/abstract/20/9/1526. [DOI] [PubMed]

[R23] Nash D, Nair S, Mayxay M, Newton PN, Guthmann J-P, Nosten F, Anderson TJ. Selection strength and hitchhiking around two anti-malarial resistance genes. Proc R Soc B Biol Sci. 2005;272(1568):1153–1161. doi: 10.1098/rspb.2004.3026. http://rspb.royalsocietypublishing.org/content/272/1568/1153.abstract. [DOI] [PMC free article] [PubMed]

[R24] Prugnolle F, Durand P, Renaud F, Rousset F. Effective size of the hierarchically structured populations of the agent of malaria: a coalescent-based model. Heredity. 2009:1–7. doi: 10.1038/hdy.2009.127. http://dx.doi.org/10.1038/hdy.2009.127. [DOI] [PubMed]

[R25] Schneider KA, Kim Y. An analytical model for genetic hitchhiking in malaria parasites caused by drug resistance. Theor Popul Biol. 2010 doi: 10.1016/j.tpb.2010.06.005. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R26] Schwobel B, Alifrangis M, Salanti A, Jelinek T. Different mutation patterns of atovaquone resistance to Plasmodium falciparum in vitro and in vivo: rapid detection of codon 268 polymorphisms in the cytochrome b as potential in vivo resistance marker. Malaria J. 2003;2(1):5. doi: 10.1186/1475-2875-2-5. http://www.malariajournal.com/content/2/1/5. [DOI] [PMC free article] [PubMed]

[R27] Stephan W, Wiehe THE, Lenz MW. The effect of strongly selected substitutions on neutral polymorphism: analytical results based on diffusion theory. Theor Popul Biol. 1992;41(2):237–254. http://www.sciencedirect.com/science/article/B6WXD-4F1Y9N0-3M/2/1245281bba0c6b542457fdd75c343edf.

[R28] Triglia T, Cowman AF. Primary structure and expression of the dihydropteroate synthetase gene of Plasmodium falciparum. Proc Natl Acad Sci USA. 1994;91(15):7149–7153. doi: 10.1073/pnas.91.15.7149. http://www.pnas.org/content/91/15/7149.abstract. [DOI] [PMC free article] [PubMed]

[R29] WHO. WHO Expert Committee on malaria. World Health Organ Tech Rep Ser. 2000;892:1–74. http://www.genetics.org/cgi/content/abstract/160/2/765. [PubMed]

[R30] Wootton JC, Feng X, Ferdig MT, Cooper RA, Mu J, Baruch DI, Magill AJ. Su X-z 07 2002 Genetic diversity and chloroquine selective sweeps in Plasmodium falciparum. Nature. 418(6895):320–323. doi: 10.1038/nature00813. http://dx.doi.org/10.1038/nature00813. [DOI] [PubMed]

PERMALINK

Approximations for the hitchhiking effect caused by the evolution of antimalarial-drug resistance

Kristan A Schneider

Yuseob Kim

Abstract

1 Introduction

2 The model

Table 1.

3 Approximations

3.1 Equilibrium frequencies at the neutral locus for m = 2

Result 1

Theorem 1

Proof

Remark 1

Remark 2

Corollary 1

Fig. 1.

Fig. 2.

Theorem 2

Remark 3

Corollary 2

Fig. 3.

Corollary 3

Fig. 4.

3.2 Approximations for rare mutations

Theorem 3

Fig. 5.

Theorem 4

Proof

Corollary 4

Fig. 6.

3.3 The hitchhiking effect for general m

Result 2

Fig. 7.

4 The general model

4.1 Host heterogeneity

Result 3

4.2 Number of co-infections

Result 4

5 Equilibrium heterozygosity and the hitchhiking effect

Fig. 8.

Remark 4

Fig. 9.

6 Discussion

Acknowledgments

Appendix A

A.1 Bounds and approximations

Proof of Remark 2

Proof of Theorem 2

Proof of Remark 3

Proof of Theorem 3

A.2 Relative heterozygosity

Proof of Remark 4

Contributor Information

References

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases