Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2012 Jun 1.
Published in final edited form as: J Math Biol. 2010 Jul 11;62(6):789–832. doi: 10.1007/s00285-010-0353-9

Approximations for the hitchhiking effect caused by the evolution of antimalarial-drug resistance

Kristan A Schneider 1,2,3,, Yuseob Kim 4,5
PMCID: PMC3242009  NIHMSID: NIHMS287685  PMID: 20623287

Abstract

An analytically feasible, deterministic model for the spread of drug resistance among human malaria parasites, which incorporates all characteristics of the complex malaria-transmission cycle was introduced by Schneider and Kim (Theor. Popul Biol, 2010). The model accounts for the fact that only a fraction of infected hosts receive drug treatment and that hosts can be co-infected by differently many parasites. Furthermore, the model also incorporates host heterogeneity. Antimalarial-drug resistance is assumed to be caused by a single locus with two alleles—a sensitive one and a resistance one. The most important result for this model is that an analytical solution for the frequencies of a linked neutral biallelic locus exists. However, the exact solution does not admit an explicit form, and cannot straightforwardly be interpreted in terms of the model parameters. Here, we establish simple approximations for the equilibrium frequency at the neutral locus. Under the assumption that the resistant allele is initially rare—the biologically most relevant assumption in this context—and that recombination is weak, the approximations become similar to the approximations in the standard hitchhiking model. However, there are crucial differences. In particular, because of the high degree of selfing among malaria parasites in their sexual phase, a genome-wide reduction of relative heterozygosity occurs if selection is sufficiently strong. It turns out that the approximations are accurate even if the recombination rates are not small and the resistant allele is initially not very rare. The main advantage of our approximations is that they are easy to interpret in terms of model parameters. Moreover, they allow to make predictions of the size of the valley of reduced heterozygosity around the selected locus for given model parameters. Reversely, for a given reduction of heterozygosity, it is possible to identify the corresponding parameters. Moreover, we will show that incorporating host heterogeneity leads to an increased hitchhiking effect.

Keywords: Selective sweep, Co-infections, Drug concentration, Relative heterozygosity, Host heterogeneity, Malaria, Plasmodium falciparum

1 Introduction

Human malaria is an infectious disease caused by parasites belonging to the genus Plasmodium, which is endemic in most tropical and subtropical regions in the world. Infections with Plasmodium falciparium, the most virulent form of human malaria, result worldwide in one to 3 million deaths per year (cf. WHO 2000; Korenromp et al. 2005). Malaria control is highly dependent on drug treatments that kill parasites in infected hosts. However, attempts to control malaria have been thwarted by rapid evolution of antimalarial-drug resistance, a fact that has been described to be a public health disaster (cf. Marsh 1998).

The limited repertoire of safe, effective, and affordable antimalarial drugs has made research on the emergence and dispersion of resistance a global health priority. Mathematical models that can use input from genetic data to investigate the dynamics of mutations associated with drug resistance are urgently needed for designing drug-deployment policies that can increase the lifespan of the available drugs. This requires a detailed understanding of population genetic processes that lead to the emergence and dispersion of drug resistance. Unfortunately, the highly complex nature of the malaria-transmission cycle as well as complex demographic and environmental factors aggravate the efforts to elaborate theoretical models.

Malaria parasites undergo a complex transmission cycle with sexual phases in the mosquito vector and asexual phases in the infected host (cf. e.g. Daily 2006; Prugnolle et al. 2009). A human host is inoculated with sporozoites by the bite of an infected Anopheles mosquito during its blood meal. In the human host the sporozoites migrate first to the liver where they differentiate into hepatic merozoites. These are released into the blood stream where some of the hepatic merozoites form gametocytes. The haploid gametocytes are extracted by a mosquito during its blood meal and immediately reproduce sexually in the mosquito’s gut. Consequently, they undergo recombination. Completing the transmission cycle, this step is followed by the production of haploid sporozoites in the mosquitos’s salivary glands from which they can be inoculated into a human host.

Another source of complication is that many environmental and clinical factors differ significantly across the worldwide distribution of this parasite. The transmission rate and hence the number of secondary infections varies from very low rates in parts of South America, over intermediate rates in Southeast Asia, to high rates in Africa. On the other hand the level of host-acquired immunity is much higher (which means a high number of asymptomatic infections) in most of the affected areas in Africa than in other parts of the world. Such variation in host-acquired immunity affects drug use.

Among others, two important variables that summarize the demographic and clinical setting of the particular geographic area are the (average) number of parasites (m) co-infecting a given host (the average multiplicity of infection), which is determined by transmission intensity, and the proportion (α) of infected hosts that are drug treated. This parameter depends mainly on how many hosts acquired immunity.

Several studies built population genetic models that demonstrated profound effects of m and α on the rate of drug-resistance evolution (Hastings and Mackinnon 1998; Mackinnon and Hastings 1998; Hastings 2003,2006). However, it is still unclear which mechanisms are important to the spread of resistance, i.e., intra-host dynamics, drug half life, multiple drug treatment, migration, multiple infections, recombination, mutation etc.

Over the last 2 decades substantial advances regarding the genetic basis of antimalarial-drug resistance have been made. It is known that specific point mutations in the dhfr and dhps regions underlie resistance to pyrimethamine and sulfadoxine (cf. Cowman et al. 1988; Triglia and Cowman 1994; Brooks et al. 1994), that point mutations in mitrochandrial DNA underlie resistance to atovaquone (cf. Schwobel et al. 2003), and that mutations in the pfcrt gene are causing resistance against Chloroquine (CQ).

The spread of mutations causing drug resistance leads to a valley of reduced genetic variation at linked neutral regions. This removal of pre-existing variation occurs because recombination cannot effectively break the initial association with the neutral background in which the mutant first occurred, a process known as genetic hitchhiking or a selective sweep (Maynard Smith and Haigh 1974; Stephan et al. 1992; Barton 2000). For instance, Nair et al. (2003) observed a severe reduction of variation at microsatellite loci spanning over a 100 kb region surrounding the dhfr gene in a Southeast Asian population of P. falciparum. The extent of this pattern depends on how fast the favored allele increases to high frequency while meiotic recombination is constantly eroding the association between the favored allele and the surrounding chromosome segment (Kim and Stephan 2002). Selective sweeps have been initially studied for randomly mating populations of constant size, and homogenous constant selection pressures (e.g., Maynard Smith and Haigh 1974), to which we refer as the ‘standard model’, or standard ‘hitchhiking’.

In malaria biology, detection of selective sweeps mainly contributed to confirming the location of drug resistant mutations and elucidating their mutational origins (Wootton et al. 2002; Nash et al. 2005; Mita et al. 2007; Nair et al. 2007; McCollum et al. 2008), while fewer studies attempted to relate the span of selective sweeps with the strength of drug selection (Nair et al. 2003). Recently, Schneider and Kim (2010) introduced an analytical feasible model for the spread of antimalarial-drug resistance, which allows to study genetic hitchhiking. The model covers the important characteristics of the transmission cycle, incorporates host heterogeneity, i.e., different classes of treated and untreated hosts, and accounts for the fact that hosts can be infected by differently many parasites. Moreover, the model allows simple conditions for the spread of resistance and its speed in terms of the fitness parameters and α. Hence, it is useful to find ‘optimal’ treatment strategies to prevent or slow down the spread of resistance (for more discussion see Schneider and Kim 2010).

Studies based on known recombination rates and the result of the standard model of selective sweeps concluded that the observed patterns of selective sweeps are compatible with the predictions (Nair et al. 2003). However, Schneider and Kim (2010) discuss in detail why the application of the standard selective-sweep model to a malaria-parasite population is highly problematic. In particular, the high degree of selfing among malaria parasites will lead to a genome-wide reduction of heterozygosity if selection is sufficiently strong (cf. also Hedrick 1980).

In the absence of reliable public-health records, a retrospective analysis of the mechanisms and parameters underlying the spread of antimalarial drug resistance may be achieved indirectly through the patterns of selective sweeps. However, the analytical solution for the valley of reduced heterozygosity in the model of Schneider and Kim (2010) is not explicit, which limits their applicability. Moreover, it is not at all straightforward to interpret these results based on the model parameters.

In this article we derive accurate approximations to the exact, analytical solutions of Schneider and Kim (2010). The approximations under the assumption that the resistant allele is initially rare and that recombination is weak are similar to but different from the usual approximation for standard hitchhiking. Unlike in standard hitchhiking, our approximations are accurate even if the recombination rates are large and the resistant allele is initially not very rare. Moreover, using the approximations, we will show that incorporating host heterogeneity will result in an increased hitchhiking effect compared to our basic model, with corresponding selection parameters, which accounts just for one class of treated and one class of untreated hosts. The simple form of the approximations will allow us to easily interpret the effect of genetic hitchhiking in terms of the model parameters. Such approximations are extremely useful to achieve applicability of the results of Schneider and Kim (2010) to real data. Moreover, we show that the approximate solutions can be easily applied to identify the range of parameters that give rise to given levels of reduced heterozygosity in a given range of recombination distances. Finally, we discuss the differences compared with standard hitchhiking and give an outlook how our results can be applied to real data.

2 The model

We consider two biallelic loci. The first locus is subject to selection with a sensitive allele AS and a resistant allele AR segregating. The second locus is selectively neutral with the alleles N1 and N2 segregating. We will use the notation and parametrization summarized in Table 1.

Table 1.

Summary of notation

Haplotypes ASN1 ASN2 ARN1 ARN2
Frequency p1 p2 p3 p4
Fitness in
W1(U)
W2(U)
W3(U)
W4(U)
Untreated hosts 1 1 1 − s 1 − s
Fitness in
W1(T)
W2(T)
W3(T)
W4(T)
Treated hosts 1 − dS 1 − dS 1 − dR 1 − dR
Sensitive/resistant Sen. Sen. Res. Res.

The tables shows the notation of the four haplotypes, their frequencies, fitnesses in treated and untreated hosts, the parametrization of fitnesses that we are going to use for the illustrations in the following sections, and whether the haplotypes are sensitive or resistant. Here, s reflects metabolic costs of the resistant allele, while dS and dR indicate how efficiently the drug wipes out the sensitive and resistant parasites, respectively

Let p denote the frequency of the resistant allele AR. We have p = p3 + p4. Furthermore, we denote the frequency of the neutral allele N1 among the sensitive and resistant haplotypes by R and Q, respectively. We therefore have R=p1p1+p2=p11p and Q=p3p3+p4=p3p.

Moreover, we denote the recombination rate between the two loci by r, and the vector of haplotype frequencies by p = (p1, … , p4). We assume that each host is infected randomly and independently by exactly m haploids (parasite strains). (We assume m to be a fixed parameter until Sect. 4.2, where we assume that m follows a fixed frequency-distribution).

Hosts acquire parasite strains according to their frequencies in the mosquito phase. Hence, the configuration of infections in hosts is multinomially distributed with parameters m and p1, … , p4. We assume that all haploids in a newly infected host have equal frequencies. Hence, the relative frequency of a haploid drawn from the parasite population among mosquitos is 1m. Let us denote a multi-index by m = (m1, … , m4), and the sum over its components by |m|=i=14mi. The probability that a host is infected by mi copies of haplotype i (i = 1, … , n, |m| = m) is given by

(mm)pm, (1)

where (mm)=m!i=14mi! denotes the respective multinomial coefficient and, as usual, pm:=i=14pimi.

After a host is infected the parasites reproduce clonally in the host. An infected host either receives drug treatment or is untreated. We assume that a fixed proportion α of infected hosts in the population is treated, whereas the remaining hosts are untreated. Thus, the probability for a host to be treated is simply α. The rate of reproduction of the haplotypes is different in treated and untreated hosts. The absolute fitness of a parasite strain is the expected number of its descendants in the host at the time of the mosquito visit. The absolute frequency (fitness) of haplotype i in an untreated host before a mosquito takes its blood meal is denoted by mimWi(U), whereas the absolute frequency of haplotype i in a treated host before a mosquito takes its blood meal is denoted by mimWi(T). [In the following, wherever it is appropriate, we use the superscript to (.) to resemble the superscripts (U) and (T).] Some of these haploids form gametocytes in male or female expressions. The frequencies of those are assumed to be proportional to the number of respective haplotypes. Furthermore, we impose that male and female gametocytes occur at the same frequencies. We assume that the number of different gametocytes taken by a mosquito during its blood meal from an infected host is proportional to its frequency in the host. Let γ denote the proportional constant, which is assumed to be the same for each mosquito. Hence, if the absolute frequencies of parasites in an infected host are m1mW1(.),,m4mW4(.), the absolute frequencies of parasites absorbed during the blood meal are γm1mW1(.),,γm4mW4(.). Note that this takes drug efficiency into account, because a mosquito will absorb a smaller number of parasites from a host in which drugs efficiently eliminated parasites.

In the mosquitos’ guts recombination occurs immediately after the blood meal during the phase in which meiosis occurs. In the gut of a mosquito, which has taken its blood meal from a host initially infected with mi haplotypes i (i = 1, … , 4), the probability that a male k-gametocyte fertilizes a female l-gametocyte is

γmkmWk(.)γmWm(.)γmlmWl(.)γmWm(.)=mkWk(.)mlWl(.)Wm(.)2,

where

γmWm(.)=γmk=14mkWk(.)

is the frequency of parasite haploids in the mosquito’s gut. The above probability is the relative frequency of a male k-gametocyte times that of a female l-gametocyte. Therefore, the absolute frequency of pairings of a male k-gametocyte and a female l-gametocyte is, the probability of such a fertilization times the absolute numbers of parasites in the gut, i.e.,

γ2mkmWk(.)mlmWl(.)γmWm(.)=γmmkWk(.)mlWl(.)Wm(.). (2)

The probability that a fertilization of a male k-gametocyte to a female l-gametocyte produces a haplotype i is denoted by R(kli). Therefore, the absolute frequencies of haplotype i in the population of mosquitos that took their blood meal from treated and untreated hosts are respectively

pi(.)=|m|=m(mm)pmk,l=14γmkWk(.)mlWl(.)mWm(.)R(kli),

where the sum runs over all multi-indices m+4 with |m| = m. Therefore, the relative frequencies of haplotypes in the mosquito population become

pi=αpi(T)+(1α)pi(U)αk=14pk(T)+(1α)k=14pk(U)=pik=14pk, (3a)

where

pi:=αpi(T)+(1α)pi(U). (3b)

For what follows, let us define the average fitness of the resistant allele among treated and untreated hosts by

λ:=αW3(T)+(1α)W3(U) (4a)

and that of the sensitive by

μ:=αW1(T)+(1α)W1(U). (4b)

If p(t) denotes the frequency of the resistant allele at time t, it changes according to p(t)=λtp(0)λtp(0)+μt(1p(0)), i.e., according to the standard haploid one-locus selection model (cf. Schneider and Kim 2010). Thus, the resistant allele will sweep through the population if and only if λ > μ, otherwise it will get lost (or remain constant in frequency). Hence, in the following we will always assume λ > μ without further mentioning. Note that the spread of the resistant allele is independent from m. Moreover, it is obvious to calculate the time until a given level of resistance is reached (cf. Result 2 in Schneider and Kim 2010). Because of the relatively simple form of the dynamics of the resistant allele it is possible to derive simple conditions for the spread of resistance and its speed. Example 1 in Schneider and Kim (2010) directly links the fitness parameters to the spread of fixation. Furthermore, assuming that the fitnesses in treated hosts are functions of the administered drug concentration, Example 2 in Schneider and Kim (2010) illustrates the impact of the drug concentration and α on the spread of resistance and its speed. Assuming that the fitness of resistant parasites decays slower than that of the sensitive ones as a function of the drug concentration they found that resistance spreads most quickly for intermediate drug concentrations and large α.

Anyhow, the focus of this article is genetic hitchhiking. The following relations were derived in Schneider and Kim (2010).

W¯pi=pi(αWi(T)+(1α)Wi(U))+rip(1p)(QR)ϑp,m, (5a)

where

r1=r2=r3=r4=r,W¯=pλ+(1p)μ,ϑp,m:=k=0m2(m2k)θk,mpk(1p)m2k, (5b)
θk,m:=αθk,m(T)+(1α)θk,m(U), (5c)

and

θk,m(.):=(m1)W1(.)W3(.)(m1k)W1(.)+(k+1)W3(.). (5d)

It was further shown in Schneider and Kim (2010) that

R=R+rp(QR)ϑp,mμ (6a)

and

Q=Qr(1p)(QR)ϑp,mλ. (6b)

From the relations (5) and (6) the equilibrium frequency of the neutral allele N1 was calculated to be

Q^(m)=Q0+r(R0Q0)λτ=0ϑpτ,mp01p0(λμ)τ+1l=0τ1Λm,l, (7)

where p0, Q0, and R0 are the respective initial frequencies,

ϑpt,m=(p01p0(λμ)t+1)2mk=0m2(m2k)θk,m(p01p0)k(λμ)tk,

and

Λm,t:=1rϑpt,mλμ(p0λt+1+(1p0)μt+1p0λt+(1p0)μt). (8)

It was mentioned by Schneider and Kim (2010) that, for practical purposes, it suffices to sum (7) until the time of quasi-fixation, for which they provided an explicit formula. However, it is not at all obvious how (m) is influenced by the various model parameters. In particular, when studying genetic hitchhiking using (7) it seems infeasible to predict the width of the valley of reduced heterozygosity in terms of the recombinational distance. Hence, simple but accurate approximations for (7) that permit a simple interpretation in terms of the model parameters are highly desirable.

3 Approximations

We shall now calculate simple approximations for (7). First, we treat the special case m = 2, and later deduce the general case from it.

3.1 Equilibrium frequencies at the neutral locus for m = 2

The equilibrium frequency of the neutral allele N1 is given by

Q^:=Q^(2)=Q0+rϑ(R0Q0)λτ=0l=0τ1Λlp01p0(λμ)τ+1,

where

ϑ:=ϑpt,2 (9)

is independent of the trajectory pt of the resistant allele and in particular of p0, and

Λl:=Λ2,l. (10)

In the following we will derive upper and lower bounds as well as approximations for which exhibit a much simpler form. This permits us to explore the effects of the various parameters on .

First, note that Λl is monotone decreasing in l. To see this note that Λl < Λl−1 is equivalent to

1rϑλμ(p0λl+1+(1p0)μl+1p0λl+(1p0)μl)<1rϑλμ(p0λl+(1p0)μlp0λl1+(1p0)μl1).

This can be simplified to

λl1μl+1+λl+1μl1>2λlμl.

By dividing through λl−1 μl−1 this simplifies to

(λμ)2>0.

Moreover, note that

Λ¯:=limlΛl=liml1rϑλμ(p0λ+(1p0)μ(μλ)lp0+(1p0)(μλ)l)=1rϑμ. (11)

Hence, by using

Λ¯:=Λ(0)=1rϑ(p0λ+(1p0)μ)λμΛlΛ¯, (12)

we obtain

Λ¯τl=0τ1ΛlΛ¯τ.

Furthermore, because λ > μ we have

1rϑ(p0λ+(1p0)μ)λμ1rϑ(p0μ+(1p0)μ)λμ=1rϑλ=:Λ, (13)

and consequently Λ̃ ≥ Λ̅.

Let us define

Q(a):=Q0+rϑ(R0Q0)λτ=0aτp01p0(λμ)τ+1. (14)

Note that by relabeling the alleles, we can assume without loss of generality that R0 > Q0. Hence, (14) is monotone increasing in a. Thus, we can formulate our first result.

Result 1

Let Q(a) be defined as in (14). Then, Q(a) is monotone increasing in a. Moreover, let Λ̲, Λ̅ and Λ̃ be defined as in (11), (12), and (13), respectively. Then

Q¯:=Q(Λ¯)andQ:=Q(Λ)

are upper bounds for Q̂, and

Q¯:=Q(Λ¯)

is a lower bound for Q̂.

Although the expressions of this bounds are simpler than that of they are still not very explicit. Hence, we shall derive upper and lower bounds for and , or more generally for Q(a).

First, note that ϑ < λ. This is because it is equivalent to

αW3(T)+(1α)W3(U)>αW1(T)W3(T)W1(T)+W3(T)+(1α)W1(U)W3(U)W1(U)+W3(U)

which obviously holds because

W1(T)W1(T)+W3(T)<1andW1(U)W1(U)+W3(U)<1.

Essentially the same calculation shows ϑ < μ. Therefore, we have ϑλ<1 and ϑμ<1. This combined with (11) and (13) implies 12<Λ¯<Λ1,since r[0,12]. Hence, in the following we will always assume a[Λ¯,Λ][12,1].

Notice that has exactly the same form as in the standard haploid hitchhiking model derived by Maynard Smith and Haigh (1974; eq. 8), with r replaced by ϑrλ and p0 replaced by μp0λ(1p0)+μp0. Thus, we can interpret this as studying standard hitchhiking with an ‘effective’ recombination rate ϑrλ, which is smaller than r, and an ‘effective’ initial frequency of μp0λ(1p0)+μp0, which is smaller than the actual initial frequency. The adjusted recombination rate leads to a more severe hitchhiking effect, whereas the adjusted initial frequency leads to a less pronounced hitchhiking effect. Combination of this factors lead to a more severe hitchhiking effect (cf. Schneider and Kim 2010).

Next, we need two definitions. For a ∈ ℝ and k ∈ ℕ+ the upper Pochhammer symbol is defined by

(a)k¯:=l=0k1(a+l).

The hypergeometric function is defined by

Fsr(a1,,arb1,,bsz):=k=0zkk!u=1r(au)k¯υ=1s(bυ)k¯,

where r, s ∈ ℕ, z, a1, …, ar ∈ ℝ, b1, … , bs\0.

With the above definition we are already able to formulate our first theorem.

Theorem 1

Assume a(μλ)l for l ∈ ℕ, and let csc x=1sinx denote the cosecant of x. If p0<12, the function

ψ(a):=Q0+(R0Q0)λ[πcsc(πlogalogλlogμ)(logλlogμ)(p01p0)logalogλlogμ1logaF12(1,logalogλlogμ1+logalogλlogμp01p0)] (15)

is a lower bound for Q(a). If p0<λλ+μ,

Ψ(a):=Q0+(R0Q0)λ[πcsc(πlogalogλlogμ)(logλlogμ)(p01p0)logalogλlogμ1alogaF12(1,logalogλlogμ1+logalogλlogμp0μ(1p0)λ)] (16)

is an upper bound for Q(a).

Proof

Consider the function

g(x)=axcbx+1 (17)

for x > −1 with a ∈ (0, 1), and let

b:=λμ>1,andc:=p01p0. (18)

By calculating the derivative, which is given by

g(x)=ax(loga+cbx(logalogb))(1+cbx)2

and obviously always negative, we recognize that g is monotone decreasing as a function in x. Thus, we obtain the estimate

0g(x)dxτ=0g(τ)0g(x1)dx=1g(x)dx. (19)

Since, we have

Q(a)=Q0+rϑ(R0Q0)λτ=0g(τ),

it follows from (19) that

ψ(a)=Q0+rϑ(R0Q0)λ0g(x)dx (20)

is a lower bound for Q(a), and that

Ψ(a)=Q0+rϑ(R0Q0)λ1g(x)dx (21)

is an upper bound for Q(a).

We will now manipulate the above equations. First, rewrite g as

g(x)=e1+ce, (22)

where

ξ:=logaandη:=logb. (23)

Assume first that ce < 1. Hence, g is a geometric series, can be rewritten as

g(x)=ek=0(ce)k=k=0(c)kex(ξ+), (24)

and converges absolutely for all x satisfying ce < 1. If ce > 1, we have 1ce<1 and can expand (22) into the power series

g(x)=ece111ce=ecek=0(ce)k=k=0(c)(k+1)ex(η(k+1)+ξ),

which converges absolutely for all x satisfying ce > 1.

Let us further define x0:=logclogb. The assumption p0<12 guarantees x0 > 0. The absolute convergence of the above series enables us to integrate by summands, i.e., for x < x0 we have

G1(x)=g(x)dx=k=0(c)kex(ηk+ξ)dx=k=0(c)kex(ηk+ξ)dx (25a)
=k=0(c)kex(ηk+ξ)ηk+ξ=eηk=0(c)kexηkk+ξη=exξξk=0ξηk+ξη(ce)k (25b)
=eξk=0(ξη)k¯(1)k¯(1+ξη)k¯(ce)kk!=eξF12(1,ξη1+ξηce). (25c)

Note that we have used the fact that ξηk for all k ∈ ℕ in (25b), which is guaranteed by the assumption a(μλ)k for all k ∈ ℕ.

Similarly, for x > x0 we have

G2(x)=g(x)dx=k=0(c)(k+1)ex(η(k+1)+ξ)dx=k=0(c)(k+1)ex(η(k+1)+ξ)dx (26a)
=k=0(c)(k+1)ex(η(k+1)+ξ)η(k+1)+ξ=exξηk=0(c)(k+1)exη(k+1)(k+1)+ξη (26b)
=eηk=1(c)kexηkk+ξη=eξk=1ξηkξη(1ce)k (26c)
=eξk=1(ξη)k¯(1)k¯(1ξη)k¯(1cexη)kk!=exξξ[1F12(1,ξη1ξη1ce)]. (26d)

It should be mentioned that we did not need the assumption a(μλ)k for all k ∈ ℕ in (26).

Next, assume p0<12 i.e., x0 > 0. Hence, we have

0g(x)dx=0x0g(x)dx+x0g(x)dx=limxx0G1(x)G1(0)+limx+G2(x)limxx0+G2(x).

Since 1k+ξη0, the Leibnitz criterium implies that

G1(x0)=eξlogcηηk=0(1)kk+ξη (27)

converges, i.e., −∞ < G1(x0) < ∞. Furthermore, by a similar argument we obtain −∞ < G2(x0) < ∞. We also see that

|G2(x)|=|eηk=1(c)kexηkk+ξη|eηk=1|(c)kexηkk+ξη|=eηk=1|1ce|k1kξηexξηk=1|1ce|k=eη(1ce).

Hence, for x > x0 this implies that the series representation (26d) of G2(x) is absolute convergent. Therefore, we obtain

limxx0+G2(x)=G2(x0) (28)

and

limx+G2(x)=0ξ[1F12(1,ξη1ξη0)]=0. (29)

Similarly, we see that G1(x) is absolute convergent for x0 < x. Therefore, we have

limxx0G1(x)=G1(x0). (30)

We obtain

0g(x)dx=G1(x0)G1(0)G2(x0)=eξlogcηξF12(1,ξη1+ξη1)1ξF12(1,ξη1+ξηc)eξlogcηξ[1F12(1,ξη1ξη1)]. (31)

If p0<λλ+μ, we have x0 > − 1. The above calculations are still valid and yield

1g(x)dx=G1(x0)G1(1)G2(x0)=eξlogcηξF12(1,ξη1+ξη1)eξξF12(1,ξη1+ξηceη)eξlogcηξ[1F12(1,ξη1ξη1)]. (32)

Let us now simplify (31) and (32). Note that

A:=F12(1,κ1+κ1)+F12(1,κ1κ1)=k=0(1)k¯(κ)k¯(κ+1)k¯(1)kk!+k=0(1)k¯(κ)k¯(κ+1)k¯(1)kk!=k=0(1)kκκ+k+k=0(1)kκκ+k=k=02κ2(1)kk2κ2=2κ2π2k=0(1)kκ2π2k2π2.

By using the the well known series

csc(z)=1z+2zk=0(1)kz2k2π2forzπ,

we arrive at

A=πκcsc(πκ)1.

Because a(μλ)k for k ∈ ℕ we have ξη+. Because, we assume a ∈ [0, 1] and λ > μ we also have a(μλ)k for k ∈ ℤ. Hence, we have ξη. Therefore, we obtain

0g(x)dx=πcsc(πξη)ηcξn1ξF12(1,ξη1+ξηc)

and

1g(x)dx=πcsc(πξη)ηcξηeξξF12(1,ξη1+ξηceη).

Combining the above with (20) and (21) and the definitions of ξ, η, and c immediately yields (15) and (16).

The assumptions p0<12, or p0<λλ+μ hold whenever one aims to study the hitchhiking effect of a mutation that is initially rare. If the initial frequency of the mutation is not rare because one is interesting in studying mutations from standing genetic variation that become beneficial, for instance, because of a change in the environment, i.e., if one wants to study soft selective sweeps (cf. Hermisson and Pennings 2005), Theorem 1 is not applicable. The study of soft selective sweeps is relevant and hence we shall also treat the cases p012 and p0λλ+μ.

Remark 1

The proof of Theorem 1 reveals that we have to replace (15) by

ψ(a)=Q0+rϑ(R0Q0)λloga[1F12(1,logalogμlogλ1+logalogμlogλ1p0p0)] (33)

if p012. Moreover, we have to replace (16) by

Ψ(a)=Q0+rϑ(R0Q0)λaloga[1F12(1,logalogμlogλ1+logalogμlogλ(1p0)λp0μ)] (34)

if p0λλ+μ. Notably, (33) and (34) also hold if a=(λμ)k for k ∈ ℕ. Furthermore, (33) and (34) are the continuations of (15) and (16) as functions in p0.

Theorem 1 assumes a(λμ)k for all k ∈ ℕ. If this assumption is violated we have to make the following adjustments.

Remark 2

Assume l ∈ ℕ. If p0<12,

ψ((μλ)l)=Q0+rϑ(R0Q0)λ(logμlogλ)[k=0l1(1)kkl(p01p0)k(p01p0)llogp0], (35)

is a lower bound for Q((μλ)l). If p0<λλ+μ,

Ψ((μλ)l)=Q0+rϑ(R0Q0)λ(logμlogλ)[k=0l1(1)kkl(μλ)kl(p01p0)k(p01p0)llog(p0μp0μ+(1p0)λ)] (36)

is an upper bound for Q((μλ)l). Moreover, (35) and (36) are the continuations of (15) and (16) in the limit a(μλ)l. If p012, (35) has to be replaced by (33). If p0λλ+μ, (36) has to be replaced by (34).

The proof can be found in Appendix A.1.

Let

Ψ:=Ψ(Λ),Ψ¯:=Ψ(Λ¯),andΨ¯:=Ψ(Λ¯) (37)

and analogously

ψ:=ψ(Λ),ψ¯:=ψ(Λ¯),andψ¯:=ψ(Λ¯). (38)

We immediately obtain the following corollary.

Corollary 1

We have Ψ̃ ≥ Ψ̅ ≥ ψ̲, hence Ψ̃ and Ψ̅ are upper bounds for Q̂, and ψ̲ is a lower bound for Q̂. Moreover, we have Ψ̃ ≥ ψ̃, Ψ̅ ≥ ψ̅, and Ψ̲ ≥ ψ̲.

Figure 1 illustrates and its bounds and , as well as their bounds Ψ̅, Ψ̲, ψ̅, and ψ̲ for various parameters. It becomes obvious that the upper bound is very close to , whereas the lower bound greatly underestimates .

Fig. 1.

Fig. 1

Equilibrium frequency of the neutral allele N1 and various bounds for as a function of r for different parameter combinations. The panels show along with its upper and lower bounds and , as well as their respective upper and lower bounds Ψ̅ and ψ̅, and Ψ̲ and ψ̲. The parameters for the various plot panels are specified in the boxes above the panels. In all panels and are almost identical

It turns out that and are almost identical unless p0 is large. The same holds for for Ψ̃ and Ψ̅. This is illustrated in Fig. 2.

Fig. 2.

Fig. 2

Equilibrium frequency of the neutral allele N1 and various bounds of as a function of r for different parameter combinations. The panels show along with its upper bounds and , as well as their respective upper and lower bounds Ψ̅ and ψ̅, and Ψ̲ and ψ̲. The parameters for the various plot panels are specified in the boxes above the panels

Although, Ψ (a) and ψ(a) have closed expressions, these expressions involve the hypergeometric function. In the following we shall derive approximations for Ψ(a) and ψ(a) that make no use of the hypergeometric function. Our first step is the following theorem.

Theorem 2

For p0<12 let

ϕ(a):=Q0+rϑ(R0Q0)λ[(p01p0)logalogμlogλ(1loga1loga+logλlogμ+1logalogλ+logμ)1loga+p0(loga+logλlogμ)(1p0)] (39)

and, for p0<λλ+μ let

Φ(a):=Q0+rϑ(R0Q0)λ[(p01p0)logalogμlogλ(1loga1loga+logλlogμ+1logalogλ+logμ)1aloga+p0μa(loga+logλlogμ)(1p0)λ]. (40)

If p0<12, ϕ(a) ≈ ψ(a) and if p0<λλ+μ, Φ(a) ≈ Ψ (a).

The proof can be found in Appendix A.1.

If p012, or p0λλ+μ we have to modify the definitions of ϕ(a), or Φ(a), respectively.

Remark 3

For p012 let

ϕ(a):=Q0rϑ(R0Q0)(1p0)λp0(logλlogμloga) (41)

and, for p0λλ+μ let

Φ(a):=Q0rϑ(R0Q0)(1p0)μp0(logλlogμloga). (42)

If p012, ϕ(a) ≈ ψ(a) and if p0λλ+μ, Φ(a) ≈ Ψ(a). Moreover, (41) and (42) are the continuations of (39) and (40) regarded as functions in p0.

The proof can be found in Appendix A.1.

From Theorem 2, Remark 3, and the definitions of Λ̲, Λ̅, and Λ̃ we immediately obtain the following corollary.

Corollary 2

Let Ψ, ψ, Φ and ϕ be defined as in Theorems 1 and 2, and Remarks 1–3. Furthermore, let

Φ:=Φ(Λ),Φ¯:=Φ(Λ¯),Φ¯:=Φ(Λ¯) (43)

and

ϕ:=ϕ(Λ),ϕ¯:=ϕ(Λ¯),ϕ¯:=ϕ(Λ¯). (44)

Then Φ̃ ≈ Ψ̃, Φ̅ ≈ Ψ̅, Φ̲ ≈ Ψ̲, ϕ̃ψ̃, ϕ̅ψ̅, and ϕ̲ψ̲.

The approximations provided in Corollary 2 are very accurate unless p0 becomes too large. This is illustrated in Fig. 3.

Fig. 3.

Fig. 3

Equilibrium frequency of the neutral allele N1 and various bounds of as a function of r for different parameter combinations. The panels show along with its upper bounds and , as well as their respective upper and lower bounds Ψ̅ and ψ̅, and Ψ̃ and ψ̃, along with their approximations Φ̅ and ϕ̅, and Φ̃ and ϕ̃, respectively. The parameters for the various plot panels are specified in the boxes above the panels

Note that we derived the upper and lower bounds for Q(a) by using the estimate (19). However, we can use (19) for the standard estimate

k=0g(k)121g(x)dx+120g(x)dx. (45)

Hence, the estimates summarized in the next corollary follow immediately.

Corollary 3

Estimates for Q̂ are given by

Q¯:=12(Ψ¯+ψ¯)andQ:=12(Ψ+ψ), (46)

and

Q¯:=12(Φ¯+ϕ¯)andQ=12(Φ+ϕ). (47)

The approximations of Corollary 3 are illustrated in Fig. 4. It is obvious that the above approximations are very accurate for various parameters.

Fig. 4.

Fig. 4

Equilibrium frequency of the neutral allele N1 and various bounds of as a function of r for different parameter combinations. The panels show along with its upper bounds and , as well as its approximations Q̅*, Q̃*, Q̅°, and Q̃°. The parameters are the same as in Fig. 3

Although we already arrived at relatively simple approximations for , they are still difficult to interpret in terms of the involved parameters. In the following we will concentrate on the case in which the initial frequency p0 of the resistant allele is small. This is the most relevant case for studying genetic hitchhiking.

3.2 Approximations for rare mutations

So far in our derivations we did not assume that the initial frequency p0 of the resistant allele is small. However, since this is the biologically most relevant situations we shall derive further approximations for the equilibrium frequencies of the neutral alleles under the assumption that the mutation is initially rare, i.e., p0 ≈ 0 and 1 − p0 ≈ 1. Since we have Λ̃ ≈ Λ̅ under this assumption we focus on estimates based on Λ̃.

For p0 ≈ 0 we obtain the following theorem.

Theorem 3

Let

A(a):=Q0+(R0Q0)λ[1loga((p01p0)logalogμlogλ1)1a2aloga]. (48)

For p0 ≈ 0 we have Q°(a) ≈ A(a).

The proof can be found in Appendix A.1.

In accordance to out previous notation we set

A¯:=A(Λ¯)andA:=A(Λ). (49)

These approximations are illustrated in Fig. 5.

Fig. 5.

Fig. 5

Equilibrium frequency of the neutral allele N1 and various bounds of as a function of r for different parameter combinations. The panels show along with its upper bounds and , as well as the approximations A̅, Ã, B̅, B̃. The parameters are the same as in Fig. 3

Now, we shall additionally assume that a ≈ 1.

Theorem 4

Let a(x) = 1 − rx with x[ϑλ,ϑμ] and

B(x):=Q0+ϑ(R0Q0)(1p0rxlogλlogμ). (50)

For p0 ≈ 0 and rx ≈ 0 we have Q°(a(x)) ≈ B(x).

Proof

We have log a(x) ≈ −rx. Hence,

Q(a(x))Q0+(R0Q0)λ[1rx((p01p0)rxlogλlogμ1)+rx2(1rx)rx]=Q0+ϑ(R0Q0)[1(p01p0)rxlogλlogμ+rx2(1rx)].

If rx ≈ 0, we can neglect rx2(1rx) and obtain

Q(a(x))Q0+ϑ(R0Q0)(1(p01p0)rxlogλlogμ).

Moreover, because p0 ≈ 0 we have p01p0p0. Thus,

Q(a(x))Q0+ϑ(R0Q0)(1p0rxlogλlogμ).

Note that we have a(ϑ(p0λ+(1p0)μ)λμ)=Λ¯ and a(ϑλ)=Λ. Let us define

B¯:=B(ϑ(p0λ+(1p0)μ)λμ)andB:=B(ϑλ). (51)

We obtain the following corollary.

Corollary 4

We have

B=R0(R0Q0)p0λ(logλlogμ). (52)

For p0 ≈ 0 and λ0, we have Q̃° ≈ B̃.

Figure 5 illustrates the approximations A̅, Ã, B̅, B̃. It is easily seen that the approximations are very accurate unless p0 is too large (p0 ≳ 0.1). Anyway, the approximations are still acceptably accurate for p0 = 0.1.

By expanding p0x into a Taylor series around x = 0, and by setting x=λ(logλlogμ) we obtain

B=R0(R0Q0)k=0rkk!(ϑλ(logλlogμ))klogp0. (53)

Hence, for r ≈ 0 we can further approximate this by neglecting terms of order O(r2) or O(r3) and higher, and obtain

C=Q0(R0Q0)logp0λ(logλlogμ) (54)

and

D=Q0(R0Q0)logp0λ(logλlogμ)(R0Q0)r22(ϑλ(logλlogμ))2logp0. (55)

Figure 6 shows along with its approximations and . It is obvious that these approximations are only accurate for very small r. Not surprisingly is a much better approximation than the linear approximation .

Fig. 6.

Fig. 6

Equilibrium frequency of the neutral allele N1 and various bounds of as a function of r for various parameter combinations. The panels show along with its upper bounds and , as well as the approximations and .

3.3 The hitchhiking effect for general m

Since the sensitive allele goes extinct in the population, the equilibrium frequency of the neutral allele N1 is given by (7), i.e., by

Q^(m)=Q0+r(R0Q0)λτ=0ϑpτ,mp01p0(λμ)τ+1l=0τ1Λm,l.

The above formula has the same structure as in the case m = 2. If we set m = 1 there is no recombination and hence the initial proportions of the neutral allele among genotypes with the resistant allele remain constant over time. Hence, we have Q^(1)=Q0(1).

We shall now derive approximations for (m). As easily seen from (7), we can apply our approximations from the case m = 2 if we make the approximation l=0τ1Λm,laτ and ϑpτ,mb. We first approximate ϑpτ,m by a constant. From (5b) we obtain

ϑpτ,m=k=0m2(m2k)pτk(1pτ)m2k×(α(m1)W1(T)W3(T)(mk1)W1(T)+(k+1)W3(T)+(1α)(m1)W1(U)W3(U)(mk1)W1(U)+(k+1)W3(U)).

Let

ϑ¯(m):=maxk=0,,m2{α(m1)W1(T)W3(T)(mk1)W1(T)+(k+1)W3(T)+(1α)(m1)W1(U)W3(U)(mk1)W1(U)+(k+1)W3(U)} (56)

and

ϑ¯(m):=mink=0,,m2{α(m1)W1(T)W3(T)(mk1)W1(T)+(k+1)W3(T)+(1α)(m1)W1(U)W3(U)(mk1)W1(U)+(k+1)W3(U)}. (57)

Hence, we have

ϑpτ,mϑ¯(m)k=0m2(m2k)pτk(1pτ)m2k=ϑ¯(m),

where the last equality follows from the binomial formula. Similarly, we obtain ϑpτ,mϑ̲(m).

Therefore, we obtain

1rϑ¯(m)λμ(p0λt+1+(1p0)μt+1p0λt+(1p0)μt)Λm,t1rϑ¯(m)λμ(p0λt+1+(1p0)μt+1p0λt+(1p0)μt).

Using the same approximation as in the case m = 2 yields

Λ¯(m):=1rϑ¯(m)μΛm,t1rϑ¯(m)λμ(p0λ+(1p0)μ)=:Λ¯(m).

Hence, by replacing ϑ and a by ϑ̅(m) and Λ̅(m), or ϑ̲(m) and Λ̲(m), respectively, we obtain the upper and lower bounds

Q¯(m)=Q0+rϑ¯(m)(R0Q0)λk=0Λ¯(m)kp01p0(λμ)k+1

and

Q¯(m)=Q0+rϑ¯(m)(R0Q0)λk=0Λ¯(m)kp01p0(λμ)k+1,

respectively.

It turns out that the upper and lower bounds are very inaccurate estimates for (m). However, it is possible to find relative accurate approximations for (m), at least if p0 is small, which is the most important case for our current purpose. If p0 ≈ 0, we have p01p00, so that we obtain

ϑpτ,mk=0m2(m2k)(p01p0)k(λμ)t(k)×(α(m1)W1(T)W3(T)(mk1)W1(T)+(k+1)W3(T)+(1α)(m1)W1(U)W3(U)(mk1)W1(U)+(k+1)W3(U))α(m1)W1(T)W3(T)(m1)W1(T)+W3(T)+(1α)(m1)W1(U)W3(U)(m1)W1(U)+W3(U)=:ϑ(m). (58)

From (5c) we see that ϑ̃(m) = θ0,m Moreover, define

Λ(m):=1rϑ(m)λ. (59)

Hence, by approximating ϑpτ,m by ϑ̃(m) and Λm,l by Λ̃(m) we obtain

Q^(m)Q(m)=Q0+rϑ(m)(R0Q0)λτ=0(Λ(m))τp01p0(λμ)τ+1. (60)

This approximation turns out to be sufficiently accurate unless p0 is too large.

Clearly, (m) can be further approximated as in the case m = 2. However, note that for m > 2 the assumption that p0 is small is always implicitly made because otherwise the approximation (58) for ϑpτ,m will be inaccurate. Hence, only those approximations that were derived under the assumption of p0 ≈ 0 are meaningful if m > 2, i.e., those of Result 4 and Corollary 4. We can therefore summarize:

Result 2

Let m ≥ 2 and

B(m):=R0(R0Q0)p0rϑ(m)λ(logλlogμ). (61)

Then we have Q̂(m)(m).

Note, that the approximations become worse for large m. However, as noted in Schneider and Kim (2010), the differences in (m) for different values of m become small for large m. Since, in reality m should be bounded by a maximum possible value it should be sufficient to assume m < 10. For these values Result 2 is still accurate. This is illustrated in Fig. 7. Furthermore, note that Result 2 becomes delicate for large m, because it is based on the approximation (58), which becomes very inaccurate as m → ∞.

Fig. 7.

Fig. 7

Equilibrium frequency (m) (thin lines) of the neutral allele N1 versus the approximation B(m) (thick dashed lines) for various values of m. The parameters are as in Fig. 3

4 The general model

Schneider and Kim (2010) also presented two generalizations of their model. First, they argued that the differentiation into treated and untreated hosts is oversimplified, and that host heterogeneity should be taken into account. Host heterogeneity can reflect for instance different levels of drug concentration, drug decay, different levels of host-acquired immunity, different immune responses etc. Second, they argued that it is oversimplified to assume that each host is infected by the same number of parasites, i.e., that m is a fixed parameter, and showed how the model can be generalized to the case in which m follows a given frequency distribution.

We shall briefly summarize the two generalizations, and show how our results are generalized.

4.1 Host heterogeneity

Assume again a fixed proportion α of hosts is treated. We divide the treated and untreated host into various different discrete “treated classes” and “untreated classes”. Let the proportion of infected hosts that fall into class j (j ∈ ℕ) be αj. Let U ⊆ ℕ and T = ℕ\U be the sets of treated and untreated classes, respectively. Hence, we have jUαj=1α and jTαj=α. Let us denote the fitnesses of parasite haplotypes carrying the sensitive and resistant allele in hosts that fall into class j by W1(j) and W2(j), and W3(j) and W4(j), respectively.

Note that in the original formulation of the model we have Wk(U)=jUαj1αWk(j)=E[Wk(j)U], and Wk(T)=jTαjαWk(j)=E[Wk(j)T]. Hence, it can be regarded as the approximation that all untreated hosts are subsumed in just one class, in which Wk(U) is the mean fitness of the parasites among them and all treated hosts are subsumed in just one class, in which Wk(T) is the mean fitness of the parasites among them.

In the model accounting for host heterogeneity the equilibrium frequency of the neutral allele N1 is still given by (7), however the parameters λ, μ, are given by

λ=j=0αjW3(j)=E[W3(j)]=(1α)E[W3(j)U]+αE[W3(j)T] (62a)

and

μ=j=0αjW1(j)=E[W1(j)]=(1α)E[W1(j)U]+αE[W1(j)|T], (62b)

whereas ϑp,m and Λm,t are still given by (5b) and (8), however in (5b) θk,m has to be replaced by

θk,m=(m1)j=0αjW1(j)W3(j)(m1k)W1(j)+(k+1)W3(j). (63)

Hence, in all approximations ϑ̃(m) has to be replaced by

ϑ(m)=j=0αj(m1)W1(j)W3(j)(m1)W1(j)+W3(j). (64)

We summarize:

Result 3

Let m ≥ 2. The equilibrium frequency of the neutral allele N1, (m), is approximately given by

B(m):=R0(R0Q0)p0rϑ(m)λ(logλlogμ), (65)

where λ,μ, and ϑ̃(m) are given by (62a), (62b), and (64), respectively.

It was mentioned in Schneider and Kim (2010) that accounting for host heterogeneity results in a more pronounced hitchhiking compared to the basic model if the values of λ and μ coincide, or more precisely if Wk(U)=E[Wk(j)U], and Wk(T)=E[Wk(j)T]. The reason is that under this assumption ϑ̃(m) given by (64) is smaller than if it is given by (58), and (for fixed m) the respective approximations for the hitchhiking effect corresponds to standard hitchhiking with recombination rate ϑ(m)rλ and initial frequency μp0λ(1p0)+μp0. We will prove this in the Sect. 5.

4.2 Number of co-infections

Now, assume that the number m of parasites that infect a host follows some probability distribution over the population. Let κm denote the probability that a host is infected by m ≥ 1 parasites. Naturally, we have m=1κm=1.

As shown in Schneider and Kim (2010), the equilibrium frequency of the neutral allele N1 is given by

Q^(κ)=Q0+r(R0Q0)λτ=0ϑpτ,κp01p0(λμ)τ+1l=0τ1Λκ,l, (66a)

with

ϑpτ,κ=m=1κmϑpτ,mandΛκ,t=m=1κmΛm,t, (67a)

where ϑpτ,m and Λm,t are given by (5b) and (8) (eventually with λ, μ, ϑp,m, Λm,t, and θk,m adjusted as in Sect. 4.1 to incorporate host heterogeneity). Hence, the equilibrium frequency of the neutral allele N1 can be approximated following the calculations of Sect. 3. We can summarize this in the following result.

Result 4

The equilibrium frequency of the neutral allele N1, (κ) is approximately given by

Q(κ)B(κ):=R0(R0Q0)p0rϑ(κ)λ(logλlogμ) (68)

with ϑκ=m=1κmϑm, where λ, μ, and ϑ̃(m) are given by (4), and (58), respectively. If one wants to incorporate host heterogeneity, λ, μ, and ϑ̃(m) are given by (62), and (64), respectively.

5 Equilibrium heterozygosity and the hitchhiking effect

From now on we assume that m follows a probability distribution as in Sect. 4.2 and we account for host heterogeneity as in Sect. 4.1 unless otherwise mentioned.

Remember that the equilibrium heterozygozity is given by

H^=2Q^(κ)(1Q^(κ)). (69)

We have seen in the last section that the equilibrium frequency, and its upper and lower bounds and approximations are given by expressions of the form Q0 + (R0Q0) A(r), where A is a function of the recombination rate that has to be chosen appropriately. For brevity we will suppress the dependence of (κ) on r unless necessary. Let us regard R0, i.e., the initial frequency of N1 among sensitive parasites, as a random variable and the heterozygosity as a function of R0. Let us write

H^=H^(R0)=2(Q0+(R0Q0)A(r))(1Q0(R0Q0)A(r)). (70)

Given the initial frequency of the neutral allele N1 is R0, the beneficial mutation occurs initially in association with allele N1 with probability R0, and in association with allele N2 with probability (1 − R0). Hence, we have Q0 = 1 with probability R0 and Q0 = 0 with probability (1 − R0). Therefore, the average heterozygosity given R0 is calculated to be

E(H^R0)=2R0(1+(R01)A(r))(Q0R0)A(r)+2(1R0)R0A(r)(1R0A(r))=(R0R02)(2A(r)A2(r)). (71)

Hence, according to the theorem of total probability, the average heterozygosity is calculated to be

E(H^)=E(E(H^R0))=2(E(R0)E(R02))(2A(r)A2(r)). (72)

The initial heterozygosity is given by H0 = 2R0(1 − R0). Hence, the fraction of the expected equilibrium heterozygosity over the initial heterozygosity is given by

H(r)=E(H^)E(H0)=2A(r)A2(r), (73)

which is independent of the distribution of R0. Since A(0) = 0 we have H(0) = 0.

If we further approximate by given by (52), we obtain

H(κ)(r)H(κ)=1p02rϑ(κ)λ(logλlogμ). (74)

From (74) it is seen that H(κ) shows a strong genome-wide reduction if κ1 is large and selection is sufficiently strong. The approximation is illustrated in Fig. 8.

Fig. 8.

Fig. 8

Relative expected heterozygosity H(κ) (thin lines) of the neutral allele N1 versus the approximation (κ) (thick dashed lines) for different distributions κ. In all cases we assume a truncated exponential distribution with range 1–10 and mean κ̅. By truncated we mean that the probability of m = 10 is the probability that m ≥ 10 for a poisson distribution with mean κ̅. The parameters are as in Fig. 3

For the special case that m is constant the above reduces to H(m)(r)H(m)=1p02rϑ(m)λ(logλlogμ) which is increasing as a function of m. The reason is that ϑ̃(m) is monotone increasing in m, either if it is given by (58) or in the case of host heterogeneity by (64).

It was mentioned in Schneider and Kim (2010) that accounting for host heterogeneity results in a more pronounced hitchhiking compared to the basic model if the values of λ and μ coincide, or, more precisely, if Wk(U)=E[Wk(j)U] and Wk(T)=E[Wk(j)T]. The reason is that, for every m, under these assumptions ϑ̃(m) given by (64) is smaller than if it is given by (58). Hence, this holds also for the corresponding values of ϑ̃(κ), and consequently (74) is smaller if host heterogeneity is incorporated. We shall summarize this as a remark and prove it in the appendix.

Remark 4

Host heterogeneity leads to an increased hitchhiking effect, i.e., to a stronger reduction in relative heterozygosity (κ)(r), compared to the basic model with only one class of treated and one class of untreated hosts with corresponding fitness parameters, i.e., Wk(U)=E[Wk(j)U], and Wk(T)=E[Wk(j)T].

For fixed large m the differences in (m) become very small and vanish in the limit m → ∞ because (m) approaches the classical approximation for standard hitchhiking.

Note, that the last statement of the remark is delicate, because the approximation (m) will be inaccurate for very large m.

We can use (73), or (74) to calculate the maximum recombination distance for which a given reduction in relative heterozygosity can be observed. This is relevant for predicting the width of the valley of relative heterozygosity for given selection parameters. By comparison of (73), or (74) with empirical data on the relative heterozygosity, it is possible to re-evaluate or validate parameter estimates (e.g., for α, λ, μ, etc.).

From (73) we obtain that the maximum recombination rate, , for which the relative heterozygosity is smaller than β by solving the equation H() = β. If we solve this equation first with respect to A(), we obtain

A(r^)=11β2. (75)

By using the approximation according to (74), we obtain

r^=λlog(1β2)(logλlogμ)2ϑ(κ)logp0. (76)

Figure 9 illustrates the valley of reduced heterozygosity as a function of α for different distributions κ of m. Such illustrations can be used to determine the range of parameters that lead to a given reduction in relative heterozygosity.

Fig. 9.

Fig. 9

Contour plot of the valley of reduced heterozygosity for different distributions κ. In all cases we assume a truncated exponential distribution with range 1–10 and mean κ̅. By truncated we mean that the probability of m = 10 is the probability that m ≥ 10 for a poisson distribution with mean κ̅. In all panels the same selection parameters are assumed. The parameters are summarized in the boxes above the panels

6 Discussion

We obtained a closed-form approximation for the expected heterozygosity shaped by genetic hitchhiking in the model of antimalarial drug-resistance evolution proposed in Schneider and Kim (2010). This model aims to capture the effect of multiple infections per host (m) and drug-treatment rate (α), which are considered the most important epidemiological parameters that characterize geographic differences in the dynamics of drug resistance (Escalante et al. 2009), as well as the complex malaria transmission cycle on the pattern of selective sweeps caused by drug-resistant mutations (Daily 2006; Prugnolle et al. 2009). Due to the model complexity the exact solution is expressed by the summation of infinite (or a very large number of) terms. Here, we provided numerous approximate solutions with varying degree of accuracy. Notably, using the assumption that the starting frequency of the resistant allele under positive directional selection is low, we could obtain a solution that is simple enough to allow clear biological interpretations regarding the effects of epidemiological parameters. Furthermore, our approximations are flexible enough to incorporate arbitrary distributions of hosts with different infection rates and/or host heterogeneity, e.g., arbitrary distributions of hosts with different drug concentrations. The latter condition, arising due to a slow decay of antimalarial drugs in the bodies of treated patients, was demonstrated to be crucial for the initiation of drug resistance evolution (Hastings et al. 2002). For more discussion on how this model can be used to predict the spread of resistance and its speed, and how it can be used to design ‘optimal’ treatment strategies we refer to Schneider and Kim (2010).

The mean fitness of the resistant parasites, λ, and of the sensitive parasites, μ, are crucial in the considered model. If λ and μ are not too different, log λ − log μ corresponds to the selection coefficient of the beneficial (resistant) mutation. Then, our approximation (74) is basically identical to that under the standard model of hitchhiking obtained by Maynard Smith and Haigh (1974) except the modifying factors of the recombination rate, ϑ(κ)λ, and of the initial frequency, μλ(1p0)+μp0. Therefore, the dynamics of hitchhiking unique to this malaria model is summarized by these factors. We have ϑ(κ)λ<1 because (m1)W1(j)(m1)W1(j)+W3(j)<1 for all host class j and all m. The latter fraction represents the probability that a given resistant gametocyte pairs with a sensitive gametocyte in the body of a mosquito which took its blood meal from a j host, when the frequency of resistant allele is low. (As with the standard hitchhiking model, the final heterozygosity is predominantly determined by the dynamics of the resistant allele at its early stage.) If the drug in the host is very effective ( W1(j)W3(j)), it will greatly reduce ϑ(κ)λ. Namely, the strength of selection determines the effective rate of recombination (decay of association between beneficial and neutral allele), unlike the case of standard hitchhiking model in which the two factors are decoupled. Moreover, the approximation (74) will be more accurate even for larger r and p0 compared to the standard hitchhiking model because of the two adjustment factors ϑ(κ)λ and μλ(1p0)+μp0.

Assume m is constant. It is also obvious from the above that the number, m, of independent parasite strains infecting a host determines the effective recombination rate. A small value of m thus increases the hitchhiking effect. In the extreme case of m = 1, genetic variation in the population is completely wiped out as parasites reproduce effectively asexually. On the other hand, with m → ∞ our approximation approaches that of standard hitchhiking. Note, however, that in the exact solution ϑ(m)λ is still less than one, even with m → ∞, because the approximation leading to ϑ̃(m) (Eq. 58) assumes that the frequency of the resistant allele is low and hence that the probability that a given host is infected by two or more resistant strains is negligible. However, if m increases to a large number, such chance is no longer negligible. Therefore, the combined effect of strong drug pressure and the limited number of clones in hosts can greatly amplify the effect of genetic hitchhiking beyond the level predicted by the standard model. If m is not constant, the hitchhiking effect is more pronounced for more left-skewed distributions κ. It will be in particular pronounced if a large fraction of single infections occurs.

Our approximation also reveals another important departure of our model from the previous models that assume random mating and homogeneous selective pressures. In the standard model, the allele-frequency trajectory of a beneficial mutation is necessary and sufficient information for predicting its hitchhiking effect on the linked neutral variation (Betancourt et al. 2004; Chevin and Hospital 2008). For example, the speed with which the beneficial mutation increases in the population determines the size of the genomic regions affected by hitchhiking. However, in our model with host heterogeneity, different combinations of parameter values (j and Wi(j)) that specify the same allele frequency trajectory of the resistance allele may generate different hitchhiking effects. Schneider and Kim (2010) showed that the changes of sensitive and resistant allele frequencies are uniquely determined by their absolute fitness, μ and λ, respectively. With host heterogeneity, the fitness is simply the mean of W1(j) or W3(j) weighted by the frequencies of host classes (αj). The modifying factor of effective recombination rate, ϑ(m)λ (ore more generally ϑ(κ)λ), however is not a linear function of W1(j) or W3(j). As a result, as shown in Remark 4, ϑ(m)λ (or ϑ(κ)λ)) decreases as one introduces host heterogeneity while keeping λ and μ constant. This makes our simplest approximation assuming no host heterogeneity a conservative predictor of the hitchhiking effect.

Comparison of approximate solutions with empirical observation of local reductions of genetic variation around the loci of drug resistance mutations, combined with other genetic (e.g. recombination rate, the frequency change of drug resistance) and epidemiological (e.g. the mean number of independent parasite clones per host) information, will greatly advance our understanding of antimalarial drug-resistance evolution. Especially, if empirical data for the reduction in heterozygosity is available, our results can be used to determine possible ranges for parameters that are unknown and/or infeasible to measure. It should be noted that the fraction λ/μ can be easily estimated from retrospective genetic data. Since logpt1pt=tlogλμ, log λ/μ is just the slope of the linear regression of the logarithm of the ratio of resistant over sensitive parasites measured at different time points. However, it is difficult to scale time, because the number of transmission cycles per year is difficult to quantify. Moreover, also the parameter α should be easy to measure. On the opposite, the distribution of κ of m will be difficult to estimate. However, the distribution of m, and especially single infections, lead to a genome-wide reduction in relative heterozygosity if selection is sufficiently strong. This genome-wide reduction might be used to estimate the distribution κ. However, applying our results to real data lies beyond the scope of this article and will be accomplished in a follow-up paper.

Acknowledgments

This work was funded by the National Institute of Health grant R01GM084320. We want to thank Prof. Ananias Escalante for helpful comments on an earlier draft of this work. We gratefully acknowledge the fruitful discussions with him on this and similar topics. We also want to thank two anonymous reviewers.

Appendix A

A.1 Bounds and approximations

Proof of Remark 2

If a=(μλ)l for l ∈ ℕ we just need to adapt the proof of Theorem 1. The derivation of G1 assumed ξη. If this assumption is violated we have to adjust the derivation. In this case we have ξη=l and we can replace (24) by

g(x)=exlηce+1=exlηk=0(ce)k=k=0(c)ke(kl)=(c)lk=0(c)kle(kl).

Hence, we have to replace (25) in the proof of Theorem 1 by

G1(x)=g(x)dx=(c)lk=0(c)kle(kl)dx=(c)lk=0(c)kle(kl)dx=(c)l(k=0l1(c)kle(kl)dx+dx+k=l+1(c)kle(kl)dx)=(c)l(k=0l1(c)kle(kl)η(kl)+x+k=l+1(c)kle(kl)η(kl))=(c)l(k=0l1(c)kle(kl)η(kl)+x1ηk=1(c)k1exηkk)=(c)l(k=0l1(c)kle(kl)η(kl)+xlog(1+ce)η),

Thus, we have

G1(1)=(c)l(k=0l1(c)klη(kl)1log(1+ceη)η), (77)
G1(0)=(c)l(k=0l1(c)klη(kl)log(1+c)η), (78)

and

G1(x0)=(c)l(k=0l1(1)(kl)η(kl)+x0log2η). (79)

Combining the above with (26), the definitions of ξ, η, c, (31), and (32) yields

ψ((μλ)l)=Q0+rϑ(R0Q0)λ×[(p01p0)llogλlogμ(k=0l1(1)kkl(1(p01p0)kl)log2p0)(p01p0)ll(logλlogμ)(1F12(1,l1+l1))], (80)

and

Ψ((μλ)l)=Q0+rϑ(R0Q0)λ×[(p01p0)llogλlogμ(k=0l1(1)kkl(1(p0μ(1p0)λ)kl)log(p0λp0μ+(1p0)λ)+logλ+logμlog2)(p01p0)ll(logλlogμ)(1F12(1,l1+l1))], (81)

respectively. Note that we have

1F12(1,l1+l1)=1k=0(1)k¯(l)k¯(1)k(l+1)k¯k!=1k=0l(1)kl+k (82)
=1l(1)l(k=1(1)kkk=1l1(1)kk) (83)
=l(1)2lll(1)l(log2k=1l1(1)kk) (84)
=l(1)l(k=1l(1)kklog2) (85)
=lk=0l1(1)kkll(1)llog2. (86)

Hence, (80) and (81) simplify to

ψ((μλ)l)=Q0+rϑ(R0Q0)λ×[(p01p0)llogμlogλ(k=0l1(1)kkl(p01p0)kllogp0)], (87)

and

Ψ((μλ)l)=Q0+rϑ(R0Q0)λ[(p01p0)llogμlogλ(k=0l1(1)kkl(p0μ(1p0)λ)kllog(p0μp0μ+(1p0)λ))], (88)

which equal (36) and (35), respectively.

Now, let us regard g(x) given by (17) as a function in x and a and write ga(x) for it. Clearly for x ≥ 0, ga(x) is monotone increasing in a, and for x < 0 it is monotone decreasing. Since by Theorem 1 the integral xga(x)dx exists for ∈ {−1, 0} and a(μλ)l, it follows by the Theorems of monotone convergence and, in case = −1, also by the theorem of dominated convergence that limaa0xga(x)dx=xga0(x)dx. By setting a0=(μλ)l it follows that (36) and (35) are the continuations of (16) and (15) in the limit a(μλ)l.

If x0 ≤ 0, we do not need the function G1, and hence the derivations from Theorem 1 and Remark 1 hold for ψ. The same holds for Ψ if x0 ≤ −1. This finishes the proof.

Proof of Theorem 2

For κ ≠ −l(l ∈ ℕ) we have

F12(1,κ1+κz)=k=0(κ)k¯zk(κ+1)k¯=1+κk=1zkκ+k1+κzκ+1. (89)

First assume a(μλ)l for l ∈ ℕ, i.e., ξηl.

Setting κ=ξη and z = −c gives

F12(1,ξη1+ξηc)1η+ξ=1p0loga(loga+logλlogμ)(1p0), (90)

whereas setting κ=ξη and z=cη gives

F12(1,ξη1+ξηceη)1eη(η+ξ)=1p0μloga(loga+logλlogμ)(1p0)λ. (91)

Moreover, setting κ=ξη and z = −1 yields

F12(1,ξη1ξη1)1ξξη=1logalogalogλ+logμ, (92)

whereas by setting κ=ξη and z = −1 we obtain

F12(1,ξη1+ξη1)1ξξ+η=1logaloga+logλlogμ. (93)

First, combining (93), (90), and (92) with (31), and using this approximation in (20) yields (39) by using the definitions of ξ and η. Similarly, we obtain (40) by combining (93), (91), (92), with (32) and (21).

Clearly, (39) and (40) are continuous especially also at a=(μλ)l. Since ψ(a) and Ψ(a) have continuations at a=(μλ)l for (l ∈ ℕ) according to Remark 2, we have ϕ(a) ≈ ψ(a) and Φ(a) ≈ Ψ(a) for all a. This finishes the proof.

Proof of Remark 3

We obtain (41) by combining (33) with (89) for κ=logalogμlogλ and z=1p0p0, whereas we obtain (42) by combining (34) with (89) for κ=logalogμlogλ and z=(1p0)λp0μ.

Clearly (39), (40), (41), and (42) are continuous functions in p0. For p0=12 it is easily seen that (39) equals (41), whereas for p0=λλ+μ (40) equals (42). Hence, (41) and (42) are the continuations in p0 of (39) and (40), respectively.

Proof of Theorem 3

Note that we have

Q(a)=Q0+(R0Q0)λA, (94)

where

A=(p01p0)logalogμlogλ(1loga1logaλμ+1logaμλ)12aloga+p0μ2(1p0)logμ12loga+p02(1p0)logμ=(p01p0)logalogμlogλ(1loga+1logλ)1loga(1+a2a)(p01p0)logalogμlogλ(1logμ)+p02(1p0)logμ(μ+1)=(p01p0)logalogμlogλ(1loga+1logλ)1+a2alogap0(1p0)logμ((p01p0)logaλμlogμlogλ1)+p02(1p0)logμ(μaλ1)=1loga((p01p0)logalogμlogλ1)p0(1p0)logμ((p01p0)logaλμlogμlogλ1)1a2aloga+1logλ(p01p0)logalogμlogλp02(1p0)logμ(μ1). (95)

Furthermore, x1logx is monotone increasing in x and limx0x1logx=0. Since λ > μ and 12a1, we have 0<μ2. Hence, we obtain

0μ1logμ1log2,

and we have

0p02(1p0)logμ(μ1)p02log2(1+O(p0)), (96)

which follows from the fact that 11p0 can be written as a geometric series.

Therefore,

A1loga((p01p0)logalogμlogλ1)p0(1p0)logμ×((p01p0)logμlogμlogλ1)1a2aloga+1logaμλ(p01p0)logalogμlogλ. (97)

For C > 0 we obtain f(x)=Cx1x is monotone increasing in x, because f(x)=1Cx+CxlogCxx20. Furthermore, we have limx→0 f (x) = log C. By choosing C=(p01p0)1logμlogλ, we see that

p0(1p0)logμ((p01p0)logaλμlogμlogλ1)

is negligible compared with

1loga((p01p0)logalogμlogλ1). (98)

Furthermore, also

1logλ(p01p0)logalogμlogλ (99)

is negligible compared with (98). Therefore,

A1loga((p01p0)logalogμlogλ1)1a2aloga. (100)

A.2 Relative heterozygosity

Proof of Remark 4

Let f(x,y):=11x+1y for x, y ∈ ℝ+. Its Hessian matrix is calculated to be

H=(2fx22fyx2fxy2fy2)=(2y2(x+y)32xy(x+y)32xy(x+y)32x2(x+y)3).

Clearly, we have 2y2(x+y)3<0 and det H = 0, i.e., the leading minors of H are non-positive. Hence, f is concave but not strictly concave (note that f (x, x) = x/2). Hence, for positive random variables X and Y defined on a probability space (Ω, A, P) and a sub-σ algebra B the Jensen’s inequality for higher dimensions yields

E[f(X,Y)B]f(E[XB],E[YB]).

Now, choose (Ω, A, P) = (ℕ, P(ℕ), P), where P(j) = αj. Moreover, choose X(j)=W1(j),Y(j)=(m1)W3(j) and B = {0̷, U, T, ℕ}. Then the Jensen’s, inequality gives

jUαj(m1)W1(j)W3(j)(m1)W1(j)+W3(j)=(1α)jUαj1α1W1(j)+1(m1)W3(j)=(1α)E[f(X,Y)U](1α)f(E[XU],E[YU])=(1α)1W1(U)+1(m1)W3(U)=(1α)(m1)W1(U)W3(U)(m1)W1(U)+W3(U).

Similarly, we obtain

jTαj(m1)W1(j)W3(j)(m1)W1(j)+W3(j)α(m1)W1(T)W3(T)(m1)W1(T)+W3(T).

Combination of the above yields

j=0αj(m1)W1(j)W3(j)(m1)W1(j)+W3(j)(1α)(m1)W1(U)W3(U)(m1)W1(U)+W3(U)+α(m1)W1(T)W3(T)(m1)W1(T)+W3(T), (101)

i.e., that ϑ̃(m) given by (64) is smaller than if it is given by (58).

Clearly in the limit m → ∞ equality holds in (101). Moreover, ϑ̃(m)λ. Hence, in the limit case (m) reduced to the approximation for standard hitchhiking.

The proof is finished by applying the argument formulated above the remark.

Contributor Information

Kristan A. Schneider, Email: kristan.schneider@asu.edu, School of Life Sciences, Arizona State University, 1711 South Rural Road, Tempe, AZ 85287, USA; Department of Mathematics, University of Vienna, Nordbergstrasse 15, UZA 4, 1090 Vienna, Austria; CEMI/Biodesign Institute, Arizona State University, P. O. Box 875301, Tempe, AZ 85287-5301, USA.

Yuseob Kim, School of Life Sciences, Arizona State University, 1711 South Rural Road, Tempe, AZ 85287, USA; Center for Evolutionary Medicine and Informatics, Biodesign Institute, 1001 S. McAllister Ave., Tempe, AZ 85281, USA.

References

  1. Barton NH. Genetic hitchhiking. Philos Trans R Soc Lond B Biol Sci. 2000;355(1403):1553–1562. doi: 10.1098/rstb.2000.0716. http://rstb.royalsocietypublishing.org/content/355/1403/1553.abstract. [DOI] [PMC free article] [PubMed]
  2. Betancourt AJ, Kim Y, Orr HA. A pseudohitchhiking model of X vs. autosomal diversity. Genetics. 2004;168(4):2261–2269. doi: 10.1534/genetics.104.030999. http://www.genetics.org/cgi/content/abstract/168/4/2261. [DOI] [PMC free article] [PubMed]
  3. Brooks DR, Wang P, Read M, Watkins WM, Sims PFG, Hyde JE. Sequence variation of the hydroxymethyldihydropterin pyrophosphokinase: dihydropteroate synthase gene in lines of the human malaria parasite Plasmodium falciparum with differing resistance to sulfadoxine. Eur J Biochem. 1994;224(2):397–405. doi: 10.1111/j.1432-1033.1994.00397.x. http://dx.doi.org/10.1111/j.1432-1033.1994.00397.x. [DOI] [PubMed]
  4. Chevin L-M, Hospital F. Selective sweep at a quantitative trait locus in the presence of background genetic variation. Genetics. 2008;180(3):1645–1660. doi: 10.1534/genetics.108.093351. http://www.genetics.org/cgi/content/abstract/180/3/1645. [DOI] [PMC free article] [PubMed]
  5. Cowman AF, Morry MJ, Biggs BA, Cross GA, Foote SJ. Amino acid changes linked to pyrimethamine resistance in the dihydrofolate reductase-thymidylate synthase gene of Plasmodium falciparum. Proc Natl Acad Sci USA. 1988;85(23):9109–9113. doi: 10.1073/pnas.85.23.9109. http://www.pnas.org/content/85/23/9109.abstract. [DOI] [PMC free article] [PubMed]
  6. Daily JP. Antimalarial drug therapy: the role of parasite biology and drug resistance. J Clin Pharmacol. 2006;46(12):1487–1497. doi: 10.1177/0091270006294276. http://jcp.sagepub.com/cgi/content/abstract/46/12/1487. [DOI] [PubMed]
  7. Escalante AA, Smith DL, Kim Y. The dynamics of mutations associated with anti-malarial drug resistance in plasmodium falciparum. Trends Parasitol. 2009;25(12):557–563. doi: 10.1016/j.pt.2009.09.008. http://www.sciencedirect.com/science/article/B6W7G-4XJ9BS6-1/2/04d13d69d2006be02c12ef0d051cc5c6. [DOI] [PMC free article] [PubMed]
  8. Hastings IM. Response to: the puzzling links between malaria transmission level and drug resistance. Trends Parasitol. 2003;19(4):160–161. doi: 10.1016/s1471-4922(03)00054-0. http://www.sciencedirect.com/science/article/B6W7G-483SMKD-1/2/680a5da4f74f112c6e1b87c270a9921d. [DOI] [PubMed]
  9. Hastings IM. Complex dynamics and stability of resistance to antimalarial drugs. Parasitology. 2006;132(05):615–624. doi: 10.1017/S0031182005009790. http://journals.cambridge.org/action/displayAbstract?fromPage=online&aid=428221&fulltextType=RA&fileId=S0031182005009790. [DOI] [PubMed]
  10. Hastings IM, Mackinnon MJ. The emergence of drug-resistant malaria. Parasitology. 1998;117(05):411–417. doi: 10.1017/s0031182098003291. http://journals.cambridge.org/action/displayAbstract?fromPage=online&aid=24263&fulltextType=RA&fileId=S0031182098003291. [DOI] [PubMed]
  11. Hastings IM, Watkins WM, White NJ. The evolution of drug-resistant malaria: the role of drug elimination half-life. Philos Trans R Soc Lond B Biol Sci. 2002;357(1420):505–519. doi: 10.1098/rstb.2001.1036. http://rstb.royalsocietypublishing.org/content/357/1420/505.abstract. [DOI] [PMC free article] [PubMed]
  12. Hedrick PW. Hitchhiking: a Comparison of Linkage and Partial Selfing. Genetics. 1980;94(3):791–808. doi: 10.1093/genetics/94.3.791. http://www.genetics.org/cgi/content/abstract/94/3/791. [DOI] [PMC free article] [PubMed]
  13. Hermisson J, Pennings PS. Soft sweeps: molecular population genetics of adaptation from standing genetic variation. Genetics. 2005;169(4):2335–2352. doi: 10.1534/genetics.104.036947. http://www.genetics.org/cgi/content/abstract/169/4/2335. [DOI] [PMC free article] [PubMed]
  14. Kim Y, Stephan W. Detecting a local signature of genetic hitchhiking along a recombining chromosome. Genetics. 2002;160(2):765–777. doi: 10.1093/genetics/160.2.765. http://www.genetics.org/cgi/content/abstract/160/2/765. [DOI] [PMC free article] [PubMed]
  15. Korenromp E, Miller J, Nahlen B, Wardlaw T, Young M. World malaria report 2005. World Health Organization (WHO); Geneva: 2005. [Google Scholar]
  16. Mackinnon MJ, Hastings IM. The evolution of multiple drug resistance in malaria parasites. Trans R Soc Tropical Med Hygiene. 1998;92(2):188–195. doi: 10.1016/s0035-9203(98)90745-3. http://www.sciencedirect.com/science/article/B75GP-4BY31N7-MS/2/bfd1e32b1cf5fd0f23e10978a356bd36. [DOI] [PubMed]
  17. Marsh K. Malaria disaster in africa. Lancet. 1998;352(9132):924–924. doi: 10.1016/S0140-6736(05)61510-3. http://www.sciencedirect.com/science/article/B6T1B-4FWV357-JX/2/c9facab67b868b8fddd289f050bfae19. [DOI] [PubMed]
  18. Maynard Smith J, Haigh J. The hitch-hiking effect of a favourable gene. Genet Res. 1974;23(01):23–35. http://journals.cambridge.org/action/displayAbstract?fromPage=online&aid=1754360&fulltextType=RA&fileId=S0016672300014634. [PubMed]
  19. McCollum AM, Basco LK, Tahar R, Udhayakumar V, Escalante AA. Hitchhiking and selective sweeps of Plasmodium falciparum sulfadoxine and pyrimethamine resistance alleles in a population from Central Africa. Antimicrob Agents Chemother. 2008;52(11):4089–4097. doi: 10.1128/AAC.00623-08. http://aac.asm.org/cgi/content/abstract/52/11/4089. [DOI] [PMC free article] [PubMed]
  20. Mita T, Tanabe K, Takahashi N, Tsukahara T, Eto H, Dysoley L, Ohmae H, Kita K, Krudsood S, Looareesuwan S, Kaneko A, Bjorkman A, Kobayakawa T. Independent evolution of pyrimethamine resistance in Plasmodium falciparum isolates in Melanesia. Antimicrob Agents Chemother. 2007;51(3):1071–1077. doi: 10.1128/AAC.01186-06. http://aac.asm.org/cgi/content/abstract/51/3/1071. [DOI] [PMC free article] [PubMed]
  21. Nair S, Nash D, Sudimack D, Jaidee A, Barends M, Uhlemann A-C, Krishna S, Nosten F, Anderson TJC. Recurrent gene amplification and soft selective sweeps during evolution of multidrug resistance in malaria parasites. Mol Biol Evol. 2007;24(2):562–573. doi: 10.1093/molbev/msl185. http://mbe.oxfordjournals.org/cgi/content/abstract/24/2/562. [DOI] [PubMed]
  22. Nair S, Williams JT, Brockman A, Paiphun L, Mayxay M, Newton PN, Guthmann J-P, Smithuis FM, Hien TT, White NJ, Nosten F, Anderson TJC. A selective sweep driven by pyrimethamine treatment in Southeast Asian malaria parasites. Mol Biol Evol. 2003;20(9):1526–1536. doi: 10.1093/molbev/msg162. http://mbe.oxfordjournals.org/cgi/content/abstract/20/9/1526. [DOI] [PubMed]
  23. Nash D, Nair S, Mayxay M, Newton PN, Guthmann J-P, Nosten F, Anderson TJ. Selection strength and hitchhiking around two anti-malarial resistance genes. Proc R Soc B Biol Sci. 2005;272(1568):1153–1161. doi: 10.1098/rspb.2004.3026. http://rspb.royalsocietypublishing.org/content/272/1568/1153.abstract. [DOI] [PMC free article] [PubMed]
  24. Prugnolle F, Durand P, Renaud F, Rousset F. Effective size of the hierarchically structured populations of the agent of malaria: a coalescent-based model. Heredity. 2009:1–7. doi: 10.1038/hdy.2009.127. http://dx.doi.org/10.1038/hdy.2009.127. [DOI] [PubMed]
  25. Schneider KA, Kim Y. An analytical model for genetic hitchhiking in malaria parasites caused by drug resistance. Theor Popul Biol. 2010 doi: 10.1016/j.tpb.2010.06.005. [DOI] [PMC free article] [PubMed] [Google Scholar]
  26. Schwobel B, Alifrangis M, Salanti A, Jelinek T. Different mutation patterns of atovaquone resistance to Plasmodium falciparum in vitro and in vivo: rapid detection of codon 268 polymorphisms in the cytochrome b as potential in vivo resistance marker. Malaria J. 2003;2(1):5. doi: 10.1186/1475-2875-2-5. http://www.malariajournal.com/content/2/1/5. [DOI] [PMC free article] [PubMed]
  27. Stephan W, Wiehe THE, Lenz MW. The effect of strongly selected substitutions on neutral polymorphism: analytical results based on diffusion theory. Theor Popul Biol. 1992;41(2):237–254. http://www.sciencedirect.com/science/article/B6WXD-4F1Y9N0-3M/2/1245281bba0c6b542457fdd75c343edf.
  28. Triglia T, Cowman AF. Primary structure and expression of the dihydropteroate synthetase gene of Plasmodium falciparum. Proc Natl Acad Sci USA. 1994;91(15):7149–7153. doi: 10.1073/pnas.91.15.7149. http://www.pnas.org/content/91/15/7149.abstract. [DOI] [PMC free article] [PubMed]
  29. WHO. WHO Expert Committee on malaria. World Health Organ Tech Rep Ser. 2000;892:1–74. http://www.genetics.org/cgi/content/abstract/160/2/765. [PubMed]
  30. Wootton JC, Feng X, Ferdig MT, Cooper RA, Mu J, Baruch DI, Magill AJ. Su X-z 07 2002 Genetic diversity and chloroquine selective sweeps in Plasmodium falciparum. Nature. 418(6895):320–323. doi: 10.1038/nature00813. http://dx.doi.org/10.1038/nature00813. [DOI] [PubMed]

RESOURCES