Abstract
Aggregation, a fundamental feature of parasite distributions, has been measured using a variety of indices. We use the definition that parasite–host system A is more aggregated than parasite–host system B if any given proportion of the parasite population is concentrated in a smaller proportion of the host population A than of host population B. This leads to indices based on the Lorenz curve such as the Gini index (Poulin’s D), coefficient of variation and the Hoover index, all of which measure departure from a uniform distribution. The Hoover index is particularly useful because it can be interpreted directly in terms of parasites and hosts. An alternative view of aggregation is degree of departure from a Poisson (or random) distribution, as used in the index of dispersion and the negative binomial k. These and Lloyd’s mean crowding index are reinterpreted and connected back to Lorenz curves. Aggregation has occasionally been defined as the slope from Taylor’s law, although the slope appears unrelated to other indices. The Hoover index may be the method of choice when data points are available, and the coefficient of variation when only variance and mean are given.
Keywords: aggregation, Lorenz curve, Hoover index, Gini index, coefficient of variation, measurement error
1. Introduction
It is estimated that nearly half of all animal species are parasitic, that is, they live on or in another living creature from which they derive nourishment [1]. An essential feature of almost all parasites is that they are not distributed evenly across host populations; typically, a few hosts have many parasites and many hosts have few or none. Uneven distribution is characteristic of many living organisms but appears particularly acute in parasites, where it has significant ecological consequences. It can limit parasite population size through: mortality of heavily infected hosts, stimulation of a host protective response, probability of encounter with interspecific competitors, and reduced parasite fecundity (the ‘crowding’ effect). Parasite aggregation also facilitates cross-fertilization, bringing with it all the genetic benefits that entails. As a result of these effects, the phenomenon is considered of prime importance in parasite ecology [2,3].
Determining the causes of aggregation from theory and from field data has been hampered by confusion over its measurement [4]. The problem is that there is no universally recognized definition for parasite aggregation. This was cogently stated by Pielou [5] for spatial ecology in general when she wrote ‘the phrase “degree of aggregation” describes a vague, undefined notion that is open to several interpretations. If aggregation is to be measured we must first choose from a number of possibilities some measurable property of a spatial pattern that is to be called its aggregation, and the method of measurement is then implicit in the chosen definition. Thus the several existing ways of measuring aggregation are not different methods of measuring the same thing: they measure different things.’ This is the situation in parasitology today. There are a number of methods, some summarized in Wilson et al. [6], but, despite the reassuring lines from Reiczigel et al. [7] that their interpretations are identical and they more or less predict each other, the different methods compare different things and are not readily comparable (figure 1). As a consequence, different methods can give opposing answers [8–10], and treating them as measuring the same thing can lead to conclusions that are not entirely justified.
Figure 1.

Plot of the index of dispersion (variance-to-mean ratio (VMR)) against the Gini index (Poulin’s D) calculated on samples of helminth parasites from the great cormorant (Phalacrocorax carbo) data reported in table 7 of Kanarek & Zaleśny [8]. There is no distinguishable pattern, contrary to what one would expect if the two indices measured the same quantity.
Here, we start with a definition of aggregation based on the concept of Poulin [11]. We then explore the mathematical relationships of commonly used methods dealing first with those based on departure from a uniform distribution, specifically Lorenz curves. Departure from Poisson is examined in relation to the index of dispersion, k, and Lloyd’s mean crowding, and we conclude with some comments on Taylor’s law.
2. Lorenz curves
In a pioneering paper Poulin [11] considered that aggregation was evident if, ‘when hosts in a sample are ranked from least to most infected, and the cumulative number of parasites is plotted against the cumulative number of hosts; the cumulative number of parasites rises more rapidly as the more heavily infected hosts are reached.’ The curve which Poulin described is essentially the Lorenz curve used extensively in economics and statistics [12].
Using the Lorenz curve we can make the following comparative definition of aggregation: parasite–host system A is more aggregated than parasite–host system B if any given proportion of the parasite population is concentrated in a smaller proportion of the host population A than of host population B. Graphically, this means that the Lorenz curve of the more aggregated population A is below the Lorenz curve of the less aggregated population B everywhere as in figure 2a. Such Lorenz curves are said to be ordered. Under this definition of aggregation, only the host’s parasite burden relative to the total number of parasites in the sample is important and not the absolute counts.
Figure 2.
Lorenz curves for three distributions: (a) uniform distribution (solid), Poisson distribution with mean m = 3 (dashed), negative binomial distribution with m = 3 and k = 1 (dotted). Note that the less aggregated distribution (dashed) is above the more aggregated distribution (dotted) everywhere. Thus, according to our definition, the negative binomial distribution is more aggregated than the Poisson distribution with this choice of parameters. (b) Lorenz curves for the Poisson distribution with m = 1 (dashed) and negative distribution with m = 3 and k = 1 (dotted line). As the curves cross, we cannot say one distribution is more aggregated than the other, using the definition given at the start of this section.
The straight line in figure 2 represents a parasite–host system in which all hosts have exactly the same parasite burden, termed a ‘uniform’ distribution in parasitology, but a ‘degenerate’ distribution in statistics. When we refer to a uniform distribution throughout the paper, we are referring to the parasitological uniform distribution not a statistical uniform distribution. This is the least aggregated parasite–host system. It lies above the Lorenz curves of all other distributions.
As Lorenz ordering is only a partial ordering, it can happen that the curves cross, in which case the curves are said to be unordered (figure 2b). It is desirable, therefore, to have indices which agree with Lorenz ordering when present, but still allow comparison of parasite–host systems even when the curves are unordered. Such an index is necessarily a Schur convex function of the hosts’ burden relative to the average number of parasites per host [12, definition 2.2]. The family of Schur convex functions is large. For example, if x1, …, xn are the parasite counts from a sample of n hosts and g is any continuous convex function, then the index Ig defined by
| 2.1 |
where , is a Schur convex function of the relative parasite burden [12, corollary 3.2.1]. It is also desirable that the index does not depend on the labelling of hosts, that is, taking hosts in a particular order. However, this does not refine the class of reasonable indices since all Schur convex functions possess this property. If there were other properties identified in the parasitological literature that an aggregation measure should possess, then we could use these to characterize all aggregation measures. That is the approach used in economics to construct inequality indices, often called the axiomatic approach [13]. In the absence of such properties, we will briefly discuss three important indices that are consistent with our definition of aggregation.
The first of these is the Hoover index, also known as the Robin–Hood index or Pietra index, which does not appear to have been used much in the study of parasite aggregation. The Hoover index can be calculated from
which is the sum of the absolute differences from the mean divided by twice the sum of the values. It can also be expressed as
which has the form of a Schur convex function (2.1) with . It takes the minimal value of 0 when all hosts have the same parasite burden and the maximal value when all parasites are concentrated in a single host. The maximal value that the Hoover index can take is (1 − n−1), so for large sample sizes this will be approximately 1. The behaviour of the index between these two extremes is particularly simple; the Hoover index gives the proportion of parasites in the sample that need to be redistributed among the hosts in order to produce a uniform distribution. Therefore, an increase of 0.1 in the Hoover index means that an additional 10% of the parasite population would have to be redistributed, regardless of the specific value of the index. When calculating the Hoover index from data, the ‘bias corrected and accelerated’ bootstrap can be applied to construct confidence intervals for the estimate [14, ch. 14]. The main limitation of the index is that two distributions that satisfy Lorenz ordering, and so have different levels of aggregation (according to our definition), could have the same value for the Hoover index if the proportion of parasites in hosts with above average parasite burden is the same (figure 3).
Figure 3.
Three Lorenz curves each with a Hoover index of approximately 0.42 but derived from distributions of very different shapes. According to our definition of aggregation, the distribution indicated by the dotted line is less aggregated than the distribution indicated by the dashed line, which is less aggregated than the distribution indicated by the dot-dashed line. The dotted line is a Lorenz curve of a distribution supported on two values. The dashed line is the Lorenz curve of a negative binomial distribution with m = 3 and k = 1. The dot-dashed curve is the Lorenz curve of the most aggregated distribution compared with this negative binomial distribution having the same Hoover index. The graph demonstrates that in extreme cases the Hoover index does not necessarily distinguish between levels of aggregation.
The coefficient of variation (cv), given by the ratio of the standard deviation to the mean, has been used on a few occasions in the literature as a measure of parasite aggregation [15, ch. 4]. Squaring the coefficient of variation yields a Schur convex function of the form (2.1) with g(x) = (x − 1)2, which is the ratio of the variance to mean squared. Therefore, the coefficient of variation can be used to compare levels of aggregation when only the means and variances are available. Like the Hoover index, the coefficient of variation takes the minimal value of 0 when all hosts have the same parasite burden and the maximal value when all parasites are concentrated in a single host although the maximal value is now n(n − 1). Although the index lacks a clear interpretation, we connect it to other indices in §4.
The coefficient of variation has a property that is particularly useful for studying aggregation in heterogeneous populations. The squared coefficient of variation is a member of the one-parameter family of indices that preserves the Lorenz ordering and, when applied to heterogeneous populations, can be decomposed into a sum of within group aggregation and between group aggregation [16]. Shaw et al. [17] showed empirically that populations of parasites appeared more aggregated, as measured by k−1, than their constituent subpopulations formed by conditioning on certain host characteristics such as sex, herd, age and location. To see when this behaviour might be expected for the squared coefficient of variation, let cvj and denote the coefficient of variation and mean for the j-th subpopulation and let πj denote the proportion of hosts from the j-th subpopulation. The squared coefficient of variation of the population can be decomposed as
The last term in the above equation is the squared coefficient of variation for a population where host–parasite burden is replaced by the average parasite burden for its subpopulation. As
by Jensen’s inequality, it follows that if the subpopulations have similar values for the coefficient of variation but considerable differences in the means, then the population as a whole will appear more aggregated than the component subpopulations.
The final index we consider in this section is the Gini index, which can be evaluated as
Poulin [11] proposed to use this as a measure of aggregation, which he called the index of discrepancy (D). It has since become commonly accepted [7]. The index is usually defined as the ratio of the area between the Lorenz curve and straight line, and the area under the straight line. Like the Hoover index, the Gini index takes minimal and maximal values of 0 and (1 − n−1), respectively. Although the index may be interpreted in terms of the Lorenz curve, it lacks a direct interpretation in terms of parasites and hosts. It also lacks the ability to be decomposed into a sum of within group aggregation and between group aggregation, except under the unlikely scenario where the distributions of parasite burden of the groups do not overlap [18].
The Gini index, Hoover index and coefficient of variation are closely related to each other and to the level of prevalence through the following inequalities (proof given in the electronic supplementary material):
| 2.2 |
These inequalities are illustrated in figures 4 and 5. Figure 4 gives values of the indices calculated assuming an infinite sample size from a negative binomial distribution, which is known to provide a good fit for many parasite distributions [19], with mean and variance related by Taylor’s law. The prevalence–abundance relationship implied by Taylor’s law and the negative binomial distribution has been previously noted [21]. In figure 5, the same indices were calculated directly from data for samples of the monogenean Gotocotyle secunda and metacestodes of the cestode Otobothrium cysticum collected from narrow-barred Spanish mackerel Scomberomorus commerson (data from [22] and available in the electronic supplementary material).
Figure 4.

Values of the four indices at different means calculated from the negative binomial distribution with mean and variances to match Taylor’s law described in Shaw & Dobson [19] (log10(v) = 1.098 + 1.551 × log10(m)). The five curves from top to bottom are the coefficient of variation, twice the Hoover index, the Gini index, the Hoover index, and 1 − prevalence. The order of the curves is as stated in inequality (2.2). Equations for the mean absolute deviation and mean absolute difference required to compute the Hoover and Gini index, respectively, were taken from Ramasubban [20]. (Online version in colour.)
Figure 5.
Values for the four indices calculated from samples of the monogenean Gotocotyle secunda from seven locations and samples of the cestode Otobothrium cysticum from six locations.
These figures suggest that these indices are larger when prevalence is small, as inequality (2.2) demonstrates. Furthermore, since prevalence is bounded by the mean, inequality (2.2) implies a lower bound of 1 − m on each of these indices. Thus, the increase at very small values of the mean could be considered a consequence of the method of calculation, but, as Hurlbert [23] points out, if the mathematical properties of the index do not conform to our concept of aggregation, then we have the option of reconsidering the concept.
3. Lorenz curves and incomplete counts
A complete count of a parasite burden is not always possible. In heavily parasitized hosts, one may restrict the count to only part of the host to reduce the workload involved [22,24]. Parasites such as lice can be hard to isolate so that not all parasites are counted [25]. Sometimes it is necessary to rely on indirect measures of parasite burden based on coprological or haematological samples [6, §§2.3.3 and 2.3.4]. More generally, a count of the total number of parasites per host can exceed the resources available. In statistics, these counts are said to be subject to ‘measurement error’, a topic that has received little attention in the parasitological literature.
Measurement error can be viewed as a random transformation of the true parasite burden. It is known that very few transformations (both deterministic and random) preserve the Lorenz order of distributions [12, ch. 4]. Hence indices based on Lorenz curves are likely to be affected by incomplete counts.
To make the discussion more concrete, we will assume the following simple model of measurement error where each parasite is counted with a fixed probability ω independently of the other parasites. In other words, conditional on the true parasite burden, the incomplete parasite count is a realization from a binomial distribution with success probability ω and number of trials given by the true parasite burden. This is called ‘binomial thinning’ or just ‘thinning’ in the statistics literature. Assuming an infinite sample size, the observed level of aggregation of a parasite distribution decreases as the probability of including a parasite in the count increases. This is a consequence of Arnold [12, theorem 3.4] (proof given in the electronic supplementary material).
The phenomenon is illustrated in figure 6 using Hoover and Gini indices. The solid lines are derived from a negative binomial distribution representing a typical parasite distribution. The dashed lines are derived from a uniform distribution. When the probability of including a parasite in the count is more than 0.2, the dashed lines are below the solid lines so the negative binomial appears more aggregated than the uniform distribution. However, when the probability of inclusion is sufficiently small, less than 0.2 in our example, the dashed lines are above the solid lines, meaning that the sample from the uniform distribution appears more aggregated than the sample from the negative binomial distribution. Although the Hoover and Gini indices do not admit a simple explicit expression under thinning, we can bound the effect of thinning on these indices (proof given in the electronic supplementary material). Let Hω and Gω denote the Hoover and Gini indices for the population where parasites are counted with probability ω. Then we have the inequalities
| 3.1 |
and
| 3.2 |
where m is the mean parasite burden for the population. As the mean of the thinned counts is ωm, these inequalities show that the indices for the thinned counts will not differ much from the values without thinning provided the means from the thinned counts are not small.
Figure 6.
The Hoover index and Gini index are calculated for the negative binomial distribution with m = 50 and k = 3 (solid lines) and uniform distribution with m = 10 (dashed lines) subject to binomial thinning, calculated using equations from Ramasubban [20] and Johnson [26]. From inequality (2.2), the curve for the Gini index is above the curve for the Hoover index. As the probability of the parasite being counted decreases, both indices increase.
While the Hoover and Gini indices can only be bounded or computed numerically, the coefficient of variation, denoted cvω, has the explicit expression
| 3.3 |
where parasites are counted with probability ω. From this equation, we see that the cvω has the same behaviour as the Hω and Gω in that cvω increases rapidly as the probability of inclusion decreases to zero for a given distribution. The equation also allows the coefficient of variation without thinning cv1 to be readily estimated if the probability of inclusion is known. For example, for G. secunda in figure 5, gills on only one side of the fish were counted, hence we take ω = 0.5. The estimated cv and mean from the thinned count were 1.37 and 1.17, respectively. Adjusting for thinning, the true cv is estimated to be 1.20. For the O. cysticum, samples, parasites were counted from a 3% section of the gut. The estimated cv and mean from the thinned count was 2.80 and 9.9, respectively. After adjusting for thinning, the true cv is estimated to be 2.78. The adjustments in these two cases were not large since the means from the thinned counts, given by ωm, were relatively large. In summary, while the indices computed from thinned counts may overestimate the level of aggregation, the difference will be small unless the mean parasite burden in the thinned count is small.
4. Index of dispersion and measures related to the Poisson distribution
An alternative view of aggregation that has been dominant in the parasitology literature is as a departure from the Poisson distribution, also called the random distribution in parasitology. A simple model of host–parasite interaction, assuming (i) encounters between parasites and the host follow a Poisson process, (ii) no parasite reproduction occurs within the host, and (iii) parasites die at a density-independent death rate, leads to a Poisson distribution for parasite burden for a host of a given age [27]. Departures from a Poisson distribution for parasite burden, therefore, indicate a violation of one or more of these assumptions [27,28]. The importance of stratifying samples by host age, among other demographic variables, has been noted by a number of researchers [6, section 2.3.1]. Since combining age classes inflates various estimates of aggregation [17,29], it is not possible to reject the Poisson null model using data that ignore the age of the hosts.
One common way to measure departure from a Poisson distribution is using the index of dispersion, I, also called the variance to mean ratio (VMR). For a Poisson distribution I = 1. Biological distributions are rarely Poisson and usually the variance is greater than the mean, so I > 1. These considerations lead to defining an aggregated parasite distribution as one in which I > 1 [2,3,6]. However, Hurlbert [23] used a series of hypothetical situations to show that I is not necessarily a good indication of departure from a Poisson distribution. Indeed, although I = 1 in a Poisson distribution, the converse that when I = 1 the distribution is a Poisson one is false.
Closely related to the index of dispersion is the parameter k of the negative binomial distribution. Early work of Milne [30] on sheep ticks and Crofton [31] on the acanthocephalan Polymorphus minutus in Gammarus pulex found that the negative binomial distribution provided a good model of the distributions of these parasites. The extensive study by Shaw et al. [17] showed that this pattern holds for a large number of parasite–host systems. Since then, the parameter k of the negative binomial has become a commonly used inverse measure of aggregation (e.g. [32]). Assuming a negative binomial distribution, the two measures are related by k−1 = (I − 1)/m.
The parameter k has been used as an indicator of closeness to the Poisson distribution because if the mean is fixed, as k approaches infinity, the distribution approaches the Poisson. The importance of the mean is often forgotten when comparing k with the Poisson distribution, with the result that it is sometimes claimed that the negative binomial is close to the Poisson when k is sufficiently large, e.g. k ≥ 8 [3] or k ≥ 20 [6], without regard to the size of the mean.
To illustrate this, suppose we measure the distance between two distributions using total variation, which is equivalent to Hurlbert’s Dp. Total variation takes the maximum value of 1 when the two distributions do not overlap. Direct calculation shows that the total variation distance between a Poisson distribution with m = 200 and a negative binomial distribution with k = 20 and m = 200 is approximately 0.5. On the other hand, the total variation distance between a Poisson distribution with m = 2 and a negative binomial distribution with the same value of k (20) and m = 2 is approximately 0.02. Thus, for these two distributions with identical k values the total variation distance is much greater at a large mean than at a small mean. This distinction is not purely theoretical. Suppose we simulate data from a negative binomial with m = 200 and k = 20 and then fit both the negative binomial distribution and Poisson distributions using maximum likelihood. Using the Akaike information criterion to decide between the two distributions, the negative binomial would be chosen with probability around 0.9 for sample sizes as small as five. Likewise, it has been suggested that small values of k indicate a departure from randomness. This can also be shown not to hold in general using similar calculations.
Unlike I and k, Lloyd’s mean crowding is an aggregation measure that has a clear interpretation in terms of parasites and hosts [33]. As applied in parasitology, mean crowding can be defined as the mean number per parasite of other conspecific parasites in the same host. Mean crowding (mc) can be expressed in terms of the mean (m) and variance (v) of the host’s parasite burden as
For a Poisson distribution mc = m so mean crowding could potentially describe departures from a Poisson distribution but, given its close connection to the index of dispersion, it will necessarily have the same defects. Lloyd also proposed a ‘patchiness’ index, which he defined as how many times as ‘crowded’ an individual is, on average, as it would have to be if the same population had a Poisson distribution.
Although mean crowding and patchiness have usually been interpreted by comparison with the Poisson distribution, one can also study the behaviour of these two indices relative to the uniform distribution. The first observation we can make in this direction is that mc = m − 1 for the uniform distribution since I = 0, and this is the smallest value mean crowding can take for a fixed integer mean. In this case, we may interpret I as the increase in mean crowding over what one would observe for a uniform distribution with the same mean.
When the mean is not an integer, the smallest value mean crowding can take is no longer m − 1 since a non-integer mean for an integer-valued random variable implies that the distribution is not uniform. The lower bound on mean crowding for a given mean is
| 4.1 |
where r is the integer such that r − 1 < m ≤ r. The distribution which attains this lower bound is actually the least aggregated distribution in the Lorenz ordering with mean m. (the proof is given in the electronic supplementary material).
For non-integer means the minimum mean crowding is greater than m − 1 (figure 7). Supposing a parasite distribution has a mean of 1.5 and variance of 23.5 (from Shaw & Dobson [19]), then mc = 16.2, but the mc for the least aggregated distribution at this mean is 0.67, not 0.5. Thus if the mean is not a whole number the calculated mc will be higher than adjacent integer means (figure 7). The large change observed in the range [0, 1] (figure 7) is due to minimum mean crowding being zero for m < 1. The difference decreases quickly as m increases.
Figure 7.
The difference between the minimum mean crowding and m − 1 plotted against m. The dashed line is 1/(4m) and shows the speed at which the difference decreases.
Overall, the index of dispersion I appears to be a good approximation of the increase in mean crowding over that observed for the least aggregated distribution with the same mean, at least when the mean is not small. In our example, I = 15.7 (from 23.5/1.5) compared with an increase in mean crowding of 15.5 over the least aggregated distribution with a mean of 1.5.
Returning to the patchiness index, we note that
So cv2 approximates the increase in patchiness over the least aggregated distribution with the same mean. The accuracy of the approximation depends on the mean.
5. Taylor’s law
Taylor’s law describes a frequently observed relationship between the mean and variance. Specifically, it asserts that the variance should be proportional to some (positive) power of the mean. While theoretical studies have proposed various biological and non-biological causes [21,34], numerous empirical studies have shown the relationship to be a consistent feature of many parasite populations. In these, the exponent from Taylor’s law, that is, the slope in the log-log plot of variance against mean, is frequently around 1.5. Rarely does the slope exceed 2.
The slope clearly defines the relationship between the mean and variance. Taylor [35] suggested that the slope was not only a true population statistic, but also an ‘index of aggregation’ that described an intrinsic property of the organisms concerned. Consequently, the slope has been used by a number of authors as the basis for a measure of aggregation [8,10,36–43]. While the methodology has been used to explore the relationship between the distribution of parasites and biologically meaningful covariates [21], it is far from clear how the slope from Taylor’s law relates to other concepts of aggregation.
One connection that is sometimes made is that for the Poisson (random) distribution the mean is equal to the variance so the resulting slope is 1. This has led some authors to test if the slope is 1, the rejection of which indicates a departure from the Poisson distribution [6,8,41,44]. A difficulty with using the slope in this way is that, although a slope of 1 indicates that the index of dispersion is constant as a function of the mean, it does not necessarily indicate that the distribution is close to the Poisson distribution since the intercept is often very different from zero [19]. One way to overcome this is to use the slope together with the intercept to estimate the index of dispersion at a specified mean, e.g. the I10 as used by Lester & McVinish [45] and the dispersion score of Lester [46].
Many of the measures of aggregation mentioned earlier are functions of the mean and variance so Taylor’s law implies that these must be a function of the mean. In a typical parasite population where log10(v) = 1.098 + 1.551 × log10(m) [19] the index of dispersion and Lloyd’s mean crowding index display greater variation over the usual range of mean values than other indices such as patchiness or the coefficient of variation. Although the Hoover and Gini indices are not simply functions of the mean and variance, Taylor’s law will constrain their behaviour, as suggested by figure 4. This does not, however, affect the interpretation of these indices.
The properties of the indices discussed above are summarized in table 1.
Table 1.
Properties of the indices.
6. Discussion
We followed Poulin [11] in his concept of aggregation and his proposed method of measuring distance from a uniform distribution using a Lorenz curve. This is ideal because the full dataset is used, thus potentially leading to greater sensitivity when comparing samples. He proposed that the Gini index be used. We suggest that the Hoover index may be more useful, in part because it has a clear interpretation in terms of parasites and hosts.
Although separate from the issue of whether or not the Hoover index makes a good definition of aggregation, it is important to note that the index is relatively robust to the kind of measurement error we discuss in §2, provided the mean of the partial count is not small.
Some authors have suggested that the level of aggregation is characteristic of a parasite species [35,41]. This will depend on how the aggregation is measured. Authors may find it an attractive idea to have a value that is relatively independent of other parameters such as parasite abundance. Then samples or groups of samples could be ranked according to their distance from a uniform distribution. In general, indices based on Lorenz curves accomplish this, although the distances from a uniform distribution gradually decrease as the means increase (assuming Taylor’s law holds).
At very small means the distributions of most macroparasites have high values of aggregation, as measured by indices based on the Lorenz curve. For the Hoover index, this is consistent with the interpretation that the index is the proportion of parasites in the sample that need to be redistributed among the hosts in order to produce a uniform distribution. As the mean becomes very small the proportion that may need to be moved, e.g. one out of a total of two parasites, will become large.
The alternative approach is the measurement of distance from a Poisson distribution, typically using means and variances or some derivative. Lloyd’s mean crowding index discussed above has the attractive property of having a clear interpretation in terms of parasites and hosts. Other methods, generally having little interpretative value, include Morisiti’s index of dispersion [47], which is related to Lloyd’s patchiness, and a constraint-based approach [48]. They are not explored in this paper. We do caution, however, that although the interconnections between various indices may be helpful for interpreting reported results when original data are not available, this should not be taken as a reason for their use.
In deciding which method may be the most useful in the future, we follow the view of Reiczigel et al. [7], which favours an index that has a clear and distinct biological interpretation. The Hoover index can be interpreted as the proportion of parasites in the sample that need to be redistributed among the hosts in order to produce a uniform distribution. This is less clear with other indices, including the other recommended index, the coefficient of variation. However, the Hoover index and the coefficient of variation both follow Lorenz ordering and both have the advantage of simplicity of calculation (see electronic supplementary material for equations).
The major question we raise here is what do authors mean by aggregation? Until we have some consensus we may not be able to investigate further this basic property of macroparasite distributions. Careful definition will help to describe parasite population biology and in turn lead to identification of the sources for the aggregation. This may reveal host or parasite responses that are critical to maintaining the parasite population; for example, diplectanids appeared to be far more aggregated than can be accounted for with the current knowledge about their life cycles [45]. Indices with a clear interpretation such as the Hoover index may clarify this area of parasitology.
An improved understanding of aggregation, its importance and its origin could be obtained by investigating the degree of aggregatedness with respect to a null distribution. Taking the uniform distribution as the null model we can say that under the null model (i) all hosts face the same mortality risk, (ii) all hosts have the same stimulation of their immune system, (iii) all parasites face the same level of interspecific competition, (iv) all parasites have the same access to mates, and so forth. Perhaps there is no simple generating process which will result in a uniform distribution of parasites, but a theoretical investigation of the causes of aggregation requires some sort of generating process.
Supplementary Material
Supplementary Material
Acknowledgements
The authors thank Drs S. Blomberg, B.R. Moore, R.D. Adlard and two anonymous reviewers for their helpful comments on an earlier draft of the manuscript.
Data accessibility
Data for figure 5 are provided as electronic supplementary material.
Authors' contributions
R.J.G.L. initiated and guided the study and drafted the manuscript. R.M. analysed the indices and helped draft the manuscript. Both authors gave final approval for publication.
Funding
R.M. is supported in part by the Australian Research Council (Discovery grant no. DP150101459 and the ARC Centre of Excellence for Mathematical and Statistical Frontiers, CE140100049).
References
- 1.Weinstein SB, Kuris AM. 2016. Independent origins of parasitism in Animalia. Biol. Lett. 12, 20160324 ( 10.1098/rsbl.2016.0324) [DOI] [PMC free article] [PubMed] [Google Scholar]
- 2.Goater TM, Goater CP, Esch GW. 2014. Parasitism: the diversity and ecology of animal parasites, 2nd edn Cambridge, UK: Cambridge University Press. [Google Scholar]
- 3.Poulin R. 2007. Evolutionary ecology of parasites, 2nd edn Princeton, NJ: Princeton University Press. [Google Scholar]
- 4.Johnson PTJ, Hoverman JT. 2014. Heterogeneous hosts: how variation in host size, behaviour and immunity affects parasite aggregation. J. Anim. Ecol. 83, 1103–1112. ( 10.1111/1365-2656.12215) [DOI] [PubMed] [Google Scholar]
- 5.Pielou EC. 1977. Mathematical ecology. New York, NY: Wiley. [Google Scholar]
- 6.Wilson K, Bjørnstad ON, Dobson AP, Merler S, Poglayen G, Randolph SE, Read AF, Skorping A. 2001. Heterogeneities in macroparasite infections: patterns and processes. In The ecology of wildlife diseases (eds PJ Hudson, A Rizzoli, BT Grenfell, H Heesterbeek, AP Dobson). Oxford, UK: Oxford University Press.
- 7.Reiczigel J, Marozzi M, Fábián I, Rózsa L. 2019. Biostatistics for parasitologists—a primer to quantitative parasitology. Trends Parasitol. 1835, 277–281. ( 10.1016/j.pt.2019.01.003) [DOI] [PubMed] [Google Scholar]
- 8.Kanarek G, Zaleśny G. 2014. Extrinsic- and intrinsic-dependent variation in component communities and patterns of aggregations in helminth parasites of great cormorant (Phalacrocorax carbo) from N.E. Poland. Parasitol. Res. 113, 837–850. ( 10.1007/s00436-013-3714-7) [DOI] [PMC free article] [PubMed] [Google Scholar]
- 9.Sarabeev V, Balbuena JA, Morand S. 2019. Aggregation patterns of helminth populations in the introduced fish, Liza haematocheilus (Teleostei: Mugilidae): disentangling host–parasite relationships. Int. J. Parasitol. 49, 83–91. ( 10.1016/j.ijpara.2018.10.004) [DOI] [PubMed] [Google Scholar]
- 10.Sherrard-Smith E, Perkins SE, Chadwick EA, Cable J. 2015. Spatial and seasonal factors are key determinants in the aggregation of helminths in their definitive hosts: Pseudamphistomum truncatum in otters (Lutra lutra). Int. J. Parasitol. 45, 75–83. ( 10.1016/j.ijpara.2014.09.004) [DOI] [PubMed] [Google Scholar]
- 11.Poulin R. 1993. The disparity between observed and uniform distributions: a new look at parasite aggregation. Int. J. Parasitol. 23, 931–944. ( 10.1016/0020-7519(93)90060-C) [DOI] [PubMed] [Google Scholar]
- 12.Arnold BC. 1987. Majorization and the Lorenz order: a brief introduction. Lecture Notes in Statistics, vol. 43 New York, NY: Springer. [Google Scholar]
- 13.Chakravarty SR. 1999. Measuring inequality: the axiomatic approach. In Handbook of income inequality Measurement (ed. J Silber). Recent Economic Thought Series, vol 71, pp. 163–184. Dordrecht, The Netherlands: Springer.
- 14.Efron B, Tibshirani RJ. 1994. An introduction to the bootstrap. London, UK: Chapman & Hall. [Google Scholar]
- 15.Brown CR, Brown MB. 1996. Coloniality in the cliff swallow. Chicago, IL: University of Chicago Press. [Google Scholar]
- 16.Shorrocks AF. 1980. The class of additively decomposable inequality measures. Econometrica 48, 513–625. ( 10.2307/1913126) [DOI] [Google Scholar]
- 17.Shaw DJ, Grenfell BT, Dobson P. 1998. Patterns of macroparasite aggregation in wildlife host populations. Parasitology 117, 597–610. ( 10.1017/S0031182098003448) [DOI] [PubMed] [Google Scholar]
- 18.Ebert U. 1988. On the decomposition of inequality: partitions into non-overlapping subgroups. In Measurement in economics (ed. W Eichhorn), pp. 399–412. New York, NY: Physica-Verlag.
- 19.Shaw DJ, Dobson P. 1995. Patterns of macroparasite abundance and aggregation in wildlife populations: a quantitative review. Parasitology 111, S111–S133. ( 10.1017/S0031182000075855) [DOI] [PubMed] [Google Scholar]
- 20.Ramasubban TA. 1958. The mean difference and the mean deviation of some discontinuous distributions. Biometrika 45, 549–556. ( 10.1093/biomet/45.3-4.549) [DOI] [Google Scholar]
- 21.Morand S, Krasnov B. 2008. Why apply ecological laws to epidemiology? Trends Parasitol. 24, 304–309. ( 10.1016/j.pt.2008.04.003) [DOI] [PubMed] [Google Scholar]
- 22.Lester RJG, Thompson C, Moss H, Barker SC. 2001. Movement and stock structure of narrow-barred Spanish mackerel as indicated by parasites. J. Fish Biol. 59, 833–842. ( 10.1111/j.1095-8649.2001.tb00154.x) [DOI] [Google Scholar]
- 23.Hurlbert SH. 1990. Spatial distribution of the montane unicorn. Oikos 58, 257–271. ( 10.2307/3545216) [DOI] [Google Scholar]
- 24.Skirnisson K, Thorarinsdottir ST, Nielsen OK. 2012. The parasite fauna of rock ptarmigan (Lagopus muta) in Iceland: prevalence, intensity, and distribution within the host population. Comp. Parasitol. 79, 44–55. ( 10.1654/4481.1) [DOI] [Google Scholar]
- 25.Koop JAH, Clayton DH. 2013. Evaluation of two methods for quantifying passeriform lice. J. Field Ornithol. 84, 210–215. ( 10.1111/jofo.12020) [DOI] [PMC free article] [PubMed] [Google Scholar]
- 26.Johnson NL. 1957. A note on the mean deviation of the binomial distribution. Biometrika 44, 532–533. ( 10.1093/biomet/44.3-4.532) [DOI] [Google Scholar]
- 27.Anderson RM, Gordon DM. 1982. Processes influencing the distribution of parasite numbers within host populations with special emphasis on parasite-induced host mortalities. Parasitology 85, 373–398. ( 10.1017/S0031182000055347) [DOI] [PubMed] [Google Scholar]
- 28.Isham V. 1995. Stochastic models of host-macroparasite interaction. Ann. Appl. Probab. 5, 720–740. ( 10.1214/aoap/1177004702) [DOI] [Google Scholar]
- 29.Gregory RD, Woolhouse MEJ. 1993. Quantification of parasite aggregation: a simulation study. Acta Trop. 54, 131–139. ( 10.1016/0001-706X(93)90059-K) [DOI] [PubMed] [Google Scholar]
- 30.Milne A. 1943. The comparison of sheep-tick populations (Ixodes ricinus L.). Ann. Appl. Biol. 30, 240–253. ( 10.1111/j.1744-7348.1943.tb06195.x) [DOI] [Google Scholar]
- 31.Crofton HD. 1971. A quantitative approach to parasitism. Parasitology 62, 179–193. ( 10.1017/S0031182000071420) [DOI] [Google Scholar]
- 32.Kent ML, Gaulke CA, Watral V, Sharpton TJ. 2018. Pseudocapillaria tomentosa in laboratory zebrafish Danio rerio: patterns of infection and dose response. Dis. Aquat. Organ. 131, 121–131. ( 10.3354/dao03286) [DOI] [PMC free article] [PubMed] [Google Scholar]
- 33.Lloyd M. 1967. Mean crowding. J. Anim. Ecol. 36, 1–30. ( 10.2307/3012) [DOI] [Google Scholar]
- 34.Cohen JE, Meng X. 2015. Random sampling of skewed distributions implies Taylor’s power law of fluctuation scaling. Proc. Natl Acad. Sci. USA 112, 7749–7754. ( 10.1073/pnas.1503824112) [DOI] [PMC free article] [PubMed] [Google Scholar]
- 35.Taylor LR. 1961. Aggregation, variance and the mean. Nature 189, 732–735. ( 10.1038/189732a0) [DOI] [Google Scholar]
- 36.Boag B, Tompkins DM, Hudson PJ. 2001. Patterns of parasite aggregation in the wild European rabbit (Oryctolagus cuniculus). Int. J. Parasitol. 31, 1421–1428. ( 10.1016/S0020-7519(01)00270-3) [DOI] [PubMed] [Google Scholar]
- 37.Gethings OJ, Sage RB, Leather SR. 2015. Spatial distribution of infectious stages of the nematode Syngamus trachea within pheasant (Phasianus colchicus) release pens on estates in the south west of England: potential density dependence? Vet. Parasitol. 212, 267–274. ( 10.1016/j.vetpar.2015.07.016) [DOI] [PubMed] [Google Scholar]
- 38.Johnson PTJ, Wilber MQ. 2017. Biological and statistical processes jointly drive population aggregation: using host–parasite interactions to understand Taylor’s power law. Proc. R. Soc. B 284, 20171388 ( 10.1098/rspb.2017.1388) [DOI] [PMC free article] [PubMed] [Google Scholar]
- 39.Koprivnikar J, Riepe TB, Calhoun DM, Johnson PTJ. 2018. Whether larval amphibians school does not affect the parasite aggregation rule: testing the effects of host spatial heterogeneity in field and experimental studies. Oikos 127, 99–110. ( 10.1111/oik.04249) [DOI] [Google Scholar]
- 40.Mestre A, Monros JS, Mesquita-Joanes F. 2014. The influence of environmental factors on abundance and prevalence of a commensal ostracod hosted by an invasive crayfish: are ‘parasite rules’ relevant to non-parasitic symbionts? Freshwater Biol. 59, 2107–2121. ( 10.1111/fwb.12412) [DOI] [Google Scholar]
- 41.Pérez-del-Olmo A, Morand S, Raga JA, Kostadinova A. 2011. Abundance-variance and abundance-occupancy relationships in a marine host–parasite system: the importance of taxonomy and ecology of transmission. Int. J. Parasitol. 41, 1361–1370. ( 10.1016/j.ijpara.2011.09.003) [DOI] [PubMed] [Google Scholar]
- 42.Sarabeev V, Balbuena JA, Morand S. 2017. The effects of host introduction on the relationships between species richness and aggregation in helminth communities of two species of grey mullets (Teleostei: Mugilidae). Vie et Milieu - Life Environ. 67, 121–130. ( 10.1016/j.parint.2015.01.001) [DOI] [Google Scholar]
- 43.Shvydka S, Sarabeev V, Estruch VD, Cadarso-Suárez C. 2018. Optimum sample size to estimate mean parasite abundance in fish parasite surveys. Helminthologia 55, 52–59. ( 10.1515/helm-2017-0054) [DOI] [PMC free article] [PubMed] [Google Scholar]
- 44.Matthee S, Krasnov BR. 2009. Searching for generality in the patterns of parasite abundance and distribution: ectoparasites of a South African rodent, Rhabdomys pumilio. Int. J. Parasitol. 39, 781–788. ( 10.1016/j.ijpara.2008.12.003) [DOI] [PubMed] [Google Scholar]
- 45.Lester RJG, McVinish R. 2016. Does moving up a food chain increase aggregation in parasites? J. R. Soc. Interface 13, 20160102 ( 10.1098/rsif.2016.0102) [DOI] [PMC free article] [PubMed] [Google Scholar]
- 46.Lester RJG. 2012. Overdispersion in marine fish parasites. J. Parasitol. 98, 718–721. ( 10.1645/GE-3017.1) [DOI] [PubMed] [Google Scholar]
- 47.Fong CR. 2016. High density and strong aggregation do not increase prevalence of the isopod Hemioniscus balani (Buchholz, 1866), a parasite of the acorn barnacle Chthamalus fissus (Darwin, 1854) in California. J. Crustacean Biol. 36, 46–49. ( 10.1163/1937240X-00002398) [DOI] [Google Scholar]
- 48.Wilber MQ, Johnson PTJ, Briggs CJ. 2017. When can we infer mechanism from parasite aggregation? A constraint-based approach to disease ecology. Ecology 98, 688–702. ( 10.1002/ecy.1675) [DOI] [PMC free article] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Supplementary Materials
Data Availability Statement
Data for figure 5 are provided as electronic supplementary material.





