Skip to main content
Evolutionary Applications logoLink to Evolutionary Applications
. 2008 Mar 18;1(3):462–474. doi: 10.1111/j.1752-4571.2008.00015.x

Estimation of effective number of breeders from molecular coancestry of single cohort sample

Tetsuro Nomura 1
PMCID: PMC3352377  PMID: 25567728

Abstract

The effective population size, Ne, is an important parameter in population genetics and conservation biology. It is, however, difficult to directly estimate Ne from demographic data in many wild species. Alternatively, the use of genetic data has received much attention in recent years. In the present study, I propose a new method for estimating the effective number of breeders Neb from a parameter of allele sharing (molecular coancestry) among sampled progeny. The bias and confidence interval of the new estimator are compared with those from a published method, i.e. the heterozygote-excess method, using computer simulation. Two population models are simulated; the noninbred population that consists of noninbred and nonrelated parents and the inbred population that is composed of inbred and related parents. Both methods give essentially unbiased estimates of Neb when applied to the noninbred population. In the inbred population, the proposed method gives a downward biased estimate, but the confidence interval is remarkably narrowed compared with that in the noninbred population. Estimate from the heterozygote-excess method is nearly unbiased in the inbred population, but suffers from a larger confidence interval. By combining the estimates from the two methods as a harmonic mean, the reliability is remarkably improved.

Keywords: effective number of breeders, effective population size, genetic estimate, molecular coancestry, single cohort sample

Introduction

The effective population size, Ne, is one of the most important parameters in population genetics and conservation biology, because this parameter determines both the amount of genetic drift and the rate of inbreeding (Crow and Kimura 1970; Falconer and Mackay 1996). Ne can be estimated from demographic data such as the number of parents and the variance in their progeny number (Caballero 1994). However, the demographic data needed to estimate Ne is often not available in many wild species. As an alternative to estimating Ne from demographic data, methods for estimating Ne from genetic data have been developed (for reviews, see Waples 1991; Schwartz et al. 1999; Beaumont 2003; Leberg 2005; Wang 2005). These methods have different time scales on which Ne is measured. Some of them infer the long-term Ne in the past on an evolutionary time scale, and others estimate the current or short-term Ne (Waples 1991; Wang 2005). For solving practical issues such as managing a small population of endangered species, an accurate estimate of the current or short-tem Ne is of special importance, which is a major concern of this study.

To date, three methods are available for this purpose: the temporal method (Nei and Tajima 1981; Pollak 1983; Waples 1989), the linkage disequilibrium method (Hill 1981) and the heterozygote-excess method (Pudovkin et al. 1996; Luikart and Cornuet 1999). These methods actually assess the effective number of breeders (Neb) of a cohort from which a sample is obtained. If the sample consists of reproductive adults, Neb is nearly equivalent to Ne in populations with nonoverlapping generations (Schwartz et al. 1999; and as will be discussed later). Ne can be estimated from Neb in populations with overlapping generations, if the age structure is known (Waples 1991).

The logic behind the temporal method is that the change of allele frequency in samples separated in time is a reflection of genetic drift. This method is the most tested of the genetic Neb estimators and has been used to estimate Neb of various species (Schwartz et al. 1999). The primary weakness of this method is that two or more samples separated in time are necessary (Schwartz et al. 1999). This can be expensive and, by nature, time-consuming. The linkage disequilibrium method is based on the fact that genetic drift generates nonrandom association among alleles in different loci. Despite of the obvious advantage that this method can be used to estimate Neb from a single cohort sample, there are several drawbacks (Schwartz et al. 1999; Wang 2005). Perhaps, the most critical one is that the estimator assumes an isolated equilibrium population with a constant effective size, which may not be tenable for natural populations of endangered species. The heterozygote-excess method is based on the fact that when the breeding population is small, binomial sampling error produces allele frequency differences between male and female breeders, resulting in an excess of heterozygotes in their progeny (Robertson 1965). As in the linkage disequilibrium method, this method has the advantage that only a single cohort sample is required. Further, this method is appealing because the estimate is easily computed. However, there are few applications of this method, presumably because of the low precision, as empirically shown by Luikart and Cornuet (1999).

Several authors (Waples 1991; Pudovkin et al. 1996; Luikart and Cornuet 1999) emphasized the importance of exploring a method that gives an estimate independent of ones from existing methods, because a combined estimate of several independent estimates is expected to improve the precision of separate estimates. In the present study, a novel method for estimating Neb from genetic data of a single cohort sample is proposed. The estimator is obtained from a simple parameter (molecular coancestrty) of allele sharing among sampled individuals. Reliability of the new estimator is compared with that from the heterozygote-excess method using computer simulation. Improvement of the reliability attained by combining the two methods is also examined.

Methods

Estimation of Neb from parent-based coancestry

Although a monoecious diploid population is assumed throughout the following derivation, the extension to dioecious diploid species is straightforward and the same estimation method is applicable to the population.

Let ft be the coancestry among two randomly sampled individuals in generation t, and P be the probability that two randomly sampled alleles each from different individuals in generation t come from the same individual in generation t − 1. The recurrence equation for the coancestry is given by

graphic file with name eva0001-0462-m1.jpg 1

(Crow and Kimura 1970, p. 102), where Ft−1 is the inbreeding coefficient of individuals in generation t − 1. Following the definition by Crow and Kimura (1970, p. 347), we define the effective number of breeders (Neb), or strictly the inbreeding effective number, as

graphic file with name eva0001-0462-m2.jpg 2

We set the base population of ft at the population of generation t − 1 by assuming Ft−1 = ft−1 = 0. Putting t − 1 = 0 in (1), we obtain from (1) and (2), Inline graphic and

graphic file with name eva0001-0462-m3.jpg 3

This means that an estimate of Neb can be obtained if the parent-based coancestry (f1) among individuals in one cohort is estimated.

Estimation of parent-based coancestry

Molecular coancestry

For locus l, molecular coancestry fM,xy,l (frequently called ‘molecular similarity index’) between individual x having alleles a and b and individual y having alleles c and d is defined as (Malécot 1948)

graphic file with name eva0001-0462-m4.jpg 4

where indicator Iac is one when allele a of individual x is identical to allele c of individual y, and zero otherwise, etc. When there are L marker loci, molecular coancestry fM,xy is the average molecular coancestry over all loci (Toro et al. 2002, 2003):

graphic file with name eva0001-0462-mu2.jpg

Molecular coancestry will be not only because of alleles that are identical by descent but also because of alleles that are alike in state (AIS). Molecular coancestry is, therefore, an upward biased estimator of the coancestry relative to an arbitrary base population. When sl denotes the probability that two alleles at locus l are AIS in the base population, the expected molecular coancestry between individual x and y at locus l is (Oliehoek et al. 2006)

graphic file with name eva0001-0462-m5.jpg 5

where fxy is the coancestry between individuals x and y expressed relative to the base population.

Equation (5) shows that a value for sl is needed for each locus to obtain fxy. If allele frequencies in the base population are known without errors, sl is computed as Inline graphic, where nl is the number of alleles in locus l and pi the frequency of ith allele in locus l in the base population. Because allele frequencies in the base population are, however, usually unknown, sl needs to be estimated. Similar problem is arisen in estimating any relatedness from molecular markers. In most of the published works (e.g. Ritland 1996; Lynch and Ritland 1999), allele frequencies have been estimated from the current population for which relatedness is estimated, meaning that the base population is set equal to the current population. For our purpose, this approximation leads to an apparent contradiction, because it implicitly assumes no drifts in allele frequencies between parent and progeny generations (i.e. Neb = ∞).

Estimation of f1from fM,xy

Irrespective of the upward bias, simulations suggest that molecular coancestry can be a good indicator of the coancestry relative to an arbitrary base population (e.g. Toro et al. 2003; Oliehoek et al. 2006). We take advantage of this property to convert the molecular coancestry to the parent-based coancestry (f1).

Suppose that n individuals are sampled from progeny in a given generation, for which f1 is estimated. We assume that the sample consists of at least two nonsib families. This assumption will be satisfied except for a population with an extremely small number of parents, such as a population with only one male parent in polygynous species. Thus, for a given individual in the sample, at least one nonsib pair should be involved in the possible n − 1 pairs with other sampled members. Underlying concept of our estimation is that the nonsib pairs could be inferred from molecular coancestry. Fernández and Toro (2006) showed that a sib-ship can be reconstructed from molecular coancestry with a high accuracy, suggesting that the inference on nonsib pairs based on molecular coancestry has a fairly high precision.

We assume that pairs inferred to be nonsibs (putative nonsibs) are true nonsibs (i.e. fxy = 0). Thus, substituting the average molecular coancestry (Inline graphic) for locus l over all pairs of putative nonsibs into (5) gives an estimate of sl:

graphic file with name eva0001-0462-m6.jpg 6

With the weight wl to optimize the contributions of loci to the estimate of coancestry, suggested by Oliehoek et al. (2006), the parent-based coancestry between individuals x and y, f1,xy, is estimated as

graphic file with name eva0001-0462-mu5.jpg

where

graphic file with name eva0001-0462-mu6.jpg

(Oliehoek et al. 2006)

graphic file with name eva0001-0462-mu7.jpg

and Inline graphic is the estimated frequency of allele i in locus l from the sampled individuals. Note that the weight wl puts more weight on loci with small sl and with lots of alleles at nearly equal frequency. The estimate of f1 is simply obtained by averaging Inline graphic over Inline graphic pairs:

graphic file with name eva0001-0462-mu11.jpg

And from (3), Neb is estimated by

graphic file with name eva0001-0462-m7.jpg 7

Selection method for putative nonsib pairs

The simplest method for selecting putative nonsibs from all the possible pairs is to select a given number (n0) of pairs with the smallest molecular coancestry. However, this method leads to an underestimation of sl, because of the positive correlation between fM,xy and fM,xy,l due to the finite number of marker loci (L). For example, in an extreme case where only one marker locus is available (L = 1), the selection of the smallest fM,xy automatically results in the selection of pairs with the smallest fM,xy,l. When the number of selected pairs (n0) is much smaller than the number of the actually existing nonsib pairs, the average of fM,xy,l over the selected n0 pairs is expected to be lower than that of fM,xy,l over all the actually existing nonsib pairs, leading to an underestimation of sl [cf. equation (6)].

In a strictly statistical sense, the selection of putative nonsibs for the estimation of sl should be based on data independent of the sample from which sl is estimated. This problem could be largely solved by excluding the information on locus l in selecting putative nonsib pairs for the estimation of sl. Denoting the molecular coancestry between individuals x and y excluding the information on locus l by fM,xy,/l, we can compute it as

graphic file with name eva0001-0462-m8.jpg 8

For estimating sl, the selection of n0 pairs with the smallest coancestry is based on this partial molecular coancestry.

In the present study, the following selection method was applied: (i) Give the sequential numbers (i = 1, 2, …, n) to n sampled individuals. (ii) For the first individual (i = 1), a pair with the smallest fM,xy,/l [computed from (8)] is selected from n − 1 pairs with other members. (iii) For the proceeding individual (i ≥ 2), a pair with the smallest fM,xy,/l is selected in the same manner. But if the pairs already selected in the previous selection are included in n − 1 candidate pairs, the pairs are excluded from the candidates to avoid doubly selecting the same pairs. (iv) As a result, we obtain n0(=n) pairs with the smallest fM,xy,/l; (v) averaging fM,xy,l [computed from (4)] over the n0 pairs. The average (Inline graphic) is the estimate of sl [cf. equation (6)]. (vi) Steps (ii)–(v) are repeated until estimates of sl are obtained for all marker loci.

Computer simulation

Computer simulation was carried out to evaluate the reliability of the presented method. Genotypes of individuals in the initial population were generated by assigning alleles randomly sampled from an infinite (conceptual) gene pool with a uniform allele frequency distribution with two alleles for the ‘low-polymorphic’ marker loci case or 10 alleles for the ‘high-polymorphic’ marker loci case. The number of loci was 80 for both polymorphic cases. Prior to progeny sampling for the estimation of Neb, eight generations of random mating with a breeding system defined below were simulated to accumulate inbreeding and relationship. As the breeding system, monogamy and polygyny were modeled. Under monogamy model, an equal number of male and female parents (N/2) were randomly paired to form N/2 permanent couples. Progeny (parent of the next generation) was produced from a randomly sampled couple, and the sampling of a couple and the reproduction were repeated until N/2 replacements of each sex have been obtained. Under polygyny model, Nm males and Nf (>Nm) females were generated, and each female was mated with a randomly sampled male (thus, there are Nf fixed matings). Progeny was produced from a randomly sampled mating, and this was replicated to obtain Nm males and Nf females for the parents of the next generation. In the final generation, a sample of n progeny was obtained in the same manner of reproduction of the respective breeding system. From the loci each with at least two segregating alleles in the sampled progeny, L = 5–30 loci were randomly chosen as marker loci. For the standard parental population size, N = 10 in monogamy, and Nm = 5 males and Nf = 20 females in polygyny were computed. Sample size of progeny (n) in the final generation was 100 for the two breeding systems. In the low-polymorphic marker loci case, all the marker loci should have exactly two alleles (nl = 2) as in single nucleotide polymorphisms, but the allele frequency distribution is varied among the loci. In the high-polymorphic marker loci case, not only the allele frequency distribution but also the number of alleles is varied among the loci. In the above standard population size, the average numbers of alleles per marker locus was 3.83 in monogamy, and 5.31 in polygyny, which would be comparable with the allele number of microsatellite markers in a practical survey. This type of data generation is referred to as the ‘inbred population’ model, in a sense that the parental population of sampled progeny consists of inbred and related individuals, which will be a general situation of endangered species populations.

As another type of data generation, the ‘noninbred population’ model was also simulated. The manner for the assignment of initial genotypes and the acceleration of generations were exactly same as in the inbred population, except for that the number of accelerated generations was seven. At the final generation, the allele frequency distribution of each locus was memorized. Then, genotypes of parents were regenerated by assigning alleles randomly sampled from an infinite gene pool with the memorized allele frequency distribution. The sampling of progeny and the choice of marker loci were same as in the inbred population. These procedures could produce a parental population consisting of noninbred and nonrelated individuals but having the same quality of molecular information as in the corresponding inbred population. This type of data generation could be an approximation of a recently recolonized population in an ephemeral habitat.

In additional computations, different sizes of parental population and progeny sample were examined. The effect of unequal contribution of parents on the estimates was also evaluated under monogamy with N = 10, by considering the following two patterns of unequal contributions of N/2 = 5 couples: (0.4, 0.3, 0.1, 0.1, 0.1) and (0.6, 0.1, 0.1, 0.1, 0.1). The number of replicated runs for each combination of population model, breeding system and variables was 5000.

Demographic effective number of breeders (Neb,demo) under monogamy model was computed from the standard formula of the inbreeding effective size (Caballero 1994):

graphic file with name eva0001-0462-m9.jpg 9

where Inline graphic and Inline graphic are the mean and variance of the number of progeny of couples, respectively. The expression of Inline graphic under the simulated condition is given in Appendix A. Neb,demo under polygyny is computed as

graphic file with name eva0001-0462-m10.jpg 10

The derivation of this equation is shown in Appendix B. Neb from pedigree coancestry was also computed, which was simply obtained by substituting the average parent-based pedigree coancestry of sampled progeny into (7). The computed Neb well agreed with Neb,demo. Thus, only the value of Neb,demo was presented in results, and it was referred to as the true value of simulation. In addition to the estimate (denoted as Inline graphic hereafter) obtained from (7), estimate from the heterozygote-excess method (Inline graphic; Pudovkin et al. 1996) was computed for comparison. The locus specific Inline graphic is estimated as

graphic file with name eva0001-0462-mu19.jpg

where

graphic file with name eva0001-0462-mu20.jpg

and Hobs,i and Hexp,i are the observed and expected proportion of heterozygotes having allele i, respectively. Multiple loci estimate was simply computed as the harmonic mean of Inline graphic over the marker loci, following the previous simulation studies (Pudovkin et al. 1996; Luikart and Cornuet 1999). In both methods, when a negative estimate was obtained, the estimate was regarded as an infinite (Inline graphic).

As a criterion of evaluation, the harmonic mean of estimates over 5000 replicates was computed. Furthermore, to characterize the variation and distribution of estimates, 10th, 50th and 90th percentiles in replicates were calculated. The xth percentile was obtained as the 5000 × (x/100)th smallest estimate in 5000 replicated estimates.

Results and discussion

Left and middle panels in Fig. 1 (A: monogamy and B: polygyny) illustrate the 10th, 50th and 90th percentiles, and a harmonic mean of 5000 replicated estimates of the effective number of breeders (Neb) from the heterozygote-excess and molecular coancestry methods applied to the noninbred population with L = 5–20 high-polymorphic marker loci. The three percentiles indicate that the distributions of estimates from both methods are skewed upward. The 50th percentile and harmonic mean were, however, close to Neb,demo (10 for monogamy and 13.79 for polygyny) in both methods. Under monogamy, the interval between 10th and 90th percentiles in Inline graphic tended to be wider than that in Inline graphic, whereas the reversal tendency was observed under polygyny.

Figure 1.

Figure 1

Harmonic mean (marked by open circle), and 10th, 50th and 90th percentiles (marked by bar) of 5000 estimated effective numbers of breeders in the noninbred population under (A) monogamy with N = 10 (half of each sex) parents and (B) polygyny with Nm = 5 male and Nf = 20 female parents, for the case of high-polymorphic marker loci. The sample size of progeny is n = 100. Inline graphic is the estimate from heterozygote-excess method (Pudovkin et al. 1996), Inline graphic the estimate from equation (7) and Inline graphic the estimate by the harmonic mean of Inline graphic and Inline graphic. The value in top of each graph is the clipped 90th percentile, and the value in parentheses is the percentage of replicates with Inline graphic. The dashed line shows the effective number of breeders expected from demographic parameters (Neb,demo = 10 under monogamy and 13.79 under polygyny, respectively).

The corresponding simulation results in the inbred population are shown in Fig. 2. Although the 50th percentile and harmonic mean show that the heterozygote-excess method gives an essentially unbiased estimate of Neb, the estimate from the molecular coancestry method tends to be biased downward. The degree of bias became larger as the number of marker loci increased. Inbreeding and relationship in the parental population gave quite a different impact on the confidence interval in the two methods. The interval between 10th and 90th percentiles in Inline graphic was widened in the inbred population, compared with that in the noninbred population (Fig. 1). The increase of confidence interval was more remarkable under monogamy. In fact, the 90th percentile under monogamy was infinite even with L = 20 marker loci. In contrast, the interval in Inline graphic was remarkably narrowed in the inbred population. For example, the 10th and 90th percentiles in Inline graphic under monogamy with L = 20 marker loci were 3.75 and 12.93, respectively.

Figure 2.

Figure 2

Harmonic mean (marked by open circle), and 10th, 50th and 90th percentiles (marked by bar) of 5000 estimated effective numbers of breeders in the inbred population under (A) monogamy with N = 10 (half of each sex) parents and (B) polygyny with Nm = 5 male and Nf = 20 female parents, for the case of high-polymorphic marker loci. The sample size of progeny is n = 100. Inline graphic is the estimate from heterozygote-excess method (Pudovkin et al. 1996), Inline graphic the estimate from equation (7) and Inline graphic the estimate by harmonic mean of Inline graphic and Inline graphic. The value in top of each graph is the clipped 90th percentile, and the value in parentheses is the percentage of replicates with Inline graphic. The dashed line shows the effective number of breeders expected from demographic parameters (Neb,demo = 10 under monogamy and 13.79 under polygyny, respectively).

In a strict sense, the heterozygote-excess method is valid only when the progeny are produced by random union gametes (Pudovkin et al. 1996; Luikart and Cornuet 1999). When the progeny are produced by individual-based pairwise matings such as monogamy and polygyny, the sample of progeny is family-structured. In such a sample, heterozygote deficiency generated by the interfamily Wahlund effect may mask the heterozygote excess, reducing the usefulness of the heterozygote-excess method (Luikart and Cornuet 1999). Using computer simulation, Luikart and Cornuet (1999) examined the effect of a family-structured sample on the reliability of the heterozygote-excess method. They found that the heterozygote-excess method gives an essentially unbiased estimate even with a family-structured sample. However, the existence of family structure in sampled progeny substantially increased the variance of estimates under monogamy. Simulation data of Luikart and Cornuet (1999) was generated in the same manner as the noninbred population of the present study. Thus, their sample of progeny contains only sib families. On the other hand, the sample of progeny from the inbred population consists of families with various degrees of relationship (e.g. cousins). The increased confidence interval observed in Fig. 2 indicates that the application of the heterozygote-excess method to such a sample reduces the reliability, although the method still gives an unbiased estimate. The reduction of reliability will be more serious under monogamy (Fig. 2).

As a detail information on the estimation process in the molecular coancestry method, Table 1 gives the observed and estimated [from equation (6)] AIS probability (sl) in the parental population, and the average estimated parent-based coancestry among actual nonsibs (NS), actual half-sibs (HS), actual full-sibs (FS) and all pairs of sampled progeny, for the case of monogamy and polygyny with L = 15 high-polymorphic marker loci. All the values are shown as the average over 5000 replicates (and over 15 marker loci for sl). In the noninbred population, the estimated AIS probability was close to the observed value, giving the average estimates of the parent-based coancestries in the three categories (NS, HS and FS) close to the pedigree coancestries, i.e. 0, 0.125 and 0.25 for NS, HS and FS, respectively. Thus, the molecular coancestry method gives an essentially unbiased estimate of Neb for the noninbred population (Fig. 1). However, the process of selecting putative nonsibs in the molecular coancestry method causes a problem when applied to the inbred population. The selection method may select the actual nonsibs with a reasonably high probability. But the putative nonsibs selected from the inbred population may be less-related nonsibs with regard to further back ancestral relationships than the average nonsibs among the sampled progeny. As seen from Table 1, this causes an underestimation of AIS probability, implying that the base population for coancestry is set at a further back generation over the parental generation. This overrun in setting the base population results in an overestimation of the parent-based coancestry, leading to a downward bias of Inline graphic as observed in Fig. 2. Irrespective of this drawback, the narrow confidence interval of Inline graphic in the inbred population is attractive in its practical use. Although the molecular coancestry method will be less useful for a point estimate of Neb in inbred populations, it will be useful for detecting a small Neb.

Table 1.

Observed and estimated AIS probability, and estimated parent-based coancestries among actual nonsibs (NS), actual half-sibs (HS), actual full-sibs (FS) and all pairs of sampled progeny from the noninbred and inbred parental populations under monogamy with N =10 parents or polygyny with Nm=5 male and Nf=20 female parents, for the case of L =15 high-polymorphic marker loci and the sample size of n =100.

AIS probability Estimated parent-based coancestry among


Breeding system Population Observed Estimated Actual NS Actual HS Actual FS All pairs
Monogamy Noninbred 0.3587 0.3571 0.0045 0.2552 0.0546
Inbred 0.3565 0.3366 0.0346 0.2651 0.0806
Polygyny Noninbred 0.2967 0.2972 0.0008 0.1259 0.2503 0.0370
Inbred 0.2981 0.2830 0.0237 0.1418 0.2592 0.0579

The AIS probability is the average over 5000 replicates and 15 marker loci, and the coancestry is the average over 5000 replicates.

The simulation results for the estimation with the low-polymorphic marker loci are shown in the left and middle panels in Fig. 3(A) for noninbred and Fig. 3(B) for inbred populations in monogamy. Results in polygyny (data not shown) were essentially similar to those in monogamy. As seen from the 10th and 90th percentiles in Inline graphic, the heterozygote-excess method suffers from a larger confidence interval. In fact, even with L = 30 marker loci, the 90th percentile in Inline graphic was still infinite in both noninbred and inbred populations. In contrast, the molecular coancestry method gave an estimate with a practically acceptable confidence interval when L = 30 marker loci were available.

Figure 3.

Figure 3

Harmonic mean (marked by open circle), and 10th, 50th and 90th percentiles (marked by bar) of 5000 estimated effective numbers of breeders in the (A) noninbred and (B) inbred populations under monogamy with N = 10 (half of each sex) parents, for the case of high-polymorphic marker loci. The sample size of progeny is n = 100. Inline graphicis the estimate from heterozygote-excess method (Pudovkin et al. 1996), Inline graphicestimate from equation (7) and Inline graphic the estimate by harmonic mean of Inline graphic and Inline graphic. The value in top of each graph is the clipped 90th percentile, and the value in parentheses is the percentage of replicates with Inline graphic. The dashed line shows the effective number of breeders expected from demographic parameters (Neb,demo = 10).

Table 2 shows the results from simulation runs with additional combinations of the number of parents and sample size, for the case of L = 15 high-polymorphic marker loci. As the harmonic mean of replicated estimates well agreed with the 50th percentile, it was not shown in the table. The general properties of estimates, e.g. a small bias of estimation from both methods in the noninbred population and a downward bias of Inline graphic in the inbred population, were similar to those observed in Figs 13. A remarkable point in Table 2 is a narrower confidence interval of Inline graphic in a small sample of progeny from a small inbred population. For example, under monogamy with N = 10 parents, the 90th percentile of Inline graphic from n = 10 progeny was 38.2, while the corresponding percentile of Inline graphic was infinite. In most of the practical situations of conservation biology, the population in question will be small and inbred, and may suffer from a low reproductive ability. The molecular coancestry method could significantly contribute to the detection of small Neb of such populations. The magnitude of the downward bias of Inline graphic increased in a larger inbred population, as seen from the 50th percentiles in monogamy with N = 50 and polygyny with Nm = 20 and Nf = 80, which may limit the usefulness of the molecular coancestry method. However, even in these populations, the narrow confidence interval of Inline graphic would be of practical significance for obtaining a conservative estimate of Neb.

Table 2.

Percentiles (10th, 50th and 90th) of estimated effective number of breeders for 5000 replicated simulation runs in the noninbred and inbred populations with several additional combinations of the number of parents and sample size.

Inline graphic Inline graphic Inline graphic



Population and breeding system N or Nm:Nf Neb,demo n 10th 50th 90th 10th 50th 90th 10th 50th 90th
Noninbred
 Monogamy 10 10 10 4.84 11.99 ∞ (23.2) 4.10 8.27 ∞ (10.3) 5.39 9.42 27.01 (2.1)
20 5.24 11.01 ∞ (16.7) 4.48 8.81 114.5 (8.5) 5.90 9.57 24.42 (1.2)
50 50 50 19.73 55.33 ∞ (26.5) 17.0 45.80 ∞ (23.1) 22.58 44.75 285.37 (6.3)
 Polygyny 5:20 13.79 20 7.63 16.18 ∞ (14.4) 6.11 12.42 ∞ (12.0) 8.80 13.81 38.51 (1.7)
50 8.73 15.17 73.97 (5.8) 7.06 13.57 85.49 (6.7) 9.09 14.15 30.01 (0.5)
20:80 53.78 100 25.28 59.03 ∞ (17.6) 21.62 50.24 ∞ (18.2) 28.10 52.03 203.54 (3.0)
Inbred
 Monogamy 10 10 10 4.46 12.18 ∞ (26.5) 3.43 6.70 38.20 (5.7) 4.90 8.03 18.09 (0.9)
20 4.81 10.99 ∞ (22.8) 3.51 6.60 22.29 (3.6) 5.08 7.85 16.58 (0.3)
50 50 50 17.50 50.37 ∞ (23.4) 11.58 20.30 85.59 (4.7) 16.58 27.83 69.50 (1.0)
 Polygyny 5:20 13.79 20 7.52 16.19 ∞ (17.6) 5.00 9.31 41.06 (4.8) 7.26 11.45 25.37 (0.6)
50 8.47 15.85 ∞ (10.0) 5.31 8.85 21.79 (1.6) 7.71 11.33 19.90 (0)
20:80 53.78 100 23.61 57.84 ∞ (19.7) 15.01 24.62 73.89 (2.6) 21.44 33.73 72.07 (0.4)

Fifteen (L =15) high-polymorphic marker loci were assumed.

N, the number of parents (half of each sex) in monogamy; Nm, the number of male parents; Nf, the number of female parents in polygyny; Neb,demo, effective number of breeders expected from demographic parameters; Inline graphic, estimated Neb from the heterozygote-excess method; Inline graphic, estimated Neb from equation (7); Inline graphic, harmonic mean of Inline graphic and Inline graphic.

Figures in parentheses are the percentage of replicates with Inline graphic.

The effect of unequal contributions of parents on estimates of Neb is shown in Table 3, in which a monogamy with N = 10 (half of each sex) and a sample size of n = 100 offspring was assumed. In all the cases computed, the 90th percentile in the molecular coancestry method was much smaller than in the heterozygote-excess method. As unequal contribution of parents is an important factor for a smaller Ne than the census number of breeders (Frankham 1995), the higher accuracy of the present method observed in Table 3 will be a practically appealing point.

Table 3.

Percentiles (10th, 50th and 90th) of estimated effective number of breeders for 5000 replicated simulation runs with unequal contribution of parents under monogamy in the noninbred and inbred populations with N =10 (half of each sex) parents and the sample size of n =100.

Inline graphic Inline graphic Inline graphic



Contribution Neb,demo Population 10th 50th 90th 10th 50th 90th 10th 50th 90th
0.4, 0.3, 0.1, 0.1, 0.1 7.18 Noninbred 4.53 8.14 302.02 (9.3) 3.59 6.91 18.55 (2.1) 4.81 7.31 13.46 (0.2)
Inbred 4.07 8.30 ∞ (16.9) 2.69 5.45 14.09 (1.1) 4.09 6.31 10.95 (0)
0.6, 0.1, 0.1, 0.1, 0.1 5.03 Noninbred 3.80 6.82 107.07 (8.8) 2.26 4.74 13.90 (2.0) 3.40 5.42 9.94 (0.1)
Inbred 3.63 7.24 ∞ (14.6) 1.76 4.17 12.50 (1.6) 2.96 5.02 8.90 (0.1)

Fifteen (L =15) high-polymorphic marker loci were assumed.

Contribution: expected contributions of Inline graphic=5 couples to sample.

Neb,demo, effective number of breeders expected from demographic parameters; Inline graphic, estimated Neb from the heterozygote-excess method; Inline graphic, estimated Neb from equation (7); Inline graphic, harmonic mean of Inline graphic and Inline graphic.

Figures in parentheses are the percentage of replicates with Inline graphic.

Figure 4 represents the joint distribution of estimates from the heterozygote-excess and molecular coancestry methods applied to the inbred populations under polygyny with Nm = 5 and Nf = 20 parents and L = 15 high-polymorphic marker loci. The moment and Spearman’s rank correlations, excluding the pairs with infinite estimate, were −0.003 and −0.164, respectively. In all other cases simulated, the correlations of these orders were obtained. An interesting point in Fig. 4 is that the incidence of overestimations in the two methods tends to be exclusive. At present, it is not theoretically obvious how to combine several estimates of Neb optimally to give a single best estimate (Wang 2005). As a tentative method, I combined the two estimates as the harmonic mean, according to the suggestion of Waples (1991):

Figure 4.

Figure 4

Joint distribution of estimates of effective number of breeders from heterozygote-excess (Inline graphic) and molecular coancestry (Inline graphic) methods in the inbred population under polygyny with Nm = 5 male and Nf = 20 female parents and n = 100 sample of progeny. Estimates outside the graph were clipped.

graphic file with name eva0001-0462-mu38.jpg

The harmonic mean is expected to work well in the present case, because of the exclusive incidence of overestimations in the two methods; an overestimated Neb returned by one method is filtered out and the combined estimate is largely determined by the estimate from the other method. The property of the combined estimate is shown in the right panels in Figs 13 and the column of Inline graphic in Tables 2 and 3. The combined estimate in the inbred population was biased downward because of the downward bias of Inline graphic. However, as expected, the confidence interval of the estimate was substantially narrowed, comparing with the separate estimates. It is notable that the improvement is larger for lower marker quality, i.e. for a smaller number of marker loci and/or a smaller number of alleles in each locus (Figs 13), and for a smaller sample size (Table 2). Although the development of an optimal method for combining separate estimates into a single estimate deserves further investigation with sophisticated statistical tools, the above results strongly suggest that a highly reliable estimate can be obtained from the optimal combination.

Some of the limitations of the method proposed in this study are shared by most of the published methods: marker alleles are assumed to be selectively neutral, mating within the population is at random and immigration from other populations is absent (Leberg 2005). In addition, the present method involves a problem associated with age at sampling. Estimation of Ne from the recurrence equation (1) is based on the assumption that the average coancestries in two successive generations are measured as the same age stage. In fact, the application of the present method to a sample of juveniles gives an estimate of ‘the effective number of breeders’. But even in a population with nonoverlapping generations, the estimate can be largely different from Ne, depending on the survival pattern of juveniles to adults. Following Crow and Morton (1955), we consider two extreme patterns of the survival: (i) random survival and (ii) survival of the family as a unit. In the random survival model, survival from juvenile to adult is randomly determined with the expected survival rate s. Under this pattern of survival, the average coancestry among adults is expected to be unchanged from that among the juveniles. Thus, if the present method is applied to a population with nonoverlapping generations, Inline graphic. Under the survival of the family as a unit, the entire juveniles in a family either survive or do not. With the average survival rate s in the population, Inline graphic obtained from a sample of juveniles is related to Ne as Inline graphic (for the theoretical aspect of the above consideration, see Appendix C). Although this model describes an extreme pattern of survival, Inline graphic of animals with low fecundity and high survival rate, such as mammals and birds in which parental nursing for their brood is generally observed, should be cautiously interpreted. On the other hand, Inline graphic will give an appropriate estimate of Ne when the method is applied to animals with high fecundity and low survival rate, such as marine invertebrates and fishes, whose survival seems to be essentially random.

The present method involves additional problems associated with the selection method for putative nonsibs. One is the problem as to the determination of the number (n0) of selected pairs as putative nonsibs. Although the selection method applied to the present study automatically assigns the number (n) of the sampled progeny to n0, this is an arbitrary choice. With a smaller n0, it is more likely that the selected pairs are actually nonsibs, but the coancestry among them will underestimate the AIS probability, and vice versa. Another problem is the drift-induced linkage disequilibrium among marker loci. In small populations, the drift-induced linkage disequilibrium may be an important factor (Hill 1981) and reduce the degree to which loci provide independent information about coancestry. This may reduce the effectiveness of the selection criterion of putative nonsibs defined by equation (8). One potential for solving these problems and improving the estimates of Neb from molecular coancestry is the use of a sib-ship reconstruction technique. To date, several methods for sib-ship reconstruction from molecular markers have been developed using different algorithms, such as Markov Chain Monte Carlo (MCMC) algorithm (Almudevar and Field 1999; Thomas and Hill 2002; Wang 2004) and simulated annealing (Almudevar 2003; Fernández and Toro 2006), and have been reviewed by Blouin (2003) and Butler et al. (2004). I here take the method proposed by Fernández and Toro (2006) as a trial example of the use of a sib-ship reconstruction technique for estimating Neb. By the use of their method, we can find the sib-ships among sampled individuals that yield a parent-based coancestry matrix with the highest correlation with the molecular coancestry matrix. A notable feature of their method is that it is free from the assumption of linkage equilibrium among marker loci. Two methods for the use of the reconstructed sib-ships were examined: In the first method (SR1), the reconstructed sib-ships were directly used for computing Inline graphic in equation (7). In the second method (SR2), the average locus-specific coancestry among the inferred nonsib pairs were used for estimating sl as in equation (6). Simulation with 200 replicates was run for the case of polygyny in the inbred population with Nm = 5 and Nf = 20 parents, n = 100 sample of progeny and L = 15 high-polymorphic marker loci. The results are summarized in Table 4. The two methods with sib-ship reconstruction worked quite well; they gave nearly unbiased estimates and narrower confidence intervals. Although further evaluations including other published methods for sib-ship reconstruction should be carried out under a wide range of scenario, the results in Table 4 suggest the potential for improving the molecular coancestry method.

Table 4.

Harmonic mean and percentiles (10th, 50th and 90th) of two estimates (Inline graphic and Inline graphic) of effective number of breeders from 200 replicated simulation runs with a combined use of the molecular coancestry method and a sib-ship reconstruction technique.

Percentile

Estimate Harmonic mean 10th 50th 90th
Inline graphic 16.11 9.10 16.41 111.56 (5.5)
Inline graphic 8.07 5.32 8.14 16.33 (0.1)
Inline graphic 14.39 10.74 15.07 18.54 (0)
Inline graphic 12.84 9.66 13.38 17.67 (0)

The corresponding values from the heterozygote-excess (Inline graphic) and molecular coancestry (Inline graphic) methods are also presented. Polygyny with Nm=5 male and Nf=20 female parents in the inbred population with L =15 high-polymorphic marker loci and the sample size of n =100 was assumed. The effective number of breeders expected from demographic parameters is 13.79.

Figures in parentheses are the percentage of replicates with Inline graphic.

Acknowledgments

I thank Troy Day and four anonymous referees for their helpful comments on the manuscript and Jesús Fernández for sending me Fortran code of his algorithm. This work was supported in part by grant-in-aid for scientific research (no. 19658104) from the Ministry of Education, Culture, Sports, Science and Technology of Japan.

Appendix A – Expression of Inline graphic in equation (9)

In general, variance of x can be written as

graphic file with name eva0001-0462-m11.jpg (A1)

where Inline graphic and Inline graphic are the expectation and variance of x conditional on a given y, respectively (Mood et al. 1987, p. 159). We apply this formula to the derivation of expression of Inline graphic.

Let Inline graphic be the expected contribution of ith couple to the cohort of offspring and ki the number of offspring by ith couple in sample with size n. Applying (A1), we obtain

graphic file with name eva0001-0462-mu52.jpg

where Inline graphic is the mean of ci.

For example, in the simulation condition assumed in Figs 13 and Table 2, Inline graphic for all i, giving

graphic file with name eva0001-0462-mu55.jpg

Substituting this expression of Inline graphic and Inline graphic into (9) gives

graphic file with name eva0001-0462-mu58.jpg

as expected.

Appendix B – Derivation of equation (10)

The effective size (Ne) of populations with unequal sex ratio and variation in mating success has been generally formulated by Nomura (2005). Consider a population of polygynous (harem) breeding system with Nm male and Nf female parents, in which a male mates with several females and a female mates with only one male. Let dmi be the number of matings of male parent Inline graphic with the mean Inline graphic and variance Inline graphic. Assuming a Poisson distribution of litter size (the number of newborns per mating), the equation given by Nomura (2005) reduces to

graphic file with name eva0001-0462-m12.jpg (B1)

where Inline graphic is the coefficient of variation of dmi. Under the condition of the present simulation, the number of matings (dmi) of male parents follows a binomial distribution with the mean Inline graphic and variance Inline graphic, giving

graphic file with name eva0001-0462-mu65.jpg

Substituting this expression into (B1) leads to

graphic file with name eva0001-0462-mu66.jpg

Putting Neb,demo = Ne, we obtain equation (10).

Appendix C – Effect of age at sampling on relation between Ne and Neb

For simplicity, consider a population of monogamous species with an equal number (N/2 = Nm = Nf) of male and female parents. Generations are assumed to be discrete (nonoverlapping). Let kei be the number of offspring at the early age stage (juveniles) contributed by family (couple) i, and kai be the number of offspring at the later age stage (reproductive adults) contributed by family i. The average survival rate from juvenile to adult is s. According to the standard formula of effective population size (Caballero 1994), the effective number of breeders of juveniles Neb and the effective population size Ne (or equivalently the effective number of breeders of adults) are expressed as

graphic file with name eva0001-0462-mu67.jpg

and

graphic file with name eva0001-0462-m13.jpg (C1)

We consider two extreme survival models: (i) random survival and (ii) survival of the family as a unit. Although μka = ke in both models, the expression of Inline graphic and consequently the relation between Neb and Ne depend on the model of survival assumed, as shown below.

Random survival

Applying equation (A1) and noting Inline graphic, we obtain an expression of Inline graphic as

graphic file with name eva0001-0462-m14.jpg (C2)

Substituting (C2) into (C1) gives

graphic file with name eva0001-0462-mu71.jpg

Survival of the family as a unit

Under this model, the expression corresponding to (C2) is

graphic file with name eva0001-0462-mu72.jpg

Substituting this expression into (C1) leads to

graphic file with name eva0001-0462-mu73.jpg

Literature cited

  1. Almudevar A. A simulated annealing algorithm for maximum likelihood pedigree reconstruction. Theoretical Population Biology. 2003;63:63–75. doi: 10.1016/s0040-5809(02)00048-5. [DOI] [PubMed] [Google Scholar]
  2. Almudevar A, Field C. Estimation of single-generation sibling relationships based on DNA markers. Journal of Agricultural, Biology, and Environmental Statistics. 1999;4:136–165. [Google Scholar]
  3. Beaumont MA. Conservation genetics. In: Balding DJ, Bishop M, Cannings C, editors. Handbook of Statistical Genetics. London: Wiley; 2003. pp. 779–812. [Google Scholar]
  4. Blouin MS. DNA-based methods for pedigree reconstruction and kinship analysis in natural populations. Trends in Ecology and Evolution. 2003;18:503–511. [Google Scholar]
  5. Butler K, Field C, Herbinger CM, Smith BR. Accuracy, efficiency and robustness of four algorithms allowing full sibship reconstruction from DNA marker data. Molecular Ecology. 2004;13:1589–1600. doi: 10.1111/j.1365-294X.2004.02152.x. [DOI] [PubMed] [Google Scholar]
  6. Caballero A. Developments in the prediction of effective population size. Heredity. 1994;73:657–679. doi: 10.1038/hdy.1994.174. [DOI] [PubMed] [Google Scholar]
  7. Crow JF, Kimura M. An Introduction to Population Genetics Theory. Minneapolis: Burgess Publishing; 1970. [Google Scholar]
  8. Crow JF, Morton NE. Measurement of gene frequency drift in small populations. Evolution. 1955;9:202–214. [Google Scholar]
  9. Falconer DS, Mackay TFC. Introduction to Quantitative Genetics. 4th edn. Essex: Longman; 1996. [Google Scholar]
  10. Fernández J, Toro MA. A new method to estimate relatedness from molecular markers. Molecular Ecology. 2006;15:1657–1667. doi: 10.1111/j.1365-294X.2006.02873.x. [DOI] [PubMed] [Google Scholar]
  11. Frankham R. Effective population size/adult population size ratios in wildlife: a review. Genetical Research. 1995;66:95–106. doi: 10.1017/S0016672308009695. [DOI] [PubMed] [Google Scholar]
  12. Hill WG. Estimation of effective population size from data on linkage disequilibrium. Genetical Research. 1981;38:209–216. [Google Scholar]
  13. Leberg P. Genetic approaches for estimating the effective size of populations. The Journal of Wildlife Management. 2005;69:1385–1399. [Google Scholar]
  14. Luikart G, Cornuet J-M. Estimating the effective number of breeders from heterozygote excess in progeny. Genetics. 1999;151:1211–1216. doi: 10.1093/genetics/151.3.1211. [DOI] [PMC free article] [PubMed] [Google Scholar]
  15. Lynch M, Ritland K. Estimation of pairwise relatedness with molecular markers. Genetics. 1999;152:1753–1766. doi: 10.1093/genetics/152.4.1753. [DOI] [PMC free article] [PubMed] [Google Scholar]
  16. Malécot G. Les mathématiques de l’hérédité. Paris: Masson et Cie; 1948. [Google Scholar]
  17. Mood AM, Graybill FA, Boes DC. Introduction to the Theory of Statistics. 3rd edn. Singapore: McGraw-Hill; 1987. [Google Scholar]
  18. Nei M, Tajima F. Genetic drift and estimation of effective population size. Genetics. 1981;98:625–640. doi: 10.1093/genetics/98.3.625. [DOI] [PMC free article] [PubMed] [Google Scholar]
  19. Nomura T. Effective population size under random mating with a finite number of matings. Genetics. 2005;171:1441–1442. doi: 10.1534/genetics.104.029769. [DOI] [PMC free article] [PubMed] [Google Scholar]
  20. Oliehoek PA, Windig JJ, Van Arendonk JAM, Bijma P. Estimating relatedness between individuals in general populations with a focus on their use in conservation programs. Genetics. 2006;173:483–496. doi: 10.1534/genetics.105.049940. [DOI] [PMC free article] [PubMed] [Google Scholar]
  21. Pollak E. A new method for estimating the effective population size from allele frequency changes. Genetics. 1983;104:531–548. doi: 10.1093/genetics/104.3.531. [DOI] [PMC free article] [PubMed] [Google Scholar]
  22. Pudovkin AI, Zaykin DV, Hedgecock D. On the potential for estimating the effective number of breeders from heterozygote-excess in progeny. Genetics. 1996;144:383–387. doi: 10.1093/genetics/144.1.383. [DOI] [PMC free article] [PubMed] [Google Scholar]
  23. Ritland K. Estimators for pairwise relatedness and individual inbreeding coefficients. Genetical Research. 1996;67:175–186. [Google Scholar]
  24. Robertson A. The interpretation of genotype ratios in domestic animal populations. Animal Production. 1965;7:319–324. [Google Scholar]
  25. Schwartz MK, Tallman DA, Luikart G. Review of DNA-based census and effective population size estimators. Animal Conservation. 1999;1:293–299. [Google Scholar]
  26. Thomas SC, Hill WG. Sibship reconstruction in hierarchical population structure using Markov chain Monte Carlo techniques. Genetical Research. 2002;79:227–234. doi: 10.1017/s0016672302005669. [DOI] [PubMed] [Google Scholar]
  27. Toro MA, Barragán C, Ovilo C, Rodrigańez J, Rodriguez C, Silió L. Estimation of coancestry in Iberian pigs using molecular markers. Conservation Genetics. 2002;3:309–320. [Google Scholar]
  28. Toro MA, Barragan C, Ovilo C. Estimation of genetic variability of the founder population in a conservation scheme using microsatellites. Animal Genetics. 2003;34:226–228. doi: 10.1046/j.1365-2052.2003.00988.x. [DOI] [PubMed] [Google Scholar]
  29. Wang J. Sibship reconstruction from genetic data with typing errors. Genetics. 2004;166:1963–1979. doi: 10.1534/genetics.166.4.1963. [DOI] [PMC free article] [PubMed] [Google Scholar]
  30. Wang J. Estimation of effective population sizes from data on genetic markers. Philosophical Transactions of the Royal Society B. 2005;360:1395–1409. doi: 10.1098/rstb.2005.1682. [DOI] [PMC free article] [PubMed] [Google Scholar]
  31. Waples RS. A generalized approach for estimating effective population size from temporal changes in allele frequency. Genetics. 1989;121:379–391. doi: 10.1093/genetics/121.2.379. [DOI] [PMC free article] [PubMed] [Google Scholar]
  32. Waples RS. Genetic methods for estimating the effective size of cetacean populations. In: Hoelzel AR, editor. Genetic Ecology of Whales and Dolphins, Special Issue 13. London: International Whale Commission; 1991. pp. 279–300. [Google Scholar]

Articles from Evolutionary Applications are provided here courtesy of Wiley

RESOURCES