Abstract
Estimating effective population size (N e) using linkage disequilibrium (LD) information (N e( LD )) has the operational advantage of using a single sample. However, N e( LD ) estimates assume discrete generations and its performance are constrained by demographic issues. However, such concerns have received little empirical attention so far. The pedigree of the endangered Gochu Asturcelta pig breed includes individuals classified into discrete filial generations and individuals with generations overlap. Up to 780 individuals were typed with a set of 17 microsatellites. Performance of N e( LD ) was compared with N e estimates obtained using genealogical information, molecular coancestry (N e(M)) and a temporal (two‐sample) method (N e( JR )). Molecular‐based estimates of N e exceeded those obtained using pedigree data. Estimates of N e( LD ) for filial generations F3 and F4 (17.0 and 17.3, respectively) were lower and steadier than those obtained using yearly or biannual samplings. N e( LD ) estimated for samples including generations overlap could only be compared with those obtained for the discrete filial generations when sampling span approached a generation interval and demographic correction for bias was applied. Single‐sample N e(M) estimates were lower than their N e( LD ) counterparts. N e(M) estimates are likely to partially reflect the number of founders rather than population size. In any case, estimates of LD and molecular coancestry tend to covary and, therefore, N e(M) and N e( LD ) can hardly be considered independent. Demographically adjusted estimates of N e( JR ) and N e( LD ) took comparable values when: (1) the two samples used for the former were separated by one equivalent to discrete generations in the pedigree and (2) sampling span used for the latter approached a generation interval. Overall, the empirical evidence given in this study suggested that the advantage of using single‐sample methods to obtain molecular‐based estimates of N e is not clear in operational terms. Estimates of N e obtained using methods based in molecular information should be interpreted with caution.
Keywords: Coancestry, effective population size, linkage disequilibrium, pedigree, single sample, temporal sampling
Introduction
There is an increasing interest in estimating effective population size (N e) using linkage disequilibrium (LD) information. Although these methodologies have been used basically in natural populations (Waples 1991; Barker 2011), they are of interest in livestock populations with shallow pedigrees in which no sound estimates of effective population size can be obtained using genealogies (Cervantes et al. 2011b).
The advantages of using LD information are clear in terms of time and operational costs: a single sample can provide estimates of, probably, the most important evolutionary parameter for a given population. However, computation of effective population size using linkage disequilibrium (N e(LD)) has well‐known causes of severe bias, namely sample size, markers set size, and minor allele frequency (England et al. 2006; Luikart et al. 2010; Waples and Do 2010).
In any case, the major operational constraint for the estimation of N e(LD) is that this approach assumes discrete generations and only fit well to semelparous age‐structured species. In iteroparous species, such as livestock, in which generations overlapping is the rule, N e(LD) values are more likely estimates of the effective number of breeders (N b; effective number of adult individuals that give rise to a cohort) rather than the effective size for a generation (Waples 2006; Barker 2011; Goyache et al. 2011). Although N e and N b are closely related, there are large differences between them among species and within populations (Waples et al. 2013, 2014). In such scenario, the analysis of livestock populations with known mating policy, demographic structure, and pedigrees can shed light on the performance of N e(LD) in natural iteroparous populations.
The demography of a population evolving under generations overlap is likely to affect molecular‐based estimates of N e. When temporal (two‐sample) methods for computing N e are applied, it is necessary to assume that samples are far from being independent and that “temporal” estimates of N e must be adjusted using life‐traits data (Jorde and Ryman 1995, 1996). A similar rational has been recently applied to estimates of N e(LD) obtained using single‐cohort samples. Waples et al. (2014) suggested to adjust estimates of N e(LD) for demographic bias using the ratio N b/N e. This ratio can be calculated accurately using two key life‐history traits (Waples et al. 2011, 2013).
The demographic concerns described above are not usually addressed even in research carried out in livestock (Corbin et al. 2010, 2012; Flury et al. 2010; Goyache et al. 2011). The endangered Gochu Asturcelta pig breed (Menéndez et al. 2016a) offers a unique scenario to deal with this task. A recovery program for the breed started in 2002 using six founders (three boars and three sows). The reproductive career of the founders and their direct descendants was prolonged as much as possible, and strict breeding policies avoiding matings between close relatives were applied in the population (Menéndez et al. 2016a). This allowed to identify, across years, a number of individuals which could be classified into discrete filial generations: F1 (direct descendants of two founders), discrete generation F2 (F1 × F1 crosses),and so on till discrete generation F5 (F4 × F4 crosses). This unique scenario allows to compare, in the same population, estimates of N e(LD) obtained when discrete generations are considered with those obtained using yearly or biannual cohort samplings. Further, the effect of the correction for demographic bias using parameters obtained via direct observation of the pedigree can also be assessed.
The current research will analyze both the information registered in the herdbook of the Gochu Asturcelta pig breed from 2006 to 2010 and the genotypes obtained for paternity testing. This will allow to assess the performance of N e(LD) in the following scenarios: (1) samples obtained from discrete filial generations; (2) samples obtained from yearly cohorts; and (3) samples drawn from a number of yearly cohorts equaling to or exceeding generation length. The effect of demographic adjustment of estimates will be assessed as well. For descriptive purposes, performance will be compared with estimates of N e obtained using single‐sample molecular coancestry, temporal (two‐sample) methods, and genealogical information.
Materials and Methods
Data available and sampling
Pedigree data recently analyzed by Menéndez et al. (2016a) were available. Data consisted of 3156 records (including six founders), from 515 litters, with father and mother known, registered in the herdbook of the breeders association (ACGA) from its foundation to August 2014. A total of 109 boars and 309 sows had offspring in data. Genealogies were traced to identify 11 F1 individuals (offspring of two founders), 47 F2 individuals (F1 × F1), 216 F3 individuals (F2 × F2), 147 F4 individuals (F3 × F3), and seven F5 individuals (F4 × F4).
Table 1 gives a detailed description of the data used. Analyses were limited to the period in which F3 and F4 individuals were born (from 2006 to 2010). This ensured that sample size and pedigree depth (at least three equivalents to complete generations; Gutiérrez et al. 2009) were enough to obtain reliable results. Finally, pedigree analyses involved a total of 2248 individuals, born between 2006 and 2010, including 363 F3 or F4 individuals and 1885 individuals with different pedigree depths due to generations overlap.
Table 1.
Description of samples used per year of birth. The number of litters and individuals involved in computations are detailed according to pedigree knowledge: (a) those individuals included into discrete generations (F3 or F4) and (b) those having overlapped genealogies. Both the number of individuals registered in the herdbook (used for genealogical analyses) and the number of individuals typed (in brackets) are given
| Year of birth | Discrete generations | Overlapped generations | Totals | |||||
|---|---|---|---|---|---|---|---|---|
| Litters | F3 | F4 | Subtotal | Litters | Individuals | Litters | Individuals | |
| 2006 | 6 | 39 (32) | 0 (0) | 39 (32) | 5 | 44 (42) | 11 | 83 (74) |
| 2007 | 14 | 99 (98) | 6 (6) | 105 (104) | 18 | 130 (32) | 32 | 235 (136) |
| 2008 | 21 | 50 (34) | 109 (95) | 159 (129) | 51 | 404 (82) | 72 | 563 (211) |
| 2009 | 5 | 22 (21) | 26 (26) | 48 (47) | 85 | 676 (178) | 90 | 724 (225) |
| 2010 | 3 | 6 (6) | 6 (6) | 12 (12) | 79 | 631 (122) | 82 | 643 (134) |
| Totals | 49 | 216 (191) | 147 (133) | 363 (324) | 238 | 1885 (456) | 287 | 2248 (780) |
A set of 17 microsatellites (IGF1, S0002, S0026, S0071, S0101, S0155, S0225, S0226, S0227, SW240, SW632, SW857, SW911, SW936, SW951, S0005, and S0090) used in paternity testing and diversity analyses (Menéndez et al. 2015, 2016b) were typed in a representative sample of the available individuals. Most microsatellites used were included in the ISAG–FAO panel (http://www-lgc.toulouse.inra.fr/pig/panel/panel2004.htm). Primer sequences and Polymerase Chain Reaction (PCR) conditions can be found in Menéndez et al. (2016b). PCR was carried out in a GenAmp 9700 thermocycler (Applied Biosystems, Barcelona, Spain) and genotyping was performed on an ABI 3130 DNA‐automated sequencer (Applied Biosystems).
Genotypes of a total of 780 individuals were available. They included: (1) 324 of 363 (89%) F3 or F4 individuals and (2) 456 of 1885 (24%) individuals with generations overlap in their pedigree born between 2006 and 2010. Altogether, yearly samples available varied from 83 (74‐typed) individuals born in 2006 to 724 (225‐typed) individuals born in 2010 (see Table 1).
According to the structure of data described above, analyses were sequentially carried out on: (1) discrete filial generations (F3 and F4); (2) yearly cohorts from 2006 to 2010; and (3) sequential biannual samplings mimicking the average generation interval of 1.8 ± 0.03 years reported for the Gochu Astucelta breed by Menéndez et al. (2016a). As mating policy avoids crosses between close relatives (Menéndez et al. 2016a), a model of random mating was assumed when necessary.
Genealogical estimates of effective population size
The equivalent to complete generations traced (t), computed as the sum of (1/2)n, where n is the number of generations separating the individual to each known ancestor (Maignel et al. 1996), was calculated for each individual in the pedigree born in the five‐year period 2006–2010.
Effective population sizes (N e) and their standard errors were estimated on the basis of individual increase in inbreeding ΔF i (Gutiérrez et al. 2008, 2009) and coancestry ΔC ij (Cervantes et al. 2011a) considering and , where F i is the inbreeding coefficient of individual i, C ij is the coancestry coefficient between individuals i and j (the inbreeding of a descendent from both), and t i and t j are their respective equivalent to complete generations. Finally, effective sizes were computed by averaging the individual increase in inbreeding and the increase in pairwise coancestry for all pairs of individuals in a reference subpopulation using the following formulae: and . Finally, following Cervantes et al. (2011a), the ratio N e C ij/N e F i was computed to ascertain the existence of a possible hidden structure in data.
Single‐sample molecular estimates of effective population size
N e(LD) was estimated as , where is the estimate of the correlation among alleles and S is the sample size, using the modification proposed by Waples (2006), which correct for biases resulting from the presence of rare alleles, and was empirically adapted to different sample sizes and mating systems (here large sample sizes, ≥ 30, and random mating apply; see Waples and Do 2010). To check for the consistency of the results obtained, three separate analyses were performed via removing, respectively, alleles with frequencies (P crit) lower than 0.05, 0.02, and 0.01. A jackknife procedure was used to construct 95% confidence intervals of the estimates.
For consistency with the genealogical methods, single‐sample N e was also estimated using the molecular coancestry method proposed by Nomura (2008) as , where is the average over n(n−1)/2 pairs of individuals of the molecular coancestry between two individuals i and j over L‐analyzed loci , where is the expected homozygosity at a locus l. Nomura (2008) followed the suggestion by Oliehoek et al. (2006) of: (1) removing from the computations those alleles alike‐by‐state and not identical‐by‐descent and (2) weighting the contributions over loci using , where p i is the frequency of the allele i at a given locus, to increase the importance of loci with small s l and balanced allele frequencies. This method uses alleles at any frequency for computations. A jackknife procedure was used to construct 95% confidence intervals of the estimates.
Two‐sample molecular estimates of effective population size
To illustrate differences between single‐sample and two‐sample estimators of molecular‐based N e, the unbiased temporal method proposed by Jorde and Ryman (2007), which has been proved to give consistent estimates across cohort pairs in a livestock framework (Goyache et al. 2011), was performed as well. This method is based on the computation of the estimator F s, computed as
| (1) |
, where A is the number of alleles at the locus, x i and y i are the frequencies of the ith allele in the first and second samples, respectively, and z i is the average frequency of the ith allele over samples. Computations were performed under a sampling plan I (Waples 1989) in which individuals are sampled nondestructively and subsequently returned to the population. Under this sampling plan, the Jorde and Ryman's (2007) estimator of N e (here noted as N e(JR)) is , where n y is the number of individuals in the second sample, is the harmonic mean of the sample sizes n x and n y, and N is the actual census size of the population at the time of first sampling. A jackknife procedure was used to construct 95% confidence intervals of the estimates.
Demographic adjustment for bias
Following Waples et al. (2014), N e(LD) estimates were corrected dividing them with the ratio N b/N e , where N b and N e are the effective number of breeders and effective population size, respectively. Ratio N b/N e was estimated from demographic information using a discrete‐time, age‐structured, and deterministic model and using age‐specific survival rates (s x) and birth rates (b x) calculated separately for males and females (Table S1). The model assumes that: (1) reproduction occurs at intervals of exactly one time unit (here one year); (2) survival and fecundity are independent of events in previous time periods; (3) there is no upper bound to the number of offspring an individual can produce in one breeding cycle; and (4) individuals survive to their first birthday and, therefore, fecundities are scaled to result in a stable population that produces a fixed number (N1) of individuals per cohort that survive to age 1.
Following Jorde and Ryman (1995, 1996), N e(JR) estimates were corrected multiplying them with the ratio C/G, where C is a correction factor obtained from life table data (see Table S2) and G is the generation interval. Factor C accounts for variance due to mortality as a cohort passes from one‐year class to the next and for genetic covariance among cohorts (because individuals from multiple age classes are the parents of a given cohort). The model to compute factor C requires a basic life table with information on age‐specific survival rates (l i) and birth rates at each age class i (i.e., gametic contribution; b i; see Table S2).
In all cases, life table data were estimated directly from the Gochu Asturcelta pig pedigree limiting the age of the parents to 5 years old.
Software used
All demographic and genealogical analyses were computed using the program ENDOG v4.8 (Gutiérrez and Goyache 2005) freely available at http://www.ucm.es/info/prodanim/html/JP_Web.htm
Molecular‐based estimates of N e were computed in all cases using the program NeEstimator (Do et al. 2014) freely available at http://www.molecularfisherieslaboratory.com.au/neestimator-software/.
Ratio N b/N e was computed from life table data using the program AgeNe (Waples et al. 2011, 2013) freely available at http://conserver.iugo-cafe.org/user/Robin%20Waples/AgeNe.
The Jorde and Ryman's (1995, 1996) correction factor C was computed using a program kindly provided by Dr P. E. Jorde (http://folk.uio.no/ejorde/software/factorc.zip).
Results
Table 2 gives the estimates of N e obtained using linkage disequilibrium (N e(LD)), molecular coancestry (N e(M)), and pedigree information. When discrete filial generations were considered, genealogical estimates of N e were consistently the same varying from N e F i = 5.0 ± 0.8 for F3 to 5.6 ± 0.3 F4. Estimates of N e F i and N e C ij were comparable across yearly and biannual samplings with the lower estimates for the cohort sampled in 2006. Parameter tended to have similar values across either yearly or biannual samplings. However, both N e F i and N e C ij tended to increase with pedigree depth (and size of the breeding stock) varying from N e F i = 4.6 ± 1.9 for Cohort2006 to N e C ij = 9.2 ± 0.3 for Cohort2010. Further, ratio N e C ij/N e F i was roughly 1 for F3 and F4. However, this ratio increased with years from 1.09 for Cohort2006 to 1.39 for Cohort2010, therefore suggesting the existence of a slight hidden structure in the Gochu Asturcelta pedigree (Table 2).
Table 2.
Number of individuals (N) involved and estimates of effective size for each discrete generation, yearly cohort, and biannual sampling analyzed in the Gochu Asturcelta pig breed population computed via molecular‐based methods (linkage disequilibrium, N e(LD) and molecular coancestry, N e(M)) and pedigree information (individual increase in inbreeding, N e F i and individual increase in coancestry, N e C ij). In brackets, confidence intervals of the estimates on 95% (molecular‐based methods) or standard errors of the estimates (genealogical methods) are provided. Additionally, the estimated correlation () and molecular coancestry () among alleles are given for the molecular‐based methods and mean inbreeding (F), mean equivalent to discrete generations (t), and average individual increase in inbreeding () are provided for pedigree data
| Sampling | Molecular estimates | Genealogical estimates | ||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| N b |
|
N e(LD) c |
|
N e(M) | N b | F | t |
|
N e F i | N e C ij | N e C ij /N e F i | |||
| Discrete generations | ||||||||||||||
| Generation 3 | 191 | 0.00532 | 17.0 (14.4; 19.9) | 0.0681 | 7.3 (5.2; 9.9) | 216 | 0.20 ± 0.08 | 3 | 0.10 ± 0.04 | 5.0 ± 0.8 | 5.4 ± 0.5 | 1.08 | ||
| Generation 4 | 133 | 0.00770 | 17.3 (13.5; 21.8) | 0.0951 | 5.3 (2.3; 9.3) | 147 | 0.25 ± 0.04 | 4 | 0.09 ± 0.01 | 5.5 ± 0.2 | 5.6 ± 0.3 | 1.02 | ||
| Yearly sampling | ||||||||||||||
| Cohort2006 | 74 | 0.01410 | 6.3d [9.4e (7.7; 11.2)] | 0.0925 | 5.4 (4.1; 6.8) | 83 | 0.17 ± 0.15 | 2.7 ± 0.7 | 0.10 ± 0.10 | 4.6 ± 1.9 | 5.0 ± 0.6 | 1.09 | ||
| Cohort2007 | 136 | 0.00753 | 27.5d [41.2e (31.9; 53.7)] | 0.0390 | 12.8 (3.1; 29.4) | 235 | 0.20 ± 0.10 | 3.2 ± 0.4 | 0.10 ± 0.05 | 5.2 ± 1.0 | 6.2 ± 0.4 | 1.19 | ||
| Cohort2008 | 211 | 0.00481 | 16.9d [25.3e (20.5; 31.1)] | 0.0666 | 7.5 (4.1; 12) | 563 | 0.22 ± 0.05 | 3.8 ± 0.4 | 0.08 ± 0.02 | 6.0 ± 0.6 | 7.2 ± 0.4 | 1.20 | ||
| Cohort2009 | 225 | 0.00451 | 18.9d [28.4e (23.3; 34.4)] | 0.0442 | 11.3 (6.3; 18.8) | 724 | 0.22 ± 0.05 | 4.2 ± 0.5 | 0.08 ± 0.02 | 6.3 ± 0.8 | 8.3 ± 0.4 | 1.32 | ||
| Cohort2010 | 134 | 0.00764 | 13.2d [19.8e (16.4; 23.7)] | 0.0062 | 8.4 (0.1; 40.4) | 643 | 0.25 ± 0.07 | 4.7 ± 0.6 | 0.08 ± 0.02 | 6.6 ± 0.7 | 9.2 ± 0.3 | 1.39 | ||
| Biannual samplinga | ||||||||||||||
| Sampling2006–2007 | 210 | 0.00483 | 20.1d [30.2e (25.8; 38.3)] | 0.0500 | 10.0 (6.3; 14.5) | 318 | 0.19 ± 0.11 | 3.1 ± 0.5 | 0.10 ± 0.06 | 5.0 ± 1.4 | 5.7 ± 0.5 | 1.14 | ||
| Sampling2007–2008 | 347 | 0.00291 | 24.2d [36.3e (29.3; 44.6)] | 0.0356 | 14.1 (4.6; 28.8) | 798 | 0.21 ± 0.07 | 3.6 ± 0.5 | 0.09 ± 0.03 | 5.7 ± 0.8 | 6.7 ± 0.4 | 1.18 | ||
| Sampling2008–2009 | 436 | 0.00231 | 20.0d [30.0e (25.2; 35.5)] | 0.0570 | 8.8 (5.7; 12.5) | 1287 | 0.22 ± 0.05 | 4.0 ± 0.5 | 0.08 ± 0.02 | 6.2 ± 0.7 | 7.7 ± 0.4 | 1.24 | ||
| Sampling2009–2010 | 359 | 0.00281 | 21.8d [32.7e (28.3; 37.7)] | 0.0240 | 20.8 (7.2; 41.5) | 1367 | 0.24 ± 0.06 | 4.4 ± 0.6 | 0.08 ± 0.03 | 6.5 ± 0.7 | 8.6 ± 0.4 | 1.32 | ||
Sampling mimicking the mean generation interval reported by Menéndez et al. (2016a) for the whole pedigree of the Gochu Astucelta breed (1.8 ± 0.03 years).
Number of individuals involved in the estimates.
Values obtained removing alleles with frequencies (P crit) lower than 0.05.
Estimates of effective size after correction for bias due to age structure.
Original estimates of effective size and confidence intervals.
N e(LD) took values over 17.0 for both discrete filial generations F3 and F4 (Table 2). Estimates of N e(LD) obtained for yearly or biannual samplings were adjusted for generations overlap using the ratio N b/N e computed using demographic information. This ratio took a value of 0.667 corresponding to demographic estimates of N b and N e of 222.9 and 334.9, respectively. Estimates of N e(LD) were highly consistent no matter the P crit used. Therefore, only estimates obtained using P crit = 0.05 are given. When yearly samplings were considered, the corrected estimates were similar to those obtained for the discrete filial generations when sample size was high (16.9 for Cohort2008 and 18.9 for Cohort2009). However, when sample size (Cohorts 2006, 2007, and 2010) was lower, estimates were clearly biased downward or upward. Using biannual samplings, mimicking the average generation interval as recommended by Waples et al. (2013, 2014), the corrected estimates were biased upward varying from 20.0 for Sampling2008–2009 to 24.2 for Sampling2007–2008. The increase of sampling period to three years did not change the scenario described above (Table S3). In any case, it is worth mentioning that before demographic correction (using ratio N b/N e), N e(LD) estimates were always unacceptably biased upward (Table 2).
In general, estimates of N e(M) took lower values than their N e(LD) counterparts (Table 2). N e(M) for discrete filial generation F4 (5.3) was significantly lower than that of F3 (7.3) due to a noticeable increase in molecular coancestry (9.51% vs. 6.81% in F3). When yearly or biannual samplings were considered, estimates of N e(M) followed a similar trend to those of N e(LD): the higher the N e(LD) values the higher the N e(M) estimates. Except for Sampling2008–2009, decreases in coincided with lower molecular coancestry values leading to estimates of N e(M) and N e(LD) highly biased upward (see Cohort2007 in Table 2). Again, the increase of sampling period to three years did not give any improvement in estimating N e (Table S3).
Estimates of N e were also obtained using a temporal method, previously tested in the livestock framework (Goyache et al. 2011), to gain more evidence on performance of single‐sample methods to estimate N e when samples are drawn from a number of yearly cohorts (Table 3). Estimates were corrected for overlapping generations by multiplying the original values with the ratio C/G (2.23) corresponding to a correction factor, C, computed following Jorde and Ryman (1995, 1996), of 4.01. Although the program FactorC gave an estimate of generation interval, G, of 1.93 years, the “real” G of the population of 1.8 (± 0.03) years reported by Menéndez et al. (2016a) was used to the risk of slightly overestimate the N e(JR) values. When subsequent yearly samplings were considered, the estimates of N e(JR) had a noticeable variation fluctuating from 13.4 (Cohort2006 − Cohort2007) to 33.0 (Cohort2009 − Cohort2010). When the two samples used were separated by three years, the estimates obtained become more consistent varying from 23.6 (from Cohort2008 to Cohort2010) to 25.8 (from Cohort2007 to Cohort2009), therefore suggesting that drift signal was not strong enough in subsequent yearly samplings to give reliable estimates of N e. Note that the estimates of N e(JR) for three‐year samplings were slightly higher to the adjusted N e(LD) estimates obtained for biannual samplings (Table 2) and slightly lower to the adjusted and three‐year sampling N e(LD) estimates (Table S3). In any case, these N e(JR) and N e(LD) estimates were fully comparable.
Table 3.
Estimates of N e obtained in the Gochu Asturcelta pig population using the temporal method of Jorde and Ryman (2007; N e(JR)) with all possible combinations formed by subsequent and triennial samplings of the five yearly cohorts available. Both the original and the adjusted for overlapping generations estimates of N e(JR) are given. The 95% confidence intervals of the original estimates are in brackets. Sampling sizes for each sample regime are also provided
| Sample regime | Sample size | N e(JR) estimates | Confidence intervals | |
|---|---|---|---|---|
| Original | Adjusted | |||
| Subsequent cohorts | ||||
| From Cohort2006 to Cohort2007 | 74–136 | 6.0 | 13.4 | (4.0;11.9) |
| From Cohort2007 to Cohort2008 | 136–211 | 14.4 | 32.1 | (11.3;19.7) |
| From Cohort2008 to Cohort2009 | 211–225 | 7.0 | 15.6 | (4.8;13.6) |
| From Cohort2009 to Cohort2010 | 225–134 | 14.8 | 33.0 | (11.4;20.9) |
| Triennial sampling | ||||
| From Cohort2006 to Cohort2008 | 74–211 | 11.4 | 25.4 | (8.0;19.4) |
| From Cohort2007 to Cohort2009 | 136–225 | 11.6 | 25.8 | (7.0;32.3) |
| From Cohort2008 to Cohort2010 | 211–134 | 10.6 | 23.6 | (7.8;16.4) |
Discussion
The Gochu Asturcelta pig breed offers a very particular scenario useful to illustrate the performance of single‐sample methods to estimate N e in animal populations using molecular information. The breeding policy implemented by the breeders association allows to identify individuals that can be classified into discrete filial generations and, therefore, to compare the performance of different methods to estimate N e under two different scenarios: generations overlap and discrete generations.
Genealogical estimates of effective size obtained using individual increase in inbreeding (N e F i) and individual increase in coancestry (N e C ij) kept consistency across reference populations (samples) and are in fully agreement with those recently reported by Menéndez et al. (2016a) for the most recent registered populations. Genealogical estimates are provided as a frame of reference for the understanding of the performance of the molecular‐based methods to estimate N e. Note that the genealogical methods applied correct for differences in pedigree depth and completeness of the individuals forming a reference population and, indirectly, account for the effects of mating policy, drift, overlap of generations, selection, and migration as a consequence of their reflection in the pedigree of each individual (Cervantes et al. 2008, 2009; Gutiérrez et al. 2008). Moreover, after the modification of the method suggested by Gutiérrez et al. (2009), and further applied for N e C ij by Cervantes et al. (2011a), N e F i accounts for the absence of self‐fertilization allowing to obtain useful estimates of N e using pedigrees with three equivalents to complete generations on average. In the current analysis, the lower estimates of N e F i and N e C ij were assessed for the yearly Cohort (2006) with mean pedigree depth (t = 2.7 ± 0.7) on the limit of estimability (Gutiérrez et al. 2009).
It is not surprising that molecular‐based estimates of N e are higher than those obtained using genealogical data. Very recently, Silió et al. (2016), analyzing two experimental pig lines kept in herds closed for 24–28 generations and subject to a strict minimum coancestry mating policy, reported that molecular‐based estimates of N e based on either inbreeding or coancestry tended to exceed their genealogical counterparts. Unlike pedigree information, which refers to a virtually infinite number of loci, criteria based on observed molecular polymorphism refer to a finite number of loci. In any case, sampling sizes and number of loci used here can be considered enough to obtain reliable estimates of effective populations size even if the expected N e were moderate or large (Antao et al. 2010).
Performance of the N e(LD) method
Even when discrete filial generations are considered, estimates of N e(LD) are at least threefold higher than the corresponding genealogical estimates (Table 2). However, estimates of N e(LD) for filial generations F3 and F4 were lower and steadier than those obtained using yearly or biannual sampling. The linkage disequilibrium method relies on the fact that, in a system where gametes are randomly distributed among a small number of zygotes, there will be departures from expected genotype frequencies and departures from expected gametic frequencies, both of which can be used to estimate N e (Hill 1981; Waples 1991). These assumptions only fit well to samples obtained from age‐structured populations. Moreover, in the case of overlapping generations, it is hard to assume that the available samples derive from a population with constant size. If population size changes, the “background” LD from previous generations that has not broken down by recombination between loci and new LD generated by reproduction of a finite number of individuals reflect different effective sizes and, therefore, estimate of N e based on can be biased upward or downward for a few generations (Waples 2005; Waples et al. 2014).
In any case, estimates of obtained from molecular information in the Gochu Asturcelta pig breed can be biased upward even when discrete filial generations are considered. Demographic information allows to estimate assuming selective neutrality and constant population size as , where H(N e ,N b) is the harmonic mean of N e and N b (see formula (5) in Waples et al. 2014). Demographic estimate of would be here 0.00125 which underestimates the values of obtained using molecular information whatever the sample considered (Table 2). Population studies ideally assume that LD is estimated using samples formed by unrelated individuals. This assumption is far from the pig population analyzed here and is not likely to occur in most livestock or natural animal populations therefore biasing upward the estimates of . Even though breeding policy of the Gochu Asturcelta population is under strict control, some hidden structuring, characterized by the ratio N e C ij/N e F i (Cervantes et al. 2011a), has appeared probably due to an excessive use for reproduction of the descendants of two founders (Menéndez et al. 2016a).
Our results confirm that estimates of N e(LD) obtained for filial generations F3 and F4 are more reliable than those assessed in scenarios with overlapping generations. Moreover, if no demographic adjustment is carried out, N e(LD) estimates for yearly or biannual sampling schemes were terribly wrong (Table 2). Yearly samplings appeared clearly insufficient to obtain sound estimates of N e(LD) , probably due to small sample size (Cohorts 2006 and 2010) or sampling bias (Cohort 2007; see Table 1). Estimates can be substantially biased at small sample sizes unless the true N e was smaller than the sample size used to estimate it (England et al. 2006; Waples 2006). Although biased upward, the current results confirm that N e(LD) estimates are more reliable when sampling span approaches a generation length (Waples et al. 2014). Such sampling span increases sample size, but also “homogenize” the actual number of breeders producing the sample across estimates. As theory suggests that N e(LD) estimates are function of the harmonic mean of N e and N b (Waples et al. 2014), N e(LD) should converge on true N e when sampling span approaches the generation length.
Performance of other molecular‐based methods to estimate N e
The Nomura (2008) coancestry‐based method gave lower estimates of N e than N e(LD) (Tables 2 and S3) and nearer to the “real” genealogical ones in the case of discrete filial generations. However, this may be due to the fact that N e(M) is more likely related to the number of founders represented in the samples rather than population size. Caballero and Toro (2002) reported that 1/2f (being f the average molecular coancestry of the analyzed population) is actually the founder genome equivalents (N g). N g is a key parameter to assess genetic losses due to drift which can be defined as the theoretically expected number of founders that would be required to provide the genetic diversity in the analyzed population if the founders were equally represented and had lost no alleles (Ballou and Lacy 1995). This definition is conceptually different to that of effective population size, the evolutionary analogous to census size, proposed by Wright (1931): the size of an idealized population which would give rise to the rate of inbreeding, or the rate of change in variance of gene frequencies, observed in the analyzed population. Even though the method by Nomura (2008) adjusts for the presence of alleles alike‐in‐state (Oliehoek et al. 2006), the main difference between his method and the Caballero and Toro's (2002) approach is that self‐coancestries (s i), the diagonals in the between‐individuals coancestry matrix (being , where F i is the homozygosity in a molecular context), are not included in the computations and, therefore, and N e(M) > N g. Self‐coancestries have a major importance in computing f: the lower the sample size the higher the weight of self‐coancestries on f (Cervantes et al. 2011b). In any case, our results confirm the results by Miller et al. (2015) in bighorn sheep suggesting that tend to vary with . Therefore, both estimates of N e (N e(M) and N e(LD)) cannot be considered independent.
Results obtained using the Jorde and Ryman's (2007) approach illustrate that performance of temporal methods, when applied to data with overlapping generations, is highly dependent on sampling interval (Waples and Yokota 2007; Barker 2011) due to the particular age structure of the studied population. Here, N e(JR) estimates obtained using subsequent yearly samplings did not accumulate sufficient drift signal, therefore giving inconsistent N e estimates. In turn, too long separation among samples gives estimates of N e highly biased upward (Table S4). Note that, in our example, genealogical separation between subsequent yearly samples is t ≈ 0.5 while four‐year and five‐year sampling plans (Table S4) are separated by 1.5 and 2 equivalent to discrete generations (Table 2). The Jorde and Ryman's (2007) approach gave consistent estimates, comparable with adjusted N e(LD) estimates obtained for biannual (generation interval) sampling, under a three‐year sampling plan. In our example, samples obtained under this sampling plan are separated by about t = 1. This scenario is consistent with the performance of this method previously reported in horses (Goyache et al. 2011).
Conclusions
The current results confirm the fact that performance of N e(LD) can only be considered reliable in populations under generations overlapping when sampling span approaches a generation interval (Waples et al. 2014). Otherwise, sampling bias can affect the estimates of , probably due to unaccounted variation in molecular coancestry among samples. This may be particularly important in scenarios in which samples are not likely to be formed by unrelated individuals. Furthermore, N e(LD) can only be considered useful if a correction of demographic bias is applied.
In such framework, even if no high variation of LD among yearly cohorts occur (Miller et al. 2015), the operational advantage of using single‐sample methods to obtain molecular‐based estimates of N e is not clear: while two‐sample methods may need a sample span exceeding a generation interval, single‐sample methods (namely N e(LD)) will need a representative sampling in each of the yearly cohorts included in that interval. These concerns particularly apply to natural and domestic populations with large generation intervals. As an example, it is worth mentioning that in domestic horses, generation interval usually exceeds 10 years (Cervantes et al. 2009). In such scenario, it is hard to assume that available samples are representative of a complete generation interval period (Corbin et al. 2010, 2012).
The current study has been performed using LD between unlinked loci. The availability of high‐density SNP Chips offers the opportunity of estimating N e(LD) using LD between linked loci, therefore improving the performance of the method. However, the concerns about sampling span described above still apply. Actually, high‐density SNP Chips have been used to ascertain the variation of N e over time, expressed as generations in the past (Corbin et al. 2010, 2012; Flury et al. 2010). Even though some of these studies use complex models accounting for sources of variation such as sample size, mutation, phasing, or recombination rate together with thousands of linked SNP data (Corbin et al. 2012; Barbato et al. 2015), N e(LD) estimates at a given point of time are always function of both and between‐SNPs distance in Morgans (c). As fitting c is usually arbitrary, historical estimates of N e mainly depend on which, in turn, depends on sampling and demographic structure of the studied population.
Overall, the empirical evidence given in the current study confirms that estimates of N e obtained using methods based in molecular information should be interpreted with caution (Barker 2011; Goyache et al. 2011; Putman and Carbone 2014).
Supporting information
Table S1. Life table used to calculate the ratio between effective number of breeders (N b) and effective population size (N e) using demographic data, as proposed by Waples et al. (2014, see references section).
Table S2. Life table used to calculate the correction factor (C) for overlapping generations proposed by Jorde and Ryman (1995, 1996, see references section).
Table S3. Number of individuals (N) involved and estimates of effective size for three‐years sampling in the Gochu Asturcelta pig breed population computed via molecular‐based methods (linkage disequilibrium, N e(LD), and molecular coancestry, N e(M)) and pedigree information (individual increase in inbreeding, N e F i, and individual increase in coancestry, N e C ij).
Table S4. Estimates of N e obtained in the Gochu Asturcelta pig population using the temporal method of Jorde and Ryman (2007; N e(JR)) with all possible four‐year and five‐year sampling plans formed with combinations of the five yearly cohorts available.
Acknowledgments
This work was partially funded by a specific contract between Government of Principado de Asturias and Asociación de Criadores de Gochu Asturcelta (ACGA; http://www.gochuasturcelta.org/). IA, IF, and FG are supported by grant FICYT GRUPIN14‐113.
References
- Antao, T. , Perez‐Figueroa A., and Luikart G.. 2010. Early detection of population declines: high power of genetic monitoring using effective population size estimators. Evol. Appl. 4:144–154. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Ballou, J. D. , and Lacy R. C.. 1995. Identifying genetically important individuals for management of genetic variation in pedigreed populations Pp. 76–111 in Ballou J. D., Gilpin M., Foose T. J., eds. Population management for survival and recovery: analytical methods and strategies in small population management. Columbia University Press, NY, USA. [Google Scholar]
- Barbato, M. , Orozco‐terWengel P., Tapio M., and Bruford M. W.. 2015. SNeP: a tool to estimate trends in recent effective population size trajectories using genome‐wide SNP data. Front. Genet. 6:109. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Barker, J. S. F. 2011. Effective population size of natural populations of Drosophila buzzatii, with a comparative evaluation of nine methods of estimation. Mol. Ecol. 20:4452–4471. [DOI] [PubMed] [Google Scholar]
- Caballero, A. , and Toro M. A.. 2002. Analysis of genetic diversity for the management of conserved subdivided populations. Conserv. Genet. 3:289–299. [Google Scholar]
- Cervantes, I. , Goyache F., Molina A., Valera M., and Gutiérrez J. P.. 2008. Application of individual increase in inbreeding to estimate effective sizes from real pedigrees. J. Anim. Breed. Genet. 125:301–310. [DOI] [PubMed] [Google Scholar]
- Cervantes, I. , Gutiérrez J. P., Molina A., Goyache F., and Valera M.. 2009. Genealogical analyses in open populations: the case of three Arab‐derived Spanish horse breeds. J. Anim. Breed. Genet. 126:335–347. [DOI] [PubMed] [Google Scholar]
- Cervantes, I. , Goyache F., Molina A., Valera M., and Gutiérrez J. P.. 2011a. Estimation of effective population size from the rate of coancestry in pedigreed populations. J. Anim. Breed. Genet. 128:56–63. [DOI] [PubMed] [Google Scholar]
- Cervantes, I. , Pastor J. M., Gutiérrez J. P., Goyache F., and Molina A.. 2011b. Effective population size as a measure of risk status in rare breeds: the case of three Spanish ruminant breeds. Livest. Sci. 138:202–206. [Google Scholar]
- Corbin, L. J. , Blott S. C., Swinburne J. E., Vaudin M., Bishop S. C., and Woolliams J. A.. 2010. Linkage disequilibrium and historical effective population size in the Thoroughbred horse. Anim. Genet. 41:8–15. [DOI] [PubMed] [Google Scholar]
- Corbin, L. J. , Liu A. Y. H., Bishop S. C., and Woolliams J. A.. 2012. Estimation of historical effective population size using linkage disequilibria with marker data. J. Anim. Breed. Genet. 129:257–270. [DOI] [PubMed] [Google Scholar]
- Do, C. , Waples R. S., Peel D., Macbeth G. M., Tillett B. J., and Ovenden J. R.. 2014. NeEstimator V2: re‐implementation of software for the estimation of contemporary effective population size (Ne) from genetic data. Mol. Ecol. Res. 14:209–214. [DOI] [PubMed] [Google Scholar]
- England, P. R. , Cornuet J.‐M., Berthier P., Tallmon D. A., and Luikart G.. 2006. Estimating effective population size from linkage disequilibrium: severe bias in small samples. Conserv. Genet. 7:303–308. [Google Scholar]
- Flury, C. , Tapio M., Sonstegard T., Drögemüller C., Leeb T., Simianer H., et al. 2010. Effective population size of an indigenous Swiss cattle breed estimated from linkage disequilibrium. J. Anim. Breed. Genet. 127:339–347. [DOI] [PubMed] [Google Scholar]
- Goyache, F. , Álvarez I., Fernández I., Pérez‐Pardal L., Royo L. J., and Lorenzo L.. 2011. Usefulness of molecular‐based methods for estimating effective population size in livestock assessed using data from the endangered black‐coated Asturcón pony. J. Anim. Sci. 89:1251–1259. [DOI] [PubMed] [Google Scholar]
- Gutiérrez, J. P. , and Goyache F.. 2005. A note on ENDOG: a computer program for analysing pedigree information. J. Anim. Breed. Genet. 122:172–176. [DOI] [PubMed] [Google Scholar]
- Gutiérrez, J. P. , Cervantes I., Molina A., Valera M., and Goyache F.. 2008. Individual increase in inbreeding allows estimating realised effective sizes from pedigrees. Genet. Sel. Evol. 40:359–378. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Gutiérrez, J. P. , Cervantes I., and Goyache F.. 2009. Improving the estimation of realized effective population sizes in farm animals. J. Anim. Breed. Genet. 126:327–332. [DOI] [PubMed] [Google Scholar]
- Hill, W. G. 1981. Estimation of effective population size from data on linkage disequilibrium. Genet. Res. 38:209–216. [Google Scholar]
- Jorde, P. E. , and Ryman N.. 1995. Temporal allele frequency change and estimation of effective size in populations with overlapping generations. Genetics 139:1077–1090. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Jorde, P. E. , and Ryman N.. 1996. Demographic genetics of brown trout (Salmo trutta) and estimation of effective population size from temporal change of allele frequencies. Genetics 143:1369–1381. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Jorde, P. E. , and Ryman N.. 2007. Unbiased estimator for genetic drift and effective population size. Genetics 177:927–935. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Luikart, G. , Ryman N., Tallmon D. A., Schwartz M. K., and Allendorf F. W.. 2010. Estimation of census and effective population sizes: the increasing usefulness of DNA‐based approaches. Conserv. Genet. 11:355–373. [Google Scholar]
- Maignel, L. , Boichard D., and Vérrier E.. 1996. Genetic variability of French dairy breeds estimated from pedigree information. Interbull Bull. 14:49–54. [Google Scholar]
- Menéndez, J. , Álvarez I., Fernández I., de la Roza B., and Goyache F.. 2015. Multiple paternity in domestic pig under equally probable natural matings. A case study in the endangered Gochu Asturcelta pig breed. Arch. Anim. Breed. 58:217–220. [Google Scholar]
- Menéndez, J. , Álvarez I., Fernández I., and Goyache F.. 2016a. Genealogical analysis of the Gochu Asturcelta pig breed: insights for conservation. Czech J. Anim. Sci. 61:140–143. [Google Scholar]
- Menéndez, J. , Goyache F., Beja‐Pereira A., Fernández I., Menéndez‐Arias N. A., Godinho R., et al. 2016b. Genetic characterization of the endangered Gochu Asturcelta pig breed using microsatellite and mitochondrial markers: insights for the composition of the Iberian native pig stock. Livest. Sci. 187:162–167. [Google Scholar]
- Miller, J. M. , Poissant J., Malenfant R. M., Hogg J. T., and Coltman D. W.. 2015. Temporal dynamics of linkage disequilibrium in two populations of bighorn sheep. Ecol. Evol. 5:3401–3412. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Nomura, T. 2008. Estimation of effective number of breeders from molecular coancestry of single cohort sample. Evol. Appl. 1:462–474. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Oliehoek, P. A. , Windig J. J., van Arendonk J. A. M., and Bijma P.. 2006. Estimating relatedness between individuals in general populations with a focus on their use in conservation programs. Genetics 173:483–496. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Putman, A. I. , and Carbone I.. 2014. Challenges in analysis and interpretation of microsatellite data for population genetic studies. Ecol. Evol. 4:4399–4428. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Silió, L. , Barragán C., Fernández A. I., García‐Casco J., and Rodríguez M. C.. 2016. Assessing effective population size, coancestry and inbreeding effects on litter size using the pedigree and SNP data in closed lines of the Iberian pig breed. J. Anim. Breed. Genet. 133:145–154. [DOI] [PubMed] [Google Scholar]
- Waples, R. S. 1989. A generalized approach for estimating effective population size from temporal changes in allele frequency. Genetics 121:379–391. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Waples, R. S. 1991. Genetic methods for estimating the effective size of Cetacean populations. Rep. Int. Whal. Comm. (Special issue) 13:279–300. [Google Scholar]
- Waples, R. S. 2005. Genetic estimates of contemporary effective population size: To what time periods do the estimates apply? Mol. Ecol. 14:3335–3352. [DOI] [PubMed] [Google Scholar]
- Waples, R. S. 2006. A bias correction for estimates of effective population size based on linkage disequilibrium at unlinked gene loci. Conserv. Genet. 7:167–184. [Google Scholar]
- Waples, R. S. , and Do C.. 2010. Linkage disequilibrium estimates of contemporary Ne using highly variable genetic markers: a largely untapped resource for applied conservation and evolution. Evol. Appl. 3:244–262. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Waples, R. S. , and Yokota M.. 2007. Temporal estimates of effective population size in species with overlapping generations. Genetics 175:219–233. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Waples, R. S. , Do C., and Chopelet J.. 2011. Calculating Ne and Ne/N in age‐structured populations: a hybrid Felsenstein‐Hill approach. Ecology 92:1513–1522. [DOI] [PubMed] [Google Scholar]
- Waples, R. S. , Luikart G., Faulkner J. R., and Tallmon D. A.. 2013. Simple life history traits explain key effective population size ratios across diverse taxa. Proc. Biol. Sci. 280:20131339. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Waples, R. S. , Antao T., and Luikart G.. 2014. Effects of overlapping generations on linkage disequilibrium estimates of effective population size. Genetics 197:769–780. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Wright, S. 1931. Evolution in Mendelian populations. Genetics 16:97–159. [DOI] [PMC free article] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Supplementary Materials
Table S1. Life table used to calculate the ratio between effective number of breeders (N b) and effective population size (N e) using demographic data, as proposed by Waples et al. (2014, see references section).
Table S2. Life table used to calculate the correction factor (C) for overlapping generations proposed by Jorde and Ryman (1995, 1996, see references section).
Table S3. Number of individuals (N) involved and estimates of effective size for three‐years sampling in the Gochu Asturcelta pig breed population computed via molecular‐based methods (linkage disequilibrium, N e(LD), and molecular coancestry, N e(M)) and pedigree information (individual increase in inbreeding, N e F i, and individual increase in coancestry, N e C ij).
Table S4. Estimates of N e obtained in the Gochu Asturcelta pig population using the temporal method of Jorde and Ryman (2007; N e(JR)) with all possible four‐year and five‐year sampling plans formed with combinations of the five yearly cohorts available.
