In a recent study, vonHoldt et al. (2008) examined the success of the grey wolf (Canis lupus) reintroduction program into Yellowstone National Park in preserving the genetic variation of the population. They evaluated a variety of aspects of genetic diversity in the wolf population, which originated from 41 founders introduced in 1995 and 1996, and which has remained genetically isolated since the reintroduction. In each of a large number of individuals sampled during the initial recovery period, 1995–2004, vonHoldt et al. (2008) genotyped 26 microsatellite loci. Their analyses, which included estimates of mean observed and expected heterozygosity (HO and HE, respectively), generally indicated that this isolated wolf population is effective at inbreeding avoidance and maintenance of genetic diversity. However, some aspects of their genetic variation analyses appeared to be somewhat incompatible. Levels of expected heterozygosity, calculated using Nei’s (1987) heterozygosity estimator (ĤE), identified a decreasing trend in genetic variation starting from 1997, after the introductions were complete (Figure 1A). The authors suggested that if this trend continues, wolf fitness might decrease due to the negative effects of inbreeding and reduced adaptability. Curiously, the reported ĤO showed the opposite trend to ĤE (Figure 1A), demonstrating increasing proportions of heterozygous individuals, potentially indicative of a reduction in inbreeding over time. ĤO was consistently lower than ĤE, however, a result that might be suggestive of inbreeding. As behavioral observations documented very few cases of inbreeding over the ten years of the study (vonHoldt et al. 2008), it is likely that factors other than inbreeding have contributed to the discrepancy between ĤE and ĤO.
Figure 1.
(A) Annual values of ĤE and ĤO from Table 1 in vonHoldt et al. (2008). (B) Annual values of from mean locus heterozygosities calculated using the DeGiorgio & Rosenberg (2009) estimator and excluding missing data. The corresponding annual values of calculated by excluding missing data are presented for comparison. (C) Annual values of H̃E, ĤE, and averaged across all loci. H̃E and ĤE treat missing data as an additional allele, whereas excludes missing data from the calculations. H̃E is calculated using the DeGiorgio & Rosenberg (2009) estimator, which accounts for the bias introduced by related individuals. ĤE and are calculated using the Nei (1987) estimator, which does not take relatives into account. The legend applies to all three panels.
The Yellowstone wolf dataset of vonHoldt et al. (2008) was unusually enriched for close relatives, due to the small size of the founding population ancestral to all sampled individuals, the lack of gene flow from outside immigrants, the mating hierarchy and high variance of reproductive success in the species, and the near-comprehensive sampling of the population (considering annual census sizes, the per-year proportion of the population sampled was as high as ~86%). Recent developments in the estimation of allele frequencies from inbred and related samples (e.g. Weir 1996; Broman 2001; Bourgain et al. 2004; DeGiorgio & Rosenberg 2009) have demonstrated that the presence of close relatives in a sample introduces a downward bias in ĤE, providing a possible explanation for the unusual heterozygosity observations of vonHoldt et al. (2008). We were therefore interested in determining whether accounting for the bias in ĤE caused by the inclusion of relatives would affect the conclusions of vonHoldt et al. (2008) regarding temporal trends in wolf genetic variation.
A newly developed unbiased estimator for heterozygosity (H̃E) accounts for the presence of close relatives when kinship coefficients (Φ) between individuals in the sample are known (DeGiorgio & Rosenberg 2009). We applied H̃E to genotype and kinship data for the wolves, separately analyzing data from each of the ten years of the study. Data were taken from vonHoldt et al. (2008), employing a pedigree that had previously been constructed using a combination of field observations and pairwise allele-sharing. To adjust for levels of relatedness in the computation of H̃E, for each year, at each locus, we first calculated the average pairwise kinship coefficient (Φ̄) across pairs of individuals sampled at the locus (Figure 2). To determine Φ between pairs of wolves, we used inferred relationships from wolf pedigrees and the algorithm of Lange (2002, pp. 81–83), as implemented by Atkinson & Therneau (2008). For individuals with two unknown parents, we considered the unknown parents to be founders unrelated to all sampled individuals. In rare instances in which the identity of only one parent was uncertain, we considered possible half-siblings to be full-siblings. In computing both heterozygosity and Φ̄ at a locus, we excluded from calculations at that locus individuals for which data were missing. Calculations applied to samples with missing data excluded in this manner are indicated by a “prime” (e.g. ). After estimating per-locus heterozygosities, we averaged them across loci to obtain overall annual estimates.
Figure 2.
Annual kinship coefficients averaged across all pairs of individuals genotyped for each locus, then averaged across all 26 loci. Individuals with missing data at a locus were excluded in Φ̄ computations at the locus for this plot and for the calculation of ; they were not excluded in Φ̄ computations used in calculating H̃E.
When the downward bias introduced by the inclusion of relatives is taken into account through the use of kinship coefficients, in the period after the introductions, the mean across loci shows no decreasing trend over time (Figure 1B), in contrast to the reported loss of variation over time seen for ĤE by vonHoldt et al. (2008). The downward trend in ĤE detected by vonHoldt et al. (2008) is instead likely to be due to increasing average kinship in the sample after all founders had been introduced (Figure 2). Additionally, as would be expected if inbreeding is rare, and match more closely, both in value and in the lack of a temporal trend (Figure 1B), than do the values of ĤE and ĤO (Figure 1A) reported by vonHoldt et al. (2008). In fact, for each year of the study, considering paired lists of locus heterozygosities, we found and not to be significantly different at the P<0.05 level (Table 1). This similarity of and , and the absence of a downward temporal trend in these quantities, are consistent with the low levels of inbreeding observed; these results are also compatible with the viewpoint of vonHoldt et al. (2008) that the population is thriving in terms of genetic diversity.
Table 1.
P-values for two-sided Wilcoxon signed-rank tests, comparing pairs of statistics across the 26 loci in the study.
Year | vs. | vs. | vs. |
---|---|---|---|
1995 | 0.0143 | 2.98×10−8 | 0.5317 |
1996 | 0.3666 | 2.98×10−8 | 0.5955 |
1997 | 0.3403 | 2.98×10−8 | 0.7835 |
1998 | 0.2079 | 2.98×10−8 | 0.4834 |
1999 | 0.0176 | 2.98×10−8 | 0.9602 |
2000 | 0.0220 | 2.98×10−8 | 0.9800 |
2001 | 0.0067 | 2.98×10−8 | 0.8613 |
2002 | 0.0056 | 2.98×10−8 | 0.9602 |
2003 | 0.0176 | 2.98×10−8 | 0.9602 |
2004 | 0.0079 | 2.98×10−8 | 0.7835 |
It is important to note that in our calculations of and , we treated the genotype data slightly differently from vonHoldt et al. (2008). In their computations of ĤE and ĤO, missing data were treated as a separate allele. For a given individual at a given locus in the data of vonHoldt et al. (2008), data were always missing for both alleles or neither allele; therefore, treating missing data as an allele depresses ĤO by increasing the proportion of “homozygotes.” Comparing (Figure 1B) to ĤO (Figure 1A), we can observe that the upward trend in ĤO not observed for is partly explained by a difference in the treatment of missing data. As we will see below, however, this difference does not explain the difference in the trends seen for (Figure 1B) and ĤE (Figure 1A).
Treating missing data as a separate allele inflates ĤE, by adding another allele to the total number of distinct alleles in the calculation. Consequently, to ensure that the difference we observed between (Figure 1B) and the vonHoldt et al. (2008) estimates of ĤE (Figure 1A) was not the result of our differential handling of missing data, we compared annual values of H̃E, obtained with the same approach to missing data as vonHoldt et al. (2008), to the previously reported values of ĤE (Figure 1C, top and center lines). H̃E, calculated with missing data counted as an allele, shows the same lack of temporal trend as , calculated with missing data excluded, and it differs from ĤE, calculated without accounting for relatives and including missing data as an allele. Additionally, the Nei (1987) estimator applied to samples with missing data excluded ( ) shows a similar trend to the vonHoldt et al. (2008) values of ĤE, with missing data treated as a distinct allele (Figure 1C, bottom and center lines). We therefore conclude that the qualitative difference in expected heterozygosity we observe between the DeGiorgio & Rosenberg (2009) estimator and the vonHoldt et al. (2008) use of the Nei (1987) estimator is due to differences in how the estimators treat relatedness, not in how missing data were handled.
In summary, using the unbiased DeGiorgio & Rosenberg (2009) estimator of expected heterozygosity with the Yellowstone grey wolves, we have determined that expected and observed heterozygosity are similar (Figure 1B), and that indicators of genetic diversity do in fact correspond with behavioral observations of low inbreeding levels in the population. Our results also contrast with the previously published computations (vonHoldt et al. 2008) by finding no particular trend in expected heterozygosity over time. Additionally, whereas and differ significantly at the P<0.05 level for seven of the ten years of the study, the adjusted matches more closely across all ten years (Table 1). Thus, this example illustrates that the inherent bias in the standard Nei (1987) expected heterozygosity estimator due to sampling of relatives can have a sizeable impact on estimated heterozygosity values, and that the adjustment provided by the new DeGiorgio & Rosenberg (2009) estimator can alter the interpretation in cases in which relationships among individuals are largely known. As the Yellowstone wolves examined by vonHoldt et al. (2008) provide a prototypical genetic study of related individuals from a small natural population, our analysis suggests that the DeGiorgio & Rosenberg (2009) estimator will be informative in future analyses of the dynamics of gene diversity in the presence of close relatives.
Acknowledgments
We thank M. Cronin and an anonymous reviewer for helpful comments, and M. DeGiorgio for insight into H̃E. Support for this work was provided by the Burroughs Wellcome Fund, the Alfred P. Sloan Foundation, and NIH grant R01 GM081441. Data are available upon request from B. M. vH. (bvonhold@ucla.edu).
References
- Atkinson B, Therneau T. R package version 1.1.0-22. 2008. kinship: mixed-effects Cox models, sparse matrices, and modeling data from large pedigrees. [Google Scholar]
- Broman KW. Estimation of allele frequencies with data on sibships. Genetic Epidemiology. 2001;20:307–315. doi: 10.1002/gepi.2. [DOI] [PubMed] [Google Scholar]
- Bourgain C, Abney M, Schneider D, Ober C, McPeek MS. Testing for Hardy-Weinberg equilibrium in samples with related individuals. Genetics. 2004;168:2349–2361. doi: 10.1534/genetics.104.031617. [DOI] [PMC free article] [PubMed] [Google Scholar]
- DeGiorgio M, Rosenberg NA. An unbiased estimator of gene diversity in samples containing related individuals. Molecular Biology and Evolution. 2009;26:501–512. doi: 10.1093/molbev/msn254. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Lange K. Mathematical and Statistical Methods for Genetic Analysis. 2. New York: Springer-Verlag; 2002. [Google Scholar]
- Nei M. Molecular Evolutionary Genetics. New York: Columbia University Press; 1987. [Google Scholar]
- vonHoldt BM, Stahler DR, Smith DW, Earl DA, Pollinger JP, Wayne RK. The genealogy and genetic viability of reintroduced Yellowstone grey wolves. Molecular Ecology. 2008;17:252–274. doi: 10.1111/j.1365-294X.2007.03468.x. [DOI] [PubMed] [Google Scholar]
- Weir BS. Genetic Data Analysis II. Sunderland, Massachusetts: Sinauer Associates; 1996. [Google Scholar]