Extrapolating abundance curves has no predictive power for estimating microbial biodiversity

Amy Willis

doi:10.1073/pnas.1608281113

letter

. 2016 Aug 10;113(35):E5096. doi: 10.1073/pnas.1608281113

Extrapolating abundance curves has no predictive power for estimating microbial biodiversity

Amy Willis ^a,¹

PMCID: PMC5024625 PMID: 27512033

Locey and Lennon (1) recently conducted an analysis of microbial and macrobial communities to investigate the effect of sample size (N, number of individuals or reads observed) on community species richness (S), species evenness (Simpson), frequency distribution skew, and frequency count of most abundant taxa. They argue that log–log linear models fit these relationships, specifically claiming that the index of the power law between sample size and species richness is consistent across macro- and microorganisms. Furthermore, they use the “lognormal model of biodiversity” to estimate global microbial biodiversity around 10¹¹ to 10¹² taxa. Although their claims are appealing and elegant, I argue (from a statistical perspective) that their methods were inappropriate for the desired investigation.

The lognormal model of biodiversity (ref. 2 and references therein) posits that by modeling and extrapolating a species abundance curve it is possible to estimate the total number of species in the community. Unfortunately, this is untrue. Extrapolating abundance curves, accumulation curves, and rarefaction curves is unsound statistical practice.

The underpinning of the invalidity of extrapolating abundance curves relates to which relationships are correlative but not predictive, versus which are correlative and predictive. Generally statisticians use independent quantities associated with the environmental system (e.g., pH, temperature, etc.) to predict dependent quantities (e.g., species richness in a lake). However, observed species richness is not a quantity associated with the environmental system: It is a result of the sampling procedure. To illustrate, consider an ecosystem composed of bamboo, pandas, flies, and fish (true $S = 4$ ), and suppose our estimator of total diversity ( $\hat{S}$ ) is sample diversity (as in ref. 1). We begin by only sampling $N = 20$ individuals and only observe bamboo and flies $(\hat{S} = 2)$ . If we continue sampling up to $N = 100$ individuals we may also find a fish ( $\hat{S} = 3$ ), and if we continue we may eventually find a panda. However, the true number of distinct individuals in the ecosystem is unchanged for all choices of N: Only $\hat{S}$ changes. In this way, although there is a correlation between N and $\hat{S}$ , there is no correlation between N and S because true biodiversity (richness) in the ecosystem exists regardless of the experiment and experimenter. In this way, the lognormal model of biodiversity has no predictive power for true biodiversity, only describing features of the experiment and not the universe.

The only correct (statistically admissible) way to estimate species richness is by modeling the frequency counts: singletons, $f_{1}$ ; doubletons, $f_{2}$ ; tripletons, $f_{3}$ ; and so on. Probabilistic models permit extrapolation from $f_{1}, f_{2}, f_{3} \dots$ to predict $f_{0}$ , the number of species in the population that were not observed. The statistical literature on this problem dates to ref. 3, with recommendations available for the best models for both macro- and microorganism richness (4, 5).

The historical popularity of extrapolating abundance curves is a poor argument for its continued use. I encourage the authors to consider the statistical perspective on this problem and hope that improved communication between biodiversity statisticians and ecologists will advance understanding of biodiversity.

Footnotes

The author declares no conflict of interest.

References

1.Locey KJ, Lennon JT. Scaling laws predict global microbial diversity. Proc Natl Acad Sci USA. 2016;113(21):5970–5975. doi: 10.1073/pnas.1521291113. [DOI] [PMC free article] [PubMed] [Google Scholar]
2.Curtis TP, Sloan WT, Scannell JW. Estimating prokaryotic diversity and its limits. Proc Natl Acad Sci USA. 2002;99(16):10494–10499. doi: 10.1073/pnas.142680199. [DOI] [PMC free article] [PubMed] [Google Scholar]
3.Fisher RA, Corbet S, Williams CB. The relation between the number of species and the number of individuals in a random sample of an animal population. J Anim Ecol. 1943;12:42–58. [Google Scholar]
4.Bunge J, Fitzpatrick M. Estimating the number of species: A review. J Am Stat Assoc. 1993;88(421):364–373. [Google Scholar]
5.Willis A, Bunge J. Estimating diversity via frequency ratios. Biometrics. 2015;71(4):1042–1049. doi: 10.1111/biom.12332. [DOI] [PubMed] [Google Scholar]

[r1] 1.Locey KJ, Lennon JT. Scaling laws predict global microbial diversity. Proc Natl Acad Sci USA. 2016;113(21):5970–5975. doi: 10.1073/pnas.1521291113. [DOI] [PMC free article] [PubMed] [Google Scholar]

[r2] 2.Curtis TP, Sloan WT, Scannell JW. Estimating prokaryotic diversity and its limits. Proc Natl Acad Sci USA. 2002;99(16):10494–10499. doi: 10.1073/pnas.142680199. [DOI] [PMC free article] [PubMed] [Google Scholar]

[r3] 3.Fisher RA, Corbet S, Williams CB. The relation between the number of species and the number of individuals in a random sample of an animal population. J Anim Ecol. 1943;12:42–58. [Google Scholar]

[r4] 4.Bunge J, Fitzpatrick M. Estimating the number of species: A review. J Am Stat Assoc. 1993;88(421):364–373. [Google Scholar]

[r5] 5.Willis A, Bunge J. Estimating diversity via frequency ratios. Biometrics. 2015;71(4):1042–1049. doi: 10.1111/biom.12332. [DOI] [PubMed] [Google Scholar]

PERMALINK

Extrapolating abundance curves has no predictive power for estimating microbial biodiversity

Amy Willis

Footnotes

References

ACTIONS

PERMALINK

RESOURCES

Cite

Add to Collections

PERMALINK

Extrapolating abundance curves has no predictive power for estimating microbial biodiversity

Amy Willis

Footnotes

References

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases