Scaling of species distribution explains the vast potential marine prokaryote diversity

Victor M Eguíluz; Guillem Salazar; Juan Fernández-Gracia; John K Pearman; Josep M Gasol; Silvia G Acinas; Shinichi Sunagawa; Xabier Irigoien; Carlos M Duarte

doi:10.1038/s41598-019-54936-y

. 2019 Dec 10;9:18710. doi: 10.1038/s41598-019-54936-y

Scaling of species distribution explains the vast potential marine prokaryote diversity

Victor M Eguíluz ^1,^2,^✉, Guillem Salazar ³, Juan Fernández-Gracia ², John K Pearman ¹, Josep M Gasol ⁴, Silvia G Acinas ⁴, Shinichi Sunagawa ³, Xabier Irigoien ^5,⁶, Carlos M Duarte ¹

PMCID: PMC6904450 PMID: 31822687

Abstract

Global ocean expeditions have provided minimum estimates of ocean’s prokaryote diversity, supported by apparent asymptotes in the number of prokaryotes with sampling effort, of about 40,000 species, representing <1% of the species cataloged in the Earth Microbiome Project, despite being the largest habitat in the biosphere. Here we demonstrate that the abundance of prokaryote OTUs follows a scaling that can be represented by a power-law distribution, and as a consequence, we demonstrate, mathematically and through simulations, that the asymptote of rarefaction curves is an apparent one, which is only reached with sample sizes approaching the entire ecosystem. We experimentally confirm these findings using exhaustive repeated sampling of a prokaryote community in the Red Sea and the exploration of global assessments of prokaryote diversity in the ocean. Our findings indicate that, far from having achieved a thorough sampling of prokaryote species abundance in the ocean, global expeditions provide just a start for this quest as the richness in the global ocean is much larger than estimated.

Subject terms: Microbial ecology; Statistical physics, thermodynamics and nonlinear dynamics

Introduction

The ocean, the largest habitat in the biosphere, is a microbial-dominated ecosystem holding an estimated 10²⁹ prokaryote cells¹. Exploration of the ocean biodiversity associated with the huge prokaryote pool was prevented due to the limitations in the cultivation of marine prokaryotes². This barrier was partially overcome by efficient sequencing approaches, typically targeting the genes that code for the 16S region of rDNA, which allows the definition and enumeration of the operational taxonomic units (OTUs) present in a sample, thereby providing a culture-free basis to assess biodiversity somewhat equivalent to that of species numbers³. In the past decade, global ocean expeditions and research based on them have utilized these technological developments in order to attempt to estimate the total number of prokaryote OTUs in the ocean^4–8. For instance, the TARA Oceans Expedition explored prokaryote biodiversity in the upper ocean and described the detection of 35,650 prokaryote OTUs⁵ in a set of globally distributed samples, with the exception of the Arctic, while the Malaspina Expedition gave a minimum estimate of the number of prokaryote OTUs in the deep ocean which is an order of magnitude lower, at around 3,700⁴. The TARA Expedition estimated the total richness to be 37,470 OTUs based on the Chao estimator, which defines a lower bound on species richness. This result should be interpreted to be at least 37.470 OTUs in the upper ocean.

The fraction of the total volume of the ocean sampled by any study is minimal and thus requires extreme extrapolation (over 20 orders of magnitude) from the number of species found in the samples to an estimate for the global ocean. The approach used is that of rarefaction curves, a development first introduced in 1943 by Fisher et al. to provide a basis to estimate the species richness of Malaysian butterflies⁹, subsequently popularized by Sanders (1968)¹⁰ to compare benthic invertebrate species richness from marine surveys with different sample sizes. Rarefaction curves use resampling techniques to develop a curve of the number of species against the number of samples collected¹¹. Initially introduced to evaluate how comprehensive the assessment of species numbers was based on a sampling set, it was subsequently used to infer the total number of species in the ecosystem investigated as that corresponding to the asymptote of the curve¹². This approach was adopted to deliver estimates of the prokaryote species richness in the global ocean^4,5. These estimates correspond mathematically to minimum estimates (e.g., Chao estimator)¹³, yet their precision has not been assessed. Indeed, beyond the apparent asymptote in rarefaction curves, other estimators have been proposed to estimate species richness^13–16. Marine prokaryote communities are characterized by the presence of a few abundant OTUs and a large number of rare OTUs², suggesting a much broader distribution of OTU abundance than that required to reliably apply rarefaction curves to estimate the global biodiversity of prokaryotes. Here we examine the scaling of prokaryote diversity in the ocean as a step to better understanding the extent that current assessments may underestimate prokaryote diversity in the global ocean. We do so using an array of novel approaches, including assessments across the global ocean coupled with experimental and in silico tests, to establish the scaling of ocean microbial diversity and explore its implications for the discovery of microbial diversity.

Results

Prokaryote diversity in the upper and deep ocean

The distribution of prokaryote OTUs in the upper ocean and deep ocean samples of the TARA Oceans⁵ and Malaspina⁴ Expeditions conform to broad distributions with power-law behavior, P(x) ~ x^−1−α, where x represents the abundance measured in number of reads, and is characterized (the tail of the distribution) by a scaling exponent α = 1.57 for the upper ocean, and α = 0.89 for the deep ocean (Fig. 1), similar to the classic power-law describing the number of species per taxa of Willis and Yule (1922)¹⁷. A comparison to other broad distributions (lognormal, Weibull) shows that a distribution with a power-law tail (either pure power-law or truncated power-law) are most likely to be the best fitting (Table 1). This finding implies that the most abundant 1% OTUs account for 40% of the sequences while the least abundant 90% of sampled OTUs account for only 10% of the sequences in the upper ocean; while for the deep ocean, the most abundant 1% of OTUs account for more than 70% of the sequences while the least abundant 90% of sampled OTUs account for only 8% of the sequences.

Abundance distribution of prokaryote OTUs in the upper and deep ocean. The rank vs abundance distribution for the (A) upper ocean and (B) deep ocean shows broad distributions with power-law tails. The abundance-rank distribution, r ~ x^−α, where r is the rank of abundance x, has the same functional dependence (only the ranks have to be normalized between 0 and 1) as the complementary cumulative distribution CCD, CCD(x) = ∑_{i = x,∞} P(i), where P(i) is the abundance distribution. Thus, if the abundance rank distribution is given by r ~ x^−α the abundance distribution decays as P(x) ~ x^−1−α. (A) For the upper ocean, the abundance distribution shows a double power-law decay separated at a characteristic scale of 2,313 reads: for abundances x < 2,313, the scaling exponent is 0.37 (blue line); for abundances x > 2,313, the scaling exponent is α = 1.57 (see Materials and Methods). (B) For the deep ocean, the abundance-rank distribution is characterized by a power-law decay, P(x) ~ x^−1−α, with an exponent of α = 0.89 (red line).

Table 1.

Comparing fitting models to the prokaryote abundance distribution.

	ΔAIC PL	ΔAIC TPL	ΔAIC LN	ΔAIC Weibull	α	standard error (α)	β	λ
Upper ocean	0.47	0	0.74	0.77	1.57	0.09	1.34	0.000019
Deep ocean	0.03	0	2.01	15	0.89	0.02	0.73	0.000002
Mesocosm C1	23	0	8.44	5.84	0.52	0.02	0.41	0.000106
Mesocosm C2	0	2	2.01	3.00	0.52	0.31	0.52	0
Mesocosm C3	19	0	13	11	0.53	0.02	0.43	0.000088
Mesocosm C4	26	0	8.36	5.57	0.54	0.02	0.42	0.000119
Mesocosm C5	38	0	13	9.49	0.57	0.02	0.38	0.000178
Mesocosm C6	18	0	13	11	0.52	0.02	0.44	0.000080

Open in a new tab

The delta Akaike Information Criterion (ΔAIC) indicates the most likely fit (value 0 in bold) and the difference to the most likely fit. For the six cases reported, the most likely fit is a distribution with a power-law decay (either pure or truncated). The parameters of a power law distribution P(x) ~x ^{−1− α} are the scaling exponent α; for the truncated power-law P(x) ~ x ^{−1− β} exp(−λx), are the scaling exponent β, and the characteristic abundance λ (λ = 0, for a pure power-law). ΔAIC PL: delta Akaike Information Criterion for power-law distribution fit; ΔAIC TPL: delta Akaike Information Criterion for truncated power-law distribution fit; ΔAIC LN: delta Akaike Information Criterion for log-normal distribution fit; ΔAIC W: delta Akaike Information Criterion for Weibull distribution fit. The standard error of the power-law scaling exponent (α) is also reported. For the upper ocean, the prokaryote abundance distribution shows a double power-law regime. A Maximum Likelihood Estimation for a double power-law model gives P(x) ~ x ^{−1− δ}, with exponent δ = 1.54 for x < 2,313; and P(x) ~x ^{−1− α}, with exponent α = 0.36 for x ≥ 2313 (see Materials and Methods).

Theoretical scaling

Prokaryote diversity and, in general, species diversity can be characterized by magnitudes like the Shannon and Simpson indices, which by giving greater weight to the larger, common species, provide estimators with less uncertainty¹³ (Supplementary Table 1). However, the presence of rare species impacts the estimation of species richness. Species richness scales with sampling effort as a consequence of the power-law tail of the distribution of prokaryote abundance. Let us assume that the number of OTUs of abundance x, n_x, is given by n_x = Ax^−1−α, where A is a normalizing constant, the scaling exponent α is larger than 0, α > 0, and the abundances are in the range n_x ∈ [1, N_max]. Thus, the total species richness, S, is given by S = ∑_x=1,Nmax n_x. In the limit of large N_max, the richness can be approximated as $S = A ζ (1 + α)$ , that is, A = S/ζ (1 + α), where ζ (α) is the Riemann zeta function. The total number of reads N can be obtained by N = ∑_x=1,Nmax x n_x. For α >1, we obtain

N = \frac{ζ (a)}{ζ (1 + a)}

For α < 1, in the continuous limit $N = A \int_{1}^{N_{\max}} x^{- a} d x = \frac{1}{(1 - a) ζ (1 + a)} S (N_{\max}^{1 - a} - 1)$ and the assumption that $N_{\max}^{1 - α} ≫ 1$ , we obtain

N = \frac{1}{(1 - a) ζ (1 + a)} S N_{\max}^{1 - a}

Finally, the abundance of the most abundant OTU can be evaluated as the value N_max at which there is only one group with abundance larger or equal than N_max, that is, in the continuous limit $\int_{N_{\max}}^{\infty} n_{x} dx = 1$ . This leads to $S N_{\max}^{α}$ (a detailed calculation can be found in ref. ¹⁸).

Combining the previous expressions, we obtain the following scaling laws: $S \propto N_{\max}^{α}$ and for α < 1

S \propto N_{\max}^{a} \propto Ν^{α}

while for α > 1

S \propto N_{\max}^{a} \propto N .

The same scaling laws are obtained in the Yule model¹⁹ (which can also be mapped to the Simon model^20,21), where the scaling exponent α is related to the ratio between speciation rate g and group growth s, $α$ = g/s. Systems showing distributions with power-law tails are ubiquitous: several methodologies have been described to fit and compare different functional forms as well as mechanisms to explain their origin^18,22–24.

Empirical and in silico scaling

The scaling of species richness and the distribution of species abundances are two sides of the same coin. The power-law distribution of prokaryote species abundance implies that species richness (S) scales with sampling effort (N, number of samples) as S ~ N^γ, where (i) γ equals the exponent of the rank-abundance power-law (i.e., γ = α), when this exponent is α < 1, as observed in the deep ocean (Malaspina Oceans Expedition, Fig. 2), and (ii) S is proportional to sampling effort (i.e., γ = 1) for larger exponents α > 1, such as observed for the upper ocean (TARA Expedition, Fig. 2). Indeed, the power-law scaling of species richness with sampling effort implicit in the power-law distribution of the prokaryote species abundance distribution (Fig. 1) implies that the asymptote of rarefaction curves is artifactual and that indeed, the number of species does not approach any asymptote at the sampling effort this far deployed by global expeditions (Fig. 2). This expectation was confirmed by producing an in silico global ocean microbiome with an underlying distribution of prokaryote species abundance with the same shape and exponent as those empirically derived for the upper and deep ocean (dotted lines in Fig. 2). The in silico data was obtained, first, by expanding the empirically fitted data to larger populations and, second, by randomly generating abundance OTUs from the expanded distributions (see Materials and Methods). These simulations showed that increasing sampling effort, expressed as the total number of 16S reads sequenced, about 30 to 50 times relative to that applied to the upper and deep ocean by the TARA Oceans (3.3 × 10⁶ reads, ref. ⁵) and Malaspina Expedition (1.8 × 10⁶ reads, ref. ⁴) respectively would lead to estimates of prokaryote species abundance 4.2 and 1.2 times greater than inferred on the basis of rarefaction curves for the upper and deep ocean respectively (Fig. 2 and Supplementary Fig. 1). The estimators are calculated for a global population of 10⁸ reads, which corresponds to 1 liter of upper ocean water (10⁵ prokaryote cells/ml) and 10 liters of deep ocean water (10⁴ prokaryote cells/ml) (Supplementary Table 1).

Number of species as a function of the number of reads. The expected number of OTUs in a random sampling of the total population grows sublinearly with sampling size, S ~ N^γ. (A) In the upper ocean (continuous black line), we can identify a first quasi-linear regime with γ = 0.90 (confidence interval 95% <0.01) and a second regimen with γ = 0.33 (confidence interval <0.01), while (B) in the deep ocean (continuous red line) the exponent γ = 0.62 (confidence interval <0.01). The number of OTUs in the upper ocean (horizontal dotted black line) is estimated at 35,650 OTUs⁵ and in the deep ocean (horizontal dotted red line) the maximum number of OTUs found is 3,695⁴.

Mesocosm experiment

We challenged the mathematically-derived predictions, tested and confirmed by the in silico experiment, by enclosing a plankton community of the Central Red Sea in duplicate, and sampling and sequencing it every day during 20 days²⁵ (c.f. Materials and Methods). The abundance distribution of prokaryote OTUs in the sampled Central Red Sea community continued to increase with additional sampling effort (Fig. 3), according to a power-law distribution with an average exponent of α = 0.53, comparable to that obtained for the deep ocean (α = 0.89) and for the less abundant of the upper ocean (α = 0.36) (Fig. 3D). In line with the upper and deep ocean cases, a comparative analysis performed for all the samples of the mesocosm experiment in three experimental conditions (control, single dose Nitrate-Phosphate addition and single dose Nitrate-Phosphate-Silicate addition) shows that a distribution with a power-law decay (either as a pure power-law or a truncated power-law) is the most likely fit (Supplementary Tables 2–7). The results confirmed the expectation that the number of OTUs retrieved in this community increased, on average, with the power 0.46 of the cumulative number of 16S reads sequenced without a clear asymptotic behavior despite exhaustive sampling (Fig. 3A–C and Tables 1 and 2).

Scaling of the number of OTUs with the number of reads in an experiment. The number of prokaryote OTUs as a function of the number of reads is plotted, in a log-log scale, every two days as the experiment runs for 20 days in different conditions (A) control (Mesocosm C1 and C2), (B) single dose nitrate phosphate addition (NP) (Mesocosm C3 and C4), and (C) single dose nitrate phosphate sulfate addition (NPS) (Mesocosm C5 and C6). For all the conditions, we plot two replicates. The number of OTUs, S, scales with the number of reads, N, as S ~ N^γ, with γ = 0.44, 0.40 (control), 0.38, 0.40 (NP), 0.48, 0.52 (NPS). The insets show the same data in linear scale (same ranges as main plots) where an apparent saturation asymptote is observed. (D) Abundance vs rank plot for one of the controls for successive days from bottom to top. The exponent of a power-law distribution fit, P(x) ~ x^−1−α, for the aggregated data after 20 days (black line) is α = 0.52.

Table 2.

Scaling exponents and confidence interval for the mesocosm experiment.

	Scaling exponent γ	Confidence Interval (95%)	Days of observation
Mesocosm C1	0.44	0.026	14
Mesocosm C2	0.70	0.089	18
Mesocosm C3	0.44	0.039	19
Mesocosm C4	0.29	0.043	17
Mesocosm C5	0.36	0.062	19
Mesocosm C6	0.54	0.076	20

Open in a new tab

For each condition and for each replica of the mesocosm experiment, the number of prokaryote species is fitted with the number of reads S ~ N^γ, with a least square method and the confidence intervals are calculated according to the number of days of observations in each condition.

Discussion

The results presented show that the abundance of different prokaryotic species in the ocean is described by a power-law distribution that implies that the total number of OTUs continues to increase, with a power given by that of the rank-abundance power-law, with increasing sampling effort. The dependence of the estimated richness on sampling effort is not an exclusive property of a power-law distribution and it has also been reported for lognormal distributions both theoretically²⁶ and empirically^7,23. We expect that the effort-dependence of the species richness applies to distributions with sufficient long tails and thus characterized by the presence of many rare species (OTUs). Thus, in the presence of a rare biosphere², the effort-dependence of richness estimates is the expected outcome. Hence, the estimates that the upper and deep ocean contain ca. 37,000 and 3,700 prokaryote OTUs^4,5, respectively, derived from rarefaction curves is an underestimate (Fig. 2). The estimation of the diversity based on sampling effort (both the number of samples collected and the sequencing depth applied to each sample) still represents a challenge and requires broad extrapolations. We have addressed the estimation of prokaryote diversity with the parsimonious assumption that the sampled distribution represents the population distribution, furthermore supported by the relatively conserved shape of this abundance distributions when sampling is replicated as in our mesocosm experiment (see Supplementary Tables 2–7). Thus, we have explored the estimation of prokaryote diversity derived from fitting different underlying distributions to the upper and deep ocean, and the mesocosm experiment. Future research increasing sampling effort, both for individual communities and locations across the ocean, are likely to yield OTU counts much higher than these estimates. The power-law distribution of species richness is not a new observation in ecology^27–31 but is rooted in the seminal work of Willis and Yules showing a power-law distribution of species membership within taxa¹⁷. Indeed, a recent estimate of oceanic prokaryote species richness derived by extrapolating across more than 20 orders of magnitude the relationship between species numbers and number of cells sampled to match the 10²⁹ prokaryote cells estimated in the global ocean, led to an estimate of 10¹⁰ different OTUs for this ecosystem⁷. Whereas the estimate derived from such wild extrapolation rests on a number of assumptions and does not necessarily reflect the shape of species abundance distribution of oceanic prokaryotes, it supports our empirical, mathematical, modeling and experimental results that indicate that the number of prokaryote OTUs in the ocean is far larger than currently estimated. A much-enhanced sampling effort is, therefore, required to unveil the prokaryote diversity concealed within the rare biosphere. Enhanced sampling efforts should be deployed both to retrieve the least abundant components of anyone community and also to benefit from the dynamics of microbial populations, which can bring otherwise rare components of the microbial biosphere to a level of abundance where they may be retrieved in sequencing projects (e.g., ref. ³²). Efforts to achieve an inventory of prokaryotic OTUs in the ocean will require a far more exhaustive sampling than deployed to date combined with sound extrapolation approaches rooted in the observed abundance distributions of prokaryotic OTUs.

Materials and Methods

Data and experimental design

We have analyzed three datasets. The three empirical datasets are: from the TARA expedition we collected the abundance of 18,022 OTUs from the surface water and deep chlorophyll maximum layers in 63 and 46 sites, respectively, containing 3,323,839 reads⁵ (available at http://ocean-microbiome.embl.de/companion.html). From the Malaspina expedition, we collected the abundance of 3,695 free-living and particle-attached OTUs from 30 globally distributed sites in the bathypelagic ocean⁴ (available at https://github.com/GuillemSalazar/MolEcol_2015). The experimental data reported the OTU abundance every day for a period of 20 days in three experimental conditions: (a) control (referred as Mesocosm C1 and C2), (b) single dose Nitrate-Phosphate addition (referred as C3 and C4), and (c) single dose Nitrate-Phosphate-Silicate addition (referred as C5 and C6) (Nitrate = 2 µM, Phosphate = 0.12 µM, Silicate = 3.75 µM)²⁵. Samples range from an average of 11,126 ± 5,400 (SD) reads leading to 337 ± 100 (SD) OTUs the first day to an aggregated number of 212,761 ± 22,000 (SD) reads and 1,331 ± 56 (SD) OTUs after completion of the experiment. Raw reads, which the OTUs counts were based on, have been deposited in the NCBI Sequence Read Archive under the accession number SRP051855.

Statistical analysis

Abundance distribution

The model fittings of the power-law distributions, the truncated power-law distributions, lognormal distributions, and the stretched exponential distributions ware obtained with the Maximum Likelihood Estimation applied to the empirical data³³. For the upper ocean, we have fitted also a double power-law distribution.

In silico prokaryote diversity: upper ocean

We proposed a distribution with two power-law regimes, with the parameter values (scaling exponents and transition point) obtained as described below: P(x) = Ax^−1−δ, for abundances x ≤ x_c, and P(x) = Bx^−1−α, for x > x_c. The condition that the distribution is continuous at x_c (P(x_c) = Ax_c^−1−δ = Bx_c^−1−α) and the normalization (ΣP(x) = 1), lead to the values A = δ + (δ – α) x_c^−α, and B = Ax_c ^{(δ –α)}. We assigned to the exponents α and δ, and to the transition point x_c the values obtained from the Maximum Likelihood α = 1.54, δ = 0.36, and x_c = 2,313.

In silico prokaryote diversity: deep ocean

We proposed a shifted power-law to capture the power-law tail and the deviation at the head of the distribution: P(x) = α ((x + x₀)/(1 + x₀))^{−1− α}. The parameters α and x₀ can be obtained by the Maximum Likelihood Estimation: α = N_OTU Σlog ((x₀ + x_i)/(1 + x₀)), and (x₀ + 1) Σ1/(1 + x_i) = N_OTU α /(1 − α). To solve these implicit equations, we proposed x₀ and α, evaluate the previous expressions, and obtained new values x₀′ and α′. We repeated these steps until we reached the condition |x₀′ − x₀| < T, for some convergence value T. For T = 10⁻⁶, the values we obtained are α = 0.89, and x₀ = 20.34.

Akaike Information Criterion (AIC)

The Akaike Information Criterion is defined as AIC = −2log L + 2 V, where L is the maximum likelihood of a fit model, and V is the number of free parameters. The delta Akaike Information Criterion is calculated as ΔAIC = AIC-AIC_min, where AIC_min corresponds to the minimum value of all the candidate models, and AIC the value of the candidate model. The weight AIC

w_{i} (A I C) = \frac{\exp (\frac{- 1}{2} Δ_{i} AIC)}{\sum_{K = 1}^{M} \exp (\frac{- 1}{2} Δ_{k} AIC)}

can be interpreted as the probability that the model is the best model (in the AIC sense, that it minimizes the Kullback–Leibler discrepancy), given the data and the set

of candidate models (e.g., Burnham & Anderson, 2001).

Extrapolation of abundance distributions for larger number of samples

For the upper Ocean, the abundance distribution is fitted to a double power-law defined as P(x) = Ax^−1−δ for x < x_c and P(x) = Bx^−1−α for x_c < x. A continuity condition (Ax_c^−1−δ = Bx_c^−1−α) and the normalization condition (1 = ∫₁^∞P(x)dx) gives the values for the constants A and B as A = αδ(α + (δ − α)x_c^−δ)⁻¹ and B = A x_c^α−δ. In order to fit this distribution, we have to obtain estimates for the two exponents δ and α and for the cutoff x_c. We use first the maximum likelihood method implemented in ref. ³⁰ which fits the exponent for the tail α and the value of the cutoff x_c. Then we adjust the value of the exponent for the range [1, x_c] by using the same method, only fixing the minimum value to 1 and disregarding any data over the cutoff value x_c. In order to extract the behavior of the parameters for an increasingly large ecosystem, we used increasingly randomly aggregated samples from the TARA Oceans Expedition (139 samples in total). The average parameters for aggregations of samples of similar total number of reads are shown in the left column of Supplementary Fig. 2 in black and the error bars reflect their standard deviation. Next, in order to extrapolate these parameters to larger number of reads we fitted the estimated parameters to some simple curves (shown in red in Supplementary Fig. 2). The results were x_c = 0.0002 · N_reads^1.1 + 52.6, δ = 0.32 (1 + 0.71 exp(−N_reads/570007)) and α = 1.42 (1 − 0.2 exp(−N_reads/110185)). Note that the values of the scaling exponent of the tail of the distribution α are in agreement with recently reported estimates³⁴. For the in-vitro generation of larger samples we extrapolated the parameter values to the value corresponding to the desired number of reads and generated random numbers from the corresponding distribution up to the desired number of reads, using the method of the inversion of the cumulative distribution.

For the deep Ocean, the abundance distribution is fitted to a shifted power-law P(x) = A(x + x₀)^−1−α with a maximum possible value for the abundance x_max. The value of A is given by the normalization condition (1 = ∫₁^XmaxP(x)dx) and is A = α((1 + x₀)^−α − (x_max + x₀)^−α)⁻¹. In this case, we need to estimate again three parameters to fit the distribution. In order to estimate the parameters, we first fitted the exponent α and the shifting parameter x₀ by solving iteratively the equations from maximum likelihood:

a = S {(\sum_{i = 1}^{S} \log \frac{(x_{i} + x_{0})}{1 + x_{0}})}^{- 1}

x_{0} = a S {((1 + a) \sum_{i = 1}^{s} \frac{1}{x_{i} + x_{0}})}^{- 1},

where S stands for the number of data points. With those estimated parameters we estimated the maximum abundance x_max through the average abundance <x> found in the data by solving the implicit equation <x> = ∫₁^XmaxxP(x)dx:

〈 x 〉 = \frac{a}{1 - a} \frac{{(x_{\max} + x_{0})}^{1 - a} - {(1 + x_{0})}^{1 - a}}{{(1 + x_{0})}^{- a} - {(x_{\max} + x_{0})}^{- a}} - x_{0}

The parameters are shown in the right column of Supplementary Fig. 2 and again in black are average estimates with standard deviations shown with error bars, and in red the simple fitted curves used for the extrapolation. In this case the simple curves fitted were x₀ = 0.000003 N_reads^1.1 – 1, α = 0.88 (1 − 0.45 exp(−N_reads/363263)) and <x> = 0.00042 N_reads^0.97 + 23.6.

The estimation for a larger number of reads was performed as for the upper ocean but using the proper shifted power-law distribution as given by the extrapolated parameters.

Supplementary information

Supplementary Figures^{(720.5KB, pdf)}

Acknowledgements

This research was funded by the Malaspina Circumnavigation Expedition supported by the Spanish Ministry of Science and Innovation through project Consolider-Ingenio Malaspina 2010 (CSD2008-00077) as well as King Abdullah University of Science and Technology (KAUST) through baseline funding to C.M. Duarte and X. Irigoien; by Agencia Estatal de Investigación (AEI) and Fondo Europeo de Desarrollo Regional (FEDER) through projects SPASIMM FIS2016-80067-P (AEI/FEDER, UE) and REMEI (CTM2015-70340-R); and by the Spanish State Research Agency through the María de Maeztu Program for Units of Excellence in R&D (MDM-2017-0711 to IFISC). S.S. is supported by the ETH and Helmut Horten Foundation. We thank Craig Michel for sequencing library preparation and Laura Casas for laboratory work. Further, we thank Naroa Aldanondo, Susana Carvalho, Amr Gusti, Karie Holtermann, Ioannis Georgakakis, Nazia Mojib and Tane Sinclair-Taylor as well as the personnel of the Coastal & Marine Resources core laboratory (CMOR) for their help in undertaking the sampling.

Author contributions

V.M.E. and C.M.D. conceived the idea; V.M.E., C.M.D. and J.F.-G. performed the analysis; V.M.E., G.S., J.F.-G. J.K.P., J.M.G., S.A., S.S., X.I. and C.M.D. contributed to the discussion, and writing of the manuscript.

Data availability

The TARA expedition dataset is available at http://ocean-microbiome.embl.de/companion.html; the Malaspina expedition dataset is available at https://github.com/GuillemSalazar/MolEcol_2015; and the experimental data have been deposited in the NCBI Sequence Read Archive under the accession number SRP051855.

Competing interests

The authors declare no competing interests.

Footnotes

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

is available for this paper at 10.1038/s41598-019-54936-y.

References

1.Whitman WB, Coleman DC, Wiebe WJ. Prokaryotes: The unseen majority. Proceedings of the National Academy of Sciences. 1998;95:6578–6583. doi: 10.1073/pnas.95.12.6578. [DOI] [PMC free article] [PubMed] [Google Scholar]
2.Pedrós-Alió C, Manrubia S. The vast unknown microbial biosphere. Proceedings of the National Academy of Sciences. 2016;113:6585–6587. doi: 10.1073/pnas.1606105113. [DOI] [PMC free article] [PubMed] [Google Scholar]
3.Rosselló-Mora R, Amann R. The species concept for prokaryotes. FEMS Microbiology Reviews. 2001;25:39–67. doi: 10.1111/j.1574-6976.2001.tb00571.x. [DOI] [PubMed] [Google Scholar]
4.Salazar G, et al. Global diversity and biogeography of deep-sea pelagic prokaryotes. ISME J. 2016;10:596–608. doi: 10.1038/ismej.2015.137. [DOI] [PMC free article] [PubMed] [Google Scholar]
5.Sunagawa S., Coelho L. P., Chaffron S., Kultima J. R., Labadie K., Salazar G., Djahanschiri B., Zeller G., Mende D. R., Alberti A., Cornejo-Castillo F. M., Costea P. I., Cruaud C., d'Ovidio F., Engelen S., Ferrera I., Gasol J. M., Guidi L., Hildebrand F., Kokoszka F., Lepoivre C., Lima-Mendez G., Poulain J., Poulos B. T., Royo-Llonch M., Sarmento H., Vieira-Silva S., Dimier C., Picheral M., Searson S., Kandels-Lewis S., Bowler C., de Vargas C., Gorsky G., Grimsley N., Hingamp P., Iudicone D., Jaillon O., Not F., Ogata H., Pesant S., Speich S., Stemmann L., Sullivan M. B., Weissenbach J., Wincker P., Karsenti E., Raes J., Acinas S. G., Bork P., Boss E., Bowler C., Follows M., Karp-Boss L., Krzic U., Reynaud E. G., Sardet C., Sieracki M., Velayoudon D. Structure and function of the global ocean microbiome. Science. 2015;348(6237):1261359–1261359. doi: 10.1126/science.1261359. [DOI] [PubMed] [Google Scholar]
6.Zinger L, et al. Global Patterns of Bacterial Beta-Diversity in Seafloor and Seawater Ecosystems. PLOS ONE. 2011;6:e24570. doi: 10.1371/journal.pone.0024570. [DOI] [PMC free article] [PubMed] [Google Scholar]
7.Locey, K. J. & Lennon, J. T. Scaling laws predict global microbial diversity. Proceedings of the National Academy of Sciences113, 5970–5975, 10.1073/pnas.1521291113 (2016). [DOI] [PMC free article] [PubMed]
8.Yooseph Shibu, Nealson Kenneth H., Rusch Douglas B., McCrow John P., Dupont Christopher L., Kim Maria, Johnson Justin, Montgomery Robert, Ferriera Steve, Beeson Karen, Williamson Shannon J., Tovchigrechko Andrey, Allen Andrew E., Zeigler Lisa A., Sutton Granger, Eisenstadt Eric, Rogers Yu-Hui, Friedman Robert, Frazier Marvin, Venter J. Craig. Genomic and functional adaptation in surface ocean planktonic prokaryotes. Nature. 2010;468(7320):60–66. doi: 10.1038/nature09530. [DOI] [PubMed] [Google Scholar]
9.Fisher RA, Corbet AS, Williams CB. The Relation Between the Number of Species and the Number of Individuals in a Random Sample of an Animal Population. Journal of Animal Ecology. 1943;12:42–58. doi: 10.2307/1411. [DOI] [Google Scholar]
10.Sanders HL. Marine Benthic Diversity: A Comparative Study. The American Naturalist. 1968;102:243–282. doi: 10.1086/282541. [DOI] [Google Scholar]
11.Gart JJ, Siegel AF, German RZ. Rarefaction and Taxonomic Diversity. Biometrics. 1982;38:235–241. doi: 10.2307/2530306. [DOI] [Google Scholar]
12.Gotelli NJ, Colwell RK. Quantifying biodiversity: procedures and pitfalls in the measurement and comparison of species richness. Ecology Letters. 2001;4:379–391. doi: 10.1046/j.1461-0248.2001.00230.x. [DOI] [Google Scholar]
13.Haegeman Bart, Hamelin Jérôme, Moriarty John, Neal Peter, Dushoff Jonathan, Weitz Joshua S. Robust estimation of microbial diversity in theory and in practice. The ISME Journal. 2013;7(6):1092–1101. doi: 10.1038/ismej.2013.10. [DOI] [PMC free article] [PubMed] [Google Scholar]
14.Gotelli NJ, Colwell RK. Estimating species richness. Biological diversity: frontiers in measurement and assessment. 2011;12:39–54. [Google Scholar]
15.Gotelli, N. J. & Chao, A. In Encyclopedia of Biodiversity (Second Edition) 195–211 (Academic Press, 2013).
16.Chao A, Colwell RK, Lin C-W, Gotelli NJ. Sufficient sampling for asymptotic minimum species richness estimators. Ecology. 2009;90:1125–1133. doi: 10.1890/07-2147.1. [DOI] [PubMed] [Google Scholar]
17.Willis JC, Yule GU. Some Statistics of Evolution and Geographical Distribution in Plants and Animals, and their Significance. Nature. 1922;109:177–179. doi: 10.1038/109177a0. [DOI] [Google Scholar]
18.Newman MEJ. Power laws, Pareto distributions and Zipf’s law. Contemporary Physics. 2005;46:323–351. doi: 10.1080/00107510500052444. [DOI] [Google Scholar]
19.Yule G. U. A Mathematical Theory of Evolution, Based on the Conclusions of Dr. J. C. Willis, F.R.S. Philosophical Transactions of the Royal Society B: Biological Sciences. 1925;213(402-410):21–87. doi: 10.1098/rstb.1925.0002. [DOI] [Google Scholar]
20.SIMON HERBERT A. ON A CLASS OF SKEW DISTRIBUTION FUNCTIONS. Biometrika. 1955;42(3-4):425–440. doi: 10.1093/biomet/42.3-4.425. [DOI] [Google Scholar]
21.Simkin MV, Roychowdhury VP. Re-inventing Willis. Physics Reports. 2011;502:1–35. doi: 10.1016/j.physrep.2010.12.004. [DOI] [Google Scholar]
22.Perc M. The Matthew effect in empirical data. J. R. Soc. Interface. 2014;11:20140378. doi: 10.1098/rsif.2014.0378. [DOI] [PMC free article] [PubMed] [Google Scholar]
23.Deluca A, Corral A. Fitting and goodness-of-fit test of non-truncated and truncated power-law distributions. Acta Geophysica. 2013;61:1351–1394. doi: 10.2478/s11600-013-0154-9. [DOI] [Google Scholar]
24.Voitalov I, van der Hoorn P, van der Hofstad R, Krioukov D. Scale-free networks well done. Phys. Rev. Research. 2019;1:033034. [Google Scholar]
25.Pearman JK, Casas L, Merle T, Michell C, Irigoien X. Bacterial and protist community changes during a phytoplankton bloom. Limnology and Oceanography. 2016;61:198–213. doi: 10.1002/lno.10212. [DOI] [Google Scholar]
26.Curtis TP, Sloan WT, Scannell JW. Estimating prokaryotic diversity and its limits. Proceedings of the National Academy of Sciences. 2002;99:10494–10499. doi: 10.1073/pnas.142680199. [DOI] [PMC free article] [PubMed] [Google Scholar]
27.Hoffmann KH, et al. Power law rank–abundance models for marine phage communities. FEMS Microbiology Letters. 2007;273:224–228. doi: 10.1111/j.1574-6968.2007.00790.x. [DOI] [PubMed] [Google Scholar]
28.Datta S, Delius GW, Law R, Plank MJ. A stability analysis of the power-law steady state of marine size spectra. Journal of Mathematical Biology. 2011;63:779–799. doi: 10.1007/s00285-010-0387-z. [DOI] [PubMed] [Google Scholar]
29.Rozenfeld AF, et al. Network analysis identifies weak and strong links in a metapopulation system. Proceedings of the National Academy of Sciences. 2008;105:18824–18829. doi: 10.1073/pnas.0805571105. [DOI] [PMC free article] [PubMed] [Google Scholar]
30.Edwards RA, Rohwer F. Viral metagenomics. Nature Reviews Microbiology. 2005;3:504. doi: 10.1038/nrmicro1163. [DOI] [PubMed] [Google Scholar]
31.Angly F, et al. PHACCS, an online tool for estimating the structure and diversity of uncultured viral communities using metagenomic information. BMC Bioinformatics. 2005;6:41. doi: 10.1186/1471-2105-6-41. [DOI] [PMC free article] [PubMed] [Google Scholar]
32.Hugoni M, et al. Structure of the rare archaeal biosphere and seasonal dynamics of active ecotypes in surface coastal waters. Proceedings of the National Academy of Sciences. 2013;110:6004–6009. doi: 10.1073/pnas.1216863110. [DOI] [PMC free article] [PubMed] [Google Scholar]
33.Clauset A, Shalizi CR, Newman MEJ. Power-Law Distributions in Empirical Data. SIAM Rev. 2009;51:661–703. doi: 10.1137/070710111. [DOI] [Google Scholar]
34.Ser-Giacomi E, et al. Ubiquitous abundance distribution of non-dominant plankton across the global ocean. Nature Ecology & Evolution. 2018;2:1243–1249. doi: 10.1038/s41559-018-0587-2. [DOI] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Supplementary Figures^{(720.5KB, pdf)}

Data Availability Statement

[CR1] 1.Whitman WB, Coleman DC, Wiebe WJ. Prokaryotes: The unseen majority. Proceedings of the National Academy of Sciences. 1998;95:6578–6583. doi: 10.1073/pnas.95.12.6578. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR2] 2.Pedrós-Alió C, Manrubia S. The vast unknown microbial biosphere. Proceedings of the National Academy of Sciences. 2016;113:6585–6587. doi: 10.1073/pnas.1606105113. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR3] 3.Rosselló-Mora R, Amann R. The species concept for prokaryotes. FEMS Microbiology Reviews. 2001;25:39–67. doi: 10.1111/j.1574-6976.2001.tb00571.x. [DOI] [PubMed] [Google Scholar]

[CR4] 4.Salazar G, et al. Global diversity and biogeography of deep-sea pelagic prokaryotes. ISME J. 2016;10:596–608. doi: 10.1038/ismej.2015.137. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR5] 5.Sunagawa S., Coelho L. P., Chaffron S., Kultima J. R., Labadie K., Salazar G., Djahanschiri B., Zeller G., Mende D. R., Alberti A., Cornejo-Castillo F. M., Costea P. I., Cruaud C., d'Ovidio F., Engelen S., Ferrera I., Gasol J. M., Guidi L., Hildebrand F., Kokoszka F., Lepoivre C., Lima-Mendez G., Poulain J., Poulos B. T., Royo-Llonch M., Sarmento H., Vieira-Silva S., Dimier C., Picheral M., Searson S., Kandels-Lewis S., Bowler C., de Vargas C., Gorsky G., Grimsley N., Hingamp P., Iudicone D., Jaillon O., Not F., Ogata H., Pesant S., Speich S., Stemmann L., Sullivan M. B., Weissenbach J., Wincker P., Karsenti E., Raes J., Acinas S. G., Bork P., Boss E., Bowler C., Follows M., Karp-Boss L., Krzic U., Reynaud E. G., Sardet C., Sieracki M., Velayoudon D. Structure and function of the global ocean microbiome. Science. 2015;348(6237):1261359–1261359. doi: 10.1126/science.1261359. [DOI] [PubMed] [Google Scholar]

[CR6] 6.Zinger L, et al. Global Patterns of Bacterial Beta-Diversity in Seafloor and Seawater Ecosystems. PLOS ONE. 2011;6:e24570. doi: 10.1371/journal.pone.0024570. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR7] 7.Locey, K. J. & Lennon, J. T. Scaling laws predict global microbial diversity. Proceedings of the National Academy of Sciences113, 5970–5975, 10.1073/pnas.1521291113 (2016). [DOI] [PMC free article] [PubMed]

[CR8] 8.Yooseph Shibu, Nealson Kenneth H., Rusch Douglas B., McCrow John P., Dupont Christopher L., Kim Maria, Johnson Justin, Montgomery Robert, Ferriera Steve, Beeson Karen, Williamson Shannon J., Tovchigrechko Andrey, Allen Andrew E., Zeigler Lisa A., Sutton Granger, Eisenstadt Eric, Rogers Yu-Hui, Friedman Robert, Frazier Marvin, Venter J. Craig. Genomic and functional adaptation in surface ocean planktonic prokaryotes. Nature. 2010;468(7320):60–66. doi: 10.1038/nature09530. [DOI] [PubMed] [Google Scholar]

[CR9] 9.Fisher RA, Corbet AS, Williams CB. The Relation Between the Number of Species and the Number of Individuals in a Random Sample of an Animal Population. Journal of Animal Ecology. 1943;12:42–58. doi: 10.2307/1411. [DOI] [Google Scholar]

[CR10] 10.Sanders HL. Marine Benthic Diversity: A Comparative Study. The American Naturalist. 1968;102:243–282. doi: 10.1086/282541. [DOI] [Google Scholar]

[CR11] 11.Gart JJ, Siegel AF, German RZ. Rarefaction and Taxonomic Diversity. Biometrics. 1982;38:235–241. doi: 10.2307/2530306. [DOI] [Google Scholar]

[CR12] 12.Gotelli NJ, Colwell RK. Quantifying biodiversity: procedures and pitfalls in the measurement and comparison of species richness. Ecology Letters. 2001;4:379–391. doi: 10.1046/j.1461-0248.2001.00230.x. [DOI] [Google Scholar]

[CR13] 13.Haegeman Bart, Hamelin Jérôme, Moriarty John, Neal Peter, Dushoff Jonathan, Weitz Joshua S. Robust estimation of microbial diversity in theory and in practice. The ISME Journal. 2013;7(6):1092–1101. doi: 10.1038/ismej.2013.10. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR14] 14.Gotelli NJ, Colwell RK. Estimating species richness. Biological diversity: frontiers in measurement and assessment. 2011;12:39–54. [Google Scholar]

[CR15] 15.Gotelli, N. J. & Chao, A. In Encyclopedia of Biodiversity (Second Edition) 195–211 (Academic Press, 2013).

[CR16] 16.Chao A, Colwell RK, Lin C-W, Gotelli NJ. Sufficient sampling for asymptotic minimum species richness estimators. Ecology. 2009;90:1125–1133. doi: 10.1890/07-2147.1. [DOI] [PubMed] [Google Scholar]

[CR17] 17.Willis JC, Yule GU. Some Statistics of Evolution and Geographical Distribution in Plants and Animals, and their Significance. Nature. 1922;109:177–179. doi: 10.1038/109177a0. [DOI] [Google Scholar]

[CR18] 18.Newman MEJ. Power laws, Pareto distributions and Zipf’s law. Contemporary Physics. 2005;46:323–351. doi: 10.1080/00107510500052444. [DOI] [Google Scholar]

[CR19] 19.Yule G. U. A Mathematical Theory of Evolution, Based on the Conclusions of Dr. J. C. Willis, F.R.S. Philosophical Transactions of the Royal Society B: Biological Sciences. 1925;213(402-410):21–87. doi: 10.1098/rstb.1925.0002. [DOI] [Google Scholar]

[CR20] 20.SIMON HERBERT A. ON A CLASS OF SKEW DISTRIBUTION FUNCTIONS. Biometrika. 1955;42(3-4):425–440. doi: 10.1093/biomet/42.3-4.425. [DOI] [Google Scholar]

[CR21] 21.Simkin MV, Roychowdhury VP. Re-inventing Willis. Physics Reports. 2011;502:1–35. doi: 10.1016/j.physrep.2010.12.004. [DOI] [Google Scholar]

[CR22] 22.Perc M. The Matthew effect in empirical data. J. R. Soc. Interface. 2014;11:20140378. doi: 10.1098/rsif.2014.0378. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR23] 23.Deluca A, Corral A. Fitting and goodness-of-fit test of non-truncated and truncated power-law distributions. Acta Geophysica. 2013;61:1351–1394. doi: 10.2478/s11600-013-0154-9. [DOI] [Google Scholar]

[CR24] 24.Voitalov I, van der Hoorn P, van der Hofstad R, Krioukov D. Scale-free networks well done. Phys. Rev. Research. 2019;1:033034. [Google Scholar]

[CR25] 25.Pearman JK, Casas L, Merle T, Michell C, Irigoien X. Bacterial and protist community changes during a phytoplankton bloom. Limnology and Oceanography. 2016;61:198–213. doi: 10.1002/lno.10212. [DOI] [Google Scholar]

[CR26] 26.Curtis TP, Sloan WT, Scannell JW. Estimating prokaryotic diversity and its limits. Proceedings of the National Academy of Sciences. 2002;99:10494–10499. doi: 10.1073/pnas.142680199. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR27] 27.Hoffmann KH, et al. Power law rank–abundance models for marine phage communities. FEMS Microbiology Letters. 2007;273:224–228. doi: 10.1111/j.1574-6968.2007.00790.x. [DOI] [PubMed] [Google Scholar]

[CR28] 28.Datta S, Delius GW, Law R, Plank MJ. A stability analysis of the power-law steady state of marine size spectra. Journal of Mathematical Biology. 2011;63:779–799. doi: 10.1007/s00285-010-0387-z. [DOI] [PubMed] [Google Scholar]

[CR29] 29.Rozenfeld AF, et al. Network analysis identifies weak and strong links in a metapopulation system. Proceedings of the National Academy of Sciences. 2008;105:18824–18829. doi: 10.1073/pnas.0805571105. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR30] 30.Edwards RA, Rohwer F. Viral metagenomics. Nature Reviews Microbiology. 2005;3:504. doi: 10.1038/nrmicro1163. [DOI] [PubMed] [Google Scholar]

[CR31] 31.Angly F, et al. PHACCS, an online tool for estimating the structure and diversity of uncultured viral communities using metagenomic information. BMC Bioinformatics. 2005;6:41. doi: 10.1186/1471-2105-6-41. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR32] 32.Hugoni M, et al. Structure of the rare archaeal biosphere and seasonal dynamics of active ecotypes in surface coastal waters. Proceedings of the National Academy of Sciences. 2013;110:6004–6009. doi: 10.1073/pnas.1216863110. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR33] 33.Clauset A, Shalizi CR, Newman MEJ. Power-Law Distributions in Empirical Data. SIAM Rev. 2009;51:661–703. doi: 10.1137/070710111. [DOI] [Google Scholar]

[CR34] 34.Ser-Giacomi E, et al. Ubiquitous abundance distribution of non-dominant plankton across the global ocean. Nature Ecology & Evolution. 2018;2:1243–1249. doi: 10.1038/s41559-018-0587-2. [DOI] [PubMed] [Google Scholar]

PERMALINK

Scaling of species distribution explains the vast potential marine prokaryote diversity

Victor M Eguíluz

Guillem Salazar

Juan Fernández-Gracia

John K Pearman

Josep M Gasol

Silvia G Acinas

Shinichi Sunagawa

Xabier Irigoien

Carlos M Duarte

Abstract

Introduction

Results

Prokaryote diversity in the upper and deep ocean

Figure 1.

Table 1.

Theoretical scaling

Empirical and in silico scaling

Figure 2.

Mesocosm experiment

Figure 3.

Table 2.

Discussion

Materials and Methods

Data and experimental design

Statistical analysis

Abundance distribution

In silico prokaryote diversity: upper ocean

In silico prokaryote diversity: deep ocean

Akaike Information Criterion (AIC)

Extrapolation of abundance distributions for larger number of samples

Supplementary information

Acknowledgements

Author contributions

Data availability

Competing interests

Footnotes

Supplementary information

References

Associated Data

Supplementary Materials

Data Availability Statement

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases