Skip to main content
Proceedings of the National Academy of Sciences of the United States of America logoLink to Proceedings of the National Academy of Sciences of the United States of America
. 2006 Feb 13;103(8):2736–2739. doi: 10.1073/pnas.0511083103

Phanerozoic marine biodiversity dynamics in light of the incompleteness of the fossil record

Peter J Lu *,, Motohiro Yogo ‡,§, Charles R Marshall
PMCID: PMC1413823  PMID: 16477008

Abstract

Long-term evolutionary dynamics have been approached through quantitative analysis of the fossil record, but without explicitly taking its incompleteness into account. Here we explore the temporal covariance structure of per-genus origination and extinction rates for global marine fossil genera throughout the Phanerozoic, both before and after corrections for the incompleteness of the fossil record. Using uncorrected data based on Sepkoski’s compendium, we find significant autocovariance within origination and extinction rates, as well as covariance between extinction and origination, not one, but two, intervals later, corroborating evidence for the unexplained temporal gap found by past studies. However, these effects vanish when the data are corrected for the incompleteness of the fossil record. Instead, we observe significant covariance only between extinction and origination in the immediately following intervals. The gap in the response of the biosphere to extinction in the uncorrected fossil record thus appears to be an artifact of the incompleteness of the fossil record, specifically due to episodic variation in the probability that taxa will be preserved, on time scales comparable to the temporal resolution of Sepkoski’s data. Our results also indicate that at that temporal resolution (the stage/substage of duration ≈5 million years), changes in origination and extinction do not persist for longer than one interval, except that elevated origination rates immediately after extinction may last for more than a single interval. Thus, although certain individual cases may deviate from the overall pattern, we find that in general the biosphere’s response to perturbation is immediate geologically and usually short-lived.

Keywords: extinction, Jaanusson effect, origination, Signor–Lipps effect, vector autoregression


Sepkoski’s compendia of marine fossil families (1) and genera (2, 3) have been central to the analysis of the long-term patterns of origination, extinction, and overall diversity of marine animals throughout the Phanerozoic. Many analyses have looked for temporal correlations between rates of origination, extinction, and total biodiversity in stages and substages of the Phanerozoic (49). Of special interest has been the discovery of the delayed recovery of diversity after times of unusually high extinction (5, 6), as well as the observation that times of high extinction and origination tend to persist for more than a substage (57). These results suggest either that the causes of elevated rates of origination or extinction persist for many millions of years, and/or that the biosphere has some intrinsic limits in how fast it can recover from extinction (6).

However, as all of these studies have recognized, Sepkoski’s raw data do not include any corrections for the incompleteness of the fossil record. Significantly, there is growing evidence that this incompleteness, and its temporal variation, can have significant effects on perceived patterns of origination, extinction, and biodiversity in the fossil record (1012). While some analyses have been adjusted in an attempt to account for incompleteness (e.g., ref. 6), none has performed a basic time-series analysis on a data set that has used geological and paleontological first principles to quantitatively correct the observed rates of origination and extinction derived from Sepkoski’s compendium of marine genera for the effects of incompleteness. Foote has recently presented such a data set (13), and its differences from Sepkoski’s raw data are striking. Specifically, Foote has noted that the measured peaks of origination and extinction in the corrected data are far more volatile than in Sepkoski’s uncorrected data (representative time series are presented in Fig. 1), and he cautioned that between-stage correlations in rates of origination and extinction may indeed be artifacts created by the incompleteness of the fossil record.

Fig. 1.

Fig. 1.

Per-genus rates of (a) origination Oi and (b) extinction Ei as a function of time. Uncorrected data (Upper) and data corrected for the incompleteness of the fossil record (Lower) are presented. Diversity has been counted by using the boundary-crosser method, assuming pulsed turnover (13). Different geological periods are indicated in standard colors; average rates are indicated by dotted lines.

Analytic Approach

Here we examine how the previously reported correlations of between-stage rates of origination and extinction (5, 6, 8) are affected when Foote’s corrected data are subjected to time-series analysis. Specifically, we use vector autoregression (VAR) analysis (14) to examine the temporal covariance structure between per-genus origination and extinction rates of Phanerozoic marine genera, for both Sepkoski’s raw data and the data corrected by Foote for the incompleteness of the fossil record (13).

For each ith interval of duration ti in the time-series data (Fig. 1), we calculated the covariances between per-genus rates of origination Oi and extinction Ei with the originations Oi−n and extinctions Ei−n in the n previous intervals, denoted VAR(n). Combining originations and extinctions into the two-dimensional vector xi = (Oi, Ei), the VAR(1) model is xi = Φ1·xi−1 + εi, where Φ1 is the 2 × 2 matrix of normalized covariance coefficients relating to origination and extinction one interval previous, εi represents serially uncorrelated random noise, and intercepts have been omitted to simplify notation. The VAR(2) model extends the analysis to two lags, and is defined as xi = Φ1·xi−1 + Φ2·xi−2 + εi, explicitly:

graphic file with name zpq00806-1169-m01.jpg

Previous analyses (4, 6, 8) have determined autocorrelations and cross-correlations, separately normalized to the range from −1 to +1, which precludes direct numerical comparison between the magnitudes of an autocorrelation within a time series and a cross-correlation with another. Here the multidimensional VAR technique (14) simultaneously determines the relative contributions of the two, but also uses a different normalization: clearly, autocovariance and covariance coefficients cannot both be constrained to the same finite range if their relative magnitudes are not bounded. Instead, the VAR coefficients are normalized by the inverse of the variance, which in our two-dimensional system constrains the autocovariance coefficients (diagonal Φ matrix elements) to the range from −1 to +1, the same as for the more familiar autocorrelation (the autocovariance of a time series of one-dimensional scalars). An origination autocovariance coefficient Φ1oo = 0.5 ± 0.2 implies that, holding extinction at its average value, if origination in the previous time interval is 1% above average, origination will be 0.5 ± 0.2% above average in the present interval. By contrast, the covariance coefficients (off-diagonal Φ matrix elements) are determined relative to the normalized autocovariance coefficients; they are therefore numerically unconstrained, and in particular can have values greater than +1. Although these covariance coefficients cannot be easily mapped to cross-correlation coefficients, quantitative interpretation is nonetheless straightforward. For example, a value of the covariance coefficient Φ1oe = 1.5 ± 0.5 implies that, holding extinction at its average value, if origination in the previous time interval is 1% above average, extinction will be 1.5 ± 0.5% above average in the present interval.

Data

We analyzed several published tabulations of per-genus rates of origination and extinction of Phanerozoic genera. The uncorrected raw binned data come from the compendium of fossil genera published by Sepkoski (3), and have been used in previous studies (68). We used the 101 time intervals running from the Cambrian (543 million years ago) to the present, and excluded genera appearing in only a single interval (“singletons”) to minimize Lagerstätten and monographic effects. Per-genus rates of origination (extinction) were defined as the fraction of total genera that appeared (disappeared) in an interval.

A second major tabulation is Foote’s boundary-crossing data (13), which were broken into 77 intervals corresponding mainly to geological stages. With the boundary-crosser approach, we have only point estimates of origination and extinction rates. Compiling the rates of origination and extinction depends on where in the intervening interval the taxa actually originated or went extinct. Foote used two end-member approaches for dealing with this uncertainty: (i) he assumed that taxa originated and went extinct at a uniform rate throughout the interval (the model of continuous turnover); or, (ii) he assumed that all originations occur at the beginning, and all extinctions, at the end, of the interval (the model of pulsed turnover). The uncorrected data presented in Fig. 1 are tabulated assuming this pulsed model (figure 2 of ref. 13), which currently seems the more likely approach (15).

Fig. 2.

Fig. 2.

Correcting for the incompleteness of the fossil record causes autocovariance within origination, and within extinction, to disappear. Scatter plots of origination Oi−1 in a given interval vs. origination in the following interval Oi are presented for Sepkoski’s uncorrected (a) and corrected (c) data. Similar scatterplots of extinction are presented for uncorrected (b), and corrected (d) data. Boundary-crossers with pulsed turnover are used in all panels. Color coding corresponds to geologic time period, as in Fig. 1. Mean averages are indicated by black lines. The autocovariance present in the uncorrected data (a and b), manifest as the clumping of data points around the gray line of slope 1, disappears in the corrected data (c and d).

Neither tabulation method, however, accounts for the incompleteness of the fossil record. Foote has used preservation and rock volume effects to estimate true origination and extinction per-taxon rates, in both continuous and pulsed turnover scenarios (figures 3 and 4 of ref. 13), which we term “corrected” data. While it is inherently challenging to verify any model that corrects for missing data, to date Foote is the only one to have made such a correction, and it highly desirable to determine what effect these corrections have on the perceived diversity dynamics derived from analysis of the uncorrected data. Analyzing the corrected data is especially important because the incompleteness of the fossil record distorts our view of the history of life; for example, observed times of extinction always predate true times of extinction, unless fossil misidentifications or correlation errors occur. The observed temporal ranges of taxa are not only incomplete but also biased.

Fig. 3.

Fig. 3.

Episodic incompleteness of the fossil record can cause a rapid recovery from extinction to appear delayed. (a) Complete preservation: an extinction pulse at ti−1 followed immediately by complete recovery (origination pulse) at ti. (b) Uniform, incomplete preservation: assuming an exponential distribution of waiting times between a fossil’s apparent and true extinction (6), the pulse of extinction at ti−1 (dark purple) is smeared over ti−2 and ti−3 (lighter purple); the analogous argument applies to origination (blue). Nonetheless, the origination distribution’s peak at ti still immediately follows the extinction distribution’s peak at ti−1, leaving temporal correlations qualitatively unchanged relative to the case of complete preservation. (c) Episodic, incomplete preservation: preservability in the time interval ti immediately after an extinction at ti−1 is often comparatively low (13), so that taxa that actually originated during ti will be reported to have originated in ti+1 and ti+2. This drop in preservability at ti further decreases the reported origination rate at ti (blue arrow) while concomitantly increasing the apparent rates in ti+1 and ti+2 (white arrows), creating the artifactual temporal lag/delay in time of recovery from extinction that appears in Sepkoski’s data (13).

Results

As a baseline to compare with previous studies, we first analyzed the uncorrected data in both binned and pulsed boundary-crosser tabulations. We then analyzed the corrected data in the preferred pulsed model, and for comparison we also analyzed the corrected data by assuming continuous turnover. Our results are presented in Table 1, with coefficients reported as the point estimate ± one standard error, with significant coefficients (P < 0.05) in boldface type.

Table 1.

VAR(2) coefficients for all data sets, reported as point estimate ± heteroskedasticity robust standard errors

Covariance Lag Coefficient Uncorrected binned Uncorrected pulsed Corrected continuous Corrected pulsed
Origination autocovariance 1 Φ1oo 0.60 ± 0.10 0.59 ± 0.21 −0.10 ± 0.13 0.05 ± 0.10
2 Φ2oo 0.00 ± 0.08 −0.26 ± 0.10 −0.01 ± 0.13 −0.13 ± 0.07
Origination → extinction covariance 1 Φ1oe 0.17 ± 0.13 0.08 ± 0.05 0.13 ± 0.13 0.07 ± 0.04
2 Φ2oe −0.07 ± 0.11 −0.02 ± 0.02 0.22 ± 0.12 −0.00 ± 0.03
Extinction → origination covariance 1 Φ1eo 0.08 ± 0.07 0.36 ± 0.38 0.62 ± 0.15 1.37 ± 0.42
2 Φ2eo 0.23 ± 0.05 1.16 ± 0.30 0.13 ± 0.13 1.49 ± 0.54
Extinction autocovariance 1 Φ1ee 0.37 ± 0.17 0.46 ± 0.16 0.06 ± 0.14 0.12 ± 0.12
2 Φ2ee 0.05 ± 0.11 0.05 ± 0.10 −0.19 ± 0.15 0.01 ± 0.11

Numbers in boldface type are 5% significant values (P < 0.05). Φ components are as in Eq. 1. Uncorrected binned data are from Sepkoski’s compilation (3). Boundary-crosser data, from both uncorrected (pulsed) and corrected (continuous and pulsed) analyses, are as in figures 2, 3, and 4 in ref. 13.

Because several factors might have influenced our results, we also ran several additional regressions: we included ti as another variable to account for varying interval duration, and we included a linear drift (detrended the data), to account for long-term secular decline. These results, presented in Table 2, which is published as supporting information on the PNAS web site, agree within error with those in Table 1.

For the uncorrected data, in both binned and boundary-crosser tabulations, we find that per-genus origination and extinction rates have strong autocovariance, even while accounting for extinction-origination covariance (Fig. 2a and b). This finding agrees with earlier results (47), which have formed the basis for the suggestion of inertia in the rates of origination and extinction; i.e., once origination (6, 7) or extinction (57) rates increase (decrease), they remain high (low). The only significant covariance is between rates of origination and the rates of extinction two intervals previous, or about 5–10 million years (Φ2eo = 1.16 ± 0.30 for boundary-crossers). That is, the present origination rate depends on the extinction rate two intervals previous, and not on the rate in the immediately preceding interval (i.e., Φ1eo is insignificant). Although calculated by a completely separate analysis, this result agrees with the peak at 10 million years in the cross-correlation analysis by Kirchner and Weil (6) that has been used to infer delayed recovery after extinction.

The picture drastically changes, however, when Foote’s corrections for the incompleteness of the fossil record (13) are applied to Sepkoski’s raw data. The autocovariances vanish entirely (Fig. 2 c and d). Note that the statistical result is not a negative autocovariance: it is not the case that a high rate of extinction presages a low rate in the next interval, for example. The temporal gap between extinction and subsequent origination also disappears entirely; no evidence for delayed recovery remains. In Foote’s continuous model, the two-interval lag Φ2eo becomes insignificant, replaced by a significant one-interval lag Φ1eo. In Foote’s pulsed model, both one-interval and two-interval lags contribute significantly.

Discussion

Our analysis suggests that the temporal gap between extinction and subsequent origination in the uncorrected data are an artifact of the smearing back of extinctions [Signor–Lipps effect (16)] and the smearing forward of originations [the Sppil–Rongis (6, 17) or Jaanusson (18, 19) effect] due to incomplete preservation. If a taxon’s true last occurrence is missed because of incomplete preservation, its extinction will be mistakenly reported too early; conversely, originations will be misreported too late. However, if the fossil record were uniformly incomplete, then, as Kirchner and Weil point out (6), the Signor–Lipps and Jaanusson effects are not sufficient to explain why delayed recovery after extinction should be observed in the uncorrected data if delayed recovery were not a real feature of the history of life (Fig. 3a and b). But the Signor–Lipps and Jaanusson effects can produce the false appearance of delayed recovery if the fossil record is nonuniformly incomplete (Fig. 3c). In fact, there is strong empirical evidence that preservation potential is not uniform through time (1012). Foote’s corrections to Sepkoski’s data allow for stage-by-stage variation in the quality of preservation, and he found not only that preservation probabilities vary widely from stage to stage, but also that they correlate highly with the amount of preserved marine sedimentary rock (figure 5 of ref. 13). It appears that the delayed recovery observed in the uncorrected data is specifically an artifact of the high temporal variability in the incompleteness of the fossil record (Fig. 3). Piecemeal examination of the stratigraphic record lends support to this conclusion, where extinction is often observed in intervals of relatively high preservability, whereas the subsequent originations first appear in immediately following intervals of comparatively low preservability (13). Highly episodic preservation probabilities are also expected, given our knowledge of the processes responsible for the deposition of sedimentary rock, which houses the fossil record. Explicit sequence stratigraphic modeling of how both tectonic factors and changes in eustatic sea-level affect the structure of the sedimentary rock record predicts that, even with completely uniform origination and extinction, the rock record should exhibit artifactual peaks in extinction followed by delayed peaks in origination (20).

Incomplete preservation, whether episodic or not, may also result in members of a group of taxa, that in reality all became extinct in a single interval, to appear to have gone extinct over several intervals in the uncorrected data (Fig. 3 b and c) (6). This smearing out will broaden the spikes in the extinction and origination record, and the resulting multiperiod peaks will have high autocovariance, which we observe in the uncorrected data. When the smearing is taken out, Foote’s corrected data appear much more volatile, as evident in Fig. 1.

Our results suggest that the Signor–Lipps and Jaanusson effects each shorten stratigraphic ranges on time scales of stage or substage duration (≈5 million years); i.e., that originations (extinctions) typically occur one stratigraphic interval too late (too early) in the fossil record. This idea is consistent with the results of quantitative sequence stratigraphic models, where species are allowed to evolve and be preserved in basins that are accumulating sediment (21, 22). In these computer simulations, observed stratigraphic ranges may have gaps that span up to half the true species durations; the incompleteness of the stratigraphic ranges implied by Foote’s corrections of Sepkoski’s data are of a similar magnitude. Moreover, data on the preservability of marine taxa are also consistent with the finding that the incompleteness of the fossil record has typically shortened stratigraphic ranges by one or more stages/substages. Under the assumptions of pulsed turnover, a preservability of 0.5 implies that there is a half chance that a taxon’s time of first (last) occurrence will not be preserved in the earliest (latest) interval in which it really was extant (15). Foote and Sepkoski (23) estimated the preservability of marine groups in the fossil record, finding that preservabilities are typically about 0.5 at this stratigraphic resolution (e.g., Anthozoa, 0.4–0.5; Crinoidea, 0.36–0.37; Gastropoda, 0.41–0.55; Bivalvia, 0.45–0.51; Echinoidea, 0.56–0.65). Several major groups have significantly lower values (e.g., Malacostraca, 0.19–0.33; Osteichthyes, 0.15–0.29; Chondrichthyes, 0.07–0.18), implying even larger Signor–Lipps and Jaanusson effects, whereas only two have much higher values (Brachiopoda, 0.95–1.0; Cephalopoda, 0.85–0.90), suggesting that the Signor–Lipps and Jaanusson effects should not be significant for these groups in the type of analysis we have conducted here.

The results of the VAR analysis depend in part on when within the stratigraphic intervals the true originations and extinctions actually occurred. At present, there is no definitive answer as to how they are distributed, and although the pulsed model seems more likely, the case is less compelling for originations than for extinctions (15). Assuming that Foote’s corrections improve our view of global evolutionary trends, our analysis suggests that on average the response of origination rates to extinction is immediate, and may persist for more than one time interval, if the pulsed model is indeed the better correction for the limited temporal resolution in Sepkoski’s compendium. The lack of autocovariance in the corrected data also suggests that the response of the biosphere to perturbation in general may be both immediate and short-lived (24).

Finally, it is important to remember that in analyses such as ours here, we are reporting an average, or dominant, signal for the entire time series. In time series as causally rich and complex as Phanerozoic marine diversity, there will be individual instances that do not conform to the general pattern. Whereas our analysis does not support the finding of delayed recovery after extinctions in general (6), empirical data do suggest a delayed recovery in certain individual cases, such as following the end-Permian mass extinction (25), although empirical data from local sections can be hampered by the lack of control for sequence stratigraphic architecture (22). As always, in analyzing phenomenologically rich systems there is a tradeoff between large-scale generalities, which we present here, and detailed event-specific description and explanation.

Supplementary Material

Supporting Table

Acknowledgments

We thank R. Bambach for his binned and phyla tabulations of the data from the late J. Sepkoski’s compendium of fossil genera. We thank M. Foote for providing the raw data from ref. 13 and both M. Foote and S. Holland for very helpful reviews and invaluable discussion. This work was partially supported by National Science Foundation Grants EAR-000385 and DEB-0083983 (to C.R.M.).

Abbreviation

VAR

vector autoregression.

Footnotes

Conflict of interest statement: No conflicts declared.

References

  • 1.Sepkoski J. J., Jr. Milwaukee Pub. Mus. Contr. Biol. Geol. 1992;51:1–125. [PubMed] [Google Scholar]
  • 2.Sepkoski J. J., Jr. J. Paleontol. 1997;71:533–539. doi: 10.1017/s0022336000040026. [DOI] [PubMed] [Google Scholar]
  • 3.Sepkoski J. J., Jr. Bull. Am. Paleontol. 2002;363:1–563. [Google Scholar]
  • 4.Quinn J. F. Paleobiology. 1987;13:465–478. [Google Scholar]
  • 5.Stanley S. M. Paleobiology. 1990;16:401–414. [Google Scholar]
  • 6.Kirchner J. W., Weil A. Nature. 2000;404:177–180. doi: 10.1038/35004564. [DOI] [PubMed] [Google Scholar]
  • 7.Kirchner J. W. Nature. 2002;415:65–68. doi: 10.1038/415065a. [DOI] [PubMed] [Google Scholar]
  • 8.Kirchner J. W., Weil A. Proc. R. Soc. London B; 2000. pp. 1301–1309. [Google Scholar]
  • 9.Rohde R. A., Muller R. Nature. 2005;434:208–210. doi: 10.1038/nature03339. [DOI] [PubMed] [Google Scholar]
  • 10.Alroy J., Marshall C. R., Bambach R. K., Besuzko K., Foote M., Fürsich F. T., Hansen T. A., Holland S. M., Ivany L. C., Jablonski D., et al. Proc. Natl. Acad. Sci. USA. 2001;98:6261–6266. doi: 10.1073/pnas.111144698. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 11.Peters S. E., Foote M. Nature. 2002;416:420–424. doi: 10.1038/416420a. [DOI] [PubMed] [Google Scholar]
  • 12.Smith A. B. Phil. Trans. R. Soc. London B. 2001;356:351–367. doi: 10.1098/rstb.2000.0768. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 13.Foote M. J. Geol. 2003;111:125–148. [Google Scholar]
  • 14.Hamilton J. D. Time Series Analysis. Princeton, NJ: Princeton Univ. Press; 1994. pp. 291–350. [Google Scholar]
  • 15.Foote M. Paleobiology. 2005;31:6–20. [Google Scholar]
  • 16.Signor P. W., III, Lipps J. H. Geol. Soc. Am. Spec. Pap. 1982;190:291–296. [Google Scholar]
  • 17.Marshall C. R. In: New Approaches to Speciation in the Fossil Record. Erwin D. H., Anstey R. L., editors. New York: Columbia Univ. Press; 1995. pp. 208–235. [Google Scholar]
  • 18.Marshall C. R. In: The Adequacy of the Fossil Record. Donovan S. K., Paul C. R. C., editors. London: Wiley; 1998. pp. 23–53. [Google Scholar]
  • 19.Jaanusson V. In: The Ordovician System: Proceedings of a Palaeontological Association Symposium Birmingham, September 1974. Bassett M. G., editor. Cardiff: Univ. of Wales Press and National Museum of Wales; 1976. pp. 301–326. [Google Scholar]
  • 20.Holland S. M. Paleobiology. 1995;2:92–109. [Google Scholar]
  • 21.Holland S. M., Patzkowsky M. E. Palaios. 2002;17:134–146. [Google Scholar]
  • 22.Holland S. M., Patzkowsky M. E. Geology. 1999;27:491–494. [Google Scholar]
  • 23.Foote M., Sepkoski J. J., Jr. Nature. 1999;398:415–417. doi: 10.1038/18872. [DOI] [PubMed] [Google Scholar]
  • 24.Newman M. E. J. Proc. R. Soc. London B; 1996. pp. 1605–1610. [Google Scholar]
  • 25.Payne J. L., Lehrmann D. J., Wei J. Y., Orchard M. J., Shrag D. P., Knoll A. H. Science. 2004;305:506–509. doi: 10.1126/science.1097023. [DOI] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Supporting Table

Articles from Proceedings of the National Academy of Sciences of the United States of America are provided here courtesy of National Academy of Sciences

RESOURCES