Abstract
The need for open, reproducible science is of growing concern in the twenty-first century, with multiple initiatives like the widely supported FAIR principles advocating for data to be Findable, Accessible, Interoperable and Reusable. Plant ecological and evolutionary studies are not exempt from the need to ensure that the data upon which their findings are based are accessible and allow for replication in accordance with the FAIR principles. However, it is common that the collection and curation of herbarium specimens, a foundational aspect of studies involving plants, is neglected by authors. Without publicly available specimens, huge numbers of studies that rely on the field identification of plants are fundamentally not reproducible. We argue that the collection and public availability of herbarium specimens is not only good botanical practice but is also fundamental in ensuring that plant ecological and evolutionary studies are replicable, and thus scientifically sound. Data repositories that adhere to the FAIR principles must make sure that the original data are traceable to and re-examinable at their empirical source. In order to secure replicability, and adherence to the FAIR principles, substantial changes need to be brought about to restore the practice of collecting and curating specimens, to educate students of their importance, and to properly fund the herbaria which house them.
Keywords: FAIR principles, plant ecology, plant evolution, diversity, reproducibility, open science
1. Introduction
The twenty-first century is the century of data science. Data science relies on the reuse of scholarly datasets, which are produced by empirical research. The ever-growing production and huge potential of the vast available amounts of such data has led to the development of principles that aim to ensure the reusability and accessibility of data. This set of principles was grouped under ‘FAIR' by Wilkinson et al. in 2016 [1], who recognized the urgent need for data management in science to conform to rigorous standards. They set out that data should be Findable, Accessible, Interoperable and Reusable, with a strong focus on enhancing the capacity of machines to automatically find and use this data [1]. These principles are steadily being incorporated into data management plans, with journals and funding bodies requiring that scientists deposit their data and code alongside the traditional scientific outputs of peer reviewed articles [2].
Less attention has been paid, however, to the quality of the original data. This raw data is the ground on which the four pillars of FAIR stand, and it might very quickly turn into quicksand if the revision and replication of the original research is not possible. For FAIR to work, data must not only be traceable to the original research but also to the primary source of evidence, which itself must be accessible and revisable.
Here, we argue that plant ecological and evolutionary studies need to recognize the importance of specimens as their primary data, which should therefore be archived according to the FAIR principles. As science moves towards more open structures, with funding bodies such as the European Research Council increasingly requiring accessible and reproducible data and analyses in accordance with the FAIR principles [3], it is crucial that those who work with plants ensure that the data we provide are truly reproducible and open. In order to achieve that we need to deposit specimens, the primary data of plant ecological and evolutionary sciences, in appropriately archived repositories.
2. Why are specimens the primary data of plant ecological and evolutionary studies?
We believe that the first obstacle for the development of truly open and reproducible science arises from a general misunderstanding of what the raw data actually are in plant ecology and evolution. Disciplines such as archaeology and palaeontology treat artefacts and fossils as primary data that must be included within a public repository as a requirement for the publication of their research. It seems appropriate that scientists exploring different aspects of plant diversity acknowledge, therefore, that their primary data are the specimens themselves, in addition to species counts or lists.
The disclosure of details of vouchers is mandatory for publications in many taxonomic journals. Outside of these specialist journals, however, details of vouchers are often not required, even when wild plants have been identified and studied. When a plant becomes part of an ecological or evolutionary study, its identification is the result of the interpretation of a researcher and, therefore, species identifications do not constitute the primary data of this research. The primary data are, rather, the original specimens studied [4]. If prepared and curated properly they become the tangible materials that ground the plant sciences in the real, natural world. Herbarium specimens are also a valuable resource for future researchers looking to either revisit the original study or to answer new questions.
3. Why is it important to collect and preserve specimens?
The identification of living material can be challenging, with many groups requiring high levels of taxonomical expertise. Materials identified within taxonomically complex groups such as cryptogams, grasses, sedges and many tropical plant families are easily confused, especially if they derive from places where local floras are scarce, incomplete or non-existent. Where available, taxonomic proposals may disagree in treatments for many taxa and areas. An absence of references to the taxonomical proposals followed is common1 and the omission of authorities of names is the norm rather than the exception. The frequent neglect of these methodological details means that research is simply not reproducible because the basic methodological process of specimen identification cannot be repeated. If, however, specimens have been collected, the work remains reproducible and revisable by taxonomic experts.
When biodiversity research involves a great number of common or easily identifiable specimens, visual identifications are common practice and specimens rarely end up in public collections. Although in many cases there will be no taxonomic ambiguity, there are many situations where problems may arise. Take, for example, the case of Hedera L. in the Iberian Peninsula. A recent revision of the genus found that the widely distributed, easily identified ivy (Hedera helix L.), a long-standing species, actually coexists with two other species of Hedera (H. hibernica Bean ex DC and H. maderensis subsp. iberica McAllister) whose discrimination relies on the shape and colour of the trichomes borne on the young leaves and sterile shoots [5], characters unlikely to be recorded in pictures or any other ancillary data. Published research on Iberian forests where ivy is commonly used as a climate-sensitive indicator species, is now only revisable if vouchers can be revisited to incorporate these findings.
A lack of reproducibility is particularly concerning as the use of meta-analyses increases and becomes more influential. In a rapidly changing world, studies of evolutionary and ecological change use massive datasets covering long time scales and wide geographical ranges. Specimens under public custody are an invaluable, and sometimes exclusive, source of these datasets. Without them, the results of meta-analyses will be at best not reproducible, but at worst scientifically and empirically dubious. This situation may stem from informal observations being falsely conflated with natural history [6], when in reality natural history requires the meticulous curation of specimens. Without the sometimes maligned practice of natural history, the results of large-scale data analyses may be unrepeatable, unreliable or even invalid [7].
4. Why are publicly accessible specimens imperative for open, reproducible science?
Open, reproducible science is subject to public and free accessibility of primary data, an area of increasing awareness and concern [1,8]. As a community, we abide by open policies for data and code, which are now often published alongside manuscripts. Consequently, databases such as the Global Biodiversity Iinformation Facility (www.gbif.org) or GenBank (https://www.ncbi.nlm.nih.gov/genbank/) host free, accessible information on species, their systematics and distributions, and in the latter case genetic sequences. There has not been sufficient work carried out on plant sequences in GenBank to know how many might be based on incorrect identification [9]. If the primary data, that is the specimens themselves, are properly recorded and archived, identifications can be revised and mistakes can be amended [10]. What happens, however, when there are no vouchers upon which to ground ecological and evolutionary research?
In some cases, the presence of ancillary data, such as pictures, small tissue samples for genetic analyses, or genetic sequences, allows for verification of species identity [11]. These additional tools for verification, which build upon the foundation of herbaria as repositories of verified specimens, all need to follow the same FAIR principles, and be accessible for downstream checking and verification. This is not to disregard the usefulness of new technologies, such as high-quality photographs which can provide extremely powerful tools to identify, and therefore vouch for, specimens under certain circumstances and given that they are made publicly available through databases or herbaria (where they can be included as valid vouchers [12]).
In many ecological and evolutionary studies, the only evidence of the plants studied is a species name, traceable solely to the researcher who identified the organism, and not to vouchers deposited in a herbarium (or another verifiable source for identification). This means that without publicly available specimens, even if the data from these studies are freely accessible and used in meta-analyses, they cannot be re-visited, verified or challenged. This could introduce untraceable and incommensurable sources of error into these meta-analyses. Strikingly, contributions of specimens to public herbaria have been in decline throughout the last decades [13].
5. Why is the curation and preservation of specimens a dying practice?
Pressing plants is costly, time consuming, involves following detailed methods and protocols, and requires practice and craftsmanship. This once widespread skill has fallen out of university curricula in favour of other more ‘contemporary' content [14]. Regrettably, many biodiversity researchers nowadays lack even basic training in the collection of specimens.
Even when specimens are collected, not many public institutions are in a position to manage the vast number of pressed plants that may derive from ecological research. Herbaria and natural history museums across the world have access to limited and/or decreasing funding and staff, with many having closed [15]. This means that researchers may not have access to an appropriate repository, or that the ones they do have access to might not be able to accept specimens for logistical or financial reasons.
6. Towards FAIR: how can we solve this problem?
In order that plant ecological and evolutionary studies adhere to the FAIR principles (that data are Findable, Accessible, Interoperable and Reusable), we make some recommendations under each guiding principle.
1. Findable: journals and funding bodies should require details of vouchers from researchers working on wild plants. Researchers should curate and deposit their vouchers in appropriate herbaria or online repositories if photographic vouchers used.
2. Accessible: specimens should be available to researchers who visit herbaria in person, but herbaria should also be supported to digitise their collections. Digitization will allow wider access to the specimens and would ensure that in the event of fires or floods their details are not completely lost.
3. Interoperable: in order for their specimens to be usable by other scientists, researchers should adhere to the well-established botanical practices of specimen collection and curation (i.e. [16]), and ensure that these practices are taught to the next generation of scientists.
4. Re-usable: by advocating for the continued funding of herbaria, within institutions and as part of grant applications, these key archives of biological data can preserve specimens in perpetuity, ensuring their re-usability.
In practice, evolutionary and ecological scientists must consider, early in the design of their projects, how identification mistakes can be amended or future taxonomical innovations incorporated into their research. The specimen production policy should then arise from the answer to this question. Every individual sampled for a phylogenetic study is susceptible to being misidentified, and any non-amendable mistakes on the identification of the specimens could impact the validity of the study, making vouchers essential. In other instances, such as in broader ecological studies, a single specimen per species is often enough to vouch for its taxonomical identity when it occurs multiple times at a given, referenced site.
7. Concluding remarks
As the advancement of science sheds new light on our pre-existing knowledge of nature, we must be able to re-examine it under new perspectives. Initiatives like the FAIR principles are essential to this aim. FAIR compliant data, nonetheless, must be revisable. Data must be traceable to their original sources of evidence, where they can be revisited and challenged. For plant ecological and evolutionary studies, this is impossible in the absence of vouchers.
Supplementary Material
Acknowledgements
We want to thank A/Prof. Lindsey Gillson for contributing to the general discussion and her useful comments and insights in previous versions of this manuscript. We would also like to acknowledge the insightful comments of two anonymous reviewers, who greatly improved the arguments and flow of the manuscript.
Endnote
Nomenclature in this paper follows the International Plant Names Index (IPNI). When names are derived from cited literature, nomenclature follows the taxonomical proposal of the original work.
Data accessibility
This article has no additional data.
Authors' contributions
Both authors contributed equally to this manuscript.
Competing interests
We declare we have no competing interests.
Funding
S.M. acknowledges funding from the NRF Competitive Programme for Rated Researchers (118538). A.C.M.J. acknowledges funding from the NRF African Origins Platform (117666).
References
- 1.Wilkinson MD, et al. 2016. Comment: The FAIR guiding principles for scientific data management and stewardship. Sci. Data 3, 160018 ( 10.1038/sdata.2016.18) [DOI] [PMC free article] [PubMed] [Google Scholar]
- 2.Culina A, van den Berg I, Evans S, Sánchez-Tójar A. 2020. Low availability of code in ecology: a call for urgent action. PLoS Biol. 18, e3000763 ( 10.1371/journal.pbio.3000763) [DOI] [PMC free article] [PubMed] [Google Scholar]
- 3.European Commission. 2016. Guidelines on FAIR data management in Horizon 2020, 6. Brussels, Belgium: European Commission.
- 4.Schilthuizen M, Vairappan CS, Slade EM, Mann DJ, Miller JA. 2015. Specimens as primary data: museums and ‘open science’. Trends Ecol. Evol. 30, 237–238. ( 10.1016/j.tree.2015.03.002) [DOI] [PubMed] [Google Scholar]
- 5.Valcarcel V, Vargas Gómez P. 2002. Hacia un tratamiento taxonómico de las hiedras (Hedera L. Araliaceae) ibéricas: de caracteres morfológicos a moleculares. An. del Jardín Botánico Madrid 59, 363–368. [Google Scholar]
- 6.Irwin A 2019. The everything mapper. Nature 573, 478–481. ( 10.1038/d41586-019-02846-4) [DOI] [PubMed] [Google Scholar]
- 7.Brooks SJ, et al. 2011. Natural history collections as sources of long-term datasets. Trends Ecol. Evol. 26, 153–154. ( 10.1016/j.tree.2010.12.009) [DOI] [PubMed] [Google Scholar]
- 8.Molloy JC 2011. The open knowledge foundation: open data means better science. PLoS Biol. 9, e1001195 ( 10.1371/journal.pbio.1001195) [DOI] [PMC free article] [PubMed] [Google Scholar]
- 9.Culley TM 2013. Why vouchers matter in botanical research. Appl. Plant Sci. 1, 1300076 ( 10.3732/apps.1300076) [DOI] [PMC free article] [PubMed] [Google Scholar]
- 10.Goodwin ZA, Harris DJ, Filer D, Wood JRI, Scotland RW. 2015. Widespread mistaken identity in tropical plant collections. Curr. Biol. 25, R1066–R1067. ( 10.1016/j.cub.2015.10.002) [DOI] [PubMed] [Google Scholar]
- 11.Troudet J, Vignes-Lebbe R, Grandcolas P, Legendre F. 2018. The increasing disconnection of primary biodiversity data from specimens: how does it happen and how to handle it? Syst. Biol. 67, 1110–1119. ( 10.1093/sysbio/syy044) [DOI] [PubMed] [Google Scholar]
- 12.LaFrankie JV, Chua AI. 2015. Application of digital field photographs as documents for tropical plant inventory. Appl. Plant Sci. 3, 1400116 ( 10.3732/apps.1400116) [DOI] [PMC free article] [PubMed] [Google Scholar]
- 13.Meineke EK, Davis CC, Davies TJ. 2018. The unrealized potential of herbaria for global change biology. Ecol. Monogr. 88, 505–525. ( 10.1002/ecm.1307) [DOI] [Google Scholar]
- 14.Woodland DW 2007. Are botanists becoming the dinosaurs of biology in the 21st century? South Afr. J. Bot. 73, 343–346. ( 10.1016/j.sajb.2007.03.005) [DOI] [Google Scholar]
- 15.Gropp RE 2003. Are university natural science collections going extinct? Bioscience 53, 550 ( 10.1641/0006-3568(2003)053[0550:aunscg]2.0.co;2) [DOI] [Google Scholar]
- 16.Bridson D, Forman L, Royal Botanic Gardens Kew . 1998. The herbarium handbook. London, UK: Royal Botanic Gardens Kew.
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Supplementary Materials
Data Availability Statement
This article has no additional data.
