Abstract
Physical samples and their associated data and metadata underpin scientific discoveries across disciplines and can enable new science when appropriately archived. However, there are significant gaps in current practices and infrastructure that prevent accurate provenance tracking, reproducibility, and attribution. For most samples, descriptive metadata are often sparse, inaccessible, or absent. Samples and associated data and metadata may also be scattered across numerous physical collections, data repositories, laboratories, data files, and papers with no clear linkage or provenance tracking as new information is generated over time. The Earth Science Information Partners (ESIP) Physical Samples Curation Cluster has therefore developed guidance for scientific authors on ‘Publishing Open Research Using Physical Samples.’ This involved synthesizing existing practices, gathering community feedback, and assessing real-world examples. We identified improvements needed to enable authors to efficiently cite and link Earth science samples and related data, and track their use. Our goal is to help improve discoverability, interoperability, and reuse of physical samples, and associated data and metadata. Though primarily focused on the needs of Earth and environmental sciences, these guidelines are broadly applicable.
Subject terms: Environmental sciences, Research data, Biogeochemistry, Solid Earth sciences
Introduction
Physical samples and their associated data and metadata are primary building blocks across a wide range of scientific research. They represent features of interest or living things1,2, underpin discoveries across disciplines, and are critical to the scientific process. For example, these may include soil or water samples collected to represent environmental conditions, a rock from a geologic outcrop, or a preserved organism. When samples and associated data and metadata are findable, accessible, interoperable, and reusable (FAIR)3 and as “open as possible”4–6, new science becomes possible7,8. For example, users can instantly integrate and download species occurrence records published in Global Biodiversity Information Facility (GBIF) from over 2,000 institutions/sources globally. As a result, GBIF records are used frequently in new synthesis studies and are cited in more than two publications per day9. However, for many data types and disciplines – including Earth and environmental sciences – widespread adoption of standard practices and tools for sample and data and metadata discovery, integration, and use are in more immature stages10 or do not yet exist. This paper seeks to outline practices that enhance sample and data and metadata discovery in Earth and environmental sciences, and increase the pace of new scientific insights.
Progress in funding policies, standards, and infrastructure for samples continues to motivate greater discovery, sharing and reuse of samples and associated data and metadata9,10. Recent updates to the U.S. National Science Foundation (NSF), Division of Earth Sciences (EAR) Data and Sample Policy require that:
“All data and sample metadata underlying peer-reviewed scholarly publications resulting from EAR support must now be made publicly accessible at or before the time of publication, and no later than two (2) years after completion of data collection or generation, via appropriate long-lived FAIR-aligned repositories”11.
However, there are significant infrastructural and sociotechnical gaps that prevent accurate provenance tracking12, reproducibility7,13, and attribution14,15. For the vast majority of samples, descriptive metadata are often sparse, inaccessible, or absent16–18. Samples and associated data and metadata may also be scattered across numerous physical collections, data repositories, laboratories, data files, and papers with no clear linkage or provenance tracking as new information is generated over time12,19. Yet, there is a growing need to connect related interdisciplinary sample-associated data and metadata spanning diverse fields and data systems20.
There is also a need for researchers to respect Indigenous Data Sovereignty and Governance for samples collected on lands and waters belonging to Indigenous Peoples, and to track appropriate use of such samples. While beyond the scope of the current work, researchers should be aware of and uphold the CARE (Collective Benefit, Authority to Control, Responsibility, and Ethics) Principles for Indigenous Data Governance21 for both the collection and long-term management of samples and any derivative data on those samples.
Overall, practices for publishing and citing sample-associated data have been inconsistent, and there is a lack of clear guidelines across disciplines. This has led to several consequences: (1) Research that uses samples may not be reproducible7; (2) It can be time-consuming or even impossible to track related data and information about samples22; (3) Samples can be difficult to find and reuse; (4) Sample collection managers are less able to show the impact of their collections and curatorial work15.
To address these challenges, as part of the Earth Science Information Partners (ESIP) Physical Samples Curation Cluster, we sought to develop recommended practices for publishing and citing physical samples in scientific research. The focus here was on field-collected physical samples in Earth and environmental sciences, and associated subsamples and/or derivative data that may span the Earth, environmental and biological disciplines (e.g., microbial genomics subsamples and data from a parent soil sample, associated plant data). We first briefly review existing community practices and infrastructure to enable sample publication, citation, discovery, and tracking. We present four use cases that exemplify how sample metadata sharing, citation, and tracking need to be improved. These cases respectively demonstrate efforts to:
Efficiently publish and cite large numbers of samples and associated data and metadata;
Provide attribution and credit for those involved in physical sample collection and curation to demonstrate the value of investing in collections;
Track the use of sample data generated by analysts and laboratories; and
Connect related interdisciplinary sample data, metadata, and other research outputs.
We then outline recommended practices for Earth and environmental scientists publishing sample-related work, which are meant to support these use cases. Finally, we discuss planned future work, existing obstacles, and proposed solutions to improve data discovery, integration, and attribution for physical samples.
Existing community practices and infrastructure
To develop consistent guidelines for researchers using samples and associated data and metadata in publications, we first surveyed existing community practices and infrastructure regarding (1) standard sample metadata; (2) sample identification; (3) publishing samples and associated data and metadata; and (4) sample citation. Example organizations that provide infrastructure, recommendations, or policies for managing samples and associated data are provided in Supplemental Table 1, with additional information summarized in this section.
Standard sample metadata
Formal standards organizations have developed, and continue to maintain and expand, metadata standards for describing physical samples in specific disciplines, such as the Open Geospatial Consortium (OGC)23, Genomic Standards Consortium (GSC)24, and Biodiversity Information Standards (TDWG)25. Ad hoc groups have also come together to define metadata formats and templates for specific communities26. While beyond the scope of the present work, a previous review compared these existing metadata templates and standards to describe physical samples27, which are available for biodiversity records25, 'omics material28 (such as genomics, metagenomics, metabolomics), earth and environmental science samples29,30, and ecosystem sciences samples31. The Internet of Samples (iSamples) project used and built upon this crosswalk comparison to identify commonalities and develop a schema for core sample metadata across disciplines10.
Metadata about physical samples (such as sample type, material) and their collection details (such as geographic location, collection date) provide information needed for sample discovery, and potential integration and reuse. For example, the BioSample database maintained by the National Center for Biotechnology Information (NCBI) contains records with information and metadata describing the physical materials from which the sequence information stored in other NCBI databases like GenBank are derived25. Implementation of standard metadata practices has enabled search and access to genetic sequence data in GenBank since 197932 and aggregated species occurrence records in GBIF starting in 200133. As of October 2023, GenBank included over 3.7 billion nucleotide sequences for 557,000 formally described species34, and as of February 2025 there are close to 3.1 billion species occurrence records in GBIF that enable a wide variety of synthesis studies. The System for Earth and Extraterrestrial Sample Registration (SESAR)29,30 contains metadata records for >5 million samples including rock, mineral, sediment, and soil samples; rock, sediment, and ice cores; as well as samples of volcanic gas, fluids (e.g., seawater, river water, hydrothermal fluids), and biological specimens collected as part of Earth, planetary and environmental sciences research.
We found that disciplinary data repositories often provide information on sample metadata templates or requirements, while generalist data repositories, journal publishers, and U.S. agencies often do not provide such guidance (Supplemental Table 1).
Sample identification
Persistent identifiers (PIDs) are globally unique strings, associated with standard metadata, and are resolvable, with “links that continue to provide access [to a digital object or file] into the indefinite future”35. PIDs identifying a variety of digital objects have enabled data and metadata sharing, integration and reuse across domains, including cultural heritage36, scholarly communication37,38, and the natural sciences19,39. The use of PIDs is preferable over nonresolvable identifiers, such as Universally Unique IDs (UUIDs) and Darwin Core Triplets, which are commonly used for biological specimens in natural history collections. While these can be modified into a resolvable PID with a URL, they must then be associated with standard metadata and maintained by an institution committed to long-term preservation. Nonresolvable identifiers often contain errors and duplicates and are ineffective for linking related data22. The increased use of sample PIDs is an essential component to enable more effective tools for sample tracking and citation.
Within the Earth and environmental sciences, standard practices and tools for assigning PIDs to samples have been in place for decades and enable access, integration, and reuse of high-value data and metadata. The International Generic Sample Number (IGSN) IDs, and other sample PIDs40 must be managed by organizations committed to long-term sample and data preservation41,42. Though the IGSN ID was originally established in 2004 for Earth science samples, its use has since expanded to include a wide range of interdisciplinary samples19,41,43–45. Over 12.5 million IGSN IDs have been created across allocating agents46. Major organizations such as the National geological surveys of the US, UK, Australia, South Korea, and Germany also use IGSN IDs for their collections.
Organizations within the biodiversity research community are also increasingly using PIDs for physical specimens and/or digital representations of specimens in order to track use of samples and associated research products. For example, member organizations within the Consortium of European Taxonomic Facilities (CETAF), have implemented CETAF stable identifiers for specimens. These are URI-based identifiers directing humans to a web page about the specimen and computers to a machine-readable, RDF-encoded metadata record47. The Distributed System of Scientific Collections (DiSSCo) is a research infrastructure for 200 + European natural science collections that recently began implementing services to provide PIDs for online digital representations of specimens. These digital specimen records act as complementary online surrogates for physical specimens in natural science collections that can be updated to link physical specimens/samples to associated data as it is published over time48,49.
Particularly in interdisciplinary ecological studies, Earth science samples collected in the field may be associated with microbial and other omics (genomics, metagenomics, metabolomics) analyses. Microbiologists will typically refer to an isolate by a strain identifier (e.g., “Kra1”) or by equivalent prefixed culture collection accession numbers (e.g., ATCC 35583, DSM 2078, JCM 9277)50. These are not globally unique and can lead to ambiguity in the literature and public databases, limiting the utility of these accession numbers in searching, indexing, and provenance tracking. For genomics, this ambiguity is mitigated by the use of BioSample accession numbers51. The potentially rich metadata associated with these identifiers is openly available online, providing information about the source sample, as well as references to additional identifier classes for biological projects, analyses, and sequence data. These accession numbers and associated metadata provide near-ideal unique identifiers that lend themselves to efficient retrieval and literature search, though hierarchy concerns and indexing gaps limit their use in generating citation metrics.
Publishing samples and associated data and metadata
There are general requirements now in place from funding agencies to publish all data associated with scientific publications52 (see Supplemental Table 1). Most journals now have data availability statements and increasingly require that data be published upon submission53 (https://www.nature.com/nature-portfolio/editorial-policies/reporting-standards). However, few organizations provide explicit guidance on how to create and publish datasets associated with samples, and connect samples to downstream data. We found that even for organizations that recommend using sample PIDs and assigning standard metadata, explicit guidance on publishing downstream data and using their sample PIDs throughout the sample data lifecycle was usually lacking (Supplemental Table 1). One exception is the Interdisciplinary Earth Data Alliance (IEDA2), which enables sample identifier hierarchies with parent/child relationships, and for users to provide IGSN IDs associated with their dataset (Supplemental Table 1).
Biodiversity science and genomics are generally further along in providing infrastructure for publishing and connecting standardized sample metadata and associated data. In GBIF, occurrence records can be part of datasets, but records in GBIF are published at an institution-level, not by individual researchers publishing their data (note: datasets in GBIF are often whole natural history collections)54. While GBIF now recommends use of sample/specimen PIDs, they do not require them. Other infrastructure in the biodiversity research community connect specimens or digital specimens and downstream research products using XML-based data exchange standards and/or Resource Description Framework (RDF). RDF is a general framework for representing interconnected data on the web (Supplemental Table 1), commonly serialized using JSON-LD for data interchange. NCBI and other International Nucleotide Sequence Database Collaboration (INSDC) databases connect related sample data using identifier hierarchies, BioProject, BioSample, Sequencing Read Archive (SRA), and LinkOut services to connect to other related resources.
Sample citation practices
Despite decades of PID infrastructure development and sample metadata standards development, there are no consistent recommendations across disciplines for citing samples and associated data, and attribution practices (or lack thereof) vary. To clarify sample citation, we mean referencing sample PIDs in a structured, clearly accessible way within paper text (methods section, within relevant tables, in the data availability statement), in the reference list, and/or within associated dataset metadata and data files. Note that a full citation in a paper should include adding a formal citation in the references list55; however, this does not work for many use cases involving large numbers of samples, as addressed in later sections. In such cases, we can explore practical sample citation practices that make associated studies more FAIR and enable alternative options to track sample use, such as text mining tools56 or an emerging approach developed in the Research Data Alliance (RDA) Complex Citations Working group that would enable citing large numbers of entities57.
Many physical sample repositories and natural history collections request acknowledgement when their samples are used, but practices vary, and many are not sufficient to enable the tracking of individual sample use58. Each institution recommends a distinct practice, which often includes museum catalog numbers and the institution name; PIDs may or may not be required. The Field Museum in Chicago, for example, requests that specimens or objects be cited in their preferred format: [occurrenceID].[catalogNumber].[data publisher]59. The citation formats for museum collections at the Smithsonian National Museum of Natural History and the American Museum of Natural History (AMNH) are dependent on the division or department under their loan policies60,61. For example, the Smithsonian Mineral Sciences collection requires users to cite their collections based on what is available: catalog number, EZID, IGSN ID62. The AMNH Paleontology department requires a copy of the manuscript for records and catalog citation63.
This variation in recommended citations within and across disciplines results in even greater variations in how authors actually acknowledge long-term collections and samples used, if they do so at all. For example, authors will often mention sample repositories in the Acknowledgments section of a paper, and list individual sample names on a map or table shared as supplemental materials64. In both cases, the identifiers may be inconsistently abbreviated with no information about the current archives where the physical samples are held, which is important if the samples need to be accessed and re-analyzed. This reduces the reproducibility of the study, inhibits sample reuse, and makes automated identification or digital scraping of sample citations very difficult or even impossible.
Ethics and the CARE principles for samples and associated data and metadata
We recognize that not all data derived from samples can be fully open. Cyberinfrastructure designed to share sample metadata must also be designed to protect sensitive data, in particular sensitive sample locations. Samples that are sensitive and/or restricted must be protected through appropriate access controls and have any restrictions documented (such as permits, ethics agreements, access moratoriums). The decision as to whether certain samples and derived data can be made public is not necessarily that of the researcher. For example, the Nagoya Protocol addresses ‘Access to Genetic Resources and the Fair and Equitable Sharing of Benefits Arising from their Utilization to the Convention on Biological Diversity’65.
When collecting and managing samples related to Indigenous Peoples and their lands and waters, authors should consult the CARE Principles for Indigenous Data Governance21. The CARE Principles were developed by Indigenous Peoples, scholars, nonprofit organizations, and governments to address concerns about the secondary use of samples and data derived from these samples collected on their lands. The CARE Principles, which complement the FAIR Principles66, aim to 1) respect Indigenous data sovereignty, and 2) support open data, including secondary use21,67. For future sample acquisition, it is essential that the relevant Indigenous communities are engaged prior to any samples being collected, and that wherever possible, local knowledge is included in the collection process to avoid incidents such as the unauthorized sampling of Bishop Tuff in California and other cases elsewhere68. Operationalization of the CARE Principles by publishers, data repositories, and researchers is just beginning. Metadata guidance, such as the Indigenous Metadata Bundle Communiqué69 is important to incorporate in future development of sample metadata standards70,71.
Methods
ESIP is a 501(c)(3) nonprofit supported by NASA, NOAA, USGS, and 130 + member organizations, providing leadership in promoting the collection, stewardship, and use of Earth science data, information, and knowledge. This includes about 30 collaboration areas where members meet regularly to work together on common data challenges. The Physical Samples Curation Cluster is one such group that we organized in January 2021 to promote the discovery, access, and use of physical samples and associated data. Members and our target community include researchers who collect/identify/analyze/use samples and related data products, professionals who manage samples in physical collections, data repository managers, and other cyberinfrastructure providers who support tools and services for physical samples. This includes subject-matter experts from universities, federal organizations such as the U.S. Geological Survey, NASA, NOAA, US Department of Energy, major U.S. scientific sample repositories such as the USGS Core Research Center and the Oregon State University Marine and Geology Repository, data repositories such the Interdisciplinary Earth Data Alliance and the ESS-DIVE data repository, and the international IGSN e.V.
The working group started with the goal of addressing social and technical needs for tracking and publishing sample-related research across scientific disciplines.
Use cases for tracking sample use
In group discussions, we identified specific use cases that demonstrated common needs across disciplines for tracking samples to better support sample and data management, data synthesis, and appropriate credit for researchers and institutions. We then outlined real use cases encountered in our work as sample-data experts to inform the final recommendations, including:
Efficiently publish and cite a large number of samples and associated data and metadata;
Provide credit for those involved in physical sample collection and curation to demonstrate the value of investing in collections;
Track the use of sample data generated by analysts and laboratories; and
Connect related interdisciplinary sample data, metadata and other research outputs.
Drafting guidelines through community feedback and review
Use cases then informed the author guidelines for researchers submitting scientific publications involving physical samples. These guidelines were based on existing best practices27,42,45 (Supplemental Table 1), use cases that illustrated current challenges and needs, and extensive community feedback and review.
We gathered feedback in regular working meetings and conference sessions. During the monthly meetings, we held discussions, working sessions, and relevant talks on challenges, needs, and visions for publishing and tracking scientific samples and associated data. The group engaged more broadly by convening seven conference sessions at bi-annual ESIP meetings, the 2022 American Geophysical Union meeting, and the Society for the Preservation of Natural History Collections conference. We designed ESIP sessions, in particular, to collect specific feedback through individual reflection (via digital collaborative documents and whiteboards), group discussion, and anonymous poll/survey questions72,73. We gathered input from presentations and feedback during group meetings, which informed drafts of the guidelines and improved later versions.
To further refine the guidelines, we coordinated with several related projects and international efforts. This included the Sampling Nature Research Coordination Network, Internet of Samples (iSamples) project10, Australian Research Data Commons (ARDC) Information Management for Physical Samples Community of Practice, the RDA Complex Citations Working Group, and the RDA Physical Samples and Collections in the Research Data Ecosystem Interest Group.
We also worked through examples where six different projects, including researchers who had not previously used standard sample identifiers and metadata, applied recommended practices (see Supplemental Information for methods and outcomes).
Results
Use case review: needs for tracking sample use
The following use cases informed the final recommendations for scientific authors, journal publishers, data repositories, and indexers, as presented in the results and discussion sections.
Use Case 1: Efficiently publish and cite a large number of samples and associated data and metadata
Many studies that involve physical samples use dozens, hundreds, or even thousands of samples and subsamples. Tracking samples associated with these datasets is important for identifying the impact of samples, particularly for cases where sample collection was expensive or re-sampling is impossible. This includes tracking the full body of knowledge associated with any given sample, and appropriate attribution. There is no widely adopted method to efficiently cite a large number of datasets74, let alone the physical samples linked to them. However, with current infrastructure, metadata describing all samples used in a given dataset can be included. For example, several data repositories currently include sample PIDs as related identifiers in dataset metadata (e.g., EarthChem Library (ECL)and GFZ Data Services).
A real-world example: connecting samples and data in the interdisciplinary earth data alliance
IEDA2 is a collaborative, NSF-funded data infrastructure that consists of several complementary data systems, including ECL and SESAR. IEDA2 data systems provide services for publishing sample-based analytical datasets using consistent sample metadata and PIDs for samples (IGSN IDs) to unambiguously connect samples and derived data.
SESAR offers IGSN ID registration and management services for researchers and collection curators worldwide, enabling them to permanently store and update sample metadata—including images and links to related datasets and publications—on a persistent and publicly accessible digital sample landing page (e.g., 10.58052/IENHR006K; Fig. 1). Researchers may register IGSN IDs by entering metadata for a single sample in a web form, uploading a standardized spreadsheet template for one or more samples, or sending XML-encoded sample metadata from their local sample metadata management system through SESAR’s application programming interface (API). SESAR also enables linking related samples (by sites, parent–child samples, and/or sibling samples) by providing parent IGSN ID metadata.
Fig. 1.
Diagram depicting linkages between ECL (10.26022/IEDA/112300) and SESAR. (a) During dataset submission, authors are provided with a dedicated PID metadata field to provide sample PIDs. Once the dataset is submitted, the system verifies and hyperlinks the PIDs (in this case IGSN IDs). (b) Linked IGSN IDs lead to a permanent, publicly available metadata “landing page.” For the sample shown, additional subsample (“child”) IGSN IDs have been registered and are linked. The IGSN ID registrant has provided the DOI for the dataset shown in (a) in a dedicated metadata field for related URLs or DOIs. (c) A “child” subsample metadata record links back to the “parent” sample IGSN ID (b) and to other subsamples (“siblings”). The IGSN ID registrant has manually provided the DOI for the dataset shown in (a).
EarthChem provides two distinct, but complementary services: First, it enables access to large volumes of published laboratory analytical data for terrestrial samples (ca. 50 million analytical data points), aggregated and harmonized into synthesis databases with human and machine-actionable interfaces to search and retrieve analysis-ready data75. Second, EarthChem enables publishing and archiving datasets in ECL, a FAIR data repository providing standardized data templates for specific disciplines (for example, data derived from volcanic tephra samples76). Researchers contributing data to ECL can provide IGSN IDs within a designated column in the data templates and in a distinct metadata field during dataset submission. Upon publication, IGSN IDs are displayed on the dataset landing page, and link to individual IGSN ID sample pages (Fig. 1).
As of February 2025, > 26% (440) of EarthChem’s 1,641 published datasets included links to IGSN IDs, with 34,382 unique IGSN IDs recorded. Within SESAR, ~25,000 publicly-available samples have been linked to EarthChem datasets. This reflects strong community interest and buy-in for a future where these systems have automated links for sample and data discovery, and efforts are underway to develop this through the IEDA2 Geosamples Data Nexus.
Summary of needs
To support efficiently publishing and citing large numbers of samples and associated data, we suggest the following:
Authors should use PIDs for their samples, and include them as a column in data files and/or dataset metadata.
When registering dataset DOIs, data repositories should include sample PIDs as related IDs registered with DataCite.
Sample metadata and data repositories should enable automatic updates to sample metadata profiles and dataset landing pages as new data and metadata are published. For example, when samples are included in a dataset, the sample landing page should automatically update with a link to that dataset. PIDs must therefore be processed through an indexer or other functional links must exist between pertinent repositories and sample landing pages.
When a dataset is cited, samples included in that dataset should be automatically recognized and tracked in metrics, for example as addressed in the recommendations of the RDA Complex Citations Working Group57.
Users should be able to easily access sample PIDs and metadata on dataset landing pages; for example, through a weblink or the option to download.
Use Case 2: Provide credit for those involved in physical sample collection and curation to demonstrate the value of investing in collections
Tracking sample use is crucial for giving credit to individuals and organizations involved in sample collection and curation, including sample collectors, the repositories and collection managers who curate and manage samples, and funders evaluating impact. For example, physical sample repositories must regularly show the impact of their collections to justify their work and continue to acquire funding. When data and sample stewards are unable to fully document their contributions to science14 when samples are not cited, collection managers are less able to demonstrate the impact of collections, which, in turn, threatens the sustainability of these valuable scientific assets.
A real-world example: showing the impact of the university of michigan museum of zoology
The University of Michigan Museum of Zoology’s (UMMZ) Mammal Division manages over 150,000 specimens that are used in a broad range of scientific studies. Each of these specimens has a catalog number—a unique identifier within the UMMZ that is associated with both the physical sample and its metadata—but not a PID. To track the use of their collections, staff (led by author CWT) ask researchers who use the collection to include catalog numbers and acknowledge the use of the collections in any subsequent publications. CWT and his team maintain a bibliography in Google Scholar that lists these papers, as well as papers authored by the collection staff and students.
While the publications in the Google Scholar bibliography have accrued over 96,000 citations, this is just a heuristic of specimen use. Because papers by the collection staff are mixed with papers using the collection, it does not show the impact of specific specimens over time and, therefore, does not precisely show the impact of collections management. To address this, authors [SL, ERC, KF, RN, CWT, AKT77] used text mining to extract catalog numbers and generate metrics of use. The results were underwhelming: Of 1,297 papers examined, only 245 included catalog numbers. Instead, researchers thanked the collection in the acknowledgments section without citing specimens, listed specimens in supplementary material that could not be effectively identified and mined, or listed other identifiers that were not used by UMMZ (e.g., GenBank IDs).
Summary of needs
To provide credit for physical sample collection and curation, and to demonstrate the value of investing in collections, we need the following:
Managers of physical collections should explore assigning PIDs to their specimens. While this takes considerable time and effort, it would make it easier to “mine” citations because they are consistently formatted and resolvable to an online metadata catalog.
When samples have PIDs, the PIDs must be listed in any papers that use the samples. This could be done by listing the PID in the text of the paper, by citing a sample in the references section, or by including sample PIDs in a dataset cited by the paper.
Publishers, indexers, and data repositories should work together to aggregate and track the use of all PID types. This might mean that publishers recognize and hyperlink sample PIDs in paper text, indexers build new tools to harvest PIDs from papers and datasets, or data repositories take steps to expose sample PIDs to indexers.
Subsamples taken from a parent sample should be clearly linked to the parent through related identifiers. For example, GenBank records must link to parent/source samples (ideally with the sample PID) when relevant.
Use Case 3: Track the use of sample data generated from laboratories
Similar to the sample collectors and physical collections described in Use Case 2, laboratories conducting analyses on samples need to be able to demonstrate the value of their work to funders. Understanding how data are reused is also essential for identifying service improvements that can benefit the laboratories themselves and the communities they serve; for example, focusing on thematic areas that are heavily cited, improving the efficiency of laboratory processes, or allocating resources toward products and services with high-impact potential. However, a laboratory that publishes data or provides samples loses control over provenance information (records of how the sample and data are used) as soon as it ends up in the hands of a third party. Approaches that accumulate metadata in a consistent manner across systems and preserve full provenance information for samples and any derived data are greatly needed.
Real-world example: citations for data generated by the Joint Genome Institute (JGI)
The JGI provides integrated high-throughput sequencing of samples, DNA design and synthesis, metabolomics, and computational analysis. To track its impact on scientific research, JGI developed the Data Citation Explorer56, a web service that identifies the use of genomic data products in published literature even in instances where those products are not properly cited. The service employs heuristics to discover occurrences of unique identifiers associated with genomic data in the text and reconstructs graphs that restore many of the missing connections among these related classes of identifiers. The Data Citation Explorer has been able to identify around 4,000 publications citing JGI data using NCBI identifiers or other standard identifier types. However, concurrent manual expert analyses identified that most researchers cite publications associated with datasets produced from samples if they cite anything at all. The authors estimate that there are tens of thousands of such “nonstandard” references to JGI data that cannot yet be identified using automated tools56.
Summary of needs
The following would facilitate tracking use of sample data generated by laboratories:
Researchers should follow consistent guidelines on how samples and associated data should be cited. Particularly with a rise in interdisciplinary work, it would be beneficial to use and enforce similar practices across disciplines, journals, institutions and funders.
Scholars, laboratory managers, and others who register sample identifiers should use PIDs that are globally unique and can be identified and indexed using automated tools.
Sample metadata and data repositories should use consistent methods of search and retrieval of sample data and metadata (for example, URL formats, API standards, metadata formats), and implement standards to unambiguously link and exchange information for related PIDs78.
Provenance information must be propagated when laboratory and/or sample PIDs are used.
Use Case 4: Connect interdisciplinary sample data, metadata and other research outputs
Interdisciplinary studies that connect diverse data to understand multiscale processes often involve sample data. These highly related data may be analyzed and published separately on multiple data systems across disciplines, creating a challenge to connect subsamples and data types from the same samples. Future researchers attempting to find and reuse such data often have no way of tracking sample provenance without contacting the authors. These combined challenges make data synthesis involving interdisciplinary samples very difficult.
Real-world example: biogeochemical samples from projects of the United States Department of Energy’s Biological and Environmental Research Program (U.S. DOE BER)
The U.S. DOE BER program is highly interdisciplinary, and samples from its projects are often used to enhance models and predictions of ecological processes and biogeochemical responses to ecosystem disturbances. Scientists on these projects have faced sample tracking challenges due to inefficiencies in the processes of submitting samples to different data systems and laboratories and then compiling the resulting data. One such project, the River Corridor Hydro-biogeochemistry Science Focus Area, studies hydrologic, biogeochemical, and microbial function within river corridors79. Researchers collected a series of individual surface water samples (e.g., 10.58052/IEWDR00RT), sediment samples (e.g., 10.58052/IEWDR0149), and filter samples (e.g., 10.58052/IEWDR00UI) at almost 100 global sites (e.g., 10.58052/IEWDR00P4) as part of the Worldwide Hydrobiogeochemistry Observation Network for Dynamic River Systems (WHONDRS). DNA and RNA material were extracted from the filter and sediment samples (subsamples/child samples; e.g., 10.58052/IEWDR00UI, and sent to the JGI for metagenomic and metatranscriptomic sequencing. Water and sediment samples were also sent to the Environmental Molecular Sciences Laboratory (EMSL) for metabolomics analyses (Fig. 2). They created sample sets and documented their workflows in the DOE Systems Biology Knowledgebase (KBase) and registered an associated study on the National Microbiome Data Collaborative (NMDC) portal (https://data.microbiomedata.org/details/study/nmdc:sty-11-5tgfr349) as a part of the Genome Resolved Open Watersheds database (GROWdb) effort80. NMDC enables easy access to distributed microbiome and related data. In addition, NMDC enables submitting and storing sample metadata in the MIxS standard format for describing contextual information on sampling and sequencing of genomic material. Analysis and visualizations from the sample set were incorporated into formally published datasets for long-term preservation and documentation in the Environmental System Science Data Infrastructure for a Virtual Ecosystem (ESS-DIVE) data repository81,82. These datasets were then referenced in the final journal publications associated with the data79,83–85.
Fig. 2.
Tracking and linking one source material sample from the River Corridor Hydro-biogeochemistry Science Focus Area project, based on the iSamples relational data model10 which links related samples based on entities such as project, sampling site, subsamples, as well as other related links.
The process of submitting the associated data to multiple systems and adding links and other information as new data and metadata are generated is currently inefficient. The more recently developed NMDC portal has made significant progress, currently enabling centralized submission of standardized sample metadata for microbiome research to JGI and EMSL. And the five BER data systems are working toward developing a more deeply integrated data ecosystem, including automated metadata exchange and enabling global search across systems.
Summary of needs
To efficiently connect interdisciplinary sample data, metadata and other research outputs, we need the following:
Researchers should use sample PIDs for any Earth and environmental source/parent samples and subsamples sent to laboratories.
Data repositories and laboratories should promote or provide field apps and other tools for automated registration of sample PIDs with standard metadata at the time of collection/creation of the sample, or soon after, and upon sending subsamples to different laboratories and user facilities (automatically creating resource maps that specify and display sample relationships).
Laboratories and data systems should provide tools that map varying, but similar, metadata requirements across different systems86.
Data repositories and data systems should develop programmatic interfaces or APIs to automatically connect and exchange data and metadata using sample PIDs. For example, automatically cross-link across data systems as new data and metadata are generated, and develop systems for tracking sample use and citations as new data and metadata are published and (re)used over time87.
Relevant data systems should coordinate to better integrate samples and associated data across projects and data systems.
Recommended practices for scientists publishing sample-based research
The final guidance document, “Publishing Open Research Using Physical Samples: Guidance for Authors”, includes foundational elements to make samples and associated data and metadata Open and FAIR88. Adoption of this guidance would help to track sample usage over time, which in turn supports reproducible research, data integration, reuse, and credit. The full guidance document includes links to specific examples and additional information on why each step is needed. We have also condensed the guidelines for community distribution within the Earth sciences. These guidelines can be used directly by individual researchers, journal publishers, or data repositories, and can be modified to provide more targeted instructions for specific communities.
A summary of key elements of the full scientific author guide for publishing Open and FAIR research using physical samples88 include the following:
Step 1. Describe samples with rich metadata
Describe key characteristics and collection details of samples in a study using a domain-specific standard or reporting format relevant to your sample type25,27–30. This usually involves creating a sample metadata file using a standard template. Specific key metadata fields include sample type, how and where it was collected, by whom, and where it is archived (if applicable). This file is then used to register sample PIDs (Step 2), or is included in relevant datasets (Step 3).
Step 2. Assign and/or use identifiers for samples
Assign and/or use existing sample PIDs to track samples and associated data; some institutions or data systems may assign sample PIDs for you. The guidance document provides details for how to assign PIDs in different scenarios.
Step 3. Publish and cite samples in datasets
Publish a dataset that includes your sample PIDs and associated data; see existing guidance on how and where to publish datasets. If your samples have PIDs, include them in your dataset(s) metadata, and include a sample PID column (such as column header “IGSN” or “Sample PID”) within all data files containing sample data. If your samples do not have PIDs associated with standard metadata, also include a sample metadata file that clearly describes all sample collection details (Step 1) as part of your dataset. Then cite the dataset in the reference section of your paper and include it in your data availability statement89.
Step 4. Cite sample identifiers in paper
If referring to samples within the text and/or table(s) of your paper, use sample PIDs in a consistent standard format to address methods or findings. This includes a prefix identifying the PID type before the number, and a hyperlink to the sample landing page (for example, igsn:10.58052/IEGRW002B) or the full URL (10.58052/IEGRW002B, depending on journal requirements. This will make your PID findable by both humans and computers.
Note that for valuable samples archived in collections, you should cite sample PIDs in the text or references section where possible. However, when using large numbers of samples, you can cite a dataset that in turn cites the individual samples included (Step 3).
See Supplemental Information for details on how we applied these recommended practices in the ESS-DIVE data repository.
Discussion
The author’s guide for publishing sample-based research (summarized above) is one step toward enabling physical sample discovery, tracking, and attribution. However, author guidelines alone are not enough. There are multiple ways in which scientists, repositories, PID organizations, publishers, and citation indexers can further develop the physical sample research ecosystem (Fig. 3), including:
Promote and incentivize adoption of standard practices;
Research institutions, physical collections, and laboratories facilitate use of PIDs;
Implement clear guidelines and editorial review for publishing and citing sample-based research;
Implement standards for connecting samples and datasets to related outputs;
Improve citation metrics and provenance tracking; and
Coordinate across systems to enable automated data and metadata exchange and global sample search.
Fig. 3.
Diagram of the sample-based research ecosystem. Blue arrows represent the exchange of metadata as enabled by physical sample PIDs. Areas in which scientific practices need to improve are indicated by orange symbols, and areas in which technical infrastructure is needed are indicated by red symbols. These improvements (called out with numbers N, in the lower left and right corners of triangles) would enable efficient sample and data use tracking, and make sample data more FAIR.
Promote and incentivize adoption of standard practices
One of the biggest obstacles to supporting sample tracking and citation is cultural; while there is growing awareness of the need to describe and cite samples used in research, it is not yet the norm for most scientists to do so. Scientists may not be aware of the possibility and benefits of citing PIDs, and lacking incentives otherwise, they follow disciplinary traditions. We need to promote the best practices for publishing sample-based research. This involves advancing incentives that encourage researchers to follow recommended practices and tools that make this process easier and rewarding. Such incentives include useful tools for easy sample data submission, integration, and visualization (as demonstrated by resources like GBIF), and sample citation counts or other records of where and how samples and associated data are used56,90.
For newly collected samples, PIDs and standard metadata can be effectively assigned in the field at the time of sample collection using automated field apps, such as “Dirt to Desktop”91 and StraboSpot92. These GPS-enabled apps automate the capture of precise geographical coordinates at the time of collection and can be preset to collect a consistent set of metadata attributes for a major field sampling campaign. Not all sampling locations have internet access, but the information can be stored offline and automatically loaded to the home database when an internet connection becomes available91. They also remove the chance of transcription errors and save time and money92. The IGSN ID, geolocation, time of collection, and other critical metadata are efficiently and consistently captured in the field and are ready to submit on return.
We can promote a culture of sample citation and PID uses by mentoring and training researchers, as well as through funding and journal requirements. Some funders now recommend or require the use of sample PIDs in data management plans, which is an important step. For example, the U.S. NSF GEO Data and Sample Policy request IGSN ID registration through the SESAR11. Journals can include guidance for samples in their publication requirements (for example, AGU includes IGSN IDs in their guidance for authors).
Research institutions, physical collections, and laboratories facilitate use of PIDs
The research institutions, physical collections, and laboratories that manage physical samples have a major role to play in facilitating sample publication, citation, and tracking. PID registration can be more readily incorporated into required (and ideally automated) workflows of these institutions throughout the sample collection and management lifecycle. After source sample PIDs are assigned, researchers from institutions analyzing samples may additionally facilitate sample tracking and citation by minting child PIDs for any subsamples taken from their collections (Fig. 3, 1–3).
Existing services and new technology are lowering barriers for institutions to adopt PIDs and develop local PID allocation services. For 20 years, SESAR has provided curation services and a user-friendly sample registration service for researchers for no fee. The Internet of Sample (iSamples) project is experimenting with methods of aggregating sample records by serving data from existing local sample PID registration services using a unified metadata profile10. DISSCo uses an approach to assign DOIs to the digital records for specimens (not the physical sample itself), which could simplify updating records for large numbers of specimens contributed by different sample collectors/owners, linking to related resources over time, and avoid relabeling physical samples48,49. Institutions and researchers have a growing number of options to implement standard sample metadata publication and citation in their research.
Implement clear guidelines and editorial review for publishing and citing sample-based research
Data repositories and sample metadata repositories can contribute by providing clear guidance and editorial review on identifying, describing, and reporting use of samples in datasets. While many disciplinary repositories already provide guidance on sample IDs and metadata, there is often little guidance for publishing and connecting related data over time (Supplemental Table 1; Fig. 3, improvements 4 and 5). Such data repositories can now use or adapt the guidelines developed in this publication and the associated guidance document developed by the ESIP Physical Samples Curation Cluster to address these issues and move toward common practices across systems.
Journal publishers must recognize the role of citations for research products beyond research articles, and require citations for datasets, physical samples, and beyond (improvement 5, Fig. 3). Some journal publishers already provide data and software citation guidance93; similar author instructions are needed on where and how to cite samples in publications and/or associated datasets. This includes information about how to encode sample PIDs so that they become linked in the publication process (Fig. 3, Improvements 7-8). This guidance should outline procedures for all components of a paper (how to cite sample PIDs in line in text, in tables, and how they should appear in Data Availability statements or reference sections) or a dataset where relevant. During the review process, journal and data editors should ensure that PIDs are formatted in a way that they can be easily indexed and are reliably linked to related metadata records.
Implement standards for connecting samples and datasets to related research outputs
Sample PIDs and standard metadata are the foundational elements necessary to track and update provenance information. Some data repositories have systems in place for connecting datasets to related entities through, for example, provenance metadata, the RDF metadata framework for exchanging information, and/or DataCite metadata schema (“RelatedIdentifier”, and “RelationType” fields; Supplemental Table 1)94; many of these approaches could be extended to relate samples to their data and publications. However, many data archives do not have dataset metadata fields specifically for samples and other related identifiers. We recommend that data repositories store related sample identifiers in their datasets’ metadata (Fig. 3, Improvement 6). Currently, the benefit of using the DataCite metadata is that their “RelatedIdentifier” and “RelationType” fields are used by DataCite to track citations to datasets and related works more easily, and thereby better show the impact of data (https://support.datacite.org/docs/connecting-to-works)94. Data repositories serving sample-based research should further provide the functionality to recognize sample PIDs as related entities associated with and cited by the dataset. For example, EarthChem automatically extracts IGSN IDs from data files and clearly displays links to the samples on the dataset landing page (for example, 10.26022/IEDA/112300).
Sample PIDs should generally be linked to other identifiers using defined relationship types, such as DataCite metadata described above94. This includes other samples with PIDs (parent-subsample as “IsPartOf”, or parent–child as “IsDerivedFrom”), and datasets that include sample (dataset DOI “HasPart” and “References” sample DOI/IGSN ID). Connecting sample PIDs to all downstream sample/research products enables indexers to automatically create and track directional linkages. Furthermore, we need to make these related identifiers agnostic to identifier type, going beyond DOIs to include the range of identifiers in use, such as ARKs, BioSample Accession numbers51,95, and more.
Improve citation metrics and provenance tracking
Citation metrics work fairly well for journal publications and researchers, but improvements are needed for data and sample citation55. Indexers that currently provide paper and data citation metrics, such as CrossRef and DataCite, need to consistently recognize sample PIDs as entities in citation metrics (currently only IGSN IDs/sample DOIs are tracked; Fig. 3, Improvement 9). Further, at the present time, metrics and usage tracking are only available for DOIs (which include IGSN IDs). We need metrics and usage tracking to be implemented for a range of identifiers in order to make sample-based research truly open and FAIR. Existing initiatives, such as the Make Data Count effort, are working toward making data citation work more consistently96.
All institutions involved in the sample collection and data and metadata lifecycle can contribute to a network of related identifiers that links data and metadata across PID registries and related research outputs. If sample PIDs and related identifiers are captured in parent-child sample records and dataset metadata, we can design APIs to efficiently cross-link and exchange information where needed across sample repositories, data repositories, journals, and more when sample PIDs are cited. This will make it possible to track the use of samples and attribute appropriate credit to those involved in sample collection, management, and analysis, as well as document provenance and relationships that make samples and associated data more useful. Tracking sample use will often require traversing multiple links in a graph of related PIDs. For example, this may involve a paper citing a dataset, the dataset citing analyses done on subsamples, and subsamples citing the original source sample collected in the field and/or archived in a museum. Currently, there are few effective ways of doing this traversal, making it challenging to track sample usage en masse.
We need a new approach to effectively recognize sample citations. The RDA Complex Citation Working Group, has outlined needs across multiple use cases to enable citing large numbers of objects (that may originate across multiple data systems) in a single container citation57. One of the key use cases for complex citations is to make it possible for authors to cite as many samples as needed in a paper or dataset in a machine-readable way, with the goal of enabling both provenance tracking and credit. Indexers then need to actually harvest those citations accurately from datasets and journal articles57.
Coordinate across systems to enable automated metadata exchange and global sample search
Sample metadata repositories and data repositories are often siloed and need better integration across one another, as well as connections to journal publishers. For example, many IGSN ID Allocating Agents are specific to a country, discipline, or organization and store richer metadata than that which is shared via DataCite when registering samples for IGSN IDs. Though DataCite Commons now enables searching for samples across all IGSN allocating agents, search is limited to this high-level DataCite metadata, and researchers must visit multiple systems to find more detailed sample metadata. Additionally, these distributed services are often not connected to other key systems where associated metadata and data are added over time, such as laboratories, data repositories, and journal publishers. The sample PIDs could be far more valuable to researchers with tools to automatically cross-link, update, and exchange information about samples over time (improvements 6, 8, and 9, Fig. 3).
Sample metadata repositories, data repositories, journals, and indexers must coordinate to implement community practices and technical solutions that enable automated linking and information exchange described above, which can apply to samples or any other PID used. Groups such as the ESIP Physical Sample Curation Cluster (including many of the authors of this paper), the Research Data Alliance (RDA) Physical Samples and Collections in the Research Data Ecosystem, RDA Coordinating Earth, Space, and Environmental Science Data Preservation and Scholarly Publication Processes Working Group, and the Coalition for Publishing Data in the Earth and Space Sciences (COPDESS) can help promote and facilitate such coordination.
There are also emergent infrastructure development projects that aim to bridge these silos. For example, the iSamples project aims to build connections across distributed sample metadata catalogs by aggregating sample metadata into iSamples Central10. This aggregation means that researchers would only need to search for samples in one place, rather than in multiple repositories. Additionally, several US federal agencies have plans to develop federated systems allowing discovery and access to federally-funded data and articles (See: NSF National Center for Atmospheric Research97). There is generally a push toward ‘open research commons’ within geoscience and more broadly. These important efforts should include samples and associated data and metadata as a major component.
Advancing physical sample-based research
Scientists and sample managers face similar challenges with regard to sample use tracking across specific use cases and disciplines. We believe that this limited set of community practices and improved infrastructure can solve many current challenges, and make it possible to create useful tools for sample discovery, visualization, integration and reuse. Standard practices and improved infrastructure can also make it easier to find and access materials that no longer exist outside of a museum, or are no longer available to be sampled, which saves time and money and enables science that would not be possible otherwise98. For example, new studies are published using GBIF records every day, addressing topics such as conservation, species distribution, climate change impacts, macroecological patterns, and more9,99. Similarly, the reuse of genomics datasets has contributed to research with diverse applications, for example in the industrial biotechnology100 and biomedical101 sectors, and has enabled researchers to better understand the biological effects of ecosystem disturbance102. We can advance other environmental science and interdisciplinary studies with more widespread use of standard practices and improvements to infrastructure.
We have described the need for sample and associated data publication and citation guidelines, cultural changes, and infrastructure development to better facilitate physical sample discovery, citation, and tracking. Through years of iterative development, we created author guidelines for sample publication and citation as one step toward this vision. Data repositories and journals can now use and adapt the author guidelines developed by the ESIP Physical Samples Curation Cluster to provide clear guidance for authors submitting data and journal publications88. A key element of these recommendations is the wide implementation and adoption of the Sample PID, which provides a powerful way to link and exchange relevant scientific information across facilities and data systems. Overall, these guidelines would enable future development of automated tools to track sample use over time while making samples and associated data Open and FAIR.
Supplementary information
Supplemental Information: Applying Recommended Practices in the ESS-DIVE Data Repository
Acknowledgements
J.E. Damerow, D. O’Ryan, and S. Cholia were supported through the ESS-DIVE repository by the U.S. DOE’s Office of Science Biological and Environmental Research Program under contract number DE-AC02-05CH11231. A.K. Thomer, N.H. Raia, S. Choe, and K. Lehnert were supported in part through the iSamples Project NSF grant number 2004562 and IEDA2 NSF grant number 2148939. The work conducted by the U.S. Department of Energy Joint Genome Institute (https://ror.org/04xm1d337), a DOE Office of Science User Facility, is supported by the Office of Science of the U.S. Department of Energy operated under Contract No. DE-AC02-05CH11231. The PNNL River Corridor Science Focus Area portion of this work was supported by the U.S. DOE, Office of Biological and Environmental Research (BER), Environmental System Science (ESS) Program. PNNL is operated by Battelle Memorial Institute for the DOE under contract no. DE-AC05-76RL01830. In addition, RNA and DNA samples were processed through the ‘Creating the GROW (Genome Resolved Open Watershed) Database: Leveraging Distributed Research Networks to Understand Watershed Systems’ award (10.46936/10.25585/60001289). We thank those who contributed to the ESIP Physical Samples Curation Cluster and the feedback that all participants provided in developing the author guide described in this work. We also thank those who contributed to the DOE sample interoperability working group, and their feedback/work on approaches to link related samples and data across DOE BER data systems.
Author contributions
Conceptualization: J.E. Damerow, A.K. Thomer, V. Stanley, S. Ramdeen. Write section drafts: J.E. Damerow, N.H. Raia, S. Choe, A.K. Thomer, C. Parker, N. Byers. Contribute to use-case assessment: J.E. Damerow, A.K. Thomer, N.H. Raia, S. Choe, K. Lehnert, D. O’Ryan, N. Byers, M.A. Borton, C. Parker, E. Wood-Charlson, B. Forbes, A. Goldman, C.W. Thompson, S. Lafia, K. Forrest, R. Naples, E.R. Cassidy, T.B.K. Reddy, B. Powers-McCormack, S. Cholia. Review and edit: all co-authors.
Data availability
The resulting “Scientific Author Guide for Publishing Open Research Using Physical Samples,” as well as relevant community meeting presentations are available in the ESIP Figshare research repository72,73,88,103.
The ESS-DIVE data repository has 29 datasets (as of February 2025) compiled into a data portal collection for Environmental System Science samples, which generally include IGSN IDs and standard metadata for associated samples. This includes 10 datasets with detailed links to related samples and other research outputs69,70,81,82,104–111.
Code availability
No new code was generated in this work.
Competing interests
The authors declare no competing interests.
Footnotes
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Contributor Information
Joan E. Damerow, Email: joandamerow@gmail.com
Andrea K. Thomer, Email: athomer@arizona.edu
Supplementary information
The online version contains supplementary material available at 10.1038/s41597-025-05295-z.
References
- 1.Haller, A. et al. The modular SSN ontology: A joint W3C and OGC standard specifying the semantics of sensors, observations, sampling, and actuation. Semantic Web10.3233/SW-180320 (2018).
- 2.Janowicz, K., Haller, A., Cox, S., Phuoc, D. L. & Lefrancois, M. SOSA: A Lightweight Ontology for Sensors, Observations, Samples, and Actuators. 10.2139/ssrn.3248499 (2018).
- 3.Wilkinson, M. D. et al. The FAIR Guiding Principles for scientific data management and stewardship. Scientific Data3, 160018, 10.1038/sdata.2016.18 (2016). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 4.National Academies of Sciences, Engineering, and Medicine, Policy and Global Affairs, Board on Research Data and Information, Committee on Toward an Open Science Enterprise. Open Science by Design: Realizing a Vision for 21st Century Research. National Academies Press, 10.17226/25116 (2018). [PubMed]
- 5.UNESCO Recommendation on Open Science. https://www.unescoorg/en/open-science/about (accessed 20 Mar2024) (2023).
- 6.Mons, B. et al. Cloudy, increasingly FAIR; revisiting the FAIR Data guiding principles for the European Open Science Cloud. Inf Serv Use37, 49–56, 10.3233/isu-170824 (2017). [Google Scholar]
- 7.McNutt, M. et al. Liberating field science samples and data. Science351, 1024–1026, 10.1126/science.aad7048 (2016). [DOI] [PubMed] [Google Scholar]
- 8.Sidlauskas, B. et al. Linking big: the continuing promise of evolutionary synthesis. Evolution64, 871–880, 10.1111/j.1558-5646.2009.00892.x (2010). [DOI] [PubMed] [Google Scholar]
- 9.Heberling, J. M., Miller, J. T., Noesgaard, D., Weingart, S. B. & Schigel, D. Data integration enables global biodiversity synthesis. Proc Natl Acad Sci USA118. 10.1073/pnas.2018093118 (2021). [DOI] [PMC free article] [PubMed]
- 10.Davies, N. et al. Internet of Samples (iSamples): Toward an interdisciplinary cyberinfrastructure for material samples. Gigascience1010.1093/gigascience/giab028 (2021). [DOI] [PMC free article] [PubMed]
- 11.U.S. National Science Foundation. Division of Earth Sciences (EAR) Data and Sample Policy Division of Earth Sciences National Science Foundation. https://www.nsf.gov/geo/geo-data-policies/ear/ear-data-policy-jul2023.pdf (2023).
- 12.Troudet, J., Vignes-Lebbe, R., Grandcolas, P. & Legendre, F. The Increasing Disconnection of Primary Biodiversity Data from Specimens: How Does It Happen and How to Handle It? Syst Biol67, 1110–1119, 10.1093/sysbio/syy044 (2018). [DOI] [PubMed] [Google Scholar]
- 13.Shiffrin, R. M., Börner, K. & Stigler, S. M. Scientific progress despite irreproducibility: A seeming paradox. Proc Natl Acad Sci USA115, 2632–2639, 10.1073/pnas.1711786114 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 14.Thessen, A. E. et al. Proper attribution for curation and maintenance of research collections: Metadata recommendations of the RDA/TDWG working group. Data Sci J18, 54, 10.5334/dsj-2019-054 (2019). [Google Scholar]
- 15.Rouhan, G. et al. The time has come for Natural History Collections to claim co‐authorship of research articles. Taxon66, 1014–1016, 10.12705/665.2 (2017). [Google Scholar]
- 16.Deck, J. et al. The Genomic Observatories Metadatabase (GeOMe): A new repository for field and sampling event metadata associated with genetic samples. PLoS Biol15, e2002925, 10.1371/journal.pbio.2002925 (2017). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 17.Pope, L. C., Liggins, L., Keyse, J., Carvalho, S. B. & Riginos, C. Not the time or the place: the missing spatio-temporal link in publicly available genetic data. Mol Ecol24, 3802–3809, 10.1111/mec.13254 (2015). [DOI] [PubMed] [Google Scholar]
- 18.Roche, D. G., Kruuk, L. E. B., Lanfear, R. & Binning, S. A. Public Data Archiving in Ecology and Evolution: How Well Are We Doing? PLoS Biol13, e1002295, 10.1371/journal.pbio.1002295 (2015). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 19.Klump, J. et al. Towards globally unique identification of physical samples: Governance and technical implementation of the IGSN global sample number. Data Sci J2010.5334/dsj-2021-033 (2021).
- 20.Schindel, D. E. & Cook, J. A. The next generation of natural history collections. PLoS Biol16, e2006125, 10.1371/journal.pbio.2006125 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 21.Carroll, S. R. et al. The CARE principles for indigenous data governance. Data Sci J1910.5334/dsj-2020-043 (2020).
- 22.Guralnick, R., Conlin, T., Deck, J., Stucky, B. J. & Cellinese, N. The Trouble with Triplets in Biodiversity Informatics: A Data-Driven Case against Current Identifier Practices. PLoS One9, e114069, 10.1371/journal.pone.0114069 (2014). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 23.Katharina Schleidt, I. R. OGC Abstract Specification Topic 20: Observations, measurements and samples. Open Geospatial Consortium, http://www.opengis.net/doc/as/om/3.0 (accessed 26 Mar2025) (2023).
- 24.Field, D. et al. The Genomic Standards Consortium. PLoS Biol9, e1001088, 10.1371/journal.pbio.1001088 (2011). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 25.Wieczorek, J. et al. Darwin Core: An Evolving Community-Developed Biodiversity Data Standard. PLoS One7, e29715, 10.1371/journal.pone.0029715 (2012). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 26.Crystal-Ornelas, R. et al. Enabling FAIR data in Earth and environmental science with community-centric (meta)data reporting formats. Sci Data9, 700, 10.1038/s41597-022-01606-w (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 27.Damerow, J. E. et al. Sample identifiers and metadata to support data management and reuse in multidisciplinary ecosystem sciences. Data Sci J20, 11, 10.5334/dsj-2021-011 (2021). [Google Scholar]
- 28.Yilmaz, P. et al. Minimum information about a marker gene sequence (MIMARKS) and minimum information about any (x) sequence (MIxS) specifications. Nat Biotechnol29, 415–420, 10.1038/nbt.1823 (2011). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 29.System for Earth Sample Registration (SESAR). SESAR XML Schema for samples. 10.5281/zenodo.3875531 (2020).
- 30.System for Earth Sample Registration (SESAR). SESAR Batch Registration Quick Guide. 10.5281/zenodo.3874923 (2020).
- 31.Damerow, J. et al. Sample Identifiers and Metadata Reporting Format for Environmental Systems Science. [Dataset]. Environmental Systems Science Data Infrastructure for a Virtual Ecosystem (ESS-DIVE). 10.15485/1660470 (2020).
- 32.Strasser, B. J. Genetics. GenBank–Natural history in the 21st Century? Science322, 537–538, 10.1126/science.1163399 (2008). [DOI] [PubMed] [Google Scholar]
- 33.Robertson, T., Gonzalez, M. L., Hofft, M. & Grosjean M. Documenting Natural History Collections in GBIF. Biodiversity Information Science and Standards3, 10.3897/biss.3.37216 (2019).
- 34.Sayers, E. W. et al. GenBank 2023 update. Nucleic Acids Res51, D141–D144, 10.1093/nar/gkac1012 (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 35.Kunze, J. Towards Electronic Persistence Using ARK Identifiers. https://escholarship.org/content/qt3bg2w3vs/qt3bg2w3vs.pdf?t=pn0jue (accessed 26 Mar2025) (2003).
- 36.Kansa, E. C. & Kansa, S. W. Promoting data quality and reuse in archaeology through collaborative identifier practices. Proc Natl Acad Sci USA119, e2109313118, 10.1073/pnas.2109313118 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 37.Cousijn, H. et al. Connected research: The potential of the PID Graph. Patterns (N Y)2, 100180, 10.1016/j.patter.2020.100180 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 38.Davidson, L. A. & Douglas, K. Digital object identifiers: Promise and problems for scholarly publishing. J Electron Publ4. 10.3998/3336451.0004.203 (1998).
- 39.Clark, T., Martin, S. & Liefeld, T. Globally distributed object identification for biological knowledgebases. Brief Bioinform5, 59–70, 10.1093/bib/5.1.59 (2004). [DOI] [PubMed] [Google Scholar]
- 40.Peyrard, S., Tramoni, J.-P. & Kunze, J. The ARK Identifier Scheme: Lessons Learnt at the BnF and Questions Yet Unanswered. https://escholarship.org/uc/item/58d52295. (accessed 20 Nov2019) (2014).
- 41.Klump, J. & Huber, R. 20 Years of Persistent Identifiers – Which Systems are Here to Stay? Data Science Journal16, 9, 10.5334/dsj-2017-009 (2017). [Google Scholar]
- 42.McMurry, J. A. et al. Identifiers for the 21st century: How to design, provision, and reuse persistent identifiers to maximize utility and impact of life science data. PLoS Biol15. 10.1371/journal.pbio.2001414 (2017). [DOI] [PMC free article] [PubMed]
- 43.Lehnert, K. A., Goldstein, S. L., Lenhardt, C. & Vinayagamoorthy, S. Vinayagamoorthy S. SESAR: Addressing the need for unique sample identification in the Solid Earth Sciences. p SF32A–06, https://www.researchgate.net/publication/241504172_SESAR_Addressing_the_need_for_unique_sample_identification_in_the_Solid_Earth_Sciences (2004).
- 44.Lehnert, K. A. et al. IGSN e.V.: Registration and Identification Services for Physical Samples in the Digital Universe. AGU Fall Meeting Abstracts 2011; 13: IN13B–1324. https://www.researchgate.net/publication/258471230_IGSN_eV_Registration_and_Identification_Services_for_Physical_Samples_in_the_Digital_Universe.
- 45.Lehnert, K., Klump, J., Wyborn, L. & Ramdeen, S. Persistent, Global, Unique: The three key requirements for a trusted identifier system for physical samples. Biodiversity Information Science and Standards3, e37334, 10.3897/biss.3.37334 (2019). [Google Scholar]
- 46.Lehnert, K., Klump, J., Ramdeen, S., Wyborn, L. & Haak, L. IGSN 2040 Summary Report: Defining the Future of the IGSN as a Global Persistent Identifier for Material Samples. 10.5281/zenodo.5118289 (2021).
- 47.Güntsch, A. et al. Actionable, long-term stable and semantic web compatible identifiers for access to biological collection objects. Database (Oxford)2017, bax003, 10.1093/database/bax003 (2017). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 48.Hardisty, A. et al. A choice of persistent identifier schemes for the Distributed System of Scientific. Collections (DiSSCo). Res Ideas Outcomes7, e67379, 10.3897/rio.7.e67379 (2021). [Google Scholar]
- 49.Hardisty, A. R. et al. Digital extended specimens: Enabling an extensible network of biodiversity data records as integrated digital objects on the Internet. Bioscience72, 978–987, 10.1093/biosci/biac060 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 50.Smith, D. Culture Collections and Biological Resource Centres (BRCs). Encyclopedia of Industrial Biotechnology. 10.1002/9780470054581.eib246 (2009).
- 51.Barrett, T. et al. BioProject and BioSample databases at NCBI: facilitating capture and organization of metadata. Nucleic Acids Res 40, D57–D63, 10.1093/nar/gkr1163 (2012) [DOI] [PMC free article] [PubMed]
- 52.Nelson, A. Memorandum for the heads of executive departments and agencies: Ensuring free, immediate, and equitable access to federally funded research. Office of Science and Technology Policy (OSTP). https://bidenwhitehouse.archives.gov/wp-content/uploads/2022/08/08-2022-OSTP-Public-Access-Memo.pdf (2022).
- 53.Noy, N. & Noy, A. Let go of your data. Nat Mater19, 128, 10.1038/s41563-019-0539-5 (2020). [DOI] [PubMed] [Google Scholar]
- 54.Robertson, T. et al. The GBIF integrated publishing toolkit: facilitating the efficient publishing of biodiversity data on the internet. PLoS One9, e102623, 10.1371/journal.pone.0102623 (2014). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 55.Stall, S. et al. Journal Production Guidance for Software and Data Citations. Sci Data10, 656, 10.1038/s41597-023-02491-7 (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 56.Byers, N. et al. Identifying genomic data use with the Data Citation Explorer. Sci Data11, 1200, 10.1038/s41597-024-04049-7 (2024). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 57.Agarwal, D. et al. Complex Citation Working Group recommendation. 10.5281/ZENODO.14106603 (2024).
- 58.Miller, S. E. et al. Building Natural History Collections for the Twenty-First Century and Beyond. Bioscience70, 674–687, 10.1093/biosci/biaa069 (2020). [Google Scholar]
- 59.Preferred Citations. Field Museum of Natural History. https://dams.fieldmuseum.tech/portals/museum-media/#page/preferred-citations. (accessed 27 Mar 2025).
- 60.Loan Program. Smithsonian National Museum of Natural History. https://naturalhistory.si.edu/loan-program (accessed 27 Mar 2025).
- 61.Loans. American Museum of Natural History. https://www.amnh.org/research/paleontology/loans (accessed 27 Mar 2025).
- 62.Department of Mineral Sciences Loan Policy. Smithsonian National Museum of Natural History. https://naturalhistory.si.edu/research/mineral-sciences/collections-access/loan-policy (accessed 27 Mar 2025).
- 63.Scanning Procedures. American Museum of Natural History. https://www.amnh.org/research/paleontology/scanning-procedures (accessed 27 Mar 2025).
- 64.Cui, X. et al. Global fjords as transitory reservoirs of labile organic carbon modulated by organo-mineral interactions. Sci Adv8, eadd0610, 10.1126/sciadv.add0610 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 65.Buck, M. & Hamilton, C. The Nagoya Protocol on Access to Genetic Resources and the Fair and Equitable Sharing of Benefits Arising from their Utilization to the Convention on Biological Diversity. Rev Eur Community Int Environ Law20, 47–61, https://www.cbd.int/abs/doc/protocol/nagoya-protocol-en.pdf (2011). [Google Scholar]
- 66.Carroll, S. R., Herczog, E., Hudson, M., Russell, K. & Stall, S. Operationalizing the CARE and FAIR Principles for Indigenous data futures. Sci Data8, 108, 10.1038/s41597-021-00892-0 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 67.Williamson, B., Provost, S. & Price, C. Operationalising Indigenous data sovereignty in environmental research and governance. Environment and Planning F2, 281–304, 10.1177/26349825221125496 (2023). [Google Scholar]
- 68.Sahagún, L. Caltech says it regrets drilling holes in sacred Native American petroglyph site. Los Angeles Times. https://www.latimes.com/environment/story/2021-07-19/caltech-fined-for-damaging-native-american-cultural-site (accessed 24 Mar 2024) (2021).
- 69.Taitingfong, R., Martinez, A., Carroll, S. R., Hudson, M. & Anderson, J. Indigenous Metadata Bundle Communique. Collaboratory for Indigenous Data Governance, ENRICH: Equity for Indigenous Research and Innovation Coordinating Hub, and Tikanga in Technology. https://indigenousdatalab.org/3006-2/ (2023).
- 70.Golan, J. et al. Benefit sharing: Why inclusive provenance metadata matter. Front Genet13, 1014044, 10.3389/fgene.2022.1014044 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 71.Lock, M. et al. Position statement: Research and reconciliation with Indigenous peoples in rural health journals. Aust J Rural Health30, 6–7, 10.22605/rrh7353 (2022). [DOI] [PubMed] [Google Scholar]
- 72.Damerow J., Thomer A., Stanley V. How can we connect and track use of physical samples and associated data? [Presentation]. ESIP Figshare. 10.6084/m9.figshare.25483765.v1 (2023).
- 73.Damerow J., Thomer A. & Stanley, V. Community and Technical Needs to Facilitate Sample Citation. [Presentation]. ESIP Figshare. 10.6084/m9.figshare.25483771.v1 (2024).
- 74.Agarwal, D. et al. Balancing the needs of consumers and producers for scientific data collections. Ecol Inform 101251. 10.1016/j.ecoinf.2021.101251 (2021).
- 75.Lehnert, K. & EarthChem, - FAIR data for geochemistry, volcanology, and petrology [Presentation]. Zenodo.10.5281/zenodo.10737711 (2023).
- 76.Wallace, K. L. et al. Community established best practice recommendations for tephra studies-from collection through analysis. Sci Data9, 447, 10.1038/s41597-022-01515-y (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 77.Lafia, S., Thomer, A., Thompson, C., Cassidy, E. & Polasek, K. Surfacing Specimen Citations: Machine Learning, Manual Annotation, and Impact Metrics for Natural History Collections. American Geophysical Union (AGU), p IN55A–01. https://ui.adsabs.harvard.edu/abs/2022AGUFMIN55A..01L/abstract (2022).
- 78.Cross-Domain Interoperability Framework (CDIF) Working Group, Richard S. et al. Cross Domain Interoperability Framework (CDIF): Discovery Module (v01 draft for public consultation). 10.5281/zenodo.10252564 (2023).
- 79.Garayburu-Caruso, V. A. et al. Using community science to reveal the global chemogeography of river metabolomes. Metabolites1010.3390/metabo10120518 (2020). [DOI] [PMC free article] [PubMed]
- 80.Borton, M. & GROWdb, U. S. River Systems - Samples. [Dataset]. DOE KBase.10.25982/109073.30/1895615 (2022). [Google Scholar]
- 81.Toyoda, J. G., Goldman, A. E., Chu, R. K., Danczak, R. E. & Daly, R. A. WHONDRS Summer 2019 sampling campaign: global river corridor surface water FTICR-MS, NPOC, and stable isotopes. [Dataset]. ESS-DIVE. 10.15485/1603775 (2020).
- 82.Goldman, A. E. et al. WHONDRS Summer 2019 Sampling campaign: Global river corridor sediment FTICR-MS, dissolved organic carbon, aerobic respiration, elemental composition, grain size, total nitrogen and organic carbon content, bacterial abundance, and stable isotopes (v8). [Dataset]. ESS-DIVE.10.15485/1729719 (2020). [Google Scholar]
- 83.Borton, M. A. et al. It takes a village: using a crowdsourced approach to investigate organic matter composition in global rivers through the lens of ecological theory. Frontiers in Water4, 10.3389/frwa.2022.870453 (2022).
- 84.Stadler, M. et al. Applying the core-satellite species concept: Characteristics of rare and common riverine dissolved organic matter. Frontiers in Water5. 10.3389/frwa.2023.1156042 (2023).
- 85.Buser-Young, J. Z. et al. Determining the biogeochemical transformations of organic matter composition in rivers using molecular signatures. Frontiers in Water5. 10.3389/frwa.2023.1005792 (2023).
- 86.Gill, I. S. et al. The DataHarmonizer: a tool for faster data harmonization, validation, aggregation and analysis of pathogen genomics contextual information. Microb Genom9. 10.1099/mgen.0.000908 (2023). [DOI] [PMC free article] [PubMed]
- 87.Wood-Charlson, E. M., Crockett, Z., Erdmann, C., Arkin, A. P. & Robinson, C. B. Ten simple rules for getting and giving credit for data. PLoS Comput Biol18, e1010476, 10.1371/journal.pcbi.1010476 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 88.Damerow, J. et al. Publishing Open Research Using Physical Samples: Guidance for Authors. [Documentation]. ESIP Figshare. 10.6084/m9.figshare.24669057.v3 (2025).
- 89.Stall, S. et al. Data documentation and citation checklist. Zenodo. https://zenodo.org/records/7841823 (2023).
- 90.Colavizza, G., Hrynaszkiewicz, I., Staden, I., Whitaker, K. & McGillivray, B. The citation advantage of linking publications to research data. PLoS One15, e0230416, 10.1371/journal.pone.0230416 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 91.Ross, S. et al. FAIRer Data through Digital Recording: The FAIMS Mobile Experience.5, 271–285, 10.5334/jcaa.96 (2022). [Google Scholar]
- 92.Walker, D. J. et al. StraboSpot data system for structural geology. Geosphere15, 533–547, 10.5334/jcaa.96 (2019). [Google Scholar]
- 93.Fox, P. et al. Data and software sharing guidance for authors submitting to AGU journals. 10.5281/zenodo.5124741 (2021).
- 94.DataCite Metadata Working Group. DataCite metadata schema documentation for the publication and citation of research data and other research outputs v4.6. 10.14454/MZV1-5B55 (2024).
- 95.Barrett, T. BioSample. National Center for Biotechnology Information (US), https://www.ncbi.nlm.nih.gov/books/NBK169436/ (accessed 29 Mar 2019) (2013).
- 96.Cousijn, H., Feeney, P., Lowenberg, D., Presani, E. & Simons, N. Bringing citations and usage metrics together to make data count. CODATA18, 9–9, 10.5334/dsj-2019-009 (2019). [Google Scholar]
- 97.Mayernik, M., Schuster, D. & Clyne, J. Innovations in open science (IOS) planning workshop: Community expectations for a geoscience data commons -workshop report. NSF National Center for Atmospheric Research, 10.5065/GFBQ-8Y08 (2024).
- 98.Brown, J., Jones, P., Meadows, A. & Murphy F. UK PID consortium: Cost-benefit analysis. 10.5281/ZENODO.4772627 (2021).
- 99.Ball-Damerow, J. E. et al. Research applications of primary biodiversity databases in the digital age. PLoS One14, e0215794, 10.1371/journal.pone.0215794 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 100.Stewart, R. D. et al. Assembly of 913 microbial genomes from metagenomic sequencing of the cow rumen. Nat Commun9, 870, 10.1038/s41467-018-03317-6 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 101.Bernheim, A. et al. Prokaryotic viperins produce diverse antiviral molecules. Nature589, 120–124, 10.1038/s41586-020-2762-2 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 102.Planavsky, N., Hood, A., Tarhan, L., Shen, S. & Johnson, K. Store and share ancient rocks. Nature581, 137–139, 10.1038/d41586-020-01366-w (2020). [DOI] [PubMed] [Google Scholar]
- 103.Raia, N. et al. 4 Steps to Publish Open Earth Science Samples. [Documentation]. ESIP Figshare10.6084/m9.figshare.24291148.v1 (2023).
- 104.Sorensen, P. et al Sample collection metadata for soil cores from the East River Watershed, Colorado collected in 2017. [Dataset]. ESS-DIVE10.21952/WTR/1573029 (2019).
- 105.Sorensen, P. et al Soil nitrogen, water content, microbial biomass, and Archaeal, bacterial and fungal communities from the East River Watershed, Colorado collected in 2016-2017. [Dataset]. ESS-DIVE10.15485/1577267 (2019).
- 106.Alves, R. J. E. et al. Kinetic and temperature sensitivity properties of soil exoenzymes through the soil profile down to one-meter depth at a temperate coniferous forest (Blodgett, CA). [Dataset]. ESS-DIVE10.15485/1830417 (2021).
- 107.Merino, N. et al. Biogeochemistry of Pond B (Savannah River Site, South Carolina, USA): Water column and Sediments. [Dataset]. ESS-DIVE10.15485/1910298 (2021).
- 108.Coutelot, F. & Powell, B. Biogeochemistry of pond B (Savannah River Site, South Carolina, USA): Sediment core, total extraction data, pond B Savannah River Site July 2019. [Dataset]. ESS-DIVE. 10.15485/1910299 (2023).
- 109.Pennington, S. C. et al. EXCHANGE Campaign 1: A community-driven baseline characterization of soils, sediments, and water across coastal Gradients. [Dataset]. ESS-DIVE. 10.15485/1960313 (2023).
- 110.Garayburu-Caruso, V. A. et al. FTICR, NPOC, TN, and moisture of variably inundated sediment across 48 north American rivers. [Dataset]. ESS-DIVE.10.15485/1834208 (2021). [Google Scholar]
- 111.Forbes, B. et al. WHONDRS river corridor dissolved oxygen, temperature, sediment aerobic respiration, grain size, and water chemistry from machine-learning-informed sites across the contiguous United States (v4). [Dataset]. ESS-DIVE.10.15485/1923689 (2023). [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Data Citations
- Lehnert, K. & EarthChem, - FAIR data for geochemistry, volcanology, and petrology [Presentation]. Zenodo.10.5281/zenodo.10737711 (2023).
- Merino, N. et al. Biogeochemistry of Pond B (Savannah River Site, South Carolina, USA): Water column and Sediments. [Dataset]. ESS-DIVE10.15485/1910298 (2021).
Supplementary Materials
Supplemental Information: Applying Recommended Practices in the ESS-DIVE Data Repository
Data Availability Statement
The resulting “Scientific Author Guide for Publishing Open Research Using Physical Samples,” as well as relevant community meeting presentations are available in the ESIP Figshare research repository72,73,88,103.
The ESS-DIVE data repository has 29 datasets (as of February 2025) compiled into a data portal collection for Environmental System Science samples, which generally include IGSN IDs and standard metadata for associated samples. This includes 10 datasets with detailed links to related samples and other research outputs69,70,81,82,104–111.
No new code was generated in this work.



