Skip to main content
. 2020 Aug 28;3:474. doi: 10.1038/s42003-020-01204-9

Table 1.

Recommendations for the future improvement of data archiving practices.

Studies affected Issue Recommendations
31.4 %

Data is not readily accessible

• Data is not deposited

• Data is not deposited to INSDC-affiliated databases

• Accession numbers are incorrect

• Data is private

• Metadata is private

Researchers:

• Make deposited data available upon a manuscript’s publication.

• Ensure accession numbers are correct in the published article.

• Develop community standards on removal of identifying human reads and storage of clean microbiome data.

Publishers:

• Require that the sequencing data is available upon article submission, and remind authors to make the data publicly available by the time of publication42.

• Demand that datasets are deposited to the appropriate INSDC databases prior to submission in order to guarantee their long-term availability.

Data archives:

• Require that users select a date to make data public during the deposition process.

23.6%

Changes in data formatting practices

• Data is uploaded in legacy file formats

• Single sequence files are uploaded for paired-end data

Researchers:

• Ensure that a minimum set of data is provided in order to allow for reproducibility. This includes formally collecting and depositing metadata to include experiment, sample, and sequence information; and recording protocols using modern tools43 (i.e., protocols.io for laboratory protocols and R Notebooks or Jupyter Notebooks for bioinformatics code).

Data archives:

• Allow for the deposition of more diverse sequence file types, (i.e., allow for the deposition of sequence metadata files).

• Develop new standards which require the reporting of metadata on sequencing and sequence processing. Essential information such as DNA extraction, sequencing, and computational processing and data provenance should be providable via a DOI.

• Have a common and precise language regarding ‘best practices’ for data deposition (e.g., the inclusion of primers)17.

• Keep publicly available changelogs of database guidelines, so that users may understand how and why data was deposited in a particular format in the past.

14.6%

Mislabeling

• Amplicon sequences not listed as ‘amplicon’

• Single sequence files are uploaded for paired-end data

Researchers:

• Become familiarized with the terms associated with sequencing and sequence formats for proper data upload44.

• Proactive interaction with database holders (i.e., helpdesk) to ensure that data deposition is done correctly.

Publishers:

• Demand that the metadata tables be included during article submission for peer review.

Data archives:

• Recognize that amplicon sequencing is an increasingly interdisciplinary technique, and continue the current trend towards improved documentation and explanations. In particular, users may benefit from more precise guidelines into what constitutes informative metadata for the purposes of archiving (e.g., listing the environment as ‘human’ vs. ‘human gut’, Supplementary Fig. 4).