Skip to main content

Cloud Service

PMC provides cloud service access to the following subsets of the PMC Article Datasets:

  • PMC Open Access Subset - The articles available in the PMC Open Access Subset are available for reuse based on terms specified by the publisher. The majority of available articles have machine-readable Creative Commons licenses.
  • Author Manuscript Dataset - The complete Author Manuscript Dataset, which includes those articles collected under a funder policy in PMC and made available in machine-readable formats for text mining. NOTE: Author manuscripts with Creative Commons licenses are also part of the PMC Open Access Subset.
  • Historical OCR Dataset - Historical articles scanned as part of an NLM digitization project published in the 18th, 19th, and 20th centuries that have Creative Commons licenses. NOTE: These articles are also part of the PMC Open Access Subset.

As the custodian of these datasets, PMC works to ensure the contents are available in formats and through channels that enable new discovery, in a manner consistent with copyright law.

April 13, 2026: Update on PMC Article Dataset Distribution Changes

As announced on February 12, major changes to PMC's Article Dataset Distribution Services are underway.

On April 13, all legacy files for the PMC Article Datasets were moved to new temporary directories and prefixes on the PMC FTP and Cloud Services.

  • FTP Service: all legacy files were moved to a new directory named "deprecated."
  • Cloud Service: all legacy prefixes were updated to add "deprecated" to the prefix. Prefixes for legacy files now begin with //pmc-oa-opendata/deprecated/.

This intentional disruption alerts users to the upcoming changes to the PMC Cloud Service on AWS, while allowing for easy updates to keep existing automated workflows running. We encourage users of the legacy PMC FTP and PMC Cloud Services to begin working with the updated PMC Cloud Service structure and to adjust existing workflows.

All legacy files on the FTP and Cloud Services will be removed in August 2026.

For complete details about this transition, please see the NCBI Insights blog post and our documentation on Accessing PMC Article Datasets Using Amazon Web Services

Files

The files that PMC distributes via our cloud service for each article include

  • Metadata in JSON
  • Full-text of the article in NISO Z39.96-2015 JATS XML
  • Full-text of the article in plain text as extracted from the XML
  • Full article PDF (when available)
  • Media files (when available)
  • Supplementary materials (when available)

NOTE: PDFs, Media files and supplementary materials for author manuscripts without Creative Commons licenses are NOT made available as part of the PMC Article Datasets.

In addition to the files above, PMC makes a CSV inventory file available. It is updated on a daily basis.

Cloud Service Providers

All data made available on cloud services are managed by the National Library of Medicine (NLM). Currently cloud service is only available through the Amazon Web Service (AWS).

Update Frequency

Continuous

Article versions are updated continuously. Updates include:

  • the addition of new article versions
  • the update of one or more objects belonging to an existing article version
  • in rare cases, the removal of an article version

License

Articles in these datasets are made available consistent with either the terms of applicable article-level license statements or the funder’s policy. See PMC Copyright for more information.

Contact

pubmedcentral@ncbi.nlm.nih.gov

How to Cite

NIH NLM NCBI PubMed Central (PMC) Article Datasets - Full-Text Biomedical and Life Sciences Journal Articles on AWS was accessed on DATE from https://registry.opendata.aws/ncbi-pmc.