PMC Author Manuscript Dataset
The PMC Author Manuscript Dataset (“Dataset”) consists of all author manuscripts that have been made available in PMC in compliance with the NIH Public Access Policy or similar policies of other funders since July 2008. The text of manuscripts in the Dataset may be retrieved in XML and plain text formats using the retrieval methods described below.
- Not all articles in PMC are available for text mining and other reuse.
- The PMC Cloud Service, PMC OAI-PMH Service, PMC FTP Service, E-Utilities and BioC API are the only services that may be used for automated retrieval of PMC content. Systematic retrieval (or bulk retrieval) of articles through any other automated process is prohibited.
- License terms vary. Please refer to the license statement in each article for specific terms of use.
- Users of this dataset are directly and solely responsible for compliance with copyright restrictions and are expected to adhere to the terms and conditions defined by the copyright holder (see the PMC Copyright Notice).
Retrieval Methods
April 13, 2026: Update on PMC Article Dataset Distribution Changes
As announced on February 12, major changes to PMC's Article Dataset Distribution Services are underway.
On April 13, all legacy files for the PMC Article Datasets were moved to new temporary directories and prefixes on the PMC FTP and Cloud Services.
- FTP Service: all legacy files were moved to a new directory named "deprecated."
- Cloud Service: all legacy prefixes were updated to add "deprecated" to the prefix. Prefixes for legacy files now begin with //pmc-oa-opendata/deprecated/.
This intentional disruption alerts users to the upcoming changes to the PMC Cloud Service on AWS, while allowing for easy updates to keep existing automated workflows running. We encourage users of the legacy PMC FTP and PMC Cloud Services to begin working with the updated PMC Cloud Service structure and to adjust existing workflows.
All legacy files on the FTP and Cloud Services will be removed in August 2026.
For complete details about this transition, please see the NCBI Insights blog post and our documentation on Accessing PMC Article Datasets Using Amazon Web Services
The Author Manuscript Dataset is available via:
Terms of Use
Author manuscripts with specific licenses may be used according to the terms of their licenses. All other author manuscript files are available for text mining. They may also be used consistent with the principles of applicable copyright law.
How to Cite
- NIH NLM NCBI PubMed Central (PMC) Article Datasets - Full-Text Biomedical and Life Sciences Journal Articles on AWS was accessed on DATE from https://registry.opendata.aws/ncbi-pmc.