OAI-PMH Service
The PubMed Central OAI-PMH service (PMC-OAI) provides access to metadata of all items in the PubMed Central (PMC) archive, as well as to the full text of a subset of these items.
PMC-OAI is an implementation of the Open Archives Initiative Protocol for Metadata Harvesting (OAI-PMH), a standard for retrieving metadata from digital document repositories. Visit the Open Archives Initiative site for more information about the protocol and other activities of the OAI group.
PMC-OAI supports OAI-PMH version 2.0. It does not support earlier versions of the protocol.
PMC also provides a simpler OA web service, which might be more appropriate for programmatic access, depending on your requirements.
If you have questions or comments about this, or any of the other services provided by PMC, please write to the PMC help desk. To stay informed about new or updated tools or services provided by PMC, subscribe to the PMC-Utils-Announce mailing list.
Copyright
Most of the items in this archive are copyright protected, with copyright held by the author(s) or the depositing journal. In general, the OAI service can not be used to retrieve the full text of articles in PMC. The only exceptions to this policy are for articles that are in the public domain and those that are made available under an Open Access provision (as defined in the Open Access Subset). See the PMC Copyright Notice for more information.
Using the PMC-OAI Service
The base URL for the service is https://www.ncbi.nlm.nih.gov/pmc/oai/oai.cgi.
See the examples, below, for details about how to use this service to retrieve PMC metadata.
High-Volume Retrievals
If you are using a script that makes more than 100 requests of any kind, please run it outside of the PMC system's peak hours. Also, please make sure that your system does not make concurrent requests, even at off-peak times. Peak hours are Monday to Friday, 5:00 AM to 9:00 PM, U.S. Eastern time.
The PMC-OAI Service supports HTTP compression, and the Accept-Encoding HTTP header must be set to 'gzip, deflate' for the most efficient data transfer. See also HTTP compression for further information about this capability.
Access to Full Text
Some PMC journals allow harvesting of the full text of all items, others allow it for only some items, and many do not allow it at all. See the PMC Open Access Subset for specifics.
In addition, the parameter set=pmc-open
identifies the complete collection of items in PMC for which the full text may be harvested.
Supported Data Formats
Records may be retrieved from the PMC archive in one of the following formats:
Formats:
- The NISO JATS Journal Archiving and Interchange XML format, for metadata or full-text article records. Schemas and complete documentation are available from https://jats.nlm.nih.gov/archiving/.
- Metadata: metadataPrefix=pmc_fm
- Full Text: metadataPrefix=pmc
- The Dublin Core format, http://dublincore.org
- Metadata: metadataPrefix=oai_dc
OAI PMC identifiers
The identifiers used by this system are of the form:
oai:pubmedcentral.nih.gov:pmcid
where pmcid
is the numerical portion of the article's PMCID. For example, the OAI identifier for the article PMC7414748 is oai:pubmedcentral.nih.gov:7414748.
Automatic Segmentation of Large Result Sets
If a ListIdentifiers
request results in more than 500 hits, PMC-OAI will return the first 500 with a resumptionToken
that can be used to get the remaining items.
If a ListRecords
request results in more than 25 hits for PMC full text, 50 hits for PMC metadata, or 50 hits for Dublin Core format metadata, PMC-OAI will return the first 25, or 50 records, respectively, with a resumptionToken
.
Examples
Identify
Identify the PMC-OAI interface.
https://www.ncbi.nlm.nih.gov/pmc/oai/oai.cgi?verb=Identify
ListMetadataFormats
List metadata formats available in PMC.
https://www.ncbi.nlm.nih.gov/pmc/oai/oai.cgi?verb=ListMetadataFormats
List metadata formats available for an article with the identifier oai:pubmedcentral.nih.gov:7414748
ListIdentifiers – return a fixed number of identifiers per request
Typically, the response to this request will contain 500 identifiers, and will include a resumption token at OAI-PMH/ListSets/resumptionToken.
List (the first set of) identifiers for all articles released in PMC after 1/1/2021 in PMC front matter format
If the above request returned resumptionToken="oai%3Apubmedcentral.nih.gov%3A64730!2021-01-01!!pmc_fm!"
, then the following request returns the next set of identifiers:
List identifier for articles in the set=bmj
, released between March to July 2020, available in full-text XML
ListRecords – return a fixed number of records per request
Typically, the response will contain ten full-text XML records.
List (the first set of) full text XML for the set=bmj
released between 3/22/2020 to 6/12/2020
https://www.ncbi.nlm.nih.gov/pmc/oai/oai.cgi?verb=ListRecords&from=2020-03-22&until=2020-06-12&set=bmj&metadataPrefix=pmc
ListSets – return a fixed number of sets per request
This response will typically contain the first 500 sets. Use the resumptionToken
to get the next set.
https://www.ncbi.nlm.nih.gov/pmc/oai/oai.cgi?verb=ListSets
GetRecord – retrieve a record by identifier.
Get record in Dublin Core format with an identifier.
Get record in PMC front matter format with an identifier.
Get record in PMC full-text XML format with an identifier.