Abstract
The National Center for Biotechnology Information (NCBI) provides biomedical data resources including PubMed®, a repository of citations and abstracts published in life science journals, and ClinicalTrials.gov, a repository of clinical research summaries. NCBI also hosts the NIH Comparative Genomics Resource (CGR) that aims to maximize the impact of eukaryotic genome datasets. NCBI provides search and retrieval operations for most of these data from 40 distinct repositories, knowledgebases, and services. The E-utilities serve as the programming interface for most of these. Resources receiving significant updates in the past year include PubMed, PMC, Bookshelf, SciENcv, CGR, ClinicalTrials.gov, ClinVar, dbSNP, GTR, Pathogen Detection, antimicrobial resistance resources, and PubChem. These resources can be accessed through the NCBI home page at https://www.ncbi.nlm.nih.gov.
Graphical Abstract
Graphical Abstract.
Introduction
The National Center for Biotechnology Information (NCBI), a center within the National Library of Medicine (NLM) at the National Institutes of Health (NIH), was created in 1988 to develop information systems for molecular biology [1]. In this article, we provide a brief overview of the NCBI collection of databases, followed by a summary of resources that we significantly updated in the past year. In a slight change from previous years, we discuss updates to DNA and protein sequence resources separately in a companion paper.
NCBI maintains a set of 40 biomedical data resources that collectively contain 5.2 billion records (Table 1). Most of which are accessible through the Entrez search and retrieval system [2]. An Entrez search bar appears near the top of the NCBI home page (https://www.ncbi.nlm.nih.gov) and of the pages of these various resources. Each Entrez resource supports simple text queries as well as more complex queries containing Boolean operators (“AND,” “OR,” and “NOT”) and fielded term searches that users can explore in the “Advanced” search linked in the search bar. Each resource also provides multiple data formats appropriate for its data type and offers various downloading functions to retrieve data. In many resources, each record functions much like a node in a knowledge graph, as it is linked to records in the same and other Entrez resources based on relationships asserted by submitters, curators, or computational analysis. Details about these formats and links are available through the home pages of each resource.
Table 1.
NCBI Data Resources (as of 3 September 2025)
| Database | Records | Annual growth | Description |
|---|---|---|---|
| Literature | |||
| PubMed | 39 334 316 | 5% | Scientific and medical abstracts/citations |
| PubMed Central | 11 230 676 | 10% | Full-text journal articles |
| NLM Catalog | 1 653 376 | 0.2% | Index of NLM collections |
| Bookshelf | 1 121 186 | 6% | Books and reports |
| MeSH | 355 572 | 0.1% | Ontology used for PubMed indexing |
| DNA/RNA | |||
| Nucleotide | 673 713 256 | 6% | DNA and RNA sequences from GenBank and RefSeq |
| BioSample | 47 538 647 | 18% | Descriptions of biological source materials |
| SRA | 40 333 544 | 16% | High-throughput DNA/RNA sequence read archive |
| Taxonomy | 2 828 854 | 4% | Taxonomic classification and nomenclature catalog |
| BioProject | 926 825 | 14% | Biological projects providing data to NCBI |
| BioCollections | 8 497 | 0% | Museum, herbaria, and biorepository collections |
| Genes | |||
| GEO Profiles | 128 414 055 | 0% | Gene expression and molecular abundance profiles |
| Gene | 63 123 617 | 15% | Collected information about gene loci |
| GEO Datasets | 8 296 887 | 9% | Functional genomics studies |
| Proteins | |||
| Protein | 1 500 527 183 | 12% | Protein sequences from GenBank and RefSeq |
| Identical Protein Groups | 994 033 355 | 22% | Protein sequences grouped by identity |
| Structure | 240 990 | 8% | Experimentally determined biomolecular structures |
| Protein Family Models | 177 908 | 11% | Conserved domain architectures, HMMs, and BlastRules |
| Conserved Domains | 67 160 | 0% | Conserved protein domains |
| Chemicals | |||
| PubChem Substance | 337 779 072 | 6% | Deposited substance and chemical information |
| PubChem Compound | 122 265 315 | 3% | Chemical information with structures, information, and links |
| PubChem BioAssay | 1 768 720 | 6% | Bioactivity screening studies |
| PubChem Pathways | 250 942 | 4% | Molecular pathways with links to genes, proteins, and chemicals |
| Clinical Genetics | |||
| dbSNP | 1 197 210 835 | 7% | Short genetic variations |
| dbVar | 8 669 169 | 6% | Genome structural variation studies |
| ClinVar | 3 761 822 | 23% | Human variations of clinical significance |
| ClinicalTrials.gov | 551 551 | 9% | Registry of clinical studies |
| MedGen | 231 394 | 3% | Medical genetics literature and links |
| GTR | 68 280 | −3% | Genetic testing registry |
| dbGaP | 1406 | 0% | Genotype–phenotype interaction studies |
All NCBI resources are committed to following and periodically evaluating their alignment with emerging principles [3] for reliable and competent data management such as FAIR (Findable, Accessible, Interoperable, and Reusable) [4] and TRUST (Transparency, Responsibility, User focus, Sustainability, and Technology) [5]. FAIR principles accelerate discovery by focusing on the quality of data objects and data sharing by promoting standardized data formats easily read by machines and stable identifiers that allow data to be retrieved consistently over time. The more recent TRUST Principles extend this by promoting reliable and sustainable repositories that communities can trust to preserve data through periods of changing technology and/or community requirements. Users can find details of specific efforts to align with these practices on resource web sites.
Literature resources
PubMed
PubMed (https://pubmed.ncbi.nlm.nih.gov) provides free online access to citations and abstracts for biomedical literature and facilitates searching across the MEDLINE, PubMed Central and Bookshelf literature resources. In the past year, PubMed added ∼1.7 million citations, growing the database to >39 million citations in 2025. We recently updated the filters interface on the PubMed search results page to provide a more intuitive, user-friendly experience (https://www.nlm.nih.gov/pubs/techbull/so24/so24_pubmed_filters_improvements.html). We designed these updates based on user feedback, web analytics, interviews, and hands-on usability testing with PubMed users from different backgrounds, such as medical librarians, clinicians, and scientists among others. Additionally, the PubMed homepage now includes information about recent development updates and other PubMed-related highlights (https://www.nlm.nih.gov/pubs/techbull/jf25/jf25_pubmed_news.html).
PubMed central
PubMed Central® (PMC) is a free full-text archive of biomedical and life sciences literature (https://pmc.ncbi.nlm.nih.gov). In 2025, PMC added >900 000 full-text articles bringing the total size of the archive to >11 million articles. These include articles from peer-reviewed journals, author manuscripts funded by NIH and other research funders, and preprints collected under the NIH Preprint Pilot. In 2025, we updated the PMC full-text search (https://pmc.ncbi.nlm.nih.gov/search/), the next step in the ongoing modernization of the PMC product and services (Fig. 1). The update transitions PMC search to the same platform used by PubMed and provides more robust search functionality and more accurate results. We also continued updating the technology behind several public PMC APIs and utilities that are now available as cloud services (Table 2).
Figure 1.
View of the updated search interface in PMC including new controls and filtering options.
Table 2.
Updated PMC API services
| API name | API information |
|---|---|
| EFetch | https://pmc.ncbi.nlm.nih.gov/about/new-in-pmc/#2025-03-05 |
| PMC ID Converter | https://pmc.ncbi.nlm.nih.gov/tools/id-converter-api/ |
| PMC XML Style Checker | https://pmc.ncbi.nlm.nih.gov/tools/stylechecker/ |
| OAI-PMH | https://pmc.ncbi.nlm.nih.gov/tools/oai/ |
Bookshelf
The NCBI Bookshelf provides free online access to full-text books and documents in life sciences, healthcare, and medicine. Bookshelf contains over a dozen formats collected by the NLM including monographs, reviews, reference works, government publications, standards and guidelines, technical reports, and textbooks. In the past year, Bookshelf added over 1550 books, growing the repository to over 14 600 total books from 185 content providers. Significant peer-reviewed collections added and updated in 2025 were in the subjects of chronic health, toxicology, and epidemiology. Bookshelf continues to support public access to nonjournal article documents such as systematic reviews, technical reports, and data briefs voluntarily submitted by several federal agencies, primarily in the Department of Health and Human Services, making it easier for the public to discover and cite these materials.
SciENcv
SciENcv (Science Experts Network Curriculum Vitae) is a valuable tool for researchers applying for federal funding from agencies such as the NIH, NSF, USDA, and the US Department of Energy. Available at https://www.ncbi.nlm.nih.gov/sciencv, the platform enables users to create and maintain biosketches that meet agency-specific requirements. By linking a SciENcv account to ORCID, researchers benefit from enhanced functionality, including the ability to auto-populate fields with ORCID data, incorporate citations directly from their ORCID profile, and include a persistent identifier on application documents, which several agencies have begun to require as part of the grant application process to support researcher identification. SciENcv continues to evolve in response to user needs and federal agency requirements. Recent enhancements include expanded partnerships with other US federal agencies, the launch of an XML upload feature for Current and Pending (Other) Support documents, and new capabilities for delegates assisting principal investigators with drafting application materials. SciENcv will continue to evolve to support changing federal requirements, particularly as agencies adopt more standardized forms and seek more detailed applicant information.
NIH Comparative Genomics Resource
The NIH Comparative Genomics Resource (CGR) (https://www.ncbi.nlm.nih.gov/cgr/) maximizes the impact of eukaryotic research organisms and their genomic data to biomedical research [3]. CGR includes work to expand and improve genome-related data, to develop new and improved tools for accessing and analyzing data as part of an NCBI toolkit, and to collaborate with communities and other resources to better interconnect data throughout the global biodata ecosystem and integrate into user workflows. Over its initial five-year focused development period, CGR has resulted in new public tools to improve data quality including NCBI’s Foreign Contamination Screening (FCS) tool suite [6] that identifies and removes contaminating sequences from newly sequenced genomes, and NCBI’s publicly released Eukaryotic Genome Annotation Pipeline (EGAPx, https://github.com/ncbi/egapx) that generates high quality annotation of genes and proteins in metazoan and plant genomes. EGAPx now produces output optimized for submission to GenBank, and since FCS is also used for screening new eukaryotic and prokaryotic genomes submissions, together we expect these tools to increase the number of high-quality annotated eukaryotic genomes available for future comparative genomics studies.
CGR has also expanded the number of tools available for data access and analysis to accelerate scientific discovery. NCBI datasets (https://www.ncbi.nlm.nih.gov/datasets/) support FAIR access to genome, gene, and ortholog information with user-friendly web interfaces, command-line tools, and documented APIs [7]. The Comparative Genome Viewer (https://ncbi.nlm.nih.gov/genome/cgv/, Fig. 2) [8] and Multiple Comparative Genome Viewer (https://www.ncbi.nlm.nih.gov/mcgv/) visualize either pairwise or multiple genome alignments allowing users to examine sequence and structural differences between genomes and how those differences may affect annotated genes. Improvements in BLAST such as ClusteredNR [9] help users better explore protein sequence diversity across the tree of life. The tools developed as part of CGR support a wide range of organisms and are intended to scale with the continuing exponential increase in genomic data.
Figure 2.
Graphical comparison of the Homo sapiens and Pan troglodytes genomes created by the NCBI CGV.
CGR has greatly increased connectivity and collaboration with other resources, including integration of more data from UniProt [10], Ensembl [11], UCSC [12], the Alliance of Genome Resources [13], and others. CGR tools are being used to power new analysis tools like BRC Analytics (https://brc-analytics.org/). With genomes now publicly available for over 20 000 eukaryotic species, the potential of applying comparative genomic data in diverse research applications has never been greater. We continue to encourage feedback on our efforts through feedback buttons on most web pages or e-mails to cgr@nlm.nih.gov.
Clinical resources
ClinicalTrials.gov
Clinical trial registries and results databases are designed to make summaries of clinical research publicly accessible and available in a centralized repository for patients, caregivers, researchers, and the general public. ClinicalTrials.gov (https://clinicaltrials.gov) is the world’s largest publicly available clinical trial registry and results database containing over 540 000 studies with nearly 4 million website visitors each month. We now provide “Fast Forward,” a series of short videos to educate users on the modernized website. They address common user questions on how to accomplish tasks on the website, such as how to search for and download studies of interest, and are available at https://www.nlm.nih.gov/oet/ed/ct/demo_videos.html. The modern website also provides an updated API (https://clinicaltrials.gov/data-api/api) that is consistent with other publicly accessible APIs and delivers standardized data.
ClinVar
ClinVar [14] archives human genetic variants classified for diseases and drug responses and contains both genomic and somatic variants and functional assertions. Over the past year, ClinVar added 625 000 new variants processed from >1 million submitted records. We also updated ClinVar to better represent functional data that are critical to resolve variants of uncertain significance and variants with conflicting classifications. A given laboratory may generate and submit functional data as part of the evidence for a classification submitted to ClinVar. Research and diagnostic laboratories may also submit functional data without a classification; this includes data from MAVEs (Multiplexed Assays of Variant Effect) that generate high-throughput functional data for variants even before they have been observed in a patient. We now require several fields for submissions of functional data, including the functional consequence of the variant, the assay type, the molecular phenotype measured, a short description of the assay, and the result of the assay for each variant. Optional fields include the disease context for the assay, a citation for the experimental method, the cell line or tissue type used for the assay, the number of replicates or controls for the assay, and a longer description of the assay’s result. We updated ClinVar’s XML files to better represent functional data as an observation of the variant, including a new attribute on the “ObservedIn” element to explicitly tag the observation as functional data. We also updated ClinVar variant pages to reflect these new data types, making it easier for the user to know when functional data are available to assist classifying variants.
MANE
The Matched Annotation from NCBI and EMBL-EBI (MANE) dataset provides a representative transcript called MANE Select for human genes to support clinical reporting and other applications [15]. MANE version 1.4, released in October 2024, incorporated the first set of noncoding genes. The next iteration of MANE (v1.5) was made available on MANE FTP (https://ftp.ncbi.nlm.nih.gov/refseq/MANE/MANE_human/) in the fall of 2025 and includes MANE Select transcripts for additional noncoding genes, revisions to MANE Select transcripts for six protein-coding genes, and additional MANE transcripts (MANE Plus Clinical) to represent significant alternate isoforms requested by clinical groups. MANE data from the new release are accessible in NCBI resources following an update to the RefSeq annotation of the human reference genome. The RefSeq team at NCBI welcomes feedback on MANE data as well as requests for MANE Plus Clinical transcripts sent to MANE-help@ncbi.nlm.nih.gov.
dbSNP and ALFA
In 2025 we released dbSNP Build 157 and ALFA Release 4 (R4) [16], significantly advancing these resources for genomic research. Build 157 contains over 1.5 billion RefSNP (rs) records, integrating data from major sources including gnomAD v4 [17] and ALFA R4. This build offers crucial annotations, with over 930 million variants having allele frequencies and ∼1.3 million linked to clinical significance from ClinVar. ALFA R4 represents a major milestone, nearly doubling the cohort size to ∼409 000 subjects. This expansion dramatically enhances clinical utility by providing frequency data for over 959 000 ClinVar variants, a 74% increase from the previous release. Aggregating data from ∼898 million total variants, ALFA R4 provides precise allele frequencies for 12 major populations, making it a critical tool for interpreting both rare and common variations. Together, these updates provide an invaluable resource for understanding human genetic diversity and its impact on health, driving improvements in personalized medicine and disease genetics. More details and information about access are available for dbSNP at https://www.ncbi.nlm.nih.gov/snp/ and for ALFA at https://www.ncbi.nlm.nih.gov/snp/docs/gsr/alfa/.
GTR
The NIH Genetic Testing Registry (GTR®) [18] is an international database that currently provides information on 68 273 clinical and 185 research tests from 314 laboratories. Each test record has a unique identifier (GTR ID) and is versioned. GTR helps clinicians choose appropriate tests for patient care, supports laboratories in identifying gaps to expand testing options, aids researchers in exploring the genetic basis of diseases, provides standardized test descriptions for payers’ billing and reimbursement reviews, supports professional societies in advocating for laboratory practice standardization and creating guidelines, and enables public health professionals to assess the genetic testing market, compare quality metrics, analyze testing technologies, identify trends, and evaluate the clinical impact and utilization of genetic tests. GTR made software upgrades and implemented new features based on user feedback, analytics, survey results, and market research activities. The upgrades increased the flexibility and depth of GTR’s search tools, increasing its value to the user community. Specific improvements include a simplified search box that enables complex queries in which users can select disease names, genes, and laboratories names from an autocomplete dictionary and search with a text string. The search logic now returns a list of tests that match the search query, and we optimized search filters by adding the ability to filter by genes, diseases, laboratory names, number of genes, laboratory certifications, and services. We implemented a pop-up that lists all labs in search results including laboratories with no tests registered in GTR but with a laboratory description that includes services available to the community. When the search query results in only one gene, disease, or laboratory, a summary box appears and provides relevant information and access to pages with more detailed information. We enhanced the advanced search by enabling searches for “All fields,” searches by American Medical Association Current Procedural Terminology (AMA CPT®) codes (https://www.ama-assn.org/practice-management/cpt), and an option to change Boolean logic. Users can also select a set of tests or the full set of results and download descriptions of the retrieved tests.
Pathogen detection
The NCBI Pathogen Detection Project (https://www.ncbi.nlm.nih.gov/pathogens/) helps public health scientists investigate disease outbreaks by integrating pathogen genomic sequences obtained from cultured bacterial isolates and quickly clustering and identifying related sequences [1]. It has been used successfully to help uncover international foodborne outbreaks and has reduced the burden of disease in the United States for foodborne pathogens [19, 20] along with other success stories (https://www.ncbi.nlm.nih.gov/pathogens/success_stories). As of August 2025, over 2 474 299 pathogen isolates covering 100 bacterial taxa and one emerging fungal pathogen, Candidozyma auris (renamed from Candida auris [21]), are available for analysis. The Isolates Browser (https://www.ncbi.nlm.nih.gov/pathogens/isolates) displays these analysis results on a daily basis, and they are also available on Google Cloud (https://www.ncbi.nlm.nih.gov/pathogens/docs/gcp). These data remain central to many bacterial outbreak detection efforts in the United States and internationally. The FDA, through the GenomeTrakr project, has used NCBI Pathogen Detection to initiate 1332 actions intended to protect consumers from foodborne illness (https://www.fda.gov/food/whole-genome-sequencing-wgs-program/genometrakr-network).
Antimicrobial resistance
The Pathogen Detection team has continued to improve and release updated resources for antimicrobial resistance (AMR) (https://www.ncbi.nlm.nih.gov/pathogens/antimicrobial-resistance/) [22]. The team has curated 9353 total proteins (8119 AMR proteins, 257 stress response proteins, and 977 virulence proteins) as well as 1500 point mutations and 5503 publication references for proteins and point mutations in the July 2025 release. We analyze all bacterial isolates in the Pathogen Detection Isolates Browser with AMRFinderPlus [22], and the three categories of genes (AMR, stress response, and virulence) are available in the Isolates Browser. Currently, over 2 356 946 isolates have at least one identified AMR gene, over 1 953 938 have at least one identified stress response gene and over 1 631 099 have at least one identified virulence gene. For the subset of isolates with assemblies in GenBank, MicroBIGG-E (the Microbial Browser for Identification of Genetic and Genomic Elements, https://www.ncbi.nlm.nih.gov/pathogens/microbigge) provides detailed information and sequences for over 42 829 000 genes and point mutations identified by AMRFinderPlus in over 1 866 000 assemblies. These data are available on Google Cloud, including the sequence of those contigs with elements identified by AMRFinderPlus. Researchers are using the AMRFinderPlus results to examine the distribution of AMR genotypes at the scale of tens of thousands of pathogen isolates [23–26]. To provide geographic context for the MicroBIGG-E data, AMR element data for those isolates in MicroBIGG-E with location data in the “geo_loc_name” field in BioSample are included the MicroBIGG-E Map (https://www.ncbi.nlm.nih.gov/pathogens/microbigge_map/). The Antibiotic Susceptibility Test (AST) Browser (https://www.ncbi.nlm.nih.gov/pathogens/ast/) allows users to search submitter-provided AST data for over 33 200 isolates (https://www.ncbi.nlm.nih.gov/pathogens/submit-data/#ast). It also includes additional data, such as measurement values and testing methods, that are not found in the Isolates Browser display. So, users can examine the relationship between measurement values and genomic features.
Chemical resources
PubChem, the largest public repository of information about chemicals [27], added over 60 new data sources in the past year and is now providing chemical information for 122 million compounds. Notably, we added regulatory information from the US Food and Drug Administration (FDA) regarding color additives (https://www.hfpappexternal.fda.gov/scripts/fdcc/index.cfm?set=ColorAdditives) and food contact substances (https://www.fda.gov/food/food-ingredients-packaging/packaging-food-contact-substances-fcs) used in manufacturing, packing, packaging, transporting, or holding food. Data integration with the FDA Generally Recognized As Safe (GRAS) Notice Inventory (https://www.fda.gov/food/food-ingredients-packaging/generally-recognized-safe-gras) now allows users to readily find the GRAS notices for substances used in food and review the basis for a substance’s GRAS designation under its intended conditions of use in food. In addition, we incorporated chemical toxicity information from the Risk Assessment Information System (RAIS) (https://rais.ornl.gov/) and the National Toxicology Program (NTP) Technical Reports (https://ntp.niehs.nih.gov/data/tr) into PubChem. We also integrated information from the California Safe Cosmetics Program (CSCP) Product Database (https://cscpsearch.cdph.ca.gov/search/publicsearch) on cosmetic ingredients known or suspected to cause harm to human health. We also loaded PubChemRDF data [machine-readable PubChem data in the Resource Description Framework (RDF) format] [28, 29] into two RDF databases (Virtuoso [30] and Qlever [31]) available in docker containers (https://pubchem.ncbi.nlm.nih.gov/docs/rdf-cloud). This allows users to readily deploy an RDF database containing PubChem data on a local machine or virtual machine on a cloud computing platform (e.g. Google Cloud Platform) and explore the data using SPARQL queries.
For further information
The resources described here include documentation, other explanatory materials, and references to collaborators and data sources on their respective web sites. A variety of video tutorials are available on the NLM YouTube channel that can be accessed through links in the standard NCBI page footer. User-support staff are available to answer questions at info@ncbi.nlm.nih.gov, and users can view support articles at https://support.nlm.nih.gov. Updates on NCBI resources and database enhancements are described on the NCBI Insights blog (https://ncbiinsights.ncbi.nlm.nih.gov/) and on resource web pages.
Acknowledgements
The authors would like to thank all the NCBI staff who through their dedicated efforts continue to enable NCBI to provide our full collection of services to the community.
Author contributions: Eric Whitney Sayers (Conceptualization [lead], Writing—original draft [lead], Writing—review & editing [lead]), Evan Bolton (Data curation [lead], Software [lead], Writing—original draft [equal]), Anna M Fine (Data curation [lead], Software [lead], Writing—original draft [equal]), Chris Kelly (Data curation [equal], Writing—original draft [supporting]), Sunghwan Kim (Data curation [equal], Software [equal], Writing—original draft [supporting]), Melissa J. Landrum (Data curation [lead], Software [equal], Writing—original draft [equal]), Stacy Lathrop (Data curation [equal], Writing—original draft [equal]), Adriana Malheiro (Data curation [lead], Software [lead], Writing—original draft [equal]), Terence D. Murphy (Conceptualization [equal], Data curation [lead], Software [equal], Writing—original draft [equal]), Lon Phan (Data curation [lead], Software [lead], Writing—original draft [equal]), Shashikant Pujar (Data curation [equal], Software [equal], Writing—original draft [supporting]), Bart Trawick (Data curation [lead], Software [lead], Writing—original draft [equal]), Valerie Anne Schneider (Conceptualization [equal], Funding acquisition [equal], Supervision [equal], Writing—review & editing [equal]), and Kim D. Pruitt (Conceptualization [equal], Funding acquisition [lead], Supervision [lead], Writing—review & editing [lead])
Contributor Information
Eric W Sayers, National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Building 38A, 8600 Rockville Pike, Bethesda, MD 20894, United States.
Evan E Bolton, National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Building 38A, 8600 Rockville Pike, Bethesda, MD 20894, United States.
Anna M Fine, National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Building 38A, 8600 Rockville Pike, Bethesda, MD 20894, United States.
Christopher Kelly, National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Building 38A, 8600 Rockville Pike, Bethesda, MD 20894, United States.
Sunghwan Kim, National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Building 38A, 8600 Rockville Pike, Bethesda, MD 20894, United States.
Melissa Landrum, National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Building 38A, 8600 Rockville Pike, Bethesda, MD 20894, United States.
Stacy Lathrop, National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Building 38A, 8600 Rockville Pike, Bethesda, MD 20894, United States.
Adriana Malheiro, National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Building 38A, 8600 Rockville Pike, Bethesda, MD 20894, United States.
Terence D Murphy, National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Building 38A, 8600 Rockville Pike, Bethesda, MD 20894, United States.
Lon Phan, National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Building 38A, 8600 Rockville Pike, Bethesda, MD 20894, United States.
Shashikant Pujar, National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Building 38A, 8600 Rockville Pike, Bethesda, MD 20894, United States.
Barton W Trawick, National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Building 38A, 8600 Rockville Pike, Bethesda, MD 20894, United States.
Valerie A Schneider, National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Building 38A, 8600 Rockville Pike, Bethesda, MD 20894, United States.
Kim D Pruitt, National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Building 38A, 8600 Rockville Pike, Bethesda, MD 20894, United States.
Conflict of interest
None declared.
Funding
This work was supported by the National Center for Biotechnology Information of the National Library of Medicine (NLM), National Institutes of Health (NIH). The contributions of the NIH authors are considered Works of the United States Government. The findings and conclusions presented in this paper are those of the authors and do not necessarily reflect the views of the NIH or the US Department of Health and Human Services. Funding to pay the Open Access publication charges for this article was provided by the National Library of Medicine, National Institutues of Health.
References
- 1. Sayers EW, Beck J, Bolton EEet al. Database resources of the National Center for Biotechnology Information in 2025. Nucleic Acids Res. 2025;53:D20–9. 10.1093/nar/gkae979. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 2. Schuler GD, Epstein JA, Ohkawa Het al. Entrez: molecular biology database and retrieval system. Methods Enzymol. 1996;266:141–62. [DOI] [PubMed] [Google Scholar]
- 3. Lin D, McAuliffe M, Pruitt KDet al. Biomedical Data Repository Concepts and Management Principles. Sci Data. 2024;11:622. 10.1038/s41597-024-03449-z. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 4. Wilkinson MD, Dumontier M, Aalbersberg IJet al. The FAIR Guiding Principles for scientific data management and stewardship. Sci Data. 2016;3:160018. 10.1038/sdata.2016.18. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 5. Lin D, Crabtree J, Dillo Iet al. The TRUST Principles for digital repositories. Sci Data. 2020;7:144. 10.1038/s41597-020-0486-7. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 6. Astashyn A, Tvedte ES, Sweeney Det al. Rapid and sensitive detection of genome contamination at scale with FCS-GX. Genome Biol. 2024;25:60. 10.1186/s13059-024-03198-7. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 7. O’Leary NA, Cox E, Holmes JBet al. Exploring and retrieving sequence and metadata for species across the tree of life with NCBI Datasets. Sci Data. 2024;11:732. 10.1038/s41597-024-03571-y. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 8. Rangwala SH, Rudnev DV, Ananiev VVet al. The NCBI Comparative Genome Viewer (CGV) is an interactive visualization tool for the analysis of whole-genome eukaryotic alignments. PLoS Biol. 2024;22:e3002405. 10.1371/journal.pbio.3002405. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 9. Sayers EW, Bolton EE, Brister JRet al. Database resources of the National Center for Biotechnology Information in 2023. Nucleic Acids Res. 2023;51:D29–38. 10.1093/nar/gkac1032. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 10. UniProt Consortium. UniProt: the Universal Protein Knowledgebase in 2025. Nucleic Acids Res. 2025;53:D609–17. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 11. Dyer SC, Austine-Orimoloye O, Azov AGet al. Ensembl 2025. Nucleic Acids Res. 2025;53:D948–57. 10.1093/nar/gkae1071. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 12. Perez G, Barber GP, Benet-Pages Aet al. The UCSC Genome Browser database: 2025 update. Nucleic Acids Res. 2025;53:D1243–9. 10.1093/nar/gkae974. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 13. Bult CJ, Sternberg PW. The alliance of genome resources: transforming comparative genomics. Mamm Genome. 2023;34:531–44. 10.1007/s00335-023-10015-2. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 14. Landrum MJ, Chitipiralla S, Kaur Ket al. ClinVar: updates to support classifications of both germline and somatic variants. Nucleic Acids Res. 2025;53:D1313–21. 10.1093/nar/gkae1090. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 15. Morales J, Pujar S, Loveland JEet al. A joint NCBI and EMBL-EBI transcript set for clinical genomics and research. Nature. 2022;604:310–5. 10.1038/s41586-022-04558-8. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 16. Phan L, Zhang H, Wang Qet al. The evolution of dbSNP: 25 years of impact in genomic research. Nucleic Acids Res. 2025;53:D925–31. 10.1093/nar/gkae977. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 17. Chen S, Francioli LC, Goodrich JKet al. A genomic mutational constraint map using variation in 76,156 human genomes. Nature. 2024;625:92–100. 10.1038/s41586-023-06045-0. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 18. Rubinstein WS, Maglott DR, Lee JMet al. The NIH genetic testing registry: a new, centralized database of genetic tests to enable access to comprehensive information and improve transparency. Nucleic Acids Res. 2013;41:D925–35. 10.1093/nar/gks1173. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 19. Pereira E, Conrad A, Tesfai Aet al. Multinational Outbreak of Listeria monocytogenes Infections Linked to Enoki Mushrooms Imported from the Republic of Korea 2016-2020. J Food Protect. 2023;86:100101. 10.1016/j.jfp.2023.100101. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 20. Brown B, Allard M, Bazaco MCet al. An economic evaluation of the Whole Genome Sequencing source tracking program in the U.S. PLoS One. 2021;16:e0258262. 10.1371/journal.pone.0258262. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 21. Liu F, Hu ZD, Zhao XMet al. Phylogenomic analysis of the Candida auris-Candida haemuli clade and related taxa in the Metschnikowiaceae, and proposal of thirteen new genera, fifty-five new combinations and nine new species. Persoonia. 2024;52:22–43. 10.3767/persoonia.2024.52.02. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 22. Feldgarden M, Brover V, Fedorov Bet al. Curation of the AMRFinderPlus databases: applications, functionality and impact. Microb Genom. 2022;8:000832. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 23. Shawrob KSM, Dhariwal A, Salvadori Get al. Large-scale global molecular epidemiology of antibiotic resistance determinants in Streptococcus pneumoniae. Microb Genom. 2025;11:001444. 10.1099/mgen.0.001444. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 24. Li X, Zhuang Y, Yu Yet al. Interplay of multiple carbapenemases and tigecycline resistance in Acinetobacter species: a serious combined threat. Clinical Microbiology and Infection. 2025;31:128–33. 10.1016/j.cmi.2024.08.027. [DOI] [PubMed] [Google Scholar]
- 25. Mack AR, Hujer AM, Mojica MFet al. beta-Lactamase diversity in Pseudomonas aeruginosa. Antimicrob Agents Chemother. 2025;69:e0078524. 10.1128/aac.00785-24. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 26. Mack AR, Hujer AM, Mojica MFet al. beta-Lactamase diversity in Acinetobacter baumannii. Antimicrob Agents Chemother. 2025;69:e0078424. 10.1128/aac.00784-24. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 27. Kim S, Chen J, Cheng Tet al. PubChem 2025 update. Nucleic Acids Res. 2025;53:D1516–25. 10.1093/nar/gkae1059. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 28. Fu G, Batchelor C, Dumontier Met al. PubChemRDF: towards the semantic annotation of PubChem compound and substance databases. J Cheminform. 2015;7:34. 10.1186/s13321-015-0084-4. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 29. Li Q, Kim S, Zaslavsky Let al. A resource description framework (RDF) model of named entity co-occurrences in biomedical literature and its integration with PubChemRDF. J Cheminform. 2025;17:79. 10.1186/s13321-025-01017-0. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 30. Erling O, Mikhailov I (ed.), de Virgilio R., Giunchiglia F., Tanca L. (ed.), Semantic Web Information Management: A Model-Based Perspective. Springer Berlin Heidelberg, Berlin, Heidelberg, 2010, pp.501–19. 10.1007/978-3-642-04329-1. [DOI] [Google Scholar]
- 31. Bast H, Buchhold B. Proceedings of the 2017 ACM on Conference on Information and Knowledge Management. Association for Computing Machinery, Singapore, Singapore, 2017, pp.647–56. 10.1145/3132847.3132921. [DOI] [Google Scholar]



