Skip to main content
. 2015 Nov 8;44(Database issue):D733–D745. doi: 10.1093/nar/gkv1189

Table 1. RefSeq accession prefixes.

Prefix Molecule type Use context
NC_1 DNA Chromosomes
Linkage Groups
AC_1 DNA Chromosomes
Linkage Groups
NZ_2 DNA Chromosomes
Scaffolds
Used predominantly for prokaryotic genomes.
NT_3 DNA Scaffolds
NW_3 DNA Scaffolds
NG_1 DNA Genomic regions.
A genomic region record may represent a single or multiple genetic loci (e.g. rRNA targeted locus, RefSeqGene, non-transcribed pseudogene)
NM_3,4 mRNA protein-coding transcripts
XM_3,5 mRNA protein-coding transcripts
NR_3,4 RNA non-protein-coding transcripts including lncRNAs, structural RNAs, transcribed pseudogenes, and transcripts with unlikely protein-coding potential from protein-coding genes
XR_3,5 RNA non-protein-coding transcripts, as above
NP_3,4 protein Proteins annotated on NM_ transcript accessions or annotated on genomic molecules without an instantiated transcript (e.g. some mitochondrial genomes, viral genomes, and reference bacterial genomes
AP_3 protein Proteins annotated on AC_ genomic accessions or annotated on genomic molecules without an instantiated transcript record
XP_3,5 protein Proteins annotated on XM_ transcript accessions or annotated on genomic molecules without an instantiated transcript record
YP_3 protein Proteins annotated on genomic molecules without an instantiated transcript record
WP_6 protein Proteins that are non-redundant across multiple strains and species. A single protein of this type may be annotated on more than one prokaryotic genome

1The complete accession number format consists of the prefix, including the underscore, followed by 6 numbers followed by the sequence version number.

2The complete accession format consists of the prefix followed by the INSDC accession number that the RefSeq record is based on followed by the RefSeq sequence version number.

3The complete accession number format consists of the prefix, including the underscore, followed by 6 or 9 numbers followed by the sequence version number.

4Records with this accession prefix have been curated by NCBI staff or a model organism database, or are in the pool of accessions that curators work with. These records are referred to as the ‘known’ RefSeq dataset.

5Records with this accession prefix are generated through either the eukaryotic genome annotation pipeline, or the small eukaryotic genome annotation pipeline. Records generated via the first method are referred to as the ‘model’ RefSeq dataset.

6The complete accession number format consists of the prefix, including the underscore, followed by 9 numbers followed by the version number. The version number is always ‘.1’ as these records are not subject to update. See online documentation for additional information: www.ncbi.nlm.nih.gov/refseq/about/nonredundantproteins/.