Table 1. RefSeq accession prefixes.
Prefix | Molecule type | Use context |
---|---|---|
NC_1 | DNA | Chromosomes |
Linkage Groups | ||
AC_1 | DNA | Chromosomes |
Linkage Groups | ||
NZ_2 | DNA | Chromosomes |
Scaffolds | ||
Used predominantly for prokaryotic genomes. | ||
NT_3 | DNA | Scaffolds |
NW_3 | DNA | Scaffolds |
NG_1 | DNA | Genomic regions. |
A genomic region record may represent a single or multiple genetic loci (e.g. rRNA targeted locus, RefSeqGene, non-transcribed pseudogene) | ||
NM_3,4 | mRNA | protein-coding transcripts |
XM_3,5 | mRNA | protein-coding transcripts |
NR_3,4 | RNA | non-protein-coding transcripts including lncRNAs, structural RNAs, transcribed pseudogenes, and transcripts with unlikely protein-coding potential from protein-coding genes |
XR_3,5 | RNA | non-protein-coding transcripts, as above |
NP_3,4 | protein | Proteins annotated on NM_ transcript accessions or annotated on genomic molecules without an instantiated transcript (e.g. some mitochondrial genomes, viral genomes, and reference bacterial genomes |
AP_3 | protein | Proteins annotated on AC_ genomic accessions or annotated on genomic molecules without an instantiated transcript record |
XP_3,5 | protein | Proteins annotated on XM_ transcript accessions or annotated on genomic molecules without an instantiated transcript record |
YP_3 | protein | Proteins annotated on genomic molecules without an instantiated transcript record |
WP_6 | protein | Proteins that are non-redundant across multiple strains and species. A single protein of this type may be annotated on more than one prokaryotic genome |
1The complete accession number format consists of the prefix, including the underscore, followed by 6 numbers followed by the sequence version number.
2The complete accession format consists of the prefix followed by the INSDC accession number that the RefSeq record is based on followed by the RefSeq sequence version number.
3The complete accession number format consists of the prefix, including the underscore, followed by 6 or 9 numbers followed by the sequence version number.
4Records with this accession prefix have been curated by NCBI staff or a model organism database, or are in the pool of accessions that curators work with. These records are referred to as the ‘known’ RefSeq dataset.
5Records with this accession prefix are generated through either the eukaryotic genome annotation pipeline, or the small eukaryotic genome annotation pipeline. Records generated via the first method are referred to as the ‘model’ RefSeq dataset.
6The complete accession number format consists of the prefix, including the underscore, followed by 9 numbers followed by the version number. The version number is always ‘.1’ as these records are not subject to update. See online documentation for additional information: www.ncbi.nlm.nih.gov/refseq/about/nonredundantproteins/.