Skip to main content
. 2013 Nov 19;42(Database issue):D756–D763. doi: 10.1093/nar/gkt1114

Table 1.

Annual growth of mammalian and human RefSeq transcript records

Releasea Taxab Mammalian records
Human records
Total transcripts Percent modelsc Percent curatedd Percent ncRNAe Total transcripts Total genesf Percent modelsc Percent curatedd Percent ncRNAe
1 5 126 980 68 7 <1 38 556 na 50 22 <1
6 10 79 686 41 17 <1 28 176 na 23 42 <1
12 19 158 111 65 11 <1 29 490 na 18 50 1
18 28 263 628 77 7 5 40 342 28 514 38 39 2
24 35 338 204 80 7 8 38 709 29 398 34 46 14
30 37 340 968 77 9 9 45 511 27 741 41 45 16
36 42 346 976 74 12 8 43 589 29 071 30 60 13
42 42 425 170 76 12 9 46 111 29 954 27 63 15
48 43 470 979 76 12 9 46 912 27 619 20 70 25
54 45 515 900 76 12 9 44 951 26 440 10 79 23
60 59 1 263 067 90 5 6 47 619 26 266 11 79 24

aRelease numbers listed correspond to ∼12 month intervals beginning from the first release in June 2003. The number of human transcripts in release 60 (July 2013) reflects the November 2012 genome annotation of three assemblies (GRCh37.p10, HuRef and CHM1_1.0) plus records added through ongoing curation activities.

bThe number of distinct NCBI Taxonomy IDs included in the RefSeq vertebrate_mammalian FTP directory that have a publicly available nuclear genome records. Twelve taxa are represented by un-annotated ENCODE genomic region records only. Mammals for which only a mitochondrial genome sequence is available are excluded. Data reported in Table 1 were extracted from archived reports available at ftp://ftp.ncbi.nlm.nih.gov/refseq/release/release-catalog/ using files named as ‘RefSeq-release##.catalog.gz’.

cThe percent of total transcripts that are model RefSeqs (with XM or XR accession prefix) generated by NCBI’s eukaryotic annotation pipeline. The percent known RefSeqs (with NM or NR prefix) can be inferred from this value (100% − percent model RefSeqs = percent known RefSeqs).

dThe percent of total transcripts that are known RefSeq records that have been curated by NCBI staff and are annotated with a ‘validated’ or ‘reviewed’ status in the COMMENT block of the RefSeq record. Validated records have undergone sequence review by NCBI staff, whereas a reviewed record includes curation of descriptive information, such as names, publications and a RefSeq summary in addition to sequence review. Known RefSeq records that have not been curated are not included; thus, the number of model records and curated records do not sum to 100%.

eThe percent of total transcripts that are not protein coding. This includes model or known long non-coding RNAs (lncRNA), small RNAs (e.g. microRNA, snoRNA, etc.), ribosomal RNAs and transcribed pseudogenes. Transfer RNAs, which are annotated on genomic records using tRNAscan but not tracked with RefSeq accessions, are not included.

fThe number of human genes per release was derived using FTP files named as ‘release##.accession2geneid.gz’. This file was not provided prior to release 14.