Table 1.
Annual growth of mammalian and human RefSeq transcript records
Releasea | Taxab | Mammalian records |
Human records |
|||||||
---|---|---|---|---|---|---|---|---|---|---|
Total transcripts | Percent modelsc | Percent curatedd | Percent ncRNAe | Total transcripts | Total genesf | Percent modelsc | Percent curatedd | Percent ncRNAe | ||
1 | 5 | 126 980 | 68 | 7 | <1 | 38 556 | na | 50 | 22 | <1 |
6 | 10 | 79 686 | 41 | 17 | <1 | 28 176 | na | 23 | 42 | <1 |
12 | 19 | 158 111 | 65 | 11 | <1 | 29 490 | na | 18 | 50 | 1 |
18 | 28 | 263 628 | 77 | 7 | 5 | 40 342 | 28 514 | 38 | 39 | 2 |
24 | 35 | 338 204 | 80 | 7 | 8 | 38 709 | 29 398 | 34 | 46 | 14 |
30 | 37 | 340 968 | 77 | 9 | 9 | 45 511 | 27 741 | 41 | 45 | 16 |
36 | 42 | 346 976 | 74 | 12 | 8 | 43 589 | 29 071 | 30 | 60 | 13 |
42 | 42 | 425 170 | 76 | 12 | 9 | 46 111 | 29 954 | 27 | 63 | 15 |
48 | 43 | 470 979 | 76 | 12 | 9 | 46 912 | 27 619 | 20 | 70 | 25 |
54 | 45 | 515 900 | 76 | 12 | 9 | 44 951 | 26 440 | 10 | 79 | 23 |
60 | 59 | 1 263 067 | 90 | 5 | 6 | 47 619 | 26 266 | 11 | 79 | 24 |
aRelease numbers listed correspond to ∼12 month intervals beginning from the first release in June 2003. The number of human transcripts in release 60 (July 2013) reflects the November 2012 genome annotation of three assemblies (GRCh37.p10, HuRef and CHM1_1.0) plus records added through ongoing curation activities.
bThe number of distinct NCBI Taxonomy IDs included in the RefSeq vertebrate_mammalian FTP directory that have a publicly available nuclear genome records. Twelve taxa are represented by un-annotated ENCODE genomic region records only. Mammals for which only a mitochondrial genome sequence is available are excluded. Data reported in Table 1 were extracted from archived reports available at ftp://ftp.ncbi.nlm.nih.gov/refseq/release/release-catalog/ using files named as ‘RefSeq-release##.catalog.gz’.
cThe percent of total transcripts that are model RefSeqs (with XM or XR accession prefix) generated by NCBI’s eukaryotic annotation pipeline. The percent known RefSeqs (with NM or NR prefix) can be inferred from this value (100% − percent model RefSeqs = percent known RefSeqs).
dThe percent of total transcripts that are known RefSeq records that have been curated by NCBI staff and are annotated with a ‘validated’ or ‘reviewed’ status in the COMMENT block of the RefSeq record. Validated records have undergone sequence review by NCBI staff, whereas a reviewed record includes curation of descriptive information, such as names, publications and a RefSeq summary in addition to sequence review. Known RefSeq records that have not been curated are not included; thus, the number of model records and curated records do not sum to 100%.
eThe percent of total transcripts that are not protein coding. This includes model or known long non-coding RNAs (lncRNA), small RNAs (e.g. microRNA, snoRNA, etc.), ribosomal RNAs and transcribed pseudogenes. Transfer RNAs, which are annotated on genomic records using tRNAscan but not tracked with RefSeq accessions, are not included.
fThe number of human genes per release was derived using FTP files named as ‘release##.accession2geneid.gz’. This file was not provided prior to release 14.