Skip to main content
. 2021 Jan 25;6:4. doi: 10.1038/s41525-020-00167-4

Fig. 1. nORFs are important to investigate.

Fig. 1

a Schematic representation of nORFs and their genomic locations. nORFs (yellow boxes) include short ORFs (sORFs) which are ORFs <100 aa, alternative ORFs (altORFs) present in alternative frames of canonical ORFs within protein-coding genes and undefined ORFs which have as of yet not been identified by other studies. These nORFs can be found both within protein-coding (including 5’UTR, 3’UTR, CDS or overlapping CDS and the UTRs) and noncoding regions. They can also be present antisense to genes. ORFs identified within Pseudogenes and Denovogenes are also included under the categorization of nORFs. Reg. regulatory regions. b nORFs (from sORFs.org and OpenProt) have been identified throughout the genome on all chromosomes. The gray peaks represent location and density of nORFs on different chromosomes plotted using the R package circlize. Frequently expressed nORFs in the TCGA or GTEx are shown as black peaks, and those identified as differentially expressed are shown in red. c Mean Ribo-Seq expression and Ribo-Seq expression standard deviation (SD) have been plotted for human lymphoblastoid cells from RPFdbV2. Canonical ORFs are depicted as blue dots and novel ORFs are depicted by orange dots. The black line shows the median expression SD of canonical ORFs. Not all nORFs have noisy expression values, many have similar SD vs. mean expression values as that of canonical ORFs (cORFs). d Proportion of coding (blue) vs. noncoding (red) disease-associated variants within GWAS, HGMD, and COSMIC datasets are shown. Around 90% of disease-associated variants from GWAS, 80% from COSMIC and 40% from HGMD map to noncoding regions. To gain a better understanding of these uncharacterized variants we evaluate those within nORFs. e Left panel shows the CADD score distribution and their mean values mapped to known proteins, sORFs in the exonic regions, and sORFs in the non-exonic regions. Right panel is the estimation size plot of the CADD scores showing the mean difference with 95% confidence interval of all variants mapped to exonic sORFs (range 0.80–0.83) and non-exonic sORFs (range 2.35–2.38) with respect to known proteins.