Figure - PMC

Skip to main content

An official website of the United States government

Here's how you know

Here's how you know

Official websites use .gov
A .gov website belongs to an official government organization in the United States.

Secure .gov websites use HTTPS
A lock ( ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.

View full-text article in PMC

. 2003 Oct 15;31(20):5877–5885. doi: 10.1093/nar/gkg798

Search in PMC
Search in PubMed
View in NLM Catalog
Add to search

Copyright © 2003 Oxford University Press

PMC Copyright notice

Breakdown of ORFs based on Sanger Institute annotation versus LDA based annotation. The Sanger Institute provides three sub-classifications of hypothetical annotations. Those ORFs that show sequence-based homology to hypothetical annotations in other organisms are termed ‘conserved hypothetical protein’ (denoted here as ‘Conserved’ or ‘C’). If an ORF is on the opposite strand from the predicted coding strand or has unusual GC composition, it is sometimes labeled ‘hypothetical protein, unlikely’ (based on details provided on the Sanger Institute web pages and EMBL entry). We have shown these here as ‘Unlikely’ or ‘U’. Finally, some ORFs are annotated merely as ‘hypothetical protein’ and we refer to these as ‘predicted’ or ‘P’. For those ORFs with a function assignment, we use the term ‘Assigned function’ or ‘A’. Plots on the left side of this figure show the distribution of annotations for ORFs that our method would label as likely coding, while plots on the right side if this figure show ORFs that our method would identify as unlikely to be true coding regions.