Skip to main content
Comparative and Functional Genomics logoLink to Comparative and Functional Genomics
. 2003 Jul;4(4):432–441. doi: 10.1002/cfg.311

Unravelling the ORFan Puzzle

Naomi Siew 1,2, Daniel Fischer 2,
PMCID: PMC2447361  PMID: 18629076

Abstract

ORFans are open reading frames (ORFs) with no detectable sequence similarity to any other sequence in the databases. Each newly sequenced genome contains a significant number of ORFans. Therefore, ORFans entail interesting evolutionary puzzles. However, little can be learned about them using bioinformatics tools, and their study seems to have been underemphasized. Here we present some of the questions that the existence of so many ORFans have raised and review some of the studies aimed at understanding ORFans, their functions and their origins. These works have demonstrated that ORFans are an untapped source of research, requiring further computational and experimental studies.

Full Text

The Full Text of this article is available as a PDF (141.3 KB).

Selected References

These references are in PubMed. This may not be the complete list of references from this article.

  1. Alimi J. P., Poirot O., Lopez F., Claverie J. M. Reverse transcriptase-polymerase chain reaction validation of 25 "orphan" genes from Escherichia coli K-12 MG1655. Genome Res. 2000 Jul;10(7):959–966. doi: 10.1101/gr.10.7.959. [DOI] [PMC free article] [PubMed] [Google Scholar]
  2. Alm R. A., Ling L. S., Moir D. T., King B. L., Brown E. D., Doig P. C., Smith D. R., Noonan B., Guild B. C., deJonge B. L. Genomic-sequence comparison of two unrelated isolates of the human gastric pathogen Helicobacter pylori. Nature. 1999 Jan 14;397(6715):176–180. doi: 10.1038/16495. [DOI] [PubMed] [Google Scholar]
  3. Andersson J. O., Andersson S. G. Pseudogenes, junk DNA, and the dynamics of Rickettsia genomes. Mol Biol Evol. 2001 May;18(5):829–839. doi: 10.1093/oxfordjournals.molbev.a003864. [DOI] [PubMed] [Google Scholar]
  4. Andrade M. A., Daruvar A., Casari G., Schneider R., Termier M., Sander C. Characterization of new proteins found by analysis of short open reading frames from the full yeast genome. Yeast. 1997 Nov;13(14):1363–1374. doi: 10.1002/(SICI)1097-0061(199711)13:14<1363::AID-YEA182>3.0.CO;2-8. [DOI] [PubMed] [Google Scholar]
  5. Balasubramanian S., Schneider T., Gerstein M., Regan L. Proteomics of Mycoplasma genitalium: identification and characterization of unannotated and atypical proteins in a small model genome. Nucleic Acids Res. 2000 Aug 15;28(16):3075–3082. doi: 10.1093/nar/28.16.3075. [DOI] [PMC free article] [PubMed] [Google Scholar]
  6. Barabasi AL, Albert R. Emergence of scaling in random networks. Science. 1999 Oct 15;286(5439):509–512. doi: 10.1126/science.286.5439.509. [DOI] [PubMed] [Google Scholar]
  7. Basrai M. A., Hieter P., Boeke J. D. Small open reading frames: beautiful needles in the haystack. Genome Res. 1997 Aug;7(8):768–771. doi: 10.1101/gr.7.8.768. [DOI] [PubMed] [Google Scholar]
  8. Bloom B. R. On the particularity of pathogens. Nature. 2000 Aug 17;406(6797):760–761. doi: 10.1038/35021204. [DOI] [PubMed] [Google Scholar]
  9. Boucher Y., Nesbø C. L., Doolittle W. F. Microbial genomes: dealing with diversity. Curr Opin Microbiol. 2001 Jun;4(3):285–289. doi: 10.1016/s1369-5274(00)00204-6. [DOI] [PubMed] [Google Scholar]
  10. Brenner S. E. Target selection for structural genomics. Nat Struct Biol. 2000 Nov;7 (Suppl):967–969. doi: 10.1038/80747. [DOI] [PubMed] [Google Scholar]
  11. Coulson Andrew F. W., Moult John. A unifold, mesofold, and superfold model of protein fold use. Proteins. 2002 Jan 1;46(1):61–71. doi: 10.1002/prot.10011. [DOI] [PubMed] [Google Scholar]
  12. Doolittle R. F. A bug with excess gastric avidity. Nature. 1997 Aug 7;388(6642):515–516. doi: 10.1038/41418. [DOI] [PubMed] [Google Scholar]
  13. Doolittle Russell F. Biodiversity: microbial genomes multiply. Nature. 2002 Apr 18;416(6882):697–700. doi: 10.1038/416697a. [DOI] [PubMed] [Google Scholar]
  14. Doolittle W. F. Phylogenetic classification and the universal tree. Science. 1999 Jun 25;284(5423):2124–2129. doi: 10.1126/science.284.5423.2124. [DOI] [PubMed] [Google Scholar]
  15. Dujon B. The yeast genome project: what did we learn? Trends Genet. 1996 Jul;12(7):263–270. doi: 10.1016/0168-9525(96)10027-5. [DOI] [PubMed] [Google Scholar]
  16. Fischer D., Baker D., Moult J. We need both computer models and experiments. Nature. 2001 Feb 1;409(6820):558–558. doi: 10.1038/35054715. [DOI] [PubMed] [Google Scholar]
  17. Fischer D., Eisenberg D. Finding families for genomic ORFans. Bioinformatics. 1999 Sep;15(9):759–762. doi: 10.1093/bioinformatics/15.9.759. [DOI] [PubMed] [Google Scholar]
  18. Fraser C. M., Eisen J. A., Salzberg S. L. Microbial genome sequencing. Nature. 2000 Aug 17;406(6797):799–803. doi: 10.1038/35021244. [DOI] [PMC free article] [PubMed] [Google Scholar]
  19. Goulding Celia W., Parseghian Angineh, Sawaya Michael R., Cascio Duilio, Apostol Marcin I., Gennaro Maria Laura, Eisenberg David. Crystal structure of a major secreted protein of Mycobacterium tuberculosis-MPT63 at 1.5-A resolution. Protein Sci. 2002 Dec;11(12):2887–2893. doi: 10.1110/ps.0219002. [DOI] [PMC free article] [PubMed] [Google Scholar]
  20. Hayashi T., Makino K., Ohnishi M., Kurokawa K., Ishii K., Yokoyama K., Han C. G., Ohtsubo E., Nakayama K., Murata T. Complete genome sequence of enterohemorrhagic Escherichia coli O157:H7 and genomic comparison with a laboratory strain K-12. DNA Res. 2001 Feb 28;8(1):11–22. doi: 10.1093/dnares/8.1.11. [DOI] [PubMed] [Google Scholar]
  21. Hirsh A. E., Fraser H. B. Protein dispensability and rate of evolution. Nature. 2001 Jun 28;411(6841):1046–1049. doi: 10.1038/35082561. [DOI] [PubMed] [Google Scholar]
  22. Hurst L. D., Smith N. G. Do essential genes evolve slowly? Curr Biol. 1999 Jul 15;9(14):747–750. doi: 10.1016/s0960-9822(99)80334-0. [DOI] [PubMed] [Google Scholar]
  23. Hutchison C. A., Peterson S. N., Gill S. R., Cline R. T., White O., Fraser C. M., Smith H. O., Venter J. C. Global transposon mutagenesis and a minimal Mycoplasma genome. Science. 1999 Dec 10;286(5447):2165–2169. doi: 10.1126/science.286.5447.2165. [DOI] [PubMed] [Google Scholar]
  24. Huynen M. A., van Nimwegen E. The frequency distribution of gene family sizes in complete genomes. Mol Biol Evol. 1998 May;15(5):583–589. doi: 10.1093/oxfordjournals.molbev.a025959. [DOI] [PubMed] [Google Scholar]
  25. Jain R., Rivera M. C., Lake J. A. Horizontal gene transfer among genomes: the complexity hypothesis. Proc Natl Acad Sci U S A. 1999 Mar 30;96(7):3801–3806. doi: 10.1073/pnas.96.7.3801. [DOI] [PMC free article] [PubMed] [Google Scholar]
  26. Jordan I. King, Rogozin Igor B., Wolf Yuri I., Koonin Eugene V. Essential genes are more evolutionarily conserved than are nonessential genes in bacteria. Genome Res. 2002 Jun;12(6):962–968. doi: 10.1101/gr.87702. [DOI] [PMC free article] [PubMed] [Google Scholar]
  27. Jordan I. King, Rogozin Igor B., Wolf Yuri I., Koonin Eugene V. Microevolutionary genomics of bacteria. Theor Popul Biol. 2002 Jun;61(4):435–447. doi: 10.1006/tpbi.2002.1588. [DOI] [PubMed] [Google Scholar]
  28. Karev Georgy P., Wolf Yuri I., Rzhetsky Andrey Y., Berezovskaya Faina S., Koonin Eugene V. Birth and death of protein domains: a simple model of evolution explains power law behavior. BMC Evol Biol. 2002 Oct 14;2:18–18. doi: 10.1186/1471-2148-2-18. [DOI] [PMC free article] [PubMed] [Google Scholar]
  29. Koonin E. V. Computational genomics. Curr Biol. 2001 Mar 6;11(5):R155–R158. doi: 10.1016/s0960-9822(01)00081-1. [DOI] [PubMed] [Google Scholar]
  30. Kunin Victor, Cases Ildefonso, Enright Anton J., de Lorenzo Victor, Ouzounis Christos A. Myriads of protein families, and still counting. Genome Biol. 2003 Jan 28;4(2):401–401. doi: 10.1186/gb-2003-4-2-401. [DOI] [PMC free article] [PubMed] [Google Scholar]
  31. Lawrence J. G., Hendrix R. W., Casjens S. Where are the pseudogenes in bacterial genomes? Trends Microbiol. 2001 Nov;9(11):535–540. doi: 10.1016/s0966-842x(01)02198-9. [DOI] [PubMed] [Google Scholar]
  32. Mackiewicz P., Kowalczuk M., Gierlik A., Dudek M. R., Cebrat S. Origin and properties of non-coding ORFs in the yeast genome. Nucleic Acids Res. 1999 Sep 1;27(17):3503–3509. doi: 10.1093/nar/27.17.3503. [DOI] [PMC free article] [PubMed] [Google Scholar]
  33. Malpertuy A., Tekaia F., Casarégola S., Aigle M., Artiguenave F., Blandin G., Bolotin-Fukuhara M., Bon E., Brottier P., de Montigny J. Genomic exploration of the hemiascomycetous yeasts: 19. Ascomycetes-specific genes. FEBS Lett. 2000 Dec 22;487(1):113–121. doi: 10.1016/s0014-5793(00)02290-0. [DOI] [PubMed] [Google Scholar]
  34. Mira A., Ochman H., Moran N. A. Deletional bias and the evolution of bacterial genomes. Trends Genet. 2001 Oct;17(10):589–596. doi: 10.1016/s0168-9525(01)02447-7. [DOI] [PubMed] [Google Scholar]
  35. Mira Alex, Klasson Lisa, Andersson Siv G. E. Microbial genome evolution: sources of variability. Curr Opin Microbiol. 2002 Oct;5(5):506–512. doi: 10.1016/s1369-5274(02)00358-2. [DOI] [PubMed] [Google Scholar]
  36. Pellegrini M., Yeates T. O. Searching for frameshift evolutionary relationships between protein sequence families. Proteins. 1999 Nov 1;37(2):278–283. [PubMed] [Google Scholar]
  37. Petrov D. A., Sangster T. A., Johnston J. S., Hartl D. L., Shaw K. L. Evidence for DNA loss as a determinant of genome size. Science. 2000 Feb 11;287(5455):1060–1062. doi: 10.1126/science.287.5455.1060. [DOI] [PubMed] [Google Scholar]
  38. Qian J., Luscombe N. M., Gerstein M. Protein family and fold occurrence in genomes: power-law behaviour and evolutionary model. J Mol Biol. 2001 Nov 2;313(4):673–681. doi: 10.1006/jmbi.2001.5079. [DOI] [PubMed] [Google Scholar]
  39. Rost Burkhard. Did evolution leap to create the protein universe? Curr Opin Struct Biol. 2002 Jun;12(3):409–416. doi: 10.1016/s0959-440x(02)00337-8. [DOI] [PubMed] [Google Scholar]
  40. Schmid K. J., Aquadro C. F. The evolutionary analysis of "orphans" from the Drosophila genome identifies rapidly diverging and incorrectly annotated genes. Genetics. 2001 Oct;159(2):589–598. doi: 10.1093/genetics/159.2.589. [DOI] [PMC free article] [PubMed] [Google Scholar]
  41. Skovgaard M., Jensen L. J., Brunak S., Ussery D., Krogh A. On the total number of genes and their length distribution in complete microbial genomes. Trends Genet. 2001 Aug;17(8):425–428. doi: 10.1016/s0168-9525(01)02372-1. [DOI] [PubMed] [Google Scholar]
  42. Thuluvath Paul J., John Preeti R. Association between hepatitis C, diabetes mellitus, and race. a case-control study. Am J Gastroenterol. 2003 Feb;98(2):438–441. doi: 10.1111/j.1572-0241.2003.07256.x. [DOI] [PubMed] [Google Scholar]
  43. Unger Ron, Uliel Shai, Havlin Shlomo. Scaling law in sizes of protein sequence families: from super-families to orphan genes. Proteins. 2003 Jun 1;51(4):569–576. doi: 10.1002/prot.10347. [DOI] [PubMed] [Google Scholar]
  44. Vitkup D., Melamud E., Moult J., Sander C. Completeness in structural genomics. Nat Struct Biol. 2001 Jun;8(6):559–566. doi: 10.1038/88640. [DOI] [PubMed] [Google Scholar]
  45. Wolf Yuri I., Karev Georgy, Koonin Eugene V. Scale-free networks in biology: new insights into the fundamentals of evolution? Bioessays. 2002 Feb;24(2):105–109. doi: 10.1002/bies.10059. [DOI] [PubMed] [Google Scholar]
  46. Wolfe Kenneth H., Li Wen-Hsiung. Molecular evolution meets the genomics revolution. Nat Genet. 2003 Mar;33 (Suppl):255–265. doi: 10.1038/ng1088. [DOI] [PubMed] [Google Scholar]
  47. Wood V., Rutherford K. M., Ivens A., Rajandream M. A., Barrell B. A re-annotation of the Saccharomyces cerevisiae genome. Comp Funct Genomics. 2001;2(3):143–154. doi: 10.1002/cfg.86. [DOI] [PMC free article] [PubMed] [Google Scholar]
  48. Wren B. W. Microbial genome analysis: insights into virulence, host adaptation and evolution. Nat Rev Genet. 2000 Oct;1(1):30–39. doi: 10.1038/35049551. [DOI] [PubMed] [Google Scholar]
  49. Yanai I., Camacho C. J., DeLisi C. Predictions of gene family distributions in microbial genomes: evolution by gene duplication and modification. Phys Rev Lett. 2000 Sep 18;85(12):2641–2644. doi: 10.1103/PhysRevLett.85.2641. [DOI] [PubMed] [Google Scholar]
  50. Zdobnov Evgeny M., von Mering Christian, Letunic Ivica, Torrents David, Suyama Mikita, Copley Richard R., Christophides George K., Thomasova Dana, Holt Robert A., Subramanian G. Mani. Comparative genome and proteome analysis of Anopheles gambiae and Drosophila melanogaster. Science. 2002 Oct 4;298(5591):149–159. doi: 10.1126/science.1077061. [DOI] [PubMed] [Google Scholar]

Articles from Comparative and Functional Genomics are provided here courtesy of Wiley

RESOURCES