TABLE 2.
SUBSETS OF GENE–PHENOTYPE NETWORK SHOWING MANUALLY CURATED KNOWLEDGE AND COMPUTED KNOWLEDGE
Gene and Reference Found in GO and in PhenoGO | Biological Process Curated in GO | Phenotypic Context Computed in PhenoGO |
---|---|---|
Nerve growth factor β (Ngfb;MGI: 97321) | Perception of pain (GO: 0019233) | Afferent neuron (UMLS: C0027883) |
Vascular endothelial growth factor C (Vegfc; MGI: 109124) (82) | Morphogenesis of embryonic epithelium (GO: 0016331) | Lymphatic vessel (UMLS: C0229889) |
Definition of abbreviations: GO = Gene Ontology; MGI = Mouse Genome Informatics; UMLS = Unified Medical Language System.
Curated relationships of biological processes are found in GO. Manual curation, a rate-limiting process, is generally considered more accurate than knowledge that can be computed in high throughput. The GO Consortium manually curated over 1,617,028 annotations of genes to Gene Ontology code in the last 5 yr and intercurator reliability of curated relationships in GO was estimated at about 93% (83). In contrast, it took about 3 yr to develop BioMedLEE and PhenoGO, a natural language processing system and a text-mining tool, together capable of encoding gene–GO–phenotypes in high throughput with a precision of 85% (35). The PhenoGO system can now process vast quantities of text within a reasonable time. The PhenoGO database, which contains about 550,000 gene–GO–phenotype annotations, can substantially facilitate whole genome association research by providing a well-organized and ontology-anchored genome–phenome network mined from massive amounts of information found in biomedical journal articles. These annotations, refined with phenotypic context, such as the cell type, tissue, and organ in which a gene is expressed and has a function, often specific to the cell type, provide a crucial step for understanding the development and the molecular underpinning of embryogenesis and possibly the pathophysiology of diseases. In the table, the biological process associated with each gene via curation is further refined with phenotypic context via computations.