Figure - PMC

Skip to main content

View full-text article in PMC

. Author manuscript; available in PMC: 2009 Jun 23.

Published in final edited form as: Nat Biotechnol. 2008 Sep;26(9):1011–1013. doi: 10.1038/nbt0908-1011

(a) Each data item is a gene pair associated with a variety of features. Some features are real-valued numbers (such as the chromosomal distance between the genes or the correlation coefficient of their expression profiles under a set of conditions). Other features are categorical (such as whether the proteins co-localize or are annotated with the same function). Only a few training examples are shown. (b) A hypothetical decision tree in which each node contains a yes/no question asking about a single feature of the data items. An example arrives at a leaf according to the answers to the questions. Pie charts indicate the percentage of interactors (green) and noninteractors (red) from the training examples that reach each leaf. New examples are predicted to interact if they reach a predominately green leaf or to not interact if they reach a predominately red leaf. In practice, random forests have been used to predict protein-protein interactions¹⁵.