Skip to main content
. Author manuscript; available in PMC: 2011 Jun 6.
Published in final edited form as: Toxicol Appl Pharmacol. 2009 Apr 9;237(3):317–330. doi: 10.1016/j.taap.2009.04.002

Figure 5.

Figure 5

Representative subset identification using joint entropy analysis. In panel A, subsets of cytokine co-treatments that maximally maintained the diversity of the full experimental human hepatocyte (HH) data set were identified by exhaustively scoring all possible subsets that contained the no cytokine and single-cytokine/LPS treatment conditions. The treatments contained in the highest scoring set of each size are indicated by the white boxes. Red bars represent the joint entropy of the maximally informative set. After 16–19 co-treatments are selected, additional co-treatments do not increase the joint entropy, indicating that the diversity of the full data set can be captured with a well-chosen set of 16–19 co-treatments. See Figure S18 for maximum entropy subset plots for the rat hepatocyte (RH) and HepG2 (G2) data sets. In panel B, maximally informative consensus subsets were chosen using only RH data, only G2 data, both RH and G2 data, or only HH data. The mean performance of the top 100 subsets chosen from each cell system when scored for joint entropy in the HH data is plotted, along with the mean and standard deviation joint entropy for all possible subsets for the HH data. Sets chosen based on RH and G2 data still perform well when scored against the HH data. In panel C, a single consensus set of each size was chosen from each cell system and the scored for joint entropy in the HH data. The probability of randomly choosing a subset with higher joint entropy is plotted as a function of set size. Low values indicate that it is unlikely to randomly select a set with higher information content than the evaluated set. The dashed line represents the average of all possible subsets.