Figure 2. Identification of thresholds for gene expression biomarkers from the TG-GATES study.
A. Distribution of values across training and test sets derived from the TG-GATES study. Thresholds were determined as detailed in the Methods. Each of the microarray comparisons (biosets) were compared to one of the six biomarkers generating a -Log(p-value) of the correlation. The -Log(p-value)s were rank ordered and divided into liver tumorigenic (red) and non-tumorigenic (blue) groups for training and test sets. Horizontal lines represent the derived thresholds. The bioset number refers to the rank of bioset organized by -log(p-value).
B. Box and whisker plots of -Log(p-value)s from the non-tumorigenic conditions. Left, values from the TG-GATES training set. Right, values from the TG-GATES test set. The 3 values deemed outliers in the training set are circled and were not used to determine the thresholds. One value in the CAR biomarker from the test set was also considered an outlier.

