Skip to main content
. 2017 Mar 21;19:32. doi: 10.1186/s13058-017-0824-7

Fig. 3.

Fig. 3

Generation and validation of the absolute inference of patient signatures (AIPS) models. a Pipeline used for the development of AIPS: (1) using our curated list of 6466 gene signatures, we used the region of independence (ROI)95 to obtain assignments in the Molecular Taxonomy of Breast Cancer International Consortium (METABRIC) dataset; (2) using a cutoff of 80% for the percentage of independent patients we selected the informative gene signatures (GS); (3) we obtained a list of 3472 informative gene signatures; (4) using the 4510 samples in our training set and the 3472 informative gene signatures we obtained the gold standard assignments for the training set; (5) using an approach similar to absolute intrinsic molecular subtyping (AIMS) (Paquet et al.) we trained 3472 absolute models mimicking the 3472 informative gene signatures; (6) we selected the final list of models that constitute AIPS by requiring significant agreement with the ROI95 assignments in all the individual datasets present in the training set; and (7) we validated the final list of 1733 AIPS models in the validation set. b Distributions of the kappa statistics for the selected 1733 models forming AIPS (green) and the 1739 models not forming AIPS (gray) in the entire training set (using the median of the individual training sets), the individual training sets, and the validation set. c Heatmaps depict the percentage of samples of a given class obtained from the ROI95 (e.g. low, independent (ind.), or high) assigned to another class by AIPS in the training and validation sets. d Number of genes utilized in the ROI95 versus the AIPS models. e The ROI95 example for an epidermal growth factor receptor (EGFR) signature from MSigDB in the McGill validation dataset. AIPS assignments are presented at the top of the heatmap. f Heatmap ordered using the Euclidean distance and the Ward’s linkage method presenting the different rules utilized in the AIPS-EGFR models (red means the rule is true and white means the rule is false). Underlined genes in rules marked by a star are enriched in genes upregulated by EGFR in MCF7 cell lines [22]. g Confusion matrix representing the agreement between the single sample AIPS-EGFR model and the whole-cohort ROI95 assignments. h Confusion matrix representing the agreement between the AIPS-EGFR assignments performed on the same RNA extraction but different platforms (RNA sequencing (RNA-seq) versus microarray). i Boxplots depicting the distribution of the percentage of agreement for the AIPS partitions done on The Cancer Genome Atlas (TCGA) samples profiled on both microarray and RNA-seq. We also present a background distribution generated from shuffling the labels 100 times. ER estrogen receptor, HER2 human epidermal growth factor receptor 2, PAM50 prediction analysis of microarray 50