Table 1. Experimental setup.
Features | n | nopt | Training | Validation | Validation |
Intra/Cross-lab Validation | Intra1 | Cross1 | |||
Genes | 10962 | 48 | V1 | V2 | V2+W2 |
BC (V1) | 747 | 44 | V1 | V2 | V2+W2 |
BCC (V1+W1+So) | 911 | 66 | V1 | V2 | V2+W2 |
HCC (Se) | 1163 | 111 | V1 | V2 | V2+W2 |
S456 (Se) | 456 | 80 | V1 | V2 | V2+W2 |
Inter-lab Validation | Inter1 | ||||
Genes | 10962 | 21 | V | W | |
BC (V) | 896 | 55 | V | W | |
BCC (V+So) | 934 | 137 | V | W | |
HCC (Se) | 1163 | 104 | V | W | |
S456 (Se) | 456 | 42 | V | W | |
Intra/Cross-lab Validation | Intra2 | Cross2 | |||
Genes | 10962 | 101 | W1 | W2 | V2+W2 |
BC (W1) | 576 | 59 | W1 | W2 | V2+W2 |
BCC (V1+W1+So) | 911 | 103 | W1 | W2 | V2+W2 |
HCC (Se) | 1163 | 71 | W1 | W2 | V2+W2 |
S456 (Se) | 456 | 67 | W1 | W2 | V2+W2 |
Inter-lab Validation | Inter2 | ||||
Genes | 10962 | 58 | W | V | |
BC (W) | 704 | 17 | W | V | |
BCC (W+So) | 762 | 33 | W | V | |
HCC (Se) | 1163 | 78 | W | V | |
S456 (Se) | 456 | 10 | W | V |
Our experimental setup allows a validation of the classifiers on data from the same institution (Intra1 and Intra2), data from the same and another institution (Cross1 and Cross2), and data from another institution (Inter1 and Inter2). In all cases the training and validation sets are non-overlapping, and thus independent. Moreover, the validation data was not used in the first step where the unsupervised approach is used to extract modules. In each of the validation schemes we included a gene-based classifier (Genes), and several module-based classifiers (BC, BCC, HCC, and S456). For each of the module-based classifiers we indicate the datasets from which the modules were extracted (Features column), along with the number of features (n), and the optimal number of modules/genes output from the train/test protocol (nopt). The Training column indicates the dataset on which the train/test protocol was used, and the Validation column indicates the datasets used for validation of the classifiers. All datasets are abbreviated as: V: [4], W: [3], So: [2], and Se: [10]. When we split a dataset in two equal independent parts we indicate the training (1) and validation (2) parts by subscripts.