Skip to main content
. 2019 Feb 20;14(2):e0198921. doi: 10.1371/journal.pone.0198921

Table 2. Descriptions of the four balancing procedures.

The training/test split was done on the number of data samples.

dataset 1 dataset 2 dataset 3 dataset 4
Training set: 975,036 samples
Test set: 193,528 samples
Class Balancing: TOMEK applied to dataset (before dataset has been split into training & test set) to remove tomek links, random undersampling applied to class 3 once dataset is split into training and testing sub-sets, then SMOTE applied to classes 1 and 2 to make their cardinalities equal to that of class 3 (325,012).
Training set: 2,293,119 samples
Test set: 201,926 samples
Class Balancing: SMOTE applied to classes 1 & 2 to make their cardinalities equal to that of class 3 (764,373).
Training set: 487,464 samples
Test set: 106,028 samples
Class Balancing: TOMEK applied to dataset
(before dataset has been split into training & test set) to remove tomek links, random undersampling applied to class 3 once dataset is split into training and testing sub-sets, then SMOTE applied to classes 1 and 2 to make their cardinalities equal to that of class 3 (162,488).
Training set: 1,462,503 samples
Test set: 281,028 samples
Class Balancing TOMEK applied to dataset
(before dataset has been split into training & test set) to remove tomek links, random undersampling applied to class 3 once dataset is split into training and testing sub-sets, then SMOTE applied to classes 1 and 2 to make their cardinalities equal to that of class 3 (487,501).