Skip to main content
. 2020 Sep 22;20:241. doi: 10.1186/s12911-020-01265-0

Table 2.

Dataset changes due to chart review and data preprocessing

Process Variables (+Target Classes) Patients (N)
First CRC Dataset 142 (+ 1) 1511
Chart Review 1) Check extraction method and location 142 (+ 1) 1508
2) Check for inappropriate data 142 (+ 1) 1496
3) Select priority variables (First Processed CRC Dataset) 40 (+ 1) 1496
Data Preprocessing 1) Drop redundant variables 37 (+ 1) 1496
2) Drop variables including 90% ↑ missing values 32 (+ 1) 1496
3) Drop instances containing missing values 32 (+ 1) 1169
4) One-hot encoding (Final CRC Dataset) 54 (+ 5) 1169
Data Split 1) Data split (training/testing) 54 (+ 5) 935 / 234