Skip to main content
. 2020 May 25;7:154. doi: 10.1038/s41597-020-0495-6

Table 2.

Columns provided in the metadata table ptbxl_database.csv.

Section Variable Data Type Description
Identifiers ecg_id integer unique ECG identifier
patient_id integer unique patient identifier
filename_lr string path to waveform data (100 Hz)
filename_hr string path to waveform data (500 Hz)
General Metadata age integer age at recording in years (see Fig. 3 left)
sex categorical sex (male 0, female 1)
height integer height in centimeters (see Fig. 3 right)
weight integer weight in kilograms (see Fig. 3 middle)
nurse categorical involved nurse (pseudonymized)
site categorical recording site (pseudonymized)
device categorical recording device
recording_date datetime ECG recording date and time
ECG Statements report string ECG report from diagnosing cardiologist
scp_codes dictionary SCP ECG statements (see Tables 6, 7 and 8)
heart_axis categorical heart’s electrical axis (see Table 10)
infarction_stadium1 categorical infarction stadium (see Table 11)
infarction_stadium2 categorical second infarction stadium (see Table 11)
validated_by categorical validating cardiologist (pseudonymized)
second_opinion boolean flag for second (deviating) opinion
initial_autogenerated_report boolean initial autogenerated report by ECG device
validated_by_human boolean validated by human
Signal Metadata baseline_drift string baseline drift or jump present
static_noise string electric hum/static noise present
burst_noise string burst noise
electrodes_problems string electrodes problems
extra_beats string extra beats
pacemaker string pacemaker
Cross-validation Folds strat_fold integer suggested stratified folds

Each ECG is identified by a unique ID (ecg_id) and comes with a number of ECG statements (scp_codes) that can be used to train a multi-label classifier that can be evaluated based on the proposed fold assignments (strat_fold).